• Nem Talált Eredményt

Introducing Relative Emissions by Scaling

5.4 Hidden Markov Models for Anomaly Detection

5.4.2 Region-Level HMMs

5.4.2.3 Introducing Relative Emissions by Scaling

The precision problem originates from the denition of the emission probability in Eq. 5.13, where we dened the emission probability of our model as a product of bi(ot,k) probabilities. Their values depend on the parameters of the MoG, but are typically small ( 1). In the training phase these parameters are roughly estimated then re-estimated iteratively. If the training set of observations is heavily loaded with noise, which is typical in outdoor surveillance, the covariance of the Gaussians will be set to a relatively large value in order to achieve a t. As a result, the

5.4.2.3. Introducing Relative Emissions by Scaling 74

product in the emission probability Eq. 5.13 will head exponentially to zero. In general we can say that the larger the covariances are the fewer samples are needed to reach zero. In case of 64bit oating point data representation the limit is2−1022 (2.23×10−308), which restricts the original model to handle only a low number of samples, which would make the model unfeasible at QCIF or CIF resolution. Hence, it is necessary to incorporate a scaling procedure in the calculation of the emission probabilities to be able to process larger number of vectors. For this reason we dene the relative emission as follows:

We denote our scaling coecients aset,k =h PN

j=1bj(ot,k)i−1

.

The scaling procedure introduced above can be incorporated into the sequence scaling method discussed Sec. 2.4.5, where each forward variable αt(i) was scaled by the sum over all states ofαt(i). It can be shown that the original re-estimation procedures can be used by simply changing the emission probabilities to relative emissions (see Eq. II.6II.10 and Eq. II.11 in Ap-pendix II.2 for more details). The only change is in the computation of the log-likelihood function, which evaluates the convergence of the re-estimation process, and can be expressed as

logP(O|λ) =−

where ˆct are the scaling coecients, calculated as in the scaling method of Sec. 2.4.5, but using relative emissions. A comprehensive proof can be found in Appendix II.2.

In order to get the observation state sequence the calculation of the Viterbi path [23] is necessary where the logarithm of the emission probabilities and of the forward, backward variables can be used [48]. In this case the product, in the emission probabilities, will turn into a sum, hence the use of the proposed relative probabilities is not necessary. See Sec. 2.4.3 and Sec. 2.4.5 for more details.

To illustrate the positive eect of our scaling technique we show example data taken from a test video where ve states and six Gaussians were used in the HMM model (i.e.N = 5andM = 6).

From the observation sequence we randomly selected one observation or, which contained nine samples (i.e.Kr = 9). We monitored the calculation of the emission probabilitiesbi(or)and the relative emissions˜bi(or), while running the Baum-Welch procedure. Finally, we got the parameters of our HMM and the states?, which most probably emitted the selectedorobservation. Fig. 5.8 left

5.4.2.4. Anomaly Detection 75

contains the comparison of the original emission probabilityb?(or)(marked with yellow) and the proposed relative emission˜b?(or)(marked with black). The vertical axis represents the logarithm of the product, while horizontally we plot the number of samples used in the product. It is easy to see that as the number of samples (i.e. the number of multiplications) increases, the value of the product decreases exponentially (please note that the gure is in negative logarithmic scale).

Practically it means that with the original method approximately 50 optical ow vectors can be present in the ROI at most, while using relative emissions 200-400 vectors still result in a feasible model (the product is above the precision level). (Please note that the graphs contain data at three steps of the Baum-Welch algorithm: initial step, 1st iteration, nal model.) Fig. 5.8 right contains the result of a second test withKr= 12,N= 4 andM = 12.

Figure 5.8: Probabilities of two observations by the number of test samples. In both (left and right) examples our method converges signicantly more slowly to zero. Original emission of Eq. 5.13 is marked with yellow and the proposed relative emission of Eq. 5.15 is marked with black color.

Three steps of the iterative training process are selected: initial estimate is marked with dotted, rst iteration with dashed and nal step with continuous lines.

5.4.2.4 Anomaly Detection

After learning the parameters of the HMM and running the Viterbi algorithm, we can get the most probable state sequence. In case of our test sequence it is driven by the trac lights illustrated in Fig. 5.9 with dierent colors as follows. The states are represented by MoGs with specic means, covariances and weights. The means give the typical directions occurring in the ROI and the weights are proportional to the relative number of occurrences. This information is visualized by dierent colors (means) and column heights (weights) drawn along the time axis. We used the HSV color space, where the hue was determined by the mean angle.

From real-world recordings we synthesized a new realistic video where an anomalously mov-ing vehicle ran in between other cars (the car in the middle of a crossmov-ing is shown in Fig. 5.10), and used the previously trained HMM to detect this unusual event.

Our anomaly detector uses both the hidden and the Markovian properties of the model,

5.4.2.4. Anomaly Detection 76

Figure 5.9: Top: examples of the four detected states. Bottom: the state sequence of highlighted ROI represented by the mean angles (hue) of the state MoG in HSV color space. The rhythm of trac is clearly visible.

similar to the method discussed in Sec. 3.3.1. We dene the probability of the observationOt at timet, that was generated by statesi given the previous stateQt−1 as

P(Ot, Qt=si|Qt−1) = whereQt−1 =s−1 denotes that the previous state is unknown. We select the states? with the highest such probability value and dene the probability thatOt is usual as

P?(Ot) =

The result, the smoothed value of the −logP?(Ot), is demonstrated on Fig. 5.10 with three detected frames of the video having the lowest probability. Unusual event detection can be easily carried out by thresholding this scalar function. Similarly to the rst example, in our second experiment an anomalously moving vehicle is driving dangerously through an intersection, between two cars that are passing the crossing lane. The result is demonstrated on Fig. 5.11 with three detected example frames and probability values (anomalously moving vehicle is marked with yellow contour).