Pixel-level Motion Direction Models - Új valószínűségi módszerek videó-megfigyelési alkalmazáso

5.3 Pixel-level Motion Direction Models

Our pixel-level detectors (denoted as low-level detectors on Fig. 5.2) are to classify motion as certain or uncertain and usual or unusual, where by uncertain we mean that the motion information is unreliable. If we detect motion at a place where no motion was observable previously the algorithm should classify this case to the uncertain (but not to the unusual) group.

Also in our approach the motion of dierent image areas is considered independently from each other, so our detection method does not investigate the co-occurrence of events, which would require a global clustering of motion information. The problem with several approaches is that they assume a certain rhythm of the trac which is true for only a limited number of situations, which will be discussed in Sec. 5.4.

The performance, behavior, and output of a system based on noisy and unreliable obser-vations of thousands of data greatly depends on data collection, cleaning/ltering, interpretation and the motion model itself. In the following subsections we describe our pixel-based motion di-rection modeling techniques, starting from the interpretation of raw motion statistics, and moving on to the inclusion of temporal and spatial support. The algorithms of the following subsections investigate local motion information in a small window of time.

5.3.1 Empirical Probability Estimates of Motion Direction

One may think that the local statistics of motion directions can give valuable information about the typical motion in road environments. To investigate this, rst we simply collected 8-bin motion direction histograms for all image pixels, where the bins represented predened disjoint direc-tion classes{E,NE,N,NW,W,SW,S,SE}, each having a constant∆ = 45^◦ range (see Fig. 5.4).

Formally the direction class dk is represented by its mean mk = k·45^◦, where 0 ≤ k ≤7, and D = {dk}_0≤k≤7. Thus the empirical model is parameterized by θ_E = {|dk|}_0≤k≤7, where |dk|

N NW W

SW S

SE E NE

Figure 5.4: Motion vector directions are classied into eight predened direction classes.

5.3.1. Empirical Probability Estimates of Motion Direction 66

denotes the number of observations (frequency) collected during a training phase. Increasing the number of bins could enhance the precision but would also increase the uncertainty (since the learning time is limited and there is no guarantee to get a continuous distribution during learning).

Initially, we applied no spatial or temporal ltering for the collected data, but supposed that the relative occurrence of motion vectors gives a simple but eective estimate of the empirical proba-bility. Letotdenote a particular motion direction at timet, i.e.ot∈R, and dk ∈Dthe direction class whereot falls into, i.e.mk−∆/2 ≤ot≤mk+∆/2. The empirical probability ofot given the modelθ_E is

P(o_t|θ_E) = |dk| P

d_i∈D

|di| , (5.1)

In Fig. 5.5 (b) and (c) we used eight distinct colors (C_k) for the visualization of motion directions.

The color of a pixel atx= (x, y)is calculated as a weighted sum:

RGBx= X

0≤k≤7

Px(mk|θ_E)Ck , (5.2)

wherePx(·|θ_E) denotes the empirical probability at positionx= (x, y). Naturally, there can be regions where no motion was detected in the video during the learning phase. We excluded those regions from the detection where less than 2% of the frames showed any motion during training.

(a) (b) (c) (d)

Figure 5.5: (a) sample frame from the One-Way video; (b) pixel-wise motion direction statistics;

(c) Mean Shift segmented statistics; coloring is done by Eq. 5.2; (d) an object is moving against the trac in a one-way street; red overlay is used for representing the unusually moving object detected by the Mean Shift based method, this object is also marked with a yellow rectangle.

5.3.2. Mixture of Gaussians by Expectation Maximization 67

5.3.2 Mixture of Gaussians by Expectation Maximization

In Sec. 5.3.1 a simple histogram based empirical probability model with predened direction classes was used, where the exact typical directions were unknown. Now we create a continuous model as a renement, which is able to represent multiple usual directions (i.e. multimodal). A straightforward approach is to t MoG (introduced in Sec. 2.2) on the noisy observation data. A mixture ofM Gaussians is dened as:

p(o_t) =

k=1

ω_kN(o_t|µ_k, Σ_k) , (5.3)

where ot is the observation at time t being modeled, i.e. a motion direction at a pixel position, ωk are the weights,µk and Σk are the expected value and covariance of the Gaussians. The θ_EM parameters of a MoG for a particular observation set{o1, o2, . . . , oN}(observed motion directions at a given pixel position) can be computed iteratively by Expectation Maximization (EM) [19] as discussed in Sec. 2.2.1.

According to [52], if an observation is within2.5σfrom the expected value of a distribution, then we consider the observation to match the distribution. Denote the set of weights of the matching distributions withW={ω_m₁, ω_m₂, . . . , ω_m_k}, where1≤m_i ≤M. Then we dene the probability of observationotgiven the MoG modelθ_EM is

P(o_t|θ_EM) = max{W} . (5.4)

5.3.3 Adaptive Mixture of Gaussians

In the previous section a MoG was tted on the observation set (motion directions) in each pixel to have a probability model. However, the pixel-wise iterative formula of the parameter estimation made the training process extremely slow. In [51] an adaptive algorithm is proposed to update the parameters of the MoG model used for motion detection: expected value, covariance and the weight of distributions are recalculated at each frame (see Sec. 4.3 for a detailed discussion). It is a question whether the adaptive updating process, as proposed in the original article, can follow the uctuation of optical ow: while in the case of background modeling the background pixels change their values relatively fast and roughly periodically, in the current case we observe recurrence in longer periods. In our experiments we made a random initialization of the distributions (testing with dierent numbers of classes) then in the detection phase we used the method of [51] to update the MoG parameters.

Consider the same mixture ofM Gaussians as in Sec. 5.3.2. The adaptive algorithm has to

5.3.4. Spatial Support: Segmentation of Motion Statistics 68

decide if a new observationo_tat timetis matching with any component in the mixture. We dene the probability thato_t is usual similarly to Eq. 5.4, i.e.

P(o_t|θ_AD) = max{W} , (5.5)

and let m? = argmax{W}. At each step (at each frame) t of the training phase we update the weights for all distributions as:

ωk,t= (1−α)·ω_k,t−1+α·Mk,t, (5.6)

whereαis the learning factor,M_k,tequals1ifk=m_?and0otherwise. After each step the weights are normalized.

Updating of the best matching Gaussianm? is performed by modifying the expected value according to the observationot, then recalculating the covariance. This process can be formulated mathematically as

µ_m_?_,t= (1−ρ)·µ_m_?_,t−1+ρ·o_t, (5.7) σ_m²_?_,t= (1−ρ)·σ_m²_?_,t−1+ρ·(ot−µm_?,t)² , (5.8) whereρ=N(ot|µm_?,t−1, Σm_?,t−1), or a constant value [51] as discussed in the previous chapter, and we experimentally set toρ= 0.15.

5.3.4 Spatial Support: Segmentation of Motion Statistics

To include spatial support we used the Mean Shift (MS) segmentation technique of [25] on the empirical probability estimatesP(·|θ_E)(see Sec. 5.3.1). The MS technique is widely used in image segmentation and has the ability to consider spatial information and other arguments such as color.

Fig. 5.5 illustrates (a) input image, (b) pixel-wise motion statistics, and (c) the segmented motion statistics map (segmentation parameters were set as proposed in [25]). We denote the set of L disjoint regions, resulted by the segmentation, byR={r₁, r₂, . . . , r_L}. For a particular regionr_i, and for the pixels contained by the region, the probability distributionP(·|θ_MS)is obtained by the segmentation.

5.3.5. Temporal Support: Markovian Extension 69

5.3.5 Temporal Support: Markovian Extension

Although we insist on the statistical processing of raw data without object level understanding of the motion processes or without the tracking of objects, we can still use spatiotemporal information for the analysis of motion. We assume that unusual events happen not only on one frame but on at least two consecutive frames, supposing a Markov Chain property of the motion of objects. In our approach it means that if we nd an anomalously moving pixel and we estimate its motion direction at timet, then projecting back with motion compensation to the preceding frame at time t−1, here should also be a corresponding anomalous motion vector with low probability. This is formalized as follows. Let

P_x^U(ox,t|θ) = 1−Px(ox,t|θ) (5.9) denote the probability of a given motion directiono_x,tbeing unusual at timetat the pixel position x= (x, y), using the motion modelθ∈ {θ_E,θ_EM,θ_AD,θ_MS}. Then using the above assumptions:

P_x^UM(ox,t|θ) =P_x^U(ox,t|θ)× max

x⁰∈N(x+v){P_x^U0(ox⁰,t−1|θ)} , (5.10) where the second term of the product means that we use the highest probability value of unusual observations in theN(·)neighborhood set of the motion compensated positionx+vat timet−1, wherev= (vx, vy)denotes the optical ow velocity vector. In our experimentsN(·)was a square of size5×5around the back projected position. Our tests proved that using P^UM instead ofP^U makes signicant improvements in the performance of anomaly detection in case of all three motion models.

5.3.6 Results, Evaluation

To use these above methods for motion anomaly detection we have to train our statistical models.

Basically there is only one parameter to be determined: the time interval for the collection of motion directions at image pixels. We have to pay attention to gathering enough data and to avoid unusual, rare events in the training video. In our examples the number of image frames used for training was between 6000 and 8000.

We can monitor the probability of events continuously by P_x^U(·|θ)and P_x^UM(·|θ). While we basically apply pixel based processing we can still group the local estimates with a simple method:

we label all connected components (blobs above the size of 10 pixels) of the binary foreground mask with the average probabilityP^U_x(·|θ)or the Markovian valueP^UM_x (·|θ). We plot these probabilities

5.3.6. Results, Evaluation 70

(a) (b)

Figure 5.6: Anomaly detection results for the frames of the One-Way sequence. (a) Empirical probability; (b) Mean Shift probability; (c) Mixture of Gaussians by EM; (d) Adaptive Mixture of Gaussians. Red dashed line: without Markovian extension (P^U_x(·|θ)). Blue solid line: with Markovian extension (P^UM_x (·|θ)).

of the most suspicious blob (the one with the highest value) as a function of frame number. Fig. 5.6 illustrates the results for the One-Way sequence where the trend lines, considered as the nal output of the detector, are the smoothed version of the probability values. In all cases the peak is at the 28500th frame, the only unusual event, when a bicycle comes in the wrong way on the one-way street. This is illustrated in Fig. 5.5(d), where the unusually moving bicycle is marked with red color. For detection we used the Mean Shift-based detector with Markovian support (P^UM_x (·|θ_MS)).

While all methods succeeded in such situations (also in other test videos), the question is the size of the gap between real unusual and other rare events. From the graphs on Fig. 5.6 it is easy to see that the Markovian extension increased the dierence between the anomalous and usual events in all cases with approximately 30%, but there is no signicant dierence between the dierent models. In general the EM method outperforms others, but unfortunately the iterative estimation of parameters (weights, expected value, and covariance) at every pixel using EM [19] is a very time consuming process and the memory requirement makes this approach unfeasible. (We have to store motion directions calculated on every frame in every pixel position. In case of 5000 training frames of a QCIF video 366MB memory is required).

In document Új valószínűségi módszerek videó-megfigyelési alkalmazásokhoz (Pldal 78-84)