Neighborhood Eects Among Foreground Pixels

4.4 Modied Mixture of Gaussians

4.4.2 Neighborhood Eects Among Foreground Pixels

In the previous section we introduced a Gaussian distribution to represent homogeneous foreground areas. Now, we have to ensure that these foreground pixels do not become part of the background due to the foreground aperture eect. To ensure this we have to investigate if pixels of the foreground model have foreground neighbors with similar expected value and smaller standard deviations. (Note that this refers to large homogeneous moving areas: as a large plain region moves, its margins at the side of the direction of motion have a relatively high standard deviation, but it decreases towards other parts of the plain region.) If the standard deviation is within the range of the investigated pixel, then we have to increase the neighbors' standard deviation. Based on these assumptions we dene the neighbor foreground setN Fx,t(∆)at positionx= (x, y)in the

∆radius, which contains those foreground models that meet the following conditions:

1. the foreground models in N Fx,t(∆) and the foreground model at x have similar expected values;

2. the standard deviation of a model inN Fx,t(∆)is smaller than the standard deviation of the foreground model at positionx.

Denoting byµ_x,_F_,tandσ_x,_F_,tthe mean and standard deviation of the foreground model at position xat timet, the above assumptions are formalized as

N Fx,t(∆) =

The following recursiveDif f use algorithm of the method is used to increase the standard deviation of the neighboring foreground models at positionx:

procedureDif f use(x)

4.5. Experiments, Results 56

The Dif f use procedure dened above is initially called for every image pixel at time t. This means that the standard deviation of the foreground model, over margins of homogeneous regions, diuses to the inner areas of objects. Dmax blocks this diusion at disoccluded parts of the image andMmax modies the sensitivity of this process.

4.5 Experiments, Results

We have extended the original method of [51] with the above algorithm and used several dierent test videos to investigate the performance. We expected that at large moving homogeneous regions less pixels would be learned by the background model, thus foreground objects would be less ragged.

Besides our own test videos, we have chosen some, with the permission of authors, from those used by Toyama et al. when testing Wallower [55]. In Fig. 4.1 we can see one input frame, the output of the original MoG method [51], the output frame after morphological closing and the result of the proposed technique. As can be seen, the foreground area remained more connected and kept the shape of the objects. It clearly outperforms the often used morphological post-processing and besides keeping the objects' integrity, partly or slowly moving regions remain detected (see the chair in the image).

(a) Input frame (b) Original MoG [51] (c) MoG + morphology (d) Proposed MoG

Figure 4.1: Visual comparison of one frame from the Wallower dataset [55].

For numerical comparison we created a small dataset containing three videos with homo-geneous foreground areas and for manually selected frames we performed manual ground truth separation. Finally, we compared the original MoG-based method with our extended model. In Fig. 4.2, we present three frames where the numerical comparison was carried out. Table 4.1 sum-marizes the performance of the original [51] and the proposed method by counting the misclassied pixels (either false positive or false negative).

To compare computational complexity we implemented the original and the improved algo-rithm in the same framework as single-threaded C++ applications using Intel Image Processing

4.6. Conclusion 57

Video Original MoG Proposed MoG Improvement

Car 4350 1934 55.54%

Woman 3840 1778 53.70%

Man 2815 1447 48.60%

Table 4.1: Misclassied pixels using the original MoG-based method and our extended model on the frames of Fig. 4.2.

Library (IPL). Processing the three channel video frames with the resolution of320×240 on a Windows XP, 3GHz Intel Xeon CPU PC the frame rate dropped from 12 FPS to approximately 8 FPS. In our experiments we used three Gaussian components in the background models.

Video Input Ground truth Original MoG Proposed MoG

Car

Woman

Man

Figure 4.2: Comparison of classical MoG-based foreground-background separation and the pro-posed method. No morphological post-processing step was performed on the output.

4.6 Conclusion

In this chapter we discussed the basics of MoG-based motion detection. According to our experi-ences and other papers, the foreground aperture problem seemed to be an important problem in motion detection, so we developed a new method to minimize this problem. Compared to other techniques our approach reduces the undetected areas drastically, preserves the shape of the ob-jects, detects slowly moving objects but decreases the frame rate with approximately 30%. This is achieved by an improvement of the original adaptive MoG model, and without any iterative optimization. To illustrate our results we made ground truth foreground-background separation on several test videos and performed numerical comparison.

5. Unusual Event Detection in Urban Environment 58

Chapter 5

Unusual Event Detection in Urban Environment

The analysis of motion vectors is one of the main tools for understanding complex motion behaviors.

In most cases these tasks require real-time and robust image processing methods, because the images of low-cost camera systems are often loaded with heavy noise. As already discussed, the source of the noise may originate from the device (electronic noise, optical distortion, shaking, icker, auto white balance, aliasing error, frame drop, compression artifacts, etc.) or from the circumstances of the surveillance scene (weather and lighting conditions such as rain, wind, sunlight, dirt, head glare light of vehicles, occlusion, non-rigid motion, shadows, etc). It is obvious that all these types of noise cannot be removed directly in real-time, but spatial and temporal support must be involved in the analysis and interpretation of observed data. In this chapter we show robust pixel dense motion modeling techniques to detect unusual events on low-quality open-air surveillance videos without any specic prior knowledge of the motion patterns, situation and the environment. The proposed automatic methods require an unsupervised training phase only.

Since the frame rate of surveillance videos is often not stable, we do not consider the magnitude of motion vectors extracted from input video frames. In the proposed methods we use pixel-level probability models, multidimensional mixture of Gaussians (MoGs), single level and hierarchical hidden Markov models (HMMs). In our approach there is no need for recognizing or tracking objects in order to alert anomalous motion or to model and visualize the uctuation of trac.

Hierarchy of HMMs can be composed by separate HMMs of non-overlapping image regions and by higher level HMMs analyzing the co-occurrence of low level states. We also show a scaling method

5.1. Unusual Event Detection in Videos 59

introduced into the mathematical model of HMM to get a robust tool for the statistical analysis of large number of motion vector samples at a time. We illustrate the usage of our models with several real-life videos.

The remainder of this chapter is organized as follows. After overviewing related works in Sec. 5.1, we give a brief outline and assumptions of our proposed system in Sec. 5.2, including the preprocessing steps (optical ow calculation and region construction) required by models dened later. In Sec. 5.3 we investigate the use of some low-level techniques for the analysis of dense optical ow directions without object level understanding. In our discussion we call a motion event unusual at any location if the observed direction is implausible. Our two dierent HMM-based detection algorithms are discussed in Sec. 5.4.2 and Sec. 5.4.3, anomaly detection is demonstrated by real-life videos at the end of the sections. Finally, in Sec. 5.4.4, we analyze the performance of our HMM-based method to demonstrate its real-time processing capability.

5.1 Unusual Event Detection in Videos

A common approach in automated visual surveillance is to nd specic motion patterns or ob-jects (e.g. pedestrians, cars, motion at specic locations, etc.), or alternatively to nd unexpected events or behavior without any a priori knowledge or specication of their features, situation and environment. Most of the known methods for surveillance apply object tracking based approaches (e.g. [27, 32]), or investigate the co-occurrence of dierent motions (e.g. [77]). However, object tracking based approaches work with high false alarm rate in noisy environments, and thus surveil-lance applications face a lot of problems, as discussed in several papers (e.g. [20]), resulting in a signicant gap between laboratory testing and real-life applications. A good survey of visual surveillance can be found in [29]. While there is a wide range of approaches, many of them cannot be applied in outdoor surveillance due to unreliable observation data. Some examples of the prac-tical problems of object tracking are presented in Fig. 5.1. In this experiment we used the Mean Shift based object tracking method of [13] and applied Kalman [34] ltering on the position and size of the objects.

HMM-based methods are widely used in computer vision for oine video analysis. In [76], a semi-supervised adapted HMM framework is proposed, where usual and unusual event models are created in an iterative process. They use simple background subtraction for motion detection and then principal component analysis (PCA) is used to reduce the dimension of observation data. The eectiveness is not proven for outdoor surveillance (only indoor videos have been tested); even the

5.1. Unusual Event Detection in Videos 60

(a) (b) (c)

Figure 5.1: Typical practical problems of the object tracking based approaches: in (a) the tracker lost the highlighted vehicle and found another one with similar color; in (b) two vehicles were treated as a single object, after these vehicles separated the tracker found other two similar vehicles;

in (c) the tracker grouped two vehicles entering the scene, but later the two cars moved in dierent directions. In our experiments we used a Mean Shift based tracker [13] and a Kalman [34] post-processing lter.

simple motion detection mechanism may generate lots of false data in case of real outdoor security recordings. Moreover, the usage of PCA (trained on usual video examples) is questionable since in the detection phase it may easily lter out rare/abnormal information as PCA-based data reduction tries to preserve the most probable information in the data set. The method in [32] uses clustering to select normal event pattern groups of object trajectories, then HMM training is performed on each normal group. Finally, unusual events are detected by analyzing the likelihood of an unseen object trajectory in every model. The method assumes the knowledge of trajectories, which is not trivial in many crowded scenarios. Hongeng et al. in [27] use object tracking and analyze human activities considered to be composed of action threads, each thread being executed by a single actor. 2D shape and trajectory features are analyzed and the eects of noise are to be reduced by additional mechanisms such as ground plane detection, feet detection, while the complexity of their inference method largely depends on the number of moving objects and the complexity of the scene. In [1], the anomalies of optical ow in pedestrian trac are analyzed. Video frames are divided into xed size blocks and local models are dened for each block. They propose the use of HMMs for the analysis of likelihoods of observations in a time-window. Unfortunately only indoor data are evaluated, and they project the input optical ow patterns on the principal components

5.1. Unusual Event Detection in Videos 61

of the training ow elds, which might lter out valuable information about abnormal events. [43]

also proposes the use of HMMs for the analysis of the motion of people in indoor scenes. While they concentrated only on human tracking and applied background subtraction for motion detection, known to be unreliable in open air, they report that the false alarm rate was too high. They trained their discrete HMMs for each normal activity - this approach is not realizable in video surveillance. [11] uses entropic minimization in the training process of HMM, and nds unusual events in an oine process by analyzing the likelihoods in the forward-backward procedure in a given time-window. Unfortunately, spatially distant events cannot be easily separated in that model and localizing the anomalous event is also impossible. [71] presents a unied automatic model selection based approach for modeling complex activities of multiple objects in cluttered scenes. Events are represented by pixel change history (PCH), which are classied by unsupervised MoG-based clustering using Expectation-Maximization, and the model order selection is based on Schwarz's Bayesian information criterion (BIC). Dynamic probabilistic networks (DPNs) are formulated for modeling the temporal and causal correlations among discrete events for robust and holistic scene-level behavior interpretation.

In [66] recognition of unusual motion is done by modeling activity by the polygonal shape of the conguration of interacting objects at any time, and its deformation over time. In their method they tried to avoid tracking, but their proposed technique is inadequate for outdoor trac videos where occlusion, noise, shadows make it impossible to generate point masses for shape analysis as proposed. [8] uses space-time video segments measured relatively to all the other video segments within a small window in time, and multiscale representations of these patches are created.

According to the examples shown in their paper it is a question how the algorithm behaves in noisy outdoor sequences and what happens if the number of objects is high or their size is small and occlusion happens. [77] applies co-occurrence matrices of simple prototype features (such as the spatial histogram of binary moving object maps) for the analysis of motion. Event detection is performed by K -means clustering, identifying isolated clusters as unusual events.

The number of papers about tracking based approaches is tremendous and is still increasing.

[15] is a good example of new directions in this eld: this paper proposes a detection approach not requiring the binarization of the dierence image. Local density maxima in the dierence image -usually representing moving objects - are outlined by a fast non-parametric mean shift clustering procedure. Object tracking is carried out by updating and propagating cluster parameters over time using the mode seeking property of the mean shift procedure. For occluding targets, a fast procedure determining the object conguration maximizing image likelihood is presented.

In document Új valószínűségi módszerek videó-megfigyelési alkalmazásokhoz (Pldal 68-75)