• Nem Talált Eredményt

General Overview of The Proposed System

Analyzing the above papers we have found that most of the methods use tracking infor-mation, and usually indoor applications were targeted. Some papers applied PCA for dimension reduction with the danger of ltering out abnormal motion patterns, while others need examples of each normal activity for training. Unfortunately there is no uniform standard to compare and judge the dierent methods in a quantitative way. Even the denition of unusual event or anomalous event is not very clear and may dier depending on the specic application environment, and naturally it is not easy to nd videos with anomalous content.

Instead of mentioning other approaches, we refer to previous overviews such as the survey in [42], reviewing advances in human motion capture and analysis from 2000 to 2006, following a previous survey of papers up to 2000 [41].

5.2 General Overview of The Proposed System

As there are lots of approaches to unusual event detection and motion analysis (for example [42]

lists 424 papers in the given period), we should clarify what is new in our model and what our purposes and assumptions are:

• According to our observations, occlusions, shadows, and other eects make tracking unfeasible in several outdoor cases. To avoid the failing of the tracking algorithm, application-specic modeling of objects or partial objects and their behavior would be necessary. To keep away from such complex models we apply no object based tracking. Instead, we use a dense optical ow eld to build models upon. Our low-level detector models detect unusual motion in every pixel and do not assume that a certain rhythm of the trac is not present, however the HMMs will give us temporal links between observations. To avoid misunderstanding we should emphasize that in traditional tracking approaches there is object-to-object or at least blob-to-blob correspondence between video frames. This includes labeling and the characterization of object trajectories. On the contrary, in our approach we need no object or blob labeling from frame to frame.

• We assume that the source video frames are recorded with unstable frame rate, hence our models are based on localized motion directions and the magnitudes of the optical ow vectors are neglected.

• We construct our HMM models in two levels: a region based continuous distribution HMMs are trained with the optical ow vectors, thus local observations are evaluated in the light

5.2.1. Preprocessing 63

of temporal behavior of trac. On top of several regional HMMs we create another discrete HMM. The hierarchical representation of region information enables us to consider those regions which might have eects on each other in a trac situation.

• In general there is a prior unsupervised learning phase, but the collection of training data can be accomplished in the processing phase simultaneously without signicant drop of com-putational performance.

The dierent steps of the proposed algorithm are discussed in the following sections and the general architecture of our system is demonstrated in Fig. 5.2. In the training phase the optical ow vectors are calculated and ltered in the rst step, and are used both to create the coherent regions (ROIs) of the scene and to estimate the parameters of the regional HMMs. Finally, on top of several regional HMMs, a higher-level discrete HMM is trained. In the detection phase the optical ow vectors inside a selected ROI and the current state of the regional HMM characterize the next state of the model and the probability of this event. Finally, the high-level HMM analyses the state conguration of several regional models.

5.2.1 Preprocessing

To avoid unnecessary computations we apply an adaptive MoG change detection algorithm to exclude non-changing areas from further analysis. The approach of [52] is well suited for open air applications since it can learn the multimodal appearance of objects (trees, water, rain, shadows, etc.). Fig. 5.3 shows the foreground areas of typical surveillance videos detected by the method proposed in [51]. In Sec. 4.3 this method is discussed in details. As observable, objects are ragged so it would be dicult to track each by object based methods. For dense optical ow calculation we successfully used the multi-scale gradient method of Bergen [7], employing a pyramid of successively lowpass-ltered versions of the images. In this pyramid one ner image level is warped from the neighboring frames with bilinear interpolation by the scaled motion vector calculated on the previous level. Then the motion estimation on this warped level gives the residual ner motion resulting in a coarse-to-ne motion estimation. The multi-scale property helps to avoid problems of temporal under-sampling when the frame-rate of the video is low compared to the speed of the observed objects. This is a frequent case in real-life surveillance videos. We also successfully used the method of Lucas and Kanade [9] in our tests. Instead of using time consuming lters (e.g.

spatiotemporal Gaussian or median) we only used simple lters to get rid of outliers: the vectors with unusually large (>30% of image size) or small (<1pixel) magnitude were simply neglected.

5.2.1. Preprocessing 64

Figure 5.2: General architecture of the proposed system. Dashed lines denote processing steps in the training phase only.

Our real-life test videos, recorded by commercial systems, had variable frame rate, hence only the directions of the optical ow vectors were used in further steps. The proposed algorithms were tested on the intensity channel of color videos due to complexity considerations (most surveillance systems' workstations have to process and store the images of 4-16 cameras), but they can be easily extended to three channels.

Figure 5.3: Typical output of the foreground-background separation using the MoG method of [51].