• Nem Talált Eredményt

Figure 3.9: Example frames from video segments we used to simulate device malfunction in our experiments.

video frames (Sec. 3.2.1), and model parameter estimation as discussed in Sec. 3.2.3. According to our tests the time consumption of the rst step is 4.61 sec and one iteration in the model training step is performed in 4.12 msec.

We used the trained model for oine video segmentation to determine the most probable state (camera) sequence for the observation sequence using the Viterbi algorithm (see Sec. 3.2.4).

The segmentation including the observation generation step (see Sec. 3.2.1) of 1000 frames was performed in 4.74 seconds, which increases linearly with the number of frames [48]. In practice this means that the segmentation of a video containing 12 hours of recordings with 20 fps frame rate (i.e. 864000 frames total) can be performed in 112 seconds.

Next we tested the HMM and the HSMM-based online detectors (see Sec. 3.3.1 and Sec. 3.3.2) to nd anomalous camera events (we added manually several video frames containing device mal-function output). According to our experiments the average speed of both detectors is approxi-mately 208 frames per second on the previously specied machine. Finally 7500 frames from two videos (one night and one daytime) were evaluated containing 6 cases of anomalous camera order, 7 unusual camera durations, 175 PTZ frames and 185 frames of device malfunctioning. In case of all 7500 frames the false positive detection rate was 0%. All anomalous events were 100% detected but within PTZ segments about 30% of individual frames were missed (this had no eect of detecting the PTZ segment boundaries).

3.6 Conclusion

The numerous multi-camera surveillance systems produce large amounts of video recordings. A signicant part of these videos is recorded using time-division multiplexing. Processing and ana-lyzing these videos are labour intensive, hence automatic machine tools are required to aid these tasks. Since most of the analytical tools work on single cameras only, the segmentation of time-multiplexed video streams is a prerequisite step for many machine vision and security tasks such as motion detection, abandoned object recognition or unusual motion detection. However,

seg-3.6. Conclusion 48

mentation is ambiguous in cases where the monitored scene changes signicantly in a short time interval, periods are missing or unknown scenes are visited during manual PTZ control. In this chapter we presented our real-time HMM and HSMM-based detectors that are capable of tak-ing into account the usual durations, typical scene images and the order of the cameras in the video stream. Moreover, the proposed methods provide an eective tool for oine segmentation of large amounts of archive recordings. We tested our methods on real-life low-quality recordings and demonstrated several unusual events (anomalous order, unusual duration, manual control, and device malfunction) typically occurring in security applications. The observation vectors and the model parameters can easily be updated accordingly to the period of the day, which is a future task to be implemented.

4. Foreground-Background Separation 49

Chapter 4

Foreground-Background Separation

Separating moving image parts from the static background is an important phase in video surveil-lance applications, such as object recognition, tracking and motion analysis. In recent years pixel-wise background modeling approaches have become widely applied in dierent video processing problems even in real-time applications, due to the increase of the processing power of computers and the relatively low complexity of these algorithms. In these methods the background is mod-eled by the recent values of image pixels and the pixels of the incoming frames are compared to these models. The dierent approaches apply dierent model creation and decision mechanisms to designate moving areas.

The method based on mixture of Gaussians (MoG) is a widely used technique and provides a robust tool to learn noisy and changing background automatically and adaptively. Known back-ground modeling methods (including the MoG-based ones) suer from the phenomena called the foreground aperture problem, when areas of large moving homogeneous regions become part of the background instead of being selected as moving pixels. In this chapter we introduce a new MoG-based method to eliminate this problem. The rest of this chapter is organized as follows. In Sec. 4.1 we overview the task of foreground-background separation and some of the widely used techniques. The practical problems a robust method has to cope with are discussed in Sec. 4.2. A detailed introduction of the basic MoG-based method is given in Sec. 4.3. We present our improved MoG-based method in Sec. 4.4. The results of our experiments are presented in Sec. 4.5. Finally, Sec. 4.6 concludes the chapter.

4.1. Foreground-Background Separation 50

4.1 Foreground-Background Separation

Background modeling is the core element of almost every foreground-background separation tech-nique. Therefore, the background model has to be robust against noise, while the model parameter estimate or update requires a real-time procedure. Most of the techniques can be classied into two categories: non-recursive and recursive.

The methods of the rst group store a predened number of the previous video frames in a buer, and perform a pixel-wise background estimation using the buer data. The frame dierenc-ing method is the simplest of these background modeldierenc-ing techniques and it uses the previous video frame as a background model. Since it uses only the previous frame, the interior pixels of slowly moving large homogeneous objects will be classied as background (aperture problem). Minor im-provements can be achieved e.g. by using the combination of two consecutive frame dierences [35].

The median lter method estimates the background as the average or median (or medoid in case of a color input) of the buer data [18] at a given location. In case of linear predictive ltering techniques, the intensity of the background is estimated by applying a linear predictive lter (e.g.

Wiener lter) on the buer data [55]. At each time step the lter coecients are re-estimated, making the method dicult to apply in real-time. Unlike the previous techniques that use a single background estimate, the kernel density estimation technique uses the entire buer data to form a kernel density estimator (e.g. using Gaussian kernel) [21]. The pixel is classied as foreground when it is unlikely to come from the estimated background distribution.

Instead of keeping large numbers of previous frames in the memory, the recursive methods recursively update a background model based on each video frame. As a result, temporally distant observations can aect the current model. To suppress this, most techniques include exponential weighting to discount the past. The approximated median lter approximates the median by the upcoming pixels: if its value is larger than the estimate then the median is increased, and decreased if its smaller [39,49]. The Kalman lter [34] recursively estimates the state of a variable assuming noisy observations. One of the simplest recursive background modeling technique, the Pnder algorithm [70], uses the pixel intensities only, and a single Gaussian distribution is used to model the background. This approach works well for static backgrounds (indoor applications), but fails in other cases such as waving trees, ickering objects, or water (outdoor applications). [36] used both the intensity and its temporal derivate, while Koller et al. used the intensity and its spatial derivate in the model [37]. The MoG-based techniques track multiple Gaussians simultaneously and update the model parameters for each input frame. The main advantage of this method is the capability of learning multimodal backgrounds (e.g. waving trees), and the technique has become popular