Performance Analysis, Scope, and Limitations

5.4 Hidden Markov Models for Anomaly Detection

5.4.4 Performance Analysis, Scope, and Limitations

In our experiments the input video (320×240 frame size) of 4800 frames (with unstable frame rate) was resized to half size to test our HMM-based detection algorithm discussed in Sec. 5.4 on a Windows XP, 3GHz Intel Xeon PC, implemented in a single-threaded C++ application, and using the OpenCV library [45]. We measured the speed of the dierent phases of the detection algorithm. The following three phases were dened:

1. Preprocessing: MoG change detection, optical ow calculation and ltering, connected com-ponents localization;

2. Observation construction: MoG t on vectors of each blob inside the selected ROI;

3. Anomaly detection: evaluating Eq. 5.17.

According to our tests the rst phase was processed in 51.9 msecs, the second in 17.24 msecs and the third in 6.14 msec per frame in average. Altogether this means approximately 13 FPS processing performance.

As shown in our example, the detection mechanism can be well used in urban trac ap-plications. However, there are some limitations that should be considered: we assume that the surveillance camera is positioned well above the road, hence full occlusion happens only occa-sionally. We should also pay attention that the training video data should not contain unusual events. One could construct an application where data collection and model training would run simultaneously with event detection, or the model is continuously updated with new observations.

It could also be possible to apply dierent models for the dierent periods of a day. Processing these long term observations are outside the scope and length of this work. Moreover, in case of trac anomalies caused by stationary objects (e.g. parking vehicle) our detectors will fail, since the low-level HMM models are based on optical ow and motion detection. This can be solved by incorporating duration information into the high-level model to signal unusually long motionless states, e.g. by using a hidden semi-Markov model.

5.5 Conclusion

Object tracking based trac analysis approaches are dicult to be used in noisy outdoor environ-ments, especially where the number of objects is large and the motion is complex. In this chapter we presented alternative new methods to model the motion of outdoor trac areas without the

5.5. Conclusion 81

direct semantic modeling of specic motion events. Two main approaches were investigated, both building on the pixel-wise, dense optical ow: local probability models for motion direction evalu-ation and HMM-based spatiotemporal modeling, to discover the latent semantic rules of complex motion. Both of the proposed anomaly detectors can achieve real-time processing performance.

We also proved that with relative scaling of emission probabilities a large number of samples can be used for hidden Markov modeling of motion directions. This way new pixel dense probabilistic approaches can be constructed, where the joint probability of hundreds of motion vectors is very low. Our methods give the possibility to segment the road areas and model the behavior of se-lected ROIs independently by HMMs. Naturally, the HMMs can be used for anomaly detection in road trac, and we could create hierarchical HMMs to nd the relation between the neighboring ROIs. Interesting questions to be investigated in future are the optimal automatic selection of ROIs to participate in the hierarchical model, and the ways of long-term model training. Further improvements might be achieved by using a HSMM-based method (see Sec. 3.3.2) to model either the individual regions or the connection between regions. This would give us a tool for detecting unusual durations, which is also useful when the trac is analyzed, e.g. trac jams might be detected as being in motionless state with unusually long duration.

6. Conclusions 82

Chapter 6

Conclusions

In this thesis the research history and new results of our work in the eld of visual surveillance, related and connected methods, applications have been collected and presented. During the devel-opment of the presented methods we focused on robustness and near real-time processing capability.

A quick overview of the contributions of the areas of the respective theses can be fount in Table 6.1, including their locations in the text.

Thesis group Chapter Contribution

(1) Analysis of

time-multiplexed videos Chapter 3 (34-47)

- robust method for automatic oine seg-mentation

- real-time unusual camera event detec-tion

(2) Foreground-background

separation Chapter 4 (49-57) - elimination of the foreground apertureproblem

(3) Unusual event detection Chapter 5 (58-80)

- pixel-wise modeling of motion directions - regional HMM to model the uctuation of the trac

- scaling technique to solve a numerical precision problem

- hierarchical composition of regional models

Table 6.1: Contributions by theses.

The presented methods are built on probabilistic models for achieving robust and high performance tools for dierent automatic surveillance tasks in noisy and cluttered outdoor envi-ronment.

We presented new results in automatic scene recognition and abnormal camera event

detec-6. Conclusions 83

tion in a multi-camera surveillance system, which is a key preprocessing step in many single-camera machine vision applications since in most systems only the visual information of the cameras is transmitted and stored without any additional metadata. The presented oine segmentation method is a useful tool for the processing of large amounts of archived time-multiplexed video data, while the proposed real-time detectors can be used to nd anomalous camera activity such as unusual camera order or duration, manual PTZ control and device malfunction.

In foreground-background separation we introduced an extension to the most widely used mixture of Gaussians background model to cope with a practical problem, namely the foreground aperture. Our results show that the number of falsely classied pixels have reduced signicantly, without using any iterative optimization processes.

We presented two dierent unusual event detectors for signaling two dierent trac anoma-lies, both using the extracted pixel-wise optical ow directions only. To solve a numerical precision problem in the training procedure of the hidden Markov model based detector, we presented a scaling technique in the mathematical formulæ of the parameter estimation method, without com-promising the speed of the procedure.

The developed algorithms directly correspond to ongoing research projects with the par-ticipation of the MTA-SZTAKI. Particularly, the aim of the MEDUSA project of the European Defence Agency is to realize an intelligent multi-sensor data fusion grid, and the integration of the unusual event detection methods presented in Chapter 5 into the nal prototype system is cur-rently in progress. The pixel-level unusual event detection methods of Sec. 5.3 were also integrated into the system of the MONLINGV project [53] of the Jedlik Ányos programme.

I. 84

Appendix I

I.1 Illustrations: Aberrations and Artifacts

This appendix contains some example images to illustrate the dierent aberrations and artifacts caused by lenses, image sensors, and compression, that might appear in recorded video streams.

Figure I.1: Blooming eect: too much charge of a given pixel causes overow to pixels in its neighborhood.

Figure I.2: Thermal noise generated by the agitation of electrons inside the sensor.

I.1. Illustrations: Aberrations and Artifacts 85

Figure I.3: Smear eect: vertical white stripes caused by the read out process of the CCD are clearly visible.

Figure I.4: Aliasing error at patterns containing high spatial frequencies.

Figure I.5: Hot pixels are permanent and can be found in almost all image sensors.

I.1. Illustrations: Aberrations and Artifacts 86

Figure I.6: Interlaced videos displayed on a progressive scan device. Artifacts are visible at the location of moving objects.

Figure I.7: Comatic aberration: beams from o-axis objects produce comet-like shapes.

Figure I.8: Chromatic aberration: color fringes are present around the image.

I.1. Illustrations: Aberrations and Artifacts 87

(a) (b)

Figure I.9: (a) Barrel distortion: the straight lines bulge outwards at the center. (b) Pincushion distortion: the aberration is the opposite of the barrel distortion.

(a) (b)

Figure I.10: Vignetting: (a) does not show any notable vignetting artifact, but the reduction of brightness at the corners of (b) is signicant.

(a) (b) (c)

Figure I.11: Compression artifacts: (a) blocking; (b) mosquito noise; (c) chroma subsampling error.

I.1. Illustrations: Aberrations and Artifacts 88

(a) (b) (c)

Figure I.12: Real-life outdoor recordings might contain: (a) rain and reections; (b) multiple illumination sources, e.g. headlights; (c) cast shadow and occlusion.

(a)

(b)

(c)

Figure I.13: Example video frames from real-life outdoor recordings demonstrating practical prob-lems in urban environment: (a) cluttered scenes; (b) dirt; (c) multiple illumination sources in nighttime videos.

II. 89

Appendix II

Consider a continuous HMMλ={π,A,B} withN states and mixture ofM Gaussians emission probability. Moreover, let O = (O1,O2, . . . ,OT) denote the observation sequence of length T, whereO_t= (O_t,1, O_t,2, . . . , O_t,_Kt). The state sequence of a process isQ= (Q₁, Q₂, . . . , Q_T).

II.1 Re-estimation Using Multiple Observation Sequences

UsingE training sequences the original iterative re-estimation formula of the Baum-Welch algo-rithm [48] (see Sec. 2.4.4) is:

II.2. Expectation Maximization Using Relative Emission 90

where the e in the superscript of the α, β, γ, and ξ variables denotes a particular sequence (1≤e≤E),O^e is theeth observation sequence, andT^eis its length.

II.2 Expectation Maximization Using Relative Emission

The re-estimation procedure of theλHMM, using one observation sequenceOis

Moreover, the transition probability re-estimation can be computed directly from the forward and backward variables:

II.2. Expectation Maximization Using Relative Emission 91

In Sec. 5.4.2.3 we dened the relative emission as follows:

˜bi(Ot) =

and denote the scaling coecients aset,k =h PN

j=1bj(Ot,k)i−1

. DenotingEt=QK_t

k=1et,k we get

˜bi(Ot) =Etbi(Ot).

Now we have to nd the relation between the original and the scaled forward α˜_t(i) and backwardβ˜_t(i) variables in order to make the changes of the Baum-Welch procedure Eq. II.6 if necessary. First at timet= 1we can write:

α1(i) =πi˜bi(O1) =πiE1bi(O1) =E1α1(i). (II.14) In the next stept= 2using the recursive formula of Eq. 2.27 we get:

Thus nally by induction we get rule:

II.2. Expectation Maximization Using Relative Emission 92

The direct re-estimation of the transition probabilities in Eq. II.11 is

¯ we can cancel them out from both the numerator and denominator. By the above results we easily proved that the Baum-Welch re-estimation algorithm Eq. II.6 can be used with the new relative emissions.

The procedure introduced above can be incorporated into the sequence scaling method of [48] (introduced in Sec. 2.4.5), where eachα_t(i)was scaled by the sum over all states of α_t(i). First we calculate the relative emissions followed by sequence scaling. The nal scaled forward and backward variables will be denoted byαˆt(i)andβˆt(i)and can be calculated as follows. The forward procedure starts fromt= 1, we can write:

α⁰₁(i) =π_i˜b_i(O₁) =π_iE₁b_i(O₁) =E₁α₁(i), (II.25)

II.2. Expectation Maximization Using Relative Emission 93

The backward variables are scaled by the sameˆct coecients. In the rst stept=T:

β_T⁰(i) = 1, (II.30)

II.2. Expectation Maximization Using Relative Emission 94

The direct re-estimation of the transition probabilities in Eq. II.11 is

is independent oft we can cancel them out from both the numerator and denominator.

For the computation of the log-likelihood function in Eq. II.12, used to evaluate the con-vergence of the re-estimation process, we can useαˆ_T(i) =h

τ=1cˆ_τE_τi

α_T(i)and sinceαˆ_T(i)are normalized their sum is equal to1, hence

P(O|λ) = 1

II.2. Expectation Maximization Using Relative Emission 95

or by usingE_t=QKt

k=1e_t,k:

logP(O|λ) =−

τ=1

log (ˆc_τ)−

τ=1 K_τ

k=1

log (e_t,k) . (II.43)

Bibliography 96

Bibliography

[1] E. L. Andrade, S. Blunsden, and R. B. Fisher. Performance analysis of event detection models in crowded scenes. In Proceedings of The International Conference on Visual Information Engineering, pages 427432, Bangalore, India, September 2628 2006.

[2] L. E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Inequalities III: Proceedings of the Third Symposium on Inequalities, pages 18, Los Angeles, CA, USA, 1972.

[3] L. E. Baum and J. A. Eagon. An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology. Bulletin of the American Mathematical Society, 73(3):360363, 1967.

[4] L. E. Baum and T. Petrie. Statistical inference for probabilistic functions of nite state Markov chains. The Annals of Mathematical Statistics, 37(6):15541563, 1966.

[5] L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1):164171, 1970.

[6] L. E. Baum and L. R. Welch. A statistical estimation procedure for probabilistic functions of nite Markov processes. Submitted for publication to Proc. Nat. Acad. Sci. U.S.A.

[7] J. R. Bergen and R. Hingorani. Hierarchical motion-based frame rate conversion. Technical report, David Sarno Research Center Princeton NJ 08540, 1990.

[8] O. Boiman and M. Irani. Detecting irregularities in images and in video. International Journal of Computer Vision, 74(1):1731, 2007.

[9] J.-Y. Bouguet. Pyramidal implementation of the Lucas-Kanade feature tracker. Technical report, Intel Corp., Microprocessor Research Labs, 2000.

[10] T. Bouwmans, F. El Baf, and B. Vachon. Background modeling using mixture of Gaussians for foreground detection - a survey. Recent Patents on Computer Science, 1(3):219237, 2008.

[11] M. Brand and V. Kettnaker. Discovery and segmentation of activities in video. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 22(8):844851, 2000.

[12] Y. Chang, D. J. Lee, Y. Hong, and J. Archibald. Unsupervised video shot detection using clustering ensemble with a color global scale-invariant feature transform descriptor. Journal on Image and Video Processing, 8(1):110, 2008.

Bibliography 97

[13] D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects using mean shift. In Proceedings of The IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 142149, Hilton Head, SC, USA, June 1315 2000.

[14] C. Cotsaces, N. Nikolaidis, and I. Pitas. Video shot detection and condensed representation.

A review. Signal Processing Magazine, IEEE, 23(2):2837, 2006.

[15] Cs. Beleznai, B. Frühstück, and H. Bischof. Human tracking by mode seeking. In Proceedings of The 4th International Symposium on Image and Signal Processing and Analysis, pages 16, Zagreb, Croatia, September 1517 2005.

[16] Cs. Benedek and T. Szirányi. Markovian framework for foreground-background-shadow sep-aration of real world video scenes. In Proceedings of The 7th Asian Conference on Computer Vision, pages 898907, Hyderabad, India, January 1316 2006.

[17] Cs. Benedek and T. Szirányi. Bayesian foreground and shadow detection in uncertain frame rate surveillance videos. IEEE Transactions on Image Processing, 4(17):608621, 2008.

[18] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati. Detecting moving objects, ghosts and shadows in video streams. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10):13371342, 2003.

[19] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):138, 1977.

[20] A. R. Dick and M. J. Brooks. Issues in automated visual surveillance. In Proceedings of The 7th International Conference on Digital Image Computing: Techniques and Applications, pages 195204, Sydney, Australia, December 10-12 2003.

[21] A. Elgammal, D. Harwood, and L. Davis. Non-parametric model for background subtraction.

In Proceedings of IEEE International Conference on Computer Vision, Frame-rate Workshop, pages 751767, Kerkyra, Greece, September 2025 2000.

[22] J. D. Ferguson. Variable duration models for speech. In Proceedings of The Symposium on the Application of HMMs to Text and Speech, pages 143179, Princeton, NJ, USA, 1980.

[23] G. D. Forney. The Viterbi algorithm. Proceedings of the IEEE, 61(3):268278, 1973.

[24] N. Friedman and S. Russell. Image segmentation in video sequences: A probabilistic approach.

In Proceedings of The 13th Conference on Uncertainty in Articial Intelligence, pages 175181, Rhode Island, USA, August 13 1997.

[25] B. Georgescu, I. Shimshoni, and P. Meer. Mean Shift based clustering in high dimensions: A texture classication example. In Proceedings of The 9th IEEE International Conference on Computer Vision, pages 456463, Nice, France, October 1417 2003.

[26] K. J. Han and A. H. Tewk. Eigen-image based video segmentation and indexing. In Pro-ceedings of The International Conference on Image Processing, pages 538541, Washington, DC, USA, October 2629 1997.

[27] S. Hongeng, R. Nevatia, and F. Bremond. Video-based event recognition: Activity represen-tation and probabilistic recognition methods. Computer Vision and Image Understanding, 96(2):129162, 2004.

Bibliography 98

[28] J.-S. Hu and T.-M. Su. Robust background subtraction with shadow and highlight removal for indoor surveillance. EURASIP Journal on Applied Signal Processing, 2007(1):108108, 2007.

[29] W. Hu, T. Tan, L. Wang, and S. Maybank. A survey on visual surveillance of object motion and behaviors. IEEE Transactions on Systems, Man and Cybernetics, 34:334352, 2004.

[30] X. Huang, H. Ma, and H. Yuan. A hidden Markov model approach to parsing mtv video shot. In Proceedings of The Congress on Image and Signal Processing, pages 276280, Sanya, Hainan, China, May 2730 2008.

[31] V. Jain, B. B. Kimia, and J. L. Mundy. Background modeling based on subpixel edges.

In Proceedings of The International Conference on Image Processing, pages 321324, San Antonio, TX, USA, September 1619 2007.

[32] F. Jiang, Y. Wu, and A. K. Katsaggelos. Abnormal event detection from surveillance video by dynamic hierarchical clustering. In Proceedings of The International Conference on Image Processing, pages 145148, San Antonio, TX, USA, September 1619 2007.

[33] M. T. Johnson. Capacity and complexity of HMM duration modeling techniques. Signal Processing Letters, IEEE, 12(5):407410, 2005.

[34] R. E. Kalman. A new approach to linear ltering and prediction problems. Transactions of the ASMEJournal of Basic Engineering, 82(Series D):3545, 1960.

[35] Y. Kameda and M. Minoh. A human motion estimation method using 3-successive video frames. In Proceedings of The 2nd International Conference on Virtual Systems and Multi-Media, pages 135140, Gifu, Japan, September 1820 1996.

[36] K. P. Karmann and A. von Brandt. Moving object recognition using an adaptive background memory. In Proceedings of The 3rd International Workshop on Time-Varying Image Processing and Moving Object Recognition, Amsterdam, The Netherlands, 1990.

[37] D. Koller, J. Weber, and J. Malik. Robust multiple car tracking with occlusion reasoning.

Technical report, EECS Department, University of California, Berkeley, 1993.

[38] S. E. Levinson. Continuously variable duration hidden Markov models for automatic speech recognition. Computer Speech and Language, 1(1):2945, 1986.

[39] N. J. B. McFarlane and S. C. P. Segmentation and tracking of piglets in images. Machine Vision and Applications, 8(3):187193, 1995.

[40] C. D. Mitchell and L. H. Jamieson. Modeling duration in a hidden Markov model with the exponential family. In IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 331334, Minneapolis, MN, USA, April 27-30 1993.

[41] T. B. Moeslund and E. Granum. A survey of advances in vision-based human motion capture.

Computer Vision and Image Understanding, 81(3):231268, 2001.

[42] T. B. Moeslund, A. Hilton, and V. Krüger. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2):90126, 2006.

[43] V. Nair and J. J. Clark. Automated visual surveillance using hidden Markov models. In Proceedings of The 15th International Conference on Vision Interface, pages 8894, Calgary, Canada, May 2729 2002.

Bibliography 99

[44] Notesco Oy, 2010. http://www.notesco.net/download/markkinakatsaus.pdf.

[45] OpenCV. Open Source Computer Vision library. http://opencv.willowgarage.com.

[46] I. Petrás, Cs. Beleznai, Y. Dedeo§lu, M. Pardàs, L. Kovács, Z. Szlávik, L. Havasi, T. Szirányi, B. U. Töreyin, U. Güdükbay, A. E. Çetin, and C. Canton-Ferrer. Flexible test-bed for unusual behavior detection. In Proceedings of The 6th ACM International Conference on Image and Video Retrieval, pages 105108, Amsterdam, The Netherlands, July 911 2007.

[47] F. Porikli. Human body tracking by adaptive background models and mean-shift analysis.

In Proceedings of The IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Graz, Austria, March 31 2003.

[48] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257286, 1989.

[49] P. Remagnino, A. Baumberg, T. Grove, D. Hogg, T. Tan, A. Worrall, and K. Baker. An integrated trac and pedestrian model-based vision system. In Proceedings of The 8th British Machine Vision Conference, pages 380389, Colchester, U.K., September 811 1997.

[50] M. J. Russell and R. K. Moore. Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition. In IEEE Proceedings of The International Conference on Acoustics, Speech, and Signal Processing, volume 10, pages 58, Tampa, FL, USA, March 2629 1985.

[51] C. Stauer and W. E. L. Grimson. Adaptive background mixture models for real-time tracking.

In Proceedings of The IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 246252, Fort Collins, CO, USA, June 2325 1999.

[52] C. Stauer and W. E. L. Grimson. Learning patterns of activity using real-time tracking.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):747757, 2000.

[53] Z. Szlávik, L. Kovács, L. Havasi, Cs. Benedek, I. Petrás, Á. Utasi, A. Licsár, L. Czúni, and T. Szirányi. Behavior and event detection for annotation and surveillance. In Proceedings of the 6th International Workshop on Content-Based Multimedia Indexing, pages 117124, London, UK, June 1820 2008.

[54] E. Thul. An evaluation of Chris Stauer and W.E.L Grimson's method for background sub-traction. Technical report, School of Computer Science, McGill University, 2007.

[55] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallower: Principles and practice of background maintenance. In Proceedings of The 7th IEEE International Conference on Computer Vision, volume 1, pages 255261, Kerkyra, Greece, September 2025 1999.

[56] Á. Utasi and L. Czúni. Reducing the foreground aperture problem in mixture of Gaussians based motion detection. In Proceedings of The 14th International Conference on Systems, Signals and Image Processing and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services, pages 157160, Maribor, Slovenia, June 2730 2007.

[57] Á. Utasi and L. Czúni. Valós idej¶ mozgásdetektálás módosított mixture of Gaussians eljárás-sal. In Proceedings of The 6th Conference of Hungarian Association for Image Processing and Pattern Recognition, Debrecen, Hungary, January 25-27 2007.

Bibliography 100

[58] Á. Utasi and L. Czúni. Anomaly detection with low-level processes in videos. In Proceedings

In document Új valószínűségi módszerek videó-megfigyelési alkalmazásokhoz (Pldal 93-0)