Detecting anomalous behaviour in audio-video sensor networks

This project investigates the identification and classification of behaviours as normal or abnormal, safe or threatening, from an irregular and often heterogeneous sensor network. Although some general principles may be applied, there is unlikely to be a unified approach that can be applied across different sensor domains; for example, tracking moving ‘blobs’ in radar or sonar data is quite distinct in dimension, mathematical and applied treatment from tracking human subjects in CCTV image data. In this work project we focus specifically on the problems of using electro-optic (video, IR, LiDAR) and audible data to monitor behaviour and detect anomalies, therefore addressing a number of specified areas of interest and challenges. Our current work addresses the problem of statistical anomaly detection in video pattern recognition. This can be applied to tracking people and objects in surveillance networks; learning statistical representations of normality which evolve to produce pattern-of-life estimates for targets; detecting deviations from normality and using the spatial context of behaviour to filter out spurious anomalies. The same context driven approach can be applied to various domains for example, airport or city surveillance. Initial progress has focused on developing algorithms to incorporating contextual understanding into low-level tracking algorithms. Specifically, we have focused on concourse protection where the challenge is to identify a threat before it reaches the entry point of an asset (e.g. checkpoint, forward operating base). We have developed a novel head-pose classifier for surveillance video that is able to identify where a person is looking. Our approach has shown encouraging results on two public video datasets (Benfold and Caviar), as shown below:
AV-anomalous-P1

By integrating head-pose information from our classifier into a Kalman Filter our ‘Intentional Tracker’ uses this contextual prior when making predictions about target movement. We have demonstrated that using this intentional prior allows us to make better predictions of expected behaviour. This theory generalises to different types of target (e.g. vehicles using indicators) and other spatio-temporal priors, which will become the focus of our work over the coming months. Using contextual information our algorithms construct better models of normal behaviour. This enables them to make more accurate predictions of expected behaviour through which anomalous activity can be identified by its low probability.

Publications:

An adaptive motion model for person tracking with instantaneous head-pose features. R H Baxter, M J V Leach, S S Mukherjee, N M Robertson, IEEE Signal Processing Letters, Vol. 22, Iss. 5, p578-582, 2015. doi: 10.1109/LSP.2014.2364458
Tracking with Intent. R H Baxter, M J V Leach, N M Robertson. Sensor Signal Processing for Defence, 2014, DOI: 10.1109/SSPD.2014.6943323
Detecting Social Groups in Crowded Surveillance Videos Using Visual Attention. M J V Leach. R H Baxter, N M Robertson, E P Sparks. CVPR Workshop on Computational Models for Social Interaction and Behavior. 2014, DOI: 10.1109/CVPRW.2014.75

Contacts:

Dr. Rolf H Baxter
Dr. Neil M Robertson

Useful links:

Related publications
UDRC