
Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network
Abstract
The widespread adoption of city surveillance systems has led to an increase in the use of surveillance videos for maintaining public safety and security. This thesis tackles the problem of detecting anomalous events in surveillance videos. The goal is to automatically identify abnormal events by learning from both normal and abnormal videos. Most of previous works consider any deviation from learned normal patterns as an anomaly, which may not always be valid since the same activity could be normal or abnormal under different circumstances. To address this issue, the thesis utilizes the Two-Stream Inflated 3D (I3D) Convolutional Networks to extract spatial and temporal video features and demonstrates how it outperforms the 3D Convolutional Network (C3D) used in prior work as feature extractor. To avoid annotating abnormal activities in training videos, a weakly supervised anomaly detection model is implemented based on the Multiple Instance Learning (MIL) framework. The model considers normal and abnormal videos as bags and video clips as instances, learns a ranking model to predict high anomaly scores for video clips containing anomalies. The thesis further shows that the choice of features input, such as concatenating RGB and flow features, and careful choice of optimization settings, such as optimizer, can significantly improve the performance of the anomaly detection model on some evaluation metrics.