Anomalous event detection is of utmost importance in intelligent video surveillance. Currently, most approaches for the automatic analysis of complex video scenes typically rely on hand-crafted appearance and motion features. However, adopting user defined representations is clearly suboptimal, as it is desirable to learn descriptors specific to the scene of interest. To cope with this need, in this paper we propose Appearance and Motion DeepNet (AMDN), a novel approach based on deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Then, based on the learned features, multiple one-class SVM models are used to predict the anomaly scores of each input. Finally, a novel late fusion strategy is proposed to combine the computed scores and detect abnormal events. The proposed ADMN is extensively evaluated on publicly available video surveillance datasets including UCSD pedestian, Subway, and Train, showing competitive performance with respect to state of the art approaches. © 2016 Elsevier Inc. All rights reserved.
Detecting anomalous events in videos by learning deep representations of appearance and motion / Xu, Dan; Yan, Yan; Ricci, Elisa; Sebe, Nicu. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 156:(2017), pp. 117-127. [10.1016/j.cviu.2016.10.010]
Detecting anomalous events in videos by learning deep representations of appearance and motion
Xu, Dan;Yan, Yan;Ricci, Elisa;Sebe, Nicu
2017-01-01
Abstract
Anomalous event detection is of utmost importance in intelligent video surveillance. Currently, most approaches for the automatic analysis of complex video scenes typically rely on hand-crafted appearance and motion features. However, adopting user defined representations is clearly suboptimal, as it is desirable to learn descriptors specific to the scene of interest. To cope with this need, in this paper we propose Appearance and Motion DeepNet (AMDN), a novel approach based on deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Then, based on the learned features, multiple one-class SVM models are used to predict the anomaly scores of each input. Finally, a novel late fusion strategy is proposed to combine the computed scores and detect abnormal events. The proposed ADMN is extensively evaluated on publicly available video surveillance datasets including UCSD pedestian, Subway, and Train, showing competitive performance with respect to state of the art approaches. © 2016 Elsevier Inc. All rights reserved.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S1077314216301618-main.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
3.49 MB
Formato
Adobe PDF
|
3.49 MB | Adobe PDF | Visualizza/Apri |
CVIU_final.pdf
accesso aperto
Tipologia:
Pre-print non referato (Non-refereed preprint)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
4.19 MB
Formato
Adobe PDF
|
4.19 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione