We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection / Xu, Dan; Song, Jingkuan; Yan, Yan; Ricci, E.; Sebe, Niculae. - (2015), pp. 1-12. (Intervento presentato al convegno BMVC tenutosi a Swansea nel 7-10 September) [10.5244/C.29.8].

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

Xu, Dan;Song, Jingkuan;Yan, Yan;Ricci, E.;Sebe, Niculae
2015-01-01

Abstract

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.
2015
Proceedings of the British Machine Vision Conference 2015
Swansea, UK
BMVA Press
1-901725-53-7
Xu, Dan; Song, Jingkuan; Yan, Yan; Ricci, E.; Sebe, Niculae
Learning Deep Representations of Appearance and Motion for Anomalous Event Detection / Xu, Dan; Song, Jingkuan; Yan, Yan; Ricci, E.; Sebe, Niculae. - (2015), pp. 1-12. (Intervento presentato al convegno BMVC tenutosi a Swansea nel 7-10 September) [10.5244/C.29.8].
File in questo prodotto:
File Dimensione Formato  
paper008.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Altra licenza (Other type of license)
Dimensione 3.23 MB
Formato Adobe PDF
3.23 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/125100
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact