Anomalous event detection is of utmost importance in intelligent video surveillance. Currently, most approaches for the automatic analysis of complex video scenes typically rely on hand-crafted appearance and motion features. However, adopting user defined representations is clearly suboptimal, as it is desirable to learn descriptors specific to the scene of interest. To cope with this need, in this paper we propose Appearance and Motion DeepNet (AMDN), a novel approach based on deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Then, based on the learned features, multiple one-class SVM models are used to predict the anomaly scores of each input. Finally, a novel late fusion strategy is proposed to combine the computed scores and detect abnormal events. The proposed ADMN is extensively evaluated on publicly available video surveillance datasets including UCSD pedestian, Subway, and Train, showing competitive performance with respect to state of the art approaches. © 2016 Elsevier Inc. All rights reserved.

Detecting anomalous events in videos by learning deep representations of appearance and motion / Xu, Dan; Yan, Yan; Ricci, Elisa; Sebe, Nicu. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 156:(2017), pp. 117-127. [10.1016/j.cviu.2016.10.010]

Detecting anomalous events in videos by learning deep representations of appearance and motion

Xu, Dan;Yan, Yan;Ricci, Elisa;Sebe, Nicu
2017-01-01

Abstract

Anomalous event detection is of utmost importance in intelligent video surveillance. Currently, most approaches for the automatic analysis of complex video scenes typically rely on hand-crafted appearance and motion features. However, adopting user defined representations is clearly suboptimal, as it is desirable to learn descriptors specific to the scene of interest. To cope with this need, in this paper we propose Appearance and Motion DeepNet (AMDN), a novel approach based on deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Then, based on the learned features, multiple one-class SVM models are used to predict the anomaly scores of each input. Finally, a novel late fusion strategy is proposed to combine the computed scores and detect abnormal events. The proposed ADMN is extensively evaluated on publicly available video surveillance datasets including UCSD pedestian, Subway, and Train, showing competitive performance with respect to state of the art approaches. © 2016 Elsevier Inc. All rights reserved.
2017
Xu, Dan; Yan, Yan; Ricci, Elisa; Sebe, Nicu
Detecting anomalous events in videos by learning deep representations of appearance and motion / Xu, Dan; Yan, Yan; Ricci, Elisa; Sebe, Nicu. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 156:(2017), pp. 117-127. [10.1016/j.cviu.2016.10.010]
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S1077314216301618-main.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.49 MB
Formato Adobe PDF
3.49 MB Adobe PDF   Visualizza/Apri
CVIU_final.pdf

accesso aperto

Tipologia: Pre-print non referato (Non-refereed preprint)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.19 MB
Formato Adobe PDF
4.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/193376
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 368
  • ???jsp.display-item.citation.isi??? 306
social impact