Video anomaly detection has gained significant attention in the current intelligent surveillance systems. We propose Deep Residual Spatiotemporal Translation Network (DR-STN), a novel unsupervised Deep Residual conditional Generative Adversarial Network (DR-cGAN) model with an Online Hard Negative Mining (OHNM) approach. The proposed DR-cGAN provides a wider network to learn a mapping from spatial to temporal representations and enhance the perceptual quality of synthesized images from a generator. During DR-cGAN training, we take only the frames of normal events to produce their corresponding dense optical flow. At testing time, we compute the reconstruction error in local pixels between the synthesized and the real dense optical flow and then apply OHNM to remove false-positive detection results. Finally, a semantic region merging is introduced to integrate the intensities of all the individual abnormal objects into a full output frame. The proposed DR-STN has been extensively evaluated on publicly available benchmarks, including UCSD, UMN, and CUHK Avenue, demonstrating superior results over other state-of-the-art methods both in frame-level and pixel-level evaluations. The average Area Under the Curve (AUC) value of the frame-level evaluation for the three benchmarks is 96.73%. The improvement ratio of AUC in the frame level between DR-STN and state-of-the-art methods is 7.6%.

Video anomaly detection using deep residual-spatiotemporal translation network / Ganokratanaa, T.; Aramvith, S.; Sebe, N.. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 155:(2022), pp. 143-150. [10.1016/j.patrec.2021.11.001]

Video anomaly detection using deep residual-spatiotemporal translation network

Sebe N.
2022-01-01

Abstract

Video anomaly detection has gained significant attention in the current intelligent surveillance systems. We propose Deep Residual Spatiotemporal Translation Network (DR-STN), a novel unsupervised Deep Residual conditional Generative Adversarial Network (DR-cGAN) model with an Online Hard Negative Mining (OHNM) approach. The proposed DR-cGAN provides a wider network to learn a mapping from spatial to temporal representations and enhance the perceptual quality of synthesized images from a generator. During DR-cGAN training, we take only the frames of normal events to produce their corresponding dense optical flow. At testing time, we compute the reconstruction error in local pixels between the synthesized and the real dense optical flow and then apply OHNM to remove false-positive detection results. Finally, a semantic region merging is introduced to integrate the intensities of all the individual abnormal objects into a full output frame. The proposed DR-STN has been extensively evaluated on publicly available benchmarks, including UCSD, UMN, and CUHK Avenue, demonstrating superior results over other state-of-the-art methods both in frame-level and pixel-level evaluations. The average Area Under the Curve (AUC) value of the frame-level evaluation for the three benchmarks is 96.73%. The improvement ratio of AUC in the frame level between DR-STN and state-of-the-art methods is 7.6%.
2022
Ganokratanaa, T.; Aramvith, S.; Sebe, N.
Video anomaly detection using deep residual-spatiotemporal translation network / Ganokratanaa, T.; Aramvith, S.; Sebe, N.. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 155:(2022), pp. 143-150. [10.1016/j.patrec.2021.11.001]
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0167865521003925-main.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.3 MB
Formato Adobe PDF
1.3 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/340575
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 12
social impact