Video anomaly detection has gained significant attention in the current intelligent surveillance systems. We propose Deep Residual Spatiotemporal Translation Network (DR-STN), a novel unsupervised Deep Residual conditional Generative Adversarial Network (DR-cGAN) model with an Online Hard Negative Mining (OHNM) approach. The proposed DR-cGAN provides a wider network to learn a mapping from spatial to temporal representations and enhance the perceptual quality of synthesized images from a generator. During DR-cGAN training, we take only the frames of normal events to produce their corresponding dense optical flow. At testing time, we compute the reconstruction error in local pixels between the synthesized and the real dense optical flow and then apply OHNM to remove false-positive detection results. Finally, a semantic region merging is introduced to integrate the intensities of all the individual abnormal objects into a full output frame. The proposed DR-STN has been extensively evaluated on publicly available benchmarks, including UCSD, UMN, and CUHK Avenue, demonstrating superior results over other state-of-the-art methods both in frame-level and pixel-level evaluations. The average Area Under the Curve (AUC) value of the frame-level evaluation for the three benchmarks is 96.73%. The improvement ratio of AUC in the frame level between DR-STN and state-of-the-art methods is 7.6%.
Video anomaly detection using deep residual-spatiotemporal translation network / Ganokratanaa, T.; Aramvith, S.; Sebe, N.. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 155:(2022), pp. 143-150. [10.1016/j.patrec.2021.11.001]
Video anomaly detection using deep residual-spatiotemporal translation network
Sebe N.
2022-01-01
Abstract
Video anomaly detection has gained significant attention in the current intelligent surveillance systems. We propose Deep Residual Spatiotemporal Translation Network (DR-STN), a novel unsupervised Deep Residual conditional Generative Adversarial Network (DR-cGAN) model with an Online Hard Negative Mining (OHNM) approach. The proposed DR-cGAN provides a wider network to learn a mapping from spatial to temporal representations and enhance the perceptual quality of synthesized images from a generator. During DR-cGAN training, we take only the frames of normal events to produce their corresponding dense optical flow. At testing time, we compute the reconstruction error in local pixels between the synthesized and the real dense optical flow and then apply OHNM to remove false-positive detection results. Finally, a semantic region merging is introduced to integrate the intensities of all the individual abnormal objects into a full output frame. The proposed DR-STN has been extensively evaluated on publicly available benchmarks, including UCSD, UMN, and CUHK Avenue, demonstrating superior results over other state-of-the-art methods both in frame-level and pixel-level evaluations. The average Area Under the Curve (AUC) value of the frame-level evaluation for the three benchmarks is 96.73%. The improvement ratio of AUC in the frame level between DR-STN and state-of-the-art methods is 7.6%.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S0167865521003925-main.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
1.3 MB
Formato
Adobe PDF
|
1.3 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione