Deep learning for automatic violence detection: Tests on the AIRTLab dataset

Sernani, Paolo; Falcionelli, Nicola; Tomassini, Selene; Contardo, Paolo; Dragoni, Aldo Franco

doi:10.1109/ACCESS.2021.3131315

Following the growing availability of video surveillance cameras and the need for techniques to automatically identify events in video footages, there is an increasing interest towards automatic violence detection in videos. Deep learning-based architectures, such as 3D Convolutional Neural Networks, demonstrated their capability of extracting spatio-temporal features from videos, being effective in violence detection. However, friendly behaviours or fast moves such as hugs, small hits, claps, high fives, etc., can still cause false positives, interpreting a harmless action as violent. To this end, we present three deep-learning based models for violence detection and test them on the AIRTLab dataset, a novel dataset designed to check the robustness of algorithms against false positives. The objective is twofold: on one hand, we compute accuracy metrics on the three proposed models (two are based on transfer learning and one is trained from scratch), building a baseline of metrics for the AIRTLab dataset; on the other hand, we validate the capability of the proposed dataset of challenging the robustness to false positives. The results of the proposed models are in line with the scientific literature, in terms of accuracy, with transfer learning-based networks exhibiting better generalization capabilities than the trained from scratch network. Moreover, the tests highlighted that most of the classification errors concern the identification of non-violent clips, validating the design of the proposed dataset. Finally, to demonstrate the significance of the proposed models, the paper presents a comparison with the related literature, as well as with models based on well-established pre-trained 2D Convolutional Neural Networks 2D CNNs. Such comparison highlights that 3D models get better accuracy performance than time distributed 2D CNNs (merged with a recurrent model) in processing the spatio-temporal features of video clips. The source code of the experiments and the AIRTLab dataset are available in public repositories.

Deep learning for automatic violence detection: Tests on the AIRTLab dataset / Sernani, P., Falcionelli, N., Tomassini, S., Contardo, P., Dragoni, A.F.. - In: IEEE ACCESS. - ISSN 2169-3536. - ELETTRONICO. - 9:(2021), pp. 160580-160595. [10.1109/ACCESS.2021.3131315]

Deep learning for automatic violence detection: Tests on the AIRTLab dataset

Sernani, Paolo;Falcionelli, Nicola;Tomassini, Selene;Contardo, Paolo;Dragoni, Aldo Franco

2021-01-01

Abstract

Following the growing availability of video surveillance cameras and the need for techniques to automatically identify events in video footages, there is an increasing interest towards automatic violence detection in videos. Deep learning-based architectures, such as 3D Convolutional Neural Networks, demonstrated their capability of extracting spatio-temporal features from videos, being effective in violence detection. However, friendly behaviours or fast moves such as hugs, small hits, claps, high fives, etc., can still cause false positives, interpreting a harmless action as violent. To this end, we present three deep-learning based models for violence detection and test them on the AIRTLab dataset, a novel dataset designed to check the robustness of algorithms against false positives. The objective is twofold: on one hand, we compute accuracy metrics on the three proposed models (two are based on transfer learning and one is trained from scratch), building a baseline of metrics for the AIRTLab dataset; on the other hand, we validate the capability of the proposed dataset of challenging the robustness to false positives. The results of the proposed models are in line with the scientific literature, in terms of accuracy, with transfer learning-based networks exhibiting better generalization capabilities than the trained from scratch network. Moreover, the tests highlighted that most of the classification errors concern the identification of non-violent clips, validating the design of the proposed dataset. Finally, to demonstrate the significance of the proposed models, the paper presents a comparison with the related literature, as well as with models based on well-established pre-trained 2D Convolutional Neural Networks 2D CNNs. Such comparison highlights that 3D models get better accuracy performance than time distributed 2D CNNs (merged with a recurrent model) in processing the spatio-temporal features of video clips. The source code of the experiments and the AIRTLab dataset are available in public repositories.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2021
			
	Titolo del periodico (Journal title)
	
				IEEE ACCESS
			
	DOI
	
				https://dx.doi.org/10.1109/ACCESS.2021.3131315
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85120578509
			
	Codice WOS (WOS identifier)
	
				WOS:000728921200001
			
	Tutti gli autori
	
						Sernani, Paolo; Falcionelli, Nicola; Tomassini, Selene; Contardo, Paolo; Dragoni, Aldo Franco
					
	Citazione
	
				Deep learning for automatic violence detection: Tests on the AIRTLab dataset / Sernani, P., Falcionelli, N., Tomassini, S., Contardo, P., Dragoni, A.F.. - In: IEEE ACCESS. - ISSN 2169-3536. - ELETTRONICO. - 9:(2021), pp. 160580-160595. [10.1109/ACCESS.2021.3131315]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
2021 - IEEE Access.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 2.42 MB Formato Adobe PDF Visualizza/Apri	2.42 MB	Adobe PDF	Visualizza/Apri