Combining a mobile deep neural network and a recurrent layer for violence detection in videos

IRIS

Several techniques for the automatic detection of violent scenes in videos and security footage appeared in recent years, for example with the goal of unburdening authorities from the need of analyzing hours of Closed-Circuit TeleVision (CCTV) clips. In this regard, Deep Learning-based techniques such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) emerged as effective for violence detection. Nevertheless, most of such techniques require significant computational and memory resources to run the automatic detection of violence. Thus, we propose the combination of an established CNN, MobileNetV2, designed for the use in mobile and embedded devices with a recurrent layer to extract the spatio-temporal features in the security videos. A lightweight model can run in embedded devices, in a edge computing fashion, for example to allow processing the videos near the camera recording them, to preserve privacy. Specifically, we exploit transfer learning, as we use a pre-trained version of MobileNetV2, and we propose two different models combining it with a Bidirectional Long Short-Term Memory (Bi-LSTM) and a Convolutional LSTM (ConvLSTM). The paper presents accuracy tests of the two models on the AIRTLab dataset and a comparison with more complex models developed in our previous work, in order to evaluate the drop of accuracy necessary to use a model compatible with limited resources. The network composed of MobileNetV2 and the ConvLSTM scores a 94.1% accuracy, against the 96.1% of a model based on a more complex 3D CNN.

Combining a mobile deep neural network and a recurrent layer for violence detection in videos / Contardo, P.; Tomassini, S.; Falcionelli, N.; Dragoni, A. F.; Sernani, P.. - ELETTRONICO. - 3402:(2023), pp. 35-43. (Intervento presentato al convegno 5th International Conference on Recent Trends and Applications in Computer Science and Information Technology, RTA-CSIT 2023 tenutosi a Tirana, Albania nel 26-27 April 2023).

Combining a mobile deep neural network and a recurrent layer for violence detection in videos

Contardo, P.;Tomassini, S.;Falcionelli, N.;Dragoni, A. F.;Sernani, P.

2023-01-01

Abstract

Several techniques for the automatic detection of violent scenes in videos and security footage appeared in recent years, for example with the goal of unburdening authorities from the need of analyzing hours of Closed-Circuit TeleVision (CCTV) clips. In this regard, Deep Learning-based techniques such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) emerged as effective for violence detection. Nevertheless, most of such techniques require significant computational and memory resources to run the automatic detection of violence. Thus, we propose the combination of an established CNN, MobileNetV2, designed for the use in mobile and embedded devices with a recurrent layer to extract the spatio-temporal features in the security videos. A lightweight model can run in embedded devices, in a edge computing fashion, for example to allow processing the videos near the camera recording them, to preserve privacy. Specifically, we exploit transfer learning, as we use a pre-trained version of MobileNetV2, and we propose two different models combining it with a Bidirectional Long Short-Term Memory (Bi-LSTM) and a Convolutional LSTM (ConvLSTM). The paper presents accuracy tests of the two models on the AIRTLab dataset and a comparison with more complex models developed in our previous work, in order to evaluate the drop of accuracy necessary to use a model compatible with limited resources. The network composed of MobileNetV2 and the ConvLSTM scores a 94.1% accuracy, against the 96.1% of a model based on a more complex 3D CNN.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				CEUR Workshop Proceedings
			
	Luogo di edizione (Place of publication)
	
				Aachen, Germany
			
	Casa editrice (Publisher)
	
				CEUR-WS
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85161927368
			
	Tutti gli autori
	
						Contardo, P.; Tomassini, S.; Falcionelli, N.; Dragoni, A. F.; Sernani, P.
					
	Citazione
	
				Combining a mobile deep neural network and a recurrent layer for violence detection in videos / Contardo, P.; Tomassini, S.; Falcionelli, N.; Dragoni, A. F.; Sernani, P.. - ELETTRONICO. - 3402:(2023), pp. 35-43. (Intervento presentato al  convegno 5th International Conference on Recent Trends and Applications in Computer Science and Information Technology, RTA-CSIT 2023 tenutosi a Tirana, Albania nel 26-27 April 2023).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2023 - RTA-CSIT.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 1.34 MB Formato Adobe PDF Visualizza/Apri	1.34 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403292

Citazioni

ND

0

ND

ND

social impact