Deep Unsupervised Key Frame Extraction for Efficient Video Classification

Tang, Hao; Ding, Lei; Songsong, Wu; Ren, Bin; Sebe, Nicu; Rota, Paolo

doi:10.1145/3571735

Video processing and analysis have become an urgent task since a huge amount of videos (e.g., Youtube, Hulu) are uploaded online every day. The extraction of representative key frames from videos is very important in video processing and analysis since it greatly reduces computing resources and time. Although great progress has been made recently, large-scale video classification remains an open problem, as the existing methods have not well balanced the performance and efficiency simultaneously. To tackle this problem, this work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC). The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically. The other is that it can preserve the temporal information of the video. Thus it improves the efficiency of video classification. Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification. Moreover, a weight fusion strategy of different input networks is presented to boost the performance. By optimizing both video classification and key frame extraction simultaneously, we achieve better classification performance and higher efficiency. We evaluate our method on two popular datasets (i.e., HMDB51 and UCF101) and the experimental results consistently demonstrate that our strategy achieves competitive performance and efficiency compared with the state-of-the-art approaches.

Deep Unsupervised Key Frame Extraction for Efficient Video Classification / Tang, Hao; Ding, Lei; Wu, Songsong; Ren, Bin; Sebe, Nicu; Rota, Paolo. - In: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS. - ISSN 1551-6857. - 19:3(2023), pp. 11901-11917. [10.1145/3571735]

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

Tang, Hao;Ding, Lei;Wu, Songsong;Ren, Bin;Sebe, Nicu;Rota, Paolo

2023-01-01

Abstract

Video processing and analysis have become an urgent task since a huge amount of videos (e.g., Youtube, Hulu) are uploaded online every day. The extraction of representative key frames from videos is very important in video processing and analysis since it greatly reduces computing resources and time. Although great progress has been made recently, large-scale video classification remains an open problem, as the existing methods have not well balanced the performance and efficiency simultaneously. To tackle this problem, this work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC). The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically. The other is that it can preserve the temporal information of the video. Thus it improves the efficiency of video classification. Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification. Moreover, a weight fusion strategy of different input networks is presented to boost the performance. By optimizing both video classification and key frame extraction simultaneously, we achieve better classification performance and higher efficiency. We evaluate our method on two popular datasets (i.e., HMDB51 and UCF101) and the experimental results consistently demonstrate that our strategy achieves competitive performance and efficiency compared with the state-of-the-art approaches.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del periodico (Journal title)
	
				ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS
			
	Numero e parte del fascicolo (Issue number and part)
	
				3
			
	DOI
	
				https://dx.doi.org/10.1145/3571735
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85205345621
			
	Codice WOS (WOS identifier)
	
				WOS:001011930300019
			
	Tutti gli autori
	
						Tang, Hao; Ding, Lei; Wu, Songsong; Ren, Bin; Sebe, Nicu; Rota, Paolo
					
	Citazione
	
				Deep Unsupervised Key Frame Extraction for Efficient Video Classification / Tang, Hao; Ding, Lei; Wu, Songsong; Ren, Bin; Sebe, Nicu; Rota, Paolo. - In: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS. - ISSN 1551-6857. - 19:3(2023), pp. 11901-11917. [10.1145/3571735]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
2211.06742.pdf accesso aperto Tipologia: Pre-print non referato (Non-refereed preprint) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 4.54 MB Formato Adobe PDF Visualizza/Apri	4.54 MB	Adobe PDF	Visualizza/Apri
3571735.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 8.94 MB Formato Adobe PDF Visualizza/Apri	8.94 MB	Adobe PDF	Visualizza/Apri