Spatio-temporal VLAD encoding for human action recognition in videos

Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu

doi:10.1007/978-3-319-51811-4_30

Encoding is one of the key factors for building an effective video representation. In the recent works, super vector-based encoding approaches are highlighted as one of the most powerful representation generators. Vector of Locally Aggregated Descriptors (VLAD) is one of the most widely used super vector methods. However, one of the limitations of VLAD encoding is the lack of spatial information captured from the data. This is critical, especially when dealing with video information. In this work, we propose Spatio-temporal VLAD (ST-VLAD), an extended encoding method which incorporates spatio-temporal information within the encoding process. This is carried out by proposing a video division and extracting specific information over the feature group of each video split. Experimental validation is performed using both hand-crafted and deep features. Our pipeline for action recognition with the proposed encoding method obtains state-of-the-art performance over three challenging datasets: ...

Spatio-temporal VLAD encoding for human action recognition in videos / Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu. - 10132:(2017), pp. 365-378. ( 23rd International Conference on MultiMedia Modeling, MMM 2017 Islanda 2017) [10.1007/978-3-319-51811-4_30].

Spatio-temporal VLAD encoding for human action recognition in videos

Duta, Ionut C.;Ionescu, Bogdan;Aizawa, Kiyoharu;Sebe, Nicu

2017-01-01

Abstract

Encoding is one of the key factors for building an effective video representation. In the recent works, super vector-based encoding approaches are highlighted as one of the most powerful representation generators. Vector of Locally Aggregated Descriptors (VLAD) is one of the most widely used super vector methods. However, one of the limitations of VLAD encoding is the lack of spatial information captured from the data. This is critical, especially when dealing with video information. In this work, we propose Spatio-temporal VLAD (ST-VLAD), an extended encoding method which incorporates spatio-temporal information within the encoding process. This is carried out by proposing a video division and extracting specific information over the feature group of each video split. Experimental validation is performed using both hand-crafted and deep features. Our pipeline for action recognition with the proposed encoding method obtains state-of-the-art performance over three challenging datasets: ...

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2017
			
	Titolo del volume (Proceedings title)
	
				23rd International Conference on MultiMedia Modeling, MMM 2017
			
	Luogo di edizione (Place of publication)
	
				Heidelberg
			
	Casa editrice (Publisher)
	
				Springer Verlag
			
	ISBN
	
				9783319518107
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85009810516
			
	Codice WOS (WOS identifier)
	
				WOS:000418363200030
			
	Tutti gli autori
	
						Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu
					
	Citazione
	
				Spatio-temporal VLAD encoding for human action recognition in videos / Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu. - 10132:(2017), pp. 365-378. ( 23rd International Conference on MultiMedia Modeling, MMM 2017 Islanda 2017) [10.1007/978-3-319-51811-4_30].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
MMM2017_stVLAD.pdf Solo gestori archivio Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 516.78 kB Formato Adobe PDF Visualizza/Apri	516.78 kB	Adobe PDF	Visualizza/Apri