Simple, efficient and effective encodings of local deep features for video action recognition

Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu

doi:10.1145/3078971.3078988

For an action recognition system a decisive component is represented by the feature encoding part which builds the final representation that serves as input to a classifier. One of the shortcomings of the existing encoding approaches is the fact that they are built around hand-crafted features and they are not also highly competitive on encoding the current deep features, necessary in many practical scenarios. In this work we propose two solutions specifically designed for encoding local deep features, taking advantage of the nature of deep networks, focusing on capturing the highest feature response of the convolutional maps. The proposed approaches for deep feature encoding provide a solution to encapsulate the features extracted with a convolutional neural network over the entire video. In terms of accuracy our encodings outperform by a large margin the current most widely used and powerful encoding approaches, while being extremely efficient for the computational cost. Evaluated in...

Simple, efficient and effective encodings of local deep features for video action recognition / Duta, I.C., Ionescu, B., Aizawa, K., Sebe, N.. - (2017), pp. 218-225. (17th ACM International Conference on Multimedia Retrieval, ICMR 2017 Bucharest 2017) [10.1145/3078971.3078988].

Simple, efficient and effective encodings of local deep features for video action recognition

Duta, Ionut C.;Ionescu, Bogdan;Aizawa, Kiyoharu;Sebe, Nicu

2017-01-01

Abstract

For an action recognition system a decisive component is represented by the feature encoding part which builds the final representation that serves as input to a classifier. One of the shortcomings of the existing encoding approaches is the fact that they are built around hand-crafted features and they are not also highly competitive on encoding the current deep features, necessary in many practical scenarios. In this work we propose two solutions specifically designed for encoding local deep features, taking advantage of the nature of deep networks, focusing on capturing the highest feature response of the convolutional maps. The proposed approaches for deep feature encoding provide a solution to encapsulate the features extracted with a convolutional neural network over the entire video. In terms of accuracy our encodings outperform by a large margin the current most widely used and powerful encoding approaches, while being extremely efficient for the computational cost. Evaluated in...

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2017
			
	Titolo del volume (Proceedings title)
	
				ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval
			
	Autore/i del libro (Book author/s)
	
				Duta, Ionut C.
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				Association for Computing Machinery, Inc
			
	ISBN
	
				9781450347013
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85021818671
			
	Codice WOS (WOS identifier)
	
				WOS:000610413000033
			
	Tutti gli autori
	
						Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu
					
	Citazione
	
				Simple, efficient and effective encodings of local deep features for video action recognition / Duta, I.C., Ionescu, B., Aizawa, K., Sebe, N.. - (2017), pp. 218-225. (17th ACM International Conference on Multimedia Retrieval, ICMR 2017 Bucharest 2017) [10.1145/3078971.3078988].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
p218-duta.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.23 MB Formato Adobe PDF Visualizza/Apri	1.23 MB	Adobe PDF	Visualizza/Apri