For an action recognition system a decisive component is represented by the feature encoding part which builds the final representation that serves as input to a classifier. One of the shortcomings of the existing encoding approaches is the fact that they are built around hand-crafted features and they are not also highly competitive on encoding the current deep features, necessary in many practical scenarios. In this work we propose two solutions specifically designed for encoding local deep features, taking advantage of the nature of deep networks, focusing on capturing the highest feature response of the convolutional maps. The proposed approaches for deep feature encoding provide a solution to encapsulate the features extracted with a convolutional neural network over the entire video. In terms of accuracy our encodings outperform by a large margin the current most widely used and powerful encoding approaches, while being extremely efficient for the computational cost. Evaluated in...

Simple, efficient and effective encodings of local deep features for video action recognition / Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu. - (2017), pp. 218-225. ( 17th ACM International Conference on Multimedia Retrieval, ICMR 2017 Bucharest 2017) [10.1145/3078971.3078988].

Simple, efficient and effective encodings of local deep features for video action recognition

Duta, Ionut C.;Sebe, Nicu
2017-01-01

Abstract

For an action recognition system a decisive component is represented by the feature encoding part which builds the final representation that serves as input to a classifier. One of the shortcomings of the existing encoding approaches is the fact that they are built around hand-crafted features and they are not also highly competitive on encoding the current deep features, necessary in many practical scenarios. In this work we propose two solutions specifically designed for encoding local deep features, taking advantage of the nature of deep networks, focusing on capturing the highest feature response of the convolutional maps. The proposed approaches for deep feature encoding provide a solution to encapsulate the features extracted with a convolutional neural network over the entire video. In terms of accuracy our encodings outperform by a large margin the current most widely used and powerful encoding approaches, while being extremely efficient for the computational cost. Evaluated in...
2017
ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval
Duta, Ionut C.
New York
Association for Computing Machinery, Inc
9781450347013
Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu
Simple, efficient and effective encodings of local deep features for video action recognition / Duta, Ionut C.; Ionescu, Bogdan; Aizawa, Kiyoharu; Sebe, Nicu. - (2017), pp. 218-225. ( 17th ACM International Conference on Multimedia Retrieval, ICMR 2017 Bucharest 2017) [10.1145/3078971.3078988].
File in questo prodotto:
File Dimensione Formato  
p218-duta.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.23 MB
Formato Adobe PDF
1.23 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/193356
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact