In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary with a Random Forest approach that allows significant speedup; (iii) use of a modified VLAD and FK representation to replace the classic Bag-of-Words and obtaining better performance. We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-of-the-art results in several classification scenarios.

Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors / Mironica, Ionut; Duta, Ionut Cosmin; Ionescu, Bogdan; Sebe, Niculae. - (2015), pp. 1-6. (Intervento presentato al convegno ICME 2015 tenutosi a Torino nel 29th June- 3rd July 2015) [10.1109/ICME.2015.7177489].

Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors

Duta, Ionut Cosmin;Sebe, Niculae
2015-01-01

Abstract

In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary with a Random Forest approach that allows significant speedup; (iii) use of a modified VLAD and FK representation to replace the classic Bag-of-Words and obtaining better performance. We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-of-the-art results in several classification scenarios.
2015
2015 IEEE International Conference on Multimedia and Expo (ICME)
Piscataway, NJ
IEEE
978-1-4799-7082-7
Mironica, Ionut; Duta, Ionut Cosmin; Ionescu, Bogdan; Sebe, Niculae
Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors / Mironica, Ionut; Duta, Ionut Cosmin; Ionescu, Bogdan; Sebe, Niculae. - (2015), pp. 1-6. (Intervento presentato al convegno ICME 2015 tenutosi a Torino nel 29th June- 3rd July 2015) [10.1109/ICME.2015.7177489].
File in questo prodotto:
File Dimensione Formato  
Mironica-ICME15.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 113.79 kB
Formato Adobe PDF
113.79 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/115059
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 0
social impact