The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG, HOF and MBH descriptors and release Matlab code; (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method; (3) We investigate the trade-off between accuracy and computational efficiency for computing the feature vocabulary, using and comparing most of the commonly adopted vector quantization techniques: k-means, hierarchical k-means, Random Forests, Fisher Vectors and VLAD.
Video Classification with Densely Extracted HOG/HOF/MBH Features: An Evaluation of the Accuracy/Computational Efficiency Trade-off / J., Uijlings; Duta, Ionut Cosmin; Sangineto, Enver; Sebe, Niculae. - In: INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL. - ISSN 2192-6611. - 4:(2015), pp. 33-44. [10.1007/s13735-014-0069-5]
Video Classification with Densely Extracted HOG/HOF/MBH Features: An Evaluation of the Accuracy/Computational Efficiency Trade-off
Duta, Ionut Cosmin;Sangineto, Enver;Sebe, Niculae
2015-01-01
Abstract
The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG, HOF and MBH descriptors and release Matlab code; (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method; (3) We investigate the trade-off between accuracy and computational efficiency for computing the feature vocabulary, using and comparing most of the commonly adopted vector quantization techniques: k-means, hierarchical k-means, Random Forests, Fisher Vectors and VLAD.File | Dimensione | Formato | |
---|---|---|---|
realtimeVideoClassificationIJMIR2014.pdf
accesso aperto
Tipologia:
Pre-print non referato (Non-refereed preprint)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
495.96 kB
Formato
Adobe PDF
|
495.96 kB | Adobe PDF | Visualizza/Apri |
Uijlings2015_Article_VideoClassificationWithDensely.pdf
Solo gestori archivio
Descrizione: Versione editoriale inserita nella rivista
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
834.94 kB
Formato
Adobe PDF
|
834.94 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione