Action recognition in videos has gained substantial attention from the computer vision community due to the wide range of possible applications. Recent works have addressed this problem with deep learning methods. The main limitation of existing approaches is their difficulty to learn temporal dynamics due to the high computational load demanded for processing huge amounts of data required to train a model. To overcome this problem, we propose a Compressed Video Convolutional 3D network (CV-C3D). It exploits information from the compressed representation of a video in order to avoid the high computational cost for fully decoding the video stream. The speed up of the computation enables our network to use 3D convolutions for capturing the temporal context efficiently. Our network has the lowest computational complexity among all the compared approaches. Results of our approach in the task of action recognition on two public benchmarks, UCF-101 and HMDB-51, were comparable to the baselines, with the advantage of running at faster inference speed.

CV-C3D: Action recognition on compressed videos with convolutional 3D networks / Dos Santos, S. F.; Sebe, N.; Almeida, J.. - (2019), pp. 24-30. (Intervento presentato al convegno 32nd SIBGRAPI Conference on Graphics, Patterns and Images, SIBGRAPI 2019 tenutosi a CINCO - Riocentro Convention and Event Center, bra nel 2019) [10.1109/SIBGRAPI.2019.00012].

CV-C3D: Action recognition on compressed videos with convolutional 3D networks

Sebe N.;
2019-01-01

Abstract

Action recognition in videos has gained substantial attention from the computer vision community due to the wide range of possible applications. Recent works have addressed this problem with deep learning methods. The main limitation of existing approaches is their difficulty to learn temporal dynamics due to the high computational load demanded for processing huge amounts of data required to train a model. To overcome this problem, we propose a Compressed Video Convolutional 3D network (CV-C3D). It exploits information from the compressed representation of a video in order to avoid the high computational cost for fully decoding the video stream. The speed up of the computation enables our network to use 3D convolutions for capturing the temporal context efficiently. Our network has the lowest computational complexity among all the compared approaches. Results of our approach in the task of action recognition on two public benchmarks, UCF-101 and HMDB-51, were comparable to the baselines, with the advantage of running at faster inference speed.
2019
Proceedings - 32nd Conference on Graphics, Patterns and Images, SIBGRAPI 2019
New York
Institute of Electrical and Electronics Engineers Inc.
978-1-7281-5227-1
Dos Santos, S. F.; Sebe, N.; Almeida, J.
CV-C3D: Action recognition on compressed videos with convolutional 3D networks / Dos Santos, S. F.; Sebe, N.; Almeida, J.. - (2019), pp. 24-30. (Intervento presentato al convegno 32nd SIBGRAPI Conference on Graphics, Patterns and Images, SIBGRAPI 2019 tenutosi a CINCO - Riocentro Convention and Event Center, bra nel 2019) [10.1109/SIBGRAPI.2019.00012].
File in questo prodotto:
File Dimensione Formato  
08919874.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 318.54 kB
Formato Adobe PDF
318.54 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/250826
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 11
social impact