Unsupervised Domain Adaptation for Video Transformers in Action Recognition

da Costa, Victor G. Turrisi; Zara, Giacomo; Rota, Paolo; Oliveira-Santos, Thiago; Sebe, Nicu; Murino, Vittorio; Ricci, Elisa

doi:10.1109/ICPR56361.2022.9956679

Over the last few years, Unsupervised Domain Adaptation (UDA) techniques have acquired remarkable importance and popularity in computer vision. However, when compared to the extensive literature available for images, the field of videos is still relatively unexplored. On the other hand, the performance of a model in action recognition is heavily affected by domain shift. In this paper, we propose a simple and novel UDA approach for video action recognition. Our approach leverages recent advances on spatio-temporal transformers to build a robust source model that better generalises to the target domain. Furthermore, our architecture learns domain invariant features thanks to the introduction of a novel alignment loss term derived from the Information Bottleneck principle. We report results on two video action recognition benchmarks for UDA, showing state-of-the-art performance on HMDB ↔ UCF, as well as on Kinetics→NEC-Drone, which is more challenging. This demonstrates the effectiveness of our method in handling different levels of domain shift. The source code is available at https://github.com/vturrisi/UDAVT.

Unsupervised Domain Adaptation for Video Transformers in Action Recognition / da Costa, Victor G. Turrisi; Zara, Giacomo; Rota, Paolo; Oliveira-Santos, Thiago; Sebe, Nicu; Murino, Vittorio; Ricci, Elisa. - 2022-:(2022), pp. 1258-1265. (Intervento presentato al convegno 26th International Conference on Pattern Recognition, ICPR 2022 tenutosi a Palais des Congres de Montreal, can nel 2022) [10.1109/ICPR56361.2022.9956679].

Unsupervised Domain Adaptation for Video Transformers in Action Recognition

da Costa, Victor G. Turrisi;Zara, Giacomo;Rota, Paolo;Oliveira-Santos, Thiago;Sebe, Nicu;Murino, Vittorio;Ricci, Elisa

2022-01-01

Abstract

Over the last few years, Unsupervised Domain Adaptation (UDA) techniques have acquired remarkable importance and popularity in computer vision. However, when compared to the extensive literature available for images, the field of videos is still relatively unexplored. On the other hand, the performance of a model in action recognition is heavily affected by domain shift. In this paper, we propose a simple and novel UDA approach for video action recognition. Our approach leverages recent advances on spatio-temporal transformers to build a robust source model that better generalises to the target domain. Furthermore, our architecture learns domain invariant features thanks to the introduction of a novel alignment loss term derived from the Information Bottleneck principle. We report results on two video action recognition benchmarks for UDA, showing state-of-the-art performance on HMDB ↔ UCF, as well as on Kinetics→NEC-Drone, which is more challenging. This demonstrates the effectiveness of our method in handling different levels of domain shift. The source code is available at https://github.com/vturrisi/UDAVT.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				International Conference on Pattern Recognition
			
	Luogo di edizione (Place of publication)
	
				345 E 47TH ST, NEW YORK, NY 10017 USA
			
	Casa editrice (Publisher)
	
				IEEE
			
	ISBN
	
				978-1-6654-9062-7
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85143640146
			
	Codice WOS (WOS identifier)
	
				WOS:000897707601037
			
	Tutti gli autori
	
						da Costa, Victor G. Turrisi; Zara, Giacomo; Rota, Paolo; Oliveira-Santos, Thiago; Sebe, Nicu; Murino, Vittorio; Ricci, Elisa
					
	Citazione
	
				Unsupervised Domain Adaptation for Video Transformers in Action Recognition / da Costa, Victor G. Turrisi; Zara, Giacomo; Rota, Paolo; Oliveira-Santos, Thiago; Sebe, Nicu; Murino, Vittorio; Ricci, Elisa. - 2022-:(2022), pp. 1258-1265. (Intervento presentato al  convegno 26th International Conference on Pattern Recognition, ICPR 2022 tenutosi a Palais des Congres de Montreal, can nel 2022) [10.1109/ICPR56361.2022.9956679].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Unsupervised_Domain_Adaptation_for_Video_Transformers_in_Action_Recognition.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.52 MB Formato Adobe PDF Visualizza/Apri	2.52 MB	Adobe PDF	Visualizza/Apri