Transferable-Guided Attention Is All You Need for Video Domain Adaptation

Sacilotti, André; Samuel Felipe Dos Santos,; Sebe, Nicu; Almeida, Jurandy

doi:10.1109/WACV61041.2025.00842

Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video UDA has been little explored. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism. A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge across different backbones. To improve the transferability of ViT, we introduce a novel and effective module, named Domain Transferable-guided Attention Block (DTAB). DTAB compels ViT to focus on the spatio-temporal transferability relationship among video frames by changing the self-attention mechanism to a transferability attention mechanism. Extensive experiments were conducted on UCF-HMDB, Kinetics-Gameplay, and Kinetics-NEC Drone datasets, with different backbones, like ResNet101, I3D, and STAM, to verify the effectiveness of TransferAttn compared with state-of-the-art approaches. Also, we demonstrate that DTAB yields performance gains when applied to other state-of-the-art transformer-based UDA methods from both video and image domains. Our code is available at https://github.com/Andre-Sacilotti/transferattn-project-code.

Transferable-Guided Attention Is All You Need for Video Domain Adaptation / Sacilotti, André; Felipe Dos Santos, Samuel; Sebe, Nicu; Almeida, Jurandy. - (2025), pp. 8691-8701. ( 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 usa 2025) [10.1109/WACV61041.2025.00842].

Transferable-Guided Attention Is All You Need for Video Domain Adaptation

André Sacilotti;Samuel Felipe Dos Santos;Nicu Sebe;Jurandy Almeida

2025-01-01

Abstract

Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video UDA has been little explored. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism. A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge across different backbones. To improve the transferability of ViT, we introduce a novel and effective module, named Domain Transferable-guided Attention Block (DTAB). DTAB compels ViT to focus on the spatio-temporal transferability relationship among video frames by changing the self-attention mechanism to a transferability attention mechanism. Extensive experiments were conducted on UCF-HMDB, Kinetics-Gameplay, and Kinetics-NEC Drone datasets, with different backbones, like ResNet101, I3D, and STAM, to verify the effectiveness of TransferAttn compared with state-of-the-art approaches. Also, we demonstrate that DTAB yields performance gains when applied to other state-of-the-art transformer-based UDA methods from both video and image domains. Our code is available at https://github.com/Andre-Sacilotti/transferattn-project-code.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del volume (Proceedings title)
	
				Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				Institute of Electrical and Electronics Engineers Inc.
			
	ISBN
	
				9798331510831
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-105003635443
			
	Codice WOS (WOS identifier)
	
				WOS:001521272600352
			
	Tutti gli autori
	
						Sacilotti, André; Felipe Dos Santos, Samuel; Sebe, Nicu; Almeida, Jurandy
					
	Citazione
	
				Transferable-Guided Attention Is All You Need for Video Domain Adaptation / Sacilotti, André; Felipe Dos Santos, Samuel; Sebe, Nicu; Almeida, Jurandy. - (2025), pp. 8691-8701. ( 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 usa 2025) [10.1109/WACV61041.2025.00842].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Sacilotti_Transferable-Guided_Attention_is_All_You_Need_for_Video_Domain_Adaptation_WACV_2025_paper.pdf accesso aperto Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Altra licenza (Other type of license) Dimensione 1.99 MB Formato Adobe PDF Visualizza/Apri	1.99 MB	Adobe PDF	Visualizza/Apri
Transferable-Guided_Attention_Is_All_You_Need_for_Video_Domain_Adaptation.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.77 MB Formato Adobe PDF Visualizza/Apri	1.77 MB	Adobe PDF	Visualizza/Apri