Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video UDA has been little explored. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism. A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge across different backbones. To improve the transferability of ViT, we introduce a novel and effective module, named Domain Transferable-guided Attention Block (DTAB). DTAB compels ViT to focus on the spatio-temporal transferability relationship among video frames by changing the self-attention mechanism to a transferability attention mechanism. Extensive experiments were conducted on UCF-HMDB, Kinetics-Gameplay, and Kinetics-NEC Drone datasets, with different backbones, like ResNet101, I3D, and STAM, to verify the effectiveness of TransferAttn compared with state-of-the-art approaches. Also, we demonstrate that DTAB yields performance gains when applied to other state-of-the-art transformer-based UDA methods from both video and image domains. Our code is available at https://github.com/Andre-Sacilotti/transferattn-project-code.

Transferable-Guided Attention Is All You Need for Video Domain Adaptation / Sacilotti, André; Felipe Dos Santos, Samuel; Sebe, Nicu; Almeida, Jurandy. - (2025), pp. 8691-8701. ( 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 usa 2025) [10.1109/WACV61041.2025.00842].

Transferable-Guided Attention Is All You Need for Video Domain Adaptation

Nicu Sebe;
2025-01-01

Abstract

Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video UDA has been little explored. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism. A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge across different backbones. To improve the transferability of ViT, we introduce a novel and effective module, named Domain Transferable-guided Attention Block (DTAB). DTAB compels ViT to focus on the spatio-temporal transferability relationship among video frames by changing the self-attention mechanism to a transferability attention mechanism. Extensive experiments were conducted on UCF-HMDB, Kinetics-Gameplay, and Kinetics-NEC Drone datasets, with different backbones, like ResNet101, I3D, and STAM, to verify the effectiveness of TransferAttn compared with state-of-the-art approaches. Also, we demonstrate that DTAB yields performance gains when applied to other state-of-the-art transformer-based UDA methods from both video and image domains. Our code is available at https://github.com/Andre-Sacilotti/transferattn-project-code.
2025
Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
New York
Institute of Electrical and Electronics Engineers Inc.
9798331510831
Sacilotti, André; Felipe Dos Santos, Samuel; Sebe, Nicu; Almeida, Jurandy
Transferable-Guided Attention Is All You Need for Video Domain Adaptation / Sacilotti, André; Felipe Dos Santos, Samuel; Sebe, Nicu; Almeida, Jurandy. - (2025), pp. 8691-8701. ( 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 usa 2025) [10.1109/WACV61041.2025.00842].
File in questo prodotto:
File Dimensione Formato  
Sacilotti_Transferable-Guided_Attention_is_All_You_Need_for_Video_Domain_Adaptation_WACV_2025_paper.pdf

accesso aperto

Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Altra licenza (Other type of license)
Dimensione 1.99 MB
Formato Adobe PDF
1.99 MB Adobe PDF Visualizza/Apri
Transferable-Guided_Attention_Is_All_You_Need_for_Video_Domain_Adaptation.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.77 MB
Formato Adobe PDF
1.77 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/453793
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 2
social impact