Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. The previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this work, we take an orthogonal approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain a rich world prior surprisingly robust to domain-shift. We showcase the unreasonable effectiveness of integrating LLVMs for SFVUDA by devising an intuitive and parameter-efficient method, which we name Domain Adaptation with Large Language-Vision models (DALL-V), that distills the world prior and complementary source model information into a student network tailored for the target. Despite the simplicity, DALL-V 1 achieves significant improvement over state-of-the-art SFVUDA methods.

The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation / Zara, Giacomo; Conti, Alessandro; Roy, Subhankar; Lathuilière, Stéphane; Rota, Paolo; Ricci, Elisa. - (2023), pp. 10273-10283. (Intervento presentato al convegno ICCV 2024 tenutosi a Parigi, Francia nel 01-06 October 2023) [10.1109/ICCV51070.2023.00946].

The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

Zara, Giacomo;Conti, Alessandro;Roy, Subhankar;Lathuilière, Stéphane;Rota, Paolo;Ricci, Elisa
2023-01-01

Abstract

Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. The previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this work, we take an orthogonal approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain a rich world prior surprisingly robust to domain-shift. We showcase the unreasonable effectiveness of integrating LLVMs for SFVUDA by devising an intuitive and parameter-efficient method, which we name Domain Adaptation with Large Language-Vision models (DALL-V), that distills the world prior and complementary source model information into a student network tailored for the target. Despite the simplicity, DALL-V 1 achieves significant improvement over state-of-the-art SFVUDA methods.
2023
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Piscataway, NJ USA
IEEE Computer Society
979-8-3503-0718-4
Zara, Giacomo; Conti, Alessandro; Roy, Subhankar; Lathuilière, Stéphane; Rota, Paolo; Ricci, Elisa
The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation / Zara, Giacomo; Conti, Alessandro; Roy, Subhankar; Lathuilière, Stéphane; Rota, Paolo; Ricci, Elisa. - (2023), pp. 10273-10283. (Intervento presentato al convegno ICCV 2024 tenutosi a Parigi, Francia nel 01-06 October 2023) [10.1109/ICCV51070.2023.00946].
File in questo prodotto:
File Dimensione Formato  
Zara_The_Unreasonable_Effectiveness_of_Large_Language-Vision_Models_for_Source-Free_Video_ICCV_2023_paper.pdf

accesso aperto

Descrizione: ICCV paper is the Open Access version, provided by the Computer Vision Foundation. Except for this watermark, it is identical to the accepted version;
Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 753.02 kB
Formato Adobe PDF
753.02 kB Adobe PDF Visualizza/Apri
The_Unreasonable_Effectiveness_of_Large_Language-Vision_Models_for_Source-free_Video_Domain_Adaptation.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.28 MB
Formato Adobe PDF
1.28 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/400792
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact