Open-set Unsupervised Video Domain Adaptation (OU-VDA) deals with the task of adapting an action recognition model from a labelled source domain to an unlabelled target domain that contains "target-private" categories, which are present in the target but absent in the source. In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning by proposing to use pre-trained Language and Vision Models (CLIP). The CLIP is well suited for OUVDA due to its rich representation and the zero-shot recognition capabilities. However, rejecting target-private instances with the CLIP's zero-shot protocol requires oracle knowledge about the target-private label names. To circumvent the impossibility of the knowledge of label names, we propose AutoLabel that automatically discovers and generates object-centric compositional candidate target-private class names. Despite its simplicity, we show that CLIP when equipped with AutoLabel can satisfactorily reject the target-private instances, thereby facilitating better alignment between the shared classes of the two domains. The code is available(1).
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation / Zara, G; Roy, S; Rota, P; Ricci, E. - (2023), pp. 11504-11513. (Intervento presentato al convegno CVPR tenutosi a Vancouver BC, Canada nel 17-24 June 2023) [10.1109/CVPR52729.2023.01107].
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
Zara, G;Roy, S;Rota, P
;Ricci, E
2023-01-01
Abstract
Open-set Unsupervised Video Domain Adaptation (OU-VDA) deals with the task of adapting an action recognition model from a labelled source domain to an unlabelled target domain that contains "target-private" categories, which are present in the target but absent in the source. In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning by proposing to use pre-trained Language and Vision Models (CLIP). The CLIP is well suited for OUVDA due to its rich representation and the zero-shot recognition capabilities. However, rejecting target-private instances with the CLIP's zero-shot protocol requires oracle knowledge about the target-private label names. To circumvent the impossibility of the knowledge of label names, we propose AutoLabel that automatically discovers and generates object-centric compositional candidate target-private class names. Despite its simplicity, we show that CLIP when equipped with AutoLabel can satisfactorily reject the target-private instances, thereby facilitating better alignment between the shared classes of the two domains. The code is available(1).File | Dimensione | Formato | |
---|---|---|---|
Zara_AutoLabel_CLIP-Based_Framework_for_Open-Set_Video_Domain_Adaptation_CVPR_2023_paper.pdf
accesso aperto
Descrizione: CVPR Open Access version
Tipologia:
Post-print referato (Refereed author’s manuscript)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.58 MB
Formato
Adobe PDF
|
1.58 MB | Adobe PDF | Visualizza/Apri |
AutoLabel_CLIP-based_framework_for_Open-Set_Video_Domain_Adaptation.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
529.92 kB
Formato
Adobe PDF
|
529.92 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione