AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation

IRIS

Open-set Unsupervised Video Domain Adaptation (OU-VDA) deals with the task of adapting an action recognition model from a labelled source domain to an unlabelled target domain that contains "target-private" categories, which are present in the target but absent in the source. In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning by proposing to use pre-trained Language and Vision Models (CLIP). The CLIP is well suited for OUVDA due to its rich representation and the zero-shot recognition capabilities. However, rejecting target-private instances with the CLIP's zero-shot protocol requires oracle knowledge about the target-private label names. To circumvent the impossibility of the knowledge of label names, we propose AutoLabel that automatically discovers and generates object-centric compositional candidate target-private class names. Despite its simplicity, we show that CLIP when equipped with AutoLabel can satisfactorily reject the target-private instances, thereby facilitating better alignment between the shared classes of the two domains. The code is available(1).

AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation / Zara, G; Roy, S; Rota, P; Ricci, E. - (2023), pp. 11504-11513. (Intervento presentato al convegno CVPR tenutosi a Vancouver BC, Canada nel 17-24 June 2023) [10.1109/CVPR52729.2023.01107].

AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation

Zara, G;Roy, S;Rota, P;Ricci, E

2023-01-01

Abstract

Open-set Unsupervised Video Domain Adaptation (OU-VDA) deals with the task of adapting an action recognition model from a labelled source domain to an unlabelled target domain that contains "target-private" categories, which are present in the target but absent in the source. In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning by proposing to use pre-trained Language and Vision Models (CLIP). The CLIP is well suited for OUVDA due to its rich representation and the zero-shot recognition capabilities. However, rejecting target-private instances with the CLIP's zero-shot protocol requires oracle knowledge about the target-private label names. To circumvent the impossibility of the knowledge of label names, we propose AutoLabel that automatically discovers and generates object-centric compositional candidate target-private class names. Despite its simplicity, we show that CLIP when equipped with AutoLabel can satisfactorily reject the target-private instances, thereby facilitating better alignment between the shared classes of the two domains. The code is available(1).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
			
	Luogo di edizione (Place of publication)
	
				10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
			
	Casa editrice (Publisher)
	
				IEEE COMPUTER SOC
			
	ISBN
	
				979-8-3503-0129-8
			
	Codice WOS (WOS identifier)
	
				WOS:001062522103078
			
	Tutti gli autori
	
						Zara, G; Roy, S; Rota, P; Ricci, E
					
	Citazione
	
				AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation / Zara, G; Roy, S; Rota, P; Ricci, E. - (2023), pp. 11504-11513. (Intervento presentato al  convegno CVPR tenutosi a Vancouver BC, Canada nel 17-24 June 2023) [10.1109/CVPR52729.2023.01107].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Zara_AutoLabel_CLIP-Based_Framework_for_Open-Set_Video_Domain_Adaptation_CVPR_2023_paper.pdf accesso aperto Descrizione: CVPR Open Access version Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.58 MB Formato Adobe PDF Visualizza/Apri	1.58 MB	Adobe PDF	Visualizza/Apri
AutoLabel_CLIP-based_framework_for_Open-Set_Video_Domain_Adaptation.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 529.92 kB Formato Adobe PDF Visualizza/Apri	529.92 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/400408

Citazioni

ND

ND

10

ND

social impact