In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains “unknown” semantic categories that are not shared with the source. The challenge lies in aligning the shared classes of the two domains while separating the shared classes from the unknown ones. In this work we propose to address OUVDA with an unified contrastive learning framework that learns discriminative and well-clustered features. We also propose a video-oriented temporal contrastive loss that enables our method to better cluster the feature space by exploiting the freely available temporal information in video data. We show that discriminative feature space facilitates better separation of the ...

In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open -set video domain adaptation (OUVDA), where the target dataset contains "unknown"semantic categories that are not shared with the source. The challenge lies in aligning the shared classes of the two domains while separating the shared classes from the unknown ones. In this work we propose to address OUVDA with an unified contrastive learning framework that learns discriminative and well -clustered features. We also propose a video -oriented temporal contrastive loss that enables our method to better cluster the feature space by exploiting the freely available temporal information in video data. We show that discriminative feature space facilitates better separation of the unknown classes, and thereby allows us to use a simple similarity based score to identify them. We conduct thorough experimental evaluation on multiple OUVDA benchmarks and show the effectiveness of our proposed method against the prior art.

Simplifying open-set video domain adaptation with contrastive learning / Zara, Giacomo; Turrisi da Costa, Victor Guilherme; Roy, Subhankar; Rota, Paolo; Ricci, Elisa. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 241:103953(2024). [10.1016/j.cviu.2024.103953]

Simplifying open-set video domain adaptation with contrastive learning

Zara, Giacomo;Turrisi da Costa, Victor Guilherme;Roy, Subhankar;Rota, Paolo;Ricci, Elisa
2024-01-01

Abstract

In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains “unknown” semantic categories that are not shared with the source. The challenge lies in aligning the shared classes of the two domains while separating the shared classes from the unknown ones. In this work we propose to address OUVDA with an unified contrastive learning framework that learns discriminative and well-clustered features. We also propose a video-oriented temporal contrastive loss that enables our method to better cluster the feature space by exploiting the freely available temporal information in video data. We show that discriminative feature space facilitates better separation of the ...
2024
103953
Zara, Giacomo; Turrisi da Costa, Victor Guilherme; Roy, Subhankar; Rota, Paolo; Ricci, Elisa
Simplifying open-set video domain adaptation with contrastive learning / Zara, Giacomo; Turrisi da Costa, Victor Guilherme; Roy, Subhankar; Rota, Paolo; Ricci, Elisa. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 241:103953(2024). [10.1016/j.cviu.2024.103953]
File in questo prodotto:
File Dimensione Formato  
2301.03322v1.pdf

Solo gestori archivio

Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.57 MB
Formato Adobe PDF
1.57 MB Adobe PDF   Visualizza/Apri
1-s2.0-S1077314224000341-main.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.15 MB
Formato Adobe PDF
1.15 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/437441
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 3
  • OpenAlex ND
social impact