This paper addresses the issue of cross-class domain adaptation (CCDA) in semantic segmentation, where the target domain contains both shared and novel classes that are either unlabeled or unseen in the source domain. This problem is challenging, as the absence of labels for novel classes hampers the effective solutions of both cross-domain and cross-class problems. Since Visual Language Models (VLMs) have exhibited impressive generalization across diverse data distributions and are capable of generating zero-shot predictions without requiring task-specific training examples, we propose a label alignment method by leveraging VLMs to relabel pseudo labels for novel classes. Considering that VLMs typically provide only image-level predictions, we embed a two-stage method to enable fine-grained semantic segmentation and design a threshold based on the uncertainty of pseudo labels to exclude noisy VLM predictions. To further augment the supervision of novel classes, we devise memory banks with an adaptive update scheme to effectively manage accurate VLM predictions, which are then resampled to increase the sampling probability of novel classes. Through comprehensive experiments, we demonstrate the effectiveness and versatility of our proposed method across various CCDA scenarios.

Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models / Ren, W.; Xia, R.; Zheng, M.; Wu, Z.; Tang, Y.; Sebe, N.. - (2024), pp. 5005-5014. ( 32nd ACM International Conference on Multimedia, MM 2024 aus 2024) [10.1145/3664647.3681122].

Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models

Sebe N.
2024-01-01

Abstract

This paper addresses the issue of cross-class domain adaptation (CCDA) in semantic segmentation, where the target domain contains both shared and novel classes that are either unlabeled or unseen in the source domain. This problem is challenging, as the absence of labels for novel classes hampers the effective solutions of both cross-domain and cross-class problems. Since Visual Language Models (VLMs) have exhibited impressive generalization across diverse data distributions and are capable of generating zero-shot predictions without requiring task-specific training examples, we propose a label alignment method by leveraging VLMs to relabel pseudo labels for novel classes. Considering that VLMs typically provide only image-level predictions, we embed a two-stage method to enable fine-grained semantic segmentation and design a threshold based on the uncertainty of pseudo labels to exclude noisy VLM predictions. To further augment the supervision of novel classes, we devise memory banks with an adaptive update scheme to effectively manage accurate VLM predictions, which are then resampled to increase the sampling probability of novel classes. Through comprehensive experiments, we demonstrate the effectiveness and versatility of our proposed method across various CCDA scenarios.
2024
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
New York
Association for Computing Machinery, Inc
9798400706868
Ren, W.; Xia, R.; Zheng, M.; Wu, Z.; Tang, Y.; Sebe, N.
Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models / Ren, W.; Xia, R.; Zheng, M.; Wu, Z.; Tang, Y.; Sebe, N.. - (2024), pp. 5005-5014. ( 32nd ACM International Conference on Multimedia, MM 2024 aus 2024) [10.1145/3664647.3681122].
File in questo prodotto:
File Dimensione Formato  
3664647.3681122-compressed.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 420.9 kB
Formato Adobe PDF
420.9 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/439450
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact