Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models

Ren, W.; Xia, R.; Zheng, M.; Wu, Z.; Tang, Y.; Sebe, N.

doi:10.1145/3664647.3681122

This paper addresses the issue of cross-class domain adaptation (CCDA) in semantic segmentation, where the target domain contains both shared and novel classes that are either unlabeled or unseen in the source domain. This problem is challenging, as the absence of labels for novel classes hampers the effective solutions of both cross-domain and cross-class problems. Since Visual Language Models (VLMs) have exhibited impressive generalization across diverse data distributions and are capable of generating zero-shot predictions without requiring task-specific training examples, we propose a label alignment method by leveraging VLMs to relabel pseudo labels for novel classes. Considering that VLMs typically provide only image-level predictions, we embed a two-stage method to enable fine-grained semantic segmentation and design a threshold based on the uncertainty of pseudo labels to exclude noisy VLM predictions. To further augment the supervision of novel classes, we devise memory banks with an adaptive update scheme to effectively manage accurate VLM predictions, which are then resampled to increase the sampling probability of novel classes. Through comprehensive experiments, we demonstrate the effectiveness and versatility of our proposed method across various CCDA scenarios.

Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models / Ren, W.; Xia, R.; Zheng, M.; Wu, Z.; Tang, Y.; Sebe, N.. - (2024), pp. 5005-5014. ( 32nd ACM International Conference on Multimedia, MM 2024 aus 2024) [10.1145/3664647.3681122].

Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models

Ren W.;Xia R.;Zheng M.;Wu Z.;Tang Y.;Sebe N.

2024-01-01

Abstract

This paper addresses the issue of cross-class domain adaptation (CCDA) in semantic segmentation, where the target domain contains both shared and novel classes that are either unlabeled or unseen in the source domain. This problem is challenging, as the absence of labels for novel classes hampers the effective solutions of both cross-domain and cross-class problems. Since Visual Language Models (VLMs) have exhibited impressive generalization across diverse data distributions and are capable of generating zero-shot predictions without requiring task-specific training examples, we propose a label alignment method by leveraging VLMs to relabel pseudo labels for novel classes. Considering that VLMs typically provide only image-level predictions, we embed a two-stage method to enable fine-grained semantic segmentation and design a threshold based on the uncertainty of pseudo labels to exclude noisy VLM predictions. To further augment the supervision of novel classes, we devise memory banks with an adaptive update scheme to effectively manage accurate VLM predictions, which are then resampled to increase the sampling probability of novel classes. Through comprehensive experiments, we demonstrate the effectiveness and versatility of our proposed method across various CCDA scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				Association for Computing Machinery, Inc
			
	ISBN
	
				9798400706868
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85209778051
			
	Tutti gli autori
	
						Ren, W.; Xia, R.; Zheng, M.; Wu, Z.; Tang, Y.; Sebe, N.
					
	Citazione
	
				Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models / Ren, W.; Xia, R.; Zheng, M.; Wu, Z.; Tang, Y.; Sebe, N.. - (2024), pp. 5005-5014. ( 32nd ACM International Conference on Multimedia, MM 2024 aus 2024) [10.1145/3664647.3681122].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
3664647.3681122-compressed.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 420.9 kB Formato Adobe PDF Visualizza/Apri	420.9 kB	Adobe PDF	Visualizza/Apri