Test-Time Vocabulary Adaptation for Language-Driven Object Detection

IRIS

Open-Vocabulary object detection models allow users to freely specify a class vocabulary in natural language at test time, guiding the detection of desired objects. However, vocabularies can be overly broad or even mis-specified, hampering the overall performance of the detector. In this work, we propose a plug-and-play Vocabulary Adapter (VocAda) to refine the user-defined vocabulary, automatically tailoring it to categories that are relevant for a given image. VocAda does not require any training, it operates at inference time in three steps: i) it uses an image captionner to describe visible objects, ii) it parses nouns from those captions, and iii) it selects relevant classes from the user-defined vocabulary, discarding irrelevant ones. Experiments on COCO and Objects365 with three state-of-the-art detectors show that VocAda consistently improves performance, proving its versatility. The code is open source.

Test-Time Vocabulary Adaptation for Language-Driven Object Detection / Liu, Mingxuan; Hayes, Tyler L.; Mancini, Massimiliano; Ricci, Elisa; Volpi, Riccardo; Csurka, Gabriela. - (2025), pp. 540-545. ( 2025 IEEE International Conference on Image Processing (ICIP) USA 2025) [10.1109/icip55913.2025.11084618].

Test-Time Vocabulary Adaptation for Language-Driven Object Detection

Liu, Mingxuan;Hayes, Tyler L.;Mancini, Massimiliano;Ricci, Elisa;Volpi, Riccardo;Csurka, Gabriela

2025-01-01

Abstract

Open-Vocabulary object detection models allow users to freely specify a class vocabulary in natural language at test time, guiding the detection of desired objects. However, vocabularies can be overly broad or even mis-specified, hampering the overall performance of the detector. In this work, we propose a plug-and-play Vocabulary Adapter (VocAda) to refine the user-defined vocabulary, automatically tailoring it to categories that are relevant for a given image. VocAda does not require any training, it operates at inference time in three steps: i) it uses an image captionner to describe visible objects, ii) it parses nouns from those captions, and iii) it selects relevant classes from the user-defined vocabulary, discarding irrelevant ones. Experiments on COCO and Objects365 with three state-of-the-art detectors show that VocAda consistently improves performance, proving its versatility. The code is open source.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del volume (Proceedings title)
	
				2025 IEEE International Conference on Image Processing (ICIP)
			
	Luogo di edizione (Place of publication)
	
				Los Alamitos, CA, USA
			
	Casa editrice (Publisher)
	
				IEEE Computer Society
			
	ISBN
	
				979-8-3315-2379-4
			
	Tutti gli autori
	
						Liu, Mingxuan; Hayes, Tyler L.; Mancini, Massimiliano; Ricci, Elisa; Volpi, Riccardo; Csurka, Gabriela
					
	Citazione
	
				Test-Time Vocabulary Adaptation for Language-Driven Object Detection / Liu, Mingxuan; Hayes, Tyler L.; Mancini, Massimiliano; Ricci, Elisa; Volpi, Riccardo; Csurka, Gabriela. - (2025), pp. 540-545. ( 2025 IEEE International Conference on Image Processing (ICIP) USA 2025) [10.1109/icip55913.2025.11084618].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Test_Time_Adaptations (1).pdf embargo fino al 17/09/2027 Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.98 MB Formato Adobe PDF Visualizza/Apri	1.98 MB	Adobe PDF	Visualizza/Apri
Test-Time_Vocabulary_Adaptation_for_Language-Driven_Object_Detection.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.05 MB Formato Adobe PDF Visualizza/Apri	2.05 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/472210

Citazioni

ND

ND

ND

0

social impact