Open-Vocabulary object detection models allow users to freely specify a class vocabulary in natural language at test time, guiding the detection of desired objects. However, vocabularies can be overly broad or even mis-specified, hampering the overall performance of the detector. In this work, we propose a plug-and-play Vocabulary Adapter (VocAda) to refine the user-defined vocabulary, automatically tailoring it to categories that are relevant for a given image. VocAda does not require any training, it operates at inference time in three steps: i) it uses an image captionner to describe visible objects, ii) it parses nouns from those captions, and iii) it selects relevant classes from the user-defined vocabulary, discarding irrelevant ones. Experiments on COCO and Objects365 with three state-of-the-art detectors show that VocAda consistently improves performance, proving its versatility. The code is open source.
Test-Time Vocabulary Adaptation for Language-Driven Object Detection / Liu, Mingxuan; Hayes, Tyler L.; Mancini, Massimiliano; Ricci, Elisa; Volpi, Riccardo; Csurka, Gabriela. - (2025), pp. 540-545. ( 2025 IEEE International Conference on Image Processing (ICIP) USA 2025) [10.1109/icip55913.2025.11084618].
Test-Time Vocabulary Adaptation for Language-Driven Object Detection
Liu, Mingxuan;Mancini, Massimiliano;Ricci, Elisa;Volpi, Riccardo;
2025-01-01
Abstract
Open-Vocabulary object detection models allow users to freely specify a class vocabulary in natural language at test time, guiding the detection of desired objects. However, vocabularies can be overly broad or even mis-specified, hampering the overall performance of the detector. In this work, we propose a plug-and-play Vocabulary Adapter (VocAda) to refine the user-defined vocabulary, automatically tailoring it to categories that are relevant for a given image. VocAda does not require any training, it operates at inference time in three steps: i) it uses an image captionner to describe visible objects, ii) it parses nouns from those captions, and iii) it selects relevant classes from the user-defined vocabulary, discarding irrelevant ones. Experiments on COCO and Objects365 with three state-of-the-art detectors show that VocAda consistently improves performance, proving its versatility. The code is open source.| File | Dimensione | Formato | |
|---|---|---|---|
|
Test_Time_Adaptations (1).pdf
embargo fino al 17/09/2027
Tipologia:
Post-print referato (Refereed author’s manuscript)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.98 MB
Formato
Adobe PDF
|
1.98 MB | Adobe PDF | Visualizza/Apri |
|
Test-Time_Vocabulary_Adaptation_for_Language-Driven_Object_Detection.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
2.05 MB
Formato
Adobe PDF
|
2.05 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



