GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images

Zhang, Shibingfeng; Nath, Shantanu; Mazzaccara, Davide

doi:10.18653/v1/2023.semeval-1.219

Given a word in context, the task of VisualWord Sense Disambiguation consists of selecting the correct image among a set of candidates. To select the correct image, we propose a solution blending text augmentation and multi-modal models. Text augmentation leverages the fine-grained semantic annotation from Word-Net to get a better representation of the textual component. We then compare this sense-augmented text to the image set using pre-trained multimodal models CLIP and ViLT. Our system has been ranked 16th for the English language, achieving 68.5 points for hit rate and 79.2 for mean reciprocal rank.

GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images / Zhang, Shibingfeng; Nath, Shantanu; Mazzaccara, Davide. - ELETTRONICO. - (2023), pp. 1592-1597. (Intervento presentato al convegno SemEval-2023 tenutosi a Toronto, Canada nel 13th-14th July 2023) [10.18653/v1/2023.semeval-1.219].

GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images

Zhang, Shibingfeng;Nath, Shantanu;Mazzaccara, Davide

2023-01-01

Abstract

Given a word in context, the task of VisualWord Sense Disambiguation consists of selecting the correct image among a set of candidates. To select the correct image, we propose a solution blending text augmentation and multi-modal models. Text augmentation leverages the fine-grained semantic annotation from Word-Net to get a better representation of the textual component. We then compare this sense-augmented text to the image set using pre-trained multimodal models CLIP and ViLT. Our system has been ranked 16th for the English language, achieving 68.5 points for hit rate and 79.2 for mean reciprocal rank.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
			
	Luogo di edizione (Place of publication)
	
				Toronto, Canada
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	Tutti gli autori
	
						Zhang, Shibingfeng; Nath, Shantanu; Mazzaccara, Davide
					
	Citazione
	
				GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images / Zhang, Shibingfeng; Nath, Shantanu; Mazzaccara, Davide. - ELETTRONICO. - (2023), pp. 1592-1597. (Intervento presentato al  convegno SemEval-2023 tenutosi a Toronto, Canada nel 13th-14th July 2023) [10.18653/v1/2023.semeval-1.219].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2023.semeval-1.219.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 1.19 MB Formato Adobe PDF Visualizza/Apri	1.19 MB	Adobe PDF	Visualizza/Apri