Given a word in context, the task of VisualWord Sense Disambiguation consists of selecting the correct image among a set of candidates. To select the correct image, we propose a solution blending text augmentation and multi-modal models. Text augmentation leverages the fine-grained semantic annotation from Word-Net to get a better representation of the textual component. We then compare this sense-augmented text to the image set using pre-trained multimodal models CLIP and ViLT. Our system has been ranked 16th for the English language, achieving 68.5 points for hit rate and 79.2 for mean reciprocal rank.

GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images / Zhang, Shibingfeng; Nath, Shantanu; Mazzaccara, Davide. - ELETTRONICO. - (2023), pp. 1592-1597. (Intervento presentato al convegno SemEval-2023 tenutosi a Toronto, Canada nel 13th-14th July 2023) [10.18653/v1/2023.semeval-1.219].

GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images

Mazzaccara, Davide
2023-01-01

Abstract

Given a word in context, the task of VisualWord Sense Disambiguation consists of selecting the correct image among a set of candidates. To select the correct image, we propose a solution blending text augmentation and multi-modal models. Text augmentation leverages the fine-grained semantic annotation from Word-Net to get a better representation of the textual component. We then compare this sense-augmented text to the image set using pre-trained multimodal models CLIP and ViLT. Our system has been ranked 16th for the English language, achieving 68.5 points for hit rate and 79.2 for mean reciprocal rank.
2023
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Toronto, Canada
Association for Computational Linguistics
Zhang, Shibingfeng; Nath, Shantanu; Mazzaccara, Davide
GPL at SemEval-2023 Task 1: WordNet and CLIP to disambiguate images / Zhang, Shibingfeng; Nath, Shantanu; Mazzaccara, Davide. - ELETTRONICO. - (2023), pp. 1592-1597. (Intervento presentato al convegno SemEval-2023 tenutosi a Toronto, Canada nel 13th-14th July 2023) [10.18653/v1/2023.semeval-1.219].
File in questo prodotto:
File Dimensione Formato  
2023.semeval-1.219.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.19 MB
Formato Adobe PDF
1.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/388134
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact