Vision and Language Integration: Moving beyond Objects

IRIS

The last years have seen an explosion of work on the integration of vision and language data. New tasks like Image Captioning and Visual Questions Answering have been proposed and impressive results have been achieved. There is now a shared desire to gain an in-depth understanding of the strengths and weaknesses of those models. To this end, several datasets have been proposed to try and challenge the state-of-the-art. Those datasets, however, mostly focus on the interpretation of objects (as denoted by nouns in the corresponding captions). In this paper, we reuse a previously proposed methodology to evaluate the ability of current systems to move beyond objects and deal with attributes (as denoted by adjectives), actions (verbs), manner (adverbs) and spatial relations (prepositions). We show that the coarse representations given by current approaches are not informative enough to interpret attributes or actions, whilst spatial relations somewhat fare better, but only in attention models.

Vision and Language Integration: Moving beyond Objects / Ravi, Shekhar; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Nabi, Moin; Sangineto, Enver; Bernardi, Raffaella. - ELETTRONICO. - (2017), pp. 1-6. (Intervento presentato al convegno IWCS 2017 tenutosi a Montpellier, France nel 19th-22nd September 2017).

Vision and Language Integration: Moving beyond Objects

Ravi Shekhar;Sandro Pezzelle;Aurelie Herbelot;Moin Nabi;Enver Sangineto;Raffaella Bernardi

2017-01-01

Abstract

The last years have seen an explosion of work on the integration of vision and language data. New tasks like Image Captioning and Visual Questions Answering have been proposed and impressive results have been achieved. There is now a shared desire to gain an in-depth understanding of the strengths and weaknesses of those models. To this end, several datasets have been proposed to try and challenge the state-of-the-art. Those datasets, however, mostly focus on the interpretation of objects (as denoted by nouns in the corresponding captions). In this paper, we reuse a previously proposed methodology to evaluate the ability of current systems to move beyond objects and deal with attributes (as denoted by adjectives), actions (verbs), manner (adverbs) and spatial relations (prepositions). We show that the coarse representations given by current approaches are not informative enough to interpret attributes or actions, whilst spatial relations somewhat fare better, but only in attention models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2017
			
	Titolo del volume (Proceedings title)
	
				IWCS 2017 12th International Conference on Computational Semantics: Short papers
			
	Luogo di edizione (Place of publication)
	
				Stroudsburg, USA
			
	Casa editrice (Publisher)
	
				ACL
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85061739546
			
	Tutti gli autori
	
						Ravi, Shekhar; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Nabi, Moin; Sangineto, Enver; Bernardi, Raffaella
					
	Citazione
	
				Vision and Language Integration: Moving beyond Objects / Ravi, Shekhar; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Nabi, Moin; Sangineto, Enver; Bernardi, Raffaella. - ELETTRONICO. - (2017), pp. 1-6. (Intervento presentato al  convegno IWCS 2017 tenutosi a Montpellier, France nel 19th-22nd September 2017).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
W17-6938-vision.pdf accesso aperto Descrizione: articolo principale Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 591.98 kB Formato Adobe PDF Visualizza/Apri	591.98 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/192745

Citazioni

ND

12

ND

ND

social impact