Learning quantification from images: A structured neural architecture

Sorodoc, Ionut; Pezzelle, Sandro; Herbelot, Aurelie; Dimiccoli, Mariella; Bernardi, Raffaella

doi:10.1017/S1351324918000128

Major advances have recently been made in merging language and vision representations. Most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw multimodal data to perform certain types of higher level reasoning, expressed in natural language by function words. A case in point is given by their ability to learn quantifiers, i.e. expressions like few, some and all. From formal semantics and cognitive linguistics, we know that quantifiers are relations over sets which, as a simplification, we can see as proportions. For instance, in most fish are red, most encodes the proportion of fish which are red fish. In this paper, we study how well current neural network strategies model such relations. We propose a task where, given an image and a query expressed by an object–property pair, the system must return a quantifier expressing which proportions of the queried object have the queried property. Our contributions are twofold. First, we show that the best performance on this task involves coupling state-of-the-art attention mechanisms with a network architecture mirroring the logical structure assigned to quantifiers by classic linguistic formalisation. Second, we introduce a new balanced dataset of image scenarios associated with quantification queries, which we hope will foster further research in this area.

Learning quantification from images: A structured neural architecture / Sorodoc, Ionut-teodor; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Dimiccoli, Mariella; Bernardi, Raffaella. - In: NATURAL LANGUAGE ENGINEERING. - ISSN 1351-3249. - ELETTRONICO. - 2018, 24:3(2018), pp. 363-392. [10.1017/S1351324918000128]

Learning quantification from images: A structured neural architecture

Ionut Sorodoc;Sandro Pezzelle;Aurelie Herbelot;Mariella Dimiccoli;Raffaella Bernardi

2018-01-01

Abstract

Major advances have recently been made in merging language and vision representations. Most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw multimodal data to perform certain types of higher level reasoning, expressed in natural language by function words. A case in point is given by their ability to learn quantifiers, i.e. expressions like few, some and all. From formal semantics and cognitive linguistics, we know that quantifiers are relations over sets which, as a simplification, we can see as proportions. For instance, in most fish are red, most encodes the proportion of fish which are red fish. In this paper, we study how well current neural network strategies model such relations. We propose a task where, given an image and a query expressed by an object–property pair, the system must return a quantifier expressing which proportions of the queried object have the queried property. Our contributions are twofold. First, we show that the best performance on this task involves coupling state-of-the-art attention mechanisms with a network architecture mirroring the logical structure assigned to quantifiers by classic linguistic formalisation. Second, we introduce a new balanced dataset of image scenarios associated with quantification queries, which we hope will foster further research in this area.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2018
			
	Titolo del periodico (Journal title)
	
				NATURAL LANGUAGE ENGINEERING
			
	Numero e parte del fascicolo (Issue number and part)
	
				3
			
	DOI
	
				https://dx.doi.org/10.1017/S1351324918000128
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85044667404
			
	Codice WOS (WOS identifier)
	
				WOS:000430702300002
			
	Tutti gli autori
	
						Sorodoc, Ionut-teodor; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Dimiccoli, Mariella; Bernardi, Raffaella
					
	Citazione
	
				Learning quantification from images: A structured neural architecture / Sorodoc, Ionut-teodor; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Dimiccoli, Mariella; Bernardi, Raffaella. - In: NATURAL LANGUAGE ENGINEERING. - ISSN 1351-3249. - ELETTRONICO. - 2018, 24:3(2018), pp. 363-392. [10.1017/S1351324918000128]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
camera-ready.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.36 MB Formato Adobe PDF Visualizza/Apri	1.36 MB	Adobe PDF	Visualizza/Apri