Major advances have recently been made in merging language and vision representations. Most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw multimodal data to perform certain types of higher level reasoning, expressed in natural language by function words. A case in point is given by their ability to learn quantifiers, i.e. expressions like few, some and all. From formal semantics and cognitive linguistics, we know that quantifiers are relations over sets which, as a simplification, we can see as proportions. For instance, in most fish are red, most encodes the proportion of fish which are red fish. In this paper, we study how well current neural network strategies model such relations. We propose a task where, given an image and a query expressed by an object–property pair, the system must return a quantifier expressing which proportions of the queried object have the queried property. Our contributions are twofold. First, we show that the best performance on this task involves coupling state-of-the-art attention mechanisms with a network architecture mirroring the logical structure assigned to quantifiers by classic linguistic formalisation. Second, we introduce a new balanced dataset of image scenarios associated with quantification queries, which we hope will foster further research in this area.

Learning quantification from images: A structured neural architecture / Sorodoc, Ionut-teodor; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Dimiccoli, Mariella; Bernardi, Raffaella. - In: NATURAL LANGUAGE ENGINEERING. - ISSN 1351-3249. - ELETTRONICO. - 2018, 24:3(2018), pp. 363-392. [10.1017/S1351324918000128]

Learning quantification from images: A structured neural architecture

Ionut Sorodoc;Sandro Pezzelle;Aurelie Herbelot;Raffaella Bernardi
2018

Abstract

Major advances have recently been made in merging language and vision representations. Most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw multimodal data to perform certain types of higher level reasoning, expressed in natural language by function words. A case in point is given by their ability to learn quantifiers, i.e. expressions like few, some and all. From formal semantics and cognitive linguistics, we know that quantifiers are relations over sets which, as a simplification, we can see as proportions. For instance, in most fish are red, most encodes the proportion of fish which are red fish. In this paper, we study how well current neural network strategies model such relations. We propose a task where, given an image and a query expressed by an object–property pair, the system must return a quantifier expressing which proportions of the queried object have the queried property. Our contributions are twofold. First, we show that the best performance on this task involves coupling state-of-the-art attention mechanisms with a network architecture mirroring the logical structure assigned to quantifiers by classic linguistic formalisation. Second, we introduce a new balanced dataset of image scenarios associated with quantification queries, which we hope will foster further research in this area.
3
Sorodoc, Ionut-teodor; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Dimiccoli, Mariella; Bernardi, Raffaella
Learning quantification from images: A structured neural architecture / Sorodoc, Ionut-teodor; Pezzelle, Sandro; Herbelot, Aurelie Georgette Geraldine; Dimiccoli, Mariella; Bernardi, Raffaella. - In: NATURAL LANGUAGE ENGINEERING. - ISSN 1351-3249. - ELETTRONICO. - 2018, 24:3(2018), pp. 363-392. [10.1017/S1351324918000128]
File in questo prodotto:
File Dimensione Formato  
camera-ready.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.36 MB
Formato Adobe PDF
1.36 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11572/206432
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact