Quantifiers in a MultimodalWorld: Hallucinating Vision with Language and Sound

Testoni, Alberto; Pezzelle, Sandro; Bernardi, Raffaella

doi:10.18653/v1/W19-2912

Inspired by the literature on multisensory integration, we develop a computational model to ground quantifiers in perception. The model learns to pick out of nine quantifiers (‘few’, ‘many’, ‘all’, etc.) the one that is more likely to describe the percent of animals in a visualauditory input containing both animals and artifacts. We show that relying on concurrent sensory inputs increases model performance on the quantification task. Moreover, we evaluate the model in a situation in which only the auditory modality is given, while the visual one is ‘hallucinanted’ either from the auditory input itself or from a linguistic caption describing the quantity of entities in the auditory input. This way, the model exploits prior associations between modalities. We show that the model profits from the prior knowledge and outperforms the auditory-only setting.

Quantifiers in a MultimodalWorld: Hallucinating Vision with Language and Sound / Testoni, Alberto; Pezzelle, Sandro; Bernardi, Raffaella. - ELETTRONICO. - (2019), pp. 105-116. ( CMCL 2019 Minneapolis, MN 7th June 2019) [10.18653/v1/W19-2912].

Quantifiers in a MultimodalWorld: Hallucinating Vision with Language and Sound

Testoni Alberto;Pezzelle Sandro;Bernardi Raffaella

2019-01-01

Abstract

Inspired by the literature on multisensory integration, we develop a computational model to ground quantifiers in perception. The model learns to pick out of nine quantifiers (‘few’, ‘many’, ‘all’, etc.) the one that is more likely to describe the percent of animals in a visualauditory input containing both animals and artifacts. We show that relying on concurrent sensory inputs increases model performance on the quantification task. Moreover, we evaluate the model in a situation in which only the auditory modality is given, while the visual one is ‘hallucinanted’ either from the auditory input itself or from a linguistic caption describing the quantity of entities in the auditory input. This way, the model exploits prior associations between modalities. We show that the model profits from the prior knowledge and outperforms the auditory-only setting.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2019
			
	Titolo del volume (Proceedings title)
	
				NAACL HLT 2019: Cognitive Modeling and Computational Linguistics: Proceedings of the Workshop
			
	Luogo di edizione (Place of publication)
	
				Stroudsburg, PA
			
	Casa editrice (Publisher)
	
				ACL
			
	ISBN
	
				978-1-948087-96-4
			
	Tutti gli autori
	
						Testoni, Alberto; Pezzelle, Sandro; Bernardi, Raffaella
					
	Citazione
	
				Quantifiers in a MultimodalWorld: Hallucinating Vision with Language and Sound / Testoni, Alberto; Pezzelle, Sandro; Bernardi, Raffaella. - ELETTRONICO. - (2019), pp. 105-116. ( CMCL 2019 Minneapolis, MN 7th June 2019) [10.18653/v1/W19-2912].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
cmcl19.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 913.86 kB Formato Adobe PDF Visualizza/Apri	913.86 kB	Adobe PDF	Visualizza/Apri