They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies

IRIS

In this paper, we study the grounding skills required to answer spatial questions asked by humans while playing the GuessWhat?! game. We propose a classification for spatial questions dividing them into absolute, relational, and group questions. We build a new answerer model based on the LXMERT multimodal transformer and we compare a baseline with and without visual features of the scene. We are interested in studying how the attention mechanisms of LXMERT are used to answer spatial questions since they require putting attention on more than one region simultaneously and spotting the relation holding among them. We show that our proposed model outperforms the baseline by a large extent (9.70% on spatial questions and 6.27% overall). By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies / Testoni, A., Greco, C., Bianchi, T., Mazuecos, M., Marcante, A., Benotti, L., Bernardi, R.. - ELETTRONICO. - (2020), pp. 29-38. (SpLU 2020 Online November 19, 2020) [10.18653/v1/2020.splu-1.4].

They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies

Testoni, Alberto;Greco, Claudio;Bianchi, Tobias;Mazuecos, Mauricio;Marcante, Agata;Benotti, Luciana;Bernardi, Raffaella

2020-01-01

Abstract

In this paper, we study the grounding skills required to answer spatial questions asked by humans while playing the GuessWhat?! game. We propose a classification for spatial questions dividing them into absolute, relational, and group questions. We build a new answerer model based on the LXMERT multimodal transformer and we compare a baseline with and without visual features of the scene. We are interested in studying how the attention mechanisms of LXMERT are used to answer spatial questions since they require putting attention on more than one region simultaneously and spotting the relation holding among them. We show that our proposed model outperforms the baseline by a large extent (9.70% on spatial questions and 6.27% overall). By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2020
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the Third International Workshop on Spatial Language Understanding
			
	Luogo di edizione (Place of publication)
	
				Stroudsburg PA, USA
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	Tutti gli autori
	
						Testoni, Alberto; Greco, Claudio; Bianchi, Tobias; Mazuecos, Mauricio; Marcante, Agata; Benotti, Luciana; Bernardi, Raffaella
					
	Citazione
	
				They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies / Testoni, A., Greco, C., Bianchi, T., Mazuecos, M., Marcante, A., Benotti, L., Bernardi, R.. - ELETTRONICO. - (2020), pp. 29-38. (SpLU 2020 Online November 19, 2020) [10.18653/v1/2020.splu-1.4].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2020.splu-1.4.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 3.01 MB Formato Adobe PDF Visualizza/Apri	3.01 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/286799

Citazioni

ND

ND

ND

4

social impact