Grounded conversational agents are a fascinating research line on which important progress has been made lately thanks to the development of neural network models and to the release of visual dialogue datasets. The latter have been used to set visual dialogue games which are an interesting test bed to evaluate conversational agents. Researchers’ attention is on building models of increasing complexity, trained with computationally costly machine learning paradigms that lead to higher task success scores. In this paper, we take a step back: We use a rather simple neural network architecture and we scrutinize the GuessWhich task, the dataset, and the quality of the generated dialogues. We show that our simple Questioner agent reaches state-of-the art performance, that the evaluation metric commonly used is too coarse to compare different models, and that high task success does not correspond to high quality of the dialogues. Our work shows the importance of running detailed analyses of the results to spot possible models’ weaknesses rather than aiming to outperform state-of-the-art scores

The Devil is in the Detail: A Magnifying Glass for the GuessWhich Visual Dialogue Game / Testoni, Alberto; Shekhar, Ravi; Fernández, Raquel; Bernardi, Raffaella. - ELETTRONICO. - (2019), pp. 15-24. (Intervento presentato al convegno SemDial 2019 tenutosi a London nel 4th-6th September 2019).

The Devil is in the Detail: A Magnifying Glass for the GuessWhich Visual Dialogue Game

Testoni, Alberto;Shekhar Ravi;Bernardi Raffaella
2019-01-01

Abstract

Grounded conversational agents are a fascinating research line on which important progress has been made lately thanks to the development of neural network models and to the release of visual dialogue datasets. The latter have been used to set visual dialogue games which are an interesting test bed to evaluate conversational agents. Researchers’ attention is on building models of increasing complexity, trained with computationally costly machine learning paradigms that lead to higher task success scores. In this paper, we take a step back: We use a rather simple neural network architecture and we scrutinize the GuessWhich task, the dataset, and the quality of the generated dialogues. We show that our simple Questioner agent reaches state-of-the art performance, that the evaluation metric commonly used is too coarse to compare different models, and that high task success does not correspond to high quality of the dialogues. Our work shows the importance of running detailed analyses of the results to spot possible models’ weaknesses rather than aiming to outperform state-of-the-art scores
2019
SemDial 2019; LondonLogue: Proceedings of the 23rd Workshop on the Semantics and Pragmatics of Dialogue
London, United Kingdom
SEMDIAL
Testoni, Alberto; Shekhar, Ravi; Fernández, Raquel; Bernardi, Raffaella
The Devil is in the Detail: A Magnifying Glass for the GuessWhich Visual Dialogue Game / Testoni, Alberto; Shekhar, Ravi; Fernández, Raquel; Bernardi, Raffaella. - ELETTRONICO. - (2019), pp. 15-24. (Intervento presentato al convegno SemDial 2019 tenutosi a London nel 4th-6th September 2019).
File in questo prodotto:
File Dimensione Formato  
Testoni_semdial_0005.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 965.44 kB
Formato Adobe PDF
965.44 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/250557
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact