Natural language generation systems have witnessed important progress in the last years, but they are shown to generate tokens that are unrelated to the source input. This problem affects computational models in many NLP tasks, and it is particularly unpleasant in multimodal systems. In this work, we assess the rate of object hallucination in multimodal conversational agents playing the GuessWhat?! referential game. Better visual processing has been shown to mitigate this issue in image captioning; hence, we adapt to the GuessWhat?! task the best visual processing models at disposal, and propose two new models to play the Questioner agent. We show that the new models generate few hallucinations compared to other renowned models available in the literature. Moreover, their hallucinations are less severe (affect task-accuracy less) and are more human-like. We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.

“I’ve Seen Things You People Wouldn’t Believe”: Hallucinating Entities in GuessWhat?! / Testoni, Alberto; Bernardi, Raffaella. - ELETTRONICO. - (2021), pp. 101-111. (Intervento presentato al convegno ACL SRW tenutosi a Online nel 1-6 August 2021) [10.18653/v1/2021.acl-srw.11].

“I’ve Seen Things You People Wouldn’t Believe”: Hallucinating Entities in GuessWhat?!

Testoni, Alberto;Bernardi, Raffaella
2021-01-01

Abstract

Natural language generation systems have witnessed important progress in the last years, but they are shown to generate tokens that are unrelated to the source input. This problem affects computational models in many NLP tasks, and it is particularly unpleasant in multimodal systems. In this work, we assess the rate of object hallucination in multimodal conversational agents playing the GuessWhat?! referential game. Better visual processing has been shown to mitigate this issue in image captioning; hence, we adapt to the GuessWhat?! task the best visual processing models at disposal, and propose two new models to play the Questioner agent. We show that the new models generate few hallucinations compared to other renowned models available in the literature. Moreover, their hallucinations are less severe (affect task-accuracy less) and are more human-like. We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.
2021
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop
Online
ACL
978-1-954085-55-8
Testoni, Alberto; Bernardi, Raffaella
“I’ve Seen Things You People Wouldn’t Believe”: Hallucinating Entities in GuessWhat?! / Testoni, Alberto; Bernardi, Raffaella. - ELETTRONICO. - (2021), pp. 101-111. (Intervento presentato al convegno ACL SRW tenutosi a Online nel 1-6 August 2021) [10.18653/v1/2021.acl-srw.11].
File in questo prodotto:
File Dimensione Formato  
2021.acl-srw.11.pdf

accesso aperto

Descrizione: articolo principale
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 899.78 kB
Formato Adobe PDF
899.78 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/328642
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact