Natural language generation systems have witnessed important progress in the last years, but they are shown to generate tokens that are unrelated to the source input. This problem affects computational models in many NLP tasks, and it is particularly unpleasant in multimodal systems. In this work, we assess the rate of object hallucination in multimodal conversational agents playing the GuessWhat?! referential game. Better visual processing has been shown to mitigate this issue in image captioning; hence, we adapt to the GuessWhat?! task the best visual processing models at disposal, and propose two new models to play the Questioner agent. We show that the new models generate few hallucinations compared to other renowned models available in the literature. Moreover, their hallucinations are less severe (affect task-accuracy less) and are more human-like. We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.
“I’ve Seen Things You People Wouldn’t Believe”: Hallucinating Entities in GuessWhat?! / Testoni, Alberto; Bernardi, Raffaella. - ELETTRONICO. - (2021), pp. 101-111. (Intervento presentato al convegno ACL SRW tenutosi a Online nel 1-6 August 2021) [10.18653/v1/2021.acl-srw.11].
“I’ve Seen Things You People Wouldn’t Believe”: Hallucinating Entities in GuessWhat?!
Testoni, Alberto;Bernardi, Raffaella
2021-01-01
Abstract
Natural language generation systems have witnessed important progress in the last years, but they are shown to generate tokens that are unrelated to the source input. This problem affects computational models in many NLP tasks, and it is particularly unpleasant in multimodal systems. In this work, we assess the rate of object hallucination in multimodal conversational agents playing the GuessWhat?! referential game. Better visual processing has been shown to mitigate this issue in image captioning; hence, we adapt to the GuessWhat?! task the best visual processing models at disposal, and propose two new models to play the Questioner agent. We show that the new models generate few hallucinations compared to other renowned models available in the literature. Moreover, their hallucinations are less severe (affect task-accuracy less) and are more human-like. We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.File | Dimensione | Formato | |
---|---|---|---|
2021.acl-srw.11.pdf
accesso aperto
Descrizione: articolo principale
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
899.78 kB
Formato
Adobe PDF
|
899.78 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione