Probability Distributions as a Litmus Test to Inspect NNs Grounding Skills

Lucassen, A. J.; Testoni, A.; Bernardi, Raffaella

Today AI systems are trained by ultimately using a classifier to perform a down-streaming task and are mostly evaluated on the task-success they reach. Not enough attention is given to how the classifier distributes the probabilities among the candidates out of which the target with the highest probability is selected. We propose to take the probability distribution as a litmus test to inspect models’ grounding skills. We take a visually grounded referential guessing game as test-bed and use the probability distribution as a way to evaluate whether question answer pairs are well grounded by the model. To this end, we propose a method to obtain such soft-labels automatically and show they correlate well with human uncertainty about the grounded interpretation of the QA pair. Our result shows that higher task accuracy does not necessarily correspond to a more meaningful probability distribution; we do not consider trustworthy the models which do not pass our litmus test.

Probability Distributions as a Litmus Test to Inspect NNs Grounding Skills / Lucassen, A. J.; Testoni, A.; Bernardi, Raffaella. - ELETTRONICO. - 3287:(2022), pp. 108-126. ( 6th Workshop on Natural Language for Artificial Intelligence, NL4AI 2022 Udine 30 novembre 2022).

Probability Distributions as a Litmus Test to Inspect NNs Grounding Skills

Lucassen, A. J.;Testoni, A.;Bernardi, Raffaella

2022-01-01

Abstract

Today AI systems are trained by ultimately using a classifier to perform a down-streaming task and are mostly evaluated on the task-success they reach. Not enough attention is given to how the classifier distributes the probabilities among the candidates out of which the target with the highest probability is selected. We propose to take the probability distribution as a litmus test to inspect models’ grounding skills. We take a visually grounded referential guessing game as test-bed and use the probability distribution as a way to evaluate whether question answer pairs are well grounded by the model. To this end, we propose a method to obtain such soft-labels automatically and show they correlate well with human uncertainty about the grounded interpretation of the QA pair. Our result shows that higher task accuracy does not necessarily correspond to a more meaningful probability distribution; we do not consider trustworthy the models which do not pass our litmus test.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Sixth Workshop on Natural Language for Artificial Intelligence
			
	Luogo di edizione (Place of publication)
	
				Aachen
			
	Casa editrice (Publisher)
	
				CEUR-WS
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85143255573
			
	Tutti gli autori
	
						Lucassen, A. J.; Testoni, A.; Bernardi, Raffaella
					
	Citazione
	
				Probability Distributions as a Litmus Test to Inspect NNs Grounding Skills / Lucassen, A. J.; Testoni, A.; Bernardi, Raffaella. - ELETTRONICO. - 3287:(2022), pp. 108-126. ( 6th Workshop on Natural Language for Artificial Intelligence, NL4AI 2022 Udine 30 novembre 2022).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
paper11.pdf accesso aperto Descrizione: paper Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 3.93 MB Formato Adobe PDF Visualizza/Apri	3.93 MB	Adobe PDF	Visualizza/Apri