A Small but Informed and Diverse Model: The Case of the Multimodal GuessWhat!? Guessing Game

IRIS

Pre-trained Vision and Language Transformers achieve high performance on downstream tasks due to their ability to transfer representational knowledge accumulated during pretraining on substantial amounts of data. In this paper, we ask whether it is possible to compete with such models using features based on transferred (pre-trained, frozen) representations combined with a lightweight architecture. We take a multimodal guessing task as our testbed, GuessWhat?!. An ensemble of our lightweight model matches the performance of the finetuned pre-trained transformer (LXMERT). An uncertainty analysis of our ensemble shows that the lightweight transferred representations close the data uncertainty gap with LXMERT, while retaining model diversity leading to ensemble boost. We further demonstrate that LXMERT’s performance gain is due solely to its extra V&L pretraining rather than because of architectural improvements. These results argue for flexible integration of multiple features and lightweight models as a viable alternative to large, cumbersome, pre-trained models.

A Small but Informed and Diverse Model: The Case of the Multimodal GuessWhat!? Guessing Game / Greco, Claudio; Testoni, Alberto; Bernardi, Raffaella; Frank, Stella. - ELETTRONICO. - (2022), pp. 1-10. (Intervento presentato al convegno CLASP tenutosi a Gothenburg nel 15-16 September 2022).

A Small but Informed and Diverse Model: The Case of the Multimodal GuessWhat!? Guessing Game

Greco, Claudio;Testoni, Alberto;Bernardi, Raffaella;Frank, Stella

2022-01-01

Abstract

Pre-trained Vision and Language Transformers achieve high performance on downstream tasks due to their ability to transfer representational knowledge accumulated during pretraining on substantial amounts of data. In this paper, we ask whether it is possible to compete with such models using features based on transferred (pre-trained, frozen) representations combined with a lightweight architecture. We take a multimodal guessing task as our testbed, GuessWhat?!. An ensemble of our lightweight model matches the performance of the finetuned pre-trained transformer (LXMERT). An uncertainty analysis of our ensemble shows that the lightweight transferred representations close the data uncertainty gap with LXMERT, while retaining model diversity leading to ensemble boost. We further demonstrate that LXMERT’s performance gain is due solely to its extra V&L pretraining rather than because of architectural improvements. These results argue for flexible integration of multiple features and lightweight models as a viable alternative to large, cumbersome, pre-trained models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 2022 CLASP Conference on (Dis)embodiment
			
	Luogo di edizione (Place of publication)
	
				USA
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	ISBN
	
				978-1-955917-67-4
			
	Tutti gli autori
	
						Greco, Claudio; Testoni, Alberto; Bernardi, Raffaella; Frank, Stella
					
	Citazione
	
				A Small but Informed and Diverse Model: The Case of the Multimodal GuessWhat!? Guessing Game / Greco, Claudio; Testoni, Alberto; Bernardi, Raffaella; Frank, Stella. - ELETTRONICO. - (2022), pp. 1-10. (Intervento presentato al  convegno CLASP tenutosi a Gothenburg nel 15-16 September 2022).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2022.clasp-1.1.pdf accesso aperto Descrizione: paper Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 535.72 kB Formato Adobe PDF Visualizza/Apri	535.72 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/365192

Citazioni

ND

ND

ND

ND

social impact