ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments

IRIS

Artificial agents are nowadays challenged to perform embodied AI tasks. To succeed, agents must understand the meaning of verbs and how their corresponding actions transform the surrounding world. In this work, we propose ACT-Thor, a novel controlled benchmark for embodied action understanding. We use the AI2-THOR simulated environment to produce a controlled setup in which an agent, given a before-image and an associated action command, has to determine what the correct after-image is among a set of possible candidates. First, we assess the feasibility of the task via a human evaluation that resulted in 81.4% accuracy, and very high inter-annotator agreement (84.9%). Second, we design both unimodal and multimodal baselines, using state-of-the-art visual feature extractors. Our evaluation and error analysis suggest that only models that have a very structured representation of the actions together with powerful visual features can perform well on the task. However, they still fall behind human performance in a zero-shot scenario where the model is exposed to unseen (action, object) pairs. This paves the way for a systematic way of evaluating embodied AI agents that understand grounded actions.

ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments / Hanna, Michael; Pedeni, Federico; Testoni, Alberto; Suglia, Alessandro; Bernardi, Raffaella. - ELETTRONICO. - 29:1(2022), pp. 5597-5612. (Intervento presentato al convegno 29th International Conference on Computational Linguistics, COLING 2022 tenutosi a Gyeongju, Republic of Korea nel 12-17 October, 2022).

ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments

Hanna, Michael;Pedeni, Federico;Testoni, Alberto;Suglia, Alessandro;Bernardi, Raffaella

2022-01-01

Abstract

Artificial agents are nowadays challenged to perform embodied AI tasks. To succeed, agents must understand the meaning of verbs and how their corresponding actions transform the surrounding world. In this work, we propose ACT-Thor, a novel controlled benchmark for embodied action understanding. We use the AI2-THOR simulated environment to produce a controlled setup in which an agent, given a before-image and an associated action command, has to determine what the correct after-image is among a set of possible candidates. First, we assess the feasibility of the task via a human evaluation that resulted in 81.4% accuracy, and very high inter-annotator agreement (84.9%). Second, we design both unimodal and multimodal baselines, using state-of-the-art visual feature extractors. Our evaluation and error analysis suggest that only models that have a very structured representation of the actions together with powerful visual features can perform well on the task. However, they still fall behind human performance in a zero-shot scenario where the model is exposed to unseen (action, object) pairs. This paves the way for a systematic way of evaluating embodied AI agents that understand grounded actions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 29th International Conference on Computational Linguistics
			
	Luogo di edizione (Place of publication)
	
				USA
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics (ACL)
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85147706628
			
	Tutti gli autori
	
						Hanna, Michael; Pedeni, Federico; Testoni, Alberto; Suglia, Alessandro; Bernardi, Raffaella
					
	Citazione
	
				ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments / Hanna, Michael; Pedeni, Federico; Testoni, Alberto; Suglia, Alessandro; Bernardi, Raffaella. - ELETTRONICO. - 29:1(2022), pp. 5597-5612. (Intervento presentato al  convegno 29th International Conference on Computational Linguistics, COLING 2022 tenutosi a Gyeongju, Republic of Korea nel 12-17 October, 2022).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2022.coling-1.495.pdf accesso aperto Descrizione: articolo Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 2.87 MB Formato Adobe PDF Visualizza/Apri	2.87 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/365190

Citazioni

ND

1

ND

ND

social impact