Predicting Gaze from Egocentric Social Interaction Videos and IMU Data

Sanket Kumar Thakur,; Beyan, Cigdem; Morerio, Pietro; Alessio Del Bue,

doi:10.1145/3462244.3479954

Gaze prediction in egocentric videos is a fairly new research topic, which might have several applications for assistive technology (e.g., supporting blind people in their daily interactions), security (e.g., attention tracking in risky work environments), education (e.g., augmented / mixed reality training simulators, immersive games) and so forth. Egocentric gaze is typically estimated from video while few works attempt to use inertial measurement unit (IMU) data, a sensor modality often available in wearable devices (e.g., augmented reality headsets). Instead, in this paper, we examine whether joint learning of egocentric video and corresponding IMU data can improve the first-person gaze prediction compared to using these modalities separately. In this respect, we propose a multimodal network and evaluate it on several unconstrained social interaction scenarios captured by a first-person perspective. The proposed multimodal network achieves better results compared to unimodal methods as well as several (multimodal) baselines, showing that using egocentric video together with IMU data can boost the first-person gaze estimation performance.

Predicting Gaze from Egocentric Social Interaction Videos and IMU Data / Kumar Thakur, Sanket; Beyan, Cigdem; Morerio, Pietro; Del Bue, Alessio. - (2021), pp. 717-722. (Intervento presentato al convegno ACM ICMI '21 tenutosi a Montreal nel 18 – 22 October 2021) [10.1145/3462244.3479954].

Predicting Gaze from Egocentric Social Interaction Videos and IMU Data

Sanket Kumar Thakur;Cigdem Beyan;Pietro Morerio;Alessio Del Bue

2021-01-01

Abstract

Gaze prediction in egocentric videos is a fairly new research topic, which might have several applications for assistive technology (e.g., supporting blind people in their daily interactions), security (e.g., attention tracking in risky work environments), education (e.g., augmented / mixed reality training simulators, immersive games) and so forth. Egocentric gaze is typically estimated from video while few works attempt to use inertial measurement unit (IMU) data, a sensor modality often available in wearable devices (e.g., augmented reality headsets). Instead, in this paper, we examine whether joint learning of egocentric video and corresponding IMU data can improve the first-person gaze prediction compared to using these modalities separately. In this respect, we propose a multimodal network and evaluate it on several unconstrained social interaction scenarios captured by a first-person perspective. The proposed multimodal network achieves better results compared to unimodal methods as well as several (multimodal) baselines, showing that using egocentric video together with IMU data can boost the first-person gaze estimation performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2021
			
	Titolo del volume (Proceedings title)
	
				The 2021 International Conference on Multimodal Interaction (ICMI ’21)
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				Association for Computing Machinery
			
	ISBN
	
				978-1-4503-8481-0
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85118966741
			
	Tutti gli autori
	
						Kumar Thakur, Sanket; Beyan, Cigdem; Morerio, Pietro; Del Bue, Alessio
					
	Citazione
	
				Predicting Gaze from Egocentric Social Interaction Videos and IMU Data / Kumar Thakur, Sanket; Beyan, Cigdem; Morerio, Pietro; Del Bue, Alessio. - (2021), pp. 717-722. (Intervento presentato al  convegno ACM ICMI '21 tenutosi a Montreal nel 18 – 22 October 2021) [10.1145/3462244.3479954].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
main.pdf Solo gestori archivio Descrizione: First online Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.32 MB Formato Adobe PDF Visualizza/Apri	2.32 MB	Adobe PDF	Visualizza/Apri
3462244.3479954.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.53 MB Formato Adobe PDF Visualizza/Apri	2.53 MB	Adobe PDF	Visualizza/Apri