Can computers learn from humans to see better? Inferring scene semantics from viewers' eye movements

IRIS

This paper describes an attempt to bridge the semantic gap between computer vision and scene understanding employing eye movements. Even as computer vision algorithms can efficiently detect scene objects, discovering semantic relationships between these objects is as essential for scene understanding. Humans understand complex scenes by rapidly moving their eyes (saccades) to selectively focus on salient entities (fixations). For 110 social scenes, we compared verbal descriptions provided by observers against eye movements recorded during a free-viewing task. Data analysis confirms (i) a strong correlation between task-explicit linguistic descriptions and task-implicit eye movements, both of which are influenced by underlying scene semantics and (ii) the ability of eye movements in the form of fixations and saccades to indicate salient entities and entity relationships mentioned in scene descriptions. We demonstrate how eye movements are useful for inferring the meaning of social (ever...

Can computers learn from humans to see better? Inferring scene semantics from viewers' eye movements

Subramanian, Ramanathan;Yanulevskaya, Victoria;Sebe, Niculae

2011-01-01

Abstract

This paper describes an attempt to bridge the semantic gap between computer vision and scene understanding employing eye movements. Even as computer vision algorithms can efficiently detect scene objects, discovering semantic relationships between these objects is as essential for scene understanding. Humans understand complex scenes by rapidly moving their eyes (saccades) to selectively focus on salient entities (fixations). For 110 social scenes, we compared verbal descriptions provided by observers against eye movements recorded during a free-viewing task. Data analysis confirms (i) a strong correlation between task-explicit linguistic descriptions and task-implicit eye movements, both of which are influenced by underlying scene semantics and (ii) the ability of eye movements in the form of fixations and saccades to indicate salient entities and entity relationships mentioned in scene descriptions. We demonstrate how eye movements are useful for inferring the meaning of social (ever...

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2011
			
	Titolo del volume (Proceedings title)
	
				MM '11 Proceedings of the 19th ACM international conference on Multimedia
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				ACM Press
			
	ISBN
	
				9781450306164
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-84455173231
			
	Tutti gli autori
	
						Subramanian, Ramanathan; Yanulevskaya, Victoria; Sebe, Niculae
					
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/88417

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

24

ND

35

social impact