Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.

Joint Estimation of Human Pose and Conversational Groups from Social Scenes / Varadarajan, Jagannadan; Subramanian, Ramanathan; Bulã², Samuel Rota; Ahuja, Narendra; Lanz, Oswald; Ricci, Elisa. - In: INTERNATIONAL JOURNAL OF COMPUTER VISION. - ISSN 0920-5691. - STAMPA. - 126:(2018), pp. 410-429. [10.1007/s11263-017-1026-6]

Joint Estimation of Human Pose and Conversational Groups from Social Scenes

Subramanian, Ramanathan;Lanz, Oswald;Ricci, Elisa
2018-01-01

Abstract

Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.
2018
Varadarajan, Jagannadan; Subramanian, Ramanathan; Bulã², Samuel Rota; Ahuja, Narendra; Lanz, Oswald; Ricci, Elisa
Joint Estimation of Human Pose and Conversational Groups from Social Scenes / Varadarajan, Jagannadan; Subramanian, Ramanathan; Bulã², Samuel Rota; Ahuja, Narendra; Lanz, Oswald; Ricci, Elisa. - In: INTERNATIONAL JOURNAL OF COMPUTER VISION. - ISSN 0920-5691. - STAMPA. - 126:(2018), pp. 410-429. [10.1007/s11263-017-1026-6]
File in questo prodotto:
File Dimensione Formato  
IJCV2018.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.23 MB
Formato Adobe PDF
2.23 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/194332
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 26
  • ???jsp.display-item.citation.isi??? 19
social impact