Monocular Human Pose Estimation (HPE) aims at determining the 3D positions of human joints from a single 2D image captured by a camera. However, a single 2D point in the image may correspond to multiple points in 3D space. Typically, the uniqueness of the 2D-3D relationship is approximated using an orthographic or weak-perspective camera model. In this study, instead of relying on approximations, we advocate for utilizing the full perspective camera model. This involves estimating camera parameters and establishing a precise, unambiguous 2D-3D relationship. To do so, we introduce the EPOCH framework, comprising two main components: the pose lifter network (LiftNet) and the pose regressor network (RegNet). LiftNet utilizes the full perspective camera model to precisely estimate the 3D pose in an unsupervised manner. It takes a 2D pose and camera parameters as inputs and produces the corresponding 3D pose estimation. These inputs are obtained from RegNet, which starts from a single image and provides estimates for the 2D pose and camera parameters. RegNet utilizes only 2D pose data as weak supervision. Internally, RegNet predicts a 3D pose, which is then projected to 2D using the estimated camera parameters. This process enables RegNet to establish the unambiguous 2D-3D relationship. Our experiments show that modeling the lifting as an unsupervised task with a camera in-the-loop results in better generalization to unseen data. We obtain state-of-the-art results for the 3D HPE on the Human3.6M and MPI-INF-3DHP datasets. More information on our project page: https://carstenepic.github.io/epoch/.

EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans / Garau, Nicola; Martinelli, Giulia; Bisagno, Niccolò; Tomè, Denis; Stoll, Carsten. - 15635 LNCS:(2025), pp. 1-18. ( Workshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024 ita 2024) [10.1007/978-3-031-91575-8_1].

EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans

Nicola Garau;Giulia Martinelli;Niccolò Bisagno;
2025-01-01

Abstract

Monocular Human Pose Estimation (HPE) aims at determining the 3D positions of human joints from a single 2D image captured by a camera. However, a single 2D point in the image may correspond to multiple points in 3D space. Typically, the uniqueness of the 2D-3D relationship is approximated using an orthographic or weak-perspective camera model. In this study, instead of relying on approximations, we advocate for utilizing the full perspective camera model. This involves estimating camera parameters and establishing a precise, unambiguous 2D-3D relationship. To do so, we introduce the EPOCH framework, comprising two main components: the pose lifter network (LiftNet) and the pose regressor network (RegNet). LiftNet utilizes the full perspective camera model to precisely estimate the 3D pose in an unsupervised manner. It takes a 2D pose and camera parameters as inputs and produces the corresponding 3D pose estimation. These inputs are obtained from RegNet, which starts from a single image and provides estimates for the 2D pose and camera parameters. RegNet utilizes only 2D pose data as weak supervision. Internally, RegNet predicts a 3D pose, which is then projected to 2D using the estimated camera parameters. This process enables RegNet to establish the unambiguous 2D-3D relationship. Our experiments show that modeling the lifting as an unsupervised task with a camera in-the-loop results in better generalization to unseen data. We obtain state-of-the-art results for the 3D HPE on the Human3.6M and MPI-INF-3DHP datasets. More information on our project page: https://carstenepic.github.io/epoch/.
2025
Lecture Notes in Computer Science
GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
Springer Science and Business Media Deutschland GmbH
9783031915741
9783031915758
Garau, Nicola; Martinelli, Giulia; Bisagno, Niccolò; Tomè, Denis; Stoll, Carsten
EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans / Garau, Nicola; Martinelli, Giulia; Bisagno, Niccolò; Tomè, Denis; Stoll, Carsten. - 15635 LNCS:(2025), pp. 1-18. ( Workshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024 ita 2024) [10.1007/978-3-031-91575-8_1].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/465955
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact