The rapid development of 3D computer vision has made the precise reconstruction and understanding of complex environments essential. At the heart of this challenge lies the need for efficient, expressive 3D representations capable of capturing geometry, appearance, and semantics. While recent neural scene representations have notably improved image-based reconstruction, they are often hindered by high computational costs, limited robustness, and poor scalability for large-scale or interactive applications. The primary objective of this thesis is to improve the performance of neural scene representations, with a particular focus on 3D Gaussian Splatting, to support both precise 3D reconstruction and comprehensive scene understanding across diverse real-world scenarios. To achieve this goal, we start by introducing NeRFBK, a comprehensive benchmark dataset designed to systematically evaluate radiance field representations against traditional photogrammetry. Next, we investigate the reconstruction of non-collaborative surfaces, such as reflective and transparent objects, which remain challenging for existing 3D reconstruction approaches. By incorporating surface normal supervision and relighting guidance into the 3D Gaussian Splatting framework, we develop a reconstruction pipeline that significantly improves geometric fidelity under sparse views and complex illumination conditions. Furthermore, we extend 3D Gaussian representations to large-scale outdoor scene understanding by introducing RenderWorld, a unified framework that leverages 3D Gaussian Splatting to generate self-supervised 3D occupancy representations from multi-view images, enabling efficient scene reconstruction, semantic reasoning, and motion forecasting in autonomous driving scenarios. Finally, we explore the potential of 3D Gaussian representations for interactive scene manipulation. We propose 3DSceneEditor, a fully 3D-based framework that enables controllable and semantically aware editing of complex indoor scenes by directly operating on Gaussian primitives. Overall, this thesis provides a series of advancements in Gaussian-based scene representations for 3D reconstruction, scene understanding, and interactive 3D applications. These contributions not only improve the fidelity and efficiency of 3D reconstruction but also advance the capability of machines to interpret and interact with complex 3D environments, bringing machine perception closer to human-level spatial understanding.

3D reconstruction and scene understanding with 3D Gaussian Splatting representation / Yan, Ziyang. - (2026 May 14), pp. -1.

3D reconstruction and scene understanding with 3D Gaussian Splatting representation

Yan, Ziyang
2026-05-14

Abstract

The rapid development of 3D computer vision has made the precise reconstruction and understanding of complex environments essential. At the heart of this challenge lies the need for efficient, expressive 3D representations capable of capturing geometry, appearance, and semantics. While recent neural scene representations have notably improved image-based reconstruction, they are often hindered by high computational costs, limited robustness, and poor scalability for large-scale or interactive applications. The primary objective of this thesis is to improve the performance of neural scene representations, with a particular focus on 3D Gaussian Splatting, to support both precise 3D reconstruction and comprehensive scene understanding across diverse real-world scenarios. To achieve this goal, we start by introducing NeRFBK, a comprehensive benchmark dataset designed to systematically evaluate radiance field representations against traditional photogrammetry. Next, we investigate the reconstruction of non-collaborative surfaces, such as reflective and transparent objects, which remain challenging for existing 3D reconstruction approaches. By incorporating surface normal supervision and relighting guidance into the 3D Gaussian Splatting framework, we develop a reconstruction pipeline that significantly improves geometric fidelity under sparse views and complex illumination conditions. Furthermore, we extend 3D Gaussian representations to large-scale outdoor scene understanding by introducing RenderWorld, a unified framework that leverages 3D Gaussian Splatting to generate self-supervised 3D occupancy representations from multi-view images, enabling efficient scene reconstruction, semantic reasoning, and motion forecasting in autonomous driving scenarios. Finally, we explore the potential of 3D Gaussian representations for interactive scene manipulation. We propose 3DSceneEditor, a fully 3D-based framework that enables controllable and semantically aware editing of complex indoor scenes by directly operating on Gaussian primitives. Overall, this thesis provides a series of advancements in Gaussian-based scene representations for 3D reconstruction, scene understanding, and interactive 3D applications. These contributions not only improve the fidelity and efficiency of 3D reconstruction but also advance the capability of machines to interpret and interact with complex 3D environments, bringing machine perception closer to human-level spatial understanding.
14-mag-2026
XXXVIII
2025-2026
Università degli Studi di Trento
Information and Communication Technology
Remondino, Fabio
no
Inglese
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/486611
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact