3D reconstruction and scene understanding with 3D Gaussian Splatting representation

IRIS

The rapid development of 3D computer vision has made the precise reconstruction and understanding of complex environments essential. At the heart of this challenge lies the need for efficient, expressive 3D representations capable of capturing geometry, appearance, and semantics. While recent neural scene representations have notably improved image-based reconstruction, they are often hindered by high computational costs, limited robustness, and poor scalability for large-scale or interactive applications. The primary objective of this thesis is to improve the performance of neural scene representations, with a particular focus on 3D Gaussian Splatting, to support both precise 3D reconstruction and comprehensive scene understanding across diverse real-world scenarios. To achieve this goal, we start by introducing NeRFBK, a comprehensive benchmark dataset designed to systematically evaluate radiance field representations against traditional photogrammetry. Next, we investigate the reconstruction of non-collaborative surfaces, such as reflective and transparent objects, which remain challenging for existing 3D reconstruction approaches. By incorporating surface normal supervision and relighting guidance into the 3D Gaussian Splatting framework, we develop a reconstruction pipeline that significantly improves geometric fidelity under sparse views and complex illumination conditions. Furthermore, we extend 3D Gaussian representations to large-scale outdoor scene understanding by introducing RenderWorld, a unified framework that leverages 3D Gaussian Splatting to generate self-supervised 3D occupancy representations from multi-view images, enabling efficient scene reconstruction, semantic reasoning, and motion forecasting in autonomous driving scenarios. Finally, we explore the potential of 3D Gaussian representations for interactive scene manipulation. We propose 3DSceneEditor, a fully 3D-based framework that enables controllable and semantically aware editing of complex indoor scenes by directly operating on Gaussian primitives. Overall, this thesis provides a series of advancements in Gaussian-based scene representations for 3D reconstruction, scene understanding, and interactive 3D applications. These contributions not only improve the fidelity and efficiency of 3D reconstruction but also advance the capability of machines to interpret and interact with complex 3D environments, bringing machine perception closer to human-level spatial understanding.

3D reconstruction and scene understanding with 3D Gaussian Splatting representation / Yan, Ziyang. - (2026 May 14), pp. -1.

3D reconstruction and scene understanding with 3D Gaussian Splatting representation

Yan, Ziyang

2026-05-14

Abstract

The rapid development of 3D computer vision has made the precise reconstruction and understanding of complex environments essential. At the heart of this challenge lies the need for efficient, expressive 3D representations capable of capturing geometry, appearance, and semantics. While recent neural scene representations have notably improved image-based reconstruction, they are often hindered by high computational costs, limited robustness, and poor scalability for large-scale or interactive applications. The primary objective of this thesis is to improve the performance of neural scene representations, with a particular focus on 3D Gaussian Splatting, to support both precise 3D reconstruction and comprehensive scene understanding across diverse real-world scenarios. To achieve this goal, we start by introducing NeRFBK, a comprehensive benchmark dataset designed to systematically evaluate radiance field representations against traditional photogrammetry. Next, we investigate the reconstruction of non-collaborative surfaces, such as reflective and transparent objects, which remain challenging for existing 3D reconstruction approaches. By incorporating surface normal supervision and relighting guidance into the 3D Gaussian Splatting framework, we develop a reconstruction pipeline that significantly improves geometric fidelity under sparse views and complex illumination conditions. Furthermore, we extend 3D Gaussian representations to large-scale outdoor scene understanding by introducing RenderWorld, a unified framework that leverages 3D Gaussian Splatting to generate self-supervised 3D occupancy representations from multi-view images, enabling efficient scene reconstruction, semantic reasoning, and motion forecasting in autonomous driving scenarios. Finally, we explore the potential of 3D Gaussian representations for interactive scene manipulation. We propose 3DSceneEditor, a fully 3D-based framework that enables controllable and semantically aware editing of complex indoor scenes by directly operating on Gaussian primitives. Overall, this thesis provides a series of advancements in Gaussian-based scene representations for 3D reconstruction, scene understanding, and interactive 3D applications. These contributions not only improve the fidelity and efficiency of 3D reconstruction but also advance the capability of machines to interpret and interact with complex 3D environments, bringing machine perception closer to human-level spatial understanding.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				14-mag-2026
			
	Ciclo
	
				XXXVIII
			
	Anno Accademico
	
				2025-2026
			
	Dipartimento
	
				Università degli Studi di Trento
			
	Corso di dottorato
	
				Informatica e telecomunicazioni (fino a.a. 2020-21, 36° ciclo)
			
	Supervisore/Relatore di tesi esterno (External supervisor)
	
				Remondino, Fabio
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/486611

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

social impact