Cues3D: Unleashing the power of sole NeRF for consistent and unique instances in open-vocabulary 3D panoptic segmentation

Xue, Feng; Wenzhuang, Xu; Zhong, Guofeng; Ming, Anlong; Sebe, Nicu

doi:10.1016/j.inffus.2025.103164

Open-vocabulary 3D panoptic segmentation has recently emerged as a significant trend. Top-performing methods currently integrate 2D segmentation with geometry-aware 3D primitives. However, the advantage would be lost without high-fidelity 3D point clouds, such as methods based on Neural Radiance Field (NeRF). These methods are limited by the insufficient capacity to maintain consistency across partial observations. To address this, recent works have utilized contrastive loss or cross-view association pre-processing for view consensus. In contrast to them, we present Cues3D, a compact approach that relies solely on NeRF instead of pre-associations. The core idea is that NeRF's implicit 3D field inherently establishes a globally consistent geometry, enabling effective object distinction without explicit cross-view supervision. We propose a three-phase training framework for NeRF, initialization-disambiguation-refinement, whereby the instance IDs are corrected using the initially-learned knowledge. Additionally, an instance disambiguation method is proposed to match NeRF-rendered 3D masks and ensure globally unique 3D instance identities. With the aid of Cues3D, we obtain highly consistent and unique 3D instance ID for each object across views with a balanced version of NeRF. Our experiments are conducted on ScanNet v2, ScanNet200, ScanNet++, and Replica datasets for 3D instance, panoptic, and semantic segmentation tasks. Cues3D outperforms other 2D image-based methods and competes with the latest 2D-3D merging based methods, while even surpassing them when using additional 3D point clouds. The code link could be found in the appendix and will be released on github.

Cues3D: Unleashing the power of sole NeRF for consistent and unique instances in open-vocabulary 3D panoptic segmentation / Xue, F., Xu, W., Zhong, G., Ming, A., Sebe, N.. - In: INFORMATION FUSION. - ISSN 1566-2535. - 122:(2025). [10.1016/j.inffus.2025.103164]

Cues3D: Unleashing the power of sole NeRF for consistent and unique instances in open-vocabulary 3D panoptic segmentation

Xue, Feng;Xu, Wenzhuang;Zhong, Guofeng;Ming, Anlong;Sebe, Nicu

2025-01-01

Abstract

Open-vocabulary 3D panoptic segmentation has recently emerged as a significant trend. Top-performing methods currently integrate 2D segmentation with geometry-aware 3D primitives. However, the advantage would be lost without high-fidelity 3D point clouds, such as methods based on Neural Radiance Field (NeRF). These methods are limited by the insufficient capacity to maintain consistency across partial observations. To address this, recent works have utilized contrastive loss or cross-view association pre-processing for view consensus. In contrast to them, we present Cues3D, a compact approach that relies solely on NeRF instead of pre-associations. The core idea is that NeRF's implicit 3D field inherently establishes a globally consistent geometry, enabling effective object distinction without explicit cross-view supervision. We propose a three-phase training framework for NeRF, initialization-disambiguation-refinement, whereby the instance IDs are corrected using the initially-learned knowledge. Additionally, an instance disambiguation method is proposed to match NeRF-rendered 3D masks and ensure globally unique 3D instance identities. With the aid of Cues3D, we obtain highly consistent and unique 3D instance ID for each object across views with a balanced version of NeRF. Our experiments are conducted on ScanNet v2, ScanNet200, ScanNet++, and Replica datasets for 3D instance, panoptic, and semantic segmentation tasks. Cues3D outperforms other 2D image-based methods and competes with the latest 2D-3D merging based methods, while even surpassing them when using additional 3D point clouds. The code link could be found in the appendix and will be released on github.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del periodico (Journal title)
	
				INFORMATION FUSION
			
	DOI
	
				https://dx.doi.org/10.1016/j.inffus.2025.103164
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-105002331388
			
	Codice WOS (WOS identifier)
	
				WOS:001469448300001
			
	Tutti gli autori
	
						Xue, Feng; Xu, Wenzhuang; Zhong, Guofeng; Ming, Anlong; Sebe, Nicu
					
	Citazione
	
				Cues3D: Unleashing the power of sole NeRF for consistent and unique instances in open-vocabulary 3D panoptic segmentation / Xue, F., Xu, W., Zhong, G., Ming, A., Sebe, N.. - In: INFORMATION FUSION. - ISSN 1566-2535. - 122:(2025). [10.1016/j.inffus.2025.103164]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S1566253525002374-main.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 5.11 MB Formato Adobe PDF Visualizza/Apri	5.11 MB	Adobe PDF	Visualizza/Apri