Text-driven object insertion in the 3D scene is an emerging task that enables intuitive scene editing through natural language. Despite its potential, existing 2D editing-based methods often suffer from reliance on spatial priors such as 2D masks, 3D bounding boxes, and they struggle to ensure inserted object consistency. These limitations hinder flexibility and scalability in real-world applications. In this paper, we propose FreeInsert, a novel framework that leverages foundation models (MLLMs, LGM, and diffusion models) to disentangle object generation and spatial placement, enabling unsupervised and flexible object insertion in 3D scenes without spatial priors. FreeInsert begins with an MLLM-based parser that extracts structured semantics-including object types, spatial relationships, and attachment regions-from user instructions. These semantics guide both the reconstruction of the inserted object for 3D consistency and the learning of its degrees of freedom. We first leverage the spatial reasoning capabilities of MLLMs to initialize the object's pose and scale. To further enhance natural integration with the scene, a hierarchical spatially-aware stage is employed to refine the object's placement, incorporating both the spatial semantics and priors inferred by the MLLM. Finally, the object's appearance is enhanced using inserted-object image to improve visual fidelity. Experimental results demonstrate that FreeInsert enables semantically coherent, spatially precise, and visually realistic 3D insertions, without requiring any spatial priors, offering a user-friendly and flexible editing experience. Project page: https://tjulcx.github.io/FreeInsert/.
FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors / Li, Chenxi; Wang, Weijie; Li, Qiang; Sebe, Nicu; Lepri, Bruno; Nie, Weizhi. - (2025), pp. 10915-10924. ( ACM Multimedia Dublin October 2025) [10.1145/3746027.3755072].
FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors
Wang, Weijie;Sebe, Nicu;Lepri, Bruno;
2025-01-01
Abstract
Text-driven object insertion in the 3D scene is an emerging task that enables intuitive scene editing through natural language. Despite its potential, existing 2D editing-based methods often suffer from reliance on spatial priors such as 2D masks, 3D bounding boxes, and they struggle to ensure inserted object consistency. These limitations hinder flexibility and scalability in real-world applications. In this paper, we propose FreeInsert, a novel framework that leverages foundation models (MLLMs, LGM, and diffusion models) to disentangle object generation and spatial placement, enabling unsupervised and flexible object insertion in 3D scenes without spatial priors. FreeInsert begins with an MLLM-based parser that extracts structured semantics-including object types, spatial relationships, and attachment regions-from user instructions. These semantics guide both the reconstruction of the inserted object for 3D consistency and the learning of its degrees of freedom. We first leverage the spatial reasoning capabilities of MLLMs to initialize the object's pose and scale. To further enhance natural integration with the scene, a hierarchical spatially-aware stage is employed to refine the object's placement, incorporating both the spatial semantics and priors inferred by the MLLM. Finally, the object's appearance is enhanced using inserted-object image to improve visual fidelity. Experimental results demonstrate that FreeInsert enables semantically coherent, spatially precise, and visually realistic 3D insertions, without requiring any spatial priors, offering a user-friendly and flexible editing experience. Project page: https://tjulcx.github.io/FreeInsert/.| File | Dimensione | Formato | |
|---|---|---|---|
|
FreeInsert.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
9.17 MB
Formato
Adobe PDF
|
9.17 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



