In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.

T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance / Nie, Weizhi; Chen, Ruidong; Wang, Weijie; Lepri, Bruno; Sebe, Nicu. - In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. - ISSN 0162-8828. - 47:1(2025), pp. 172-189. [10.1109/TPAMI.2024.3463753]

T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance

Wang, Weijie;Lepri, Bruno;Sebe, Nicu
2025-01-01

Abstract

In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.
2025
1
Nie, Weizhi; Chen, Ruidong; Wang, Weijie; Lepri, Bruno; Sebe, Nicu
T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance / Nie, Weizhi; Chen, Ruidong; Wang, Weijie; Lepri, Bruno; Sebe, Nicu. - In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. - ISSN 0162-8828. - 47:1(2025), pp. 172-189. [10.1109/TPAMI.2024.3463753]
File in questo prodotto:
File Dimensione Formato  
T2TD_Text-3D_Generation_Model_Based_on_Prior_Knowledge_Guidance.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.34 MB
Formato Adobe PDF
4.34 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/445414
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact