Referring 3D Segmentation is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query. Previous works perform a two-stage paradigm, first conducting language-agnostic instance segmentation then matching with given text query. However, the semantic concepts from text query and visual cues are separately interacted during the training, and both instance and semantic labels for each object are required, which is time consuming and human-labor intensive. To mitigate these issues, we propose a novel Referring 3D Segmentation pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask. Specifically, we design a Point-Word Cross-Modal Alignment module for aligning the fine-grained features of points and textual embedding. Query Mask Predictor module and Query-Sentence Alignment module are introduced for coarse-grained alignment between masks and query. Furthermore, we propose an area regularization loss, which coarsely reduces irrelevant background predictions on a large scale. Besides, a point-to-point contrastive loss is proposed concentrating on distinguishing points with subtly similar features. Through extensive experiments, we achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels. Code is available at https://github.com/mellody11/LESS.

LESS: Label-Efficient and Single-Stage Referring 3D Instance Segmentation / Liu, Xuexun; Xu, Xiaoxu; Li, Jinlong; Zhang, Qiudan; Wang, Xu; Sebe, Nicu; Ma, Lin. - (2024). ( 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada December 2024).

LESS: Label-Efficient and Single-Stage Referring 3D Instance Segmentation

Jinlong Li;Nicu Sebe;
2024-01-01

Abstract

Referring 3D Segmentation is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query. Previous works perform a two-stage paradigm, first conducting language-agnostic instance segmentation then matching with given text query. However, the semantic concepts from text query and visual cues are separately interacted during the training, and both instance and semantic labels for each object are required, which is time consuming and human-labor intensive. To mitigate these issues, we propose a novel Referring 3D Segmentation pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask. Specifically, we design a Point-Word Cross-Modal Alignment module for aligning the fine-grained features of points and textual embedding. Query Mask Predictor module and Query-Sentence Alignment module are introduced for coarse-grained alignment between masks and query. Furthermore, we propose an area regularization loss, which coarsely reduces irrelevant background predictions on a large scale. Besides, a point-to-point contrastive loss is proposed concentrating on distinguishing points with subtly similar features. Through extensive experiments, we achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels. Code is available at https://github.com/mellody11/LESS.
2024
38th Conference on Neural Information Processing Systems (NeurIPS 2024)
New York
NeurIPS
Liu, Xuexun; Xu, Xiaoxu; Li, Jinlong; Zhang, Qiudan; Wang, Xu; Sebe, Nicu; Ma, Lin
LESS: Label-Efficient and Single-Stage Referring 3D Instance Segmentation / Liu, Xuexun; Xu, Xiaoxu; Li, Jinlong; Zhang, Qiudan; Wang, Xu; Sebe, Nicu; Ma, Lin. - (2024). ( 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada December 2024).
File in questo prodotto:
File Dimensione Formato  
10191_LESS_Label_Efficient_and.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 3.14 MB
Formato Adobe PDF
3.14 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/442617
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact