The integration of point and voxel representations is becoming more common in Light Detection and Ranging (LiDAR)-based 3D object detection. However, existing fusion strategies suffer from ineffective semantic alignment and contextual information loss, while relying solely on point features within regions of interest leads to geometric detail degradation and limited local–global feature integration. To tackle these challenges, we propose the Point-Voxel Attention Fusion Network (PVAFN), a novel two-stage 3D object detector that introduces a point-voxel attention fusion module based on dual-gated cross-modal interaction and a multi-pooling strategy based on density-space awareness. During the feature extraction and fusion stage, a dual-gated hierarchical attention mechanism is proposed to dynamically fuse three heterogeneous modalities—keypoint-based geometric details, voxel-wise local regularity, and Bird's-Eye-View (BEV)-level global semantics—through learnable gating functions. In th...

The integration of point and voxel representations is becoming more common in Light Detection and Ranging (LiDAR)-based 3D object detection. However, existing fusion strategies suffer from ineffective semantic alignment and contextual information loss, while relying solely on point features within regions of interest leads to geometric detail degradation and limited local–global feature integration. To tackle these challenges, we propose the Point-Voxel Attention Fusion Network (PVAFN), a novel two-stage 3D object detector that introduces a point-voxel attention fusion module based on dual-gated cross-modal interaction and a multi-pooling strategy based on density-space awareness. During the feature extraction and fusion stage, a dual-gated hierarchical attention mechanism is proposed to dynamically fuse three heterogeneous modalities—keypoint-based geometric details, voxel-wise local regularity, and Bird's-Eye-View (BEV)-level global semantics—through learnable gating functions. In the refinement stage, a density-spatial-aware multi-pooling enhancement module is designed to synergize density-aware cluster pooling and multi-scale spatial-aware pyramid pooling, efficiently capturing key geometric details and fine-grained shape structures. This design enhances the integration of local and global features while enabling adaptive multi-scale context modeling and spatially sensitive feature aggregation. Extensive experiments on the KITTI and Waymo benchmark datasets demonstrate that PVAFN achieves promising detection accuracy in 3D mean Average Precision.

PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection / Li, Yidi; Wen, Jiahao; Gong, Rui; Ren, Bin; Wenhao, ; Cheng, Chen; Liu, Hong; Sebe, Nicu. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - 281:(2025). [10.1016/j.eswa.2025.127608]

PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

Bin Ren;Nicu Sebe
2025-01-01

Abstract

The integration of point and voxel representations is becoming more common in Light Detection and Ranging (LiDAR)-based 3D object detection. However, existing fusion strategies suffer from ineffective semantic alignment and contextual information loss, while relying solely on point features within regions of interest leads to geometric detail degradation and limited local–global feature integration. To tackle these challenges, we propose the Point-Voxel Attention Fusion Network (PVAFN), a novel two-stage 3D object detector that introduces a point-voxel attention fusion module based on dual-gated cross-modal interaction and a multi-pooling strategy based on density-space awareness. During the feature extraction and fusion stage, a dual-gated hierarchical attention mechanism is proposed to dynamically fuse three heterogeneous modalities—keypoint-based geometric details, voxel-wise local regularity, and Bird's-Eye-View (BEV)-level global semantics—through learnable gating functions. In th...
2025
Li, Yidi; Wen, Jiahao; Gong, Rui; Ren, Bin; Wenhao, ; Cheng, Chen; Liu, Hong; Sebe, Nicu
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection / Li, Yidi; Wen, Jiahao; Gong, Rui; Ren, Bin; Wenhao, ; Cheng, Chen; Liu, Hong; Sebe, Nicu. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - 281:(2025). [10.1016/j.eswa.2025.127608]
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0957417425012308-main.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 2.68 MB
Formato Adobe PDF
2.68 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/453791
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 3
  • OpenAlex 4
social impact