Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning

Ren, Bin; Mei, Guofeng; Pani Paudel, Danda; Wang, Weijie; Yawei, Li; Liu, Mengyuan; Cucchiara, Rita; Van Gool, Luc; Sebe, Nicu

doi:10.1007/978-981-96-0963-5_4

Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a weight-sharing encoder and two identically structured decoders are utilized to perform masked token reconstruction. Additionally, we propose that for an input token masked by both masks simultaneously, the reconstructed features should be as similar as possible. This naturally establishes an explicit contrastive constraint within the generative MAE-based pre-training paradigm, resulting in our proposed method, Point-CMAE. Consequently, Point-CMAE effectively enhances the representation quality and transfer performance compared to its MAE counterpart. Experimental evaluations across various downstream applications, including classification, part segmentation, and few-shot learning, demonstrate the efficacy of our framework in surpassing state-of-the-art techniques under standard ViTs and single-modal settings. The source code and trained models are available at https://github.com/Amazingren/Point-CMAE.

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning / Ren, Bin; Mei, Guofeng; Pani Paudel, Danda; Wang, Weijie; Li, Yawei; Liu, Mengyuan; Cucchiara, Rita; Van Gool, Luc; Sebe, Nicu. - 15478 LNCS:(2024), pp. 56-75. (Intervento presentato al convegno 17th Asian Conference on Computer Vision, ACCV 2024 tenutosi a Hanoi nel 2024) [10.1007/978-981-96-0963-5_4].

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning

Ren, Bin;Mei, Guofeng;Pani Paudel, Danda;Wang, Weijie;Li, Yawei;Liu, Mengyuan;Cucchiara, Rita;Van Gool, Luc;Sebe, Nicu

2024-01-01

Abstract

Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a weight-sharing encoder and two identically structured decoders are utilized to perform masked token reconstruction. Additionally, we propose that for an input token masked by both masks simultaneously, the reconstructed features should be as similar as possible. This naturally establishes an explicit contrastive constraint within the generative MAE-based pre-training paradigm, resulting in our proposed method, Point-CMAE. Consequently, Point-CMAE effectively enhances the representation quality and transfer performance compared to its MAE counterpart. Experimental evaluations across various downstream applications, including classification, part segmentation, and few-shot learning, demonstrate the efficacy of our framework in surpassing state-of-the-art techniques under standard ViTs and single-modal settings. The source code and trained models are available at https://github.com/Amazingren/Point-CMAE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Luogo di edizione (Place of publication)
	
				Heidelberg
			
	Casa editrice (Publisher)
	
				Springer Science and Business Media Deutschland GmbH
			
	ISBN
	
				9789819609628
9789819609635
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85212942089
			
	Tutti gli autori
	
						Ren, Bin; Mei, Guofeng; Pani Paudel, Danda; Wang, Weijie; Li, Yawei; Liu, Mengyuan; Cucchiara, Rita; Van Gool, Luc; Sebe, Nicu
					
	Citazione
	
				Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning / Ren, Bin; Mei, Guofeng; Pani Paudel, Danda; Wang, Weijie; Li, Yawei; Liu, Mengyuan; Cucchiara, Rita; Van Gool, Luc; Sebe, Nicu. - 15478 LNCS:(2024), pp. 56-75. (Intervento presentato al  convegno 17th Asian Conference on Computer Vision, ACCV 2024 tenutosi a Hanoi nel 2024) [10.1007/978-981-96-0963-5_4].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/442591

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

ND

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning

Ren, Bin;Mei, Guofeng;Pani Paudel, Danda;Wang, Weijie;Li, Yawei;Liu, Mengyuan;Cucchiara, Rita;Van Gool, Luc;Sebe, Nicu

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)