SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

IRIS

In the last year, universal monocular metric depth estimation (universal MMDE) has gained considerable attention, serving as the foundation model for various multimedia tasks, such as video and image editing. Nonetheless, current approaches face challenges in maintaining consistent accuracy across diverse scenes without scene-specific parameters and pre-training, hindering the practicality of MMDE. Furthermore, these methods rely on extensive datasets comprising millions, if not tens of millions, of data for training, leading to significant time and hardware expenses. This paper presents SM4Depth, a model that seamlessly works for both indoor and outdoor scenes, without needing extensive training data and GPU clusters. Firstly, to obtain consistent depth across diverse scenes, we propose a novel metric scale modeling, i.e., variation- based unnormalized depth bins. It reduces the ambiguity of the conventional metric bins and enables better adaptation to large depth gaps of scenes during training. Secondly, we propose a ''divide and conquer'' solution to reduce reliance on massive training data. Instead of estimating directly from the vast solution space, the metric bins are estimated from multiple solution sub-spaces to reduce complexity. Additionally, we introduce an uncut depth dataset, BUPT Depth, to evaluate the depth accuracy and consistency across various indoor and outdoor scenes. Trained on a consumer-grade GPU using just 150K RGB-D pairs, SM4Depth achieves outstanding performance on the most never-before-seen datasets, especially maintaining consistent accuracy across indoors and outdoors. The code can be found here.

SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model / Liu, Y.; Xue, F.; Ming, A.; Zhao, M.; Ma, H.; Sebe, N.. - (2024), pp. 3469-3478. ( 32nd ACM International Conference on Multimedia, MM 2024 aus 2024) [10.1145/3664647.3681405].

SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

Liu Y.;Xue F.;Ming A.;Zhao M.;Ma H.;Sebe N.

2024-01-01

Abstract

In the last year, universal monocular metric depth estimation (universal MMDE) has gained considerable attention, serving as the foundation model for various multimedia tasks, such as video and image editing. Nonetheless, current approaches face challenges in maintaining consistent accuracy across diverse scenes without scene-specific parameters and pre-training, hindering the practicality of MMDE. Furthermore, these methods rely on extensive datasets comprising millions, if not tens of millions, of data for training, leading to significant time and hardware expenses. This paper presents SM4Depth, a model that seamlessly works for both indoor and outdoor scenes, without needing extensive training data and GPU clusters. Firstly, to obtain consistent depth across diverse scenes, we propose a novel metric scale modeling, i.e., variation- based unnormalized depth bins. It reduces the ambiguity of the conventional metric bins and enables better adaptation to large depth gaps of scenes during training. Secondly, we propose a ''divide and conquer'' solution to reduce reliance on massive training data. Instead of estimating directly from the vast solution space, the metric bins are estimated from multiple solution sub-spaces to reduce complexity. Additionally, we introduce an uncut depth dataset, BUPT Depth, to evaluate the depth accuracy and consistency across various indoor and outdoor scenes. Trained on a consumer-grade GPU using just 150K RGB-D pairs, SM4Depth achieves outstanding performance on the most never-before-seen datasets, especially maintaining consistent accuracy across indoors and outdoors. The code can be found here.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
			
	Luogo di edizione (Place of publication)
	
				New York
			
	Casa editrice (Publisher)
	
				Association for Computing Machinery, Inc
			
	ISBN
	
				9798400706868
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85209824126
			
	Tutti gli autori
	
						Liu, Y.; Xue, F.; Ming, A.; Zhao, M.; Ma, H.; Sebe, N.
					
	Citazione
	
				SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model / Liu, Y.; Xue, F.; Ming, A.; Zhao, M.; Ma, H.; Sebe, N.. - (2024), pp. 3469-3478. ( 32nd ACM International Conference on Multimedia, MM 2024 aus 2024) [10.1145/3664647.3681405].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/439531

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

ND

social impact