GraphMLP: A graph MLP-like architecture for 3D human pose estimation

Wenhao, Li; Liu, Mengyuan; Liu, Hong; Guo, Tianyu; Wang, Ti; Tang, Hao; Sebe, Nicu

doi:10.1016/j.patcog.2024.110925

Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human body configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand of the 3D human pose, while allowing for both local and global spatial interactions. Furthermore, we propose to flexibly and efficiently extend the GraphMLP to the video domain and show that complex temporal dynamics can be effectively modeled in a simple way with negligible computational cost gains in the...

Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human body configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand of the 3D human pose, while allowing for both local and global spatial interactions. Furthermore, we propose to flexibly and efficiently extend the GraphMLP to the video domain and show that complex temporal dynamics can be effectively modeled in a simple way with negligible computational cost gains in the sequence length. To the best of our knowledge, this is the first MLP-Like architecture for 3D human pose estimation in a single frame and a video sequence. Extensive experiments show that the proposed GraphMLP achieves state-of-the-art performance on two datasets, i.e., Human3.6M and MPI-INF-3DHP. Code and models are available at https://github.com/Vegetebird/GraphMLP.

GraphMLP: A graph MLP-like architecture for 3D human pose estimation / Li, W., Liu, M., Liu, H., Guo, T., Wang, T.i., Tang, H., Sebe, N.. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 158:(2025), pp. 11092501-11092510. [10.1016/j.patcog.2024.110925]

GraphMLP: A graph MLP-like architecture for 3D human pose estimation

Li, Wenhao;Liu, Mengyuan;Liu, Hong;Guo, Tianyu;Wang, Ti;Tang, Hao;Sebe, Nicu

2025-01-01

Abstract

Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human body configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand of the 3D human pose, while allowing for both local and global spatial interactions. Furthermore, we propose to flexibly and efficiently extend the GraphMLP to the video domain and show that complex temporal dynamics can be effectively modeled in a simple way with negligible computational cost gains in the...

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del periodico (Journal title)
	
				PATTERN RECOGNITION
			
	DOI
	
				https://dx.doi.org/10.1016/j.patcog.2024.110925
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85203812631
			
	Codice WOS (WOS identifier)
	
				WOS:001317916400001
			
	Tutti gli autori
	
						Li, Wenhao; Liu, Mengyuan; Liu, Hong; Guo, Tianyu; Wang, Ti; Tang, Hao; Sebe, Nicu
					
	Citazione
	
				GraphMLP: A graph MLP-like architecture for 3D human pose estimation / Li, W., Liu, M., Liu, H., Guo, T., Wang, T.i., Tang, H., Sebe, N.. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 158:(2025), pp. 11092501-11092510. [10.1016/j.patcog.2024.110925]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0031320324006769-main.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.88 MB Formato Adobe PDF Visualizza/Apri	1.88 MB	Adobe PDF	Visualizza/Apri