Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation

IRIS

Text-based human motion generation is challenging due to the complexity and context-dependency of natural human motions. In recent years, an increasing number of studies have focused on using transformer-based diffusion models to tackle this issue. However, an over-reliance on transformers has resulted in a lack of adequate detail in the generated motions. This study proposes a novel graph network-based diffusion model to address this challenging problem. Specifically, we use spatio-temporal graphs to capture local details for each node and an auxiliary transformer to aggregate the information across all nodes. In addition, the transformer is also used to process conditional global information that is difficult to handle with graph networks. Our model achieves competitive results on currently the largest dataset HumanML3D and outperforms existing diffusion models in terms of FID and diversity, demonstrating the advantages of graph neural networks in modeling human motion data.

Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation / Liu, Chang; Zhao, Mengyi; Ren, Bin; Liu, Mengyuan; Sebe, Nicu. - ELETTRONICO. - (2023). (Intervento presentato al convegno BMVC tenutosi a Aberdeen, UK nel 20th November - 24th November 2023).

Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation

Liu,Chang;Zhao,Mengyi;Ren,Bin;Liu,Mengyuan;Sebe,Nicu

2023-01-01

Abstract

Text-based human motion generation is challenging due to the complexity and context-dependency of natural human motions. In recent years, an increasing number of studies have focused on using transformer-based diffusion models to tackle this issue. However, an over-reliance on transformers has resulted in a lack of adequate detail in the generated motions. This study proposes a novel graph network-based diffusion model to address this challenging problem. Specifically, we use spatio-temporal graphs to capture local details for each node and an auxiliary transformer to aggregate the information across all nodes. In addition, the transformer is also used to process conditional global information that is difficult to handle with graph networks. Our model achieves competitive results on currently the largest dataset HumanML3D and outperforms existing diffusion models in terms of FID and diversity, demonstrating the advantages of graph neural networks in modeling human motion data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di deposito (Filing date)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				British Machine Vision Conference
			
	Citazione
	
				Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation / Liu, Chang; Zhao, Mengyi; Ren, Bin; Liu, Mengyuan; Sebe, Nicu. - ELETTRONICO. - (2023). (Intervento presentato al  convegno BMVC tenutosi a Aberdeen, UK nel 20th November - 24th November 2023).
			
	Tutti gli autori
	
						Liu, Chang; Zhao, Mengyi; Ren, Bin; Liu, Mengyuan; Sebe, Nicu
					
	Appare nelle tipologie:
	
				04.3 Poster presentato a convegno (Poster presented at Conference or Workshop)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/399762

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

social impact