Text-based human motion generation is challenging due to the complexity and context-dependency of natural human motions. In recent years, an increasing number of studies have focused on using transformer-based diffusion models to tackle this issue. However, an over-reliance on transformers has resulted in a lack of adequate detail in the generated motions. This study proposes a novel graph network-based diffusion model to address this challenging problem. Specifically, we use spatio-temporal graphs to capture local details for each node and an auxiliary transformer to aggregate the information across all nodes. In addition, the transformer is also used to process conditional global information that is difficult to handle with graph networks. Our model achieves competitive results on currently the largest dataset HumanML3D and outperforms existing diffusion models in terms of FID and diversity, demonstrating the advantages of graph neural networks in modeling human motion data.

Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation / Liu, Chang; Zhao, Mengyi; Ren, Bin; Liu, Mengyuan; Sebe, Nicu. - ELETTRONICO. - (2023). (Intervento presentato al convegno BMVC tenutosi a Aberdeen, UK nel 20th November - 24th November 2023).

Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation

Liu,Chang;Ren,Bin;Sebe,Nicu
2023-01-01

Abstract

Text-based human motion generation is challenging due to the complexity and context-dependency of natural human motions. In recent years, an increasing number of studies have focused on using transformer-based diffusion models to tackle this issue. However, an over-reliance on transformers has resulted in a lack of adequate detail in the generated motions. This study proposes a novel graph network-based diffusion model to address this challenging problem. Specifically, we use spatio-temporal graphs to capture local details for each node and an auxiliary transformer to aggregate the information across all nodes. In addition, the transformer is also used to process conditional global information that is difficult to handle with graph networks. Our model achieves competitive results on currently the largest dataset HumanML3D and outperforms existing diffusion models in terms of FID and diversity, demonstrating the advantages of graph neural networks in modeling human motion data.
2023
British Machine Vision Conference
Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation / Liu, Chang; Zhao, Mengyi; Ren, Bin; Liu, Mengyuan; Sebe, Nicu. - ELETTRONICO. - (2023). (Intervento presentato al convegno BMVC tenutosi a Aberdeen, UK nel 20th November - 24th November 2023).
Liu, Chang; Zhao, Mengyi; Ren, Bin; Liu, Mengyuan; Sebe, Nicu
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/399762
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact