The mainstream motion prediction methods usually focus on short-term prediction, and their predicted long-term motions often fall into an average pose, i.e., the freezing forecasting problem [27]. To mitigate this problem, we propose a novel Bidirectional Transformer-based Generative Adversarial Network (BiTGAN) for long-term human motion prediction. The bidirectional setup leads to consistent and smooth generation in both forward and backward directions. Besides, to make full use of the history motions, we split them into two parts. The first part is fed to the Transformer encoder in our BiTGAN while the second part is used as the decoder input. This strategy can alleviate the exposure problem [37]. Additionally, to better maintain both the local (i.e., frame-level pose) and global (i.e., video-level semantic) similarities between the predicted motion sequence and the real one, the soft dynamic time warping (Soft-DTW) loss is introduced into the generator. Finally, we utilize a dual-discriminator to distinguish the predicted sequence at both frame and sequence levels. Extensive experiments on the public Human3.6M dataset demonstrate that our proposed BiTGAN achieves state-of-the-art performance on long-term (4s) human motion prediction, and reduces the average error of all actions by 4%.

Bidirectional Transformer GAN for Long-term Human Motion Prediction / Zhao, My.; Tang, H.; Xie, P.; Dai, Sl.; Sebe, N.; Wang, W.. - In: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS. - ISSN 1551-6857. - 19:5(2023), pp. 16301-16319. [10.1145/3579359]

Bidirectional Transformer GAN for Long-term Human Motion Prediction

Tang, H.;Sebe, N.;Wang, W.
2023-01-01

Abstract

The mainstream motion prediction methods usually focus on short-term prediction, and their predicted long-term motions often fall into an average pose, i.e., the freezing forecasting problem [27]. To mitigate this problem, we propose a novel Bidirectional Transformer-based Generative Adversarial Network (BiTGAN) for long-term human motion prediction. The bidirectional setup leads to consistent and smooth generation in both forward and backward directions. Besides, to make full use of the history motions, we split them into two parts. The first part is fed to the Transformer encoder in our BiTGAN while the second part is used as the decoder input. This strategy can alleviate the exposure problem [37]. Additionally, to better maintain both the local (i.e., frame-level pose) and global (i.e., video-level semantic) similarities between the predicted motion sequence and the real one, the soft dynamic time warping (Soft-DTW) loss is introduced into the generator. Finally, we utilize a dual-discriminator to distinguish the predicted sequence at both frame and sequence levels. Extensive experiments on the public Human3.6M dataset demonstrate that our proposed BiTGAN achieves state-of-the-art performance on long-term (4s) human motion prediction, and reduces the average error of all actions by 4%.
2023
5
Zhao, My.; Tang, H.; Xie, P.; Dai, Sl.; Sebe, N.; Wang, W.
Bidirectional Transformer GAN for Long-term Human Motion Prediction / Zhao, My.; Tang, H.; Xie, P.; Dai, Sl.; Sebe, N.; Wang, W.. - In: ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS. - ISSN 1551-6857. - 19:5(2023), pp. 16301-16319. [10.1145/3579359]
File in questo prodotto:
File Dimensione Formato  
Bidirectional_TOMM22 .pdf

accesso aperto

Descrizione: just accepted
Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.62 MB
Formato Adobe PDF
1.62 MB Adobe PDF Visualizza/Apri
3579359.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.6 MB
Formato Adobe PDF
4.6 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/385749
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 9
  • OpenAlex ND
social impact