Multi-term attention networks for skeleton-based action recognition

IRIS

The same action takes different time in different cases. This difference will affect the accuracy of action recognition to a certain extent. We propose an end-to-end deep neural network called "Multi-Term Attention Networks" (MTANs), which solves the above problem by extracting temporal features with different time scales. The network consists of a Multi-Term Attention Recurrent Neural Network (MTA-RNN) and a Spatio-Temporal Convolutional Neural Network (ST-CNN). In MTA-RNN, a method for fusing multi-term temporal features are proposed to extract the temporal dependence of different time scales, and the weighted fusion temporal feature is recalibrated by the attention mechanism. Ablation research proves that this network has powerful spatio-temporal dynamic modeling capabilities for actions with different time scales. We perform extensive experiments on four challenging benchmark datasets, including the NTU RGB+D dataset, UT-Kinect dataset, Northwestern-UCLA dataset, and UWA3DII dataset. Our method achieves better results than the state-of-the-art benchmarks, which demonstrates the effectiveness of MTANs.

Multi-term attention networks for skeleton-based action recognition / Diao, X.; Li, X.; Huang, C.. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 10:15(2020), p. 5326. [10.3390/APP10155326]

Multi-term attention networks for skeleton-based action recognition

Diao X.;Li X.;Huang C.

2020-01-01

Abstract

The same action takes different time in different cases. This difference will affect the accuracy of action recognition to a certain extent. We propose an end-to-end deep neural network called "Multi-Term Attention Networks" (MTANs), which solves the above problem by extracting temporal features with different time scales. The network consists of a Multi-Term Attention Recurrent Neural Network (MTA-RNN) and a Spatio-Temporal Convolutional Neural Network (ST-CNN). In MTA-RNN, a method for fusing multi-term temporal features are proposed to extract the temporal dependence of different time scales, and the weighted fusion temporal feature is recalibrated by the attention mechanism. Ablation research proves that this network has powerful spatio-temporal dynamic modeling capabilities for actions with different time scales. We perform extensive experiments on four challenging benchmark datasets, including the NTU RGB+D dataset, UT-Kinect dataset, Northwestern-UCLA dataset, and UWA3DII dataset. Our method achieves better results than the state-of-the-art benchmarks, which demonstrates the effectiveness of MTANs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2020
			
	Titolo del periodico (Journal title)
	
				APPLIED SCIENCES
			
	Numero e parte del fascicolo (Issue number and part)
	
				15
			
	DOI
	
				https://dx.doi.org/10.3390/APP10155326
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85089934522
			
	Codice WOS (WOS identifier)
	
				WOS:000559000100001
			
	Tutti gli autori
	
						Diao, X.; Li, X.; Huang, C.
					
	Citazione
	
				Multi-term attention networks for skeleton-based action recognition / Diao, X.; Li, X.; Huang, C.. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 10:15(2020), p. 5326. [10.3390/APP10155326]

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/369608

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

6

3

ND

social impact