Text-Enhanced Zero-Shot Action Recognition: A Training-Free Approach

IRIS

Vision-language models (VLMs) have demonstrated remarkable performance across various visual tasks, leveraging joint learning of visual and textual representations. While these models excel in zeroshot image tasks, their application to zero-shot video action recognition (ZSVAR) remains challenging due to the dynamic and temporal nature of actions. Existing methods for ZS-VAR typically require extensive training on specific datasets, which can be resource-intensive and may introduce domain biases. In this work, we propose Text-Enhanced Action Recognition (TEAR), a simple approach to ZS-VAR that is training-free and does not require the availability of training data or extensive computational resources. Drawing inspiration from recent findings in vision and language literature, we utilize action descriptors for decomposition and contextual information to enhance zero-shot action recognition. Through experiments on UCF101, HMDB51, and Kinetics600 datasets, we showcase the effectiveness and applicability of our proposed approach in addressing the challenges of ZS-VAR. (The code will be released later at https://github.com/MaXDL4Phys/tear).

Text-Enhanced Zero-Shot Action Recognition: A Training-Free Approach / Bosetti, M., Zhang, S., Liberatori, B., Zara, G., Ricci, E., Rota, P.. - 15315:(2024), pp. 327-342. (ICPR Kolkata 1-5/12/2024) [10.1007/978-3-031-78354-8_21].

Text-Enhanced Zero-Shot Action Recognition: A Training-Free Approach

Bosetti, Massimo;Zhang, Shibingfeng;Liberatori, Bendetta;Zara, Giacomo;Ricci, Elisa;Rota, Paolo

2024-01-01

Abstract

Vision-language models (VLMs) have demonstrated remarkable performance across various visual tasks, leveraging joint learning of visual and textual representations. While these models excel in zeroshot image tasks, their application to zero-shot video action recognition (ZSVAR) remains challenging due to the dynamic and temporal nature of actions. Existing methods for ZS-VAR typically require extensive training on specific datasets, which can be resource-intensive and may introduce domain biases. In this work, we propose Text-Enhanced Action Recognition (TEAR), a simple approach to ZS-VAR that is training-free and does not require the availability of training data or extensive computational resources. Drawing inspiration from recent findings in vision and language literature, we utilize action descriptors for decomposition and contextual information to enhance zero-shot action recognition. Through experiments on UCF101, HMDB51, and Kinetics600 datasets, we showcase the effectiveness and applicability of our proposed approach in addressing the challenges of ZS-VAR. (The code will be released later at https://github.com/MaXDL4Phys/tear).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				Lecture Notes in Computer Science ((LNCS,volume 15315))
			
	Luogo di edizione (Place of publication)
	
				GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
			
	Casa editrice (Publisher)
	
				SPRINGER INTERNATIONAL PUBLISHING AG
			
	ISBN
	
				9783031783531
9783031783548
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85212480986
			
	Codice WOS (WOS identifier)
	
				WOS:001565265500021
			
	Tutti gli autori
	
						Bosetti, Massimo; Zhang, Shibingfeng; Liberatori, Bendetta; Zara, Giacomo; Ricci, Elisa; Rota, Paolo
					
	Citazione
	
				Text-Enhanced Zero-Shot Action Recognition: A Training-Free Approach / Bosetti, M., Zhang, S., Liberatori, B., Zara, G., Ricci, E., Rota, P.. - 15315:(2024), pp. 327-342. (ICPR Kolkata 1-5/12/2024) [10.1007/978-3-031-78354-8_21].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/470970

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

0

1

social impact