Advance Fake Video Detection via Vision Transformers

IRIS

Recent advancements in AI-based multimedia generation have enabled the creation of hyper-realistic images and videos, raising concerns about their potential use in spreading misinformation. The widespread accessibility of generative techniques, which allow for the production of fake multimedia from prompts or existing media, along with their continuous refinement, underscores the urgent need for highly accurate and generalizable AI-generated media detection methods, underlined also by new regulations like the European Digital AI Act. In this paper, we draw inspiration from Vision Transformer (ViT)-based fake image detection and extend this idea to video. We propose an original framework that effectively integrates ViT embeddings over time to enhance detection performance. Our method shows promising accuracy, generalization, and few-shot learning capabilities across a new, large and diverse dataset of videos generated using five open source generative techniques from the state-of-the-art, as well as a separate dataset containing videos produced by proprietary generative methods.

Advance Fake Video Detection via Vision Transformers / Battocchio, J., Dell'Anna, S., Montibeller, A., Boato, G.. - (2025), pp. 1-11. (13th ACM Workshop on Information Hiding and Multimedia Security, IHandMMSec 2025 San Jose, California 28 Giugno 2025) [10.1145/3733102.3733129].

Advance Fake Video Detection via Vision Transformers

Battocchio, Joy;Dell'Anna, Stefano;Montibeller, Andrea;Boato, Giulia

2025-01-01

Abstract

Recent advancements in AI-based multimedia generation have enabled the creation of hyper-realistic images and videos, raising concerns about their potential use in spreading misinformation. The widespread accessibility of generative techniques, which allow for the production of fake multimedia from prompts or existing media, along with their continuous refinement, underscores the urgent need for highly accurate and generalizable AI-generated media detection methods, underlined also by new regulations like the European Digital AI Act. In this paper, we draw inspiration from Vision Transformer (ViT)-based fake image detection and extend this idea to video. We propose an original framework that effectively integrates ViT embeddings over time to enhance detection performance. Our method shows promising accuracy, generalization, and few-shot learning capabilities across a new, large and diverse dataset of videos generated using five open source generative techniques from the state-of-the-art, as well as a separate dataset containing videos produced by proprietary generative methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del volume (Proceedings title)
	
				Information Hiding and Multimedia Security (IH&MMSec)
			
	Luogo di edizione (Place of publication)
	
				1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
			
	Casa editrice (Publisher)
	
				ASSOC COMPUTING MACHINERY
			
	ISBN
	
				9798400718878
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-105031784279
			
	Codice WOS (WOS identifier)
	
				WOS:001540643000001
			
	Tutti gli autori
	
						Battocchio, Joy; Dell'Anna, Stefano; Montibeller, Andrea; Boato, Giulia
					
	Citazione
	
				Advance Fake Video Detection via Vision Transformers / Battocchio, J., Dell'Anna, S., Montibeller, A., Boato, G.. - (2025), pp. 1-11. (13th ACM Workshop on Information Hiding and Multimedia Security, IHandMMSec 2025 San Jose, California 28 Giugno 2025) [10.1145/3733102.3733129].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/463351

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

5

0

4

social impact