Click to Move: Controlling Video Generation with Sparse Motion

IRIS

This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input. Notably, our proposed deep architecture incorporates a Graph Convolution Network (GCN) modelling the movements of all the objects in the scene in a holistic manner and effectively combining the sparse user motion information and image features. Experimental results show that C2M outperforms existing methods on two publicly available datasets, thus demonstrating the effectiveness of our GCN framework at modelling object interactions. The source code is publicly available at https://github.com/PierfrancescoArdino/C2M.

Click to Move: Controlling Video Generation with Sparse Motion / Ardino, P.; De Nadai, M.; Lepri, B.; Ricci, E.; Lathuiliere, S.. - (2021), pp. 14729-14738. (Intervento presentato al convegno 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 tenutosi a Virtual nel 2021) [10.1109/ICCV48922.2021.01448].

Click to Move: Controlling Video Generation with Sparse Motion

Ardino P.;De Nadai M.;Lepri B.;Ricci E.;Lathuiliere S.

2021-01-01

Abstract

This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input. Notably, our proposed deep architecture incorporates a Graph Convolution Network (GCN) modelling the movements of all the objects in the scene in a holistic manner and effectively combining the sparse user motion information and image features. Experimental results show that C2M outperforms existing methods on two publicly available datasets, thus demonstrating the effectiveness of our GCN framework at modelling object interactions. The source code is publicly available at https://github.com/PierfrancescoArdino/C2M.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
			2021
		
	Titolo del volume (Proceedings title)
	
			Proceedings of the IEEE International Conference on Computer Vision
		
	Luogo di edizione (Place of publication)
	
			Virtual
		
	Casa editrice (Publisher)
	
			Institute of Electrical and Electronics Engineers Inc.
		
	ISBN
	
			978-1-6654-2812-5
		
	Codice Scopus (Scopus Identifier)
	
			2-s2.0-85127815298
		
	Codice WOS (WOS identifier)
	
			WOS:000798743204092
		
	Tutti gli autori
	
			Ardino, P.; De Nadai, M.; Lepri, B.; Ricci, E.; Lathuiliere, S.
		
	Citazione
	
			Click to Move: Controlling Video Generation with Sparse Motion / Ardino, P.; De Nadai, M.; Lepri, B.; Ricci, E.; Lathuiliere, S.. - (2021), pp. 14729-14738. (Intervento presentato al  convegno 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 tenutosi a Virtual nel 2021) [10.1109/ICCV48922.2021.01448].
		
	Appare nelle tipologie:
	
			04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/341652

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

1

social impact