Image Animation Using Deep Learning

Siarohin, Aliaksandr

doi:10.15168/11572_310291

Recently media content generation, particularly image and video, using deep learning gained a lot of attention in the research community. One of the main reasons for that is the surge of the interactions in the social networks, that draw a lot of people without specialized backgrounds into the media industry. This raises the interest in the tolls for simplifying the production of the media content, such as images and videos. Another potential avenue for deep learning methods is a simplification of the content generation for the traditional media, especially creation of movies, visual effects for which require significant human efforts. One of the most promising directions, in deep learning based content production, is image animation, e.g. generation of videos based on the known appearance and the movement. While the GANs and other generative models have shown great progress in the generation of the random images and videos, photorealistic quality in image animation is still not achieved. In this work we made a step into this direction by stating the task of the image animation and proposing several new methods for solving it. In the image animation we are given a single source image and a driving video and are asked to produce a video where an object from the source image moves like an object from the driving video. Prior to adapting deep learning techniques for image animation we investigate an ability of these to work on simpler, but related task, e.g. pose-guided generation. In this task we are given source image and target pose and asked to generate a person from the source image in the target pose. We identify the main flaw of the current image2image architectures and propose a new architecture, based on deformable skip connection, to address this. We used these insides to create methods for image animation. To this end we propose an architecture called Monkey-Net that is based on the keypoints learned in the unsupervised way. However only the keypoints are not able to represent all the possible variations in the pose movement, to this end we propose to extend the keypoints representation with local affine transformation around these keypoints in the First Order Model. Finally we propose a new better way of estimation of the affine transformations using Principal Component Analysis (PCA).

Image Animation Using Deep Learning / Siarohin, Aliaksandr. - (2021 Jun 24), pp. 1-88. [10.15168/11572_310291]

Image Animation Using Deep Learning

Siarohin, Aliaksandr

2021-06-24

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				24-giu-2021
			
	Ciclo
	
				XXXIII
			
	Anno Accademico
	
				2019-2020
			
	Dipartimento
	
				Ingegneria e Scienza dell'Informaz (cess.4/11/12)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Sebe, Niculae
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_310291
			
	Lingua (Language)
	
				Inglese
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd_unitn_alisaksandr_siarohin.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Creative commons Dimensione 45.32 MB Formato Adobe PDF Visualizza/Apri	45.32 MB	Adobe PDF	Visualizza/Apri