Exploring 2D and 3D Human Generation and Editing

Zhang, Jichao

doi:10.15168/11572_400992

In modern society, cameras on intelligent devices can generate a huge amount of natural images, including images of the human body and face. Therefore, there is a huge social demand for more efficient editing of images to meet human production and life needs, including entertainment, such as image beauty. In recent years, Generative Models with Deep Learning techniques have attracted lots of attention in the Artificial Intelligence field, and some powerful methods, such as Variational Autoencoder and Generative Adversarial Networks, can generate very high-resolution and realistic images, especially for facial images, human body image. In this thesis, we follow the powerful generative model to achieve image generation and editing tasks, and we focus on human image generation and editing tasks, including local eye and face generation and editing, global human body generation, and editing. We introduce different methods to improve previous baselines based on different human regions. 1) Eye region of human image: Gaze correction and redirection aim to manipulate the eye gaze to a desired direction. Previous common gaze correction methods require annotating training data with precise gaze and head pose information. To address this issue, we proposed the new datasets as training data and formulated the gaze correction task as a generative inpainting problem, addressed using two new modules. 2) Face region of human image: Based on a powerful generative model for face region, many papers have learned to control the latent space to manipulate face attributes. However, they need more precise controls on 3d factors such as camera pose because they tend to ignore the underlying 3D scene rendering process. Thus, we take the pre-trained 3D-Aware generative model as the backbone and learn to manipulate the latent space using the attribute labels as conditional information to achieve the 3D-Aware face generation and editing task. 3) Human Body region of human image: 3D-Aware generative models have been shown to produce realistic images representing rigid/semi-rigid objects, such as facial regions. However, they usually struggle to generate high-quality images representing non-rigid objects, such as the human body, which greatly interests many computer graphics applications. Thus, we introduce semantic segmentation into the model. We split the entire generation pipeline into two stages and use intermediate segmentation masks to bridge these two stages. Furthermore, our model can control pose, semantic, and appearance codes by using multiple latent codes to achieve human image editing.

Exploring 2D and 3D Human Generation and Editing / Zhang, Jichao. - (2024 Feb 12), pp. 1-134. [10.15168/11572_400992]

Exploring 2D and 3D Human Generation and Editing

Zhang, Jichao

2024-02-12

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				12-feb-2024
			
	Ciclo
	
				XXXI
			
	Anno Accademico
	
				2022-2023
			
	Dipartimento
	
				Informatica e Telecomunicazioni (cess.31/12/07)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Sebe, Niculae
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Paese dell'Istituzione/ente esterno in caso di cotutela o collaborazioni internazionali (Country of the Institution in case of bi-nationally supervised PhD thesis or other international collaborations).
	
				ITALIA
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_400992
			
	Lingua (Language)
	
				Inglese
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
Jichao_PHD_Thesis.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 48.99 MB Formato Adobe PDF Visualizza/Apri	48.99 MB	Adobe PDF	Visualizza/Apri