In this thesis, we mainly focus on image generation. However, one can still observe unsatisfying results produced by existing state-of-the-art methods. To address this limitation and further improve the quality of generated images, we propose a few novel models. The image generation task can be roughly divided into three subtasks, i.e., person image generation, scene image generation, and cross-modal translation. Person image generation can be further divided into three subtasks, namely, hand gesture generation, facial expression generation, and person pose generation. Meanwhile, scene image generation can be further divided into two subtasks, i.e., cross-view image translation and semantic image synthesis. For each task, we have proposed the corresponding solution. Specifically, for hand gesture generation, we have proposed the GestureGAN framework. For facial expression generation, we have proposed the Cycle-in-Cycle GAN (C2GAN) framework. For person pose generation, we have proposed the XingGAN and BiGraphGAN frameworks. For cross-view image translation, we have proposed the SelectionGAN framework. For semantic image synthesis, we have proposed the Local and Global GAN (LGGAN), EdgeGAN, and Dual Attention GAN (DAGAN) frameworks. Although each method was originally proposed for a certain task, we later discovered that each method is universal and can be used to solve different tasks. For instance, GestureGAN can be used to solve both hand gesture generation and cross-view image translation tasks. C2GAN can be used to solve facial expression generation, person pose generation, hand gesture generation, and cross-view image translation. SelectionGAN can be used to solve cross-view image translation, facial expression generation, person pose generation, hand gesture generation, and semantic image synthesis. Moreover, we explore cross-modal translation and propose a novel DanceGAN for audio-to-video translation.
Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes / Tang, Hao. - (2021 May 27), pp. 1-281. [10.15168/11572_306790]
Learning to Generate Things and Stuff: Guided Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes
Tang, Hao
2021-05-27
Abstract
In this thesis, we mainly focus on image generation. However, one can still observe unsatisfying results produced by existing state-of-the-art methods. To address this limitation and further improve the quality of generated images, we propose a few novel models. The image generation task can be roughly divided into three subtasks, i.e., person image generation, scene image generation, and cross-modal translation. Person image generation can be further divided into three subtasks, namely, hand gesture generation, facial expression generation, and person pose generation. Meanwhile, scene image generation can be further divided into two subtasks, i.e., cross-view image translation and semantic image synthesis. For each task, we have proposed the corresponding solution. Specifically, for hand gesture generation, we have proposed the GestureGAN framework. For facial expression generation, we have proposed the Cycle-in-Cycle GAN (C2GAN) framework. For person pose generation, we have proposed the XingGAN and BiGraphGAN frameworks. For cross-view image translation, we have proposed the SelectionGAN framework. For semantic image synthesis, we have proposed the Local and Global GAN (LGGAN), EdgeGAN, and Dual Attention GAN (DAGAN) frameworks. Although each method was originally proposed for a certain task, we later discovered that each method is universal and can be used to solve different tasks. For instance, GestureGAN can be used to solve both hand gesture generation and cross-view image translation tasks. C2GAN can be used to solve facial expression generation, person pose generation, hand gesture generation, and cross-view image translation. SelectionGAN can be used to solve cross-view image translation, facial expression generation, person pose generation, hand gesture generation, and semantic image synthesis. Moreover, we explore cross-modal translation and propose a novel DanceGAN for audio-to-video translation.File | Dimensione | Formato | |
---|---|---|---|
Thesis.pdf
Open Access dal 02/09/2021
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
55.61 MB
Formato
Adobe PDF
|
55.61 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione