Human Face and Behavior Analysis

Wang, Wei

Human face and behavior analysis are very important research topics in the field of computer vision and they have broad applications in our everyday life. For instance, face alignment, face aging, face expression analysis and action recognition have been well studied and applied for security and entertainment. With these face analyzing techniques (e.g., face aging), we could enhance the performance of cross-age face verification system which now has been used for banks and electronic devices to recognize their clients. With the help of action recognition system, we could better summarize the user uploaded videos or generate logs for surveillance videos. This could help us retrieve the videos more accurately and easily. The dictionary learning and neural networks are powerful machine learning models for these research tasks. Initially, we focus on the multi-view action recognition task. First, a class-wise dictionary is pre-trained which encourages the sparse representations of the between-class videos from different views to lie close by. Next, we integrate the classifiers and the dictionary learning model into a unified model to learn the dictionary and classifiers jointly. For face alignment, we frame the standard cascaded face alignment problem as a recurrent process by using a recurrent neural network. Importantly, by combining a convolutional neural network with a recurrent one we alleviate hand-crafted features to learn task-specific features. For human face aging task, it takes as input a single image and automatically outputs a series of aged faces. Since human face aging is a smooth progression, it is more appropriate to age the face by going through smooth transitional states. In this way, the intermediate aged faces between the age groups can be generated. Towards this target, we employ a recurrent neural network. The hidden units in the RFA are connected autoregressively allowing the framework to age the person by referring to the previous aged faces. For smile video generation, one person may smile in different ways (e.g., closing/opening the eyes or mouth). This is a one-to-many image-to-video generation problem, and we introduce a deep neural architecture named conditional multi-mode network (CMM-Net) to approach it. A multi-mode recurrent generator is trained to induce diversity and generate K different sequences of video frames.

Human Face and Behavior Analysis / Wang, Wei. - (2018), pp. 1-101.

Human Face and Behavior Analysis

Wang, Wei

2018-01-01

Abstract

Human face and behavior analysis are very important research topics in the field of computer vision and they have broad applications in our everyday life. For instance, face alignment, face aging, face expression analysis and action recognition have been well studied and applied for security and entertainment. With these face analyzing techniques (e.g., face aging), we could enhance the performance of cross-age face verification system which now has been used for banks and electronic devices to recognize their clients. With the help of action recognition system, we could better summarize the user uploaded videos or generate logs for surveillance videos. This could help us retrieve the videos more accurately and easily. The dictionary learning and neural networks are powerful machine learning models for these research tasks. Initially, we focus on the multi-view action recognition task. First, a class-wise dictionary is pre-trained which encourages the sparse representations of the between-class videos from different views to lie close by. Next, we integrate the classifiers and the dictionary learning model into a unified model to learn the dictionary and classifiers jointly. For face alignment, we frame the standard cascaded face alignment problem as a recurrent process by using a recurrent neural network. Importantly, by combining a convolutional neural network with a recurrent one we alleviate hand-crafted features to learn task-specific features. For human face aging task, it takes as input a single image and automatically outputs a series of aged faces. Since human face aging is a smooth progression, it is more appropriate to age the face by going through smooth transitional states. In this way, the intermediate aged faces between the age groups can be generated. Towards this target, we employ a recurrent neural network. The hidden units in the RFA are connected autoregressively allowing the framework to age the person by referring to the previous aged faces. For smile video generation, one person may smile in different ways (e.g., closing/opening the eyes or mouth). This is a one-to-many image-to-video generation problem, and we introduce a deep neural architecture named conditional multi-mode network (CMM-Net) to approach it. A multi-mode recurrent generator is trained to induce diversity and generate K different sequences of video frames.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2018
			
	Ciclo
	
				XXX
			
	Anno Accademico
	
				2018-2019
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi esterno (External supervisor)
	
				Sebe, Nicu
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd_thesis.pdf Open Access dal 09/05/2020 Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 18.66 MB Formato Adobe PDF Visualizza/Apri	18.66 MB	Adobe PDF	Visualizza/Apri
Disclaimer_Wei.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.06 MB Formato Adobe PDF Visualizza/Apri	1.06 MB	Adobe PDF	Visualizza/Apri