Human face and behavior analysis are very important research topics in the field of computer vision and they have broad applications in our everyday life. For instance, face alignment, face aging, face expression analysis and action recognition have been well studied and applied for security and entertainment. With these face analyzing techniques (e.g., face aging), we could enhance the performance of cross-age face verification system which now has been used for banks and electronic devices to recognize their clients. With the help of action recognition system, we could better summarize the user uploaded videos or generate logs for surveillance videos. This could help us retrieve the videos more accurately and easily. The dictionary learning and neural networks are powerful machine learning models for these research tasks. Initially, we focus on the multi-view action recognition task. First, a class-wise dictionary is pre-trained which encourages the sparse representations of the between-class videos from different views to lie close by. Next, we integrate the classifiers and the dictionary learning model into a unified model to learn the dictionary and classifiers jointly. For face alignment, we frame the standard cascaded face alignment problem as a recurrent process by using a recurrent neural network. Importantly, by combining a convolutional neural network with a recurrent one we alleviate hand-crafted features to learn task-specific features. For human face aging task, it takes as input a single image and automatically outputs a series of aged faces. Since human face aging is a smooth progression, it is more appropriate to age the face by going through smooth transitional states. In this way, the intermediate aged faces between the age groups can be generated. Towards this target, we employ a recurrent neural network. The hidden units in the RFA are connected autoregressively allowing the framework to age the person by referring to the previous aged faces. For smile video generation, one person may smile in different ways (e.g., closing/opening the eyes or mouth). This is a one-to-many image-to-video generation problem, and we introduce a deep neural architecture named conditional multi-mode network (CMM-Net) to approach it. A multi-mode recurrent generator is trained to induce diversity and generate K different sequences of video frames.

Human Face and Behavior Analysis / Wang, Wei. - (2018), pp. 1-101.

Human Face and Behavior Analysis

Wang, Wei
2018-01-01

Abstract

Human face and behavior analysis are very important research topics in the field of computer vision and they have broad applications in our everyday life. For instance, face alignment, face aging, face expression analysis and action recognition have been well studied and applied for security and entertainment. With these face analyzing techniques (e.g., face aging), we could enhance the performance of cross-age face verification system which now has been used for banks and electronic devices to recognize their clients. With the help of action recognition system, we could better summarize the user uploaded videos or generate logs for surveillance videos. This could help us retrieve the videos more accurately and easily. The dictionary learning and neural networks are powerful machine learning models for these research tasks. Initially, we focus on the multi-view action recognition task. First, a class-wise dictionary is pre-trained which encourages the sparse representations of the between-class videos from different views to lie close by. Next, we integrate the classifiers and the dictionary learning model into a unified model to learn the dictionary and classifiers jointly. For face alignment, we frame the standard cascaded face alignment problem as a recurrent process by using a recurrent neural network. Importantly, by combining a convolutional neural network with a recurrent one we alleviate hand-crafted features to learn task-specific features. For human face aging task, it takes as input a single image and automatically outputs a series of aged faces. Since human face aging is a smooth progression, it is more appropriate to age the face by going through smooth transitional states. In this way, the intermediate aged faces between the age groups can be generated. Towards this target, we employ a recurrent neural network. The hidden units in the RFA are connected autoregressively allowing the framework to age the person by referring to the previous aged faces. For smile video generation, one person may smile in different ways (e.g., closing/opening the eyes or mouth). This is a one-to-many image-to-video generation problem, and we introduce a deep neural architecture named conditional multi-mode network (CMM-Net) to approach it. A multi-mode recurrent generator is trained to induce diversity and generate K different sequences of video frames.
2018
XXX
2018-2019
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Sebe, Nicu
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
phd_thesis.pdf

Open Access dal 09/05/2020

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 18.66 MB
Formato Adobe PDF
18.66 MB Adobe PDF Visualizza/Apri
Disclaimer_Wei.pdf

Solo gestori archivio

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.06 MB
Formato Adobe PDF
1.06 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/367945
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact