Each smile is unique: One person surely smiles in different ways (e.g. closing/opening the eyes or mouth). Given one input image of a neutral face, can we generate multiple smile videos with distinctive characteristics? To tackle this one-to-many video generation problem, we propose a novel deep learning architecture named Conditional Multi-Mode Network (CMM-Net). To better encode the dynamics of facial expressions, CMM-Net explicitly exploits facial landmarks for generating smile sequences. Specifically, a variational auto-encoder is used to learn a facial landmark embedding. This single embedding is then exploited by a conditional recurrent network which generates a landmark embedding sequence conditioned on a specific expression (e.g. spontaneous smile). Next, the generated landmark embeddings are fed into a multi-mode recurrent landmark generator, producing a set of landmark sequences still associated to the given smile class but clearly distinct from each other. Finally, these lan...

Each smile is unique: one person surely smiles in different ways (e.g. closing/opening the eyes or mouth). Given one input image of a neutral face, can we generate multiple smile videos with distinctive characteristics? To tackle this one-to-many video generation problem, we propose a novel deep learning architecture named Conditional MultiMode Network (CMM-Net). To better encode the dynamics of facial expressions, CMM-Net explicitly exploits facial landmarks for generating smile sequences. Specifically, a variational auto-encoder is used to learn a facial landmark embedding. This single embedding is then exploited by a conditional recurrent network which generates a landmark embedding sequence conditioned on a specific expression (e.g. spontaneous smile). Next, the generated landmark embeddings are fed into a multi-mode recurrent landmark generator, producing a set of landmark sequences still associated to the given smile class but clearly distinct from each other. Finally, these landmark sequences are translated into face videos. Our experimental results demonstrate the effectiveness of our CMM-Net in generating realistic videos of multiple smile expressions.

Every Smile is Unique: Landmark-Guided Diverse Smile Generation / Wang, Wei; Alameda-Pineda, Xavier; Xu, Dan; Fua, Pascal; Ricci, Elisa; Sebe, Nicu. - (2018), pp. 7083-7092. ( 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 Salt Lake City, UT, USA 18-23 June) [10.1109/CVPR.2018.00740].

Every Smile is Unique: Landmark-Guided Diverse Smile Generation

Wang, Wei;Alameda-Pineda, Xavier;Xu, Dan;Ricci, Elisa;Sebe, Nicu
2018-01-01

Abstract

Each smile is unique: One person surely smiles in different ways (e.g. closing/opening the eyes or mouth). Given one input image of a neutral face, can we generate multiple smile videos with distinctive characteristics? To tackle this one-to-many video generation problem, we propose a novel deep learning architecture named Conditional Multi-Mode Network (CMM-Net). To better encode the dynamics of facial expressions, CMM-Net explicitly exploits facial landmarks for generating smile sequences. Specifically, a variational auto-encoder is used to learn a facial landmark embedding. This single embedding is then exploited by a conditional recurrent network which generates a landmark embedding sequence conditioned on a specific expression (e.g. spontaneous smile). Next, the generated landmark embeddings are fed into a multi-mode recurrent landmark generator, producing a set of landmark sequences still associated to the given smile class but clearly distinct from each other. Finally, these lan...
2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition
Piscataway, NJ USA
IEEE
978-1-5386-6420-9
Wang, Wei; Alameda-Pineda, Xavier; Xu, Dan; Fua, Pascal; Ricci, Elisa; Sebe, Nicu
Every Smile is Unique: Landmark-Guided Diverse Smile Generation / Wang, Wei; Alameda-Pineda, Xavier; Xu, Dan; Fua, Pascal; Ricci, Elisa; Sebe, Nicu. - (2018), pp. 7083-7092. ( 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 Salt Lake City, UT, USA 18-23 June) [10.1109/CVPR.2018.00740].
File in questo prodotto:
File Dimensione Formato  
every.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.95 MB
Formato Adobe PDF
1.95 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/225600
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 58
  • ???jsp.display-item.citation.isi??? 35
  • OpenAlex ND
social impact