Personalized Face Modeling for Improved Face
Reconstruction and Motion Retargeting

ECCV 2020 (Spotlight)

Bindita Chaudhuri¹

Noranart Vesdapunt²

Linda Shapiro¹

Baoyuan Wang²

¹ University of Washington ² Microsoft Cloud and AI

Unfortunately, we won't be able to release our code since some parts of the code belongs to Microsoft's confidential property.

Our end-to-end framework. Our framework takes frames from in-the-wild video(s) of a user as input and generates per-frame tracking parameters via the TrackNet and personalized face model of the user via the ModelNet. The model and tracking parameters are then combined to obtain 3D reconstruction. The networks are trained together in an end-to-end manner (marked in red) by projecting the reconstructed outputs into 2D using a differentiable renderer and computing multi-image consistency losses and other regularization losses.

Abstract

Traditional methods for image-based 3D face reconstruction and facial motion retargeting fit a 3D morphable model (3DMM) to the face, which has limited modeling capacity and fail to generalize well to in-the-wild data. Use of deformation transfer or multilinear tensor as a personalized 3DMM for blendshape interpolation does not address the fact that facial expressions result in different local and global skin deformations in different persons. Moreover, existing methods learn a single albedo per user which is not enough to capture the expression-specific skin reflectance variations. We propose an end-to-end framework that jointly learns a personalized face model per user and per-frame facial motion parameters from a large corpus of in-the-wild videos of user expressions. Specifically, we learn user-specific expression blendshapes and dynamic (expression-specific) albedo maps by predicting personalized corrections on top of a 3DMM prior. We introduce novel training constraints to ensure that the corrected blendshapes retain their semantic meanings and the reconstructed geometry is disentangled from the albedo. Experimental results show that our personalization accurately captures fine-grained facial dynamics in a wide range of conditions and efficiently decouples the learned face model from facial motion, resulting in more accurate face reconstruction and facial motion retargeting compared to state-of-the-art methods.

Results for Images

Qualitative results of our method. Our modeling network accurately captures high-fidelity facial details specific to the user, thereby enabling the tracking network to better learn user-independent facial motion. Our network can handle a wide variety of expression, head pose, lighting conditions, age, ethnicity, facial hair and makeup etc.

Results for Videos

Acknowledgements

We would like to thank Muscle Wu, Zeyu Chen, Wenbin Zhu and other members of the AI Perception team at Microsoft Cloud+AI for their help with the project. We would also like to thank Alex Colburn from UW Graphics and Imaging Laboratory for his valuable discussions. Thanks to https://richzhang.github.io/colorization/ for this webpage template.