Joint Face Detection and Facial Motion Retargeting
for Multiple Faces

CVPR 2019

Bindita Chaudhuri¹

Noranart Vesdapunt²

Baoyuan Wang²

¹ University of Washington ² Microsoft Cloud and AI

Unfortunately, we won't be able to release our code since some parts of the code belongs to Microsoft's confidential property.

Retargeting results from 2D human face(s) to 3D character(s) using our approach. These examples are screenshots taken during live performance capture experiments conducted on CPU on a regular PC with a webcam. More examples from our test sets are shown below.

Abstract

Facial motion retargeting is an important problem in both computer graphics and vision, which involves capturing the performance of a human face and transferring it to another 3D character. Learning 3D morphable model (3DMM) parameters from 2D face images using convolutional neural networks is common in 2D face alignment, 3D face reconstruction etc. However, existing methods either require an additional face detection step before retargeting or use a cascade of separate networks to perform detection followed by retargeting in a sequence. In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for multiple faces. First, we design a novel multitask learning framework that learns a disentangled representation of 3DMM parameters for a single face. Then, we leverage the trained single face model to generate ground truth 3DMM parameters for multiple faces to train another network that performs joint face detection and motion retargeting for images with multiple faces. Experimental results show that our joint detection and retargeting network has high face detection accuracy and is robust to extreme expressions and poses while being faster than state-of-the-art methods.

Results for Images

The results here show the rendered 3DMM with the 3DMM parameters predicted by our networks for single face and multi-face images. During retargeting, we only transfer the expression and rotation parameters to the 3D characters as shown in the teaser image at the top of this page. In the paper, we have demonstrated the ability of our approach to disentangle these parameters from the other parameters, resulting in accurate retargeted facial motion on 3D characters.

Results for single face images using our multi-scale single face retargeting network. Top row: input images, bottom row: predicted 3DMM parameters rendered on top of the input images.

Results for images with multiple faces using our multi-scale multi-face retargeting network. Top row: input images with predicted face bounding boxes in green rectangles, bottom row: predicted 3DMM parameters rendered on top of each face in the input images.

Results for Videos

Acknowledgements

We would like to thank Pai Zhang, Muscle Wu, Xiang Yan, Zeyu Chen and other members of the Visual Intelligence Group at Microsoft Research AI for their help with the project. We would also like to thank Linda Shapiro, Alex Colburn and Barbara Mones from UW Graphics and Imaging Laboratory for their valuable discussions. Thanks to https://github.com/experiencor/keras-yolo2 for sharing their code and to https://richzhang.github.io/colorization/ for this webpage template.