Deep Energies for Estimating Three-Dimensional Facial Pose and Expression
Jane Wu, Michael Bao, Xinwei Yao, Ronald Fedkiw
Deep Energies for Estimating Three-Dimensional Facial Pose and Expression
While much progress has been made in capturing high-quality facial performances using motion capture markers and shape-from-shading, high-end systems typically also rely on rotoscope curves hand-drawn on the image. These curves are subjective and difficult to draw consistently; moreover, ad-hoc procedural methods are required for generating matching rotoscope curves on synthetic renders embedded in the optimization used to determine three-dimensional (3D) facial pose and expression. We propose an alternative approach whereby these curves and other keypoints are detected automatically on both the image and the synthetic renders using trained neural networks, eliminating artist subjectivity, and the ad-hoc procedures meant to mimic it. More generally, we propose using machine learning networks to implicitly define deep energies which when minimized using classical optimization techniques lead to 3D facial pose and expression estimation.
Numerical optimization / Neural networks / Motion capture / Face tracking
[1.] |
|
[2.] |
|
[3.] |
|
[4.] |
Bao, M., Cong, M., Grabli, S., Fedkiw, R.: High-quality face capture using anatomical muscles. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10794–10803. IEEE (2019)
|
[5.] |
Bhat, K.S., Goldenthal, R., Ye, Y., Mallet, R., Koperwas, M.: High fidelity facial animation capture and retargeting with contours. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 7–14. ACM (2013)
|
[6.] |
|
[7.] |
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: European Conference on Computer Vision, pp. 25–36. Springer (2004)
|
[8.] |
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In: 2017 IEEE International Conference on Computer Vision, pp. 1021–1030. IEEE (2017)
|
[9.] |
|
[10.] |
|
[11.] |
|
[12.] |
Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: European Conference on Computer Vision, pp. 109–122. Springer (2014)
|
[13.] |
Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., Chai, J.: Accurate and robust 3D facial capture using a single RGBD camera. In: ICCV'13: Proceedings of the 2013 IEEE International Conference on Computer Vision, pp. 3615–3622. IEEE (2013)
|
[14.] |
Debevec, P., Hawkins, T., Tchou, C., Duiker, H.-P., Sarokin, W., Sagar, M.: Acquiring the reflectance field of a human face. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 145–156. ACM Press/Addison-Wesley Publishing Co (2000)
|
[15.] |
Deng, J., Zhou, Y., Cheng, S., Zaferiou, S.: Cascade multi-view hourglass model for robust 3D face alignment. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 399–403. IEEE (2018)
|
[16.] |
Deng, Z., Chiang, P.-Y., Fox, P., Neumann, U.: Animating blendshape faces by cross-mapping motion capture data. In: Proceedings of the 2006 Symposium on Interactive 3D Graphics and Games, pp. 43–48. ACM (2006)
|
[17.] |
|
[18.] |
Dong, X., Yu, S.-I., Weng, X., Wei, S.-E., Yang, Y., Sheikh, Y.: Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 360–368. IEEE (2018)
|
[19.] |
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 658–666. ACM (2016)
|
[20.] |
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. In: 2015 IEEE International Conference on Computer Vision, pp. 2758–2766. IEEE (2015)
|
[21.] |
Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3D face reconstruction and dense alignment with position map regression network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 534–551. IEEE (2018)
|
[22.] |
Feng, Z.-H., Kittler, J., Christmas, W., Huber, P., Wu, X.-J.: Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. arXiv:1611.05396 (2016)
|
[23.] |
|
[24.] |
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423. IEEE (2016)
|
[25.] |
Gerig, T., Morel-Forster, A., Blumer, C., Egger, B., Luthi, M., Schönborn, S., Vetter, T.: Morphable face models-an open framework. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 75–82. IEEE (2018)
|
[26.] |
|
[27.] |
Guo, J.Z., Zhu, X.Y., Lei, Z.: 3DDFA. https://github.com/cleardusk/3DDFA (2018)
|
[28.] |
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1647–1655. IEEE (2017). http://lmb.informatik.uni-freiburg.de//Publications/2017/IMKDB17
|
[29.] |
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric cnn regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1031–1039. IEEE (2017)
|
[30.] |
|
[31.] |
|
[32.] |
Johnson, J., Alahi, A., Li, F.F.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694–711. Springer (2016)
|
[33.] |
Jourabloo, A., Liu, X.: Large-pose face alignment via CNN-based dense 3D model fitting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4188–419. IEEE (2016)
|
[34.] |
|
[35.] |
Kazemi, V., Keskin, C., Taylor, J., Kohli, P., Izadi, S.: Real-time face reconstruction from a single depth image. In: 2014 2nd International Conference on 3D Vision, pp. 369–376. IEEE (2014)
|
[36.] |
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1867–1874 (2014)
|
[37.] |
Kim, H., Zollhöfer, M., Tewari, A., Thies, J., Richardt, C., Theobalt, C.: InverseFaceNet: deep monocular inverse face rendering. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4625–4634 (2018)
|
[38.] |
|
[39.] |
Korshunova, I., Shi, W., Dambre, J., Theis, L.: Fast face-swap using convolutional neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3697–3705. IEEE (2017)
|
[40.] |
|
[41.] |
|
[42.] |
Li, Y., Liu, S., Yang, J., Yang, M.-H.: Generative face completion. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5892–5900. IEEE (2017)
|
[43.] |
Loper, M.: Chumpy autodifferentation library. http://chumpy.org (2014)
|
[44.] |
Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8695, pp. 154–169. Springer, Cham (2014)
|
[45.] |
Lourakis, M., Argyros, A.A.: Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment? In: Tenth IEEE International Conference on Computer Vision (ICCV'05), vol. 2, pp. 1526–1531. IEEE (2005)
|
[46.] |
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI'81: Proceedings of the 7th International Joint Conference on Artificial Intelligence, vol. 2, pp. 674–679. Morgan Kaufmann Publishers Inc., San Francisco, CA (1981)
|
[47.] |
|
[48.] |
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5188–5196. IEEE (2015)
|
[49.] |
|
[50.] |
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499. Springer (2016)
|
[51.] |
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS 2017 Autodiff Workshop (2017)
|
[52.] |
Ramamoorthi, R., Hanrahan, P.: An efficient representation for irradiance environment maps. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 497–500. ACM (2001)
|
[53.] |
Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Epicflow: edge-preserving interpolation of correspondences for optical flow. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1164–1172. IEEE (2015)
|
[54.] |
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv:1609.04747 (2016)
|
[55.] |
|
[56.] |
|
[57.] |
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
|
[58.] |
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Proceedings of the 15th European Conference on Computer Vision (ECCV), pp. 536–553. Springer (2018)
|
[59.] |
Tewari, A., Zollhofer, M., Kim, H., Garrido, P., Bernard, F., Perez, P., Theobalt, C.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3735–3744. IEEE (2017)
|
[60.] |
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2387–2395. IEEE (2016)
|
[61.] |
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660. IEEE (2014)
|
[62.] |
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. IEEE (2018)
|
[63.] |
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4732. IEEE (2016)
|
[64.] |
|
[65.] |
|
[66.] |
Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2129–2138. IEEE (2018)
|
[67.] |
|
[68.] |
Zadeh, A., Lim, Y.C., Baltrusaitis, T., Morency, L.-P.: Convolutional experts constrained local model for 3D facial landmark detection. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 2519–2528. IEEE (2017)
|
[69.] |
|
[70.] |
Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3409–3417. IEEE (2016)
|
[71.] |
|
[72.] |
|
/
〈 | 〉 |