Mask guided diverse face image synthesis
Song SUN, Bo ZHAO, Muhammad MATEEN, Xin CHEN, Junhao WEN
Mask guided diverse face image synthesis
Recent studies have shown remarkable success in face image generation task. However, existing approaches have limited diversity, quality and controllability in generating results. To address these issues, we propose a novel end-to-end learning framework to generate diverse, realistic and controllable face images guided by face masks. The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face, such as eyes, nose and mouse. The framework consists of four components: style encoder, style decoder, generator and discriminator. The style encoder generates a style code which represents the style of the result face; the generator translate the input face mask into a real face based on the style code; the style decoder learns to reconstruct the style code from the generated face image; and the discriminator classifies an input face image as real or fake. With the style code, the proposed model can generate different face images matching the input face mask, and by manipulating the face mask, we can finely control the generated face image. We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.
face image generation / image translation / generative adversarial networks
[1] |
Yan X, Yang J, Sohn K, Lee H. Attribute2image: Conditional image generation from visual attributes. In: Proceedings of the European Conference on Computer Vision. 2015, 776−791
|
[2] |
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 5907−5915
|
[3] |
Zhang H , Xu T , Li H , Zhang S , Wang X , Huang X , Metaxas D . Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41( 8): 1947– 1962
|
[4] |
Choi Y, Choi M, Kim M, Ha J W, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8789−8797
|
[5] |
Choi Y, Uh Y, Yoo J, Ha J W. Stargan v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 8188−8197
|
[6] |
Isola P, Zhu J Y, Zhou T, Efros A A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1125−1134
|
[7] |
Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J, Catanzaro B. High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8798−8807
|
[8] |
Liu X, Yin G, Shao J, Wang X, Li H. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. In: Proceedings of Advances in Neural Information Processing Systems. 2019, 570−580
|
[9] |
Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of the International Conference on Learning Representations. 2019, 1−35
|
[10] |
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 4401−4410
|
[11] |
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 1501−1510
|
[12] |
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 2672−2680
|
[13] |
Zablotskaia P, Siarohin A, Zhao B, Sigal L. Dwnet: Dense warp-based network for pose-guided human video generation. In: Proceedings of the British Machine Vision Conference. 2019, 205.1-205.13
|
[14] |
Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2223−2232
|
[15] |
Zhu J Y, Zhang R, Pathak D, Darrell T, Efros A A, Wang O, Shechtamn E. Toward multimodal image-to-image translation. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 465−476
|
[16] |
Zhang G, Kan M, Shan S, Chen X. Generative adversarial network with spatial attention for face attribute editing. In: Proceedings of the European Conference on Computer Vision. 2018, 417−432
|
[17] |
Liu Z, Luo P, Wang X, Tang X. Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision. 2015, 3730-3738
|
[18] |
Zhao B, Meng L, Yin W, Sigal L. Image generation from layout. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 8584−8593
|
[19] |
Zhang M J , Wang N , Li Y , Gao X . Deep latent low-rank representation for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2019, 30( 10): 3109– 3123
|
[20] |
Zhang M J , Wang N , Li Y , Gao X . Neural probabilistic graphical model for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2020, 31( 7): 2623– 2637
|
[21] |
Zhang M J , Li J , Wang N , Gao X . Compositional model-based sketch generator in facial entertainment. IEEE Transaction on Cybernetics, 2018, 48( 3): 904– 915
|
[22] |
Zhang M J , Wang N , Li Y , Gao X . Bionic face sketch generator. IEEE Transaction on Cybernetics, 2019, 50( 6): 2701– 2714
|
[23] |
Zhang M J , Wang N , Li Y , Gao X , Tao D . Dual-transfer face sketch-photo synthesis. IEEE Transaction on Image Processing, 2019, 28( 2): 642– 657
|
[24] |
Zhang M J , Li Y , Wang N , Chi Y , Gao X . Cascaded face sketch synthesis under various illuminations. IEEE Transaction on Image Processing, 2019, 29
|
[25] |
He Z, Kan M, Zhang J, Shan S. PA-GAN: progressive attention generative adversarial network for facial attribute editing. arXiv: 2007.05892, 2020
|
[26] |
Gu S, Bao J, Yang H, Chen D, Wen F, Yuan L. Mask-guided portrait editing with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 3436−3445
|
[27] |
Lee C H, Liu Z, Wu L, Luo P. Maskgan: towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 5549−5558
|
[28] |
Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2017, 1857−1865
|
[29] |
Mao Q, Lee H Y, Tseng H Y, Ma S, Yang M H. Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 1429−1437
|
[30] |
Yang D, Hong S, Jang Y, Zhao T, Lee H. Diversity-sensitive conditional generative adversarial networks. arXiv: 1901.09024, 2019.
|
[31] |
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 6629−6640
|
[32] |
Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 586−595
|
[33] |
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training gans. In: Proceedings of the Advances in Neural Information Processing Systems. 2016, 2234−2242
|
[34] |
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems 25. 2012, 1097−1105
|
[35] |
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: Proceedings of Advances in Neural Information Processing Systems Workshop. 2017, 1−4
|
[36] |
Mescheder L, Geiger A, and Nowozin S. Which training methods for gans do actually converge? In: Proceedings of International Conference on Machine Learning. 2018, 3481−3490
|
[37] |
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014
|
[38] |
Karras T, Aila T, Laine S, and Lehtinen J. Progressive growing of gans for improved quality, stability, and variation. arXiv: 1710.10196, 2017
|
[39] |
Yazici Y, Foo C, Winkler SS, Yap K H, Piliouras G, Chandrasekhar V. The unusual effectiveness of averaging in gan training. In: Proceedings of the International Conference on Learning Representations, 2019, 1−22
|
[40] |
He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026−1034
|
/
〈 | 〉 |