Towards Interpretable Face Morphing via Unsupervised Learning of Layer-Wise and Local Features

Cheng Yu; Chuan Chen; ;Wenmin Wang

doi:10.1049/cit2.70088

CAAI Transactions on Intelligence Technology ›› 2026, Vol. 11 ›› Issue (1) :137 -148. DOI: 10.1049/cit2.70088

ORIGINAL RESEARCH

research-article

Towards Interpretable Face Morphing via Unsupervised Learning of Layer-Wise and Local Features

Author information +

History +

PDF (4536KB)

Abstract

Discovering meaningful face morphing is critical for applications in image synthesis. Traditional unsupervised methods rely on global or layer-wise representations, neglecting finer local details and thus limiting the control over specific facial attributes. In this work, we introduce an improved unsupervised approach that leverages contrastive learning and K-means clustering to learn both layer-wise and local features (LLF) in the latent space of StyleGAN. Our method segments latent representations into multiple local components across different layers, enabling fine-grained control over attributes such as hair, eyes, and mouth. Experimental results demonstrate that LLF outperforms existing methods by providing more interpretable facial trans-formations while preserving high image realism, offering a promising solution for enhanced unsupervised face morphing ap-plications. The code is available at https://github.com/disanda/LLF.

Keywords

computer vision / generative adversarial network / neural network

Cite this article

Download citation ▾

Cheng Yu, Chuan Chen, ;Wenmin Wang. Towards Interpretable Face Morphing via Unsupervised Learning of Layer-Wise and Local Features. CAAI Transactions on Intelligence Technology, 2026, 11(1): 137-148 DOI:10.1049/cit2.70088

登录浏览全文

4963

注册一个新账户忘记密码

Funding

This work was supported by the Natural Science Foundation of Chongqing, China (Grant CSTB2023NSCQ‐LZX0068), Science and Technology Research Program of Chongqing Education Commission of China (Youth Project‐KJQN202401159), and Scientific Research Foundation of Chongqing University of Technology (Grant 2023ZDZ022).

Conflicts of Interest

Wenmin Wang is an editorial board member for the journal, and was not involved in peer review process or the decision to publish this article. The authors declare that they have no confiict of interest.

Data Availability Statement

The authors have nothing to report.

Endnotes

¹https://www.wjx.cn/.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	T. Karras, S. Laine, and T. Aila, “A Style-based Generator Architec-ture for Generative Adversarial Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence 43, no. 12 (2021): 4217-4228, https://doi.org/10.1109/tpami.2020.2970919.

[2]	E. Richardson, Y. Alaluf, O. Patashnik, et al., “Encoding in Style: A Stylegan Encoder for Image-to-Image Translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2021).

[3]

A. Sauer, T. Karras, S. Laine,A. Geiger, and T. Aila, “Stylegan-t: Unlocking the Power of Gans for Fast Large-Scale Text-to-Image Syn-thesis,” in Proceedings of the International Conference on Machine Learning (ICML), eds. A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Vol. 202, (2023), 30105-30118.

[4]	O. Patashnik, Z. Wu, E. Shechtman,D. Cohen-Or, and D. Lischinski, “Styleclip: Text-Driven Manipulation of Stylegan Imagery,” in Pro-ceedings of the IEEE International Conference on Computer Vision, (2021), 2065-2074.

[5]	Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial Attribute Editing by Only Changing What You Want,” IEEE Trans-actions on Image Processing 28, no. 11 (2019): 5464-5478, https://doi.org/10.1109/tip.2019.2916751.

[6]	C. Yu, W. Wang, H. Li, and R. Bugiolacchi, “Fast 2-Step Regulari-zation on Style Optimization for Real Face Morphing,” Neural Networks 155 (2022): 28-38, https://doi.org/10.1016/j.neunet.2022.08.007.

[7]	Y. Shen, C. Yang, X. Tang, and B. Zhou, “Interfacegan: Interpreting the Disentangled Face Representation Learned by Gans,” IEEE Trans-ations on Pattern Analysis and Machine Intelligence 44, no. 4 (2022): 2004-2018, https://doi.org/10.1109/tpami.2020.3034267.

[8]	R. Abdal, P. Zhu, N. J. Mitra, and P. Wonka, “Stylefiow: Attribute- Conditioned Exploration of stylegan-generated Images Using Condi-tional Continuous Normalizing Flows,” ACM Transactions on Graphics 40, no. 3 (2021): 1-21, https://doi.org/10.1145/3447648.

[9]	E. Collins, R. Bala,B. Price, and S. Süsstrunk, “Editing in Style: Uncovering the Local Semantics of Gans,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2020), 5770-5779.

[10]	H. Ling, K. Kreis, D. Li, S. W. Kim,A. Torralba, and S. Fidler, “Editgan: High-Precision Semantic Image Editing,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS, 2021).

[11]	E. Härkönen, A. Hertzmann,J. Lehtinen, and S. Paris, “Ganspace: Discovering Interpretable GAN Controls,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS, 2020).

[12]	Y. Shen and B. Zhou, “Closed-Form Factorization of Latent Se-mantics in Gans,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2021), 1532-1540.

[13]	O. K. Yüksel, E. Simsar, E. G. Er, and P. Yanardag, “Latentclr: A Contrastive Learning Approach for Unsupervised Discovery of Inter-pretable Directions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR, 2021), 14243-14252.

[14]	P. Ling, L. Chen, P. Zhang, H. Chen, Y. Jin, and J. Zheng, “Freedrag: Feature Dragging for Reliable Point-based Image Editing,” in Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR, 2024), 6860-6870.

[15]	C. Yang, Y. Shen, and B. Zhou, “Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis,” International Journal of Computer Vision 129, no. 5 (2021): 1451-1466, https://doi.org/10.1007/s11263-020-01429-5.

[16]	D. Bau, J. Zhu, H. Strobelt, et al., “GAN Dissection: Visualizing and Understanding Generative Adversarial Networks,” in International Conference on Learning Representations (ICLR, 2019).

[17]	W. Xia, Y. Zhang, Y. Yang, J. Xue, B. Zhou, and M. Yang, “GAN Inversion: A Survey,” IEEE Transations on Pattern Analysis and Ma-chine Intelligence 45, no. 3 (2023): 3121-3138, https://doi.org/10.1109/tpami.2022.3181070.

[18]	T. Wei, D. Chen, W. Zhou, et al., “E2style: Improve the Efficiency and Effectiveness of Stylegan Inversion,” IEEE Transactions on Image Pro-cessing 31 (2022): 3267-3280, https://doi.org/10.1109/tip.2022.3167305.

[19]	C. Yu, W. Wang, and R. Bugiolacchi, “Improving Generative Adver-sarial Network Inversion via Fine-Tuning Gan Encoders,” Applied Soft Computing 166 (2024): 112201, https://doi.org/10.1016/j.asoc.2024.112201.

[20]	X. Pan, A. Tewari, T. Leimkühler, L. Liu,A. Meka, and C. Theobalt, “Drag Your GAN: Interactive Point-based Manipulation on the Gener-ative Image Manifold,” in ACM SIGGRAPH Conference Proceedings, (2023), 78:1-78:11.

[21]	D. Li, J. Yang, K. Kreis, A. Torralba, and S. Fidler, “Semantic Seg-mentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization,” in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, (CVPR, 2021), 8300-8311.

[22]	A. Radford, J. W. Kim, C. Hallacy, et al., “Learning Transferable Visual Models From Natural Language Supervision,” in Proceedings of the International Conference on Machine Learning (ICML), 139, (Pro-ceedings of Machine Learning Research, 2021),8748-8763.

[23]	T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in Pro-ceedings of the International Conference on Machine Learning (ICML), 119, (Proceedings of Machine Learning Research, 2020),1597-1607.

[24]	D. Pakhomov, S. Hira, N. Wagle, K. E. Green, and N. Navab, “Segmentation in Style: Unsupervised Semantic Image Segmentation With Stylegan and CLIP,” (2021) arXiv preprint abs/2107. 12518, https://arxiv.org/abs/2107.12518.

[25]	T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Improving the Image Quality of Stylegan,” in Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR, 2020), 8107-8116.

[26]	R. Abdal,Y. Qin, and P. Wonka, “Image2stylegan: How to Embed Images into the Stylegan Latent Space?,” in Proceedings of the IEEE International Conference on Computer Vision, (2019), 4431-4440.

[27]	R. L. Thorndike, “Who Belongs in the Family?,” Psychometrika 18, no. 4 (1953): 267-276, https://doi.org/10.1007/bf02289263.

[28]	P. J. Rousseeuw, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,” Journal of Computational and Applied Mathematics 20 (1987): 53-65, https://doi.org/10.1016/0377-0427(87)90125-7.

[29]	A. McCallum,K. Nigam, and L. H. Ungar, “Efficient Clustering of high-Dimensional Data Sets With Application to Reference Matching,” in KDD (ACM, 2000), 169-178.

[30]	Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing 13, no. 4 (2004): 600-612, https://doi.org/10.1109/tip.2003.819861.

[31]	R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR, 2018).

[32]	M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans Trained by a Two time-scale Update Rule Converge to a Local Nash Equilibrium,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS, 2017), 6626-6637.

[33]	R. Kessler, K. Raja, J. Tapia, and C. Busch, “Towards Minimizing Efforts for Morphing Attacks—Deep Embeddings for Morphing Pair Selection and Improved Morphing Attack Detection,” PLoS One 19, no. 5 (2024): 1-29, https://doi.org/10.1371/journal.pone.0304610.

[34]	P. Esser, S. Kulal, A. Blattmann, et al., “Scaling Rectified Flow Transformers for high-resolution Image Synthesis,” in Proceedings of the International Conference on Machine Learning (ICML), (Proceedings of Machine Learning Research, 2024).