Learning storage-efficient 3D Gaussian head avatars from monocular videos via parametric adaptation and material decomposition

Guohao LI , Hongyu YANG , Di HUANG , Yunhong WANG

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (12) : 2012711

PDF (2223KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (12) :2012711 DOI: 10.1007/s11704-025-50214-5
Image and Graphics
RESEARCH ARTICLE

Learning storage-efficient 3D Gaussian head avatars from monocular videos via parametric adaptation and material decomposition

Author information +
History +
PDF (2223KB)

Abstract

Recent advances in 3D head avatar generation combine 3D Gaussian Splatting (3DGS) with 3D Morphable Models (3DMM) to reconstruct animatable avatars from monocular video inputs. However, existing approaches exhibit two critical limitations: prohibitive storage requirements from per-primitive animation parameters and spherical harmonics (SH) coefficients, and compromised facial fidelity due to insufficient dynamic detail modeling. To address these challenges, we propose PBR-GAvatar, a novel framework featuring two key innovations: First, we develop hierarchical parametric adaptation that combines coarse 3DMM basis refinement via Low-Rank Adaptation (LoRA) with a lightweight Dynamic Detail Generator (DDG) producing expression-conditioned details. Second, we introduce a material decomposition paradigm that replaces SH coefficients with compact Physically Based Rendering (PBR) textures. Our framework jointly optimizes geometry, dynamics, and material properties through differentiable rendering. The proposed framework achieves a 20× size reduction (under 10 MB) compared with state-of-the-art methods, while demonstrating superior reconstruction fidelity on INSTA and GBS benchmarks. The PBR material system not only reduces storage demands but also supports photorealistic relighting under arbitrary illumination conditions. Our implementation will be made publicly available at the website of liguohao96.github.io/PBR-GAvatar/.

Graphical abstract

Keywords

storage-efficient 3D Gaussian avatars / PBR / dynamic detail modeling

Cite this article

Download citation ▾
Guohao LI, Hongyu YANG, Di HUANG, Yunhong WANG. Learning storage-efficient 3D Gaussian head avatars from monocular videos via parametric adaptation and material decomposition. Front. Comput. Sci., 2026, 20(12): 2012711 DOI:10.1007/s11704-025-50214-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Wang Y, Wang X, Yi R, Fan Y, Hu J, Zhu J, Ma L. 3D Gaussian head avatars with expressive dynamic appearances by compact tensorial representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025, 21117−21126

[2]

Deng Y, Yang J, Xu S, Chen D, Jia Y, Tong X. Accurate 3D face reconstruction with weakly-supervised learning: from single image to image set. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2019, 285−295

[3]

Feng Y, Feng H, Black M J, Bolkart T . Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics, 2021, 40( 4): 88

[4]

Daněček R, Black M J, Bolkart T. EMOCA: emotion driven monocular face capture and animation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 20279−20290

[5]

Bharadwaj S, Zheng Y, Hilliges O, Black M J, Abrevaya V F . FLARE: fast learning of animatable and relightable mesh avatars. ACM Transactions on Graphics, 2023, 42( 6): 204

[6]

Li G, Yang H, Huang D, Wang Y. 3D face modeling via weakly-supervised disentanglement network joint identity-consistency prior. In: Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition. 2024, 1−10

[7]

Cai X, Lou J, Bu J, Dong J, Wang H, Yu H . Single depth image 3D face reconstruction via domain adaptive learning. Frontiers of Computer Science, 2024, 18( 1): 181342

[8]

Blanz V, Vetter T. A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. 1999, 187−194

[9]

Li T, Bolkart T, Black M J, Li H, Romero J . Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, 2017, 36( 6): 194

[10]

Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R. NeRF: representing scenes as neural radiance fields for view synthesis. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 405−421

[11]

Gao X, Zhong C, Xiang J, Hong Y, Guo Y, Zhang J . Reconstructing personalized semantic facial NeRF models from monocular video. ACM Transactions on Graphics, 2022, 41( 6): 200

[12]

Sun J, Wang X, Wang L, Li X, Zhang Y, Zhang H, Liu Y. Next3D: generative neural texture rasterization for 3D-aware head avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 20991−21002

[13]

Zielonka W, Bolkart T, Thies J. Instant volumetric head avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 4574−4584

[14]

Zheng Y, Abrevaya V F, Bühler M C, Chen X, Black M J, Hilliges O. I M Avatar: implicit morphable head avatars from videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 13535−13545

[15]

Grassal P W, Prinzler M, Leistner T, Rother C, Nießner M, Thies J. Neural head avatars from monocular RGB videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 18632−18643

[16]

Xu Y, Zhang H, Wang L, Zhao X, Huang H, Qi G, Liu Y. LatentAvatar: learning latent expression code for expressive neural head avatar. In: Proceedings of the ACM SIGGRAPH Conference Papers. 2023, 86

[17]

Kerbl B, Kopanas G, Leimkühler T, Drettakis G . 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 2023, 42( 4): 139

[18]

Qian S, Kirschstein T, Schoneveld L, Davoli D, Giebenhain S, Nießner M. GaussianAvatars: photorealistic head avatars with rigged 3D Gaussians. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 20299−20309

[19]

Xiang J, Gao X, Guo Y, Zhang J. FlashAvatar: high-fidelity head avatar with efficient Gaussian embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 1802−1812

[20]

Xu Y, Chen B, Li Z, Zhang H, Wang L, Zheng Z, Liu Y. Gaussian head avatar: ultra high-fidelity head avatar via dynamic Gaussians. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 1931−1941

[21]

Ma S, Weng Y, Shao T, Zhou K. 3D Gaussian blendshapes for head avatar animation. In: Proceedings of the ACM SIGGRAPH Conference Papers. 2024, 60

[22]

Zhang J, Wu Z, Liang Z, Gong Y, Hu D, Yao Y, Cao X, Zhu H. FATE: full-head Gaussian avatar with textural editing from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025, 5535−5545

[23]

Li G, Yang H, Men Y, Huang D, Li W, Yang R, Wang Y. Generating editable head avatars with 3D Gaussian GANs. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2025, 1−5

[24]

Kirschstein T, Giebenhain S, Tang J, Georgopoulos M, Nießner M. GGHead: fast and generalizable 3D Gaussian heads. In: Proceedings of the ACM SIGGRAPH Asia Conference Papers. 2024, 126

[25]

Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[26]

Karis B. Real shading in unreal engine 4. In: Proceedings of the Physically Based Shading in Theory and Practice, 2013, 4

[27]

Zielonka W, Bolkart T, Thies J. Towards metrical reconstruction of human faces. In: Proceedings of the 17th European Conference on Computer Vision. 2022, 250−269

[28]

Barron J T, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan P P. Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 5835−5844

[29]

Müller T, Evans A, Schied C, Keller A . Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 2022, 41( 4): 102

[30]

Yariv L, Gu J, Kasten Y, Lipman Y. Volume rendering of neural implicit surfaces. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 367

[31]

Wang P, Liu L, Liu Y, Theobalt C, Komura T, Wang W. NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 2081

[32]

Gafni G, Thies J, Zollhöfer M, Nießner M. Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8645−8654

[33]

Hong Y, Peng B, Xiao H, Liu L, Zhang J. HeadNeRF: a real-time nerf-based parametric head model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 20342−20352

[34]

Giebenhain S, Kirschstein T, Georgopoulos M, Rünz M, Agapito L, Nießner M. MonoNPHM: dynamic head reconstruction from monocular videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 10747−10758

[35]

Kim H, Garrido P, Tewari A, Xu W, Thies J, Niessner M, Pérez P, Richardt C, Zollhöfer M, Theobalt C . Deep video portraits. ACM Transactions on Graphics, 2018, 37( 4): 163

[36]

Koujan M R, Doukas M C, Roussos A, Zafeiriou S. Head2Head: video-based neural head synthesis. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition. 2020, 16−23

[37]

Wang L, Zhao X, Sun J, Zhang Y, Zhang H, Yu T, Liu Y. StyleAvatar: real-time photo-realistic portrait avatar from a single video. In: Proceedings of the ACM SIGGRAPH Conference Papers. 2023, 67

[38]

Wang H, Huang D, Wang Y . GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding. Frontiers of Computer Science, 2022, 16( 1): 161301

[39]

Yu Z, Chen A, Huang B, Sattler T, Geiger A. Mip-splatting: alias-free 3D Gaussian splatting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 19447−19456

[40]

Huang B, Yu Z, Chen A, Geiger A, Gao S. 2D Gaussian splatting for geometrically accurate radiance fields. In: Proceedings of the ACM SIGGRAPH Conference Papers. 2024, 32

[41]

Giebenhain S, Kirschstein T, Rünz M, Agapito L, Nießner M. NPGA: neural parametric Gaussian avatars. In: Proceedings of the ACM SIGGRAPH Asia Conference Papers. 2024, 127

[42]

Xu Y, Ye K, Shao T, Weng Y . Animatable 3D Gaussians for modeling dynamic humans. Frontiers of Computer Science, 2025, 19( 9): 199704

[43]

Xu Y, Wang L, Zheng Z, Su Z, Liu Y. 3D Gaussian parametric head model. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 129−147

[44]

Zielonka W, Bolkart T, Beeler T, Thies J. Gaussian eigen models for human heads. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025, 15930−15940

[45]

Saito S, Schwartz G, Simon T, Li J, Nam G. Relightable Gaussian codec avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 130−141

[46]

Li J, Cao C, Schwartz G, Khirodkar R, Richardt C, Simon T, Sheikh Y, Saito S. URAvatar: universal relightable Gaussian codec avatars. In: Proceedings of the ACM SIGGRAPH Conference Papers. 2024, 128

[47]

Schmidt J, Giebenhain S, Niessner M. BecomingLit: relightable Gaussian avatars with hybrid neural shading. 2025, arXiv preprint arXiv: 2506.06271

[48]

Zhang D, Liu Y, Lin L, Zhu Y, Chen K, Qin M, Li Y, Wang H. HRAvatar: high-quality and relightable Gaussian head avatar. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025, 26285−26296

[49]

Mao Y, Ge Y, Fan Y, Xu W, Mi Y, Hu Z, Gao Y . A survey on lora of large language models. Frontiers of Computer Science, 2025, 19( 7): 197605

[50]

Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 10674−10685

[51]

Chan E R, Lin C Z, Chan M A, Nagano K, Pan B, de Mello S, Gallo O, Guibas L, Tremblay J, Khamis S, Karras T, Wetzstein G. Efficient geometry-aware 3D generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16102−16112

[52]

Lewis J P, Cordner M, Fong N. Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 2000, 165−172

[53]

Cook R L, Torrance K E. A reflectance model for computer graphics. In: Proceedings of the 8th Annual Conference on Computer Graphics and Interactive Techniques. 1981, 307−316

[54]

Cook R L, Torrance K E . A reflectance model for computer graphics. ACM Transactions on Graphics, 1982, 1( 1): 7–24

[55]

Schlick C . An inexpensive BRDF model for physically-based rendering. Computer Graphics Forum, 1994, 13( 3): 233–246

[56]

Akenine-Möller T, Haines E, Hoffman N. Real-time rendering. AK Peters/crc Press. 2019

[57]

Xu L, Lu C, Xu Y, Jia J . Image smoothing via L0 gradient minimization. ACM Transactions on Graphics, 2011, 30( 6): 1–12

[58]

Zheng Y, Yifan W, Wetzstein G, Black M J, Hilliges O. PointAvatar: deformable point-based head avatars from videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 21057−21067

[59]

Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 586−595

[60]

Jin H, Liu I, Xu P, Zhang X, Han S, Bi S, Zhou X, Xu Z, Su H. TensoIR: tensorial inverse rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 165−174

[61]

Zhang X, Srinivasan P P, Deng B, Debevec P, Freeman W T, Barron J T . NeRFactor: neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics, 2021, 40( 6): 237

[62]

Debevec P E, Hawkins T, Tchou C, Duiker H, Sarokin W, SagarM . Acquiring the reflectance field of a human face. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 2001, 145−156

[63]

Liu Y, Wang P, Lin C, Long X, Wang J, Liu L, Komura T, Wang W . NeRO: neural geometry and BRDF reconstruction of reflective objects from multiview images. ACM Transactions on Graphics, 2023, 42( 4): 114

[64]

Moenne-Loccoz N, Mirzaei A, Perel O, de Lutio R, Esturo J M, State G, Fidler S, Sharp N, Gojcic Z . 3D Gaussian ray tracing: fast tracing of particle scenes. ACM Transactions on Graphics, 2024, 43( 6): 232

RIGHTS & PERMISSIONS

Higher Education Press

PDF (2223KB)

Supplementary files

Highlights

272

Accesses

0

Citation

Detail

Sections
Recommended

/