3D asset generation: a survey of evolution towards autoregressive and agent-driven paradigms

Hongxing FAN , Haohua CHEN , Zehuan HUANG , Ziwei LIU , Lu SHENG

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (11) : 2011710

PDF (6039KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (11) : 2011710 DOI: 10.1007/s11704-025-50381-5
Image and Graphics
REVIEW ARTICLE

3D asset generation: a survey of evolution towards autoregressive and agent-driven paradigms

Author information +
History +
PDF (6039KB)

Abstract

Generating high-quality 3D assets is a fundamental challenge in computer vision and graphics. While the field has progressed significantly from early VAE/GAN approaches through diffusion models and large reconstruction models, persistent limitations hinder widespread application. Specifically, achieving high geometric and appearance fidelity, intuitive user control, versatile multi-modal conditioning, and directly usable outputs (e.g., structured meshes) remains challenging for established paradigms. This paper surveys the evolution of deep generative models for 3D content creation, with a primary focus on emerging paradigms: autoregressive (AR) generation and Agent-driven approaches, poised to address aforementioned shortcomings. AR models generate assets sequentially (e.g., token-by-token or part-by-part), offering inherent potential for finer control, structured outputs, and integrating user guidance during the step-by-step process. Agent-driven methods, conversely, leverage the reasoning and linguistic capabilities of Large Language Models (LLMs), enabling intuitive and flexible 3D creation by decomposing complex tasks and utilizing external tools through multi-agent systems. We provide a comprehensive overview of these novel techniques, discuss their potential advantages over current methods, and outline key challenges and future directions towards more capable and intelligent 3D generation systems.

Graphical abstract

Keywords

3D generation paradigms / autoregressive models / agent-driven 3D generation

Cite this article

Download citation ▾
Hongxing FAN, Haohua CHEN, Zehuan HUANG, Ziwei LIU, Lu SHENG. 3D asset generation: a survey of evolution towards autoregressive and agent-driven paradigms. Front. Comput. Sci., 2026, 20(11): 2011710 DOI:10.1007/s11704-025-50381-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Tan Q, Gao L, Lai Y K, Xia S. Variational autoencoders for deforming 3D mesh models. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5841−5850

[2]

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. 2014, 2672−2680

[3]

Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F. ShapeNet: an information-rich 3D model repository. 2015, arXiv preprint arXiv: 1512.03012

[4]

Schwarz K, Liao Y, Niemeyer M, Geiger A. GRAF: generative radiance fields for 3D-aware image synthesis. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1692

[5]

Chan E R, Monteiro M, Kellnhofer P, Wu J, Wetzstein G. pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 5795−5805

[6]

Niemeyer M, Geiger A. GIRAFFE: representing scenes as compositional generative neural feature fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 11448−11459

[7]

Chan E R, Lin C Z, Chan M A, Nagano K, Pan B, de Mello S, Gallo O, Guibas L, Tremblay J, Khamis S, Karras T, Wetzstein G. Efficient geometry-aware 3D generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16102−16112

[8]

Gu J, Liu L, Wang P, Theobalt C. StyleNeRF: a style-based 3D aware generator for high-resolution image synthesis. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[9]

OrEl R, Luo X, Shan M, Shechtman E, Park J J, Kemelmacher-Shlizerman I. StyleSDF: high-resolution 3D-consistent image and geometry generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 13493−13503

[10]

Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 574

[11]

Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 10674−10685

[12]

Hong Y, Zhang K, Gu J, Bi S, Zhou Y, Liu D, Liu F, Sunkavalli K, Bui T, Tan H. LRM: large reconstruction model for single image to 3D. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[13]

Wang P, Tan H, Bi S, Xu Y, Luan F, Sunkavalli K, Wang W, Xu Z, Zhang K. PF-LRM: pose-free large reconstruction model for joint pose and shape prediction. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[14]

Li J, Tan H, Zhang K, Xu Z, Luan F, Xu Y, Hong Y, Sunkavalli K, Shakhnarovich G, Bi S. Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[15]

Xu Y, Shi Z, Wang Y, Chen H, Yang C, Peng S, Shen Y, Wetzstein G. GRM: large Gaussian reconstruction model for efficient 3D reconstruction and generation. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 1−20

[16]

Tang J, Chen Z, Chen X, Wang T, Zeng G, Liu Z. LGM: large multi-view Gaussian model for high-resolution 3D content creation. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 1−18

[17]

Zhang K, Bi S, Tan H, Xiangli Y, Zhao N, Sunkavalli K, Xu Z. GS-LRM: large reconstruction model for 3D Gaussian splatting. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 1−19

[18]

Poole B, Jain A, Barron J T, Mildenhall B. DreamFusion: text-to-3D using 2D diffusion. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[19]

Liu R, Wu R, Van Hoorick B, Tokmakov P, Zakharov S, Vondrick C. Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 9264−9275

[20]

Shi Y, Wang P, Ye J, Mai L, Li K, Yang X. MVDream: multi-view diffusion for 3D generation. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[21]

Wang P, Shi Y. ImageDream: image-prompt multi-view diffusion for 3D generation. 2023, arXiv preprint arXiv: 2312.02201

[22]

Li P, Liu Y, Long X, Zhang F, Lin C, Li M, Qi X, Zhang S, Luo W, Tan P, Wang W, Liu Q, Guo Y. Era3D: high-resolution multiview diffusion using efficient row-wise attention. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 1780

[23]

Tang S, Chen J, Wang D, Tang C, Zhang F, Fan Y, Chandra V, Furukawa Y, Ranjan R. MVDiffusion++: a dense high-resolution multi-view diffusion model for single or sparse-view 3D object reconstruction. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 175−191

[24]

Liu Y, Lin C, Zeng Z, Long X, Liu L, Komura T, Wang W. SyncDreamer: generating multiview-consistent images from a single-view image. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[25]

Gao R, Hołyński A, Henzler P, Brussee A, Martin-Brualla R, Srinivasan P, Barron J T, Poole B. CAT3D: create anything in 3D with multi-view diffusion models. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 2403

[26]

Voleti V, Yao C H, Boss M, Letts A, Pankratz D, Tochilkin D, Laforte C, Rombach R, Jampani V. SV3D: novel multi-view synthesis and 3D generation from a single image using latent video diffusion. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 439−457

[27]

Miao K, Agrawal H, Zhang Q, Semeraro F, Cavallo M, Gu J, Toshev A. DSplats: 3D generation by denoising splats-based multiview diffusion models. 2024, arXiv preprint arXiv: 2412.09648

[28]

Huang Z, Wen H, Dong J, Wang Y, Li Y, Chen X, Cao Y P, Liang D, Qiao Y, Dai B, Sheng L. EpiDiff: enhancing multi-view synthesis via localized epipolar-constrained diffusion. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 9784−9794

[29]

Huang Z, Guo Y C, Wang H, Yi R, Ma L, Cao Y P, Sheng L. MV-adapter: multi-view consistent image generation made easy. 2024, arXiv preprint arXiv: 2412.03632

[30]

Wen H, Huang Z, Wang Y, Chen X, Sheng L. Ouroboros3D: image-to-3D generation via 3D-aware recursive diffusion. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 21631−21641

[31]

Chen Z, Tang J, Dong Y, Cao Z, Hong F, Lan Y, Wang T, Xie H, Wu T, Saito S, Pan L, Lin D, Liu Z. 3DTopia-XL: scaling high-quality 3D asset generation via primitive diffusion. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025, 26576−26586

[32]

Wu S, Lin Y, Zhang F, Zeng Y, Xu J, Torr P, Cao X, Yao Y. Direct3D: scalable image-to-3D generation via 3D latent diffusion transformer. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 3873

[33]

Zhang L, Wang Z, Zhang Q, Qiu Q, Pang A, Jiang H, Yang W, Xu L, Yu J . CLAY: a controllable large-scale generative model for creating high-quality 3D assets. ACM Transactions on Graphics (TOG), 2024, 43( 4): 120

[34]

Lan Y, Hong F, Yang S, Zhou S, Meng X, Dai B, Pan X, Loy C C. LN3DIFF: scalable latent neural fields diffusion for speedy 3D generation. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 112−130

[35]

Zhang B, Cheng Y, Yang J, Wang C, Zhao F, Tang Y, Chen D, Guo B. GaussianCube: a structured and explicit radiance representation for 3D generative modeling. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 3091

[36]

He X, Chen J, Peng S, Huang D, Li Y, Huang X, Yuan C, Ouyang W, He T. GVGEN: text-to-3D generation with volumetric representation. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 463−479

[37]

Nichol A, Jun H, Dhariwal P, Mishkin P, Chen M. Point-E: a system for generating 3D point clouds from complex prompts. 2022, arXiv preprint arXiv: 2212.08751

[38]

Liu Q, Zhang Y, Bai S, Kortylewski A, Yuilie A. DIRECT-3D: learning direct text-to-3D generation on massive noisy 3D data. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 6881−6891

[39]

Nam G, Khlifi M, Rodriguez A, Tono A, Zhou L, Guerrero P. 3D-LDM: neural implicit 3D shape generation with latent diffusion models. 2022, arXiv preprint arXiv: 2212.00842

[40]

Kalischek N, Peters T, Wegner J D, Schindler K. TetraDiffusion: tetrahedral diffusion models for 3D shape generation. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 357−373

[41]

Alliegro A, Siddiqui Y, Tommasi T, Nießner M. PolyDiff: generating 3D polygonal meshes with diffusion models. 2023, arXiv preprint arXiv: 2312.11417

[42]

Ren X, Huang J, Zeng X, Museth K, Fidler S, Williams F. XCube: large-scale 3D generative modeling using sparse voxel hierarchies. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 4209−4219

[43]

Liu Z, Feng Y, Black M J, Nowrouzezahrai D, Paull L, Liu W. MeshDiffusion: score-based generative 3D mesh modeling. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[44]

Xiong B, Wei S T, Zheng X Y, Cao Y P, Lian Z, Wang P S. OctFusion: octree-based diffusion models for 3D shape generation. 2024, arXiv preprint arXiv: 2408.14732

[45]

Tochilkin D, Pankratz D, Liu Z, Huang Z, Letts A, Li Y, Liang D, Laforte C, Jampani V, Cao Y P. TripoSR: fast 3D object reconstruction from a single image. 2024, arXiv preprint arXiv: 2403.02151

[46]

Chen H, Wang P, Wang F, Tian W, Xiong L, Li H. EPro-PnP: generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 2771−2780

[47]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[48]

McAllister D, Ge S, Huang J B, Jacobs D W, Efros A A, Holynski A, Kanazawa A. Rethinking score distillation as a bridge between image distributions. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 1064

[49]

Nash C, Ganin Y, Eslami S M A, Battaglia P W. PolyGen: an autoregressive generative model of 3D meshes. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 669

[50]

Siddiqui Y, Alliegro A, Artemov A, Tommasi T, Sirigatti D, Rosov V, Dai A, Nießner M. MeshGPT: generating triangle meshes with decoder-only transformers. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 19615−19625

[51]

Chen Y, Wang Y, Luo Y, Wang Z, Chen Z, Zhu J, Zhang C, Lin G. MeshAnything V2: artist-created mesh generation with adjacent mesh tokenization. 2024, arXiv preprint arXiv: 2408.02555

[52]

Wang Z, Lorraine J, Wang Y, Su H, Zhu J, Fidler S, Zeng X. LLaMA-mesh: unifying 3D mesh generation with language models. 2024, arXiv preprint arXiv: 2411.09595

[53]

Sun C, Han J, Deng W, Wang X, Qin Z, Gould S. 3D-GPT: procedural 3D modeling with large language models. In: Proceedings of 2025 International Conference on 3D Vision (3DV). 2025, 1253–1263

[54]

Hu Z, Iscen A, Jain A, Kipf T, Yue Y, Ross D A, Schmid C, Fathi A. SceneCraft: an LLM agent for synthesizing 3D scenes as blender code. In: Proceedings of the 41st International Conference on Machine Learning. 2024

[55]

Qi C R, Su H, Kaichun M, Guibas L J. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 77−85

[56]

Qi C R, Yi L, Su H, Guibas L J. PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 5105−5114

[57]

Roveri R, Öztireli A C, Pandele I, Gross M . PointProNets: consolidation of point clouds with convolutional neural networks. Computer Graphics Forum, 2018, 37( 2): 87–99

[58]

Chen T, Ying X . FPSMix: data augmentation strategy for point cloud classification. Frontiers of Computer Science, 2025, 19( 2): 192701

[59]

Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1912−1920

[60]

Maturana D, Scherer S. VoxNet: a 3D Convolutional Neural Network for real-time object recognition. In: Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2015, 922−928

[61]

Botsch M, Kobbelt L, Pauly M, Alliez P, Lévy B. Polygon Mesh Processing. New York: CRC Press, 2010

[62]

Shirman L A, Séquin C H . Local surface interpolation with Bézier patches. Computer Aided Geometric Design, 1987, 4( 4): 279–295

[63]

Han J, Cen J, Wu L, Li Z, Kong X, Jiao R, Yu Z, Xu T, Wu F, Wang Z, Xu H, Wei Z, Zhao D, Liu Y, Rong Y, Huang W . A survey of geometric graph neural networks: data structures, models and applications. Frontiers of Computer Science, 2025, 19( 11): 1911375

[64]

Kerbl B, Kopanas G, Leimkuehler T, Drettakis G . 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 2023, 42( 4): 139

[65]

Wu G, Yi T, Fang J, Xie L, Zhang X, Wei W, Liu W, Tian Q, Wang X. 4D Gaussian splatting for real-time dynamic scene rendering. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 20310−20320

[66]

Xu Y, Ye K, Shao T, Weng Y . Animatable 3D Gaussians for modeling dynamic humans. Frontiers of Computer Science, 2025, 19( 9): 199704

[67]

Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A. Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4455−4465

[68]

Park J J, Florence P, Straub J, Newcombe R, Lovegrove S. DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 165−174

[69]

Sitzmann V, Zollhöfer M, Wetzstein G. Scene representation networks: continuous 3D-structure-aware neural scene representations. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 101

[70]

Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R . NeRF: representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021, 65( 1): 99–106

[71]

Wang P, Liu L, Liu Y, Theobalt C, Komura T, Wang W. NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 2081

[72]

Müller T, Evans A, Schied C, Keller A . Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (TOG), 2022, 41( 4): 102

[73]

Chen A, Xu Z, Geiger A, Yu J, Su H. TensoRF: tensorial radiance fields. In: Proceedings of the 17th European Conference on Computer Vision. 2022, 333−350

[74]

Su H, Maji S, Kalogerakis E, Learned-Miller E. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of 2015 IEEE International Conference on Computer Vision. 2015, 945−953

[75]

Chen X, Ma H, Wan J, Li B, Xia T. Multi-view 3D object detection network for autonomous driving. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6526−6534

[76]

Yu Z, Dong Z, Yu C, Yang K, Fan Z, Chen C L P . A review on multi-view learning. Frontiers of Computer Science, 2025, 19( 7): 197334

[77]

Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A. Real-time human pose recognition in parts from single depth images. In: Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition. 2011, 1297−1304

[78]

Newcombe R A, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison A J, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: real-time dense surface mapping and tracking. In: Proceedings of 2011 10th IEEE International Symposium on Mixed and Augmented Reality. 2011, 127−136

[79]

Wu J, Zhang C, Xue T, Freeman W T, Tenenbaum J B. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 82−90

[80]

Lin C H, Gao J, Tang L, Takikawa T, Zeng X, Huang X, Kreis K, Fidler S, Liu M Y, Lin T Y. Magic3D: high-resolution text-to-3D content creation. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 300−309

[81]

Achlioptas P, Diamanti O, Mitliagkas I, Guibas L. Learning representations and generative models for 3D point clouds. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 40−49

[82]

Li C L, Zaheer M, Zhang Y, Póczos B, Salakhutdinov R. Point cloud GAN. In: Proceedings of the International Conference on Learning Representations. 2019

[83]

Shu D, Park S W, Kwon J. 3D point cloud generative adversarial network based on tree structured graph convolutions. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 3858−3867

[84]

Yang Z, Chen Y, Zheng X, Chang Y, Li X. Conditional GAN for point cloud generation. In: Proceedings of the 16th Asian Conference on Computer Vision. 2022, 117−133

[85]

Yang X, Wu Y, Zhang K, Jin C. CPCGAN: a controllable 3D point cloud generative adversarial network with semantic label generating. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 3154−3162

[86]

Ramasinghe S, Khan S, Barnes N, Gould S. Spectral-GANs for high-resolution 3D point-cloud generation. In: Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2020, 8169−8176

[87]

Cheng S, Bronstein M, Zhou Y, Kotsia I, Pantic M, Zafeiriou S. MeshGAN: non-linear 3D morphable models of faces. 2019, arXiv preprint arXiv: 1903.10384

[88]

Gao R, Wen M, Park J, Cho K . Human mesh reconstruction with generative adversarial networks from single RGB images. Sensors, 2021, 21( 4): 1350

[89]

Smith E J, Meger D. Improved adversarial systems for 3D object generation and reconstruction. In: Proceedings of the 1st Annual Conference on Robot Learning. 2017, 87−96

[90]

Ajayi E A, Lim K M, Chong S C, Lee C P . 3D shape generation via variational autoencoder with signed distance function relativistic average generative adversarial network. Applied Sciences, 2023, 13( 10): 5925

[91]

Khan S H, Guo Y, Hayat M, Barnes N. Unsupervised primitive discovery for improved 3D generative modeling. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9731−9740

[92]

Luo A, Li T, Zhang W H, Lee T S. SurfGen: adversarial 3D shape synthesis with explicit surface discriminators. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 16218−16228

[93]

Wu R, Zhuang Y, Xu K, Zhang H, Chen B. PQ-NET: a generative part Seq2Seq network for 3D shapes. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 826−835

[94]

Kleineberg M, Fey M, Weichert F. Adversarial generation of continuous implicit shape representations. In: Proceedings of the 41st Annual Conference of the European Association for Computer Graphics. 2020, 41−44

[95]

Zhu J Y, Zhang Z, Zhang C, Wu J, Torralba A, Tenenbaum J B, Freeman W T. Visual object networks: image generation with disentangled 3D representation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 118−129

[96]

Nguyen-Phuoc T, Li C, Theis L, Richardt C, Yang Y L. HoloGAN: unsupervised learning of 3D representations from natural images. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 7587−7596

[97]

Skorokhodov I, Tulyakov S, Wang Y, Wonka P. EpiGRAF: rethinking training of 3D GANs. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1778

[98]

Cai S, Obukhov A, Dai D, Van Gool L. Pix2NeRF: unsupervised conditional π-GAN for single image to neural radiance fields translation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 3971−3980

[99]

Gao J, Shen T, Wang Z, Chen W, Yin K, Li D, Litany O, Gojcic Z, Fidler S. GET3D: a generative model of high quality 3D textured shapes learned from images. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2308

[100]

Tewari A, R M B, Pan X, Fried O, Agrawala M, Theobalt C. Disentangled3D: learning a 3D generative model with disentangled geometry and appearance from monocular images. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 1506−1515

[101]

Nguyen-Phuoc T, Richardt C, Mai L, Yang Y L, Mitra N. BlockGAN: learning 3D object-aware scene representations from unlabelled images. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 568

[102]

Sun J, Wang X, Zhang Y, Li X, Zhang Q, Liu Y, Wang J. FENeRF: face editing in neural radiance fields. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 7662−7672

[103]

Xu Y, Peng S, Yang C, Shen Y, Zhou B. 3D-aware image synthesis via learning structural and textural representations. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 18409−18418

[104]

Sun J, Wang X, Wang L, Li X, Zhang Y, Zhang H, Liu Y. Next3D: generative neural texture rasterization for 3D-aware head avatars. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 20991−21002

[105]

Schwarz K, Sauer A, Niemeyer M, Liao Y, Geiger A. VoxGRAF: fast 3D-aware image synthesis with sparse voxel grids. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2464

[106]

Chen X, Cohen-Or D, Chen B, Mitra N J . Towards a neural graphics pipeline for controllable image generation. Computer Graphics Forum, 2021, 40( 2): 127–140

[107]

Deng Y, Yang J, Xiang J, Tong X. GRAM: generative radiance manifolds for 3D-aware image generation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 10663−10673

[108]

Nash C, Williams C K I . The shape variational autoencoder: a deep generative model of part-segmented 3D objects. Computer Graphics Forum, 2017, 36( 5): 1–12

[109]

Yang G, Huang X, Hao Z, Liu M Y, Belongie S, Hariharan B. PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 4540−4549

[110]

Gao L, Wu T, Yuan Y J, Lin M X, Lai Y K, Zhang H . TM-NET: deep generative networks for textured meshes. ACM Transactions on Graphics (TOG), 2021, 40( 6): 263

[111]

Kosiorek A R, Strathmann H, Zoran D, Moreno P, Schneider R, Mokrá S, Rezende D J. NeRF-VAE: a geometry aware 3D scene generative model. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 5742−5752

[112]

Mittal P, Cheng Y C, Singh M, Tulsiani S. AutoSDF: shape priors for 3D completion, reconstruction and generation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 306−315

[113]

Gao L, Yang J, Wu T, Yuan Y J, Fu H, Lai Y K, Zhang H . SDM-NET: deep generative network for structured deformable mesh. ACM Transactions on Graphics (TOG), 2019, 38( 6): 243

[114]

Guo J, Gao S, Bian J W, Sun W, Zheng H, Jia R, Gong M. Hyper3D: efficient 3D representation via hybrid triplane and octree feature for enhanced 3D shape Variational Auto-Encoders. 2025, arXiv preprint arXiv: 2503.10403

[115]

Chen R, Zhang J, Liang Y, Luo G, Li W, Liu J, Li X, Long X, Feng J, Tan P. Dora: sampling and benchmarking for 3D shape Variational Auto-Encoders. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 16251−16261

[116]

Wang H, Du X, Li J, Yeh R A, Shakhnarovich G. Score Jacobian chaining: lifting pretrained 2D diffusion models for 3D generation. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 12619−12629

[117]

ang P, Xu D, Fan Z, Wang D, Mohan S, Iandola F, Ranjan R, Li Y, Liu Q, Wang Z, Chandra V. Taming mode collapse in score distillation for text-to-3D generation. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 9037−9047

[118]

Wang Z, Lu C, Wang Y, Bao F, Li C, Su H, Zhu J. ProlificDreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 368

[119]

Raj A, Kaza S, Poole B, Niemeyer M, Ruiz N, Mildenhall B, Zada S, Aberman K, Rubinstein M, Barron J, Li Y, Jampani V. DreamBooth3D: subject-driven text-to-3D generation. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 2349−2359

[120]

Zhang J, Li X, Zhang Q, Cao Y, Shan Y, Liao J. HumanRef: single image to 3D human generation via reference-guided diffusion. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 1844−1854

[121]

Tsalicoglou C, Manhardt F, Tonioni A, Niemeyer M, Tombari F. TextMesh: generation of realistic 3D meshes from text prompts. In: Proceedings of 2024 International Conference on 3D Vision (3DV). 2024, 1554−1563

[122]

Metzer G, Richardson E, Patashnik O, Giryes R, Cohen-Or D. Latent-NeRF for shape-guided generation of 3D shapes and textures. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 12663−12673

[123]

Lukoianov A, de Ocáriz Borde H S, Greenewald K, Guizilini V C, Bagautdinov T, Sitzmann V, Solomon J. Score distillation via reparametrized DDIM. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 818

[124]

Tang J, Wang T, Zhang B, Zhang T, Yi R, Ma L, Chen D. Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 22762−22772

[125]

Qian G, Mai J, Hamdi A, Ren J, Siarohin A, Li B, Lee H Y, Skorokhodov I, Wonka P, Tulyakov S, Ghanem B. Magic123: one image to high-quality 3D object generation using both 2D and 3D diffusion priors. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[126]

Yu W, Yuan L, Cao Y P, Gao X, Li X, Hu W, Quan L, Shan Y, Tian Y. HiFi-123: towards high-fidelity one image to 3D content generation. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 258−274

[127]

Yan R, Wu K, Ma K. Flow score distillation for diverse text-to-3D generation. 2024, arXiv preprint arXiv: 2405.10988

[128]

Tang B, Wang J, Wu Z, Zhang L. Stable score distillation for high-quality 3D generation. 2023, arXiv preprint arXiv: 2312.09305

[129]

Luo S, Hu W. Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 2836−2844

[130]

Zhang B, Tang J, Nießner M, Wonka P . 3DShape2VecSet: a 3D shape representation for neural fields and generative diffusion models. ACM Transactions on Graphics (TOG), 2023, 42( 4): 92

[131]

Xiang J, Lv Z, Xu S, Deng Y, Wang R, Zhang B, Chen D, Tong X, Yang J. Structured 3D latents for scalable and versatile 3D generation. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 21469−21480

[132]

Liang Y, Luo K, Chen X, Chen R, Yan H, Li W, Liu J, Tan P. UniTEX: universal high fidelity generative texturing for 3D shapes. 2025, arXiv preprint arXiv: 2505.23253

[133]

Ye C, Wu Y, Lu Z, Chang J, Guo X, Zhou J, Zhao H, Han X. Hi3DGen: high-fidelity 3D geometry generation from images via normal bridging. 2025, arXiv preprint arXiv: 2503.22236

[134]

Wu S, Lin Y, Zhang F, Zeng Y, Yang Y, Bao Y, Qian J, Zhu S, Cao X, Torr P, Yao Y. Direct3D-S2: gigascale 3D generation made easy with spatial sparse attention. arXiv preprint arXiv: 2505.17412

[135]

Li Y, Zou Z X, Liu Z, Wang D, Liang Y, Yu Z, Liu X, Guo Y C, Liang D, Ouyang W, Cao Y P. TripoSG: high-fidelity 3D shape synthesis using large-scale rectified flow models. 2025, arXiv preprint arXiv: 2502.06608

[136]

Richardson E, Metzer G, Alaluf Y, Giryes R, Cohen-Or D. TEXTure: text-guided texturing of 3D shapes. In: Proceedings of 2023 ACM SIGGRAPH Conference Proceedings. 2023, 54

[137]

Zeng X, Chen X, Qi Z, Liu W, Zhao Z, Wang Z, Fu B, Liu Y, Yu G. Paint3D: paint anything 3D with lighting-less texture diffusion models. In: Proceedings of 2014 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 4252−4262

[138]

Bensadoun R, Kleiman Y, Azuri I, Harosh O, Vedaldi A, Neverova N, Gafni O. Meta 3D TextureGen: fast and consistent texture generation for 3D objects. 2024, arXiv preprint arXiv: 2407.02430

[139]

Liu M, Xu C, Jin H, Chen L, T M V, Xu Z, Su H. One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 976

[140]

Li M, Long X, Liang Y, Li W, Liu Y, Li P, Chi X, Qi X, Xue W, Luo W, Liu Q, Guo Y. M-LRM: multi-view large reconstruction model. 2024, arXiv preprint arXiv: 2406.07648

[141]

Xu D, Yuan Y, Mardani M, Liu S, Song J, Wang Z, Vahdat A. AGG: amortized generative 3D Gaussians for single image to 3D. Transactions on Machine Learning Research, 2024

[142]

Szymanowicz S, Rupprecht C, Vedaldi A. Splatter image: ultra-fast single-view 3D reconstruction. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 10208−10217

[143]

Zhang C, Song H, Wei Y, Chen Y, Lu J, Tang Y. GeoLRM: geometry-aware large reconstruction model for high-quality 3D Gaussian generation. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 1772

[144]

Shen Q, Wu Z, Yi X, Zhou P, Zhang H, Yan S, Wang X. Gamba: marry Gaussian splatting with mamba for single view 3D reconstruction. 2024, arXiv preprint arXiv: 2403.18795, 2024

[145]

Lu L, Gao H, Dai T, Zha Y, Hou Z, Wu J, Xia S T. Large point-to-Gaussian model for image-to-3D generation. In: Proceedings of the 32nd ACM International Conference on Multimedia. 2024, 10843−10852

[146]

Xiong B, Liu J, Hu J, Wu C, Wu J, Liu X, Zhao C, Ding E, Lian Z. TexGaussian: generating high-quality PBR material via octree-based 3D Gaussian splatting. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 551−561

[147]

Jiang L, Zheng X, Lyu Y, Zhou J, Wang L. BrightDreamer: generic 3D Gaussian generative framework for fast text-to-3D synthesis. 2024, arXiv preprint arXiv: 2403.11273

[148]

Chen Y, He T, Huang D, Ye W, Chen S, Tang J, Cai Z, Yang L, Yu G, Lin G, Zhang C. MeshAnything: artist-created mesh generation with autoregressive transformers. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[149]

Zhang J, Xiong F, Wang G, Xu M. G3PT: unleash the power of autoregressive modeling in 3D generation via cross-scale querying transformer. In: Proceedings of the 34th International Joint Conference on Artificial Intelligence. 2025, 262

[150]

Tang J, Li Z, Hao Z, Liu X, Zeng G, Liu M Y, Zhang Q. EdgeRunner: auto-regressive auto-encoder for artistic mesh generation. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[151]

Weng H, Zhao Z, Lei B, Yang X, Liu J, Lai Z, Chen Z, Liu Y, Jiang J, Guo C, Zhang T, Gao S, Chen C L P. Scaling mesh generation via compressive tokenization. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 11093−11103

[152]

Chen Y, Lan Y, Zhou S, Wang T, Pan X. SAR3D: autoregressive 3D object generation and understanding via multi-scale 3D VQVAE. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 28371−28382

[153]

Medi T, Rampini A, Reddy P, Jayaraman P K, Keuper M. 3D-WAG: hierarchical wavelet-guided autoregressive generation for high-fidelity 3D shapes. 2024, arXiv preprint arXiv: 2411.19037

[154]

Zhang J, Xiong F, Xu M. 3D representation in 512-Byte: Variational Tokenizer is the key for autoregressive 3D generation. 2024, arXiv preprint arXiv: 2412.02202

[155]

Hao Z, Romero D W, Lin T Y, Liu M Y. Meshtron: high-fidelity, artist-like 3D mesh generation at scale. 2024, arXiv preprint arXiv: 2412.09548

[156]

Gao D, Siddiqui Y, Li L, Dai A. MeshArt: generating articulated meshes with structure-guided transformers. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 618−627

[157]

Zhang X, Liu Y, Li Y, Zhang R, Liu Y, Wang K, Ouyang W, Xiong Z, Gao P, Hou Q, Cheng M M. TAR3D: creating high-quality 3D assets via next-part prediction. 2024, arXiv preprint arXiv: 2412.16919

[158]

Liu I, Xu Z, Wang Y, Tan H, Xu Z, Wang X, Su H, Shi Z . RigAnything: template-free autoregressive rigging for diverse 3D assets. ACM Transactions on Graphics (TOG), 2025, 44( 4): 122

[159]

Wang Y, Yi X, Weng H, Xu Q, Wei X, Yang X, Guo C, Chen L, Zhang H. Nautilus: locality-aware autoencoder for scalable mesh generation. 2025, arXiv preprint arXiv: 2501.14317

[160]

Gao J, Liu W, Sun W, Wang S, Song X, Shang T, Chen S, Li H, Yang X, Yan Y, Ji P. MARS: mesh autoregressive model for 3D shape detailization. 2025, arXiv preprint arXiv: 2502.11390

[161]

Li H, Erkoc Z, Li L, Sirigatti D, Rozov V, Dai A, Nießner M. MeshPad: interactive sketch conditioned artistic-designed mesh generation and editing. 2025, arXiv preprint arXiv: 2503.01425

[162]

Meng Z, Wang Q, Zhou Z, King I, Zhao P. 3D point cloud generation via autoregressive up-sampling. 2025, arXiv preprint arXiv: 2503.08594

[163]

Fei F, Tang J, Tian F P, Shi B, Tan P. PacTure: efficient PBR texture generation on packed views with visual autoregressive models. 2025, arXiv preprint arXiv: 2505.22394

[164]

Lionar S, Liang J, Lee G H. TreeMeshGPT: artistic mesh generation with autoregressive tree sequencing. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 26608−26617

[165]

Zhao R, Ye J, Wang Z, Liu G, Chen Y, Wang Y, Zhu J. DeepMesh: auto-regressive artist-mesh creation with reinforcement learning. 2025, arXiv preprint arXiv: 2503.15265

[166]

Liu X, Tang C K, Tai Y W. WorldCraft: photo-realistic 3D world creation and customization via LLM agents. 2025, arXiv preprint arXiv: 2502.15601

[167]

Huang I, Yang G, Guibas L. BlenderAlchemy: editing 3D graphics with vision-language models. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 297−314

[168]

Chen J, Li X, Ye X, Li C, Fan Z, Zhao H. Idea23D: collaborative LMM agents enable 3D model generation from interleaved multimodal inputs. In: Proceedings of the 31st International Conference on Computational Linguistics. 2025, 4149−4166

[169]

Zou Z X, Yu Z, Guo Y C, Li Y, Liang D, Cao Y P, Zhang S H. Triplane meets Gaussian splatting: fast and generalizable single-view 3D reconstruction with transformers. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 10324−10335

[170]

Ling L, Lin C H, Lin T Y, Ding Y, Zeng Y, Sheng Y, Ge Y, Liu M Y, Bera A, Li Z. Scenethesis: a language and vision agentic framework for 3D scene generation. 2025, arXiv preprint arXiv: 2505.02836

[171]

Wen X, Zhu X, Yi R, Wang Z, Zhu C, Xu K . CAD-NeRF: learning NeRFs from uncalibrated few-view images by CAD model retrieval. Frontiers of Computer Science, 2025, 19( 10): 1910706

[172]

Kingma D P, Welling M. Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations. 2014

[173]

Zamorski M, Zięba M, Klukowski P, Nowak R, Kurach K, Stokowiec W, Trzciński T . Adversarial autoencoders for compact representations of 3D point clouds. Computer Vision and Image Understanding, 2020, 193: 102921

[174]

Song J, Meng C, Ermon S. Denoising diffusion implicit models. In: Proceedings of the 9th International Conference on Learning Representations, 2021

[175]

Peebles W, Xie S. Scalable diffusion models with transformers. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 4172−4182

[176]

Kwak M S, Ahn D, Kim I H, Kim J H, Kim S. Geometry-aware score distillation via 3D consistent noising and gradient consistency modeling. 2024, arXiv preprint arXiv: 2406.16695

[177]

Dong J, Fang Q, Huang Z, Xu X, Wang J, Peng S, Dai B. TELA: text to layer-wise 3D clothed human generation. In: Proceedings of the 18th European Conference on Computer Vision. 2024, 19−36

[178]

Zhou L, Du Y, Wu J. 3D shape generation and completion through point-voxel diffusion. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5806−5815

[179]

Huang Z, Guo Y C, An X, Yang Y, Li Y, Zou Z X, Liang D, Liu X, Cao Y P, Sheng L. MIDI: multi-instance diffusion for single image to 3D scene generation. In: Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025, 23646−23657

[180]

Hunyuan3D Team. Hunyuan3D 2.0: scaling diffusion models for high resolution textured 3D assets generation. 2025, arXiv preprint arXiv: 2501.12202

[181]

He X, Zou Z X, Chen C H, Guo Y C, Liang D, Yuan C, Ouyang W, Cao Y P, Li Y. SparseFlex: high-resolution and arbitrary-topology 3D shape modeling. 2025, arXiv preprint arXiv: 2503.21732

[182]

Bao H, Du K, Su X, Fan J, Huang J, Dong W, Lu L, Li K . EMP3D: an emergency medical procedures 3D dataset with pose and shape. Frontiers of Computer Science, 2025, 19( 11): 1911368

[183]

Deitke M, Liu R, Wallingford M, Ngo H, Michel O, Kusupati A, Fan A, Laforte C, Voleti V, Gadre S Y, VanderBilt E, Kembhavi A, Vondrick C, Gkioxari G, Ehsani K, Schmidt L, Farhadi A. Objaverse-XL: a universe of 10M+ 3D objects. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1554

[184]

Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention–MICCAI 2015. 2015, 234−241

[185]

Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, Ni L, Shum H Y. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[186]

Tencent Hunyuan3D. Hunyuan3D 1.0: a unified framework for text-to-3D and image-to-3D generation. 2024, arXiv preprint arXiv: 2411.02293

[187]

Bao Y, Ding T, Huo J, Liu Y, Li Y, Li W, Gao Y, Luo J . 3D Gaussian splatting: survey, technologies, challenges, and opportunities. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35( 7): 6832–6852

[188]

Rossignac J . Edgebreaker: connectivity compression for triangle meshes. IEEE Transactions on Visualization and Computer Graphics, 1999, 5( 1): 47–61

[189]

Schick T, Dwivedi-Yu J, Dessí R, Raileanu R, Lomeli M, Hambro E, Zettlemoyer L, Cancedda N, Scialom T. Toolformer: language models can teach themselves to use tools. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 2997

[190]

Gong C, Li X . Agents with foundation models: advance and vision. Frontiers of Computer Science, 2025, 19( 4): 194330

[191]

Qu C, Dai S, Wei X, Cai H, Wang S, Yin D, Xu J, Wen J R . Tool learning with large language models: a survey. Frontiers of Computer Science, 2025, 19( 8): 198343

[192]

Hang T, Gu S, Chen D, Geng X, Guo B . CCA: collaborative competitive agents for image editing. Frontiers of Computer Science, 2025, 19( 11): 1911367

[193]

Kaplan J, McCandlish S, Henighan T, Brown T B, Chess B, Child R, Gray S, Radford A, Wu J, Amodei D. Scaling laws for neural language models. 2020, arXiv preprint arXiv: 2001.08361

[194]

Zhao, Yiqin, Shen, Yu, Petrangeli, Stefano, Gadelha, Matheus, Nguyen, Cuong, Wu, Gang. Copiloting Creative 3D Scene Modeling and Visualization with Generative Agents. NeurIPS 2024 Workshop on Creativity and Generative AI, 2024

[195]

ahujasid. Blender MCP. See github.com/ahujasid/blender-mcp, 2025

[196]

Zhu B, Zhang H . Debiasing vision-language models for vision tasks: a survey. Frontiers of Computer Science, 2025, 19( 1): 191321

[197]

Labs B F, Batifol S, Blattmann A, Boesel F, Consul S, Diagne C, Dockhorn T, English J, English Z, Esser P, Kulal S, Lacey K, Levi Y, Li C, Lorenz D, Müller J, Podell D, Rombach R, Saini H, Sauer A, Smith L. FLUX.1 Kontext: flow matching for in-context image generation and editing in latent space. 2025, arXiv preprint arXiv: 2506.15742

[198]

Pope R, Douglas S, Chowdhery A, Devlin J, Bradbury J, Heek J, Xiao K, Agrawal S, Dean J. Efficiently scaling transformer inference. In: Proceedings of the 6th Conference on Machine Learning and Systems, 2023, 606−624

[199]

Shazeer N. Fast transformer decoding: one write-head is all you need. 2019, arXiv preprint arXiv: 1911.02150

[200]

Ainslie J, Lee-Thorp J, de Jong M, Zemlyanskiy Y, Lebron F, Sanghai S. GQA: training generalized multi-query transformer models from multi-head checkpoints. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 4895−4901

[201]

Guo T, Chen X, Wang Y, Chang R, Pei S, Chawla N V, Wiest O, Zhang X. Large language model based multi-agents: a survey of progress and challenges. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence. 2024, 890

[202]

Anthropic. Model context protocol. See modelcontextprotocol.io/docs/getting-started/intro website, 2024

[203]

Heidari N, Iosifidis A . Geometric deep learning for computer-aided design: a survey. IEEE Access, 2025, 13: 119305–119334

[204]

Xu X, Lambourne J, Jayaraman P, Wang Z, Willis K, Furukawa Y . BrepGen: a B-rep generative diffusion model with structured latent geometry. ACM Transactions on Graphics (TOG), 2024, 43( 4): 119

[205]

Jayaraman P K, Lambourne J G, Desai N, Willis K, Sanghi A, Morris N J W. SolidGen: an autoregressive model for direct B-rep synthesis. Transactions on Machine Learning Research, 2023

[206]

Zhou S, Tang T, Zhou B. CADParser: a learning approach of sequence modeling for B-rep CAD. In: Proceedings of the 32nd International Joint Conference on Artificial Intelligence. 2023, 1804−1812

[207]

Xu X, Willis K D D, Lambourne J G, Cheng C Y, Jayaraman P K, Furukawa Y. SkexGen: autoregressive generation of CAD construction sequences with disentangled codebooks. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 24698−24724

[208]

Zhang L, Le B, Akhtar N, Lam S K, Ngo T. Large language models for computer-aided design: a survey. 2025, arXiv preprint arXiv: 2505.08137

[209]

Xie H, Ju F. Text-to-CadQuery: a new paradigm for CAD generation with scalable large model capabilities. 2025, arXiv preprint arXiv: 2505.06507

[210]

Wang S, Chen C, Le X, Xu Q, Xu L, Zhang Y, Yang J. CAD-GPT: synthesising CAD construction sequence with spatial reasoning-enhanced multimodal LLMs. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence. 2025, 7880−7888

[211]

Liao J, Xu J, Sun Y, Tang M, He S, Liao J, Yu S, Li Y, Guan X. Automated CAD modeling sequence generation from text descriptions via transformer-based large language models. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025, 21720−21748

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (6039KB)

Supplementary files

Highlights

398

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/