Drag2Build++: A drag-based 3D architectural mesh editing workflow based on differentiable surface modeling

Jun Yin , Pengyu Zeng , Peilin Li , Jing Zhong , Tianze Hao , Han Zheng , Shuai Lu

Front. Archit. Res. ›› 2025, Vol. 14 ›› Issue (6) : 1602 -1620.

PDF (11743KB)
Front. Archit. Res. ›› 2025, Vol. 14 ›› Issue (6) :1602 -1620. DOI: 10.1016/j.foar.2025.07.005
RESEARCH ARTICLE

Drag2Build++: A drag-based 3D architectural mesh editing workflow based on differentiable surface modeling

Author information +
History +
PDF (11743KB)

Abstract

In modern architectural design, as complexity increases and diverse demands emerge, reconstructing 3D spaces has become a crucial method. However, existing methods remain limited to small-scale scenarios and exhibit poor reconstruction accuracy when applied to building-scale environments, resulting in unstable mesh quality and reduced design productivity. Furthermore, the lack of real-time, interactive editing tools prolongs design iteration cycles and impedes workflow efficiency. To address this issue, we propose the following contributions:

(1) We construct ArchiNet++, an architectural dataset that includes 710,180 multi-view images, 5200 SketchUp models, and corresponding camera parameters from the conceptual design phase of architectural projects.

(2) We introduce Drag2Build++, an interactive 3D mesh reconstruction framework featuring drag-based editing and three core innovations: a differentiable geometry module for fine-grained deformation, a 2D-3D rendering bridge for supervision, and a GAN-based refinement module for photorealistic texture synthesis.

(3) Comprehensive experiments demonstrate that our model excels in generating highquality 3D meshes and enables rapid mesh editing via drag-based interactions. Furthermore, by incorporating textured mesh generation into this interactive workflow, it improves both efficiency and modeling flexibility.

We hope this combination can contribute to a more intuitive modeling process and offer a practical tool set that supports the digital transformation efforts within architectural design.

Keywords

3D architectural generation / GAN model / 3D reconstruction / Drag-based generation

Cite this article

Download citation ▾
Jun Yin, Pengyu Zeng, Peilin Li, Jing Zhong, Tianze Hao, Han Zheng, Shuai Lu. Drag2Build++: A drag-based 3D architectural mesh editing workflow based on differentiable surface modeling. Front. Archit. Res., 2025, 14(6): 1602-1620 DOI:10.1016/j.foar.2025.07.005

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Al-Rqaibat, S.A. , Al-Nusair, S. , Bataineh, R. , 2025. Enhancing architectural education through hybrid digital tools: investigating the impact on design creativity and cognitive processes. Smart Learning Environments 12 (1), 26.

[2]

Alidoost, F. , Arefi, H. , Tombari, F. , 2019. 2D image-to-3D model: knowledge-based 3D building reconstruction (3DBR) using single aerial images and convolutional neural networks (CNNs). Remote Sens. 11 (19), 2219.

[3]

Ashawkey. (n.d.). Drag3D[GitHub repository]. GitHub. Retrieved February 4, 2025, from https://github.com/ashawkey/Drag3D.

[4]

Bagasi, O. , Nawari, N.O. , Alsaffar, A. , 2025. BIM and AI in early design stage: advancing architect-client communication. Buildings 15 (12), 1977.

[5]

Biljecki, F. , Ledoux, H. , Stoter, J. , 2016. An improved LOD specification for 3D building models. Comput. Environ. Urban Syst. 59, 25- 37.

[6]

Broad, T. , Leymarie, F.F. , Grierson, M. , 2021. Network bending: expressive manipulation of generative models in multiple domains. Entropy 24 (1), 28.

[7]

Brutto, M.L. , Meli, P. , 2012. Computer vision tools for 3D modelling in archaeology. International Journal of Heritage in the Digital Era 1 (1_Suppl. l) , 1- 6.

[8]

Bryan, L. , 2005. How Designers Think: the Design Process Demystified. Architectural Preee and Elsevier.

[9]

Chen, Y. , Huang, Z. , Gan, Y. , Yang, M. , 2022. MeshDiff: differentiable surface manipulation via neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1537-1546.

[10]

Chen, Z. , Ledoux, H. , Khademi, S. , Nan, L. , 2022. Reconstructing compact building models from point clouds using deep implicit fields. ISPRS J. Photogrammetry Remote Sens. 194, 58- 73.

[11]

Chen, B. , Peng, X. , Zhou, S. , Liu, X. , Luo, Y. , 2023. A study on intelligent recognition and 3D modeling of rural buildings based on multi-source data fusion. Re Dai Di Li 43 (2), 190- 201.

[12]

Cui, Y. , Zhao, X. , Zhang, G. , Cao, S. , Ma, K. , Wang, L. , 2024. StableDrag: stable dragging for point-based image editing. In: European Conference on Computer Vision. Springer Nature Switzerland, Cham, pp. 340-356.

[13]

Deng, N. , He, Z. , Ye, J. , Duinkharjav, B. , Chakravarthula, P. , Yang, X. , Sun, Q. , 2022. Fov-nerf: foveated neural radiance fields for virtual reality. IEEE Trans. Visual. Comput. Graph. 28 (11), 3854- 3864.

[14]

Díaz González, E.M. , Belaroussi, R. , Soto-Martín, O. , Acosta, M. , Martín-Gutierrez, J. , 2025. Effect of interactive virtual reality on the teaching of conceptual design in engineering and architecture fields. Appl. Sci. 15 (8), 4205.

[15]

El-Khouly, T. , Abdelhalim, O. , 2024. Preserving conceptual design integrity: strategies for enhancing interoperability in architectural digital design workflows. Sci. Rep. 14 (1), 30595.

[16]

Gao, J. , Shen, T. , Wang, Z. , Chen, W. , Yin, K. , Li, D. , et al., 2022. Get3d: a generative model of high quality 3d textured shapes learned from images. Adv. Neural Inf. Process. Syst. 35, 31841- 31854.

[17]

Gu, C. , Li, Z. , Zhang, Z. , Bai, Y. , Xie, S. , Wang, Z. , 2024. Dragscene: Interactive 3D Scene Editing with single-view Drag Instructions arXiv preprint arXiv: 2412.13552.

[18]

Guang, W. , Gu, X. , Huang, M. , Mao, Z. , 2025. Dragin3D: image editing by dragging in 3D space. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 21502-21512.

[19]

He, Z. , Kan, M. , Shan, S. , 2021. Eigengan: layer-wise eigen-learning for gans. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14408-14417.

[20]

Huang, Yujian , et al., 2025. Fusing transformer and diffusion for high-resolution prediction of daylight illuminance and glare based on sparse ceiling-mounted input. Build. Environ. 267, 112163.

[21]

IT-Jim , 2023. Nerf in 2023: Theory and Practice. Retrieved from it-jim.com/blog/nerf-in-2023-theory-and-practice/.

[22]

Jacob, Munkberg , Hasselgren, Jon , Shen, Tianchang , Gao, Jun , Chen, Wenzheng , Evans, Alex , Müller, Thomas , Fidler, Sanja , 2022. Extracting triangular 3d models, materials, and lighting from images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8280-8290.

[23]

Jun, H. , Nichol, A. , 2023. Shap-e: generating conditional 3d implicit functions. arXiv preprint arXiv: 2305.02463.

[24]

Karras, T. , Laine, S. , Aittala, M. , Hellsten, J. , Lehtinen, J. , Aila, T. , 2020. Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8110-8119.

[25]

Kato, H. , Ushiku, Y. , Harada, T. , 2018. Neural 3D mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 3907-3916.

[26]

Kelly, T. , 2021. Cityengine: an Introduction to Rule-based Modeling. Urban informatics, pp. 637-662.

[27]

Kerbl, B. , Kopanas, G. , Leimkühler, T. , Ritschel, T. , 2023. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42 (4), 1- 14.

[28]

Kingma, D.P. , 2014. Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412.6980.

[29]

Kippers, R.G. , Koeva, M. , van Keulen, M. , Oude Elberink, S.J. , 2021. Automatic 3D building model generation using deep learning methods based on CityJSON and 2D floor plans. Int. Arch. Photogram. Rem. Sens. Spatial Inf. Sci. 46, 49- 54.

[30]

Kolbe, T.H. , 2009. Representing and exchanging 3D city models with CityGML. In: 3D geo-information Sciences. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 15-31.

[31]

Laine, Samuli , Hellsten, Janne , Karras, Tero , Seol, Yeongho , Lehtinen, Jaakko , Aila, Timo , 2020. Modular primitives for high-performance differentiable rendering. ACM Trans. Graph. 39 (6).

[32]

Li, Junjie , Lu, Shuai , Wang, Qingguo , 2018. Graphical visualisation assist analysis of indoor environmental performance: impact of atrium spaces on public buildings in cold climates. Indoor Built Environ. 27 (3), 331- 347.

[33]

Li, W. , Meng, L. , Wang, J. , He, C. , Xia, G.S. , Lin, D. , 2021. 3D building reconstruction from monocular remote sensing images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12548-12557.

[34]

Li, Y. , Zhao, T. , Chen, J. , Liu, X. , Wang, C. , Zhang, Y. , Zhang, Z. , 2023. ShapeCrafter: a text-guided 3D shape editing framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2012-2022.

[35]

Li, M. , Li, M. , Li, M. , Xu, L. , 2025. Building lightweight 3D indoor models from point clouds with enhanced scene understanding. Remote Sens. 17 (4), 596.

[36]

Ling, P. , Chen, L. , Zhang, P. , Chen, H. , Jin, Y. , Zheng, J. , 2024. Freedrag: feature dragging for reliable point-based image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6860-6870.

[37]

Long, X. , Guo, Y.C. , Lin, C. , Liu, Y. , Dou, Z. , Liu, L. , et al., 2024. Wonder3d: single image to 3d using cross-domain diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9970-9980.

[38]

Lu, Shuai , et al., 2016. The influence of shape design on the acoustic performance of concert halls from the viewpoint of acoustic potential of shapes. Acta Acustica united Acustica 102 (6), 1027- 1044.

[39]

Lu, Shuai , et al., 2017. An experimental study on the acoustic absorption of sand panels. Appl. Acoust. 116, 238- 248.

[40]

Lu, S. , Luo, Y. , Gao, W. , Lin, B. , 2024. Supporting early-stage design decisions with building performance optimisation: findings from a design experiment. J. Build. Eng. 82, 108298.

[41]

Luo, Yilu , et al., 2025. Outdoor space design and its effect on mental work performance in a subtropical climate. Build. Environ. 270, 112470.

[42]

Ma, Xintong , et al., 2025. Street microclimate prediction based on Transformer model and street view image in high-density urban areas. Build. Environ. 269, 112490.

[43]

Mahmoud, M. , Zhao, Z. , Chen, W. , Adham, M. , Li, Y. , 2025. Automated Scan-to-BIM: a deep learning-based framework for indoor environments with complex furniture elements. J. Build. Eng. 106, 112596.

[44]

Mildenhall, B. , Srinivasan, P.P. , Tancik, M. , Barron, J.T. , Ramamoorthi, R. , Ng, R. , 2021. Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65 (1), 99- 106.

[45]

Miller, N. , Stasiuk, D. , 2017. A novel mesh-based workflow for complex geometry in BIM. In: Proceedings of the Acadia.

[46]

Müller, T. , Evans, A. , Schied, C. , Keller, A. , 2022. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph 41 (4), 102: 1- 102: 15.

[47]

Mura, C. , Mattausch, O. , Villanueva, A.J. , Gobbetti, E. , Pajarola, R. , 2014. Automatic room detection and reconstruction in cluttered indoor environments with complex room layouts. Comput. Graph. 44, 20- 32.

[48]

Nichol, A. , Jun, H. , Dhariwal, P. , Mishkin, P. , Chen, M. , 2022. Pointe: a system for generating 3d point clouds from complex prompts. arXiv preprint arXiv: 2212.08751.

[49]

Nie, Y. , Han, X. , Guo, S. , Zheng, Y. , Chang, J. , Zhang, J.J. , 2020. Total3dunderstanding: joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 55-64.

[50]

Oechsle Michael , Mescheder Lars , Niemeyer Michael , Strauss Thilo , Geiger Andreas , 2019. Texture fields: learning texture representations in function space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4531-4540.

[51]

Pan, X. , Tewari, A. , Leimkühler, T. , Liu, L. , Meka, A. , Theobalt, C. , 2023. Drag your gan: interactive point-based manipulation on the generative image manifold. InACM SIGGRAPH 2023 Conference Proceedings 23, 1- 11.

[52]

Pan, X. , Lin, Q. , Ye, S. , Li, L. , Guo, L. , Harmon, B. , 2024. Deep learning based approaches from semantic point clouds to semantic BIM models for heritage digital twin. Heritage Science 12 (1), 65.

[53]

Paszke, A. , Gross, S. , Massa, F. , Lerer, A. , Bradbury, J. , Chanan, G. , et al., 2019. Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32.

[54]

Peng, S. , Niemeyer, M. , Mescheder, L. , Pollefeys, M. , Geiger, A. , 2020. Convolutional occupancy networks. In: Computer Vision-ECCV 2020: 16Th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part III 16. Springer International Publishing, pp. 523-540.

[55]

Remondino, F. , 2011. Heritage recording and 3D modeling with photogrammetry and 3D scanning. Remote Sens. 3 (6), 1104- 1138.

[56]

Remondino, F. , Karami, A. , Yan, Z. , Mazzacca, G. , Rigon, S. , Qin, R. , 2023. A critical analysis of NeRF-based 3D reconstruction. Remote Sens. 15 (14), 3585.

[57]

Richardson, E. , Zhang, Y. , Kashifuji, Y. , Aigerman, N. , 2022. GET3D: a generative model of high quality 3D textured shapes learned from images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10694.

[58]

Shen Tianchang , Gao Jun , Yin Kangxue , Liu Ming-Yu , Fidler Sanja , 2021. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In: Advances in Neural Information Processing Systems (Neurips).

[59]

Shi, Y. , Xue, C. , Liew, J.H. , Pan, J. , Yan, H. , Zhang, W. , Tan, V.Y. , Bai, S. , 2024. Dragdiffusion: harnessing diffusion models for interactive point-based image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8839-8849.

[60]

Sun, Y. , Wu, X. , Li, P. , Jin, Z. , Wang, Y. , Wang, H. , Li, J. , 2023. DreamEditor3D: Text-driven 3D editing via multi-view consistent diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 20127-20137.

[61]

Tancik, M. , Weber, E. , Ng, E. , Li, R. , Yi, B. , Wang, T. , Pratul, P. , Kanazawa, A. , 2023. Nerfstudio: a modular framework for neural radiance field development. In: ACM SIGGRAPH 2023 Conference Proceedings. ACM, pp. 1-12.

[62]

Tang, P. , Huber, D. , Akinci, B. , Lipman, R. , Lytle, A. , 2010. Automatic reconstruction of as-built building information models from laser-scanned point clouds: a review of related techniques. Autom. ConStruct. 19 (7), 829- 843.

[63]

Teng, Y. , Xie, E. , Wu, Y. , Han, H. , Li, Z. , Liu, X. , 2023. Drag-avideo: Non-Rigid Video Editing with Point-based Interaction arXiv preprint arXiv: 2312.02936.

[64]

Vachha, C. , 2023. Creating visual effects with neural radiance fields. arXiv preprint arXiv: 2401.08633.

[65]

Wang, L. , 2022. Workflow for applying optimization-based design exploration to early-stage architectural design-case study based on EvoMass. Int. J. Architect. Comput. 20 (1), 41- 60.

[66]

Wang, Q. , Sohn, H. , Cheng, J.C. , 2018. Automatic as-built BIM creation of precast concrete bridge deck panels using laser scan data. J. Comput. Civ. Eng. 32 (3), 04018011.

[67]

Xie, J. , Girshick, R. , Farhadi, A. , 2016. Deep3D: fully automatic 2Dto-3D video conversion with deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1-9.

[68]

Xie, H. , Yao, H. , Zhang, S. , Zhou, S. , Sun, W. , 2020. Pix2Vox++: Multi-scale context-aware 3D object reconstruction from single and multiple images. Int. J. Comput. Vis. 128 (12), 2919- 2935.

[69]

Xu, Z. , Lin, J. , Gao, Z. , Wang, X. , Lin, S. , 2023. NeuralLift-360: lifting an in-the-wild 2D photo to a 3D object with 360° views. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19923-19933.

[70]

Yang, X. , Koehl, M. , Grussenmeyer, P. , 2018. Mesh-to-BIM: from segmented mesh elements to BIM model with limited parameters. Int. Arch. Photogram. Rem. Sens. Spatial Inf. Sci. 42, 1213- 1218.

[71]

Yin, J. , Xu, P. , Gao, W. , Zeng, P. , Lu, S. , 2024. Drag2Build: interactive point-based manipulation of 3D architectural point clouds generated from a single image. In: Gardner, N., Herr, C. M., Wang, L., Hirano, T., Khan, S.A. (Eds.), Accelerated Design: Proceedings of the 29th CAADRIA Conference, vol. 1. CAADRIA, Singapore, pp. 169-178.

[72]

Yu, Tingtao , et al., 2025. Machine learning prediction on spatial and environmental perception and work efficiency using electroencephalography including cross-subject scenarios. J. Build. Eng. 99, 111644.

[73]

Zeng, Pengyu , et al., 2024. Residential floor plans: Multi-conditional automatic generation using diffusion models. Autom. ConStruct. 162, 105374.

[74]

Zeng, P. , Gao, W. , Li, J. , et al., 2025a. Automated residential layout generation and editing using natural language and images. Autom. ConStruct. 174, 106133.

[75]

Zeng, Tiancheng , et al. 2025b. Improving outdoor thermal environmental quality through kinetic canopy empowered by machine learning and control algorithms. Building Simulation. Tsinghua University Press, Beijing.

[76]

Zhang, H. , 2019. 3D model generation on architectural plan and section training through machine learning. Technologies 7 (4), 82.

[77]

Zhang, T. , 2019. Residential group layout generation method based on generative adversarial networks. Jianzhu Jishu Yanjiu(Architectural Technology Research) 3 (2), 45- 51.

[78]

Zhang, Y. , Ling, H. , Gao, J. , Yin, K. , Lafleche, J.F. , Barriuso, A. , et al., 2021. Datasetgan: efficient labeled data factory with minimal human effort. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10145-10155.

[79]

Zheng, Y. , Yifan, W. , Wetzstein, G. , Black, M.J. , Hilliges, O. , 2023. Pointavatar: deformable point-based head avatars from videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21057-21067.

RIGHTS & PERMISSIONS

The Author(s). Publishing services by Elsevier B.V. on behalf of Higher Education Press and KeAi.

AI Summary AI Mindmap
PDF (11743KB)

212

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/