1 Introduction
Chinese classical gardens, represented by Jiangnan private gardens and northern imperial gardens, constitute a distinctive tradition in Chinese landscape design and have exerted broad influence on garden and landscape practices worldwide
[1–
2]. Rooted in philosophy, literature, and art, their design emphasizes the ideal of "harmony between heaven and humanity" and encodes cultural meanings through coherent spatial organization and symbolic details.
Over the past few decades, Chinese classical garden design has undergone a significant transformation from traditional hand drawing to parametric design
[3]. The introduction of digital technologies, such as modeling and rendering software, has improved design accuracy and flexibility while also enhancing visual representation, providing designers with more effective decision-support tools
[4]. However, current workflows have yet to achieve a truly intelligent level, as design processes still rely heavily on the expertise and intuition of designers. Given the inherent complexity of Chinese classical garden design, the field imposes high professional skill requirements, resulting in a substantial entry barrier. Additionally, the design process is often labor-intensive. To produce a visually appealing plan, designers typically undertake time-consuming tasks such as modeling, rendering, and image editing. As a result, considerable effort is devoted to presentation and refinement rather than conceptual design. Meanwhile, with the renewed societal interest in Chinese classical garden design, contemporary projects are often shaped by heterogeneous site contexts, functional demands, cultural interpretations, and stakeholder preferences, which together may require more iterative design coordination and refinement. Consequently, reducing design difficulty, improving design quality and efficiency, and streamlining workflows have become critical challenges for the future development of Chinese classical garden design.
With the rapid advancement of artificial intelligence (AI), the design industry is undergoing a profound transformation toward intelligent workflows
[5]. In this paper, the term "AIGC-assisted workflow" refers specifically to processes that leverage Artificial Intelligence–Generated Content (AIGC), emphasizing the role of AI in content generation and its assistive function in augmenting rather than replacing the designer's creative process. Unlike traditional design methods that focus primarily on form and function, AIGC-assisted creative tools are reshaping creative workflows by reducing limitations related to scale, scope, and technical learning requirements
[6]. Designers can now use low-dimensional inputs, such as prompts, hand-drawn sketches, and rough models, as control elements to generate detailed high-dimensional design outputs. This approach reduces reliance on repetitive drafting and iterative design
[7–
8].
The application of AI in design has become increasingly widespread, influencing design paradigms across various spatial scales, from urban and neighborhood planning to architecture and landscape design
[9]. Its influence now extends into subfields including architectural design
[10–
15], interior design
[16], and graphic design
[17]. At present, AIGC applications in design practice can generally be categorized into generative design based on intelligent algorithms and assisted design through intelligent platforms
[18–
19]. On the algorithmic side, Generative Adversarial Networks (GANs) and their variants (e.g., Pix2Pix, CycleGAN) have been widely employed in the generation of 2D design proposals and image-to-image translation tasks
[20]. Initially, these algorithms were mainly applied to architectural and urban design. For instance, Jige Steven Quan developed the Urban-GAN system for procedural urban design
[21], while Xueqing Li et al. combined GANs with Generative Genetic Design (GGD) to optimize the spatial layout of pedestrian-friendly urban cooling nodes
[22]. Hybrid models combining autoencoders and GANs have been applied to volumetric building form generation
[23]; and Graph-Constrained GANs have been introduced for modular housing floor plan design
[24]. Simultaneously, researchers have begun to explore the integration of intelligent algorithms into landscape architecture, yielding promising preliminary results. These include automatic rendering of landscape visualizations using the PlantoGraphy system
[25], generation and visualization of landscape layout plans using the Pix2Pix-BicycleGAN workflow
[26]. These techniques have been successfully applied in specific contexts such as pocket parks
[27] and floral border compositions
[28], demonstrating strong adaptability and significant scalability. Overall, intelligent algorithm-based research is broad in scope and technically mature, showing strong potential in complex spatial generation, image interpretation, and multi-objective optimization.
At the platform level, intelligent tools such as CityEngine, UrbanSim, Delve, DEEPUD, XKool, and CityCAD have emerged with capabilities for urban form generation and spatial simulation
[29–
30]. Concurrently, text-to-image generation platforms such as Stable Diffusion, Midjourney, and DALL·E enable designers to rapidly generate visual scenarios from natural language prompts, facilitating a combination of rational analysis and intuitive creativity. Representative applications include the systematic assessment of platform-specific potentials and limitations in architectural design
[31]. However, research on intelligent algorithms substantially outpaces research on intelligent platforms in both volume and depth, particularly in architecture and urban design, while applications in landscape architecture remain comparatively limited. Furthermore, regardless of whether the approach is algorithm-based or platform-based, most current methods rely predominantly on 2D top-view plans or related imagery to indirectly represent 3D spatial qualities. This 2D image-centric generation logic constrains the ability to address the spatial integrity, layering, and immersive experience fundamental to complex 3D design. As such, a critical question remains: can approaches that rely primarily on 2D planar compositions adequately support spatially coherent and immersive 3D design outcomes? This issue warrants further theoretical and methodological investigation.
As a vital component of the design field, Chinese classical gardens are entering a new phase of development through the integration of advanced AI technologies. To enable AI to meaningfully contribute to design practice, it is necessary to establish a shared theoretical foundation bridging computer science and the principles of Chinese classical garden design. Currently, AIGC-assisted design for Chinese classical gardens primarily relies on algorithms (e.g., CycleGAN, Pix2Pix, Pix2PixHD) to achieve intelligent and efficient design generation. For example, previous studies have used ANN and GAN models to learn 3D spatial structures and generate Taihu stones
[32] and trained GAN models to generate layout plans for private Chinese courtyards
[33]. However, these methods depend heavily on large-scale, high-quality image datasets and generally require comprehensive feature extraction to produce high-quality outputs. In addition, they often require advanced programming skills, limiting their accessibility in practical design applications. In contrast, the rapid development of widely accessible AIGC platforms has encouraged designers to adopt such platforms as primary creative tools. Recent reviews of AIGC-assisted design and human–AI interaction highlight a paradigm shift from automation toward co-creation
[34–
36]. However, as noted in previous studies, most current frameworks focus on modern urban scenarios, while research explicitly addressing the high-context cultural semantics embedded in heritage domains such as Chinese classical gardens remains limited. Therefore, there is a need to develop practical and theoretically grounded workflows based on widely used AIGC platforms.
Critically, this study aims to move beyond the simple aggregation of commercial AI tools by establishing a logic-driven generative framework intended to bridge the gap between high-context cultural semantics and low-context generative AI. The study argues that standard generative models may suffer from data bias and spatial hallucinations when interpreting non-Western architectural forms. Therefore, beyond improving efficiency, the primary contribution of this research lies in the proposed logic-driven generative framework, which addresses three limitations of current AIGC systems. First, unlike conventional text-to-image generation approaches, the framework introduces a Semantic Divergence Layer that functions as a cultural schema retrieval mechanism. By translating abstract classical theories from Yuan Ye (《园冶》) into structured prompt logic, this layer aims to reduce semantic drift commonly observed in general-purpose models. Second, to address the lack of inherent 3D reasoning in current generative systems, the framework proposes a Topological Constraint Layer based on control-guided generation techniques. This layer formalizes the classical principle of ranking the location (相地) as computational edge constraints, compelling the generative process to follow physical spatial logic rather than purely pixel-based probabilities. Third, in response to the limitations of 2D generation, the framework adopts a human-scale sequential viewing strategy through an Ontological Refinement Layer that generates coherent image sequences instead of isolated views. This approach is intended as a transitional methodology bridging current 2D paradigms and future 3D world models, while preserving the logic of "changing scenery with every step" (步移景异) through consistent topological constraints across multiple views. This research seeks to contribute new perspectives to the contemporary development of Chinese classical garden design and to support the continued relevance and accessibility of this cultural heritage in modern society.
2 Methods
2.1 Analyzing the Logic of Chinese Classical Garden Design
The design logic draws directly from classical Chinese aesthetics and philosophical treatises such as Cheng Ji's
Yuan Ye[37] and Zhenheng Wen's
Zhang Wu Zhi (《长物志》)
[38]. Specifically, the concept of borrowing scenery (借景) guides the Semantic Divergence Layer, allowing the AI to expand visual boundaries while maintaining logical coherence. Similarly, the classical principle of ranking the location from
Yuan Ye was applied as the theoretical basis for Topological Constraint Layer, treating physical site conditions as rigid computational constraints.
Based on the integrated relationship between spatial layouts and garden elements in Chinese classical garden design①, this research examined landscape characteristics and design logic from these two complementary perspectives: the organization of spatial layouts and the composition of garden elements. The analysis of garden elements concentrates on four principal components: rockeries, water, vegetation, and architecture.
① The spatial arrangement and combination of Chinese classical garden elements generate complex spatial and temporal relationships, including viewing and being viewed, primary and secondary elements with focal points, spatial contrasts, concealment and exposure, guidance and suggestion, sparsity and density, undulation and layering, void and solid, meandering paths, varied terrain, upward and downward perspectives, penetration and layering, and spatial sequences.
2.2 Analyzing the Ability of Existing AIGC Platforms in Chinese Classical Garden Design
We tested the capabilities of mainstream AIGC platforms, including ChatGPT 4.0, Stable Diffusion, and Midjourney, in layout and element generation to evaluate their understanding of Chinese classical gardens.
1) Viewpoint. Since AIGC platforms primarily generate 2D images, we analyzed their capability to interpret various viewpoints (e.g., plan, elevation, human perspectives, bird's-eye views). Viewpoint alignment was judged independently by three trained annotators based on horizon line position, camera elevation angle, and consistency between described and rendered occlusion relationships.
2) Spatial layouts and garden elements. The test of spatial layouts and garden elements consisted of preliminary and detailed stages. Preliminary testing utilized approximately 10 general spatial prompts to evaluate baseline comprehension and identify the most capable platforms. Subsequently, detailed testing evaluated these top-performing platforms on their understanding of specific categories within spatial layouts and garden elements.
3) Prompt standardization procedure. To ensure a fair and scientifically rigorous comparison across AIGC platforms with different syntax requirements, we established a systematic prompt standardization procedure. We designed a structured prompt template that decomposes each design concept into five core informational fields: core concept, main elements, spatial layout, art style or atmosphere, and technical parameters.
This core semantic information was then translated into platform-optimal formats: coherent narrative paragraphs for DALL-E 3, and comma-separated keywords for Midjourney and Stable Diffusion (augmented with negative prompts for the latter). This structured approach ensures that the design brief remains consistent, allowing for a valid comparison of their generative performance.
2.3 Multidimensional Evaluation Protocol
To move beyond surface-level image metrics, we established a comprehensive evaluation framework comprising four distinct stages.
1) Technical baseline. We employed established metrics from the field of computer vision and generative model assessment. The Fréchet inception distance (FID) measures the distance between the feature distributions of real and generated images. This research first constructed a reference dataset of over 1,000 high-resolution photographs of real Chinese classical gardens. Then, a corresponding set of 1,000 images from each AIGC platform was generated using a standardized set of prompts. The FID score was computed between each generated set and the reference set. This research employed the Contrastive Language-Image Pre-training (CLIP) score, which calculates the cosine similarity between the embedding of a text prompt and a generated image. To quantify the impact of specific adjustments to the spatial layout, we utilized the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). We calculated the PSNR and SSIM between the pre-refinement and post-refinement images. For the testing of garden elements, approximately 30 preliminary tests were conducted. For each category, 20 detailed tests were conducted.
2) Human-centered user experience. We recruited 72 participants (37 graduate students in Landscape Architecture and 35 junior designers with 1–3 years of practice experience). Each participant was given a standardized 2-hour design task: to develop a conceptual design for a small courtyard corner in the style of a Jiangnan private garden. Each participant completed the System Usability Scale (SUS) questionnaire immediately after the task. In this study, we adopted an extended SUS structure (20 items, each rated on a 5-point Likert scale).
3) Expert cultural turing test. We convened a panel of 26 senior landscape architecture academics and practitioners. A curated set of 20 anonymized images was presented to the panel. The experts were asked to evaluate each image using a 5-point Likert scale across five criteria: cultural fidelity, spatial logic, implementability, yijing (意境)②, and aesthetic atmosphere. Fleiss Kappa was then calculated. A Kappa value exceeding 0.6 is generally interpreted as indicating substantial agreement.
② "Yijing" refers to "意境" in Chinese, which might be translated as "imagery" in relevant literatures.
4) Layer-wise ablation analysis. To attribute observed improvements to specific components rather than tool choice or operator skill, we conducted controlled ablation experiments by treating the Semantic Divergence, Topological Constraint, and Ontological Refinement Layers as independent functional modules. Four ablation variants were compared under identical design tasks and operational conditions (Table 1). Minimal human selection was allowed, while cross-layer compensatory operations were strictly prohibited. By jointly analyzing quantitative metrics (FID, CLIP, PSNR, SSIM), user evaluations, and expert assessments, this comparative analysis enabled explicit identification of each layer's contribution to spatial coherence, cultural accuracy, and human post-editing effort.
2.4 Exploring a Generative Framework and Operational Workflow for Chinese Classical Garden Design
In the design process, designers must provide AI tools with comprehensive and accurate prompts that encapsulate project requirements, site characteristics, and professional insights. Currently, guiding AI platforms in the creative process is an inherently iterative process of continuously providing information to the AI tools and refining concepts and ideas based on feedback. In this process, the quality and structure of these prompts directly dictate the efficiency and quality of the generated outputs.
The fundamental logic of Chinese classical garden design relies on integrating "spatial layouts" and "garden elements" to construct layered 3D spaces. However, current AIGC platforms lack inherent 3D spatial cognition and struggle to accurately interpret 3D characteristics of landscape spaces from 2D plan views. To bridge this gap, we proposed using 2D landscape images from non-plan perspectives as the primary design medium, dividing the workflow into two distinct stages: conceptual design and detailed design.
1) Conceptual design. Utilizing platforms with strong text-to-image capabilities, designers input detailed prompts to generate initial concepts and stylistic directions. To ensure rigorous evaluation, a conceptual design iteration is considered acceptable only when all major spatial layouts, garden elements, and stylistic markers described in the prompt are present without structural contradictions.
2) Detailed design. Building upon the accepted conceptual images, this stage involves precise, localized adjustments to resolve spatial issues such as disproportionate layouts, unclear path organization, and disconnected water-land relationships. A modification during this phase is quantitatively and qualitatively defined as successful when the PSNR or SSIM values match the intended magnitude of change, and human annotators confirm that the spatial logic is demonstrably improved relative to the previous iteration.
Finally, this optimized logic-driven workflow was applied to a controlled case study of a Jiangnan private garden, covering an area of approximately 500 m2 and characterized by small bridges and flowing water, to explore the practical application of AIGC in Chinese classical garden design.
3 Results
3.1 The Logic of Chinese Classical Garden Design
Through a systematic analysis of Chinese classical gardens, we first abstracted the complex design logic of Chinese classical gardens into two computationally evaluable dimensions, i.e., spatial layouts and garden elements, and summarized various garden elements, including rockeries, water, vegetation, and architecture. This analysis provides a theoretical basis for evaluating the AI's ability to interpret and generate Chinese classical gardens.
3.1.1 Spatial Layouts
Chinese classical gardens emphasize the concept of "grandness within smallness" (小中见大), skillfully utilizing limited space to create diverse and rich scenic effects. By guiding visitors' sight and walking paths, the design enhances the overall experience. The spatial layouts of Chinese classical gardens generally fall into two types: dominant scene layout and clustered scene layout. The dominant scene layout uses techniques such as elevation changes and axial symmetry to highlight the main scenery, enhancing its prominence and visual appeal. In contrast, the clustered scene layout seemingly arranges elements freely and loosely but actually maintains a strong sense of order to prevent the scattered elements from seeming chaotic. The techniques employed in spatial layouts include borrowed scenery (借景), framed scenery (框景), leaking scenery (漏景), opposing scenery (对景), layered scenery (夹景), obstructed scenery (障景), added scenery (添景), and hidden scenery (藏景)
[39].
3.1.2 Garden Elements
(1) Rockeries
In Chinese classical gardens, rockeries are constructed using natural stones
[40] to emulate the grandeur of natural mountains. The selection of stones emphasizes their natural beauty, including Taihu stone (太湖石), lingbi stone (灵璧石), ying stone (英石), and kun stone (昆石)
[41]. The basic techniques for constructing rockeries can be categorized into various methods, such as an (安), lian (连), jie (接), dou (斗), pin (拼), kua (挎), xuan (悬), jian (剑), ka (卡), chui (垂), tiao (挑), cheng (撑), die (叠), shu (竖), jia (夹), zheng (整), and gou (钩). The surface texture treatment techniques of rockeries, known as cunfa (皴法), include various styles such as mayacun (马牙皴), pimacun (披麻皴), fupicun (斧劈皴), zhedaicun (折带皴), and luanchacun (乱柴皴).
(2) Water
In the creation of waterscapes, two primary layout forms are employed: centralized layouts and dispersed layouts. The design aims to achieve various artistic effects, such as graceful water curves, the natural rhythm of water sounds, the dynamic and static interplay of water bodies, and the enchanting beauty of reflections on the water surface
[42]. Water is typically integrated closely with the spatial layouts. Through its interaction with other garden elements, water creates diverse spatial variations, significantly enhancing the vibrancy and layering of the entire garden.
(3) Vegetation
Chinese classical gardens place great emphasis on their multifaceted appreciation of vegetation, including their color, fragrance, shape, form, and charm. Planting methods primarily adopt natural forms and various types such as paired planting (对植), row planting (列植), clustered planting (丛植), and solitary planting (孤植). The main categories of plants include trees, flowers, bamboo, and herbaceous plants
[43]. Different plants are often imbued with symbolic meanings, such as elegance, resilience, and prosperity.
(4) Architecture
The landscape composition of Chinese classical gardens is often realized through a variety of architectural forms or structures. The architectural types within gardens are diverse, including pavilions (亭), terraces (台), towers (楼), chambers (阁), verandas (轩), open halls (榭), archways (卷), open spaces (广), and corridors (廊)
[44]. Depending on the specific setting and functional requirements, the roof styles also vary, with common types including wudian (庑殿), xieshan (歇山), xuanshan (悬山), yingshan (硬山), and cuanjian (攒尖).
3.2 Objective and Subjective Evaluation Results
3.2.1 Technical Baseline
The generation performance of the three AIGC platforms was initially benchmarked using FID, CLIP, PSNR, and SSIM metrics (Table 2). Midjourney achieved the lowest FID (8.4) and the highest visual quality scores compared with Stable Diffusion and DALL-E 3, indicating superior baseline generation capabilities.
3.2.2 Human-Centered Evaluation
Furthermore, a usability study with 72 participants evaluating the multi-platform workflow yielded a mean SUS score of 76.6, reflecting a highly positive user experience, particularly in efficiency enhancement and iterative design support. The results also indicate ongoing limitations of current image-based AIGC systems in representing complex spatial perception and experiential continuity.
3.2.3 Expert Cultural Turing Test
While Midjourney achieved the best baseline technical scores, expert evaluation (Fleiss Kappa = 0.8) revealed a critical divergence between visual appeal and design logic. As shown in Fig. 1, although single-platform outputs achieved high scores in aesthetic atmosphere, they performed significantly worse in spatial logic due to structural hallucinations, such as physically implausible bridge structures. In particular, the generated bridges often lacked proper integration with the path system, resulting in low implementability scores. In contrast, our proposed multi-platform framework achieved a balanced high score across all dimensions, particularly outperforming single-platform baselines in cultural fidelity (4.5/5.0) and spatial logic (4.2/5.0).
3.2.4 Layer-wise Ablation Analysis
To systematically attribute the observed improvements to specific components of the framework, a layer-wise ablation analysis was conducted. Compared with the G0 baseline, the full three-layer workflow (G3) achieved consistent improvements across generation quality, expert evaluation, and editing efficiency. FID decreased from 68.42 ± 4.21 to 47.92 ± 2.93, corresponding to a 30.0% reduction, while SSIM and PSNR increased from 0.62 ± 0.05 to 0.73 ± 0.03 and from 17.31 ± 1.10 to 21.15 ± 0.81, respectively. Expert-rated cultural fidelity improved by 1.92 (2.63 to 4.55), and spatial logic improved by 1.82 (2.42 to 4.24). Meanwhile, human post-editing time was reduced from 42.02 ± 8.51 to 18.01 ± 5.26 min, a 57.1% decrease. These results suggest that G3 substantially enhances cultural-spatial consistency while reducing manual correction effort.
3.3 A Comprehensive Workflow for Chinese Classical Garden Design
The testing results show that while images generated by DALL-E 3 and Stable Diffusion generally align with the input prompts, their visual quality remains relatively average. In contrast, Midjourney not only achieves a high degree of prompt adherence but also produces visually superior results (Fig. 2). Furthermore, the findings indicate that an iterative, three-stage input strategy—defining task objectives, structuring a detailed prompt, and optimizing based on outputs—substantially improves the consistency and visual fidelity of AI-generated images (Fig. 3).
Our experiments show that an iterative input process comprising four stages—"prepare accurate spatial layout color block diagrams, " "modify error spatial layouts, " "input positive and negative prompts, " and "adjust and optimize"—can significantly enhance the stability and aesthetic quality of AI-generated images and yield more precise design outputs and clearer spatial layout representations.
Evaluations (Table 3) indicate that Stable Diffusion demonstrates exceptional control and generation performance in the "Seg" type, accurately mapping different color blocks to corresponding landscape elements with a high degree of alignment. This precise spatial control is primarily achieved through ControlNet, a neural network architecture designed to inject extra spatial conditioning into pre-trained diffusion models. The remaining types are not well suited to Chinese classical garden design.
The underlying mechanism of ControlNet is highly effective for architectural applications. Instead of retraining the entire foundational model, it creates a locked copy of the production-ready Stable Diffusion model to preserve its vast visual knowledge, while a trainable copy learns the relationship between an input control map (e.g., Canny edge map, semantic segmentation map) and the desired output. During generation, these outputs of both the locked and trainable copies are merged, guiding the final image via both semantic text prompts and precise spatial constraints. This architecture is uniquely suited for the detailed design phase, as it imposes explicit structural and layout information onto a generated concept without degrading the rich visual knowledge embedded in the base model. Therefore, optimizing reference images from the conceptual design phase by applying localized repainting with the "Seg" tool can effectively enhance the controllability and quality of the generated design outputs.
Within this semantic segmentation approach, utilizing the ADE20K dataset enables Stable Diffusion to generate corresponding elements with high precision based on predefined colors. Further analysis reveals that reclassifying and integrating the four major categories of garden elements improve both the scientific rigor and the practical applicability of AI-generated images in landscape design.
The test results indicate that ChatGPT 4.0 (DALL-E 3) demonstrates outstanding performance in understanding and representing garden elements, generating more precise design images through well-structured prompt combinations (Fig. 4-1). Further analysis reveals that an iterative input process comprising three stages—"assign a role, " "define garden element attributes, " and "adjust and optimize"—can significantly enhance the stability and aesthetic quality of the generated images. The specific tasks for this three-stage iterative process for garden element generation are detailed in the supplementary information.
Finally, test results indicate that current AIGC platforms remain incapable of seamlessly generating complex spatial layouts and precise garden elements simultaneously, necessitating their decoupled processing. This step plays a critical role in optimizing the size, positional control, and accuracy of garden elements within the spatial layout (Fig. 4-2). Integrating the "Canny" tool with conventional image editing software (e.g., Photoshop) ensures precise visual refinement. Adjusting parameters through a three-stage process—"prepare accurate garden elements, " "remove error garden elements, " and "adjust and optimize"—can significantly improve the stability and aesthetic quality of the generated images.
Based on the theoretical framework established in Section 2.3 and the test results mentioned above, we finalized the operational workflow as shown in Fig. 5. The validation results confirm that separating the process into conceptual design and detailed design effectively overcomes the AI's limitations in one-shot generation. In the detailed design stage, the workflow explicitly decouples the optimization of spatial layouts and garden elements to ensure distinct control over topological logic and semantic fidelity.
3.4 Case Study
To demonstrate the scientific rigor and reproducibility of our framework, we applied it to a controlled design experiment: a 500 m2 Jiangnan private garden. Unlike utilizing a black-box generation process, we documented the specific AI errors encountered at each stage and how our layered framework corrected them (Fig. 6).
1) Stage 1: Initial Semantic Divergence Layer using Midjourney provided a strong atmospheric baseline. However, structural analysis revealed significant spatial errors, specifically a physically impossible bridge structure that disconnected the water system. This confirmed that pure semantic generation lacks topological logic.
2) Stage 2: To correct the spatial error, we did not rely on random regeneration. Instead, Topological Constraint Layer was applied. By extracting the Seg map of a functionally correct layout and injecting it via ControlNet (weight 1), we forced the model to reconstruct the bridge with correct perspective and connection logic, effectively imposing physical laws onto the generative process.
3) Stage 3-A: The initial AI output rendered the pavilion with generic Asian-style roofs, exhibiting Semantic Drift. We employed the Ontological Refinement Layer using localized inpainting strategies. By constraining the generation to specific architectural vocabulary, we restored the cultural authenticity of the artifacts.
4) Stage 3-B: The final output was not merely selected for aesthetics but validated against the established expert evaluation criteria. The rigorous correction process demonstrates the reproducibility of the workflow: by constraining topology and refining ontology, designers can consistently achieve culturally accurate outcomes, independent of random seed variations.
4 Discussion
4.1 The Adaptability of the Framework in the Era of Rapid AI Evolution
The field of generative AI is evolving at a breakneck pace, with newer models such as Google's Gemini 2.5 and Black Forest Labs' FLUX.1 offering superior prompt adherence and integrated editing capabilities. However, the emergence of these advanced models does not render the proposed workflow obsolete; rather, it highlights the resilience of the logic-driven generative framework proposed in this study. The framework is designed as a modular system where specific tools function as interchangeable components within three logic layers (Table 4).
1) Semantic Divergence Layer. Currently powered by Midjourney, this layer can be seamlessly upgraded to newer engines like FLUX.1. While new models improve textual understanding, the methodological necessity of this layer—to retrieve cultural schemas before defining structure—remains unchanged.
2) Topological Constraint Layer. While models like Gemini 2.5 offer editing, they often lack the granular, pixel-level structural control provided by ControlNet. For Chinese classical gardens, where spatial relationships must follow strict physical logic (ranking the location), the black-box editing of end-to-end models is often insufficient. The framework's insistence on explicit topological constraints remains a critical theoretical contribution.
3) Ontological Refinement Layer. General-purpose models still suffer from dataset bias regarding high-context cultural heritage. The potential integration of LoRA fine-tuning represents a lasting strategy for ensuring cultural fidelity, a requirement that persists regardless of the base model's power.
4.2 Capability Boundaries and the Path to World Models
To assess the long-term stability of the proposed workflow, it is necessary to situate it within the fundamental principles of current AI pathways. Mainstream models, such as Stable Diffusion and Midjourney, are based on latent diffusion and rely on probabilistic distributions to map textual inputs to 2D pixels. Although effective in rendering texture and stylistic features, these models lack an internal representation of physical laws, which often results in spatial inconsistencies and geometric errors. Emerging advances in 3D generative models, including NeRF and 3D Gaussian Splatting, as well as world models such as Sora and Gen-3, aim to address these limitations by incorporating physical simulation and object permanence. However, currently, these 3D models often sacrifice texture fidelity and cultural specificity for geometric accuracy. In this transitional era, the human-perspective-first strategy proposed in this study serves as a critical bridge. By generating spatially consistent 2D image sequences via the Topological Constraint Layer, the workflow effectively acts as a semantic frontend. These culturally accurate 2D sequences can serve as high-quality input for subsequent 3D reconstruction algorithms, such as image-to-3D NeRF, ensuring that the cultural essence established by the framework is retained as the spatial representation transitions to fully generated 3D forms. Consequently, the core logic of the framework remains applicable as underlying generative engines evolve from 2D to 3D paradigms.
4.3 Addressing Cultural Homogenization in Heritage Design
While established commercial tools, such as Photoshop (Firefly) and FLUX.1 Inpainting, offer streamlined workflows for image composition and refinement, applying these general-purpose tools directly to Chinese classical garden design reveals a critical limitation: cultural homogenization. Commercial black-box models are typically trained on copyright-cleared, globalized datasets dominated by Western or modern architectural forms. Comparative testing indicates that when these tools are applied to the restoration of specific heritage elements, such as Taihu stones, they tend to produce generic rock forms that lack the defining aesthetic qualities of shou (瘦), zhou (皱), lou (漏), and tou (透). Similarly, they frequently misinterpret the complex curvature of flying Eaves (翘角) as standard pagoda roofs. In contrast, the innovation of the proposed workflow lies not in the act of inpainting itself, but in the modeling of cultural semantics through a controllable, open ecosystem. By utilizing the open Stable Diffusion ecosystem, the framework allows for the potential injection of specific cultural datasets via LoRA fine-tuning strategies within the Ontological Refinement Layer. Although the current study validates this layer using iterative inpainting, the framework provides the necessary technical infrastructure for future implementation of custom cultural models, ensuring that spatial sequencing and element generation are not merely visually plausible but culturally accurate. Therefore, this methodology transcends the convenience of commercial tools, offering a necessary technical pathway for preserving cultural sovereignty in the age of generative AI.
4.4 Research Objectives and Tool Limitations
It is crucial to distinguish between the inherent limitations of current generative tools and the methodological objectives of this study. While current paradigms face constraints regarding innate 3D understanding and full automation, the primary objective of this research is not to create a fully automated 3D modeling tool, but to establish a logic-driven methodology for translating the high-context semantics of Chinese classical garden into computational constraints.
5 Conclusions
The relationship between designers and AI is shifting from a simple user-tool dynamic to one of deep collaboration and co-creation. To achieve this ideal state, our research has adopted several key approaches: conducting an in-depth analysis of the characteristics and inherent logic of Chinese classical garden design, evaluating the capabilities of mainstream AI platforms, and integrating the strengths of these platforms to develop an innovative and efficient AIGC-assisted workflow for Chinese classical garden design.
This research has the following results. Firstly, Chinese classical gardens possess unique characteristics with their inherent logic centered on the organization and planning of spatial layouts and garden elements. In the context of current AI generation, this process often involves the complex task of translating 3D spaces into 2D representations. Secondly, different mainstream AI platforms demonstrate distinct advantages: Midjourney excels in providing design inspiration, ChatGPT 4.0 (DALL-E 3) specializes in generating accurate garden elements, and Stable Diffusion demonstrates superior performance in controlling and optimizing spatial layouts. Thirdly, while AI has shown preliminary design capabilities, achieving high-quality outcomes still requires the integration of traditional tools and professional input from designers. In the future, strengthening training datasets and optimizing technologies can better support the intelligent transformation of Chinese classical garden design.
This research represents an initial foundational framework of integrating AI with Chinese classical garden design, marking a new starting point for this field. Through this exploratory work, we have not only opened up new directions for the digitalization and intelligent development of Chinese classical garden design but also laid a solid foundation for future in-depth studies in this area. Our findings provide scholars with a scientific reference framework, encouraging further innovative research and driving the transformation and enhancement of Chinese classical garden design in the context of modern technology.