1 Introduction
The Fourth Industrial Revolution has introduced disruptive technologies represented by artificial intelligence (AI), driving continuous transformation in scientific research paradigms and teaching models. With rapid technological advancement, researchers are now able to integrate heterogeneous multi-source data such as text, images, and geospatial information to analyze urban issues in more fine-grained, dynamic, and visualized ways
[1–
4]. At the same time, higher education is undergoing profound structural adjustment and systemic reconfiguration.
Currently, higher education faces several challenges, including the uneven distribution of high-quality educational resources across regions, disciplines, and institutions, as well as the limitations of the conventional instructional model characterized by teacher-centered approaches, standardized curricula, and fixed learning paces in addressing students’ diverse and individualized learning needs
[5–
7]. However, the pathways, mechanisms, key nodes, and underlying logic of deep transformation driven by technological advancement still require systematic clarification and empirical validation. Existing studies indicate that AI demonstrates significant potential in areas such as instructional content generation, knowledge graph construction, and human–machine collaborative learning. It is expected to reshape modes of knowledge acquisition and drive educational ecosystems toward greater intelligence, openness, and adaptability
[8–
12]. Recent studies have also validated the effectiveness of AI in applications such as virtual teaching assistants, writing support, lesson plan development, and assignment grading
[13–
15]. However, research on the system architecture, functional design, and integration of AI teaching assistants into instructional contexts remains in its early stages, lacking a mature theoretical framework and well-established practical paradigms. From an educational theory perspective, if AI teaching assistants are integrated with constructivist learning theory
[16] and cognitive load theory
[17], this may help explain the mechanisms through which they promote personalized learning and deeper cognitive engagement, thereby providing a theoretical foundation for their adoption in higher education.
This study takes the New Science of Cities, an undergraduate general education course at Tsinghua University, as a representative case. It aims to investigate the key mechanisms underlying the evolution of the AI teaching assistant from general-purpose language models to domain-specific intelligent engines, and to explore innovative pathways for integrating AI into teaching practice. The study seeks to provide a replicable and transferable theoretical framework and operational paradigm for promoting the deep integration of AI technologies and teaching practice in higher education.
2 Methods and Technical Implementation of the AI Teaching Assistant
Current AI teaching systems in text-based interaction largely rely on keyword matching or outputs generated by a general-purpose large language model (LLM). As a result, their responses often lack in-depth understanding of domain-specific knowledge and the latest research developments. In addition, most existing systems either do not support image generation capabilities or rely solely on basic generative models, making it difficult to meet the expressive and cognitive requirements of professional teaching. Building upon existing technologies, the AI teaching assistant for the New Science of Cities is specifically adapted to course requirements and develops two core functions: text generation and image generation.
The text generation module enhances instructional relevance and domain alignment in course-based question answering by constructing a high-quality domain-specific corpus and incorporating a Retrieval-Augmented Generation (RAG) mechanism. The image generation module, on the other hand, leverages a large-scale image-text paired dataset and model fine-tuning to improve visual representation and cognitive support in the teaching context of future cities. Furthermore, these two core functions are integrated into the Zhipu Qingyan AI platform and the Hetang RainClassroom system. A functional card interface is introduced to provide prompt templates based on typical task scenarios, thereby improving usability and pedagogical applicability. Compared with general-purpose models (e.g., ChatGPT, Midjourney), the customized AI teaching assistant requires greater investment in corpus construction, model fine-tuning, and platform integration. However, it demonstrates significant advantages in instructional task alignment, response accuracy, and cultural contextualization.
2.1 Text Generation Function
In terms of text generation, the research team first focused on the systematic organization and construction of high-quality domain-specific data. The overall workflow is illustrated in Fig. 1, aiming to provide knowledge support with both disciplinary characteristics and educational value for course instruction. The textual data include textbooks, online course materials, seminal papers, and case-based literature. By constructing a comprehensive corpus based on the New Science of Cities and urban planning literature, the model’s capabilities in knowledge understanding and contextual adaptation for teaching and academic research are significantly improved.
In terms of model capability optimization, the AI teaching assistant adopts ChatGLM-4
[18] as the foundational LLM and incorporates RAG to achieve domain-specific enhancement for teaching scenarios. Compared with traditional fine-tuning approaches, RAG does not require updating model parameters and therefore offers advantages such as faster iteration and more convenient knowledge updating. It is particularly suitable for domains where knowledge evolves rapidly or is frequently updated. While ensuring domain specificity, this approach also effectively mitigates the hallucination problem of LLM. During the response generation stage, prompt engineering is applied to explicitly require the LLM to adopt course-specific analytical perspectives and frameworks when structuring its responses, thereby guiding the AI teaching assistant to more fully and systematically utilize the retrieved corpus in generating answers.
2.2 Text-to-Image Generation Function
In terms of the image generation capability, to enhance the AI teaching assistant’s support for visually representing urban scenarios, the research team fine-tuned the model on a self-constructed high-quality image-text paired dataset, improving its text-to-image generation performance (Fig. 2). The dataset consists of both manually curated and automatically collected components. The manual process focuses on representative cases of future urban spatial design, where 1, 100 representative images were carefully selected and annotated by researchers with a background in design. For the automated component, 350, 000 high-resolution real-world images were collected from international architectural design platforms, and an additional 50, 000 high-quality AI-generated images were incorporated. To support annotation at scale, CogVLM
[18], a vision-language model, was employed to automatically generate image descriptions. Ultimately, a dataset of 400, 000 image-text pairs was constructed, covering multiple urban scenario categories such as residential spaces, office spaces, and public spaces, providing rich training samples for model development.
Based on the above dataset, the research team fine-tuned CogView3
[18] as the backbone model, employing Low-Rank Adaptation (LoRA) to optimize key parameters, and conducted approximately 2, 000 training steps. To further enhance the model’s ability to understand prompts, an LLM was introduced prior to image generation to rewrite and expand user inputs. In this process, concise keywords containing domain knowledge were transformed into semantically enriched sentences with explicit spatial composition and complete scene descriptions, thereby improving both visual quality and semantic accuracy of generated images.
In the evaluation stage, 240 test samples were constructed across 30 categories of representative urban spatial scenarios (with 8 descriptions per category), covering both classical and future-oriented as well as indoor and outdoor environments. The results show that the latest CogView3-Plus model
[18] achieved a user satisfaction rate of approximately 60% on the test set, whereas the fine-tuned model based on CogView3 performed better in terms of image content accuracy, visual style alignment, and cultural contextual representation (Fig. 3). Notably, it demonstrates a significant improvement in the representation of Chinese urban contexts, achieving a user satisfaction rate of 75%, which confirms its practical value in supporting creative expression.
2.3 Interface Development
The research team integrated the two core functions—text generation and image generation—into the Zhipu Qingyan AI platform and the Hetang RainClassroom system, enabling the application of AI in real-world teaching scenarios. The former served as a testing platform for students and was subsequently integrated into the latter after functional refinement, thereby better connecting existing course resources and improving the overall learning experience.
To improve students’ operational efficiency and targeted questioning, the team designed and deployed 25 functional card modules on the Zhipu Qingyan AI platform, covering five application scenarios of experimental planning, observational reporting, transformation proposals, revision and evaluation, and question generation (Fig. 4). These functional cards received positive feedback from students and were subsequently migrated to the Hetang RainClassroom system to further extend their applicability in real teaching contexts. Based on teaching experience, the team systematically identified key and difficult points in the course, developed customized question templates, and embedded them into the system in the form of functional cards. This enables students to directly access structured and intelligent responses after completing each course unit (Fig. 5). In addition, by integrating knowledge graphs, MOOC resources, and course materials, the AI teaching assistant system establishes a multi-source learning resource linkage, which further supports students in achieving a deeper understanding of the course content (Fig. 6).
3 Application of the AI Teaching Assistant in Teaching Practice
3.1 Intelligent Classroom Interaction
In classroom teaching, instructors can use the AI teaching assistant to generate introductory content aligned with the course progression, combining academic rigor with pedagogical relevance to stimulate student interest. During lectures, the AI teaching assistant acts as a “third instructional agent, ” participating in teacher–student interaction by providing exploratory and heuristic responses, thereby supporting instructors in guiding critical thinking. In discussion sessions, students can leverage the AI teaching assistant to clarify discussion directions and obtain supplementary knowledge and case materials, thereby enhancing both the quality and depth of classroom discussions (Fig. 7).
3.2 Post-Class Tutoring Support
In the post-class stage, the AI teaching assistant provides multidimensional support for knowledge consolidation and assignment assistance. For content not fully understood during class, students can quickly raise questions via functional cards or access a 24-hour intelligent learning companion for explanations (Fig. 8). For extended learning needs, relevant content can be retrieved from the knowledge base for further exploration. In addition, the system supports multiple interaction modalities, including voice-based interaction, screenshot-based Q&A, and voice-enabled mind map generation.
The AI teaching assistant is integrated throughout all stages of course projects. It assists students in constructing research frameworks, organizing ideas, and recommending appropriate research methodologies. By leveraging academic resources, the system provides analytical suggestions that help optimize logical structures, while also efficiently filtering relevant literature and cases to enhance information synthesis capabilities. In terms of visual representation, the image generation function supports the depiction of future scenarios, thereby strengthening the expressiveness of research outcomes. Finally, the system can further refine and polish written text, improving linguistic accuracy and academic expression quality.
3.3 Student and Instructor Evaluation
To evaluate the practical effectiveness of the AI teaching assistant, the research team analyzed student usage data from the Fall semesters of 2023 and 2024, and collected student feedback through questionnaires.
In terms of usage, 30 students participated in the feedback survey in Fall 2023, and approximately 70% reported using the AI teaching assistant in tasks such as information retrieval, concept understanding, research framework construction, and text refinement. In Fall 2024, all 17 students used ChatGLM, the underlying LLM, for assistance. The total number of interactions reached 432, with a maximum of 108 queries submitted by a single user and a maximum usage duration of 24 h 11 min (Fig. 9). Students with different learning styles exhibited significant differences in their usage patterns. Students who tended toward independent thinking primarily used the system in a critical and verification-oriented manner, focusing on information retrieval and rapid validation, and thus showed relatively shorter usage durations. In contrast, students who preferred interactive learning were more inclined to use the system as a conversational cognitive tool, engaging in continuous dialogue to expand cognitive boundaries and deepen understanding, resulting in longer usage times. In addition, students in the top 20% of usage duration generally achieved “excellent” course grades, indicating a potential positive correlation between AI teaching assistant usage intensity and learning performance. However, this relationship may also be influenced by factors such as overall learning engagement, and therefore requires further validation.
Meanwhile, students’ feedback indicates that the AI teaching assistant plays a significant role in improving research thinking and writing quality. However, several limitations remain, including insufficient depth of responses, unstable image generation quality, slow response speed, and incomplete literature retrieval functionality. Some students suggested enhancing the system’s understanding of urban space and campus environments, improving contextual memory and update capabilities, reducing templated expressions, and strengthening cross-domain adaptability. In addition, existing studies have shown that excessive reliance on intelligent tools may weaken learners’ autonomous learning and problem-solving abilities
[19]. Therefore, in teaching practice, it is necessary to clearly define the auxiliary role of AI teaching assistants and maintain students’ critical thinking and reflective capabilities through appropriate pedagogical guidance. Future development of the AI teaching assistant should not only improve the depth and flexibility of responses at the technical level, but also enhance interpretability and transparency at the design level, thereby developing students’ ability to effectively interact with AI rather than becoming dependent on it.
From the instructors’ perspective, this practice involves both the construction and application of the AI teaching assistant. Instructors serve not only as system designers and builders, but also as users and observers in the teaching process. During the system development, key challenges include accurately identifying instructional requirements, systematically constructing the knowledge base, and ensuring deep alignment between the system and course content. In practical application, several key operational considerations are highlighted. First, the AI teaching assistant should be introduced appropriately during classroom teaching, emphasizing its supportive rather than substitutive role, so as to stimulate student interest and enhance classroom engagement. Second, in the post-class stage, the system’s knowledge-based question-answering capability should be fully utilized, allowing it to handle routine and repetitive inquiries, thereby enabling instructors to focus more on guiding students’ systematic thinking. Third, for individualized questions from students, instructors should integrate AI-generated feedback with professional expertise to provide more in-depth analysis and guidance.
4 Exploration of AI-Enabled Discipline-Based Knowledge Engine Construction
4.1 From AI Teaching Assistant to Domain-Specific Vertical Foundation Model
Tsinghua University has taken the lead in exploring AI-enabled disciplinary engine construction based on its experience in online education development and AI-supported teaching practices, aiming to promote the transformation of higher education paradigms. The AI teaching assistant for the New Science of Cities can serve as a starting point for gradually advancing toward the construction of a disciplinary engine in the field of urban planning. The technical architecture of the disciplinary engine consists of four main layers. The knowledge graph layer organizes multi-source knowledge in the field of urban planning—including core concepts, methods, and cases—into a structured graph format, enabling logical reasoning and cross-domain retrieval. The multimodal fusion layer integrates text, images, and tabular data, and through cross-modal alignment and representation learning, enables the model to jointly understand and process visual, statistical, and textual information. The model evolution layer builds upon general-purpose LLM and incorporates RAG, gradually evolving into a domain-specific vertical foundation model for urban planning through domain corpus fine-tuning, knowledge injection, and human-in-the-loop feedback. The application interface layer supports diverse application scenarios, including personalized theoretical learning and planning design guidance. Through this architecture, the AI teaching assistant is no longer limited to a supportive instructional tool, but is further developed into a system with generalized reasoning and domain-specific intelligence capabilities in urban planning.
Specifically, the team further expanded the input content and application scope of the AI teaching assistant (Fig. 10). Building upon the original AI assistant model, 20 subfields of urban planning were identified based on the undergraduate training program in urban planning. A comprehensive corpus covering the New Science of Cities and urban planning disciplines with broad knowledge scope was constructed, providing a solid domain-specific knowledge foundation for the LLM. Meanwhile, 49 sets of real and mock tests for certified urban–rural planner examination were collected for model evaluation. The results show that the domain-specific vertical model outperformed other general-purpose models on the corresponding test tasks (Table 1).
4.2 Future Perspectives on Discipline-Based Knowledge Engine Construction
Starting from the AI teaching assistant, the construction of a discipline-based knowledge engine for urban planning will reshape the development landscape of the discipline across multiple dimensions, including knowledge, functionality, and application. It will open new pathways for cultivating innovative planning professionals who are capable of addressing future urban development challenges, solving complex urban planning problems, and advancing frontier research in the field (Fig. 11).
In terms of knowledge sources, the AI teaching assistant periodically integrates textbooks and course materials, standards and regulations, as well as the latest literature and case studies. New knowledge is embedded into the existing knowledge graph according to thematic units and course chapters, enabling a stable mapping from authoritative sources to specific clauses, indicators, and representative cases. This ensures that the model can continuously learn and dynamically update its knowledge base, thereby developing more accurate understanding and stronger logical reasoning over disciplinary concepts and rules.
In terms of knowledge fusion, existing textbooks and regulatory documents, recent research findings, and practical cases are first segmented, deduplicated, anonymized, and version-labeled. An indexing system is then constructed based on course syllabi and thematic tags. Second, a terminology glossary and conceptual network are used to constrain the usage boundaries of synonyms and semantically similar concepts. Finally, without altering the original usage of the knowledge system, new knowledge is precisely injected and dynamically updated.
In terms of reasoning mechanisms, retrieval order and scope can be dynamically adjusted according to task objectives. In research-support scenarios, the system strengthens retrieval of recent literature and case studies. In practical guidance scenarios, it simultaneously retrieves regulatory clauses and relevant cases. The evidence chain generation module links extracted clauses, parameters or conditions, applicability scopes, and citation sources at each retrieval step, and performs alignment and conflict resolution when necessary, ensuring that the intermediate reasoning process remains grounded in disciplinary evidence. Ultimately, the outputs are presented in a standardized structure comprising conclusions, key evidence, scope of applicability, and avenues for further verification.
Expanding application scenarios is also an important future direction for discipline-specific foundation models in urban planning. Urban managers can use such specialized models to predict urban development trends under different planning policies, thereby supporting decision-making. Planners and designers can leverage the models’ capabilities in creative inspiration and regulatory compliance review to enhance both the innovativeness and compliance of design proposals. Meanwhile, the public may access personalized spatial planning recommendations through open interfaces.
5 Conclusions and Outlook
The study takes the New Science of Cities course as a case study and systematically reviews the development pathway and teaching practice of an AI teaching assistant. Focusing on text generation, image generation, and platform integration, it establishes an intelligent teaching support system for urban planning education. By constructing a high-quality domain-specific corpus and introducing a RAG mechanism along with fine-tuning of image-text models, the system significantly enhances AI capabilities in knowledge retrieval and visual representation. Meanwhile, platform integration further improves instructional interaction and enriches students’ learning experience.
On this basis, the study proposes the concept of a “discipline-based knowledge engine” and constructs a multimodal educational system centered on LLM, integrating domain-specific corpora, knowledge graphs, and generative model optimization. This system not only supports classroom teaching but can also be extended to research training, competency assessment, and interdisciplinary collaboration, serving as an important intelligent infrastructure for advancing academic innovation and educational transformation.
However, the adoption of AI in teaching still faces several challenges, including insufficient understanding of teaching contexts, excessive student dependence, and limited system transparency. Therefore, the construction and application of the AI teaching assistant should not remain at the level of technological feasibility alone, but must be continuously reflected upon and optimized in relation to educational objectives and knowledge construction logic. Future work should further refine learning pathways and instructional interaction mechanisms under the guidance of constructivist learning theory and cognitive load theory, forming a hybrid paradigm in which technology and educational theory mutually reinforce each other, and promoting the transformation of AI from an assistive tool to a deeply embedded teaching partner.
In addition, ethical governance is a key issue for the sustainable development of AI-based education. In the future, more comprehensive regulatory frameworks need to be established for data privacy protection, algorithmic fairness, and system interpretability. Only with the dual support of technological innovation and ethical governance can AI teaching assistants truly become a driving force for the intelligent and sustainable transformation of higher education.