Cover illustration
Mathematical reasoning is a fundamental aspect of intelligence, encompassing a spectrum from basic arithmetic to intricate problem-solving. Recent investigations into the mathematical abilities of large language models (LLMs) have yielded inconsistent and incomplete assessments. In response, we introduce MathEval, a comprehensive benchmark designed to methodically evaluate the mathematical problem-solving proficiency of LLMs in various contexts, adaptation strategies, and evaluation metrics. MathEval consolidates 22 distinct datasets, encompassing a broad spectrum of mathematical disciplines, languages (including English and Chinese), and problem categories (ranging from arithmetic and competitive mathematics to higher mathematics), with varying degrees of difficulty from elementary to advanced. To address the complexity of mathematical reasoning outputs and adapt to diverse models and prompts, we employ GPT-4 as an automated pipeline for answer extraction and comparison. Additionally, we trained a publicly available DeepSeek-LLM-7B-Base model using GPT-4 results, enabling precise answer validation without requiring GPT-4 access. To mitigate potential test data contamination and truly gauge progress, MathEval incorporates an annually refreshed set of problems from the latest Chinese National College Entrance Examination (Gaokao-2023, Gaokao-2024), thereby benchmarking genuine advancements in mathematical problem solving skills.
With the rapid development of online education, cognitive diagnosis has become a key task in intelligent education, particularly for student ability assessments and resource recommendations. However, existing cognitive diagnosis models face the diagnostic system cold-start problem, whereby there are no response logs in new domains, making accurate student diagnosis challenging. This research defines this task as zero-shot cross-domain cognitive diagnosis (ZCCD), which aims to diagnose students’ cognitive abilities in the target domain using only the response logs from the source domain without prior interaction data. To address this, a novel paradigm, large language model (LLM)-guided cognitive state transfer (LCST) is proposed, which leverages the powerful capabilities of LLMs to bridge the gap between the source and target domains. By modelling cognitive states as natural language tasks, LLMs act as intermediaries to transfer students’ cognitive states across domains. The research uses advanced LLMs to analyze the relationships between knowledge concepts and diagnose students’ mastery of the target domain. The experimental results on real-world datasets shows that the LCST significantly improves cognitive diagnostic performance, which highlights the potential of LLMs as educational experts in this context. This approach provides a promising direction for solving the ZCCD challenge and advancing the application of LLMs in intelligent education.
As a data-driven analysis and decision-making tool, student portraits have gained significant attention in education management and personalized instruction. This research systematically explores the construction process of student portraits by integrating knowledge graph technology with advanced data analytics, including clustering, predictive modelling, and natural language processing. It then examines the portraits’ applications in personalized learning, such as student-centric adaptation of content and paths, and personalized teaching, especially the educator-driven instructional adjustments. Through case studies and quantitative analysis of multimodal datasets, including structured academic records, unstructured behavioural logs, and socio-emotional assessments, the research demonstrates how student portraits enable academic early warnings, adaptive learning path design, and equitable resource allocation. The findings provide actionable insights and technical frameworks for implementing precision education.
The rapid development of artificial intelligence technology has propelled the automated, humanized, and personalized learning services to become a core topic in the transformation of education. Generative artificial intelligence (GenAI), represented by large language models (LLMs), has provided opportunities for reshaping the methods for setting personalized learning objectives, learning patterns, construction of learning resources, and evaluation systems. However, it still faces significant limitations in understanding the differences in individual static characteristics, dynamic learning processes, and students’ literacy goals, as well as in actively differentiating and adapting to these differences. The study has clarified the technical strategies and application services of GenAI-empowered personalized learning, and analyzed the challenges in areas such as the lag in theoretical foundations and lack of practical guidance, weak autonomy and controllability of key technologies, insufficient understanding of the learning process, lack of mechanisms for enhancing higher-order literacy, and deficiencies in safety and ethical regulations. It has proposed implementation paths around interdisciplinary theoretical innovation, development of LLMs, enhancement of personalized basic services, improvement of higher-order literacy, optimization of long-term evidence-based effects, and establishment of a safety and ethical value regulation system, aiming to promote the realization of safe, efficient, and sustainable personalized learning.
With the development of the Internet and intelligent education systems, the significance of cognitive diagnosis has become increasingly acknowledged. Cognitive diagnosis models (CDMs) aim to characterize learners’ cognitive states based on their responses to a series of exercises. However, conventional CDMs often struggle with less frequently observed learners and items, primarily due to limited prior knowledge. Recent advancements in large language models (LLMs) offer a promising avenue for infusing rich domain information into CDMs. However, integrating LLMs directly into CDMs poses significant challenges. While LLMs excel in semantic comprehension, they are less adept at capturing the fine-grained and interactive behaviours central to cognitive diagnosis. Moreover, the inherent difference between LLMs’ semantic representations and CDMs’ behavioural feature spaces hinders their seamless integration. To address these issues, this research proposes a model-agnostic framework to enhance the knowledge of CDMs through LLMs extensive knowledge. It enhances various CDM architectures by leveraging LLM-derived domain knowledge and the structure of observed learning outcomes taxonomy. It operates in two stages: first, LLM diagnosis, which simultaneously assesses learners via educational techniques to establish a richer and a more comprehensive knowledge representation; second, cognitive level alignment, which reconciles the LLM’s semantic space with the CDM’s behavioural domain through contrastive learning and mask-reconstruction learning. Empirical evaluations on multiple real-world datasets demonstrate that the proposed framework significantly improves diagnostic accuracy and underscoring the value of integrating LLM-driven semantic knowledge into traditional cognitive diagnosis paradigms.
Generative artificial intelligence (GenAI) models, such as ChatGPT, have rapidly gained popularity. Despite this widespread usage, there is still a limited understanding of how this emerging technology impacts different stakeholders in higher education. While extensive research exists on the general opportunities and risks in education, there is often a lack of specificity regarding the target audience—namely, students, educators, and institutions—and concrete solution strategies and recommendations are typically absent. Our goal is to address the perspectives of students and educators separately and offer tailored solutions for each of these two stakeholder groups. This study employs a mixed-method approach that integrates a detailed online questionnaire of 188 students with a scenario analysis to examine potential benefits and drawbacks introduced by GenAI. The findings indicate that students utilize the technology for tasks such as assignment writing and exam preparation, seeing it as an effective tool for achieving academic goals. Subsequent the scenario analysis provided insights into possible future scenarios, highlighting both opportunities and challenges of integrating GenAI within higher education for students as well as educators. The primary aim is to offer a clear and precise understanding of the potential implications for students and educators separately while providing recommendations and solution strategies. The results suggest that irresponsible and excessive use of the technology could pose significant challenges. Therefore, educators need to establish clear policies, reevaluate learning objectives, enhance AI skills, update curricula, and reconsider examination methods.
The rapid advancement of artificial intelligence has significantly impacted education, with large-scale foundation models (LFMs) emerging as transformative tools. While LFMs have demonstrated exceptional performance across diverse domains, their integration into K-12 education remains in its early stages, requiring alignment with pedagogical principles, cognitive development, and curriculum standards. This paper provides a comprehensive technological review of LFM applications in K-12 education, examining current workflows, challenges, and future opportunities. We explore how LFMs facilitate personalized learning, teacher–student collaboration, and automated assessment while highlighting critical issues such as motivation, engagement, and age-appropriate instructional strategies. By analyzing global developments, this study offers valuable insights for educators seeking to optimize AI-driven teaching methods and for students leveraging AI for self-directed learning. Our findings aim to inform future research and drive innovation in educational AI, ensuring the effective and ethical integration of LFMs into the evolving K-12 educational landscape.
Open-source large language models (LLMs) research has made significant progress, but most studies predominantly focus on general-purpose English data, which poses challenges for LLM research in Chinese education. To address this, this research first reviewed and synthesized the core technologies of representative open-source LLMs, and designed an advanced 1.5B-parameter LLM tailored for the Chinese education field. Chinese education large language model (CELLM) is trained from scratch, involving two stages, namely, pre-training and instruction fine-tuning. In the pre-training phase, an open-source dataset is utilized for the Chinese education domain. During the instruction fine-tuning stage, the Chinese instruction dataset is developed and open-sourced, comprising over 258,000 data entries. Finally, the results and analysis of CELLM across multiple evaluation datasets are presented, which provides a reference baseline performance for future research. All of the models, data, and codes are open-source to foster community research on LLMs in the Chinese education domain.