1 Introduction
In recent years, the field of artificial intelligence (AI) has witnessed significant advancements, giving rise to a powerful class of large language models (LLMs). At the forefront of these language models is GPT-4 (
OpenAI, 2023). With an impressive number of parameters in the trillions, GPT-4 is one of the largest language models ever created, capable of understanding complex language patterns and generating responses that are often indistinguishable from those of a human. At the same time, open-source communities have developed several alternatives, including LLaMA (
Touvron et al., 2023). Training on trillions of tokens based on publicly available data, LLaMA achieves competitive performance with a relatively small number of parameters and affordable training costs. Thus, it emerges as a valuable open-source tool for a wide range of applications.
While large language models have demonstrated high capacity, they often encounter distinct challenges when applied to educational tasks. We outline the key limitations they face as follows and provide an illustration in Fig.1.
Fig.1 Limitations of general LLMs in education: (a) comprehension ability, (b) out-of-date knowledge, (c) personalized ability, (d) Chinese proficiency, (e) logical reasoning ability. |
Full size|PPT slide
First, inadequacy of basic cognitive capacities. Although LLMs possess exceptional capabilities in general domain, their fundamental educational capabilities, such as retention and comprehension, remain limited due to constraints imposed by the training data. Primarily, owing to the vastness of knowledge, LLMs often exhibit constrained comprehension of specialized expertise that extends beyond their training data, yielding inaccurate responses. Additionally, academic knowledge is incessantly evolving, especially within practical subjects. The information contained in their training data may become outdated and obsolete, thus constraining the basic ability of LLMs to generate factually accurate responses when faced with inquiries that necessitate up-to-date awareness of post-training occurrences and knowledge (
Cao et al., 2021;
Liu et al., 2023;
Wang et al., 2021;
Yang et al., 2023).
Second, lack of advanced cognitive capacities. Proficiency in advanced cognitive capacities, such as analysis, evaluation and innovation, is vital for tackling challenging tasks. Existing studies have demonstrated that LLMs lack these capabilities, which can lead to failures. For instance,
Liu et al. (2023) have highlighted the persisting difficulty in logical reasoning for ChatGPT and GPT-4, especially when confronted with unfamiliar data and natural language inference datasets.
Baidoo-Anu and Owusu Ansah (2023) discovered that generative models were limited to generating responses solely based on patterns present in their training data, thereby restricting the creativity and originality of their outputs. Moreover,
Baidoo-Anu and Owusu Ansah (2023) provided evidence that ChatGPT and other generative AI models may offer general information and assistance but lack the ability to personalize instruction to cater to the specific needs of individual students. The insufficiency of advanced skills in LLMs hinders their broader utilization in educational contexts.
Third, limited Chinese proficiency. While several large language models, such as LLaMA, have been made available to the public, their primary focus has been on English corpora, with limitation in applicability to other languages.
Cui et al. (2023) showed that vocabularies of LLaMA or Alpaca (
Taori et al., 2023) contained only a few hundred Chinese tokens, substantially hindering their efficiency in encoding and decoding Chinese text.
The key to rectify the shortcomings and adapt LLMs to the realm of education lies in the amalgamation of LLMs with educational theories, thereby equipping LLMs with varying levels of abilities. Among a large amount of educational theories, Bloom’s Taxonomy, as delineated in
Anderson et al. (2001), proffers a framework for categorizing the diverse objectives and proficiencies that educators aspire to instill into their students. The new taxonomy embraces a two-dimensional framework encompassing “knowledge” and “cognitive processes.” “Knowledge” pertains to the relevant content involved in learning, while “cognitive processes” refer to the academic behaviors and manifestations of learning that need to be mastered. Bloom’s Taxonomy has provided ample scope for guiding teaching practices and helping learners progress to higher-order thinking (
Ramirez, 2017), henceforth, it ought to be utilized to boost the abilities of LLMs in the realm of education.
In accordance with Bloom’s Taxonomy, the authors propose a method that transfers general domain LLMs into educational field by simultaneously learning the “knowledge” and “cognitive processes” dimension in this paper, as illustrated in Fig.2. Firstly, to expand the breadth of the model’s knowledge, we manually summarize coarse-grained knowledge concepts drawn from authentic textbooks, and meticulously utilize strong LLMs to generate fine-grained knowledge concepts that are aligned with the “knowledge” category. This approach facilitates a comprehensive coverage of detailed knowledge concepts spanning multiple levels and complexities, which help to enhance the model’s basic cognitive capacities. Subsequently, we employ self-instruction, as demonstrated in (
Wang et al., 2023), to construct over forty thousands Chinese instructions based on the “cognitive processes” dimension with educational tasks including professional knowledge question answering, test problem generation, and intelligent tutoring. These instructions not only embody a multitude of educational capacities at various levels, particularly those advanced ones, but also encompass all aforementioned fine-grained knowledge concepts. With them as instruction-tuning data, the proficiency of models can be significantly enhanced. Finally, in order to enhance model’s awareness to the knowledge lies beyond training data, we utilize two retrieval augmentation strategies during inference, namely local knowledge base retrieval and search engine retrieval, to serve as extra knowledge sources.
Fig.2 Training pipeline. We collect knowledge concepts and instructions under the guidance of textbooks, Bloom’s Taxonomy and strong LLMs, serving as instruction-tuning data to transform general LLMs to educational LLMs. During inference, we construct a local knowledge base based on the textbook, incorporating search engine capabilities for retrieval enhancement. |
Full size|PPT slide
We conduct our method on two open-source Chinese language models, Chinese-LLaMA and Alpaca (
Cui et al.,2023) and Qwen-7B-Chat (
Bai et al., 2023). Experiments have demonstrated the superiority of our fine-tuned models compared to the original models across various educational tasks, assessed from a diverse range of evaluation perspectives. It is worthy of note that our experiments are conducted specifically within the domain of Chinese AI instruction. In summary, our contributions are three-fold:
(1) We devise instructions based on textbooks and the guidance of Bloom’s Taxonomy to transfer general LLMs to educational domain;
(2) We utilize retrieval augmentation strategies during inference to expand the width of the model’s knowledge and enhance the quality of responses to factual inquiries;
(3) We conduct evaluation on various education tasks, demonstrating the superiority of our finetuned models compared to origin models.
2 Related Work
2.1 Large Language Models
LLMs have revolutionized the field of AI and natural language processing, opening up new possibilities for human–computer interaction and advancing our understanding of language and its applications. With the ability to process and analyse vast amounts of textual data, large language models have showcased remarkable capabilities in tasks such as text generation, question answering, summarization, translation, sentiment analysis, and more.
Among all the models, ChatGPT (
Ouyang et al., 2022) and GPT-4 (
OpenAI, 2023) are two prominent iterations of large language models developed by OpenAI. Despite their excellent performance on general tasks, they are not open-source models and have a huge amount of parameters, which hinders personal deployment and research. LLaMA (
Touvron et al., 2023) is an open-source substitute for GPT, with number of parameters ranging from 7 billion to 65 billion. Alpaca (
Taori et al., 2023) conducts instruction tuning on LLaMA with 52K instruction data, achieving comparable performance with ChatGPT in English at an affordable cost.
Despite their great performance in English, LLaMA has shown weakness in Chinese due to the lack of Chinese corpus in the training data. To tackle this problem, Chinese-LLaMA-Alpaca (
Cui et al., 2023) augment LLaMA with capabilities for understanding and generating Chinese text and its ability to follow instructions by extending LLaMA’s existing vocabulary with additional Chinese tokens and further fine-tuning the model with Chinese instruction datasets. Unlike Chinese-LLaMA-Alpaca directly fine-tuning on LLaMA, ChatGLM (
Zeng et al., 2022) is a new model based on the existing GLM structure. With training on large Chinese–English bilingual corpus and aligning with human intentions through supervised fine-tuning, ChatGLM has shown powerful Chinese language capabilities.
2.2 Bloom’s Taxonomy
Bloom’s Taxonomy (
Anderson et al., 2001) is a widely recognized framework that categorizes educational objectives and cognitive processes. It was first developed in the 1950s by Benjamin Bloom and his colleagues, and it has since become a fundamental tool in the field of education. Bloom’s Taxonomy provides a structured way to understand and organize different levels of thinking and learning (
Bloom et al., 1956).
In its 2001 revision, the new taxonomy embraces a two-dimensional framework encompassing “knowledge” and “cognitive processes.” Knowledge pertains to the relevant content involved in learning, encompassing four categories ranging from concrete to abstract: factual knowledge, conceptual knowledge, procedural knowledge, and metacognitive knowledge. Cognitive processes refer to the academic behaviors and manifestations of learning that need to be mastered, including six categories: remember, understand, apply, analyse, evaluate, and create, which are arranged in ascending order of cognitive complexity. This theory has been used to explore the weaknesses of ChatGPT in the field of education (
Elsayed, 2023).
3 Methods
The details of our finetuning method will be introduced in this section. Firstly, we collect fine-grained concepts corresponding to categories in “knowledge” dimension of Bloom’s Taxonomy. Then the instruction templates are designed to align with “cognitive processes” dimension of Bloom’s Taxonomy. By combining knowledge concepts and instruction templates, we obtain instruction-output pairs as our training data. Besides, two retrieval augmentation methods are developed during inference, enhancing the accuracy and professionalism of model’s response.
3.1 Knowledge Concepts
Knowledge concepts, as the fundamental unit for transmitting instructional information in teaching activities, plays an important role in both teaching and learning. It can be regarded as the basic component of subject knowledge and serves as the cornerstone for constructing a systematic knowledge system. We collect knowledge concepts with 2 steps, corresponding to different levels of granularity. Coarse-grained knowledge concepts have fewer quantities and are easy to obtain, so we manually extract them from the textbooks in the first step. In this way, we collect 117 coarse-grained knowledge concepts about AI, which are used for fine-grained knowledge concepts generation.
In the second step, we aim to collect intricate fine-grained knowledge concepts that are derived from the coarse ones. However, acquiring these detailed concepts manually is a time-consuming task and requires significant human effort. Therefore, we employ a self-instruction approach (
Wang et al., 2023) to acquire them. To be specific, for each coarse concept and knowledge category within Bloom’s Taxonomy, we employ ChatGPT to act as an AI learner. We prompt it to provide a series of questions that it may encounter during the learning process and to summarize the corresponding fine-grained knowledge concepts related to each category. Following careful manual extraction, cleaning, and filtering of the responses, we obtain a total of 981 fine-grained concepts and 1,196 questions. These concepts encompass various levels and diverse subjects.
3.2 Knowledge-Based Instruction Tuning
Instruction tuning (
Wei et al., 2021) is a simple method to improve the ability of language models to respond to NLP instructions, demonstrating promising abilities of language models to perform tasks described purely via instructions. Inspired by the automatic generation of instruction, we design templates and construct instructions using concepts above.
Specifically, we develop 39 distinct templates, in which fine-grained concepts or questions will be filled. These templates use three common educational tasks, i.e., subject knowledge QA, test problem generation, and intelligent tutoring as carriers, comprehensively encompassing all the learning abilities described in the cognitive processes of Bloom’s Taxonomy. Subsequently, these templates are merged with concepts or questions to formulate original instructions. However, these instructions often suffer from substandard quality and a dearth of diversity, which might affect the performance of the model (
Wang et al., 2023). Therefore, we employ ChatGPT to assess the coherence of each instruction. Only those instructions deemed of high quality are selected and revised to reduce similarity with others. Through this process, we acquire a set of instructions that exhibit exceptional quality. These instructions are then submitted to human experts utilizing GPT as an assistant to generate corresponding answers. Each instruction, along with its respective answer, is organized following the Stanford Alpaca (
Taori et al., 2023) style. It consists of indispensable Instruction and Output fields, while the Input field remains optional. Ultimately, we obtain 38,784 pairs of instruction and output, which serve as the foundation for supervised fine-tuning. We illustrate some data samples from our dataset in Fig.3.
Fig.3 Demonstrations of the dataset. |
Full size|PPT slide
3.3 Retrieval Enhancement
As mentioned in (
Cao et al., 2021;
Liu et al., 2023;
Wang et al., 2021;
Yang et al., 2023), LLMs have limited performance in producing factually accurate answers. To tackle this problem, we utilize two retrieval enhancement methods during inference, namely local knowledge base retrieval and search engine retrieval.
Local knowledge base retrieval primarily addresses factual information contained in textbooks, which provides advantages when the model responds to queries involving obscure knowledge or highly professional language. To establish such a local library, we follow the standard long chain procedure. Initially, we import our textbooks to constitute unstructured textual contents, dividing them into multiple text chunks using a text splitter. Subsequently, a text embedding model is employed to transform these text chunks into a vector space while preserving textual coherence and similarity. Through this approach, each query can be converted into the same vector space, enabling the retrieval of the most similar k text segments from the textbooks to serve as reference materials.
To address inquiries that exceed the scope of textbooks, we employ a search engine to augment our proficiency. This is accomplished by dispatching each inquiry to the Azure Bing Search API, which provides us with a collection of search outcomes serving as pertinent resources. Whether retrieving information from a local library or through the search engine, these resources are amalgamated with the user’s query, thereby forming an input for the model to generate a more accurate, comprehensive, and professional response. Please note that we set the retrieval enhancement as an optional feature, which means users need to decide whether to use the retrieval enhancement feature according to their needs.
4 Experiments
4.1 Baselines
We conduct experiments on two Chinese baselines:
(1) Chinese-LLaMA-Alpaca (
Cui et al., 2023) continues training on LLaMA with Chinese data. We conduct our training on Chinese-Alpaca-7B and Chinese-Alpaca-13B.
(2) Qwen (
Bai et al., 2023) is an optimized dialogue model specifically designed for the Chinese chatting scenario. We conduct our training on Qwen-7B-Chat.
4.2 Experiment Detail
We adopt AdamW optimizer with an initial learning rate of 2e-5. The models are trained on 8 A100 GPUs and the batch size of each GPU is set to 16. We use Low-Rank Adaptation (
Hu et al., 2021) training strategy to reduce training parameters when training on Chinese-LLaMA-Alpaca. The lora rank of Chinese-Alpaca-7B is set to 8, while the lora rank of Chinese-Alpaca-13B is set to 32. For Qwen-7B-Chat, we utilize full-parameter finetuning. For inference, we set the temperature to 1, top-p to 0.7, beam size to 1, and maximum generation length to 1,024.
4.3 Testset Detail
4.3.1 Self-Constructed Dataset
Our testset consist of 2 parts: educational functions and cognitive capacities. There are 70 instances in the educational functions part, encompassing professional question answering, test problem generation and intelligent tutoring. Each function is evaluated from diverse perspectives. The remaining 60 instances are dedicated to testing cognitive capacities, corresponding to the 6 cognitive processes described in Bloom’s Taxonomy. Fig.4 illustrates the distribution of our test data.
Fig.4 Distribution of test data. |
Full size|PPT slide
4.3.2 Public Dataset: C-Eval
C-Eval (
Huang et al., 2024) is a comprehensive Chinese evaluation suite designed to assess the advanced knowledge and reasoning abilities of foundation models in a Chinese context. It includes 13,948 multiple-choice questions across four difficulty levels (middle school, high school, college, and vocational) and spans 52 disciplines. C-Eval also features a subset called C-Eval Hard, focusing on particularly challenging subjects. Evaluation results show that even the most advanced models like GPT-4 have significant room for improvement, highlighting the suite’s ability to benchmark the capabilities and limitations of current language models. We conduct evaluation on the validation set of C-Eval.
4.4 Results on Self-Constructed Dataset
We conduct model inference on our self-constructed dataset, aiming to compare the results generated by the original LLMs (i.e., Chinese-Alpaca-7B and Chinese-Alpaca-13B) and our WisdomBot. We utilize both human and GPT-4 evaluation when comparing performance, ensuring the accuracy and diversity of evaluation. For human evaluation, we recruit ten experts in the field of AI to compare the responses of two models. For GPT-4 evaluation, we conduct two evaluation processes for each question. The order of responses from the two models differs in the prompt of each evaluation processes because GPT-4 favors response that come first in the sequence.
The results are shown in Fig.5–Fig.8. From these pictures we can observe that for each evaluation part, WisdomBot has a winning rate of at least 63%. WisdomBot even reaches a 100% winning rate on professional question answering part comparing with Chinese-Alpaca-7B. These results demonstrate that WisdomBot can provide more accurate responses.
Fig.5 Human evaluation of whether WisdomBot outperforms Chinese-Alpaca-7B. |
Full size|PPT slide
Fig.6 Human evaluation of whether WisdomBot outperforms Chinese-Alpaca-13B. |
Full size|PPT slide
Fig.7 GPT-4 evaluation of whether WisdomBot outperforms Chinese-Alpaca-7B. |
Full size|PPT slide
Fig.8 GPT-4 evaluation of whether WisdomBot outperforms Chinese-Alpaca-13B. |
Full size|PPT slide
4.5 Results on C-Eval
We conduct zero-shot evaluation on the validation set of C-Eval benchmark of all the baselines and WisdomBot. We list the overall performance of the three models in Tab.1, and the performance on each subset of C-Eval benchmark in Tab.2–Tab.5. From these tables we can observe that WisdomBot outperforms the baselines in most subjects, especially in subjects related to information and computer science. We attribute the performance enhancement to our training data, which is highly relevant to these subjects. For other subjects such as social science and humanities, WisdomBot does not exhibit a significant performance decrease. The performance on “other” subset is even increased compared with the baselines. The results demonstrate the superiority of our WisdomBot model.
Tab.1 Results on the validation set of C-Eval benchmark |
Model | STEM | Social science | Humanities | Other | Hard | Average |
Chinese-Alpaca-7B | 35.45 | 51.53 | 47.67 | 41.87 | 28.28 | 42.49 |
Qwen-7B-Chat | 51.61 | 72.64 | 66.94 | 53.83 | 35.14 | 59.37 |
WisdomBot | 59.17 | 72.01 | 65.38 | 54.96 | 49.26 | 62.06 |
Tab.2 Results on the STEM subset within the validation set of C-Eval benchmark |
Model | Computer network | Operatingsystem | Computer architecture | College programming |
Chinese-Alpaca-7B | 36.84 | 52.63 | 38.1 | 43.24 |
Qwen-7B-Chat | 42.11 | 42.11 | 52.38 | 64.86 |
WisdomBot | 52.63 | 57.89 | 57.14 | 62.16 |
|
Model | Collegephysics | Collegechemistry | Advancedmathematics | Probabilityand statistics |
Chinese-Alpaca-7B | 31.58 | 16.67 | 21.05 | 33.33 |
Qwen-7B-Chat | 31.58 | 54.17 | 10.53 | 22.22 |
WisdomBot | 57.89 | 58.33 | 26.32 | 33.33 |
|
Model | Discretemathematics | Electricalengineer | Metrologyengineer | High schoolmathematics |
Chinese-Alpaca-7B | 43.75 | 37.84 | 50 | 16.67 |
Qwen-7B-Chat | 18.75 | 24.32 | 75 | 33.33 |
WisdomBot | 37.5 | 35.14 | 70.83 | 33.33 |
|
Model | High schoolphysics | High schoolchemistry | High school biology | Middle school mathematics |
Chinese-Alpaca-7B | 31.58 | 31.58 | 42.11 | 21.05 |
Qwen-7B-Chat | 57.89 | 52.63 | 73.68 | 63.16 |
WisdomBot | 78.95 | 68.42 | 68.42 | 63.16 |
|
Model | Middle school biology | Middle school physics | Middle school chemistry | Veterinary medicine |
Chinese-Alpaca-7B | 47.62 | 47.37 | 40 | 26.09 |
Qwen-7B-Chat | 85.71 | 84.21 | 100 | 43.48 |
WisdomBot | 90.48 | 84.21 | 95 | 52.17 |
Tab.3 Results on the social science subset within the validation set of C-Eval benchmark |
Model | College economics | Business administration | Marxism | Mao Zedong Thought |
Chinese-Alpaca-7B | 32.73 | 45.45 | 52.63 | 54.17 |
Qwen-7B-Chat | 45.45 | 54.55 | 73.68 | 75.00 |
WisdomBot | 38.18 | 54.55 | 84.21 | 62.50 |
|
Model | Education science | Teacher qualification | High school politics | High school geography |
Chinese-Alpaca-7B | 37.93 | 59.09 | 57.89 | 42.11 |
Qwen-7B-Chat | 65.52 | 84.09 | 94.74 | 63.16 |
WisdomBot | 72.41 | 81.82 | 94.74 | 57.89 |
|
Model | Middle school politics | Middle school geography | | |
Chinese-Alpaca-7B | 66.67 | 66.67 | | |
Qwen-7B-Chat | 95.24 | 75.00 | | |
WisdomBot | 90.48 | 83.33 | | |
Tab.4 Results on the humanities subset within the validation set of C-Eval benchmark |
Model | Modern Chinese history | Ideological and moral cultivation | Logic | Law |
Chinese-Alpaca-7B | 52.17 | 52.63 | 54.55 | 20.83 |
Qwen-7B-Chat | 78.26 | 84.21 | 36.36 | 41.67 |
WisdomBot | 69.57 | 94.74 | 59.09 | 37.50 |
|
Model | Chinese language and literature | Art studies | Professional tour guide | Legal professional |
Chinese-Alpaca-7B | 34.78 | 48.48 | 51.72 | 39.13 |
Qwen-7B-Chat | 56.52 | 66.67 | 79.31 | 43.48 |
WisdomBot | 47.83 | 69.70 | 68.97 | 43.48 |
|
Model | High school Chinese | High school history | Middle school history | |
Chinese-Alpaca-7B | 47.37 | 50.00 | 72.73 | |
Qwen-7B-Chat | 78.95 | 80.00 | 90.91 | |
WisdomBot | 57.89 | 75.00 | 95.45 | |
Tab.5 Results on the other subset within the validation set of C-Eval benchmark |
Model | Civil servant | Sports science | Plant protection | Basic medicine |
Chinese-Alpaca-7B | 40.43 | 57.89 | 36.36 | 47.37 |
Qwen-7B-Chat | 48.94 | 47.37 | 68.18 | 63.16 |
WisdomBot | 53.19 | 52.63 | 59.09 | 68.42 |
|
Model | Clinical medicine | Urban and rural planner | Accountant | Fire engineer |
Chinese-Alpaca-7B | 36.36 | 52.17 | 36.73 | 38.71 |
Qwen-7B-Chat | 45.45 | 63.04 | 51.02 | 48.39 |
WisdomBot | 50.00 | 60.87 | 53.06 | 45.16 |
|
Model | Environmental impact assessment engineer | Tax accountant | Physician | |
Chinese-Alpaca-7B | 45.16 | 34.69 | 34.69 | |
Qwen-7B-Chat | 48.39 | 53.06 | 55.10 | |
WisdomBot | 58.06 | 44.90 | 59.18 | |
4.6 Advanced Cognitive Ability Comparisons
We compare WisdomBot with baseline models to evaluate their advanced cognitive abilities, encompassing creativity, personalized ability, and logical reasoning ability. We curate 50 test samples for each ability test. For the creativity and personalization tests, we ask GPT-4 to score each model’s response on a scale from 1 to 5, with higher scores indicating stronger abilities. For the logical reasoning test, we directly assess the outputs and calculate each model’s accuracy. The results are reported in Tab.6, which demonstrates the superiority of WisdomBot in terms of advanced cognitive abilities.
Tab.6 Comparisons on three advanced cognitive abilities |
Model | Creativity | Personalized ability | Logical reasoning (%) |
Chinese-Alpaca-7B | 2.78 | 3.56 | 8 |
Qwen-7B-Chat | 2.86 | 3.34 | 46 |
WisdomBot | 3.28 | 3.80 | 52 |
4.7 Experiments on Retrieval Enhancement
We evaluate the effectiveness of two retrieval enhancements: local knowledge base retrieval and search engine retrieval. For the knowledge base retrieval, we curate 50 professional AI questions and assess the professional level of the answers using GPT-4. For the search engine retrieval, we curate 30 factual questions and evaluate the correctness of the answers. The results, presented in Tab.7, show that the local knowledge base retrieval enables the model to generate more professional answers, while the search engine retrieval improves the model’s accuracy on factual questions.
Tab.7 Comparisons on retrieval enhancements |
Model | Local knowledge base (%) | Search engine (%) |
w/o retrieval | 30 | 35 |
w retrieval | 70 | 93 |
4.8 Case Study
As illustrated in Fig.9, WisdomBot demonstrates enhanced creativity (a), personalized ability (b), and logical reasoning ability (c) compared to baseline models. Additionally, local knowledge base retrieval (d) and search engine retrieval (e) respectively improve the professional level and accuracy of WisdomBot’s response.
Fig.9 Case examples generated by WisdomBot and baselines: (a) creativity, (b) personalized ability, (c) logical reasoning. WisdomBot with two retrieval enhancement methods: (d) local knowledge library retiveval, (e) search engine retrieval. |
Full size|PPT slide
5 Conclusion
The general large language models lack basic cognitive abilities and advanced cognitive abilities. We propose a novel tuning approach, using high-quality textbook-level corpora as the basis, focusing on knowledge concepts to construct training data, migrating open-source large language models to the education field, and forming the educational large language model WisdomBot. Experiments show that WisdomBot has achieved excellent performance in different educational scenarios and various subjects.
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}