LLM-Driven Cognitive Diagnosis with SOLO Taxonomy: A Model-Agnostic Framework

Zhiang Dong, Jingyuan Chen, Fei Wu

Frontiers of Digital Education ›› 2025, Vol. 2 ›› Issue (2) : 20.

PDF(5392 KB)
PDF(5392 KB)
Frontiers of Digital Education ›› 2025, Vol. 2 ›› Issue (2) : 20. DOI: 10.1007/s44366-025-0057-8
RESEARCH ARTICLE

LLM-Driven Cognitive Diagnosis with SOLO Taxonomy: A Model-Agnostic Framework

Author information +
History +

Abstract

With the development of the Internet and intelligent education systems, the significance of cognitive diagnosis has become increasingly acknowledged. Cognitive diagnosis models (CDMs) aim to characterize learners’ cognitive states based on their responses to a series of exercises. However, conventional CDMs often struggle with less frequently observed learners and items, primarily due to limited prior knowledge. Recent advancements in large language models (LLMs) offer a promising avenue for infusing rich domain information into CDMs. However, integrating LLMs directly into CDMs poses significant challenges. While LLMs excel in semantic comprehension, they are less adept at capturing the fine-grained and interactive behaviours central to cognitive diagnosis. Moreover, the inherent difference between LLMs’ semantic representations and CDMs’ behavioural feature spaces hinders their seamless integration. To address these issues, this research proposes a model-agnostic framework to enhance the knowledge of CDMs through LLMs extensive knowledge. It enhances various CDM architectures by leveraging LLM-derived domain knowledge and the structure of observed learning outcomes taxonomy. It operates in two stages: first, LLM diagnosis, which simultaneously assesses learners via educational techniques to establish a richer and a more comprehensive knowledge representation; second, cognitive level alignment, which reconciles the LLM’s semantic space with the CDM’s behavioural domain through contrastive learning and mask-reconstruction learning. Empirical evaluations on multiple real-world datasets demonstrate that the proposed framework significantly improves diagnostic accuracy and underscoring the value of integrating LLM-driven semantic knowledge into traditional cognitive diagnosis paradigms.

Graphical abstract

Keywords

large language models / cognitive diagnosis models / intelligent education system, SOLO taxonomy, knowledge representation

Cite this article

Download citation ▾
Zhiang Dong, Jingyuan Chen, Fei Wu. LLM-Driven Cognitive Diagnosis with SOLO Taxonomy: A Model-Agnostic Framework. Frontiers of Digital Education, 2025, 2(2): 20 https://doi.org/10.1007/s44366-025-0057-8

References

[1]
Abbasiantaeb, Z., Yuan, Y. F., Kanoulas, E., Aliannejadi, M. (2024). Let the LLMs talk: Simulating human-to-human conversational QA via zero-shot LLM-to-LLM interactions. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining. Merida: ACM, 8–17.
[2]
Bi, H. Y., Chen, E. H., He, W. D., Wu, H., Zhao, W. H., Wang, S. J., Wu, J. Z. (2023). BETA-CD: A Bayesian meta-learned cognitive diagnosis framework for personalized learning. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 5018–5026.
[3]
Bi, H. Y., Ma, H. P., Huang, Z. Y., Yin, Y., Liu, Q., Chen, E. H., Su, Y., Wang, S. J. (2020). Quality meets diversity: A model-agnostic framework for computerized adaptive testing. In: Proceedings of 2020 IEEE International Conference on Data Mining. Sorrento: IEEE, 42–51.
[4]
Biggs, J. B., Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy (structure of the observed learning outcome). New York: Academic Press.
[5]
Cui, J. Q., Zhong, Z. S., Tian, Z. T., Liu, S., Yu, B., Jia, J. Y. (2024). Generalized parametric contrastive learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12): 7463–7474
[6]
Dai, Z. L., Yao, C., Han, W. K., Yuanying, Y. Y., Gao, Z. P., Chen, J. Y. (2024). MPCoder: Multi-user personalized code generator with explicit and implicit style representation learning. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok: ACL, 3765–3780.
[7]
de la Torre, J. (2009). DINA model and parameter estimation: A didactic.Journal of Educational and Behavioral Statistics, 34(1): 115–130
[8]
Deng, X., Bashlovkina, V., Han, F., Baumgartner, S., Bendersky, M. (2023). LLMs to the moon? Reddit market sentiment analysis with large language models. In: Proceedings of the ACM Web Conference 2023. New York: ACM, 1014–1019.
[9]
Dong, Z., Chen, J. Y., Wu, F. (2025). Knowledge is power: Harnessing large language models for enhanced cognitive diagnosis. arXiv Preprint, arXiv:2502.05556.
[10]
Gao, W. B., Liu, Q., Huang, Z. Y., Yin, Y., Bi, H. Y., Wang, M. C., Ma, J. H., Wang, S. J., Su, Y. (2021). RCD: Relation map driven cognitive diagnosis for intelligent education systems. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 501–510.
[11]
He, K. M., Chen, X. L., Xie, S. N., Li, Y. H., Dollár, P., Girshick, R. (2022). Masked autoencoders are scalable vision learners. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 16000–16009.
[12]
Hoang, M., Bihorac, O. A., Rouces, J. (2019). Aspect-based sentiment analysis using BERT. In: Proceedings of the 22nd Nordic Conference on Computational Linguistics. Turku: ACL, 187–196.
[13]
Hu, L. Y., Dong, Z. A., Chen, J. Y., Wang, G. F., Wang, Z. H., Zhao, Z., Wu, F. (2023). PTADisc: A cross-course dataset supporting personalized learning in cold-start scenarios. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 44976–44996.
[14]
Huang, Z. C., Jin, X. J., Lu, C. Z., Hou, Q. B., Cheng, M. M., Fu, D. M., Shen, X. H., Feng, J. S. (2024). Contrastive masked autoencoders are stronger vision learners.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(4): 2506–2517
[15]
Huang, Z. Y., Liu, Q., Zhai, C. X., Yin, Y., Chen, E. H., Gao, W. B., Hu, G. P. (2019). Exploring multi-objective exercise recommendations in online education systems. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 1261–1270.
[16]
Hui, B. Y., Yang, J., Cui, Z. Y., Yang, J. X., Liu, D. Y. H., Zhang, L., Liu, T. Y., Zhang, J. J., Yu, B. W., Lu, K. M., , . (2024). Qwen2.5-coder technical report. arXiv Preprint, arXiv:2409.12186.
[17]
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y. L., Isola, P., Maschinot, A., Liu, C., Krishnan, D. (2020). Supervised contrastive learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 18661–18673.
[18]
Laskar, T. R., Hoque, E., Huang, J. X. (2022). Domain adaptation with pre-trained transformers for query-focused abstractive text summarization.Computational Linguistics, 48(2): 279–320
[19]
Li, Q. Y., Fu, L. Y., Zhang, W. M., Chen, X. Y., Yu, J. W., Xia, W., Zhang, W. N., Tang, R. M., Yu, Y. (2023). Adapting large language models for education: Foundational capabilities, potentials, and challenges. arXiv Preprint, arXiv:2401.08664.
[20]
Lin, W., Chen, J. Y., Shi, J. X., Guo, Z. R., Zhu, Y. C., Wang, Z. H., Jin, T., Zhao, Z., Wu, F., Yan, S. C., Zhang, H. W. (2024a). Action imitation in common action space for customized action image synthesis. In: Proceedings of the 38th Annual Conference on Neural Information Processing Systems. Vancouver.
[21]
Lin, W., Chen, J. Y., Shi, J. X., Zhu, Y. C., Liang, C., Miao, J. Z., Jin, T., Zhao, Z., Wu, F., Yan, S. C., Zhang, H. W. (2024b). Non-confusing generation of customized concepts in diffusion models. In: Proceedings of the 41st International Conference on Machine Learning. Vienna: JMLR, 1206.
[22]
Lin, W., Feng, Y. Y., Han, W. K., Jin, T., Zhao, Z., Wu, F., Yao, C., Chen, J. Y. (2024c). E3: Exploring embodied emotion through a large-scale egocentric video dataset. In: Proceedings of the 38th Conference on Neural Information Processing Systems Datasets and Benchmarks Track. Vancouver.
[23]
Liu, Q. (2021). Towards a new generation of cognitive diagnosis. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. Montreal: ijcai, 4961–4964.
[24]
Liu, J. Y., Huang, Z. Y., Xiao, T., Sha, J., Wu, J. Z., Liu, Q., Wang, S. J., Chen, E. H. (2024a). SocraticLM: Exploring Socratic personalized teaching with large language models. In: Proceedings of the 38th Annual Conference on Neural Information Processing Systems. Vancouver.
[25]
Liu, S., Shen, J. H., Qian, H., Zhou, A. M. (2024b). Inductive cognitive diagnosis for fast student learning in web-based intelligent education systems. In: Proceedings of the ACM Web Conference 2024. New York: ACM, 4260–4271.
[26]
Liu, Z. Y., Yin, S. X., Lin, G. Y., Chen, N. F. (2024c). Personality-aware student simulation for conversational intelligent tutoring systems. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami: ACL.
[27]
Lord, F. (1952). A theory of test scores. Psychometric Monographs No. 7. Richmond: Psychometric Corporation.
[28]
Moslem, Y., Haque, R., Kelleher, J. D., Way, A. (2023). Adaptive machine translation with large language models. In: Proceedings of the 24th Annual Conference of the European Association for Machine Translation. Tampere: European Association for Machine Translation, 227–237.
[29]
Reckase, M. D. (2009). Multidimensional item response theory models. In: Reckase, M. D., ed. Multidimensional item response theory. New York: Springer, 79–112.
[30]
Su, H. J., Shi, W. J., Kasai, J., Wang, Y. Z., Hu, Y. S., Ostendorf, M., Yih, W. T., Smith, N. A., Zettlemoyer, L., Yu, T. (2023). One embedder, any task: Instruction-finetuned text embeddings. In: Proceedings of the Findings of the Association for Computational Linguistics. Toronto: ACL, 1102–1121.
[31]
van den Oord, A., Li, Y. Z., Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv Preprint, arXiv:1807.03748.
[32]
van der Maaten, L., Hinton, G. (2008). Visualizing data using t-SNE.Journal of Machine Learning Research, 9(86): 2579–2605
[33]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 6000–6010.
[34]
Wang, F., Liu, Q., Chen, E. H., Huang, Z. Y., Chen, Y. Y., Yin, Y., Huang, Z., Wang, S. J. (2020). Neural cognitive diagnosis for intelligent education systems. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI Press, 6153–6161.
[35]
Wang, F., Liu, Q., Chen, E. H., Liu, C. R., Huang, Z. Y., Wu, J. Z., Wang, S. J. (2024a). Unified uncertainty estimation for cognitive diagnosis models. In: Proceedings of the ACM Web Conference 2024. New York: ACM, 3545–3554.
[36]
Wang, S. S., Zeng, Z., Yang, X., Xu, K., Zhang, X. Y. (2024b). Boosting neural cognitive diagnosis with student’s affective state modeling. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. Vancouver: AAAI Press, 620–627.
[37]
Wang, S. S., Zeng, Z., Yang, X., Zhang, X. Y. (2023). Self-supervised graph learning for long-tailed cognitive diagnosis. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 110–118.
[38]
Wu, T., Li, M. Z., Chen, J. Y., Ji, W., Lin, W., Gao, J. Y., Kuang, K., Zhao, Z., Wu, F. (2024). Semantic alignment for multimodal large language models. In: Proceedings of the 32nd ACM International Conference on Multimedia. New York: ACM, 3489–3498.
[39]
Xia, J., Wu, L. R., Wang, G., Chen, J. T., Li, S. Z. (2022). ProGCL: Rethinking hard negative mining in graph contrastive learning. In: Proceedings of the 39th International Conference on Machine Learning. Baltimore: PMLR, 24332–24346.
[40]
Xu, S. L., Zhang, X. Y., Qin, L. H. (2024). EduAgent: Generative student agents in learning. arXiv Preprint, arXiv:2404.07963.
[41]
Yu, X. S., Qin, C., Shen, D. Z., Ma, H. P., Zhang, L., Zhang, X. Y., Zhu, H. S., Xiong, H. (2024). RDGT: Enhancing group cognitive diagnosis with relation-guided dual-side graph transformer.IEEE Transactions on Knowledge and Data Engineering, 36(7): 3429–3442
[42]
Zeng, A. H., Xu, B., Wang, B. W., Zhang, C. H., Yin, D., Zhang, D., Rojas, D., Feng, G. Y., Zhao, H. L., Lai, H. Y., , . (2024). ChatGLM: A family of large language models from GLM-130B to GLM-4 all tools. arXiv Preprint, arXiv:2406.12793.
[43]
Zhang, B., Haddow, B., Birch, A. (2023a). Prompting large language model for machine translation: A case study. In: Proceedings of the 40th International Conference on Machine Learning. Honolulu: JMLR, 41092–41110.
[44]
Zhang, D. C., Zhang, K., Wu, L., Tian, M., Hong, R. C., Wang, M. (2024b). Path-specific causal reasoning for fairness-aware cognitive diagnosis. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 4143–4154.
[45]
Zhang, H. P., Liu, X., Zhang, J. W. (2023b). SummIt: Iterative text summarization via ChatGPT. In: Proceedings of the Findings of the Association for Computational Linguistics. Singapore: ACL, 10644–10657.
[46]
Zhang, J. J., Hou, Y. P., Xie, R. B., Sun, W. Q., McAuley, J., Zhao, W. X., Lin, L. Y., Wen, J. R. (2024a). AgentCF: Collaborative learning with autonomous language agents for recommender systems. In: Proceedings of the ACM Web Conference 2024. New York: ACM, 3679–3689.
[47]
Zhu, L. X., Huang, X. W., Sang, J. T. (2024). How reliable is your simulator? Analysis on the limitations of current LLM-based user simulators for conversational recommendation. In: Proceedings of the ACM Web Conference 2024. New York: ACM, 1726–1732.
[48]
Zhuang, Y., Liu, Q., Huang, Z. Y., Li, Z., Jin, B. B., Bi, H. Y., Chen, E. H., Wang, S. J. (2022). A robust computerized adaptive testing approach in educational question retrieval. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 416–426.

Acknowledgments

This research was partially supported by the National Natural Science Foundation of China (Grant Nos. 62037001 and 62307032), and the Zhejiang Province Leading Geese Plan (Grant No. 2025C02022).

Conflict of Interest

Fei Wu is a member of the Editorial Board and Jingyuan Chen is a Senior Editor of Frontiers of Digital Education, who were excluded from the peer-review process and all editorial decisions related to the acceptance and publication of this article. Peer-review was handled independently by the other editors to minimise bias.

Ethics Statements

The authors declare that their Institutional Ethics Committee confirmed that no ethical review was required for this study. Written informed consent for participation was not required because all participants’ data was anonymized before the statistical analyses were conducted.

Data Availability Statements

The authors confirm that all data generated or analysed during this study are included in this published article.

Authors Contributions

Zhiang Dong made substantial contributions to the conception of the work, the acquisition, analysis, and interpretation of data, and drafted the work. Jingyuan Chen made substantial contributions to the conception of the work and revised it critically for important intellectual content. Fei Wu made substantial contributions to the acquisition of data and revised it critically for important intellectual content. All authors approved the version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

RIGHTS & PERMISSIONS

2025 Higher Education Press
AI Summary AI Mindmap
PDF(5392 KB)

Accesses

Citations

Detail

Sections
Recommended

/