PediaBench: a comprehensive Chinese pediatric dataset for benchmarking large language models

Qian ZHANG , Panfeng CHEN , Linkun FENG , Shuyu LIU , Jiali LI , Heng ZHAO , Mei CHEN , Hui LI , Yanhao WANG

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (3) : 2003902

PDF (833KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (3) : 2003902 DOI: 10.1007/s11704-025-41345-w
Interdisciplinary
LETTER

PediaBench: a comprehensive Chinese pediatric dataset for benchmarking large language models

Author information +
History +
PDF (833KB)

Graphical abstract

Cite this article

Download citation ▾
Qian ZHANG, Panfeng CHEN, Linkun FENG, Shuyu LIU, Jiali LI, Heng ZHAO, Mei CHEN, Hui LI, Yanhao WANG. PediaBench: a comprehensive Chinese pediatric dataset for benchmarking large language models. Front. Comput. Sci., 2026, 20(3): 2003902 DOI:10.1007/s11704-025-41345-w

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Singhal K, Azizi S, Tu T, Mahdavi S S, Wei J, Chung H W, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Schärli N, Chowdhery A, Mansfield P, Demner-Fushman D, Agüera y Arcas B, Webster D, Corrado G S, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V . Large language models encode clinical knowledge. Nature, 2023, 620( 7972): 172–180

[2]

Zhang N, Chen M, Bi Z, Liang X, Li L, Shang X, Yin K, Tan C, Xu J, Huang F, Si L, Ni Y, Xie G, Sui Z, Chang B, Zong H, Yuan Z, Li L F, Yan J, Zan H, Zhang K, Tang B, Chen Q. CBLUE: a Chinese biomedical language understanding evaluation benchmark. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 7888–7915

[3]

Jin D, Pan E, Oufattole N, Weng W H, Fang H, Szolovits P . What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 2021, 11( 14): 6421

[4]

Li J, Wang X, Wu X, Zhang Z, Xu X, Fu J, Tiwari P, Wan X, Wang B. Huatuo-26M, a large-scale Chinese medical QA dataset. 2023, arXiv preprint arXiv: 2305.01526

[5]

Wang X, Chen G, Song D, Zhang Z, Chen Z, Xiao Q, Chen J, Jiang F, Li J, Wan X, Wang B, Li H. CMB: a comprehensive medical benchmark in Chinese. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics. 2024, 6184–6205

[6]

Cai Y, Wang L, Wang Y, de Melo G, Zhang Y, Wang Y, He L. MedBench: a large-scale Chinese benchmark for evaluating medical large language models. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 17709–17717

[7]

Liu J, Zhou P, Hua Y, Chong D, Tian Z, Liu A, Wang H, You C, Guo Z, Zhu L, Li M L. Benchmarking large language models on CMExam - a comprehensive Chinese medical exam dataset. In: Proceedings of the 37th International Conference on Neural Information Processing System. 2023, 2283

[8]

Du Z, Qian Y, Liu X, Ding M, Qiu J, Yang Z, Tang J. GLM: general language model pretraining with autoregressive blank infilling. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 320–335

[9]

Bai Y, Ying J, Cao Y, Lv X, He Y, Wang X, Yu J, Zeng K, Xiao Y, Lyu H, Zhang J, Li J, Hou L. Benchmarking foundation models with language-model-as-an-examiner. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 3414

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (833KB)

Supplementary files

Highlights

486

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/