Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Hong Zhou; Hong-lin Wang; Yu-yu Duan; Zi-neng Yan; Rui Luo; Xiang-xin Lv; Yi Xie; Jia-yao Zhang; Jia-ming Yang; Ming-di Xue; Ying Fang; Lin Lu; Peng-ran Liu; Zhe-wei Ye

doi:10.1007/s11596-024-2929-4

Current Medical Science ›› 2024, Vol. 44 ›› Issue (5) : 1001-1005. DOI: 10.1007/s11596-024-2929-4

Original Article

Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Hong Zhou¹^,² ,
Hong-lin Wang¹^,² ,
Yu-yu Duan²^,³ ,
Zi-neng Yan¹^,² ,
Rui Luo¹^,² ,
Xiang-xin Lv¹^,² ,
Yi Xie¹^,² ,
Jia-yao Zhang¹^,² ,
Jia-ming Yang¹^,² ,
Ming-di Xue¹^,² ,
Ying Fang¹^,² ,
Lin Lu²^,⁴^,^a ,
Peng-ran Liu¹^,²^,^b ,
Zhe-wei Ye¹^,²^,^c

Author information +

History +

Abstract

Objective

This study aimed to evaluate and compare the effectiveness of knowledge base-optimized and unoptimized large language models (LLMs) in the field of orthopedics to explore optimization strategies for the application of LLMs in specific fields.

Methods

This research constructed a specialized knowledge base using clinical guidelines from the American Academy of Orthopaedic Surgeons (AAOS) and authoritative orthopedic publications. A total of 30 orthopedic-related questions covering aspects such as anatomical knowledge, disease diagnosis, fracture classification, treatment options, and surgical techniques were input into both the knowledge base-optimized and unoptimized versions of the GPT-4, ChatGLM, and Spark LLM, with their generated responses recorded. The overall quality, accuracy, and comprehensiveness of these responses were evaluated by 3 experienced orthopedic surgeons.

Results

Compared with their unoptimized LLMs, the optimized version of GPT-4 showed improvements of 15.3% in overall quality, 12.5% in accuracy, and 12.8% in comprehensiveness; ChatGLM showed improvements of 24.8%, 16.1%, and 19.6%, respectively; and Spark LLM showed improvements of 6.5%, 14.5%, and 24.7%, respectively.

Conclusion

The optimization of knowledge bases significantly enhances the quality, accuracy, and comprehensiveness of the responses provided by the 3 models in the orthopedic field. Therefore, knowledge base optimization is an effective method for improving the performance of LLMs in specific fields.

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Hong Zhou, Hong-lin Wang, Yu-yu Duan, Zi-neng Yan, Rui Luo, Xiang-xin Lv, Yi Xie, Jia-yao Zhang, Jia-ming Yang, Ming-di Xue, Ying Fang, Lin Lu, Peng-ran Liu, Zhe-wei Ye. Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization. Current Medical Science, 2024, 44(5): 1001‒1005 https://doi.org/10.1007/s11596-024-2929-4

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Fritz B, Yi PH, Kijowski R, et al.. Radiomics and Deep Learning for Disease Detection in Musculoskeletal Radiology. Invest Radiol, 2023, 58(1): 3-13

[2]	Zhang J, Lin H, Wang H, et al.. Deep learning system assisted detection and localization of lumbar spondylolisthesis. Front Bioeng Biotechnol, 2023, 11: 1194009 CrossRef Google scholar

[3]	Xie Y, Seth I, Hunter-Smith DJ, et al.. Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT. Aesth Plast Surg, 2023, 47(5): 1985-1993 CrossRef Google scholar

[4]	Shrestha N, Shen Z, Zaidat B, et al.. Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain. Spine (Phila Pa 1976), 2024, 49(9): 640-651 CrossRef Google scholar

[5]	Zaretsky J, Kim JM, Baskharoun S, et al.. Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format. JAMA Netw Open, 2024, 7(3): e240357 CrossRef Google scholar

[6]	Gundluru N, Rajput DS, Lakshmanna K, et al.. Enhancement of Detection of Diabetic Retinopathy Using Harris Hawks Optimization with Deep Learning Model. Comput Intell Neurosci, 2022, 2022: 1-13 CrossRef Google scholar

[7]	Cheng R, Crouzier M, Hug F, et al.. Automatic quadriceps and patellae segmentation of MRI with cascaded U2-Net and SASSNet deep learning model. Med Phys, 2022, 49(1): 443-460 CrossRef Google scholar

[8]	Blanchard F, Assefi M, Gatulle N, et al.. ChatGPT in the world of medical research: From how it works to how to use it. Anaesth Crit Care Pain Med, 2023, 42(3): 101231 CrossRef Google scholar

[9]	Gupta R, Herzog I, Park JB, et al.. Performance of ChatGPT on the Plastic Surgery Inservice Training Examination. Aesthet Surg J, 2023, 43(12): 1078-1082 CrossRef Google scholar

[10]	Kung TH, Cheatham M, Medenilla A, et al.. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health, 2023, 2(2): e0000198 CrossRef Google scholar

[11]	Giannos P, Delardas O. Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations. JMIR Med Educ, 2023, 9: e47737 CrossRef Google scholar

[12]	He Y, Tang H, Wang D, et al.. Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?. Ann Biomed Eng, 2023, 51: 1362-1365 CrossRef Google scholar

[13]	Callcut RA, Kornblith LZ, Conroy AS, et al.. The why and how our trauma patients die: A prospective Multicenter Western Trauma Association study. J Trauma Acute Care Surg, 2019, 86(5): 864-870 CrossRef Google scholar

[14]	Kaarre J, Feldt R, Keeling LE, et al.. Exploring the potential of ChatGPT as a supplementary tool for providing orthopaedic information. Knee Surg Sports Traumatol Arthrosc, 2023, 31(11): 5190-5198 CrossRef Google scholar

[15]	Sinha RK, Deb Roy A, Kumar N, et al.. Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology. Cureus, 2023, 15(2): e35237

[16]	Li J, Hui B, Qu G, et al.. Can LLM Already Serve as A Database Interface? A Big Bench for Large-Scale Database Grounded Text-to-SQLs. Adv Neural Informat Process Syst, 2023, 36: 42330-42357

[17]	Bratić D, Šapina M, Jurečić D, et al.. Centralized Database Access: Transformer Framework and LLM/Chatbot Integration-Based Hybrid Model. Appl Syst Innov, 2024, 7(1): 17 CrossRef Google scholar

[18]	Che W, Zhou Z, Feng Y, et al.. Towards a comprehensive understanding of the impact of large language models on natural language processing: challenges, opportunities and future. Sci Sin Inf (Chinese), 2023, 53(9): 1645 CrossRef Google scholar

[19]	Peng C, Xia F, Naseriparsa M, et al.. Knowledge Graphs: Opportunities and Challenges. Artif Intell Rev, 2023, 56(11): 13071-13102 CrossRef Google scholar

[20]	Yao Y, Duan J, Xu K, et al.. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 2024, 4(2): 100211 CrossRef Google scholar

[21]	Zhou X, Sun Z, Li G. DB-GPT: Large Language Model Meets Database. Data Sci Eng, 2024, 9(1): 102-111 CrossRef Google scholar

[22]	Xiao Z, Li W, Moon H, et al.. Generative Artificial Intelligence GPT-4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology. ACS Synth Biol, 2023, 12(10): 2973-2982 CrossRef Google scholar

[23]	O’Connor MI, Switzer JA. AAOS Clinical Practice Guideline Summary: Management of Hip Fractures in Older Adults. J Am Acad Orthop Surg, 2022, 30(20): e1291-e1296

[24]	Lichtman DM, Bindra RR, Boyer MI, et al.. American Academy of Orthopaedic Surgeons clinical practice guideline on: the treatment of distal radius fractures. J Bone Joint Surg Am, 2011, 93(8): 775-778 CrossRef Google scholar

[25]	Kamal RN, Shapiro LM. American Academy of Orthopaedic Surgeons/American Society for Surgery of the Hand Clinical Practice Guideline Summary Management of Distal Radius Fractures. J Am Acad Orthop Surg, 2022, 30(4): e480-e486 CrossRef Google scholar

[26]	Solomon L, Warwick D, Nayagam S. Apley’s System of Orthopaedics and Fractures, 2010 10th edition Boca Raton CRC Press CrossRef Google scholar

[27]	Skinner H, McMahon PJ. Current Diagnosis and Treatment in Orthopedics, 2013 5th Edition New York McGraw Hill Medical

[28]	Centre NCG. Fractures (Non-Complex): Assessment and Management, 2016 London National Institute for Health and Care Excellence (NICE)

[29]	Schroeder JD, Turner SP, Buck E. Hip Fractures: Diagnosis and Management. Am Fam Physician, 2022, 106(6): 675-683

[30]	Sivananthan S, Sherry E, Warnke P, et al.. Mercer’s Textbook of Orthopaedics and Trauma, 2012 London CRC Press CrossRef Google scholar

[31]	Wilsonmacdonald J. Oxford Textbook of Trauma and Orthopaedics, 2011 2nd edition London Oxford University Press

[32]	Hargett DI, Sanderson BR, Little MTM. Patella Fractures: Approach to Treatment. J Am Acad Orthop Surg, 2021, 29(6): 244-253

[33]	Noorden RV. ChatGPT-like AIs are coming to major science search engines. Nature, 2023, 620(7973): 258 CrossRef Google scholar

[34]	Liu P, Qian L, Zhao X, et al.. Joint Knowledge Graph and Large Language Model for Fault Diagnosis and Its Application in Aviation Assembly. IEEE Trans Ind Inf, 2024, 20(6): 2588-2599 CrossRef Google scholar

[35]	Pan S, Luo L, Wang Y, et al.. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans Knowl Data Eng, 2024, 36(7): 1-20 CrossRef Google scholar

[36]	Ghosh A, Bir A. Evaluating ChatGPT’s Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry. Cureus, 2023, 15(4): e37023