Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Hong Zhou, Hong-lin Wang, Yu-yu Duan, Zi-neng Yan, Rui Luo, Xiang-xin Lv, Yi Xie, Jia-yao Zhang, Jia-ming Yang, Ming-di Xue, Ying Fang, Lin Lu, Peng-ran Liu, Zhe-wei Ye

Current Medical Science ›› DOI: 10.1007/s11596-024-2929-4
Original Article

Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Author information +
History +

Abstract

Objective

This study aimed to evaluate and compare the effectiveness of knowledge base-optimized and unoptimized large language models (LLMs) in the field of orthopedics to explore optimization strategies for the application of LLMs in specific fields.

Methods

This research constructed a specialized knowledge base using clinical guidelines from the American Academy of Orthopaedic Surgeons (AAOS) and authoritative orthopedic publications. A total of 30 orthopedic-related questions covering aspects such as anatomical knowledge, disease diagnosis, fracture classification, treatment options, and surgical techniques were input into both the knowledge base-optimized and unoptimized versions of the GPT-4, ChatGLM, and Spark LLM, with their generated responses recorded. The overall quality, accuracy, and comprehensiveness of these responses were evaluated by 3 experienced orthopedic surgeons.

Results

Compared with their unoptimized LLMs, the optimized version of GPT-4 showed improvements of 15.3% in overall quality, 12.5% in accuracy, and 12.8% in comprehensiveness; ChatGLM showed improvements of 24.8%, 16.1%, and 19.6%, respectively; and Spark LLM showed improvements of 6.5%, 14.5%, and 24.7%, respectively.

Conclusion

The optimization of knowledge bases significantly enhances the quality, accuracy, and comprehensiveness of the responses provided by the 3 models in the orthopedic field. Therefore, knowledge base optimization is an effective method for improving the performance of LLMs in specific fields.

Cite this article

Download citation ▾
Hong Zhou, Hong-lin Wang, Yu-yu Duan, Zi-neng Yan, Rui Luo, Xiang-xin Lv, Yi Xie, Jia-yao Zhang, Jia-ming Yang, Ming-di Xue, Ying Fang, Lin Lu, Peng-ran Liu, Zhe-wei Ye. Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization. Current Medical Science, https://doi.org/10.1007/s11596-024-2929-4

References

[1]
Fritz B, Yi PH, Kijowski R, et al. . Radiomics and Deep Learning for Disease Detection in Musculoskeletal Radiology. Invest Radiol, 2023, 58(1): 3-13.
CrossRef Google scholar
[2]
Zhang J, Lin H, Wang H, et al. . Deep learning system assisted detection and localization of lumbar spondylolisthesis. Front Bioeng Biotechnol, 2023, 11: 1194009.
CrossRef Google scholar
[3]
Xie Y, Seth I, Hunter-Smith DJ, et al. . Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT. Aesth Plast Surg, 2023, 47(5): 1985-1993.
CrossRef Google scholar
[4]
Shrestha N, Shen Z, Zaidat B, et al. . Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain. Spine (Phila Pa 1976), 2024, 49(9): 640-651.
CrossRef Google scholar
[5]
Zaretsky J, Kim JM, Baskharoun S, et al. . Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format. JAMA Netw Open, 2024, 7(3): e240357.
CrossRef Google scholar
[6]
Gundluru N, Rajput DS, Lakshmanna K, et al. . Enhancement of Detection of Diabetic Retinopathy Using Harris Hawks Optimization with Deep Learning Model. Comput Intell Neurosci, 2022, 2022: 1-13.
CrossRef Google scholar
[7]
Cheng R, Crouzier M, Hug F, et al. . Automatic quadriceps and patellae segmentation of MRI with cascaded U2-Net and SASSNet deep learning model. Med Phys, 2022, 49(1): 443-460.
CrossRef Google scholar
[8]
Blanchard F, Assefi M, Gatulle N, et al. . ChatGPT in the world of medical research: From how it works to how to use it. Anaesth Crit Care Pain Med, 2023, 42(3): 101231.
CrossRef Google scholar
[9]
Gupta R, Herzog I, Park JB, et al. . Performance of ChatGPT on the Plastic Surgery Inservice Training Examination. Aesthet Surg J, 2023, 43(12): 1078-1082.
CrossRef Google scholar
[10]
Kung TH, Cheatham M, Medenilla A, et al. . Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health, 2023, 2(2): e0000198.
CrossRef Google scholar
[11]
Giannos P, Delardas O. Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations. JMIR Med Educ, 2023, 9: e47737.
CrossRef Google scholar
[12]
He Y, Tang H, Wang D, et al. . Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?. Ann Biomed Eng, 2023, 51: 1362-1365.
CrossRef Google scholar
[13]
Callcut RA, Kornblith LZ, Conroy AS, et al. . The why and how our trauma patients die: A prospective Multicenter Western Trauma Association study. J Trauma Acute Care Surg, 2019, 86(5): 864-870.
CrossRef Google scholar
[14]
Kaarre J, Feldt R, Keeling LE, et al. . Exploring the potential of ChatGPT as a supplementary tool for providing orthopaedic information. Knee Surg Sports Traumatol Arthrosc, 2023, 31(11): 5190-5198.
CrossRef Google scholar
[15]
Sinha RK, Deb Roy A, Kumar N, et al. . Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology. Cureus, 2023, 15(2): e35237.
[16]
Li J, Hui B, Qu G, et al. . Can LLM Already Serve as A Database Interface? A Big Bench for Large-Scale Database Grounded Text-to-SQLs. Adv Neural Informat Process Syst, 2023, 36: 42330-42357
[17]
Bratić D, Šapina M, Jurečić D, et al. . Centralized Database Access: Transformer Framework and LLM/Chatbot Integration-Based Hybrid Model. Appl Syst Innov, 2024, 7(1): 17.
CrossRef Google scholar
[18]
Che W, Zhou Z, Feng Y, et al. . Towards a comprehensive understanding of the impact of large language models on natural language processing: challenges, opportunities and future. Sci Sin Inf (Chinese), 2023, 53(9): 1645.
CrossRef Google scholar
[19]
Peng C, Xia F, Naseriparsa M, et al. . Knowledge Graphs: Opportunities and Challenges. Artif Intell Rev, 2023, 56(11): 13071-13102.
CrossRef Google scholar
[20]
Yao Y, Duan J, Xu K, et al. . A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 2024, 4(2): 100211.
CrossRef Google scholar
[21]
Zhou X, Sun Z, Li G. DB-GPT: Large Language Model Meets Database. Data Sci Eng, 2024, 9(1): 102-111.
CrossRef Google scholar
[22]
Xiao Z, Li W, Moon H, et al. . Generative Artificial Intelligence GPT-4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology. ACS Synth Biol, 2023, 12(10): 2973-2982.
CrossRef Google scholar
[23]
O’Connor MI, Switzer JA. AAOS Clinical Practice Guideline Summary: Management of Hip Fractures in Older Adults. J Am Acad Orthop Surg, 2022, 30(20): e1291-e1296
[24]
Lichtman DM, Bindra RR, Boyer MI, et al. . American Academy of Orthopaedic Surgeons clinical practice guideline on: the treatment of distal radius fractures. J Bone Joint Surg Am, 2011, 93(8): 775-778.
CrossRef Google scholar
[25]
Kamal RN, Shapiro LM. American Academy of Orthopaedic Surgeons/American Society for Surgery of the Hand Clinical Practice Guideline Summary Management of Distal Radius Fractures. J Am Acad Orthop Surg, 2022, 30(4): e480-e486.
CrossRef Google scholar
[26]
Solomon L, Warwick D, Nayagam S Apley’s System of Orthopaedics and Fractures, 2010 10th edition Boca Raton CRC Press,
CrossRef Google scholar
[27]
Skinner H, McMahon PJ Current Diagnosis and Treatment in Orthopedics, 2013 5th Edition New York McGraw Hill Medical
[28]
Centre NCG Fractures (Non-Complex): Assessment and Management, 2016 London National Institute for Health and Care Excellence (NICE)
[29]
Schroeder JD, Turner SP, Buck E. Hip Fractures: Diagnosis and Management. Am Fam Physician, 2022, 106(6): 675-683
[30]
Sivananthan S, Sherry E, Warnke P, et al. Mercer’s Textbook of Orthopaedics and Trauma, 2012 London CRC Press,
CrossRef Google scholar
[31]
Wilsonmacdonald J Oxford Textbook of Trauma and Orthopaedics, 2011 2nd edition London Oxford University Press
[32]
Hargett DI, Sanderson BR, Little MTM. Patella Fractures: Approach to Treatment. J Am Acad Orthop Surg, 2021, 29(6): 244-253
[33]
Noorden RV. ChatGPT-like AIs are coming to major science search engines. Nature, 2023, 620(7973): 258.
CrossRef Google scholar
[34]
Liu P, Qian L, Zhao X, et al. . Joint Knowledge Graph and Large Language Model for Fault Diagnosis and Its Application in Aviation Assembly. IEEE Trans Ind Inf, 2024, 20(6): 2588-2599.
CrossRef Google scholar
[35]
Pan S, Luo L, Wang Y, et al. . Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans Knowl Data Eng, 2024, 36(7): 1-20.
CrossRef Google scholar
[36]
Ghosh A, Bir A. Evaluating ChatGPT’s Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry. Cureus, 2023, 15(4): e37023.

Accesses

Citations

Detail

Sections
Recommended

/