Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Hong Zhou , Hong-lin Wang , Yu-yu Duan , Zi-neng Yan , Rui Luo , Xiang-xin Lv , Yi Xie , Jia-yao Zhang , Jia-ming Yang , Ming-di Xue , Ying Fang , Lin Lu , Peng-ran Liu , Zhe-wei Ye

Current Medical Science ›› 2024, Vol. 44 ›› Issue (5) : 1001 -1005.

PDF
Current Medical Science ›› 2024, Vol. 44 ›› Issue (5) : 1001 -1005. DOI: 10.1007/s11596-024-2929-4
Original Article

Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Author information +
History +
PDF

Abstract

Objective

This study aimed to evaluate and compare the effectiveness of knowledge base-optimized and unoptimized large language models (LLMs) in the field of orthopedics to explore optimization strategies for the application of LLMs in specific fields.

Methods

This research constructed a specialized knowledge base using clinical guidelines from the American Academy of Orthopaedic Surgeons (AAOS) and authoritative orthopedic publications. A total of 30 orthopedic-related questions covering aspects such as anatomical knowledge, disease diagnosis, fracture classification, treatment options, and surgical techniques were input into both the knowledge base-optimized and unoptimized versions of the GPT-4, ChatGLM, and Spark LLM, with their generated responses recorded. The overall quality, accuracy, and comprehensiveness of these responses were evaluated by 3 experienced orthopedic surgeons.

Results

Compared with their unoptimized LLMs, the optimized version of GPT-4 showed improvements of 15.3% in overall quality, 12.5% in accuracy, and 12.8% in comprehensiveness; ChatGLM showed improvements of 24.8%, 16.1%, and 19.6%, respectively; and Spark LLM showed improvements of 6.5%, 14.5%, and 24.7%, respectively.

Conclusion

The optimization of knowledge bases significantly enhances the quality, accuracy, and comprehensiveness of the responses provided by the 3 models in the orthopedic field. Therefore, knowledge base optimization is an effective method for improving the performance of LLMs in specific fields.

Cite this article

Download citation ▾
Hong Zhou, Hong-lin Wang, Yu-yu Duan, Zi-neng Yan, Rui Luo, Xiang-xin Lv, Yi Xie, Jia-yao Zhang, Jia-ming Yang, Ming-di Xue, Ying Fang, Lin Lu, Peng-ran Liu, Zhe-wei Ye. Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization. Current Medical Science, 2024, 44(5): 1001-1005 DOI:10.1007/s11596-024-2929-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Fritz B, Yi PH, Kijowski R, et al.. Radiomics and Deep Learning for Disease Detection in Musculoskeletal Radiology. Invest Radiol, 2023, 58(1): 3-13

[2]

Zhang J, Lin H, Wang H, et al.. Deep learning system assisted detection and localization of lumbar spondylolisthesis. Front Bioeng Biotechnol, 2023, 11: 1194009

[3]

Xie Y, Seth I, Hunter-Smith DJ, et al.. Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT. Aesth Plast Surg, 2023, 47(5): 1985-1993

[4]

Shrestha N, Shen Z, Zaidat B, et al.. Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain. Spine (Phila Pa 1976), 2024, 49(9): 640-651

[5]

Zaretsky J, Kim JM, Baskharoun S, et al.. Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format. JAMA Netw Open, 2024, 7(3): e240357

[6]

Gundluru N, Rajput DS, Lakshmanna K, et al.. Enhancement of Detection of Diabetic Retinopathy Using Harris Hawks Optimization with Deep Learning Model. Comput Intell Neurosci, 2022, 2022: 1-13

[7]

Cheng R, Crouzier M, Hug F, et al.. Automatic quadriceps and patellae segmentation of MRI with cascaded U2-Net and SASSNet deep learning model. Med Phys, 2022, 49(1): 443-460

[8]

Blanchard F, Assefi M, Gatulle N, et al.. ChatGPT in the world of medical research: From how it works to how to use it. Anaesth Crit Care Pain Med, 2023, 42(3): 101231

[9]

Gupta R, Herzog I, Park JB, et al.. Performance of ChatGPT on the Plastic Surgery Inservice Training Examination. Aesthet Surg J, 2023, 43(12): 1078-1082

[10]

Kung TH, Cheatham M, Medenilla A, et al.. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health, 2023, 2(2): e0000198

[11]

Giannos P, Delardas O. Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations. JMIR Med Educ, 2023, 9: e47737

[12]

He Y, Tang H, Wang D, et al.. Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?. Ann Biomed Eng, 2023, 51: 1362-1365

[13]

Callcut RA, Kornblith LZ, Conroy AS, et al.. The why and how our trauma patients die: A prospective Multicenter Western Trauma Association study. J Trauma Acute Care Surg, 2019, 86(5): 864-870

[14]

Kaarre J, Feldt R, Keeling LE, et al.. Exploring the potential of ChatGPT as a supplementary tool for providing orthopaedic information. Knee Surg Sports Traumatol Arthrosc, 2023, 31(11): 5190-5198

[15]

Sinha RK, Deb Roy A, Kumar N, et al.. Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology. Cureus, 2023, 15(2): e35237

[16]

Li J, Hui B, Qu G, et al.. Can LLM Already Serve as A Database Interface? A Big Bench for Large-Scale Database Grounded Text-to-SQLs. Adv Neural Informat Process Syst, 2023, 36: 42330-42357

[17]

Bratić D, Šapina M, Jurečić D, et al.. Centralized Database Access: Transformer Framework and LLM/Chatbot Integration-Based Hybrid Model. Appl Syst Innov, 2024, 7(1): 17

[18]

Che W, Zhou Z, Feng Y, et al.. Towards a comprehensive understanding of the impact of large language models on natural language processing: challenges, opportunities and future. Sci Sin Inf (Chinese), 2023, 53(9): 1645

[19]

Peng C, Xia F, Naseriparsa M, et al.. Knowledge Graphs: Opportunities and Challenges. Artif Intell Rev, 2023, 56(11): 13071-13102

[20]

Yao Y, Duan J, Xu K, et al.. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 2024, 4(2): 100211

[21]

Zhou X, Sun Z, Li G. DB-GPT: Large Language Model Meets Database. Data Sci Eng, 2024, 9(1): 102-111

[22]

Xiao Z, Li W, Moon H, et al.. Generative Artificial Intelligence GPT-4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology. ACS Synth Biol, 2023, 12(10): 2973-2982

[23]

O’Connor MI, Switzer JA. AAOS Clinical Practice Guideline Summary: Management of Hip Fractures in Older Adults. J Am Acad Orthop Surg, 2022, 30(20): e1291-e1296

[24]

Lichtman DM, Bindra RR, Boyer MI, et al.. American Academy of Orthopaedic Surgeons clinical practice guideline on: the treatment of distal radius fractures. J Bone Joint Surg Am, 2011, 93(8): 775-778

[25]

Kamal RN, Shapiro LM. American Academy of Orthopaedic Surgeons/American Society for Surgery of the Hand Clinical Practice Guideline Summary Management of Distal Radius Fractures. J Am Acad Orthop Surg, 2022, 30(4): e480-e486

[26]

Solomon L, Warwick D, Nayagam S. Apley’s System of Orthopaedics and Fractures, 2010 10th edition Boca Raton CRC Press

[27]

Skinner H, McMahon PJ. Current Diagnosis and Treatment in Orthopedics, 2013 5th Edition New York McGraw Hill Medical

[28]

Centre NCG. Fractures (Non-Complex): Assessment and Management, 2016 London National Institute for Health and Care Excellence (NICE)

[29]

Schroeder JD, Turner SP, Buck E. Hip Fractures: Diagnosis and Management. Am Fam Physician, 2022, 106(6): 675-683

[30]

Sivananthan S, Sherry E, Warnke P, et al.. Mercer’s Textbook of Orthopaedics and Trauma, 2012 London CRC Press

[31]

Wilsonmacdonald J. Oxford Textbook of Trauma and Orthopaedics, 2011 2nd edition London Oxford University Press

[32]

Hargett DI, Sanderson BR, Little MTM. Patella Fractures: Approach to Treatment. J Am Acad Orthop Surg, 2021, 29(6): 244-253

[33]

Noorden RV. ChatGPT-like AIs are coming to major science search engines. Nature, 2023, 620(7973): 258

[34]

Liu P, Qian L, Zhao X, et al.. Joint Knowledge Graph and Large Language Model for Fault Diagnosis and Its Application in Aviation Assembly. IEEE Trans Ind Inf, 2024, 20(6): 2588-2599

[35]

Pan S, Luo L, Wang Y, et al.. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans Knowl Data Eng, 2024, 36(7): 1-20

[36]

Ghosh A, Bir A. Evaluating ChatGPT’s Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry. Cureus, 2023, 15(4): e37023

RIGHTS & PERMISSIONS

Huazhong University of Science and Technology

AI Summary AI Mindmap
PDF

173

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/