PDF
Abstract
Aim: The purpose of this study was to elucidate differences in patient perspectives on large language model (LLM) vs. physician-generated responses to frequently asked questions about anterior cervical discectomy and fusion (ACDF) surgery.
Methods: This cross-sectional study had three phases: In phase 1, we generated 10 common questions about ACDF surgery using ChatGPT-3.5, ChatGPT-4.0, and Google search. Phase 2 involved obtaining answers to these questions from two spine surgeons, ChatGPT-3.5, and Gemini. In phase 3, we recruited 5 cervical spine surgery patients and 5 age-matched controls to assess the clarity and completeness of the responses.
Results: LLM-generated responses were significantly shorter, on average, than physician-generated responses (30.0 +/- 23.5 vs. 153.7 +/- 86.7 words, P < 0.001). Study participants were more likely to rate LLM-generated responses with more positive clarity ratings (H = 6.25, P = 0.012), with no significant difference in completeness ratings (H = 0.695, P = 0.404). On an individual question basis, there were no significant differences in ratings given to LLM vs. physician-generated responses. Compared with age-matched controls, cervical spine surgery patients were more likely to rate physician-generated responses as higher in clarity (H = 6.42, P = 0.011) and completeness (H = 7.65, P = 0.006).
Conclusion: Despite a small sample size, our findings indicate that LLMs offer comparable, and occasionally preferred, information in terms of clarity and comprehensiveness of responses to common ACDF questions. It is particularly striking that ratings were similar, considering LLM-generated responses were, on average, 80% shorter than physician responses. Further studies are needed to determine how LLMs can be integrated into spine surgery education in the future.
Keywords
Anterior cervical discectomy and fusion (ACDF)
/
large language model (LLM)
/
ChatGPT
/
Gemini
/
patient education
/
health information quality
/
patient perspectives
Cite this article
Download citation ▾
Ezra T. Yoseph, Aneysis D. Gonzalez-Suarez, Siegmund Lang, Atman Desai, Serena S. Hu, Corinna C. Zygourakis.
Patient perspectives on AI: a pilot study comparing large language model and physician-generated responses to routine cervical spine surgery questions.
Artificial Intelligence Surgery, 2024, 4(3): 267-77 DOI:10.20517/ais.2024.38
| [1] |
Rhee JM.Anterior cervical discectomy and fusion.JBJS Essent Surg Tech2016;6:e37 PMCID:PMC6132613
|
| [2] |
Gould H,Haines CM.Anterior cervical discectomy and fusion: techniques, complications, and future directives.Semin Spine Surg2020;32:100772
|
| [3] |
Gaudin D,Mansour TR.Considerations in spinal fusion surgery for chronic lumbar pain: psychosocial factors, rating scales, and perioperative patient education - a review of the literature.World Neurosurg2017;98:21-7
|
| [4] |
Cline RJ.Consumer health information seeking on the Internet: the state of the art.Health Educ Res2001;16:671-92
|
| [5] |
Langford AT,Gupta J,Loeb S.Impact of the Internet on patient-physician communication.Eur Urol Focus2020;6:440-4
|
| [6] |
Thirunavukarasu AJ,Elangovan K,Tan TF.Large language models in medicine.Nat Med2023;29:1930-40
|
| [7] |
Hung YC,Sigel M,Slater ED.Comparison of patient education materials generated by chat generative pre-trained transformer versus experts: an innovative way to increase readability of patient education materials.Ann Plast Surg2023;91:409-12
|
| [8] |
Lang SP,Gonzalez-Suarez AD.Analyzing large language models’ responses to common lumbar spine fusion surgery questions: a comparison between ChatGPT and Bard.Neurospine2024;21:633-41 PMCID:PMC11224745
|
| [9] |
Blease C,Gaab J.Computerization and the future of primary care: a survey of general practitioners in the UK.PLoS One2018;13:e0207418 PMCID:PMC6291067
|
| [10] |
Landis JR.The measurement of observer agreement for categorical data.Biometrics1977;33:159-74
|
| [11] |
Goodman RS,Stone CA Jr.Accuracy and reliability of chatbot responses to physician questions.JAMA Netw Open2023;6:e2336483 PMCID:PMC10546234
|
| [12] |
Subramanian T,Araghi K.Using artificial intelligence to answer common patient-focused questions in minimally invasive spine surgery.J Bone Joint Surg Am2023;105:1649-53
|
| [13] |
Mika AP,Engstrom SM,Wilson JM.Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty.J Bone Joint Surg Am2023;105:1519-26
|
| [14] |
Jahanshahi H,Cevik M.Auto response generation in online medical chat services.J Healthc Inform Res2022;6:344-74 PMCID:PMC9284963
|
| [15] |
Aggarwal A,Wu D,Qiao S.Artificial intelligence-based chatbots for promoting health behavioral changes: systematic review.J Med Internet Res2023;25:e40789 PMCID:PMC10007007
|
| [16] |
Bharti U,Batra H,Lalit S.Medbot: conversational artificial intelligence powered chatbot for delivering tele-health after COVID-19. In: 2020 5th International Conference on Communication and Electronics Systems (ICCES); 2020 Jun 10-12; Coimbatore, India. IEEE; 2020. pp. 870-5.
|
| [17] |
Laranjo L,Tong HL.Conversational agents in healthcare: a systematic review.J Am Med Inform Assoc2018;25:1248-58 PMCID:PMC6118869
|
| [18] |
OpenAI. GPT-4. Available from: https://openai.com/gpt-4. [Last accessed on 27 Sep 2024]
|
| [19] |
OpenAI. GPT-4 Research. Available from: https://openai.com/index/gpt-4-research/. [Last accessed on 27 Sep 2024]
|
| [20] |
Brin D,Vaid A.Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.Sci Rep2023;13:16492 PMCID:PMC10543445
|
| [21] |
Sharma A,Kelleher M,Sall D.USMLE step 2 CK: best predictor of multimodal performance in an internal medicine residency.J Grad Med Educ2019;11:412-9 PMCID:PMC6699543
|
| [22] |
Ayers JW,Dredze M.Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum.JAMA Intern Med2023;183:589-96 PMCID:PMC10148230
|
| [23] |
Ali R,Tang OY.Bridging the literacy gap for surgical consents: an AI-human expert collaborative approach.NPJ Digit Med2024;7:63 PMCID:PMC10923794
|
| [24] |
Emsley R.ChatGPT: these are not hallucinations - they’re fabrications and falsifications.Schizophrenia2023;9:52 PMCID:PMC10439949
|