Exploring chatbot applications in pancreatic disease treatment: potential and pitfalls

Alberto Balduzzi; Matteo De Pastena; Susanna Tondato; Federico Gronchi; Tommaso Dall’Olio; Giuseppe Malleo; Antonio Pea; Salvatore Paiella; Roberto Salvia

doi:10.20517/ais.2025.11

Artificial Intelligence Surgery ›› 2025, Vol. 5 ›› Issue (3) :377 -86. DOI: 10.20517/ais.2025.11

Original Article

Exploring chatbot applications in pancreatic disease treatment: potential and pitfalls

Author information +

History +

PDF

Abstract

Aim: The use of chatbots to respond across various domains is becoming more integrated into daily life, potentially replacing traditional search engines. The study aimed to investigate the performance of different large language models (LLMs) in providing recommendations regarding pancreatic cancer (PC) to surgeons.

Methods: Standardized prompts were engineered to query four freely accessible LLMs (ChatGPT-4, Personal Intelligence by Inflection AI, Anthropic Claude 3 Haiku Version 3.5, Perplexity AI) on October 9th, 2024. Fourteen questions included the incidence, diagnosis, and treatment for radiologically resectable, borderline resectable, locally advanced, and metastatic PC. Three different investigators queried the LLMS simultaneously. The reliability and accuracy of the responses were evaluated using a 4-point Likert scale and then compared to the international guidelines. Descriptive statistics were used to report outcomes as counts and percentages.

Results: Overall, 72% of the responses were deemed correct (scored 3 or 4). Claude provided the most accurate responses (32%), followed by ChatGPT (28%). ChatGPT-4 and Anthropic Claude 3 Haiku Version 3.5 achieved the overall highest score rate (4-point) at 50% and 52%, respectively. Regarding the quality and accuracy of the responses, ChatGPT cited guidelines most frequently (29%). However, only 19% of all evaluated responses included guideline citations.

Conclusion: The LLMs are still not suitable for safe, standalone use in the medical field, but their rapid learning capabilities suggest they may become indispensable tools for medical professionals in the future.

Keywords

Leveraging large language model (LLM) / artificial intelligence / pancreas / pancreatic ductal adenocarcinoma / guidelines

Cite this article

Download citation ▾

Alberto Balduzzi, Matteo De Pastena, Susanna Tondato, Federico Gronchi, Tommaso Dall’Olio, Giuseppe Malleo, Antonio Pea, Salvatore Paiella, Roberto Salvia. Exploring chatbot applications in pancreatic disease treatment: potential and pitfalls. Artificial Intelligence Surgery, 2025, 5(3): 377-86 DOI:10.20517/ais.2025.11

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Roumeliotis KI.ChatGPT and Open-AI models: a preliminary review.Future Internet2023;15:192

[2]	Lee P,Petro J.Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine.N Engl J Med2023;388:1233-9

[3]	Olszewski R,Mańczak M,Jeziorski K.Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: a comparative study.Int J Med Inform2024;190:105562

[4]	Nirala KK,Purani VS.A survey on providing customer and public administration based services using AI: chatbot.Multimed Tools Appl2022;81:22215-46 PMCID:PMC8721490

[5]	Gallifant J,Levites Strekalova YA.Peer review of GPT-4 technical report and systems card.PLOS Digit Health2024;3:e0000417 PMCID:PMC10795998

[6]	Liebrenz M,Buadze A,Smith A.Generating scholarly content with ChatGPT: ethical challenges for medical publishing.Lancet Digit Health2023;5:e105-6

[7]	Lancet Digital Health. ChatGPT: friend or foe?.Lancet Digit Health2023;5:e102

[8]	Haug CJ.Artificial intelligence and machine learning in clinical medicine, 2023.N Engl J Med2023;388:1201-8

[9]	Wagner MW.Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information.Can Assoc Radiol J2024;75:69-73

[10]	Walker HL,Kuemmerli C.Reliability of medical information provided by ChatGPT: assessment against clinical guidelines and patient information quality instrument.J Med Internet Res2023;25:e47479 PMCID:PMC10365578

[11]	Johnson D,Patrinely J.Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model.Res Sq2023:rs PMCID:PMC10002821

[12]	Lee JW,Kim JH.Development of AI-generated medical responses using the ChatGPT for cancer patients.Comput Methods Programs Biomed2024;254:108302

[13]	Emile SH,Freund M.How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer?.Surgery2023;174:1273-5

[14]	Mihalache A,Muni RH.Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment.JAMA Ophthalmol2023;141:589-97 PMCID:PMC10141269

[15]	Mackay TM,Augustinus S.Dutch Pancreatic Cancer GroupImplementation of best practices in pancreatic cancer care in the Netherlands: a stepped-wedge randomized clinical trial.JAMA Surg2024;159:429-37

[16]	NCCN Clinical Practice Guidelines in Oncology (NCCN Guidelines®) Pancreatic Adenocarcinoma version 3.2024. Available from: https://www.nccn.org/guidelines/guidelines-detail?category=1&id=1455 (accessed on 2025-7-30)

[17]	Conroy T,Vilgrain V.ESMO Guidelines CommitteeElectronic address: clinicalguidelines@esmo.org. Pancreatic cancer: ESMO clinical practice guideline for diagnosis, treatment and follow-up.Ann Oncol2023;34:987-1002

[18]	O’Reilly D,Hasler E.Diagnosis and management of pancreatic cancer in adults: A summary of guidelines from the UK National Institute for Health and Care Excellence.Pancreatology2018;18:962-70

[19]	Okusaka T,Yoshida M.Committee for Revision of Clinical Guidelines for Pancreatic Cancer of the Japan Pancreas SocietyClinical practice guidelines for pancreatic cancer 2022 from the Japan pancreas society: a synopsis.Int J Clin Oncol2023;28:493-511

[20]	Rooney MK,Perni S.Readability of patient education materials from high-impact medical journals: a 20-year analysis.J Patient Exp2021;8:2374373521998847 PMCID:PMC8205335

[21]	Huo B,Collins GS,Lee Y.Reporting standards for the use of large language model-linked chatbots for health advice.Nat Med2023;29:2988

[22]	Cotugna N,Carpenter-Haefele KM.Evaluation of literacy level of patient education pages in health-related journals.J Community Health2005;30:213-9

[23]	Hermann CE,Boyd L,Aviki E.Let’s chat about cervical cancer: assessing the accuracy of ChatGPT responses to cervical cancer questions.Gynecol Oncol2023;179:164-8

[24]

Lee Y,Brar K.ASMBS Artificial Intelligence and Digital Surgery TaskforcePerformance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American society for metabolic and bariatric surgery textbook of bariatric surgery questions.Surg Obes Relat Dis2024;20:609-13

[25]	Carl N,Haggenmüller S.Large language model use in clinical oncology.NPJ Precis Oncol2024;8:240 PMCID:PMC11499929

[26]	Kuşcu O,Sütay Süslü N.Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?.Front Oncol2023;13:1256459 PMCID:PMC10722294

[27]	Goodman RS,Stone CA Jr.Accuracy and reliability of chatbot responses to physician questions.JAMA Netw Open2023;6:e2336483 PMCID:PMC10546234

[28]	Bhattacharyya M,Bhattacharyya D.High rates of fabricated and inaccurate references in ChatGPT-generated medical content.Cureus2023;15:e39238 PMCID:PMC10277170

[29]	Shool S,Saboori Amleshi R,Golpira R.A systematic review of large language model (LLM) evaluations in clinical medicine.BMC Med Inform Decis Mak2025;25:117 PMCID:PMC11889796