Exploring chatbot applications in pancreatic disease treatment: potential and pitfalls
Alberto Balduzzi , Matteo De Pastena , Susanna Tondato , Federico Gronchi , Tommaso Dall’Olio , Giuseppe Malleo , Antonio Pea , Salvatore Paiella , Roberto Salvia
Artificial Intelligence Surgery ›› 2025, Vol. 5 ›› Issue (3) : 377 -86.
Aim: The use of chatbots to respond across various domains is becoming more integrated into daily life, potentially replacing traditional search engines. The study aimed to investigate the performance of different large language models (LLMs) in providing recommendations regarding pancreatic cancer (PC) to surgeons.
Methods: Standardized prompts were engineered to query four freely accessible LLMs (ChatGPT-4, Personal Intelligence by Inflection AI, Anthropic Claude 3 Haiku Version 3.5, Perplexity AI) on October 9th, 2024. Fourteen questions included the incidence, diagnosis, and treatment for radiologically resectable, borderline resectable, locally advanced, and metastatic PC. Three different investigators queried the LLMS simultaneously. The reliability and accuracy of the responses were evaluated using a 4-point Likert scale and then compared to the international guidelines. Descriptive statistics were used to report outcomes as counts and percentages.
Results: Overall, 72% of the responses were deemed correct (scored 3 or 4). Claude provided the most accurate responses (32%), followed by ChatGPT (28%). ChatGPT-4 and Anthropic Claude 3 Haiku Version 3.5 achieved the overall highest score rate (4-point) at 50% and 52%, respectively. Regarding the quality and accuracy of the responses, ChatGPT cited guidelines most frequently (29%). However, only 19% of all evaluated responses included guideline citations.
Conclusion: The LLMs are still not suitable for safe, standalone use in the medical field, but their rapid learning capabilities suggest they may become indispensable tools for medical professionals in the future.
Leveraging large language model (LLM) / artificial intelligence / pancreas / pancreatic ductal adenocarcinoma / guidelines
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
Lancet Digital Health. ChatGPT: friend or foe?.Lancet Digit Health2023;5:e102 |
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
NCCN Clinical Practice Guidelines in Oncology (NCCN Guidelines®) Pancreatic Adenocarcinoma version 3.2024. Available from: https://www.nccn.org/guidelines/guidelines-detail?category=1&id=1455 (accessed on 2025-7-30) |
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
/
| 〈 |
|
〉 |