Artificial Intelligence Surgery

2025-07-30 2025, Volume 5 Issue 3

Previous Next

Select all

Systematic Review

Machine learning and artificial intelligence for predicting short and long-term complications following metabolic bariatric surgery - a systematic review

Athanasios G. Pantelis, Panagiota Epiphaniou, Dimitris P. Lapatsanis

2025, 5(3): 322-44. https://doi.org/10.20517/ais.2024.104

Background: Machine learning (ML) and other applications of artificial intelligence (AI) are revolutionizing medicine, particularly in the field of surgery. These models have the potential to outperform traditional predictive tools, aiding clinicians in decision making and enhancing operative safety through improved patient selection.

Methods: A systematic search was conducted across PubMed/MEDLINE and Google Scholar, guided by the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement, to identify studies employing ML and AI algorithms to predict postoperative complications following metabolic bariatric surgery (MBS). The search included primary studies published in English up to November 2024. The area under the receiver operating characteristic curve (AUROC) was used as a surrogate metric for algorithm performance, with values exceeding 0.8 considered clinically significant; however, studies were not excluded based on AUROC thresholds.

Results: The search identified 23 studies meeting the inclusion criteria. These were categorized into seven domains: general complications (8 studies, 34.8%), readmissions after MBS (4 studies, 17.4%), hemorrhage (1 study, 4.3%), leaks (1 study, 4.3%), venous thromboembolism (3 studies, 13.0%), nutritional deficiencies (4 studies, 17.4%), and miscellaneous complications such as gastroesophageal reflux disease, gallbladder disease, and major adverse cardiovascular events (MACE) (3 studies, 13.0%). The studies spanned from 2007 to 2024, with 87.0% (20/23) published in or after 2019. In total, 87 AI/ML algorithms were analyzed. While several studies reported AUROC values exceeding 0.7, the highest achieved was 0.94. However, most studies exhibited methodological limitations, including a lack of external validation and inadequate handling of imbalanced datasets, where complication events were markedly fewer than non-events.

Conclusions: While AI and ML approaches generally outperform traditional predictive models in forecasting postoperative complications following MBS, few algorithms demonstrated clinically significant performance with AUROC values above 0.8. Future research should adopt more rigorous methodologies and implement strategies to address imbalanced datasets, ensuring broader clinical applicability of AI/ML tools.

Perspective

From surgical outcome prediction to optimizing surgical performance: the role of artificial intelligence in hernia surgery

Victoria L. Walker, B. Todd Heniford, Gregory T. Scarola, Sullivan A. Ayuso

2025, 5(3): 345-9. https://doi.org/10.20517/ais.2025.02

Artificial intelligence (AI) is starting to change the way we approach hernia surgery and abdominal wall reconstruction (AWR). From improving surgical planning to predicting outcomes and enhancing intraoperative precision, AI, particularly deep learning models (DLMs), offers tools that can support decision making and personalize patient care. These models can analyze large datasets, such as preoperative imaging, to spot patterns we might miss. We have seen AI outperform experienced surgeons in predicting complications like mesh infections. Despite its promise, there are also valid concerns about data quality, transparency, and overreliance. Thoughtful integration is essential, ensuring that AI complements rather than replaces clinical judgment. Moving forward, collaboration with fields such as computer science and support from surgical societies will be essential. With appropriate groundwork, AI can be a powerful tool in the pre-op, intra-op, and post-op phases of care.

Review

From scalpel to software: the potential role of AI in plastic surgery training - a scoping review

Elizabeth Hogue, Sidney Nottingham, Andrew James, Fernando A. Herrera

2025, 5(3): 350-60. https://doi.org/10.20517/ais.2025.19

Aim: The evolving capabilities of artificial intelligence (AI) are revolutionizing medicine, and AI integration into surgical training has produced novel tools that are altering the educational landscape. Therefore, the aim of this review is to demonstrate current and future applications of AI in plastic surgery training.

Methods: A detailed search was performed using PubMed and other search engines for applications of AI within surgical education.

Results: Of papers that met inclusion criteria, eight addressed AI in plastic surgery education, with others addressing general surgery (n = 4), neurosurgery (n = 3), endodontics (n = 1), obstetrics/gynecology (n = 1), orthopedic surgery (n = 1), urology (n = 1), and craniofacial surgery (n = 1). Three key areas of research emerged: supplemental/independent learning, operative skills practice, and resident feedback.

Conclusions: Novel applications of various AI algorithms within these areas were explored. The limited integration of AI into plastic surgery education compared with other surgical specialties and the limitations inherent to AI were also highlighted. Though limited research has specifically examined the applications of AI in plastic surgery education, its potential as a versatile educational tool within the field is evident. Novel AI algorithms are already enhancing study tools, surgical skill acquisition, and feedback. Further study is imperative to investigate outlets that leverage AI for the advancement of plastic surgery education.

Review

Data quality for safer and more personalized perioperative care: a scoping review

Massimiliano Greco, Ilesa Bose, Brenda Lupo Pasinetti, Maurizio Cecconi

2025, 5(3): 361-76. https://doi.org/10.20517/ais.2024.100

Background: The exponential growth of perioperative data generated by monitors, electronic health records (EHRs), and wearable devices (WD) represents a significant promise for improving risk assessment, preventing complications, and personalizing perioperative care. Perioperative care produces a wide range of data types from diverse sources (e.g., intraoperative monitors, EHRs, and WD) that can be analyzed using machine learning (ML) techniques. The use of data-driven techniques to big data from perioperative medicine is being extended to different settings of perioperative care, including risk prediction, intraoperative monitoring, complication reduction, and decision support. However, the quality of these data often remains uncertain, potentially limiting the effectiveness of even the most advanced models.

Objective: This scoping review maps the current literature on perioperative data quality. It explores common quality challenges (such as missing, inaccurate, or non-standardized data) and highlights tools, frameworks, and methodologies, from harmonization standards to ML-based imputation techniques. We address the challenges of ensuring adequate data collection, data accuracy, and consistency. We emphasize the importance of data standardization and harmonization through common models to facilitate interaction and integration among different hospitals, systems, and countries. Such efforts aim to enhance external validation and bridge the translational gap from bench to bedside.

Design: We included English-language publications that addressed perioperative data quality issues. We searched PubMed and reviewed the reference lists of relevant articles. Two independent reviewers selected studies and extracted data. Our analysis focused on four key topics: data accuracy, handling of missing data, standardization, and harmonization.

Results: Of the 342 publications, many highlight that perioperative data derive from multiple sources, including intraoperative monitors, ICU systems, EHRs, registries, and WD. Missing values, artifacts, and uneven documentation were common challenges. Studies reported that using advanced filtering and imputation algorithms, standard vocabularies (like SNOMED CT and LOINC), and common data models (CDMs, such as OMOP) improved data sharing and use. Initiatives like the Multicenter Perioperative Outcomes Group (MPOG) demonstrated how harmonized datasets could drive multi-institutional quality improvement and research.

Conclusions: This review focuses on perioperative data quality; we translate technical methods into practical strategies for data-driven perioperative care. It highlights the strong link between data quality and improved perioperative care. Achieving the diffusion of reliable and standardized data calls for strategic efforts on regulatory alignment, staff training, and the development of large collaborative networks. As perioperative medicine evolves, high-quality data will serve as the foundation for reliable predictive modeling, safer anesthesia management, and more patient-centered approaches.

Original Article

Exploring chatbot applications in pancreatic disease treatment: potential and pitfalls

Alberto Balduzzi, Matteo De Pastena, Susanna Tondato, Federico Gronchi, Tommaso Dall’Olio, Giuseppe Malleo, Antonio Pea, Salvatore Paiella, Roberto Salvia

2025, 5(3): 377-86. https://doi.org/10.20517/ais.2025.11

Aim: The use of chatbots to respond across various domains is becoming more integrated into daily life, potentially replacing traditional search engines. The study aimed to investigate the performance of different large language models (LLMs) in providing recommendations regarding pancreatic cancer (PC) to surgeons.

Methods: Standardized prompts were engineered to query four freely accessible LLMs (ChatGPT-4, Personal Intelligence by Inflection AI, Anthropic Claude 3 Haiku Version 3.5, Perplexity AI) on October 9th, 2024. Fourteen questions included the incidence, diagnosis, and treatment for radiologically resectable, borderline resectable, locally advanced, and metastatic PC. Three different investigators queried the LLMS simultaneously. The reliability and accuracy of the responses were evaluated using a 4-point Likert scale and then compared to the international guidelines. Descriptive statistics were used to report outcomes as counts and percentages.

Results: Overall, 72% of the responses were deemed correct (scored 3 or 4). Claude provided the most accurate responses (32%), followed by ChatGPT (28%). ChatGPT-4 and Anthropic Claude 3 Haiku Version 3.5 achieved the overall highest score rate (4-point) at 50% and 52%, respectively. Regarding the quality and accuracy of the responses, ChatGPT cited guidelines most frequently (29%). However, only 19% of all evaluated responses included guideline citations.

Conclusion: The LLMs are still not suitable for safe, standalone use in the medical field, but their rapid learning capabilities suggest they may become indispensable tools for medical professionals in the future.

Meta-Analysis

Clinical outcomes, learning effectiveness, and patient-safety implications of AI-assisted HPB surgery for trainees: a systematic review and multiple meta-analyses

Fahim Kanani, Narmin Zoabi, Goykhman Yaacov, Nir Messer, Amedeo Carraro, Nir Lubezky, Aviad Gravetz, Eviatar Nesher

2025, 5(3): 387-417. https://doi.org/10.20517/ais.2025.47

Introduction: Artificial intelligence (AI) applications are increasingly integrated into hepato-pancreato-biliary (HPB) surgery training, yet their impact on educational outcomes and patient safety remains unclear. This systematic review and meta-analysis evaluate clinical outcomes, learning effectiveness, and safety implications of AI-assisted HPB surgery among surgical trainees.

Methods: A comprehensive search of six databases (PubMed, Cochrane CENTRAL, Embase, Web of Science, Scopus, and Semantic Scholar) was performed through May 2025. Studies involving surgical trainees utilizing AI-based platforms with measurable clinical, educational, or safety outcomes were included. Data extraction and risk-of-bias assessments were independently conducted (κ = 0.86-0.91). Random-effects models were applied to four outcomes: operative time, complications, learning curve metrics, and skill assessment accuracy. Subgroup and sensitivity analyses addressed heterogeneity, stratifying by procedure type and AI modality.

Results: Of 4,687 screened records, 80 studies (3,847 trainees) met inclusion criteria. Four separate meta-analyses revealed: (1) operative time reduction of 32.5 min (MD -32.5, 95% CI: -45.2 to -19.8; I² = 65%; 15 studies, 1,234 procedures); (2) decreased complications (RR 0.72, 95% CI: 0.58-0.89; I² = 42%; 18 studies, 2,156 patients); (3) accelerated learning with 2.3 fewer cases to proficiency (SMD -2.3, 95% CI: -2.8 to -1.8; I² = 55%; 10 studies, 423 trainees); and (4) AI skill assessment accuracy of 85.4% (95% CI: 81.2%-89.6%; I² = 78%; 12 studies, 847 assessments). Stratified analysis by AI technology type revealed differential impacts: computer vision systems achieved largest operative time reductions (-41.2 min, 95% CI: -54.3 to -28.1), augmented reality showed -38.7 min (95% CI: -49.8 to -27.6), while machine learning demonstrated -24.3 min (95% CI: -32.1 to -16.5); test for subgroup differences P = 0.02. Subgroup analysis showed greater benefits for complex procedures (pancreaticoduodenectomy: -48.3 min) versus simple procedures (cholecystectomy: -18.4 min, P = 0.003). Complications showed similar procedure-specific patterns, with pancreaticoduodenectomy achieving RR 0.65 versus cholecystectomy RR 0.78. Critical View of Safety achievement improved from 11% to 78% (RR 2.84, 95% CI: 2.12-3.81). Publication bias was not detected (Egger’s test P > 0.05 for all outcomes).

Discussion: AI-assisted HPB surgical training improves operative efficiency, reduces complications, enhances learning curves, and enables accurate skill assessment. These findings support systematic AI integration with standardized protocols and multicenter validation.

Perspective

AI-powered medical imaging for ventral hernia repair

Ankoor Talwar, Akshay I. Kelshiker, John P. Fischer

2025, 5(3): 418-24. https://doi.org/10.20517/ais.2025.22

Ventral hernia repair (VHR) is the surgical restoration of abdominal wall integrity to correct hernia defects and prevent recurrence. Artificial intelligence (AI) has emerged as a transformative tool in medical imaging, offering novel solutions to enhance the workflow and outcomes in VHR. This manuscript explores AI-driven applications in imaging for VHR, focusing on preoperative risk stratification, intraoperative augmented reality guidance, and postoperative wound monitoring. AI imaging models have demonstrated efficacy in preoperatively predicting hernia formation, optimizing surgical planning, and predicting complications. Recent advancements, including convolutional neural networks and real-time object detection models, have shown promise in automating wound assessment and streamlining clinical workflow. Still, there are notable challenges in AI imaging, such as dataset bias, high computational demands, and model interpretability. Future work should prioritize dataset diversity, computational efficiency, and explainable AI to ensure equitable, scalable, and clinically reliable AI imaging integration for VHR.

Systematic Review

Artificial intelligence use in abdominal wall reconstruction: a systematic review

Amy Liu, Akash Liyanage, Brian Chen, Peter Deptula, Daniel Murariu

2025, 5(3): 425-33. https://doi.org/10.20517/ais.2025.01

Aim: The use of artificial intelligence (AI) in medicine has grown significantly in recent years. This systematic review aims to highlight current trends in the application of AI specifically in abdominal wall reconstruction, which represents one of many medical fields utilizing AI technology.

Methods: A systematic review was conducted following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines. Electronic databases including PubMed, Google Scholar, EBSCO, Ovid, and the Cochrane Library were searched for studies published between 2000 and 2024 that evaluated AI applications in abdominal wall reconstruction.

Results: A total of 142 publications were identified, of which 12 met the inclusion criteria and were included in this review. All included studies were published between 2019 and 2024. Among these, 2 studies investigated AI models for predicting hernia occurrence and the need for abdominal wall reconstruction; 1 study focused on AI for preoperative planning; 6 articles examined AI-based prediction of postoperative complications; and 3 publications explored the use of AI to answer patient questions.

Conclusion: Current research on AI in abdominal wall reconstruction primarily focuses on predicting postoperative outcomes and minimizing complications. However, there is no established consensus regarding the optimal applications or methodologies for integrating AI in this surgical field.

Review

Role of artificial intelligence in the detection, assessment and outcome of gastroesophageal varices

Gianluca Rompianesi, Francesca Pegoraro, Bianca Pacilio, Giusy Petti, Gianluca Benassai, Micaela Cappuccio, Roberto Montalti, Roberto Troisi

2025, 5(3): 434-47. https://doi.org/10.20517/ais.2025.09

Gastroesophageal varices (GEVs) are one of the first clinically relevant consequences of PH, developing in 60%-80% of patients with liver cirrhosis. They are directly associated with a higher risk of decompensation and death. Screening endoscopy is the most common screening strategy in patients with cirrhosis. However, there is a tendency to find non-invasive predictors of GEVs to avoid costly and potentially harmful procedures safely. Artificial intelligence (AI)-driven predictive models effectively integrate diverse clinical, imaging, and laboratory data to provide non-invasive and precise risk stratification, reducing the reliance on endoscopic evaluations. Deep learning applications, particularly convolutional neural networks (CNNs), have proved highly effective in analyzing endoscopic images, thereby enhancing diagnostic accuracy beyond traditional visual inspection. Additionally, radiomics-based AI models utilizing computed tomography (CT) and elastography have enabled non-invasive risk assessment, improving predictions of bleeding risk and estimations of the hepatic venous pressure gradient (HVPG). Ethical considerations, such as data privacy and algorithmic bias, also require careful management. Future research should focus on prospective validation, real-world application studies, and the development of standardized AI frameworks to ensure the clinical applicability of these methods. AI-driven precision medicine has the potential to revolutionize the management of GEVs, offering more efficient, accurate, and individualized patient care while optimizing healthcare resource utilization.

Correction

Correction: Contribution of 3D virtual modeling in locating hepatic metastases, particularly “vanishing tumors”: a pilot study

Mike Salavracos, Etienne Danse, Nicolas Michoux, Alexandre de Hemptinne, Tancrède De Poortere, Laurent Coubeau

2025, 5(3): 448-9. https://doi.org/10.20517/ais.2025.04

Systematic Review

Artificial intelligence for real-time surgical phase recognition in minimal invasive inguinal hernia repair: a systematic review on behalf of TROGSS - the robotic global surgical society

Aman Goyal, Mathew Mendoza, Alfonzo E. Munoz, Christian Adrian Macias, Adel Abou-Mrad, Luigi Marano, Rodolfo J. Oviedo

2025, 5(3): 450-64. https://doi.org/10.20517/ais.2024.108

Introduction: Artificial intelligence (AI) integration into surgical practice has advanced intraoperative precision, complication prediction, and procedural efficiency. While AI has demonstrated advancements in colorectal, cardiac, and other laparoscopic procedures, its application in inguinal hernia repair (IHR), one of the most commonly performed surgeries, remains underexplored. AI models demonstrate potential in real-time recognition of surgical phases, anatomical structures, and instruments, particularly in transabdominal preperitoneal (TAPP), total extraperitoneal (TEP), and robotic inguinal hernia repair (RIHR). This systematic review evaluates the accuracy, applicability, and clinical impact of AI-based systems in real-time surgical phase recognition during IHR.

Methods: Following PRISMA 2020 guidelines and PROSPERO registration (CRD42024621178), a systematic search of PubMed, Scopus, Web of Science, Embase, Cochrane Library, and ScienceDirect was conducted on November 12, 2024. Studies utilizing AI models for real-time video-based surgical phase recognition in minimally invasive IHR (TAPP, TEP, and RIHR) were included. The screening process, data extraction task, and quality assessment using NOS (Newcastle-Ottawa Scale) were performed by three independent reviewers. Primary outcomes were AI performance metrics (accuracy, F1-score, precision, recall, and latency), and secondary outcomes included clinical phase recognition performance.

Results: Out of 903 records, six studies (2022-2024) were included, involving laparoscopic (n = 4) and robotic-assisted (n = 2) IHR from the United States (n = 2), France (n = 2), and Greece (n = 1). A total of 774 videos (25-619 per study) underwent pre-processing (frame extraction or down-sampling). Annotation tools included CVAT, SuperAnnotate, and manual labeling. AI models (VTN, DETR, ResNet-50, YOLOv8) demonstrated accuracy between 74% and > 87%, with YOLOv8 achieving the highest F1-score (82%). Risk of bias was moderate to high, with Fleiss’ kappa for inter-rater agreement at 0.82 (selection) and 0.49 (comparability).

Conclusion: AI and ML models demonstrate significant potential in achieving real-time surgical phase recognition during minimally invasive IHR. Despite promising accuracies, challenges such as heterogeneity in model performance, reliance on annotated datasets, and the need for real-time validation persist. Standardized benchmarks, multicenter studies, and hardware advancements will be essential to fully integrate AI into surgical workflows, improving surgical training, technical performance, and patient outcomes.

About the journal

Aims & scope

Description

Editorial board

Cover gallery

Contact us

Browse

Latest issue

All volumes and issues

Most accessed

Most cited

Authors & reviewers

Online submisson

Guidelines for authors

Please choose a citation manager