Artificial Intelligence Surgery

2025-10-31 2025, Volume 5 Issue 4

Previous Next

Select all

Review

Machine learning for Regenerative Peripheral Nerve Interface-based prosthetic control: current applications and clinical translation

Melanie J. Wang, Luis H. Cubillos, Theodore A. Kung, Stephen W.P. Kemp, Alison K. Snyder-Warwick, Paul S. Cederna

2025, 5(4): 465-75. https://doi.org/10.20517/ais.2025.03

Machine learning algorithms and control systems have changed the design of modern-day prosthetic devices. This narrative review explores the evolution and application of machine learning in advanced prosthetic devices. Despite all the advancements created in prosthetic technology over the years, we still have not achieved the necessary level of functional rehabilitation or a seamless interface that allows users to truly mirror natural movement. Challenges persist in creating intuitive control strategies that can both interpret complex neural signals and translate them into fluid, multi-articulated movements. There is a need for better control strategies for these advanced prosthetic devices. Regenerative Peripheral Nerve Interface (RPNI) surgery has emerged in the field as a promising new way of enhancing prosthetic functionality. However, significant work is still needed to bridge the gap between current capabilities and the seamless, intuitive control required for naturalistic movement and true prosthetic embodiment. For continuous control, Kalman and Wiener filters have successfully translated EMG signals into smooth finger movements. In a study with rhesus macaques, a Kalman filter-based system achieved closed-loop continuous hand control using RPNI signals. For pose identification, Naïve Bayes (NB) classifiers and Hidden Markov Models combined with NB (HMM-NB) have shown high accuracy. One study reported > 96% accuracy in classifying finger movements using a NB classifier in rhesus macaques with RPNIs. In human participants, researchers decoded five different finger postures using only RPNI signals, both offline and in real time. Long-term stability of RPNI-based control has been demonstrated, with controllers maintaining high accuracy using calibration data collected up to 246 days prior. In a practical application, a human participant with RPNIs successfully completed a Coffee Making Task using four distinct grip patterns, showcasing the system’s functional utility.

Original Article

Clinical insight from mesh implant narratives using zero-shot Retrieval-Augmented Generation approach

Indu Bala, Ekta Sharma, Lewis Mitchell

2025, 5(4): 476-89. https://doi.org/10.20517/ais.2025.44

Aim: Mesh implant surgeries for hernia repair are frequently associated with adverse events that can compromise patient outcomes. Extracting structured clinical insights from large-scale, unstructured data sources such as the U.S. Food and Drug Administration’s Manufacturer and User Facility Device Experience (FDA MAUDE) database remains a challenge due to variability and subjectivity in patient narratives. This study aims to develop and evaluate a zero-shot generative artificial intelligence (AI) framework enhanced with Retrieval-Augmented Generation (RAG) to automatically extract structured clinical information and adverse event indicators from unstructured mesh implant reports, assessing its accuracy, interpretability, and scalability against a manually annotated benchmark.

Methods: The study employed the LLaMA 2 (13B) model for zero-shot structured summarization and adverse event extraction from FDA MAUDE mesh implant reports (2000–2021). The framework integrated retrieval-based context using RAG and evaluated model performance on report date, hernia type, and adverse event flag using accuracy, Jaccard similarity, and Chi-square tests (P < 0.05). Statistical analysis validated improvements in output reliability and clinical relevance.

Results: The model outputs were compared to a manually annotated Benchmark Baseline. With zero-shot prompting alone, the model achieved accuracies of 67% for report date, 60% for hernia type, and 83% for adverse event flag. After integrating the RAG approach, these accuracies improved to 81%, 82%, and 99%, respectively. The accuracy for adverse event extraction increased from 60% to 86%, and the Jaccard similarity improved from 75% to 88.9%. Chi-square tests confirmed statistical significance (P < 0.05) for most of the observed improvements.

Conclusion: This study demonstrates that combining zero-shot generative AI with retrieval augmentation can effectively convert unstructured patient reports into structured data. This approach offers a scalable and interpretable method for adverse event monitoring in mesh implant surgeries and supports data-driven evaluation of patient-reported outcomes.

Review

Artificial intelligence for post-polypectomy surveillance: a scoping review of emerging tools in colorectal cancer prevention

Stefania Ferrari, Silvia Negro, Francesco Celotto, Quoc Riccardo Bao, Giovanni Madeo, Alessandra Pulvirenti, Gaya Spolverato

2025, 5(4): 490-504. https://doi.org/10.20517/ais.2025.65

Aim: This scoping review aimed to synthesize current evidence on the application of artificial intelligence (AI), including natural language processing (NLP) and large language models (LLMs), in post-polypectomy surveillance for colorectal cancer (CRC). Specific objectives were to assess technological advances, evaluate their impact on guideline adherence, and identify gaps for future research.

Methods: We conducted a scoping review following the Arksey and O’Malley framework and PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Searches across PubMed, EMBASE, Scopus, and Web of Science identified studies applying AI to CRC surveillance after polypectomy. Eligible studies investigated AI models for interval assignment or risk stratification using colonoscopy and pathology data.

Results: Of 950 screened articles, seven met the inclusion criteria. Five studies used NLP-based decision support tools, achieving concordance rates of 81.7%-99.9% with guideline-recommended surveillance intervals, consistently outperforming clinician recommendations. Two studies evaluated ChatGPT-4 in clinical decision making; fine-tuned models demonstrated an accuracy of up to 85.7%, surpassing that of physicians in retrospective and simulated scenarios. NLP systems demonstrated technical maturity and scalability, while LLMs offered flexible, user-friendly interfaces but were less reliable in complex clinical scenarios.

Conclusion: AI tools, particularly NLP-enhanced systems, demonstrate strong potential to standardize post-polypectomy surveillance and improve guideline adherence. LLMs are promising but remain under validation. Future research should assess clinical implementation, long-term outcomes, and integration within electronic health records.

Review

Advances in the application of artificial intelligence in continuing education for trauma orthopedics

Zijie An, Yawei Zhang, Qiaoyu Zhang, Jingyu Liu, Yue Mao, Rui Zhao, Kun Zhu

2025, 5(4): 505-20. https://doi.org/10.20517/ais.2025.34

The application of artificial intelligence (AI) in continuing education for traumatic orthopedics is rapidly evolving and demonstrates significant potential. Through AI technologies, surgeons can enhance their surgical skills and operational confidence within safe simulated environments, particularly in contexts where hands-on practice opportunities are diminishing. However, the long-term efficacy of AI in continuing education remains incompletely validated, with further research required to assess skill retention among trainees and the practical outcomes of its application. Additionally, AI holds substantial promise in clinical diagnosis and decision-making support, enabling surgeons to rapidly analyze and process complex data in trauma and emergency settings. Despite its broad prospects in acute surgical interventions and educational training, the adoption of AI remains in its nascent stage due to limited physician understanding of AI technologies and current technical constraints. AI also exhibits advantages in personalized teaching by assessing trainee competencies and providing feedback to optimize educational processes. Nevertheless, challenges such as data imbalance and insufficient sample sizes persist in AI-driven continuing education. While the widespread integration of AI in orthopedic trauma education - particularly in medical imaging diagnostics and surgical training - can significantly improve clinical outcomes, physicians must fully acknowledge its limitations and exercise prudence when implementing AI solutions.

Original Article

Assistive assessment of neurological dysfunction via eye movement patterns and ocular biometrics

Ziqi Wang, Jing Bi, Xiaomeng Zhao, Zhipeng Zheng, Jinglei Cui, Rong Cui, Junqi Zhang, Yuanchen Tang, Jiantao Liang, Kai Zhou, Jia Zhang

2025, 5(4): 521-44. https://doi.org/10.20517/ais.2025.62

Aim: Nerve dysfunction often manifests as abnormal eye behaviors, necessitating accurate and objective neurological assessment. Current deep learning-based facial analysis methods lack adaptability to inter-patient variability, making it difficult to capture subtle and rapid ocular dynamics such as incomplete eyelid closure or asymmetric eye movement. To address this, we propose a precise deep learning system for quantitative ocular state analysis, providing objective support for the evaluation of neurological dysfunction.

Methods: We propose the Ocular-enhanced Face Keypoints Net (OFKNet). It incorporates three key innovations: (1) a 40-point anatomically informed ocular landmark design enabling dense modeling of eyelid contours, canthus structure, and pupil dynamics; (2) a MobileNetV3-based region enhancement module that amplifies feature responses within clinically critical areas such as the internal canthus; (3) and an improved Path Aggregation Network combined with Squeeze-and-Excitation modules that enables adaptive multi-scale fusion and enhances sensitivity to subtle ocular deformations.

Results: Using clinically acquired data, OFKNet demonstrates substantial performance gains over state-of-the-art baselines. It achieves a 65.3% reduction in normalized mean error on the 40-point dataset (0.029 vs. 0.084) and a 38.4% reduction on the 14-point dataset, with all improvements statistically significant (P < 0.001). Despite operating on high-resolution inputs, the system maintains real-time capability and provides stable frame-level landmark localization, enabling precise capture of dynamic ocular motion patterns.

Conclusion: OFKNet provides a reliable tool for real-time monitoring of eye movement patterns in patients with neurological disorders. By visualizing time-series graphs of bilateral eye openness, the system enables a more comprehensive understanding of ocular dynamics and supports timely clinical decision-making and treatment adjustment.

Original Article

Modeling treatment effect heterogeneity in prophylactic lumbar drainage: a Double Machine Learning reanalysis of EARLYDRAIN

Shrinit Babel, Syed RH Peeran, Gandham E. Jonathan

2025, 5(4): 545-56. https://doi.org/10.20517/ais.2025.75

Aim: The EARLYDRAIN trial illustrated that prophylactic lumbar drainage (LD) could reduce poor outcomes in patients with aneurysmal subarachnoid hemorrhage, although not uniformly. We aim to reanalyze trial data using Double Machine Learning (DML) to estimate individualized treatment effects and identify patients or patient subgroups most likely to benefit.

Methods: We applied a DML framework with causal Random Forests to data from 287 patients randomized in the EARLYDRAIN trial to prophylactic LD or standard care. Six binary outcomes were analyzed: vasospasm, cerebral infarction, infection, favorable six-month outcome [modified Rankin Scale (mRS) ≤ 2], functional independence [Glasgow Outcomes Scale-Extended (GOS-E) ≥ 5], and shunt dependency. Average treatment effects (ATEs) and conditional ATEs (CATEs) were estimated. Uniform Manifold Approximation and Projection of the CATE values identified treatment-response clusters.

Results: Across the full cohort, prophylactic LD showed no consistent ATE across outcomes [e.g., mRS ATE: 0.02; 95% confidence interval (CI): -0.13-0.17]; CATE distributions revealed significant heterogeneity, with four treatment-response phenotypes. Younger patients with elevated intracranial pressure and lower drainage volumes derived greater benefit, and potentially reduced vasospasm risk. Patients older than 60 with higher systolic blood pressure, greater hemorrhage burden, and positive fluid balance experienced limited benefit and increased shunt dependency. A web-based application (https://earlydrain.streamlit.app) was developed to translate these findings into clinical decision support.

Conclusion: DML analysis of the EARLYDRAIN trial was able to substantiate the heterogeneity in effects of prophylactic LD from the initial trial. DML offers a scalable framework to reveal treatment heterogeneity masked in trial averages and support precision medicine in neurosurgery.

Systematic Review

Deep learning in real-time image-guided surgery: a systematic review of applications, methodologies, and clinical relevance

Omar Kasimieh, Mohamed Mustaf Ahmed, Zhinya Kawa Othman, Ifrah Ali, Mulki Mukhtar Hassan, Philine Muriel Maulion, Adetola Emmanuel Babalola, Olalekan John Okesanya, Bonaventure Michael Ukoaka, Francesco Branda, Don Eliseo Lucero-Prisno III

2025, 5(4): 557-71. https://doi.org/10.20517/ais.2025.92

Aim: Real-time image guidance using deep learning is being increasingly used in surgery. This systematic review aims to characterize intraoperative systems, mapping applications, performance and latency, validation practices, and the reported effects on workflow and patient-relevant outcomes.

Methods: A systematic review was conducted on PubMed, Embase, Scopus, ScienceDirect, IEEE Xplore, Google Scholar, and Directory of Open Access Journals from December 31, 2024. Eligible English-language, peer-reviewed diagnostic accuracy, cohort, quasi-experimental, or randomized studies (2017-2024) evaluated the learning for real-time intraoperative guidance. Two reviewers screened, applied the Joanna Briggs Institute checklists, and extracted the design, modality, architecture, training, validation, performance, and latency. Heterogeneity precluded the meta-analysis.

Results: Twenty-seven studies spanning laparoscopic, neurosurgical, breast, colorectal, cardiac, and other workflows met the criteria. The modalities included red-green-blue laparoscopy or endoscopy, ultrasound, optical coherence tomography, cone-beam computed tomography, and stimulated Raman histology. The architectures were mainly convolutional neural networks with frequent transfer learning. Reported performance was high, with classification accuracy commonly 90%-97% and segmentation Dice or intersection over union up to 0.95 at operating-room-compatible speeds of about 20-300 frames per second or sub-second per-frame latency; volumetric pipelines sometimes required up to 1 min. Several systems demonstrated intraoperative feasibility and high surgeon acceptance, yet fewer than one quarter reported external validation and only a small subset linked outputs to patient-important outcomes.

Conclusion: Deep-learning systems for real-time image guidance exhibit strong technical performance and emerging workflow benefits. Priorities include multicenter prospective evaluations, standardized reporting of latency and external validation, rigorous human factors assessment, and open benchmarking to demonstrate generalizability and patient impact.

About the journal

Aims & scope

Description

Editorial board

Cover gallery

Contact us

Browse

Latest issue

All volumes and issues

Most accessed

Most cited

Authors & reviewers

Online submisson

Guidelines for authors

Please choose a citation manager