2026-01-06 2026, Volume 6 Issue 1

  • Select all
  • Review
    Michele Introna, John George Karippacheril, Sara Pilla, Davide Trimarchi, Marco Gemma, Donato Martino, Carla Carozzi

    Artificial intelligence (AI) has shown considerable potential in perioperative monitoring, particularly in its application to electroencephalogram (EEG) analysis for assessing the depth of anesthesia. AI methods may enable the dynamic recognition of complex time-frequency EEG patterns and the adaptation of monitoring strategies to patient-specific brain responses. Convolutional neural networks, artificial neural networks, and hybrid deep learning models have reported encouraging results in detecting anesthetic states, estimating bispectral index values, and identifying relevant EEG features - such as alpha-delta shifts or burst suppression - without relying on manual feature engineering. Parallel efforts using virtual and augmented reality platforms suggest possible benefits for anesthesiologist training in EEG interpretation and pharmacologic titration. Despite these advances, important limitations constrain clinical translation. A major challenge is the absence of standardized EEG pattern definitions across anesthetic agents and patient groups, limiting model generalizability. Restricted interoperability between EEG monitors and electronic health records, coupled with proprietary data formats, reduces access to raw EEG signals and hampers large-scale development. Privacy and governance requirements add further barriers to data integration. Methodologically, many studies are affected by insufficient internal validation, suboptimal reporting, and testing in experimental rather than real-world conditions, reducing their translational value. While AI could eventually improve anesthetic precision and safety through EEG-guided approaches, realizing this potential will require transparent algorithms, multicenter and heterogeneous datasets, and robust interoperability and data-sharing standards. Only through such coordinated efforts can these tools evolve from promising research applications into reliable components of routine anesthetic care.

  • Original Article
    Runchen Wang, Zhiming Ye, Qixia Wang, Bo Liang, Nanfei Fu, Wenxi Wang, Huimin Deng, Taimin Zhu, Shangxi Zeng, Yudong Zhang, Shunjun Jiang, Ying Huang, Wenhua Liang, Hengrui Liang, Jianxing He, Xusen Zou

    Aim: We aimed to develop EnrichGT, an open-source and clinician-friendly R package for functional genomics enrichment analysis leveraging large language models (LLMs). The tool addresses major limitations of existing approaches, including semantic redundancy, limited interpretability, and static reporting frameworks, thereby facilitating clinical interpretation and supporting data-driven decision-making.

    Methods: EnrichGT implemented both over-representation analysis and preranked gene set enrichment analysis using multiple knowledge bases. To minimize redundancy, enriched pathways were clustered based on shared genes, emphasizing coherent biological themes. Biological interpretability is further improved by inferring transcription factor activity through CollecTRI (Collection of Transcription Regulation Interactions, https://github.com/saezlab/CollecTRI) and pathway activity via PROGENy (Pathway RespOnsive GENes for activity inference, https://saezlab.github.io/progeny/). Additionally, context-aware annotations were generated through LLM integration, and results were compiled into dynamic, interactive reports using Quarto.

    Results: EnrichGT streamlines functional genomics enrichment analysis by clustering pathways based on gene co-occurrence, significantly reducing redundancy and enhancing interpretability. When applied to lung adenocarcinoma data from The Cancer Genome Atlas (TCGA), 873 enriched Gene Ontology terms were consolidated into 15 biologically coherent modules, revealing key processes such as myeloid cell activation and tumor-associated angiogenesis. Downstream analysis identified major tumor-associated regulators [CREB1 (cAMP responsive element binding protein 1), RELA/NF-κB p65 (RELA = RELA proto-oncogene, NF-κB = nuclear factor kappa-light-chain-enhancer of activated B cells signaling), HIF1A (hypoxia inducible factor 1 subunit alpha), PPARG (peroxisome proliferator activated receptor gamma), ETS1 (ETS proto-oncogene 1)] and critical signaling axes [TNFα (tumor necrosis factor alpha signaling), NF-κB, hypoxia (oxygen deprivation-related signaling)]. Automated LLM-based annotations and multi-database integration provided complementary pathway insights. Furthermore, EnrichGT’s comparative multi-condition framework revealed conserved and condition-specific biological patterns across datasets, including single-cell ear-canal development and TCGA tumor-stage progression. Its dynamic reporting interface ensured transparent, reproducible, and iterative exploration of enrichment results.

    Conclusion: EnrichGT offered a robust, clinician-friendly solution for functional genomics enrichment analysis, enhancing clinical interpretation and data-driven decision-making.

  • Review
    Brandon Valencia-Coronel, Jacques F. Marescaux, Mariano Gimenez

    To evaluate artificial intelligence (AI)-powered tools for optimizing surgical research workflows and establish an evidence-based framework for appropriate AI technology selection throughout the research pipeline. We conducted a structured, qualitative narrative appraisal of 43 AI-powered tools (October 2024-March 2025), categorizing them across five functional domains: (1) scientific search engines, (2) document interaction systems, (3) literature analysis tools, (4) writing assistants, and (5) graphic design and reference management solutions. Our assessment framework evaluated key functionalities, costs, technical capabilities, and practical limitations through comprehensive documentation analysis, operational testing, and a systematic review of demonstration materials. All assessments reflect tool versions accessed between October 2024 and March 2025, acknowledging the rapidly evolving nature of this ecosystem. AI technologies primarily enhanced efficiency in literature discovery, content synthesis, and manuscript preparation while maintaining methodological rigor. The 43 evaluated tools demonstrated significant capabilities in processing scientific information, with each category offering distinct advantages for specific research tasks. Findings indicate substantial time reduction in literature searches, document analysis, and manuscript preparation when properly integrated into research workflows. AI-powered tools demonstrate transformative potential for optimizing surgical research processes, providing significant efficiencies from initial literature search to final publication. Successful implementation requires maintaining a critical balance between technological innovation and fundamental scientific principles, with essential human oversight to prevent overreliance on automation that could compromise critical thinking and analytical skills.

  • Systematic Review
    Carolina González-Abós, Roberto Molina, Sofía Almirante, Mariano Vázquez, Fabio Ausania

    Aim: Insufficient assessment of post-surgical organ perfusion in hepatobiliopancreatic surgery can lead to serious complications. Consequently, various technological solutions have been developed to achieve non-invasive and accurate blood flow assessment. This article aims to evaluate the current state of four-dimensional flow magnetic resonance imaging (4D-flow MRI) and computational fluid dynamics (CFD) technologies in assessing vascular blood flow within this surgical field.

    Methods: A comprehensive literature search using ClinicalTrials.gov and PubMed/MEDLINE was performed; articles published between 2015 and 2025 were included. Broad search terms, including “blood flow measurement”, “4D-flow MRI”, or “computational fluid dynamics” and “abdomen” or “liver”, were utilized.

    Results: Twenty-two studies were analyzed in detail. Nineteen focused on vascular conditions surrounding the liver, with 15 assessing venous flow and five evaluating the hepatic artery. Additional hemodynamic features analyzed included blood velocity, pressure, and particle distribution. The clinical applications investigated were: portal vein embolization (1), venous anastomosis (3), liver resection (2), portal hypertension (2), transarterial radioembolization (2), transjugular intrahepatic portosystemic shunt (4), and liver fibrosis (1). Notably, only CFD facilitated the simulation of prospective hemodynamic conditions (2).

    Conclusion: Both 4D-flow MRI and CFD technologies facilitate the accurate study of blood flow dynamics within the supramesocolic compartment. Furthermore, CFD enables the simulation of prospective vascular conditions, establishing its potential as a preoperative planning tool. However, further research is required to fully validate the clinical utility of CFD in this surgical context.

  • Original Article
    Xiaojian Ji, Nianzhe Sun, Anan Wang, Jing Dong, Jiawen Hu, Jian Zhu, Feng Huang, Zhengbo Zhang, Kunpeng Li, Da Teng, Tao Li

    Aim: General-purpose Large Language Models (LLMs) exhibit significant limitations in high-stakes clinical domains such as spondyloarthritis (SpA) diagnosis, yet the absence of specialized evaluation tools precludes the quantification of these failures. This study aims to break this critical evaluation impasse and rigorously test the hypothesis that domain specialization is a necessity for achieving expert-level performance in complex medical diagnostics.

    Methods: We employed a two-pronged experimental approach. First, we introduced the Spondyloarthritis Multiple-Choice Question Answering Benchmark (SpAMCQA), a comprehensive, expert-validated benchmark engineered to probe the nuanced diagnostic reasoning required for SpA. Second, to validate the domain specialization hypothesis, we developed the Spondyloarthritis Diagnosis Large Language Model (SpAD-LLM) by fine-tuning a foundation model on a curated corpus of SpA-specific clinical data. The efficacy of SpAD-LLM was then evaluated against leading generalist models, including Generative Pre-trained Transformer 4 (GPT-4), on the SpAMCQA testbed.

    Results: On the SpAMCQA benchmark, our specialized SpAD-LLM achieved a state-of-the-art accuracy of 92.36%, decisively outperforming the 86.05% accuracy of the leading generalist model, GPT-4. This result provides the first empirical evidence on a purpose-built benchmark that generalist scaling alone is insufficient for mastering the specific inferential knowledge required for SpA diagnosis.

    Conclusion: Our findings demonstrate that in high-stakes domains, domain specialization is not merely an incremental improvement but a categorical necessity. We release the SpAMCQA benchmark and full inference logs to the public, providing the community with a foundational evaluation toolkit, while positioning the SpAD-LLM series as a validated baseline to catalyze the development of truly expert-level medical artificial intelligence.

  • Review
    Xi Xu, Ying Zhang, Qiufei Niu, Nianjiao Long, Jianqiang Li, Linna Zhao, Jian Yin, Jijiang Yang

    Background: Speech production is a coordinated physiological process and a vital digital biomarker for health assessment. Recent advances in artificial intelligence (AI), particularly in representation learning, have substantially expanded the application of speech analysis across diverse clinical domains.

    Methods: This review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR). Five major bibliographic databases were systematically searched for studies published between 2015 and 2025. Eligible studies applied AI-driven speech analysis for clinical diagnosis or monitoring, while those lacking quantitative evaluation or sufficient methodological detail were excluded.

    Results: A total of 124 studies were analyzed, covering neurological, psychiatric, and respiratory disorders. The field has transitioned from traditional machine learning with handcrafted features to deep learning and foundation models. Parkinson’s disease, Alzheimer’s disease, depression, and coronavirus disease 2019 (COVID-19) are the most frequently investigated conditions. The included studies were charted and synthesized to map disease coverage, methodological trends, and clinical application scenarios.

    Conclusion: Speech analysis offers a non-invasive approach for early disease detection and remote monitoring in telemedicine. To support clinical translation, future research should prioritize model robustness and interpretability across diverse clinical populations.