EnrichGT: a comprehensive R-based tool for functional genomics enrichment analysis based on large language models
Runchen Wang , Zhiming Ye , Qixia Wang , Bo Liang , Nanfei Fu , Wenxi Wang , Huimin Deng , Taimin Zhu , Shangxi Zeng , Yudong Zhang , Shunjun Jiang , Ying Huang , Wenhua Liang , Hengrui Liang , Jianxing He , Xusen Zou
Artificial Intelligence Surgery ›› 2026, Vol. 6 ›› Issue (1) : 18 -35.
EnrichGT: a comprehensive R-based tool for functional genomics enrichment analysis based on large language models
Aim: We aimed to develop EnrichGT, an open-source and clinician-friendly R package for functional genomics enrichment analysis leveraging large language models (LLMs). The tool addresses major limitations of existing approaches, including semantic redundancy, limited interpretability, and static reporting frameworks, thereby facilitating clinical interpretation and supporting data-driven decision-making.
Methods: EnrichGT implemented both over-representation analysis and preranked gene set enrichment analysis using multiple knowledge bases. To minimize redundancy, enriched pathways were clustered based on shared genes, emphasizing coherent biological themes. Biological interpretability is further improved by inferring transcription factor activity through CollecTRI (Collection of Transcription Regulation Interactions, https://github.com/saezlab/CollecTRI) and pathway activity via PROGENy (Pathway RespOnsive GENes for activity inference, https://saezlab.github.io/progeny/). Additionally, context-aware annotations were generated through LLM integration, and results were compiled into dynamic, interactive reports using Quarto.
Results: EnrichGT streamlines functional genomics enrichment analysis by clustering pathways based on gene co-occurrence, significantly reducing redundancy and enhancing interpretability. When applied to lung adenocarcinoma data from The Cancer Genome Atlas (TCGA), 873 enriched Gene Ontology terms were consolidated into 15 biologically coherent modules, revealing key processes such as myeloid cell activation and tumor-associated angiogenesis. Downstream analysis identified major tumor-associated regulators [CREB1 (cAMP responsive element binding protein 1), RELA/NF-κB p65 (RELA = RELA proto-oncogene, NF-κB = nuclear factor kappa-light-chain-enhancer of activated B cells signaling), HIF1A (hypoxia inducible factor 1 subunit alpha), PPARG (peroxisome proliferator activated receptor gamma), ETS1 (ETS proto-oncogene 1)] and critical signaling axes [TNFα (tumor necrosis factor alpha signaling), NF-κB, hypoxia (oxygen deprivation-related signaling)]. Automated LLM-based annotations and multi-database integration provided complementary pathway insights. Furthermore, EnrichGT’s comparative multi-condition framework revealed conserved and condition-specific biological patterns across datasets, including single-cell ear-canal development and TCGA tumor-stage progression. Its dynamic reporting interface ensured transparent, reproducible, and iterative exploration of enrichment results.
Conclusion: EnrichGT offered a robust, clinician-friendly solution for functional genomics enrichment analysis, enhancing clinical interpretation and data-driven decision-making.
Enrichment analysis / large language models / visualization / EnrichGT
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
Ontology Consortium. The Gene Ontology resource: enriching a GOld mine.Nucleic Acids Res2021;49:D325-34 PMCID:PMC7779012 |
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
gt: easily create presentation-ready display tables. Available from https://gt.rstudio.com [accessed 18 December 2025]. |
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
/
| 〈 |
|
〉 |