Clinical insight from mesh implant narratives using zero-shot Retrieval-Augmented Generation approach

Indu Bala; Ekta Sharma; Lewis Mitchell

doi:10.20517/ais.2025.44

Artificial Intelligence Surgery ›› 2025, Vol. 5 ›› Issue (4) :476 -89. DOI: 10.20517/ais.2025.44

Original Article

Clinical insight from mesh implant narratives using zero-shot Retrieval-Augmented Generation approach

Author information +

History +

PDF

Abstract

Aim: Mesh implant surgeries for hernia repair are frequently associated with adverse events that can compromise patient outcomes. Extracting structured clinical insights from large-scale, unstructured data sources such as the U.S. Food and Drug Administration’s Manufacturer and User Facility Device Experience (FDA MAUDE) database remains a challenge due to variability and subjectivity in patient narratives. This study aims to develop and evaluate a zero-shot generative artificial intelligence (AI) framework enhanced with Retrieval-Augmented Generation (RAG) to automatically extract structured clinical information and adverse event indicators from unstructured mesh implant reports, assessing its accuracy, interpretability, and scalability against a manually annotated benchmark.

Methods: The study employed the LLaMA 2 (13B) model for zero-shot structured summarization and adverse event extraction from FDA MAUDE mesh implant reports (2000–2021). The framework integrated retrieval-based context using RAG and evaluated model performance on report date, hernia type, and adverse event flag using accuracy, Jaccard similarity, and Chi-square tests (P < 0.05). Statistical analysis validated improvements in output reliability and clinical relevance.

Results: The model outputs were compared to a manually annotated Benchmark Baseline. With zero-shot prompting alone, the model achieved accuracies of 67% for report date, 60% for hernia type, and 83% for adverse event flag. After integrating the RAG approach, these accuracies improved to 81%, 82%, and 99%, respectively. The accuracy for adverse event extraction increased from 60% to 86%, and the Jaccard similarity improved from 75% to 88.9%. Chi-square tests confirmed statistical significance (P < 0.05) for most of the observed improvements.

Conclusion: This study demonstrates that combining zero-shot generative AI with retrieval augmentation can effectively convert unstructured patient reports into structured data. This approach offers a scalable and interpretable method for adverse event monitoring in mesh implant surgeries and supports data-driven evaluation of patient-reported outcomes.

Keywords

Mesh implantation / adverse events / Retrieval-Augmented Generation (RAG) / hernia repair / GenAI

Cite this article

Download citation ▾

Indu Bala, Ekta Sharma, Lewis Mitchell. Clinical insight from mesh implant narratives using zero-shot Retrieval-Augmented Generation approach. Artificial Intelligence Surgery, 2025, 5(4): 476-89 DOI:10.20517/ais.2025.44

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Watson C.Hernia. In: Watson C, Davies J, Editors. Ellis and Calne’s lecture notes in general surgery. Hoboken: John Wiley & Son; 2023. pp. 311-22.

[2]	Le TN,Gadzhanova S.Hernia repair prevalence by age and gender among the Australian adult population from 2017 to 2021.Critical Public Health2024;34:1-11

[3]	Mishali M,Mishali O.Understanding variation among medical device reporting sources: a study of the MAUDE database.Clin Ther2025;47:76-81

[4]	Clavel M, Durán F, Eker S, et al. Maude manual (version 3.1). SRI International, 2020. Available from: https://gentoo.uls.co.za/distfiles/5d/Maude-3.1-manual.pdf [accessed 16 October 2025].

[5]	Kou Q.Unlocking the potential of natural language processing in decoding medical device adverse events. In: Lane M, Sethumadhavan A, Editors. Collaborative intelligence: how humans and AI are transforming our world. Cambridge: MIT Press; 2024. pp. 197-211.

[6]	I. Natural language processing in medical science and healthcare.Medicon Med Sci2023;4;1-2

[7]	Liao TJ,Cross K,Elespuru R.Medical device report analyses from MAUDE: device and patient outcomes, adverse events, and sex-based differential effects.Regul Toxicol Pharmacol2024;149:105591

[8]	Bala I.Fuzzy classification with comprehensive learning gravitational search algorithm in breast tumor detection.IJRTE2019;8:2688-94

[9]	Martin SC.Tipu Zahed Aziz, MD (November 9, 1956-October 25, 2024).Neuromodulation2025;28:371-2

[10]	Touvron H, Lavril T, Izacard G, et al. LLaMA: open and efficient foundation language models. arXiv 2023; arXiv:2302.13971. Available from https://doi.org/10.48550/arXiv.2302.13971 [accessed 16 October 2025].

[11]	Bumgardner VK, Larsen MA, Anderson MB, Sayre GG, Fecho K, Pfaff ER. Local large language models for complex structured medical tasks. arXiv 2023; arXiv:2308.01727. Available from https://doi.org/10.48550/arXiv.2308.01727 [accessed 16 October 2025].

[12]	Wang H,Dantona C,Sun J.DRG-LLaMA : tuning LLaMA model to predict diagnosis-related group for hospitalized patients.NPJ Digit Med2024;7:16 PMCID:PMC10803802

[13]	Zhang R, Han J, Zhou A, et al. LLaMA-Adapter: efficient fine-tuning of large language models with zero-initialized attention. arXiv 2024; arXiv:2303.16199. Available from https://doi.org/10.48550/arXiv.2303.16199 [accessed 16 October 2025].

[14]	Frayling E, Lever J, McDonald G. Zero-shot and few-shot generation strategies for artificial clinical records. arXiv 2024; arXiv:2403.08664. Available from https://doi.org/10.48550/arXiv.2403.08664 [accessed 16 October 2025].

[15]	Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv 2020; arXiv:2005.11401. Available from https://doi.org/10.48550/arXiv.2005.11401 [accessed 16 October 2025].

[16]	Gao Y, Xiong Y, Gao X, et al. Retrieval-augmented generation for large language models: a survey. arXiv 2023; arXiv:2312.10997. Available from https://doi.org/10.48550/arXiv.2312.10997 [accessed 16 October 2025].

[17]

Salemi A.Evaluating retrieval quality in retrieval-augmented generation. In: SIGIR 2024: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2024 Jul 14-18; Washington DC, USA. New York: Association for Computing Machinery; 2024. pp. 2395-400.

[18]	Yu H,Sano A. Zero-shot ECG diagnosis with large language models and retrieval-augmented generation. In Machine learning for health (ML4H); 2023 Dec 10; New Orleans, USA. Cambridge: PMLR; 2023. pp. 650-63. Available from https://proceedings.mlr.press/v225/yu23b.html [accessed 16 October 2025].

[19]	Thompson WE, Vidmar DM, De Freitas JK, et al. Large language models with retrieval-augmented generation for zero-shot disease phenotyping. arXiv 2023; arXiv:2312.06457. Available from https://doi.org/10.48550/arXiv.2312.06457 [accessed 16 October 2025].

[20]	Mahbub S,Alinejad S. From one to zero: RAG-IM adapts language models for interpretable zero-shot predictions on clinical tabular data. In: NeurIPS 2024 Third Table Representation Learning Workshop, 2024. Available from https://openreview.net/forum?id=3OYjWzqqC1 [accessed 16 October 2025].

[21]	Ke YH,Elangovan K.Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.NPJ Digit Med2025;8:187 PMCID:PMC11971376

[22]	Dong X,Meng J,Lin H.SyRACT: zero-shot biomedical document-level relation extraction with synergistic RAG and CoT.Bioinformatics2025;41:btaf356 PMCID:PMC12237500

[23]	Mishali M,Mishali O.Evaluation of reporting trends in the MAUDE Database: 1991 to 2022.Digit Health2025;11:20552076251314094 PMCID:PMC11755539

[24]	Bala I,Stanford T,Mitchell L.Machine learning-based analysis of adverse events in mesh implant surgery reports.Soc Netw Anal Min2024;14:1229

[25]	Boutin R,Latouche P.Embedded topics in the stochastic block model.Stat Comput2023;33:10265

[26]	S SK,E GMK,A RGS.A RAG-based medical assistant especially for infectious diseases. In: 2024 International Conference on Inventive Computation Technologies (ICICT); 2024 Apr 24-26; Lalitpur, Nepal. New York: IEEE; 2024. pp. 1128-33.

[27]	Galli C,Calciolari E.Performance of 4 pre-trained sentence transformer models in the semantic query of a systematic review dataset on peri-implantitis.Information2024;15:68

[28]

Wang X,Tang M.Robust orbital game policy in multiple disturbed environments: an approach based on causality diversity maximal marginal relevance algorithm. In: Liu L, Niu Y, Fu W, Qu Y, Editors. Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems (4th ICAUS 2024); 2024 Sep 19-21; Shenyang, China. Singapore: Springer; 2025. pp. 355-69.

[29]	Badshah S, Sajjad H. Quantifying the capabilities of LLMs across scale and precision. arXiv 2024; arXiv:2405.03146. Available from https://doi.org/10.48550/arXiv.2405.03146 [accessed 16 October 2025].

[30]	Verma V.A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective.Soc Netw Anal Min2020;10:660

[31]	Bala I,Lim R,Mitchell L.An effective approach for multiclass classification of adverse events using machine learning.JCCE2024;3:226-39

[32]	Groves M,Alexander H.Clinical reasoning: the relative contribution of identification, interpretation and hypothesis errors to misdiagnosis.Med Teach2003;25:621-5

[33]	McHugh ML.The chi-square test of independence.Biochem Med2013;23:143-9 PMCID:PMC3900058

[34]	Bala I, Mitchell L, Gillam MH. Analysis of voluntarily reported data post mesh implantation for detecting public emotion and identifying concern reports. arXiv 2025; arXiv:2509.04517. Available from https://doi.org/10.48550/arXiv.2509.04517 [accessed 16 October 2025].