PDF
Abstract
Aim: Mesh implant surgeries for hernia repair are frequently associated with adverse events that can compromise patient outcomes. Extracting structured clinical insights from large-scale, unstructured data sources such as the U.S. Food and Drug Administration’s Manufacturer and User Facility Device Experience (FDA MAUDE) database remains a challenge due to variability and subjectivity in patient narratives. This study aims to develop and evaluate a zero-shot generative artificial intelligence (AI) framework enhanced with Retrieval-Augmented Generation (RAG) to automatically extract structured clinical information and adverse event indicators from unstructured mesh implant reports, assessing its accuracy, interpretability, and scalability against a manually annotated benchmark.
Methods: The study employed the LLaMA 2 (13B) model for zero-shot structured summarization and adverse event extraction from FDA MAUDE mesh implant reports (2000–2021). The framework integrated retrieval-based context using RAG and evaluated model performance on report date, hernia type, and adverse event flag using accuracy, Jaccard similarity, and Chi-square tests (P < 0.05). Statistical analysis validated improvements in output reliability and clinical relevance.
Results: The model outputs were compared to a manually annotated Benchmark Baseline. With zero-shot prompting alone, the model achieved accuracies of 67% for report date, 60% for hernia type, and 83% for adverse event flag. After integrating the RAG approach, these accuracies improved to 81%, 82%, and 99%, respectively. The accuracy for adverse event extraction increased from 60% to 86%, and the Jaccard similarity improved from 75% to 88.9%. Chi-square tests confirmed statistical significance (P < 0.05) for most of the observed improvements.
Conclusion: This study demonstrates that combining zero-shot generative AI with retrieval augmentation can effectively convert unstructured patient reports into structured data. This approach offers a scalable and interpretable method for adverse event monitoring in mesh implant surgeries and supports data-driven evaluation of patient-reported outcomes.
Keywords
Mesh implantation
/
adverse events
/
Retrieval-Augmented Generation (RAG)
/
hernia repair
/
GenAI
Cite this article
Download citation ▾
Indu Bala, Ekta Sharma, Lewis Mitchell.
Clinical insight from mesh implant narratives using zero-shot Retrieval-Augmented Generation approach.
Artificial Intelligence Surgery, 2025, 5(4): 476-89 DOI:10.20517/ais.2025.44
| [1] |
Watson C.Hernia. In: Watson C, Davies J, Editors. Ellis and Calne’s lecture notes in general surgery. Hoboken: John Wiley & Son; 2023. pp. 311-22.
|
| [2] |
Le TN,Gadzhanova S.Hernia repair prevalence by age and gender among the Australian adult population from 2017 to 2021.Critical Public Health2024;34:1-11
|
| [3] |
Mishali M,Mishali O.Understanding variation among medical device reporting sources: a study of the MAUDE database.Clin Ther2025;47:76-81
|
| [4] |
Clavel M, Durán F, Eker S, et al. Maude manual (version 3.1). SRI International, 2020. Available from: https://gentoo.uls.co.za/distfiles/5d/Maude-3.1-manual.pdf [accessed 16 October 2025].
|
| [5] |
Kou Q.Unlocking the potential of natural language processing in decoding medical device adverse events. In: Lane M, Sethumadhavan A, Editors. Collaborative intelligence: how humans and AI are transforming our world. Cambridge: MIT Press; 2024. pp. 197-211.
|
| [6] |
I. Natural language processing in medical science and healthcare.Medicon Med Sci2023;4;1-2
|
| [7] |
Liao TJ,Cross K,Elespuru R.Medical device report analyses from MAUDE: device and patient outcomes, adverse events, and sex-based differential effects.Regul Toxicol Pharmacol2024;149:105591
|
| [8] |
Bala I.Fuzzy classification with comprehensive learning gravitational search algorithm in breast tumor detection.IJRTE2019;8:2688-94
|
| [9] |
Martin SC.Tipu Zahed Aziz, MD (November 9, 1956-October 25, 2024).Neuromodulation2025;28:371-2
|
| [10] |
Touvron H, Lavril T, Izacard G, et al. LLaMA: open and efficient foundation language models. arXiv 2023; arXiv:2302.13971. Available from https://doi.org/10.48550/arXiv.2302.13971 [accessed 16 October 2025].
|
| [11] |
Bumgardner VK, Larsen MA, Anderson MB, Sayre GG, Fecho K, Pfaff ER. Local large language models for complex structured medical tasks. arXiv 2023; arXiv:2308.01727. Available from https://doi.org/10.48550/arXiv.2308.01727 [accessed 16 October 2025].
|
| [12] |
Wang H,Dantona C,Sun J.DRG-LLaMA : tuning LLaMA model to predict diagnosis-related group for hospitalized patients.NPJ Digit Med2024;7:16 PMCID:PMC10803802
|
| [13] |
Zhang R, Han J, Zhou A, et al. LLaMA-Adapter: efficient fine-tuning of large language models with zero-initialized attention. arXiv 2024; arXiv:2303.16199. Available from https://doi.org/10.48550/arXiv.2303.16199 [accessed 16 October 2025].
|
| [14] |
Frayling E, Lever J, McDonald G. Zero-shot and few-shot generation strategies for artificial clinical records. arXiv 2024; arXiv:2403.08664. Available from https://doi.org/10.48550/arXiv.2403.08664 [accessed 16 October 2025].
|
| [15] |
Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv 2020; arXiv:2005.11401. Available from https://doi.org/10.48550/arXiv.2005.11401 [accessed 16 October 2025].
|
| [16] |
Gao Y, Xiong Y, Gao X, et al. Retrieval-augmented generation for large language models: a survey. arXiv 2023; arXiv:2312.10997. Available from https://doi.org/10.48550/arXiv.2312.10997 [accessed 16 October 2025].
|
| [17] |
Salemi A.Evaluating retrieval quality in retrieval-augmented generation. In: SIGIR 2024: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2024 Jul 14-18; Washington DC, USA. New York: Association for Computing Machinery; 2024. pp. 2395-400.
|
| [18] |
Yu H,Sano A. Zero-shot ECG diagnosis with large language models and retrieval-augmented generation. In Machine learning for health (ML4H); 2023 Dec 10; New Orleans, USA. Cambridge: PMLR; 2023. pp. 650-63. Available from https://proceedings.mlr.press/v225/yu23b.html [accessed 16 October 2025].
|
| [19] |
Thompson WE, Vidmar DM, De Freitas JK, et al. Large language models with retrieval-augmented generation for zero-shot disease phenotyping. arXiv 2023; arXiv:2312.06457. Available from https://doi.org/10.48550/arXiv.2312.06457 [accessed 16 October 2025].
|
| [20] |
Mahbub S,Alinejad S. From one to zero: RAG-IM adapts language models for interpretable zero-shot predictions on clinical tabular data. In: NeurIPS 2024 Third Table Representation Learning Workshop, 2024. Available from https://openreview.net/forum?id=3OYjWzqqC1 [accessed 16 October 2025].
|
| [21] |
Ke YH,Elangovan K.Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.NPJ Digit Med2025;8:187 PMCID:PMC11971376
|
| [22] |
Dong X,Meng J,Lin H.SyRACT: zero-shot biomedical document-level relation extraction with synergistic RAG and CoT.Bioinformatics2025;41:btaf356 PMCID:PMC12237500
|
| [23] |
Mishali M,Mishali O.Evaluation of reporting trends in the MAUDE Database: 1991 to 2022.Digit Health2025;11:20552076251314094 PMCID:PMC11755539
|
| [24] |
Bala I,Stanford T,Mitchell L.Machine learning-based analysis of adverse events in mesh implant surgery reports.Soc Netw Anal Min2024;14:1229
|
| [25] |
Boutin R,Latouche P.Embedded topics in the stochastic block model.Stat Comput2023;33:10265
|
| [26] |
S SK,E GMK,A RGS.A RAG-based medical assistant especially for infectious diseases. In: 2024 International Conference on Inventive Computation Technologies (ICICT); 2024 Apr 24-26; Lalitpur, Nepal. New York: IEEE; 2024. pp. 1128-33.
|
| [27] |
Galli C,Calciolari E.Performance of 4 pre-trained sentence transformer models in the semantic query of a systematic review dataset on peri-implantitis.Information2024;15:68
|
| [28] |
Wang X,Tang M.Robust orbital game policy in multiple disturbed environments: an approach based on causality diversity maximal marginal relevance algorithm. In: Liu L, Niu Y, Fu W, Qu Y, Editors. Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems (4th ICAUS 2024); 2024 Sep 19-21; Shenyang, China. Singapore: Springer; 2025. pp. 355-69.
|
| [29] |
Badshah S, Sajjad H. Quantifying the capabilities of LLMs across scale and precision. arXiv 2024; arXiv:2405.03146. Available from https://doi.org/10.48550/arXiv.2405.03146 [accessed 16 October 2025].
|
| [30] |
Verma V.A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective.Soc Netw Anal Min2020;10:660
|
| [31] |
Bala I,Lim R,Mitchell L.An effective approach for multiclass classification of adverse events using machine learning.JCCE2024;3:226-39
|
| [32] |
Groves M,Alexander H.Clinical reasoning: the relative contribution of identification, interpretation and hypothesis errors to misdiagnosis.Med Teach2003;25:621-5
|
| [33] |
McHugh ML.The chi-square test of independence.Biochem Med2013;23:143-9 PMCID:PMC3900058
|
| [34] |
Bala I, Mitchell L, Gillam MH. Analysis of voluntarily reported data post mesh implantation for detecting public emotion and identifying concern reports. arXiv 2025; arXiv:2509.04517. Available from https://doi.org/10.48550/arXiv.2509.04517 [accessed 16 October 2025].
|