Opportunities and Challenges of Cardiovascular Disease Risk Prediction for Primary Prevention Using Machine Learning and Electronic Health Records: A Systematic Review
Tianyi Liu , Andrew J. Krentz , Zhiqiang Huo , Vasa Ćurčin
Reviews in Cardiovascular Medicine ›› 2025, Vol. 26 ›› Issue (4) : 37443
Cardiovascular disease (CVD) remains the foremost cause of morbidity and mortality worldwide. Recent advancements in machine learning (ML) have demonstrated substantial potential in augmenting risk stratification for primary prevention, surpassing conventional statistical models in predictive performance. Thus, integrating ML with Electronic Health Records (EHRs) enables refined risk estimation by leveraging the granularity and breadth of longitudinal individual patient data. However, fundamental barriers persist, including limited generalizability, challenges in interpretability, and the absence of rigorous external validation, all of which impede widespread clinical deployment.
This review adheres to the methodological rigor of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Scale for the Assessment of Narrative Review Articles (SANRA) guidelines. A systematic literature search was performed in March 2024, encompassing the Medline and Embase databases, to identify studies published since 2010. Supplementary references were retrieved from the Institute for Scientific Information (ISI) Web of Science, and manual searches were curated. The selection process, conducted via Rayyan, focused on systematic and narrative reviews evaluating ML-driven models for long-term CVD risk prediction within primary prevention contexts utilizing EHR data. Studies investigating short-term prognostication, highly specific comorbid cohorts, or conventional models devoid of ML components were excluded.
Following an exhaustive screening of 1757 records, 22 studies met the inclusion criteria. Of these, 10 were systematic reviews (four incorporating meta-analyses), while 12 constituted narrative reviews, with the majority published post-2020. The synthesis underscores the superiority of ML in modeling intricate EHR-derived risk factors, facilitating precision-driven cardiovascular risk assessment. Nonetheless, salient challenges endure heterogeneity in CVD outcome definitions, undermine comparability, data incompleteness and inconsistency compromise model robustness, and a dearth of external validation constrains clinical translatability. Moreover, ethical and regulatory considerations, including algorithmic opacity, equity in predictive performance, and the absence of standardized evaluation frameworks, pose formidable obstacles to seamless integration into clinical workflows.
Despite the transformative potential of ML-based CVD risk prediction, it remains encumbered by methodological, technical, and regulatory impediments that hinder its full-scale adoption into real-world healthcare settings. This review underscores the imperative circumstances for standardized validation protocols, stringent regulatory oversight, and interdisciplinary collaboration to bridge the translational divide. Our findings established an integrative framework for developing, validating, and applying ML-based CVD risk prediction algorithms, addressing both clinical and technical dimensions. To further advance this field, we propose a standardized, transparent, and regulated EHR platform that facilitates fair model evaluation, reproducibility, and clinical translation by providing a high-quality, representative dataset with structured governance and benchmarking mechanisms. Meanwhile, future endeavors must prioritize enhancing model transparency, mitigating biases, and ensuring adaptability to heterogeneous clinical populations, fostering equitable and evidence-based implementation of ML-driven predictive analytics in cardiovascular medicine.
cardiovascular disease / machine learning / electronic health records / risk prediction / primary prevention
| • | 1. Examine the current evidence on CVD prediction models and assess the potential of EHRs and ML models for enhancing CVD risk prediction. |
| • | 2. Identify the limitations of using EHRs and ML for CVD risk prediction, covering both clinical and technical aspects. |
| • | 3. Identify elements of an integrative framework for development, validation, and application of ML based CVD risk prediction algorithm. |
| • | 4. Highlight areas for future research directions to optimize the use of EHRs and ML for CVD risk prediction. |
| [1] |
Mensah GA, Fuster V, Murray CJL, Roth GA, Global Burden of Cardiovascular Diseases and Risks Collaborators. Global Burden of Cardiovascular Diseases and Risks, 1990-2022. Journal of the American College of Cardiology. 2023; 82: 2350–2473. https://doi.org/10.1016/j.jacc.2023.11.007. |
| [2] |
National Institute for Health and Care Excellence. Cardiovascular disease: risk assessment and reduction, including lipid modification. London: National Institute for Health and Care Excellence (NICE). 2023. Available at: www.nice.org.uk/guidance/ng238 (Accessed: 19 April 2024). |
| [3] |
Arnett DK, Blumenthal RS, Albert MA, Buroker AB, Goldberger ZD, Hahn EJ, et al. 2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation. 2019; 140: e596–e646. https://doi.org/10.1161/CIR.0000000000000678. |
| [4] |
Marx N, Federici M, Schütt K, Müller-Wieland D, Ajjan RA, Antunes MJ, et al. 2023 ESC Guidelines for the management of cardiovascular disease in patients with diabetes. European Heart Journal. 2023; 44: 4043–4140. https://doi.org/10.1093/eurheartj/ehad192. |
| [5] |
Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ (Clinical Research Ed.). 2017; 357: j2099. https://doi.org/10.1136/bmj.j2099. |
| [6] |
SCORE2 working group and ESC Cardiovascular risk collaboration. SCORE2 risk prediction algorithms: new models to estimate 10-year risk of cardiovascular disease in Europe. European Heart Journal. 2021; 42: 2439–2454. https://doi.org/10.1093/eurheartj/ehab309. |
| [7] |
Collins GS, Altman DG. An independent and external validation of QRISK2 cardiovascular disease risk score: a prospective open cohort study. BMJ (Clinical Research Ed.). 2010; 340: c2442. https://doi.org/10.1136/bmj.c2442. |
| [8] |
Goh LGH, Welborn TA, Dhaliwal SS. Independent external validation of cardiovascular disease mortality in women utilising Framingham and SCORE risk models: a mortality follow-up study. BMC Women’s Health. 2014; 14: 118. https://doi.org/10.1186/1472-6874-14-118. |
| [9] |
Hippisley-Cox J, Coupland CAC, Bafadhel M, Russell REK, Sheikh A, Brindle P, et al. Development and validation of a new algorithm for improved cardiovascular risk prediction. Nature Medicine. 2024; 30: 1440–1447. https://doi.org/10.1038/s41591-024-02905-y. |
| [10] |
D’Agostino RB, Sr, Grundy S, Sullivan LM, Wilson P, CHD Risk Prediction Group. Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA. 2001; 286: 180–187. https://doi.org/10.1001/jama.286.2.180. |
| [11] |
Allan S, Olaiya R, Burhan R. Reviewing the use and quality of machine learning in developing clinical prediction models for cardiovascular disease. Postgraduate Medical Journal. 2022; 98: 551–558. https://doi.org/10.1136/postgradmedj-2020-139352. |
| [12] |
Goldstein BA, Navar AM, Carter RE. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. European Heart Journal. 2017; 38: 1805–1814. https://doi.org/10.1093/eurheartj/ehw302. |
| [13] |
Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PloS One. 2017; 12: e0174944. https://doi.org/10.1371/journal.pone.0174944. |
| [14] |
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical Research Ed.). 2021; 372: n71. https://doi.org/10.1136/bmj.n71. |
| [15] |
Baethge C, Goldbeck-Wood S, Mertens S. SANRA-a scale for the quality assessment of narrative review articles. Research Integrity and Peer Review. 2019; 4: 5. https://doi.org/10.1186/s41073-019-0064-8. |
| [16] |
Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Systematic Reviews. 2016; 5: 210. https://doi.org/10.1186/s13643-016-0384-4. |
| [17] |
Mohd Faizal AS, Thevarajah TM, Khor SM, Chang SW. A review of risk prediction models in cardiovascular disease: conventional approach vs. artificial intelligent approach. Computer Methods and Programs in Biomedicine. 2021; 207: 106190. https://doi.org/10.1016/j.cmpb.2021.106190. |
| [18] |
Patel R, Peesay T, Krishnan V, Wilcox J, Wilsbacher L, Khan SS. Prioritizing the primary prevention of heart failure: Measuring, modifying and monitoring risk. Progress in Cardiovascular Diseases. 2024; 82: 2–14. https://doi.org/10.1016/j.pcad.2024.01.001. |
| [19] |
Suri JS, Bhagawati M, Paul S, Protogeron A, Sfikakis PP, Kitas GD, et al. Understanding the bias in machine learning systems for cardiovascular disease risk assessment: The first of its kind review. Computers in Biology and Medicine. 2022; 142: 105204. https://doi.org/10.1016/j.compbiomed.2021.105204. |
| [20] |
Zhao Y, Wood EP, Mirin N, Cook SH, Chunara R. Social Determinants in Machine Learning Cardiovascular Disease Prediction Models: A Systematic Review. American Journal of Preventive Medicine. 2021; 61: 596–605. https://doi.org/10.1016/j.amepre.2021.04.016. |
| [21] |
Jeong K, Mallard AR, Coombe L, Ward J. Artificial intelligence and prediction of cardiometabolic disease: Systematic review of model performance and potential benefits in indigenous populations. Artificial Intelligence in Medicine. 2023; 139: 102534. https://doi.org/10.1016/j.artmed.2023.102534. |
| [22] |
Azmi J, Arif M, Nafis MT, Alam MA, Tanweer S, Wang G. A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Medical Engineering & Physics. 2022; 105: 103825. https://doi.org/10.1016/j.medengphy.2022.103825. |
| [23] |
Banerjee A, Chen S, Fatemifar G, Zeina M, Lumbers RT, Mielke J, et al. Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility. BMC Medicine. 2021; 19: 85. https://doi.org/10.1186/s12916-021-01940-7. |
| [24] |
Krittanawong C, Virk HUH, Bangalore S, Wang Z, Johnson KW, Pinotti R, et al. Machine learning prediction in cardiovascular diseases: a meta-analysis. Scientific Reports. 2020; 10: 16057. https://doi.org/10.1038/s41598-020-72685-1. |
| [25] |
Chahine Y, Magoon MJ, Maidu B, Del Álamo JC, Boyle PM, Akoum N. Machine Learning and the Conundrum of Stroke Risk Prediction. Arrhythmia & Electrophysiology Review. 2023; 12: e07. https://doi.org/10.15420/aer.2022.34. |
| [26] |
Bozyel S, Şimşek E, Koçyiğit Burunkaya D, Güler A, Korkmaz Y, Şeker M, et al. Artificial Intelligence-Based Clinical Decision Support Systems in Cardiovascular Diseases. Anatolian Journal of Cardiology. 2024; 28: 74–86. https://doi.org/10.14744/AnatolJCardiol.2023.3685. |
| [27] |
Ciccarelli M, Giallauria F, Carrizzo A, Visco V, Silverio A, Cesaro A, et al. Artificial intelligence in cardiovascular prevention: new ways will open new doors. Journal of Cardiovascular Medicine (Hagerstown, Md.). 2023; 24: e106–e115. https://doi.org/10.2459/JCM.0000000000001431. |
| [28] |
Baashar Y, Alkawsi G, Alhussian H, Capretz LF, Alwadain A, Alkahtani AA, et al. Effectiveness of Artificial Intelligence Models for Cardiovascular Disease Prediction: Network Meta-Analysis. Computational Intelligence and Neuroscience. 2022; 2022: 5849995. https://doi.org/10.1155/2022/5849995. |
| [29] |
Hammond MM, Everitt IK, Khan SS. New strategies and therapies for the prevention of heart failure in high-risk patients. Clinical Cardiology. 2022; 45: S13–S25. https://doi.org/10.1002/clc.23839. |
| [30] |
Nadarajah R, Younsi T, Romer E, Raveendra K, Nakao YM, Nakao K, et al. Prediction models for heart failure in the community: A systematic review and meta-analysis. European Journal of Heart Failure. 2023; 25: 1724–1738. https://doi.org/10.1002/ejhf.2970. |
| [31] |
Amal S, Safarnejad L, Omiye JA, Ghanzouri I, Cabot JH, Ross EG. Use of Multi-Modal Data and Machine Learning to Improve Cardiovascular Disease Care. Frontiers in Cardiovascular Medicine. 2022; 9: 840262. https://doi.org/10.3389/fcvm.2022.840262. |
| [32] |
Liu W, Laranjo L, Klimis H, Chiang J, Yue J, Marschner S, et al. Machine-learning versus traditional approaches for atherosclerotic cardiovascular risk prognostication in primary prevention cohorts: a systematic review and meta-analysis. European Heart Journal. Quality of Care & Clinical Outcomes. 2023; 9: 310–322. https://doi.org/10.1093/ehjqcco/qcad017. |
| [33] |
Javed Z, Kundi H, Chang R, Titus A, Arshad H. Polysocial Risk Scores: Implications for Cardiovascular Disease Risk Assessment and Management. Current Atherosclerosis Reports. 2023; 25: 1059–1068. https://doi.org/10.1007/s11883-023-01173-4. |
| [34] |
Gautam N, Mueller J, Alqaisi O, Gandhi T, Malkawi A, Tarun T, et al. Machine Learning in Cardiovascular Risk Prediction and Precision Preventive Approaches. Current Atherosclerosis Reports. 2023; 25: 1069–1081. https://doi.org/10.1007/s11883-023-01174-3. |
| [35] |
Cai Y, Cai YQ, Tang LY, Wang YH, Gong M, Jing TC, et al. Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review. BMC Medicine. 2024; 22: 56. https://doi.org/10.1186/s12916-024-03273-7. |
| [36] |
Friedrich S, Groß S, König IR, Engelhardt S, Bahls M, Heinz J, et al. Applications of artificial intelligence/machine learning approaches in cardiovascular medicine: a systematic review with recommendations. European Heart Journal. Digital Health. 2021; 2: 424–436. https://doi.org/10.1093/ehjdh/ztab054. |
| [37] |
Khan MS, Arshad MS, Greene SJ, Van Spall HGC, Pandey A, Vemulapalli S, et al. Artificial intelligence and heart failure: A state-of-the-art review. European Journal of Heart Failure. 2023; 25: 1507–1525. https://doi.org/10.1002/ejhf.2994. |
| [38] |
Page MJ, Shamseer L, Tricco AC. Registration of systematic reviews in PROSPERO: 30,000 records and counting. Systematic Reviews. 2018; 7: 32. https://doi.org/10.1186/s13643-018-0699-4. |
| [39] |
Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Medicine. 2015; 13: 1. https://doi.org/10.1186/s12916-014-0241-z. |
| [40] |
Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine. 2011; 155: 529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009. |
| [41] |
Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Annals of Internal Medicine. 2019; 170: 51–58. https://doi.org/10.7326/M18-1376. |
| [42] |
World Health Organization. Cardiovascular diseases: Avoiding heart attacks and strokes. 2015. Available at: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (Accessed: 21 April 2024). |
| [43] |
Saputro SA, Pattanaprateep O, Pattanateepapon A, Karmacharya S, Thakkinstian A. Prognostic models of diabetic microvascular complications: a systematic review and meta-analysis. Systematic Reviews. 2021; 10: 288. https://doi.org/10.1186/s13643-021-01841-z. |
| [44] |
Burlacu A, Iftene A, Popa IV, Crisan-Dabija R, Brinza C, Covic A. Computational Models Used to Predict Cardiovascular Complications in Chronic Kidney Disease Patients: A Systematic Review. Medicina (Kaunas, Lithuania). 2021; 57: 538. https://doi.org/10.3390/medicina57060538. |
| [45] |
Khan SS, Coresh J, Pencina MJ, Ndumele CE, Rangaswami J, Chow SL, et al. Novel Prediction Equations for Absolute Risk Assessment of Total Cardiovascular Disease Incorporating Cardiovascular-Kidney-Metabolic Health: A Scientific Statement From the American Heart Association. Circulation. 2023; 148: 1982–2004. https://doi.org/10.1161/CIR.0000000000001191. |
| [46] |
Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). International Journal of Epidemiology. 2015; 44: 827–836. https://doi.org/10.1093/ije/dyv098. |
| [47] |
Li Y, Sperrin M, Belmonte M, Pate A, Ashcroft DM, van Staa TP. Do population-level risk prediction models that use routinely collected health data reliably predict individual risks? Scientific Reports. 2019; 9: 11222. https://doi.org/10.1038/s41598-019-47712-5. |
| [48] |
Amirahmadi A, Ohlsson M, Etminani K. Deep learning prediction models based on EHR trajectories: A systematic review. Journal of Biomedical Informatics. 2023; 144: 104430. https://doi.org/10.1016/j.jbi.2023.104430. |
| [49] |
Pendergrass SA, Crawford DC. Using Electronic Health Records To Generate Phenotypes For Research. Current Protocols in Human Genetics. 2019; 100: e80. https://doi.org/10.1002/cphg.80. |
| [50] |
Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlalı MY, et al. Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review. 2022; 46: 100511. https://doi.org/10.1016/j.cosrev.2022.100511. |
| [51] |
Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE Journal of Biomedical and Health Informatics. 2018; 22: 1589–1604. |
| [52] |
de Mello BH, Rigo SJ, da Costa CA, da Rosa Righi R, Donida B, Bez MR, et al. Semantic interoperability in health records standards: a systematic literature review. Health and Technology. 2022; 12: 255–272. https://doi.org/10.1007/s12553-022-00639-w. |
| [53] |
Kho ME, Duffett M, Willison DJ, Cook DJ, Brouwers MC. Written informed consent and selection bias in observational studies using medical records: systematic review. BMJ (Clinical Research Ed.). 2009; 338: b866. https://doi.org/10.1136/bmj.b866. |
| [54] |
Rocher L, Hendrickx JM, de Montjoye YA. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications. 2019; 10: 3069. https://doi.org/10.1038/s41467-019-10933-3. |
| [55] |
Zhao Y, Yu BYM, Liu Y, Tong T, Liu Y. Weight reduction and cardiovascular benefits: Protocol for a systematic review and meta-analysis. Medicine. 2018; 97: e13246. https://doi.org/10.1097/MD.0000000000013246. |
| [56] |
Damen JAAG, Hooft L, Schuit E, Debray TPA, Collins GS, Tzoulaki I, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ (Clinical Research Ed.). 2016; 353: i2416. https://doi.org/10.1136/bmj.i2416. |
| [57] |
Karmali KN, Persell SD, Perel P, Lloyd-Jones DM, Berendsen MA, Huffman MD. Risk scoring for the primary prevention of cardiovascular disease. The Cochrane Database of Systematic Reviews. 2017; 3: CD006887. https://doi.org/10.1002/14651858.CD006887.pub4. |
| [58] |
Willis A, Davies M, Yates T, Khunti K. Primary prevention of cardiovascular disease using validated risk scores: a systematic review. Journal of the Royal Society of Medicine. 2012; 105: 348–356. https://doi.org/10.1258/jrsm.2012.110193. |
| [59] |
Talha I, Elkhoudri N, Hilali A. Major Limitations of Cardiovascular Risk Scores. Cardiovascular Therapeutics. 2024; 2024: 4133365. https://doi.org/10.1155/2024/4133365. |
| [60] |
Khan SS, Matsushita K, Sang Y, Ballew SH, Grams ME, Surapaneni A, et al. Development and Validation of the American Heart Association’s PREVENT Equations. Circulation. 2024; 149: 430–449. https://doi.org/10.1161/CIRCULATIONAHA.123.067626. |
| [61] |
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ (Clinical Research Ed.). 2024; 385: e078378. https://doi.org/10.1136/bmj-2023-078378. |
Engineering and Physical Sciences Research Council (EPSRC)-funded King’s Health Partners Digital Health Hub(EP/X030628/1)
Metadvice Ltd. and the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London(IS-BRC-1215-20006)
/
| 〈 |
|
〉 |