Evaluating Predictive Performance of Machine Learning Algorithms That Integrate Routine Clinical Variables With Imaging-Derived Information in Stroke Recurrence Risk

Li Gao; Shitao Wang; Jinlian Li; Mingkun Zhang

doi:10.31083/BJHM50394

British Journal of Hospital Medicine ›› 2026, Vol. 87 ›› Issue (1) :50394 DOI: 10.31083/BJHM50394

Article

research-article

Evaluating Predictive Performance of Machine Learning Algorithms That Integrate Routine Clinical Variables With Imaging-Derived Information in Stroke Recurrence Risk

Li Gao ¹^,²
, Shitao Wang ³
, Jinlian Li ²
, Mingkun Zhang ¹^,⁴^,^*

Author information +

History +

PDF (252KB)

Abstract

Aims/Background:

Stroke recurrence remains a significant challenge in post-stroke management, with traditional prediction models often showing limited accuracy. This study aims to compare the performance of multiple machine learning (ML) algorithms that integrate routine clinical variables with imaging-derived features in predicting stroke recurrence risk, and to identify the optimal predictive model.

Methods:

This retrospective cohort study enrolled 350 patients with ischemic stroke who were admitted to The Fifth People’s Hospital of Jinan between January 2018 and December 2021. Patients were divided into three groups based on the time of first stroke onset: Group A (n = 110), Group B (n = 120), and Group C (n = 120). Routine clinical variables (age, gender, hypertension, and diabetes) and imaging features (infarct size and location) were collected. Four ML-based algorithms—logistic regression, random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost)—were used to construct predictive models. The predictive performance of these models was evaluated by area under the curve (AUC), sensitivity, specificity, and accuracy.

Results:

The XGBoost model showed the superior predictive performance, achieving the highest AUC of 0.86, followed by the random forest model (0.82), support vector machine model (0.78), and logistic regression model (0.75). The most influential predictors for stroke recurrence were found to be infarct size, history of hypertension, and fasting blood glucose levels.

Conclusion:

ML-based algorithms that integrate routine clinical variables with imaging-derived data can predict stroke recurrence risk effectively, with the XGBoost model demonstrating superior predictive performance, which may further support more individualized clinical decision-making.

Keywords

stroke rehabilitation / machine learning / risk assessment / neuroimaging / secondary prevention

Cite this article

Download citation ▾

Li Gao, Shitao Wang, Jinlian Li, Mingkun Zhang. Evaluating Predictive Performance of Machine Learning Algorithms That Integrate Routine Clinical Variables With Imaging-Derived Information in Stroke Recurrence Risk. British Journal of Hospital Medicine, 2026, 87(1): 50394 DOI:10.31083/BJHM50394

登录浏览全文

4963

注册一个新账户忘记密码

1. Introduction

Stroke is a devastating global health burden and remains one of the leading causes of death and long-term disability across all age groups [1, 2]. The World Health Organization estimates that over 15 million people experience a stroke each year; approximately 5 million die and 5 million are left with permanent disability [3]. A major challenge in effective stroke management is the significant risk of recurrence. Epidemiological evidence indicates that about 5.7–51.3% of patients experience a second stroke within the first year after the initial event, and the risk can persist for years [4]. Recurrent stroke often results in more severe neurological impairment, increased healthcare costs, and a significant reduction in quality of life for patients and their families [5]. Therefore, early and accurate identification of individuals at high risk of recurrence is not merely a clinical priority but also a critical public health need, enabling individualized secondary prevention strategies to mitigate this risk.

Traditional approaches for predicting the risk of stroke recurrence, such as the Essen Stroke Risk Score (ESRS), the Stroke Prognostic Instrument (SPI), and the ABCD² (Age, Blood pressure, Clinical features, Duration of symptoms, Diabetes) score, are widely used in routine clinical care [6]. These models generally rely on a limited set of readily available clinical variables, including age, history of hypertension, diabetes mellitus, atrial fibrillation, and a previous transient ischemic attack (TIA) [7]. While they provide a convenient approach to risk stratification, their predictive performance is often moderate, with validation studies demonstrating area under the curve (AUC) values of 0.6 to 0.7 [8]. This modest accuracy indicates, in part, the limited ability of these strategies to capture the complex, multifactorial biology of stroke, which involves interactions between clinical features, biochemical pathways, and structural brain changes. Moreover, many of these models often do not incorporate detailed neuroimaging information that can provide insights into the severity and anatomical distribution of cerebral damage, all of which are important determinants of recurrence risk.

In recent years, machine learning (ML) has revolutionized various fields of medicine, including diagnostic imaging, prognostic prediction modeling, and assessment of treatment response [9]. By processing high-dimensional data, identifying non-linear relationships, and extracting complex patterns from large datasets, ML approaches offer a promising alternative to traditional statistical methods for predicting stroke recurrence [10]. Unlike conventional approaches, ML models can integrate diverse data sources, including routine clinical variables, laboratory results, and imaging-derived features, enabling the development of more comprehensive and more accurate prediction tools [11].

Neuroimaging, in particular, holds significant potential for enhancing the prediction of recurrent stroke risk. Computed tomography (CT) and magnetic resonance imaging (MRI) can characterize infarct size and location and detect associated pathologies such as leukoaraiosis, cerebral microbleeds, and carotid artery stenosis [12]. These imaging features can reflect the underlying vascular pathology, the severity of cerebral ischemia, and the burden of silent cerebrovascular disease, all of which are strongly linked to stroke recurrence. For example, larger infarct sizes have consistently been associated with a higher recurrence risk [13], likely indicating more extensive vascular injury and a greater likelihood of unstable atherosclerotic plaques. Similarly, leukoaraiosis, a marker of cerebral small-vessel disease, has been established as an independent predictor of recurrent vascular events [14].

Despite growing interest in applying ML in stroke research, limited studies have performed systematic comparisons of various ML algorithms for predicting stroke recurrence using a combination of routine clinical variables and imaging features. Most published studies have assessed only a single algorithm or have used one data modality alone (e.g., clinical data without imaging, or imaging without detailed clinical data), which limits our understanding of which algorithm and which data integration approach yields the best predictive performance. Additionally, prioritizing and interpreting the most influential predictors of recurrence within an integrated dataset remains crucial, both to enhance model transparency and to generate mechanistic insights that could inform the development of more effective secondary preventive strategies.

Therefore, this study aims to address these gaps by evaluating the performance of four commonly used ML approaches: logistic regression, random forest, support vector machine (SVM), and extreme gradient boosting (XGBoost). Using an integrated dataset that combines routine clinical data with detailed imaging features, the study seeks to determine which algorithm achieves the highest predictive performance for stroke recurrence. Furthermore, the study will identify the most influential predictors of recurrence within the integrated dataset and assess the generalizability of the optimal model across clinically relevant subgroups, such as patients with cortical versus subcortical infarcts. Overall, the findings may support the development of more accurate and clinically useful tools for recurrence risk stratification, enabling more individualized secondary prevention and improved patient outcomes.

2. Methods

2.1 Study Population

This study enrolled 350 patients with ischemic stroke from the Department of Neurology, The Fifth People’s Hospital of Jinan, China, between January 2018 and December 2021. Inclusion criteria were as follows: (1) diagnosis consistent with Chinese Stroke Association guidelines for clinical management of ischaemic cerebrovascular diseases: executive summary and 2023 update [15]; (2) first-ever ischemic stroke confirmed by CT or MRI; and (3) availability of complete clinical and imaging data. However, patients were excluded if they had: (1) hemorrhagic stroke; (2) stroke secondary to trauma, tumor, or other non-atherosclerotic causes; (3) severe cognitive impairment or other conditions preventing completion of follow-up.

Patients were categorized into three groups based on the admission period: Group A (January 2018–December 2019), Group B (January 2020–June 2021), and Group C (July 2021–December 2021). This non-uniform time interval design was adopted to account for a hospital-wide transition to a digital medical record system in the later study phase (post–June 2021), which substantially improved the efficiency of patient identification and research recruitment. To ensure balanced sample sizes and baseline characteristics across groups (all p

>

0.05) while maintaining consistent inclusion criteria, longer intervals were used for Groups A and B (pre-digitalization) to accumulate adequate patients, and a shorter interval was applied for Group C (post-digitalization) to avoid over-recruitment. The primary outcome was stroke recurrence, defined as a new ischemic stroke event confirmed by imaging within one year after the first stroke. A 1-year follow-up was selected because the risk of stroke recurrence is highest during the first year after the initial event, making it a critical window for intensified secondary prevention [16]. The observed difference in monthly enrollment rates across cohorts, including the higher recruitment rate in Group C, likely reflects a hospital-wide transition to a digital medical record system during the later study phase, which significantly improved the efficiency of patient identification and research recruitment while maintaining the same inclusion criteria.

2.2 Data Collection

Two categories of variables, such as routine clinical data and imaging features, were collected for each participant. Routine clinical variables included demographic characteristics (age, gender), comorbidities (hypertension, diabetes, atrial fibrillation, coronary heart disease), laboratory results (fasting blood glucose, total cholesterol, low-density lipoprotein cholesterol, creatinine), and treatment (antiplatelet therapy recorded as a binary variable without specifying the agent or combination regimen, and statin use). Information on formal anticoagulation (e.g., warfarin or direct oral anticoagulants) was not consistently available and was therefore excluded from the analysis. Demographic factors and key comorbidities (hypertension, diabetes, atrial fibrillation, and coronary heart disease) were selected because they are well-established clinical determinants of stroke recurrence.

Imaging features included infarct size (cm², measured by CT/MRI), infarct location (cortical, subcortical, or posterior circulation), severity of leukoaraiosis (mild, moderate, severe), and carotid artery stenosis (

>

50% or not, assessed using ultrasound).

2.3 Machine Learning Models

Four ML algorithms selected for model construction were as follows: (i) Logistic regression (LR), a linear classifier that models the log-odds of binary outcomes, incorporating L1 regularization to reduce overfitting and support feature selection [17]. (ii) Random forest (RF), an ensemble approach that combines multiple decision trees, using bootstrap resampling and random feature selection to enhance robustness and reduce variance [18]. (iii) SVM is a margin-based classifier that identifies an optimal hyperplane to separate classes, using a radial basis function kernel to capture non-linear associations [19]. (iv) XGBoost, a gradient-boosting framework that builds sequential trees with regularization to enhance generalization and minimize prediction error [20].

Feature importance was calculated from each model’s internal metric, scoring features based on their average gain across all splits in which they contributed. For benchmarking against traditional risk stratification, the Essen Stroke Risk Score (ESRS) was also calculated for each patient.

2.4 Model Training and Evaluation

The entire cohort was randomly categorized into a training set (70%, n = 245) for model development and an independent testing set (30%, n = 105) for final performance evaluation. All data preprocessing procedures were established using the training data and then applied to the testing data to prevent data leakage. These preprocessing steps included imputation of missing values (median for continuous variables and mode for categorical variables), standardization of continuous variables, one-hot encoding of categorical variables, winsorization of outliers at the 1st and 99th percentiles, and application of the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance.

Model hyperparameters were optimized using 5-fold cross-validation within the training set, applying grid search for LR, RF, and SVM, and Bayesian optimization for XGBoost. The hyperparameter search ranges and the optimally selected values are detailed in Table 1. Model performance was determined on the independent testing set using the AUC, sensitivity, specificity, and accuracy. All analyses were performed in Python 3.9 (Python Software Foundation, Beaverton, OR, USA) using scikit-learn (v1.0.2) and XGBoost (v1.5.1) libraries.

2.5 Statistical Analysis

Statistical analyses were conducted using Python (v3.9; Python Software Foundation, Beaverton, OR, USA) with the scikit-learn (v1.0.2) and XGBoost (v1.5.1) libraries, and R (v4.1.2; R Foundation for Statistical Computing, Vienna, Austria) with the tidyverse (v1.3.1) and pROC (v1.18.0) packages. Categorical variables are presented as frequencies and percentages (n, %). Group comparisons were performed using Pearson’s chi-square test. Continuous variables: Normality was assessed using the Shapiro-Wilk test, and homogeneity of variances was assessed using Levene’s test. Normally distributed continuous variables are presented as mean

\pm{}

standard deviation (mean

\pm{}

SD) and compared using Student’s t-test (two groups) or one-way analysis of variance (ANOVA; three or more groups). Non-normally distributed continuous variables are presented as median (interquartile range, IQR) and compared using the Mann-Whitney U test (two groups) or Kruskal-Wallis test (three or more groups). All statistical tests were two-tailed, and a p-value of

<

0.05 was considered statistically significant.

Model calibration, representing agreement between predicted probabilities and observed outcomes, was assessed using the Hosmer-Lemeshow goodness-of-fit test. To further evaluate the key predictors identified by the best-performing model , multivariate logistic regression was performed with adjustment for potential confounders.

3. Results

3.1 Comparison of Baseline Characteristics Across Three Groups

The baseline characteristics of the three groups are summarized in Table 2. No significant differences were found across three groups (Group A, B, and C) regarding age, gender, comorbidities, or imaging features (all p

>

0.05), indicating that the groups were well balanced at baseline.

3.2 Comparison of Characteristics Between the Training and Testing Sets

Comparison of baseline characteristics between the training set (70% of patients, n = 245) and the testing set (30%, n = 105) is detailed in Table 3. No substantial differences were observed across any variables, including demographic factors, comorbidities, laboratory assessments, treatments, and imaging features (all p

>

0.05), confirming balanced randomization. This balance ensures the validity of subsequent model training and validation.

3.3 Stroke Recurrence Rate

Stroke recurrence rates across predefined subgroups, including admission-period groups, infarct location, and key clinical risk factors, are summarized in Table 4. Recurrence rates were comparable across the three time-period groups. Conversely, hypertension and carotid stenosis (

>

50%) were linked to significantly higher recurrence rates, underscoring their role in recurrent stroke risk.

3.4 Predictive Performance of the ML Models

Predictive performance of the four ML models for stroke recurrence is shown in Table 5. Among them, the XGBoost model achieved the highest discrimination, with an AUC of 0.86 (95% confidence interval [CI]: 0.79–0.92), followed by RF (AUC 0.82, 95% CI: 0.75–0.89), SVM (AUC 0.78, 95% CI: 0.70–0.86), and LR (AUC 0.75, 95% CI: 0.67–0.83). Additionally, the XGBoost model showed the highest sensitivity (81.0%), specificity (84.1%), and overall accuracy (83.5%).

3.5 Calibration of Models

Calibration of all five predictive models, reflecting the agreement between predicted probabilities and observed outcomes, was assessed using the Hosmer-Lemeshow goodness-of-fit test. As shown in Supplementary Table 1, all models, including the traditional ESRS, demonstrated good calibration, with non-significant p-values (all p

>

0.05). These findings indicate close agreement between predicted and observed stroke recurrence risk.

3.6 Subgroup Analysis by Infarct Location

A subgroup analysis stratified by infarct location was conducted to evaluate whether the predictive performance differed across etiologically distinct stroke subtypes, despite comparable overall recurrence rates. To assess the generalizability of the optimal model across these pathophysiologically heterogeneous stroke subtypes, model performance was evaluated individually in subgroups stratified by infarct location: cortical, subcortical, and posterior circulation. As described in Table 6, XGBoost maintained the highest performance across all three subgroups, achieving an AUC of 0.88 (95% CI: 0.80–0.96) for cortical infarcts, 0.84 (95% CI: 0.76–0.92) for subcortical infarcts, and 0.81 (95% CI: 0.70–0.92) for posterior circulation infarcts. Random forest followed as the second-best performer in each subgroup, with AUCs of 0.83, 0.80, and 0.78, respectively.

3.7 Key Predictors of Stroke Recurrence

The ten most influential predictors of stroke recurrence identified by the XGBoost model based on feature importance ranking are listed in Table 7. Infarct size demonstrated the greatest contribution (100.0), followed by a history of hypertension (85.2) and fasting blood glucose (78.6), suggesting crucial roles in recurrence risk prediction.

3.8 Multivariate Logistic Regression for Key Predictors

Multivariate logistic regression findings assessing associations between key predictors and stroke recurrence are shown in Table 8. It revealed that infarct size (odds ratio [OR] = 2.15, 95% CI: 1.52–3.04), hypertension (OR = 1.89, 95% CI: 1.12–3.18), and fasting blood glucose (OR = 1.67, 95% CI: 1.03–2.71) were independently associated with increased recurrence risk of stroke (all p

<

0.05).

4. Discussion

The present study systematically compared the performance of four machine learning algorithms for predicting stroke recurrence using an integrated set of routine clinical variables and imaging features. Among them, the XGBoost model demonstrated the strongest predictive performance, achieving an AUC of 0.86. The findings underscore the potential of ML-based approaches to enhance risk stratification for stroke recurrence and to address key limitations of traditional prediction models that rely on a narrow set of clinical variables.

The superior performance of XGBoost compared with logistic regression, random forest, and SVM aligns with previous findings highlighting that gradient-boosting frameworks are well-suited to complex, high-dimensional clinical datasets [21]. A possible explanation for its superior performance is XGBoost’s capability to model non-linear relationships and higher-order interactions among variables, such as the synergistic effect of infarct size and hypertension. For instance, while large infarcts are associated with higher recurrence risk, this effect may be significantly amplified in patients with poorly controlled hypertension, a relationship that linear models such as logistic regression may not capture adequately. This capability is particularly relevant in stroke research, where recurrence risk is determined by a complex interaction of vascular, metabolic, and neuroimaging-related factors.

Integrating imaging-derived features into the predictive models represents a key strength of this study. Traditional models often overlook neuroimaging data because of its analytical complexity and the need for specialized interpretation; however, our results indicate that imaging features, particularly infarct size, contribute significantly to recurrence prediction. Infarct size, ranked as a crucial predictor in the XGBoost model, consistent with previous evidence linking larger infarcts to higher recurrence risk [22]. Larger infarcts usually reflect more severe arterial occlusion, greater ischemic injury, and a higher likelihood of underlying vasculopathy, which all together increase the risk of subsequent cerebrovascular events [23]. Additionally, incorporating markers such as leukoaraiosis and carotid artery stenosis captures the contributions of small-vessel disease and large-artery atherosclerosis, respectively, thereby enhancing the clinical relevance of risk stratification [24].

The identification of hypertension and fasting blood glucose as key predictors reinforces the crucial role of metabolic and vascular risk management in secondary prevention. Hypertension, a well-established driver of stroke pathogenesis, promotes arteriosclerosis, disrupts endothelial function, and increases susceptibility to small vessel occlusion [25]. Similarly, elevated fasting blood glucose levels, even among individuals without a diagnosis of diabetes, may indicate insulin resistance and systemic inflammation, both of which contribute to vascular injury and thrombus formation [26]. Notably, lifestyle-based interventions can significantly improve these metabolic parameters [27]. These findings support current clinical guidelines that emphasize tight blood pressure and glycemic management after stroke, while also highlighting how ML-based models may help identify high-risk individuals who could benefit from more aggressive intervention.

Subgroup analyses revealed that the XGBoost model maintained strong predictive performance across patients with cortical, subcortical, and posterior circulation infarcts, suggesting good generalizability in distinct stroke subtypes with varying etiologies (e.g., large-artery atherosclerosis for cortical, small-vessel disease for subcortical, and vertebrobasilar pathology for posterior circulation). This result is clinically relevant because cortical and subcortical strokes often have distinct etiologies, such as large-artery atherosclerosis and small-vessel disease, and may therefore require tailored preventive strategies [15]. The consistent performance of the model across these subgroups supports its potential ability as a flexible and broadly applicable approach in clinical risk stratification.

Our results also highlight the limitations of traditional risk scores. For example, the ESRS, which relies on variables such as age, hypertension, and diabetes, typically achieves an AUC of about 0.65–0.70 for predicting recurrence [28]. In contrast, the XGBoost model yielded an AUC of 0.86, representing a meaningful improvement in predictive accuracy that could improve identification of high-risk patients. However, ML-based models should be used to complement, not replace, clinical decision-making. While the XGBoost model provides a quantitative risk estimation, clinicians should interpret these findings alongside patient-specific factors, including adherence to medication and lifestyle factors, to guide tailored management.

Several limitations of the study should be considered before interpreting these results. First, the single-center, retrospective design may limit the generalizability of the findings. Variations in clinical practice patterns, imaging acquisition and interpretation, and follow-up procedures across institutions could affect model performance, emphasizing the need for external validation in multicenter cohorts. Second, the study focused on recurrence within the first year of stroke, and longer follow-up is needed to assess how well these models predict late recurrent events. Third, several potentially informative predictors, including genetic markers, lifestyle factors (e.g., smoking status and physical activity), and detailed data on medication adherence, were not included due to unavailability in electronic medical records. Incorporating these variables in future studies may further improve predictive accuracy. Fourth, while the XGBoost model demonstrated strong performance, the restricted interpretability typical of “black box” models may hinder clinical acceptance without robust explanation frameworks and prospective assessment. Fifth, and importantly, antithrombotic medications were inadequately characterized. The “antiplatelet therapy” was captured only as a binary variable and did not distinguish between single or dual regimens. Crucially, anticoagulant use, which is a critical determinant of recurrence prevention in patients with atrial fibrillation, was not consistently available. The absence of this key confounder likely affected the model’s performance and should be addressed in future studies.

Despite these limitations, this study advances our understanding of ML-based stroke recurrence prediction by demonstrating the benefit of integrating routine clinical variables with imaging-derived data. The XGBoost model demonstrated high discriminative performance and consistent outcomes across subgroups, indicating potential application for supporting personalized secondary prevention strategies. However, the single-center, retrospective design and the lack of external validation remain significant limitations and may restrict generalizability. The lack of external validation in diverse, multi-center cohorts represents a significant limitation, potentially affecting the generalizability of our model. Future studies should prioritize external validation to ensure robustness across different patient populations, imaging protocols, and clinical workflows. Furthermore, restricting outcomes to a 1-year recurrence window does not capture late recurrent events, and longer follow-up would strengthen the clinical relevance of the model. Future studies should focus on external validation, incorporating additional predictive variables (such as lifestyle, adherence, and other biologically informative predictors), and develop practical, user-friendly tools to facilitate implementation in routine clinical care.

In summary, machine learning algorithms that integrate routine clinical variables with imaging-derived features can effectively predict stroke recurrence risk, with the XGBoost model offering the highest overall performance. Infarct size, hypertension, and fasting blood glucose were identified as most influential predictors, underscoring the importance of structural neuroimaging and rigorous management of metabolic and vascular risk factors in secondary prevention. These findings support the use of ML-based models as adjuncts to clinical decision-making, with the potential to improve outcomes by facilitating more targeted risk reduction approaches.

5. Conclusion

This study demonstrates that machine learning algorithms integrating routine clinical data and imaging features can predict stroke recurrence risk effectively, with the XGBoost model achieving the highest overall performance. The key predictors, particularly infarct size and a history of hypertension, underscore the significance of structural brain injury and vascular-metabolic dysregulation in driving recurrence risk. Robust performance across cortical, subcortical, and posterior circulation infarct subgroups further supports the model’s potential clinical utility in diverse stroke subtypes with distinct pathophysiological mechanisms.

Key Points

• Machine learning models, particularly XGBoost, that integrate both routine clinical and imaging-derived features demonstrate a higher predictive performance for stroke recurrence risk than traditional models.

• Infarct size, a history of hypertension, and fasting blood glucose levels were identified as the most influential predictors of recurrence.

• The XGBoost model maintained robust predictive performance across different stroke subtypes defined by infarct location.

• This study highlights the potential of applying advanced analytical methods and multimodal data for enhancing risk stratification and supporting personalized secondary prevention strategies in stroke survivors.

Availability of Data and Materials

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Feigin VL, Brainin M, Norrving B, Martins SO, Pandian J, Lindsay P, et al. World Stroke Organization: Global Stroke Fact Sheet 2025. International Journal of Stroke. 2025; 20: 132–144. https://doi.org/10.1177/17474930241308142.

[2]	Hilkens NA, Casolla B, Leung TW, de Leeuw FE. Stroke. Lancet. 2024; 403: 2820–2836. https://doi.org/10.1016/S0140-6736(24)00642-1.

[3]

GBD 2021 Diseases and Injuries Collaborators. Global incidence, prevalence, years lived with disability (YLDs), disability-adjusted life-years (DALYs), and healthy life expectancy (HALE) for 371 diseases and injuries in 204 countries and territories and 811 subnational locations, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet. 2024; 403: 2133–2161. https://doi.org/10.1016/S0140-6736(24)00757-8.

[4]	Kolmos M, Christoffersen L, Kruuse C. Recurrent Ischemic Stroke - A Systematic Review and Meta-Analysis. Journal of Stroke and Cerebrovascular Diseases. 2021; 30: 105935. https://doi.org/10.1016/j.jstrokecerebrovasdis.2021.105935.

[5]	Dymm BL, Kwicklis M, Meurer WJ, Shi X, Lisabeth LD. Recurrent stroke arrival time. Journal of Stroke and Cerebrovascular Diseases. 2023; 32: 107069. https://doi.org/10.1016/j.jstrokecerebrovasdis.2023.107069.

[6]	Ke L, Zhang H, Long K, Peng Z, Huang Y, Ma X, et al. Risk factors and prediction models for recurrent acute ischemic stroke: a retrospective analysis. PeerJ. 2024; 12: e18605. https://doi.org/10.7717/peerj.18605.

[7]	Abedi V, Avula V, Chaudhary D, Shahjouei S, Khan A, Griessenauer CJ, et al. Prediction of Long-Term Stroke Recurrence Using Machine Learning Models. Journal of Clinical Medicine. 2021; 10: 1286. https://doi.org/10.3390/jcm10061286.

[8]	Gladstone DJ, Lindsay MP, Douketis J, Smith EE, Dowlatshahi D, Wein T, et al. Canadian Stroke Best Practice Recommendations: Secondary Prevention of Stroke Update 2020. The Canadian Journal of Neurological Sciences. 2022; 49: 315–337. https://doi.org/10.1017/cjn.2021.127.

[9]	Karako K, Tang W. Applications of and issues with machine learning in medicine: Bridging the gap with explainable AI. Bioscience Trends. 2025; 18: 497–504. https://doi.org/10.5582/bst.2024.01342.

[10]	Colangelo G, Ribo M, Montiel E, Dominguez D, Olivé-Gadea M, Muchada M, et al. PRERISK: A Personalized, Artificial Intelligence-Based and Statistically-Based Stroke Recurrence Predictor for Recurrent Stroke. Stroke. 2024; 55: 1200–1209. https://doi.org/10.1161/STROKEAHA.123.043691.

[11]	Ley C, Martin RK, Pareek A, Groll A, Seil R, Tischer T. Machine learning and conventional statistics: making sense of the differences. Knee Surgery, Sports Traumatology, Arthroscopy. 2022; 30: 753–757. https://doi.org/10.1007/s00167-022-06896-6.

[12]	van Dam-Nolen DHK, Truijman MTB, van der Kolk AG, Liem MI, Schreuder FHBM, Boersma E, et al. Carotid Plaque Characteristics Predict Recurrent Ischemic Stroke and TIA: The PARISK (Plaque At RISK) Study. JACC. Cardiovascular Imaging. 2022; 15: 1715–1726. https://doi.org/10.1016/j.jcmg.2022.04.003.

[13]	Arends CM, Liman TG, Strzelecka PM, Kufner A, Löwe P, Huo S, et al. Associations of clonal hematopoiesis with recurrent vascular events and death in patients with incident ischemic stroke. Blood. 2023; 141: 787–799. https://doi.org/10.1182/blood.2022017661.

[14]	Moroni F, Ammirati E, Magnoni M, D’Ascenzo F, Anselmino M, Anzalone N, et al. Carotid atherosclerosis, silent ischemic brain damage and brain atrophy: A systematic review and meta-analysis. International Journal of Cardiology. 2016; 223: 681–687. https://doi.org/10.1016/j.ijcard.2016.08.234.

[15]	Liu L, Li Z, Zhou H, Duan W, Huo X, Xu W, et al. Chinese Stroke Association guidelines for clinical management of ischaemic cerebrovascular diseases: executive summary and 2023 update. Stroke and Vascular Neurology. 2023; 8: e3. https://doi.org/10.1136/svn-2023-002998.

[16]	Månsson K, Söderholm M, Berhin I, Pessah-Rasmussen H, Ullberg T. The Post-Stroke Checklist: longitudinal use in routine clinical practice during first year after stroke. BMC Cardiovascular Disorders. 2024; 24: 601. https://doi.org/10.1186/s12872-024-04239-6.

[17]	Mahmood NH, Kadir DH. Sparsity regularization enhances gene selection and leukemia subtype classification via logistic regression. Leukemia Research. 2025; 150: 107663. https://doi.org/10.1016/j.leukres.2025.107663.

[18]	Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Briefings in Bioinformatics. 2023; 24: bbad002. https://doi.org/10.1093/bib/bbad002.

[19]	Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995; 20: 273–297. https://doi.org/10.1007/BF00994018.

[20]	Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: New York. 2016. https://doi.org/10.1145/2939672.2939785.

[21]	Olaniran OR, Olaniran SF, Allohibi J, Alharbi AA, Alharbi NM. Mixed effect gradient boosting for high-dimensional longitudinal data. Scientific Reports. 2025; 15: 30927. https://doi.org/10.1038/s41598-025-16526-z.

[22]	Li G, Feng Z, Zhang H, Zou Y, Xv H, Jiang S. Analysis of influencing factors and interaction effects on stroke recurrence in patients with middle cerebral artery occlusion treated with mechanical thrombectomy. Frontiers in Neurology. 2025; 16: 1580950. https://doi.org/10.3389/fneur.2025.1580950.

[23]	Jreij G, Canton G, Hippe DS, Balu N, Yuan C, Cebral J, et al. Systematic review of biomechanical forces associated with carotid plaque disruption and stroke. Journal of Vascular Surgery. 2025; 82: 1113–1124.e7. https://doi.org/10.1016/j.jvs.2025.05.014.

[24]	Golsari A, Bittersohl D, Cheng B, Griem P, Beck C, Hassenstein A, et al. Silent Brain Infarctions and Leukoaraiosis in Patients With Retinal Ischemia: A Prospective Single-Center Observational Study. Stroke. 2017; 48: 1392–1396. https://doi.org/10.1161/STROKEAHA.117.016467.

[25]	Zhang C, Li Z, Liu L, Pu Y, Zou X, Yan H, et al. The role of hypertension and diabetes mellitus on the etiology of middle cerebral artery disease. Brain and Behavior. 2022; 12: e2521. https://doi.org/10.1002/brb3.2521.

[26]	Giacchetti G, Sechi LA, Rilli S, Carey RM. The renin-angiotensin-aldosterone system, glucose metabolism and diabetes. Trends in Endocrinology and Metabolism. 2005; 16: 120–126. https://doi.org/10.1016/j.tem.2005.02.003.

[27]

Hörber S, Lehmann R, Fritsche L, Machann J, Birkenfeld AL, Häring HU, et al. Lifestyle Intervention Improves Prothrombotic Coagulation Profile in Individuals at High Risk for Type 2 Diabetes. The Journal of Clinical Endocrinology and Metabolism. 2021; 106: e3198–e3207. https://doi.org/10.1210/clinem/dgab124.

[28]	Zhao J, Wang D, Liu X, Wang Y, Zhao X. The Predictive Value of Essen and SPI-II on the Risk of 5-Year Recurrence in Chinese Patients with Acute Ischemic Stroke. Neuropsychiatric Disease and Treatment. 2023; 19: 2251–2260. https://doi.org/10.2147/NDT.S433383.

PDF (252KB)

Part of a collection:

Supplementary files

1759-7390-87-1-50394-s1

277

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Authors & reviewers

Online submission

Author guidelines