Prostate cancer prediction model: A retrospective analysis based on machine learning using the MIMIC-IV database

Wei Wang, Xin Jin

PDF(1172 KB)
PDF(1172 KB)
Intelligent Pharmacy ›› 2023, Vol. 1 ›› Issue (4) : 268-273. DOI: 10.1016/j.ipha.2023.04.010
Research Article

Prostate cancer prediction model: A retrospective analysis based on machine learning using the MIMIC-IV database

Author information +
History +

Abstract

Prostate cancer is a common malignant tumor in men and early diagnosis and treatment are crucial for the survival rate of patients. This study aimed to establish a machine learning-based prostate cancer prediction model to help physicians accurately identify high-risk patients. This study performed a retrospective analysis using prostate cancer patient data from the MIMIC-IV database. First, the data was cleaned and preprocessed, including imputing missing values, handling outliers, and feature selection. Then, prediction models were established using machine learning algorithms (including logistic regression, support vector machine, deep neural networks, XGBoost, LightGBM and CatBoost) and evaluated using cross-validation and ROC curve analysis. We screened out 1975 patients diagnosed with PC and 11,745 patients diagnosed with BPH based on ICD codes. However, among the BPH patients, 467 were also diagnosed with PC, so we excluded these patients. The LightGBM machine learning model outperformed the other models in distinguishing patients with PC [LightGBM vs. CatBoost vs. XGBoost vs. DNN vs. SVM vs. LR; area under the curve (AUC): 0.93 vs. 0.91 vs. 0.89 vs. 0.86 vs. 0.70 vs. 0.68, respectively]. The LightGBM model had a sensitivity of 86%, specificity of 85% at the best cut-off value. The model was capable of predicting whether a patient has prostate cancer based on their clinical features (including age, Laboratory test, etc.) and had a high level of accuracy and stability. The machine learning-based prostate cancer prediction model established in this study has some clinical application value and can help physicians accurately identify high-risk patients, providing more precise prevention and treatment plans for patients.

Keywords

Prostate cancer / LightGBM / Predictive model / China

Cite this article

Download citation ▾
Wei Wang, Xin Jin. Prostate cancer prediction model: A retrospective analysis based on machine learning using the MIMIC-IV database. Intelligent Pharmacy, 2023, 1(4): 268‒273 https://doi.org/10.1016/j.ipha.2023.04.010

References

[1]
Gulshan Varun , Peng Lily , Coram Marc , et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316 (22): 2402- 2410.
[2]
Rajpurkar Pranav , Irvin Jeremy , Zhu Kaylie , et al. Chexnet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. 2017. arXiv preprint arXiv: 1711.05225.
[3]
Murtaza Ghulam , Shuib Liyana , Wahid Abdul Wahab Ainuddin , et al. Deep learningbased breast cancer classification through medical imaging modalities: state of the art and research challenges. Artif Intell Rev. 2020; 53: 1655- 1720.
[4]
Johnson Alistair EW , Stone David J , Celi Leo A , Pollard Tom J . The mimic code repository: enabling reproducibility in critical care research. J Am Med Inf Assoc. 2018; 25 (1): 32- 39.
[5]
Johnson Alistair , Bulgarelli Lucas , Pollard Tom , Horng Steven , Celi Leo Anthony , Mark Roger . Mimic-iv. PhysioNet; 2020. Available online at: https://physionet.org/content/mimiciv/1.0/. Accessed August 23, 2021.
[6]
Warren Joan L , Klabunde Carrie N , Schrag Deborah , Bach Peter B , Riley Gerald F . Overview of the seer-medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002: IV3- IV18.
[7]
Li Yixuan , Chen Zixuan . Performance evaluation of machine learning methods for breast cancer prediction. Appl Comput Math. 2018; 7 (4): 212- 216.
[8]
Raoof Syed Saba , Jabbar MA , Fathima Syed Aley . Lung cancer prediction using machine learning: a comprehensive approach, 108–115. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA). 2020. IEEE.
[9]
Boser Bernhard E , Guyon Isabelle M , Vapnik Vladimir N . A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. 1992: 144- 152.
[10]
Chen Tianqi , Guestrin Carlos . Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. 2016: 785- 794.
[11]
Guolin Ke , Qi Meng , Finley Thomas , et al. Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30.
[12]
Prokhorenkova Liudmila , Gusev Gleb , Vorobev Aleksandr , Veronika Dorogush Anna , Gulin Andrey . Catboost: unbiased boosting with categorical features. Adv Neural Inf Process Syst. 2018;31.
[13]
Krizhevsky Alex , Sutskever Ilya , Hinton Geoffrey E . Imagenet classification with deep convolutional neural networks. Commun ACM. 2017; 60 (6): 84- 90.
[14]
Concato John , Peduzzi Peter , Holford Theodore R , Feinstein Alvan R . Importance of events per independent variable in proportional hazards analysis i. background, goals, and general strategy. J Clin Epidemiol. 1995;48(12):1495–1501.
[15]
Peduzzi Peter , Concato John , Feinstein Alvan R , Holford Theodore R . Importance of events per independent variable in proportional hazards regression analysis ii. accuracy and precision of regression estimates. J Clin Epidemiol. 1995;48(12): 1503–1510.
[16]
Peduzzi Peter , Concato John , Kemper Elizabeth , Holford Theodore R , Feinstein Alvan R . A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996; 49 (12): 1373- 1379.
[17]
Riley Richard D , Ensor Joie , Kym IE Snell , et al. Calculating the sample size required for developing a clinical prediction model. Br Med J. 2020;368.
[18]
Riley Richard D , Kym IE Snell , Ensor Joie , et al. Minimum sample size for developing a multivariable prediction model: Part i-continuous outcomes. Stat Med. 2019; 38 (7): 1262- 1275.
[19]
Smeden Maarten van , Moons Karel GM , Groot Joris AH de , et al. Sample size for binary logistic prediction models: beyond events per variable criteria. Stat Methods Med Res. 2019; 28 (8): 2455- 2474.
[20]
Van Smeden Maarten , Groot Joris AH de , Moons Karel GM , et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol. 2016; 16: 1- 12.
[21]
Uemura Hirotsugu , Ye Dingwei , Kanesvaran Ravindran , et al. United in fight against prostate cancer (ufo) registry: first results from a large, multi-centre, prospective, longitudinal cohort study of advanced prostate cancer in asia. BJU Int. 2020; 125 (4): 541- 552.
[22]
Pernar Claire H , Ebot Ericka M , Wilson Kathryn M , Mucci Lorelei A . The epidemiology of prostate cancer. Cold Spring Harbor Perspect Med. 2018; 8 (12): a030361.
[23]
Adhyatma Kharisma Prasetya , Prapiska Fauriski F , Putra Siregar Ginanda , Warli Syah Mirsya . Systemic inflammatory response in predicting prostate cancer: the diagnostic value of neutrophil-to-lymphocyte ratio. Open Access Macedonian J Med Sci. 2019; 7 (10): 1628.
[24]
Fu Shuang , Zhang Xin , Niu Ye , Wang Rui-Tao . Prostate specific antigen, mean platelet volume, and platelet distribution width in combination to discriminate prostate cancer from benign prostate hyperplasia. Asian Pac J Cancer Prev APJCP. 2018; 19 (3): 699.
[25]
Zhou Jiatong , He Zhaowei , Ma Shenfei , Liu Ranlu . Ast/alt ratio as a significant predictor of the incidence risk of prostate cancer. Cancer Med. 2020; 9 (15): 5672- 5677.
[26]
Guo Hanxu , Jia Xianjie , Liu Hao . Based on biomedical index data: risk prediction model for prostate cancer. Medicine. 2021;100(17).
[27]
Mohler James L , Antonarakis Emmanuel S . Nccn guidelines updates: management of prostate cancer. J Natl Compr Cancer Netw. 2019;17(5.5):583–586.
[28]
Al-Khalil Shadi , Boothe David , Durdin Trey , et al. Interactions between benign prostatic hyperplasia (bph) and prostate cancer in large prostates: a retrospective data review. Int Urol Nephrol. 2016; 48: 91- 97.
[29]
B Culp MaryBeth , Soerjomataram Isabelle , Efstathiou Jason A , Bray Freddie , Jemal Ahmedin . Recent global patterns in prostate cancer incidence and mortality rates. Eur Urol. 2020; 77 (1): 38- 52.
[30]
Grossman David C , Curry Susan J , Owens Douglas K , et al. Screening for prostate cancer: us preventive services task force recommendation statement. JAMA. 2018; 319 (18): 1901- 1913.

RIGHTS & PERMISSIONS

2023 2023 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
AI Summary AI Mindmap
PDF(1172 KB)

Accesses

Citations

Detail

Sections
Recommended

/