Prediction of Hyperuricemia Risk Based on Medical Examination Report Analysis

Rong Hou , Yongbo Xiao , Yan Zhu , Hongyan Zhao

Journal of Systems Science and Systems Engineering ›› 2020, Vol. 29 ›› Issue (4) : 468 -503.

PDF
Journal of Systems Science and Systems Engineering ›› 2020, Vol. 29 ›› Issue (4) : 468 -503. DOI: 10.1007/s11518-020-5462-4
Article

Prediction of Hyperuricemia Risk Based on Medical Examination Report Analysis

Author information +
History +
PDF

Abstract

This study hopes to contribute to disease detection by analyzing a medical examination dataset with 123,968 samples. Based on association rules mining and related medical knowledge, 6 models were constructed here to predict hyperuricemia prevalence and investigated its risk factors. Comparing different models, the prediction performances of Lasso logistic regression, traditional logistic regression, and random forest are excellent, and the results can be interpreted. PCA logistic regression model also works well, but it is not analytical. KNN’s prediction performance is relatively poor, while data dimensionality reduction can significantly improve its AUC. SVC has the worst performance and its efficiency of processing highdimensional large dataset is extremely low. The risk factors of hyperuricemia mainly belongs to 4 categories, which are obesity-related factors, renal function factors, liver function factors, and myeloproliferative diseases-related factors. Random forest, Lasso regression, and logistic regression all treat serum creatinine, BMI, triglyceride, fatty liver, and age as key predictive variables. Models also show that serum urea, serum alanine aminotransferase, negative urobilinogen, red blood cell count, white blood cell count and the pH are significantly correlated with the risk.

Keywords

Medical examination / hyperuricemia / machine learning / risk prediction / risk factors

Cite this article

Download citation ▾
Rong Hou, Yongbo Xiao, Yan Zhu, Hongyan Zhao. Prediction of Hyperuricemia Risk Based on Medical Examination Report Analysis. Journal of Systems Science and Systems Engineering, 2020, 29(4): 468-503 DOI:10.1007/s11518-020-5462-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Abdullah AA, Mohammed GA, Mohammad KS. Application of data mining: Diabetes health care in young and old patients. Computer and Information Sciences, 2013, 25(2): 127-136.

[2]

Agrawal, Rakesh, Ramakrishnan S 1994. Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 1994

[3]

Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. Lung cancer survival prediction using ensemble data mining on SEER data. Scientific Programming, 2012, 20(1): 29-42.

[4]

Antonelli D, Baralis E, Bruno G, Cerquitelli T, Chiusano S, Mahoto N. Analysis of diabetic patients through their examination history. Expert Systems with Applications, 2013, 40(11): 4672-4678.

[5]

Arun KS, Helga J, Rafn B. Outcomes of educational interventions in type 2 diabetes: WEKA data-mining analysis. Patient Education and Counseling, 2007, 67(1–2): 21-31.

[6]

Ashok KD. Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput & Applic, 2018, 29(10): 685-693.

[7]

Awan SE, Bennamoun M, Sohel F, Sanfilippo FM, Chow BJ, Dwivedi G. Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death. PloS One, 2019, 14(6): e0218760

[8]

Bertsimas D, Bjarnadóttir MV, Kane MA, Kryder JC, Pandey R, Vempala S, Wang G. Algorithmic prediction of health-care costs. Operations Research, 2008, 56(6): 1382-1392.

[9]

Chawla NV, Davis DA. Bringing big data to personalized healthcare: A patient-centered framework. Journal of General Internal Medicine, 2013, 28(3): 660-665.

[10]

Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning, 2006 233-240.

[11]

Han J, Pei J, Kamber M. Data Mining: Concepts and Techniques, 2011.

[12]

He H, Garcia EA. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 2009, 21(9): 1263-1284.

[13]

Huang Y, McCullagh P, Black N, Harper R. Feature selection and classification model construction on type 2 diabetic patients’ data. Artificial intelligence in medicine, 2007, 41(3): 251-262.

[14]

Iqbal U, Hsu CK, Nguyen PAA, Clinciu DL, Lu R, Syed-Abdul S, Yang HC, Wang YC, Huang CY, Huang CW. Cancer-disease associations: A visualization and animation through medical big data. Computer methods and programs in biomedicine, 2016, 127: 44-51.

[15]

Karaolis M, Moutiris JA, Hadjipanayi D, Pattichis CS. Assessment of the risk factors of coronary heart events based on data mining with decision trees. IEEE Transactions on Information Technology in Biomedicine, 2010, 14(3): 559-566.

[16]

Kusiak A, Dixon B, Shah S. Predicting survival time for kidney dialysis patients: A data mining approach. Computers in Biology and Medicine, 2005, 35(4): 311-327.

[17]

Liu W, Sanjay C. Class confidence weighted kNN algorithms for imbalanced data sets. Advanced in Knowledge Discovery and Data Mining. Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2011, Berlin, Heidelberg: Springer

[18]

Ma BJ. Associations between dynamic changes of blood indicators and metabolic syndrome, 2017

[19]

Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker DW, Choudhary A. Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data. Journal of the American Medical Informatics Association, 2013, 20(1): 118-124.

[20]

Nicolas P, Rafik T, Yves B, Gerd S, Lotfi L. Generating a condensed representation for association rules. Journal of Intelligent Information Systems, 2005, 24(1): 29-60.

[21]

Post AR, Kurc T, Cholleti S, Gao J, Lin X, Bornstein W, Cantrell D, Levine D, Hohmann S, Saltz JH. The Analytic InformationWarehouse (AIW): A platform for analytics using electronic health record data. Journal of biomedical informatics, 2013, 46(3): 410-424.

[22]

Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, Sontag D. Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data, 2015, 3(4): 277-287.

[23]

Richard L, Jonathan T, Ryan JT, Robert T. A significance test for the Lasso. Annals of statistics, 2014, 42(2): 413-468.

[24]

Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T, Søeby K, Bredkjær S, Juul A, Werge T. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput Biol, 2011, 7(8): e1002141

[25]

Saiful I, Mahmudul H, Wang XY, Hayley D G, Noor EA. A systematic review on healthcare analytics: Application and theoretical perspective of data mining. Healthcare, 2018, 6(2): 54

[26]

Tsipouras MG, Exarchos TP, Fotiadis DI, Kotsia AP, Vakalis KV, Naka KK, Michalis LK. Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Transactions on Information Technology in Biomedicine, 2008, 12(4): 447-458.

[27]

Wu YF, Fang Y. Stroke prediction with machine learning methods among older Chinese. International Journal of Environmental Research and Public Health, 2020, 17(6): 1828

[28]

Zhou X, Chen S, Liu B, Zhang R, Wang Y, Li P, Guo Y, Zhang H, Gao Z, Yan X. Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artificial Intelligence in medicine, 2010, 48(2–3): 139-152.

AI Summary AI Mindmap
PDF

131

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/