PDF
Abstract
Cataract is a very common eye disease and the most significant cause of blindness. In consideration of its burden on society, the focus was put on testing the risk factors of cataract and building robust machine learning models in which these factors can be utilized to predict the risk of cataract. The data used herein was collected by a Chinese physical examination center located in Shanghai. It contains more than 120,000 examinees and about 500 physical examination metrics. Firstly, association rules were adopted to filter 39 abnormalities which are more likely to incur the risk of cataract, and the significance of these abnormalities was tested with univariate analysis and multivariate analysis. The test results indicate that age, diabetes, refractive error, retinal arteriosclerosis, thyroid nodules, and incomplete mammary gland degeneration significantly increase the possibility of cataract. Various machine learning models were compared in terms of their performance in predicting the risk of cataract based on these six factors, among which the logistic regression model and the decision-tree based ensemble methods outperform others. The test set AUC of these models can reach 0.84.
Keywords
Cataract
/
risk factors
/
physical examination data
/
machine learning
Cite this article
Download citation ▾
Jianqiao Hao, Yongbo Xiao, Shudi Du.
Physical Examination Data Based Cataract Risk Analysis.
Journal of Systems Science and Systems Engineering, 2021, 30(2): 198-214 DOI:10.1007/s11518-021-5477-5
| [1] |
Assmann G, Cullen P, Schulte H. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the prospective cardiovascular Munster (PROCAM) study. Circulation, 2002, 105(3): 310-315.
|
| [2] |
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5-32.
|
| [3] |
Chang J R. Risk factors associated with incident cataracts and cataract surgery in the Age-related Eye Disease Study (AREDS): AREDS report number 32. Ophthalmology, 2011, 118(11): 2113-2119.
|
| [4] |
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
|
| [5] |
Conroy R M, Pyörälä K, Fitzgerald A P, Sans S, Menotti A, De Backer G, De Bacquer D, Ducimetière P, Jousilahti P, Keil U, Njølstad I, Oganov RG, Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P, Wilhelmsen L, Graham IM SCORE project group Estimation of ten-year risk of fatal cardiovascular disease in Europe: The SCORE project. European Heart Journal, 2003, 24(11): 987-1003.
|
| [6] |
Cumming R G, Mitchell P. Alcohol, smoking, and cataracts: The Blue Mountains eye study. Archives of Ophthalmology, 1997, 115(10): 1296-1303.
|
| [7] |
Foster A. Vision 2020: The cataract challenge. Community Eye Health, 2000, 13(34): 17-19.
|
| [8] |
Harding J J, Harding R S, Egerton M. Risk factors for cataract in Oxfordshire: Diabetes, peripheral neuropathy, myopia, glaucoma and diarrhoea. Acta Ophthalmologica, 1989, 67(5): 510-517.
|
| [9] |
Heyningen V R. The human lens: I. A comparison of cataracts extracted in Oxford (England) and Shikarpur (W. Pakistan). Experimental Eye Research, 1972, 13(2): 136-147.
|
| [10] |
Hiller R, Sperduto R D, Ederer F. Epidemiologic associations with nuclear, cortical, and posterior subcapsular cataracts. American Journal of Epidemiology, 1986, 124(6): 916-925.
|
| [11] |
Hodge W G, Whitcher J P, Satariano W. Risk factors for age-related cataracts. Epidemiologic Reviews, 1995, 17(2): 336-346.
|
| [12] |
Javitt J C, Wang F, West S K. Blindness due to cataract: Epidemiology and prevention. Annual Review of Public Health, 1996, 17: 159-177.
|
| [13] |
Fei J, Yong J, Hui Z, Yi D, Hao L, Sufeng M, Yilong W, Qiang D, Haipeng S, Yongjun W. Artificial intelligence in healthcare: Past, present and future. Stroke and Vascular Neurology., 2017, 2(4): 230-243.
|
| [14] |
Jiang TX, Zhai SN, Yan J, Li Y, Lu ZQ. Association between hyperlipidemia, diabetes and age-related cataract. International Eyes Science, 2012, 12(11): 2098-2101.
|
| [15] |
Kaur A, Gupta V, Christopher A F, Malik M A, Bansal P. Nutraceuticals in prevention of cataract can evidence based approach. Saudi Journal of Ophthalmology, 2016, 31(1): 30-37.
|
| [16] |
Moncef K, Rim K, Rupert B, Hans L, Flaxman SR, Jonasl JB, Jill K, Janet L, Kovin N, Konrad P. Number of people blind or visually impaired by cataract worldwide and in world regions, 1990 to 2010. Investigative Ophthalmology & Visual Science, 2015, 56(11): 6762-6769.
|
| [17] |
Kleiman R S, Larose E R, Badger J C, Page D, Peissig P L (2018). Using machine learning algorithms to predict risk for development of calciphylaxis in patients with chronic kidney disease. AMIA Summits on Translational Science Proceedings 2018, 139.
|
| [18] |
Kuppens E V, Van Best J A, Sterk C C. Is glaucoma associated with an increased risk of cataract?. British Journal of Ophthalmology, 1995, 79(7): 649-652.
|
| [19] |
Lundberg S M, Lee S I (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems: 4765–4774.
|
| [20] |
Maaten L V D, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(v): 2579-2605.
|
| [21] |
Mao W S, Hu T S. An epidemiologic survey of senile cataract in China. Chinese Medical Journal, 1982, 95(11): 813-818.
|
| [22] |
Shapley L S, Lloyd S. Notes on the n-Person Game II: The Value of an n-Person Game, 1951, Santa Monica, CA: RAND Corporation
|
| [23] |
Tang Y, Wang X, Wang J, Huang W, Gao YP. Prevalence and causes of visual impairment in a Chinese adult population: The Taizhou eye study. Ophthalmology, 2015, 122(7): 1480-1488.
|
| [24] |
Tavani A, Negri E, La Vecchia C. Food and nutrient intake and risk of cataract. Annals of Epidemiology, 1996, 6(1): 41-46.
|
| [25] |
Welp A, Woodbury R B, McCoy M A, et al. Understanding the epidemiology of vision loss and impairment in the United States. Making Eye Health A Population Health Imperative: Vision for Tomorrow, National Academies Press (US).
|
| [26] |
Wilson P, D’Agostino R, Levy D, Bélanger A M, Silbershatz H, Kannel W. Prediction of coronary heart disease using risk factor categories. Circulation, 1998, 97(18): 1837-1847.
|
| [27] |
World Health Organization (2014). Facts about blindness and visual impairment.
|
| [28] |
Xu B, Shi L. Analysis of the importance of early prevention and disease detection of physical examination. Modern Preventive Medicine, 2012, 39(19): 5033-5034.
|
| [29] |
Yang X, Li J, Hu D, Chen J, Li Y, Huang J, Liu X, Liu F, Cao J, Shen C. Predicting the 10-year risks of atherosclerotic cardiovascular disease in Chinese population: the China-PAR project (Prediction for ASCVD Risk in China). Circulation, 2016, 134(19): 1430-1440.
|
| [30] |
Yeh D Y, Cheng C H, Chen Y W. A predictive model for cerebrovascular disease using data mining. Expert Systems with Applications, 2011, 38(7): 8970-8977.
|
| [31] |
Zhang Q, Zhu Z, Meng W, Zhang YY, Xue FZ. Longitudinal monitoring large-scale health check-up data analysis strategy. Journal of Shandong University (Health Sciences), 2012, 50(2): 149-156.
|
| [32] |
Zhang R, Zheng L, Pan G. Application and foundation of disease prediction models. Chinese Journal of Health Statistics, 2015, 32(4): 724-726.
|
| [33] |
Zhao Y, Wong Z S Y, Tsui K L (2018). A framework of rebalancing imbalanced healthcare data for rare events’ classification: A case of look-alike sound-alike mix-up incident detection. Journal of Healthcare Engineering:1–11.
|