Optimizing Diabetes Classification: BOA-Enhanced ML with EDA and SMOTE

R. Harihara Krishnan , Ananthi Sheshasaayee

Journal of Modern Applied Statistical Methods ›› 2025, Vol. 24 ›› Issue (1) : 11

PDF (1936KB)
Journal of Modern Applied Statistical Methods ›› 2025, Vol. 24 ›› Issue (1) :11 DOI: 10.56801/Jmasm.V24.i1.11
research-article
Optimizing Diabetes Classification: BOA-Enhanced ML with EDA and SMOTE
Author information +
History +
PDF (1936KB)

Abstract

Diabetes Mellitus, a chronic metabolic disorder stemming from fluctuations in blood glucose and insulin levels, exerts profound impacts on every organ, significantly compromising overall health. While a permanent cure remains elusive, proactive management can control the disease’s extent. Early detection is pivotal in averting its onset. This research employs Exploratory Data Analysis (EDA), coupled with SMOTE analysis, to unveil patterns, correlation, characteristics, and data structures. For diabetes classification, Support Vector Machine (SVM), Extreme Gradient Boosting (XG Boost). Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT) optimized by Bees Optimization, were employed. Metrics like the F1 Score, ROC curve, accuracy, precision, and recall are used to carefully evaluate the model’s performance. In order to determine the parameters that support classification, this model was tested using the PIMA Indian dataset and real-time datasets. For the real- time dataset with BOA, the SVM model scored an astounding 98.86% accuracy, but for the PIMA dataset, it only managed a 96% accuracy. As a result, this study proves that, in comparison to cutting-edge techniques, combining EDA with SMOTE and ML with BOA produces better outcome.

Keywords

investment efficiency / bibliometric literature review / corporate governance / thematic evolution / ESG

Cite this article

Download citation ▾
R. Harihara Krishnan, Ananthi Sheshasaayee. Optimizing Diabetes Classification: BOA-Enhanced ML with EDA and SMOTE. Journal of Modern Applied Statistical Methods, 2025, 24(1): 11 DOI:10.56801/Jmasm.V24.i1.11

登录浏览全文

4963

注册一个新账户 忘记密码

Author Contributions

R.H.K. and A.S.: conceptualization, methodology, software, data curation, writing—original draft preparation, visualization, investigation, supervision, validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this research classification study involved the utilization of both the PIMA Indian dataset [13], and a realtime dataset gathered through a Google Form distributed among the general population. Nearly 2000 data entries were collected.

Conflicts of Interest

The authors declare no conflict of interest.

References

[1]

Wee B.F.; Sivakumar S.; Lim K.H.; et al. Diabetes detection based on machine learning and deep learning approaches. Multimed. Tools Appl. 2023, 83, 24153-24185. https://doi.org/10.1007/s11042-023-16407-5.

[2]

Bala Manoj Kumar, P.; Srinivasa Perumal, R.; Nadesh R.K.; et al. Type 2: Diabetes mellitus prediction using Deep Neural Networks classifier. Int. J. Cogn. Comput. Eng. 2020, 1, 55-61. https://doi.org/10.1016/j.ijcce.2020.10.002.

[3]

Ahamed B.S.; Arya M.S.; Nancy A.O.V. Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation. Adv. Hum.-Comput. Interact. 2022, 2022, 9220560. https://doi.org/10.1155/2022/9220560.

[4]

Zhan W. A Comparative Study on Machine Learning Based Type 2 Diabetes Mellitus Prediction. In Proceedings of the 2022 International Conference on Computer Science, Information Engineering and Digital Economy ( CSIEDE 2022), Guangzhou, China, 28-30 October 2022. https://doi.org/10.2991/978-94-6463-108-1_95.

[5]

Rahman M.A.; Abdulrazak L.F.; Ali M.M.; et al. Machine Learning-Based Approach for Predicting Diabetes Employing Socio—Demographic Characteristics. Algorithms 2023, 16, 503. https://doi.org/10.3390/a16110503.

[6]

Khanam J.J.; Foo S.Y. A comparison of machine learning algorithms for diabetes prediction. ICT Express 2021, 7, 432-439. https://doi.org/10.1016/j.icte.2021.02.004.

[7]

Tigga N.P.; Garg S. Predicting type 2 Diabetes using Logistic Regression. In Lecture Notes of Electrical Engineering; Springer: Singapore, 2020.

[8]

Haffner C.P.; Vapnik V.N. Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 1999, 10, 1055-1064. https://doi.org/10.1109/72.788646.

[9]

Maniruzzaman M.; Rahman M.J.; Ahammed B.; et al. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf. Sci. Syst. 2020, 8, 7. https://doi.org/10.1007/s13755-019-0095-z.

[10]

Banerjee S. Machine Learning (ML) in Diet Planning for Type-1 Diabetes—An Overview. J. Healthc. Treat. Dev. (JHTD) 2022, 2, 1-5. https://doi.org/10.55529/jhtd25.1.5.

[11]

Zou Q.; Qu K.; Luo Y.; et al. Predicting Diabetes Mellitus with Machine Learning Techniques. Front Genet. 2018, 9, 515. https://doi.org/10.3389/fgene.2018.00515.

[12]

Islam M.R.; Banik S.; Rahman K.N.; et al. A comparative approach to alleviating the prevalence of diabetes mellitus using machine learning. Comput. Methods Programs Biomed. Update 2023, 4, 100113. https://doi.org/10.1016/j.cmpbup.2023.100113.

[13]

Pima Indians Diabetes Database. Available online: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

[14]

Sivaranjani S.; Ananya S.; Aravinth J.; et al. Diabetes prediction using machine learning algorithms with feature selection and dimensionality reduction. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19-20 March 2021; Volume 1, pp. 141-146.https://doi.org/10.1109/ICACCS51430.2021.9441935.

[15]

Unwin, A. Exploratory Data Analysis. In International Encyclopedia of Education, 3rd ed.; Peterson P., Baker E., McGaw B.,Eds.; Elsevier: Amsterdam, The Netherlands, 2010; pp. 156-161, ISBN 9780080448947. https://doi.org/10.1016/B978-0-08-044894-7.01327-0.

[16]

Sneha N.; Gangil T. Analysis of diabetes mellitus for early prediction using optimal features selection. J. Big Data 2019, 6, 13. https://doi.org/10.1186/s40537-019-0175-6.

[17]

Thamilarasi V. Artificial Intelligence-Driven Smart Scenic Management: Automated Decision Making and Optimization. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 2731.

[18]

Kopitar L.; Kocbek P.; Cilar L.; et al. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci. Rep. 2020, 10, 11981. https://doi.org/10.1038/s41598-020-68771-z.

[19]

Thamilarasi V., Roselin. R. Automatic Classification and Accuracy by Deep Learning Using CNN Methods in Lung Chest, X.-Ray Image. IOP Conf. Ser. Mater. Sci. Eng. 2020, 1055, 012099.

[20]

Chou C.Y.; Hsu D.Y.; Chou C.H. Predicting the Onset of Diabetes with Machine Learning Methods. J. Pers. Med. 2023, 13, 406. https://doi.org/10.3390/jpm13030406.

PDF (1936KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/