Efficient feature selection based on Gower distance for breast cancer diagnosis

Salwa Shakir Baawi , Mustafa Noaman Kadhim , Dhiah Al-Shammary

Journal of Electronic Science and Technology ›› 2025, Vol. 23 ›› Issue (2) : 100315

PDF (1990KB)
Journal of Electronic Science and Technology ›› 2025, Vol. 23 ›› Issue (2) : 100315 DOI: 10.1016/j.jnlest.2025.100315
research-article

Efficient feature selection based on Gower distance for breast cancer diagnosis

Author information +
History +
PDF (1990KB)

Abstract

This study presents an efficient feature selection method based on the Gower distance to enhance the accuracy and efficiency of standard classifiers on high-dimensional medical datasets. High-dimensional data poses significant challenges for traditional classifiers due to feature redundancy or being irrelevant. The proposed method addresses these challenges by partitioning the dataset into blocks, calculating the Gower distance within each block, and selecting features based on their average similarity. Technically, the Gower distance normalizes the absolute difference between numerical features, ensuring that each feature contributes equally to the distance calculation. This normalization prevents features with larger scales from overshadowing those with smaller scales. This process facilitates the identification of features that exhibit high harmony and are the most relevant for classification. The proposed feature selection strategy significantly reduces dimensionality, retains the most relevant features, and improves model performance. Experimental results show that the accuracy for the classifiers including k-nearest neighbors (KNN), naive Bayes (NB), decision tree (DT), random forest (RF), support vector machine (SVM), and logistic regression (LR) was increased by 4.38%–7.02%. Besides, the reduction in the feature set size contributes to a considerable decrease in computational complexity and thus faster diagnosis speed. The execution time was averagely reduced by 77.82% for all samples and 76.45% for one sample. These results demonstrate that the proposed feature selection method shows enhanced performance on both prediction accuracy and diagnostic speed, making it a promising tool for real-time clinical decision-making and improving patient care outcomes.

Keywords

Breast cancer disease classification / Feature selection / Gower distance / Machine learning classifiers

Cite this article

Download citation ▾
Salwa Shakir Baawi, Mustafa Noaman Kadhim, Dhiah Al-Shammary. Efficient feature selection based on Gower distance for breast cancer diagnosis. Journal of Electronic Science and Technology, 2025, 23(2): 100315 DOI:10.1016/j.jnlest.2025.100315

登录浏览全文

4963

注册一个新账户 忘记密码

Data availability

The data used in this paper is publicly available at https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data.

CRediT authorship contribution statement

Salwa Shakir Baawi: Conceptualization, Methodology, Supervision, Writing―review & editing. Mustafa Noaman Kadhim: Data curation, Formal analysis, Software, Visualization, Writing―original draft. Dhiah Al-Shammary: Methodology, Validation, Investigation, Resources, Writing―review & editing, Project administration.

Declaration of competing interest

The authors declare no known conflicts of interest associated with this publication.

References

[1]

WHO, Breast cancer, https://www.who.int/news-room/fact-sheets/detail/breast-cancer, November 2021.

[2]

H.-Y. Wang, J. Feng, Q.-R. Bu, et al., Breast mass detection in digital mammogram based on Gestalt psychology, J. Healthc. Eng. 2018 (1) (2018) 4015613.

[3]

G. Valvano, G. Santini, N. Martini, et al., Convolutional neural networks for the segmentation of microcalcification in mammography imaging, J. Healthc. Eng. 2019 (1) (2019) 9360941.

[4]

N. Biswas, K.M.M. Uddin, S.T. Rikta, S.K. Dey, A comparative analysis of machine learning classifiers for stroke prediction: a predictive analytics approach, Healthc. Anal. 2 (2022) 100116.

[5]

M. Gupta, B. Gupta, A comparative study of breast cancer diagnosis using supervised machine learning techniques, in: Proc. of the 2nd Intl. Conf. on Computing Methodologies and Communication, Erode, India, (2018), pp. 997-1002.

[6]

S. Thawkar, V. Katta, A.R. Parashar, L.K. Singh, M. Khanna, Breast cancer: a hybrid method for feature selection and classification in digital mammography, Int. J. Imag. Syst. Tech. 33 (5) (2023) 1696-1712.

[7]

P. Dinesh, A.S. Vickram, P. Kalyanasundaram, Medical image prediction for diagnosis of breast cancer disease comparing the machine learning algorithms: SVM, KNN, logistic regression, random forest and decision tree to measure accuracy, AIP Conf. Proc. 2853 (1) (2024) 020140.

[8]

L.G.R. Putra, K. Marzuki, H. Hairani, Correlation-based feature selection and Smote-Tomek Link to improve the performance of machine learning methods on cancer disease prediction, Eng. Appl. Sci. Res. 50 (6) (2023) 577-583.

[9]

B.U. Maheswari, T. Guhan, C.F. Britto, A. Sheeba, M.P. Rajakumar, K. Pratyush, Performance analysis of classifying the breast cancer images using KNN and naive Bayes classifier, AIP Conf. Proc. 2831 (1) (2023) 020012.

[10]

M.M. Hassan, M.M. Hassan, F. Yasmin, et al., A comparative assessment of machine learning algorithms with the least absolute shrinkage and selection operator for breast cancer detection and prediction, Decis. Anal. J. 7 (2023) 100245.

[11]

S. Laghmati, S. Hamida, K. Hicham, B. Cherradi, A. Tmiri, An improved breast cancer disease prediction system using ML and PCA, Multimed. Tools Appl. 83 (11) (2024) 33785-33821.

[12]

F. Atban, E. Ekinci, Z. Garip, Traditional machine learning algorithms for breast cancer image classification with optimized deep features, Biomed. Signal Proces. 81 (2023) 104534.

[13]

L.K. Singh, M. Khanna, R. Singh, An enhanced soft-computing based strategy for efficient feature selection for timely breast cancer prediction: Wisconsin Diagnostic Breast Cancer dataset case, Multimed. Tools Appl. 83 (31) (2024) 76607-76672.

[14]

A.H. Alsaeedi, H.H.R. Al-Mahmood, Z.F. Alnaseri, et al., Fractal feature selection model for enhancing high-dimensional biological problems, BMC Bioinf. 25 (1) (2024) 12.

[15]

L.K. Singh, M. Khanna, R. Singh, Efficient feature selection for breast cancer classification using soft computing approach: a novel clinical decision support system, Multimed. Tools Appl. 83 (14) (2024) 43223-43276.

[16]

M. Minnoor, V. Baths, Diagnosis of breast cancer using random forests, Procedia Comput. Sci. 218 (2023) 429-437.

[17]

H. Chen, N. Wang, X.-P. Du, K.-H. Mei, Y. Zhou, G.-X. Cai, Classification prediction of breast cancer based on machine learning, Comput. Intel. Neurosc. 2023 (1) (2023) 6530719.

[18]

M.S. Al, S. Amin, M.A. Zeb, et al., Enhancing breast cancer detection and classification using advanced multi-model features and ensemble machine learning techniques, Life 13 (10) (2023) 2093.

[19]

V.N. Gopal, F. Al-Turjman, R. Kumar, L. Anand, M. Rajesh, Feature selection and classification in breast cancer prediction using IoT and machine learning, Measurement 178 (2021) 109442.

[20]

K.M.M. Uddin, N. Biswas, S.T. Rikta, S.K. Dey, Machine learning-based diagnosis of breast cancer utilizing feature optimization technique, Comput. Methods Progr. Biomed. Update 3 (2023) 100098.

[21]

S. Ara, A. Das, A. Dey, Malignant and benign breast cancer classification using machine learning algorithms, in: Proc. of the Intl. Conf. on Artificial Intelligence, Islamabad, Pakistan, (2021), pp. 97-101.

[22]

M. Khashei, N. Bakhtiarvand, A novel discrete learning-based intelligent methodology for breast cancer classification purposes, Artif. Intell. Med. 139 (2023) 102492.

[23]

D. Al-Shammary, M.N. Kadhim, A.M. Mahdi, A. Ibaida, K. Ahmed, Efficient ECG classification based on Chi-square distance for arrhythmia detection, J. Elect. Sci. Technol. 22 (2) (2024) 100249.

[24]

M.N. Kadhim, D. Al-Shammary, F. Sufi, A novel voice classification based on Gower distance for Parkinson disease detection, Int. J. Med. Inform. 191 (2024) 105583.

[25]

M. Sadiq, M.N. Kadhim, D. Al-Shammary, M. Milanova, Novel EEG feature selection based on Hellinger distance for epileptic seizure detection, Smart Health 35 (2025) 100536.

[26]

M.B.S. Khan, Atta-Ur-rahman, M.S. Nawaz, R. Ahmed, M.A. Khan, Intelligent breast cancer diagnostic system empowered by deep extreme gradient descent optimization, Math. Biosci. Eng. 19 (8) (2022) 7978-8002.

[27]

R. Shafique, F. Rustam, G.S. Choi, et al., Breast cancer prediction using fine needle aspiration features and upsampling with supervised machine learning, Cancers 15 (3) (2023) 681.

[28]

S. Almutairi, S. Manimurugan, B.G. Kim, M.M. Aborokbah, C. Narmatha, Breast cancer classification using deep Q learning (DQL) and gorilla troops optimization (GTO), Appl. Soft Comput. 142 (2023) 110292.

[29]

L.K. Singh, M. Khanna, R. Singh, Artificial intelligence based medical decision support system for early and accurate breast cancer prediction, Adv. Eng. Softw. 175 (2023) 103338.

[30]

M. Khanna, L.K. Singh, K. Shrivastava, R. Singh, An enhanced and efficient approach for feature selection for chronic human disease prediction: a breast cancer study, Heliyon 10 (5) (2024) e26799.

AI Summary AI Mindmap
PDF (1990KB)

48

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/