Efficient feature selection based on Gower distance for breast cancer diagnosis
Salwa Shakir Baawi , Mustafa Noaman Kadhim , Dhiah Al-Shammary
Journal of Electronic Science and Technology ›› 2025, Vol. 23 ›› Issue (2) : 100315
Efficient feature selection based on Gower distance for breast cancer diagnosis
This study presents an efficient feature selection method based on the Gower distance to enhance the accuracy and efficiency of standard classifiers on high-dimensional medical datasets. High-dimensional data poses significant challenges for traditional classifiers due to feature redundancy or being irrelevant. The proposed method addresses these challenges by partitioning the dataset into blocks, calculating the Gower distance within each block, and selecting features based on their average similarity. Technically, the Gower distance normalizes the absolute difference between numerical features, ensuring that each feature contributes equally to the distance calculation. This normalization prevents features with larger scales from overshadowing those with smaller scales. This process facilitates the identification of features that exhibit high harmony and are the most relevant for classification. The proposed feature selection strategy significantly reduces dimensionality, retains the most relevant features, and improves model performance. Experimental results show that the accuracy for the classifiers including k-nearest neighbors (KNN), naive Bayes (NB), decision tree (DT), random forest (RF), support vector machine (SVM), and logistic regression (LR) was increased by 4.38%–7.02%. Besides, the reduction in the feature set size contributes to a considerable decrease in computational complexity and thus faster diagnosis speed. The execution time was averagely reduced by 77.82% for all samples and 76.45% for one sample. These results demonstrate that the proposed feature selection method shows enhanced performance on both prediction accuracy and diagnostic speed, making it a promising tool for real-time clinical decision-making and improving patient care outcomes.
Breast cancer disease classification / Feature selection / Gower distance / Machine learning classifiers
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
/
| 〈 |
|
〉 |