A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
Ping Gong , Junguang Gao , Li Wang
Journal of Systems Science and Systems Engineering ›› 2022, Vol. 31 ›› Issue (6) : 728 -752.
A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
Credit risk assessment is an important task of risk management for financial institutions. Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks. However, few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously. To this end, this study proposes a Tomek link and genetic algorithm (GA)-based under-sampling framework (TEUS) to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors. TEUS first determines boundary majority instances with Tomek link, then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance. Second, TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels. After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution, TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set. Innovatively, the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search. Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS.
Imbalance classification / credit classification / class overlap / evolutionary under-sampling / genetic algorithm
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
Du G, Elston F (2022). Financial risk assessment to improve the accuracy of financial prediction in the internet financial industry using data analytics models. Operations Management Research: 0123456789. |
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
Goldberg D (1989). Genetic algorithms in search. Optimization, and machine learning. Addion Wesley, 102(36). |
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
Mercier M, Santos MS, Abreu PH, Soares C, Soares JP, Santos J. (2018). Analysing the footprint of classifiers in overlapped and imbalanced contexts. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11191 LNCS, 200–212. |
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
Santos M S, Abreu PH, Japkowicz N, Fernández A, Soares C, Wilk S, Santos J. (2022). On the joint-effect of class imbalance and overlap: a critical review. Artificial Intelligence Review: 1–69. |
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
Vuttipittayamongkol P, Elyan E, Petrovski A, Jayne C (2018). Overlap-Based Undersampling for Improving Imbalanced Data Classification. Lecture Notes in Computer Science 11314 LNCS: 689–697. |
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
/
| 〈 |
|
〉 |