Boosting imbalanced data learning with Wiener process oversampling

Qian LI , Gang LI , Wenjia NIU , Yanan CAO , Liang CHANG , Jianlong TAN , Li GUO

Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (5) : 836 -851.

PDF (825KB)
Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (5) : 836 -851. DOI: 10.1007/s11704-016-5250-y
RESEARCH ARTICLE

Boosting imbalanced data learning with Wiener process oversampling

Author information +
History +
PDF (825KB)

Abstract

Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling (WPO) technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperformsmany prevalent imbalance learning solutions.

Keywords

imbalanced-data learning / oversampling / ensemble learning / Wiener process / AdaBoost

Cite this article

Download citation ▾
Qian LI, Gang LI, Wenjia NIU, Yanan CAO, Liang CHANG, Jianlong TAN, Li GUO. Boosting imbalanced data learning with Wiener process oversampling. Front. Comput. Sci., 2017, 11(5): 836-851 DOI:10.1007/s11704-016-5250-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

ZhouZ H, LiuX Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 63–77

[2]

LiuX Y, ZhouZ H. The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proceedings of the 6th International Conference on Data Mining. 2006, 970–974

[3]

YuL, WangS, LaiK K. Developing an svm-based ensemble learning system for customer risk identification collaborating with customer relationship management. Frontiers of Computer Science in China, 2010, 4(2): 196–203

[4]

LiuE, ZhaoH, GuoF, Liang J, TianJ . Fingerprint segmentation based on an adaboost classifier. Frontiers of Computer Science in China, 2011, 5(2): 148–157

[5]

HanH, WangW, MaoB. Over-sampling algorithm based on adaboost in unbalanced data set. Computer Engineering, 2007, 33(10): 207–209 (in Chinese)

[6]

ChawlaN V, Lazarevic A, HallL O , BowyerK W. Smoteboost: improving prediction of the minority class in boosting. Lecture Notes in Computer Science, 2003, 2838: 107–119

[7]

MeaseD, WynerA J, BujaA. Boosted classification trees and class probability/quantile estimation. The Journal of Machine Learning Research, 2007, 8: 409–439

[8]

BatistaG E, PratiR C, MonardM C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20–29

[9]

BunkhumpornpatC, Sinapiromsaran K, LursinsapC . Safe-levelsmote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Proceedings of Advances in Knowledge Discovery and Data Mining. 2009, 475–482

[10]

ChawlaN V, BowyerK W, HallL O, Kegelmeyer W P. Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 321–357

[11]

YuanB, LiuW. Measure oriented training: a targeted approach to imbalanced classification problems. Frontiers of Computer Science, 2012, 6(5): 489–497

[12]

KangP, ChoS. EUS SVMS: ensemble of under-sampled svms for data imbalance problems. In: Proceedings of Neural Information Processing. 2006, 837–846

[13]

JapkowiczN. The class imbalance problem: significance and strategies. In: Proceedings of International Conference on Artificial Intelligence. 2000

[14]

GalarM, Fernandez A, BarrenecheaE , BustinceH, Herrera F. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(4): 463–484

[15]

YuanB, MaX. Sampling+ reweighting: boosting the performance of adaboost on imbalanced datasets. In: Proceedings of International Joint Conference on Neural Networks. 2012, 1–6

[16]

HidaT. Brownian motion. Springer US, 1980, 11(5): 44–113

[17]

DietterichT G. Ensemble methods in machine learning. In: Proceedings of Multiple classifier systems. 2000, 1–15

[18]

MaloofM A. Learning when data sets are imbalanced and when costs are unequal and unknown. In: Proceedings of ICML-2003 Workshop on Learning from Imbalanced Data Sets II. 2003

[19]

ChawlaN V, Japkowicz N, KotczA . Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 1–6

[20]

HanH, WangW Y, MaoB H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Proceedings of Advances in Intelligent Computing. 2005, 878–887

[21]

LiuX Y, WuJ, ZhouZ H. Exploratory undersampling for classimbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, 2009, 39(2): 539–550

[22]

SchapireR E. The boosting approach to machine learning: an overview. Nonlinear Estimation and Classification, 2003, 149–171

[23]

SchapireR E, SingerY. Boostexter: a boosting-based system for text categorization. Machine Learning, 2000, 39(2–3): 135–168

[24]

LiX, WangL, SungE. Adaboost with svm-based component classifiers. Engineering Applications of Artificial Intelligence, 2008, 21(5): 785–795

[25]

BoydS, Vandenberghe L. Convex Optimization. Cambridge: Cambridge University Press, 2004

[26]

AsuncionA, NewmanD. UCI machine learning repository. 2007

[27]

BreimanL, Friedman J, StoneC J , OlshenR A.Classification and Regression Trees. Belmont: Wadsworth International Group, 1984

[28]

LewisD D. Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of Machine Learning: ECML-98. 1998, 4–15

[29]

KellerJ M, GrayM R, GivensJ A. A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man and Cybernetics, 1985 (4): 580–585

[30]

BreimanL. Bagging predictors. Machine Learning, 1996, 24(2): 123–140

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (825KB)

Supplementary files

FCS-0836-15250-WJN_suppl_1

1376

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/