A primal perspective for indefinite kernel SVM problem
Hui XUE, Haiming XU, Xiaohong CHEN, Yunyun WANG
A primal perspective for indefinite kernel SVM problem
Indefinite kernel support vector machine (IKSVM) has recently attracted increasing attentions in machine learning. Since IKSVM essentially is a non-convex problem, existing algorithms either change the spectrum of indefinite kernel directly but risking losing some valuable information or solve the dual form of IKSVM whereas suffering from a dual gap problem. In this paper, we propose a primal perspective for solving the problem. That is, we directly focus on the primal form of IKSVM and present a novel algorithm termed as IKSVM-DC for binary and multi-class classification. Concretely, according to the characteristics of the spectrum for the indefinite kernel matrix, IKSVM-DC decomposes the primal function into the subtraction of two convex functions as a difference of convex functions (DC) programming. To accelerate convergence rate, IKSVM-DC combines the classical DC algorithm with a line search step along the descent direction at each iteration. Furthermore, we construct a multi-class IKSVM model which can classify multiple classes in a unified form. A theoretical analysis is then presented to validate that IKSVM-DC can converge to a local minimum. Finally, we conduct experiments on both binary and multi-class datasets and the experimental results show that IKSVM-DC is superior to other state-of-the-art IKSVM algorithms.
indefinite kernel / support vector machine / multi-class classification / non-convex optimization
[1] |
Cortes C. Support-vector network. Machine Learning Journal, 1995, 20(3): 273–297
CrossRef
Google scholar
|
[2] |
Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge: Cambridge University Press, 2000
CrossRef
Google scholar
|
[3] |
Li Y Y, Lu R Q. Locality preserving projection on SPD matrix Lie group: algorithm and analysis. Science China Information Sciences, 2018, 61(9): 092104
CrossRef
Google scholar
|
[4] |
Ma L R, Song D D, Liao L J, Wang J. PSVM: a preference-enhanced SVM model using preference data for classification. Science China Information Sciences, 2017, 60(12): 122103
CrossRef
Google scholar
|
[5] |
Saigo H, Vert J-P, Ueda N, Akutsu T. Protein homology detection using string alignment kernels. Bioinformatics, 2004, 20(11): 1682–1689
CrossRef
Google scholar
|
[6] |
Wang C, Song Y, Li H, Zhang M, Han J. Text classification with heterogeneous information network kernels. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2130–2136
|
[7] |
Vapnik V. The Nature of Statistical Learning Theory. Germany: Springer Science & Business Media, 2013
|
[8] |
Suard F, Rakotomamonjy A, Bensrhair A. Kernel on bag of paths for measuring similarity of shapes. In: Proceedings of European Symposium on Artificial Neural Networks. 2007, 355–360
|
[9] |
Chen Y, Garcia E, Gupta M, Rahimi A, Cazzanti L. Similarity-based classification: concepts and algorithms. Journal of Machine Learning Research, 2009, 10(3): 747–776
|
[10] |
Chen Y, Gupta M. Fusing similarities and kernels for classification, In: Proceedings of the 12th International Conference on IEEE Information Fusion. 2009, 474–481
|
[11] |
Pkalska E, Haasdonk B. Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(6): 1017–1032
CrossRef
Google scholar
|
[12] |
Sun H, Wu Q. Least square regression with indefinite kernels and coefficient regularization. Applied and Computational Harmonic Analysis, 2011, 30(1): 96–109
CrossRef
Google scholar
|
[13] |
Ying Y, Campbell C, Girolami M. Analysis of SVMwith indefinite kernels. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. 2009, 2205–2213
|
[14] |
Loosli G, Canu S. Non positive SVM. Technical Report, 2010
|
[15] |
Haasdonk B. Feature space interpretation of SVMs with indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(4): 482–492
CrossRef
Google scholar
|
[16] |
Pekalska E, Paclik P, Duin R P. A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, 2001, 2(12): 175–211
|
[17] |
Graepel T, Herbrich R, Bollmann-Sdorra P, Obermayer K. Classification on pairwise proximity data. In: Proceedings of the 13th Conference on Neural Information Processing Systems. 1999, 438–444
|
[18] |
Roth V, Laub J, Kawanabe M, Buhmann J M. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(12): 1540–1551
CrossRef
Google scholar
|
[19] |
Luss R, d’Aspremont A. Support vector machine classification with indefinite kernels. In: Proceedings of the 21st Conference on Neural Information Processing Systems. 2008, 953–960
|
[20] |
Chen J, Ye J. Training SVM with indefinite kernels. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 136–143
CrossRef
Google scholar
|
[21] |
Chen Y, Gupta M R, Recht B. Learning kernels from indefinite similarities. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 145–152
CrossRef
Google scholar
|
[22] |
Gu S, Guo Y. Learning SVM classifiers with indefinite kernels. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. 2012, 942–948
|
[23] |
Lin H T, Lin C J. A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Computation, 2003, 3: 1–32
|
[24] |
Akoa F B. Combining DC algorithms (DCAs) and decomposition techniques for the training of nonpositive-semidefinite kernels. IEEE Transactions on Neural Networks, 2008, 19(11): 1854–1872
CrossRef
Google scholar
|
[25] |
Ong C S, Mary X, Canu S, Smola A J. Learning with non-positive kernels. In: Proceedings of the 21st International Conference on Machine Learning. 2004, 81
CrossRef
Google scholar
|
[26] |
Loosli G, Canu S, Ong C S. Learning SVM in Kre˘ın spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(6): 1204–1216
CrossRef
Google scholar
|
[27] |
Alabdulmohsin I M, Gao X, Zhang X. Support vector machines with indefinite kernels. In: Proceedings of the 6th Asian Conference on Machine Learning. 2014, 32–47
|
[28] |
Kotsiantis S B, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering, 2007, 160: 3–24
CrossRef
Google scholar
|
[29] |
Friedman J. Another approach to polychotomous classification. Technical Report, Department of Statistics, Stanford University, 1996
|
[30] |
Krebel U. Pairwise classification and support vector machines. Advances in Kernel Methods: Support Vector Learning, 1999, 255–268
|
[31] |
Bottou L, Cortes C, Denker J S, Drucker H, Guyon Z, Jackel L D, Le Cun Y, Muller U A, Sackinger E, Simard P, Vapnik U. Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition. 1994, 77–82
|
[32] |
Dietterich T G, Bakiri G. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 1995, 2: 263–286
CrossRef
Google scholar
|
[33] |
Allwein E L, Schapire R E, Singer Y. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 2000, 1(12): 113–141
|
[34] |
Platt J C, Cristianini N, Shawe-Taylor J. Large margin dags for multiclass classification. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. 1999, 547–553
|
[35] |
Scholkopf B, Smola A J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Massackusetts: MIT Press, 2001
|
[36] |
Tao P D, An L T H. Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica, 1997, 22(1): 289–355
|
[37] |
Dinh T P, Le Thi H A. Recent advances in DC programming and DCA. In: Nguyen N T, Le-Thi H A, eds. Transactions on Computational Intelligence XIII. Springer, Berlin, Heidelbery, 2014, 1–37
CrossRef
Google scholar
|
[38] |
Piot B, Geist M, Pietquin O. Difference of convex functions programming for reinforcement learning. In: Proceedings of the 27th Conference on Neural Information Processing Systems. 2014, 2519–2527
|
[39] |
Xu H M, Xue H, Chen X H, Wang Y Y. Solving indefinite kernel support vector machine with difference of convex functions programming. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2782–2788
|
[40] |
Artacho F J A, Fleming R M, Vuong P T. Accelerating the DC algorithm for smooth functions. Mathematical Programming, 2018, 169(1): 95–118
CrossRef
Google scholar
|
[41] |
Nesterov Y. Introductory Lectures on Convex Optimization: A Basic Course. Germany: Springer Science & Business Media, 2013
|
[42] |
Wang Z, Xue X. Multi-class support vector machine. In: Ma Y, Guo G, eds. Support Vector Machines Applications. Springer, Cham, 2014, 23–48
CrossRef
Google scholar
|
[43] |
Vapnik V N. Statistical Learning Theory. Wiley New York: JohnWiley & Sons, Inc., 1998
|
[44] |
Weston J, Watkins C. Multi-class support vector machines. Technical Report, Department of Computer Science, Royal Holloway, University of London, 1998
|
[45] |
Crammer K, Singer Y. On the learnability and design of output codes for multi-class problems. Machine Learning, 2002, 47(2): 201–233
CrossRef
Google scholar
|
[46] |
Blake C, Merz C J. UCI repository of machine learning databases, 1998
|
[47] |
Ratsch G, Onoda T, Muller K R. Soft margins for AdaBoost. Machine Learning, 2001, 42(3): 287–320
CrossRef
Google scholar
|
[48] |
Duin R P, Pekalska E. The dissimilarity representation for pattern recognition: a tutorial. Technical Report, 2009
|
[49] |
Mosek A. The MOSEK optimization software. Google Website, 2010
|
[50] |
Wu G, Chang E Y, Zhang Z. An analysis of transformation on nonpositive semidefinite similarity matrix for kernel machines. In: Proceedings of the 22nd International Conference on Machine Learning. 2005, 1245–1256
|
[51] |
Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
CrossRef
Google scholar
|
[52] |
Zhang M L, Zhou Z H. Exploiting unlabeled data to enhance ensemble diversity. DataMining and Knowledge Discovery, 2013, 26(1): 98–129
CrossRef
Google scholar
|
/
〈 | 〉 |