A primal perspective for indefinite kernel SVM problem

Hui XUE; Haiming XU; Xiaohong CHEN; Yunyun WANG

doi:10.1007/s11704-018-8148-z

PDF(533 KB)

Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (2) : 349-363. DOI: 10.1007/s11704-018-8148-z

RESEARCH ARTICLE

A primal perspective for indefinite kernel SVM problem

Hui XUE¹^,² ,
Haiming XU¹^,² ,
Xiaohong CHEN³ ,
Yunyun WANG⁴

Author information +

History +

Abstract

Indefinite kernel support vector machine (IKSVM) has recently attracted increasing attentions in machine learning. Since IKSVM essentially is a non-convex problem, existing algorithms either change the spectrum of indefinite kernel directly but risking losing some valuable information or solve the dual form of IKSVM whereas suffering from a dual gap problem. In this paper, we propose a primal perspective for solving the problem. That is, we directly focus on the primal form of IKSVM and present a novel algorithm termed as IKSVM-DC for binary and multi-class classification. Concretely, according to the characteristics of the spectrum for the indefinite kernel matrix, IKSVM-DC decomposes the primal function into the subtraction of two convex functions as a difference of convex functions (DC) programming. To accelerate convergence rate, IKSVM-DC combines the classical DC algorithm with a line search step along the descent direction at each iteration. Furthermore, we construct a multi-class IKSVM model which can classify multiple classes in a unified form. A theoretical analysis is then presented to validate that IKSVM-DC can converge to a local minimum. Finally, we conduct experiments on both binary and multi-class datasets and the experimental results show that IKSVM-DC is superior to other state-of-the-art IKSVM algorithms.

Keywords

indefinite kernel / support vector machine / multi-class classification / non-convex optimization

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Hui XUE, Haiming XU, Xiaohong CHEN, Yunyun WANG. A primal perspective for indefinite kernel SVM problem. Front. Comput. Sci., 2020, 14(2): 349‒363 https://doi.org/10.1007/s11704-018-8148-z

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Cortes C. Support-vector network. Machine Learning Journal, 1995, 20(3): 273–297 CrossRef Google scholar

[2]	Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge: Cambridge University Press, 2000 CrossRef Google scholar

[3]	Li Y Y, Lu R Q. Locality preserving projection on SPD matrix Lie group: algorithm and analysis. Science China Information Sciences, 2018, 61(9): 092104 CrossRef Google scholar

[4]	Ma L R, Song D D, Liao L J, Wang J. PSVM: a preference-enhanced SVM model using preference data for classification. Science China Information Sciences, 2017, 60(12): 122103 CrossRef Google scholar

[5]	Saigo H, Vert J-P, Ueda N, Akutsu T. Protein homology detection using string alignment kernels. Bioinformatics, 2004, 20(11): 1682–1689 CrossRef Google scholar

[6]	Wang C, Song Y, Li H, Zhang M, Han J. Text classification with heterogeneous information network kernels. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2130–2136

[7]	Vapnik V. The Nature of Statistical Learning Theory. Germany: Springer Science & Business Media, 2013

[8]	Suard F, Rakotomamonjy A, Bensrhair A. Kernel on bag of paths for measuring similarity of shapes. In: Proceedings of European Symposium on Artificial Neural Networks. 2007, 355–360

[9]	Chen Y, Garcia E, Gupta M, Rahimi A, Cazzanti L. Similarity-based classification: concepts and algorithms. Journal of Machine Learning Research, 2009, 10(3): 747–776

[10]	Chen Y, Gupta M. Fusing similarities and kernels for classification, In: Proceedings of the 12th International Conference on IEEE Information Fusion. 2009, 474–481

[11]	Pkalska E, Haasdonk B. Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(6): 1017–1032 CrossRef Google scholar

[12]	Sun H, Wu Q. Least square regression with indefinite kernels and coefficient regularization. Applied and Computational Harmonic Analysis, 2011, 30(1): 96–109 CrossRef Google scholar

[13]	Ying Y, Campbell C, Girolami M. Analysis of SVMwith indefinite kernels. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. 2009, 2205–2213

[14]	Loosli G, Canu S. Non positive SVM. Technical Report, 2010

[15]	Haasdonk B. Feature space interpretation of SVMs with indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(4): 482–492 CrossRef Google scholar

[16]	Pekalska E, Paclik P, Duin R P. A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, 2001, 2(12): 175–211

[17]	Graepel T, Herbrich R, Bollmann-Sdorra P, Obermayer K. Classification on pairwise proximity data. In: Proceedings of the 13th Conference on Neural Information Processing Systems. 1999, 438–444

[18]	Roth V, Laub J, Kawanabe M, Buhmann J M. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(12): 1540–1551 CrossRef Google scholar

[19]	Luss R, d’Aspremont A. Support vector machine classification with indefinite kernels. In: Proceedings of the 21st Conference on Neural Information Processing Systems. 2008, 953–960

[20]	Chen J, Ye J. Training SVM with indefinite kernels. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 136–143 CrossRef Google scholar

[21]	Chen Y, Gupta M R, Recht B. Learning kernels from indefinite similarities. In: Proceedings of the 26th International Conference on Machine Learning. 2009, 145–152 CrossRef Google scholar

[22]	Gu S, Guo Y. Learning SVM classifiers with indefinite kernels. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. 2012, 942–948

[23]	Lin H T, Lin C J. A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Computation, 2003, 3: 1–32

[24]	Akoa F B. Combining DC algorithms (DCAs) and decomposition techniques for the training of nonpositive-semidefinite kernels. IEEE Transactions on Neural Networks, 2008, 19(11): 1854–1872 CrossRef Google scholar

[25]	Ong C S, Mary X, Canu S, Smola A J. Learning with non-positive kernels. In: Proceedings of the 21st International Conference on Machine Learning. 2004, 81 CrossRef Google scholar

[26]	Loosli G, Canu S, Ong C S. Learning SVM in Kre˘ın spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(6): 1204–1216 CrossRef Google scholar

[27]	Alabdulmohsin I M, Gao X, Zhang X. Support vector machines with indefinite kernels. In: Proceedings of the 6th Asian Conference on Machine Learning. 2014, 32–47

[28]	Kotsiantis S B, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering, 2007, 160: 3–24 CrossRef Google scholar

[29]	Friedman J. Another approach to polychotomous classification. Technical Report, Department of Statistics, Stanford University, 1996

[30]	Krebel U. Pairwise classification and support vector machines. Advances in Kernel Methods: Support Vector Learning, 1999, 255–268

[31]	Bottou L, Cortes C, Denker J S, Drucker H, Guyon Z, Jackel L D, Le Cun Y, Muller U A, Sackinger E, Simard P, Vapnik U. Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition. 1994, 77–82

[32]	Dietterich T G, Bakiri G. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 1995, 2: 263–286 CrossRef Google scholar

[33]	Allwein E L, Schapire R E, Singer Y. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 2000, 1(12): 113–141

[34]	Platt J C, Cristianini N, Shawe-Taylor J. Large margin dags for multiclass classification. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. 1999, 547–553

[35]	Scholkopf B, Smola A J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Massackusetts: MIT Press, 2001

[36]	Tao P D, An L T H. Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica, 1997, 22(1): 289–355

[37]	Dinh T P, Le Thi H A. Recent advances in DC programming and DCA. In: Nguyen N T, Le-Thi H A, eds. Transactions on Computational Intelligence XIII. Springer, Berlin, Heidelbery, 2014, 1–37 CrossRef Google scholar

[38]	Piot B, Geist M, Pietquin O. Difference of convex functions programming for reinforcement learning. In: Proceedings of the 27th Conference on Neural Information Processing Systems. 2014, 2519–2527

[39]	Xu H M, Xue H, Chen X H, Wang Y Y. Solving indefinite kernel support vector machine with difference of convex functions programming. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2782–2788

[40]	Artacho F J A, Fleming R M, Vuong P T. Accelerating the DC algorithm for smooth functions. Mathematical Programming, 2018, 169(1): 95–118 CrossRef Google scholar

[41]	Nesterov Y. Introductory Lectures on Convex Optimization: A Basic Course. Germany: Springer Science & Business Media, 2013

[42]	Wang Z, Xue X. Multi-class support vector machine. In: Ma Y, Guo G, eds. Support Vector Machines Applications. Springer, Cham, 2014, 23–48 CrossRef Google scholar

[43]	Vapnik V N. Statistical Learning Theory. Wiley New York: JohnWiley & Sons, Inc., 1998

[44]	Weston J, Watkins C. Multi-class support vector machines. Technical Report, Department of Computer Science, Royal Holloway, University of London, 1998

[45]	Crammer K, Singer Y. On the learnability and design of output codes for multi-class problems. Machine Learning, 2002, 47(2): 201–233 CrossRef Google scholar

[46]	Blake C, Merz C J. UCI repository of machine learning databases, 1998

[47]	Ratsch G, Onoda T, Muller K R. Soft margins for AdaBoost. Machine Learning, 2001, 42(3): 287–320 CrossRef Google scholar

[48]	Duin R P, Pekalska E. The dissimilarity representation for pattern recognition: a tutorial. Technical Report, 2009

[49]	Mosek A. The MOSEK optimization software. Google Website, 2010

[50]	Wu G, Chang E Y, Zhang Z. An analysis of transformation on nonpositive semidefinite similarity matrix for kernel machines. In: Proceedings of the 22nd International Conference on Machine Learning. 2005, 1245–1256

[51]	Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27 CrossRef Google scholar

[52]	Zhang M L, Zhou Z H. Exploiting unlabeled data to enhance ensemble diversity. DataMining and Knowledge Discovery, 2013, 26(1): 98–129 CrossRef Google scholar

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(533 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
18 Apr 2018	21 Nov 2018	15 Apr 2020
Just Accepted Date	Online First Date	Issue Date
30 Nov 2018	17 Sep 2019	16 Oct 2019

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates