Analysis of loss functions in support vector machines

Huajun WANG , Naihua XIU

Front. Math. China ›› 2023, Vol. 18 ›› Issue (6) : 381 -414.

PDF (4114KB)
Front. Math. China ›› 2023, Vol. 18 ›› Issue (6) : 381 -414. DOI: 10.3868/s140-DDD-023-0027-x
SURVEY ARTICLE

Analysis of loss functions in support vector machines

Author information +
History +
PDF (4114KB)

Abstract

Support vector machines (SVMs) are a kind of important machine learning methods generated by the cross interaction of statistical theory and optimization, and have been extensively applied into text categorization, disease diagnosis, face detection and so on. The loss function is the core research content of SVM, and its variational properties play an important role in the analysis of optimality conditions, the design of optimization algorithms, the representation of support vectors and the research of dual problems. This paper summarizes and analyzes the 0-1 loss function and its eighteen popular surrogate loss functions in SVM, and gives three variational properties of these loss functions: subdifferential, proximal operator and Fenchel conjugate, where the nine proximal operators and fifteen Fenchel conjugates are given by this paper.

Graphical abstract

Keywords

Support vector machines / loss function / subdifferential / proximal operator / Fenchel conjugate

Cite this article

Download citation ▾
Huajun WANG, Naihua XIU. Analysis of loss functions in support vector machines. Front. Math. China, 2023, 18(6): 381-414 DOI:10.3868/s140-DDD-023-0027-x

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Akay M F. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 2009; 36(2): 3240–3247

[2]

Bartlett P L, Wegkamp M H. Classification with a reject option using a hinge loss. J Mach Learn Res 2008; 9: 1823–1840

[3]

BeckA. First-Order Methods in Optimization. MOS-SIAM Series on Optimization, Vol 25. Philadelphia, PA: SIAM, 2017

[4]

Brooks J P. Support vector machines with the ramp loss and the hard margin loss. Oper Res 2011; 59(2): 467–479

[5]

Cao L J, Keerthi S S, Ong C J, Zhang J Q, Periyathamby U, Fu X J, Lee H P. Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans Neural Netw 2006; 17(4): 1039–1049

[6]

Carrizosa E, Nogales-Gómez A, Romero Morales D. Heuristic approaches for support vector machines with the ramp loss. Optim Lett 2014; 8(3): 1125–1135

[7]

Chang C-C, Hsu C-W, Lin C-J. The analysis of decomposition methods for support vector machines. IEEE Trans Neural Netw 2000; 11(4): 1003–1008

[8]

Chang C-C, Lin C-J. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2011; 2(3): 27

[9]

Chang K-W, Hsieh C-J, Lin C-J. Coordinate descent method for large-scale L2-loss linear support vector machines. J Mach Learn Res 2008; 9: 1369–1398

[10]

Chapelle O. Training a support vector machine in the primal. Neural Comput 2007; 19(5): 1155–1178

[11]

Chapelle O, Haffner P, Vapnik V N. Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 1999; 10(5): 1055–1064

[12]

Chen H L, Yang B, Liu J, Liu D Y. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 2011; 38(7): 9014–9022

[13]

ClarkeF H. Optimization and Nonsmooth Analysis. New York: John Wiley & Sons, 1983

[14]

CollobertRSinzFWestonJBottouL. Trading convexity for scalability. In: ICML 2006, Proceedings of the 23rd International Conference on Machine Learning (Cohen W W, Moore A, eds). New York: Association for Computing Machinery, 2006, 201–208

[15]

Collobert R, Sinz F, Weston J, Bottou L. Large scale transductive SVMs. J Mach Learn Res 2006; 7: 1687–1712

[16]

Cortes C, Vapnik V. Support vector networks. Mach Learn 1995; 20(3): 273–297

[17]

De Kruif B J, De Vries T J A. Pruning error minimization in least squares support vector machines. IEEE Trans Neural Netw 2003; 14(3): 696–702

[18]

DengN YTianY JZhangC H. Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions. Boca Raton, FL: CRC Press, 2012

[19]

Ertekin S, Bottou L, Giles C L. Nonconvex online support vector machines. IEEE Trans Pattern Anal Mach Intell 2011; 33(2): 368–381

[20]

Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J. LIBLINEAR: A library for large linear classification. J Mach Learn Res 2008; 9: 1871–1874

[21]

Fan R-E, Chen P-H, Lin C-J. Working set selection using second order information for training support vector machines. J Mach Learn Res 2005; 6: 1889–1918

[22]

Feng Y L, Yang Y N, Huang X L, Mehrkanoon S, Suykens J A K. Robust support vector machines for classification with nonconvex and smooth losses. Neural Comput 2016; 28(6): 1217–1247

[23]

Frank I E, Friedman J H. A statistical view of some chemometrics regression tools. Technometrics 1993; 35(2): 109–135

[24]

GhanbariHLiM HScheinbergK. Novel and efficient approximations for zero-one loss of linear classifiers, 2019, arXiv: 1903.00359

[25]

Huang L W, Shao Y H, Zhang J, Zhao Y T, Teng J Y. Robust rescaled hinge loss twin support vector machine for imbalanced noisy classification. IEEE Access 2019; 7: 65390–65404

[26]

Huang X L, Shi L, Suykens J A K. Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 2014; 36(5): 984–997

[27]

Huang X L, Shi L, Suykens J A K. Ramp loss linear programming support vector machine. J Mach Learn Res 2014; 15: 2185–2211

[28]

Huang X L, Shi L, Suykens J A K. Sequential minimal optimization for SVM with pinball loss. Neurocomputing 2015; 149(C): 1596–1603

[29]

Huang X L, Shi L, Suykens J A K. Solution path for pin-SVM classifiers with positive and negative τ values. IEEE Trans Neural Netw Learn Syst 2017; 28(7): 1584–1593

[30]

JumutcVHuangX LSuykensJ A K. Fixed-size Pegasos for hinge and pinball loss SVM. In: The 2013 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ: IEEE, 2013

[31]

Keerthi S S, DeCoste D. A modified finite Newton method for fast solution of large scale linear SVMs. J Mach Learn Res 2005; 6: 341–361

[32]

Keerthi S S, Gilbert E G. Convergence of a generalized SMO algorithm for SVM classifier design. Machine Learning 2002; 46: 351–360

[33]

Keerthi S S, Shevade S K. SMO algorithm for least-squares SVM formulations. Neural Comput 2003; 15(2): 487–507

[34]

Keerthi S S, Shevade S K, Bhattacharyya C, Murthy K R K. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 2014; 13(3): 637–649

[35]

Khan N M, Ksantini R, Ahmad I S, Boufama B. A novel SVM+NDA model for classification with an application to face recognition. Pattern Recognition 2012; 45(1): 66–79

[36]

Lee C-P, Lin C-J. A Study on L2-loss (squared hinge-loss) multiclass SVM. Neural Comput 2013; 25(5): 1302–1323

[37]

Lee Y-J, Mangasarian O L. SSVM: a smooth support vector machine for classification. Comput Optim Appl 2001; 20(1): 5–22

[38]

LiH. Statistical Learning Methods, 2nd ed. Beijing: Tsinghua Univ Press, 2019 (in Chinese)

[39]

Li J T, Jia Y M, Li W L. Adaptive huberized support vector machine and its application to microarray classification. Neural Computing and Applications 2011; 20(1): 123–132

[40]

Lin C-J. On the convergence of the decomposition method for support vector machines. IEEE Trans Neural Netw 2001; 12(6): 1288–1298

[41]

Lin C-J. Asymptotic convergence of an SMO algorithm without any assumptions. IEEE Trans Neural Netw 2002; 13(1): 248–250

[42]

Liu D L, Shi Y, Tian Y J, Huang X K. Ramp loss least squares support vector machine. J Comput Sci 2016; 14: 61–68

[43]

López J, Suykens J A K. First and second order SMO algorithms for LS-SVM classifiers. Neural Process Lett 2011; 33(1): 31–44

[44]

Mančev D. A sequential dual method for the structured ramp loss minimization. Facta Univ Ser Math Inform 2005; 30(1): 13–27

[45]

Mason L, Bartlett P L, Baxter J. Improved generalization through explicit optimization of margins. Mach Learn 2000; 38(3): 243–255

[46]

Park S Y, Liu Y F. Robust penalized logistic regression with truncated loss functions. Canad J Statist 2011; 39(2): 300–323

[47]

Pérez-Cruz F, Navia-Vázquez A, Figueiras-Vidal A R, Artés- Rodríguez A. Empirical risk minimization for support vector classifiers. IEEE Trans Neural Netw 2003; 14(2): 296–303

[48]

RockafellarR TWetsR J-B. Variational Analysis, Corrected 3rd printing. Grundlehren der Mathematischen Wissenschaften, Vol 317. Berlin: Springer-Verlag, 2009

[49]

Sabbah T, Ayyash M, Ashraf M. Hybrid support vector machine based feature selection method for text classification. The International Arab Journal of Information Technology 2018; 15(3A): 599–609

[50]

Shalev-Shwartz S, Singer Y, Srebro N, Cotter A. Pegasos: primal estimated sub-gradient solver for SVM. Math Program 2011; 127(1): Ser B, 3–30

[51]

Shao Y H, Liu L M, Huang L W, Deng N Y. Key issues of support vector machines and future prospects. Sci Sin Math 2020; 50(9): 1233–1248

[52]

Shao Y H, Yang K L, Liu M Z, Wang Z, Li C N, Chen W J. From support vector machine to nonparallel support vector machine. Operations Research Transactions 2018; 22(2): 55–65(inChinese)

[53]

Sharif W, Yanto I T R, Samsudin N A, Deris M M, Khan A, Mushtaq M F, Ashraf M. An optimised support vector machine with Ringed Seal Search algorithm for efficient text classification. Journal of Engineering Science and Technology 2019; 14(3): 1601–1613

[54]

Shen X T, Tseng G C, Zhang X G, Wong W H. On ψ-learning. J Amer Statist Assoc 2003; 98(463): 724–734

[55]

Shen X, Niu L F, Qi Z Q, Tian Y J. Support vector machine classifier with truncated pinball loss. Pattern Recognition 2017; 68: 199–210

[56]

SteinwartIChristmannA. Support Vector Machines. New York: Springer, 2008

[57]

Suykens J A K, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett 1999; 9(3): 293–300

[58]

Tanveer M, Sharma S, Rastogi R, Anand P. Sparse support vector machine with pinball loss. Trans Emerg Telecommun Technol 2021; 32(2): e3820

[59]

VenkateswarLalPNittaG RPrasadA. Ensemble of texture and shape descriptors using support vector machine classification for face recognition. J Ambient Intell Humaniz Comput, 2019, https://doi:10.1007/s12652-019-01192-7, in press

[60]

WahbaG. Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. In: Advances in Kernel Methods—Support Vector Learning (Schölkopf B, Burges C J C, Smola A J, eds). Cambridge, MA: MIT Press, 1998, 69–88

[61]

Wang H J, Shao Y H, Xiu N H. Proximal operator and optimality conditions for ramp loss SVM. Optim Lett 2022; 16: 999–1014

[62]

Wang H J, Shao Y H, Zhou S L, Zhang C, Xiu N H. Support vector machine classifier via L0/1 soft-margin loss. IEEE Trans Pattern Anal Mach Intell 2022; 44(10): 7253–7265

[63]

Wang Q, Ma Y, Zhao K, Tian Y J. A comprehensive survey of loss functions in machine learning. Ann Data Sci 2020; 9: 187–212

[64]

Wu Y C, Liu Y F. Robust truncated hinge loss support vector machines. J Amer Statist Assoc 2007; 102(479): 974–983

[65]

Xu J M, Li L. A face recognition algorithm based on sparse representation and support vector machine. Computer Technology and Development 2018; 28(2): 59–63(inChinese)

[66]

Xu Y Y, Akrotirianakis I, Chakraborty A. Proximal gradient method for huberized support vector machine. Pattern Anal Appl 2016; 19(4): 989–1005

[67]

Yan Y Q, Li Q N. An efficient augmented Lagrangian method for support vector machine. Optim Methods Softw 2020; 35(4): 855–883

[68]

Yang L M, Dong H W. Support vector machine with truncated pinball loss and its application in pattern recognition. Chemometrics Intell Lab Syst 2018; 177: 89–99

[69]

Yang Y, Zou H. An efficient algorithm for computing the HHSVM and its generalizations. J Comput Graph Statist 2013; 22(2): 396–415

[70]

Yang Z J, Xu Y T. A safe accelerative approach for pinball support vector machine classifier. Knowledge-Based Syst 2018; 147: 12–24

[71]

Yin J, Li Q N. A semismooth Newton method for support vector classification and regression. Comput Optim Appl 2019; 73(2): 477–508

[72]

ZhangC. Support Vector Machine Classifier via 0-1 Loss Function. MS Thesis. Beijing: Beijing Jiaotong University, 2019 (in Chinese)

[73]

Zhang T, Oles F J. Text categorization based on regularized linear classification methods. Information Retrieval 2001; 4(1): 5–31

[74]

Zhang W, Yoshida T, Tang X J. Text classification based on multi-word with support vector machine. Knowledge-Based Syst 2008; 21(8): 879–886

[75]

ZhaoLMammadovMYearwoodJ. From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops. Piscataway, NJ: IEEE, 2010, 1281–1288

[76]

Zhao Y P, Sun J G. Recursive reduced least squares support vector regression. Pattern Recognition 2009; 42(5): 837–842

[77]

Zhou S S. Sparse LSSVM in primal using Cholesky factorization for large-scale problems. IEEE Trans Neural Netw Learn Syst 2016; 27(4): 783–795

[78]

Zhu W X, Song Y Y, Xiao Y Y. Support vector machine classifier with huberized pinball loss. Eng Appl Artif Intell 2020; 91: 103635

RIGHTS & PERMISSIONS

Higher Education Press 2023

AI Summary AI Mindmap
PDF (4114KB)

872

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/