Learning rates for multi-kernel linear programming classifiers

Feilong CAO, Xing XING

PDF(198 KB)
PDF(198 KB)
Front. Math. China ›› 2011, Vol. 6 ›› Issue (2) : 203-219. DOI: 10.1007/s11464-011-0103-3
RESEARCH ARTICLE
RESEARCH ARTICLE

Learning rates for multi-kernel linear programming classifiers

Author information +
History +

Abstract

In this paper, we consider the learning rates of multi-kernel linear programming classifiers. Our analysis shows that the convergence behavior of multi-kernel linear programming classifiers is almost the same as that of multi-kernel quadratic programming. This is implemented by setting a stepping stone between the linear programming and the quadratic programming. An upper bound is presented for general probability distributions and distribution satisfying some Tsybakov noise condition.

Keywords

Multi-kernel / linear programming / learning rate / classification

Cite this article

Download citation ▾
Feilong CAO, Xing XING. Learning rates for multi-kernel linear programming classifiers. Front Math Chin, 2011, 6(2): 203‒219 https://doi.org/10.1007/s11464-011-0103-3

References

[1]
Anthony M, Bartlett P L. Neural Network Learning: Theoretical Foundations. Cambridge: Cambridge University Press, 1999
CrossRef Google scholar
[2]
Aronszajn N. Theory of reproducing kernel. Trans Amer Math Soc, 1950, 68: 337-404
[3]
Bartleet P L. The sample complexity of pattern classification with neural networks: the size of the wights is more important than the size of the network. IEEE Trans Inform Theory, 1998, 44: 525-536
CrossRef Google scholar
[4]
Bartleet P L, Jordan M I, Mcaulifle J D. Convexity, classification, and risk bounds. J Amer Stati Assoc, 2006, 101: 138-156
CrossRef Google scholar
[5]
Chen D R, Wu Q, Ying Y, Zhou D X. Support vector machine soft margin classifier: error analysis. J Machine Learning Research, 2004, 5: 1143-1175
[6]
Cucker F, Smale S. On the mathematical foundations of learning. Bull Amer Math Soc, 2001, 39: 1-49
CrossRef Google scholar
[7]
Devroye L, Györfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. Berlin: Springer-Verlag, 1997
[8]
Lugosi G, Vayatis N. On the Bayes-risk consistency of regularized boosting methods. Ann Statist, 2004, 32: 30-55
[9]
Pollsard D. Convergence of Stochastic Processes. Berlin: Springer-Verlag, 1984
[10]
Scovel C, Steinwart I. Fast Rates for Support Vector Machines. Berlin/Heidelberg: Springer-Verlag, 2005
[11]
Smale S, Zhou D X. Shannon sampling and function reconstruction from point values. Bull Amer Math Soc, 2004, 41: 279-305
CrossRef Google scholar
[12]
Tsybakov A B. Optimal aggregation of classifiers in statistical learning. Ann Statis, 2004, 32: 135-166
CrossRef Google scholar
[13]
Vapnik V. Statistical Learning Theory. New York: John Wiley & Sons, 1998
[14]
Wahba G. Spline Models for Observational Data. Philadelphia: SIAM, 1990
[15]
Wu Q, Ying Y, Zhou D X. Multi-kernel regularized classifiers. J Complexity, 2007, 23: 108-134
CrossRef Google scholar
[16]
Wu Q, Zhou D X. SVM soft margin classifiers: linear programming versus quadratic programming. Neural Computation, 2005, 17: 1160-1187
CrossRef Google scholar
[17]
Zhang T. Covering number bounds of certain regularized linear function classes. J Machine Learning Research, 2002, 2: 527-550
CrossRef Google scholar
[18]
Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat, 2004, 32: 56-85
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(198 KB)

Accesses

Citations

Detail

Sections
Recommended

/