Learning rates for multi-kernel linear programming classifiers

Feilong CAO; Xing XING

doi:10.1007/s11464-011-0103-3

PDF(198 KB)

Front. Math. China ›› 2011, Vol. 6 ›› Issue (2) : 203-219. DOI: 10.1007/s11464-011-0103-3

RESEARCH ARTICLE

Learning rates for multi-kernel linear programming classifiers

Feilong CAO ,
Xing XING

Author information +

History +

Abstract

In this paper, we consider the learning rates of multi-kernel linear programming classifiers. Our analysis shows that the convergence behavior of multi-kernel linear programming classifiers is almost the same as that of multi-kernel quadratic programming. This is implemented by setting a stepping stone between the linear programming and the quadratic programming. An upper bound is presented for general probability distributions and distribution satisfying some Tsybakov noise condition.

Keywords

Multi-kernel / linear programming / learning rate / classification

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Feilong CAO, Xing XING. Learning rates for multi-kernel linear programming classifiers. Front Math Chin, 2011, 6(2): 203‒219 https://doi.org/10.1007/s11464-011-0103-3

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Anthony M, Bartlett P L. Neural Network Learning: Theoretical Foundations. Cambridge: Cambridge University Press, 1999 CrossRef Google scholar

[2]	Aronszajn N. Theory of reproducing kernel. Trans Amer Math Soc, 1950, 68: 337-404

[3]	Bartleet P L. The sample complexity of pattern classification with neural networks: the size of the wights is more important than the size of the network. IEEE Trans Inform Theory, 1998, 44: 525-536 CrossRef Google scholar

[4]	Bartleet P L, Jordan M I, Mcaulifle J D. Convexity, classification, and risk bounds. J Amer Stati Assoc, 2006, 101: 138-156 CrossRef Google scholar

[5]	Chen D R, Wu Q, Ying Y, Zhou D X. Support vector machine soft margin classifier: error analysis. J Machine Learning Research, 2004, 5: 1143-1175

[6]	Cucker F, Smale S. On the mathematical foundations of learning. Bull Amer Math Soc, 2001, 39: 1-49 CrossRef Google scholar

[7]	Devroye L, Györfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. Berlin: Springer-Verlag, 1997

[8]	Lugosi G, Vayatis N. On the Bayes-risk consistency of regularized boosting methods. Ann Statist, 2004, 32: 30-55

[9]	Pollsard D. Convergence of Stochastic Processes. Berlin: Springer-Verlag, 1984

[10]	Scovel C, Steinwart I. Fast Rates for Support Vector Machines. Berlin/Heidelberg: Springer-Verlag, 2005

[11]	Smale S, Zhou D X. Shannon sampling and function reconstruction from point values. Bull Amer Math Soc, 2004, 41: 279-305 CrossRef Google scholar

[12]	Tsybakov A B. Optimal aggregation of classifiers in statistical learning. Ann Statis, 2004, 32: 135-166 CrossRef Google scholar

[13]	Vapnik V. Statistical Learning Theory. New York: John Wiley & Sons, 1998

[14]	Wahba G. Spline Models for Observational Data. Philadelphia: SIAM, 1990

[15]	Wu Q, Ying Y, Zhou D X. Multi-kernel regularized classifiers. J Complexity, 2007, 23: 108-134 CrossRef Google scholar

[16]	Wu Q, Zhou D X. SVM soft margin classifiers: linear programming versus quadratic programming. Neural Computation, 2005, 17: 1160-1187 CrossRef Google scholar

[17]	Zhang T. Covering number bounds of certain regularized linear function classes. J Machine Learning Research, 2002, 2: 527-550 CrossRef Google scholar

[18]	Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat, 2004, 32: 56-85 CrossRef Google scholar