Learning rates for multi-kernel linear programming classifiers
Feilong CAO, Xing XING
Learning rates for multi-kernel linear programming classifiers
In this paper, we consider the learning rates of multi-kernel linear programming classifiers. Our analysis shows that the convergence behavior of multi-kernel linear programming classifiers is almost the same as that of multi-kernel quadratic programming. This is implemented by setting a stepping stone between the linear programming and the quadratic programming. An upper bound is presented for general probability distributions and distribution satisfying some Tsybakov noise condition.
Multi-kernel / linear programming / learning rate / classification
[1] |
Anthony M, Bartlett P L. Neural Network Learning: Theoretical Foundations. Cambridge: Cambridge University Press, 1999
CrossRef
Google scholar
|
[2] |
Aronszajn N. Theory of reproducing kernel. Trans Amer Math Soc, 1950, 68: 337-404
|
[3] |
Bartleet P L. The sample complexity of pattern classification with neural networks: the size of the wights is more important than the size of the network. IEEE Trans Inform Theory, 1998, 44: 525-536
CrossRef
Google scholar
|
[4] |
Bartleet P L, Jordan M I, Mcaulifle J D. Convexity, classification, and risk bounds. J Amer Stati Assoc, 2006, 101: 138-156
CrossRef
Google scholar
|
[5] |
Chen D R, Wu Q, Ying Y, Zhou D X. Support vector machine soft margin classifier: error analysis. J Machine Learning Research, 2004, 5: 1143-1175
|
[6] |
Cucker F, Smale S. On the mathematical foundations of learning. Bull Amer Math Soc, 2001, 39: 1-49
CrossRef
Google scholar
|
[7] |
Devroye L, Györfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. Berlin: Springer-Verlag, 1997
|
[8] |
Lugosi G, Vayatis N. On the Bayes-risk consistency of regularized boosting methods. Ann Statist, 2004, 32: 30-55
|
[9] |
Pollsard D. Convergence of Stochastic Processes. Berlin: Springer-Verlag, 1984
|
[10] |
Scovel C, Steinwart I. Fast Rates for Support Vector Machines. Berlin/Heidelberg: Springer-Verlag, 2005
|
[11] |
Smale S, Zhou D X. Shannon sampling and function reconstruction from point values. Bull Amer Math Soc, 2004, 41: 279-305
CrossRef
Google scholar
|
[12] |
Tsybakov A B. Optimal aggregation of classifiers in statistical learning. Ann Statis, 2004, 32: 135-166
CrossRef
Google scholar
|
[13] |
Vapnik V. Statistical Learning Theory. New York: John Wiley & Sons, 1998
|
[14] |
Wahba G. Spline Models for Observational Data. Philadelphia: SIAM, 1990
|
[15] |
Wu Q, Ying Y, Zhou D X. Multi-kernel regularized classifiers. J Complexity, 2007, 23: 108-134
CrossRef
Google scholar
|
[16] |
Wu Q, Zhou D X. SVM soft margin classifiers: linear programming versus quadratic programming. Neural Computation, 2005, 17: 1160-1187
CrossRef
Google scholar
|
[17] |
Zhang T. Covering number bounds of certain regularized linear function classes. J Machine Learning Research, 2002, 2: 527-550
CrossRef
Google scholar
|
[18] |
Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat, 2004, 32: 56-85
CrossRef
Google scholar
|
/
〈 | 〉 |