RESEARCH ARTICLE

Learning rates for multi-kernel linear programming classifiers

  • Feilong CAO ,
  • Xing XING
Expand
  • Department of Mathematics, China Jiliang University, Hangzhou 310018, China

Received date: 10 Sep 2010

Accepted date: 08 Jan 2011

Published date: 01 Apr 2011

Copyright

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg

Abstract

In this paper, we consider the learning rates of multi-kernel linear programming classifiers. Our analysis shows that the convergence behavior of multi-kernel linear programming classifiers is almost the same as that of multi-kernel quadratic programming. This is implemented by setting a stepping stone between the linear programming and the quadratic programming. An upper bound is presented for general probability distributions and distribution satisfying some Tsybakov noise condition.

Cite this article

Feilong CAO , Xing XING . Learning rates for multi-kernel linear programming classifiers[J]. Frontiers of Mathematics in China, 2011 , 6(2) : 203 -219 . DOI: 10.1007/s11464-011-0103-3

1
Anthony M, Bartlett P L. Neural Network Learning: Theoretical Foundations. Cambridge: Cambridge University Press, 1999

DOI

2
Aronszajn N. Theory of reproducing kernel. Trans Amer Math Soc, 1950, 68: 337-404

3
Bartleet P L. The sample complexity of pattern classification with neural networks: the size of the wights is more important than the size of the network. IEEE Trans Inform Theory, 1998, 44: 525-536

DOI

4
Bartleet P L, Jordan M I, Mcaulifle J D. Convexity, classification, and risk bounds. J Amer Stati Assoc, 2006, 101: 138-156

DOI

5
Chen D R, Wu Q, Ying Y, Zhou D X. Support vector machine soft margin classifier: error analysis. J Machine Learning Research, 2004, 5: 1143-1175

6
Cucker F, Smale S. On the mathematical foundations of learning. Bull Amer Math Soc, 2001, 39: 1-49

DOI

7
Devroye L, Györfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. Berlin: Springer-Verlag, 1997

8
Lugosi G, Vayatis N. On the Bayes-risk consistency of regularized boosting methods. Ann Statist, 2004, 32: 30-55

9
Pollsard D. Convergence of Stochastic Processes. Berlin: Springer-Verlag, 1984

10
Scovel C, Steinwart I. Fast Rates for Support Vector Machines. Berlin/Heidelberg: Springer-Verlag, 2005

11
Smale S, Zhou D X. Shannon sampling and function reconstruction from point values. Bull Amer Math Soc, 2004, 41: 279-305

DOI

12
Tsybakov A B. Optimal aggregation of classifiers in statistical learning. Ann Statis, 2004, 32: 135-166

DOI

13
Vapnik V. Statistical Learning Theory. New York: John Wiley & Sons, 1998

14
Wahba G. Spline Models for Observational Data. Philadelphia: SIAM, 1990

15
Wu Q, Ying Y, Zhou D X. Multi-kernel regularized classifiers. J Complexity, 2007, 23: 108-134

DOI

16
Wu Q, Zhou D X. SVM soft margin classifiers: linear programming versus quadratic programming. Neural Computation, 2005, 17: 1160-1187

DOI

17
Zhang T. Covering number bounds of certain regularized linear function classes. J Machine Learning Research, 2002, 2: 527-550

DOI

18
Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat, 2004, 32: 56-85

DOI

Options
Outlines

/