Max-margin basedBayesian classifier

Tao-cheng HU; Jin-hui YU

doi:10.1631/FITEE.1601078

PDF(466 KB)

Front. Inform. Technol. Electron. Eng ›› 2016, Vol. 17 ›› Issue (10) : 973-981. DOI: 10.1631/FITEE.1601078

Article

Max-margin basedBayesian classifier

Tao-cheng HU ,
Jin-hui YU

Author information +

History +

Abstract

There is a tradeoff between generalization capability and computational overhead in multi-class learning. We propose a generative probabilistic multi-class classifier, considering both the generalization capability and the learning/prediction rate. We show that the classifier has a max-margin property. Thus, prediction on future unseen data can nearly achieve the same performance as in the training stage. In addition, local variables are eliminated, which greatly simplifies the optimization problem. By convex and probabilistic analysis, an efficient online learning algorithm is developed. The algorithm aggregates rather than averages dualities, which is different from the classical situations. Empirical results indicate that our method has a good generalization capability and coverage rate.

Keywords

Multi-class learning / Max-margin learning / Online algorithm

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Tao-cheng HU, Jin-hui YU. Max-margin basedBayesian classifier. Front. Inform. Technol. Electron. Eng, 2016, 17(10): 973‒981 https://doi.org/10.1631/FITEE.1601078

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Agarwal, A., Kakade, S.M., Karampatziakis, N., , 2014. Least squares revisited: calable approaches for multiclass prediction. Proc. Int. Conf. on Machine Learning, p.541–549.

[2]	Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer, New York, USA.

[3]	Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3(Jan):993–1022.

[4]	Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press, Cambridge, UK.

[5]	Cai, Q., Yin, Y.F., Man, H., 2013. DSPM: dynamic structure preserving map for action recognition. IEEE Int. Conf. on Multimedia and Expo, p.1–6. http://dx.doi.org/10.1109/ICME.2013.6607606

[6]	Coates, A., Lee, H., Ng, A.Y., 2011. An analysis of singlelayer networks in unsupervised feature learning. Int. Conf. on Artificial Intelligence and Statistics, p.215–223.

[7]	Daniely, A., Shalev-Shwartz, S., 2014. Optimal learners for multiclass problems. Proc. Conf. on Learning Theory, p.287–316.

[8]	Duchi, J., Hazan, E., Singer, Y., 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res., 12:2121–2159.

[9]	Galar, M., Fernández, A., Barrenechea, E., , 2011. An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Patt. Recogn., 44(8):1761–1776. http://dx.doi.org/10.1016/j.patcog.2011.01.017

[10]	Hazan, E., Rakhlin, A., Bartlett, P.L., 2007. Adaptive online gradient descent.In: Platt, J.C., Koller, D., Singer, Y., et al. (Eds.), Advances in Neural Information Processing Systems 20. MIT Press, Canada, p.65–72.

[11]	Hu, T.C., Yu, J.H., 2015. Generalized entropy based semi-supervised learning. IEEE/ACIS Int. Conf. on Computer and Information Science, p.259–263. http://dx.doi.org/10.1109/ICIS.2015.7166603

[12]	Hu, T.C., Yu, J.H., 2016. Incremental max-margin learning for semi-supervised multi-class problem. Stud. Comput. Intell., 612:31–43. http://dx.doi.org/10.1007/978-3-319-23509-7_3

[13]	Jebara, T., 2004. Machine learning: discriminative and generative. In : Meila, M. (Ed.), the Kluwer International Series in Engineering and Computer Science. Kluwer Academic, Germany.

[14]	LeCun, Y., Bottou, L., Bengio, Y., , 1998. Gradientbased learning applied to document recognition. Proc. IEEE, 86(11):2278–2324.

[15]	Nene, S.A., Nayar, S.K., Murase, H., 1996a. Columbia Object Image Library (COIL-20) Available from http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php [<Date>Accessed on Feb. 1, 2016</Date>].

[16]	Nene, S.A., Nayar, S.K., Murase, H., 1996b. Columbia Object Image Library (COIL-100) Available from http://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php [<Date>Accessed on Feb. 1, 2016</Date>].

[17]	Rahimi, A., Recht, B., 2007. Random features for large-scale kernel machines.In: Platt, J.C., Koller, D., Singer, Y., et al. (Eds.), Advances in Neural Information Processing Systems 20. MIT Press, Canada, p.1177–1184.

[18]	Ramaswamy, H.G., Babu, B.S., Agarwal, S., , 2014. On the consistency of output code based learning algorithms for multiclass learning problems. Proc. Conf. on Learning Theory, p.885–902.

[19]	Shalev-Shwartz, S., 2007. Online learning: theory, algorithms and applications. PhD Thesis, Hebrew University, Jerusalem, Israel.

[20]	Shalev-Shwartz, S., Kakade, S.M., 2009. Mind the duality gap: logarithmic regret algorithms for online optimization. In : Koller, D., Schuurmans, D., Bengio, Y. (Eds.), Advances in Neural Information Processing Systems 21. MIT Press, Canada, p.1457–1464.

[21]	Srebro, N., Sridharan, K., Tewari, A., 2011. On the universality of online mirror descent. In: Saul, L.K., Weiss, Y., Bottou, L. (Eds.), Advances in Neural Information Processing Systems 17. MIT Press, Canada, p.2645–2653.

[22]	Zhu, J., 2012. Max-margin nonparametric latent feature models for link prediction. Proc. Int. Conf. on Machine Learning, p.719–726.

[23]	Zhu, J., Xing, E.P., 2009. Maximum entropy discrimination Markov networks. J. Mach. Learn. Res., 10(Nov):2531–2569.

[24]	Zhu, J., Chen, N., Xing, E.P., 2011. Infinite latent SVM for classification and multi-task learning. In : Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., et al. (Eds.), Advances in Neural Information Processing Systems 24. MIT Press, Canada, p.1620–1628.

[25]	Zhu, J., Chen, N., Perkins, H., , 2013. Gibbs maxmargin topic models with fast sampling algorithms. Proc. Int. Conf. on Machine Learning, p.124–132.