Extreme vocabulary learning
Hanze DONG, Zhenfeng SUN, Yanwei FU, Shi ZHONG, Zhengjun ZHANG, Yu-Gang JIANG
Extreme vocabulary learning
Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.
vocabulary-informed learning / zero-shot learning / extreme value theory
[1] |
Biederman I. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115
CrossRef
Google scholar
|
[2] |
Scheirer W J, Jain L P, Boult T E. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324
CrossRef
Google scholar
|
[3] |
Rebuff S A, Kolesnikov A, Lampert C H. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010
CrossRef
Google scholar
|
[4] |
Opelt A, Pinz A, Zisserman A. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10
|
[5] |
Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766
|
[6] |
Scheirer W J, de Rezende Rocha A, Sapkota A, Boult T E. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772
CrossRef
Google scholar
|
[7] |
Rudd E M, Jain L P, Scheirer W J, Boult T E. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768
CrossRef
Google scholar
|
[8] |
Bendale A, Boult T. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902
CrossRef
Google scholar
|
[9] |
Sattar H, Muller S, Fritz M, Bulling A. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990
CrossRef
Google scholar
|
[10] |
Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465
CrossRef
Google scholar
|
[11] |
Frome A, Corrado G S, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129
|
[12] |
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G S, Dean J. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650
|
[13] |
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119
|
[14] |
Kumar Verma V, Arora G, Mishra A, Rai P. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289
CrossRef
Google scholar
|
[15] |
Long T, Xu X, Li Y, Shen F M, Song J K, Shen H T. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289
CrossRef
Google scholar
|
[16] |
Long Y, Liu L, Shen F M, Shao L, Li X L. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512
CrossRef
Google scholar
|
[17] |
Xian Y Q, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551
CrossRef
Google scholar
|
[18] |
Fu Y W, Sigal L. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346
CrossRef
Google scholar
|
[19] |
Fu Y W, Wang X M, Dong H Z, Jiang Y G, Wang M, Xue X Y, Sigal L. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
CrossRef
Google scholar
|
[20] |
Bai X, Rao C, Wang X G. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949
CrossRef
Google scholar
|
[21] |
Wang X G, Wang B Y, Bai X, Liu W Y, Tu Z W. Max-margin multipleinstance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854
|
[22] |
Zhang L, Xiang T, Gong S G. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030
CrossRef
Google scholar
|
[23] |
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359
|
[24] |
Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95
CrossRef
Google scholar
|
[25] |
Thrun S, Pratt L. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998
CrossRef
Google scholar
|
[26] |
Rohrbach M, Stark M, Schiele B. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648
CrossRef
Google scholar
|
[27] |
Tommasi T, Caputo B. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009
CrossRef
Google scholar
|
[28] |
Li F F, Fergus R, Perona P. A Bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141
|
[29] |
Bart E, Ullman S. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679
|
[30] |
Hertz T, Hillel A, Weinshall D. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408
|
[31] |
Fleuret F, Blanchard G. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378
|
[32] |
Amit Y, Fink M, Srebro N, Ullman S. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24
CrossRef
Google scholar
|
[33] |
Wolf L, Martin I. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364
|
[34] |
Torralba A, Murphy K, Freeman W. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869
CrossRef
Google scholar
|
[35] |
Rohrbach M, Ebert S, Schiele B. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54
|
[36] |
Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B. What helps where – and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917
CrossRef
Google scholar
|
[37] |
Torralba A, Murphy K P, Freeman W T. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114
CrossRef
Google scholar
|
[38] |
Akata Z, Reed S, Walter D, Lee H, Schiele B. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936
CrossRef
Google scholar
|
[39] |
Weston J, Bengio S, Usunier N. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770
|
[40] |
Akata Z, Perronnin F, Harchaoui Z, Schmid C. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826
CrossRef
Google scholar
|
[41] |
Fu Y W, Hospedales T M, Xiang T, Fu Z Y, Gong S G. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599
CrossRef
Google scholar
|
[42] |
Farhadi A, Endres I, Hoiem D, Forsyth D. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785
CrossRef
Google scholar
|
[43] |
Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for oneshot image recognition. In: Proceedings of International Conference on Machine Learning – Deep Learning Workshop. 2015
|
[44] |
Kotz S, Nadarajah S. Extreme Value Distributions: Theory and Applications. World Scientific, 2000
CrossRef
Google scholar
|
[45] |
Bartlett P, Freund Y, Lee W S, Schapire R E. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686
CrossRef
Google scholar
|
[46] |
Coles S. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001
CrossRef
Google scholar
|
[47] |
Fu Y W, Hospedales T M, Xiang T, Gong S G. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345
CrossRef
Google scholar
|
[48] |
Fu Z Y, Xiang T, Kodirov E, Gong S. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644
CrossRef
Google scholar
|
[49] |
Maaten L V D, Hinton G. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 2008, 9(Nov): 2579–2605
|
/
〈 | 〉 |