Extreme vocabulary learning

Hanze DONG, Zhenfeng SUN, Yanwei FU, Shi ZHONG, Zhengjun ZHANG, Yu-Gang JIANG

PDF(1241 KB)
PDF(1241 KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (6) : 146315. DOI: 10.1007/s11704-019-8249-3
RESEARCH ARTICLE

Extreme vocabulary learning

Author information +
History +

Abstract

Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.

Keywords

vocabulary-informed learning / zero-shot learning / extreme value theory

Cite this article

Download citation ▾
Hanze DONG, Zhenfeng SUN, Yanwei FU, Shi ZHONG, Zhengjun ZHANG, Yu-Gang JIANG. Extreme vocabulary learning. Front. Comput. Sci., 2020, 14(6): 146315 https://doi.org/10.1007/s11704-019-8249-3

References

[1]
Biederman I. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115
CrossRef Google scholar
[2]
Scheirer W J, Jain L P, Boult T E. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324
CrossRef Google scholar
[3]
Rebuff S A, Kolesnikov A, Lampert C H. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010
CrossRef Google scholar
[4]
Opelt A, Pinz A, Zisserman A. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10
[5]
Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766
[6]
Scheirer W J, de Rezende Rocha A, Sapkota A, Boult T E. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772
CrossRef Google scholar
[7]
Rudd E M, Jain L P, Scheirer W J, Boult T E. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768
CrossRef Google scholar
[8]
Bendale A, Boult T. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902
CrossRef Google scholar
[9]
Sattar H, Muller S, Fritz M, Bulling A. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990
CrossRef Google scholar
[10]
Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465
CrossRef Google scholar
[11]
Frome A, Corrado G S, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129
[12]
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G S, Dean J. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650
[13]
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119
[14]
Kumar Verma V, Arora G, Mishra A, Rai P. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289
CrossRef Google scholar
[15]
Long T, Xu X, Li Y, Shen F M, Song J K, Shen H T. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289
CrossRef Google scholar
[16]
Long Y, Liu L, Shen F M, Shao L, Li X L. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512
CrossRef Google scholar
[17]
Xian Y Q, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551
CrossRef Google scholar
[18]
Fu Y W, Sigal L. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346
CrossRef Google scholar
[19]
Fu Y W, Wang X M, Dong H Z, Jiang Y G, Wang M, Xue X Y, Sigal L. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
CrossRef Google scholar
[20]
Bai X, Rao C, Wang X G. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949
CrossRef Google scholar
[21]
Wang X G, Wang B Y, Bai X, Liu W Y, Tu Z W. Max-margin multipleinstance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854
[22]
Zhang L, Xiang T, Gong S G. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030
CrossRef Google scholar
[23]
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359
[24]
Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95
CrossRef Google scholar
[25]
Thrun S, Pratt L. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998
CrossRef Google scholar
[26]
Rohrbach M, Stark M, Schiele B. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648
CrossRef Google scholar
[27]
Tommasi T, Caputo B. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009
CrossRef Google scholar
[28]
Li F F, Fergus R, Perona P. A Bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141
[29]
Bart E, Ullman S. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679
[30]
Hertz T, Hillel A, Weinshall D. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408
[31]
Fleuret F, Blanchard G. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378
[32]
Amit Y, Fink M, Srebro N, Ullman S. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24
CrossRef Google scholar
[33]
Wolf L, Martin I. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364
[34]
Torralba A, Murphy K, Freeman W. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869
CrossRef Google scholar
[35]
Rohrbach M, Ebert S, Schiele B. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54
[36]
Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B. What helps where – and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917
CrossRef Google scholar
[37]
Torralba A, Murphy K P, Freeman W T. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114
CrossRef Google scholar
[38]
Akata Z, Reed S, Walter D, Lee H, Schiele B. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936
CrossRef Google scholar
[39]
Weston J, Bengio S, Usunier N. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770
[40]
Akata Z, Perronnin F, Harchaoui Z, Schmid C. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826
CrossRef Google scholar
[41]
Fu Y W, Hospedales T M, Xiang T, Fu Z Y, Gong S G. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599
CrossRef Google scholar
[42]
Farhadi A, Endres I, Hoiem D, Forsyth D. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785
CrossRef Google scholar
[43]
Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for oneshot image recognition. In: Proceedings of International Conference on Machine Learning – Deep Learning Workshop. 2015
[44]
Kotz S, Nadarajah S. Extreme Value Distributions: Theory and Applications. World Scientific, 2000
CrossRef Google scholar
[45]
Bartlett P, Freund Y, Lee W S, Schapire R E. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686
CrossRef Google scholar
[46]
Coles S. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001
CrossRef Google scholar
[47]
Fu Y W, Hospedales T M, Xiang T, Gong S G. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345
CrossRef Google scholar
[48]
Fu Z Y, Xiang T, Kodirov E, Gong S. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644
CrossRef Google scholar
[49]
Maaten L V D, Hinton G. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 2008, 9(Nov): 2579–2605

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1241 KB)

Accesses

Citations

Detail

Sections
Recommended

/