Extreme vocabulary learning

Hanze DONG , Zhenfeng SUN , Yanwei FU , Shi ZHONG , Zhengjun ZHANG , Yu-Gang JIANG

Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (6) : 146315

PDF (1241KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (6) : 146315 DOI: 10.1007/s11704-019-8249-3
RESEARCH ARTICLE

Extreme vocabulary learning

Author information +
History +
PDF (1241KB)

Abstract

Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.

Keywords

vocabulary-informed learning / zero-shot learning / extreme value theory

Cite this article

Download citation ▾
Hanze DONG, Zhenfeng SUN, Yanwei FU, Shi ZHONG, Zhengjun ZHANG, Yu-Gang JIANG. Extreme vocabulary learning. Front. Comput. Sci., 2020, 14(6): 146315 DOI:10.1007/s11704-019-8249-3

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Biederman I. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115

[2]

Scheirer W J, Jain L P, Boult T E. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324

[3]

Rebuff S A, Kolesnikov A, Lampert C H. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010

[4]

Opelt A, Pinz A, Zisserman A. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10

[5]

Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766

[6]

Scheirer W J, de Rezende Rocha A, Sapkota A, Boult T E. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772

[7]

Rudd E M, Jain L P, Scheirer W J, Boult T E. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768

[8]

Bendale A, Boult T. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902

[9]

Sattar H, Muller S, Fritz M, Bulling A. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990

[10]

Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465

[11]

Frome A, Corrado G S, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129

[12]

Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G S, Dean J. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650

[13]

Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119

[14]

Kumar Verma V, Arora G, Mishra A, Rai P. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289

[15]

Long T, Xu X, Li Y, Shen F M, Song J K, Shen H T. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289

[16]

Long Y, Liu L, Shen F M, Shao L, Li X L. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512

[17]

Xian Y Q, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551

[18]

Fu Y W, Sigal L. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346

[19]

Fu Y W, Wang X M, Dong H Z, Jiang Y G, Wang M, Xue X Y, Sigal L. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

[20]

Bai X, Rao C, Wang X G. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949

[21]

Wang X G, Wang B Y, Bai X, Liu W Y, Tu Z W. Max-margin multipleinstance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854

[22]

Zhang L, Xiang T, Gong S G. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030

[23]

Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359

[24]

Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95

[25]

Thrun S, Pratt L. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998

[26]

Rohrbach M, Stark M, Schiele B. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648

[27]

Tommasi T, Caputo B. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009

[28]

Li F F, Fergus R, Perona P. A Bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141

[29]

Bart E, Ullman S. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679

[30]

Hertz T, Hillel A, Weinshall D. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408

[31]

Fleuret F, Blanchard G. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378

[32]

Amit Y, Fink M, Srebro N, Ullman S. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24

[33]

Wolf L, Martin I. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364

[34]

Torralba A, Murphy K, Freeman W. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869

[35]

Rohrbach M, Ebert S, Schiele B. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54

[36]

Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B. What helps where – and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917

[37]

Torralba A, Murphy K P, Freeman W T. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114

[38]

Akata Z, Reed S, Walter D, Lee H, Schiele B. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936

[39]

Weston J, Bengio S, Usunier N. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770

[40]

Akata Z, Perronnin F, Harchaoui Z, Schmid C. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826

[41]

Fu Y W, Hospedales T M, Xiang T, Fu Z Y, Gong S G. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599

[42]

Farhadi A, Endres I, Hoiem D, Forsyth D. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785

[43]

Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for oneshot image recognition. In: Proceedings of International Conference on Machine Learning – Deep Learning Workshop. 2015

[44]

Kotz S, Nadarajah S. Extreme Value Distributions: Theory and Applications. World Scientific, 2000

[45]

Bartlett P, Freund Y, Lee W S, Schapire R E. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686

[46]

Coles S. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001

[47]

Fu Y W, Hospedales T M, Xiang T, Gong S G. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345

[48]

Fu Z Y, Xiang T, Kodirov E, Gong S. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644

[49]

Maaten L V D, Hinton G. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 2008, 9(Nov): 2579–2605

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (1241KB)

Supplementary files

Article highlights

1041

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/