Extreme vocabulary learning

Hanze DONG; Zhenfeng SUN; Yanwei FU; Shi ZHONG; Zhengjun ZHANG; Yu-Gang JIANG

doi:10.1007/s11704-019-8249-3

PDF(1241 KB)

Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (6) : 146315. DOI: 10.1007/s11704-019-8249-3

RESEARCH ARTICLE

Extreme vocabulary learning

Author information +

History +

Abstract

Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.

Keywords

vocabulary-informed learning / zero-shot learning / extreme value theory

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Hanze DONG, Zhenfeng SUN, Yanwei FU, Shi ZHONG, Zhengjun ZHANG, Yu-Gang JIANG. Extreme vocabulary learning. Front. Comput. Sci., 2020, 14(6): 146315 https://doi.org/10.1007/s11704-019-8249-3

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Biederman I. Recognition-by-components: a theory of human image understanding. Psychological Review, 1987, 94(2): 115 CrossRef Google scholar

[2]	Scheirer W J, Jain L P, Boult T E. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2317–2324 CrossRef Google scholar

[3]	Rebuff S A, Kolesnikov A, Lampert C H. iCaRL: incremental classifier and representation learning sylvestre-alvise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2001–2010 CrossRef Google scholar

[4]	Opelt A, Pinz A, Zisserman A. Incremental learning of object detectors using a visual shape alphabet. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 3–10

[5]	Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1760–1766

[6]	Scheirer W J, de Rezende Rocha A, Sapkota A, Boult T E. Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1757–1772 CrossRef Google scholar

[7]	Rudd E M, Jain L P, Scheirer W J, Boult T E. The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 762–768 CrossRef Google scholar

[8]	Bendale A, Boult T. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1893–1902 CrossRef Google scholar

[9]	Sattar H, Muller S, Fritz M, Bulling A. Prediction of search targets from fixations in open-world settings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 981–990 CrossRef Google scholar

[10]	Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3): 453–465 CrossRef Google scholar

[11]	Frome A, Corrado G S, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2121–2129

[12]	Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G S, Dean J. Zero-shot learning by convex combination of semantic embeddings. 2013, arXiv preprint arXiv:1312.5650

[13]	Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 3111–3119

[14]	Kumar Verma V, Arora G, Mishra A, Rai P. Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 4281–4289 CrossRef Google scholar

[15]	Long T, Xu X, Li Y, Shen F M, Song J K, Shen H T. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. 2018, 4281–4289 CrossRef Google scholar

[16]	Long Y, Liu L, Shen F M, Shao L, Li X L. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2498–2512 CrossRef Google scholar

[17]	Xian Y Q, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551 CrossRef Google scholar

[18]	Fu Y W, Sigal L. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 5337–5346 CrossRef Google scholar

[19]	Fu Y W, Wang X M, Dong H Z, Jiang Y G, Wang M, Xue X Y, Sigal L. Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019 CrossRef Google scholar

[20]	Bai X, Rao C, Wang X G. Shape vocabulary: a robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing, 2014, 23(9): 3935–3949 CrossRef Google scholar

[21]	Wang X G, Wang B Y, Bai X, Liu W Y, Tu Z W. Max-margin multipleinstance dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 846–854

[22]	Zhang L, Xiang T, Gong S G. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2021–2030 CrossRef Google scholar

[23]	Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Data and Knowledge Engineering, 2010, 22(10): 1345–1359

[24]	Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Artificial Intelligence Review, 2002, 18(2): 77–95 CrossRef Google scholar

[25]	Thrun S, Pratt L. Learning to Learn: Introduction and Overview. Springer, Boston, MA, 1998 CrossRef Google scholar

[26]	Rohrbach M, Stark M, Schiele B. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1641–1648 CrossRef Google scholar

[27]	Tommasi T, Caputo B. The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of British Machine Vision Conference. 2009 CrossRef Google scholar

[28]	Li F F, Fergus R, Perona P. A Bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of IEEE International Conference on Computer Vision. 2003, 1134–1141

[29]	Bart E, Ullman S. Cross-generalization: learning novel classes from a single example by feature replacement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 672–679

[30]	Hertz T, Hillel A, Weinshall D. Learning a kernel function for classification with small training samples. In: Proceedings of International Conference on Machine Learning. 2016, 401–408

[31]	Fleuret F, Blanchard G. Pattern recognition from one example by chopping. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005, 371–378

[32]	Amit Y, Fink M, Srebro N, Ullman S. Uncovering shared structures in multiclass classification. In: Proceedings of International Conference on Machine Learning. 2007, 17–24 CrossRef Google scholar

[33]	Wolf L, Martin I. Robust boosting for learning from few examples. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 359–364

[34]	Torralba A, Murphy K, Freeman W. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 19(5): 854–869 CrossRef Google scholar

[35]	Rohrbach M, Ebert S, Schiele B. Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 46–54

[36]	Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B. What helps where – and why? semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 910–917 CrossRef Google scholar

[37]	Torralba A, Murphy K P, Freeman W T. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM, 2010, 53(3): 107–114 CrossRef Google scholar

[38]	Akata Z, Reed S, Walter D, Lee H, Schiele B. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2927–2936 CrossRef Google scholar

[39]	Weston J, Bengio S, Usunier N. Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770

[40]	Akata Z, Perronnin F, Harchaoui Z, Schmid C. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 819–826 CrossRef Google scholar

[41]	Fu Y W, Hospedales T M, Xiang T, Fu Z Y, Gong S G. Transductive multi-view embedding for zero-shot recognition and annotation. In: Proceedings of European Conference on Computer Vision. 2014, 584–599 CrossRef Google scholar

[42]	Farhadi A, Endres I, Hoiem D, Forsyth D. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1778–1785 CrossRef Google scholar

[43]	Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for oneshot image recognition. In: Proceedings of International Conference on Machine Learning – Deep Learning Workshop. 2015

[44]	Kotz S, Nadarajah S. Extreme Value Distributions: Theory and Applications. World Scientific, 2000 CrossRef Google scholar

[45]	Bartlett P, Freund Y, Lee W S, Schapire R E. Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 1998, 26(5): 1651–1686 CrossRef Google scholar

[46]	Coles S. An Introduction to Statistical Modeling of Extreme Values. London: Springer, 2001 CrossRef Google scholar

[47]	Fu Y W, Hospedales T M, Xiang T, Gong S G. Transductive multiview zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345 CrossRef Google scholar

[48]	Fu Z Y, Xiang T, Kodirov E, Gong S. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2635–2644 CrossRef Google scholar