Relation Reconstructive Binarization of word embeddings
Feiyang PAN , Shuokai LI , Xiang AO , Qing HE
Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (2) : 162307
Relation Reconstructive Binarization of word embeddings
Word-embedding acts as one of the backbones of modern natural language processing (NLP). Recently, with the need for deploying NLP models to low-resource devices, there has been a surge of interest to compress word embeddings into hash codes or binary vectors so as to save the storage and memory consumption. Typically, existing work learns to encode an embedding into a compressed representation from which the original embedding can be reconstructed. Although these methods aim to preserve most information of every individual word, they often fail to retain the relation between words, thus can yield large loss on certain tasks. To this end, this paper presents Relation Reconstructive Binarization (R2B) to transform word embeddings into binary codes that can preserve the relation between words. At its heart, R2B trains an auto-encoder to generate binary codes that allow reconstructing the word-by-word relations in the original embedding space. Experiments showed that our method achieved significant improvements over previous methods on a number of tasks along with a space-saving of up to 98.4%. Specifically, our method reached even better results on word similarity evaluation than the uncompressed pre-trained embeddings, and was significantly better than previous compression methods that do not consider word relations.
embedding compression / variational auto-encoder / binary word embedding
| [1] |
|
| [2] |
Datar M, Immorlica N, Indyk P, Mirrokni V S. Locality-sensitive hashing scheme basedon p-stable distributions. In: Proceedings of the twentieth Annual Symposium on Computational Geometry. 2004, 253–262. |
| [3] |
|
| [4] |
Kulis B, Darrell T. Learning to hash with binary reconstructive embeddings. In: Proceedings of Advances in Neural Information Processing Systems. 2009, 1042–1050. |
| [5] |
Shu R, Nakayama H. Compressing wordembeddings via deep compositional code learning. In: Proceedings of International Conference on Learning Representations.2018. |
| [6] |
Tissier J, Gravier C, Habrard A. Near-lossless binarization of word embeddings. In: Proceedings of 33rd AAAI Conference on Artificial Intelligence. 2019, 7104-7111. |
| [7] |
Maddison C J, Mnih A, Teh Y W. The concrete distribution: a continuous relaxation of discrete random variables. In: Proceedings of International Conference on Learning Representations. 2017 |
| [8] |
Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax. In: Proceedings of International Conference on Learning Representations. 2017 |
| [9] |
Chen T, Min M R, Sun Y. Learning k-way d-dimensional discrete codes for compact embedding representations. In: Proceedings of International Conference on Machine Learning. 2018, 854–863 |
| [10] |
Park W, Kim D, Lu Y, Cho M. Relational knowledge distillation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019, 3967–3976 |
| [11] |
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1532–1543. |
| [12] |
|
| [13] |
Tissier J, Gravier C, Habrard A. Dict2vec: learning word embeddings using lexical dictionaries. In: Proceedings of the 2017 Conferenceon Empirical Methods in Natural Language Processing. 2017, 254–263 |
| [14] |
|
| [15] |
Luong T, Socher R, Manning C. Better word representations with recursive neuralnetworks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, 104–113 |
| [16] |
|
| [17] |
|
| [18] |
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of wordsand phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems. 2013, 3111–3119 |
| [19] |
Mikolov T, Yih W-T, Zweig G. Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies. 2013, 746–751 |
| [20] |
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems. 2015, 649–657 |
| [21] |
Chen Q, Zhu X, Ling Z-H, Wei S, Jiang H, Inkpen D. Enhanced lstm for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, 1657–1668 |
| [22] |
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L,Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 5998–6008 |
| [23] |
Klein G, Kim Y, Deng Y, Senellart J, Rush J M. Opennmt: open-source toolkit for neural machine translation. In: Proceedings ofACL 2017: System Demonstrations. 2017, 67–72 |
| [24] |
Yin Z, Shen Y. On the dimensionality of word embedding. In: Proceedings of Advances in Neural Information Processing Systems. 2018, 887–898 |
Higher Education Press
/
| 〈 |
|
〉 |