Bidirectional Transformer with absolute-position aware relative position encoding for encoding sentences

Le QI , Yu ZHANG , Ting LIU

Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (1) : 171301

PDF (7544KB)
Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (1) : 171301 DOI: 10.1007/s11704-022-0610-2
Artificial Intelligence
RESEARCH ARTICLE

Bidirectional Transformer with absolute-position aware relative position encoding for encoding sentences

Author information +
History +
PDF (7544KB)

Abstract

Transformers have been widely studied in many natural language processing (NLP) tasks, which can capture the dependency from the whole sentence with a high parallelizability thanks to the multi-head attention and the position-wise feed-forward network. However, the above two components of transformers are position-independent, which causes transformers to be weak in modeling sentence structures. Existing studies commonly utilized positional encoding or mask strategies for capturing the structural information of sentences. In this paper, we aim at strengthening the ability of transformers on modeling the linear structure of sentences from three aspects, containing the absolute position of tokens, the relative distance, and the direction between tokens. We propose a novel bidirectional Transformer with absolute-position aware relative position encoding (BiAR-Transformer) that combines the positional encoding and the mask strategy together. We model the relative distance between tokens along with the absolute position of tokens by a novel absolute-position aware relative position encoding. Meanwhile, we apply a bidirectional mask strategy for modeling the direction between tokens. Experimental results on the natural language inference, paraphrase identification, sentiment classification and machine translation tasks show that BiAR-Transformer achieves superior performance than other strong baselines.

Graphical abstract

Keywords

Transformer / relative position encoding / bidirectional mask strategy / sentence encoder

Cite this article

Download citation ▾
Le QI, Yu ZHANG, Ting LIU. Bidirectional Transformer with absolute-position aware relative position encoding for encoding sentences. Front. Comput. Sci., 2023, 17(1): 171301 DOI:10.1007/s11704-022-0610-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems. 2017, 5998−6008

[2]

Guo M, Zhang Y, Liu T. Gaussian transformer: a lightweight approach for natural language inference. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 6489−6496

[3]

Yu A W, Dohan D, Luong M T, Zhao R, Chen K, Norouzi M, Le Q V. QANet: combining local convolution with global self-attention for reading comprehension. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[4]

Dai Z, Yang Z, Yang Y, Carbonell J, Le Q V, Salakhutdinov R. Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 2978−2988

[5]

Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171−4186

[6]

Shen T, Jiang J, Zhou T, Pan S, Long G, Zhang C. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 5446−5455

[7]

Bowman S R, Angeli G, Potts C, Manning C D. A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 632– 642

[8]

Williams A, Nangia N, Bowman S. A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018, 1112−1122

[9]

Wang Z, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 4144−4150

[10]

Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014, 1532−1543

[11]

Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A. Advances in pre-training distributed word representations. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018

[12]

Hendrycks D, Gimpel K. Gaussian error linear units (GELUs). 2016, arXiv preprint arXiv: 1606.08415

[13]

Loshchilov I, Hutter F. Decoupled weight decay regularization. In: Proceedings of the 7th International Conference on Learning Representations. 2019

[14]

Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z. Star-transformer. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 1315−1325

[15]

Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015, 1556−1566

[16]

Wu W, Wang H, Liu T, Ma S. Phrase-level self-attention networks for universal sentence encoding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3729−3738

[17]

Im J, Cho S. Distance-based self-attention network for natural language inference. 2017, arXiv preprint arXiv: 1712.02047

[18]

Kim S, Kang I, Kwak N. Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 6586−6593

[19]

Talman A , Yli-Jyrä A , Tiedemann J . Sentence embeddings in NLI with iterative refinement encoders. Natural Language Engineering, 2019, 25( 4): 467– 482

[20]

Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018, 464– 468

[21]

Chen K, Wang R, Utiyama M, Sumita E. Recurrent positional embedding for neural machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 1361−1367

[22]

Chen K, Wang R, Utiyama M, Sumita E. Neural machine translation with reordering embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 1787−1799

[23]

Zheng Z , Huang S , Weng R , Dai X Y , Chen J . Improving self-attention networks with sequential relations. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28 : 1707– 1716

[24]

Hewitt J, Manning C D. A structural probe for finding syntax in word representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4129−4138

[25]

Wang Y, Lee H Y, Chen Y N. Tree transformer: integrating tree structures into self-attention. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 1061−1070

RIGHTS & PERMISSIONS

Higher Education Press 2021

AI Summary AI Mindmap
PDF (7544KB)

Supplementary files

Highlights

2534

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/