咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Bidirectional Transformer with... 收藏

Bidirectional Transformer with absolute-position aware relative position encoding for encoding sentences

作     者:Le QI Yu ZHANG Ting LIU Le QI;Yu ZHANG;Ting LIU

作者机构:School of Computer Science and TechnologyHarbin Institute of TechnologyHarbin 150001China 

出 版 物:《Frontiers of Computer Science》 (中国计算机科学前沿(英文版))

年 卷 期:2023年第17卷第1期

页      面:63-71页

核心收录:

学科分类:08[工学] 080203[工学-机械设计及理论] 0802[工学-机械工程] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by the Key Development Program of the Ministry of Science and Technology(2019YFF0303003) the National Natural Science Foundation of China(Grant No.61976068) “Hundreds,Millions”Engineering Science and Technology Major Special Project of Heilongjiang Province(2020ZX14A02). 

主  题:Transformer relative position encoding bidirectional mask strategy sentence encoder 

摘      要:Transformers have been widely studied in many natural language processing (NLP) tasks, which can capture the dependency from the whole sentence with a high parallelizability thanks to the multi-head attention and the position-wise feed-forward network. However, the above two components of transformers are position-independent, which causes transformers to be weak in modeling sentence structures. Existing studies commonly utilized positional encoding or mask strategies for capturing the structural information of sentences. In this paper, we aim at strengthening the ability of transformers on modeling the linear structure of sentences from three aspects, containing the absolute position of tokens, the relative distance, and the direction between tokens. We propose a novel bidirectional Transformer with absolute-position aware relative position encoding (BiAR-Transformer) that combines the positional encoding and the mask strategy together. We model the relative distance between tokens along with the absolute position of tokens by a novel absolute-position aware relative position encoding. Meanwhile, we apply a bidirectional mask strategy for modeling the direction between tokens. Experimental results on the natural language inference, paraphrase identification, sentiment classification and machine translation tasks show that BiAR-Transformer achieves superior performance than other strong baselines.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分