咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >A novel unsupervised method fo... 收藏

A novel unsupervised method for new word extraction

A novel unsupervised method for new word extraction

作     者:Lili MEI Heyan HUANG Xiaochi WEI Xianling MAO 

作者机构:Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications Department of Computer Science and TechnologyBeijing Institute of Technology 

出 版 物:《Science China(Information Sciences)》 (中国科学:信息科学(英文版))

年 卷 期:2016年第59卷第9期

页      面:11-21页

核心收录:

学科分类:0810[工学-信息与通信工程] 0808[工学-电气工程] 08[工学] 081203[工学-计算机应用技术] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by State Key Program of National Natural Science of China(Grant No.61132009) National High Technology Research and Development Program of China(863 Program)(Grant No.2015AA015404) National Natural Science Foundation of China(Grant Nos.61201351,61402036) 

主  题:new word extraction word segmentation domain specificity statistical language knowledge domain word extraction 

摘      要:New words could benefit many NLP tasks such as sentence chunking and sentiment analysis. However, automatic new word extraction is a challenging task because new words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel method to extract new words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a candidate list of new words. Then, we employ the statistical language knowledge to extract the top ranked new words. Experimental results show that our proposed method is able to extract a large number of new words both in Chinese and English corpus, and notably outperforms the state-of-the-art methods. Moreover, we also demonstrate our method increases the accuracy of Chinese word segmentation by 10% on corpus containing new words.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分