咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >SciCN:A Scientific Dataset for... 收藏

SciCN:A Scientific Dataset for Chinese Named Entity Recognition

作     者:Jing Yang Bin Ji Shasha Li Jun Ma Jie Yu 

作者机构:College of ComputerNational University of Defense TechnologyChangsha410073China 

出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))

年 卷 期:2024年第78卷第3期

页      面:4303-4315页

核心收录:

学科分类:0502[文学-外国语言文学] 050201[文学-英语语言文学] 05[文学] 0805[工学-材料科学与工程(可授工学、理学学位)] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:This research was supported by the National Key Research and Development Program[2020YFB1006302]. 

主  题:Named entity recognition dataset scientific information extraction lexicon 

摘      要:Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://***/yangjingla/SciCN.git.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分