Word Embedding Bootstrapped Deep Active Learning Method to Information Extraction on Chinese Electronic Medical Record
词嵌入自举的深活跃的学习方法到汉语上的信息抽取电子医药记录作者机构:Shanghai Chest HospitalShanghai Jiao Tong UniversityShanghai200030China
出 版 物:《Journal of Shanghai Jiaotong university(Science)》 (上海交通大学学报(英文版))
年 卷 期:2021年第26卷第4期
页 面:494-502页
核心收录:
学科分类:1001[医学-基础医学(可授医学、理学学位)] 081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 10[医学]
主 题:deep active learning named entity recognition(NER) information extraction word embedding Chinese electronic medical record(EMR)
摘 要:Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, which increases the use cost and hinders its applications. In this work, an effective named entity recognition (NER) method is presented for information extraction on Chinese EMR, which is achieved by word embedding bootstrapped deep active learning to promote the acquisition of medical information from Chinese EMR and to release its value. In this work, deep active learning of bi-directional long short-term memory followed by conditional random field (Bi-LSTM+CRF) is used to capture the characteristics of different information from labeled corpus, and the word embedding models of contiguous bag of words and skip-gram are combined in the above model to respectively capture the text feature of Chinese EMR from unlabeled corpus. To evaluate the performance of above method, the tasks of NER on Chinese EMR with “medical history content were used. Experimental results show that the word embedding bootstrapped deep active learning method using unlabeled medical corpus can achieve a better performance compared with other models.