Text Mining and Analysis of Treatise on Febrile Diseases Based on Natural Language Processing
Text Mining and Analysis of Treatise on Febrile Diseases Based on Natural Language Processing作者机构:School of Traditional Chinese MedicineBeijing University of Chinese Medicine School of Life ScienceBeijing University of Chinese Medicine Beijing 100029China
出 版 物:《World Journal of Traditional Chinese Medicine》 (世界中医药杂志(英文))
年 卷 期:2020年第6卷第1期
页 面:67-73页
学科分类:100208[医学-临床检验诊断学] 1002[医学-临床医学] 081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 10[医学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Knowledge discovery natural language processing text mining traditional Chinese medicine literature treatise on febrile diseases
摘 要:Objective:With using natural language processing (NLP) technology to analyze and process the text of Treatise on Febrile Diseases (TFDs)for the sake of finding important information, this paper attempts to apply NLP in the field of text mining of traditional Chinese medicine (TCM)literature. Materials and Methods:Based on the Python language, the experiment invoked the NLP toolkit such as Jieba, nltk, gensim,and sklearn library, and combined with Excel and Word software. The text of TFDs was sequentially cleaned, segmented, and moved the stopped words, and then implementing word frequency statistics and analysis, keyword extraction, named entity recognition (NER) and other operations, finally calculating text similarity. Results:Jieba can accurately identify the herbal name in TFDs. Word frequency statistics based on the word segmentation found that warm therapy is an important treatment of TFDs. Guizhi decoction is the main prescription,and five core decoctions are identified. Keyword extraction based on the term frequency-inverse document frequency algorithm is *** accuracy of NER in TFDs is about 86%;latent semantic indexing model calculating the similarity,Understanding of Synopsis of Golden Chamber (SGC) is much more similar with SGC than with TFDs. The results meet expectation. Conclusions:It lays a research foundation for applying NLP to the field of text mining of unstructured TCM literature. With the combination of deep learning technology,NLP as an important branch of artificial intelligence will have broader application prospective in the field of text mining in TCM literature and construction of TCM knowledge graph as well as TCM knowledge services.