A Cross-Domain Ontology Semantic Representation Based on NCBI-BlueBERT Embedding
A Cross-Domain Ontology Semantic Representation Based on NCBI-BlueBERT Embedding作者机构:Faculty of Computing Harbin Institute of Technology Department of Medical Informatics School of Biomedical Engineering and InformaticsNanjing Medical University Beijing Key Laboratory of Intelligent Processing for Building Big Data School of Electrical and Information Engineering Beijing University of Civil Engineering and Architecture
出 版 物:《Chinese Journal of Electronics》 (电子学报(英文))
年 卷 期:2022年第31卷第5期
页 面:860-869页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 081203[工学-计算机应用技术] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by the National Natural Science Foundation of China (62171164 62102191 61872114 62131004)
主 题:Ontology Semantic representation Semantic similarity Protein-protein interaction
摘 要:A common but critical task in biological ontologies data analysis is to compare the difference between ontologies. There have been numerous ontologybased semantic-similarity measures proposed in specific ontology domain, but it still remains a challenge for crossdomain ontologies comparison. An ontology contains the scientific natural language description for the corresponding biological aspect. Therefore, we develop a new method based on natural language processing(NLP) representation model bidirectional encoder representations from transformers(BERT) for cross-domain semantic representation of biological ontologies. This article uses the BERT model to represent the word-level of the ontologies as a set of vectors, facilitating the semantic analysis or comparing the biomedical entities named in an ontology or associated with ontology terms. We evaluated the ability of our method in two experiments: calculating similarities of pair-wise disease ontology and human phenotype ontology terms and predicting the pair-wise of proteins interaction. The experimental results demonstrated the comparative performance. This gives promise to the development of NLP methods in biological data analysis.