Deciphering “the language of nature”: A transformer-based language model for deleterious mutations in proteins
作者机构:Raymond G.Perelman Center for Cellular and Molecular TherapeuticsChildren's Hospital of PhiladelphiaPhiladelphiaPA 19104USA Palisades Charter High SchoolPacific PalisadesCA 90272USA Massachusetts Institute of TechnologyCambridgeMA 02139USA Department of Genetics and Biomedical InformaticsZhongshan School of MedicineSun Yat-sen UniversityGuangzhou 510080China Department of Pathology and Laboratory MedicinePerelman School of MedicineUniversity of PennsylvaniaPhiladelphiaPA 19104USA
出 版 物:《The Innovation》 (创新(英文))
年 卷 期:2023年第4卷第5期
页 面:47-58页
核心收录:
学科分类:0502[文学-外国语言文学] 050201[文学-英语语言文学] 05[文学]
基 金:NIH grant GM132713(K.W.) CHOP Research Institute and the Fundamental Research Funds for the Central Universities,Sun Yat-sen University(No.23ptpy119,to L.F)
主 题:delete prediction apply
摘 要:Various machine-learning models,including deep neural network models,have already been developed to predict deleteriousness of missense(non-synonymous)*** improvements to the current state of the art,however,may still benefit from a fresh look at the biological problem using more sophisticated self-adaptive machine-learning *** advances in the field of natural language processing show that transformer models—a type of deep neural network—to be particularly powerful at modeling sequence information with context *** this study,we introduce MutFormer,a transformer-based model for the prediction of deleterious missense mutations,which uses reference and mutated protein sequences from the human genome as the primary *** takes advantage of a combination of self-attention layers and convolutional layers to learn both long-range and short-range dependencies between amino acid mutations in a protein *** first pre-trained MutFormer on reference protein sequences and mutated protein sequences resulting from common genetic variants observed in human *** next examined different fine-tuning methods to successfully apply the model to deleteriousness prediction of missense ***,we evaluated MutFormer’s performance on multiple testing datasets.