Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes
Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes作者机构:Life Sciences DepartmentParis Diderot University Department of Biochemistry and Molecular BiologyGeorge Washington University Medical Center Department of OncologyGeorgetown University Center for Biologics Evaluation and ResearchFood and Drug Administration
出 版 物:《Genomics, Proteomics & Bioinformatics》 (基因组蛋白质组与生物信息学报(英文版))
年 卷 期:2013年第11卷第2期
页 面:96-104页
核心收录:
学科分类:0710[理学-生物学] 071010[理学-生物化学与分子生物学] 081704[工学-应用化学] 1001[医学-基础医学(可授医学、理学学位)] 07[理学] 08[工学] 0817[工学-化学工程与技术] 0714[理学-统计学(可授理学、经济学学位)] 0703[理学-化学] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:Support for this work came from the George Washington University funds to RM.RG's participation is supported by RO1 CA135069 and U01 CA168926 supported in part by an appointment to the Research Participation Program at the Center for Biologics Evaluation and Research administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration
主 题:N-linked glycosylation Gain and loss of glycosyla-tion nsSNP nsSNV Variation
摘 要:The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharo- myees cerevisiae. Our analysis shows that 78 % of all asparagines of NXS/T motif involved in N-gly- cosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribu- tion across the secondary structural elements, indicating that the NXS/T motif in itself is not bio- logically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://***/tools/sfat.