Improving conditional random field model for prediction of protein-RNA residue-base contacts
Improving conditional random field model for prediction of protein-RNA residue-base contacts作者机构:Department of Electrical Engineering and Computer Science National Institute of Technology Matsue College Shimane 690- 8518 Japan Graduate School of Medicine Kyoto University Kyoto 606-8507 Japan Riken Quantitative Biology Center Hyogo 650-0047 Japan
出 版 物:《Frontiers of Electrical and Electronic Engineering in China》 (中国电气与电子工程前沿(英文版))
年 卷 期:2018年第6卷第2期
页 面:155-162页
核心收录:
学科分类:12[管理学] 0832[工学-食品科学与工程(可授工学、农学学位)] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 083203[工学-农产品加工及贮藏工程]
基 金:supported by JSPS Japan Grants-in-Aid
主 题:protein-RNA interaction residue-base contact conditional random field
摘 要:Background: For understanding biological cellular systems, it is important to analyze interactions between protein residues and RNA bases. A method based on conditional random fields (CRFs) was developed for predicting contacts between residues and bases, which receives multiple sequence alignments for given protein and RNA sequences, respectively, and learns the model with many parameters involved in relationships between neighboring residue-base pairs by maximizing the pseudo likelihood function. Methods: In this paper, we proposed a novel CRF-based model with more complicated dependency relationships between random variables than the previous model, but which takes less parameters for the sake of avoidance of overfitting to training data. Results: We performed cross-validation experiments for evaluating the proposed model, and took the average of AUC (area under receiver operating characteristic curve) scores. The result suggests that the proposed CRF-based model without using Ll-norm regularization (lasso) outperforms the existing model with and without the lasso under several input observations to CRFs. Conclusions: We proposed a novel stochastic model for predicting protein-RNA residue-base contacts, and improved the prediction accuracy in terms of the AUC score. It implies that more dependency relationships in a CRF could be controlled by less parameters.