Improving conditional random field model for prediction of protein-RNA residue-base contacts

Morihiro Hayashida , Noriyuki Okada , Mayumi Kamada , Hitoshi Koyano

Quant. Biol. ›› 2018, Vol. 6 ›› Issue (2) : 155 -162.

PDF (411KB)
Quant. Biol. ›› 2018, Vol. 6 ›› Issue (2) : 155 -162. DOI: 10.1007/s40484-018-0136-7
RESEARCH ARTICLE
RESEARCH ARTICLE

Improving conditional random field model for prediction of protein-RNA residue-base contacts

Author information +
History +
PDF (411KB)

Abstract

Background: For understanding biological cellular systems, it is important to analyze interactions between protein residues and RNA bases. A method based on conditional random fields (CRFs) was developed for predicting contacts between residues and bases, which receives multiple sequence alignments for given protein and RNA sequences, respectively, and learns the model with many parameters involved in relationships between neighboring residue-base pairs by maximizing the pseudo likelihood function.

Methods: In this paper, we proposed a novel CRF-based model with more complicated dependency relationships between random variables than the previous model, but which takes less parameters for the sake of avoidance of overfitting to training data.

Results: We performed cross-validation experiments for evaluating the proposed model, and took the average of AUC (area under receiver operating characteristic curve) scores. The result suggests that the proposed CRF-based model without using L1-norm regularization (lasso) outperforms the existing model with and without the lasso under several input observations to CRFs.

Conclusions: We proposed a novel stochastic model for predicting protein-RNA residue-base contacts, and improved the prediction accuracy in terms of the AUC score. It implies that more dependency relationships in a CRF could be controlled by less parameters.

Graphical abstract

Keywords

protein-RNA interaction / residue-base contact / conditional random field

Cite this article

Download citation ▾
Morihiro Hayashida, Noriyuki Okada, Mayumi Kamada, Hitoshi Koyano. Improving conditional random field model for prediction of protein-RNA residue-base contacts. Quant. Biol., 2018, 6(2): 155-162 DOI:10.1007/s40484-018-0136-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Re, A., Joshi, T., Kulberkyte, E., Morris, Q. and Workman, C. T. (2014) RNA-protein interactions: an overview. Methods Mol. Biol., 1097, 491–521

[2]

Lejeune, D., Delsaux, N., Charloteaux, B., Thomas, A. and Brasseur, R. (2005) Protein-nucleic acid recognition: statistical analysis of atomic interactions and influence of DNA structure. Proteins, 61, 258–271

[3]

Siomi, H., Matunis, M. J., Michael, W. M. and Dreyfuss, G. (1993) The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucleic Acids Res., 21, 1193–1198

[4]

Feng, G. S., Chong, K., Kumar, A. and Williams, B. R. (1992) Identification of double-stranded RNA-binding domains in the interferon-induced double-stranded RNA-activated p68 kinase. Proc. Natl. Acad. Sci. USA, 89, 5447–5451

[5]

St Johnston, D., Brown, N. H., Gall, J. G. and Jantsch, M. (1992) A conserved double-stranded RNA-binding domain. Proc. Natl. Acad. Sci. USA, 89, 10979–10983

[6]

Gorbalenya, A. E., Koonin, E. V., Donchenko, A. P. and Blinov, V. M. (1989) Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res., 17, 4713–4730

[7]

Parisi, M. and Lin, H. (2000) Translational repression: a duet of Nanos and Pumilio. Curr. Biol., 10, R81–R83

[8]

Hall, T. M. (2005) Multiple modes of RNA recognition by zinc finger proteins. Curr. Opin. Struct. Biol., 15, 367–373

[9]

Gupta, A. and Gribskov, M. (2011) The role of RNA sequence and structure in RNA–protein interactions. J. Mol. Biol., 409, 574–587

[10]

Peled, S., Leiderman, O., Charar, R., Efroni, G., Shav-Tal, Y. and Ofran, Y. (2016) De-novo protein function prediction using DNA binding and RNA binding proteins as a test case. Nat Commun, 7, 13424

[11]

Ho, T. (1995) Random decision forests. Proc. Third Int. Con. on Document Analysis and Recognition, 1, 278–282

[12]

Kumar, M., Gromiha, M. M. and Raghava, G. P. (2008) Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins, 71, 189–194

[13]

Kumar, M., Gromiha, M. M. and Raghava, G. P. (2011) SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit., 24, 303–313

[14]

Pérez-Cano, L. and Fernández-Recio, J. (2010) Optimal protein-RNA area, OPRA: a propensity-based method to identify RNA-binding sites on proteins. Proteins, 78, 25–35

[15]

Liu, Z. P., Wu, L. Y., Wang, Y., Zhang, X. S. and Chen, L. (2010) Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics, 26, 1616–1622

[16]

Zhang, C., Lee, K. Y., Swanson, M. S. and Darnell, R. B. (2013) Prediction of clustered RNA-binding protein motif sites in the mammalian genome. Nucleic Acids Res., 41, 6793–6807

[17]

Zhao, H., Yang, Y. and Zhou, Y. (2011) Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets. Nucleic Acids Res., 39, 3017–3025

[18]

Ren, H. and Shen, Y. (2015) RNA-binding residues prediction using structural features. BMC Bioinformatics, 16, 249

[19]

Wang, Y., Chen, X., Liu, Z. P., Huang, Q., Wang, Y., Xu, D., Zhang, X. S., Chen, R. and Chen, L. (2013) De novo prediction of RNA-protein interactions from sequence information. Mol. Biosyst., 9, 133–142

[20]

Sun, M., Wang, X., Zou, C., He, Z., Liu, W. and Li, H. (2016) Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics, 17, 231

[21]

Lafferty, J., McCallum, A. and Pereira, F. (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proc. Int. Conf. on Machine Learning 2001, pp. 282–289

[22]

Sha, F. and Pereira, F. (2003) Shallow parsing with conditional random fields. Proc. HLT-NAACL 2003, pp. 134–141

[23]

Yao, K., Peng, B., Zweig, G., Yu, D., Li, X. and Gao, F. (2014) Recurrent conditional random field for language understanding. 2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4077–4081

[24]

Vemulapalli, R., Tuzel, O., Liu, M. Y. and Chella, R. (2016) Gaussian conditional random field network for semantic segmentation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3224–3233

[25]

Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2011) Conditional random field approach to prediction of protein-protein interactions using domain information. BMC Syst. Biol., 5, S8

[26]

Kamada, M., Hayashida, M., Song, J. and Akutsu, T. (2011) Discriminative random field approach to prediction of protein residue contacts. In IEEE International Conference on Systems Biology, pp. 285–291

[27]

Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2012) Predicting protein-RNA residue-base contacts using two-dimensional conditional random field. In 2012 IEEE International Conference on Systems Biology

[28]

Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2013) Prediction of protein-RNA residue-base contacts using two-dimensional conditional random field with the lasso. BMC Syst. Biol., 7, S15

[29]

Dunn, S. D., Wahl, L. M. and Gloor, G. B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24, 333–340

[30]

Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 58, 267–288

[31]

Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. and Aurell, E. (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E Stat. Nonlin. Soft Matter Phys., 87, 012707

[32]

Rose, P. W., Beran, B., Bi, C., Bluhm, W. F., Dimitropoulos, D., Goodsell, D. S., Prlic, A., Quesada, M., Quinn, G. B., Westbrook, J. D., (2011) The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res., 39, D392–D401

[33]

Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., (2012) The Pfam protein families database. Nucleic Acids Res., 40, D290–D301

[34]

Gardner, P. P., Daub, J., Tate, J., Moore, B. L., Osuch, I. H., Griffiths-Jones, S., Finn, R. D., Nawrocki, E. P., Kolbe, D. L., Eddy, S. R., (2011) Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res., 39, D141–D145

[35]

The UniProt Consortium. (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res., 38, D142–D148

[36]

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. and Sayers, E. W. (2011) GenBank. Nucleic Acids Res., 39, D32–D37

[37]

Murphy, L. R., Wallqvist, A. and Levy, R. M. (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng., 13, 149–152

[38]

Bertsekas, D. P. (1999) Nonlinear Programming. Nashua: Athena Scientific

[39]

Nocedal, J. (1980) Updating quasi-Newton matrices with limited storage. Math. Comput., 35, 773–782

[40]

Kolmogorov, V. (2006) Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28, 1568–1583

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (411KB)

1224

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/