Improving conditional random field model for prediction of protein-RNA residue-base contacts

Morihiro Hayashida, Noriyuki Okada, Mayumi Kamada, Hitoshi Koyano

PDF(411 KB)
PDF(411 KB)
Quant. Biol. ›› 2018, Vol. 6 ›› Issue (2) : 155-162. DOI: 10.1007/s40484-018-0136-7
RESEARCH ARTICLE
RESEARCH ARTICLE

Improving conditional random field model for prediction of protein-RNA residue-base contacts

Author information +
History +

Abstract

Background: For understanding biological cellular systems, it is important to analyze interactions between protein residues and RNA bases. A method based on conditional random fields (CRFs) was developed for predicting contacts between residues and bases, which receives multiple sequence alignments for given protein and RNA sequences, respectively, and learns the model with many parameters involved in relationships between neighboring residue-base pairs by maximizing the pseudo likelihood function.

Methods: In this paper, we proposed a novel CRF-based model with more complicated dependency relationships between random variables than the previous model, but which takes less parameters for the sake of avoidance of overfitting to training data.

Results: We performed cross-validation experiments for evaluating the proposed model, and took the average of AUC (area under receiver operating characteristic curve) scores. The result suggests that the proposed CRF-based model without using L1-norm regularization (lasso) outperforms the existing model with and without the lasso under several input observations to CRFs.

Conclusions: We proposed a novel stochastic model for predicting protein-RNA residue-base contacts, and improved the prediction accuracy in terms of the AUC score. It implies that more dependency relationships in a CRF could be controlled by less parameters.

Graphical abstract

Keywords

protein-RNA interaction / residue-base contact / conditional random field

Cite this article

Download citation ▾
Morihiro Hayashida, Noriyuki Okada, Mayumi Kamada, Hitoshi Koyano. Improving conditional random field model for prediction of protein-RNA residue-base contacts. Quant. Biol., 2018, 6(2): 155‒162 https://doi.org/10.1007/s40484-018-0136-7

References

[1]
Re, A., Joshi, T., Kulberkyte, E., Morris, Q. and Workman, C. T. (2014) RNA-protein interactions: an overview. Methods Mol. Biol., 1097, 491–521
CrossRef Pubmed Google scholar
[2]
Lejeune, D., Delsaux, N., Charloteaux, B., Thomas, A. and Brasseur, R. (2005) Protein-nucleic acid recognition: statistical analysis of atomic interactions and influence of DNA structure. Proteins, 61, 258–271
CrossRef Pubmed Google scholar
[3]
Siomi, H., Matunis, M. J., Michael, W. M. and Dreyfuss, G. (1993) The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucleic Acids Res., 21, 1193–1198
CrossRef Pubmed Google scholar
[4]
Feng, G. S., Chong, K., Kumar, A. and Williams, B. R. (1992) Identification of double-stranded RNA-binding domains in the interferon-induced double-stranded RNA-activated p68 kinase. Proc. Natl. Acad. Sci. USA, 89, 5447–5451
CrossRef Pubmed Google scholar
[5]
St Johnston, D., Brown, N. H., Gall, J. G. and Jantsch, M. (1992) A conserved double-stranded RNA-binding domain. Proc. Natl. Acad. Sci. USA, 89, 10979–10983
CrossRef Pubmed Google scholar
[6]
Gorbalenya, A. E., Koonin, E. V., Donchenko, A. P. and Blinov, V. M. (1989) Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res., 17, 4713–4730
CrossRef Pubmed Google scholar
[7]
Parisi, M. and Lin, H. (2000) Translational repression: a duet of Nanos and Pumilio. Curr. Biol., 10, R81–R83
CrossRef Pubmed Google scholar
[8]
Hall, T. M. (2005) Multiple modes of RNA recognition by zinc finger proteins. Curr. Opin. Struct. Biol., 15, 367–373
CrossRef Pubmed Google scholar
[9]
Gupta, A. and Gribskov, M. (2011) The role of RNA sequence and structure in RNA–protein interactions. J. Mol. Biol., 409, 574–587
CrossRef Pubmed Google scholar
[10]
Peled, S., Leiderman, O., Charar, R., Efroni, G., Shav-Tal, Y. and Ofran, Y. (2016) De-novo protein function prediction using DNA binding and RNA binding proteins as a test case. Nat Commun, 7, 13424
Pubmed
[11]
Ho, T. (1995) Random decision forests. Proc. Third Int. Con. on Document Analysis and Recognition, 1, 278–282
[12]
Kumar, M., Gromiha, M. M. and Raghava, G. P. (2008) Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins, 71, 189–194
CrossRef Pubmed Google scholar
[13]
Kumar, M., Gromiha, M. M. and Raghava, G. P. (2011) SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit., 24, 303–313
CrossRef Pubmed Google scholar
[14]
Pérez-Cano, L. and Fernández-Recio, J. (2010) Optimal protein-RNA area, OPRA: a propensity-based method to identify RNA-binding sites on proteins. Proteins, 78, 25–35
CrossRef Pubmed Google scholar
[15]
Liu, Z. P., Wu, L. Y., Wang, Y., Zhang, X. S. and Chen, L. (2010) Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics, 26, 1616–1622
CrossRef Pubmed Google scholar
[16]
Zhang, C., Lee, K. Y., Swanson, M. S. and Darnell, R. B. (2013) Prediction of clustered RNA-binding protein motif sites in the mammalian genome. Nucleic Acids Res., 41, 6793–6807
CrossRef Pubmed Google scholar
[17]
Zhao, H., Yang, Y. and Zhou, Y. (2011) Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets. Nucleic Acids Res., 39, 3017–3025
CrossRef Pubmed Google scholar
[18]
Ren, H. and Shen, Y. (2015) RNA-binding residues prediction using structural features. BMC Bioinformatics, 16, 249
CrossRef Pubmed Google scholar
[19]
Wang, Y., Chen, X., Liu, Z. P., Huang, Q., Wang, Y., Xu, D., Zhang, X. S., Chen, R. and Chen, L. (2013) De novo prediction of RNA-protein interactions from sequence information. Mol. Biosyst., 9, 133–142
CrossRef Pubmed Google scholar
[20]
Sun, M., Wang, X., Zou, C., He, Z., Liu, W. and Li, H. (2016) Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics, 17, 231
CrossRef Pubmed Google scholar
[21]
Lafferty, J., McCallum, A. and Pereira, F. (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proc. Int. Conf. on Machine Learning 2001, pp. 282–289
[22]
Sha, F. and Pereira, F. (2003) Shallow parsing with conditional random fields. Proc. HLT-NAACL 2003, pp. 134–141
[23]
Yao, K., Peng, B., Zweig, G., Yu, D., Li, X. and Gao, F. (2014) Recurrent conditional random field for language understanding. 2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4077–4081
[24]
Vemulapalli, R., Tuzel, O., Liu, M. Y. and Chella, R. (2016) Gaussian conditional random field network for semantic segmentation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3224–3233
[25]
Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2011) Conditional random field approach to prediction of protein-protein interactions using domain information. BMC Syst. Biol., 5, S8
CrossRef Pubmed Google scholar
[26]
Kamada, M., Hayashida, M., Song, J. and Akutsu, T. (2011) Discriminative random field approach to prediction of protein residue contacts. In IEEE International Conference on Systems Biology, pp. 285–291
[27]
Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2012) Predicting protein-RNA residue-base contacts using two-dimensional conditional random field. In 2012 IEEE International Conference on Systems Biology
[28]
Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2013) Prediction of protein-RNA residue-base contacts using two-dimensional conditional random field with the lasso. BMC Syst. Biol., 7, S15
CrossRef Pubmed Google scholar
[29]
Dunn, S. D., Wahl, L. M. and Gloor, G. B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24, 333–340
CrossRef Pubmed Google scholar
[30]
Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 58, 267–288
[31]
Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. and Aurell, E. (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E Stat. Nonlin. Soft Matter Phys., 87, 012707
CrossRef Pubmed Google scholar
[32]
Rose, P. W., Beran, B., Bi, C., Bluhm, W. F., Dimitropoulos, D., Goodsell, D. S., Prlic, A., Quesada, M., Quinn, G. B., Westbrook, J. D., (2011) The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res., 39, D392–D401
CrossRef Pubmed Google scholar
[33]
Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., (2012) The Pfam protein families database. Nucleic Acids Res., 40, D290–D301
CrossRef Pubmed Google scholar
[34]
Gardner, P. P., Daub, J., Tate, J., Moore, B. L., Osuch, I. H., Griffiths-Jones, S., Finn, R. D., Nawrocki, E. P., Kolbe, D. L., Eddy, S. R., (2011) Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res., 39, D141–D145
CrossRef Pubmed Google scholar
[35]
The UniProt Consortium. (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res., 38, D142–D148
CrossRef Pubmed Google scholar
[36]
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. and Sayers, E. W. (2011) GenBank. Nucleic Acids Res., 39, D32–D37
CrossRef Pubmed Google scholar
[37]
Murphy, L. R., Wallqvist, A. and Levy, R. M. (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng., 13, 149–152
CrossRef Pubmed Google scholar
[38]
Bertsekas, D. P. (1999) Nonlinear Programming. Nashua: Athena Scientific
[39]
Nocedal, J. (1980) Updating quasi-Newton matrices with limited storage. Math. Comput., 35, 773–782
CrossRef Google scholar
[40]
Kolmogorov, V. (2006) Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28, 1568–1583
CrossRef Pubmed Google scholar

ACKNOWLEDGEMENTS

This work was partially supported by Grants-in-Aid #16K00392 and #16KT0020 from JSPS, Japan.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Morihiro Hayashida, Noriyuki Okada, Mayumi Kamada and Hitoshi Koyano declare that they have no conflict of interests.
This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(411 KB)

Accesses

Citations

Detail

Sections
Recommended

/