Improving conditional random field model for prediction of protein-RNA residue-base contacts

Morihiro Hayashida; Noriyuki Okada; Mayumi Kamada; Hitoshi Koyano

doi:10.1007/s40484-018-0136-7

PDF(411 KB)

Quant. Biol. ›› 2018, Vol. 6 ›› Issue (2) : 155-162. DOI: 10.1007/s40484-018-0136-7

RESEARCH ARTICLE

Improving conditional random field model for prediction of protein-RNA residue-base contacts

Author information +

History +

Abstract

Background: For understanding biological cellular systems, it is important to analyze interactions between protein residues and RNA bases. A method based on conditional random fields (CRFs) was developed for predicting contacts between residues and bases, which receives multiple sequence alignments for given protein and RNA sequences, respectively, and learns the model with many parameters involved in relationships between neighboring residue-base pairs by maximizing the pseudo likelihood function.

Methods: In this paper, we proposed a novel CRF-based model with more complicated dependency relationships between random variables than the previous model, but which takes less parameters for the sake of avoidance of overfitting to training data.

Results: We performed cross-validation experiments for evaluating the proposed model, and took the average of AUC (area under receiver operating characteristic curve) scores. The result suggests that the proposed CRF-based model without using L₁-norm regularization (lasso) outperforms the existing model with and without the lasso under several input observations to CRFs.

Conclusions: We proposed a novel stochastic model for predicting protein-RNA residue-base contacts, and improved the prediction accuracy in terms of the AUC score. It implies that more dependency relationships in a CRF could be controlled by less parameters.

Graphical abstract

Keywords

protein-RNA interaction / residue-base contact / conditional random field

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Morihiro Hayashida, Noriyuki Okada, Mayumi Kamada, Hitoshi Koyano. Improving conditional random field model for prediction of protein-RNA residue-base contacts. Quant. Biol., 2018, 6(2): 155‒162 https://doi.org/10.1007/s40484-018-0136-7

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Re, A., Joshi, T., Kulberkyte, E., Morris, Q. and Workman, C. T. (2014) RNA-protein interactions: an overview. Methods Mol. Biol., 1097, 491–521 CrossRef Pubmed Google scholar

[2]	Lejeune, D., Delsaux, N., Charloteaux, B., Thomas, A. and Brasseur, R. (2005) Protein-nucleic acid recognition: statistical analysis of atomic interactions and influence of DNA structure. Proteins, 61, 258–271 CrossRef Pubmed Google scholar

[3]	Siomi, H., Matunis, M. J., Michael, W. M. and Dreyfuss, G. (1993) The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucleic Acids Res., 21, 1193–1198 CrossRef Pubmed Google scholar

[4]	Feng, G. S., Chong, K., Kumar, A. and Williams, B. R. (1992) Identification of double-stranded RNA-binding domains in the interferon-induced double-stranded RNA-activated p68 kinase. Proc. Natl. Acad. Sci. USA, 89, 5447–5451 CrossRef Pubmed Google scholar

[5]	St Johnston, D., Brown, N. H., Gall, J. G. and Jantsch, M. (1992) A conserved double-stranded RNA-binding domain. Proc. Natl. Acad. Sci. USA, 89, 10979–10983 CrossRef Pubmed Google scholar

[6]	Gorbalenya, A. E., Koonin, E. V., Donchenko, A. P. and Blinov, V. M. (1989) Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res., 17, 4713–4730 CrossRef Pubmed Google scholar

[7]	Parisi, M. and Lin, H. (2000) Translational repression: a duet of Nanos and Pumilio. Curr. Biol., 10, R81–R83 CrossRef Pubmed Google scholar

[8]	Hall, T. M. (2005) Multiple modes of RNA recognition by zinc finger proteins. Curr. Opin. Struct. Biol., 15, 367–373 CrossRef Pubmed Google scholar

[9]	Gupta, A. and Gribskov, M. (2011) The role of RNA sequence and structure in RNA–protein interactions. J. Mol. Biol., 409, 574–587 CrossRef Pubmed Google scholar

[10]	Peled, S., Leiderman, O., Charar, R., Efroni, G., Shav-Tal, Y. and Ofran, Y. (2016) De-novo protein function prediction using DNA binding and RNA binding proteins as a test case. Nat Commun, 7, 13424 Pubmed

[11]	Ho, T. (1995) Random decision forests. Proc. Third Int. Con. on Document Analysis and Recognition, 1, 278–282

[12]	Kumar, M., Gromiha, M. M. and Raghava, G. P. (2008) Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins, 71, 189–194 CrossRef Pubmed Google scholar

[13]	Kumar, M., Gromiha, M. M. and Raghava, G. P. (2011) SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit., 24, 303–313 CrossRef Pubmed Google scholar

[14]	Pérez-Cano, L. and Fernández-Recio, J. (2010) Optimal protein-RNA area, OPRA: a propensity-based method to identify RNA-binding sites on proteins. Proteins, 78, 25–35 CrossRef Pubmed Google scholar

[15]	Liu, Z. P., Wu, L. Y., Wang, Y., Zhang, X. S. and Chen, L. (2010) Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics, 26, 1616–1622 CrossRef Pubmed Google scholar

[16]	Zhang, C., Lee, K. Y., Swanson, M. S. and Darnell, R. B. (2013) Prediction of clustered RNA-binding protein motif sites in the mammalian genome. Nucleic Acids Res., 41, 6793–6807 CrossRef Pubmed Google scholar

[17]	Zhao, H., Yang, Y. and Zhou, Y. (2011) Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets. Nucleic Acids Res., 39, 3017–3025 CrossRef Pubmed Google scholar

[18]	Ren, H. and Shen, Y. (2015) RNA-binding residues prediction using structural features. BMC Bioinformatics, 16, 249 CrossRef Pubmed Google scholar

[19]	Wang, Y., Chen, X., Liu, Z. P., Huang, Q., Wang, Y., Xu, D., Zhang, X. S., Chen, R. and Chen, L. (2013) De novo prediction of RNA-protein interactions from sequence information. Mol. Biosyst., 9, 133–142 CrossRef Pubmed Google scholar

[20]	Sun, M., Wang, X., Zou, C., He, Z., Liu, W. and Li, H. (2016) Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics, 17, 231 CrossRef Pubmed Google scholar

[21]	Lafferty, J., McCallum, A. and Pereira, F. (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proc. Int. Conf. on Machine Learning 2001, pp. 282–289

[22]	Sha, F. and Pereira, F. (2003) Shallow parsing with conditional random fields. Proc. HLT-NAACL 2003, pp. 134–141

[23]	Yao, K., Peng, B., Zweig, G., Yu, D., Li, X. and Gao, F. (2014) Recurrent conditional random field for language understanding. 2014 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4077–4081

[24]	Vemulapalli, R., Tuzel, O., Liu, M. Y. and Chella, R. (2016) Gaussian conditional random field network for semantic segmentation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3224–3233

[25]	Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2011) Conditional random field approach to prediction of protein-protein interactions using domain information. BMC Syst. Biol., 5, S8 CrossRef Pubmed Google scholar

[26]	Kamada, M., Hayashida, M., Song, J. and Akutsu, T. (2011) Discriminative random field approach to prediction of protein residue contacts. In IEEE International Conference on Systems Biology, pp. 285–291

[27]	Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2012) Predicting protein-RNA residue-base contacts using two-dimensional conditional random field. In 2012 IEEE International Conference on Systems Biology

[28]	Hayashida, M., Kamada, M., Song, J. and Akutsu, T. (2013) Prediction of protein-RNA residue-base contacts using two-dimensional conditional random field with the lasso. BMC Syst. Biol., 7, S15 CrossRef Pubmed Google scholar

[29]	Dunn, S. D., Wahl, L. M. and Gloor, G. B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24, 333–340 CrossRef Pubmed Google scholar

[30]	Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 58, 267–288

[31]	Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. and Aurell, E. (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E Stat. Nonlin. Soft Matter Phys., 87, 012707 CrossRef Pubmed Google scholar

[32]	Rose, P. W., Beran, B., Bi, C., Bluhm, W. F., Dimitropoulos, D., Goodsell, D. S., Prlic, A., Quesada, M., Quinn, G. B., Westbrook, J. D., (2011) The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res., 39, D392–D401 CrossRef Pubmed Google scholar

[33]	Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., (2012) The Pfam protein families database. Nucleic Acids Res., 40, D290–D301 CrossRef Pubmed Google scholar

[34]	Gardner, P. P., Daub, J., Tate, J., Moore, B. L., Osuch, I. H., Griffiths-Jones, S., Finn, R. D., Nawrocki, E. P., Kolbe, D. L., Eddy, S. R., (2011) Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res., 39, D141–D145 CrossRef Pubmed Google scholar

[35]	The UniProt Consortium. (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res., 38, D142–D148 CrossRef Pubmed Google scholar

[36]	Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. and Sayers, E. W. (2011) GenBank. Nucleic Acids Res., 39, D32–D37 CrossRef Pubmed Google scholar

[37]	Murphy, L. R., Wallqvist, A. and Levy, R. M. (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng., 13, 149–152 CrossRef Pubmed Google scholar

[38]	Bertsekas, D. P. (1999) Nonlinear Programming. Nashua: Athena Scientific

[39]	Nocedal, J. (1980) Updating quasi-Newton matrices with limited storage. Math. Comput., 35, 773–782 CrossRef Google scholar

[40]	Kolmogorov, V. (2006) Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28, 1568–1583 CrossRef Pubmed Google scholar

ACKNOWLEDGEMENTS

This work was partially supported by Grants-in-Aid #16K00392 and #16KT0020 from JSPS, Japan.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Morihiro Hayashida, Noriyuki Okada, Mayumi Kamada and Hitoshi Koyano declare that they have no conflict of interests.

This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(411 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Revised	Accepted	Published
01 Sep 2017	04 Dec 2017	13 Dec 2017	11 Jun 2018
Online First Date	Issue Date
09 May 2018	11 Jun 2018

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs