A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data

Minzhe Zhang, Qiwei Li, Yang Xie

PDF(1765 KB)
PDF(1765 KB)
Quant. Biol. ›› 2018, Vol. 6 ›› Issue (3) : 275-286. DOI: 10.1007/s40484-018-0149-2
METHODOLOGY ARTICLE
METHODOLOGY ARTICLE

A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data

Author information +
History +

Abstract

Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data.

Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at the authors' website and the R/C++ code is available at https://github.com/liqiwei2000/BaySeqPeak.

Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods.

Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.

Author summary

Methylated RNA immunoprecipatation combined with RNA sequencing (MeRIP-seq), which can be viewed as a marriage of two well-studied techniques: ChIP-seq and RNA-seq, is changing the landscape of RNA epigenomics study at a higher resolution. We propose a Bayesian statistical model to identify the transcriptome methylation sites using MeRIP-seq data. Our approach includes several innovative characteristics by taking into account: (i) the high proportion of zeros in the data due to the insufficient sequencing depth; (ii) the spatial dependence of neighboring read enrichment. Compared to the existing methods, it is shown that our prediction is more consistent with the biological knowledge, and has better accuracy and spatial resolution.

Graphical abstract

Keywords

MeRIP-seq data / RNA epigenomics / Bayesian inference / hidden Markov model / zero-inflated negative binomial

Cite this article

Download citation ▾
Minzhe Zhang, Qiwei Li, Yang Xie. A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data. Quant. Biol., 2018, 6(3): 275‒286 https://doi.org/10.1007/s40484-018-0149-2

References

[1]
Suzuki, M. M. and Bird, A. (2008) DNA methylation landscapes: provocative insights from epigenomics. Nat. Rev. Genet., 9, 465–476
CrossRef Pubmed Google scholar
[2]
Shi, Y. (2007) Histone lysine demethylases: emerging roles in development, physiology and disease. Nat. Rev. Genet., 8, 829–833
CrossRef Pubmed Google scholar
[3]
Motorin, Y. and Helm, M. (2011) RNA nucleotide methylation. Wiley Interdiscip. Rev. RNA, 2, 611–631
CrossRef Pubmed Google scholar
[4]
Dominissini, D., Moshitch-Moshkovitz, S., Schwartz, S., Salmon-Divon, M., Ungar, L., Osenberg, S., Cesarkas, K., Jacob-Hirsch, J., Amariglio, N., Kupiec, M., (2012)Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485, 201–206
[5]
Meyer, K. D., Saletore, Y., Zumbo, P., Elemento, O., Mason, C. E. and Jaffrey, S. R. (2012) Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell, 149, 1635–1646
CrossRef Pubmed Google scholar
[6]
Machnicka, M. A., Milanowska, K., Oglou, O., Purta, E., Kurkowska, M., Olchowik, A., Januszewski, W., Kalinowski, S., Dunin-Horkawicz, S., Rother, K. M., (2013) MODOMICS: a database of RNA modification pathways€–€2013 update. Nucleic Acids Res., 41, D262–D267
CrossRef Pubmed Google scholar
[7]
Desrosiers, R., Friderici, K. and Rottman, F. (1974) Identification of methylated nucleosides in messenger RNA from Novikoff hepatoma cells. Proc. Natl. Acad. Sci. USA, 71, 3971–3975
CrossRef Pubmed Google scholar
[8]
Adams, J. M. and Cory, S. (1975) Modified nucleosides and bizarre 5′-termini in mouse myeloma mRNA. Nature, 255, 28–33
CrossRef Pubmed Google scholar
[9]
Aloni, Y., Dhar, R. and Khoury, G. (1979) Methylation of nuclear simian virus 40 RNAs. J. Virol., 32, 52–60
Pubmed
[10]
Liu, J., Yue, Y., Han, D., Wang, X., Fu, Y., Zhang, L., Jia, G., Yu, M., Lu, Z., Deng, X., (2014) A METTL3-METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation. Nat. Chem. Biol., 10, 93–95
CrossRef Pubmed Google scholar
[11]
Ping, X.-L., Sun, B. F., Wang, L., Xiao, W., Yang, X., Wang, W. J., Adhikari, S., Shi, Y., Lv, Y., Chen, Y. S., (2014) Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase. Cell Res., 24, 177–189
CrossRef Pubmed Google scholar
[12]
Jia, G., Fu, Y., Zhao, X., Dai, Q., Zheng, G., Yang, Y., Yi, C., Lindahl, T., Pan, T., Yang, Y. G., (2011) N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nat. Chem. Biol., 7, 885–887
CrossRef Pubmed Google scholar
[13]
Yue, Y., Liu, J. and He, C. (2015) RNA N6-methyladenosine methylation in post-transcriptional gene expression regulation. Genes Dev., 29, 1343–1355
CrossRef Pubmed Google scholar
[14]
Meyer, K. D., and Jaffrey, S. R. (2014) The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nat. Rev. Mol. Cell Bio., 15, 313–326
[15]
Cao, G., Li, H-B., Yin, Z., Flavell, R. A. (2016) Recent advances in dynamic m6A RNA modification. Open Biol ., 6,160003
[16]
Meng, J., Cui, X., Rao, M. K., Chen, Y. and Huang, Y. (2013) Exome-based analysis for RNA epigenome sequencing data. Bioinformatics, 29, 1565–1567
CrossRef Pubmed Google scholar
[17]
Przyborowski, J. and Wilenski, H. (1940) Homogeneity of results in testing samples from Poisson series: with an application to testing clover seed for dodder. Biometrika, 31, 313–323
[18]
Cui, X., Meng, J., Rao, M. K., Chen Y. and Huang Y. (2015) HEPeak: an HMM-based exome peak-finding package for RNA epigenome sequencing data.  BMC genomics  16(Suppl 4), S2
CrossRef Google scholar
[19]
Cui, X., Meng, J., Zhang, S., Chen, Y. and Huang, Y. (2016) A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data. Bioinformatics, 32, i378–i385
CrossRef Pubmed Google scholar
[20]
Gelman, A. (2006) Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal., 1, 515–534
CrossRef Google scholar
[21]
Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res., 18, 1509–1517
CrossRef Pubmed Google scholar
[22]
Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinform., 11, 94
CrossRef Google scholar
[23]
Anders, S., and Huber W. (2010) Differential expression analysis for sequence count data. Genome Boil., 11, R106
CrossRef Google scholar
[24]
Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140
CrossRef Pubmed Google scholar
[25]
Witten, D., Tibshirani, R., Gu, S., Fire, A. and Lui, W. -O. (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biol., 8, 58
CrossRef Google scholar
[26]
Witten, D. M. (2011) Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat., 5, 2493–2518
CrossRef Google scholar
[27]
Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 13, 523–538
CrossRef Pubmed Google scholar
[28]
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods, 5, 621–628
CrossRef Pubmed Google scholar
[29]
Morris, C. N. (1983) Parametric empirical Bayes inference: theory and applications. J. Am. Stat. Assoc., 78, 47–55
CrossRef Google scholar
[30]
Gelman, A. (2008) Objections to Bayesian statistics. Bayesian Anal., 3, 445–449
CrossRef Google scholar
[31]
Li, Q., Guindani, M., Reich, B. J., Bondell, H. D. and Vannucci, M. (2017) A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints. Statistical Analysis and Data Mining: The ASA Data Science Journal, 10, 393–409
CrossRef Google scholar
[32]
Guha, S., Li, Y. and Neuberg, D. (2008) Bayesian hidden Markov modeling of array CGH data. J. Am. Stat. Assoc., 103, 485–497
CrossRef Pubmed Google scholar
[33]
Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155–176
CrossRef Pubmed Google scholar
[34]
Gelman, A. and Rubin, D. B. (1992) Inference from iterative simulation using multiple sequences. Stat. Sci., 7, 457–472
CrossRef Google scholar
[35]
Hess, M. E., Hess, S., Meyer, K. D., Verhagen, L. A., Koch, L., Brönneke, H. S., Dietrich, M. O., Jordan, S. D., Saletore, Y., Elemento, O., (2013) The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry. Nat. Neurosci., 16, 1042–1048
CrossRef Pubmed Google scholar
[36]
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47217
[37]
Meng, J., Lu, Z., Liu, H., Zhang, L., Zhang, S., Chen, Y., Rao, M. K. and Huang, Y. (2014) A protocol for RNA methylation differential analysis with MeRIP-seq data and exomePeak R/Bioconductor package. Methods, 69, 274–281
CrossRef Pubmed Google scholar

ACKNOWLEDGEMENTS

The authors would like to thank Jessie Norris for helping with proofreading the manuscript. This work was partially supported by the National Institutes of Health (Nos. R01CA172211, P50CA70907, P30CA142543, R01GM-115473, R01GM117597, R15GM113157, and R01CA152301), and the Cancer Prevention and Research Institute of Texas (No. RP120732).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Minzhe Zhang, Qiwei Li and Yang Xie declare that they have no conflict of interests.‚‚This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1765 KB)

Accesses

Citations

Detail

Sections
Recommended

/