Modeling and analysis of RNA-seq data: a review from a statistical perspective

Wei Vivian Li, Jingyi Jessica Li

PDF(805 KB)
PDF(805 KB)
Quant. Biol. ›› 2018, Vol. 6 ›› Issue (3) : 195-209. DOI: 10.1007/s40484-018-0144-7
REVIEW
REVIEW

Modeling and analysis of RNA-seq data: a review from a statistical perspective

Author information +
History +

Abstract

Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date.

Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations.

Conclusions: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.

Graphical abstract

Keywords

RNA-seq / statistical modeling / differentially expressed genes / alternatively spliced exons / isoform reconstruction and quantification

Cite this article

Download citation ▾
Wei Vivian Li, Jingyi Jessica Li. Modeling and analysis of RNA-seq data: a review from a statistical perspective. Quant. Biol., 2018, 6(3): 195‒209 https://doi.org/10.1007/s40484-018-0144-7

References

[1]
Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10, 57–63
CrossRef Google scholar
[2]
Zhao, S., Fung-Leung, W.-P., Bittner, A., Ngo, K. and Liu, X. (2014) Comparison of RNA-seq and microarray in transcriptome profiling of activated t cells. PLoS One, 9, e78644
CrossRef Google scholar
[3]
Engström, P. G., Steijger, T., Sipos, B., Grant, G. R., Kahles, A., The RGASP Consortium, Rätsch, G., Goldman, N., Hubbard, T. J., Harrow, J., (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods, 10, 1185–1191
CrossRef Google scholar
[4]
Soneson, C. and Delorenzi, M. (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics, 14, 91
CrossRef Google scholar
[5]
Giorgi, F. M., Del Fabbro, C. and Licausi, F. (2013) Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana. Bioinformatics, 29, 717–724
CrossRef Google scholar
[6]
Kanitz, A., Gypas, F., Gruber, A. J., Gruber, A. R., Martin, G. and Zavolan, M (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol., 16, 1–26
CrossRef Google scholar
[7]
Tourasse, N. J., Millet, J. R. M, and Dupuy, D. (2017) Quantitative RNA-seq meta-analysis of alternative exon usage in C. elegans. Genome Res., 27, 2120–2128
[8]
Li, J. J., Huang, H., Qian, M. and Zhang, X. (2015) Advanced Medical Statistics, 2nd ed., chapter 24, pp. 915–936. World Scientific
[9]
Seqc/Maqc-Iii Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol., 32, 903–914
CrossRef Google scholar
[10]
Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Szcześniak, M. W., Gaffney, D. J., Elo, L. L., Zhang, X. (2016) A survey of best practices for RNA-seq data analysis. Genome Biol., 17, 1
[11]
Gao, R. and Li, J. J. (2017) Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons. BMC Genomics, 18, 234
CrossRef Google scholar
[12]
Arbeitman, M. N., Furlong, E. E. M., Imam, F., Johnson, E., Null, B. H., Baker, B. S., Krasnow, M. A., Scott, M. P., Davis, R. W. and White, K. P. (2002) Gene expression during the life cycle of Drosophila melanogaster. Science, 297, 2270–2275
[13]
Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J. C., Grützner, F. and Kaessmann, H. (2014) The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature, 505, 635–640
CrossRef Google scholar
[14]
Li, W. V., Chen, Y. and Li, J. J. (2017) Trom: a testing-based method for finding transcriptomic similarity of biological samples. Stat. Biosci., 9, 105–136
CrossRef Google scholar
[15]
de la Fuente, A., Bing, N., Hoeschele, I. and Mendes, P. (2004) Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20, 3565–3574
CrossRef Google scholar
[16]
Wyner, A. D. (1978) A definition of conditional mutual information for arbitrary ensembles. Inf. Control, 38, 51–59
CrossRef Google scholar
[17]
Zhao, J., Zhou, Y., Zhang, X. and Chen, L. (2016) Part mutual information for quantifying direct associations in networks. Proc. Natl. Acad. Sci. USA, 113, 5130–5135
CrossRef Google scholar
[18]
van der Maaten, L. and Hinton, G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605
[19]
Kruskal, J. B. and Wish, M. (1978) Multidimensional Scaling, volume 11. Sage
[20]
Evans, C., Hardin, J. and Stoebel, D. M. (2017) Selecting between-sample RNA-seq normalization methods from the perspective of their assumptions. Brief. Bioinform., bbx008 https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbx008/3056951?redirectedFrom=fulltext#
[21]
Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinformatics, 11, 94
CrossRef Google scholar
[22]
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods, 5, 621–628
CrossRef Google scholar
[23]
Trapnell, C., Pachter, L. and Salzberg, S. L. (2009) Tophat: discovering splice junctions with RNA-seq. Bioinformatics, 25, 1105–1111
CrossRef Google scholar
[24]
Li, B. and Dewey, C. N. (2011) RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics, 12, 323
CrossRef Google scholar
[25]
Wagner, G. P., Kin, K. and Lynch, V. J. (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci., 131, 281–285
CrossRef Google scholar
[26]
Dillies, M.-A., Rau, A., Aubert, J., Hennequet-Antier, C., Jean-mougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., (2013) A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis. Brief. Bioinform., 14, 671–683
CrossRef Google scholar
[27]
Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193
CrossRef Google scholar
[28]
Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biol., 11, R106
CrossRef Google scholar
[29]
Robinson, M. D. and Oshlack, A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol., 11, R25
CrossRef Google scholar
[30]
Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 13, 523–538
CrossRef Google scholar
[31]
Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C. E., Socci, N. D. and Betel, D. (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol., 14, 3158
CrossRef Google scholar
[32]
Bloom, J. S., Khan, Z., Kruglyak, L., Singh, M. and Caudy, A. A. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics, 10, 221
CrossRef Google scholar
[33]
Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140
CrossRef Google scholar
[34]
Hardcastle, T. J. and Kelly, K. A. (2010) baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11, 422
CrossRef Google scholar
[35]
Love, M. I., Huber, W. and Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550
CrossRef Google scholar
[36]
Yu, D., Huber, W. and Vitek, O. (2013) Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size. Bioinformatics, 29, 1275–1282
CrossRef Google scholar
[37]
Leng, N., Dawson, J. A., Thomson, J. A., Ruotti, V., Rissman, A. I., Smits, B. M. G., Haag, J. D., Gould, M. N., Stewart, R. M. and Kendziorski, C. (2013) Ebseq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics, 29, 1035–1043
CrossRef Google scholar
[38]
Van De Wiel, M. A., Leday, G. G. R., Pardo, L., Rue, H., Van Der Vaart, A. W. and Van Wieringen, W. N. (2013) Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics, 14, 113–128
CrossRef Google scholar
[39]
Law, C. W., Chen, Y., Shi, W. and Smyth, G. K. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol., 15, R29
CrossRef Google scholar
[40]
Smyth, G. K.. (2005) Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor, pp. 397–420. Springer
[41]
Pimentel, H., Bray, N. L., Puente, S., Melsted, P. and Pachter, L. (2017) Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods, 14, 687–690
CrossRef Google scholar
[42]
Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., Wrobel, N., Gharbi, K., Simpson, G. G., Owen-Hughes, T., (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA, 22, 839–851
CrossRef Google scholar
[43]
Neyman, J. and Pearson, E. S. (1928) On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20, 175–240
[44]
Holm, S. (1979) A simple sequentially rejective multiple test procedure. Scand. J. Stat., 6, 65–70
[45]
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289–300
[46]
Nueda, M. J., Martorell-Marugan, J., Martí, C., Tarazona, S. and Conesa, A. (2018) Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics, 34, 524–526
CrossRef Google scholar
[47]
Tai, Y. C. and Speed, T. P. (2006) A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Stat., 34, 2387–2412
CrossRef Google scholar
[48]
Stuart, J. M., Segal, E., Koller, D.and Kim, S. K. (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science, 302, 249–255
[49]
Langfelder, P. and Horvath, S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559
CrossRef Google scholar
[50]
Zhang, B. and Horvath, S. (2005) A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol., 4, Article 17
CrossRef Google scholar
[51]
Ravasz, E., , Somera A. L., Mongru, D. A., Oltvai, Z. N. and Barabási, A. -L. (2002) Hierarchical organization of modularity in metabolic networks. Science, 297, 1551–1555
[52]
Oti, M., van Reeuwijk, J., Huynen, M. A. and Brunner, H. G. (2008) Conserved co-expression for candidate disease gene prioritization. BMC Bioinformatics, 9, 208
CrossRef Google scholar
[53]
Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D. and Friedman, N. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 34, 166–176
CrossRef Google scholar
[54]
Canzar, S., Andreotti, S., Weese, D., Reinert, K. and Klau, G. W. (2016) CIDANE: comprehensive isoform discovery and abundance estimation. Genome Biol., 17, 16
CrossRef Google scholar
[55]
Jiang, H. and Wong, W. H. (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics, 25, 1026–1032
CrossRef Google scholar
[56]
Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 511–515
CrossRef Google scholar
[57]
Roberts, A. and Pachter, L. (2013) Streaming fragment assignment for real-time analysis of sequencing experiments. Nat. Methods, 10, 71–73
CrossRef Google scholar
[58]
Bray, N. L., Pimentel, H., Melsted, P. and Pachter, L. (2016) Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 525–527
CrossRef Google scholar
[59]
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39, 1–38
[60]
Zhang, J., Jay Kuo, C.-C. and Chen, L. (2014) WEMIQ: an accurate and robust isoform quantification method for RNA-seq data. Bioinformatics, 31, 878–885
[61]
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. and Kingsford, C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417–419
CrossRef Google scholar
[62]
Mezlini, A.M., Smith, E. J. M., Fiume, M., Buske, O., Savich, G. L., Shah, S., Aparicio, S., Chiang, D.Y., Goldenberg, A. and Brudno, M. (2013) iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res., 23, 519–529
[63]
Li, W. V., Zhao, A., Zhang, S. and Li, J. J. (2017) Msiq: joint modeling of multiple RNA-seq samples for accurate isoform quantification. Ann. Appl. Stat., 12, 510–539
[64]
Katz, Y. and Eric, T. (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods, 7, 1009–1015
CrossRef Google scholar
[65]
Love, M. I., Hogenesch, J. B. and Irizarry, R. A. (2016) Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nat. Biotechnol., 34, 1287–1291
CrossRef Google scholar
[66]
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. and Pachter, L. (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol., 12, R22
CrossRef Google scholar
[67]
Xia, Z., Wen, J., Chang, C.-C. and Zhou, X. (2011) Nsmap: a method for spliced isoforms identification and quantification from RNA-seq. BMC Bioinformatics, 12, 162
CrossRef Google scholar
[68]
Bohnert, R. and Rätsch, G. (2010) rQuant. web: a tool for RNA-seq-based transcript quantitation. Nucleic Acids Res., 38, W348–W351
CrossRef Google scholar
[69]
Li, J. J., Jiang, C.-R., Brown, J. B., Huang, H. and Bickel, P. J. (2011) Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. USA, 108, 19867–19872
CrossRef Google scholar
[70]
Li, W., Feng, J. and Jiang, T. (2011) IsoLasso: a LASSO regression approach to RNA-seq based transcriptome assembly. J. Comput. Biol., 18, 1693–1707
CrossRef Google scholar
[71]
Meinshausen, N. and Bühlmann, P. (2010) Stability selection. J. R. Stat. Soc. Series B Stat. Methodol., 72, 417–473
CrossRef Google scholar
[72]
Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., (2011) Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol., 29, 644–652
CrossRef Google scholar
[73]
Guttman, M., Garber, M., Levin, J. Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M. J., Gnirke, A., Nusbaum, C., (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nat. Biotechnol., 28, 503–510
CrossRef Google scholar
[74]
Pertea, M., Pertea, G. M., Antonescu, C. M., Chang, T.-C., Mendell, J. T. and Salzberg, S. L. (2015) Stringtie enables improved reconstruction of a transcrip-tome from RNA-seq reads. Nat. Biotechnol., 33, 290–295
CrossRef Google scholar
[75]
Wang, X., Wu, Z. and Zhang, X. (2010) Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. J. Bioinform. Comput. Biol., 8 (Supp. 1), 177–192
CrossRef Google scholar
[76]
Lin, Y.-Y., Dao, P., Hach, F., Bakhshi, M., Mo, F., Lapuk, A., Collins, C. and Cenk Sahinalp, S. (2012) Cliiq: accurate comparative detection and quantification of expressed isoforms in a population. In Algorithms in Bioinformatics, pp. 178–189. Springer
[77]
Behr, J., Kahles, A., Zhong, Y., Sreedharan, V. T., Drewe, P. and Rätsch, G. (2013) MITIE: Simultaneous RNA-seq-based transcript identification and quantification in multiple samples. Bioinformatics, 29, 2529–2538
CrossRef Google scholar
[78]
Bernard, E., Jacob, L., Mairal, J. and Vert, J.-P. (2014) Efficient RNA isoform identification and quantification from RNA-seq data with network flows. Bioinformatics, 30, 2447–2455
[79]
Steijger, T., Abril, J. F., Engström, P. G., Kokocinski, F., Abril, J. F., Akerman, M., Alioto, T., Ambrosini, G., Antonarakis, S. E., Behr, J., (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods, 10, 1177–1184
CrossRef Google scholar
[80]
Wu, J., Akerman, M., Sun, S., McCombie, W. R., Krainer, A. R. and Zhang, M. Q. (2011) Splicetrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics, 27, 3010–3016
CrossRef Google scholar
[81]
Shen, S., Park, J. W., Lu, Z., Lin, L., Henry, M. D., Wu, Y. N., Zhou, Q. and Xing, Y. (2014) rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl. Acad. Sci. USA., 111, E5593–E5601
CrossRef Google scholar
[82]
Hu, Y., Huang, Y., Du, Y., Orellana, C. F., Singh, D., Johnson, A. R., Monroy, A., Kuan, P.-F., Hammond, S. M., Makowski, L., (2013) Diffsplice: the genome-wide detection of differential splicing events with RNA-seq. Nucleic Acids Res., 41, e39–e39
CrossRef Google scholar
[83]
Anders, S., Reyes, A. and Huber, W. (2012) Detecting differential usage of exons from RNA-seq data. Genome Res., 22, 2008–2017
CrossRef Google scholar
[84]
Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., Searle, S., (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res., 22, 1760–1774
CrossRef Google scholar
[85]
Rhoads, A. and Au, K. F. (2015) Pacbio sequencing and its applications. Genom. Proteom. Bioinf ., 13, 278–289
CrossRef Google scholar
[86]
Branton, D., Deamer, D. W., Marziali, A., Bayley, H., Benner, S. A., Butler, T., Di Ventra, M., Garaj, S., Hibbs, A., Huang, X., (2008) The potential and challenges of nanopore sequencing. Nat. Biotechnol., 26, 1146–1153
CrossRef Google scholar
[87]
Byrne, A., Beaudin, A. E., Olsen, H. E., Jain, M., Cole, C., Palmer, T., DuBois, R. M., Forsberg, E. C., Akeson, M. and Vollmers, C. (2017) Nanopore long-read RNA-seq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun., 8, 16027
CrossRef Google scholar
[88]
Au, K. F., Sebastiano, V., Afshar, P. T., Durruthy, J. D. and Lee, L.Williams, B.A., van Bakel, H., Schadt, E. E., Reijo-Pera, R. A., Underwood, J.G., (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. USA, 110, E4821–E4830
[89]
Bleidorn, C. (2016) Third generation sequencing: technology and its potential impact on evolutionary biodiversity research. Syst. Biodivers., 14, 1–8
CrossRef Google scholar
[90]
Ramaswami, G., Lin, W., Piskol, R., Tan, M. H., Davis, C. and Li, J. B. (2012) Accurate identification of human Alu and non-Alu RNA editing sites. Nat. Methods, 9, 579–581
CrossRef Google scholar
[91]
Bahn, J. H., Lee, J.-H., Li, G., Greer, C., Peng, G. and Xiao, X. (2012) Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res., 22, 142–150
CrossRef Google scholar
[92]
Iyer, M. K., Niknafs, Y. S., Malik, R., Singhal, U., Sahu, A., Hosono, Y., Barrette, T. R., Prensner, J. R., Evans, J. R., Zhao, S., (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet., 47, 199–208
CrossRef Google scholar
[93]
Hezroni, H., Koppstein, D., Schwartz, M. G., Avrutin, A., Bartel, D. P. and Ulitsky, I. (2015) Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Reports, 11, 1110–1122
CrossRef Google scholar
[94]
Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J. -B., Stephens, M., Gilad, Y. and Pritchard, J. K. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 768–772.
[95]
Zak, D. E., Penn-Nicholson, A., Scriba, T. J., Thompson, E., Suliman, S., Amon, L. M., Mahomed, H., Erasmus, M., Whatney, W., Hussey, G. D., (2016) A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet, 387, 2312–2322
CrossRef Google scholar
[96]
Hawkins, R. D., Hon, G. C. and Ren, B. (2010) Next-generation genomics: an integrative approach. Nat. Rev. Genet., 11, 476–486
CrossRef Google scholar
[97]
Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. and Teichmann, S. A. (2015) The technology and biology of single-cell RNA sequencing. Mol. Cell, 58, 610–620
CrossRef Google scholar
[98]
Xu, C. and Su, Z. (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics, 31, 1974–1980
CrossRef Google scholar
[99]
Pierson, E. and Yau, C. (2015) Zifa: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol., 16, 241
CrossRef Google scholar
[100]
Li, W. V. and Li, J. J. (2018) An accurate and robust imputation method scimpute for single-cell RNA-seq data. Nat. Commun., 9, 997
CrossRef Google scholar
[101]
Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., (2017) The human cell atlas. eLife, 6, e27041
[102]
The Human Cell Atlas Consortium. (2017) The human cell atlas white paper

ACKNOWLEDGEMENTS

This work was supported by the following grants: National Science Foundation DMS-1613338, NIH/NIGMS R01GM120507, PhRMA Foundation Research Starter Grant in Informatics, Johnson & Johnson WiSTEM2D Award, and Sloan Research Fellowship (to J.J.L) and the UCLA Dissertation Year Fellowship (to W.V.L). The authors would like to thank the insightful feedbacks from Dr. Lior Pachter at California Institute of Technology and Dr. Michael I. Love at University of North Carolina at Chapel Hill.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Wei Vivian Li and Jingyi Jessica Li declare that they have no conflict of interests.
This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(805 KB)

Accesses

Citations

Detail

Sections
Recommended

/