Modeling and analysis of RNA-seq data: a review from a statistical perspective

Wei Vivian Li; Jingyi Jessica Li

doi:10.1007/s40484-018-0144-7

PDF(805 KB)

Quant. Biol. ›› 2018, Vol. 6 ›› Issue (3) : 195-209. DOI: 10.1007/s40484-018-0144-7

REVIEW

Modeling and analysis of RNA-seq data: a review from a statistical perspective

Wei Vivian Li¹ ,
Jingyi Jessica Li¹^,²

Author information +

History +

Abstract

Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date.

Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations.

Conclusions: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.

Graphical abstract

Keywords

RNA-seq / statistical modeling / differentially expressed genes / alternatively spliced exons / isoform reconstruction and quantification

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Wei Vivian Li, Jingyi Jessica Li. Modeling and analysis of RNA-seq data: a review from a statistical perspective. Quant. Biol., 2018, 6(3): 195‒209 https://doi.org/10.1007/s40484-018-0144-7

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10, 57–63 CrossRef Google scholar

[2]	Zhao, S., Fung-Leung, W.-P., Bittner, A., Ngo, K. and Liu, X. (2014) Comparison of RNA-seq and microarray in transcriptome profiling of activated t cells. PLoS One, 9, e78644 CrossRef Google scholar

[3]	Engström, P. G., Steijger, T., Sipos, B., Grant, G. R., Kahles, A., The RGASP Consortium, Rätsch, G., Goldman, N., Hubbard, T. J., Harrow, J., (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods, 10, 1185–1191 CrossRef Google scholar

[4]	Soneson, C. and Delorenzi, M. (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics, 14, 91 CrossRef Google scholar

[5]	Giorgi, F. M., Del Fabbro, C. and Licausi, F. (2013) Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana. Bioinformatics, 29, 717–724 CrossRef Google scholar

[6]	Kanitz, A., Gypas, F., Gruber, A. J., Gruber, A. R., Martin, G. and Zavolan, M (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol., 16, 1–26 CrossRef Google scholar

[7]	Tourasse, N. J., Millet, J. R. M, and Dupuy, D. (2017) Quantitative RNA-seq meta-analysis of alternative exon usage in C. elegans. Genome Res., 27, 2120–2128

[8]	Li, J. J., Huang, H., Qian, M. and Zhang, X. (2015) Advanced Medical Statistics, 2nd ed., chapter 24, pp. 915–936. World Scientific

[9]	Seqc/Maqc-Iii Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol., 32, 903–914 CrossRef Google scholar

[10]	Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Szcześniak, M. W., Gaffney, D. J., Elo, L. L., Zhang, X. (2016) A survey of best practices for RNA-seq data analysis. Genome Biol., 17, 1

[11]	Gao, R. and Li, J. J. (2017) Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons. BMC Genomics, 18, 234 CrossRef Google scholar

[12]	Arbeitman, M. N., Furlong, E. E. M., Imam, F., Johnson, E., Null, B. H., Baker, B. S., Krasnow, M. A., Scott, M. P., Davis, R. W. and White, K. P. (2002) Gene expression during the life cycle of Drosophila melanogaster. Science, 297, 2270–2275

[13]	Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J. C., Grützner, F. and Kaessmann, H. (2014) The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature, 505, 635–640 CrossRef Google scholar

[14]	Li, W. V., Chen, Y. and Li, J. J. (2017) Trom: a testing-based method for finding transcriptomic similarity of biological samples. Stat. Biosci., 9, 105–136 CrossRef Google scholar

[15]	de la Fuente, A., Bing, N., Hoeschele, I. and Mendes, P. (2004) Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20, 3565–3574 CrossRef Google scholar

[16]	Wyner, A. D. (1978) A definition of conditional mutual information for arbitrary ensembles. Inf. Control, 38, 51–59 CrossRef Google scholar

[17]	Zhao, J., Zhou, Y., Zhang, X. and Chen, L. (2016) Part mutual information for quantifying direct associations in networks. Proc. Natl. Acad. Sci. USA, 113, 5130–5135 CrossRef Google scholar

[18]	van der Maaten, L. and Hinton, G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605

[19]	Kruskal, J. B. and Wish, M. (1978) Multidimensional Scaling, volume 11. Sage

[20]	Evans, C., Hardin, J. and Stoebel, D. M. (2017) Selecting between-sample RNA-seq normalization methods from the perspective of their assumptions. Brief. Bioinform., bbx008 https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbx008/3056951?redirectedFrom=fulltext#

[21]	Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinformatics, 11, 94 CrossRef Google scholar

[22]	Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods, 5, 621–628 CrossRef Google scholar

[23]	Trapnell, C., Pachter, L. and Salzberg, S. L. (2009) Tophat: discovering splice junctions with RNA-seq. Bioinformatics, 25, 1105–1111 CrossRef Google scholar

[24]	Li, B. and Dewey, C. N. (2011) RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics, 12, 323 CrossRef Google scholar

[25]	Wagner, G. P., Kin, K. and Lynch, V. J. (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci., 131, 281–285 CrossRef Google scholar

[26]

Dillies, M.-A., Rau, A., Aubert, J., Hennequet-Antier, C., Jean-mougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., (2013) A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis. Brief. Bioinform., 14, 671–683

CrossRef Google scholar

[27]	Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193 CrossRef Google scholar

[28]	Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biol., 11, R106 CrossRef Google scholar

[29]	Robinson, M. D. and Oshlack, A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol., 11, R25 CrossRef Google scholar

[30]	Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 13, 523–538 CrossRef Google scholar

[31]	Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C. E., Socci, N. D. and Betel, D. (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol., 14, 3158 CrossRef Google scholar

[32]	Bloom, J. S., Khan, Z., Kruglyak, L., Singh, M. and Caudy, A. A. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics, 10, 221 CrossRef Google scholar

[33]	Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140 CrossRef Google scholar

[34]	Hardcastle, T. J. and Kelly, K. A. (2010) baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11, 422 CrossRef Google scholar

[35]	Love, M. I., Huber, W. and Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550 CrossRef Google scholar

[36]	Yu, D., Huber, W. and Vitek, O. (2013) Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size. Bioinformatics, 29, 1275–1282 CrossRef Google scholar

[37]	Leng, N., Dawson, J. A., Thomson, J. A., Ruotti, V., Rissman, A. I., Smits, B. M. G., Haag, J. D., Gould, M. N., Stewart, R. M. and Kendziorski, C. (2013) Ebseq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics, 29, 1035–1043 CrossRef Google scholar

[38]	Van De Wiel, M. A., Leday, G. G. R., Pardo, L., Rue, H., Van Der Vaart, A. W. and Van Wieringen, W. N. (2013) Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics, 14, 113–128 CrossRef Google scholar

[39]	Law, C. W., Chen, Y., Shi, W. and Smyth, G. K. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol., 15, R29 CrossRef Google scholar

[40]	Smyth, G. K.. (2005) Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor, pp. 397–420. Springer

[41]	Pimentel, H., Bray, N. L., Puente, S., Melsted, P. and Pachter, L. (2017) Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods, 14, 687–690 CrossRef Google scholar

[42]

Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., Wrobel, N., Gharbi, K., Simpson, G. G., Owen-Hughes, T., (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA, 22, 839–851

CrossRef Google scholar

[43]	Neyman, J. and Pearson, E. S. (1928) On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20, 175–240

[44]	Holm, S. (1979) A simple sequentially rejective multiple test procedure. Scand. J. Stat., 6, 65–70

[45]	Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289–300

[46]	Nueda, M. J., Martorell-Marugan, J., Martí, C., Tarazona, S. and Conesa, A. (2018) Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics, 34, 524–526 CrossRef Google scholar

[47]	Tai, Y. C. and Speed, T. P. (2006) A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Stat., 34, 2387–2412 CrossRef Google scholar

[48]	Stuart, J. M., Segal, E., Koller, D.and Kim, S. K. (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science, 302, 249–255

[49]	Langfelder, P. and Horvath, S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559 CrossRef Google scholar

[50]	Zhang, B. and Horvath, S. (2005) A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol., 4, Article 17 CrossRef Google scholar

[51]	Ravasz, E., , Somera A. L., Mongru, D. A., Oltvai, Z. N. and Barabási, A. -L. (2002) Hierarchical organization of modularity in metabolic networks. Science, 297, 1551–1555

[52]	Oti, M., van Reeuwijk, J., Huynen, M. A. and Brunner, H. G. (2008) Conserved co-expression for candidate disease gene prioritization. BMC Bioinformatics, 9, 208 CrossRef Google scholar

[53]	Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D. and Friedman, N. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 34, 166–176 CrossRef Google scholar

[54]	Canzar, S., Andreotti, S., Weese, D., Reinert, K. and Klau, G. W. (2016) CIDANE: comprehensive isoform discovery and abundance estimation. Genome Biol., 17, 16 CrossRef Google scholar

[55]	Jiang, H. and Wong, W. H. (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics, 25, 1026–1032 CrossRef Google scholar

[56]

Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010) Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 511–515

CrossRef Google scholar

[57]	Roberts, A. and Pachter, L. (2013) Streaming fragment assignment for real-time analysis of sequencing experiments. Nat. Methods, 10, 71–73 CrossRef Google scholar

[58]	Bray, N. L., Pimentel, H., Melsted, P. and Pachter, L. (2016) Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 525–527 CrossRef Google scholar

[59]	Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39, 1–38

[60]	Zhang, J., Jay Kuo, C.-C. and Chen, L. (2014) WEMIQ: an accurate and robust isoform quantification method for RNA-seq data. Bioinformatics, 31, 878–885

[61]	Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. and Kingsford, C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417–419 CrossRef Google scholar

[62]	Mezlini, A.M., Smith, E. J. M., Fiume, M., Buske, O., Savich, G. L., Shah, S., Aparicio, S., Chiang, D.Y., Goldenberg, A. and Brudno, M. (2013) iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res., 23, 519–529

[63]	Li, W. V., Zhao, A., Zhang, S. and Li, J. J. (2017) Msiq: joint modeling of multiple RNA-seq samples for accurate isoform quantification. Ann. Appl. Stat., 12, 510–539

[64]	Katz, Y. and Eric, T. (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods, 7, 1009–1015 CrossRef Google scholar

[65]	Love, M. I., Hogenesch, J. B. and Irizarry, R. A. (2016) Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nat. Biotechnol., 34, 1287–1291 CrossRef Google scholar

[66]	Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. and Pachter, L. (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol., 12, R22 CrossRef Google scholar

[67]	Xia, Z., Wen, J., Chang, C.-C. and Zhou, X. (2011) Nsmap: a method for spliced isoforms identification and quantification from RNA-seq. BMC Bioinformatics, 12, 162 CrossRef Google scholar

[68]	Bohnert, R. and Rätsch, G. (2010) rQuant. web: a tool for RNA-seq-based transcript quantitation. Nucleic Acids Res., 38, W348–W351 CrossRef Google scholar

[69]	Li, J. J., Jiang, C.-R., Brown, J. B., Huang, H. and Bickel, P. J. (2011) Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. USA, 108, 19867–19872 CrossRef Google scholar

[70]	Li, W., Feng, J. and Jiang, T. (2011) IsoLasso: a LASSO regression approach to RNA-seq based transcriptome assembly. J. Comput. Biol., 18, 1693–1707 CrossRef Google scholar

[71]	Meinshausen, N. and Bühlmann, P. (2010) Stability selection. J. R. Stat. Soc. Series B Stat. Methodol., 72, 417–473 CrossRef Google scholar

[72]	Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., (2011) Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol., 29, 644–652 CrossRef Google scholar

[73]

Guttman, M., Garber, M., Levin, J. Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M. J., Gnirke, A., Nusbaum, C., (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nat. Biotechnol., 28, 503–510

CrossRef Google scholar

[74]	Pertea, M., Pertea, G. M., Antonescu, C. M., Chang, T.-C., Mendell, J. T. and Salzberg, S. L. (2015) Stringtie enables improved reconstruction of a transcrip-tome from RNA-seq reads. Nat. Biotechnol., 33, 290–295 CrossRef Google scholar

[75]	Wang, X., Wu, Z. and Zhang, X. (2010) Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. J. Bioinform. Comput. Biol., 8 (Supp. 1), 177–192 CrossRef Google scholar

[76]	Lin, Y.-Y., Dao, P., Hach, F., Bakhshi, M., Mo, F., Lapuk, A., Collins, C. and Cenk Sahinalp, S. (2012) Cliiq: accurate comparative detection and quantification of expressed isoforms in a population. In Algorithms in Bioinformatics, pp. 178–189. Springer

[77]	Behr, J., Kahles, A., Zhong, Y., Sreedharan, V. T., Drewe, P. and Rätsch, G. (2013) MITIE: Simultaneous RNA-seq-based transcript identification and quantification in multiple samples. Bioinformatics, 29, 2529–2538 CrossRef Google scholar

[78]	Bernard, E., Jacob, L., Mairal, J. and Vert, J.-P. (2014) Efficient RNA isoform identification and quantification from RNA-seq data with network flows. Bioinformatics, 30, 2447–2455

[79]	Steijger, T., Abril, J. F., Engström, P. G., Kokocinski, F., Abril, J. F., Akerman, M., Alioto, T., Ambrosini, G., Antonarakis, S. E., Behr, J., (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods, 10, 1177–1184 CrossRef Google scholar

[80]	Wu, J., Akerman, M., Sun, S., McCombie, W. R., Krainer, A. R. and Zhang, M. Q. (2011) Splicetrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics, 27, 3010–3016 CrossRef Google scholar

[81]	Shen, S., Park, J. W., Lu, Z., Lin, L., Henry, M. D., Wu, Y. N., Zhou, Q. and Xing, Y. (2014) rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc. Natl. Acad. Sci. USA., 111, E5593–E5601 CrossRef Google scholar

[82]	Hu, Y., Huang, Y., Du, Y., Orellana, C. F., Singh, D., Johnson, A. R., Monroy, A., Kuan, P.-F., Hammond, S. M., Makowski, L., (2013) Diffsplice: the genome-wide detection of differential splicing events with RNA-seq. Nucleic Acids Res., 41, e39–e39 CrossRef Google scholar

[83]	Anders, S., Reyes, A. and Huber, W. (2012) Detecting differential usage of exons from RNA-seq data. Genome Res., 22, 2008–2017 CrossRef Google scholar

[84]	Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., Searle, S., (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res., 22, 1760–1774 CrossRef Google scholar

[85]	Rhoads, A. and Au, K. F. (2015) Pacbio sequencing and its applications. Genom. Proteom. Bioinf ., 13, 278–289 CrossRef Google scholar

[86]	Branton, D., Deamer, D. W., Marziali, A., Bayley, H., Benner, S. A., Butler, T., Di Ventra, M., Garaj, S., Hibbs, A., Huang, X., (2008) The potential and challenges of nanopore sequencing. Nat. Biotechnol., 26, 1146–1153 CrossRef Google scholar

[87]

Byrne, A., Beaudin, A. E., Olsen, H. E., Jain, M., Cole, C., Palmer, T., DuBois, R. M., Forsberg, E. C., Akeson, M. and Vollmers, C. (2017) Nanopore long-read RNA-seq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun., 8, 16027

CrossRef Google scholar

[88]	Au, K. F., Sebastiano, V., Afshar, P. T., Durruthy, J. D. and Lee, L.Williams, B.A., van Bakel, H., Schadt, E. E., Reijo-Pera, R. A., Underwood, J.G., (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. USA, 110, E4821–E4830

[89]	Bleidorn, C. (2016) Third generation sequencing: technology and its potential impact on evolutionary biodiversity research. Syst. Biodivers., 14, 1–8 CrossRef Google scholar

[90]	Ramaswami, G., Lin, W., Piskol, R., Tan, M. H., Davis, C. and Li, J. B. (2012) Accurate identification of human Alu and non-Alu RNA editing sites. Nat. Methods, 9, 579–581 CrossRef Google scholar

[91]	Bahn, J. H., Lee, J.-H., Li, G., Greer, C., Peng, G. and Xiao, X. (2012) Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res., 22, 142–150 CrossRef Google scholar

[92]	Iyer, M. K., Niknafs, Y. S., Malik, R., Singhal, U., Sahu, A., Hosono, Y., Barrette, T. R., Prensner, J. R., Evans, J. R., Zhao, S., (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet., 47, 199–208 CrossRef Google scholar

[93]	Hezroni, H., Koppstein, D., Schwartz, M. G., Avrutin, A., Bartel, D. P. and Ulitsky, I. (2015) Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Reports, 11, 1110–1122 CrossRef Google scholar

[94]	Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J. -B., Stephens, M., Gilad, Y. and Pritchard, J. K. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 768–772.

[95]	Zak, D. E., Penn-Nicholson, A., Scriba, T. J., Thompson, E., Suliman, S., Amon, L. M., Mahomed, H., Erasmus, M., Whatney, W., Hussey, G. D., (2016) A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet, 387, 2312–2322 CrossRef Google scholar

[96]	Hawkins, R. D., Hon, G. C. and Ren, B. (2010) Next-generation genomics: an integrative approach. Nat. Rev. Genet., 11, 476–486 CrossRef Google scholar

[97]	Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. and Teichmann, S. A. (2015) The technology and biology of single-cell RNA sequencing. Mol. Cell, 58, 610–620 CrossRef Google scholar

[98]	Xu, C. and Su, Z. (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics, 31, 1974–1980 CrossRef Google scholar

[99]	Pierson, E. and Yau, C. (2015) Zifa: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol., 16, 241 CrossRef Google scholar

[100]

Li, W. V. and Li, J. J. (2018) An accurate and robust imputation method scimpute for single-cell RNA-seq data. Nat. Commun., 9, 997

CrossRef Google scholar

[101]

Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., (2017) The human cell atlas. eLife, 6, e27041

[102]

The Human Cell Atlas Consortium. (2017) The human cell atlas white paper

ACKNOWLEDGEMENTS

This work was supported by the following grants: National Science Foundation DMS-1613338, NIH/NIGMS R01GM120507, PhRMA Foundation Research Starter Grant in Informatics, Johnson & Johnson WiSTEM2D Award, and Sloan Research Fellowship (to J.J.L) and the UCLA Dissertation Year Fellowship (to W.V.L). The authors would like to thank the insightful feedbacks from Dr. Lior Pachter at California Institute of Technology and Dr. Michael I. Love at University of North Carolina at Chapel Hill.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Wei Vivian Li and Jingyi Jessica Li declare that they have no conﬂict of interests.

This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(805 KB)

3673

Accesses

Citations

Altmetric

Detail

Sections

Recommended

Abstract
Graphical abstract
Keywords
Cite this article
References
ACKNOWLEDGEMENTS
COMPLIANCE WITH ETHICS GUIDELINES
RIGHTS & PERMISSIONS

Received	Revised	Accepted	Published
05 Dec 2017	23 Feb 2018	29 Mar 2018	13 Sep 2018
Online First Date	Issue Date
09 Aug 2018	13 Sep 2018

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs

Abstract

Graphical abstract

Keywords

Cite this article

{{custom_sec.title}}

{{custom_sec.title}}

References

ACKNOWLEDGEMENTS

COMPLIANCE WITH ETHICS GUIDELINES

RIGHTS & PERMISSIONS