<em><bold></bold>De novo</em> assembly of transcriptome from next-generation sequencing data

Xuan Li; Yimeng Kong; Qiong-Yi Zhao; Yuan-Yuan Li; Pei Hao

doi:10.1007/s40484-016-0069-y

PDF(212 KB)

Quant. Biol. ›› 2016, Vol. 4 ›› Issue (2) : 94-105. DOI: 10.1007/s40484-016-0069-y

REVIEW

De novo assembly of transcriptome from next-generation sequencing data

Author information +

History +

Abstract

Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expression detail of transcripts for organisms with no reference genome available. De novo transcriptome assembly faces many unique challenges, including alternative splicing, variable expression level covering a dynamic range of several orders of magnitude, artifacts introduced by reverse transcription, etc. In the current review, we illustrate the grand strategy in applying De Bruijn Graph (DBG) approach in de novo transcriptome assembly. We further analyze many parameters proven critical in transcriptome assembly using DBG. Among them, k-mer length, coverage depth of reads, genome complexity, performance of different programs are addressed in greater details. A multi-k-mer strategy balancing efficiency and sensitivity is discussed and highly recommended for de novo transcriptome assembly. Future direction points to the combination of NGS and third generation sequencing technology that would greatly enhance the power of de novo transcriptomics study.

Graphical abstract

Keywords

transcriptome / de novo assembly / De Bruijn Graph / next generation sequencing / k-mer length / RNA splicing / performance

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xuan Li, Yimeng Kong, Qiong-Yi Zhao, Yuan-Yuan Li, Pei Hao. De novo assembly of transcriptome from next-generation sequencing data. Quant. Biol., 2016, 4(2): 94‒105 https://doi.org/10.1007/s40484-016-0069-y

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Sanger, F., Nicklen, S. and Coulson, A. R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA, 74, 5463–5467 CrossRef Google scholar

[2]	Kheterpal, I., Scherer, J. R., Clark, S. M., Radhakrishnan, A., Ju, J., Ginther, C. L., Sensabaugh, G. F. and Mathies, R. A. (1996) DNA sequencing using a four-color confocal fluorescence capillary array scanner. Electrophoresis, 17, 1852–1859 CrossRef Google scholar

[3]	International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945 CrossRef Google scholar

[4]	Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y.-J., Chen, Z., (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376–380

[5]	Bentley, D. R., Balasubramanian, S., Swerdlow, H. P., Smith, G. P., Milton, J., Brown, C. G., Hall, K. P., Evers, D. J., Barnes, C. L., Bignell, H. R., (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456, 53–59 CrossRef Google scholar

[6]	Valouev, A., Ichikawa, J., Tonthat, T., Stuart, J., Ranade, S., Peckham, H., Zeng, K., Malek, J. A., Costa, G., McKernan, K., (2008) A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res., 18, 1051–1063 CrossRef Google scholar

[7]	Metzker, M. L. (2010) Sequencing technologies—the next generation. Nat. Rev. Genet., 11, 31–46 CrossRef Google scholar

[8]	Morozova, O., Hirst, M. and Marra, M. A. (2009) Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet., 10, 135–151 CrossRef Google scholar

[9]	Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nat. Biotechnol., 26, 1135–1145 CrossRef Google scholar

[10]	Mardis, E. R. (2008) The impact of next-generation sequencing technology on genetics. Trends Genet., 24, 133–141 CrossRef Google scholar

[11]	Graveley, B. R., Brooks, A. N., Carlson, J. W., Duff, M. O., Landolin, J. M., Yang, L., Artieri, C. G., van Baren, M. J., Boley, N., Booth, B. W., (2011) The developmental transcriptome of Drosophila melanogaster. Nature, 471, 473–479 CrossRef Google scholar

[12]	Li, C.-F., Zhu, Y., Yu, Y., Zhao, Q.-Y., Wang, S.-J., Wang, X.-C., Yao, M.-Z., Luo, D., Li, X., Chen, L., (2015) Global transcriptome and gene regulation network for secondary metabolite biosynthesis of tea plant (Camellia sinensis). BMC Genomics, 16:560

[13]	Wang, X. C., Zhao, Q. Y., Ma, C. L., Zhang, Z. H., Cao, H. L., Kong, Y. M., Yue, C., Hao, X. Y., Chen, L., Ma, J. Q., (2013) Global transcriptome profiles of Camellia sinensis during cold acclimation. BMC Genomics, 14, 415 CrossRef Google scholar

[14]	Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y. and Pritchard, J. K. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 768–772 CrossRef Google scholar

[15]	Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 320, 1344–1349 CrossRef Google scholar

[16]	Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J. C., Grtzner, F. and Kaessmann, H. (2014) The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature, 505, 635–640 CrossRef Google scholar

[17]	Nam, J. W. and Bartel, D. P. (2012) Long noncoding RNAs in C. elegans. Genome Res., 22, 2529–2540 CrossRef Google scholar

[18]	Cabili, M. N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A. and Rinn, J. L. (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev., 25, 1915–1927 CrossRef Google scholar

[19]	Chen, X., Gao, C., Li, H., Huang, L., Sun, Q., Dong, Y., Tian, C., Gao, S., Dong, H., Guan, D., (2010) Identification and characterization of microRNAs in raw milk during different periods of lactation, commercial fluid, and powdered milk products. Cell Res., 20, 1128–1137 CrossRef Google scholar

[20]	Marquez, Y., Brown, J. W. S., Simpson, C., Barta, A. and Kalyna, M. (2012) Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res., 22, 1184–1195 CrossRef Google scholar

[21]	Shao, W., Zhao, Q. Y., Wang, X. Y., Xu, X. Y., Tang, Q., Li, M., Li, X. and Xu, Y. Z. (2012) Alternative splicing and trans-splicing events revealed by analysis of the Bombyx mori transcriptome. RNA, 18, 1395–1407 CrossRef Google scholar

[22]	Barbosa-Morais, N. L., Irimia, M., Pan, Q., Xiong, H. Y., Gueroussov, S., Lee, L. J., Slobodeniuc, V., Kutter, C., Watt, S., Colak, R., (2012) The evolutionary landscape of alternative splicing in vertebrate species. Science, 338, 1587–1593 CrossRef Google scholar

[23]	Xu, P., Kong, Y., Song, D., Huang, C., Li, X. and Li, L. (2014) Conservation and functional influence of alternative splicing in wood formation of Populus and Eucalyptus. BMC Genomics, 15, 780 CrossRef Google scholar

[24]	Trapnell, C., Hendrickson, D. G., Sauvageau, M., Goff, L., Rinn, J. L. and Pachter, L. (2012) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol., 31, 46–53 CrossRef Google scholar

[25]	Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252, 1651–1656 CrossRef Google scholar

[26]	Aaronson, J. S., Eckman, B., Blevins, R. A., Borkowski, J. A., Myerson, J., Imran, S. and Elliston, K. O. (1996) Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res., 6, 829–845 CrossRef Google scholar

[27]	Kan, Z. Y., Rouchka, E. C., Gish, W. R. and States, D. J. (2001) Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res., 11, 889–900 CrossRef Google scholar

[28]	Modrek, B., Resch, A., Grasso, C. and Lee, C. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res., 29, 2850–2859 CrossRef Google scholar

[29]	Velculescu, V. E., Zhang, L., Vogelstein, B. and Kinzler, K. W. (1995) Serial analysis of gene-expression. Science, 270, 484–487 CrossRef Google scholar

[30]

Alvarez, H., Corvalan, A., Roa, J. C., Argani, P., Murillo, F., Edwards, J., Beaty, R., Feldmann, G., Hong, S. M., Mullendore, M., (2008) Serial analysis of gene expression identifies connective tissue growth factor expression as a prognostic biomarker in gallbladder cancer. Clin. Cancer Res., 14, 2631–2638

CrossRef Google scholar

[31]	Horan, M. P. (2009) Application of serial analysis of gene expression to the study of human genetic disease. Hum. Genet., 126, 605–614 CrossRef Google scholar

[32]	Honda, H., Barrueto, F. F., Gogusev, J., Im, D. D. and Morin, P. J. (2008) Serial analysis of gene expression reveals differential expression between endometriosis and normal endometrium. Possible roles for AXL and SHC1 in the pathogenesis of endometriosis. Reprod. Biol. Endocrinol., 6–59

[33]	Kodzius, R., Kojima, M., Nishiyori, H., Nakamura, M., Fukuda, S., Tagami, M., Sasaki, D., Imamura, K., Kai, C., Harbers, M., (2006) CAGE: cap analysis of gene expression. Nat. Methods, 3, 211–222 CrossRef Google scholar

[34]	Harbers, M. and Carninci, P. (2005) Tag-based approaches for transcriptome research and genome annotation. Nat. Methods, 2, 495–502 CrossRef Google scholar

[35]

Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T., (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA, 100, 15776–15781

CrossRef Google scholar

[36]	Maekawa, S., Matsumoto, A., Takenaka, Y. and Matsuda, H. (2007) Tissue-specific functions based on information content of gene ontology using cap analysis gene expression. Med. Biol. Eng. Comput., 45, 1029–1036 CrossRef Google scholar

[37]	Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D. H., Johnson, D., Luo, S. J., McCurdy, S., Foy, M., Ewan, M., (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol., 18, 630–634 CrossRef Google scholar

[38]	Ozsolak, F. and Milos, P. M. (2011) RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet., 12, 87–98 CrossRef Google scholar

[39]	Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10, 57–63 CrossRef Google scholar

[40]	Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995) Quantitative monitoring of gene-expression patterns with a complementary-dna microarray. Science, 270, 467–470 CrossRef Google scholar

[41]	Okoniewski, M. J. and Miller, C. J. (2006) Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations. BMC Bioinformatics, 7, 276

[42]	Pauli, A., Valen, E., Lin, M. F., Garber, M., Vastenhouw, N. L., Levin, J. Z., Fan, L., Sandelin, A., Rinn, J. L., Regev, A., (2012) Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res., 22, 577–591 CrossRef Google scholar

[43]	Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S. F., Schroth, G. P. and Burge, C. B. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature, 456, 470–476 CrossRef Google scholar

[44]	Filichkin, S. A., Priest, H. D., Givan, S. A., Shen, R., Bryant, D. W., Fox, S. E., Wong, W. K. and Mockler, T. C. (2010) Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res., 20, 45–58 CrossRef Google scholar

[45]	Keren, H., Lev-Maor, G. and Ast, G. (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet., 11, 345–355 CrossRef Google scholar

[46]	Mamanova, L., Andrews, R. M., James, K. D., Sheridan, E. M., Ellis, P. D., Langford, C. F., Ost, T. W. B., Collins, J. E. and Turner, D. J. (2010) FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nat. Methods, 7, 130–132 CrossRef Google scholar

[47]	Faghihi, M. A. and Wahlestedt, C. (2009) Regulatory roles of natural antisense transcripts. Nat. Rev. Mol. Cell Biol., 10, 637–643 CrossRef Google scholar

[48]

Yamashita, R., Sathira, N. P., Kanai, A., Tanimoto, K., Arauchi, T., Tanaka, Y., Hashimoto, S. i., Sugano, S., Nakai, K. and Suzuki, Y. (2011) Genome-wide characterization of transcriptional start sites in humans by integrative transcriptome analysis. Genome Res., 21, 775–789

CrossRef Google scholar

[49]	Zhang, S. J., Liu, C. J., Yu, P., Zhong, X., Chen, J. Y., Yang, X., Peng, J., Yan, S., Wang, C., Zhu, X., (2014) Evolutionary interrogation of human biology in well-annotated genomic framework of Rhesus Macaque. Mol. Biol. Evol., 31, 1309–1324 CrossRef Google scholar

[50]	Derti, A., Garrett-Engele, P., MacIsaac, K. D., Stevens, R. C., Sriram, S., Chen, R., Rohl, C. A., Johnson, J. M. and Babak, T. (2012) A quantitative atlas of polyadenylation in five mammals. Genome Res., 22, 1173–1183 CrossRef Google scholar

[51]	Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods, 5, 621–628 CrossRef Google scholar

[52]	Jia, G., Huang, X., Zhi, H., Zhao, Y., Zhao, Q., Li, W., Chai, Y., Yang, L., Liu, K., Lu, H., (2013) A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat. Genet., 45, 957–961 CrossRef Google scholar

[53]	Kumar, S., Banks, T. W. and Cloutier, S. (2012) SNP discovery through next-generation sequencing and its applications. Int. J. Plant Genomics, 2012, 1–15

[54]	Ramaswami, G., Zhang, R., Piskol, R., Keegan, L. P., Deng, P., O’Connell, M. A. and Li, J. B. (2013) Identifying RNA editing sites using RNA sequencing data alone. Nat. Methods, 10, 128–132 CrossRef Google scholar

[55]	Ramaswami, G., Lin, W., Piskol, R., Tan, M. H., Davis, C. and Li, J. B. (2012) Accurate identification of human Alu and non-Alu RNA editing sites. Nat. Methods, 9, 579–581 CrossRef Google scholar

[56]	Ward, J. A., Ponnala, L. and Weber, C. A. (2012) Strategies for transcriptome analysis in nonmodel plants. Am. J. Bot., 99, 267–276 CrossRef Google scholar

[57]	Duan, J. L., Xia, C., Zhao, G. Y., Jia, J. Z. and Kong, X. Y. (2012) Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC Genomics, 13, 392

[58]	Pan, Q., Shai, O., Lee, L. J., Frey, B. J. and Blencowe, B. J. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet., 40, 1413–1415 CrossRef Google scholar

[59]	Zhang, G., Guo, G., Hu, X., Zhang, Y., Li, Q., Li, R., Zhuang, R., Lu, Z., He, Z., Fang, X., (2010) Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res., 20, 646–654 CrossRef Google scholar

[60]	Allen, M. A., Hillier, L. W., Waterston, R. H. and Blumenthal, T. (2011) A global analysis of C. elegans trans-splicing. Genome Res., 21, 255–264 CrossRef Google scholar

[61]	McManus, C. J., Duff, M. O., Eipper-Mains, J. and Graveley, B. R. (2010) Global analysis of trans-splicing in Drosophila. Proc. Natl. Acad. Sci. USA, 107, 12975–12979 CrossRef Google scholar

[62]	Kong, Y., Zhou, H., Yu, Y., Chen, L., Hao, P. and Li, X. (2015) The evolutionary landscape of intergenic trans-splicing events in insects. Nat. Commun., 6, 8734 CrossRef Google scholar

[63]

Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D. G., (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res., 22, 1775–1789

CrossRef Google scholar

[64]	Nacu, S., Yuan, W., Kan, Z., Bhatt, D., Rivers, C., Stinson, J., Peters, B. A., Modrusan, Z., Jung, K., Seshagiri, S., (2011) Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med. Genomics, 4, 11 CrossRef Google scholar

[65]	Rung, J. and Brazma, A. (2012) Reuse of public genome-wide gene expression data. Nat. Rev. Genet., 14, 89–99 CrossRef Google scholar

[66]	Schliesky, S., Gowik, U., Weber, A. P. M. and Braeutigam, A. (2012) RNA-seq assembly — are we there yet? Front. Plant Sci., 3, 220

[67]	He, W., You, M., Vasseur, L., Yang, G., Xie, M., Cui, K., Bai, J., Liu, C., Li, X., Xu, X., (2012) Developmental and insecticide-resistant insights from the de novo assembled transcriptome of the diamondback moth, Plutella xylostella. Genomics, 99, 169–177 CrossRef Google scholar

[68]	Zhan, S., Merlin, C., Boore, J. L. and Reppert, S. M. (2011) The monarch butterfly genome yields insights into long-distance migration. Cell, 147, 1171–1185 CrossRef Google scholar

[69]	Akbari, O. S., Antoshechkin, I., Amrhein, H., Williams, B., Diloreto, R., Sandler, J. and Hay, B. A. (2013) The developmental transcriptome of the mosquito Aedes aegypti, an invasive species and major arbovirus vector. G3, 3, 1493–1509 CrossRef Google scholar

[70]	Merkin, J., Russell, C., Chen, P. and Burge, C. B. (2012) Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science, 338, 1593–1599 CrossRef Google scholar

[71]	Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol., 29, 644–652 CrossRef Google scholar

[72]	Pevzner, P. A., Tang, H. X. and Waterman, M. S. (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98, 9748–9753 CrossRef Google scholar

[73]	Batzoglou, S. (2004). Algorithmic challenges in mammalian whole-genome assembly. In Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. John Wiley & Sons, Ltd

[74]	Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P. and Batzoglou, S. (2007) Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One, 2, e484

[75]	Warren, R. L., Sutton, G. G., Jones, S. J. M. and Holt, R. A. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23, 500–501 CrossRef Google scholar

[76]	Dohm, J. C., Lottaz, C., Borodina, T. and Himmelbauer, H. (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res., 17, 1697–1706 CrossRef Google scholar

[77]	Jeck, W. R., Reinhardt, J. A., Baltrus, D. A., Hickenbotham, M. T., Magrini, V., Mardis, E. R., Dangl, J. L. and Jones, C. D. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942–2944 CrossRef Google scholar

[78]	Zerbino, D. R. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829 CrossRef Google scholar

[79]	Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M. and Birol, I. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 1117–1123 CrossRef Google scholar

[80]	Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20, 265–272 CrossRef Google scholar

[81]	Surget-Groba, Y. and Montoya-Burgos, J. I. (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res., 20, 1432–1440 CrossRef Google scholar

[82]	Birol, I., Jackman, S. D., Nielsen, C. B., Qian, J. Q., Varhol, R., Stazyk, G., Morin, R. D., Zhao, Y., Hirst, M., Schein, J. E., (2009) De novo transcriptome assembly with ABySS. Bioinformatics, 25, 2872–2877 CrossRef Google scholar

[83]	Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S. D., Mungall, K., Lee, S., Okada, H. M., Qian, J. Q., (2010) De novo assembly and analysis of RNA-seq data. Nat. Methods, 7, 909–912 CrossRef Google scholar

[84]	Schulz, M. H., Zerbino, D. R., Vingron, M. and Birney, E. (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 28, 1086–1092 CrossRef Google scholar

[85]	Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol., 29, 644–652 CrossRef Google scholar

[86]	Schuster, S. C. (2008) Next-generation sequencing transforms today’s biology. Nat. Methods, 5, 16–18 CrossRef Google scholar

[87]	Zhao, Q.-Y., Wang, Y., Kong, Y.-M., Luo, D., Li, X. and Hao, P. (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics, 12, S2 CrossRef Google scholar

[88]

Braeutigam, A., Kajala, K., Wullenweber, J., Sommer, M., Gagneul, D., Weber, K. L., Carr, K. M., Gowik, U., Mass, J., Lercher, M. J., (2011) An mRNA blueprint for C-4 photosynthesis derived from comparative transcriptomics of closely related C-3 and C-4 species. Plant Physiol., 155, 142–156

CrossRef Google scholar

[89]	Gowik, U., Brautigam, A., Weber, K. L., Weber, A. P. M. and Westhoff, P. (2011) Evolution of C-4 photosynthesis in the genus Flaveria: how many and which genes does it take to make C-4? Plant Cell, 23, 2087–2105 CrossRef Google scholar

[90]	Wang, Y., Yu, Y., Pan, B., Hao, P., Li, Y., Shao, Z., Xu, X. and Li, X. (2012) Optimizing hybrid assembly of next-generation sequence data from Enterococcus faecium: a microbe with highly divergent genome. BMC Syst. Biol., 6(Suppl 3), S21 CrossRef Google scholar

[91]	Falgueras, J., Lara, A. J., Fernandez-Pozo, N., Canton, F. R., Perez-Trabado, G. and Claros, M. G. (2010) SeqTrim: a high-throughput pipeline for preprocessing any type of sequence reads. BMC Bioinformatics, 11, 38 CrossRef Google scholar

[92]	Lassmann, T., Hayashizaki, Y. and Daub, C. O. (2009) TagDust-a program to eliminate artifacts from next generation sequencing data. Bioinformatics, 25, 2839–2840 CrossRef Google scholar

[93]	Martin, J., Bruno, V. M., Fang, Z., Meng, X., Blow, M., Zhang, T., Sherlock, G., Snyder, M. and Wang, Z. (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics, 11,663

[94]	Shi, H., Schmidt, B., Liu, W. and Mueller-Wittig, W. (2010) A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. J. Comput. Biol., 17, 603–615 CrossRef Google scholar

[95]	Kelley, D. R., Schatz, M. C. and Salzberg, S. L. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116

[96]	Yang, X., Chockalingam, S. P. and Aluru, S. (2013) A survey of error-correction methods for next-generation sequencing. Brief. Bioinform., 14, 56–66 CrossRef Google scholar

[97]	Liu, B., Yuan, J., Yiu, S.-M., Li, Z., Xie, Y., Chen, Y., Shi, Y., Zhang, H., Li, Y., Lam, T.-W., (2012) COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly. Bioinformatics, 28, 2870–2874 CrossRef Google scholar

[98]	Conway, T. C. and Bromage, A. J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486 CrossRef Google scholar

[99]	Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J. M. and Brown, C. T. (2012) Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. USA, 109, 13272–13277 CrossRef Google scholar

[100]

HannonLab. (2009) FASTX TOOLKIT. http://hannonlab.cshl.edu/fastx_toolkit/

[101]

Joshi, N. A. and Fass, J. N. (2011) Sickle: a sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. Available athttps://github.com/najoshi/sickle

[102]

Andrews, S. (2010). FastQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

[103]

Lohse, M., Bolger, A. M., Nagel, A., Fernie, A. R., Lunn, J. E., Stitt, M. and Usadel, B. (2012) RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res., 40, W622–W627

CrossRef Google scholar

[104]

Hansen, M. A., Oey, H., Fernandez-Valverde, S., Jung, C.-H. and Mattick, J. S. (2008). Biopieces: a bioinformatics toolset and framework. In 19^th International Conference on Genome Informatics

[105]

Modolo, L. and Lerat, E. (2015) UrQt: an efficient software for the unsupervised quality trimming of NGS data. BMC Bioinformatics, 16, 137

CrossRef Google scholar

[106]

Riesgo, A., Perez-Porro, A. R., Carmona, S., Leys, S. P. and Giribet, G. (2012) Optimization of preservation and storage time of sponge tissues to obtain quality mRNA for next-generation sequencing. Mol. Ecol. Resour., 12, 312–322

CrossRef Google scholar

[107]

Looso, M., Preussner, J., Sousounis, K., Bruckskotten, M., Michel, C. S., Lignelli, E., Reinhardt, R., Hoeffner, S., Krueger, M., Tsonis, P. A.,(2013) A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration. Genome Biol., 14, R16

[108]

MacManes, M. D. (2014) On the optimal trimming of high-throughput mRNA sequence data. Front. Genet., 5, 13

CrossRef Google scholar

[109]

MacManes, M. D. and Eisen, M. B. (2013) Improving transcriptome assembly through error correction of high-throughput sequence reads. PeerJ, 1, e113

CrossRef Google scholar

[110]

Mbandi, S. K., Hesse, U., Rees, D. J. G. and Christoffels, A. (2014) A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads. Front. Genet., 5, 17

CrossRef Google scholar

[111]

Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991

CrossRef Google scholar

[112]

Blumenthal, T. (1998) Gene clusters and polycistronic transcription in eukaryotes. BioEssays, 20, 480–487

CrossRef Google scholar

[113]

Kazan, K. (2003) Alternative splicing and proteome diversity in plants: the tip of the iceberg has just emerged. Trends Plant Sci., 8, 468–471

CrossRef Google scholar

[114]

Leff, S. E. and Rosenfeld, M. G. (1986) Complex transcriptional units: diversity in gene-expression by alternative RNA processing. Annu. Rev. Biochem., 55, 1091–1117

CrossRef Google scholar

[115]

Gibbons, J. G., Janson, E. M., Hittinger, C. T., Johnston, M., Abbot, P. and Rokas, A. (2009) Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics. Mol. Biol. Evol., 26, 2731–2744

CrossRef Google scholar

[116]

Gruenheit, N., Deusch, O., Esser, C., Becker, M., Voelckel, C. and Lockhart, P. (2012) Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. BMC Genomics, 13, 92

CrossRef Google scholar

[117]

Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., Couger, M. B., Eccles, D., Li, B., Lieber, M., (2013) De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat. Protoc., 8, 1494–1512

CrossRef Google scholar

[118]

Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 511–515

CrossRef Google scholar

[119]

Trapnell, C., Pachter, L. and Salzberg, S. L. (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105–1111

CrossRef Google scholar

[120]

Griffith, M., Griffith, O. L., Mwenifumbo, J., Goya, R., Morrissy, A. S., Morin, R. D., Corbett, R., Tang, M. J., Hou, Y.-C., Pugh, T. J., (2010) Alternative expression analysis by RNA sequencing. Nat. Methods, 7, 843–847

CrossRef Google scholar

[121]

Melicher, D., Torson, A., Dworkin, I. and Bowsher, J. (2014) A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach. BMC Genomics, 15, 188

CrossRef Google scholar

[122]

Francis, W. R., Christianson, L. M., Kiko, R., Powers, M. L., Shaner, N. C. and Haddock, S. H. D. (2013) A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genomics, 14, 167

CrossRef Google scholar

[123]

Kumar, S. and Blaxter, M. (2010) Comparing de novo assemblers for 454 transcriptome data. BMC Genomics, 11, 571

CrossRef Google scholar

[124]

Ren, X., Liu, T., Dong, J., Sun, L., Yang, J., Zhu, Y. and Jin, Q. (2012) Evaluating de bruijn graph assemblers on 454 transcriptomic data. PLoS One, 7, e51188

CrossRef Google scholar

[125]

O’Neil, S. and Emrich, S. (2013) Assessing de novo transcriptome assembly metrics for consistency and utility. BMC Genomics, 14, 465

CrossRef Google scholar

[126]

Salzberg, S. L., Phillippy, A. M., Zimin, A., Puiu, D., Magoc, T., Koren, S., Treangen, T. J., Schatz, M. C., Delcher, A. L., Roberts, M., (2012) GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res., 22, 557–567

CrossRef Google scholar

[127]

Mundry, M., Bornberg-Bauer, E., Sammeth, M. and Feulner, P. G. D. (2012) Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS One, 7, e31410

CrossRef Google scholar

[128]

Li, B., Fillmore, N., Bai, Y., Collins, M., Thomson, J. A., Stewart, R. and Dewey, C. N. (2014) Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol., 15, 553

CrossRef Google scholar

[129]

Clark, S. C., Egan, R., Frazier, P. I. and Wang, Z. (2013) ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics, 29, 435–443

CrossRef Google scholar

[130]

Henschel, R., Lieber, M., Wu, L.-S., Nista, P. M., Haas, B. J. and LeDuc, R. D. (2012). Trinity RNA-Seq assembler performance optimization. In Proceedings of the 1^st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond. 1–8

[131]

Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., Liu, S., Huang, W., He, G., Gu, S., Li, S., (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics, 30, 1660–1666

CrossRef Google scholar

[132]

Chang, Z., Li, G., Liu, J., Zhang, Y., Ashby, C., Liu, D., Cramer, C. L. and Huang, X. (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol., 16, 30

CrossRef Google scholar

[133]

Li, Y., Hu, Y., Bolund, L. and Wang, J. (2010) State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum. Genomics, 4, 271–277

CrossRef Google scholar

[134]

Zhou, S., Liao, R. and Guan, J. (2013) When cloud computing meets bioinformatics: a review. J. Bioinform. Comput. Biol., 11, 1330002

[135]

Taylor, R. (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics, 11, S1

CrossRef Google scholar

[136]

Check Hayden, E. (2009) Genome sequencing: the third generation. Nature, 457, 768–769

CrossRef Google scholar

[137]

Schadt, E. E., Turner, S. and Kasarskis, A. (2010) A window into third-generation sequencing. Hum. Mol. Genet., 19, R227–R240

CrossRef Google scholar

[138]

Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133– 138

CrossRef Google scholar

[139]

Koren, S., Schatz, M. C., Walenz, B. P., Martin, J., Howard, J. T., Ganapathy, G., Wang, Z., Rasko, D. A., McCombie, W. R., Jarvis, E. D., (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol., 30, 693–700

CrossRef Google scholar

[140]

English, A. C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., Qin, X., Muzny, D. M., Reid, J. G., Worley, K. C., (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One, 7, e47768

CrossRef Google scholar

[141]

Ferrarini, M., Moretto, M., Ward, J. A., Šurbanovski, N., Stevanović, V., Giongo, L., Viola, R., Cavalieri, D., Velasco, R., Cestaro, A., (2013) An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics, 14, 1–12

CrossRef Google scholar

ABBREVIATIONS

CAGE, cap analysis of gene expression; cDNA, complementary DNA; CDS, coding sequence; CPU, central processing unit; DBG, De Bruijn Graph; MPSS, massively parallel signature sequencing; NGS, next generation sequencing; ORF, open reading frame; RMBT, reads mapped back to assembled transcripts; SAGE, serial analysis of gene expression; SNV, single nucleotide variation; SOLiD, sequencing by oligonucleotide ligation and detection; UTR, untranslated region

ACKNOWLEDGEMENTS

This work is supported in part by grants from the National Basic Research Program of China (Nos. 2012CB316501 and 2013CB127000) and the National Natural Science Foundation of China (Nos. 31571310 and 31271409).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xuan Li, Yimeng Kong, Qiong-Yi Zhao, Yuan-Yuan Li and Pei Hao declare they have no conflict of interests.

This article does not contain any studies with human or animal subjects performed by any of the authors.

Funding

RIGHTS & PERMISSIONS

2016 Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap

PDF(212 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
15 Nov 2015	04 Feb 2016	26 May 2016
Online First Date	Issue Date
13 May 2016	26 May 2016

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs

Abstract

Graphical abstract

Keywords

Cite this article

{{custom_sec.title}}

{{custom_sec.title}}

References

ABBREVIATIONS

ACKNOWLEDGEMENTS

COMPLIANCE WITH ETHICS GUIDELINES

RIGHTS & PERMISSIONS