De novo assembly of transcriptome from next-generation sequencing data

Xuan Li, Yimeng Kong, Qiong-Yi Zhao, Yuan-Yuan Li, Pei Hao

PDF(212 KB)
PDF(212 KB)
Quant. Biol. ›› 2016, Vol. 4 ›› Issue (2) : 94-105. DOI: 10.1007/s40484-016-0069-y
REVIEW
REVIEW

De novo assembly of transcriptome from next-generation sequencing data

Author information +
History +

Abstract

Reconstruction of transcriptome by de novo assembly from next generation sequencing (NGS) short-sequence reads provides an essential mean to catalog expressed genes, identify splicing isoforms, and capture the expression detail of transcripts for organisms with no reference genome available. De novo transcriptome assembly faces many unique challenges, including alternative splicing, variable expression level covering a dynamic range of several orders of magnitude, artifacts introduced by reverse transcription, etc. In the current review, we illustrate the grand strategy in applying De Bruijn Graph (DBG) approach in de novo transcriptome assembly. We further analyze many parameters proven critical in transcriptome assembly using DBG. Among them, k-mer length, coverage depth of reads, genome complexity, performance of different programs are addressed in greater details. A multi-k-mer strategy balancing efficiency and sensitivity is discussed and highly recommended for de novo transcriptome assembly. Future direction points to the combination of NGS and third generation sequencing technology that would greatly enhance the power of de novo transcriptomics study.

Graphical abstract

Keywords

transcriptome / de novo assembly / De Bruijn Graph / next generation sequencing / k-mer length / RNA splicing / performance

Cite this article

Download citation ▾
Xuan Li, Yimeng Kong, Qiong-Yi Zhao, Yuan-Yuan Li, Pei Hao. De novo assembly of transcriptome from next-generation sequencing data. Quant. Biol., 2016, 4(2): 94‒105 https://doi.org/10.1007/s40484-016-0069-y

References

[1]
Sanger, F., Nicklen, S. and Coulson, A. R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA, 74, 5463–5467
CrossRef Google scholar
[2]
Kheterpal, I., Scherer, J. R., Clark, S. M., Radhakrishnan, A., Ju, J., Ginther, C. L., Sensabaugh, G. F. and Mathies, R. A. (1996) DNA sequencing using a four-color confocal fluorescence capillary array scanner. Electrophoresis, 17, 1852–1859
CrossRef Google scholar
[3]
International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945
CrossRef Google scholar
[4]
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y.-J., Chen, Z., (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376–380
[5]
Bentley, D. R., Balasubramanian, S., Swerdlow, H. P., Smith, G. P., Milton, J., Brown, C. G., Hall, K. P., Evers, D. J., Barnes, C. L., Bignell, H. R., (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456, 53–59
CrossRef Google scholar
[6]
Valouev, A., Ichikawa, J., Tonthat, T., Stuart, J., Ranade, S., Peckham, H., Zeng, K., Malek, J. A., Costa, G., McKernan, K., (2008) A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res., 18, 1051–1063
CrossRef Google scholar
[7]
Metzker, M. L. (2010) Sequencing technologies—the next generation. Nat. Rev. Genet., 11, 31–46
CrossRef Google scholar
[8]
Morozova, O., Hirst, M. and Marra, M. A. (2009) Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet., 10, 135–151
CrossRef Google scholar
[9]
Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nat. Biotechnol., 26, 1135–1145
CrossRef Google scholar
[10]
Mardis, E. R. (2008) The impact of next-generation sequencing technology on genetics. Trends Genet., 24, 133–141
CrossRef Google scholar
[11]
Graveley, B. R., Brooks, A. N., Carlson, J. W., Duff, M. O., Landolin, J. M., Yang, L., Artieri, C. G., van Baren, M. J., Boley, N., Booth, B. W., (2011) The developmental transcriptome of Drosophila melanogaster. Nature, 471, 473–479
CrossRef Google scholar
[12]
Li, C.-F., Zhu, Y., Yu, Y., Zhao, Q.-Y., Wang, S.-J., Wang, X.-C., Yao, M.-Z., Luo, D., Li, X., Chen, L., (2015) Global transcriptome and gene regulation network for secondary metabolite biosynthesis of tea plant (Camellia sinensis). BMC Genomics, 16:560
[13]
Wang, X. C., Zhao, Q. Y., Ma, C. L., Zhang, Z. H., Cao, H. L., Kong, Y. M., Yue, C., Hao, X. Y., Chen, L., Ma, J. Q., (2013) Global transcriptome profiles of Camellia sinensis during cold acclimation. BMC Genomics, 14, 415
CrossRef Google scholar
[14]
Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y. and Pritchard, J. K. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 768–772
CrossRef Google scholar
[15]
Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 320, 1344–1349
CrossRef Google scholar
[16]
Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J. C., Grtzner, F. and Kaessmann, H. (2014) The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature, 505, 635–640
CrossRef Google scholar
[17]
Nam, J. W. and Bartel, D. P. (2012) Long noncoding RNAs in C. elegans. Genome Res., 22, 2529–2540
CrossRef Google scholar
[18]
Cabili, M. N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A. and Rinn, J. L. (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev., 25, 1915–1927
CrossRef Google scholar
[19]
Chen, X., Gao, C., Li, H., Huang, L., Sun, Q., Dong, Y., Tian, C., Gao, S., Dong, H., Guan, D., (2010) Identification and characterization of microRNAs in raw milk during different periods of lactation, commercial fluid, and powdered milk products. Cell Res., 20, 1128–1137
CrossRef Google scholar
[20]
Marquez, Y., Brown, J. W. S., Simpson, C., Barta, A. and Kalyna, M. (2012) Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res., 22, 1184–1195
CrossRef Google scholar
[21]
Shao, W., Zhao, Q. Y., Wang, X. Y., Xu, X. Y., Tang, Q., Li, M., Li, X. and Xu, Y. Z. (2012) Alternative splicing and trans-splicing events revealed by analysis of the Bombyx mori transcriptome. RNA, 18, 1395–1407
CrossRef Google scholar
[22]
Barbosa-Morais, N. L., Irimia, M., Pan, Q., Xiong, H. Y., Gueroussov, S., Lee, L. J., Slobodeniuc, V., Kutter, C., Watt, S., Colak, R., (2012) The evolutionary landscape of alternative splicing in vertebrate species. Science, 338, 1587–1593
CrossRef Google scholar
[23]
Xu, P., Kong, Y., Song, D., Huang, C., Li, X. and Li, L. (2014) Conservation and functional influence of alternative splicing in wood formation of Populus and Eucalyptus. BMC Genomics, 15, 780
CrossRef Google scholar
[24]
Trapnell, C., Hendrickson, D. G., Sauvageau, M., Goff, L., Rinn, J. L. and Pachter, L. (2012) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol., 31, 46–53
CrossRef Google scholar
[25]
Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252, 1651–1656
CrossRef Google scholar
[26]
Aaronson, J. S., Eckman, B., Blevins, R. A., Borkowski, J. A., Myerson, J., Imran, S. and Elliston, K. O. (1996) Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res., 6, 829–845
CrossRef Google scholar
[27]
Kan, Z. Y., Rouchka, E. C., Gish, W. R. and States, D. J. (2001) Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res., 11, 889–900
CrossRef Google scholar
[28]
Modrek, B., Resch, A., Grasso, C. and Lee, C. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res., 29, 2850–2859
CrossRef Google scholar
[29]
Velculescu, V. E., Zhang, L., Vogelstein, B. and Kinzler, K. W. (1995) Serial analysis of gene-expression. Science, 270, 484–487
CrossRef Google scholar
[30]
Alvarez, H., Corvalan, A., Roa, J. C., Argani, P., Murillo, F., Edwards, J., Beaty, R., Feldmann, G., Hong, S. M., Mullendore, M., (2008) Serial analysis of gene expression identifies connective tissue growth factor expression as a prognostic biomarker in gallbladder cancer. Clin. Cancer Res., 14, 2631–2638
CrossRef Google scholar
[31]
Horan, M. P. (2009) Application of serial analysis of gene expression to the study of human genetic disease. Hum. Genet., 126, 605–614
CrossRef Google scholar
[32]
Honda, H., Barrueto, F. F., Gogusev, J., Im, D. D. and Morin, P. J. (2008) Serial analysis of gene expression reveals differential expression between endometriosis and normal endometrium. Possible roles for AXL and SHC1 in the pathogenesis of endometriosis. Reprod. Biol. Endocrinol., 6–59
[33]
Kodzius, R., Kojima, M., Nishiyori, H., Nakamura, M., Fukuda, S., Tagami, M., Sasaki, D., Imamura, K., Kai, C., Harbers, M., (2006) CAGE: cap analysis of gene expression. Nat. Methods, 3, 211–222
CrossRef Google scholar
[34]
Harbers, M. and Carninci, P. (2005) Tag-based approaches for transcriptome research and genome annotation. Nat. Methods, 2, 495–502
CrossRef Google scholar
[35]
Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T., (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA, 100, 15776–15781
CrossRef Google scholar
[36]
Maekawa, S., Matsumoto, A., Takenaka, Y. and Matsuda, H. (2007) Tissue-specific functions based on information content of gene ontology using cap analysis gene expression. Med. Biol. Eng. Comput., 45, 1029–1036
CrossRef Google scholar
[37]
Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D. H., Johnson, D., Luo, S. J., McCurdy, S., Foy, M., Ewan, M., (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol., 18, 630–634
CrossRef Google scholar
[38]
Ozsolak, F. and Milos, P. M. (2011) RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet., 12, 87–98
CrossRef Google scholar
[39]
Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10, 57–63
CrossRef Google scholar
[40]
Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995) Quantitative monitoring of gene-expression patterns with a complementary-dna microarray. Science, 270, 467–470
CrossRef Google scholar
[41]
Okoniewski, M. J. and Miller, C. J. (2006) Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations. BMC Bioinformatics, 7, 276
[42]
Pauli, A., Valen, E., Lin, M. F., Garber, M., Vastenhouw, N. L., Levin, J. Z., Fan, L., Sandelin, A., Rinn, J. L., Regev, A., (2012) Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res., 22, 577–591
CrossRef Google scholar
[43]
Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S. F., Schroth, G. P. and Burge, C. B. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature, 456, 470–476
CrossRef Google scholar
[44]
Filichkin, S. A., Priest, H. D., Givan, S. A., Shen, R., Bryant, D. W., Fox, S. E., Wong, W. K. and Mockler, T. C. (2010) Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res., 20, 45–58
CrossRef Google scholar
[45]
Keren, H., Lev-Maor, G. and Ast, G. (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet., 11, 345–355
CrossRef Google scholar
[46]
Mamanova, L., Andrews, R. M., James, K. D., Sheridan, E. M., Ellis, P. D., Langford, C. F., Ost, T. W. B., Collins, J. E. and Turner, D. J. (2010) FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nat. Methods, 7, 130–132
CrossRef Google scholar
[47]
Faghihi, M. A. and Wahlestedt, C. (2009) Regulatory roles of natural antisense transcripts. Nat. Rev. Mol. Cell Biol., 10, 637–643
CrossRef Google scholar
[48]
Yamashita, R., Sathira, N. P., Kanai, A., Tanimoto, K., Arauchi, T., Tanaka, Y., Hashimoto, S. i., Sugano, S., Nakai, K. and Suzuki, Y. (2011) Genome-wide characterization of transcriptional start sites in humans by integrative transcriptome analysis. Genome Res., 21, 775–789
CrossRef Google scholar
[49]
Zhang, S. J., Liu, C. J., Yu, P., Zhong, X., Chen, J. Y., Yang, X., Peng, J., Yan, S., Wang, C., Zhu, X., (2014) Evolutionary interrogation of human biology in well-annotated genomic framework of Rhesus Macaque. Mol. Biol. Evol., 31, 1309–1324
CrossRef Google scholar
[50]
Derti, A., Garrett-Engele, P., MacIsaac, K. D., Stevens, R. C., Sriram, S., Chen, R., Rohl, C. A., Johnson, J. M. and Babak, T. (2012) A quantitative atlas of polyadenylation in five mammals. Genome Res., 22, 1173–1183
CrossRef Google scholar
[51]
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods, 5, 621–628
CrossRef Google scholar
[52]
Jia, G., Huang, X., Zhi, H., Zhao, Y., Zhao, Q., Li, W., Chai, Y., Yang, L., Liu, K., Lu, H., (2013) A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat. Genet., 45, 957–961
CrossRef Google scholar
[53]
Kumar, S., Banks, T. W. and Cloutier, S. (2012) SNP discovery through next-generation sequencing and its applications. Int. J. Plant Genomics, 2012, 1–15
[54]
Ramaswami, G., Zhang, R., Piskol, R., Keegan, L. P., Deng, P., O’Connell, M. A. and Li, J. B. (2013) Identifying RNA editing sites using RNA sequencing data alone. Nat. Methods, 10, 128–132
CrossRef Google scholar
[55]
Ramaswami, G., Lin, W., Piskol, R., Tan, M. H., Davis, C. and Li, J. B. (2012) Accurate identification of human Alu and non-Alu RNA editing sites. Nat. Methods, 9, 579–581
CrossRef Google scholar
[56]
Ward, J. A., Ponnala, L. and Weber, C. A. (2012) Strategies for transcriptome analysis in nonmodel plants. Am. J. Bot., 99, 267–276
CrossRef Google scholar
[57]
Duan, J. L., Xia, C., Zhao, G. Y., Jia, J. Z. and Kong, X. Y. (2012) Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC Genomics, 13, 392
[58]
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. and Blencowe, B. J. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet., 40, 1413–1415
CrossRef Google scholar
[59]
Zhang, G., Guo, G., Hu, X., Zhang, Y., Li, Q., Li, R., Zhuang, R., Lu, Z., He, Z., Fang, X., (2010) Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res., 20, 646–654
CrossRef Google scholar
[60]
Allen, M. A., Hillier, L. W., Waterston, R. H. and Blumenthal, T. (2011) A global analysis of C. elegans trans-splicing. Genome Res., 21, 255–264
CrossRef Google scholar
[61]
McManus, C. J., Duff, M. O., Eipper-Mains, J. and Graveley, B. R. (2010) Global analysis of trans-splicing in Drosophila. Proc. Natl. Acad. Sci. USA, 107, 12975–12979
CrossRef Google scholar
[62]
Kong, Y., Zhou, H., Yu, Y., Chen, L., Hao, P. and Li, X. (2015) The evolutionary landscape of intergenic trans-splicing events in insects. Nat. Commun., 6, 8734
CrossRef Google scholar
[63]
Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D. G., (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res., 22, 1775–1789
CrossRef Google scholar
[64]
Nacu, S., Yuan, W., Kan, Z., Bhatt, D., Rivers, C., Stinson, J., Peters, B. A., Modrusan, Z., Jung, K., Seshagiri, S., (2011) Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med. Genomics, 4, 11
CrossRef Google scholar
[65]
Rung, J. and Brazma, A. (2012) Reuse of public genome-wide gene expression data. Nat. Rev. Genet., 14, 89–99
CrossRef Google scholar
[66]
Schliesky, S., Gowik, U., Weber, A. P. M. and Braeutigam, A. (2012) RNA-seq assembly — are we there yet? Front. Plant Sci., 3, 220
[67]
He, W., You, M., Vasseur, L., Yang, G., Xie, M., Cui, K., Bai, J., Liu, C., Li, X., Xu, X., (2012) Developmental and insecticide-resistant insights from the de novo assembled transcriptome of the diamondback moth, Plutella xylostella. Genomics, 99, 169–177
CrossRef Google scholar
[68]
Zhan, S., Merlin, C., Boore, J. L. and Reppert, S. M. (2011) The monarch butterfly genome yields insights into long-distance migration. Cell, 147, 1171–1185
CrossRef Google scholar
[69]
Akbari, O. S., Antoshechkin, I., Amrhein, H., Williams, B., Diloreto, R., Sandler, J. and Hay, B. A. (2013) The developmental transcriptome of the mosquito Aedes aegypti, an invasive species and major arbovirus vector. G3, 3, 1493–1509
CrossRef Google scholar
[70]
Merkin, J., Russell, C., Chen, P. and Burge, C. B. (2012) Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science, 338, 1593–1599
CrossRef Google scholar
[71]
Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol., 29, 644–652
CrossRef Google scholar
[72]
Pevzner, P. A., Tang, H. X. and Waterman, M. S. (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98, 9748–9753
CrossRef Google scholar
[73]
Batzoglou, S. (2004). Algorithmic challenges in mammalian whole-genome assembly. In Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. John Wiley & Sons, Ltd
[74]
Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P. and Batzoglou, S. (2007) Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One, 2, e484
[75]
Warren, R. L., Sutton, G. G., Jones, S. J. M. and Holt, R. A. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23, 500–501
CrossRef Google scholar
[76]
Dohm, J. C., Lottaz, C., Borodina, T. and Himmelbauer, H. (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res., 17, 1697–1706
CrossRef Google scholar
[77]
Jeck, W. R., Reinhardt, J. A., Baltrus, D. A., Hickenbotham, M. T., Magrini, V., Mardis, E. R., Dangl, J. L. and Jones, C. D. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942–2944
CrossRef Google scholar
[78]
Zerbino, D. R. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829
CrossRef Google scholar
[79]
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M. and Birol, I. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 1117–1123
CrossRef Google scholar
[80]
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20, 265–272
CrossRef Google scholar
[81]
Surget-Groba, Y. and Montoya-Burgos, J. I. (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res., 20, 1432–1440
CrossRef Google scholar
[82]
Birol, I., Jackman, S. D., Nielsen, C. B., Qian, J. Q., Varhol, R., Stazyk, G., Morin, R. D., Zhao, Y., Hirst, M., Schein, J. E., (2009) De novo transcriptome assembly with ABySS. Bioinformatics, 25, 2872–2877
CrossRef Google scholar
[83]
Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S. D., Mungall, K., Lee, S., Okada, H. M., Qian, J. Q., (2010) De novo assembly and analysis of RNA-seq data. Nat. Methods, 7, 909–912
CrossRef Google scholar
[84]
Schulz, M. H., Zerbino, D. R., Vingron, M. and Birney, E. (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 28, 1086–1092
CrossRef Google scholar
[85]
Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol., 29, 644–652
CrossRef Google scholar
[86]
Schuster, S. C. (2008) Next-generation sequencing transforms today’s biology. Nat. Methods, 5, 16–18
CrossRef Google scholar
[87]
Zhao, Q.-Y., Wang, Y., Kong, Y.-M., Luo, D., Li, X. and Hao, P. (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics, 12, S2
CrossRef Google scholar
[88]
Braeutigam, A., Kajala, K., Wullenweber, J., Sommer, M., Gagneul, D., Weber, K. L., Carr, K. M., Gowik, U., Mass, J., Lercher, M. J., (2011) An mRNA blueprint for C-4 photosynthesis derived from comparative transcriptomics of closely related C-3 and C-4 species. Plant Physiol., 155, 142–156
CrossRef Google scholar
[89]
Gowik, U., Brautigam, A., Weber, K. L., Weber, A. P. M. and Westhoff, P. (2011) Evolution of C-4 photosynthesis in the genus Flaveria: how many and which genes does it take to make C-4? Plant Cell, 23, 2087–2105
CrossRef Google scholar
[90]
Wang, Y., Yu, Y., Pan, B., Hao, P., Li, Y., Shao, Z., Xu, X. and Li, X. (2012) Optimizing hybrid assembly of next-generation sequence data from Enterococcus faecium: a microbe with highly divergent genome. BMC Syst. Biol., 6(Suppl 3), S21
CrossRef Google scholar
[91]
Falgueras, J., Lara, A. J., Fernandez-Pozo, N., Canton, F. R., Perez-Trabado, G. and Claros, M. G. (2010) SeqTrim: a high-throughput pipeline for preprocessing any type of sequence reads. BMC Bioinformatics, 11, 38
CrossRef Google scholar
[92]
Lassmann, T., Hayashizaki, Y. and Daub, C. O. (2009) TagDust-a program to eliminate artifacts from next generation sequencing data. Bioinformatics, 25, 2839–2840
CrossRef Google scholar
[93]
Martin, J., Bruno, V. M., Fang, Z., Meng, X., Blow, M., Zhang, T., Sherlock, G., Snyder, M. and Wang, Z. (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics, 11,663
[94]
Shi, H., Schmidt, B., Liu, W. and Mueller-Wittig, W. (2010) A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. J. Comput. Biol., 17, 603–615
CrossRef Google scholar
[95]
Kelley, D. R., Schatz, M. C. and Salzberg, S. L. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116
[96]
Yang, X., Chockalingam, S. P. and Aluru, S. (2013) A survey of error-correction methods for next-generation sequencing. Brief. Bioinform., 14, 56–66
CrossRef Google scholar
[97]
Liu, B., Yuan, J., Yiu, S.-M., Li, Z., Xie, Y., Chen, Y., Shi, Y., Zhang, H., Li, Y., Lam, T.-W., (2012) COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly. Bioinformatics, 28, 2870–2874
CrossRef Google scholar
[98]
Conway, T. C. and Bromage, A. J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486
CrossRef Google scholar
[99]
Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J. M. and Brown, C. T. (2012) Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. USA, 109, 13272–13277
CrossRef Google scholar
[100]
HannonLab. (2009) FASTX TOOLKIT. http://hannonlab.cshl.edu/fastx_toolkit/
[101]
Joshi, N. A. and Fass, J. N. (2011) Sickle: a sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. Available athttps://github.com/najoshi/sickle
[102]
Andrews, S. (2010). FastQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
[103]
Lohse, M., Bolger, A. M., Nagel, A., Fernie, A. R., Lunn, J. E., Stitt, M. and Usadel, B. (2012) RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res., 40, W622–W627
CrossRef Google scholar
[104]
Hansen, M. A., Oey, H., Fernandez-Valverde, S., Jung, C.-H. and Mattick, J. S. (2008). Biopieces: a bioinformatics toolset and framework. In 19th International Conference on Genome Informatics
[105]
Modolo, L. and Lerat, E. (2015) UrQt: an efficient software for the unsupervised quality trimming of NGS data. BMC Bioinformatics, 16, 137
CrossRef Google scholar
[106]
Riesgo, A., Perez-Porro, A. R., Carmona, S., Leys, S. P. and Giribet, G. (2012) Optimization of preservation and storage time of sponge tissues to obtain quality mRNA for next-generation sequencing. Mol. Ecol. Resour., 12, 312–322
CrossRef Google scholar
[107]
Looso, M., Preussner, J., Sousounis, K., Bruckskotten, M., Michel, C. S., Lignelli, E., Reinhardt, R., Hoeffner, S., Krueger, M., Tsonis, P. A.,(2013) A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration. Genome Biol., 14, R16
[108]
MacManes, M. D. (2014) On the optimal trimming of high-throughput mRNA sequence data. Front. Genet., 5, 13
CrossRef Google scholar
[109]
MacManes, M. D. and Eisen, M. B. (2013) Improving transcriptome assembly through error correction of high-throughput sequence reads. PeerJ, 1, e113
CrossRef Google scholar
[110]
Mbandi, S. K., Hesse, U., Rees, D. J. G. and Christoffels, A. (2014) A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads. Front. Genet., 5, 17
CrossRef Google scholar
[111]
Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991
CrossRef Google scholar
[112]
Blumenthal, T. (1998) Gene clusters and polycistronic transcription in eukaryotes. BioEssays, 20, 480–487
CrossRef Google scholar
[113]
Kazan, K. (2003) Alternative splicing and proteome diversity in plants: the tip of the iceberg has just emerged. Trends Plant Sci., 8, 468–471
CrossRef Google scholar
[114]
Leff, S. E. and Rosenfeld, M. G. (1986) Complex transcriptional units: diversity in gene-expression by alternative RNA processing. Annu. Rev. Biochem., 55, 1091–1117
CrossRef Google scholar
[115]
Gibbons, J. G., Janson, E. M., Hittinger, C. T., Johnston, M., Abbot, P. and Rokas, A. (2009) Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics. Mol. Biol. Evol., 26, 2731–2744
CrossRef Google scholar
[116]
Gruenheit, N., Deusch, O., Esser, C., Becker, M., Voelckel, C. and Lockhart, P. (2012) Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. BMC Genomics, 13, 92
CrossRef Google scholar
[117]
Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., Couger, M. B., Eccles, D., Li, B., Lieber, M., (2013) De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat. Protoc., 8, 1494–1512
CrossRef Google scholar
[118]
Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 511–515
CrossRef Google scholar
[119]
Trapnell, C., Pachter, L. and Salzberg, S. L. (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105–1111
CrossRef Google scholar
[120]
Griffith, M., Griffith, O. L., Mwenifumbo, J., Goya, R., Morrissy, A. S., Morin, R. D., Corbett, R., Tang, M. J., Hou, Y.-C., Pugh, T. J., (2010) Alternative expression analysis by RNA sequencing. Nat. Methods, 7, 843–847
CrossRef Google scholar
[121]
Melicher, D., Torson, A., Dworkin, I. and Bowsher, J. (2014) A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach. BMC Genomics, 15, 188
CrossRef Google scholar
[122]
Francis, W. R., Christianson, L. M., Kiko, R., Powers, M. L., Shaner, N. C. and Haddock, S. H. D. (2013) A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genomics, 14, 167
CrossRef Google scholar
[123]
Kumar, S. and Blaxter, M. (2010) Comparing de novo assemblers for 454 transcriptome data. BMC Genomics, 11, 571
CrossRef Google scholar
[124]
Ren, X., Liu, T., Dong, J., Sun, L., Yang, J., Zhu, Y. and Jin, Q. (2012) Evaluating de bruijn graph assemblers on 454 transcriptomic data. PLoS One, 7, e51188
CrossRef Google scholar
[125]
O’Neil, S. and Emrich, S. (2013) Assessing de novo transcriptome assembly metrics for consistency and utility. BMC Genomics, 14, 465
CrossRef Google scholar
[126]
Salzberg, S. L., Phillippy, A. M., Zimin, A., Puiu, D., Magoc, T., Koren, S., Treangen, T. J., Schatz, M. C., Delcher, A. L., Roberts, M., (2012) GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res., 22, 557–567
CrossRef Google scholar
[127]
Mundry, M., Bornberg-Bauer, E., Sammeth, M. and Feulner, P. G. D. (2012) Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS One, 7, e31410
CrossRef Google scholar
[128]
Li, B., Fillmore, N., Bai, Y., Collins, M., Thomson, J. A., Stewart, R. and Dewey, C. N. (2014) Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol., 15, 553
CrossRef Google scholar
[129]
Clark, S. C., Egan, R., Frazier, P. I. and Wang, Z. (2013) ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics, 29, 435–443
CrossRef Google scholar
[130]
Henschel, R., Lieber, M., Wu, L.-S., Nista, P. M., Haas, B. J. and LeDuc, R. D. (2012). Trinity RNA-Seq assembler performance optimization. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond. 1–8
[131]
Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., Liu, S., Huang, W., He, G., Gu, S., Li, S., (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics, 30, 1660–1666
CrossRef Google scholar
[132]
Chang, Z., Li, G., Liu, J., Zhang, Y., Ashby, C., Liu, D., Cramer, C. L. and Huang, X. (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol., 16, 30
CrossRef Google scholar
[133]
Li, Y., Hu, Y., Bolund, L. and Wang, J. (2010) State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum. Genomics, 4, 271–277
CrossRef Google scholar
[134]
Zhou, S., Liao, R. and Guan, J. (2013) When cloud computing meets bioinformatics: a review. J. Bioinform. Comput. Biol., 11, 1330002
[135]
Taylor, R. (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics, 11, S1
CrossRef Google scholar
[136]
Check Hayden, E. (2009) Genome sequencing: the third generation. Nature, 457, 768–769
CrossRef Google scholar
[137]
Schadt, E. E., Turner, S. and Kasarskis, A. (2010) A window into third-generation sequencing. Hum. Mol. Genet., 19, R227–R240
CrossRef Google scholar
[138]
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133– 138
CrossRef Google scholar
[139]
Koren, S., Schatz, M. C., Walenz, B. P., Martin, J., Howard, J. T., Ganapathy, G., Wang, Z., Rasko, D. A., McCombie, W. R., Jarvis, E. D., (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol., 30, 693–700
CrossRef Google scholar
[140]
English, A. C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., Qin, X., Muzny, D. M., Reid, J. G., Worley, K. C., (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One, 7, e47768
CrossRef Google scholar
[141]
Ferrarini, M., Moretto, M., Ward, J. A., Šurbanovski, N., Stevanović, V., Giongo, L., Viola, R., Cavalieri, D., Velasco, R., Cestaro, A., (2013) An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics, 14, 1–12
CrossRef Google scholar

ABBREVIATIONS

CAGE, cap analysis of gene expression; cDNA, complementary DNA; CDS, coding sequence; CPU, central processing unit; DBG, De Bruijn Graph; MPSS, massively parallel signature sequencing; NGS, next generation sequencing; ORF, open reading frame; RMBT, reads mapped back to assembled transcripts; SAGE, serial analysis of gene expression; SNV, single nucleotide variation; SOLiD, sequencing by oligonucleotide ligation and detection; UTR, untranslated region

ACKNOWLEDGEMENTS

This work is supported in part by grants from the National Basic Research Program of China (Nos. 2012CB316501 and 2013CB127000) and the National Natural Science Foundation of China (Nos. 31571310 and 31271409).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xuan Li, Yimeng Kong, Qiong-Yi Zhao, Yuan-Yuan Li and Pei Hao declare they have no conflict of interests.
This article does not contain any studies with human or animal subjects performed by any of the authors.
Funding
 

RIGHTS & PERMISSIONS

2016 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(212 KB)

Accesses

Citations

Detail

Sections
Recommended

/