Current challenges and solutions of <i><strong>de novo</strong></i> assembly

Xingyu Liao; Min Li; You Zou; Fang-Xiang Wu; Yi-Pan; Jianxin Wang

doi:10.1007/s40484-019-0166-9

PDF(1527 KB)

Quant. Biol. ›› 2019, Vol. 7 ›› Issue (2) : 90-109. DOI: 10.1007/s40484-019-0166-9

REVIEW

Current challenges and solutions of de novo assembly

Author information +

History +

Abstract

Background: Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings.

Results: In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph-based, de Bruijn graph-based and string graph-based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single-cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly.

Conclusions: This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.

Author summary

In this review, we focus on the main challenges facing de novo assembly and its solusions. Firstly, we introduce some of the major challenges faced by de novo assembly. Secondly, we analyze the characteristics of various sequencing platforms and their impact on assembly results, and introduce the characteristics of each assemblers and their adaptation scene. Thirdly, we introduce in detail the solutions to the main challenges of de novo assembly. Finally, we discuss the latest methods and developments in de novo assembly.

Graphical abstract

Keywords

next-generation sequencing / single-cell sequencing / single-molecule sequencing / de novo assembly algorithms

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xingyu Liao, Min Li, You Zou, Fang-Xiang Wu, Yi-Pan, Jianxin Wang. Current challenges and solutions of de novo assembly. Quant. Biol., 2019, 7(2): 90‒109 https://doi.org/10.1007/s40484-019-0166-9

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Miller, J. R., Koren, S. and Sutton, G. (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–327 CrossRef Pubmed Google scholar

[2]	Nagarajan, N. and Pop, M. (2013) Sequence assembly demystified. Nat. Rev. Genet., 14, 157–167 CrossRef Pubmed Google scholar

[3]	Denton, J. F., Lugo-Martinez, J., Tucker, A. E., Schrider, D. R., Warren, W. C. and Hahn, M. W. (2014) Extensive error in the number of genes inferred from draft genome assemblies. PLoS Comput. Biol., 10, e1003998 CrossRef Pubmed Google scholar

[4]	Head, S. R., Komori, H. K., LaMere, S. A., Whisenant, T., Van Nieuwerburgh, F., Salomon, D. R. and Ordoukhanian, P. (2014) Library construction for next-generation sequencing: overviews and challenges. Biotechniques, 56, 61–64 CrossRef Pubmed Google scholar

[5]	Yang, X., Chockalingam, S. P. and Aluru, S. (2013) A survey of error-correction methods for next-generation sequencing. Brief. Bioinform., 14, 56–66 CrossRef Pubmed Google scholar

[6]	Kelley, D. R., Schatz, M. C. and Salzberg, S. L. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116 CrossRef Pubmed Google scholar

[7]	Koren, S. and Phillippy, A. M. (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120 CrossRef Pubmed Google scholar

[8]	Madoui, M. A., Engelen, S., Cruaud, C., Belser, C., Bertrand, L., Alberti, A., Lemainque, A., Wincker, P. and Aury, J. M. (2015) Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics, 16, 327 CrossRef Pubmed Google scholar

[9]	Sims, D., Sudbery, I., Ilott, N. E., Heger, A. and Ponting, C. P. (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet., 15, 121–132 CrossRef Pubmed Google scholar

[10]

Chitsaz, H., Yee-Greenbaum, J. L., Tesler, G., Lombardo, M. J., Dupont, C. L., Badger, J. H., Novotny, M., Rusch, D. B., Fraser, L. J., Gormley, N. A., (2011) Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol., 29, 915–921

CrossRef Pubmed Google scholar

[11]	Rodrigue, S., Malmstrom, R. R., Berlin, A. M., Birren, B. W., Henn, M. R. and Chisholm, S. W. (2009) Whole genome amplification and de novo assembly of single bacterial cells. PLoS One, 4, e6864 CrossRef Pubmed Google scholar

[12]	Liao, X., Li, M., Zou, Y., Wu, F., Pan, Y., Luo, F., and Wang, J. (2018) Improving de novo assembly based on read classification. IEEE ACM T. Comput. Bi. http://dx.doi.org/10.1109/TCBB.2018.2861380

[13]	Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y. J., Chen, Z., (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376–380 CrossRef Pubmed Google scholar

[14]	Kazazian, H. H. Jr. (2004) Mobile elements: drivers of genome evolution. Science, 303, 1626–1632 CrossRef Pubmed Google scholar

[15]	Cordaux, R. and Batzer, M. A. (2009) The impact of retrotransposons on human genome evolution. Nat. Rev. Genet., 10, 691–703 CrossRef Pubmed Google scholar

[16]	Goodwin, S., Gurtowski, J., Ethe-Sayers, S., Deshpande, P., Schatz, M. C. and McCombie, W. R. (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756 CrossRef Pubmed Google scholar

[17]	Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. and Ragoussis, J. (2016) Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep., 6, 31602 CrossRef Pubmed Google scholar

[18]	Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. and Birol, I. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 1117–1123 CrossRef Pubmed Google scholar

[19]

Gnerre, S., Maccallum, I., Przybylski, D., Ribeiro, F. J., Burton, J. N., Walker, B. J., Sharpe, T., Hall, G., Shea, T. P., Sykes, S., (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA, 108, 1513–1518

CrossRef Pubmed Google scholar

[20]	Simpson, J. T. and Durbin, R. (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res., 22, 549–556 CrossRef Pubmed Google scholar

[21]	Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1, 18 CrossRef Pubmed Google scholar

[22]	Schatz, M. C., Witkowski, J. and McCombie, W. R. (2012) Current challenges in de novo plant genome sequencing and assembly. Genome Biol., 13, 243 CrossRef Pubmed Google scholar

[23]	Idury, R. M. and Waterman, M. S. (1995) A new algorithm for DNA sequence assembly. J. Comput. Biol., 2, 291–306 CrossRef Pubmed Google scholar

[24]	Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991 CrossRef Pubmed Google scholar

[25]	Hernandez, D., François, P., Farinelli, L., Osterås, M. and Schrenzel, J. (2008) de novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res., 18, 802–809 CrossRef Pubmed Google scholar

[26]	Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry, C. M., Reinert, K. H., Remington, K. A., (2000) A whole-genome assembly of Drosophila. Science, 287, 2196–2204 CrossRef Pubmed Google scholar

[27]	Jaffe, D. B., Butler, J., Gnerre, S., Mauceli, E., Lindblad-Toh, K., Mesirov, J. P., Zody, M. C. and Lander, E. S. (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res., 13, 91–96 CrossRef Pubmed Google scholar

[28]	Sohn, J. I. and Nam, J. W. (2018) The present and future of de novo whole-genome assembly. Brief. Bioinformatics, 19, 23–40 Pubmed

[29]	Mitra, R. D. and Church, G. M. (1999) In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res., 27, e34–e39 CrossRef Pubmed Google scholar

[30]	Buermans, H. P. J. and den Dunnen, J. T. (2014) Next generation sequencing technology: advances and applications. Biochim. Biophys. Acta, 1842, 1932–1941 CrossRef Pubmed Google scholar

[31]	Metzker, M. L. (2010) Sequencing technologies–the next generation. Nat. Rev. Genet., 11, 31–46 CrossRef Pubmed Google scholar

[32]	Laehnemann, D., Borkhardt, A. and McHardy, A. C. (2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief. Bioinform., 17, 154–179 CrossRef Pubmed Google scholar

[33]	Schirmer, M., Ijaz, U. Z., D’Amore, R., Hall, N., Sloan, W. T. and Quince, C. (2015) Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res., 43, e37–e37 CrossRef Pubmed Google scholar

[34]	van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426 CrossRef Pubmed Google scholar

[35]	Mestan, K. K., Ilkhanoff, L., Mouli, S. and Lin, S. (2011) Genomic sequencing in clinical trials. J. Transl. Med., 9, 222 CrossRef Pubmed Google scholar

[36]	Goodwin, S., McPherson, J. D. and McCombie, W. R. (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351 CrossRef Pubmed Google scholar

[37]

Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., Bertoni, A., Swerdlow, H. P. and Gu, Y. (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13, 341

CrossRef Pubmed Google scholar

[38]	Schuster, S. C. (2008) Next-generation sequencing transforms today’s biology. Nat. Methods, 5, 16–18 CrossRef Pubmed Google scholar

[39]	Patel, R. K. and Jain, M. (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One, 7, e30619 CrossRef Pubmed Google scholar

[40]	Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., Lin, D., Lu, L. and Law, M. (2012) Comparison of next-generation sequencing systems. J. Biomed. Biotechnol., Article ID 251364

[41]	Liu, L., Hu, N., Wang, B., Min, C., Juan, W., Tian, Z., Yi, H. and Dan, L. (2011). A brief utilization report on the Illumina HiSeq 2000 sequencer. Mycology, 2, 169–191

[42]	Simon, S. A., Zhai, J., Nandety, R. S., McCormick, K. P., Zeng, J., Mejia, D. and Meyers, B. C. (2009) Short-read sequencing technologies for transcriptional analyses. Annu. Rev. Plant Biol., 60, 305–333 CrossRef Pubmed Google scholar

[43]	Kircher, M. and Kelso, J. (2010) High-throughput DNA sequencing–concepts and limitations. BioEssays, 32, 524–536 CrossRef Pubmed Google scholar

[44]	Hert, D. G., Fredlake, C. P. and Barron, A. E. (2008) Advantages and limitations of next-generation sequencing technologies: a comparison of electrophoresis and non-electrophoresis methods. Electrophoresis, 29, 4618–4626 CrossRef Pubmed Google scholar

[45]	Henson, J., Tischler, G. and Ning, Z. (2012) Next-generation sequencing and large genome assemblies. Pharmacogenomics, 13, 901–915 CrossRef Pubmed Google scholar

[46]	Rhoads, A. and Au, K. F. (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics, 13, 278–289 CrossRef Pubmed Google scholar

[47]

Logares, R., Haverkamp, T. H. A., Kumar, S., Lanzén, A., Nederbragt, A. J., Quince, C. and Kauserud, H. (2012) Environmental microbiology through the lens of high-throughput DNA sequencing: synopsis of current platforms and bioinformatics approaches. J. Microbiol. Methods, 91, 106–113

CrossRef Pubmed Google scholar

[48]	Treangen, T. J. and Salzberg, S. L. (2011) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet., 13, 36–46 CrossRef Pubmed Google scholar

[49]	Heather, J. M. and Chain, B. (2016) The sequence of sequencers: The history of sequencing DNA. Genomics, 107, 1–8 CrossRef Pubmed Google scholar

[50]	Chin, C. S., Alexander, D. H., Marks, P., Klammer, A. A., Drake, J., Heiner, C., Clum, A., Copeland, A., Huddleston, J., Eichler, E. E., (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods, 10, 563–569 CrossRef Pubmed Google scholar

[51]

Ferrarini, M., Moretto, M., Ward, J. A., Šurbanovski, N., Stevanović, V., Giongo, L., Viola, R., Cavalieri, D., Velasco, R., Cestaro, A., (2013) An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics, 14, 670

CrossRef Pubmed Google scholar

[52]	Goodwin, S., Gurtowski, J., Ethe-Sayers, S., Deshpande, P., Schatz, M. C. and McCombie, W. R. (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756 CrossRef Pubmed Google scholar

[53]	Laver, T., Harrison, J., O’Neill, P. A., Moore, K., Farbos, A., Paszkiewicz, K. and Studholme, D. J. (2015) Assessing the performance of the Oxford Nanopore technologies minion. Biomol Detect. Quantif., 3, 1–8 CrossRef Pubmed Google scholar

[54]	Turner, W. (1890) The cell theory, past and present. J. Anat. Physiol., 24(Pt 2), 253–287

[55]	Gawad, C., Koh, W. and Quake, S. R. (2016) Single-cell genome sequencing: current state of the science. Nat. Rev. Genet., 17, 175–188 CrossRef Pubmed Google scholar

[56]

CrossRef Pubmed Google scholar

[57]	Batzoglou, S., Jaffe, D. B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J. P. and Lander, E. S. (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res., 12, 177–189 CrossRef Pubmed Google scholar

[58]	Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991 CrossRef Pubmed Google scholar

[59]	Li, Z., Chen, Y., Mu, D., Yuan, J., Shi, Y., Zhang, H., Gan, J., Li, N., Hu, X., Liu, B., (2012) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief. Funct. Genomics, 11, 25–37 CrossRef Pubmed Google scholar

[60]	Chaisson, M. J. P., Wilson, R. K. and Eichler, E. E. (2015) Genetic variation and the de novo assembly of human genomes. Nat. Rev. Genet., 16, 627–640 CrossRef Pubmed Google scholar

[61]	Huang, X., Wang, J., Aluru, S., Yang, S. P. and Hillier, L. (2003) PCAP: a whole-genome assembly program. Genome Res., 13, 2164–2170 CrossRef Pubmed Google scholar

[62]	Treangen, T. J., Sommer, D. D., Angly, F. E., Koren, S. and Pop, M. (2011) Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics. 33, 11.8. 1–11.8. 18

[63]	Luo, J., Wang, J., Zhang, Z., Wu, F. X., Li, M. and Pan, Y. (2015) EPGA: de novo assembly using the distributions of reads and insert size. Bioinformatics, 31, 825–833 CrossRef Pubmed Google scholar

[64]	Conway, T. C. and Bromage, A. J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486 CrossRef Pubmed Google scholar

[65]	Pevzner, P. (2000) Computational Molecular Biology: An Algorithmic Approach. Cambridge: MIT press

[66]	Pevzner, P. A., Tang, H. and Waterman, M. S. (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98, 9748–9753 CrossRef Pubmed Google scholar

[67]	Zerbino, D. R. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829 CrossRef Pubmed Google scholar

[68]

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477

CrossRef Pubmed Google scholar

[69]	Peng, Y., Leung, H. C. M., Yiu, S. M. and Chin, F. Y. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 1420–1428 CrossRef Pubmed Google scholar

[70]	Luo, J., Wang, J., Li, W., Zhang, Z., Wu, F. X., Li, M. and Pan, Y. (2015) EPGA2: memory-efficient de novo assembler. Bioinformatics, 31, 3988–3990 Pubmed

[71]	Zimin, A. V., Marçais, G., Puiu, D., Roberts, M., Salzberg, S. L. and Yorke, J. A. (2013) The MaSuRCA genome assembler. Bioinformatics, 29, 2669–2677 CrossRef Pubmed Google scholar

[72]	Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I. A., Belmonte, M. K., Lander, E. S., Nusbaum, C. and Jaffe, D. B. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18, 810–820 CrossRef Pubmed Google scholar

[73]	Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760 CrossRef Pubmed Google scholar

[74]	Simpson, J. T. and Durbin, R. (2010) Efficient construction of an assembly string graph using the FM-index. Bioinformatics, 26, i367–i373 CrossRef Pubmed Google scholar

[75]	Koren, S. and Phillippy, A. M. (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120 CrossRef Pubmed Google scholar

[76]	Xiao, C. L., Chen, Y., Xie, S. Q., Chen, K-N, Wang, Y., Luo, F., and Xie, Z. (2016) MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads. bioRxiv, 089250

[77]	Heo, Y., Wu, X. L., Chen, D., Ma, J. and Hwu, W. M. (2014) BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. Bioinformatics, 30, 1354–1362 CrossRef Pubmed Google scholar

[78]	Li, X. and Waterman, M. S. (2003) Estimating the repeat structure and length of DNA sequences using L-tuples. Genome Res., 13, 1916–1922 Pubmed

[79]	Kelley, D. R., Schatz, M. C. and Salzberg, S. L. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116 CrossRef Pubmed Google scholar

[80]	Yang, X., Dorman, K. S. and Aluru, S. (2010) Reptile: representative tiling for short read error correction. Bioinformatics, 26, 2526–2533 CrossRef Pubmed Google scholar

[81]	Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20, 265–272 CrossRef Pubmed Google scholar

[82]	Zhao, X., Palmer, L. E., Bolanos, R., Mircean, C., Fasulo, D. and Wittenberg, G. M. (2010) EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol., 17, 1549–1560 CrossRef Pubmed Google scholar

[83]	Salmela, L. and Schröder, J. (2011) Correcting errors in short reads by multiple alignments. Bioinformatics, 27, 1455–1461 CrossRef Pubmed Google scholar

[84]	Thompson, J. D., Thierry, J. C. and Poch, O. (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics, 19, 1155–1161 CrossRef Pubmed Google scholar

[85]	Lassmann, T. and Sonnhammer, E. L. L. (2005) Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 298 CrossRef Pubmed Google scholar

[86]	Allam, A., Kalnis, P. and Solovyev, V. (2015) Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics, 31, 3421–3428 CrossRef Pubmed Google scholar

[87]	Salmela, L. and Rivals, E. (2014) LoRDEC: accurate and efficient long read error correction. Bioinformatics, 30, 3506–3514 CrossRef Pubmed Google scholar

[88]	Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760 CrossRef Pubmed Google scholar

[89]	Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. L. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 10, R25 CrossRef Pubmed Google scholar

[90]	Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S. L. (2004) Versatile and open software for comparing large genomes. Genome Biol., 5, R12 CrossRef Pubmed Google scholar

[91]	Ning, Z., Cox, A. J. and Mullikin, J. C. (2001) SSAHA: a fast search method for large DNA databases. Genome Res., 11, 1725–1729 CrossRef Pubmed Google scholar

[92]	Berlin, K., Koren, S., Chin, C. S., Drake, J. P., Landolin, J. M. and Phillippy, A. M. (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630 CrossRef Pubmed Google scholar

[93]	Li, H. (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32, 2103–2110 CrossRef Pubmed Google scholar

[94]	Medvedev, P., Scott, E., Kakaradov, B. and Pevzner, P. (2011) Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics, 27, i137–i141 CrossRef Pubmed Google scholar

[95]	Do, C. B., Mahabhashyam, M. S. P., Brudno, M. and Batzoglou, S. (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res., 15, 330–340 CrossRef Pubmed Google scholar

[96]	Nikolenko, S. I., Korobeynikov, A. I. and Alekseyev, M. A. (2013) BayesHammer: Bayesian clustering for error correction in single-cell sequencing, BMC genomics. BioMed Central, 2013, S7 CrossRef Google scholar

[97]	Kao, W. C., Chan, A. H. and Song, Y. S. (2011) ECHO: a reference-free short-read error correction algorithm. Genome Res., 21, 1181–1192 CrossRef Pubmed Google scholar

[98]	Chaisson, M. J. and Pevzner, P. A. (2008) Short read fragment assembly of bacterial genomes. Genome Res., 18, 324–330 CrossRef Pubmed Google scholar

[99]	Li, M., Liao, Z., He, Y., Wang, J., Luo, J. and Pan, Y. (2017) ISEA: iterative seed-extension algorithm for de novo assembly using paired-end information and insert size distribution. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 14, 916–925 CrossRef Pubmed Google scholar

[100]

Luo, J., Wang, J., Zhang, Z., Li, M. and Wu, F. X. (2017) BOSS: a novel scaffolding algorithm based on an optimized scaffold graph. Bioinformatics, 33, 169–176

CrossRef Pubmed Google scholar

[101]

Li, M., Tang, L., Wu, F. X., Pan, Y. and Wang, J. (2018) SCOP: a novel scaffolding algorithm based on contig classification and optimization. Bioinformatics, doi: 10.1093/bioinformatics/bty773

Pubmed

[102]

Huddleston, J., Ranade, S., Malig, M., Antonacci, F., Chaisson, M., Hon, L., Sudmant, P. H., Graves, T. A., Alkan, C., Dennis, M. Y., (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696

CrossRef Pubmed Google scholar

[103]

Mostovoy, Y., Levy-Sakin, M., Lam, J., Lam, E. T., Hastie, A. R., Marks, P., Lee, J., Chu, C., Lin, C., Džakula, Ž., (2016) A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods, 13, 587–590

CrossRef Pubmed Google scholar

[104]

Chaisson, M. J. and Tesler, G. (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics, 13, 238

CrossRef Pubmed Google scholar

[105]

Boetzer, M. and Pirovano, W. (2014) SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics, 15, 211

CrossRef Pubmed Google scholar

[106]

Lam, K. K., LaButti, K., Khalak, A. and Tse, D. (2015) FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads. Bioinformatics, 31, 3207–3209

CrossRef Pubmed Google scholar

[107]

Ye, C., Hill, C. M., Wu, S., Ruan, J. and Ma, Z. S. (2016) DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep., 6, 31900

CrossRef Pubmed Google scholar

[108]

Muggli, M. D., Puglisi, S. J., Ronen, R. and Boucher, C. (2015) Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics, 31, i80–i88

CrossRef Pubmed Google scholar

[109]

Wu, B., Li, M., Liao, X., Luo, J., Wu, F., Pan, Y. and Wang, J. (2018) MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1

CrossRef Pubmed Google scholar

[110]

Li, M., Wu, B., Yan, X., Luo, J., Pan, Y., Wu, F. X. and Wang, J. (2017) PECC: Correcting contigs based on paired-end read distribution. Comput. Biol. Chem., 69, 178–184

CrossRef Pubmed Google scholar

[111]

Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F. and Corbeil, J. (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol., 13, R122

CrossRef Pubmed Google scholar

[112]

Schatz, M. C., Sommer, D., Kelley, D. and Pop, M. (2010) De novo assembly of large genomes using cloud computing. In Proceedings of the Cold Spring Harbor Biology of Genomes Conference

[113]

Chang, Y. J., Chen, C. C., Ho, J. M. and Chen, C. –L. (2012) De novo assembly of high-throughput sequencing data with cloud computing and new operations on string graphs. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference. pp. 155–161

[114]

Guo, X., Yu, N., Ding, X., Wang, J. and Pan, Y. (2015) DIME: a novel framework for de novo metagenomic sequence assembly. J. Comput. Biol., 22, 159–177

CrossRef Pubmed Google scholar

[115]

Roberts, R. J., Carneiro, M. O. and Schatz, M. C. (2013) The advantages of SMRT sequencing. Genome Biol., 14, 405

CrossRef Pubmed Google scholar

[116]

Sharma, T. R., Devanna, B. N., Kiran, K., Singh, P. K., Arora, K., Jain, P., Tiwari, I. M., Dubey, H., Saklani, B., Kumari, M., (2018) Status and prospects of next generation sequencing technologies in crop plants. Curr. Issues Mol. Biol., 27, 1–36

CrossRef Pubmed Google scholar

[117]

Lee, H., Gurtowski, J., Yoo, S., Marcus, s., McCombie, W, and Schatz, M. (2014) Error correction and assembly complexity of single molecule sequencing reads. bioRxiv, 006395

[118]

Bashir, A., Klammer, A., Robins, W. P., Chin, C. S., Webster, D., Paxinos, E., Hsu, D., Ashby, M., Wang, S., Peluso, P., (2012) A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol., 30, 701–707

CrossRef Pubmed Google scholar

[119]

Warren, R. L., Yang, C., Vandervalk, B. P., Behsaz, B., Lagman, A., Jones, S. J. and Birol, I. (2015) LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. Gigascience, 4, 35

CrossRef Pubmed Google scholar

[120]

Gao, S., Bertrand, D., Chia, B. K. H. and Nagarajan, N. (2016) OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biol., 17, 102

CrossRef Pubmed Google scholar

[121]

Antipov, D., Korobeynikov, A., McLean, J. S. and Pevzner, P. A. (2016) hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics, 32, 1009–1015

CrossRef Pubmed Google scholar

[122]

CrossRef Pubmed Google scholar

[123]

Luo, J., Wang, J., Shang, J., Luo, H., Li, M., Wu, F. and Pan, Y. (2018) GapReduce: a gap filling algorithm based on partitioned read sets. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1

CrossRef Pubmed Google scholar

[124]

Boetzer, M. and Pirovano, W. (2012) Toward almost closed genomes with GapFiller. Genome Biol., 13, R56

CrossRef Pubmed Google scholar

[125]

Paulino, D., Warren, R. L., Vandervalk, B. P., Raymond, A., Jackman, S. D. and Birol, I. (2015) Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics, 16, 230

CrossRef Pubmed Google scholar

[126]

Kosugi, S., Hirakawa, H. and Tabata, S. (2015) GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics, 31, 3733–3741

CrossRef Pubmed Google scholar

[127]

English, A. C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., Qin, X., Muzny, D. M., Reid, J. G., Worley, K. C., (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One, 7, e47768

CrossRef Pubmed Google scholar

ACKNOWLEDGMENTS

This work has been supported by the National Natural Science Foundation of China (Nos. 61732009, 61772557 and 61420106009), supported by 111 Project (No. B18059) and the Fundamental Research Funds for the Central Universities of Central South University (No. 1053320171177).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xingyu Liao, Min Li, You Zou, Fang-Xiang Wu, Yi-Pan and Jianxin Wang declare that they have no conflict of interests.

This paper is a review and does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(1527 KB)

4455

Accesses

Citations

336

Altmetric

Detail

Sections

Recommended

Abstract
Author summary
Graphical abstract
Keywords
Cite this article
References
ACKNOWLEDGMENTS
COMPLIANCE WITH ETHICS GUIDELINES
RIGHTS & PERMISSIONS

Received	Revised	Accepted	Published
05 Apr 2018	14 Jun 2018	16 Jun 2018	15 Jun 2019
Online First Date	Issue Date
24 Apr 2019	30 May 2019

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs

Abstract

Author summary

Graphical abstract

Keywords

Cite this article

{{custom_sec.title}}

{{custom_sec.title}}

References

ACKNOWLEDGMENTS

COMPLIANCE WITH ETHICS GUIDELINES

RIGHTS & PERMISSIONS