A survey on de novo assembly methods for single-molecular sequencing

Ying Chen, Chuan-Le Xiao

PDF(217 KB)
PDF(217 KB)
Quant. Biol. ›› 2020, Vol. 8 ›› Issue (3) : 203-215. DOI: 10.1007/s40484-020-0214-5
REVIEW
REVIEW

A survey on de novo assembly methods for single-molecular sequencing

Author information +
History +

Abstract

Background: The single-molecular sequencing (SMS) is under rapid development and generating increasingly long and accurate sequences. De novo assembly of genomes from SMS sequences is a critical step for many genomic studies. To scale well with the developing trends of SMS, many de novo assemblers for SMS have been released. These assembly workflows can be categorized into two different kinds: the correction-and-assembly strategy and the assembly-and-correction strategy, both of which are gaining more and more attentions.

Results: In this article we make a discussion on the characteristics of errors in SMS sequences. We then review the currently widely applied de novo assemblers for SMS sequences. We also describe computational methods relevant to de novo assembly, including the alignment methods and the error correction methods. Benchmarks are provided to analyze their performance on different datasets and to provide use guides on applying the computation methods.

Conclusion: We make a detailed review on the latest development of de novo assembly and some relevant algorithms for SMS, including their rationales, solutions and results. Besides, we provide use guides on the algorithms based on their benchmark results. Finally we conclude the review by giving some developing trends of third generation sequencing (TGS).

Graphical abstract

Keywords

third generation sequencing / single-molecular real-time sequencing / sequence alignment / sequence error correction / de novo assembly

Cite this article

Download citation ▾
Ying Chen, Chuan-Le Xiao. A survey on de novo assembly methods for single-molecular sequencing. Quant. Biol., 2020, 8(3): 203‒215 https://doi.org/10.1007/s40484-020-0214-5

References

[1]
Brown, S. D., Nagaraju, S., Utturkar, S., De Tissera, S., Segovia, S., Mitchell, W., Land, M. L., Dassanayake, A. and Köpke, M. (2014) Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol. Biofuels, 7, 40
CrossRef Pubmed Google scholar
[2]
Rhoads, A. and Au, K. F. (2015) PacBio sequencing and its applications. Genom. Proteom. Bioinf., 13, 278–289
CrossRef Pubmed Google scholar
[3]
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133–138
CrossRef Pubmed Google scholar
[4]
Detter, J. C., Johnson, S. L., Bishop-Lilly, K. A., Chain, P. S., Gibbons, H. S., Minogue, T. D., Sozhamannan, S., Van Gieson, E.J. and Resnick, I. G. (2014) Nucleic acid sequencing for characterizing infectious and/or novel agents in complex samples. In: Biological Identification, pp. 3–53. Sawston: Woodhead Publishing
[5]
Wenger, A. M., Peluso, P., Rowell, W. J., Chang, P. C., Hall, R. J., Concepcion, G. T., Ebler, J., Fungtammasan, A., Kolesnikov, A., Olson, N. D., (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol., 37, 1155–1162
CrossRef Pubmed Google scholar
[6]
Chin, C. S., Alexander, D. H., Marks, P., Klammer, A. A., Drake, J., Heiner, C., Clum, A., Copeland, A., Huddleston, J., Eichler, E. E., (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods, 10, 563–569
CrossRef Pubmed Google scholar
[7]
Magi, A., Semeraro, R., Mingrino, A., Giusti, B. and D’Aurizio, R. (2018) Nanopore sequencing data analysis: state of the art, applications and challenges. Brief. Bioinformatics, 19, 1256–1272
Pubmed
[8]
Jain, M., Koren, S., Miga, K. H., Quick, J., Rand, A. C., Sasani, T. A., Tyson, J. R., Beggs, A. D., Dilthey, A. T., Fiddes, I. T., (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol., 36, 338–345
CrossRef Pubmed Google scholar
[9]
Magi, A., Giusti, B. and Tattini, L. (2017) Characterization of MinION nanopore data for resequencing analyses. Brief. Bioinformatics, 18, 940–953
Pubmed
[10]
Rang, F. J., Kloosterman, W. P. and de Ridder, J. (2018) From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol., 19, 90
CrossRef Pubmed Google scholar
[11]
Loman, N. J., Quick, J. and Simpson, J. T. (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods, 12, 733–735
CrossRef Pubmed Google scholar
[12]
Walker, B. J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., Cuomo, C. A., Zeng, Q., Wortman, J., Young, S. K., (2014) Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One, 9, e112963
CrossRef Pubmed Google scholar
[13]
Kingan, S. B., Heaton, H., Cudini, J., Lambert, C. C., Baybayan, P., Galvin, B. D., Durbin, R., Korlach, J. and Lawniczak, M. K. N. (2019) A high-quality de novo genome assembly from a single mosquito using PacBio sequencing. Genes (Basel), 10, 62
CrossRef Pubmed Google scholar
[14]
Vaser, R., Sović I., Nagarajan, N. and Šikić, M. (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res., 27, 737–746
CrossRef Pubmed Google scholar
[15]
Bashir, A., Klammer, A., Robins, W. P., Chin, C. S., Webster, D., Paxinos, E., Hsu, D., Ashby, M., Wang, S., Peluso, P., (2012) A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol., 30, 701–707
CrossRef Pubmed Google scholar
[16]
Koren, S., Schatz, M. C., Walenz, B. P., Martin, J., Howard, J. T., Ganapathy, G., Wang, Z., Rasko, D. A., McCombie, W. R., Jarvis, E. D., (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol., 30, 693–700
CrossRef Pubmed Google scholar
[17]
Ribeiro, F. J., Przybylski, D., Yin, S., Sharpe, T., Gnerre, S., Abouelleil, A., Berlin, A. M., Montmayeur,A., Shea, T. P., Walker, B. J., (2012) Finished bacterial genomes from shotgun sequence data. Genome Res., 22, 2270–2277
CrossRef Pubmed Google scholar
[18]
Chin, C. S., Alexander, D. H., Marks, P., Klammer, A. A., Drake, J., Heiner, C., Clum, A., Copeland, A., Huddleston, J., Eichler, E. E., (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods, 10, 563–569
CrossRef Pubmed Google scholar
[19]
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol., 147, 195–197
CrossRef Pubmed Google scholar
[20]
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402
CrossRef Pubmed Google scholar
[21]
Chaisson, M. J. and Tesler, G. (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics, 13, 238
CrossRef Pubmed Google scholar
[22]
Ferragina, P. and Manzini, G. (2000) Opportunistic data structures with applications In: Proceedings the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE
[23]
Li, H. and Durbin, R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589–595
CrossRef Pubmed Google scholar
[24]
Ben, L. and Salzberg, S. L. (2012) Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359
[25]
Chen, Y., Ye, W., Zhang, Y. and Xu, Y. (2015) High speed BLASTN: an accelerated MegaBLAST search tool. Nucleic Acids Res., 43, 7762–7768
CrossRef Pubmed Google scholar
[26]
Lam, T. W., Sung, W. K., Tam, S. L., Wong, C. K. and Yiu, S. M. (2008) Compressed indexing and local alignment of DNA. Bioinformatics, 24, 791–797
CrossRef Pubmed Google scholar
[27]
Abouelhoda, M. I. and Ohlebusch, E. A. (2003) Local chaining algorithm and its applications in comparative genomics. In: International Workshop on Algorithms in Bioinformatics, pp. 1–16. Berlin: Springer
[28]
Eppstein, D., Galil, Z., Giancarlo, R. and Italiano, G. F. (1992) Sparse dynamic programming I: linear cost functions. J. Assoc. Comput. Mach., 39, 519–545
CrossRef Google scholar
[29]
Myers, G. (2014) Efficient local alignment discovery amongst noisy long reads In: International Workshop on Algorithms in Bioinformatics, pp. 52–67. Berlin: Springer
[30]
Cormen, T. H., Leiserson,C. E., Rivest, R. L. and Stein, C. (2009) Introduction to Algorithms (3rd. edition), pp.197–204. Cambridge: MIT Press
[31]
Myers, E. W. (1986) AnO (ND) difference algorithm and its variations. Algorithmica, 1, 251–266
CrossRef Google scholar
[32]
Roberts, M., Hayes, W., Hunt, B. R., Mount, S. M. and Yorke, J. A. (2004) Reducing storage requirements for biological sequence comparison. Bioinformatics, 20, 3363–3369
CrossRef Pubmed Google scholar
[33]
Li, H. (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32, 2103–2110
CrossRef Pubmed Google scholar
[34]
Li, H. (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34, 3094–3100
CrossRef Pubmed Google scholar
[35]
Gotoh, O. (1990) Optimal sequence alignment allowing for long gaps. Bull. Math. Biol., 52, 359–373
CrossRef Pubmed Google scholar
[36]
Suzuki, H. and Kasahara, M. (2018) Introducing difference recurrence relations for faster semi-global alignment of long sequences. BMC Bioinformatics, 19, 45
CrossRef Pubmed Google scholar
[37]
Šošic, M. and Šikic, M. (2017) Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics, 33, 1394–1395
CrossRef Pubmed Google scholar
[38]
Myers, G. (1999) A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. Assoc. Comput. Mach., 46, 395–415
CrossRef Google scholar
[39]
Hirschberg, D. S. (1975) A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18, 341–343
CrossRef Google scholar
[40]
Lee, C., Grasso, C. and Sharlow, M. F. (2002) Multiple sequence alignment using partial order graphs. Bioinformatics, 18, 452–464
CrossRef Pubmed Google scholar
[41]
Chin, C. S., Peluso, P., Sedlazeck, F. J., Nattestad, M., Concepcion,G. T., Clum, A., Dunn, C., O’Malley, R., Figueroa-Balderas, R., Morales-Cruz, A., (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods, 13, 1050–1054
CrossRef Pubmed Google scholar
[42]
Myers, E. W. (2005) The fragment assembly string graph. Bioinformatics, 21(suppl_2), ii79–ii85
[43]
Berlin, K., Koren, S., Chin, C. S., Drake, J. P., Landolin, J. M. and Phillippy, A. M. (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630
CrossRef Pubmed Google scholar
[44]
Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., Bergman, N. H. and Phillippy, A. M. (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res., 27, 722–736
CrossRef Pubmed Google scholar
[45]
Miller, J. R., Delcher, A. L., Koren, S., Venter, E., Walenz, B. P., Brownley, A., Johnson, J., Li, K., Mobarry, C. and Sutton, G. (2008) Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 24, 2818–2824
CrossRef Pubmed Google scholar
[46]
Xiao, C. L., Chen, Y., Xie, S. Q., Chen,K.-N., Wang, Y., Han, Y., Luo F., Xie , Z. (2017) MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat. Methods, 14, 1072–1074
[47]
Kolmogorov, M., Yuan, J., Lin, Y. and Pevzner, P. A. (2019) Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol., 37, 540–546
CrossRef Pubmed Google scholar
[48]
Ruan, J. and Li, H. (2020) Fast and accurate long-read assembly with wtdbg2. Nat. Methods, 17, 155–158
CrossRef Pubmed Google scholar
[49]
Chen, Y., Nie, F., Xie, S.-Q. and Zheng, Y.-F. (2020) Fast and accurate assembly of Nanopore reads via progressive error correction and adaptive read selection. bioRxiv, 930107

COMPLIANCE WITH ETHICS GUIDELINES

The authors Ying Chen and Chuan-Le Xiao declare that they have no conflict of interests.
This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2020 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(217 KB)

Accesses

Citations

Detail

Sections
Recommended

/