Please wait a minute...

Quantitative Biology

Quant. Biol.    2018, Vol. 6 Issue (4) : 359-368     https://doi.org/10.1007/s40484-018-0155-4
METHODOLOGY ARTICLE |
WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets
Sheng Wang1(), Zhen Li, Yizhou Yu2, Xin Gao1()
1. Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
2. Department of Computer Science, University of Hong Kong, Hong Kong SAR 999077, China
3. School of Science and Engineering, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen 518172, China
Download: PDF(902 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: The Oxford MinION nanopore sequencer is the recently appealing third-generation genome sequencing device that is portable and no larger than a cellphone. Despite the benefits of MinION to sequence ultra-long reads in real-time, the high error rate of the existing base-calling methods, especially indels (insertions and deletions), prevents its use in a variety of applications.

Methods: In this paper, we show that such indel errors are largely due to the segmentation process on the input electrical current signal from MinION. All existing methods conduct segmentation and nucleotide label prediction in a sequential manner, in which the errors accumulated in the first step will irreversibly influence the final base-calling. We further show that the indel issue can be significantly reduced via accurate labeling of nucleotide and move labels directly from the raw signal, which can then be efficiently learned by a bi-directional WaveNet model simultaneously through feature sharing. Our bi-directional WaveNet model with residual blocks and skip connections is able to capture the extremely long dependency in the raw signal. Taking the predicted move as the segmentation guidance, we employ the Viterbi decoding to obtain the final base-calling results from the smoothed nucleotide probability matrix.

Results: Our proposed base-caller, WaveNano, achieves good performance on real MinION sequencing data from Lambda phage.

Conclusions: The signal-level nanopore base-caller WaveNano can obtain higher base-calling accuracy, and generate fewer insertions/deletions in the base-called sequences.

Keywords nanopore sequencing      bi-directional WaveNets      base-calling      third generation sequencing      deep learning     
Corresponding Authors: Sheng Wang,Xin Gao   
Online First Date: 30 November 2018    Issue Date: 10 December 2018
 Cite this article:   
Sheng Wang,Zhen Li,Yizhou Yu, et al. WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets[J]. Quant. Biol., 2018, 6(4): 359-368.
 URL:  
http://journal.hep.com.cn/qb/EN/10.1007/s40484-018-0155-4
http://journal.hep.com.cn/qb/EN/Y2018/V6/I4/359
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Sheng Wang
Zhen Li
Yizhou Yu
Xin Gao
Fig.1  Mechanism of MinION nanopore sequencer and the two signal-level labels.
Fig.2  Comparison of the conventional base-calling methods and the WaveNano model.
5-mer prediction Move prediction SeqID1 SeqID2 SeqID Grate
Metrichor / / 86.1 91.6 85.2 8.8
Albacore / / 87.8 88.4 85.9 8.2
WaveNano 63.3 94.7 95.6 93.2 92.3 5.6
w/o bi-WaveNet 59.7 92.2 93.7 91.3 89.4 7.1
w/o move guidance 56.4 / 91.1 84.6 83.3 12.8
Tab.1  Base-calling performance on the Lambda phage genome (48.5 Kb)
Fig.3  The overall pipeline of WaveNano).
Fig.4  Basic architecture of WaveNets model.
1 Cao, M. D., Nguyen, S. H., Ganesamoorthy, D., Elliott, A. G., Cooper, M. A. and Coin, L. J. (2017) Scaffolding and completing genome assemblies in real-time with nanopore sequencing. Nat. Commun., 8, 14515
https://doi.org/10.1038/ncomms14515. pmid: 28218240
2 Loman, N. J., Quick, J. and Simpson, J. T. (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods, 12, 733–735
https://doi.org/10.1038/nmeth.3444. pmid: 26076426
3 Li, Y., Han, R., Bi, C., Li, M., Wang, S. and Gao, X. (2018) DeepSimulator: a deep simulator for nanopore sequencing. Bioinformatics, 34, 2899–2908
https://doi.org/https://doi.org/10.1093/bioinformatics/bty223 pmid: 29659695.
4 Jain, M., Fiddes, I. T., Miga, K. H., Olsen, H. E., Paten, B. and Akeson, M. (2015) Improved data analysis for the MinION nanopore sequencer. Nat. Methods, 12, 351–356
https://doi.org/10.1038/nmeth.3290. pmid: 25686389
5 Lu, H., Giordano, F. and Ning, Z. (2016) Oxford Nanopore MinION sequencing and genome assembly. Genom. Proteom. Bioinf., 14, 265–279
https://doi.org/10.1016/j.gpb.2016.05.004. pmid: 27646134
6 Quick, J., Loman, N. J., Duraffour, S., Simpson, J. T., Severi, E., Cowley, L., Bore, J. A., Koundouno, R., Dudas, G., Mikhail, A., et al. (2016) Real-time, portable genome sequencing for Ebola surveillance. Nature, 530, 228–232
https://doi.org/10.1038/nature16996. pmid: 26840485
7 Castro-Wallace, S. L., Chiu, C. Y., John, K. K., Stahl, S. E., Rubins, K. H., McIntyre, A. B. R., Dworkin, J. P., Lupisella, M. L., Smith, D. J., Botkin, D. J., et al. (2017) Nanopore DNA sequencing and genome assembly on the International Space Station. Sci. Rep., 7, 18022
https://doi.org/10.1038/s41598-017-18364-0. pmid: 29269933
8 Loose, M., Malla, S. and Stout, M. (2016) Real-time selective sequencing using nanopore technology. Nat. Methods, 13, 751–754
https://doi.org/10.1038/nmeth.3930. pmid: 27454285
9 Jain, M., Olsen, H. E., Paten, B. and Akeson, M. (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol., 17, 239
https://doi.org/10.1186/s13059-016-1103-0. pmid: 27887629
10 Goodwin, S.,Gurtowski, J., Ethe-Sayers, S., Deshpande, P., Schatz, M. C. and McCombie, W. R. (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756
https://doi.org/10.1101/gr.191395.115. pmid: 26447147
11 Sovic, I., Šikić, M., Wilm, A., Fenlon, S. N., Chen, S. and Nagarajan, N. (2016) Fast and sensitive mapping of error-prone nanopore sequencing reads with GraphMap. Nat Commun., 7, 11307
https://doi.org/??10.1038/ncomms11307
12 Szalay, T. and Golovchenko, J. A. (2015) De novo sequencing and variant calling with nanopores using PoreSeq. Nat. Biotechnol., 33, 1087–1091
https://doi.org/10.1038/nbt.3360. pmid: 26352647
13 David, M., Dursi, L. J., Yao, D., Boutros, P. C. and Simpson, J. T. (2017) Nanocall: an open source basecaller for Oxford Nanopore sequencing data. Bioinformatics, 33, 49–55
https://doi.org/10.1093/bioinformatics/btw569. pmid: 27614348
14 Boža, V., Brejová, B. and Vinař, T. (2017) DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. PLoS One, 12, e0178751
https://doi.org/10.1371/journal.pone.0178751. pmid: 28582401
15 Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu K. (2016) Wavenet: A generative model for raw audio. ArXiv, 1609.03499
16 Hochreiter, S. and Schmidhuber, J. (1997) Long short-term memory. Neural Comput., 9, 1735–1780
https://doi.org/10.1162/neco.1997.9.8.1735. pmid: 9377276
17 Chung, J., Gulcehre, C., Cho, K. H. and Bengio, Y. (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv, 1412.3555
18 LeCun, Y., Bengio, Y. and Hinton, G. (2015) Deep learning. Nature, 521, 436–444
https://doi.org/10.1038/nature14539. pmid: 26017442
19 He, K., Zhang, X., Ren, S., and Sun, J. (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas
20 Hirschberg, J. and Manning, C. D. (2015) Advances in natural language processing. Science, 349, 261–266
https://doi.org/10.1126/science.aaa8685. pmid: 26185244
21 Wang, S., Sun, S., Li, Z., Zhang, R. and Xu, J. (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324
https://doi.org/10.1371/journal.pcbi.1005324. pmid: 28056090
22 Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410
https://doi.org/10.1016/S0022-2836(05)80360-2. pmid: 2231712
23 Pearson, W. R. and Miller, W. (1992) Dynamic programming algorithms for biological sequence comparison. In Methods in Enzymology. pp. 575–601, Elsevier
24 Wang, S., Ma, J. and Xu, J. (2016) AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics, 32, i672–i679
https://doi.org/10.1093/bioinformatics/btw446. pmid: 27587688
25 McIntyre, A. B., Rizzardi, L., Yu, A. M., Alexander, N., Rosen, G. L., Botkin, D. J., Stahl, S. E., John, K. K., Castro-Wallace, S. L., McGrath, K., et al. (2016) Nanopore sequencing in microgravity. npj Microgravity, 2, 16035
26 Teng, H., Cao, M. D., Hall, M. B., Duarte, T., Wang, S. and Coin, L. J. M. (2018) Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. Gigascience, 7, giy037
https://doi.org/10.1093/gigascience/giy037. pmid: 29648610
27 Han, R., Li, Y., Wang, S. and Gao, X. (2017) An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing. bioRxiv, 238857
28 van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., and Kavukcuoglu, K. (2016) Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems
29 Wang S., Sun S., and Xu J. (2016) AUC-maximized deep convolutional neural fields for protein sequence labeling. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science, Frasconi P., Landwehr N., Manco G., Vreeken J. (eds) vol 9852. Springer, Cham
30 Calders T., and Jaroszewicz S. (2007) Efficient AUC optimization for classification. In Knowledge Discovery in Databases: PKDD 2007. Lecture Notes in Computer Science, Kok J. N., Koronacki J., Lopez de Mantaras R., Matwin S., Mladenič D., Skowron A. (eds), vol 4702. Springer, Berlin, Heidelberg
Related articles from Frontiers Journals
[1] Tanlin Sun, Luhua Lai, Jianfeng Pei. Analysis of protein features and machine learning algorithms for prediction of druggable proteins[J]. Quant. Biol., 2018, 6(4): 334-343.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed