Combinatorial pooled sequencing: experiment design and decoding

Chang-chang Cao , Xiao Sun

Quant. Biol. ›› 2016, Vol. 4 ›› Issue (1) : 36 -46.

PDF (441KB)
Quant. Biol. ›› 2016, Vol. 4 ›› Issue (1) : 36 -46. DOI: 10.1007/s40484-016-0064-3

Combinatorial pooled sequencing: experiment design and decoding

Author information +
History +
PDF (441KB)

Abstract

Owing to rapid advances in the next-generation sequencing technology, the cost of DNA sequencing has been reduced by over several orders of magnitude. However, genomic sequencing of individuals at the population scale is still restricted to a few model species due to the huge challenge of constructing libraries for thousands of samples. Meanwhile, pooled sequencing provides a cost-effective alternative to sequencing individuals separately, which could vastly reduce the time and cost for DNA library preparation. Technological improvements, together with the broad range of biological research questions that require large sample sizes, mean that pooled sequencing will continue to complement the sequencing of individual genomes and become increasingly important in the foreseeable future. However, simply mixing samples together for sequencing makes it impossible to identify reads that belongs to each sample. Barcoding technology could help to solve this problem, nonetheless, currently, barcoding every sample is costly especially for large-scale samples. An alternative to barcoding is combinatorial pooled sequencing which employs pooling pattern rather than short DNA barcodes to encode each sample. In combinatorial pooled sequencing, samples are mixed into few pools according to a carefully designed pooling strategy which allows the sequencing data to be decoded to identify the reads that belongs to the sample that are unique or rare in the population. In this review, we mainly survey the experiment design and decoding procedure for the combinatorial pooled sequencing applied in rare variant and rare haplotype carriers screening, complex genome assembling and single individual haplotyping.

Graphical abstract

Keywords

combinatorial pooled sequencing / experiment design / decoding

Cite this article

Download citation ▾
Chang-chang Cao, Xiao Sun. Combinatorial pooled sequencing: experiment design and decoding. Quant. Biol., 2016, 4(1): 36-46 DOI:10.1007/s40484-016-0064-3

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426

[2]

Metzker, M. L. (2010) Sequencing technologies—the next generation. Nat. Rev. Genet., 11, 31–46

[3]

Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nat. Biotechnol., 26, 1135–1145

[4]

Schlötterer, C., Tobler, R., Kofler, R.and Nolte, V. (2014) Sequencing pools of individuals—mining genome-wide polymorphism data without big funding. Nat. Rev. Genet., 15, 749–763

[5]

Futschik, A. and Schlötterer, C. (2010) The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics, 186, 207–218

[6]

Patterson, N. and Gabriel, S. (2009) Combinatorics and next-generation sequencing. Nat. Biotechnol., 27, 826–827

[7]

Wang, W., Yin, X., Soo Pyon, Y., Hayes, M.and Li, J. (2013) Rare variant discovery and calling by sequencing pooled samples with overlaps. Bioinformatics, 29, 29–38

[8]

Smith, A. M., Heisler, L. E., St Onge, R. P., Farias-Hesson, E., Wallace, I. M., Bodeau, J., Harris, A. N., Perry, K. M., Giaever, G., Pourmand, N., (2010) Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res., 38, e142

[9]

Gautier, M., Foucaud, J., Gharbi, K., Cézard, T., Galan, M., Loiseau, A., Thomson, M., Pudlo, P., Kerdelhué, C. and Estoup, A. (2013) Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol., 22, 3766–3779

[10]

Cao, C.-C. and Sun, X. (2015) Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing. Bioinformatics, 31, 515–522

[11]

Lonardi, S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P. R., Wu, Y., Ciardo, G., Alsaihati, B., Ma, Y., (2013) Combinatorial pooling enables selective sequencing of the barley gene space. PLoS Comput. Biol., 9, e1003010

[12]

Lo, C., Liu, R., Lee, J., Robasky, K., Byrne, S., Lucchesi, C., Aach, J., Church, G., Bafna, V. and Zhang, K. (2013) On the design of clone-based haplotyping. Genome Biol., 14, R100

[13]

Skums, P., Artyomenko, A., Glebova, O., Ramachandran, S., Mandoiu, I., Campo, D. S., Dimitrova, Z., Zelikovsky, A. and Khudyakov, Y. (2015) Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. Bioinformatics, 31, 682–690

[14]

Ngo, H., and Du, D. (2000) A survey on combinatorial group testing algorithms with applications to DNA library screening. Discrete mathematical problems with medical applications, 55, 171–182.

[15]

Erlich, Y., Chang, K., Gordon, A., Ronen, R., Navon, O., Rooks, M. and Hannon, G. J. (2009) DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Res., 19, 1243–1253

[16]

Thierry-Mieg, N. (2006) A new pooling strategy for high-throughput screening: the Shifted Transversal Design. BMC Bioinformatics, 7, 28

[17]

Dorfman, R. (1943) The detection of defective members of large populations. Ann. Math. Stat., 14, 436–440

[18]

Prabhu, S. and Pe’er, I. (2009) Overlapping pools for high-throughput targeted resequencing. Genome Res., 19, 1254–1261

[19]

Chen, H.-B. and Hwang, F. K. (2008) A survey on nonadaptive group testing algorithms through the angle of decoding. J. Comb. Optim., 15, 49–59

[20]

Candes, E., Romberg, J. and Tao, T. (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math., 59, 1207–1223

[21]

Donoho, D. (2006) Compressed sensing. IEEE Trans. Inf. Theory, 52, 1289–1306

[22]

Bodmer, W. and Bonilla, C. (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet., 40, 695–701

[23]

Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R., Chakravarti, A., (2009) Finding the missing heritability of complex diseases. Nature, 461, 747–753

[24]

Nelson, M. R., Wegmann, D., Ehm, M. G., Kessner, D., St Jean, P., Verzilli, C., Shen, J., Tang, Z., Bacanu, S. A., Fraser, D., (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science, 337, 100–104

[25]

Tennessen, J. A., Bigham, A. W., O’Connor, T. D., Fu, W., Kenny, E. E., Gravel, S., McGee, S., Do, R., Liu, X., Jun, G., , (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337, 64–69

[26]

Du, D. and Hwang, F. (2000) Combinatorial group testing and its applications, 2nd ed. Singapore: World Scientific

[27]

Thierry-Mieg, N. and Bailly, G. (2008) Interpool: interpreting smart-pooling results. Bioinformatics, 24, 696–703

[28]

Golan, D., Erlich, Y. and Rosset, S. (2012) Weighted pooling—practical and cost-effective techniques for pooled high-throughput sequencing. Bioinformatics, 28, i197–i206

[29]

Shental, N., Amir, A. and Zuk, O. (2010) Identification of rare alleles and their carriers using compressed se(que)nsing. Nucleic Acids Res., 38, e179

[30]

Erlich, Y., Gordon, A., Brand, M., Hannon, G. J. and Mitra, P. P. (2010) Compressed Genotyping. IEEE Trans. Inf. Theory, 56, 706–723

[31]

Erlich, Y., Shental, N., Amir, A. and Zuk, O. (2009) Compressed sensing approach for high throughput carrier screen. In Communication, Control, and Computing, 2009 Allerton 2009 47th Annual Allerton Conference

[32]

Figueiredo, M. A., Nowak, R. D., and Wright, S. J. (2007) Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. Selected Topics in Signal Processing, 1, 586–597

[33]

Cao, C.-C., Li, C. and Sun, X. (2014) Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers. BMC Bioinformatics, 15, 195

[34]

Hwang, F. (2000) Random k-set pool designs with distinct columns. Probab. Engrg. Inform. Sci., 14, 49–56

[35]

He, D., Zaitlen, N., Pasaniuc, B., Eskin, E. and Halperin, E. (2011) Genotyping common and rare variation using overlapping pool sequencing. BMC Bioinformatics, 12, S2

[36]

Hormozdiariy, F., Wang, Z., Yangy, W. -Y. and Eskiny, E. (2012) Efficient genotyping of individuals using overlapping pool sequencing and imputation. In Signals, Systems and Computers (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar Conference. 1023–1027.

[37]

Zuzarte, P. C., Denroche, R. E., Fehringer, G., Katzov-Eckert, H., Hung, R. J. and McPherson, J. D. (2014) A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms. PLoS One, 9, e93455

[38]

Bonachea, E. M., Zender, G., White, P., Corsmeier, D., Newsom, D., Fitzgerald-Butt, S., Garg, V. and McBride, K. L. (2014) Use of a targeted, combinatorial next-generation sequencing approach for the study of bicuspid aortic valve. BMC Med. Genomics, 7, 56

[39]

Cao, C.-C., Li, C., Huang, Z., Ma, X. and Sun, X. (2013) Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet. Epidemiol., 37, 820–830

[40]

Trégouët, D.-A., König, I. R., Erdmann, J., Munteanu, A., Braund, P. S., Hall, A. S., Grosshennig, A., Linsel-Nitschke, P., Perret, C., DeSuremain, M., (2009) Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nat. Genet., 41, 283–285

[41]

Niu, T. (2004) Algorithms for inferring haplotypes. Genet. Epidemiol., 27, 334–347

[42]

Iliadis, A., Anastassiou, D. and Wang, X. (2012) Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data. BMC Genet., 13, 94

[43]

Chang, Y.-C., Chang, L.-Y., Chang, T.-J., Jiang, Y.-D., Lee, K.-C., Kuo, S.-S., Lee, W.-J. and Chuang, L.-M. (2010) The associations of LPIN1 gene expression in adipose tissue with metabolic phenotypes in the Chinese population. Obesity (Silver Spring), 18, 7–12

[44]

Jin, H., Stewart, T. L., Hof, R. V., Reid, D. M., Aspden, R. M. and Ralston, S. (2009) A rare haplotype in the upstream regulatory region of COL1A1 is associated with reduced bone quality and hip fracture. J. Bone Miner. Res., 24, 448–454

[45]

Lambert, J. C., Grenier-Boley, B., Harold, D., Zelenika, D., Chouraki, V., Kamatani, Y., Sleegers, K., Ikram, M. A., Hiltunen, M., Reitz, C., (2013) Genome-wide haplotype association study identifies the FRMD4A gene as a risk locus for Alzheimer’s disease. Mol. Psychiatry, 18, 461–470

[46]

Martin, R. J. L., McKnight, A. J., Patterson, C. C., Sadlier, D. M., Maxwell, A. P. and Group, T. W. U. G. S., and the Warren 3/UK GoKinD Study Group. (2010) A rare haplotype of the vitamin D receptor gene is protective against diabetic nephropathy. Nephrol. Dial. Transplant., 25, 497–503

[47]

Long, Q.,Jeffares, D. C., Zhang, Q., Ye, K., Nizhynska, V., Ning, Z., Tyler-Smith, C. and Nordborg, M. (2011) PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS One, 6, e15292

[48]

Kessner, D., Turner, T. L. and Novembre, J. (2013) Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol., 30, 1145–1158

[49]

Pirinen, M. (2009) Estimating population haplotype frequencies from pooled SNP data using incomplete database information. Bioinformatics, 25, 3296–3302

[50]

Gasbarra, D., Kulathinal, S., Pirinen, M. and Sillanpää, M. J. (2011) Estimating haplotype frequencies by combining data from large DNA pools with database information. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 8, 36–44

[51]

Treangen, T. J. and Salzberg, S. L. (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet., 13, 36–46

[52]

Zhang, G., Fang, X., Guo, X., Li, L., Luo, R., Xu, F., Yang, P., Zhang, L., Wang, X., Qi, H., (2012) The oyster genome reveals stress adaptation and complexity of shell formation. Nature, 490, 49–54

[53]

Lonardi, S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P. R., Wu, Y., Ciardo, G., Alsaihati, B. and Ma, Y. (2011) Barcoding-free BAC pooling enables combinatorial selective sequencing of the barley gene space. arXiv:1112.4438.

[54]

Engler, F. W., Hatfield, J., Nelson, W.and Soderlund, C. A. (2003) Locating sequence on FPC maps and selecting a minimal tiling path. Genome Res., 13, 2152–2163

[55]

Bozdag, S., Close, T. J. and Lonardi, S. (2008) Computing the minimal tiling path from a physical map by integer linear programming. In Algorithms in Bioinformatics. 148–161. Berlin: Springer Berlin Heidelberg

[56]

Duma, D., Wootters, M., Gilbert, A. C., Ngo, H. Q., Rudra, A., Alpert, M., Close, T. J., Ciardo, G. and Lonardi, S. (2013) Accurate decoding of pooled sequenced data using compressed sensing. In Algorithms in Bioinformatics.70–84. Berlin: Springer Berlin Heidelberg

[57]

Glusman, G., Cox, H. C.and Roach, J. C. (2014) Whole-genome haplotyping approaches and genomic medicine. Genome Med., 6, 73

[58]

Yang, H., Chen, X. and Wong, W. H. (2011) Completely phased genome sequencing through chromosome sorting. Proc. Natl. Acad. Sci. USA, 108, 12–17

[59]

Fan, H. C., Wang, J., Potanina, A. and Quake, S. R. (2011) Whole-genome molecular haplotyping of single cells. Nat. Biotechnol., 29, 51–57

[60]

Ma, L., Xiao, Y., Huang, H., Wang, Q., Rao, W., Feng, Y., Zhang, K. and Song, Q. (2010) Direct determination of molecular haplotypes by chromosome microdissection. Nat. Methods, 7, 299–301

[61]

Selvaraj, S., R Dixon, J., Bansal, V.and Ren, B. (2013) Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol., 31, 1111–1118

[62]

Snyder, M. W., Adey, A., Kitzman, J. O. and Shendure, J. (2015) Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet., 16, 344–358

[63]

Kitzman, J. O., Mackenzie, A. P., Adey, A., Hiatt, J. B., Patwardhan, R. P., Sudmant, P. H., Ng, S. B., Alkan, C., Qiu, R., Eichler, E. E., (2011) Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol., 29, 59–63

[64]

Suk, E.-K., McEwen, G. K., Duitama, J., Nowick, K., Schulz, S., Palczewski, S., Schreiber, S., Holloway, D. T., McLaughlin, S., Peckham, H., (2011) A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res., 21, 1672–1685

[65]

Peters, B. A., Kermani, B. G., Sparks, A. B., Alferov, O., Hong, P., Alexeev, A., Jiang, Y., Dahl, F., Tang, Y. T., Haas, J., (2012) Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature, 487, 190–195

[66]

Kaper, F., Swamy, S., Klotzle, B., Munchel, S., Cottrell, J., Bibikova, M., Chuang, H.-Y., Kruglyak, S., Ronaghi, M., Eberle, M. A., (2013) Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA, 110, 5552–5557

[67]

Amini, S., Pushkarev, D., Christiansen, L., Kostem, E., Royce, T., Turk, C., Pignatelli, N., Adey, A., Kitzman, J. O., Vijayan, K., (2014) Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet., 46, 1343–1349

[68]

Zielinski, D., Gordon, A., Zaks, B. L. and Erlich, Y. (2014) iPipet: sample handling using a tablet. Nat. Methods, 11, 784–785

[69]

Cradic, K. W., Murphy, S. J., Drucker, T. M., Sikkink, R. A., Eberhardt, N. L., Neuhauser, C., Vasmatzis, G. and Grebe, S. K. (2014) A simple method for gene phasing using mate pair sequencing. BMC Med. Genet., 15, 19

[70]

Feder, A. F., Petrov, D. A. and Bergland, A. O. (2012) LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One, 7, e48588

[71]

Clarke, J., Wu, H. C., Jayasinghe, L., Patel, A., Reid, S. and Bayley, H. (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol., 4, 265–270

[72]

Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133–138

[73]

Konczal, M., Koteja, P., Stuglik, M. T., Radwan, J. and Babik, W. (2014) Accuracy of allele frequency estimation using pooled RNA-Seq. Mol. Ecol. Resour., 14, 381–392

[74]

Hill, J. T., Demarest, B. L., Bisgrove, B. W., Gorsi, B., Su, Y. -C., and Yost, H. J. (2013) MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res., 23, 687–697.

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (441KB)

2027

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/