Combinatorial pooled sequencing: experiment design and decoding

Chang-chang Cao, Xiao Sun

PDF(441 KB)
PDF(441 KB)
Quant. Biol. ›› 2016, Vol. 4 ›› Issue (1) : 36-46. DOI: 10.1007/s40484-016-0064-3

Combinatorial pooled sequencing: experiment design and decoding

Author information +
History +

Abstract

Owing to rapid advances in the next-generation sequencing technology, the cost of DNA sequencing has been reduced by over several orders of magnitude. However, genomic sequencing of individuals at the population scale is still restricted to a few model species due to the huge challenge of constructing libraries for thousands of samples. Meanwhile, pooled sequencing provides a cost-effective alternative to sequencing individuals separately, which could vastly reduce the time and cost for DNA library preparation. Technological improvements, together with the broad range of biological research questions that require large sample sizes, mean that pooled sequencing will continue to complement the sequencing of individual genomes and become increasingly important in the foreseeable future. However, simply mixing samples together for sequencing makes it impossible to identify reads that belongs to each sample. Barcoding technology could help to solve this problem, nonetheless, currently, barcoding every sample is costly especially for large-scale samples. An alternative to barcoding is combinatorial pooled sequencing which employs pooling pattern rather than short DNA barcodes to encode each sample. In combinatorial pooled sequencing, samples are mixed into few pools according to a carefully designed pooling strategy which allows the sequencing data to be decoded to identify the reads that belongs to the sample that are unique or rare in the population. In this review, we mainly survey the experiment design and decoding procedure for the combinatorial pooled sequencing applied in rare variant and rare haplotype carriers screening, complex genome assembling and single individual haplotyping.

Graphical abstract

Keywords

combinatorial pooled sequencing / experiment design / decoding

Cite this article

Download citation ▾
Chang-chang Cao, Xiao Sun. Combinatorial pooled sequencing: experiment design and decoding. Quant. Biol., 2016, 4(1): 36‒46 https://doi.org/10.1007/s40484-016-0064-3

References

[1]
van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426
CrossRef Pubmed Google scholar
[2]
Metzker, M. L. (2010) Sequencing technologies—the next generation. Nat. Rev. Genet., 11, 31–46
CrossRef Pubmed Google scholar
[3]
Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nat. Biotechnol., 26, 1135–1145
CrossRef Pubmed Google scholar
[4]
Schlötterer, C., Tobler, R., Kofler, R.and Nolte, V. (2014) Sequencing pools of individuals—mining genome-wide polymorphism data without big funding. Nat. Rev. Genet., 15, 749–763
CrossRef Pubmed Google scholar
[5]
Futschik, A. and Schlötterer, C. (2010) The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics, 186, 207–218
CrossRef Pubmed Google scholar
[6]
Patterson, N. and Gabriel, S. (2009) Combinatorics and next-generation sequencing. Nat. Biotechnol., 27, 826–827
CrossRef Pubmed Google scholar
[7]
Wang, W., Yin, X., Soo Pyon, Y., Hayes, M.and Li, J. (2013) Rare variant discovery and calling by sequencing pooled samples with overlaps. Bioinformatics, 29, 29–38
CrossRef Pubmed Google scholar
[8]
Smith, A. M., Heisler, L. E., St Onge, R. P., Farias-Hesson, E., Wallace, I. M., Bodeau, J., Harris, A. N., Perry, K. M., Giaever, G., Pourmand, N., (2010) Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res., 38, e142
CrossRef Pubmed Google scholar
[9]
Gautier, M., Foucaud, J., Gharbi, K., Cézard, T., Galan, M., Loiseau, A., Thomson, M., Pudlo, P., Kerdelhué, C. and Estoup, A. (2013) Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol., 22, 3766–3779
CrossRef Pubmed Google scholar
[10]
Cao, C.-C. and Sun, X. (2015) Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing. Bioinformatics, 31, 515–522
CrossRef Pubmed Google scholar
[11]
Lonardi, S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P. R., Wu, Y., Ciardo, G., Alsaihati, B., Ma, Y., (2013) Combinatorial pooling enables selective sequencing of the barley gene space. PLoS Comput. Biol., 9, e1003010
CrossRef Pubmed Google scholar
[12]
Lo, C., Liu, R., Lee, J., Robasky, K., Byrne, S., Lucchesi, C., Aach, J., Church, G., Bafna, V. and Zhang, K. (2013) On the design of clone-based haplotyping. Genome Biol., 14, R100
CrossRef Pubmed Google scholar
[13]
Skums, P., Artyomenko, A., Glebova, O., Ramachandran, S., Mandoiu, I., Campo, D. S., Dimitrova, Z., Zelikovsky, A. and Khudyakov, Y. (2015) Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. Bioinformatics, 31, 682–690
CrossRef Pubmed Google scholar
[14]
Ngo, H., and Du, D. (2000) A survey on combinatorial group testing algorithms with applications to DNA library screening. Discrete mathematical problems with medical applications, 55, 171–182.
[15]
Erlich, Y., Chang, K., Gordon, A., Ronen, R., Navon, O., Rooks, M. and Hannon, G. J. (2009) DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Res., 19, 1243–1253
CrossRef Pubmed Google scholar
[16]
Thierry-Mieg, N. (2006) A new pooling strategy for high-throughput screening: the Shifted Transversal Design. BMC Bioinformatics, 7, 28
CrossRef Pubmed Google scholar
[17]
Dorfman, R. (1943) The detection of defective members of large populations. Ann. Math. Stat., 14, 436–440
CrossRef Google scholar
[18]
Prabhu, S. and Pe’er, I. (2009) Overlapping pools for high-throughput targeted resequencing. Genome Res., 19, 1254–1261
CrossRef Pubmed Google scholar
[19]
Chen, H.-B. and Hwang, F. K. (2008) A survey on nonadaptive group testing algorithms through the angle of decoding. J. Comb. Optim., 15, 49–59
CrossRef Google scholar
[20]
Candes, E., Romberg, J. and Tao, T. (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math., 59, 1207–1223
CrossRef Google scholar
[21]
Donoho, D. (2006) Compressed sensing. IEEE Trans. Inf. Theory, 52, 1289–1306
CrossRef Google scholar
[22]
Bodmer, W. and Bonilla, C. (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet., 40, 695–701
CrossRef Pubmed Google scholar
[23]
Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R., Chakravarti, A., (2009) Finding the missing heritability of complex diseases. Nature, 461, 747–753
CrossRef Pubmed Google scholar
[24]
Nelson, M. R., Wegmann, D., Ehm, M. G., Kessner, D., St Jean, P., Verzilli, C., Shen, J., Tang, Z., Bacanu, S. A., Fraser, D., (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science, 337, 100–104
CrossRef Pubmed Google scholar
[25]
Tennessen, J. A., Bigham, A. W., O’Connor, T. D., Fu, W., Kenny, E. E., Gravel, S., McGee, S., Do, R., Liu, X., Jun, G., , (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337, 64–69
CrossRef Pubmed Google scholar
[26]
Du, D. and Hwang, F. (2000) Combinatorial group testing and its applications, 2nd ed. Singapore: World Scientific
[27]
Thierry-Mieg, N. and Bailly, G. (2008) Interpool: interpreting smart-pooling results. Bioinformatics, 24, 696–703
CrossRef Pubmed Google scholar
[28]
Golan, D., Erlich, Y. and Rosset, S. (2012) Weighted pooling—practical and cost-effective techniques for pooled high-throughput sequencing. Bioinformatics, 28, i197–i206
CrossRef Pubmed Google scholar
[29]
Shental, N., Amir, A. and Zuk, O. (2010) Identification of rare alleles and their carriers using compressed se(que)nsing. Nucleic Acids Res., 38, e179
CrossRef Pubmed Google scholar
[30]
Erlich, Y., Gordon, A., Brand, M., Hannon, G. J. and Mitra, P. P. (2010) Compressed Genotyping. IEEE Trans. Inf. Theory, 56, 706–723
CrossRef Pubmed Google scholar
[31]
Erlich, Y., Shental, N., Amir, A. and Zuk, O. (2009) Compressed sensing approach for high throughput carrier screen. In Communication, Control, and Computing, 2009 Allerton 2009 47th Annual Allerton Conference
[32]
Figueiredo, M. A., Nowak, R. D., and Wright, S. J. (2007) Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. Selected Topics in Signal Processing, 1, 586–597
[33]
Cao, C.-C., Li, C. and Sun, X. (2014) Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers. BMC Bioinformatics, 15, 195
CrossRef Pubmed Google scholar
[34]
Hwang, F. (2000) Random k-set pool designs with distinct columns. Probab. Engrg. Inform. Sci., 14, 49–56
CrossRef Google scholar
[35]
He, D., Zaitlen, N., Pasaniuc, B., Eskin, E. and Halperin, E. (2011) Genotyping common and rare variation using overlapping pool sequencing. BMC Bioinformatics, 12, S2
Pubmed
[36]
Hormozdiariy, F., Wang, Z., Yangy, W. -Y. and Eskiny, E. (2012) Efficient genotyping of individuals using overlapping pool sequencing and imputation. In Signals, Systems and Computers (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar Conference. 1023–1027.
[37]
Zuzarte, P. C., Denroche, R. E., Fehringer, G., Katzov-Eckert, H., Hung, R. J. and McPherson, J. D. (2014) A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms. PLoS One, 9, e93455
CrossRef Pubmed Google scholar
[38]
Bonachea, E. M., Zender, G., White, P., Corsmeier, D., Newsom, D., Fitzgerald-Butt, S., Garg, V. and McBride, K. L. (2014) Use of a targeted, combinatorial next-generation sequencing approach for the study of bicuspid aortic valve. BMC Med. Genomics, 7, 56
CrossRef Pubmed Google scholar
[39]
Cao, C.-C., Li, C., Huang, Z., Ma, X. and Sun, X. (2013) Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet. Epidemiol., 37, 820–830
CrossRef Pubmed Google scholar
[40]
Trégouët, D.-A., König, I. R., Erdmann, J., Munteanu, A., Braund, P. S., Hall, A. S., Grosshennig, A., Linsel-Nitschke, P., Perret, C., DeSuremain, M., (2009) Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nat. Genet., 41, 283–285
CrossRef Pubmed Google scholar
[41]
Niu, T. (2004) Algorithms for inferring haplotypes. Genet. Epidemiol., 27, 334–347
CrossRef Pubmed Google scholar
[42]
Iliadis, A., Anastassiou, D. and Wang, X. (2012) Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data. BMC Genet., 13, 94
CrossRef Pubmed Google scholar
[43]
Chang, Y.-C., Chang, L.-Y., Chang, T.-J., Jiang, Y.-D., Lee, K.-C., Kuo, S.-S., Lee, W.-J. and Chuang, L.-M. (2010) The associations of LPIN1 gene expression in adipose tissue with metabolic phenotypes in the Chinese population. Obesity (Silver Spring), 18, 7–12
CrossRef Pubmed Google scholar
[44]
Jin, H., Stewart, T. L., Hof, R. V., Reid, D. M., Aspden, R. M. and Ralston, S. (2009) A rare haplotype in the upstream regulatory region of COL1A1 is associated with reduced bone quality and hip fracture. J. Bone Miner. Res., 24, 448–454
CrossRef Pubmed Google scholar
[45]
Lambert, J. C., Grenier-Boley, B., Harold, D., Zelenika, D., Chouraki, V., Kamatani, Y., Sleegers, K., Ikram, M. A., Hiltunen, M., Reitz, C., (2013) Genome-wide haplotype association study identifies the FRMD4A gene as a risk locus for Alzheimer’s disease. Mol. Psychiatry, 18, 461–470
CrossRef Pubmed Google scholar
[46]
Martin, R. J. L., McKnight, A. J., Patterson, C. C., Sadlier, D. M., Maxwell, A. P. and Group, T. W. U. G. S., and the Warren 3/UK GoKinD Study Group. (2010) A rare haplotype of the vitamin D receptor gene is protective against diabetic nephropathy. Nephrol. Dial. Transplant., 25, 497–503
CrossRef Pubmed Google scholar
[47]
Long, Q.,Jeffares, D. C., Zhang, Q., Ye, K., Nizhynska, V., Ning, Z., Tyler-Smith, C. and Nordborg, M. (2011) PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS One, 6, e15292
CrossRef Pubmed Google scholar
[48]
Kessner, D., Turner, T. L. and Novembre, J. (2013) Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol., 30, 1145–1158
CrossRef Pubmed Google scholar
[49]
Pirinen, M. (2009) Estimating population haplotype frequencies from pooled SNP data using incomplete database information. Bioinformatics, 25, 3296–3302
CrossRef Pubmed Google scholar
[50]
Gasbarra, D., Kulathinal, S., Pirinen, M. and Sillanpää, M. J. (2011) Estimating haplotype frequencies by combining data from large DNA pools with database information. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 8, 36–44
CrossRef Pubmed Google scholar
[51]
Treangen, T. J. and Salzberg, S. L. (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet., 13, 36–46
Pubmed
[52]
Zhang, G., Fang, X., Guo, X., Li, L., Luo, R., Xu, F., Yang, P., Zhang, L., Wang, X., Qi, H., (2012) The oyster genome reveals stress adaptation and complexity of shell formation. Nature, 490, 49–54
CrossRef Pubmed Google scholar
[53]
Lonardi, S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P. R., Wu, Y., Ciardo, G., Alsaihati, B. and Ma, Y. (2011) Barcoding-free BAC pooling enables combinatorial selective sequencing of the barley gene space. arXiv:1112.4438.
[54]
Engler, F. W., Hatfield, J., Nelson, W.and Soderlund, C. A. (2003) Locating sequence on FPC maps and selecting a minimal tiling path. Genome Res., 13, 2152–2163
CrossRef Pubmed Google scholar
[55]
Bozdag, S., Close, T. J. and Lonardi, S. (2008) Computing the minimal tiling path from a physical map by integer linear programming. In Algorithms in Bioinformatics. 148–161. Berlin: Springer Berlin Heidelberg
[56]
Duma, D., Wootters, M., Gilbert, A. C., Ngo, H. Q., Rudra, A., Alpert, M., Close, T. J., Ciardo, G. and Lonardi, S. (2013) Accurate decoding of pooled sequenced data using compressed sensing. In Algorithms in Bioinformatics.70–84. Berlin: Springer Berlin Heidelberg
[57]
Glusman, G., Cox, H. C.and Roach, J. C. (2014) Whole-genome haplotyping approaches and genomic medicine. Genome Med., 6, 73
CrossRef Pubmed Google scholar
[58]
Yang, H., Chen, X. and Wong, W. H. (2011) Completely phased genome sequencing through chromosome sorting. Proc. Natl. Acad. Sci. USA, 108, 12–17
CrossRef Pubmed Google scholar
[59]
Fan, H. C., Wang, J., Potanina, A. and Quake, S. R. (2011) Whole-genome molecular haplotyping of single cells. Nat. Biotechnol., 29, 51–57
CrossRef Pubmed Google scholar
[60]
Ma, L., Xiao, Y., Huang, H., Wang, Q., Rao, W., Feng, Y., Zhang, K. and Song, Q. (2010) Direct determination of molecular haplotypes by chromosome microdissection. Nat. Methods, 7, 299–301
CrossRef Pubmed Google scholar
[61]
Selvaraj, S., R Dixon, J., Bansal, V.and Ren, B. (2013) Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol., 31, 1111–1118
CrossRef Pubmed Google scholar
[62]
Snyder, M. W., Adey, A., Kitzman, J. O. and Shendure, J. (2015) Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet., 16, 344–358
CrossRef Pubmed Google scholar
[63]
Kitzman, J. O., Mackenzie, A. P., Adey, A., Hiatt, J. B., Patwardhan, R. P., Sudmant, P. H., Ng, S. B., Alkan, C., Qiu, R., Eichler, E. E., (2011) Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol., 29, 59–63
CrossRef Pubmed Google scholar
[64]
Suk, E.-K., McEwen, G. K., Duitama, J., Nowick, K., Schulz, S., Palczewski, S., Schreiber, S., Holloway, D. T., McLaughlin, S., Peckham, H., (2011) A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res., 21, 1672–1685
CrossRef Pubmed Google scholar
[65]
Peters, B. A., Kermani, B. G., Sparks, A. B., Alferov, O., Hong, P., Alexeev, A., Jiang, Y., Dahl, F., Tang, Y. T., Haas, J., (2012) Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature, 487, 190–195
CrossRef Pubmed Google scholar
[66]
Kaper, F., Swamy, S., Klotzle, B., Munchel, S., Cottrell, J., Bibikova, M., Chuang, H.-Y., Kruglyak, S., Ronaghi, M., Eberle, M. A., (2013) Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA, 110, 5552–5557
CrossRef Pubmed Google scholar
[67]
Amini, S., Pushkarev, D., Christiansen, L., Kostem, E., Royce, T., Turk, C., Pignatelli, N., Adey, A., Kitzman, J. O., Vijayan, K., (2014) Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet., 46, 1343–1349
CrossRef Pubmed Google scholar
[68]
Zielinski, D., Gordon, A., Zaks, B. L. and Erlich, Y. (2014) iPipet: sample handling using a tablet. Nat. Methods, 11, 784–785
CrossRef Pubmed Google scholar
[69]
Cradic, K. W., Murphy, S. J., Drucker, T. M., Sikkink, R. A., Eberhardt, N. L., Neuhauser, C., Vasmatzis, G. and Grebe, S. K. (2014) A simple method for gene phasing using mate pair sequencing. BMC Med. Genet., 15, 19
CrossRef Pubmed Google scholar
[70]
Feder, A. F., Petrov, D. A. and Bergland, A. O. (2012) LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One, 7, e48588
CrossRef Pubmed Google scholar
[71]
Clarke, J., Wu, H. C., Jayasinghe, L., Patel, A., Reid, S. and Bayley, H. (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol., 4, 265–270
CrossRef Pubmed Google scholar
[72]
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133–138
CrossRef Pubmed Google scholar
[73]
Konczal, M., Koteja, P., Stuglik, M. T., Radwan, J. and Babik, W. (2014) Accuracy of allele frequency estimation using pooled RNA-Seq. Mol. Ecol. Resour., 14, 381–392
CrossRef Pubmed Google scholar
[74]
Hill, J. T., Demarest, B. L., Bisgrove, B. W., Gorsi, B., Su, Y. -C., and Yost, H. J. (2013) MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res., 23, 687–697.

ACKNOWLEDGEMENTS

This work was supported by the National Basic Research Program of China (No. 2012CB316501) and the National Natural Science Foundation of China (No. 61472078) and the Scientific Research Foundation of Graduate School of Southeast University.
The authors Chang-chang Cao and Xiao Sun declare no competing financial interests.ƒThis article does not contain any studies with human or animal subjects performed by any of the authors.
Funding
 

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(441 KB)

Accesses

Citations

Detail

Sections
Recommended

/