Algorithmic approaches to clonal reconstruction in heterogeneous cell populations
Wazim Mohammed Ismail, Etienne Nzabarushimana, Haixu Tang
Algorithmic approaches to clonal reconstruction in heterogeneous cell populations
Background: The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones.
Results: In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem.
Conclusions: In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.
clonal theory / infinite sites assumption / clonal reconstruction problem / bacteria evolution / tumor evolution / combinatorial algorithm / probabilistic algorithm
[1] |
Shapiro, B. J. (2016) How clonal are bacteria over time? Curr. Opin. Microbiol., 31, 116–123
CrossRef
Pubmed
Google scholar
|
[2] |
Tibayrenc, M., Kjellberg, F. and Ayala, F. J. (1990) A clonal theory of parasitic protozoa: the population structures of Entamoeba, Giardia, Leishmania, Naegleria, Plasmodium, Trichomonas, and Trypanosoma and their medical and taxonomical consequences. Proc. Natl. Acad. Sci. USA, 87, 2414–2418
CrossRef
Pubmed
Google scholar
|
[3] |
Blount, Z. D., Barrick, J. E., Davidson, C. J. and Lenski, R. E. (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature, 489, 513–518
CrossRef
Pubmed
Google scholar
|
[4] |
Wielgoss, S., Barrick, J. E., Tenaillon, O., Cruveiller, S., Chane-Woon-Ming, B., Médigue, C., Lenski, R. E. and Schneider, D. (2011) Mutation rate inferred from synonymous substitutions in a long-term evolution experiment with Escherichia coli. G3: Genes, Genom. Genet., 1, 183–186
CrossRef
Pubmed
Google scholar
|
[5] |
Behringer, M. G., Choi, B. I., Miller, S. F., Doak, T. G., Karty, J. A., Guo, W. and Lynch, M. (2018) Escherichia coli cultures maintain stable subpopulation structure during long-term evolution. Proc. Natl. Acad. Sci. USA, 115, E4642–E4650
|
[6] |
Pon, J. R. and Marra, M. A. (2015) Driver and passenger mutations in cancer. Annu. Rev. Pathol., 10, 25–50
CrossRef
Pubmed
Google scholar
|
[7] |
Lenski, R. E., Rose, M. R., Simpson, S. C. and Tadler, S. C. (1991) Long-term experimental evolution in Escherichia coli. I. adaptation and divergence during 2,000 generations. Am. Nat., 138, 1315–1341
CrossRef
Google scholar
|
[8] |
Lenski, R. E., Wiser, M. J., Ribeck, N., Blount, Z. D., Nahum, J. R., Morris, J. J., Zaman, L., Turner, C. B., Wade, B. D., Maddamsetti, R.,
|
[9] |
Plucain, J., Hindré, T., Le Gac, M., Tenaillon, O., Cruveiller, S., Médigue, C., Leiby, N., Harcombe, W. R., Marx, C. J., Lenski, R. E.,
CrossRef
Pubmed
Google scholar
|
[10] |
Rozen, D. E. and Lenski, R. E. (2000) Long-term experimental evolution in Escherichia coli. VIII. dynamics of a balanced polymorphism. Am. Nat., 155, 24–35
CrossRef
Pubmed
Google scholar
|
[11] |
Wiser, M. J., Ribeck, N. and Lenski, R. E. (2013) Long-term dynamics of adaptation in asexual populations. Science, 342, 1364–1367
CrossRef
Pubmed
Google scholar
|
[12] |
Taus, T., Futschik, A. and Schlötterer, C. (2017) Quantifying selection with pool-seq time series data. Mol. Biol. Evol., 34, 3023–3034
CrossRef
Pubmed
Google scholar
|
[13] |
Schwartz, R., Schöffer, A.A. (2017) The evolution of tumour phylogenetics: principles and practice. Nat. Re. Genet ., 18, 213–229
|
[14] |
Kimura, M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61, 893–903
Pubmed
|
[15] |
El-Kebir, M., Oesper, L., Acheson-Field, H. and Raphael, B. J. (2015) Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics, 31, i62–i70
CrossRef
Pubmed
Google scholar
|
[16] |
Ng, C. K., Cooke, S. L., Howe, K., Newman, S., Xian, J., Temple, J., Batty, E. M., Pole, J. C., Langdon, S. P., Edwards, P. A.,
CrossRef
Pubmed
Google scholar
|
[17] |
Yang, L., Luquette, L. J., Gehlenborg, N., Xi, R., Haseley, P. S., Hsieh, C. H., Zhang, C., Ren, X., Protopopov, A., Chin, L.,
CrossRef
Pubmed
Google scholar
|
[18] |
Quigley, D. A., Dang, H. X., Zhao, S. G., Lloyd, P., Aggarwal, R., Alumkal, J. J., Foye, A., Kothari, V., Perry, M. D., Bailey, A. M.,
CrossRef
Pubmed
Google scholar
|
[19] |
Malikic, S., McPherson, A. W., Donmez, N. and Sahinalp, C. S. (2015) Clonality inference in multiple tumor samples using phylogeny. Bioinformatics, 31, 1349–1356
CrossRef
Pubmed
Google scholar
|
[20] |
Zare, H., Wang, J., Hu, A., Weber, K., Smith, J., Nickerson, D., Song, C., Witten, D., Blau, C. A. and Noble, W. S. (2014) Inferring clonal composition from multiple sections of a breast cancer. PLOS Comput. Biol., 10, e1003703
CrossRef
Pubmed
Google scholar
|
[21] |
Fischer, A., Vázquez-García, I., Illingworth J. R. C., and Mustonen, V. (2014) High-definition reconstruction of clonal composition in cancer. Cell Reports, 7, 1740–1752
CrossRef
Pubmed
Google scholar
|
[22] |
Zaccaria, S., El-Kebir, M., Klau, G. W. and Raphael, B. J. (2017) The copy-number tree mixture deconvolution problem and applications to multi-sample bulk sequencing tumor data. In: International Conference on Research in Computational Molecular Biology, pp. 318–335. Springer
|
[23] |
Husić, E., Li, X., Hujdurović, A., Mehine, M., Rizzi, R., Mäkinen, V., Milanič, M. and Tomescu, A. I. (2019) MIPUP: minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and ILP. Bioinformatics, 35, 769–777
CrossRef
Pubmed
Google scholar
|
[24] |
Popic, V., Salari, R., Hajirasouliha, I., Kashef-Haghighi, D., West, R. B. and Batzoglou, S. (2015) Fast and scalable inference of multi-sample cancer lineages. Genome Biol., 16, 91
CrossRef
Pubmed
Google scholar
|
[25] |
Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. and Morris, Q. (2014) Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics, 15, 35
CrossRef
Pubmed
Google scholar
|
[26] |
Deshwar, A. G., Vembu, S., Yung, C. K., Jang, G. H., Stein, L. and Morris, Q. (2015) PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol., 16, 35
CrossRef
Pubmed
Google scholar
|
[27] |
Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S., Bouchard-Côté, A. and Shah, S. P. (2014) PyClone: statistical inference of clonal population structure in cancer. Nat. Methods, 11, 396–398
CrossRef
Pubmed
Google scholar
|
[28] |
Hajirasouliha, I., Mahmoody, A. and Raphael, B. J. (2014) A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics, 30, i78–i86
CrossRef
Pubmed
Google scholar
|
[29] |
Miller, C. A., White, B. S., Dees, N. D., Griffith, M., Welch, J. S., Griffith, O. L., Vij, R., Tomasson, M. H., Graubert, T. A., Walter, M. J.,
CrossRef
Pubmed
Google scholar
|
[30] |
Oesper, L., Mahmoody, A. and Raphael, B. J. (2013) THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol., 14, R80
CrossRef
Pubmed
Google scholar
|
[31] |
Strino, F., Parisi, F., Micsinai, M. and Kluger, Y. (2013) TrAp: a tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Res., 41, e165
CrossRef
Pubmed
Google scholar
|
[32] |
Deveau, P., Colmet Daage, L., Oldridge, D., Bernard, V., Bellini, A., Chicard, M., Clement, N., Lapouble, E., Combaret, V., Boland, A.,
CrossRef
Pubmed
Google scholar
|
[33] |
Donmez, N., Malikic, S., Wyatt, A. W., Gleave, M. E., Collins, C. C. and Sahinalp, S. C. (2017) Clonality inference from single tumor samples using low-coverage sequence data. J. Comput. Biol., 24, 515–523
CrossRef
Pubmed
Google scholar
|
[34] |
Mohammed Ismail, W. and Tang, H. (2019) Clonal reconstruction from time course genomic sequencing data. In: International Conference on Intelligent Biology and Medicine
|
[35] |
El-Kebir, M., Satas, G., Oesper, L. and Raphael, B. J. (2016) Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Syst., 3, 43–53
CrossRef
Pubmed
Google scholar
|
[36] |
Nieboer, M. M., Dorssers, L. C. J., Straver, R., Looijenga, L. H. J. and de Ridder, J. (2018) TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors. PLoS One, 13, e0208002
CrossRef
Pubmed
Google scholar
|
[37] |
Yuan, K., Sakoparnig, T., Markowetz, F. and Beerenwinkel, N. (2015) BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol., 16, 36
CrossRef
Pubmed
Google scholar
|
[38] |
Jiang, Y., Qiu, Y., Minn, A. J. and Zhang, N. R. (2016) Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl. Acad. Sci. USA, 113, E5528–E5537
CrossRef
Pubmed
Google scholar
|
[39] |
Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E.,
CrossRef
Pubmed
Google scholar
|
[40] |
Myers, M. A., Satas, G. and Raphael, B. J. (2019) Calder: Inferring phylogenetic trees from longitudinal tumor samples. Cell Syst., 8, 514–522.e5
CrossRef
Pubmed
Google scholar
|
[41] |
Sengupta, S., Wang, J., Lee, J., Müller, P., Gulukota, K., Banerjee, A. and Ji, Y. (2014) Bayclone: Bayesian nonparametric inference of tumor subclones using NGS data. In: Pacific Symposium on Biocomputing Co-Chairs, pp. 467–478. World Scientific
|
[42] |
Lee, J., Müller, P., Sengupta, S., Gulukota, K. and Ji, Y. (2016) Bayesian inference for intratumour heterogeneity in mutations and copy number variation. J. R. Stat. Soc. Ser. C Appl. Stat., 65, 547–563
CrossRef
Pubmed
Google scholar
|
[43] |
Miura, S., Gomez, K., Murillo, O., Huuki, L. A., Vu, T., Buturla, T. and Kumar, S. (2018) Predicting clone genotypes from tumor bulk sequencing of multiple samples. Bioinformatics, 34, 4017–4026
CrossRef
Pubmed
Google scholar
|
[44] |
Marass, F., Mouliere, F., Yuan, K., Rosenfeld, N. and Markowetz, F. (2016) A phylogenetic latent feature model for clonal deconvolution. Ann. Appl. Stat., 10, 2377–2404
CrossRef
Google scholar
|
[45] |
Zhou, T., Sengupta, S., Müller, P. and Ji, Y. (2019) Treeclone: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. Ann. Appl. Stat., 13, 874–899
CrossRef
Google scholar
|
[46] |
Zhou, T., Müller, P., Sengupta, S. and Ji, Y. (2019) Pairclone: a bayesian subclone caller based on mutation pairs. J. R. Stat. Soc. Ser. C Appl. Stat., 68, 705–725
|
[47] |
Qiao, Y., Quinlan, A. R., Jazaeri, A. A., Verhaak, R. G., Wheeler, D. A. and Marth, G. T. (2014) SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol., 15, 443
CrossRef
Pubmed
Google scholar
|
[48] |
Zafar, H., Tzen, A., Navin, N., Chen, K. and Nakhleh, L. (2016) Sifit: a method for inferring tumor trees from single-cell sequencing data under finite-site models. Genome Biol., 18, 178
|
[49] |
Davis, A. and Navin, N. E. (2016) Computing tumor trees from single cells. Genome Biol., 17, 113
CrossRef
Pubmed
Google scholar
|
[50] |
Ross, E. M. and Markowetz, F. (2016) OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol., 17, 69
CrossRef
Pubmed
Google scholar
|
[51] |
El-Kebir, M. (2018) SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error. Bioinformatics, 34, i671–i679
CrossRef
Pubmed
Google scholar
|
[52] |
Malikic, S., Jahn, K., Kuipers, J., Sahinalp, C. and Beerenwinkel, N. (2017) Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat. Commu., 10, 2750
|
[53] |
Salehi, S., Steif, A., Roth, A., Aparicio, S., Bouchard-Côté, A. and Shah, S. P. (2017) ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome Biol., 18, 44
CrossRef
Pubmed
Google scholar
|
[54] |
Eaton, J., Wang, J. and Schwartz, R. (2018) Deconvolution and phylogeny inference of structural variations in tumor genomic samples. Bioinformatics, 34, i357–i365
CrossRef
Pubmed
Google scholar
|
[55] |
Lei, H., Lyu, B., Gertz, E. M., Schaeffer, A. A., Shi, X., Wu, K., Li, G., Xu, L., Hou, Y., Dean, M.,
|
[56] |
Aganezov, S. and Raphael, B. J. (2019) Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. bioRxiv
CrossRef
Google scholar
|
[57] |
Chen, G., Ning, B., Shi, T. (2019) Single-cell RNA-seq technologies and related computational data analysis. Front. Genet ., 10, 317–317
CrossRef
Google scholar
|
[58] |
Ferreira, P. F., Carvalho, A. M. and Vinga, S. (2018) Scalable probabilistic matrix factorization for single-cell RNA-seq analysis. bioRxiv
CrossRef
Google scholar
|
[59] |
Durif, G., Modolo, L., Mold, J. E., Lambert-Lacroix, S. and Picard, F. (2019) Probabilistic count matrix factorization for single cell expression data analysis. Bioinformatics, 35, 4011–4019
CrossRef
Pubmed
Google scholar
|
[60] |
Sun, S., Chen, Y., Liu, Y. and Shang, X. (2019) A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data. BMC Syst. Biol., 13, 28
CrossRef
Pubmed
Google scholar
|
/
〈 | 〉 |