PDF
(693KB)
Abstract
Background: The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones.
Results: In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem.
Conclusions: In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.
Keywords
clonal theory
/
infinite sites assumption
/
clonal reconstruction problem
/
bacteria evolution
/
tumor evolution
/
combinatorial algorithm
/
probabilistic algorithm
Cite this article
Download citation ▾
Wazim Mohammed Ismail, Etienne Nzabarushimana, Haixu Tang.
Algorithmic approaches to clonal reconstruction in heterogeneous cell populations.
Quant. Biol., 2019, 7(4): 255-265 DOI:10.1007/s40484-019-0188-3
| [1] |
Shapiro, B. J. (2016) How clonal are bacteria over time? Curr. Opin. Microbiol., 31, 116–123
|
| [2] |
Tibayrenc, M., Kjellberg, F. and Ayala, F. J. (1990) A clonal theory of parasitic protozoa: the population structures of Entamoeba, Giardia, Leishmania, Naegleria, Plasmodium, Trichomonas, and Trypanosoma and their medical and taxonomical consequences. Proc. Natl. Acad. Sci. USA, 87, 2414–2418
|
| [3] |
Blount, Z. D., Barrick, J. E., Davidson, C. J. and Lenski, R. E. (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature, 489, 513–518
|
| [4] |
Wielgoss, S., Barrick, J. E., Tenaillon, O., Cruveiller, S., Chane-Woon-Ming, B., Médigue, C., Lenski, R. E. and Schneider, D. (2011) Mutation rate inferred from synonymous substitutions in a long-term evolution experiment with Escherichia coli. G3: Genes, Genom. Genet., 1, 183–186
|
| [5] |
Behringer, M. G., Choi, B. I., Miller, S. F., Doak, T. G., Karty, J. A., Guo, W. and Lynch, M. (2018) Escherichia coli cultures maintain stable subpopulation structure during long-term evolution. Proc. Natl. Acad. Sci. USA, 115, E4642–E4650
|
| [6] |
Pon, J. R. and Marra, M. A. (2015) Driver and passenger mutations in cancer. Annu. Rev. Pathol., 10, 25–50
|
| [7] |
Lenski, R. E., Rose, M. R., Simpson, S. C. and Tadler, S. C. (1991) Long-term experimental evolution in Escherichia coli. I. adaptation and divergence during 2,000 generations. Am. Nat., 138, 1315–1341
|
| [8] |
Lenski, R. E., Wiser, M. J., Ribeck, N., Blount, Z. D., Nahum, J. R., Morris, J. J., Zaman, L., Turner, C. B., Wade, B. D., Maddamsetti, R., (2015) Sustained fitness gains and variability in fitness trajectories in the long-term evolution experiment with Escherichia coli. P. Roy. Soc. B-Biol. Sci. 282, 20152292
|
| [9] |
Plucain, J., Hindré T., Le Gac, M., Tenaillon, O., Cruveiller, S., Médigue, C., Leiby, N., Harcombe, W. R., Marx, C. J., Lenski, R. E., (2014) Epistasis and allele specificity in the emergence of a stable polymorphism in Escherichia coli. Science, 343, 1366–1369
|
| [10] |
Rozen, D. E. and Lenski, R. E. (2000) Long-term experimental evolution in Escherichia coli. VIII. dynamics of a balanced polymorphism. Am. Nat., 155, 24–35
|
| [11] |
Wiser, M. J., Ribeck, N. and Lenski, R. E. (2013) Long-term dynamics of adaptation in asexual populations. Science, 342, 1364–1367
|
| [12] |
Taus, T., Futschik, A. and Schlötterer, C. (2017) Quantifying selection with pool-seq time series data. Mol. Biol. Evol., 34, 3023–3034
|
| [13] |
Schwartz, R., Schöffer, A.A. (2017) The evolution of tumour phylogenetics: principles and practice. Nat. Re. Genet ., 18, 213–229
|
| [14] |
Kimura, M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61, 893–903
|
| [15] |
El-Kebir, M., Oesper, L., Acheson-Field, H. and Raphael, B. J. (2015) Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics, 31, i62–i70
|
| [16] |
Ng, C. K., Cooke, S. L., Howe, K., Newman, S., Xian, J., Temple, J., Batty, E. M., Pole, J. C., Langdon, S. P., Edwards, P. A., (2012) The role of tandem duplicator phenotype in tumour evolution in high-grade serous ovarian cancer. J. Pathol., 226, 703–712
|
| [17] |
Yang, L., Luquette, L. J., Gehlenborg, N., Xi, R., Haseley, P. S., Hsieh, C. H., Zhang, C., Ren, X., Protopopov, A., Chin, L., (2013) Diverse mechanisms of somatic structural variations in human cancer genomes. Cell, 153, 919–929
|
| [18] |
Quigley, D. A., Dang, H. X., Zhao, S. G., Lloyd, P., Aggarwal, R., Alumkal, J. J., Foye, A., Kothari, V., Perry, M. D., Bailey, A. M., (2018) Genomic hallmarks and structural variation in metastatic prostate cancer. Cell, 174, 758–769.e9
|
| [19] |
Malikic, S., McPherson, A. W., Donmez, N. and Sahinalp, C. S. (2015) Clonality inference in multiple tumor samples using phylogeny. Bioinformatics, 31, 1349–1356
|
| [20] |
Zare, H., Wang, J., Hu, A., Weber, K., Smith, J., Nickerson, D., Song, C., Witten, D., Blau, C. A. and Noble, W. S. (2014) Inferring clonal composition from multiple sections of a breast cancer. PLOS Comput. Biol., 10, e1003703
|
| [21] |
Fischer, A., Vázquez-García, I., Illingworth J. R. C., and Mustonen, V. (2014) High-definition reconstruction of clonal composition in cancer. Cell Reports, 7, 1740–1752
|
| [22] |
Zaccaria, S., El-Kebir, M., Klau, G. W. and Raphael, B. J. (2017) The copy-number tree mixture deconvolution problem and applications to multi-sample bulk sequencing tumor data. In: International Conference on Research in Computational Molecular Biology, pp. 318–335. Springer
|
| [23] |
Husić E., Li, X., Hujdurović A., Mehine, M., Rizzi, R., Mäkinen, V., Milanič M. and Tomescu, A. I. (2019) MIPUP: minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and ILP. Bioinformatics, 35, 769–777
|
| [24] |
Popic, V., Salari, R., Hajirasouliha, I., Kashef-Haghighi, D., West, R. B. and Batzoglou, S. (2015) Fast and scalable inference of multi-sample cancer lineages. Genome Biol., 16, 91
|
| [25] |
Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. and Morris, Q. (2014) Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics, 15, 35
|
| [26] |
Deshwar, A. G., Vembu, S., Yung, C. K., Jang, G. H., Stein, L. and Morris, Q. (2015) PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol., 16, 35
|
| [27] |
Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S., Bouchard-Côté A. and Shah, S. P. (2014) PyClone: statistical inference of clonal population structure in cancer. Nat. Methods, 11, 396–398
|
| [28] |
Hajirasouliha, I., Mahmoody, A. and Raphael, B. J. (2014) A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics, 30, i78–i86
|
| [29] |
Miller, C. A., White, B. S., Dees, N. D., Griffith, M., Welch, J. S., Griffith, O. L., Vij, R., Tomasson, M. H., Graubert, T. A., Walter, M. J., (2014) SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLOS Comput. Biol., 10, e1003665
|
| [30] |
Oesper, L., Mahmoody, A. and Raphael, B. J. (2013) THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol., 14, R80
|
| [31] |
Strino, F., Parisi, F., Micsinai, M. and Kluger, Y. (2013) TrAp: a tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Res., 41, e165
|
| [32] |
Deveau, P., Colmet Daage, L., Oldridge, D., Bernard, V., Bellini, A., Chicard, M., Clement, N., Lapouble, E., Combaret, V., Boland, A., (2018) QuantumClone: clonal assessment of functional mutations in cancer based on a genotype-aware method for clonal reconstruction. Bioinformatics, 34, 1808–1816
|
| [33] |
Donmez, N., Malikic, S., Wyatt, A. W., Gleave, M. E., Collins, C. C. and Sahinalp, S. C. (2017) Clonality inference from single tumor samples using low-coverage sequence data. J. Comput. Biol., 24, 515–523
|
| [34] |
Mohammed Ismail, W. and Tang, H. (2019) Clonal reconstruction from time course genomic sequencing data. In: International Conference on Intelligent Biology and Medicine
|
| [35] |
El-Kebir, M., Satas, G., Oesper, L. and Raphael, B. J. (2016) Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Syst., 3, 43–53
|
| [36] |
Nieboer, M. M., Dorssers, L. C. J., Straver, R., Looijenga, L. H. J. and de Ridder, J. (2018) TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors. PLoS One, 13, e0208002
|
| [37] |
Yuan, K., Sakoparnig, T., Markowetz, F. and Beerenwinkel, N. (2015) BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol., 16, 36
|
| [38] |
Jiang, Y., Qiu, Y., Minn, A. J. and Zhang, N. R. (2016) Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl. Acad. Sci. USA, 113, E5528–E5537
|
| [39] |
Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., (2014) TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res., 24, 1881–1893
|
| [40] |
Myers, M. A., Satas, G. and Raphael, B. J. (2019) Calder: Inferring phylogenetic trees from longitudinal tumor samples. Cell Syst., 8, 514–522.e5
|
| [41] |
Sengupta, S., Wang, J., Lee, J., Müller, P., Gulukota, K., Banerjee, A. and Ji, Y. (2014) Bayclone: Bayesian nonparametric inference of tumor subclones using NGS data. In: Pacific Symposium on Biocomputing Co-Chairs, pp. 467–478. World Scientific
|
| [42] |
Lee, J., Müller, P., Sengupta, S., Gulukota, K. and Ji, Y. (2016) Bayesian inference for intratumour heterogeneity in mutations and copy number variation. J. R. Stat. Soc. Ser. C Appl. Stat., 65, 547–563
|
| [43] |
Miura, S., Gomez, K., Murillo, O., Huuki, L. A., Vu, T., Buturla, T. and Kumar, S. (2018) Predicting clone genotypes from tumor bulk sequencing of multiple samples. Bioinformatics, 34, 4017–4026
|
| [44] |
Marass, F., Mouliere, F., Yuan, K., Rosenfeld, N. and Markowetz, F. (2016) A phylogenetic latent feature model for clonal deconvolution. Ann. Appl. Stat., 10, 2377–2404
|
| [45] |
Zhou, T., Sengupta, S., Müller, P. and Ji, Y. (2019) Treeclone: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. Ann. Appl. Stat., 13, 874–899
|
| [46] |
Zhou, T., Müller, P., Sengupta, S. and Ji, Y. (2019) Pairclone: a bayesian subclone caller based on mutation pairs. J. R. Stat. Soc. Ser. C Appl. Stat., 68, 705–725
|
| [47] |
Qiao, Y., Quinlan, A. R., Jazaeri, A. A., Verhaak, R. G., Wheeler, D. A. and Marth, G. T. (2014) SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol., 15, 443
|
| [48] |
Zafar, H., Tzen, A., Navin, N., Chen, K. and Nakhleh, L. (2016) Sifit: a method for inferring tumor trees from single-cell sequencing data under finite-site models. Genome Biol., 18, 178
|
| [49] |
Davis, A. and Navin, N. E. (2016) Computing tumor trees from single cells. Genome Biol., 17, 113
|
| [50] |
Ross, E. M. and Markowetz, F. (2016) OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol., 17, 69
|
| [51] |
El-Kebir, M. (2018) SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error. Bioinformatics, 34, i671–i679
|
| [52] |
Malikic, S., Jahn, K., Kuipers, J., Sahinalp, C. and Beerenwinkel, N. (2017) Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat. Commu., 10, 2750
|
| [53] |
Salehi, S., Steif, A., Roth, A., Aparicio, S., Bouchard-Côté A. and Shah, S. P. (2017) ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome Biol., 18, 44
|
| [54] |
Eaton, J., Wang, J. and Schwartz, R. (2018) Deconvolution and phylogeny inference of structural variations in tumor genomic samples. Bioinformatics, 34, i357–i365
|
| [55] |
Lei, H., Lyu, B., Gertz, E. M., Schaeffer, A. A., Shi, X., Wu, K., Li, G., Xu, L., Hou, Y., Dean, M., (2019) Tumor copy number deconvolution integrating bulk and single-cell sequencing data. In: International Conference on Research in Computational Molecular Biology, pp. 174–189. Springer
|
| [56] |
Aganezov, S. and Raphael, B. J. (2019) Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. bioRxiv
|
| [57] |
Chen, G., Ning, B., Shi, T. (2019) Single-cell RNA-seq technologies and related computational data analysis. Front. Genet ., 10, 317–317
|
| [58] |
Ferreira, P. F., Carvalho, A. M. and Vinga, S. (2018) Scalable probabilistic matrix factorization for single-cell RNA-seq analysis. bioRxiv
|
| [59] |
Durif, G., Modolo, L., Mold, J. E., Lambert-Lacroix, S. and Picard, F. (2019) Probabilistic count matrix factorization for single cell expression data analysis. Bioinformatics, 35, 4011–4019
|
| [60] |
Sun, S., Chen, Y., Liu, Y. and Shang, X. (2019) A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data. BMC Syst. Biol., 13, 28
|
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature