Applications of probability and statistics in cancer genomics

Xiaotu Ma, Sasi Arunachalam, Yanling Liu

PDF(1402 KB)
PDF(1402 KB)
Quant. Biol. ›› 2020, Vol. 8 ›› Issue (2) : 95-108. DOI: 10.1007/s40484-020-0203-8
REVIEW
REVIEW

Applications of probability and statistics in cancer genomics

Author information +
History +

Abstract

Background: The past decade has witnessed a rapid progress in our understanding of the genetics of cancer and its progression. Probabilistic and statistical modeling played a pivotal role in the discovery of general patterns from cancer genomics datasets and continue to be of central importance for personalized medicine.

Results: In this review we introduce cancer genomics from a probabilistic and statistical perspective. We start from (1) functional classification of genes into oncogenes and tumor suppressor genes, then (2) demonstrate the importance of comprehensive analysis of different mutation types for individual cancer genomes, followed by (3) tumor purity analysis, which in turn leads to (4) the concept of ploidy and clonality, that is next connected to (5) tumor evolution under treatment pressure, which yields insights into cancer drug resistance. We also discuss future challenges including the non-coding genomic regions, integrative analysis of genomics and epigenomics, as well as early cancer detection.

Conclusion: We believe probabilistic and statistical modeling will continue to play important roles for novel discoveries in the field of cancer genomics and personalized medicine.

Graphical abstract

Keywords

cancer genomics / sequence analysis / probability and statistics

Cite this article

Download citation ▾
Xiaotu Ma, Sasi Arunachalam, Yanling Liu. Applications of probability and statistics in cancer genomics. Quant. Biol., 2020, 8(2): 95‒108 https://doi.org/10.1007/s40484-020-0203-8

References

[1]
Nowell, P. C. (2007) Discovery of the Philadelphia chromosome: a personal perspective. J. Clin. Invest., 117, 2033–2035
CrossRef Pubmed Google scholar
[2]
Nowell, P. H. D. (1960) A minute chromosome in human chronic granulocytic leukemia. Science, 132, 1497
[3]
Sanger, F. and Coulson, A. R. (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol., 94, 441–448
CrossRef Pubmed Google scholar
[4]
Weinberg, R. A. (1991) Tumor suppressor genes. Science, 254, 1138–1146
CrossRef Pubmed Google scholar
[5]
Downing, J. R., Wilson, R. K., Zhang, J., Mardis, E. R., Pui, C. H., Ding, L., Ley, T. J. and Evans, W. E. (2012) The Pediatric Cancer Genome Project. Nat. Genet., 44, 619–622
CrossRef Pubmed Google scholar
[6]
Cancer Genome Atlas Research Network. (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068
CrossRef Pubmed Google scholar
[7]
Ma, X., Liu, Y., Liu, Y., Alexandrov, L. B., Edmonson, M. N., Gawad, C., Zhou, X., Li, Y., Rusch, M. C., Easton, J., (2018) Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature, 555, 371–376
CrossRef Pubmed Google scholar
[8]
Gröbner, S. N., Worst, B. C., Weischenfeldt, J., Buchhalter, I., Kleinheinz, K., Rudneva, V. A., Johann, P. D., Balasubramanian, G. P., Segura-Wang, M., Brabetz, S., (2018) The landscape of genomic alterations across childhood cancers. Nature, 555, 321–327
CrossRef Pubmed Google scholar
[9]
Lawrence, M. S., Stojanov, P., Mermel, C. H., Robinson, J. T., Garraway, L. A., Golub, T. R., Meyerson, M., Gabriel, S. B., Lander, E. S. and Getz, G. (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature, 505, 495–501
CrossRef Pubmed Google scholar
[10]
Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., Leiserson, M. D. M., Niu, B., McLellan, M. D., Uzunangelov, V., (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158, 929–944
CrossRef Pubmed Google scholar
[11]
Zack, T. I., Schumacher, S. E., Carter, S. L., Cherniack, A. D., Saksena, G., Tabak, B., Lawrence, M. S., Zhang, C. Z., Wala, J., Mermel, C. H., (2013) Pan-cancer patterns of somatic copy number alteration. Nat. Genet., 45, 1134–1140
CrossRef Pubmed Google scholar
[12]
Rusch, M., Nakitandwe, J., Shurtleff, S., Newman, S., Zhang, Z., Edmonson, M. N., Parker, M., Jiao, Y., Ma, X., Liu, Y., (2018) Clinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome. Nat. Commun., 9, 3962
CrossRef Pubmed Google scholar
[13]
Crowley, E., Di Nicolantonio, F., Loupakis, F. and Bardelli, A. (2013) Liquid biopsy: monitoring cancer-genetics in the blood. Nat. Rev. Clin. Oncol., 10, 472–484
CrossRef Pubmed Google scholar
[14]
Cohen, J. D., Li, L., Wang, Y., Thoburn, C., Afsari, B., Danilova, L., Douville, C., Javed, A. A., Wong, F., Mattox, A., (2018) Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science, 359, 926–930
CrossRef Pubmed Google scholar
[15]
Tomasetti, C., Vogelstein, B. and Parmigiani, G. (2013) Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc. Natl. Acad. Sci. USA, 110, 1999–2004
CrossRef Pubmed Google scholar
[16]
Kunkel, T. A. and Erie, D. A. (2015) Eukaryotic mismatch repair in relation to DNA replication. Annu. Rev. Genet., 49, 291–313
CrossRef Pubmed Google scholar
[17]
Forsberg, L. A., Gisselsson, D. and Dumanski, J. P. (2017) Mosaicism in health and disease—clones picking up speed. Nat. Rev. Genet., 18, 128–142
CrossRef Pubmed Google scholar
[18]
Bianconi, E., Piovesan, A., Facchin, F., Beraudi, A., Casadei, R., Frabetti, F., Vitale, L., Pelleri, M. C., Tassani, S., Piva, F., (2013) An estimation of the number of cells in the human body. Ann. Hum. Biol., 40, 463–471
CrossRef Pubmed Google scholar
[19]
Testa, C. M. and Jankovic, J. (2019) Huntington disease: A quarter century of progress since the gene discovery. J. Neurol. Sci., 396, 52–68
CrossRef Pubmed Google scholar
[20]
Zhang, J., Walsh, M. F., Wu, G., Edmonson, M. N., Gruber, T. A., Easton, J., Hedges, D., Ma, X., Zhou, X., Yergeau, D. A., (2015) Germline mutations in predisposition genes in pediatric cancer. N. Engl. J. Med., 373, 2336–2346
CrossRef Pubmed Google scholar
[21]
Vogelstein, B., Papadopoulos, N., Velculescu, V. E., Zhou, S., Diaz, L. A. Jr and Kinzler, K. W. (2013) Cancer genome landscapes. Science, 339, 1546–1558
CrossRef Pubmed Google scholar
[22]
Pounds, S., Cheng, C., Li, S., Liu, Z., Zhang, J. and Mullighan, C. (2013) A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics, 29, 2088–2095
CrossRef Pubmed Google scholar
[23]
Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218
CrossRef Pubmed Google scholar
[24]
Dees, N. D., Zhang, Q., Kandoth, C., Wendl, M. C., Schierding, W., Koboldt, D. C., Mooney, T. B., Callaway, M. B., Dooling, D., Mardis, E. R., (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res., 22, 1589–1598
CrossRef Pubmed Google scholar
[25]
Soussi, T. and Wiman, K. G. (2015) TP53: an oncogene in disguise. Cell Death Differ., 22, 1239–1249
CrossRef Pubmed Google scholar
[26]
International Human Genome Sequencing Consortium. (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945
CrossRef Pubmed Google scholar
[27]
Cancer Genome Atlas Research Network. (2012) Comprehensive genomic characterization of squamous cell lung cancers. Nature, 489, 519–525
CrossRef Pubmed Google scholar
[28]
Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218
CrossRef Pubmed Google scholar
[29]
Ma, X., Edmonson, M., Yergeau, D., Muzny, D. M., Hampton, O. A., Rusch, M., Song, G., Easton, J., Harvey, R. C., Wheeler, D. A., (2015) Rise and fall of subclones from diagnosis to relapse in pediatric B-acute lymphoblastic leukaemia. Nat. Commun., 6, 6604
CrossRef Pubmed Google scholar
[30]
Nik-Zainal, S., Alexandrov, L. B., Wedge, D. C., Van Loo, P., Greenman, C. D., Raine, K., Jones, D., Hinton, J., Marshall, J., Stebbings, L. A., (2012) Mutational processes molding the genomes of 21 breast cancers. Cell, 149, 979–993
CrossRef Pubmed Google scholar
[31]
Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M. and Sirotkin, K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311
CrossRef Pubmed Google scholar
[32]
Griffith, M., Miller, C. A., Griffith, O. L., Krysiak, K., Skidmore, Z. L., Ramu, A., Walker, J. R., Dang, H. X., Trani, L., Larson, D. E., (2015) Optimizing cancer genome sequencing and analysis. Cell Syst., 1, 210–223
CrossRef Pubmed Google scholar
[33]
Sundling, K. E. and Lowe, A. C. (2019) Circulating tumor cells: overview and opportunities in cytology. Adv. Anat. Pathol., 26, 56–63
CrossRef Pubmed Google scholar
[34]
Kakadia, P. M., Van de Water, N., Browett, P. J. and Bohlander, S. K. (2018) Efficient identification of somatic mutations in acute myeloid leukaemia using whole exome sequencing of fingernail derived DNA as germline control. Sci. Rep., 8, 13751
CrossRef Pubmed Google scholar
[35]
Mrózek, K., Heerema, N. A. and Bloomfield, C. D. (2004) Cytogenetics in acute leukemia. Blood Rev., 18, 115–136
CrossRef Pubmed Google scholar
[36]
Craig, D. W., Nasser, S., Corbett, R., Chan, S. K., Murray, L., Legendre, C., Tembe, W., Adkins, J., Kim, N., Wong, S., (2016) A somatic reference standard for cancer genome sequencing. Sci. Rep., 6, 24607
CrossRef Pubmed Google scholar
[37]
Li, B., Brady, S. W., Ma, X., Shen, S., Zhang, Y., Li, Y., Szlachta, K., Dong, L., Liu, Y., Yang, F., (2019) Therapy-induced mutations drive the genomic landscape of relapsed acute lymphoblastic leukemia. Blood, 135, 41–55
Pubmed
[38]
Brady, S. W., Ma, X., Bahrami, A., Satas, G., Wu, G., Newman, S., Rusch, M., Putnam, D. K., Mulder, H. L., Yergeau, D. A., (2019) The clonal evolution of metastatic osteosarcoma as shaped by cisplatin treatment. Mol. Cancer Res., 17, 895–906
CrossRef Pubmed Google scholar
[39]
Li, B., Li, H., Bai, Y., Kirschner-Schwabe, R., Yang, J. J., Chen, Y., Lu, G., Tzoneva, G., Ma, X., Wu, T., (2015) Negative feedback-defective PRPS1 mutants drive thiopurine resistance in relapsed childhood ALL. Nat. Med., 21, 563–571
CrossRef Pubmed Google scholar
[40]
Salk, J. J., Schmitt, M. W. and Loeb, L. A. (2018) Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet., 19, 269–285
CrossRef Pubmed Google scholar
[41]
Goodwin, S., McPherson, J. D. and McCombie, W. R. (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351
CrossRef Pubmed Google scholar
[42]
Mardis, E. R. (2013) Next-generation sequencing platforms. Annu. Rev. Anal. Chem. (Palo Alto, Calif.), 6, 287–303
CrossRef Pubmed Google scholar
[43]
Glenn, T. C. (2011) Field guide to next-generation DNA sequencers. Mol. Ecol. Resour., 11, 759–769
CrossRef Pubmed Google scholar
[44]
Cheng, D. T., Mitchell, T. N., Zehir, A., Shah, R. H., Benayed, R., Syed, A., Chandramohan, R., Liu, Z. Y., Won, H. H., Scott, S. N., (2015) Memorial sloan kettering-integrated mutation profiling of actionable cancer targets (MSK-IMPACT): A hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J. Mol. Diagn., 17, 251–264
CrossRef Pubmed Google scholar
[45]
Ma, X., Shao, Y., Tian, L., Flasch, D. A., Mulder, H. L., Edmonson, M. N., Liu, Y., Chen, X., Newman, S., Nakitandwe, J., (2019) Analysis of error profiles in deep next-generation sequencing data. Genome Biol., 20, 50
CrossRef Pubmed Google scholar
[46]
Young, A. L., Challen, G. A., Birmann, B. M. and Druley, T. E. (2016) Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun., 7, 12484
CrossRef Pubmed Google scholar
[47]
Ulz, P., Heitzer, E., Geigl, J. B. and Speicher, M. R. (2017) Patient monitoring through liquid biopsies using circulating tumor DNA. Int. J. Cancer, 141, 887–896
CrossRef Pubmed Google scholar
[48]
Figueroa, M. E., Lugthart, S., Li, Y., Erpelinck-Verschueren, C., Deng, X., Christos, P. J., Schifano, E., Booth, J., van Putten, W., Skrabanek, L., (2010) DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. Cancer Cell, 17, 13–27
CrossRef Pubmed Google scholar
[49]
Ma, X., Wang, Y. W., Zhang, M. Q. and Gazdar, A. F. (2013) DNA methylation data analysis and its application to cancer research. Epigenomics, 5, 301–316
CrossRef Pubmed Google scholar
[50]
Zeineldin, M., Federico, S., Chen, X., Xu, B., Stewart, E., Naranjo, A., Hogarty, M.D., Dyer, M.A. (2020) MYCN amplification and ATRX mutations are incompatible in neuroblastoma. Nat. Commun., 11, 913
[51]
Iacobucci, I., Li, Y., Roberts, K. G., Dobson, S. M., Kim, J. C., Payne-Turner, D., Harvey, R. C., Valentine, M., McCastlain, K., Easton, J., (2016) Truncating erythropoietin receptor rearrangements in acute lymphoblastic leukemia. Cancer Cell, 29, 186–200
CrossRef Pubmed Google scholar
[52]
Zhang, J., McCastlain, K., Yoshihara, H., Xu, B., Chang, Y., Churchman, M. L., Wu, G., Li, Y., Wei, L., Iacobucci, I., (2016) Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat. Genet., 48, 1481–1489
CrossRef Pubmed Google scholar
[53]
Mansour, M. R., Abraham, B. J., Anders, L., Berezovskaya, A., Gutierrez, A., Durbin, A. D., Etchin, J., Lawton, L., Sallan, S. E., Silverman, L. B., (2014) Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science, 346, 1373–1377
CrossRef Pubmed Google scholar
[54]
Huang, F. W., Hodis, E., Xu, M. J., Kryukov, G. V., Chin, L. and Garraway, L. A. (2013) Highly recurrent TERT promoter mutations in human melanoma. Science, 339, 957–959
CrossRef Pubmed Google scholar
[55]
Zhang, H., Si, X., Ji, X., Fan, R., Liu, J., Chen, K., Wang, D. and Gao, C. (2018) Genome editing of upstream open reading frames enables translational control in plants. Nat. Biotechnol., 36, 894–898
CrossRef Pubmed Google scholar
[56]
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. and Stratton, M. R. (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Reports, 3, 246–259
CrossRef Pubmed Google scholar
[57]
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Aparicio, S. A., Behjati, S., Biankin, A. V., Bignell, G. R., Bolli, N., Borg, A., Børresen-Dale, A. L., (2013) Signatures of mutational processes in human cancer. Nature, 500, 415–421
CrossRef Pubmed Google scholar
[58]
Ng, A. W. T., Poon, S. L., Huang, M. N., Lim, J. Q., Boot, A., Yu, W., Suzuki, Y., Thangaraju, S., Ng, C. C. Y., Tan, P., (2017) Aristolochic acids and their derivatives are widely implicated in liver cancers in Taiwan and throughout Asia. Sci. Transl. Med., 9, eaan6446
CrossRef Pubmed Google scholar
[59]
Brash, D. E. (2015) UV signature mutations. Photochem. Photobiol., 91, 15–26
CrossRef Pubmed Google scholar
[60]
Petljak, M., Alexandrov, L.B., Brammeld, J.S., Price, S., Wedge, D.C., Grossmann, S., Dawson, K.J., Ju, Y.S., Iorio, F., Tubio, J.M.C., (2019) Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell, 176, 1282–1294
[61]
Ye, K., Wang, J., Jayasinghe, R., Lameijer, E. W., McMichael, J. F., Ning, J., McLellan, M. D., Xie, M., Cao, S., Yellapantula, V., (2016) Systematic discovery of complex insertions and deletions in human cancers. Nat. Med., 22, 97–104
CrossRef Pubmed Google scholar
[62]
Wiemels, J. L., Leonard, B. C., Wang, Y., Segal, M. R., Hunger, S. P., Smith, M. T., Crouse, V., Ma, X., Buffler, P. A. and Pine, S. R. (2002) Site-specific translocation and evidence of postnatal origin of the t(1;19) E2A-PBX1 fusion in childhood acute lymphoblastic leukemia. Proc. Natl. Acad. Sci. USA, 99, 15101–15106
CrossRef Pubmed Google scholar

ACKNOWLEDGEMENTS

X.M. is partly supported by The Innovation in Cancer Informatics (ICI) Fund. The authors are grateful to the editorial support by Makeda Porter-Carr.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xiaotu Ma, Sasi Arunachalam and Yanling Liu declare that they have no conflict of interests.
This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2020 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1402 KB)

Accesses

Citations

Detail

Sections
Recommended

/