Applications of probability and statistics in cancer genomics

Xiaotu Ma , Sasi Arunachalam , Yanling Liu

Quant. Biol. ›› 2020, Vol. 8 ›› Issue (2) : 95 -108.

PDF (1402KB)
Quant. Biol. ›› 2020, Vol. 8 ›› Issue (2) : 95 -108. DOI: 10.1007/s40484-020-0203-8
REVIEW
REVIEW

Applications of probability and statistics in cancer genomics

Author information +
History +
PDF (1402KB)

Abstract

Background: The past decade has witnessed a rapid progress in our understanding of the genetics of cancer and its progression. Probabilistic and statistical modeling played a pivotal role in the discovery of general patterns from cancer genomics datasets and continue to be of central importance for personalized medicine.

Results: In this review we introduce cancer genomics from a probabilistic and statistical perspective. We start from (1) functional classification of genes into oncogenes and tumor suppressor genes, then (2) demonstrate the importance of comprehensive analysis of different mutation types for individual cancer genomes, followed by (3) tumor purity analysis, which in turn leads to (4) the concept of ploidy and clonality, that is next connected to (5) tumor evolution under treatment pressure, which yields insights into cancer drug resistance. We also discuss future challenges including the non-coding genomic regions, integrative analysis of genomics and epigenomics, as well as early cancer detection.

Conclusion: We believe probabilistic and statistical modeling will continue to play important roles for novel discoveries in the field of cancer genomics and personalized medicine.

Graphical abstract

Keywords

cancer genomics / sequence analysis / probability and statistics

Cite this article

Download citation ▾
Xiaotu Ma, Sasi Arunachalam, Yanling Liu. Applications of probability and statistics in cancer genomics. Quant. Biol., 2020, 8(2): 95-108 DOI:10.1007/s40484-020-0203-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Nowell, P. C. (2007) Discovery of the Philadelphia chromosome: a personal perspective. J. Clin. Invest., 117, 2033–2035

[2]

Nowell, P. H. D. (1960) A minute chromosome in human chronic granulocytic leukemia. Science, 132, 1497

[3]

Sanger, F. and Coulson, A. R. (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol., 94, 441–448

[4]

Weinberg, R. A. (1991) Tumor suppressor genes. Science, 254, 1138–1146

[5]

Downing, J. R., Wilson, R. K., Zhang, J., Mardis, E. R., Pui, C. H., Ding, L., Ley, T. J. and Evans, W. E. (2012) The Pediatric Cancer Genome Project. Nat. Genet., 44, 619–622

[6]

Cancer Genome Atlas Research Network. (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068

[7]

Ma, X., Liu, Y., Liu, Y., Alexandrov, L. B., Edmonson, M. N., Gawad, C., Zhou, X., Li, Y., Rusch, M. C., Easton, J., (2018) Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature, 555, 371–376

[8]

Gröbner, S. N., Worst, B. C., Weischenfeldt, J., Buchhalter, I., Kleinheinz, K., Rudneva, V. A., Johann, P. D., Balasubramanian, G. P., Segura-Wang, M., Brabetz, S., (2018) The landscape of genomic alterations across childhood cancers. Nature, 555, 321–327

[9]

Lawrence, M. S., Stojanov, P., Mermel, C. H., Robinson, J. T., Garraway, L. A., Golub, T. R., Meyerson, M., Gabriel, S. B., Lander, E. S. and Getz, G. (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature, 505, 495–501

[10]

Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., Leiserson, M. D. M., Niu, B., McLellan, M. D., Uzunangelov, V., (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158, 929–944

[11]

Zack, T. I., Schumacher, S. E., Carter, S. L., Cherniack, A. D., Saksena, G., Tabak, B., Lawrence, M. S., Zhang, C. Z., Wala, J., Mermel, C. H., (2013) Pan-cancer patterns of somatic copy number alteration. Nat. Genet., 45, 1134–1140

[12]

Rusch, M., Nakitandwe, J., Shurtleff, S., Newman, S., Zhang, Z., Edmonson, M. N., Parker, M., Jiao, Y., Ma, X., Liu, Y., (2018) Clinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome. Nat. Commun., 9, 3962

[13]

Crowley, E., Di Nicolantonio, F., Loupakis, F. and Bardelli, A. (2013) Liquid biopsy: monitoring cancer-genetics in the blood. Nat. Rev. Clin. Oncol., 10, 472–484

[14]

Cohen, J. D., Li, L., Wang, Y., Thoburn, C., Afsari, B., Danilova, L., Douville, C., Javed, A. A., Wong, F., Mattox, A., (2018) Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science, 359, 926–930

[15]

Tomasetti, C., Vogelstein, B. and Parmigiani, G. (2013) Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc. Natl. Acad. Sci. USA, 110, 1999–2004

[16]

Kunkel, T. A. and Erie, D. A. (2015) Eukaryotic mismatch repair in relation to DNA replication. Annu. Rev. Genet., 49, 291–313

[17]

Forsberg, L. A., Gisselsson, D. and Dumanski, J. P. (2017) Mosaicism in health and disease—clones picking up speed. Nat. Rev. Genet., 18, 128–142

[18]

Bianconi, E., Piovesan, A., Facchin, F., Beraudi, A., Casadei, R., Frabetti, F., Vitale, L., Pelleri, M. C., Tassani, S., Piva, F., (2013) An estimation of the number of cells in the human body. Ann. Hum. Biol., 40, 463–471

[19]

Testa, C. M. and Jankovic, J. (2019) Huntington disease: A quarter century of progress since the gene discovery. J. Neurol. Sci., 396, 52–68

[20]

Zhang, J., Walsh, M. F., Wu, G., Edmonson, M. N., Gruber, T. A., Easton, J., Hedges, D., Ma, X., Zhou, X., Yergeau, D. A., (2015) Germline mutations in predisposition genes in pediatric cancer. N. Engl. J. Med., 373, 2336–2346

[21]

Vogelstein, B., Papadopoulos, N., Velculescu, V. E., Zhou, S., Diaz, L. A. Jr and Kinzler, K. W. (2013) Cancer genome landscapes. Science, 339, 1546–1558

[22]

Pounds, S., Cheng, C., Li, S., Liu, Z., Zhang, J. and Mullighan, C. (2013) A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics, 29, 2088–2095

[23]

Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218

[24]

Dees, N. D., Zhang, Q., Kandoth, C., Wendl, M. C., Schierding, W., Koboldt, D. C., Mooney, T. B., Callaway, M. B., Dooling, D., Mardis, E. R., (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res., 22, 1589–1598

[25]

Soussi, T. and Wiman, K. G. (2015) TP53: an oncogene in disguise. Cell Death Differ., 22, 1239–1249

[26]

International Human Genome Sequencing Consortium. (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945

[27]

Cancer Genome Atlas Research Network. (2012) Comprehensive genomic characterization of squamous cell lung cancers. Nature, 489, 519–525

[28]

Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218

[29]

Ma, X., Edmonson, M., Yergeau, D., Muzny, D. M., Hampton, O. A., Rusch, M., Song, G., Easton, J., Harvey, R. C., Wheeler, D. A., (2015) Rise and fall of subclones from diagnosis to relapse in pediatric B-acute lymphoblastic leukaemia. Nat. Commun., 6, 6604

[30]

Nik-Zainal, S., Alexandrov, L. B., Wedge, D. C., Van Loo, P., Greenman, C. D., Raine, K., Jones, D., Hinton, J., Marshall, J., Stebbings, L. A., (2012) Mutational processes molding the genomes of 21 breast cancers. Cell, 149, 979–993

[31]

Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M. and Sirotkin, K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311

[32]

Griffith, M., Miller, C. A., Griffith, O. L., Krysiak, K., Skidmore, Z. L., Ramu, A., Walker, J. R., Dang, H. X., Trani, L., Larson, D. E., (2015) Optimizing cancer genome sequencing and analysis. Cell Syst., 1, 210–223

[33]

Sundling, K. E. and Lowe, A. C. (2019) Circulating tumor cells: overview and opportunities in cytology. Adv. Anat. Pathol., 26, 56–63

[34]

Kakadia, P. M., Van de Water, N., Browett, P. J. and Bohlander, S. K. (2018) Efficient identification of somatic mutations in acute myeloid leukaemia using whole exome sequencing of fingernail derived DNA as germline control. Sci. Rep., 8, 13751

[35]

Mrózek, K., Heerema, N. A. and Bloomfield, C. D. (2004) Cytogenetics in acute leukemia. Blood Rev., 18, 115–136

[36]

Craig, D. W., Nasser, S., Corbett, R., Chan, S. K., Murray, L., Legendre, C., Tembe, W., Adkins, J., Kim, N., Wong, S., (2016) A somatic reference standard for cancer genome sequencing. Sci. Rep., 6, 24607

[37]

Li, B., Brady, S. W., Ma, X., Shen, S., Zhang, Y., Li, Y., Szlachta, K., Dong, L., Liu, Y., Yang, F., (2019) Therapy-induced mutations drive the genomic landscape of relapsed acute lymphoblastic leukemia. Blood, 135, 41–55

[38]

Brady, S. W., Ma, X., Bahrami, A., Satas, G., Wu, G., Newman, S., Rusch, M., Putnam, D. K., Mulder, H. L., Yergeau, D. A., (2019) The clonal evolution of metastatic osteosarcoma as shaped by cisplatin treatment. Mol. Cancer Res., 17, 895–906

[39]

Li, B., Li, H., Bai, Y., Kirschner-Schwabe, R., Yang, J. J., Chen, Y., Lu, G., Tzoneva, G., Ma, X., Wu, T., (2015) Negative feedback-defective PRPS1 mutants drive thiopurine resistance in relapsed childhood ALL. Nat. Med., 21, 563–571

[40]

Salk, J. J., Schmitt, M. W. and Loeb, L. A. (2018) Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet., 19, 269–285

[41]

Goodwin, S., McPherson, J. D. and McCombie, W. R. (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351

[42]

Mardis, E. R. (2013) Next-generation sequencing platforms. Annu. Rev. Anal. Chem. (Palo Alto, Calif.), 6, 287–303

[43]

Glenn, T. C. (2011) Field guide to next-generation DNA sequencers. Mol. Ecol. Resour., 11, 759–769

[44]

Cheng, D. T., Mitchell, T. N., Zehir, A., Shah, R. H., Benayed, R., Syed, A., Chandramohan, R., Liu, Z. Y., Won, H. H., Scott, S. N., (2015) Memorial sloan kettering-integrated mutation profiling of actionable cancer targets (MSK-IMPACT): A hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J. Mol. Diagn., 17, 251–264

[45]

Ma, X., Shao, Y., Tian, L., Flasch, D. A., Mulder, H. L., Edmonson, M. N., Liu, Y., Chen, X., Newman, S., Nakitandwe, J., (2019) Analysis of error profiles in deep next-generation sequencing data. Genome Biol., 20, 50

[46]

Young, A. L., Challen, G. A., Birmann, B. M. and Druley, T. E. (2016) Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun., 7, 12484

[47]

Ulz, P., Heitzer, E., Geigl, J. B. and Speicher, M. R. (2017) Patient monitoring through liquid biopsies using circulating tumor DNA. Int. J. Cancer, 141, 887–896

[48]

Figueroa, M. E., Lugthart, S., Li, Y., Erpelinck-Verschueren, C., Deng, X., Christos, P. J., Schifano, E., Booth, J., van Putten, W., Skrabanek, L., (2010) DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. Cancer Cell, 17, 13–27

[49]

Ma, X., Wang, Y. W., Zhang, M. Q. and Gazdar, A. F. (2013) DNA methylation data analysis and its application to cancer research. Epigenomics, 5, 301–316

[50]

Zeineldin, M., Federico, S., Chen, X., Xu, B., Stewart, E., Naranjo, A., Hogarty, M.D., Dyer, M.A. (2020) MYCN amplification and ATRX mutations are incompatible in neuroblastoma. Nat. Commun., 11, 913

[51]

Iacobucci, I., Li, Y., Roberts, K. G., Dobson, S. M., Kim, J. C., Payne-Turner, D., Harvey, R. C., Valentine, M., McCastlain, K., Easton, J., (2016) Truncating erythropoietin receptor rearrangements in acute lymphoblastic leukemia. Cancer Cell, 29, 186–200

[52]

Zhang, J., McCastlain, K., Yoshihara, H., Xu, B., Chang, Y., Churchman, M. L., Wu, G., Li, Y., Wei, L., Iacobucci, I., (2016) Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat. Genet., 48, 1481–1489

[53]

Mansour, M. R., Abraham, B. J., Anders, L., Berezovskaya, A., Gutierrez, A., Durbin, A. D., Etchin, J., Lawton, L., Sallan, S. E., Silverman, L. B., (2014) Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science, 346, 1373–1377

[54]

Huang, F. W., Hodis, E., Xu, M. J., Kryukov, G. V., Chin, L. and Garraway, L. A. (2013) Highly recurrent TERT promoter mutations in human melanoma. Science, 339, 957–959

[55]

Zhang, H., Si, X., Ji, X., Fan, R., Liu, J., Chen, K., Wang, D. and Gao, C. (2018) Genome editing of upstream open reading frames enables translational control in plants. Nat. Biotechnol., 36, 894–898

[56]

Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. and Stratton, M. R. (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Reports, 3, 246–259

[57]

Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Aparicio, S. A., Behjati, S., Biankin, A. V., Bignell, G. R., Bolli, N., Borg, A., Børresen-Dale, A. L., (2013) Signatures of mutational processes in human cancer. Nature, 500, 415–421

[58]

Ng, A. W. T., Poon, S. L., Huang, M. N., Lim, J. Q., Boot, A., Yu, W., Suzuki, Y., Thangaraju, S., Ng, C. C. Y., Tan, P., (2017) Aristolochic acids and their derivatives are widely implicated in liver cancers in Taiwan and throughout Asia. Sci. Transl. Med., 9, eaan6446

[59]

Brash, D. E. (2015) UV signature mutations. Photochem. Photobiol., 91, 15–26

[60]

Petljak, M., Alexandrov, L.B., Brammeld, J.S., Price, S., Wedge, D.C., Grossmann, S., Dawson, K.J., Ju, Y.S., Iorio, F., Tubio, J.M.C., (2019) Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell, 176, 1282–1294

[61]

Ye, K., Wang, J., Jayasinghe, R., Lameijer, E. W., McMichael, J. F., Ning, J., McLellan, M. D., Xie, M., Cao, S., Yellapantula, V., (2016) Systematic discovery of complex insertions and deletions in human cancers. Nat. Med., 22, 97–104

[62]

Wiemels, J. L., Leonard, B. C., Wang, Y., Segal, M. R., Hunger, S. P., Smith, M. T., Crouse, V., Ma, X., Buffler, P. A. and Pine, S. R. (2002) Site-specific translocation and evidence of postnatal origin of the t(1;19) E2A-PBX1 fusion in childhood acute lymphoblastic leukemia. Proc. Natl. Acad. Sci. USA, 99, 15101–15106

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (1402KB)

3107

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/