Integrative clustering methods of multi-omics data for molecule-based cancer classifications

Dongfang Wang , Jin Gu

Quant. Biol. ›› 2016, Vol. 4 ›› Issue (1) : 58 -67.

PDF (727KB)
Quant. Biol. ›› 2016, Vol. 4 ›› Issue (1) : 58 -67. DOI: 10.1007/s40484-016-0063-4

Integrative clustering methods of multi-omics data for molecule-based cancer classifications

Author information +
History +
PDF (727KB)

Abstract

One goal of precise oncology is to re-classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large-scale multi-omics data is an important way for molecule-based cancer classification. The data heterogeneity and the complexity of inter-omics variations are two major challenges for the integrative clustering analysis. According to the different strategies to deal with these difficulties, we summarized the clustering methods as three major categories: direct integrative clustering, clustering of clusters and regulatory integrative clustering. A few practical considerations on data pre-processing, post-clustering analysis and pathway-based analysis are also discussed.

Graphical abstract

Keywords

clustering / cancer classification / omics / integrative analysis

Cite this article

Download citation ▾
Dongfang Wang, Jin Gu. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quant. Biol., 2016, 4(1): 58-67 DOI:10.1007/s40484-016-0063-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Garraway, L. A., Verweij, J. and Ballman, K. V. (2013) Precision oncology: an overview. J. Clin. Oncol., 31, 1803–1805

[2]

Shrager, J. and Tenenbaum, J. M. (2014) Rapid learning for precision oncology. Nat. Rev. Clin. Oncol., 11, 109–118

[3]

Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., Leiserson, M. D., Niu, B., McLellan, M. D., Uzunangelov, V., (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158, 929–944

[4]

Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. and Kim, D. (2015) Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet., 16, 85–97

[5]

Liu, Z., Zhang, X. S. and Zhang, S. (2014) Breast tumor subgroups reveal diverse clinical prognostic power. Sci. Rep., 4, 4002

[6]

Han, L., Yuan, Y., Zheng, S., Yang, Y., Li, J., Edgerton, M. E., Diao, L., Xu, Y., Verhaak, R. G. and Liang, H. (2014) The Pan-Cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes. Nat. Commun., 5, 3963

[7]

Curtis, C., Shah, S. P., Chin, S. F., Turashvili, G., Rueda, O. M., Dunning, M. J., Speed, D., Lynch, A. G., Samarajiwa, S., Yuan, Y., (2012) The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486, 346–352

[8]

Cancer Genome Atlas, N. (2012) Comprehensive molecular portraits of human breast tumours. Nature, 490, 61–70

[9]

Popat, S., Hubner, R. and Houlston, R. S. (2005) Systematic review of microsatellite instability and colorectal cancer prognosis. J. Clin. Oncol., 23, 609–618

[10]

Issa, J. P. (2004) CpG island methylator phenotype in cancer. Nat. Rev. Cancer, 4, 988–993

[11]

Kristensen, V. N., Lingjærde, O. C., Russnes, H. G., Vollan, H. K., Frigessi, A. and Børresen-Dale, A. L. (2014) Principles and methods of integrative genomic analyses in cancer. Nat. Rev. Cancer, 14, 299–313

[12]

Zhang, W., Liu, Y., Sun, N., Wang, D., Boyd-Kirkup, J., Dou, X. and Han, J. D. (2013) Integrating genomic, epigenomic, and transcriptomic features reveals modular signatures underlying poor prognosis in ovarian cancer. Cell Reports, 4, 542–553

[13]

Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., Powers, R. S., Ladanyi, M. and Shen, R. (2013) Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl. Acad. Sci. USA, 110, 4245–4250

[14]

Lock, E. F., Hoadley, K. A., Marron, J. S. and Nobel, A. B. (2013) Joint and Individual Variation Explained (Jive) for integrated analysis of multiple data types. Ann. Appl. Stat., 7, 523–542

[15]

Wu, D., Wang, D., Gu, J. and Zhang, M. Q. (2015) Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification. BMC Genomics, 16, 1022

[16]

Zhang, S., Liu, C. C., Li, W., Shen, H., Laird, P. W. and Zhou, X. J. (2012) Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res., 40, 9379–9391

[17]

Drier, Y., Sheffer, M. and Domany, E. (2013) Pathway-based personalized analysis of cancer. Proc. Natl. Acad. Sci. USA, 110, 6388–6393

[18]

Kirk, P., Griffin, J. E., Savage, R. S., Ghahramani, Z. and Wild, D. L. (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics, 28, 3290–3297

[19]

Lock, E. F. and Dunson, D. B. (2013) Bayesian consensus clustering. Bioinformatics, 29, 2610–2616

[20]

Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B. and Goldenberg, A. (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods, 11, 333–337

[21]

Vaske, C. J., Benz, S. C., Sanborn, J. Z., Earl, D., Szeto, C., Zhu, J., Haussler, D. and Stuart, J. M. (2010) Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 26, i237–i245

[22]

Shen, R., Olshen, A. B. and Ladanyi, M. (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25, 2906–2912

[23]

Zhang, S., Li, Q., Liu, J. and Zhou, X. J. (2011) A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics, 27, i401–i409

[24]

Candes, E. J., Li, X. D., Ma, Y. and Wright, J. (2011) Robust principal component analysis? J. ACM, 58

[25]

Boyd, S.Parikh, N.Chu, E.Peleato, B.Eckstein,and J.. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3, 1–122

[26]

Candès, E. J. and Recht, B. (2009) Exact matrix completion via convex optimization. Found. Comput. Math., 9, 717–772

[27]

Cai, J. F., Candes, E. J. and Shen, Z. W. (2010) A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20, 1956–1982

[28]

Zhou, X., Liu, J., Wan, X. and Yu, W. (2014) Piecewise-constant and low-rank approximation for identification of recurrent copy number variations. Bioinformatics, 30, 1943–1949

[29]

Chung, N. C. and Storey, J. D. (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31, 545–554

[30]

Linting, M., van Os, B. J. and Meulman, J. J. (2011) Statistical significance of the contribution of variables to the PCA solution: an alternative permutation strategy. Psychometrika, 76, 440–460

[31]

Friedman, J., Hastie, T. and Tibshirani, R. (2009) The Elements of Statistical Learning. New York: Springer-Verlag

[32]

Jain, A. K., Murty, M. N., and Flynn, P. J. (1999) Data clustering: a review. ACM computing surveys (CSUR), 31, 264–323

[33]

Han, J., Kamber, M. and Pei, J. (2011) Data mining: concepts and techniques: concepts and techniques. San Francisco: Morgan Kaufmann

[34]

Rodriguez, A. and Laio, A. (2014) Clustering by fast search and find of density peaks. Science, 344, 1492–1496

[35]

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003) Latent dirichlet allocation. J. Mach. Learn. Res., 3, 993–1022

[36]

Nguyen, X. and Gelfand, A. E. (2011) The Dirichlet labeling process for clustering functional data. Stat. Sin., 21, 1249–1289

[37]

Dahl, D. B. (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In Bayesian inference for gene expression and proteomics, 201–218, Cambridge: Cambridge University Press

[38]

Savage, R. S., Ghahramani, Z., Griffin, J. E., Kirk, P. and Wild, D. L. (2013) Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data. arXiv:1304.3577

[39]

Nguyen, N. and Caruana, R. (2007) Consensus clusterings. In Data Mining, ICDM 2007. Seventh IEEE International Conference, 607–612

[40]

Goder, A. and Filkov, V. (2008) Consensus Clustering Algorithms: Comparison and Refinement. in Alenex, SIAM., 109–117

[41]

Girvan, M. and Newman, M. E. (2002) Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, 99, 7821–7826

[42]

Newman, M. E. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582

[43]

Ng, A. Y., Jordan, M. I. and Weiss, Y. (2001) On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems. 849–856, Cambridge: MIT Press

[44]

von Luxburg, U. (2007) A tutorial on spectral clustering. Stat. Comput., 17, 395–416

[45]

Enright, A. J., Van Dongen, S. and Ouzounis, C. A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30, 1575–1584

[46]

Levandowsky, M. and Winter, D. (1971) Distance between sets. Nature, 234, 34–35

[47]

Hubert, L. and Arabie, P. (1985) Comparing partitions. J. Classif., 2, 193–218

[48]

Alizadeh, A. A., Aranda, V., Bardelli, A., Blanpain, C., Bock, C., Borowski, C., Caldas, C., Califano, A., Doherty, M., Elsner, M., (2015) Toward understanding and exploiting tumor heterogeneity. Nat. Med., 21, 846–853

[49]

Kan, Z., Jaiswal, B. S., Stinson, J., Janakiraman, V., Bhatt, D., Stern, H. M., Yue, P., Haverty, P. M., Bourgon, R., Zheng, J., (2010) Diverse somatic mutation patterns and pathway alterations in human cancers. Nature, 466, 869–873

[50]

Lohr, J. G., Stojanov, P., Lawrence, M. S., Auclair, D., Chapuy, B., Sougnez, C., Cruz-Gordillo, P., Knoechel, B., Asmann, Y. W., Slager, S. L., (2012) Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc. Natl. Acad. Sci. USA, 109, 3879–3884

[51]

Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218

[52]

Villanueva, A., Portela, A., Sayols, S., Battiston, C., Hoshida, Y., Méndez-González, J., Imbeaud, S., Letouzé, E., Hernandez-Gea, V., Cornella, H., (2015) DNA methylation-based prognosis and epidrivers in hepatocellular carcinoma. Hepatology, 61, 1945–1956

[53]

Eifert, C. and Powers, R. S. (2012) From cancer genomes to oncogenic drivers, tumour dependencies and therapeutic targets. Nat. Rev. Cancer, 12, 572–578

[54]

Sanchez-Garcia, F., Villagrasa, P., Matsui, J., Kotliar, D., Castro, V., Akavia, U. D., Chen, B. J., Saucedo-Cuevas, L., Rodriguez Barrueco, R., Llobet-Navas, D., (2014) Integration of genomic data enables selective discovery of breast cancer drivers. Cell, 159, 1461–1475

[55]

Shalem, O., Sanjana, N. E., Hartenian, E., Shi, X., Scott, D. A., Mikkelsen, T. S., Heckl, D., Ebert, B. L., Root, D. E., Doench, J. G., (2014) Genome-scale CRISPR-Cas9 knockout screening in human cells. Science, 343, 84–87

[56]

Jiang, P., Wang, H., Li, W., Zang, C., Li, B., Wong, Y. J., Meyer, C., Liu, J. S., Aster, J. C. and Liu, X. S. (2015) Network analysis of gene essentiality in functional genomics experiments. Genome Biol., 16, 239

[57]

Chen, J. C., Alvarez, M. J., Talos, F., Dhruv, H., Rieckhof, G. E., Iyer, A., Diefes, K. L., Aldape, K., Berens, M., Shen, M. M., (2014) Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks. Cell, 159, 402–414

[58]

Fehrmann, R. S., Karjalainen, J. M., Krajewska, M., Westra, H. J., Maloney, D., Simeonov, A., Pers, T. H., Hirschhorn, J. N., Jansen, R. C., Schultes, E. A., (2015) Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet., 47, 115–125

[59]

Rockman, M. V. and Kruglyak, L. (2006) Genetics of global gene expression. Nat. Rev. Genet., 7, 862–872

[60]

Akavia, U. D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H. C., Pochanard, P., Mozes, E., Garraway, L. A. and Pe’er, D. (2010) An integrated approach to uncover drivers of cancer. Cell, 143, 1005–1017

[61]

Li, Q., Seo, J. H., Stranger, B., McKenna, A., Pe’er, I., Laframboise, T., Brown, M., Tyekucheva, S. and Freedman, M. L. (2013) Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell, 152, 633–641

[62]

Cancer Genome Atlas Research Network. (2014) Integrated genomic characterization of papillary thyroid carcinoma. Cell, 159, 676–690

[63]

Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K. and Irizarry, R. A. (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet., 11, 733–739

[64]

Eisenberg, E. and Levanon, E. Y. (2003) Human housekeeping genes are compact. Trends Genet., 19, 362–365

[65]

van der Maaten, L. and Hinton, G. (2008) Visualizing Data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605.

[66]

Hoyer, P. O. (2004) Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res., 5, 1457–1469.

[67]

Lee, D. D. and Seung, H. S. (1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791

[68]

Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. and Tanabe, M. (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res., 40, D109–D114

[69]

Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res., 39, D691–D697

[70]

Caspi, R., Altman, T., Billington, R., Dreher, K., Foerster, H., Fulcher, C. A., Holland, T. A., Keseler, I. M., Kothari, A., Kubo, A., (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res., 42, D459–D471

[71]

Livshits, A., Git, A., Fuks, G., Caldas, C. and Domany, E. (2015) Pathway-based personalized analysis of breast cancer expression data. Mol. Oncol., 9, 1471–1483

[72]

Tarca, A. L., Draghici, S., Khatri, P., Hassan, S. S., Mittal, P., Kim, J. S., Kim, C. J., Kusanovic, J. P. and Romero, R. (2009) A novel signaling pathway impact analysis. Bioinformatics, 25, 75–82

[73]

Paull, E. O., Carlin, D. E., Niepel, M., Sorger, P. K., Haussler, D. and Stuart, J. M. (2013) Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics, 29, 2757–2764

[74]

Hofree, M., Shen, J. P., Carter, H., Gross, A. and Ideker, T. (2013) Network-based stratification of tumor mutations. Nat. Methods, 10, 1108–1115

[75]

Liu, Z. and Zhang, S. (2015) Tumor characterization and stratification by integrated molecular profiles reveals essential pan-cancer features. BMC Genomics, 16, 503

[76]

Cancer Genome Atlas Research Network, Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C. andStuart, J. M. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet., 45, 1113–1120

[77]

Cancer Genome Atlas Research Network. (2014) Comprehensive molecular characterization of gastric adenocarcinoma. Nature, 513, 202–209

[78]

Yuan, Y., Van Allen, E. M., Omberg, L., Wagle, N., Amin-Mansour, A., Sokolov, A., Byers, L. A., Xu, Y., Hess, K. R., Diao, L., (2014) Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol., 32, 644–652

[79]

Wold, S., Martens, H. and Wold, H. (1983) The multivariate calibration-problem in chemistry solved by the Pls Method. Lect. Notes Math., 973, 286–293

[80]

Bastien, P., Bertrand, F., Meyer, N. and Maumy-Bertrand, M. (2015) Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data. Bioinformatics, 31, 397–404

[81]

Aronson, S. J. and Rehm, H. L. (2015) Building the foundation for genomics in precision medicine. Nature, 526, 336–342

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (727KB)

3078

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/