Integrative clustering methods of multi-omics data for molecule-based cancer classifications
Dongfang Wang, Jin Gu
Integrative clustering methods of multi-omics data for molecule-based cancer classifications
One goal of precise oncology is to re-classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large-scale multi-omics data is an important way for molecule-based cancer classification. The data heterogeneity and the complexity of inter-omics variations are two major challenges for the integrative clustering analysis. According to the different strategies to deal with these difficulties, we summarized the clustering methods as three major categories: direct integrative clustering, clustering of clusters and regulatory integrative clustering. A few practical considerations on data pre-processing, post-clustering analysis and pathway-based analysis are also discussed.
clustering / cancer classification / omics / integrative analysis
[1] |
Garraway, L. A., Verweij, J. and Ballman, K. V. (2013) Precision oncology: an overview. J. Clin. Oncol., 31, 1803–1805
CrossRef
Pubmed
Google scholar
|
[2] |
Shrager, J. and Tenenbaum, J. M. (2014) Rapid learning for precision oncology. Nat. Rev. Clin. Oncol., 11, 109–118
CrossRef
Pubmed
Google scholar
|
[3] |
Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., Leiserson, M. D., Niu, B., McLellan, M. D., Uzunangelov, V.,
CrossRef
Pubmed
Google scholar
|
[4] |
Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. and Kim, D. (2015) Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet., 16, 85–97
CrossRef
Pubmed
Google scholar
|
[5] |
Liu, Z., Zhang, X. S. and Zhang, S. (2014) Breast tumor subgroups reveal diverse clinical prognostic power. Sci. Rep., 4, 4002
Pubmed
|
[6] |
Han, L., Yuan, Y., Zheng, S., Yang, Y., Li, J., Edgerton, M. E., Diao, L., Xu, Y., Verhaak, R. G. and Liang, H. (2014) The Pan-Cancer analysis of pseudogene expression reveals biologically and clinically relevant tumour subtypes. Nat. Commun., 5, 3963
CrossRef
Pubmed
Google scholar
|
[7] |
Curtis, C., Shah, S. P., Chin, S. F., Turashvili, G., Rueda, O. M., Dunning, M. J., Speed, D., Lynch, A. G., Samarajiwa, S., Yuan, Y.,
Pubmed
|
[8] |
Cancer Genome Atlas, N. (2012) Comprehensive molecular portraits of human breast tumours. Nature, 490, 61–70
CrossRef
Pubmed
Google scholar
|
[9] |
Popat, S., Hubner, R. and Houlston, R. S. (2005) Systematic review of microsatellite instability and colorectal cancer prognosis. J. Clin. Oncol., 23, 609–618
CrossRef
Pubmed
Google scholar
|
[10] |
Issa, J. P. (2004) CpG island methylator phenotype in cancer. Nat. Rev. Cancer, 4, 988–993
CrossRef
Pubmed
Google scholar
|
[11] |
Kristensen, V. N., Lingjærde, O. C., Russnes, H. G., Vollan, H. K., Frigessi, A. and Børresen-Dale, A. L. (2014) Principles and methods of integrative genomic analyses in cancer. Nat. Rev. Cancer, 14, 299–313
CrossRef
Pubmed
Google scholar
|
[12] |
Zhang, W., Liu, Y., Sun, N., Wang, D., Boyd-Kirkup, J., Dou, X. and Han, J. D. (2013) Integrating genomic, epigenomic, and transcriptomic features reveals modular signatures underlying poor prognosis in ovarian cancer. Cell Reports, 4, 542–553
CrossRef
Pubmed
Google scholar
|
[13] |
Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., Powers, R. S., Ladanyi, M. and Shen, R. (2013) Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl. Acad. Sci. USA, 110, 4245–4250
CrossRef
Pubmed
Google scholar
|
[14] |
Lock, E. F., Hoadley, K. A., Marron, J. S. and Nobel, A. B. (2013) Joint and Individual Variation Explained (Jive) for integrated analysis of multiple data types. Ann. Appl. Stat., 7, 523–542
CrossRef
Pubmed
Google scholar
|
[15] |
Wu, D., Wang, D., Gu, J. and Zhang, M. Q. (2015) Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification. BMC Genomics, 16, 1022
CrossRef
Google scholar
|
[16] |
Zhang, S., Liu, C. C., Li, W., Shen, H., Laird, P. W. and Zhou, X. J. (2012) Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res., 40, 9379–9391
CrossRef
Pubmed
Google scholar
|
[17] |
Drier, Y., Sheffer, M. and Domany, E. (2013) Pathway-based personalized analysis of cancer. Proc. Natl. Acad. Sci. USA, 110, 6388–6393
CrossRef
Pubmed
Google scholar
|
[18] |
Kirk, P., Griffin, J. E., Savage, R. S., Ghahramani, Z. and Wild, D. L. (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics, 28, 3290–3297
CrossRef
Pubmed
Google scholar
|
[19] |
Lock, E. F. and Dunson, D. B. (2013) Bayesian consensus clustering. Bioinformatics, 29, 2610–2616
CrossRef
Pubmed
Google scholar
|
[20] |
Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B. and Goldenberg, A. (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods, 11, 333–337
CrossRef
Pubmed
Google scholar
|
[21] |
Vaske, C. J., Benz, S. C., Sanborn, J. Z., Earl, D., Szeto, C., Zhu, J., Haussler, D. and Stuart, J. M. (2010) Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 26, i237–i245
CrossRef
Pubmed
Google scholar
|
[22] |
Shen, R., Olshen, A. B. and Ladanyi, M. (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25, 2906–2912
CrossRef
Pubmed
Google scholar
|
[23] |
Zhang, S., Li, Q., Liu, J. and Zhou, X. J. (2011) A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics, 27, i401–i409
CrossRef
Pubmed
Google scholar
|
[24] |
Candes, E. J., Li, X. D., Ma, Y. and Wright, J. (2011) Robust principal component analysis? J. ACM, 58
|
[25] |
Boyd, S.Parikh, N.Chu, E.Peleato, B.Eckstein,and J.. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3, 1–122
|
[26] |
Candès, E. J. and Recht, B. (2009) Exact matrix completion via convex optimization. Found. Comput. Math., 9, 717–772
CrossRef
Google scholar
|
[27] |
Cai, J. F., Candes, E. J. and Shen, Z. W. (2010) A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20, 1956–1982
CrossRef
Google scholar
|
[28] |
Zhou, X., Liu, J., Wan, X. and Yu, W. (2014) Piecewise-constant and low-rank approximation for identification of recurrent copy number variations. Bioinformatics, 30, 1943–1949
CrossRef
Pubmed
Google scholar
|
[29] |
Chung, N. C. and Storey, J. D. (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31, 545–554
CrossRef
Pubmed
Google scholar
|
[30] |
Linting, M., van Os, B. J. and Meulman, J. J. (2011) Statistical significance of the contribution of variables to the PCA solution: an alternative permutation strategy. Psychometrika, 76, 440–460
CrossRef
Google scholar
|
[31] |
Friedman, J., Hastie, T. and Tibshirani, R. (2009) The Elements of Statistical Learning. New York: Springer-Verlag
|
[32] |
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999) Data clustering: a review. ACM computing surveys (CSUR), 31, 264–323
|
[33] |
Han, J., Kamber, M. and Pei, J. (2011) Data mining: concepts and techniques: concepts and techniques. San Francisco: Morgan Kaufmann
|
[34] |
Rodriguez, A. and Laio, A. (2014) Clustering by fast search and find of density peaks. Science, 344, 1492–1496
CrossRef
Pubmed
Google scholar
|
[35] |
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003) Latent dirichlet allocation. J. Mach. Learn. Res., 3, 993–1022
|
[36] |
Nguyen, X. and Gelfand, A. E. (2011) The Dirichlet labeling process for clustering functional data. Stat. Sin., 21, 1249–1289
CrossRef
Google scholar
|
[37] |
Dahl, D. B. (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In Bayesian inference for gene expression and proteomics, 201–218, Cambridge: Cambridge University Press
|
[38] |
Savage, R. S., Ghahramani, Z., Griffin, J. E., Kirk, P. and Wild, D. L. (2013) Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data. arXiv:1304.3577
|
[39] |
Nguyen, N. and Caruana, R. (2007) Consensus clusterings. In Data Mining, ICDM 2007. Seventh IEEE International Conference, 607–612
|
[40] |
Goder, A. and Filkov, V. (2008) Consensus Clustering Algorithms: Comparison and Refinement. in Alenex, SIAM., 109–117
|
[41] |
Girvan, M. and Newman, M. E. (2002) Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, 99, 7821–7826
CrossRef
Pubmed
Google scholar
|
[42] |
Newman, M. E. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582
CrossRef
Pubmed
Google scholar
|
[43] |
Ng, A. Y., Jordan, M. I. and Weiss, Y. (2001) On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems. 849–856, Cambridge: MIT Press
|
[44] |
von Luxburg, U. (2007) A tutorial on spectral clustering. Stat. Comput., 17, 395–416
CrossRef
Google scholar
|
[45] |
Enright, A. J., Van Dongen, S. and Ouzounis, C. A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30, 1575–1584
CrossRef
Pubmed
Google scholar
|
[46] |
Levandowsky, M. and Winter, D. (1971) Distance between sets. Nature, 234, 34–35
CrossRef
Google scholar
|
[47] |
Hubert, L. and Arabie, P. (1985) Comparing partitions. J. Classif., 2, 193–218
CrossRef
Google scholar
|
[48] |
Alizadeh, A. A., Aranda, V., Bardelli, A., Blanpain, C., Bock, C., Borowski, C., Caldas, C., Califano, A., Doherty, M., Elsner, M.,
CrossRef
Pubmed
Google scholar
|
[49] |
Kan, Z., Jaiswal, B. S., Stinson, J., Janakiraman, V., Bhatt, D., Stern, H. M., Yue, P., Haverty, P. M., Bourgon, R., Zheng, J.,
CrossRef
Pubmed
Google scholar
|
[50] |
Lohr, J. G., Stojanov, P., Lawrence, M. S., Auclair, D., Chapuy, B., Sougnez, C., Cruz-Gordillo, P., Knoechel, B., Asmann, Y. W., Slager, S. L.,
CrossRef
Pubmed
Google scholar
|
[51] |
Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A.,
CrossRef
Pubmed
Google scholar
|
[52] |
Villanueva, A., Portela, A., Sayols, S., Battiston, C., Hoshida, Y., Méndez-González, J., Imbeaud, S., Letouzé, E., Hernandez-Gea, V., Cornella, H.,
CrossRef
Pubmed
Google scholar
|
[53] |
Eifert, C. and Powers, R. S. (2012) From cancer genomes to oncogenic drivers, tumour dependencies and therapeutic targets. Nat. Rev. Cancer, 12, 572–578
CrossRef
Pubmed
Google scholar
|
[54] |
Sanchez-Garcia, F., Villagrasa, P., Matsui, J., Kotliar, D., Castro, V., Akavia, U. D., Chen, B. J., Saucedo-Cuevas, L., Rodriguez Barrueco, R., Llobet-Navas, D.,
CrossRef
Pubmed
Google scholar
|
[55] |
Shalem, O., Sanjana, N. E., Hartenian, E., Shi, X., Scott, D. A., Mikkelsen, T. S., Heckl, D., Ebert, B. L., Root, D. E., Doench, J. G.,
CrossRef
Pubmed
Google scholar
|
[56] |
Jiang, P., Wang, H., Li, W., Zang, C., Li, B., Wong, Y. J., Meyer, C., Liu, J. S., Aster, J. C. and Liu, X. S. (2015) Network analysis of gene essentiality in functional genomics experiments. Genome Biol., 16, 239
CrossRef
Pubmed
Google scholar
|
[57] |
Chen, J. C., Alvarez, M. J., Talos, F., Dhruv, H., Rieckhof, G. E., Iyer, A., Diefes, K. L., Aldape, K., Berens, M., Shen, M. M.,
CrossRef
Pubmed
Google scholar
|
[58] |
Fehrmann, R. S., Karjalainen, J. M., Krajewska, M., Westra, H. J., Maloney, D., Simeonov, A., Pers, T. H., Hirschhorn, J. N., Jansen, R. C., Schultes, E. A.,
CrossRef
Pubmed
Google scholar
|
[59] |
Rockman, M. V. and Kruglyak, L. (2006) Genetics of global gene expression. Nat. Rev. Genet., 7, 862–872
CrossRef
Pubmed
Google scholar
|
[60] |
Akavia, U. D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H. C., Pochanard, P., Mozes, E., Garraway, L. A. and Pe’er, D. (2010) An integrated approach to uncover drivers of cancer. Cell, 143, 1005–1017
CrossRef
Pubmed
Google scholar
|
[61] |
Li, Q., Seo, J. H., Stranger, B., McKenna, A., Pe’er, I., Laframboise, T., Brown, M., Tyekucheva, S. and Freedman, M. L. (2013) Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell, 152, 633–641
CrossRef
Pubmed
Google scholar
|
[62] |
Cancer Genome Atlas Research Network. (2014) Integrated genomic characterization of papillary thyroid carcinoma. Cell, 159, 676–690
CrossRef
Pubmed
Google scholar
|
[63] |
Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K. and Irizarry, R. A. (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet., 11, 733–739
CrossRef
Pubmed
Google scholar
|
[64] |
Eisenberg, E. and Levanon, E. Y. (2003) Human housekeeping genes are compact. Trends Genet., 19, 362–365
CrossRef
Pubmed
Google scholar
|
[65] |
van der Maaten, L. and Hinton, G. (2008) Visualizing Data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605.
|
[66] |
Hoyer, P. O. (2004) Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res., 5, 1457–1469.
|
[67] |
Lee, D. D. and Seung, H. S. (1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791
CrossRef
Pubmed
Google scholar
|
[68] |
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. and Tanabe, M. (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res., 40, D109–D114
CrossRef
Pubmed
Google scholar
|
[69] |
Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B.,
CrossRef
Pubmed
Google scholar
|
[70] |
Caspi, R., Altman, T., Billington, R., Dreher, K., Foerster, H., Fulcher, C. A., Holland, T. A., Keseler, I. M., Kothari, A., Kubo, A.,
CrossRef
Pubmed
Google scholar
|
[71] |
Livshits, A., Git, A., Fuks, G., Caldas, C. and Domany, E. (2015) Pathway-based personalized analysis of breast cancer expression data. Mol. Oncol., 9, 1471–1483
CrossRef
Pubmed
Google scholar
|
[72] |
Tarca, A. L., Draghici, S., Khatri, P., Hassan, S. S., Mittal, P., Kim, J. S., Kim, C. J., Kusanovic, J. P. and Romero, R. (2009) A novel signaling pathway impact analysis. Bioinformatics, 25, 75–82
CrossRef
Pubmed
Google scholar
|
[73] |
Paull, E. O., Carlin, D. E., Niepel, M., Sorger, P. K., Haussler, D. and Stuart, J. M. (2013) Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics, 29, 2757–2764
CrossRef
Pubmed
Google scholar
|
[74] |
Hofree, M., Shen, J. P., Carter, H., Gross, A. and Ideker, T. (2013) Network-based stratification of tumor mutations. Nat. Methods, 10, 1108–1115
CrossRef
Pubmed
Google scholar
|
[75] |
Liu, Z. and Zhang, S. (2015) Tumor characterization and stratification by integrated molecular profiles reveals essential pan-cancer features. BMC Genomics, 16, 503
CrossRef
Pubmed
Google scholar
|
[76] |
Cancer Genome Atlas Research Network, Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C. andStuart, J. M. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet., 45, 1113–1120
CrossRef
Pubmed
Google scholar
|
[77] |
Cancer Genome Atlas Research Network. (2014) Comprehensive molecular characterization of gastric adenocarcinoma. Nature, 513, 202–209
CrossRef
Pubmed
Google scholar
|
[78] |
Yuan, Y., Van Allen, E. M., Omberg, L., Wagle, N., Amin-Mansour, A., Sokolov, A., Byers, L. A., Xu, Y., Hess, K. R., Diao, L.,
CrossRef
Pubmed
Google scholar
|
[79] |
Wold, S., Martens, H. and Wold, H. (1983) The multivariate calibration-problem in chemistry solved by the Pls Method. Lect. Notes Math., 973, 286–293
CrossRef
Google scholar
|
[80] |
Bastien, P., Bertrand, F., Meyer, N. and Maumy-Bertrand, M. (2015) Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data. Bioinformatics, 31, 397–404
CrossRef
Pubmed
Google scholar
|
[81] |
Aronson, S. J. and Rehm, H. L. (2015) Building the foundation for genomics in precision medicine. Nature, 526, 336–342
CrossRef
Pubmed
Google scholar
|
/
〈 | 〉 |