Biclustering by sparse canonical correlation analysis

Harold Pimentel; Zhiyue Hu; Haiyan Huang

doi:10.1007/s40484-017-0127-0

PDF(1299 KB)

Quant. Biol. ›› 2018, Vol. 6 ›› Issue (1) : 56-67. DOI: 10.1007/s40484-017-0127-0

RESEARCH ARTICLE

Biclustering by sparse canonical correlation analysis

Author information +

History +

Abstract

Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may change or only exist in a subset of collected samples, biclustering that involves clustering both genes and samples has become in-creasingly important, especially when the samples are pooled from a wide range of experimental conditions.

Methods: In this paper, we introduce a new biclustering algorithm to find subsets of genomic expression features (EFs) (e.g., genes, isoforms, exon inclusion) that show strong “group interactions” under certain subsets of samples. Group interactions are defined by strong partial correlations, or equivalently, conditional dependencies between EFs after removing the influences of a set of other functionally related EFs. Our new biclustering method, named SCCA-BC, extends an existing method for group interaction inference, which is based on sparse canonical correlation analysis (SCCA) coupled with repeated random partitioning of the gene expression data set.

Results: SCCA-BC gives sensible results on real data sets and outperforms most existing methods in simulations. Software is available at https://github.com/pimentel/scca-bc.

Conclusions: SCCA-BC seems to work in numerous conditions and the results seem promising for future extensions. SCCA-BC has the ability to find different types of bicluster patterns, and it is especially advantageous in identifying a bicluster whose elements share the same progressive and multivariate normal distribution with a dense covariance matrix.

Graphical abstract

Keywords

biclustering / SCCA / gene clusters

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Harold Pimentel, Zhiyue Hu, Haiyan Huang. Biclustering by sparse canonical correlation analysis. Quant. Biol., 2018, 6(1): 56‒67 https://doi.org/10.1007/s40484-017-0127-0

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868 CrossRef Pubmed Google scholar

[2]	Lazzeroni, L. and Owen, A. (2002) Plaid models for gene expression data. Stat. Sin., 12, 61–86

[3]	Kluger, Y., Basri, R., Chang, J. T. and Gerstein, M. (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res., 13, 703–716 CrossRef Pubmed Google scholar

[4]	Bhattacharya, A. and De, R. K. (2009) Bi-correlation clustering algorithm for determining a set of co-regulated genes. Bioinformatics, 25, 2795–2801 CrossRef Pubmed Google scholar

[5]	Nepomuceno, J. A., Troncoso, A. and Aguilar-Ruiz, J. S. (2011) Biclustering of gene expression data by correlation-based scatter search. BioData Min., 4, 3 CrossRef Pubmed Google scholar

[6]	Gao, Q., Ho, C., Jia, Y., Li, J. J. and Huang, H. (2012) Biclustering of linear patterns in gene expression data. J. Comput. Biol., 19, 619–631 CrossRef Pubmed Google scholar

[7]	Ben-Dor, A., Chor, B., Karp, R. and Yakhini, Z. (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol., 10, 373–384 CrossRef Pubmed Google scholar

[8]	Liu, J. and Wang, W. (2003) Op-cluster: Clustering by tendency in high dimensional space. In Third IEEE International Conference on Data Mining, 2003. ICDM 2003, pp. 187–194 IEEE

[9]	Wang, Y. X. R., Jiang, K., Feldman, L. J., Bickel, P. J., and Huang, H. (2015) Inferring gene-gene interactions and functional modules using sparse canonical correlation analysis. Ann. Appl. Stat. 9, 300–323

[10]	Tan, K. M. and Witten, D. M. (2014) Sparse biclustering of transposable data. J. Comput. Graph. Stat., 23, 985–1008 CrossRef Pubmed Google scholar

[11]	Turner, H., Bailey, T. and Krzanowski, W. (2005) Improved biclustering of microarray data demonstrated through systematic performance tests. Comput. Stat. Data Anal., 48, 235–254 CrossRef Google scholar

[12]	Lee, M., Shen, H., Huang, J. Z. and Marron, J. S. (2010) Biclustering via sparse singular value decomposition. Biometrics, 66, 1087–1095 CrossRef Pubmed Google scholar

[13]	Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bhlmann, P., Gruissem, W., Hennig, L., Thiele, L. and Zitzler, E. (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22, 1122–1129 CrossRef Google scholar

[14]	Higham, N. J. (2002) Computing the nearest correlation matrix–a problem from finance. IMA J. Numer. Anal., 22, 329–343 CrossRef Google scholar

[15]	St Johnston, D. (2002) The art and design of genetic screens: Drosophila melanogaster. Nat. Rev. Genet., 3, 176–188 CrossRef Pubmed Google scholar

[16]	Jorgensen, E. M. and Mango, S. E. (2002) The art and design of genetic screens: Caenorhabditis elegans. Nat. Rev.Genet., 3, 356–369 CrossRef Google scholar

[17]	Brown, J. B., Boley, N., Eisman, R., May, G. E., Stoiber, M. H., Duff, M. O., Booth, B. W., Wen, J., Park, S., Suzuki, A. M., (2014) Diversity and dynamics of the Drosophila transcriptome. Nature, 512, 393–399 CrossRef Pubmed Google scholar

[18]	Hotelling, H. (1936) Relations between two sets of variates. Biometrika, 28, 321–377 CrossRef Google scholar

[19]	Lee, W., Lee, D., Lee, Y. and Pawitan, Y. (2011) Sparse canonical covariance analysis for high-throughput data. Stat. Appl. Genet. Mol. Biol., 10 CrossRef Google scholar

[20]	Tibshirani, R., Walther, G. and Hastie, T. (2001) Estimating the number of clusters in a dataset via the gap statistic. J. R. Stat. Soc. B, 63, 411–423 CrossRef Google scholar

[21]	Anderson, T. W. (1958) An Introduction to Multivariate Statistical Analysis. New York: Wiley

[22]	Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 267–288

ACKNOWLEDGEMENT

The work is partially supported by NSF DMS-1160319.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Harold Pimentel, Zhiyue Hu and Haiyan Huang declare they have no conflict of interests.

This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(1299 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Revised	Accepted	Published
13 Jul 2017	20 Sep 2017	20 Sep 2017	08 Mar 2018
Online First Date	Issue Date
26 Feb 2018	08 Mar 2018

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs