On the statistical significance of protein complex
Youfu Su, Can Zhao, Zheng Chen, Bo Tian, Zengyou He
On the statistical significance of protein complex
Background: Statistical validation of predicted complexes is a fundamental issue in proteomics and bioinformatics. The target is to measure the statistical significance of each predicted complex in terms of p-values. Surprisingly, this issue has not received much attention in the literature. To our knowledge, only a few research efforts have been made towards this direction.
Methods: In this article, we propose a novel method for calculating the p-value of a predicted complex. The null hypothesis is that there is no difference between the number of edges in target protein complex and that in the random null model. In addition, we assume that a true protein complex must be a connected subgraph. Based on this null hypothesis, we present an algorithm to compute the p-value of a given predicted complex.
Results: We test our method on five benchmark data sets to evaluate its effectiveness.
Conclusions: The experimental results show that our method is superior to the state-of-the-art algorithms on assessing the statistical significance of candidate protein complexes.
predicted complex / statistical significance testing / subgraph mining / community detection
[1] |
Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P.,
CrossRef
Pubmed
Google scholar
|
[2] |
Gavin, A.-C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L. J., Bastuck, S., Dümpelfeld, B.,
CrossRef
Pubmed
Google scholar
|
[3] |
Nepusz, T., Yu, H. and Paccanaro, A. (2012) Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods, 9, 471–472
CrossRef
Pubmed
Google scholar
|
[4] |
Teng, B., Zhao, C., Liu, X. and He, Z. (2015) Network inference from AP-MS data: computational challenges and solutions. Brief. Bioinform., 16, 658–674
CrossRef
Pubmed
Google scholar
|
[5] |
Ma, X., Zhou, G., Shang, J., Wang, J., Peng, J. and Han, J. (2017) Detection of complexes in biological networks through diversified dense subgraph mining. J. Comput. Biol., 24, 923–941
CrossRef
Pubmed
Google scholar
|
[6] |
Chen, B., Fan, W., Liu, J. and Wu, F.-X. (2014) Identifying protein complexes and functional modules–from static PPI networks to dynamic PPI networks. Brief. Bioinform., 15, 177–194
CrossRef
Pubmed
Google scholar
|
[7] |
Ji, J., Zhang, A., Liu, C., Quan, X. and Liu, Z. (2014) Survey: functional module detection from protein-protein interaction networks. IEEE Trans. Knowl. Data Eng., 26, 261–277
CrossRef
Google scholar
|
[8] |
Li, X., Wu, M., Kwoh, C.-K. and Ng, S.-K. (2010) Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics, 11, S3
CrossRef
Pubmed
Google scholar
|
[9] |
Wang, J., Li, M., Deng, Y. and Pan, Y. (2010) Recent advances in clustering methods for protein interaction networks. BMC Genomics, 11, S10
Pubmed
|
[10] |
Bhowmick, S. S. and Seah, B. S. (2016) Clustering and summarizing protein-protein interaction networks: a survey. IEEE Trans. Knowl. Data Eng., 28, 638–658
CrossRef
Google scholar
|
[11] |
Adamcsek, B., Palla, G., Farkas, I. J., Derényi, I. and Vicsek, T. (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 22, 1021–1023
CrossRef
Pubmed
Google scholar
|
[12] |
Palla, G., Derényi, I., Farkas, I. and Vicsek, T. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 814–818
CrossRef
Pubmed
Google scholar
|
[13] |
Brohée, S. and van Helden, J. (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7, 488
CrossRef
Pubmed
Google scholar
|
[14] |
Song, J. and Singh, M. (2009) How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics, 25, 3143–3150
CrossRef
Pubmed
Google scholar
|
[15] |
Traag, V. A., Krings, G. and Van Dooren, P. (2013) Significant scales in community structure. Sci. Rep., 3, 2930
|
[16] |
Koyutürk, M., Szpankowski, W. and Grama, A. (2007) Assessing significance of connectivity and conservation in protein interaction networks. J. Comput. Biol., 14, 747–764
CrossRef
Pubmed
Google scholar
|
[17] |
Lancichinetti, A., Radicchi, F., Ramasco, J. J. and Fortunato, S. (2011) Finding statistically significant communities in networks. PLoS One, 6, e18961
CrossRef
Pubmed
Google scholar
|
[18] |
Spirin, V. and Mirny, L. A. (2003) Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA, 100, 12123–12128
CrossRef
Pubmed
Google scholar
|
[19] |
Chakraborty, T., Dalmia, A., Mukherjee, A. and Ganguly, N. (2017) Metrics for community analysis: A survey. ACM Comput. Surv., 50, 1–37
CrossRef
Google scholar
|
[20] |
Zhang, P. and Moore, C. (2014) Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc. Natl. Acad. Sci. USA, 111, 18144–18149
CrossRef
Pubmed
Google scholar
|
[21] |
Csardi, G. and Nepusz, T. (2006) The Igraph software package for complex network research. Inter. Journal Complex Systems, 1695, 1–9
|
[22] |
Nepusz, T., Yu, H. and Paccanaro, A. Clusterone cytoscape plugin. http://www.paccanarolab.org/static_content/clusterone/cl1-cytoscape3-1.0.html
|
[23] |
Collins, S. R., Kemmeren, P., Zhao, X.-C., Greenblatt, J. F., Spencer, F., Holstege, F. C., Weissman, J. S. and Krogan, N. J. (2007) Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics, 6, 439–450
CrossRef
Pubmed
Google scholar
|
[24] |
Krogan, N. J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A. P.,
CrossRef
Pubmed
Google scholar
|
[25] |
Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A. and Tyers, M. (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res. 34, Suppl 1, D535–D539
CrossRef
Google scholar
|
[26] |
“How many connected graphs over v vertices and e edges?” http://math.stackexchange.com/questions/689526/how-many-connected-graphs-over-v-vertices-and-e-edges
|
[27] |
Shor, P. W. (1995) A new proof of cayley’s formula for counting labeled trees. J. Com. Theory, 71, 154–158
CrossRef
Google scholar
|
[28] |
Marquardt, D. W. (1963) An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math., 11, 431–441
CrossRef
Google scholar
|
[29] |
Moré, J. (1977) The levenberg–marquardt algorithm: Implementation and theory. In Conference on Numerical Analysis. Dundee, UK
|
/
〈 | 〉 |