PDF
(769KB)
Abstract
Background: Statistical validation of predicted complexes is a fundamental issue in proteomics and bioinformatics. The target is to measure the statistical significance of each predicted complex in terms of p-values. Surprisingly, this issue has not received much attention in the literature. To our knowledge, only a few research efforts have been made towards this direction.
Methods: In this article, we propose a novel method for calculating the p-value of a predicted complex. The null hypothesis is that there is no difference between the number of edges in target protein complex and that in the random null model. In addition, we assume that a true protein complex must be a connected subgraph. Based on this null hypothesis, we present an algorithm to compute the p-value of a given predicted complex.
Results: We test our method on five benchmark data sets to evaluate its effectiveness.
Conclusions: The experimental results show that our method is superior to the state-of-the-art algorithms on assessing the statistical significance of candidate protein complexes.
Graphical abstract
Keywords
predicted complex
/
statistical significance testing
/
subgraph mining
/
community detection
Cite this article
Download citation ▾
Youfu Su, Can Zhao, Zheng Chen, Bo Tian, Zengyou He.
On the statistical significance of protein complex.
Quant. Biol., 2018, 6(4): 313-320 DOI:10.1007/s40484-018-0153-6
| [1] |
Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 623–627
|
| [2] |
Gavin, A.-C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L. J., Bastuck, S., Dümpelfeld, B., (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature, 440, 631–636
|
| [3] |
Nepusz, T., Yu, H. and Paccanaro, A. (2012) Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods, 9, 471–472
|
| [4] |
Teng, B., Zhao, C., Liu, X. and He, Z. (2015) Network inference from AP-MS data: computational challenges and solutions. Brief. Bioinform., 16, 658–674
|
| [5] |
Ma, X., Zhou, G., Shang, J., Wang, J., Peng, J. and Han, J. (2017) Detection of complexes in biological networks through diversified dense subgraph mining. J. Comput. Biol., 24, 923–941
|
| [6] |
Chen, B., Fan, W., Liu, J. and Wu, F.-X. (2014) Identifying protein complexes and functional modules–from static PPI networks to dynamic PPI networks. Brief. Bioinform., 15, 177–194
|
| [7] |
Ji, J., Zhang, A., Liu, C., Quan, X. and Liu, Z. (2014) Survey: functional module detection from protein-protein interaction networks. IEEE Trans. Knowl. Data Eng., 26, 261–277
|
| [8] |
Li, X., Wu, M., Kwoh, C.-K. and Ng, S.-K. (2010) Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics, 11, S3
|
| [9] |
Wang, J., Li, M., Deng, Y. and Pan, Y. (2010) Recent advances in clustering methods for protein interaction networks. BMC Genomics, 11, S10
|
| [10] |
Bhowmick, S. S. and Seah, B. S. (2016) Clustering and summarizing protein-protein interaction networks: a survey. IEEE Trans. Knowl. Data Eng., 28, 638–658
|
| [11] |
Adamcsek, B., Palla, G., Farkas, I. J., Derényi, I. and Vicsek, T. (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 22, 1021–1023
|
| [12] |
Palla, G., Derényi, I., Farkas, I. and Vicsek, T. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 814–818
|
| [13] |
Brohée, S. and van Helden, J. (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7, 488
|
| [14] |
Song, J. and Singh, M. (2009) How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics, 25, 3143–3150
|
| [15] |
Traag, V. A., Krings, G. and Van Dooren, P. (2013) Significant scales in community structure. Sci. Rep., 3, 2930
|
| [16] |
Koyutürk, M., Szpankowski, W. and Grama, A. (2007) Assessing significance of connectivity and conservation in protein interaction networks. J. Comput. Biol., 14, 747–764
|
| [17] |
Lancichinetti, A., Radicchi, F., Ramasco, J. J. and Fortunato, S. (2011) Finding statistically significant communities in networks. PLoS One, 6, e18961
|
| [18] |
Spirin, V. and Mirny, L. A. (2003) Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA, 100, 12123–12128
|
| [19] |
Chakraborty, T., Dalmia, A., Mukherjee, A. and Ganguly, N. (2017) Metrics for community analysis: A survey. ACM Comput. Surv., 50, 1–37
|
| [20] |
Zhang, P. and Moore, C. (2014) Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc. Natl. Acad. Sci. USA, 111, 18144–18149
|
| [21] |
Csardi, G. and Nepusz, T. (2006) The Igraph software package for complex network research. Inter. Journal Complex Systems, 1695, 1–9
|
| [22] |
Nepusz, T., Yu, H. and Paccanaro, A. Clusterone cytoscape plugin.
|
| [23] |
Collins, S. R., Kemmeren, P., Zhao, X.-C., Greenblatt, J. F., Spencer, F., Holstege, F. C., Weissman, J. S. and Krogan, N. J. (2007) Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics, 6, 439–450
|
| [24] |
Krogan, N. J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A. P., (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 440, 637–643
|
| [25] |
Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A. and Tyers, M. (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res. 34, Suppl 1, D535–D539
|
| [26] |
“How many connected graphs over v vertices and e edges?”
|
| [27] |
Shor, P. W. (1995) A new proof of cayley’s formula for counting labeled trees. J. Com. Theory, 71, 154–158
|
| [28] |
Marquardt, D. W. (1963) An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math., 11, 431–441
|
| [29] |
Moré J. (1977) The levenberg–marquardt algorithm: Implementation and theory. In Conference on Numerical Analysis. Dundee, UK
|
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature