REVIEW

Proteome-wide prediction of protein-protein interactions from high-throughput data

Expand
  • 1. Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; 2. Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan

Received date: 12 May 2012

Accepted date: 30 May 2012

Published date: 01 Jul 2012

Abstract

In this paper, we present a brief review of the existing computational methods for predicting proteome-wide protein-protein interaction networks from highthroughput data. The availability of various types of omics data provides great opportunity and also unprecedented challenge to infer the interactome in cells. Reconstructing the interactome or interaction network is a crucial step for studying the functional relationship among proteins and the involved biological processes. The protein interaction network will provide valuable resources and alternatives to decipher the mechanisms of these functionally interacting elements as well as the running system of cellular operations. In this paper, we describe the main steps of predicting protein-protein interaction networks and categorize the available approaches to couple the physical and functional linkages. The future topics and the analyses beyond prediction are also discussed and concluded.

Cite this article

Zhi-Ping Liu, Luonan Chen . Proteome-wide prediction of protein-protein interactions from high-throughput data[J]. Protein & Cell, 0 , 3(7) : 508 -520 . DOI: 10.1007/s13238-012-2945-1

References

[1] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402 .10.1093/nar/25.17.3389
[2] Andres, L.E., Ezkurdia, I., Garcia, B., Valencia, A., and Juan, D. (2009). EcID. A database for the inference of functional interactions in E. coli. Nucleic Acids Res 37, D629-D635 10.1093/nar/gkn853
[3] Aranda, B., Achuthan, P., Alam-Faruque, Y., Armean, I., Bridge, A., Derow, C., Feuermann, M., Ghanbarian, A.T., Kerrien, S., Khadake, J., . (2010). The IntAct molecular interaction database in 2010. Nucleic Acids Res 38, D525-D531 .10.1093/nar/gkp878
[4] Aytuna, A.S., Gursoy, A., and Keskin, O. (2005). Prediction of protein- protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics 21, 2850-2855 .10.1093/bioinformatics/bti443
[5] Bader, G.D., Betel, D., and Hogue, C.W. (2003). BIND: the bio-molecular interaction network database. Nucleic Acids Res 31, 248-250 .10.1093/nar/gkg056
[6] Bader, J.S., Chaudhuri, A., Rothberg, J.M., and Chant, J. (2004). Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 22, 78-85 .10.1038/nbt924
[7] Barabasi, A.L., and Oltvai, Z. (2004). Network biology: understanding the cell’s functional organization. Nat Rev Genet 5, 101-113 .10.1038/nrg1272
[8] Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., and Edgar, R. (2007). NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res 35, D760-D765 .10.1093/nar/gkl887
[9] Biocarta. (2012). Available: http://cgap.nci.nih.gov/Pathways/ BioCarta_Pathways. Accessed April7, 2012.
[10] Bossi, A., and Lehner, B. (2009). Tissue specificity and the human protein interaction network. Mol Syst Biol 5, 260.10.1038/msb.2009.17
[11] Behrends, C., Sowa, M.E., Gygi, S.P., and Harper, J.W. (2010). Network organization of the human autophagy system. Nature 466, 68-76 .10.1038/nature09204
[12] Bhardwaj, N., and Lu, H. (2005). Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics 21, 2730-2738 .10.1093/bioinformatics/bti398
[13] Bork, P., Jensen, L.J., von Mering, C., Ramani, A.K., Lee, I., and Marcotte, E.M. (2004). Protein interaction networks from yeast to human. Curr Opin Struct Biol 14, 292-299 .10.1016/j.sbi.2004.05.003
[14] Brown, K.R., and Jurisica, I. (2007). Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol 8, R95.10.1186/gb-2007-8-5-r95
[15] Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., and Thompson, J.D. (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31, 3497-3500 .10.1093/nar/gkg500
[16] Chatr-aryamontri, A., Ceol, A., Palazzi, L.M., Nardelli, G., Schneider, M.V., Castagnoli, L., and Cesareni, G. (2007). MINT: the Molecular INTeraction database. Nucleic Acids Res 35, D572-D574 .10.1093/nar/gkl950
[17] Chen, L., Liu, R., Liu, Z.P., Li, M., and Aihara, K. (2012). Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep . 2, 342.10.1038/srep00342
[18] Chen, L., Wang, R.S., and Zhang, X.S. (2009). Biomolecular networks: methods and applications in systems biology (John Wiley & Sons, Hoboken, New Jersey).
[19] Chen, L., Wang, R., Li, C., and Aihara, K. (2010). Modelling biomolecular networks in cells: structures and dynamics. (Springer-Verlag, Berlin).
[20] Chen, L., Wu, L.Y., Wang, Y., and Zhang, X.S. (2006). Inferring protein interactions from experimental data by association probabilistic method. Proteins 62, 833-837 .10.1002/prot.20783
[21] Cole, S.T., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D., Gordon, S.V., Eiglmeier, K., Gas, S., Barry, C.E., . (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537-544 .10.1038/31159
[22] Dandekar, T., Snel, B., Huynen, M., and Bork, P. (1998). Conserva-tion of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23, 324-328 .10.1016/S0968-0004(98)01274-2
[23] Enright, A.J., Iliopoulos, I., Kyrpides, N.C., and Ouzounis, C.A. (1999). Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86-90 .
[24] Eisenberg, D., Marcotte, E.M., Xenarios, I., and Yeates, T.O. (2000). Protein function in the post-genomic era. Nature 405, 823-826 .10.1038/35015694
[25] Ge, H., Liu, Z., Church, G.M., and Vidal, M. (2001). Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29, 482-486 .10.1038/ng776
[26] Gobel, U., Sander, C., Schneider, R., and Valencia, A. (1994). Correlated mutations and residue contacts in proteins. Proteins 18, 309-317 .10.1002/prot.340180402
[27] Goh, C.S., Bogan, A.A., Joachimiak, M., Walther, D., and Cohen, F.E. (2000). Co-evolution of proteins with their interaction partners. J Mol Biol 299, 283-293 .10.1006/jmbi.2000.3732
[28] Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M., and Barabasi, A.L. (2007). The human disease network. Proc Natl Acad Sci U S A 104, 8685-8690 .10.1073/pnas.0701361104
[29] Grigoriev, A. (2001). A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. Nucleic Acids Res 29, 3513-3519 .10.1093/nar/29.17.3513
[30] Guo, Y., Yu, L., Wen, Z., and Li, M. (2008). Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res 36, 3025-3030 .10.1093/nar/gkn159
[31] Han, J.D., Bertin, N., Hao, T., Goldberg, D.S., Berriz, G.F., Zhang, L.V., Dupuy, D., Walhout, A.J., Cusick, M.E., Roth, F.P., . (2004a). Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88-93 .10.1038/nature02555
[32] Han, K., Park, B., Kim, H., Hong, J., and Park, J. (2004b). PID: the Human Protein Interaction Database. Bioinformatics 20, 2466-2470 .10.1093/bioinformatics/bth253
[33] Hayashida, M., Ueda, N., and Akutsu, T. (2003). Inferring strengths of protein-protein interactions from experimental data using linear programming. Bioinformatics 19, ii58-ii65 .10.1093/bioinformatics/btg1061
[34] He, D., Liu, Z.P., and Chen, L. (2011). Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach. BMC Genomics 12, 592.10.1186/1471-2164-12-592
[35] He, D., Liu, Z.P., Honda, M., Kaneko, S., and Chen, L. (2012). Coexpression network analysis in chronic hepatitis B and C hepatic lesion reveals distinct patterns of disease progression to hepatocellular carcinoma. J Mol Cell Biol 4, 140-152 .10.1093/jmcb/mjs011
[36] Huynen, M., Snel, B., Lathe, W. 3rd, and Bork, P. (2000). Predicting rotein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10, 1204-1210 .10.1101/gr.10.8.1204
[37] Ideker, T., and Sharan, R. (2008). Protein networks in disease. Genome Res 18, 644-652 .10.1101/gr.071852.107
[38] Jager, S., Cimermancic, P., Gulbahce, N., Johnson, J.R., McGovern, K.E., Clarke, S.C., Shales, M., Mercenne, G., Pache, L., Li. K., . (2011). Global landscape of HIV-human protein complexes. Nature 481, 365-370 .
[39] Jansen, R., Greenbaum, D., and Gerstein, M. (2002). Relating whole-genome expression data with protein-protein interactions. Genome Res 12, 37-46 .10.1101/gr.205602
[40] Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., and Gerstein, M. (2003). A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449-453 .10.1126/science.1087361
[41] Jothi, R., Kann, M.G., and Przytycka, T.M. (2005). Predicting protein- protein interaction by searching evolutionary tree automorphism space. Bioinformatics 21, i241-i250 .10.1093/bioinformatics/bti1009
[42] Kanehisa, M., and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27-30 .10.1093/nar/28.1.27
[43] Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R., . (2007). IntAct--open source resource for molecular interaction data. Nucleic Acids Res 35, D561-D565 .10.1093/nar/gkl958
[44] Lage, K., Mollgard, K., Greenway, S., Wakimoto, H., Gorham, J.M., Workman, C.T., Bendsen, E., Hansen, N.T., Rigina, O., Roque, F.S., . (2010). Dissecting spatio-temporal protein networks driving human heart development and related disorders. Mol Syst Biol 6, 381.10.1038/msb.2010.36
[45] Lee, K., Chuang, H.Y., Beyer, A., Sung, M.K., Huh, W.K., Lee, B., and Ideker, T. (2008) Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Res 36, e136.10.1093/nar/gkn619
[46] Liu, X., Liu, Z.P., Zhao, X.M., and Chen, L. (2012a). Identifying disease genes and module biomarkers with differential interactions. J Am Med Inform Assoc 19, 241-248 .10.1136/amiajnl-2011-000658
[47] Liu, Z.P., Wang, J., Qiu, Y.Q., Leung, R.K.K., Zhang, X.S., Tsui, S.T.W., and Chen, L. (2012b). Inferring a protein interaction map of Mycobacterium tuberculosis based on sequences and interologs. BMC Bioinformatics 13 (Suppl 7), S6.10.1186/1471-2105-13-S7-S6
[48] Liu, Z.P., Wang, Y., Zhang, X.S., and Chen, L. (2012c). Network- based analysis of complex diseases. IET Syst Biol 6: 22-33 .10.1049/iet-syb.2010.0052
[49] Liu, Z.P., Wang, Y., Zhang, X.S., Xia, W., and Chen, L. (2011). Detecting and analyzing differentially activated pathways in brain regions of Alzheimer's disease patients. Mol Biosyst 7, 1441-1452 .10.1039/c0mb00325e
[50] Liu, Z.P., Wu, L.Y., Wang, Y., Chen, L., and Zhang, X.S. (2007). Predicting gene ontology functions from protein's regional surface structures. BMC Bioinformatics 8, 475.10.1186/1471-2105-8-475
[51] Liu, Z.P., Wu, L.Y., Wang, Y., Zhang, X.S., and Chen, L. (2010). Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 26, 1616-1622 .10.1093/bioinformatics/btq253
[52] Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., and Gerstein, M. (2005) Assessing the limits of genomic data integration for predicting protein networks. Genome Res 15, 945-953 .10.1101/gr.3610305
[53] von Mering, C., Jensen, L.J., Kuhn, M., Chaffron, S., Doerks, T., Kruger, B., Snel, B., and Bork, P. (2007). STRING 7 – recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35, D358-D362 .10.1093/nar/gkl825
[54] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Network motifs: simple building blocks of complex networks. Science 298, 824-827 .10.1126/science.298.5594.824
[55] Newman, M.E., and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys Rev E 69, 026113 .10.1103/PhysRevE.69.026113
[56] Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D., and Maltsev, N. (1999). Use of contiguity on the chromosome to predict functional coupling. In Silico Biol 1, 93-108 .
[57] Pagel, P., Kovac, S., Oesterheld, M., Brauner, B., Dunger- Kaltenbach, I., Frishman, G., Montrone, C., Mark, P., Stumpflen, V., Mewes, H.W., . (2005) The MIPS mammalian proteinprotein interaction database. Bioinformatics 21, 832-834 .10.1093/bioinformatics/bti115
[58] Pazos, F., and Valencia, A. (2001). Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 14, 609-614 .10.1093/protein/14.9.609
[59] Pazos, F., and Valencia, A. (2002). In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219-227 .10.1002/prot.10074
[60] Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., and Yeates, T.O. (1999). Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96, 4285-4288 .10.1073/pnas.96.8.4285
[61] Prasad, T.S.K., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla, D., Raju, R., . (2009). Human Protein Reference Database- 2009 update. Nucleic Acids Res 37, D767-D772 .10.1093/nar/gkn892
[62] Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y., and Jiang, H. (2007). Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci U S A 104, 4337-4341 .10.1073/pnas.0607879104
[63] Skrabanek, L., Saini, H.K., Bader, G.D., and Enright, A.J. (2008). Computational prediction of protein-protein interactions. Mol Biotechnol 38, 1-17 .10.1007/s12033-007-0069-2
[64] Smith, G.R., and Sternberg, M.J. (2002). Prediction of protein-protein interactions by docking methods. Curr Opin Struct Biol 12, 28-35 .10.1016/S0959-440X(02)00285-3
[65] Sprinzak, E., and Margalit, H. (2001). Correlated sequence- signatures as markers of protein-protein interaction. J Mol Biol 311, 681-692 .10.1006/jmbi.2001.4920
[66] Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34, D535-D539 .10.1093/nar/gkj109
[67] Szilagyi, A., Grimm, V., Arakaki, A.K., and Skolnick, J. (2005). Prediction of physical protein-protein interactions. Phys Biol 2, S1-S16 .10.1088/1478-3975/2/2/S01
[68] Tamames, J., Casari, G., Ouzounis, C., and Valencia, A. (1997). Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 44, 66-73 .10.1007/PL00006122
[69] Tsoka, S., and Ouzounis, C.A. (2000). Prediction of protein interactions: metabolic enzymes are frequently involved in gene fusion. Nat Genet 26, 141-142 .10.1038/79847
[70] Sapkota, A., Liu, X., Zhao, X.M., Cao, Y., Liu, J., Liu, Z.P., and Chen, L. (2011). DIPOS: database of interacting proteins in Oryza sativa. Mol Biosyst 7, 2615-2621 .10.1039/c1mb05120b
[71] Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., and Eisenberg, D. (2004). The database of interacting proteins: 2004 update. Nucleic Acids Res 32, D449-D451 .10.1093/nar/gkh086
[72] Smialowski, P., Pagel, P., Wong, P., Brauner, B., Dunger, I., Fobo, G., Frishman, G., Montrone, C., Rattei, T., Frishman, D., . (2009). The Negatome database: a reference set of non-interacting protein pairs. Nucleic Acids Res 38, D540-D544 .10.1093/nar/gkp1026
[73] Valencia, A., and Pazos, F. (2002). Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 12, 368-373 .10.1016/S0959-440X(02)00333-0
[74] Vapnik, V. (1995). The nature of statistical learning theory. ( Springer-Verlag, New York ).
[75] Vastrik, I., D'Eustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., . (2007). Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8, R39.10.1186/gb-2007-8-3-r39
[76] Walhout, A.J., Sordella, R., Lu, X., Hartley, J.L., Temple, G.F., Brasch, M.A., Thierry-Mieg, N., and Vidal, M. (2000). Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287, 116-122 .10.1126/science.287.5450.116
[77] Wang, R.S., Wang, Y., Wu, L.Y., Zhang, X.S., and Chen, L. (2007). Analysis on multi-domain cooperation for predicting protein- protein interactions. BMC Bioinformatics 8, 391.10.1186/1471-2105-8-391
[78] Wang, J., Huo, K. , Ma, L., Tang, L., Li, D., Huang, X., Yuan, Y., Li, C., Wang, W., Guan, W., . (2011). Toward an understanding of the protein interaction network of the human liver. Mol Syst Biol 7, 536.10.1038/msb.2011.67
[79] Wang, L., Liu, Z.P., Zhang, X.S., and Chen, L. (2012). Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng Des Sel 25, 119-126 .10.1093/protein/gzr066
[80] Winter, C., Henschel, A., Kim, W.K., and Schroeder, M. (2006). SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res 34, D310-D314 .10.1093/nar/gkj099
[81] Wu, J., Kasif, S., and DeLisi, C. (2003). Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19, 1524-1530 .10.1093/bioinformatics/btg187
[82] Yu, H., Luscombe, N.M., Lu, H.X., Zhu, X., Xia, Y., Han, J.D., Bertin, N., Chung, S., Vidal, M., and Gerstein, M. (2004). Annotation transfer between genomes: protein-protein interologs and protein- DNA regulogs. Genome Res 14, 1107-1118 .10.1101/gr.1774904
[83] Yu, X., Wallqvist, A., and Reifman, J. (2012). Inferring high-confidence human protein-protein interactions. BMC Bioinformatics 13, 79.
[84] Zhang, X.S., Wang, R.S., Wang, Y., Wang, J., Qiu, Y., Wang, L., and Chen, L. (2009). Modularity optimization in community detection of complex networks. Europhys Lett 87, 38002.10.1209/0295-5075/87/38002
[85] Zhao, X.M., Chen, L., and Aihara, K. (2010). A discriminative approach to identifying domain-domain interactions from protein- protein interactions. Proteins 78, 1243-1253 .10.1002/prot.22643
[86] Zhao, X.M., Zhang, X.W., Tang, W., and Chen, L. (2009). FPPI: Fusarium graminearum protein-protein interaction database. J Proteome Res 8, 4714-4721 .10.1021/pr900415b
[87] Zhou, H.X., and Shan, Y. (2001). Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44, 336-343 .10.1002/prot.1099
Options
Outlines

/