INTRODUCTION
Proteins form the basic functional units of a cell. They carry out their functions by interacting with other proteins and small molecules. It is important to characterize the protein-protein interaction interface to gain mechanical insight into these interactions. On a systems level, these interactions form a complex network responsible for responding to both intracellular and extracellular perturbations [
1]. A number of experimental techniques have been developed in recent years to comprehensively map such networks. However, due to technical limitations, a significant number of interactions, and in particular the dynamic interactions in such networks, are yet to be discovered.
In this review, we first discuss the experimental and theoretical methods to construct “classical” protein-protein interaction networks. We then summarize recent progress on integrating structural information into these interaction networks. We also discuss how interaction networks are being utilized in rational drug design.
GRAPH THEORY BASED “CLASSICAL” PPI NETWORKS
In the post genomic era, significant effort has been put into identifying and understanding the role of various coding and non-coding regions of the genome [
2]. Knockout experiments, targeted mutations, functional assays and other biochemical methods have been used to gain insight into the functions of individual proteins [
3]. As most proteins carry out their functions by interacting with other proteins, a number of experimental methodologies, such as yeast two-hybrid (Y2H) [
4-
8], co-immunoprecipitation [
9,
10] and co-expression data [
11,
12], have been used to construct protein-protein interaction networks.
Several databases, including DIP [
13], MINT [
14], HPRD [
15], BioGrid [
16], BIND [
17] and IntAct [
18], have then compiled this data from various sources. A brief description of these databases can be found in Table 1.
Traditionally, PPI networks have been represented as graphs (Fig. 1), where each node represents a protein and interactions between them are shown as edges. Several analyses of these networks have illustrated the built-in robustness of these networks by calculating the degree (number of interactions) of each protein [
19-
24]. Moreover, the proteins/genes in these networks are not randomly located; instead, proteins associated with a particular function tend to form clusters [
25-
28], and those associated with a disease have a large number of protein-protein interactions [
29,
30]. However, the elevated degree observed for disease-associated genes may have some inherent bias because many studies have focused on cancer genes alone and also, in general, disease-associated genes might have higher reported interactions because they attract more research interest [
31]. The graphical representation of the network is also useful in tracing potentially perturbed / malfunctioning proteins (nodes). However, this representation ignores the structural information of the protein-protein interaction interface.
STRUCTURE BASED PPI NETWORKS
High-throughput approaches like Y2H, co-immunoprecipitation and co-expression do not provide structural details for protein-protein interactions, and sometimes contain significant false positives [
8,
32-
34]. High-resolution structural protein-protein interaction data can be obtained by X--ray crystallography [
35] and NMR spectroscopy [
36], while cryo-EM provides low resolution structural data [
37]. As of November 2012, more than 86000 structures have been deposited in the Protein Data Bank (PDB) [
38]. Although a significant number of structures are now available, a portion of these are monomeric and others may contain non-native packing interactions [
39].
Several computational methods have been designed to complement experimental approaches. Docking methodologies are widely used to predict the bound state of two proteins. ZDOCK [
40], PIPER [
41], ClusPro [
42], HADDOCK [
43], RosettaDOCK [
44] and PatchDock [
45] are some of the most commonly used docking methods. They can be broadly divided into two categories; (i) methods that utilize Fast Fourier Transform (FFT) to search for the best interaction conformation during rigid body rotations/translations, (ii) methods that use experimental information, such as interface residues and NMR data. The critical assessment of predicted interactions (CAPRI) is a community wide experiment, held every two years, that aims to judge the performance of existing methods [
46]. Homology based methodologies are also widely used, particularly for large-scale studies, as docking methods are time intensive [
47-
50]. An alternative approach to predict reliable protein-protein interaction is to utilize only the interface information from a homolog protein [
51-
54]. Interface-based approaches take advantage of the observation that protein interaction sites are more conserved than the remainder of the protein surface [
55-
57]. A recent study on 231 enzyme families showed that even a sequence identity of 45% between the binding surface of the template protein and the modeled protein could generate the interaction interface successfully [
58]. Computational approaches that specialize in identification of the protein-protein interaction sites for a particular type of protein, e.g., membrane proteins, have also been very successful [
59-
64]. The identification of pockets on the protein surface has also been used successfully by a number of groups to predict protein-protein interaction sites [
65-
67]. A brief description of computational tools to detect protein-protein interactions can be found in Table 2.
Structural Protein-protein Interaction Networks provide rich mechanistic insight into the regulatory mechanisms of proteins. Not only do they provide information about the important residues involved in the interactions, but also indicate whether two proteins might simultaneously interact or compete for a binding partner. If both proteins bind to approximately the same surface on a protein, it is more than likely that they will compete for the binding interaction due to steric hindrances. On the other hand, if the two proteins bind to different parts of a protein surface, it is likely that they can interact with the protein simultaneously. Most proteins interact with only a few other proteins. However, some proteins (named
hub proteins) have a large number of protein-protein interactions [
70,
71]. Hub proteins, can include families of enzymes, transcription factors and intrinsically disordered proteins, among others [
72,
73]. The number of interactions in hub proteins is larger than the number of interaction interfaces. Therefore, hub proteins often reuse their PPI interfaces for multiple interactions. Intrinsically disordered proteins achieve this by sampling the low energy conformation landscape continuously. This enables them to present different interaction interfaces to different binding partners [
72]. Studies have shown that hub proteins are more likely to be associated with diseases like cancer than non-hub proteins [
30,
74].
A human structural interaction network: Wang et al. have used high quality binary interaction data and homology modeling to construct a human structural interaction network (hSIN) that consists of 2816 proteins and 4222 structurally resolved interactions [
75]. Utilizing this structurally resolved protein-protein interaction network, they were able to demonstrate that for the corresponding diseases, the in-frame mutations were enriched on the interaction interfaces of the proteins. Moreover, they discuss the basis of pleiotropy of disease genes and locus heterogeneity with experimental case studies on the interactions of WASP protein with CDC42 and VASP proteins [
75]. They also predicted 292 candidate genes to have 694 previously unknown disease-to-gene associations by applying the guilt-by-association principle on their structurally resolved interaction network, based on mutations of known disease genes [
75].
An extracellular signal-regulated kinase network: PRISM (PRotein Interactions by Structural Matching) is another useful tool for constructing structure based protein-protein interaction networks [
51,
76,
77]. PRISM utilizes structural motifs derived from known non-redundant binary interactions, evolutionary conservation and flexible refinement to predict protein-protein interactions on a proteome wide scale. The structural network of the Extracellular signal-Regulated Kinases (ERK) in the Mitogen-Activated Protein Kinase (MAPK) signaling pathway was constructed using this approach [
77]. This network provides rich information about interactions that can occur simultaneously, and those that are mutually exclusive [
77]. 64% of the 25 protein-protein interaction interfaces in the network are utilized for two or more interactions. Most notably, ERK protein is involved in seven interactions using seven distinct interfaces [
77].
Interacting proteins share at least some subcellular localization. PPI networks have, therefore, also been used to predict the subcellular localization of protein complexes [
78,
79]. Interestingly, there is some evidence that suggests that information on subcellular localization can be used, in combination with other features, to predict PPIs [
80].
It is important to note that apart from PPI networks, other types of networks that depict other cellular activities, for example metabolic networks (KEGG [
81], EcoCyc [
82], BioCyc [
83], and metaTIGER [
84]), have also been used extensively in computational and experimental studies.
PPI NETWORKS AND DRUG DESIGN
Structural protein-protein interaction networks are a valuable resource for drug discovery. Proteins function by interacting with other proteins. Therefore, interacting proteins are likely to be involved in the same cellular processes. As a result, perturbing these interactions can result in a number of outcomes, including onset or intensification of a disease such as cancer [
85-
87]. Perturbing these interactions can often cause
loss of function or
gain of function [
88]. With the availability of the complete structural information of the interaction interface, it is possible to design peptide inhibitors that mimic the interaction partner and perturb a normal PPI [
89,
90]. Moreover, the side effects of a drug can be predicted much more accurately using the structural and topological information embedded in the structure based interaction networks, as compared to just the topological information in the classical graph-based interaction networks [
91].
The classical assumption of one drug target for one drug to treat a single disease has been shown to be inaccurate in a number of cases. This assumption might be the reason for the high failure of new drugs in clinical trials as a result of low efficacy and high toxicity [
92-
94]. So-called “Off-Target” binding generally contributes to side effects and toxicity [
95]. However, there have been a few cases where “Off-Target” binding has been beneficial [
96]. Each known drug on average binds to 6 known targets, and therefore it is predicted that on average there will be 6 targets, known or unknown, for each newly discovered drug [
97]. To understand the side effects and toxicity of rejected drugs, it is important to predict “Off-Target” binding sites. A number of methods have used clustering of proteins into families [
98,
99], global structure similarity measurement [
100,
101] and interface similarity measurement [
102-
105] to predict “Off-Target” binding or to redesign drugs to enhance efficacy. Lounkine et al. have used a similarity ensemble approach to predict off-targets, based on whether a molecule will bind to a target with similar chemical features to those of known targets [
106]. They further linked the off-targets to adverse drug reactions (ADR) by using a guilt-by-association pipeline that linked off-targets to the ADRs of drugs for which the off-targets were primary targets [
106]. They computationally screened 656 drugs approved for human use against 73 target proteins, and verified their predictions by either searching in protein-ligand databases or performing binding and functional assays [
106]. Moreover, they were able to construct a three-way Drug-Target-ADR network that could be an extremely useful starting point for future off-target and ADR predictions [
106]. The traditional approach to drug design is to focus on the molecular level, while the phenotypic outcomes in the clinical trial are measured at the organism level. Therefore, in future it will be extremely beneficial to predict off-target binding using structure based PPI networks [
85,
107], and to predict the drug targets [
108], drug response [
109,
110] and even drug resistance [
111] well before entering the clinical trial stage.
Drug repositioning: Due to the high failure rate and huge costs involved with traditional drug design [
112,
113], a new paradigm has emerged that identifies new targets for existing approved drugs [
114,
115]. This approach is based on the fact that each drug on average binds more than one target, and that the cost of Phase I clinical trials could be saved by re-using existing approved drugs. The amount of chemical and biologic data has increased exponentially in recent years, and databases have been developed to integrate large amounts of data arising from different sources. PROMISCUOUS is one such database that integrates drug-protein interactions, protein-protein interactions and side effect data for drug repositioning studies [
116]. Chem2Bio2RDF is another useful database that integrates chemical and drug data, protein and gene data, chemogenomics data, protein-protein interaction and pathway data and side effects data for designing multiple pathway inhibitors and predicting adverse drug reactions [
117]. Drug repositioning can be particularly useful for rare and orphan diseases [
118,
119]. Successful repositioning of drugs for novel targets has been achieved not only using approved drugs [
120] but also using late stage failures [
121,
122]. With the advent of structure based PPI networks this approach is likely to become much more effective, as lead targets and ADRs can be predicted much more accurately using structure based PPI networks than graph theory based classical networks.
Drug-drug interaction (DDI): Drug-drug interaction (DDI) is often the cause of ADRs, particularly for patient populations regularly taking multiple drugs. DDIs occur when the pharmacologic response of a particular drug is transformed by the action of another drug [
123], resulting in potentially harmful clinical effects. A recent algorithm by Huang et al. for the systematic prediction of pharmacodynamic DDIs that considers drug actions and their clinical effects for the first time in the context of complex PPI networks is a significant step forward for predicting ADRs [
124]. The authors show that the integration of network topology, cross-tissue gene expression correlations and side effect similarity can predict DDIs with significant success. One of the major avenues for improvement in future studies would be utilization of a structure based PPI network, which would provide a high confidence molecular network reducing the number of false positives. This should also provide a framework to study the mechanisms of Drug-protein interactions. By integrating structural information, it will be possible for the first time to assess the ratio of DDIs occurring as a result of competitive binding, allosteric effects or indirect influence.
Yet not all DDIs are bad side effects of drugs; some DDIs provide a useful
Drug Combination strategy. Identification of drugs that could be prescribed simultaneously may improve efficacy and reduce side effects. This strategy is particularly effective in accounting for pathway redundancy. Several drug combinations have already been reported, particularly for various types of cancer [
125-
127] and Human Immunodeficiency Virus (HIV) [
128-
130]. In the near future, the use of structure based PPI networks will provide more detailed information to bring bioinformatics studies in the field of Drug Combination strategies to the next level.
CONCLUDING REMARKS
Protein-protein interaction interfaces are a rich resource for gaining insight into the mechanism of how a protein carries out its functions. A complete structurally resolved interaction network of all the proteins will be an invaluable resource for not only understanding the complex and essential functions associated with these proteins, but will also help in designing novel therapeutic strategies for the diseases associated with these proteins. Due to experimental limitations, computational methods are extremely useful for completing such interaction networks, and for using these networks to predict the side effects of new drugs and to reposition existing drugs.
ACKNOWLEDGEMENTS
This work was funded by grants from the National Natural Science Foundation of China (NSFC) (Grant #31210103916 and 91019019), Chinese Ministry of Science and Technology (Grant #2011CB504206) and Chinese Academy of Sciences (CAS) (Grant #KSCX2-EW-R-02 and KSCX2-EW-J-15) and stem cell leading project XDA01010303 to J.D.J.H. H.N. was supported by the Chinese Academy of Sciences Fellowship for Young International Scientist [Grant # 2012Y1SB0006] and the China Natural National Science Foundation [Grant # 31250110524]. The authors thank Dr. Jerome Boyd-Kirkup for extensive editing and Hamna Anwar for proofreading the manuscript.
CONFLICT OF INTEREST
The authors Hammad Naveed and Jingdong J. Han declare that they have no conflict of interests.
Higher Education Press and Springer-Verlag Berlin Heidelberg