1 INTRODUCTION
The exploration for biomedical interactions between chemical compounds (drugs, molecules) and protein targets is of great significance for drug discovery [
1]. It is believed that drugs interact with biological systems by binding to protein targets and affecting their downstream activity. Prediction of drug-target interactions (DTIs) is thus important for identification of therapeutic targets or characteristics of drug targets. The abundant knowledge of DTIs also provides valuable insight towards understanding and uncovering higher-level information such as therapeutic mechanisms in drug repurposing [
2]. For instance, Sildenafil was initially developed to treat pulmonary hypertension, but identification of its side effects allowed it to be repositioned for treating erectile dysfunction [
3]. In addition, since most human diseases are complex biological processes that are resistant to the activity of a single drug [
4,
5], polypharmacy has become a promising strategy among pharmacists. Prediction and validation of drug-drug interactions (DDIs) can sometimes reveal potential synergies in drug combinations to improve the therapeutic efficacy of individual drugs [
6]. More importantly, negative DDIs are major causes of adverse drug reactions (ADRs) [
7], especially among the elderly who are more likely to take multiple medications [
8]. The severe ADRs from critical DDIs may lead to the withdrawal of drugs from market, such as withdrawal of mibefradil and cerivastatin from the US market [
9,
10]. Hence, accurate interactions prediction between drugs can not only ensure drug safety, but also can shed a light for drug repositioning or drug repurposing, which potentially can lower the overall drug development costs and enhance the drug development efficiency.
Over the past decade, the emergence of various biochemical databases, such as DrugBank [
11], TwoSides [
12], RCSB Protein Data Bank [
13] and PubChem [
14], has provided a rich resource for studying DTIs and DDIs for health professionals. However, prediction of novel or unseen biochemical interactions still remains a challenging task.
In vitro experimental techniques are reliable but expensive and time-consuming.
In silico computational approaches have received far more attention due to their cost-effectiveness and increasing accuracy in various drug-related prediction tasks [
15–
19]. The state-of-the-art computational methods for interactions prediction rely on machine learning algorithms that incorporate large-scale biochemical data. Most of these efforts are based on the principle that similar drugs tend to share similar target proteins and vice versa [
20]. Hence, the most popular frameworks formulate the prediction of DTIs and DDIs as classification tasks and use different forms of similarity functions as inputs [
21]. Another common types of approach are to construct heterogeneous networks in the chemogenomics space to predict potential interactions using random walks [
22]. The rise of machine learning methods, especially deep learning methods have promoted drug-related research tremendously in the last two decades, including the tasks for predicting DTIs and DDIs [
23,
24]. For example, DeepDDI [
16] first generated a feature vector called structural similarity profile (SSP) for each drug, then calculated a combined SSPs of a pair of drugs by dimension reduction,
i.e., PCA, from concatenation of two SSP of drugs. The combined SSPs were used for training DeepDDI model to perform DDI prediction. Similar to DeepDDI, NDD [
15] first calculated high-level features of drug by multiple drug similarities based on drug substructure, target, side effect, pathway and etc. Then it used a multi-layer perceptron for the interaction prediction based on curated features. DeepPurpose [
25] is a deep learning framework for DTI and DDI prediction tasks by integrating different types of neural network structure only using sequence-based inputs. DeepDTA [
26] used two convolutional neural networks to learn from compound SMILES and protein sequences to predict interactions. GraphDTA [
27] used graph neural networks and convolutional neural network to learn the high dimension features of drugs and targets separately and makes interaction prediction via fully connected layers. DDIMDL [
28] built a multimodal deep learning framework with multiple drugs features to predict DDIs. SkipGNN [
29] is a graph neural network approach for predicting molecular interactions by aggregating information from direct and second-order interactions.
In spite of these advances, there is still room for improvement in several aspects. First of all, the accurate prediction of unseen drug interactions depends heavily on the feature extraction technique or similarity kernel used. Since different forms of feature extraction or similarity kernel introduce varying amount of human-engineered bias, they often display different levels of predictive performance depending on the relevant settings and no single kernel outperforms others universally [
30]. Similarity-based methods also have difficulty applied on large-scale datasets due to the significant computational complexity of measuring similarity matrices [
31]. Network-based methods built upon topological properties of the multipartite graph suffer from the same problem depending on the complexity of the graph [
32]. Deep learning-based methods utilized either sequence-based or structural information only, none of them combined both information for specific drug and protein to comprehensively model the biological entities. Moreover, none of existing methods consider solving DDIs and DTIs tasks using a unified framework.
In recent years, deep learning frameworks based on various of graph neural networks such as graph convolutional network (GCNs) [
33], graph attention networks (GATs) [
34], gated graph neural networks (GGNNs) [
35] and residual gated graph convolutional network [
36] have demonstrated ground-breaking performance on social science, natural science, knowledge graphs and many other research areas [
37–
39]. In particular, GCNs have been applied to various biochemical problems such as molecular properties prediction [
40], molecular generation, protein function prediction [
41]. As pharmacological similarities are mainly originated and computed from not only sequence but also structural properties, graph representations of biochemical entities have shown capability of capturing the structural features of Euclidean ones without requiring feature engineering [
42,
43].
Based on these observations, we propose DeepDrug, a graph-based deep learning framework, to learn drug interactions such as pairwise DDIs or DTIs. A key insight of our framework is that biochemical interactions are primarily determined by both the sequence and structure of the participating entities and both drugs and proteins can be naturally represented as graphs. Therefore, it is crucial for the predictive model to incorporate both sequence-based and structural information and employ a graph-based architecture for DeepDrug. The proposed model mainly differs from previous methods in the following three aspects: (i) Unlike previous methods that only use sequence or structure information. DeepDrug takes both traditional sequence representation and structure-based graph representations as inputs to learn a more comprehensive representation for drugs or proteins; (ii) We introduced a novel Res-GCN module to better capture the intrinsic structural information among atoms of a compound and residues of a protein; (iii) To the best of our knowledge, DeepDrug is the first work to solve both DDIs and DTIs tasks within a unified framework. A series of systematic experiments show that DeepDrug outperforms other state-of-the-art models and demonstrates high robustness under different experimental settings. We summarized that DeepDrug, as an effective tool for predicting DDIs and DTIs, could shed light on the understanding of biochemical interactions.
2 RESULTS
2.1 Overview of DeepDrug
We developed a deep learning framework, DeepDrug, to predict drug interactions (e.g., DDIs and DTIs) by combining sequence profile and structural profile. For each input (drug or protein), we used sequence data as well as the partially available structural data as separate input branch to the DeepDrug model (Fig.1). The input sequence of drug and protein was converted into a representation using one-hot encoding and fed to convolution layers. The drug chemical structure was encoded as a graph, where node represents atom and edge denotes chemical bond. Similarly, the protein structure was encoded also as a graph with nodes and edges denoting amino acids and the interaction between them. Then the graph representation was fed to several residual graph convolution layers (Res-GCNs). The hidden features extracted from the sequence branches and structural branches were subsequently merged by concatenation. Finally, a fully connected layer with Sigmoid/Softmax/None activation functions were used to get different types of output for binary classification, multi-class/multi-label classification, and regression, respectively. Detailed hyperparameters were illustrated in Supplementary Fig. S1.
2.2 DeepDrug enables superior drug-drug interactions prediction
DDIs prediction falls into two categories: (i) binary classification where each pair of drugs in the database was annotated as positive example or negative examples. Negative samples were selected by either random pairing or stringent blind test. (ii) multi-class/multi-label classification where the multi-labels were obtained from annotations based on the different types of interactions defined in DrugBank and TwoSides (see Methods). We first evaluated the performance of DeepDrug for DDIs prediction in a binary classification setting. We benchmarked DeepDrug against eight baseline methods, including random forest classification (RF) and logistic regression (LR), DeepDDI [
16], DeepPurpose [
25], NDD [
15], AttentionDDI [
44], DDIMDL [
28] and SkipGNN [
29]. Five datasets were used for evaluation, including DDInter [
45], DrugBank, TwoSides and two datasets from NDD paper [
15]. Our analysis showed that deep learning methods outperform similarity-based methods and traditional machine learning methods, such as RF and LR, across different datasets by a large margin. Among all the competing methods, DeepDrug consistently outperformed all other methods by achieving the highest F1 score of 0.916–0.955, highest area under precision-recall curve (auPRC) score of 0.964–0.987 and highest area under receiver operation curve (auROC) score of 0.971–0.988 with balanced setting (Supplementary Table S1). Comparing to the second-best baseline method DeepPurpose, DeepDrug achieved averaged 2.1% higher F1 score, 1.3% higher auPRC score and 1.1% higher auROC score.
However, Due to the rarity of occurrence of DDIs [
46], the number of known DDIs among a typical drug database is usually very low. Hence, to be more realistic and practical, we also evaluated robustness of DeepDrug with imbalanced datasets by altering the ratio between positive samples and negative samples to 1:2, 1:4, 1:8 and 1:16 based on the number of drugs in different datasets (Fig.2). Note that the results of NDD and AttentionDDI were directly collected from the original paper. In our case, although the auPRC scores of all comparing methods dropped, DeepDrug still outperforms other comparison methods across all datasets with different positive-to-negative ratio by achieving the highest F1 and auPRC scores (Fig.2, Supplementary Fig. S2). Specifically, DeepDrug is more robust and achieves a significantly higher performance than the best baselines DeepPurpose and DeepDDI when the dataset is extremely unbalanced (Fig.2). For example, the superiority demonstrated by DeepDrug over DeepPurpose in terms of F1 score increased from 1.7% to 9.6% when the positive-to-negative sample ratio changed from 1:1 to 1:16 for DDInter dataset. The auPRC score of DeepDrug over DeepPurpose increased from 1.40% to 8.50% when the positive-to-negative sample ratio changed from 1:1 to 1:16 for DDInter dataset. To sum up, the performance of DeepDrug in terms of F1 and auPRC scores over other prediction methods demonstrated the superior ability of DeepDrug in predicting DDIs, especially with unbalanced dataset.
Next, we compare DeepDrug with other methods with multi-class/multi-label classification settings where only DeepDDI and DeepPurpose are applicable. We conducted the classification experiments using DrugBank and TwoSides databases based on the 86 and 1317 interaction types, respectively. All of the DDI methods were evaluated using standard metrics including macro F1 score and auPRC score. In multi-class classification, DeepDrug achieved the best performance by obtaining 4.3%–5.8% higher F1 score and 4.9%–6.7% higher auPRC than the best baseline method (Supplementary Table S2). The outperformance by DeepDrug indicated the advantage of using structural representation and sequence-based representation of drug in DDI predictions. The same trend was observed in multi-label classification results where the introduction of 1317 types of interactions in dataset lowers F1 scores of all methods, DeepDrug demonstrated much higher F1 as 0.292 and auPRC score as 0.265 than the second-best method DeepPurpose (F1 score 0.227 and auPRC score 0.191, Supplementary Table S2, Fig. S3).
To evaluate the performance of DeepDrug under a more stringent setting, we used blind test for binary classification where five-fold cross-validation was used to ensure that one drug or both drugs in test set were not used in training set. DeepDrug again outperforms the best baseline DeepPurpose by achieving an average 4.45% higher F1 score and 1.8% higher auPRC with double-blind testing across four datasets (Supplementary Table S3). To sum up, DeepDrug was shown to be superior and robust in both binary and multi-class/multi-label classification of DDIs. Therefore, unlike DeepPurpose that only used the SMILES sequence information, DeepDrug exploited both structural information from a novel graph representation and sequence information from SMILES string, which is potentially capable of learning the underlying structural properties to gain better performance.
2.3 DeepDrug accurately identifies drug-target interactions
Although proteins generally have more intricate structures than chemical drugs due to their three-dimensional arrangement of sequence residuals, they can still be effectively represented by 3D graphs. We first classified the DTI dataset with binary labels and benchmarked DeepDrug against six baseline methods, including RF and LR, DeepPurpose [
47], CPI [
48], MolTrans [
49] and TransformerCPI [
50]. Three benchmark datasets were introduced, including BindingDB, DAVIS and KIBA. The benchmark experimental results also showed the same trend as DDIs tasks that deep learning methods dominated the DTIs prediction tasks. DeepDrug again obtained the best performance across all deep learning methods by achieving an average auPRC of 0.811 in the above three datasets, compared to 0.788 of the second-best baseline DeepPurpose (Fig.3, Supplementary Table S4). Noticeably, DeepDrug and DeepPurpose were the only two deep learning methods that were applicable in the largest BindingDB dataset while the transformer-based method TransformerCPI failed due to the low computational efficiency. The superior performance of DeepDrug in DTIs prediction tasks indicated that the graph-based representation of drug can be regarded as a general framework for boosting prediction performance in various drug-related tasks.
Next, we compared DeepDrug to four baseline methods, including GraphDTA [
27], DeepDTA [
26], DeepPurpose [
47] and RF, in DTIs regression settings where we directly predict the continuous binding affinity, which is measured by
Kd value (see Methods). We conducted the regression experiments in the same three datasets (BindingDB, DAVIS and KIBA [
51]) based on the
Kd value (kinase dissociation constant). All of the comparing methods were evaluated using standard metrics including concordance score, Pearson r score and R
2 score. Again, DeepDrug achieved the best performance in terms of the three evaluation measurements compared to baseline methods (Fig.3, Supplementary Table S5). Specifically, DeepDrug achieved the highest concordance score of 0.836 in BindingDB, which is 1.2% and 2.3% higher than DeepPurpose and a graph neural network-based method GraphDTA, respectively. The superiority of DeepDrug was consistently observed in DAVIS and KIBA datasets. Different from GraphDTA that only updated node features in the graph convolutional layers, DeepDrug considered both node features and edge features and updated them iteratively, thus leading to a more comprehensive representation of a drug and resulting in an incremental predictive power in the DTIs tasks. The superiority of DeepDrug indicated the benefit of combining a comprehensive structural representation and sequence representation for both drugs and proteins in DTI prediction tasks.
To further explore the ability of DeepDrug in drug repositioning, similar to DDI blind test, we stringently separated the drugs and proteins into training and test sets using five-fold cross-validation, thus curating a blind test set where the drugs or/and proteins were unseen in training set. This task became much more challenging as both the drugs and proteins were unseen during the training process. DeepDrug demonstrated an average concordance score of 0.677 and Pearson r of 0.468 in DAVIS dataset, which outperformed DeepPurpose (concordance score of 0.605, Pearson r of 0.392) by a noticeable margin (Supplementary Table S6). Therefore, by exploiting useful structural information from graph representation of drugs and proteins, DeepDrug was shown to be consistently superior over baseline methods in both classification and regression of DTIs. We then summarized that DeepDrug provided a powerful representation of both drugs and proteins by considering both the comprehensive structural information as well as the sequence information. The superior performance of DeepDrug across various settings in DDIs and DTIs prediction tasks implicated a strong generalization ability of DeepDrug in wide drug-related applications.
2.4 Model ablation analysis
To further support the results shown in the above sections, we conducted comprehensive model ablation analysis to measure the contribution of different modules used in DeepDrug architecture (Methods). First, we analyzed the performance of DeepDrug with respect to the following model ablation setting: presence of Res-GCN module and presence of CNN module. Res-GCN module and CNN module are used in the DeepDrug to leverage structural and sequence information, respectively. We used the binary classification task of DDIs in multiple positive-to-negative sample ratios and DTIs regression for ablation studies. It was observed that using Res-GCN module alone led to a decreased performance with 0.5%–2.6% lower F1 score while using only CNN module resulted in a decline of 0.2%–1.6% in F1 score (Supplementary Table S7). Similar decrease trends were noticed in terms of R2 and concordance score in DTI regression tasks. DeepDrug with structure information removed reduces 1.1% and 1.8% on the R2 metric and 0.6% and 1.4% on the Pearson r in the KIBA and DAVIS datasets, respectively. DeepDrug with fused structural and sequence features performs optimally, indicating the benefits of integrating structural information with sequence information in the DTI tasks. Next, we removed the edge features that were ignored by existing works but used in Res-GCN modules, the F1 score decreases about 2.4% to 3.2% (Supplementary Table S8). To summarize, Res-GCN module and CNN module are complementing each other to further improve the predictive performance, indicating the usefulness of our designed DeepDrug architecture.
We also analyzed the robustness of DeepDrug with respect to the following hyperparameter setting: choice of feature aggregation, number of hidden units in each GCN layer, the total number of GCN layers. The performance of DeepDrug using SoftMax aggregation function demonstrated better performance than other aggregation functions such as Mean and Sum (Supplementary Table S9). As the number of hidden units increased significantly (e.g., 32 and higher) in Res-GCN layer, both evaluation metrics started to saturate. As the number of GCN layers increased, the model became insensitive to the number of Res-GCN and CNN layers as well (Supplementary Table S9). To sum up, DeepDrug was insensitive to most parameter choices, illustrating the robustness of the framework.
2.5 DeepDrug embeddings reflect drug types and drug functions
To demonstrate that DeepDrug effectively captured the variability of structural information in the embeddings learned from Res-GCN module, we visualized the structural embeddings of drugs from benchmark DrugBank dataset using t-distributed stochastic neighbor embedding (tSNE). We found that the DeepDrug embeddings exhibited clear patterns that corresponded to the underlying drug types and drug functions (Fig.4, Supplementary Fig. S4). We assumed that drugs that were closer in the embedding space (e.g., within the same cluster) implied the presence of certain form of higher similarity or closer relationship. To verify this, we then quantified the effectiveness of the embeddings by various evaluation settings and found that DeepDrug embeddings consistently outperformed DeepPurpose by achieving a higher averaged Drug Category Enrichment Score (0.690 vs 0.621, see DCES in Supplementary Note 1) and higher silhouette score (0.568 vs 0.543, see Fig.4 and Supplementary Table S10). Extensive evaluations under various folds showed that the DeepDrug embeddings consistently achieved the best performance (Fig.4). Furthermore, to evaluate the performance of DeepDrug applied to unseen drugs, we further collected 4886 unseen drugs from DrugBank website that were not used in benchmark studies and the mean DCES was again better than the DeepPurpose (0.575 vs 0.514, Supplementary Table S11).
Next, we isolated 28 drugs in the cluster 4 (enriched as opioids) and compared their chemical structures as well as their functionalities with other randomly sampled drugs in the dataset that were far away from the cluster. A subset of our sampled drugs is presented in Fig.4. The striking observation was that drugs in the cluster shared very similar structural compositions. In terms of functionality, the cluster of drugs identified by DeepDrug embeddings were highly similar among themselves as well (Fig.4). Out of the 28 drugs in the cluster 4, all of them were meant for pain relief (Supplementary Table S12). Taken together, these results demonstrated that the DeepDrug structural embeddings effectively captured the structural information which might determine the functionality of the input entities to reflect the underlying drug function. Such structural embedding capability is considered to be the main driving force to the superior performance of DeepDrug.
2.6 DeepDrug provides potential therapeutic opportunities against SARS-CoV-2
SARS-CoV-2 is a newly enveloped positive-strand RNA virus, which has probably the largest genome (approximately 30 kb) among all RNA viruses. The nucleocapsid (N) protein, which is mainly responsible for recognizing and wrapping viral RNA into helically symmetric structures, has been reported to boost the efficiency of transcription and replication of viral RNA, implying its vital and multifunctional roles in the life cycle of coronavirus [
52].
We then investigated whether DeepDrug was able to correctly identify the interactions of SARS-CoV-2 proteins. We constructed two drug-target positive datasets (
i.e., one is expert-confirmed and one is literature-based) for SARS-CoV-2 from a recent study [
53] (Supplementary Note 2). In our benchmark BindingDB dataset, there were 68 SARS-CoV-2 interacting drugs and 124 proteins which were similar with these SARS-CoV-2 proteins. To obtain a stringent rule for constructing dataset, we removed those SARS-CoV-2 interacting drugs and analogous drugs from the training set that shared similar SMILE sequences (defined as drugs sequence similarities > 60%, see Methods, Supplementary Fig S5A, B), and removed proteins similar to SARS-CoV-2 with protein sequence similarities > 30%. After removing these records, we re-trained the DeepDrug model and combined the SARS-CoV-2 interacting drugs to construct an independent test set. The DeepDrug prediction scores for interacting pairs and non-interacting pairs were shown in Fig.5, and we noticed that DeepDrug assigned higher prediction scores for those interacting pairs. The results showed that DeepDrug was able to distinguish expert-confirmed positive pairs from negative pairs in both of mean and maximum strategies (
p-values equal to 5.06 × 10
-9 and 8.06 × 10
-7 respectively, one-side paired t-test). Results predicted on the similar templates for RCSB database, rather than the simulation structures, also show similar significant discrimination between expert-confirmed positive pairs and negative pairs (Supplementary Fig. S6). In addition, the results of DeepDrug training on the original BindingDB dataset also showed similar performance (Supplementary Fig. S7). We observed that there were some outliers with very high affinity in the predictions of the negative pairs, which could be potential valid potential drugs. Among the top-ranked drug-protein pairs, 2 out of top-3 drugs, 7 out of top-10 drugs were already reported by literatures (Supplementary Table S13). In these molecules, prinomastat (the 2
nd top-ranked molecule), a matrix metalloprotease inhibitor, was reported to have selective activity against SARS-CoV-2 but not against SARS-CoV [
54]. Besides, pioneering research [
55] have shown that
TNF,
IL1B,
IL6,
IL8,
NFKB1,
NFKB2 and
RELB genes are significant upregulated, leading to strongly activation of
TNF and NFκB-signaling pathways in the SARS-CoV-2 patients. These pathologic features are similar to those of chronic obstructive pulmonary disease (COPD). Tiotropium (the 4
th top-ranked molecule), which is observed to alleviate airway inflammation and improve pulmonary function, is a well-known therapeutic drug for COPD patients. Therefore, tiotropium is a potential effective drug for the treatment of SARS-CoV-2. In addition, a therapeutic treatment on an array of 11 SARS-CoV-2 patients have demonstrated that danoprevir (the 7
th top-ranked molecule) treatment effectively inhibited viral replication and improved patient health status [
56]. Hence, the repurposing of danoprevir, a powerful hepatitis C virus (HCV) protease inhibitor, for SARS-CoV-2 is a promising therapeutic treatment option. Such results further demonstrated the strong predictive power of DeepDrug and DeepDrug may provide therapeutic opportunities against newly found proteins such as SARS-CoV-2.
3 DISCUSSION
In this study, we proposed DeepDrug as a novel end-to-end deep learning framework for DDIs and DTIs predictions. DeepDrug takes both topological structure information and sequence information of either drug-drug pair or drug-protein pair as inputs and utilizes Res-GCNs and CNNs to learn the graph representation and high-level sequence embeddings, respectively. Multi-source features are fused together to complement each other in order to achieve a superior prediction level with high accuracy. To the best of our knowledge, DeepDrug is the first work to apply both graph convolutions and sequence convolutions to molecular representation. In addition, we demonstrated that the combination of intrinsic graph-based representation and high-level sequence embeddings are appealing for a comprehensive assessment for predicting DDIs and DTIs. Unlike the AdvProp method [
57] that uses ensemble learning strategies to combine the outputs of different sub-models that take structural and sequence information separately as inputs, DeepDrug is more capable of integrating structural and sequence information benefiting from the design of a single model architecture. Our extensive experiments highlighted the predictive power of DeepDrug and its potential translational value in drug repositioning.
We also provide three possible future directions for improving our DeepDrug model. First, the rich multi-omics data, including genomic, transcriptomic, epigenomics and proteomic data, which are proven to be informative [
58–
63], could help DeepDrug further improve the predictive power. We will try to incorporate these abundant data into our DeepDrug model. Second, the current interaction predictions (
e.g., DTIs) do not consider the causal interaction where one drug is involved in a biological or biochemical process to directly or indirectly affect a protein. Identifying such direct interactions and indirect interactions by causal inference method [
64] could help us better understand the related biological or chemical pathways or mechanisms. Furthermore, based on complex gene-protein-drug-disease heterogeneous networks constructed from multiple genomics databases [
65–
67], combining sequence and structural features of proteins/targets with association features in complex graphs through heterogeneous graph convolution networks would be a direction that could be improved.
To sum up, we introduced DeepDrug which can be served as a framework for systematically exploring the DDIs and DTIs prediction tasks with a unified model architecture. With DeepDrug, researchers could perform drug repositioning with specific target proteins. Then, one can simultaneously learn the interaction mechanism and annotate the interaction potential for every possible drug. Using large-scale public data, one could train an accurate and interpretable model to predict the interactions associated with human diseases (e.g., SARS-CoV-2). We hope our approach could help unveil the drug interaction mechanism and facilitate the further biochemical research.
4 METHODS
4.1 Drug and protein feature representation
We used DeepChem [
68] for converting drug SMILES strings into graph representations in the form of feature matrices (
i.e., node/edge feature matrices) and adjacency matrices. We used PAIRPred [
69] software for extracting the protein PDB data into similar graph representations, including feature matrices and adjacency matrices. Specifically, each drug is constructed with 11-dimension edge features and 93-dimension node features, of which 91 features were calculated using DeepChem and the remaining two are the in-degree and out-degree of each node. The cutoff for drug sequence length is set to 200. As for graph feature of proteins, we firstly collected the PDB files of all proteins on the RCSB database. For each protein, we selected the longest crystal structure,
i.e., the longest chain in the PDB file, as the 3D structure of the protein. Each protein is constructed with 80-dimensional node features, including amino acid features, and 2-dimension edge features (the distance of amino acids, and the angle of amino acids [
41]. Among the node features, 78 of them are calculated by PAIRPred software and the remaining two are the in-degree and out-degree of amino acids. Note that we did not take into account the conformational plasticity of proteins to different drugs due to the lack of sufficient available data. The cutoff for protein sequence length is set to 1000. We removed proteins without 3D structure and the corresponding DTI pairs for DTI datasets
4.2 Residual graph convolutional network
The residual graph convolutional network (Res-GCN) module was capable of learning both node embeddings and edge embeddings simultaneously by graph convolutions while other GCN-based methods in this field only consider node embeddings. The Res-GCN module converted original node features (93 and 80 features for drug and protein respectively) to the 128-dimension features and also converted the original edge features (11 features for drug and 2 features for protein) to 128-dimension features. Borrowed from the success of deep residual network [
70], we applied convolutional residual blocks in the Res-GCN module, which contained 22 residual blocks for drug branch (DDIs and DTIs tasks) and 6 residual blocks for protein branch (DTIs tasks).
We carefully designed a strategy for iteratively updating the edge features and node features in each convolutional residual block as follows. Taking the
lth graph convolutional residual block for an example, we denote the input node features and edge features as
and
respectively, where
and
denote the number of nodes and edges,
and
represent the node feature dimension and edge feature dimension. For initialization, we set
for drug or
for protein,
for drug and
for protein. Both node features
and edge features
were first passed through a layer-normalization layer [
71], a ReLu nonlinear layer and a dropout layer (ratio=0.1), which are represented as
We next illustrate how to get the node features and edge based on the processed feature matrices and through an iterative strategy. We use and for denoting the ith node features (e.g., ith row of ) and edge features between ith and jth nodes. Taking the ith node for an instance, we first calculated the residual features of edge between the ith node and jth node based on the current edge features and the node features and , which is represented as
where is two-layer perceptron with 256 and 128 nodes with ReLu activation function by taking the concatenation of node features and the corresponding edge features as input. Next, we calculated the residual features of the ith node based on the current ith node feature and all edge features connected to node i, which is formulated as
where
is the learnable parameter in the
lth graph convolutional residual block and the second term is a SoftMax aggregation function [
70] for aggregating the information of edges between node
i and all neighboring nodes. Note that the SoftMax aggregation function is parametrized by
which is also learnable in the training process. After getting the residual features of all nodes and edges, which form the residual node feature matrix
and the residual edge feature matrix
, the node feature
and edge feature
in
-th graph convolutional residual block were updated through the following propagation rule:
To ensure the compatibility of adding the features and for shortcuts when , we additionally used a linear layer with 128 nodes and a layer normalization to transform the input dimension by
Note that the two Res-GCN or CNN modules had shared weights during DDI tasks and are independent for DTI tasks. In the DDI tasks, the interactions of two drugs in the DDIs are not reciprocal to each other, and thus, although DeepDrug’s feature extraction modules are shared for each drug, the input positions of the two drugs cannot be switched (i.e., the A-B drug pair input is not equivalent to the B-A drug pair input for DeepDrug).
4.3 Data preparation
We collected 5 DDI benchmark datasets for evaluation. DrugBank benchmark dataset consists of 1706 drugs with 191,808 drug pairs among 86 types of drug interactions based on the drug function. TwoSides dataset consists of 645 drugs with 63,473 drug pairs among 1317 kinds of interactions based on the side effects, such as “abscess”, “adenoma” and “agnosia”. Different from the exclusive interactions of Drugbank dataset, side effects in TwoSides dataset are not exclusive, indicating that side effect prediction is a multi-label classification task. We further filtered out classes with less than 500 samples and construct TwoSides (963), the Twosides dataset with 963 types of interaction. Two datasets from NDD [
15] are collected. The first one, termed NDD_DS1, is composed of 548 drugs with 300,304 drug pairs, in which 97,168 pairs are positive and the rest are negative. The second one, termed NDD_DS2, consists of 707 drugs with 499,849 drug pairs, in which 34,412 pairs are positive. DDInter [
45] dataset consists of 1493 drugs with 117,608 drug pairs.
To generate a series of binary datasets from DrugBank with different positive-to-negative ratio, we considered all the pairs in the DrugBank dataset as positive samples. As for negative samples, we randomly selected drug pairs in the dataset and eliminated drug pairs that overlapped with positive samples and duplicated drug pairs. In this way, we constructed a series of binary classification datasets with positive-to-negative ratio of 1:1, 1:2, 1:4, 1:8 and 1:16.
We collected 3 DTA benchmark datasets for evaluation, including DAVIS, KIBA and BindingDB [
72] dataset. After discarding proteins without 3D structure in RSCB database, DAVIS dataset consists of 68 drugs and 316 proteins, which constructs 21,488 drug-protein pairs. KIBA dataset consists of 2111 drugs and 185 proteins, which constructs 390,535 drug-protein pairs. As for BindingDB dataset, it consists of 417,893 drugs and 2076 proteins, which constructs 751,808 drug-protein pairs. We applied thresholds of 100, 12.1 and 400 to the raw affinity scores in the DAVIS, KIBA and BindingDB datasets, respectively, to construct the corresponding binary datasets, according to pioneer study [
26].
For the DAVIS and BindingDB datasets, the binding affinity is measured by
Kd value (kinase dissociation constant), of which the range is too large.
Kd is log-transformed,
i.e., pKd, using the formula as follows [
26]:
4.4 Baseline methods
To evaluate the performance of DeepDrug, we benchmarked DeepDrug on multiple datasets with a 5-fold cross-validation strategy for DDI tasks and DTI tasks. For classification task, we benchmarked DeepDrug with multiple baseline methods, including DeepPurpose [
47], DeepDDI [
16], NDD [
15], AttentionDDI [
44], DDIMDL [
28], SkipGNN [
29], logical regression (LR) and random forest (RF). We have modified DeepDDI slightly to make it suitable for binary classification. Note that NDD and AttentionDDI are based on multiple similarity matrices, which is not able to calculate on others dataset since the source code is not released, we directly collected the results on NDD_DS1 and NDD_DS2 from the original paper. DeepPurpose is a deep learning framework for DTI prediction. We used default setting (CNN embedding for drugs and targets) of DeepPurpose for benchmarking. We have also modified DeepPurpose slightly to make it suitable for DDI prediction. For drug-target interaction task, we benchmarked DeepDrug with RF, LR, MolTrans [
49], CPI [
48], TransformerCPI [
50] and DeepPurpose. Note that we did not evaluate MolTrans, LR, TransformerCPI on BindingDB dataset due to time limitation (within 48 hours). For drug-target affinity regression tasks, we benchmarked DeepDrug with DeepDTA [
26], GraphDTA [
27] and DeepPurpose.
4.5 Model training and evaluation
The final prediction layer was a linear layer with an activation function, which was dependent on the tasks. Specifically, the Sigmoid activation function were used for binary classification task and multi-label classification task. The multi-label information was collected from TwoSides and Drugbank databases, which contain 1317 and 86 categories for the interaction types, respectively. The Softmax activation function was selected for multi-class classification task and none of activation function was used for regression task. Cross entropy (CE) loss is in classification settings and mean square error (MSE) loss was used in regression settings. We used Adam optimizer with initial settings of a learning rate of 0.01, and a weight decay of 10
−4. The dropout ratio was set to 0.1. The DeepDrug was implemented with PyTorch framework [
73]. We used Ray-project [
74] for hyper-parameters searching (Supplementary Note 3).
F1 score, auROC and auPRC are used for measuring the performance in classification task. Due to the unbalance of the datasets, macro F1 score and auPRC are the more suitable metrics. For multi-label and multi-class classification, we regarded the problems as multiple binary classification tasks and calculated auROC and auPRC individually and then averaged them as the final auROC and auPRC score. As for metrics of regression task, we used serval metrics to evaluate the performance of affinity prediction, including R2, Pearson correlation, and concordance index.
The Author(s). Published by Higher Education Press.