1 INTRODUCTION
Module (or community) detection is useful for both decomposing and visualizing various types of biological association data such as detecting dense communities in genetic interaction networks, finding patterns representing protein complexes and pathways, mining tight interaction patterns in phosphorylation networks and so on [
1–
15]. Thus, a number of tools have been developed to identify modules such as CFinder [
16], MINE [
17], Linkcomm [
18] and SurpriseMe [
19]. Besides, several Cytoscape plugins have also been designed for such tasks [
20].
Module detection in bipartite biological networks is also a key and valuable issue. For example, in transcriptional regulatory network, mining tight interactions between transcriptional factors (TFs) and target genes could be perceived as a bipartite module detection problem, where TFs represent one type of node set and target genes represent another node set. Thus, it is an urgent task to develop efficient module detection methods for bipartite biological networks. A few methods have been developed for this task such as AdaptiveBrim [
21], BCFinder [
22], BiLPA [
23], BiSBM [
24], LeadingEigenvector [
25], and LPBrim [
26]. In addition, some are implemented as a plugin of Cytoscape, like NeMo [
27], which combines hierarchical agglomerative clustering with neighbor-sharing score to detect diverse network communities. However, there is few user-friendly online tools to implement diverse methods and facilitate their applications. To this end, we design a web toolkit BMTK (bipartite module detection toolkit) to implement seven bipartite module detection methods. This toolkit unifies operation platform of different methods, standardizes their input and output formats, improves program structures to speed up their computations, and provides online visualization function. BMTK will be a powerful tool for detecting bipartite modules in diverse bipartite biological networks.
2 METHODS
Here, we introduce the main seven methods implemented in BMTK. AdaptiveBrim employs BRIM (bipartite, recursively induced modules) with an adaptive and heuristic strategy to maximize the modularity to get an optimal partition [
21]. The basic idea includes doubling the number of modules until modularity decreases and then using a bisection method to identify the maximal value of modularity between current and previous partition. BCFinder is based on an extension of the
k-clique community detection algorithm. It detects all kinds of bicliques
K(
m,
n) and merges them to form communities [
22]. BiLPA
employs a heuristic adapted label propagation algorithm to optimize a new quantitative function called bipartite partition density to identify bipartite network modules [
23]. BiSBM
implements a projection-free and stochastic block model, which can be applied into
k-partite networks directly [
24]. It is a variant of the classic Kernighan-Lin algorithm. LeadingEigenvector adopts a spectral approach to determine the leading non-negative eigenvector of the modularity matrix of a network to extract corresponding dense modules [
25]. LPBrim
first employs a fast label propagation (LP) approach to find an initial module partition efficiently and then adopt a recursive induced method BRIM to determine the final module division [
26]. This algorithm is designed for detecting modular structure in large-scale bipartite networks within reasonable time. LocalBM uses a greedy growth strategy to maximize a cohesiveness function. This strategy has been adopted for protein complex extraction in unipartite biological networks [
28].
2.1 Implementation
BMTK is a user-friendly online software implemented by JDK1.6 (Figure 1 and Supplementary Materials) [
21]. It is designed to pack diverse algorithmic implementations in Matlab, C++ or others into a unified Java package via mixed programming techniques. In BMTK, we first provide users a brief introduction, and further provide a tutorial on how to use BMTK including uploading a recognizable bipartite network file, running a method, and demonstrating and downloading identified modules. The key part is about running one of the methods. Users could select any of them by clicking corresponding method label.
We uniform the input formats of methods in BMTK which supports either adjacency matrix or edge list of a bipartite network. The toolkit will automatically extract required information and transform it into corresponding format for each method which has its own input format originally.
After uploading the data, BMTK turns into a data summary page which provides some basic information about the uploaded bipartite network, including the number of nodes and edges, density of this network, and an interactive visualization of the network by Cytoscape Web. Next, by clicking the link below the summary, BMTK starts to apply a user-preferred method to the input network, and then automatically turns into the result page with a table containing the basic statistics of all modules. Moreover, we provide visualization of the detected modules by Cytoscape Web. Besides, we also give a global visualization of partition result in two figures by BiMat [
29]. At last, all results can be downloaded for local analysis.
More detailed description about BMTK is introduced as follows.
2.1.1 Input data
We provide two input formats for a bipartite network in BMTK:
1) Edge list of the bipartite network (.txt)
The file contains two node columns separated by single tab or space. Nodes in each column belong to part U ={u1,u2,…} and part V ={v1,v2,…} of the bipartite network, respectively. Each row indicates one edge.
2) Adjacency matrix of the bipartite network (.txt)
The file contains the bipartite adjacency matrix, whose rows and columns respectively correspond to the nodes in two parts, and the values in it must be binary (0 or 1), where 1 indicates one edge exists.
2.1.2 Run
1) Open the website of tools in the Zhang lab. Click “Run” section located in menu on the left, then select a method (e.g., AdaptiveBrim).
2) Upload network file through clicking “Browse” and select file format via “format” option.
3) After submitting data, the page will jump to “data summary”, which shows essential information of the bipartite network and check the reasonability of key parameters. Then click “Running” to detect modules of a given bipartite network.
4) After a while (the computing time depends on characteristics of the input network), the page will show basic statistics of the identified modules, reordered adjacency matrix and layout of the bipartite network based on the detected modules. The user can download the module member nodes and module subnetworks stored in “moduleMembership.txt” and “moduleSubnetwork.txt”, respectively.
2.1.3 Overlap test
We build an “overlap test” function in BMTK. It gives the consistency degree of modules detecting by two methods in a heatmap. Hypergeometric test is used for all pairwise modules to test weather their overlaps are significant or not.
2.1.4 Tutorial and Demo
We have shown a detailed tutorial about BMTK. You can use the bipartite network via selecting “edgelist.txt” or “matrix.txt” in the “Demo” to test any method in BMTK.
2.2 Application
We first apply BMTK onto a set of simulated data with sizes of 100×120, 250×300, 500×600 and 1,000×1,200 and diverse numbers of embedded modules (Table 1 and Tables S1–S3). The links between nodes in each network is randomly generated with probability 0.3 within modules and 0.03 for others. The adjust rand index (ARI), a measure of agreement, is adopted to measure the module partition results against a given criterion. Generally, we find that BiLPA tends to find more detailed modules in each case. As to the time complexity, we can see that AdaptiveBrim is the most efficient one. LeadingEigenvector and LocalBM perform well for all situations. BiSBM is different from other five methods. It detects modules in two parts of a network separately with the given numbers of modules in both parts. Thus, it does not reveal the link affiliation with respect to modules in both parts. Generally, all the methods perform differently and user may need to select one method based on some domain knowledge or experience for further analysis. As BCFinder is based on an extension of the k-clique community detection algorithm which differs from other methods, we do not show its results in tables.
Moreover, we believe that bipartite module detection in a drug-target network will be helpful for understanding the modular characteristics of such a network (Figure 2A). Here, we apply BMTK to a drug-target bipartite network (Supplementary Figure S1) [
30]. This network includes 208 drugs, 201 targets and 1,473 drug-target interactions. We detect a set of drug-target modules with each method (Table 2), and analyze the functions of drugs and targets (Supplementary Figures S2–S7 and Tables S4–S10). We find that in the same detected module, drugs often function in the same pathway and so do targets. For example, LeadingEigenvector identifies 14 modules, which contain 13 drugs and eight targets on average (Figure 2A and Supplementary Table S8).
For the left module in Figure 2B, it closely relates to antidiabetics (named as antidiabetics module). This module contains 13 drugs and three targets, in which 11 drugs are blood sugar related agents. Specifically, based on the KEGG DRUG database, D00593, D00336, D01599, D00380, D00219, D00379, D00335 and D01799 all belong to hypoglycemic agents. D00594, D01111 and D01854 are all antidiabetics. For the drug D01828 (amlexanox, an approved small molecule to treat aphthous ulcers and asthma), a previous study reported that using this drug for obese mice elevates energy expenditure resulting in improved insulin sensitivity and decreased steatosis. Thus, it is a promising candidate for obesity treatment. For the drug D00418, it is a kind of ATP-sensitive potassium (KATP) channel blocker. In pancreatic beta-cells, KATP channels regulate glucose-dependent insulin secretion. That is, when KATP channels open, beta-cells hyperpolarize and insulin secretion is suppressed. Thus, such KATP blockers could serve as the target for the treatment of type II diabetes. For the targets hsa:3767 and hsa:6833, they are the members of type II diabetes mellitus and insulin secretion pathways. hsa:3785 (KCNQ2) encodes the potassium channel protein KvLQT2, which is also associated with insulin secretion stated as before. Besides, in the original drug-target bipartite network, it links to nine drugs, eight of which are hypoglycemic agents or antidiabetics. This also indicates that hsa:3785 probably involves in decreasing blood sugar levels. In summary, this module structure detected by BMTK is very biologically meaningful and helpful to explore the relationships between drugs and target genes. Interestingly, the ninth detected module by BiLPA is the same as this one; and for LPBrim and AdaptiveBrim, this module is separated into the same two modules, one of which contains one target (hsa:3767) and its linked five drugs and the other one contains two targets (hsa:6833 and hsa:3758) and their linked eight drugs.
3 CONCLUSION
Here we develop a web toolkit BMTK, implementing seven methods originally programmed in different environments. It is a convenient and powerful tool for bipartite network module detection. It uniforms input data format for all methods, optimizes each method, provides necessary data analysis, result visualization, and downloads. Moreover, we test it on a real drug-target bipartite network to show its effectiveness.
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature