1 INTRODUCTION
Escherichia coli is one of the most important model organisms in biology and its metabolic GEM has aided the development of microbial systems biology [
1]. To understand microbial biology at systems level, metabolic network reconstruction is a key technology to explore the structure and dynamics of cell system. Based on genome and literature data, several groups have reconstructed genome-scale metabolic network of
E. coli [
1–
6]. Biological information contained in these metabolic networks always focuses on endogeny of
E. coli, however, the larger amount of engineered information is not contained in these studies. On the other hand, many tools of visualizing information including pathways, reactions, compounds are based on traditional 2D. For instance, KEGG Atlas [
7] is a graphical interface to the KEGG suite of databases, which contains a manually created global map for metabolism. Pathway Tools [
8] applied in BioCyc, is a production-quality software environment for creating a type of model-organism database called Pathway/Genome Database and it is able to describe the genome and biochemical networks of organisms. Many other tools [
9,
10], such as Cytoscape, VisANT, Pathway Studio and Patik, have emerged for visually exploring biological networks. Due to the complexity of the metabolic network [
10] and various types of information it contains, the traditional 2D representation of metabolism data can hardly be extended for hundreds pathways. Furthermore, metabolism data in traditional 2D visualization is a lack of compactness and information density. So, how to collect biological PRM data scattered in literature related with engineered
E. coli, and visualize it in a global 3D overview is a big challenge. Several tools could be adapted to make the visualization of metabolism data more vivid. Arena3D [
11] puts nodes into different layers to reveal interactions between node types. Since Arena3D computes separate layouts for each layer, edges between layers are often cluttered and difficult to follow. MetNetGE [
12] utilizes a novel layout approach called the enhanced radial space-filling (ERSF) to give an overview of hierarchical pathway ontology and 3D tiered layouts, and its graphical user interface (GUI) is written with PyQt. However, there is a lack of experimental PRM data of engineered
E. coli in MetaNetGE.
With great achievements that metabolic engineering and synthetic biology have received for the past 100 years, more and more biological knowledge has been discovered. At the same time, several groups have collected such information to establish biological reaction databases. For example, EcoCyc [
13] combines information of metabolic process and genome of
E. coli. KEGG [
14] contains more than 9,000 biochemical reactions. BRENDA [
15], an enzyme information system, collects information of enzymes, enzyme-ligands, reactions and pathways. BKM-react [
16] is a non-redundant reaction database, which integrates with BREADA, KEGG and MetaCyc. However, most of these databases are not chassis-centered, and on the other hand, they did not curate comprehensively biosynthetic information for engineered
E. coli from the original literature.
According to the above mentioned analysis, there is a lack of three dimensional online visualization web server that integrates with various informatics tools to represent a comprehensive metabolic network containing PRM data in engineered E. coli. Based on biological information in EcoCyc and large amount of experimental PRM data collected from science publications related with engineered E. coli, SynBioEcoli could make researchers have a global overview of biosynthetic ability of E. coli in three dimensional visualization.
2 RESULTS AND DISCUSSION
2.1 SynBioEcoli system
The SynBioEcoli system is constructed with several components, including data curation, quadratic partitioning, informatics tools and 3D web-based rendering (shown in Figure 1). When comparing with traditional 2D visualization, the network in 3D could avoid overlapping (many nodes and edges are mixed together) that 2D network layout automatically generated (shown in Figure 2).
During metabolic network reconstruction, the solution of “reaction specificity” and “currency metabolites” is carried manually, and it is listed online at: http://www.rxnfinder.org/media/ecoli/data.html.
In the global graph of SynBioEcoli, the metabolites are treated as graph nodes, and the biochemical reactions among them are regarded as graph edges. SynBioEcoli allows researchers to retrieve the target item by names or (sub)structure smiles. Once the target item is determined, SynBioEcoli graph will reposition to make the target item shown in the center of the computer screen, researchers could click the colored object to get more information.
EcoCyc(biopax-level3.owl from its website) contains 338 biosynthetic pathways, 890 metabolic reactions, 964 chemical compounds. While SynBioEcoli contains 740 biosynthetic pathways, 3,889 biosynthetic reactions, and 2,255 chemical compounds, and it represents a more comprehensive knowledgebase to explore the biosynthetic ability of E. coli. Here, we provide some examples: lycopene and astaxanthin are value-added compounds. With the purpose of attaining some biosynthetic pathway information of them when utilize E. coli as cell factory, researchers could not retrieve any biosynthetic information in EcoCyc for the reason that they are non-native metabolite in E. coli. In PubMed, it is time-consuming for researchers to collect useful biosynthetic information including pathways, reactions, compounds and enzymes. While in SynBioEcoli, we collected such information manually. When users utilize lycopene as search terms, it will return five engineered pathways (Biosynthetic pathway of lycopene in E. coli from a foreign mevalonate pathway; Simplified diagram of glycolysis, TCA cycle, PPP, biosynthesis pathway of lycopene; lycopene biosynthesis 1; lycopene biosynthesis 2; lycopene biosynthesis 3), and when users utilize astaxanthin as search terms, it will return two engineered pathways (astaxanthin, astaxanthin biosynthetic pathway in astaxanthin-producing bacteria and the catalytic function of CrtZ and CrtW). What’s more, many new clarity regarding biochemical reactions have been completely understood and successfully introduced to E. coli. For example, in the astaxanthin biosynthesis pathway, zeaxanthin was converted to adonixanthin, and then adonixanthin was converted to astaxanthin. Each reaction is supported by specified literature.
2.2 Substructure searching and sequence similarity searching
In SynBioEcoli, Chemoinformatics tools are implemented to search (sub)structure of metabolites and proteins/genes catalyzing enzymatic reactions. The following example in Figure 3A demonstrates a chemical substructure searching in SynBioEcoli.
After users input a query “O=C(O)c1ccccc1” (Benzoic acid), a list of compounds containing the substructure will be displayed in the panel list.
(i)When users click a compound name in the panel list, the structure image will be automatically loaded. Users could also move mouse over the structure picture to magnify for more details.
(ii)Users can click “>>” symbol to expand all pathways containing the specified compound in engineered E. coli. After users click a pathway, the SynBioEcoli view will be changed so that current compound will be repositioned to the center of the computer screen.
2.3 Network analysis
There are hundreds of metabolites (graph nodes) and reactions (graph edges) in SynBioEcoli according to thousands of literature related with engineered E. coli. Some network properties (Figure 3B), such as degree ranking and distributions of nodes, statistic of edges, the number of pathways, and network graph densities, could be automatically calculated in the network analysis function.
3 CONCLUSIONS
SynBioEcoli utilizes novel three dimensional visualization technology to display metabolic network containing fruitful and experimental PRM data in engineered E. coli. With a focus on biosynthetic ability of E. coli, SynBioEcoli contains 740 pathways, 3,889 metabolic reactions, and 2,255 metabolites, and almost all of items are supported by specified literature. It could potentially provide a bridge between 2D metabolic network and 3D virtual reality of E. coli cellular metabolism, and it could be served as a comprehensive knowledgebase to explore biosynthetic ability of E. coli based on experimental PRM data in E. coli host.
4 METHODS
4.1 Data curation
Firstly, comprehensive PubMed biosynthetic publications were retrieved by using the search terms “(biosynthetic [Title/Abstract] OR biosynthesis [Title/Abstract] OR metabolic engineering [Title/Abstract]) AND (Escherichia coli [Title/Abstract] OR E. coli [Title/Abstract])”, it returned about 11,000 publications related with engineered E. coli in various strain types. Most reactions and pathways are shown in diagram format in literature, so it is necessary to curate them manually. Our data curators downloaded related publications, read them and inputted the PRM data in our in-house website platform, and then several biological experts reviewed the data to ensure its correctness and completeness.
To increase reliability of data, each item in SynBioEcoli is supported by corresponding original literature. On the other hand, specific ID has been assigned to each item to avoid duplication. Three components (compounds, reactions as well as pathways) constitute the framework of SynBioEcoli, in which the EcoCyc data is also included.
4.2 Three dimensional graph visualization
A graph G= (V, E) is a set V (vertices) and E (edges), in which an edge joins a pair of vertices. A Fruchterman-Reingold force-directed graph drawing algorithm [
17], which has been implemented in our previous study to visualize network pharmacology [
18], was adopted again to generate the 3D coordinates of edges and nodes in this work. It is important to note that the algorithm was used twice (called as quadratic partitioning algorithm), firstly for gridding the 3D space for the pathway; and secondly for partitioning each pathway grid for chemical compounds contained in pathway. In order to make the edges have nearly equal length and avoid crossing, we utilized the algorithm above mentioned to assign forces among edges and nodes based on their relative positions, and then minimize their energy.
4.3 Substructure searching and sequence similarity searching
In the metabolic network, there are thousands of metabolites represented by nodes. In Chemoinformatics, a chemical substructure is a subgraph of a molecule graph, and it is correspondingly labeled a manner reflecting the nature of the atoms and bonds in the original molecule. In this work, chemical substructure similar with molecular fragment searching in our previous studies [
19,
20], is used to retrieve related compounds and pathways.
In SynBioEcoli, similarity search methods [
21–
23] based on FASTA algorithm are used to retrieve specific enzyme or gene via sequence fragments of amino acid or base individually. The FASTA algorithm software package were downloaded from the FASTA team at Virginia University.
4.4 Network analysis and web server
The network analysis function provides useful knowledge of SynBioEcoli, such as properties of node/edge/pathway/network. A python package, NetworkX (high-productivity software for complex networks) is used in the network analysis function. SynBioEcoli server system applies a Browser/Server framework under Linux environment, in which Ajax, Apache Http Server, CSS, Django, HTML5, JavaScript, Json, C++, Python, Online Molecular Editor as well as Network Graph Analysis Algorithms are included. What’s more, WebGL technology and Three.js [
24] Framework are used for interactively three-dimensional visualization in browser environment.
Higher Education Press and Springer-Verlag Berlin Heidelberg