INTRODUCTION
The application of high-throughput biological techniques has been increasing in the last decade. Annotated sequences, biochemical and physiological data are now available for many organisms, and the most exciting challenge for systems biology is to describe, integrate and analyze this huge amount of heterogeneous data in order to transform them into usable knowledge.
Metabolism is a highly coordinated system which provides energy and precursors of macromolecules for cellular functions. The metabolism of most cells is tightly regulated at multiple levels in response to external or internal stimuli. Constraint-based mathematical metabolic models have been extensively used as a tool to simulate the behavior of metabolic networks, guide metabolic engineering, and interpret biological data [
1]. However, constraint-based metabolic modeling does not take account of regulation. Consequently, these models cannot accurately carry out prediction when the regulatory mechanisms play a decisive role in defining the behavior of the system. So, in order to improve their prediction ability, it is useful to integrate regulatory information into constraint-based metabolic models. Integration of regulatory data into genome-scale metabolic models is a challenge for systems biology, but it could lead towards a better understanding of mechanisms governing the cell, and allows the analysis of the relationship between the environment and the metabolic capabilities of an organism.
Transcriptional regulation is one of the mechanisms controlling metabolic activities. The regulatory interactions between transcription factors (TFs) and their target genes form a transcriptional regulatory network (TRN) [
2], which responds to the amount and kind of exogenous or endogenous metabolites (input) and adjusts the mRNA synthesis of genes for specific metabolic pathways (output). Genome-scale TRNs have been reconstructed in many model organisms, such as
Escherichia coli and
Saccharomyces cerevisiae, due to the extensive knowledge and experimental data available for these organisms.
Here we will review studies on the reconstruction of the TRN of yeast, and how it can be integrated into constraint-based metabolic models. As there have been many thorough reviews on the methods for the reconstruction and analysis of TRNs [
2–
8], the progress that has been made in
S. cerevisiae will be emphasized. For the integration of TRN reconstructions into metabolic models, many of the methods were first developed in
E. coli and then extended to eukaryotes in a second stage. These methods have a general validity, and theoretically can be applied to any organisms. However, the results obtained in
S. cerevisiae by the current integration techniques are often unsatisfactory, possibly due to the high complexity of gene regulation [
9], the current limited knowledge available on such mechanisms, and the presence of many cellular compartments in eukaryotic cells. Thus, we are going to represent the current state of the art of these integration methods and emphasize the results obtained in
S. cerevisiae when possible.
TRANSCRIPTIONAL REGULATION OF METABOLISM IN S. CEREVISIAE
The metabolic state is determined by multiple factors including substrate concentrations and the activities of metabolic enzymes [
10]. Protein abundance, covalent modification and allosteric regulation all affect the activities of metabolic enzymes. Transcriptional regulation plays a role in controlling the abundances of some metabolic enzymes through regulating the levels of their corresponding mRNAs. The transcriptional regulation of metabolism in
S. cerevisiae has been widely studied for the physiological effects and molecular mechanisms. From present results we can draw several lessons:
(1) Transcriptional regulation widely occurs in metabolic pathways. Enzymes in carbon (Figure 1), nitrogen, sulfur, phosphate and oxygen metabolisms are extensively regulated at the transcriptional level in
S. cerevisiae [
13,
14], with about 100 TFs having been confirmed or suggested to be involved. In fact, about 40 TFs in
S. cerevisiae are named after their functions in metabolism regulation, e.g., Adr1p (Alcohol Dehydrogenase Regulator), Rgt1p (Restores Glucose Transport), and Ino2p (INOsitol requiring).
(2) The contribution of transcriptional regulation in the regulation of metabolism is reaction- and condition-dependent. The abundances of metabolic enzymes are regulated at several other layers (e.g., mRNA degradation, translation, protein degradation) in addition to transcription. Moreover, many metabolic enzymes are regulated at the activity level. Thus, it is quite natural that changes in the transcription levels of enzyme genes are often not consistent with those in corresponding metabolic fluxes in
S. cerevisiae [
15–
17]. The correlation between gene transcription and flux is affected by the degree of other kinds of regulations (e.g., metabolite-enzyme interactions [
18]), and is dynamic under different conditions even for the same reaction [
19,
20]. Some reactions for alternative carbon source (e.g., galactose and maltose) utilization [
21], glyoxylate cycle, gluconeogenesis [
16], ergosterol biosynthesis [
19], amino acid metabolism, and nucleotide metabolism [
14] are suggested to be transcriptionally-controlled. For example, a mutant with disruption of
PUT3, which encodes a transcriptional activator of proline utilization genes, completely loses the ability to use proline as a sole nitrogen source [
22]. On the other hand, transcriptional regulation plays only minor roles in controlling many other metabolic reactions, e.g., glycolysis [
15,
16].
(3) Diverse input (upstream signal to TFs) and output (TFs to targets) mechanisms are used by the TFs for regulation of metabolism. Similar with metabolic enzymes, the TFs themselves are regulated in abundance and/or activity levels by upstream signals. In addition, some TFs are regulated at the cellular location level (cytoplasm or nucleus). These regulations are achieved through diverse molecular mechanisms: (a) transcriptional control by other TFs, e.g., repression of gluconeogenesis activator gene
CAT8 by Mig1p [
23]; (b) specific mRNA degradation, e.g., deadenylation of the mRNA of glucose repressor gene
NRG1 [
24]; (c) translational control, e.g., translation initiation regulation of amino acid biosynthesis activator Gcn4p through upstream ORFs [
25]; (d) regulated protein degradation, e.g., ubiquitin-mediated degradation of peptide transport repressor Cup9p [
26]; (e) proteolytic processing, e.g., activation of amino acid utilization activator Stp1p and Stp2p by protease Ssy5p [
27]; (f) covalent protein modification, e.g., phosphorylation of nitrogen source utilization activator Gln3p, which controls its cellular location [
28]; (g) allosteric regulation by metabolites, e.g., activation of Put3p by proline [
29]; (h) regulation by protein-protein interaction, e.g., inhibition of galactose utilization activator Gal4p by Gal80p [
30]. Actually, many TFs are under the control of multiple regulatory mechanisms. For example, the above-mentioned Gal4p is also regulated at the transcriptional level [
30], and Gcn4p is also regulated at the protein degradation level [
31]. To exert their biological functions, some TFs can differently act on different targets. For example, when the medium is shifted from glucose to ethanol, Rds2p activates the transcription of gluconeogenic genes and represses that of genes suppressing gluconeogenesis [
32]. Similarly, in the presence of serine, Cha4p activates the transcription of a serine catabolic gene and represses that of a serine biosynthetic gene, with the latter achieved by activating the expression of a neighboring intergenic non-coding transcript [
33].
(4) Different types of associations exist between TFs with shared target genes. For example, phospholipid biosynthesis activators Ino2p and Ino4p form a complex on target gene promoters, and both TFs are necessary for inositol synthesis [
34]. However, in the case of sulfur metabolism activators Met31p and Met32p, which are paralogs binding to the same site on the promoter, only disruption of both TFs led to methionine auxotrophy, suggesting redundant functions of them in methionine biosynthesis [
35]. Four TFs involved in nitrogen source utilization, including repressors Dal80p and Gzf3p and activators Gat1p and Gln3p, influence each other by both transcriptional regulation and physical interaction [
36]. This kind of associations between TFs needs to be carefully considered when converted to mathematical representations.
RECONSTRUCTION OF THE TRANSCRIPTIONAL REGULATORY NETWORK
Strategies for establishing TF-target interactions
Regulatory interactions linking TFs (regulatory proteins) and their targets (regulated genes) are the basic units of a TRN [
37]. As many transcriptional regulatory mechanisms are less conserved among organisms [
37,
38], it is generally difficult to reconstruct TRNs starting from protein homology analysis, as can be done for metabolic network reconstructions [
39]. Experimental studies, biochemical or genetic, bottom-up or top-down, are usually needed to establish the transcriptional regulatory interactions in a specific organism. However, genomic annotation is also important because it provides useful information for experiment design (e.g., TF prediction) and data integration (e.g., sequence conservations) (Figure 2).
Biochemical approaches can be used to identify direct interactions between TFs and their target genes. Here direct interactions mean physical occupancies of TFs on the promoters of genes. The methods in this field (for reviews see Refs.[
40,
41]), especially electrophoretic mobility shift assay (EMSA) [
42] and DNAase footprinting assay [
43], have been frequently used to study
in vitro interactions between specific TF-promoter pairs in
S. cerevisiae. Chromatin immunoprecipitation combined with cDNA microarray analysis (ChIP-chip), which was first developed in yeast [
44], is a powerful
in vivo method to find genome-wide direct targets of TFs. The first high-throughput ChIP-chip experiment in yeast identified the targets of 106 TFs in rich medium [
9], and a later study provided interaction information for 203 TFs under more environmental conditions [
45]. The data from these two studies are used in most TRN reconstruction studies in yeast (see section Reconstruction and Characterization of the TRN). The third high-throughput ChIP-chip experiment in yeast mainly focused on proteins involved in the transcriptional machinery [
46]. Alternative ChIP-based techniques have been developed recently, including ChIP followed by high-throughput sequencing (ChIP-seq) for higher resolution, larger coverage, lower noise and more accurate quantitation [
47], ChIP-exo achieving single-nucleotide resolution [
48], and competition ChIP for determining the residence time of TFs on promoters [
49]. As an alternative approach, affinity chromatography with immobilization of DNA followed by proteomic profiling can be used to identify the TFs occupying on a specific promoter
in vivo [
50].
The results obtained using biochemical approaches have at least two drawbacks. First, TF occupancy does not necessarily represent functional regulation, e.g., Leu3p constitutively occupies its target promoters whether it functions or not [
45]. In some cases, the functional activation of TFs (e.g., presence of cooperative factors) is needed in addition to their binding on promoters [
51]. Second, the interaction information between TFs and target genes obtained by biochemical methods does not contain the directivity (repression or activation) of transcriptional regulations.
Genetic approaches cannot distinguish direct and indirect TF-target interactions, but they do provide information on the existence (
if) and directivity (
how) of regulatory functions of TFs. This function information is very important, because only functional regulations can be integrated to metabolic models as constraints. In genetic approaches, usually transcriptional regulations are perturbed by deletion, suppression, over-expression or mutation of TF genes first, and then target genes with affected transcription levels are discovered through comparative analysis of parent and mutant strains. The transcription levels of specific genes can be determined using Northern blotting, quantitative PCR or reporter gene assay, while genome-wide transcriptional changes can be measured using transcriptomic technologies such as microarraysor RNA-seq [
52].
In
S. cerevisiae, a genome-wide single-gene-deletion mutant library has been constructed [
53], and genes with significant differential expressions in rich medium responding to individual deletions of 269 TFs were identified [
54]. A reanalysis of these data estimated that about 98% of the differential expressions were mediated by indirect TF-target interactions [
55]. These indirect regulations may be mediated by regulatory cascades (intermediate TFs) or physiological changes (e.g., deletion of one TF alters the concentration of some metabolite, which affects gene expression through other regulatory pathways). Transcriptomic responses to individual over-expressions of 55 growth-involved TFs were also reported in
S. cerevisiae [
56]. Compared with gene deletion, the gene over-expression strategy has the advantage in identifying condition-specific targets of some TFs even under standard growth conditions [
57]. It should be noted that the function of some TFs may be masked by paralog compensation when genetic perturbation of one single TF was performed [
58].
Transcriptome analysis has also been extensively used to study the responses of laboratory wild-type
S. cerevisiae strains to environmental perturbations. Some of the co-expressed gene groups (modules) generated by clustering analysis of these transcriptome data are assumed to be “co-regulated” genes under the same TF(s) [
59]. TF-target gene interactions can be predicted from these co-expression modules, for some TFs themselves are regulated at the transcriptional level [
60]. This method is particularly meaningful in finding condition-specific TF-target interactions, as there have been thousands of transcriptome datasets collected under various conditions (environments, developmental stages, time-series, etc.) in
S. cerevisiae [
61]. However, the correlations between the expression level of TF itself and those of their targets is weak for many TFs [
62], which seriously limits the inference of the TRN solely from gene co-expression data.
TFs regulate the expression of target genes through occupying specific
cis-acting transcription factor-binding sites (TFBSs). Thus, the “TF-target gene” relationships are actually the combination of “TF-TFBS” and “TFBS-target gene” relationships. Knowing the TFBS information is important for understanding the molecular mechanisms of TRNs. In addition, the presence, position, orientation and combination of TFBSs on promoters could be used to predict gene expression patterns in
S. cerevisiae [
63] and to infer the TRN (see section Reconstruction and Characterization of the TRN). The “TF-TFBS” relationships can be determined by
in vitro biochemical methods such as EMSA and large-scale microarray-based assays. Protein-binding oligonucleotide microarrays (PBMs) were used to determine the TFBSs of 112 [
64] and 89 [
65] TFs in
S. cerevisiae by different research groups. A microfluidics-based assay was shown to have higher sensitivity than PBM, and can measure the binding affinities of TFs [
66]. On the other hand, an oligonucleotide-binding protein microarray comprising 282
S. cerevisae TFs also succeeded in finding new interactions between TFs and oligonucleotides [
67].
“TF-TFBS” relationships can also be computationally predicted based on the over-representation of TFBSs in the target gene sequences of TFs. Target gene sets identified by both ChIP experiments [
48] and genetic perturbations [
56] can be used for computational discovery of TFBSs, and obviously the former (direct target gene sets) are more likely to give significant results. These computational tools were also used to predict “TFBS-target gene” relationships from other kinds of gene sets, such as co-expressed [
68] or functionally-related [
69] genes in
S. cerevisiae. In addition, as functional TFBSs are usually highly conserved, some of them can be identified by phylogenetic footprinting (i.e., looking for conserved regions after aligning promoter sequences of orthologous genes from closely related species) [
70,
71]. Integration of TFBSs determined by different approaches can give high-confidence data [
45,
72], and combination of fragmental “TF-TFBS” and “TFBS-target gene” relationships successfully gave complete “TF-target gene” interactions, e.g., the discovery of two TFs controlling ribosome biogenesis in
S. cerevisiae [
65].
Many of the transcriptional regulatory interactions obtained using the above approaches in
S. cerevisiae have been manually collected from original literature. The examples include a review summarizing the early results of individual TF-target interactions [
65], databases YeTFaSCo [
73] and ScerTF [
74] extensively collecting and curating TFBSs, and databases YEASTRACT [
11] and SCPD [
75] containing both TFBSs and TF-target interactions. In the
Saccharomyces Genome Database [
12], regulatory relationships between TFs and target genes are being added for nearly all genes recently [
76], which provide machine-readable data for the
S. cerevisiae TRN research.
Reconstruction and characterizations of the TRN
Interactions between TFs, TFBSs and target genes obtained from different approaches are complementary due to their own advantages and drawbacks. For example, comparison of target genes from biochemical and genetic experiments found non-occupying regulation and non-regulatory occupancy of TFs on target genes [
54]. Another example is that comparison of TFBSs obtained using
in vivo and
in vitro methods found combinational or “piggy-back” binding of TFs on promoters [
72]. Thus, data of different resources need to be integrated to obtain high-quality TRN reconstructions [
2].
Many algorithms for integrated TRN reconstruction and validation have been developed, such as those integrating ChIP data and gene expression data [
9,
77–
82], TFBS information and gene expression data [
83–
87], or all the three types of data [
88–
90]. The algorithms are generally based on the assumption that the direct target genes of a TF should meet more than one of the following criteria: (1) being detected for the known TFBS of the TF on the promoter; (2) being physically occupied by the TF; (3) transcriptionally responding to the perturbation of the TF; (4) transcriptionally responding to the active condition(s) of the TF; (5) having related functional annotations. A good example in reconstructing integrative TRNs is a study on DNA damage response in
S. cerevisiae, in which 30 TFs were first selected based on existing knowledge, and then gene expression profiling of wild-type and TF-deleted mutants and ChIP-chip were performed (both under normal and DNA-damaging conditions), providing multi-view data for integrative analysis [
91]. For less-studied TFs whose “active” conditions are unknown, integration of
in vitro TFBS information and co-expression data could be used to infer the possible conditions for further studies [
84]. TF-target interactions established using low- and high-throughput methods are also complementary regarding their coverage and signal-to-noise ratios. An integrative strategy for this is starting with well-studied small TRN reconstructions, followed by expanding it using interactions from high-throughput analyses [
92].
The TRN reconstructions are usually represented as network graphs, in which nodes indicate TFs and target genes, and arrows indicate transcriptional regulations. In addition, TRNs were modeled using different mathematical formalisms such as ordinary differential equations, stochastic equations, Petri nets, or Bayesian networks (for reviews see Refs.[
93–
95]). Nevertheless, the lack of high-throughput experimental methods to measure biochemical kinetic parameters
in vivo, necessary for the description of the system, limits the application of these approaches to only the description of small networks [
96].
Currently, genome-scale TRN reconstructions were converted into mathematic models only based on the use of Boolean logic. This formalism only requires information about the structure of the network and the nature of interactions (activation/repression) between the network elements [
97,
98]. The state of elements in Boolean network is described by an on/off representation, so the state of the whole network is defined by an
n-tuple of 1s and 0s according to the presence/absence of the particular elements. However, using the Boolean rules the level of detail of a system decreases and it is not possible to carry out quantitative prediction of the behavior of the system. Matrix based formalism has been used to study the properties of TRNs [
99]. This formalism is based on a reformulation of the Boolean rules into a matrix representation. Through the reformulation, the TRN is described as a regulatory network matrix (R matrix) used to investigate its functional (expression) states in response to environmental conditions.
In
S. cerevisiae, several large-scale TRN reconstructions have been reported during the past ten years. Figure 3 lists eleven of these reconstructions, which vary in terms of the number of TFs, target genes and TF-target interactions due to different data resources and different criteria used [
9,
11,
45,
46,
100–
106]. Most of these studies focused on the structural properties of TRNs, and the major conclusions from these studies are
(1) The structure of the
S. cerevisiae TRN is scale-free [
107], with most target genes having a few TFs and most TFs regulating a few targets [
6,
105]. The favor of links between highly-connected nodes (hub TFs or genes) and low-connected nodes, and limiting links between hubs, divide the TRN into structural modules [
104].
(2) From the viewpoint of hierarchical organization, TFs in top, middle and bottom layers have significantly different sequence conservations, transcript and protein abundances, numbers of direct targets, and effects on global gene expressions [
102,
108]. Interestingly, TFs in the middle layer were found to have the most direct targets.
(3) The structure of the
S. cerevisiae TRN is highly dynamic in responding to condition changes [
45]. Local regulatory units of different topological structures are favored by different conditions, and most hub TFs are only active under certain conditions [
101].
(4) The TRN of
S. cerevisiae is more complex compared with that of
E. coli. A comparative analysis found that although the total numbers of TFs are similar,
S. cerevisiae has ~5-fold more TF-target regulatory interactions, and ~12-fold more co-regulation links between TFs (i.e., two TFs share a common target), than those in
E. coli [
107].
INTEGRATION OF TRANSCRIPTIONAL REGULATORY NETWORK RECONSTRUCTIONS INTO METABOLIC MODELS
Constraint-based metabolic models
The concept of genome-scale metabolic reconstruction has been extensively researched and reviewed in the literature [
5,
109]. By using genomic and biochemical data available, it is possible to reconstruct genome-scale models of metabolism and analyze them to infer information about the behavior of the cell. Such models can be used for simulations and integrative data analysis, and several toolboxes have been developed for handling these models, e.g., COBRA [
110] and RAVEN [
39].
A protocol to generate a genome-scale model includes different stages [
111]. The first stage is the creation of a draft reconstruction by using a genome annotation of the organism of interest, and bioinformatics resources such as databases of reactions [
112,
113]. The second stage is the manual refining of the draft reconstruction to improve the quality of the model. This work is mainly based on the study of primary literature and includes the formulation of the biomass equation. The next stages are formed by the conversion of the reconstructed network into a mathematical model used to carry out simulations and by the evaluation of the quality of the reconstruction.
Many genome-scale reconstructions of the
S. cerevisiae metabolic network have been developed during the past decade. A consensus network was reconstructed as a result of a common effort by researchers to produce a highly annotated network with a standard nomenclature (see Ref. [
114] and visit http://yeast.sourceforge.net/ for the consensus yeast metabolic project). Moreover, the development of new models is still in progress in order to improve our description of some parts of the yeast metabolic network such as the lipid metabolism [
19,
115].
Flux Balance Analysis (FBA) [
116] is a mathematical method based on linear programming through which one can simulate the behavior of the reconstructed metabolic network under specific environmental conditions. The analysis is based on the fact that a reaction network is subject to a series of constraints that limit its behavior, such as environmental and physicochemical ones [
117]. The constraints define a closed space which contains all the possible stationary states that the metabolic network can reach. The more constraints one adds, the smaller the solution space becomes.
In particular, FBA allows finding a particular solution (optimal solution) inside the solution space that maximizes or minimizes a particular objective function. One of the most used objective functions is the optimization of biomass formation, which is a reaction that describes the incorporation of all necessary precursors like amino acids, nucleotides and cofactors into cellular macromolecules that make up the functional cell [
118]. Figure 4 summarizes the process of simulation of a metabolic network. Starting with a metabolic reconstruction and applying a series of constraints a solution space is defined. A particular solution inside the solution space is found using an optimization technique such as linear programming.
FBA is a powerful tool to study the properties of a metabolic network but it still has some limitations [
119]. In particular, FBA can make incorrect predictions when the regulatory network plays a key role in the definition of the behavior of the cell [
120]. For example, an FBA simulation of
E. coli metabolic network during aerobic growth in presence of both glucose and lactose predicts an uptake of both carbon sources at the same time. This prediction is not in agreement with the experimental result that the cell prefers glucose over galactose, and misunderstandings in the final analysis of the process may arise [
121].
Moreover, often the flux distribution obtained from linear programming is not unique but is part of a set of different flux distributions that generate the same value for the objective function. Every optimal flux distribution describes an alternative pathway usage that maximizes (or minimizes) the objective function [
122,
123]. This fact can lead to several problems. For instance, if there are two possible metabolic pathways that lead to the same final product from the same substrate with an equivalent stoichiometry, FBA is not able to distinguish between them. As a consequence, they may both be used in the simulation. So, these miscalculations often occur given that, for example, isoenzymes are the simplest form of an alternative pathway involved in this process, and without addition of extra information FBA cannot predict which of the isoenzymes is active at a given moment of time. In a study of the central carbon metabolism in
S. cerevisiae, the prediction of fluxes using different objective functions was shown to be able to catch the essential feature of the behavior of a system perturbed by gene knockouts, but with some shortcomings likely due to the presence of alternative pathways and regulatory mechanisms [
124].
The process of integration of transcriptional regulation into metabolic models
Two kinds of constraints can be applied to a reaction network: nonadjustable and adjustable constraints [
125]. The first kind is invariable because it is related to characteristics of the process that can hardly change such as network topology, thermodynamics, and maximal flux capacities. In contrast, the second kind (of constraints) depends on regulatory events so they can vary over time as well as under different environmental conditions. One can consider these constraints to be temporal constraints applied to the metabolic network.
In the first generation of constraints-based models only nonadjustable constraints were used, while the addition of adjustable constraints led to the second generation of constraints-based models [
126]. Incorporation of regulation into genome-scale metabolic models is, however, still a challenge and requires more work.
A small example is useful to clarify the application of adjustable constraints. In the network represented in Figure 5A, the metabolism is represented by a single reaction that carries out the transformation of the substrate (M-1) into the product (M-2) using an enzyme (Enz). The product of the reaction suppresses the activity of the transcription factor which activates the expression of the enzyme.
The behavior of this small network can be described by some simple Boolean rules (Figure 5B). According to these rules the reaction is active when the substrate M-1 and the enzyme are both present. The enzyme is present when both the transcription factor and the gene are present and the product M-2 is absent. This small network contains five elements, thus all the possible states of the network are 2
5 = 32. The reaction can be turned on or turned off depending on which state the network finds itself in. It is quite clear that even a simple network can show a very complex behavior. The most important consequence of the imposition of regulatory constraints is that it reduces the solution space of the metabolic network (Figure 6) and this fact can also be examined using topological techniques [
126].
The rFBA method and its current development
Historically, first attempts to integrate FBA with regulation started with the observation that time constants characterizing metabolism and regulation differ by one order of magnitude at least [
120]. In particular, the time constants that describe the behavior of the metabolism are faster (order of milliseconds) than the time constants that describe the transcription regulation (order of minutes). As a consequence, a dynamic process can be simulated as a sequence of steady states if one looks at the behavior of metabolic network during a time series as a sequence of short intervals of time in which the metabolic network reaches a quasi-steady state.
So the whole time of the simulation is split in several time intervals in which a quasi-steady state for the metabolic network is assumed. For the first time interval, the knowledge of the environmental condition is available, thus the FBA method can be used in order to simulate the response of the metabolic network to the environment. Additionally one can determine the status of the regulatory network by solving the Boolean equations for the time interval based on the presence or absence of particular metabolites, regulatory proteins or fluxes in the internal environment.
Finally, the output of the simulation is used to update the environmental condition for the next time interval by using a numerical integration in order to calculate the new concentrations of the external species [
127]. For each of the following time steps the state of the regulatory network is calculated and used in addition to the environmental condition calculated from the preceding time step in order to constrain an FBA simulation.
The regulatory constraint process is accomplished by checking the presence of the enzyme in the specific time interval based on the results of the regulatory rules. If the enzyme is not present, the flux of the reaction at the given interval is set equal to zero. If the enzyme is present, the flux is determined with FBA simulation in that given time interval.
As a consequence, the linear optimization for the metabolic network and the calculation of the state of the regulatory network are made in two separate moments, and the regulation is represented as a time dependent (adjustable) set of constraints applied to an FBA simulation. This method is called regulatory FBA (rFBA) and allows for the prediction of dynamic profiles of the cell growth [
120,
121,
128].
A common way in order to validate genome-scale models is by confronting simulated growth and exchanging fluxes with experimental data. Although this is the common accepted practice, some efforts have been done to try to validate models studying the correlations between the
in silico calculated and
in vivo measured internal fluxes. This is also important in order to analyze the ability of the models to predict internal fluxes [
124,
129]. One of the limitations of this comparison is that the number of measured fluxes is limited compared with the number of fluxes present in a genome-scale reconstruction. Furthermore, large-scale datasets usually cover mainly the central carbon metabolism and less other pathways, e.g., biosynthesis of biomass components like lipids and nucleotides [
130].
Moreover, it is not yet clear what the best methodology is in order to select reactions which allow the reconstruction of the whole system state, although recent results about the observability of complex systems, such as metabolic network, suggest that in yeast the minimum number of sensor nodes should be about 10% of the number of reactions of the genome-scale reconstruction [
131].
An integrated FBA such as rFBA can be used to overcome this limitation by improving the ability to make predictions and validate the model by using large gene transcription datasets available today. On one hand, the prediction of metabolic flux in a given environmental condition is possible using integrated FBA and this can suggest possible interpretations for up or down regulation of genes. On the other hand, this approach can also allow for the creation of
in silico gene expression arrays and comparison with experimental data to validate the integrated model. For instance, Covert and coworkers in a large-scale simulation of
E. coli [
128] used their model to predict differential expression of genes. Afterwards they compared the
in silico gene expression with
in vivo gene expression to update their model by rewriting, relaxing or removing regulatory rules in order to improve the prediction ability of their reconstruction in a iterative fashion.
A large-scale study of an integrated yeast rFBA model was performed by Herrgård et al. [
103]. They reconstructed a yeast TRN formed by 55 transcription factors related with the metabolism. Afterwards they integrated their TRN reconstruction with a genome-scale metabolic reconstruction and used this integrated yeast model to study the interactions between the regulatory and the metabolic networks. They also predicted growth phenotypes and gene expression changes. This work clearly showed us how the identification of new interactions and analysis of different kinds of data in yeast are possible by using only one model framework.
Herrgård et al. used an iterative approach conceptual similar to the one used by Covert et al., which was based on the update of regulatory rules. In fact, they expanded the regulatory network of the model by using experimental data taken from the literature. This approach showed that discrepancies between the model predictions and the experimental data can be used in order to drive experiments to investigate and identify potential new target genes for each TF.
Later other efforts for extending rFBA to simulate the transcriptional regulation of the whole system led towards the integration of different types of model formalisms to describe the different process inside the cell [
132]. This means using ordinary differential equation (ODE) when kinetic parameters are available, in order to simulate the signaling network and keep using Boolean rules to simulate the transcription regulatory system. This approach gave birth to hybrid methods used for
E. coli central metabolism (iFBA [
133]) and for a portion of the
S. cerevisiae system (idFBA [
134]). The final goal of this kind of approaches is to build a whole-cell system model as has been done for
Mycoplasma genitalium [
135].
The application of this hybrid models showed improvement over the rFBA approach. In particular, the ability of the model to simulate the dead ends [
133] was enhanced. Dead ends are reactions present in the reconstructed network but not used in the optimal solution because their products are not involved in the growth objective function. However, the metabolic products of the dead ends are often very important in the cell for other reasons, for example, they can be mediators of signaling.
In particular, the method called idFBA [
134] has been applied to a module of
S. cerevisiae composed by a small portion of signaling, metabolic and transcriptional regulatory networks related with high-osmolarity glycerol response mitogen-activated protein kinase pathway. This work showed that an integrated system spanning different time scales can be studied with an integrated FBA based approach.
The MILP reformulation and the data driven approaches
In the rFBA method, the TRN and the metabolic network are analyzed separately and simulated separately. The presence of alternative solutions for both networks makes it difficult to integrate them and imposes a requirement for selecting which of the steady state solutions have to be selected for simulating the next time interval. Nevertheless, Boolean rules can be converted into integer inequalities, and these inequalities can be used to formulate a mixed integer linear program problem (MILP). Thus, an integrated model reformulated in terms of an MILP problem can be obtained by changing the formalism used in order to describe the TRN. This MILP reformulation integrates the regulatory structure with the metabolic model in a single model.
The steady-state regulatory flux balance analysis (SR-FBA) [
136] method is based on the reformulation of the integrated model into an MILP problem and it was used in order to find steady state solutions for the metabolic network and for the regulation network which are consistent between them. The solutions of the TRN were mapped on the metabolic network and the coherence of the integration was studied by checking the consistency between the states of the TRN and the metabolic network, in an effort to improve the evaluation of the optima alternative solutions by classifying each gene based on its expression and flux activity states.
The SR-FBA approach gave birth to important development such as the OptORF [
137] and the
GeneForce [
138] algorithm. The limited and incomplete information available about the regulatory mechanisms and the inherent complexity of the system can lead to incomplete integrated models which can be inconsistent with experimental evidences. The
GeneForce algorithm was developed to refine and solve inconsistency in large-scale integrated models by suggesting corrections for the model to improve its prediction ability. This algorithm tries to identify the minimum number of rules that can be violated so as to allow the model to perform the correct growth prediction, preserving as much as possible the regulatory rules already applied. Thus, by relaxing the regulation rules the solution space is remodeled to include solutions previously eliminated by the first shaping of the integration process. Moreover, the procedure of correction of these inconsistencies can suggest hypotheses which can drive future experiments and lead to new discoveries.
The
GeneForce algorithm follows an approach based on using the experimental data available to lead modifications for a model in order to perform a best fit of experimental data. This data driven approach is currently applied in a more extended way by other methods. They are based on using the omics data to constrain an FBA simulation in trying to integrate directly gene expression into FBA without using any TRN reconstruction. Indeed, the process of reconstruction of a TRN in terms of Boolean rules is time consuming and difficult, because the literature available have to be searched manually. Consequently, this process has been done only for few organisms [
103,
128,
139].
For this reason, the data driven approaches have been used extensively recently. Several algorithms have been developed and many examples of this approach are present in the literature, such as the E-flux [
140] and the GIMME [
141] method and sometimes implemented in one single software [
142]. Some of them use thresholds in order to reformulate the gene expression levels into on/off description; others try to overcome the use of arbitrary thresholds (for a recent review see Ref. [
143]). Nevertheless, the results performed by data driven approaches can be limited because of the lack of correlation between the gene expression and the protein level [
144]. However, this correlation can be improved by using a network based model [
18] under certain circumstances.
Furthermore, the separation between a data driven method and a regulatory reconstruction driven method is not clearly defined sometimes. For instance, the probabilistic regulation of metabolism (PROM) [
145] algorithm starts from a regulatory network structure and gene expression data in order to model the transcription regulation by applying a probabilistic approach. The PROM algorithm was initially applied to
E. coli and
Mycobacterium tuberculosis, and was later applied to
S. cerevisiae to predict the structure of its TRN reconstruction [
146]. Moreover, some methods use the information available from gene expression and regulatory networks together to improve the prediction ability of their models. One relevant example of this approach is the MADE [
147] algorithm which has been applied to a
S. cerevisiae metabolic reconstruction. The MADE algorithm uses experimental gene expression data by setting a binary approximation of gene expression changes in different conditions after a statistical test to evaluate the most probable approximation. In this way MADE algorithm is able to formulate an MILP problem and avoid using arbitrary thresholds. The MADE method has been integrated during a second stage with the yeast TRN reconstruction, and this step improved and refined the prediction ability of the method.
FUTURE PERSPECTIVES
Although many large-scale TRNs have been reconstructed, and some of them have been integrated into metabolic models for S. cerevisiae, there are still several important problems to be solved in this area. Many of the transcriptional regulatory interactions obtained by biochemical approaches need to be studied for the efficiency of their functions. In addition, the dynamics of the TRN need to be described in more detail by examining the transcriptional regulatory interactions under more environmental conditions. For the integration of TRN reconstructions into metabolic models, only a small number of approaches used for E. coli have been also applied to S. cerevisiae so far. The transition from the modeling of prokaryotic organisms to eukaryotic organisms is challenging due to the increase in complexity of such organisms and the scarcity of our knowledge about the regulatory mechanisms.
The merging process of the Boolean logic qualitative description with the FBA quantitative results limits the outcome of the integrated system. Even when this integration is possible, the use of an on/off logic to describe the regulatory network reduces the ability of the model to make predictions. The Boolean on/off description of regulatory interactions is not sufficient to catch all the regulatory events of the cell, because it is not able to describe the intermediate levels of regulation present in a continuous system. For instance, when this binary representation is applied to describe the role of essential genes, a deletion of such genes due to a down-regulation event has the consequence that the model is not feasible. As a result, the development of methods allow a low flux even when an essential gene is down-regulated, and new large-scale methods based on non-Boolean formalisms are advantageous for a better integration of TRN reconstructions with metabolic models. Some steps toward this direction have been made with the introduction of probabilistic and data driven approaches using the MILP formulation, but other steps can be made by the extension of rFBA based techniques. Nevertheless, these are not simple tasks because the improvements of the experimental methodologies as well as the improvements of the modeling techniques are necessary in order to gain more quantitative data about the TRN in the future.
Recently, the efforts to reconstruct the metabolic network have been extended to create reconstructions of other systems present inside the cell. This new kind of modelling is based on an explicit representation of the elements which carry out the cellular tasks like the transcription and the translation process. This kind of reconstruction dramatically improves the dimension and the level of detail of the model. The reconstruction of
E.coli’s transcriptional and translational machinery [
148] allowed the formulation of the next generation genome-scale metabolic model (ME-model) and ME-models have been recently developed for
E. coli [
149,
150] and
Thermotoga maritima [
151]. This is a promising approach for building a scaffold that can be used for the integration of omics data at several layers of the model. Nevertheless, if this approach can be carried out for a eukaryotic organism is an open question. There is a lack of information about several processes inside the yeast cell, so it is not clear if it will be possible to create a ME-model for yeast in the immediate future, although a step toward this objective has been made with the reconstruction of the yeast secretory machinery [
152].
Higher Education Press and Springer-Verlag Berlin Heidelberg