Computational Biology Lab, Bio-X Program, Stanford University, Stanford, CA 94305, USA
whwong@stanford.edu
Show less
History+
Received
Accepted
Published
2014-01-01
2014-01-10
2014-06-25
Issue Date
Revised Date
2014-06-25
PDF
(3945KB)
Abstract
The Master equation is considered the gold standard for modeling the stochastic mechanisms of gene regulation in molecular detail, but it is too complex to solve exactly in most cases, so approximation and simulation methods are essential. However, there is still a lack of consensus about the best way to carry these out. To help clarify the situation, we review Master equation models of gene regulation, theoretical approximations based on an expansion method due to N.G. van Kampen and R. Kubo, and simulation algorithms due to D.T. Gillespie and P. Langevin. Expansion of the Master equation shows that for systems with a single stable steady-state, the stochastic model reduces to a deterministic model in a first-order approximation. Additional theory, also due to van Kampen, describes the asymptotic behavior of multistable systems. To support and illustrate the theory and provide further insight into the complex behavior of multistable systems, we perform a detailed simulation study comparing the various approximation and simulation methods applied to synthetic gene regulatory systems with various qualitative characteristics. The simulation studies show that for large stochastic systems with a single steady-state, deterministic models are quite accurate, since the probability distribution of the solution has a single peak tracking the deterministic trajectory whose variance is inversely proportional to the system size. In multistable stochastic systems, large fluctuations can cause individual trajectories to escape from the domain of attraction of one steady-state and be attracted to another, so the system eventually reaches a multimodal probability distribution in which all stable steady-states are represented proportional to their relative stability. However, since the escape time scales exponentially with system size, this process can take a very long time in large systems.
Like any physical quantity, gene expression level measurements are subject to noise. In fact, the experimental techniques for measuring biological quantities tend to be much noisier than measurements in other scientific disciplines. The extrinsic noise arising from measurement error can be mitigated by averaging many experimental replicates. However, in addition to straightforward extrinsic noise, gene expression is also characterized by intrinsic noise arising from the fundamental stochasticity of the underlying processes, which cannot be simply averaged away [1,2].
Experimental studies using single-cell biotechnologies have revealed that the biological mechanisms of gene regulation, including promoter activation and deactivation, transcription, translation, and degradation, are inherently stochastic [3–10]. Stochasticity can sometimes dramatically affect the behavior of gene regulatory networks [8,11,12] as stochasticity leads to different phase diagrams and can cause instability [13], and small molecular numbers can seriously impact system behavior [14]. These observations have important implications in synthetic biology, including the engineering of switches, feedback loops, and oscillatory systems [15–18].
A number of stochastic models have been applied to the gene regulation problem, since it is clear that in many cases, additive noise (independent of expression level) is not sufficient [19]. Since gene regulation depends on a series of chemical reactions (including the binding of TFs and RNAP to the promoter, transcription, translation, and degradation), it can be modeled with chemical equations [20,21]. A number of ad hoc approaches have also shown promise, including Poisson models (although expression regulation can lead to non-constant rates, disrupting the Poisson character), fluctuation noise analysis in small systems [4,22–26], and structural inference on large networks based on noise correlation [27,28]. The highest quantitative resolution is obtained from more sophisticated analysis based on the Master equation [29], including approximation methods [25,30–33], and more accurate modeling via simulation or theoretical deduction [14,19,34–37].
While the Master equation is generally accepted as the gold standard for modeling the processes of gene regulation in molecular detail, it is too complex to solve exactly except in simple cases. Approximations are needed to make it useful, but researchers have still not reached a clear consensus about the proper way to carry them out. In this paper, we attempt to shed light on these issues by discussing theoretical approximations of the Master equation based on an expansion method due to N. G. van Kampen and R. Kubo, and simulation algorithms due to Gillespie and Langevin. The van Kampen expansion shows that the stochastic model reduces to a deterministic model in a first-order approximation, provided the system has a single stable steady-state. We also discuss additional theory due to van Kampen for modeling the complex behavior of stochastic systems with multiple stable steady-states. To illustrate the methods and provide further insight into the behavior of multistable system, we perform a detailed simulation study comparing the various approximation and simulation methods applied to synthetic gene regulatory systems with a variety of qualitative characteristics.
DYNAMICAL SYSTEM MODELS OF GENE REGULATION
Deterministic dynamical system models of gene regulation lay the groundwork for stochastic modeling efforts. Master equation models are discretized stochastic generalizations of dynamical system models, so we start by introducing these basic models to provide context and clarify the basic modeling assumptions. For both dynamical system models and their stochastic counterparts, the form of the RNA polymerase binding probability function is a key modeling choice, and researchers have considered many different possibilities. Linear functions yield the simplest models, but lead to dynamical systems (or stochastic systems) with only one steady-state, while nonlinear choices yield much more complex systems with multiple steady-states, like many of the most interesting biological systems. Lyapunov theory characterizes the stability of steady-states of a dynamical system and hence the system’s long term behavior; multi-stability will become an even more interesting and critical issue when stochasticity comes into play. In this section, we discuss dynamical system models, linear and nonlinear choices of binding probability function, and steady-states and their stability, in order to set the stage for stochastic models of gene regulation.
Dynamical system model
Before formulating the Master equation model for gene regulation, we introduce its natural precursor, a deterministic dynamical system model. Although dynamical system models do not account for intrinsic noise, they capture cells’ ability to assume different characters as they transition through their lifecycles and respond to stimuli, and their quantitative form means that they can be used to predict systems’ future behavior. They also lend themselves to inference algorithms that allow the structure of novel gene regulatory systems to be learned from data [38]. As we will see in a later section, the stochastic model reduces to the deterministic model in a first-order approximation.
In the standard dynamical system model of gene regulation, the levels of RNA and protein evolve according to a system of differential equations. The basic assumption is that each species of RNA is transcribed at a rate proportional to the probability of RNA polymerase (RNAP) binding to the gene promoter (as a function of the expression levels of the transcription factor (TF) proteins that regulate it) and degrades at a rate proportional to its current level, while the corresponding protein is translated at a rate proportional to the current RNA level and also degrades at a rate proportional to its own level.
Hence, the dynamical system model is given by:where represents RNA concentrations and represents protein concentrations corresponding to a set of n genes, f(y) is the probability that RNAP is bound to the promoter as a function of the concentrations of regulatory proteins, is the transcription rate when RNAP is bound to promoter, is the translation rate, and , are the RNA and protein degradation rates, respectively.
In some situations, it is necessary or more appropriate to ignore the distinction between RNA and protein and use a model of the form:involving only the RNA concentrations x, which serve as a surrogate for the protein concentrations y. We will make similar modeling assumptions when we apply the Master equation to the gene regulation problem.
RNAP binding probability models
In Equation (1), the term f(y) represents the probability of RNAP binding to the gene promoter as a function of the protein concentrations y. There are many possible approaches to the modeling of the RNAP binding probability, yielding different functional forms of f.
The simplest approach is to use a linear function f(x) = Ax, where , , and A is an n×n matrix. Such a system can have only one steady-state, as we will discuss in more detail later on. Each coefficient aij represents the linear regulatory effect of gene j on gene i, and it is clear how perturbing the system and observing its response would allow for straightforward inference of these coefficients. If we are only interested in one specific steady-state of a biological system, this may be a good choice, and has led to success in many cases [39-42]. However, the usefulness of the linear model is severely limited by the fact that it cannot capture the ability of gene regulatory systems to maintain multiple stable steady-states, one of the key features that often motivate their study. Hence, we now turn our attention to nonlinear models.
Michaelis-Menten kinetics and the Hill equation are classical nonlinear model for activation or repression by a single factor, based on thermodynamic theory. Michaelis-Menten kinetics [43] can be applied to gene regulation by a single transcription factor by modeling transcription as the enzymatic reaction serieswhere X is an activator, D0 is an unbound promoter, D1 is an activator-bound promoter, P is an RNA polymerase, and Y is an RNA transcript. The corresponding kinetic equations are:
Let us assume that the reversible TF-promoter binding and RNAP-promoter binding reactions occur much faster than gene transcription, so that the quasi-steady-state assumptionapproximately holds. That is, the bound and unbound promoter states maintain equilibrium concentrations. Then we can rearrange to obtainA model for gene repression can be derived in a similar manner by replacing Equation (4) with(since now transcription only occurs for the unbound promoter), leading to
The Hill equation [44] is another classical model with a similar form, which models cooperative binding. For activation by a single transcription factor, for example, it has the formwhere n is the Hill coefficient.
More sophisticated thermodynamic models can account for multiple regulatory mechanisms [45,46]. The Bintu et al formulation is most general thermodynamic model in common use, capable of capturing the full spectrum of network interactions [47,48]. The Bintu RNAP binding probability function takes the general formwhere Sij lists the gene products that interact to form a regulatory complex, and bij, cij are nonnegative coefficients satisfying , which depend on the binding energies of regulator complexes to the promoter. bi0 and ci0 correspond to the case when the promoter is not bound by any regulator (), and the coefficients are normalized so that ci0 = 1.
The form of fi can model many different types of regulation in quantitative detail. Terms that appear in the denominator only are repressors, and the degree of repression depends on the magnitude of the coefficient, while terms that appear in the numerator and denominator may act as either activators or repressors depending on the relative magnitudes of the coefficients and the current gene expression levels.
Despite its detail and generality, not even the Bintu et al model captures the full complexity of the situation. One of the key assumptions implicit in the Bintu model is that RNAP levels are approximately constant on the time scale of interest. A second assumption is that nonspecific binding energy is equal for all non-promoter locations the RNAP could occupy. Finally, the model omits several reversible intermediate reactions such as binding and unbinding of RNAP and TFs. Since these reactions typically occur very quickly relative to the transcription time scale, we can reasonably assume that the quantities involved are in thermodynamic steady-state. Hence we can apply the quasi-steady-state assumption to eliminate the reversible reactions from the model. (The same argument applies to the stochastic formulation of the next section, as Rao and Arkin demonstrate how to use the quasi-steady-state assumption to eliminate intermediate species from a multivariate Master equation [49].) Nevertheless, eliminating short lived intermediate species is an approximation.
Steady-states and stability
The ability of gene regulatory networks to maintain multiple distinct steady-states is one of their most crucial properties and a major motivation for their study. Since steady-states and stability will be central to our discussion, we need to establish the necessary mathematical groundwork. Fortunately, the classical theory for general dynamical systems due to A. Lyapunov (1857–1918) perfectly serves our purposes. We will only outline the key definitions and theorems here; for a complete discussion, see a text like Walker’s “Dynamical systems and evolution equations” [50].
Consider a general nonlinear dynamical system of the formwhere , , and f is continuous. Assume that this system has an equilibrium point xe, i.e., f(xe) = 0. If f is linear or affine, f(xe) = 0 has exactly one solution so the system has exactly one steady-state; if f is nonlinear, the system may have zero, one, or multiple steady-states. (We must therefore use nonlinear functions to model gene regulatory systems if we wish to capture their ability to maintain multiple stable steady-states.) Let Φ(t; ) denote the unique solution trajectory x(t) corresponding to x(0) = . Lyapunov defined stability and a stronger condition, asymptotic stability, as follows:
Definition: The equilibrium xe is said to be stable if for all there exists , such that(where B(x, ϵ) denotes the open ball of radius ϵ centered at x).
It is said to be asymptotically stable if it is stable, and for all there exists , such that
In essence, stability means that there exists a neighborhood of the steady-state such that trajectories that start inside that neighborhood remain there for all time. Asymptotical stability means that in addition to this, nearby trajectories are attracted to the steady-state and eventually get infinitely close to it. The next theorem provides the essential stability criterion.
Theorem: Assume , and set (the Jacobian matrix at xe). If there exists a symmetric positive definite matrix P such that , then xe is asymptotically stable [50].
THE MASTER EQUATION
The basic approach to treating the stochasticity of gene regulation is to model gene expression level as a Markov process, whose future state depends probabilistically only on the current state. This is the most appropriate description for most processes in physics and chemistry [29], and this case is no exception: the mechanisms of transcription, translation, and degradation mean that the probability of each of these events depends only on the current quantity of each of the species involved in these processes, including RNAP, transcription factors, and ribosomes (each of which is the product of one or more genes, and can therefore be accounted for in our formulation). The Master equation is the natural model for gene regulation under the Markov assumption.
In this section we introduce the Master equation, which follows directly from the Markov property [29]. We will briefly discuss the general form of the Master equation, then turn our attention to the special case of birth-and-death processes, which provide an excellent model for gene regulation. We will need the multivariate form of the Master equation to treat systems with multiple genes. In the next subsection, we will apply the basic general theory outlined in this subsection to the gene regulation problem.
Markov processes
A Markov process is a stochastic process such that for any t1<t2<…<tn,
Hence a Markov process is completely determined by the functions P(y1, t1) and the transition probabilities P(y2, t2|y1, t1). The Master equation holds for any Markov process: Appendix A contains a complete derivation of the Master equation as an equivalent form of the Chapman-Kolmogorov equation, which is a direct consequence of the Markov property (adapted from Chapters IV and X of van Kampen’s Stochastic Processes in Physics and Chemistry [29]). For a general Markov process Y , the Master equation readswhere is the transition probability per unit time from y to . If the range of Y is a discrete set of states labeled by n, then Eq. (7) reduces to
Birth-and-death processes
Birth-and-death (or one-step) processes are a special class of Markov processes whose range consists of integers n and whose transition matrix permits only jumps between adjacent sites:(Note that this does not mean that it is impossible for the system to make two jumps within one time step ∆t, but only that the probability is O(∆t2).) Hence the Master equation reduces to
The birth and death rates, gn, rn, respectively, can be arbitrary functions of n, even nonlinear ones. If only non-negative integers are allowed, then for n = 0 we must replace withor alternatively we may define r0 = g-1 = 0.
One important example of a one-step process with constant transition rates is the Poisson process: rn = 0, gn = q, pn(0) = δn,0, i.e.,
It is random walk over the integers taking steps to the right only, but at random times. The negative Poisson process (taking steps to the left) is a good model for protein degradation, as we will see shortly.
Multivariable birth-and-death processes
The generalization of the Master equation for birth-and-death processes to multiple variables is straightforward. Consider an n-dimensional birth-and-death process with birth and death rates g; r : , respectively. That is, gj(k), rj(k) denote the birth and death rates, respectively, of the jth species when . The Master equation governing this process is given bywhere .
The Master equation model for gene regulation
Now we are ready to apply the basic theory of the previous subsection to the gene regulation problem. We will show how to model gene regulation as a birth-and-death process, where birth corresponds to transcription and death to degradation, and derive the appropriate Master equation model. In the one gene case, an explicit steady-state solution is available. In order to account for multiple genes and the distinction between RNA and protein, we require a multivariate formulation.
Gene regulation as a birth-and-death process
We now wish to develop a stochastic model for gene regulation. We will start simply, considering a system with a single gene, and temporarily ignoring the distinction between RNA and protein. Let X(t) be a discrete random variable representing the number of RNA transcripts present in the cell at time t. X(t) has a time-dependent probability distribution given by P(k, t) ≡ P{X(t) = k}. Analogous to the deterministic model of section Dynamical system model, we can model X(t) as a birth-and-death process with birth rate τF(k) and death rate γk, where F models the RNAP-promoter binding probability as a function of the current number of transcripts (in the single-gene case, we can only account for self-regulation). If there are initially k RNA transcripts, then over an infinitesimal timestep ∆t either a degradation event may occur with probability γk∆t, an RNAP-promoter binding event may occur followed by RNA transcription with probability τF(k)∆t, or neither may occur. (It is highly unlikely (O(∆t2)) that two or more of these events occur within a single timestep.) Hence, as Figure 1 shows, the probability P(k, t) increases by P(k-1) times the probability transcription plus P(k,t) times the probability of degradation, and decreases by P(k) times the probability of transcription plus the probability of degradation. The Master equation governing the evolution of P(k,t) over time is therefore:
Explicit steady-state solution for one gene systems
A general single-species birth-and-death process governed by the Master equationhas an explicit steady-state probability distribution given by(van Kampen VI.3.8 [29]). The proof is by induction. Applied to a single-gene system, this becomes
This formula is very useful for studying one gene systems with minimal computation. For example, it can be used to directly compute the steady-state mean and variance of a single-gene system.
Multiple genes
In order to study stochasticity in gene regulation, we must extend our framework to include multiple-gene systems as well. In order to do this we can apply the Master equation for multivariate birth-and-death processes. Consider a system with n genes, and let be a discrete random vector, where Xj(t) represents the number of RNA transcripts of gene j present in the cell at time t. X(t) has a time-dependent probability distribution given by , for . Similar to the one gene case, we can model X(t) as a birth-and-death process with Master Equation:where
RNA and protein
Initially, we simplified the discussion by ignoring protein translation and focusing only on the number of RNA transcripts of each gene. The same multivariate Master equation that allowed us to handle multiple genes also allows us to model the stochasticity of protein translation. If we introduce another discrete random vector , where Yj(t) denotes the number of protein translates of gene j, and define P(kr; kp; t) ≡ P(X(t) = kr;Y(t) = kp) the Master equation corresponding to the deterministic model (1) is
If we apply the quasi-steady-assumption discussed in section RNAP binding probability models to the translation step, that is, we assume that protein levels are approximately proportional to RNA levels at all times, then this equation reduces to the form (13) [49]. Although this assumption may not always be biologically accurate, model (13) is the only practical option in many cases, since experimental technologies for measuring both mRNA and protein levels concurrently are not yet available.
Any modeling effort is necessarily a compromise between accuracy and tractability, and this case is no exception. Since the biological mechanisms of gene transcription are extraordinarily complex and not completely understood, our model relies on a number of simplifying assumptions, both biological and physical in nature. One of the most explicit is the assumption that the rates of degradation, translation, and transcription (when RNAP is bound) are constant for each gene. In reality, the rates are affected by many other processes including chromatin remodeling, translational regulation, and protein folding. As discussed in section RNAP binding probability models, modeling the RNAP binding function also involves several simplifications and assumptions.
EXPANSION AND SIMULATION METHODS
The Master equation cannot be solved explicitly except in the simplest cases. For a one gene system, we have an explicit formula for the steady-state distribution (Equation (12)), but no such formula exist for multiple genes. Therefore, in order to make further progress we will need approximations of the Master equation and efficient simulation methods. Fortunately, much of the work has already been done by physicists studying the Master equation. Beginning in the 1970s, N. G. van Kampen [29] and Ryogo Kubo [51] developed a systematic expansion method for approximating the Master equation at any level of detail. Gillespie created a stochastic simulation algorithm to generate statistically correct trajectories of the Master equation; another simulation method based on the Langevin equation is less accurate but more efficient. We will summarize their findings in this section and show how they can be applied to the gene regulation problem. In the next section we will perform simulation studies on simple synthetic gene regulatory systems to illustrate the application of these methods and understand their strengths and weaknesses.
The Gillespie algorithm
The Gillespie algorithm enables numerical simulation of statistically correct trajectories of a system governed by the Master equation. The iterative Monte Carlo procedure randomly chooses the next event that will occur and the intervening time interval, then updates the molecular numbers of each species and the trajectory time [52]. If the simulated system is in state at time t, the waiting time τ before its next jump is drawn from an exponential distribution, and the probability of jumping to state X(μ)is wμ ∝ W(X(μ)|X) (the Master equation transition probability for X→X(μ)per unit time). The basic steps of the algorithm are
1. Initialize the molecular numbers of each species, X1,…, Xn, and set t = 0.
2. Randomly choose the next event to occur, and an exponential waiting time τ , by generating uniform random numbers r1, r2 from Unif(0,1), and setting
3. Update the time and molecular numbers based on the chosen event and time
4. Repeat steps 2–3 until the simulation time reaches limit t>Tsim.
The Gillespie algorithm provides an exact simulation of the Master equation at a high computational cost, which increases rapidly with the number of species and the system size. While it is very attractive for small systems, alternative approaches are needed for gene regulatory systems with many genes and large systems sizes. In the next few sections, we will discuss theoretical approximations as well as an efficient but inexact simulation method based on the Langevin equation.
The van Kampen expansion
N. G. van Kampen provides a systematic approximation method involving an expansion in the powers of small parameter inversely related to the system size [29]. The Master equation can be approximated at any level of detail by truncating the expansion to omit the higher-order terms. Ryogo Kubo, a contemporary of van Kampen, arrived at an equivalent formulation by a slightly different approach [51]. We will follow van Kampen’s development here since it is more transparent. For simplicity, we only describe the one-dimensional expansion, but the multivariate case is similar; van Kampen Chapter X.5 shows how to extend the theory to multiple variables [29].
In order to establish the relative scales of macroscopic and microscopic (jump) events, van Kampen introduces a system-size parameter Ω, such that for large Ω the fluctuations are relatively small. His approximation takes the form of an expansion in the powers of . A critical assumption is that the transition probability function W has the formwhich means that the transition probabilities depend only on the macroscopic variable and on the size of the jumps . For our application, we assume that this is the case, and that the jumps can only have magnitude 1:
The expansion begins with the Ansatz that the probability distribution P(X, t) has a peak of order tracking the macroscopic solution, with width of order corresponding to the fluctuations:
The motivation for the Ansatz is the observation that the relative fluctuation effects in chemical systems tend to scale as the inverse square root of the system size [53]. It is justified a posteriori by the fact that P(x, t), expressed in terms of , turns out to be independent of Ω to first approximation. As part of the expansion procedure, is chosen to track the peak, and turns out to be exactly the deterministic solution.
To compute the expansion, van Kampen redefines P(X, t) as a function of the new parameters , viarewrites the Master equation in terms of Ω, and proceeds to expand it in negative powers of Ω. To simplify the calculations, he defines the jump moments
The first jump moment corresponds to the deterministic equation:
For a birth-and-death process, this simplifies toin our case
For multiple genes, the first-order jump moments are just α1,i(y) = τifi(y) - γyi for 1≤i≤n, but the second-order jump moments, α2,i,j , 1≤i, j≤n, are more complex since they involve interactions: see van Kampen Chapter X.5 for further details.
The complete calculation (adapted from Chapter X of van Kampen) is provided in Appendix B. A crucial step in the expansion is the cancellation of terms of order , which cannot belong to a proper expansion for large Ω. The cancellation is made possible by choosing (t) (the macroscopic part of X) such that
That is, exactly satisfies the deterministic equation.
The final result (to order ) is thatwith jump moments αv defined by (15).
As we will discuss in greater detail later, the validity of the expansion relies on the assumption that the macroscopic equation has a single stable stationary state (satisfying ), which attracts all trajectories. If this is not the case, it is possible for a random fluctuation to send a stochastic trajectory out of the domain of attraction of the deterministic steady-state near which we would expect it to remain. For now, we will assume that the condition holds. Then the expansion is valid and can be truncated at the desired level of detail and translated back into the original variable via to yield various approximation schemes.
The linear noise approximation
Restricting attention to the terms of order Ω0 = 1 in this expansion yields the linear noise approximation
This is a linear Fokker-Planck equation, and the solution turns out to be a Gaussian (see van Kampen VIII.6 [29]). Hence it is completely characterized by the first and second moments of , which are of the most interest to us anyway. Multiplying Equation (18) by and , respectively, yields differential equations in the mean and variance of (denoted , , respectively):
After solving for the mean and variance of and solving the deterministic equation for , we can use the Ansatz (14) to find the mean and variance of X:
The initial condition P(X, 0) = δ(X - X0) implies 0 = x0 and , hence . (Even if has a nonzero initial distribution, if we still will have . Hence the mean of the solution to the Master equation with initial distribution approximately satisfies the deterministic equation:
This result extends to the multivariate case. If , the mean becomes a vector in and the variance becomes an n × n covariance matrix. In Equations (19) and (20), the functions , α2 are replaced with a Jacobian matrix and a matrix of second-order jump moments, but the basic structure of the expansion is similar, and (21) still holds. The multivariate case is discussed further in van Kampen Chapter X.5, and by Komorowski et al. (with different notation) [54].
Connection to nonlinear deterministic model
Equation (21) provides the link between the stochastic Master equation and the nonlinear deterministic dynamical system model (1). It shows that the deterministic equation is an approximate model for the evolution of the mean expression of the stochastic process. That is,with error on the order of a single molecule. Therefore, under a few reasonable assumptions about system size and steady-state stability, the population mean still approximately satisfies the nonlinear deterministic equation
One of the attractions of this model is an associated algorithm for learning an unknown underlying gene regulatory network from experimental data [38,55]. The algorithm selects terms to include in the model and estimates their coefficients without requiring any prior knowledge of the regulators, based on gene expression data at perturbed steady-states. Thus, relatively easy-to-acquire steady-state gene expression data can be used to learn a model that quantitatively describes the complete dynamical behavior of the system.
The Fokker-Planck and Langevin equations
In section The linear noise approximation, we saw that the linear noise approximation gave rise to a linear Fokker-Planck equation. Fokker-Planck or (mathematically equivalent) Langevin equations predate the van Kampen expansion and are still often used as approximations of the Master equation or directly as models of Markov processes with small jumps (although this sometimes leads to difficulties that must be resolved by van Kampen’s approach). In this section we will discuss these two types of equations and their applications on the gene regulation. Although the approximation is not entirely consistent due to the nonlinearity of the problem, the Langevin equation is the basis of an efficient simulation approach that enables large-scale simulations of multiple-gene systems.
The Fokker-Planck equation is a differential equation consisting of a “transport” and a “diffusion” term:(For the multivariate version, see van Kampen VIII.6.1.) In the general form of the equation, α1, α2 are any real differentiable functions with α2>0, but in Planck’s derivation of the equation as an approximation to the Master equation [56], they are exactly the first and second jump moments (15). Since the Fokker-Planck equation is always linear in P, we follow van Kampen in appropriating the term linear to mean that α1 is linear and α2 constant.
The Langevin equation is a stochastic differential equation (SDE) of the form where W(t) is a Wiener process, or Brownian motion. Again, α1, α2>0 may be any C1 functions in general, but in the case of interest to us, they represent the jump moments (15). Equations (22) and (23) are mathematically equivalent using the Ito interpretation of (23) (see van Kampen IX.4 [29] for the proof).
These equations are very appealing for modeling physical processes since they are easy to derive and interpret. For both equations, α1, α2 (thought of for now as general functions, not as the jump moments) can be inferred without even knowing the underlying Master equation, using only the macroscopic law and fluctuations around the steady-state solution (known from statistical mechanics). The approach works very well in situations where the macroscopic law α1 is linear [56–59]. However, confusion can arise when α1 is nonlinear, since effects on the order of the fluctuations are invisible macroscopically [60]. One of the major motivations for van Kampen’s systematic expansion was the need to resolve disagreements between authors who had developed different but equally plausible characterizations of the noise in nonlinear systems using this approach.
For systems with linear deterministic equations, the van Kampen approximation agrees exactly with the Fokker-Planck model, since the linear noise approximation yields a linear Fokker-Planck equation. However, discrepancies may arise for nonlinear systems, and we should consider the van Kampen theory definitive in such cases. The error in the nonlinear Fokker-Planck model is that it retains the full functional dependence on the nonlinear functions α1, α2 (in effect, keeping infinitely many terms of their Taylor expansions) while cutting off their third-order and higher derivatives in the expansion about the deterministic path (t). In contrast, the truncated van Kampen expansion replaces α1, α2 by their Taylor polynomials at a level of detail consistent with the order of the approximation. The van Kampen expansion provides a completely consistent approximation of the Master equation to any desired order of accuracy, while the Fokker-Planck model is a slightly inconsistent second-order approximation only. Nevertheless, the discrepancy between the Fokker-Planck and van Kampen approximations is often not too serious (and a second-order approximation is typically good enough), so the models are still very useful in many cases.
The Langevin equation, in particular, lends itself to efficient simulation [37,53]. Simulation provides insight into the behavior of individual trajectories as well as moment information, and applies directly to multistable systems (while van Kampen requires alternative theory since the expansion method only applies to systems with one stable steady-state). However, simulation can be very expensive. The exact Gillespie algorithm and other direct simulation methods are only computationally feasible for very small systems. Fortunately, the Langevin simulation works well for large systems with many genes, since trajectories of the Langevin equation can be simulated by evolving a small system of stochastic differential equations, rather than accounting for every single reaction like the Gillespie algorithm. Hence the Langevin simulation is appropriate for large systems with complex qualitative structures.
With these risks and potential rewards in mind, we will show how to apply the Langevin approach to the gene regulation problem. In the next section we will compare the results of Langevin simulations with the more accurate predictions of van Kampen or the direct Master equation simulation where possible. Using the first- and second-jump moments (16) for our problem, the one-dimensional Langevin equation is where W1, W2 are independent Wiener processes.
Systems with multiple stable steady-states
We have alluded several times to the fact that stochasticity can lead to unexpected results for systems with multiple stable steady-states. The basic reason is that random fluctuations can send stochastic trajectories out of the domain of attraction of one deterministic steady-state and into the domain of another. Van Kampen treats these issues in detail in Chapter XIII of his book [29]. In this section, we will summarize the points that are most relevant to our topic. In the next chapter, simulation studies will illustrate these points and provide additional insight.
For simplicity, consider a birth-and-death process with two distinct stable steady-states, and an unstable steady-state . By this we mean that the corresponding deterministic equation has the following properties:
A deterministic trajectory will eventually converge to the nearest stable steady-state, that is, trajectories with initial conditions will converge to , and those with initial conditions will converge to . (A trajectory with initial condition will remain there, but this is not physically meaningful even in the deterministic case since the slightest perturbation will send the trajectory toward or .)
When we take stochasticity into account, it is also possible for a large fluctuation to send a trajectory out of the domain of attraction of and into that of . These large fluctuations are usually unlikely, so it may take a very long time before one occurs. For systems of macroscopic size, this escape time can be so long that the event may never be observed. In smaller systems, however, transitions between steady-state domains can be a fairly common occurrence.
For systems in which giant fluctuations are relatively rare, we can distinguish two time scales: a short time scale on which equilibrium is established within the domain of attraction of a particular steady-state, and a long time scale on which giant fluctuations occur (sending trajectories out of the domain of attraction of one steady-state and into another). The rate of occurrence of the giant fluctuations is roughly equal to the height of the steady-state distribution at the unstable point , which means that the escape time scales exponentially with the system size, Ω.
A system that starts out near the unstable point evolves in three basic stages. At first, each trajectory has a reasonable probability of moving toward either of the stable points or , so the distribution widens quickly, but fluctuations across are quite possible. In the next stage, the probability has split into two autonomous parts, and fluctuations across cease, since each trajectory has settled into the domain of attraction of either or . In the final stage, the probability has reached a final bimodal stochastic steady-state distribution peaked at and . There is still a chance that fluctuations will send trajectories from one regime to another, but the probabilities are balanced so as to maintain the distribution.
A system that starts out near the stable point evolves differently, but eventually reaches the same bimodal stochastic steady-state distribution peaked at and (i.e., stage three), although it takes much longer to do so. Giant fluctuations can release trajectories from the domain of attraction of , but these occur on the long time-scale, so the probability peak at builds up much more slowly. Of course, if giant fluctuations are not particularly rare (in small systems, for example), then the initial condition has little impact on the time required to reach the steady-state distribution.
The relationship between the escape times and the probability of the regimes in the stochastic steady-state distribution is simple. Define the probabilities , of a trajectory (t) being in the domain of , , respectively, by
Let τac, τca represent the escape times, that is, is the probability per unit time for a trajectory in the domain of to cross the boundary into the domain of . Then we have [van Kanmpen XIII. 1.4]
At steady-state (),
We can identify the escape time with the mean first-passage time from to . For the one dimensional process defined by Equation (9), the mean first-passage time from to is given bywhere ps is the stationary distribution (11), as shown in Appendix C. The escape rate is O(), the height of stationary distribution at the unstable point b, so the escape time scales exponentially with the system size [61].
The relative stability of the two stable steady-states, , depends on the relative depths and widths of the two corresponding potential energy wells. To illustrate this, consider the Fokker-Planck equation modeling diffusion in a potential U:
Although this model is not even approximately appropriate for the gene regulation problem since the diffusion coeffcient is constant, while in the gene regulation problem it is a function of x, it helps clarify some important issues. To that end, suppose the derivative of U satisfies the bistability conditions (26), so that dU=dx and U have the shapes shown in Figure 2. dU/dx has zeros at the steady-states , , , and U has minima at the stable points , and a maximum at the unstable point .
The corresponding deterministic equation is . The stationary distribution is given byand for small θ we can approximate
Hence the relative stability of the two stable steady-states depends on both the depths of the potential energy wells (U(a) and U(c)) and their widths (U′′(a) and U′′(c)). In Figure 2, is more stable than , since its potential energy is lower and energy well is wider. The relative stability in this example is about , meaning that at stochastic steady-state, about 43% of trajectories will be near and 57% will be near at a given time (as shown in Figure 2, right pane). Similarly, we can approximate the escape time (mean first-passage time) by
Hence the escape time depends on the height of the energy barrier U(b) and energy well U(a), and the widths (a), (b) of the well and barrier. Since the potential energy difference is O(Ω), we see again that the escape time scales exponentially with the system size. Diffusion in multiple dimensions is qualitatively similar; van Kampen discusses the two-dimension case in XIII.4 [29].
In order to extend some of these ideas to the gene regulation problem, at least approximately, we need a non-constant diffusion term in the Fokker-Planck equation. The Fokker-Planck approximation corresponding to the fully nonlinear Master equation used for gene regulation is given by(As we noted earlier, although this does not technically constitute a consistent approximation, it works well in most cases.) The steady-state solution is given by(In the multidimensional case, finding the steady-state solution is less straightforward, as discussed in van Kampen XI.4, but simulating the equivalent Langevin simulation would provide an approximate result.) We can define an “effective potential” byand numerically evaluate πa, πc, and the relative stability, using
In the one-dimensional case, we could use the exact steady-state solution of the Master equation (12) instead, although the Fokker-Planck stationary solution may be more convenient. In the multivariate case, we must use the Fokker-Planck/Langevin approach, since there is no general approach to finding stationary solutions of multivariate Master equations.
Summary
Nonlinear Master equation models capture the stochastic mechanisms of gene regulation in full molecular detail. The Master equation can rarely be solved explicitly for multiple gene systems, but theoretical approximations and simulation algorithms can give insight into these systems. The Gillespie algorithm allows us to numerically simulate exact trajectories of the Master equation, although the computational cost becomes prohibitive for large systems with many genes. The van Kampen expansion method allows us to rigorously approximate the Master equation at any level of detail we desire (the deterministic model (1) being the simplest), provided that the system has only one stable steady-state, and van Kampen provides alternative theory for analyzing systems with multiple stable steady-states. The Langevin equation (equivalent to the Fokker-Planck equation) is an inexact approximation to the Master equation and is the basis of a highly effcient simulation method that is well suited for large multiple-gene systems. In the next section, we will perform simulation studies on simple synthetic gene regulatory systems to illustrate the application of each of these methods and evaluate their performance. As one might expect, the behavior of systems with multiple stable steady-states is particularly interesting.
STOCHASTIC SIMULATION STUDIES
In this section, we study several small synthetic gene regulatory systems in order to gain insight into the effects of stochasticity on systems with different qualitative characteristics, and the suitability and accuracy of different approximation and simulation methods in various situations. The simulation studies will compare the true Master equation (when feasible), second-order van Kampen approximation, deterministic equation (linear-noise approximation), Gillespie simulation, and Langevin simulation, in order to understand the strengths and limitations of each.
One gene system with one stable steady-state
Consider a single self-repressing gene whose self-regulation is governed by the deterministic differential equation
It has a single (non-negative) deterministic steady-state at y = 1, satisfying f(y) - γy = 0. (The other solution, y = -2, is negative and therefore not physically meaningful, nor is it realizable by the system assuming a non-negative initial condition.) The corresponding Master equation iswhere F(k) = Ωf(k/Ω). Numerical evolution of the Master equation by iteratively updating a vector of probabilities according to (32) is feasible in this case because the system is so simple. The Master equation also has the explicit steady-state solution:
Figure 3 shows the stationary probability distributions for a range of values of Ω, revealing that as Ω increases, the distribution is increasingly sharply peaked at ys = 1. That is, the mean approaches ys = 1, and the variance goes to zero as Ω increases.
To second order, the van Kampen expansion gives
We can solve for the steady-state values of , , and by setting the left-hand-sides of all three equations to zero. The first equation is the deterministic evolution equation: we already know that its only non-negative solution is . Evaluating f and its derivatives at :and plugging into the last two equations yields
Finally we obtain expressions for the steady-state mean and variance in terms of Ω:
The Langevin model for this system is given by the SDEwhere W1(t), W2(t) are independent Wiener processes.
Figure 4 compares the exact Master equation, second-order van Kampen approximation, Gillespie simulation, and Langevin simulation for this system with initial condition ys = 1 (the steady-state value) and three different values of Ω. As Ω increases, the agreement improves as the mean approaches the deterministic trajectory (that is, the steady-state value ys = 1), and the variance decreases. The discrepancy between the stochastic mean and the deterministic trajectory and the variance are both O(Ω-1) (as predicted by the van Kampen expansion).
The Master equation governs the evolution of the probability distribution; Figure 5 shows the final probability distributions for each value of Ω. In each case, the initial probability is a delta-distribution centered at Ωys, and the probability spreads out over time to reach a steady-state distribution, which is extremely close to a Gaussian for Ω>>1. For larger values of Ω, the final probability distribution remains sharply peaked around ys.
Two gene system with one stable steady-state
Next we consider a two gene system, again with a single stable steady-state, governed by the deterministic differential equation
It has a single deterministic steady-state at y1 = 1, y2 = 2. With two genes, directly evolving the Master equation is very expensive for moderately sized systems, as each probability distribution is now two-dimensional (in general, the computational cost of evolving the Master equation with system size Ω is O(Ωn) per timestep), so we omit this method and focus on the van Kampen approximation and the Gillespie and Langevin simulations. In the Langevin simulation, we neglected the interaction terms in the second-order jump moments for simplicity. Figure 6 shows that the situation is qualitatively similar to the one gene case we just discussed. The approximation and simulation means differ from the deterministic trajectory by O(Ω-1), and the variance is also O(Ω-1). For Ω = 1, the y2 variance and mean discrepancy of the Langevin simulation and the van Kampen approximation are slightly lower than those of the exact Gillespie trajectories. This inaccuracy arises from zero-boundary effects and the non-Gaussianity of the probability distribution at small system sizes (on the order of a single molecule).
Gene regulatory systems with multiple stable steady-states are ubiquitous in nature as this property plays a key role in cellular lifecycles and responses to external stimuli. However, constructing synthetic systems with multiple stable steady-states with our chosen functional form (5) can be challenging. One approach, which Chickarmane et al. used to develop their ESC-inspired system [62], is to start with a well-understood biological network with multiple steady-states and use experimental data and knowledge of qualitative behavior to suggest the appropriate terms and parameter values. This can be an interesting and useful program, especially as the synthetic network may later be useful for gaining further insight into the behavior of the original biological network; however, there are very few biological networks well-understood enough to lend themselves to this type of modeling. Furthermore, it is limiting in the sense that it relies on existing known networks, and provides little insight into methods for generating original networks. Ideally, we would like to be able to create novel networks from scratch with specific properties of our own choosing. In this section we will discuss our efforts toward this end. Although we have not fully solved this problem by any means and would encourage further work in this direction, we have developed a heuristic algorithm that, together with some trial-and-error, allowed us to generate the two multistable synthetic gene networks we study shortly.
Suppose we wish to construct an n-gene system with k stable steady-states e1, …, ek, of our choosing. That is, we want to find parameters bij , cij , i = 1, …, n, j = 1, …, m, where m is the number of terms in the model, so that
Furthermore, e1,…, ek should be stable, so we requirewhere Jf (y) denotes the n × n Jacobian matrix of f at .
Hence, we wish to find bi, ci such that fi(ej) = γej,i while satisfying the Jacobian condition and the other constraints. That is, we want to solve the feasibility problem:
If the problem is feasible, then bi, ci parametrize a system with the desired properties. Not all choices of the ej necessarily lead to a feasible problem, so we may have to try several possibilities before we find a system with multiple stable steady-states.
The problem is nonconvex due to the rational form of f and the stability condition, so we can either use a nonconvex solver, or use heuristics and trial-and-error and solve with a convex solver. Specifically, we can use an iterative approach to enforce the stability constraint [63], and simply replace the denominator of each f with a constant value and add a constraint forcing the denominator to be equal to that constant. Of course, not all constant values lead to feasible problems, so if we use the heuristic approach, we must guess-and-check the denominator values as well as the steady-state locations.
One gene system with two stable steady-states
In this section, we study a one gene system with two stable steady-states (and one unstable steady-state) inspired by a synthetic system developed by Chao Du and refined using the algorithm of the previous section.gives rise to two stable steady-states: e1 ≈ 1.0431 and e2 ≈ 7.9845, and an unstable steady-state e3 ≈ 4.0416.
At the end of the last section, we discussed methods from Chapter XIII of van Kampen’s book for analyzing the equilibrium behavior of systems with multiple stable steady-states. These tools provide a great deal of insight into long-term system behavior with minimal computation, since they only require the stationary probability distribution (which can be computed directly in the single-gene case using (12), or approximated using the Fokker-Planck equation in general). These tools will allow us to predict some of the basic behavior of system (36) with very little effort. Simulations will confirm and complete the picture.
Let us first examine the most basic properties of the system with Ω = 1. As Figure 7 shows, the deterministic system is bistable. The deterministic function α1(x) = f(x)-γx has three zeros corresponding to the three deterministic steady-states. The derivative of the deterministic function is negative (dα1 = dt<0) at the stable steady-states, and positive at the unstable steady-state. The stationary distribution has a strong peak at the more stable steady-state, e1, and a weaker one at the less-stable point e2. The system is three times as likely to be in the domain of e1 as in the domain of e2. We can use the relative stability of the two stable points to estimate the steady-state mean: π1e1 + π2e2 ≈ 2.78, which will be confirmed by our simulation study.
Next, let us examine the system with Ω = 10. Figure 8 shows the deterministic function, the effective potential, and the (approximate) stationary distribution, computed using the Fokker-Planck approach. The deterministic function and stationary distribution have the same qualitative properties as they did for Ω = 1, except that the e1 peak of the stationary distribution is now even higher relative to e2 (π1 ≈ 97%; π2 ≈ 3%) and the steady-state mean, 1.24, is therefore closer to e1. The effective potential has minima at the stable steady-states, but the “energy” of the more stable state, e1, is much lower.
Simulations reveal how the mean, variance, and probability distribution of the system actually evolve. Figures 9, 12 and 13 compare the exact Master equation, second-order van Kampen approximation, and Master equation and Langevin simulations for Ω = 1,10, and all but the exact Master equation for Ω = 100 (due to instability), respectively. Unlike for the one gene system described by Equation (31), the exact Master equation and both simulations deviate dramatically from both the van Kampen approximation and the deterministic trajectory, at least for Ω = 1,10. The reason for this is the bistability of the system. Especially when Ω is fairly small (hence the variance is relatively large) each stochastic trajectory starting from steady-state e1 has a reasonably large probability of escaping from the domain of attraction of e1 and being attracted to e2, and vice versa. In the long run, the system settles to a bimodal steady-state distribution, in which both stable steady-states are represented proportional to their relative stability. Therefore, the steady-state mean regardless of the starting point converges to the roughly weighted average of the two deterministic stable steady-states predicted by the basic stability analysis described in Figure 7. The second-order van Kampen expansion centered at either of the two steady-states does not account for this blending effect and therefore underestimates both the variance and the deviation of the mean trajectory from the deterministic trajectory. In reality, the second-order expansion should never have been applied in this case since it is only valid for systems with a single stable steady-state, as van Kampen explains in Chapter X of his book [29].
Figure 10 shows how the probability distribution evolves from two different initial conditions, peaked at e1 and e2, respectively. Regardless of the starting point, the probability distributions eventually converge to identical steady-state distributions with a strong, sharp peak near e1 and a weaker peak centered near e2. When the initial condition is a peak at e1, the probability spreads out over time and shifts some of its weight toward e2, and vice-versa, although much more weight is shifted from e2 to e1 than the other direction.
The system behavior with Ω = 10 is qualitatively similar, as Figure 12 shows, but the bimodal steady-state probability distribution is even more sharply peaked at e1 and the stochastic mean converges to an average closer to e1, in agreement with the analysis of Figure 8.
The situation appears to be different for Ω = 100, as shown in Figure 13. In fact, the system seems to behave much more like a single stable steady-state system. In that the stochastic mean remains close to the initial steady-state, the van Kampen approximation agrees well with the simulation results, and the variance and difference between the mean and deterministic trajectory are both on the order of O(Ω-1). The explanation is that for very large systems, the probability of a jump between e1 and e2 is extremely small, so the escape time is much longer than the length of the simulation. Figure 11 confirms that the escape time scales exponentially with the system size, as discussed in the previous section and Appendix C. Therefore, for a large system like this one, the stochastic trajectories are highly unlikely to diverge from the deterministic steady-state where they originated for the duration of the simulation. If the simulation ran long enough, some trajectories would eventually escape from their initial domains of attraction, and the same blending of the two steady-states that we observed in the smaller systems would occur. The large system size means that the initial time period in which the two stable steady-states operate independently of each other takes up the entire simulation, however, so we never observe this blending.
Two gene system with two stable steady-states
We constructed a two gene system with two stable steady-states using the heuristic approach described in section Constructing multistable systems. We selected the two steady-statesand solved the optimization problem (35) for the coefficients bi, ci (where m is the number of terms in the model), for i = 1, 2, and Lyapunov matrices P1, P2, to enforce the steady-state condition (fi(ej) = γej,i for i = 1, 2, j = 1, 2), and the stability condition (Jf (ej)TPj + PjJf (ej) at each steady-state j = 1, 2). The problem is non-convex due to the rational form of f and the stability condition. For convenience, we found a solution using a convex solver and a series of heuristics, then checked that the result was indeed a solution of (35), but we could also have used a general solver to solve the feasibility problem (35) directly.
The deterministic model for the two gene system with two stable steady-states iswith bi, ci, i= 1, 2, given in Appendix D.
Figure 14 compares the second-order van Kampen approximation and the Gillespie and Langevin simulations for Ω = 10 and Ω = 1000 (where in the Langevin simulation, we again neglected the interaction terms in the second-order jump moments for simplicity). The qualitative behavior of this system is exactly the same as that of the bistable one gene system. For small system sizes, each stochastic trajectory has a reasonable probability of escaping from the domain of attraction of one stable steady-state and being attracted to the other, so in the long run, the system settles to a bimodal steady-state distribution. Hence, regardless of the initial condition, the steady-state mean converges to a weighted average of the two deterministic stable steady-states. The second-order van Kampen approximation centered at either of the two steady-states does not properly apply, and would seriously underestimate both the variance and the deviation of the mean trajectory from the deterministic trajectory. For very large systems, in contrast, the probability of a giant fluctuation between e1 and e2 is very small. Since the escape time scales exponentially with the system size, it can far exceed the length of the simulation for large systems. Therefore, the stochastic trajectories remain close to the deterministic steady-state where they originated for the duration of the simulation, and the van Kampen approximation is quite accurate within this timeframe.
Simulation summary
Our simulation studies support and illustrate the theory discussed in the last section by comparing the van Kampen expansion, Gillespie simulation, and Langevin simulation for systems with one or multiple stable steady-states, hence very different qualitative characteristics. For one gene systems, we can compare the performance of each approach to the exact trajectory of the Master equation. Our study of a one gene system with one stable steady-state shows that for system-size Ω, both the variance and the difference between the stochastic mean and deterministic trajectory are O(Ω-1), and the van Kampen expansion, Gillespie simulation and Langevin simulation are all in excellent agreement with Master equation, (except for slight inaccuracy in the van Kampen and Langevin approximations for very small systems). Furthermore, the deterministic and stochastic trajectories are almost identical for large systems. As the system size increases, the final probability distribution of the stochastic system becomes increasingly sharply peaked at the deterministic steady-state. The two gene system with one stable steady-state confirms these observations. The bistable systems exhibit much more complex behavior. Rather than staying near the initial deterministic steady-state, the Gillespie and Langevin simulations (and exact Master equation, for the one gene system) deviate dramatically from both the (improperly applied) van Kampen expansion and the deterministic trajectory, at least for small Ω. The explanation is that each stochastic trajectory has a reasonable probability of escaping from the domain of attraction of one stable steady-state and being attracted to another. In the long run, the system settles to a bimodal steady-state distribution, in which both stable steady-states are represented proportional to their relative stability, and the mean is the weighted average of the two stable steady-states (as predicted by the alternative van Kampen theory for multiple stable steady-states). However, for large bistable systems, the escape time can far exceed the length of the simulation, since escape time scales exponentially with system size. Therefore, the stochastic trajectories remain close to the deterministic steady-state where they originated for the duration of the simulation.
CONCLUSIONS
We can draw several important conclusions from the theory of the earlier sections and the results of these studies. The first is that for large systems with a single stable steady-state, the deterministic model is sufficient for most practical purposes, since the probability distribution of the stochastic solution consists of a peak tracking the deterministic solution with variance inversely proportional to the system size. Multistable systems display more complex behavior since large fluctuations can cause trajectories to jump from the domain of attraction of one steady-state into another. Eventually, multistable systems settle to multimodal steady-state probability distributions peaked at the deterministic steady-states, with peak strengths proportional to the relative stability of the steady-states. Moderate-sized or randomly initialized multistable systems reach their final multimodal distribution relatively quickly, but since the escape time scales exponentially with the system size, the steady-states of large multistable systems may operate independently of each other practically indefinitely.
These observations are particularly relevant to the deterministic model-based inference method we presented in a earlier publication [38]. Since the deterministic model is very accurate for large systems with a single steady-state, the inference method applies directly to the mean of gene expression measurements for this type of system. For multistable systems, we can find the deterministic steady-state expression levels needed for the algorithm by locating the expression peaks rather than averaging the measurements. For the large system sizes typical in gene expression studies, these peaks will be extremely close to the deterministic steady-states. It is also worth specifically relating the effects of stochasticity to gene perturbations, which are central to our inference algorithm and many other applications. A gene regulatory system immediately following a perturbation like gene knockdown is not in steady-state, so the expression distribution will be in flux for some period of time before reaching a final stochastic steady-state consistent with the perturbation. This steady-state is, in general, a multimodal distribution different from the system’s natural steady-state distribution due to the perturbation. The peaks of the distribution correspond to deterministic stable steady-states consistent with the fixed expression levels of the perturbed genes. If there is only one such deterministic steady-state, the final distribution will be unimodal; if there are multiple, the system will eventually explore them all. Generally, a perturbed system does not start out very close to a particular deterministic steady-state, so it has a reasonable probability of initial attraction to any possible state and the distribution quickly reaches its multimodal steady-state (on a very short time scale relative to the escape time, as discussed in section Systems with multiple stable steady-states). To collect data for the inference algorithm, the experimenter should apply each perturbation, wait for the system to settle to its stochastic steady-state distribution, and measure the expression peaks, which correspond to deterministic perturbed steady-states.
Stochastic effects become more dominant for small systems, where fluctuations have greater impact relative to the system as a whole. In particular, stochastic modeling can be critical for genes with very low expression numbers. In these cases, exact but expensive methods like the explicit Master equation solution for one gene systems or the Gillespie algorithm may be attractive. Our results indicate that the Langevin simulation is also reasonably accurate, especially for moderate-sized systems, at much lower computational cost than the Gillespie algorithm. For systems with one stable steady-state, the van Kampen expansion is excellent for approximating the Master equation at any level of detail desired, and alternative van Kampen theory can yield insight into the asymptotic behavior of multistable systems. We hope our discussion of gene regulation modeling via the Master equation and our analysis and demonstration of approximation and simulation methods will help future researchers treat stochasticity in gene regulation more confidently and effectively.
APPENDICES
A Derivation of the Master equation
The following derivation, simplified and adapted from Chapters IV and X of van Kampen’s Stochastic Processes in Physics and Chemistry [29], is provided here for the reader’s convenience.
The Chapman-Kolmogorov equation
A Markov process is a stochastic process such that for any t1<t2<…<tn,
Hence a Markov process is completely determined by the functions P(y1, t1) and the transition probabilities . For example, for any t1<t2<t3,.
If we integrate this identity over y2 and divide both sides by P(y1, t1), we obtain the Chapman-Kolmogorov equation, which necessarily holds for any Markov process:
The Master equation
The Master equation (also known as the Kolmogorov Backward equation) is a differential form of the Chapman-Kolmogorov equation that is often more convenient and easier to relate to physical processes.
In order to derive it, we first assume for convenience that the process is time homogeneous, so we can write the transition probabilities as , i.e.,.
It can be shown (see van Kampen IV.6) that for small , has the formwhere W(y2|y1) is the y1 → y2 transition probability per unit time. The coefficient in front of the delta is the probability that no transition occurs during , so
Inserting (39) in place of in the Chapman-Kolmogorov equation (38) yields
We can rewrite this equation as (7) from the main text as follows:Or change the names of the variables:.
B van Kampen’s expansion of the Master equation
This calculation is adapted from van Kampen, Chapter X [29]. It is simplified from the original by assuming a birth-and-death process, and provided here for the reader’s convenience. The Master equation for a birth-and-death process is given by
Assume that the transition probabilities have the special form:and define
For birth-and-death processes, r only takes the values±1, so we have
Hence the Master equation becomes
As discussed in the main text, we make the Ansatzand define by.
The partial derivatives are given by
Therefore we can rewrite (40) as
Taylor expanding about allows us to approximate (40) in terms of the jump moments α1, α2:
A second Taylor expansion of about gives
We can cancel the terms on the right- and left-hand-sides by choosing
This is the final form of the expansion. It can be truncated at any level of detail desired and translated back into the original variables to yield various approximations of the Master equation. Note that it is only applicable for systems with a single stable steady-state [29].
C Mean first-passage time
For a birth-and-death process with states 0, 1, 2, …, we can derive a simple formula for the mean first-passage time. Suppose the system starts at state m and we want to find the mean first-passage time to state n. Let denote the expected time to reach state n starting from state i. Clearly = 0, and the quantity of interest is . Let gk, rk denote the birth and death rates of the chain, respectively, and tk denote the expected waiting time in state k before a transition. The waiting times and transition probabilities are related to the rates as follows:
Then we havewhere is the stationary distribution (11). Observe that if n,m are stable points and l with is an unstable point, then the stationary distribution will have peaks at n and m and a valley at l. The most important terms in the sum are therefore those with being the denominator, and the inner sum is then πm, which is O(1). Hence the escape rate is on the order of ; that is,
The escape time scales as since the stationary distribution is approximately a mixture of Gaussians with peaks of order Ω at the stable points, so is .
D Coefficients of the bistable two gene system
The coefficients of system (37) are given by
CONFLICT OF INTEREST
The authors Arwen Meister, Chao Du, Ye Henry Li and Wing Hung Wong declare that they have no conflict of interests.
Swain, P. S., Elowitz, M. B. and Siggia, E. D. (2002) Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc. Natl. Acad. Sci. U.S.A., 99, 12795–12800. Available at: and
[2]
Paulsson, J. (2004) Summing up the noise in gene networks. Nature, 427, 415–418. Available at: and
[3]
Elowitz, M. B., Levine, A. J., Siggia, E. D. and Swain, P. S. (2002) Stochastic gene expression in a single cell. Sci. Signal., 297, 1183.
[4]
Ozbudak, E. M., Thattai, M., Kurtser, I., Grossman, A. D. and van Oudenaarden, A. (2002) Regulation of noise in the expression of a single gene. Nat. Genet., 31, 69–73. Available at: and
[5]
Blake, W. J.,Kaern, M., Cantor, C. R. and Collins, J. J. (2003) Noise in eukaryotic gene expression. Nature, 422, 633–637
[6]
Rao, C. V., Wolf, D. M. and Arkin, A. P. (2002) Control, exploitation and tolerance of intracellular noise. Nature, 420, 231–237. Available at: and
[7]
Kaern, M., Elston, T. C., Blake, W. J. and Collins, J. J. (2005) Stochasticity in gene expression: from theories to phenotypes. Nat. Rev. Genet., 6, 451–464. Available at: and
[8]
Raj, A. and van Oudenaarden, A. (2008) Nature, nurture, or chance: stochastic gene expression and its consequences. Cell, 135, 216–226. Available at: and
[9]
Munsky, B., Neuert, G. and van Oudenaarden, A. (2012) Using gene expression noise to understand gene regulation. Science, 336, 183–187. Available at: and
[10]
Hager, G. L., McNally, J. G. and Misteli, T. (2009) Transcription dynamics. Mol. Cell, 35, 741–753. Available at: and
[11]
Kittisopikul, M. and Süel, G. M. (2010) Biological role of noise encoded in a genetic network motif. Proc. Natl. Acad. Sci. U.S.A., 107, 13300–13305. Available at: and
[12]
Pedraza, J. M. and van Oudenaarden, A. (2005) Noise propagation in gene networks. Science, 307, 1965–1969. Available at: and
[13]
Kepler, T. B. and Elston, T. C. (2001) Stochasticity in transcriptional regulation: origins, consequences, and mathematical representations. Biophys. J., 81, 3116–3136
[14]
Ma, R., Wang, J., Hou, Z. and Liu, H. (2012) Small-number effects: a third stable state in a genetic bistable toggle switch. Phys. Rev. Lett., 109, 248107. Available at:
[15]
Elowitz, M. B. and Leibler, S. (2000) A synthetic oscillatory network of transcriptional regulators. Nature, 403, 335–338. Available at: and
[16]
Gardner, T., Cantor, C. and Collins, J. (2000) Construction of a genetic toggle switch in Escherichia coli. Nature, 403.
[17]
Hasty, J., McMillen, D. and Collins, J. J. (2002) Engineered gene circuits. Nature, 420, 224–230. Available at: and
[18]
Ozbudak, E. M., Thattai, M., Lim, H. N., Shraiman, B. I. and Van Oudenaarden, A. (2004) Multistability in the lactose utilization network of Escherichia coli. Nature, 427, 737–740. Available at: and
[19]
Frigola, D., Casanellas, L., Sancho, J. M. and Ibañes, M. (2012) Asymmetric stochastic switching driven by intrinsic molecular noise. PLoS ONE, 7, e31407. Available at: and
[20]
Novak, B. and Tyson, J. J. (1997) Modeling the control of DNA replication in fission yeast. Proc. Natl. Acad. Sci. U.S.A., 94, 9147–9152. Available at: and
[21]
Arkin, A., Ross, J. and McAdams, H. H. (1998) Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. Genetics, 149, 1633–1648. Available at:
[22]
Thattai, M. and van Oudenaarden, A. (2001) Intrinsic noise in gene regulatory networks. Proc. Natl. Acad. Sci. U.S.A., 98, 8614–8619. Available at: and
[23]
Tao, Y. (2004) Intrinsic noise, gene regulation and steady-state statistics in a two gene network. J. Theor. Biol., 231, 563–568. Available at: and
[24]
Rosenfeld, N., Young, J. W., Alon, U., Swain, P. S. and Elowitz, M. B. (2005) Gene regulation at the single-cell level. Sci. Signal., 307, 1962.
[25]
Krishnamurthy, S., Smith, E., Krakauer, D., Fontana, W. (2007) The stochastic behavior of a molecular switching circuit with feedback. Biol. Direct, 2, 1–17. Available at:
[26]
Munsky, B., Trinh, B. and Khammash, M. (2009) Listening to the noise: random fluctuations reveal gene network parameters. Mol. Syst. Biol., 5, 318. Available at: and
[27]
Dunlop, M. J., Cox, R. S. 3rd, Levine, J. H., Murray, R. M. and Elowitz, M. B. (2008) Regulatory activity revealed by dynamic correlations in gene expression noise. Nat. Genet., 40, 1493–1498. Available at: and
[28]
Stewart-Ornstein, J., Weissman, J. S. and El-Samad, H. (2012) Cellular noise regulons underlie fluctuations in Saccharomyces cerevisiae. Mol. Cell, 45, 483–493. Available at:
[29]
Van Kampen, N. G.Stochastic Processes in Physics and Chemistry. (3rd, Ed). North Holland, 2007.
[30]
Peles, S., Munsky, B. and Khammash, M. (2006) Reduction and solution of the chemical Master equation using time scale separation and finite state projection. J. Chem. Phys., 125, 204104. Available at: and
[31]
Hegland, M., Burden, C., Santoso, L., MacNamara, S. and Booth, H. (2007) A solver for the stochastic master equation applied to gene regulatory networks. J. Comput. Appl. Math., 205, 708–724. Available at:
[32]
Macnamara, S., Bersani, A. M., Burrage, K. and Sidje, R. B. (2008) Stochastic chemical kinetics and the total quasi-steady-state assumption: application to the stochastic simulation algorithm and chemical master equation. J. Chem. Phys., 129, 095105. Available at: and
[33]
Smadbeck, P. and Kaznessis, Y. (2012) Stochastic model reduction using a modified Hill-type kinetic rate law. J. Chem. Phys., 137, 234109. Available at:
[34]
Waldherr, S., Wu, J. and Allgöwer, F. (2010) Bridging time scales in cellular decision making with a stochastic bistable switch. BMC Syst. Biol., 4, 108. Available at: and
[35]
Liang, J. and Qian, H. (2010) Computational cellular dynamics based on the chemical master equation: A challenge for understanding complexity. Journal of Computer Science and Technology, 25, 154–168. Available at:
[36]
Gutierrez, P. S., Monteoliva, D. and Diambra, L. (2012) Cooperative binding of transcription factors promotes bimodal gene expression response. PLoS ONE, 7, e44812. Available at: and
[37]
Khanin, R. and Higham, D. J. (2008) Chemical Master Equation and Langevin regimes for a gene transcription model. Theor. Comput. Sci., 408, 31–40
[38]
Meister, A., Li, Y. H., Choi, B. and Wong, W. H. (2013) Learning a nonlinear dynamical system model of gene regulation: A perturbed steady-state approach. Ann. Appl. Stat., 7, 1311–1333. Available at:
[39]
Faith, J. J., Hayete, B., Thaden, J. T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J. J. and Gardner, T. S. (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol., 5, e8. Available at: and
[40]
Bansal, M., Belcastro, V., Ambesi-Impiombato, A. and di Bernardo, D. (2007) How to infer gene networks from expression profiles. Mol. Syst. Biol., 3, 78. Available at:
[41]
Gardner, T. S., di Bernardo, D., Lorenz, D. and Collins, J. J. (2003) Inferring genetic networks and identifying compound mode of action via expression profiling. Science, 301, 102–105. Available at: and
[42]
di Bernardo, D., Thompson, M. J., Gardner, T. S., Chobot, S. E., Eastwood, E. L., Wojtovich, A. P., Elliott, S. J., Schaus, S. E. and Collins, J. J. (2005) Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat. Biotechnol., 23, 377–383. Available at: and
[43]
Michaelis, L. and Menten, M. L. (1913) Die kinetik der invertinwirkung. Biochem. Z., 49, 333–369.
[44]
Hill, A. V. (1913) The combinations of haemoglobin with oxygen and with carbon monoxide. Biochem. J., 7, 471–480. Available at:
[45]
Ackers, G. K., Johnson, A. D. and Shea, M. A. (1982) Quantitative model for gene regulation by lambda phage repressor. Proc. Natl. Acad. Sci. U.S.A., 79, 1129–1133Available at:
[46]
Shea, M. A. and Ackers, G. K. (1985) The OR control system of bacteriophage lambda: A physicalchemical model for gene regulation. J. Mol. Biol., 181, 211–230. Available at:
[47]
Bintu, L., Buchler, N. E., Garcia, H. G., Gerland, U., Hwa, T., Kondev, J. and Phillips, R. (2005) Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev., 15, 116–124. Available at: and
[48]
Bintu, L., Buchler, N. E., Garcia, H. G., Gerland, U., Hwa, T., Kondev, J., Kuhlman, T. and Phillips, R. (2005) Transcriptional regulation by the numbers: applications. Curr. Opin. Genet. Dev., 15, 125–135. Available at: and
[49]
Rao, C. V. and Arkin, A. P.(2003) Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the Gillespie algorithm. J. Chem. Phys., 118, 4999–5010
[50]
Walker, J. A.Dynamical Systems and Evolution Equations. New York: Plenum Press, 1939.
[51]
Kubo, R., Matsuo, K. and Kitahara, K. (1973) Fluctuation and relaxation of macrovariables. J. Stat. Phys., 9, 51–96. Available at:
[52]
Gillespie, D. T. (1977) Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem., 81, 2340–2361. Available at:
[53]
Gillespie, D. T. (2000) The chemical Langevin equation. J. Chem. Phys., 113, 297. Available at:
[54]
Komorowski, M., Finkenstädt, B., Harper, C. V. and Rand, D. A. (2009) Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinformatics, 10, 343. Available at: and
[55]
Choi, B.Learning networks in biological systems, Ph.D. thesis, Department of Applied Physics, Stanford University, Stanford, 2012.
[56]
Planck, M. and Verband Deutscher Physikalischer Gesellschaften. Physikalische abhandlungen und vorträge. 1958.
[57]
Lord Rayleigh. (1891) Liii. Dynamical problems in illustration of the theory of gases. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 32, 424–445.
[58]
Einstein, A.. (1906) Eine neue bestimmung der molek uldimensionen. Annalen der Physik, 324, 289–306
[59]
Von Smoluchowski, M. (1906). Zur kinetischen theorie der brownschen molekularbewegung und der suspensionen. Annalen der physik, 326, 756–780.
[60]
Van Kampen, N. G.Fluctuations in Nonlinear Systems. Fluctuation Phenomena in Solids, New York: Academic Press, 1965.
[61]
Bar-Haim, A. and Klafter, J. (1998) Geometric versus energetic competition in light harvesting by dendrimers. J. Phys. Chem. B, 102, 1662–1664. Available at:
[62]
Chickarmane, V. and Peterson, C. (2008) A computational model for understanding stem cell, trophectoderm and endoderm lineage determination. PLoS ONE, 3, e3478. Available at: and
[63]
Zavlanos, M. M., Julius, A. A., Boyd, S. P. and Pappas, G. J. (2011) Inferring stable genetic networks from steady-state data. Automatica, 47, 1113–1122. Available at:
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.