Rigorous bounds on the stationary distributions of the chemical master equation via mathematical programming
Abstract
The stochastic dynamics of networks of biochemical reactions in living cells are typically modelled using chemical master equations (CMEs). The stationary distributions of CMEs are seldom solvable analytically, and few methods exist that yield numerical estimates with computable error bounds. Here, we present two such methods based on mathematical programming techniques. First, we use semidefinite programming to obtain increasingly tighter upper and lower bounds on the moments of the stationary distribution for networks with rational propensities. Second, we employ linear programming to compute convergent upper and lower bounds on the stationary distributions themselves. The bounds obtained provide a computational test for the uniqueness of the stationary distribution. In the unique case, the bounds collectively form an approximation of the stationary distribution accompanied with a computable error bound. In the nonunique case, we explain how to adapt our approach so that it yields approximations of the ergodic distributions, also accompanied with computable error bounds. We illustrate our methodology through two biological examples: Schlögl’s model and a toggle switch model.
theoremsection
I Introduction
Celltocell variability is pervasive in cell biology and biotechnology. This stochasticity stems from the fact that biochemical reactions inside living cells often involve biomolecules present in only few copies per cells Elowitz et al. (2002); Taniguchi et al. (2010); Uphoff et al. (2016). Under wellmixed conditions the chemical master equation (CME) is used to model the stochastic dynamics of interacting biochemical components in the cell. Prominent examples include gene regulatory and signalling networks, some of which are involved in cellular adaptation and cell fate decisions Arkin, Ross, and McAdams (1998); Dandach and Khammash (2010); Thomas, Popović, and Grima (2014). As increasingly accurate experimental techniques become available, it is of paramount importance to develop accurate and reliable solution methods to stochastic models. By enabling the reliable inference of underlying model parametersMunsky, Fox, and Neuert (2015); Fröhlich et al. (2016), these methodologies can facilitate the identification of molecular mechanisms Neuert et al. (2013) and the design of synthetic circuits in living cells OyarzuÌn, Lugagne, and Stan (2014); Zechner et al. (2016).
Significant efforts have been made to investigate the stationary distributions of CMEs because they determine the long time behaviour of the underlying continuoustime Markov chainMeyn and Tweedie (1993). While exact MonteCarlo methods have been developed to sample from stationary distributions Hemberg and Barahona (2007), analytical solutions, however, are known only in a few special cases. These include single species onestep processesGardiner (2009), certain classes of unimolecular reaction networks Jahnke and Huisinga (2007); LópezCaamal and Tatiana (2014), networks obeying the conditions of detailed balance Van Kampen (1976a); Danos and Oury (2013), deficiency zero networks Anderson, Craciun, and Kurtz (2010); Cappelletti and Wiuf (2016), and small model systems Peccoud and Ycart (1995); Bokes et al. (2012). Despite these efforts, the CME is generally considered intractable because, aside of systems with finite state space, it consists of infinitely many coupled equations.
One approach to circumvent this problem is to compute moments of the stochastic process. However, this approach circumvents the problem only for networks of unimolecular reactions. In all other cases the equations for the first few moments involve unknown higher moments and thus these equations also constitute an infinite number of coupled equations that cannot be solved analytically. Moment approximations, some of which require assumptions about the apriori unknown distribution solution, are commonly employed to overcome this issue Van Kampen (1976b); Engblom (2006); Lakatos et al. (2015); Schnoerr, Sanguinetti, and Grima (2015). Few of these methods, however, admit quantified error estimates on its solution Engblom and Pender (2014); Kurtz (1976). Independently, mathematical programming has been employed to compute bounds on the moments of Markov processes in various contexts. All of these consider a finite set of moment equations supplemented by moment inequalities whose solutions can be bounded in terms of a linear program (LP) Schwerer (1996); Helmes and Stockbridge (2003) or, under stronger conditions, using a semidefinite program (SDP) HernándezLerma and Lasserre (2003); Kuntz et al. (2016).
Another approach is to approximate the distribution solution, either by expanding the CME using a finite number of basis functions Engblom (2009); Thomas and Grima (2015) or by projecting it onto finite state spaces Munsky and Khammash (2006); Kazeev et al. (2014). The finite state projection algorithm Munsky and Khammash (2006), for instance, solves the CME on a finite subset with an absorbing boundary. A particular advantage of this approach over other methods is that the accuracy of its solution is controlled. Specifically, the solution of the finite state projection algorithm is a lower bound on the timedependent solution of the CME, while the probability mass in the absorbing state yields a bound on the total error of its solution. However, at long times, all of the probability mass becomes trapped into the absorbing state, and for this reason the method is not wellsuited to study the CME’s longterm behaviour and its stationary distributions.
In this paper we present two mathematical programming approaches, one that yields upper and lower bounds on the moments of any stationary distribution, the other that provides bounds on the probability that the stationary distribution gives to a given subset of the state space. We build on previous methodologies developed to obtain moment bounds of polynomial diffusions using semidefinite programmingKuntz et al. (2016) and adapt these to discrete reaction networks with polynomial or rational propensities. In particular, we consider the first few moment equations, which are generally not closed (that is, underdetermined), and constrain their solution space using positive semidefinite inequalities that are satisfied by the moments of any admissible probability distribution.
The second approach yields convergent upper and lower bounds on the probability that the stationary distributions give to a given subset of the state space by solving LPs. By repeatedly applying the approach, we produce bounds on the complete stationary distributions and their marginals. The method considers finite but underdetermined systems of stationary equations supplemented with inequalities involving a moment bound computed using the first approach. Additionally, we quantify the error in our approach, by providing a computable bound on the total variation (or, equivalently, the ) distance between the vector of bounds and the exact distribution solution (Lemma 3).
The paper is organised as follows. In Sec. II we review the basic equations for stationary moments and distributions. We then discuss the problem of bounding the stationary moments in Sec. III. We introduce our approach by demonstrating how analytical bounds for the first two moments can be obtained for a singlespecies example and then develop its generalisation to multispecies networks with rational propensities. In Sec. IV, we show how to bound event probabilities, Theorem 2. We first demonstrate the approach for singlespecies onestep processes for which distribution bounds can be obtained analytically. Then, we explain how to generalise the approach to bound event probabilities, distributions and marginals of arbitrary networks. In Sec. V, we exemplify the usefulness of our methods by computing the stationary distribution and marginals of a toggle switch. We conclude with a discussion of our results in Sec. VI.
Ii Stationary moments and distributions
We consider a generic reaction network involving species that interact via reactions of the form
(1) 
where is the reaction index running from to , is the propensity of the reaction, and are the stoichiometric coefficients. Denoting the number of molecules at time with , the probability of observing the process in the state at time satisfies the CME Gillespie (1977); Anderson (1991)
with initial conditions for each in . The rate matrix is defined as
for all vectors and is the stoichiometric vector that denotes the netchange of molecule numbers incurred in the reaction.
A stationary distribution of the network is a distribution on such that if the has law at time zero, then is a stationary process. Consequently, is a fixed point of the CME, e.g. see Theorem 4.3 in Chapter 5.4 of Ref. Anderson (1991),
(2) 
Since the network may have more than one stationary distribution, we denote the set of solutions by . Assuming that the network is positive recurrent, the stationary distributions determine the long term behaviour of the chainMeyn and Tweedie (1993). Verifying whether or not a network is positive recurrent consists of finding an appropriate Lyapunov function Meyn and Tweedie (), which in certain cases can be achieved using mathematical programming Gupta and Khammash (2015).
Alternatively, we can consider expectations: from Eq. (2), it follows that, if a realvalued function on satisfies some integrability conditions (for example, Condition (29) in Appendix B), then
(3) 
where we are using vector notation , see the proof of Theorem 1 in Appendix B. For networks with polynomial propensities , the equations for the moments up to moment order are obtained by letting in Eq. (3) for all . However, these equations are closed only for networks with propensities that depend at most linearly on the molecular states , such as networks composed solely of zeroorder and unimolecular reactions with massaction propensities. For networks composed bimolecular or higherorder reactions, the equations of order depend on moments of order higher than , and hence these equations are not closed, that is, they form a system of underdetermined linear equation.
Iii Bounding the stationary moments
We here develop rigorous bounds on the stationary moments of the CME. We introduce our approach for a model system and obtain analytically bounds on the first two moments, and then present the general method that relies on solving SDPs.
iii.1 Moment bounds for Schlögl’s model
We consider an autocatalytic network proposed by SchlöglSchlögl (1972), that models a chemical phase transition involving a single species ,
(4) 
The propensities follow massaction kinetics and are given by
(5) 
If the reactions are modelled stochastically, the network has a unique stationary distribution with either uni or bimodal distributions depending on the parameters Ebeling and SchimanskyGeier (1979), and all of its moments are finite for any set of positive reaction rate constants, see Appendix A.
At stationarity, the first moment equation reads
(6) 
where , , , and denote the moments of , and , , , . Instead of applying a moment closure, which assumes a particular form of the distribution, we will consider inequalities satisfied by the moments of any possible distribution.
Specifically, for any distribution with moments and , it holds that the following two matrices are positive semidefiniteLasserre (2009) (in short )
(7) 
By Sylvester’s criterion the above is equivalent to
(8) 
The first five inequalities state that the moments are nonnegative and that so is the distribution’s variance. The last inequality gives a condition involving moments .
The set of vectors that satisfy Eq. (6), the inequalities (8), and the normalising condition , constrains the values the moments of can take in the sense that . To give an explicit characterisation of this set, we restrict ourselves to the case (the other case can be treated similarly). A vector satisfies the last inequality, , and the moment equation (6) if and only if
The set of pairs of nonnegative numbers that satisfy the above inequality and can be reduced to
with
In summary, the feasible set of moment values is given by
(9) 
The projection of this set onto the  plane is shown in Fig. 1a.
Because the moments of are contained in , which is clearly bounded, we have that
are respective upper and lower bounds on for . To determine these bounds, we note that the functions and are monotonically increasing, such that the maximum values of and are achieved at the furthermost northwestern intersection point of the curves and (Fig. 1a dots). In other words, there is a unique maximum with and is the rightmost root of the quartic polynomial . It hence follows that
which provides us with lower (albeit uninformative) and upper bounds on the first and second moments.
iii.2 Moment bounds for multispecies reaction networks
We generalise the above approach to obtain moment bounds of multispecies networks of the form (1). Since in many applications reaction networks with rational propensities are prevalent, it is desirable to consider moment equations that allow for this type of propensity. To this end, we introduce rational moments of the form
(10) 
with polynomial denominator (with for all ). Here, we employ multiindex notation in which multiindices are denoted by Greek letters , monomials are written as , and . More generally, we denote by the rational moments of order those for which in Eq. (10).
The key observation is that by letting in Eq. (3), one obtains linear equations involving the rational moments with appropriately chosen denominator . Moreover, the power moments can be expressed as linear combinations of these moments
where the denote the polynomial coefficients such that . As we show in the following
(11) 
where the vector represents the polynomial coefficients of and is the set of admissible values for the rational moments of order up to order with denominator , which satisfy both the rational moment equations and certain positive semidefinite inequalities. Since the set is is defined by linear equalities and semidefinite inequalities, both the left and right hand side of (11) are SDPs, a tractable class of convex optimisation problems that can be efficiently solved computationally using standard solvers. The details of this approach are developed in the following.
Rational moment equations
To derive the moment equations for networks with rational propensities, we rewrite the propensities as , where and are polynomials of degrees and . For example, for , where (resp. ) are nonnegative (resp. positive) polynomials, a convenient choice is the common denominator .
Choosing now in Eq. (3) and rearranging, we find that the rational moments satisfy the linear equations
where
and are the coefficients in the polynomials and is its maximum degree. Note that for and that the second sum is taken over all such that for all and such that there is at least one for which . It is clear that for polynomial propensities, i.e. , we recover the equations for the power moments.
The equations obtained by plugging into Eq. (3) for all involve the rational moments of order . We can write these equations compactly in vector notation as
where for all , and for all . For instance, in the case of Schlögl’s model, is just a shorthand of Eq. (6). If the above equations form a underdetermined system equations. In the following, we constrain the set of solutions with the use of socalled moment inequalities.
Semidefinite inequalities for the moments
The rational moments of any probability distribution on the nonnegative integers satisfy certain semidefinite inequalities Lasserre (2009); Blekherman, Parrilo, and Thomas (2013). In particular, we define the moment matrices
for , where , and denotes the unit vector. For instance, and are the matrices we employed in (7) for Schlögl’s model. If is the vector of rational moments of some distribution on , that is
(12) 
then these matrices are positive semidefinite (or in short , ). This follows from the fact that for any polynomial of degree with polynomial coefficients , it holds that
since . Similarly, we have
Additionally, since any probability distribution must have mass of one
Bounding moments using semidefinite programs
In summary, the set of vectors satisfying both the rational moment equations and the positive semidefinite inequalities is a spectrahedron, a convex set defined by matrix inequalities, which reads
(13) 
where denotes the cardinality of . Denoting the set of stationary distributions with rational moments up to order by , that is the set of such that all the averages in Eq. (10) are finite for all , we obtain the following straightforward theorem:
Theorem 1.
For any in , the vector containing the rational moments of order or less of (defined in Eq. (12)) belongs to .
A proof of the theorem can be found in Appendix B. This result has the following useful consequences that for any polynomial of degree and
where
(14) 
In particular, choosing with , it follows that the power moment is contained in the interval and thus we obtain the bounds of Eq. (11). Computing either or consists of solving the corresponding SDP given in Eq. (14).
Our result does not allow us to conclude on the convergence of these bounds with increasing number of moment equations and inequalities. However, it follows by construction that the sequence of lower bounds is nondecreasing while the sequence of upper bounds is nonincreasing with increasing number of moments equalities and inequalities, i.e. it holds that
In practice, however, we find that the bounds often converge to the true moments (or, in the case of multiple stationary distributions, the maximum/minimum moments over the set of stationary distributions). For example, for Schlögl’s model, in Fig. 1b we can see the projection of onto the  plane collapsing onto the point as is increased.
To investigate this convergence further, in Fig. 2a we show the upper and lower bounds on the first three moments as a function of for Schlögl’s model with a different parameter set than that in Fig. 1b. The insets show that the difference between upper and lower bounds decreases with indicating that either bound converges to the true moment. Similarly, using these bounds we obtain converging bounds on several commonly used summary statistics as shown in Fig. 2b.
iii.3 Implementation details and numerical considerations
To setup the SDPs we used the modelling package YALMIPLöfberg (2004), and to solve them we used the solver SDPAGMPNakata (2010) in conjunction with the interface mpYALMIPFantuzzi (2016).
Before proceeding onto bounding the distributions, we should mention that SDPs associated with moment problems, such as those we have discussed in this section, are often ill conditioned. The reasons why are as of yet unclear. As we discuss in Section 5 ofKuntz et al. (2016), we believe they revolve around the fact that the moments of distributions change order of magnitude very quickly which leads to numerical instability in the solvers. This, and the success encountered inFantuzzi et al. (2016) when using SDPGMP to tackle similar SDPs, motivated us to employ the multiprecision solver SDPGMP instead of a standard doubleprecision SDP solver. Alternatively, employing rational moments, even when the network has polynomial propensities, can ameliorate this problem.
Iv Bounding the stationary distributions
Here we explain how to bound the probability that the stationary distributions give to any given subset of , and, in the case of a unique stationary distribution, how to iterate our approach to yield informative bounds on the complete distribution and its marginals. We begin by illustrating our approach with analytically tractable onestep processes. We then present our main result which enables us to use linear programming to bound the probability that the distributions give to any given subset of . The case of a unique stationary distribution is considered next. Lastly, we consider the nonunique case and we show how our method provides a computational test for uniqueness of the stationary distribution and how to adapt it to yield informative bounds in the nonunique case.
iv.1 Semianalytical bounds for onestep processes
For simplicity, we first focus on the case of a onestep process that involves a single species whose molecule count changes in the reactions by . In this case, the stationary CME (2) can be solved recursivelyGardiner (2009) to obtain
(15) 
where are the propensities of the forward and backward jumps and we have defined the function . The probability is obtained by invoking the normalising condition
(16) 
Obtaining analytically is, however, possible only in simple cases.
Upper bounds
In cases for which computing analytically is not possible, it is numerically straightforward to truncate the sum in Eq. (16) considering only states for which . While this procedure may be commonly used in practice, it does only provide an upper bound on the probability distribution. Specifically, since , for all , we have that
(17) 
where the right hand side, , is an upper on for every . Clearly, tends to as .
Lower bounds
Less obvious, however, is how to obtain lower bounds on . To this end, we consider the mass of the tail of the distribution that can be bounded using Markov’s inequality as follows
However, explicit expressions for the involved moment are available only for linear propensities. To this end, we employ an upper bound that is readily computable using the method discussed in Sec. III. This provides us with a computable bound on the mass of the tail
Since
and using Eqs. (16,17) we obtain the lower bound
Error bounds
Generally, we would like to determine how close the vector of upper bounds is to the vector composed of the first few values of the stationary distribution for a given . To this end, we consider their distance (or, equivalently, twice the total variation distance)
(18) 
where we have used the fact that by definition (17).
Similarly, the distance between the vector of lower bounds and is bounded by
(19) 
Alternatively, by combining the upper bound on the lower bound we can quantify the remaining uncertainty in our knowledge of the individual probabilities of the stationary distribution:
(20) 
Since tends to as , Eqs. (1820) imply that the vectors of bounds converge both statewise and in norm to as .
Schlögl’s model
As an application of these distribution bounds we consider Schlögl’s model introduced in Sec. III.1. The first and third propensities in Eq. (III.1) can be lumped together into , while the second and fourth reaction can be lumped together into . The model enjoys the particular advantage that it can be solved exactly and it thus provides an ideal test case for the derived bounds. Using the normalising condition and Eq. (16), can be expressed in terms of a generalised hypergeometric function
(21) 
with and . To compute the bounds, we require a bound on the mass of the tail. To this end, we compute upper bounds on the and moments using the first moment equations and the corresponding matrix inequalities involving the first moments. We considered two parameter sets, one for which the stationary distribution is unimodal, and another for which it is bimodal. In Figs. 3a and 3c, we compare the bounds we obtained with the exact distribution computed using Eq. (15) and (21).
In practice, we need to choose both the truncation parameter and which moment bound to employ when computing the vectors and . As Lemmas 3 and 4 in the following section show, we can always use the computed vectors and the moment bound to a posteriori bound the distance between the vector of bounds and the stationary distribution, which we refer to as the “total error”. If the error bound is unsatisfactory, then we can increase and recompute these bounds. Choosing a good and moment bound combination a priori is not so straightforward. A basic guideline follows from Jensen’s inequality
In words, to achieve an error of less than , must always be greater than the mean number of molecules. To investigate this matter further, we studied in Figs. 3b and 3d how the error and the bound on the error depend on and on which moment we employed for Schlögl’s model. The error (solid lines) was computed using the analytical expression available for the stationary distribution of this simple network, Eqs. (15) and (21). For more general networks where no such expression is known, the error is not computable. However, the error bounds (dashed lines) are still readily available as their computation only involves the vectors , , and the bound on the moment. From the definition of the error bound , it is clear that the error bound converges to zero far quicker when using the moment (blue dashed line) rather than the moment (red dashed line), in good agreement with the actual error (solid blue and red). Hence for large truncations it is preferable to use a higher moment to compute the distribution bounds. However, for moderate it is interesting to note that lower moments can also outperform higher ones.
iv.2 Linear programming approach to multispecies networks
The approach of the previous section relied on expression (2), only available for singlespecies onestep process, that relates the whole stationary distribution to one single entry . To generalise our approach to multispecies networks, we deal directly the stationary equations. Specifically, we consider the equations for such that
(22) 
where the ’s are positive weights. Introducing such weights is important in applications because it allows choosing truncations based on the abundance of different species. There are many such equations involving the many states contained in
(23) 
Let denote the space of all realvalued vectors indexed by and (resp. ) be the projection operator that takes a vector and returns (resp. ). We can write these first few equations as , where is a dimensional matrix. These equations are underdetermined since, except for finite state spaces, . We overcome this issue in a similar manner to how we overcame the issue for the moments: we supplement these equations with inequalities and then compute the bounds by solving LPs (a subclass of SDPs). To this end, we require an upper bound on a moment, which via Markov’s inequality also equips us with an upper bound on the mass of the tail.
Bounding event probabilities
Let denote the subset of stationary distributions such that , where is a constant. A bound on the mass of the tail , i.e. the mass in the complement of , is obtained as follows
(24) 
for any and the summation in the second sum is over all states for which . The second inequality is an instance of Markov’s inequality. Consider now the convex polytope defined by
For any (resp. ) let (resp. ) to denote the indicator vector
for all (resp. ). For any , the probability of an event is then given by , and can be bounded from above and below as follows:
Theorem 2.
Pick any , let and

For any , we have that
for all .

Suppose that is nonempty and let
Both sequences and converge and
A proof of the above theorem is given in Appendix C. Note that by definition and thus is indeed a lower bound on . The number will be an upper bound on if . Computing either or consists of solving a LP, that can efficiently done using any one of the solvers are readily available online. The practical implications of the theorem are discussed in the following sections.
Unique case: Bounding probability distributions
If there exists a unique stationary distribution , we are often interested in computing the complete distribution instead of the probability that assigns to a single subset of the state space . Suppose that and , then Theorem 2 implies that
statewise for each . Additionally, we have that both and converge to as .
Denoting by the vector of lower bounds for all states and similarly for , it follows that
In words, by computing the lower bounds in and the upper bounds in we can bound the distance between and either vector of bounds. We refer to this distance as the “total error” of the vector of bounds and we refer to any bound on the total error, such as that appearing in the above, as an “error bound”. Additionally, we have the following two other error bounds
Lemma 3.
For any , we have that
(25a)  
(25b) 
That is, we can also bound the total error of one vector of bounds without computing the other vector of bounds. A proof of the above lemma can be found in Appendix C.
Unique case: Bounding marginal distributions
For highdimensional networks we are often interested in marginal distributions rather than the full stationary distribution. For simplicity, we assume that we are interested only in the first species, so that we are marginalising over the last species. We decompose the vectors of molecule numbers into where and such that the marginal distribution on is as follows
To proceed we introduce the slice of in Eq. (IV.2),
and let in Theorem 2. Thus, for each