Variance reduction for steadystate simulation and sensitivity analysis of stochastic chemical systems
Abstract
We address the problem of estimating steadystate quantities associated to systems of stochastic chemical kinetics. In most cases of interest these systems are analytically intractable, and one has to resort to computational methods to estimate stationary values of cost functions. In this work we consider a previously introduced variance reduction method and present an algorithm for its application in the context of stochastic chemical kinetics. Using two numerical examples, we test the efficiency of the method for the calculation of steadystate parametric sensitivities and evaluate its performance in comparison to other estimation methods.
1 Introduction
Knowledge of steadystate quantities related to an ergodic stochastic chemical system can provide useful insights into its properties. Moreover, steadystate values of cost functions are often easier to estimate with good accuracy compared to stationary distributions. When the system propensities are affine in the state, mean values of polynomial functions of the system state can be computed analytically, as the system of moments is closed. However, when nonpolynomial functions are considered, or the system propensities are not affine, analytic calculations are no longer possible and the only solution left is simulation.
While moment closure methods [20] can be used to provide good approximations to system moments over a finite time interval, they commonly tend to diverge from the true solution over time, thus resulting in biased steadystate values. The solution presented in Ref. [18] works only when polynomial functions of the state are considered, and generalization to arbitrary functions is still very difficult. The Finite State Projection algorithm [13] can be alternatively employed to provide moment estimates with guaranteed accuracy bounds, however the number of states required to attain a certain accuracy makes the method applicable to small problems.
On the other hand, stochastic simulation [5] can always provide estimates for the stationary mean of any function of the state, however these estimates are inevitably noisy. Bruteforce noise reduction can only be achieved at an increased computational cost, either by simulating longer trajectories or by running many trajectories in parallel.
Another possibility for reducing the noise in the estimated quantities is the application of a variance reduction technique [3], provided the added computational cost of the reduced variance estimator is significantly smaller than the gain in computer time. In this work we present the application of such a variance reduction technique [9, 8] to systems of stochastic chemical kinetics. The idea is based on socalled shadow functions and originated in the queueing systems simulation literature, a field where the range of analytically tractable systems in that field is much larger. We demonstrate how the same idea can be applied to steadystate simulation of stochastic chemical systems. We further test the capabilities of the reducedvariance estimators by performing parametric sensitivity calculations for two systems governed by nonlinear propensity functions.
The paper is organized as follows: in Sections II and III we define the steadystate estimation problem and define the naïve and shadow function estimators. In Sections IV and V we present one possible implementation of the variance reduction technique to stochastic chemical kinetics and its applicability to steadystate sensitivity analysis. The numerical examples in Section VI serve to demonstrate the effectiveness of the shadow function method in practice and assess its computational cost in comparison to naïve estimation. The conclusions of our study and some future research directions are finally summarized in Section VII.
2 Problem statement
2.1 Setup
Assume an irreducible positive recurrent Markov chain on a countable space . In the case of stochastic chemical kinetics, , where is the number of chemical species in the system. The chain moves according to a finite set of available transitions , with a corresponding set of propensity functions . The infinitesimal generator of is the operator satisfying
(1) 
for all such that . The discreteness of allows us to enumerate its elements and think of as an infinite matrix . Similarly, any function on can be thought of as an infinite column vector, and distributions on can be defined as infinite row vectors.
2.2 Steadystate estimators
Let be a integrable cost function associated with . The ergodic theorem for Markov chains [14] ascertains that for any initial condition
where is the unique invariant distribution of the system and the steadystate mean value of . Since the analytic calculation of is possible only in very special cases, its estimation from simulation is usually the only possibility. The most straightforward estimator of is
(2) 
which is also strongly consistent [3].
Under some further general conditions on , and , we also know that
(3) 
as , where denotes weak convergence, is the standard normal distribution and is called the time average variance constant (TAVC) for [3].
The TAVC can be expressed in terms of the integrated autocovariance function of the process , where , according to the formula [3]:
(4) 
An alternative expression for can be derived from the functional Central Limit Theorem for continuoustime Markov chains [4]:
(5) 
where is a solution to the socalled Poisson’s equation[2] (Note that solutions to the Poisson equation are unique up to an additive constant, i.e. if is a solution, then is also a solution [2]):
(6) 
A more general class of estimators for has the form
(7) 
where is chosen such that almost surely for all [9]. The function offers an extra degree of freedom in the design of the estimator, which can be exploited to achieve variance reduction. In other words, can be chosen such that the TAVC of , denoted by , is smaller than . The obvious choice is of course intractable, however it suggests that a function with a zero steadystate mean that is approximately equal to could also achieve variance reduction. Such functions would result in a process that behaves almost antithetically from , thus making the variance of smaller than that of alone. In the steadystate simulation literature, a function that satisfies is called a shadow function [8].
The problem then becomes the selection of an appropriate shadow function , so that , with . From (3) we see that a reduction of variance by a factor implies that the variance of is equal to the variance of . Assuming that the computational cost of both estimators is dominated by the cost of simulating the process , can be used as an indicator of the efficiency of relative to .
The basic idea of the shadow function method of Ref. [8], outlined in the next section, is to obtain such an by using analytical information from a second Markov chain that approximates the original one and is mathematically tractable. A second alternative solution of more general applicability will be described after presenting the method in more detail.
3 The shadow function method: main idea
The basic idea to the shadow function method is to consider candidate functions of the form
(8) 
where is the generator matrix of the Markov chain and is any integrable function (so that the ergodic theorem holds for it as well). In this case, and under the assumption that (that holds under some nottoostringent conditions on [8]), becomes a shadow function. We are then naturally led to consider the solution of the Poisson equation (6), which could provide us with the appropriate function . Solving (6) is of course not possible, since the state space is countable and is unknown. However, we can look for socalled surrogate functions that approximate this solution to build a better estimator.
Following the analysis from [8], we consider another Markov chain evolving on a countable space , with stationary distribution and generator . We also assume a map (not necessarily onetoone) and a function that is somehow closely related to the original cost function . If is integrable, we further assume that we can compute the solution to the Poisson equation
(9) 
through which we arrive at a surrogate function
(10) 
Summing up, the approach outlined above is based on the fact that if is: 1) a relatively good approximation of and 2) tractable analytically, then we can derive a surrogate function and an estimator which is better than the original estimator in terms of TAVC (assuming that the extra calculation time needed for is not significant).
4 Practical implementation of the shadow function method in chemical kinetics
The shadow function method was originally developed for steadystate simulation of queueing systems, for which a wide range of known and tractable approximations exists. The solution of the approximating Poisson equation can thus be calculated explicitly in many cases, and the application of the method is straightforward. This is not the case for stochastic chemical kinetic systems, where explicit solutions are very hard or impossible to calculate. One thus has to resort to different types of approximation schemes, outlined below.
4.1 Statespace truncation
The Markov chains we are interested in satisfy the following properties:

They have a finite number of bounded increments over each finite time interval

Each state leads to a finite number of states (i.e. for every , for finitely many ’s)
For such chains, an obvious idea for obtaining an approximating process is to consider a chain evolving on a finite truncation of (i.e. consider to be a finite subset of ). Actually, under quite weak assumptions and careful definition of , one can show that the invariant distribution of on approaches that of as the truncation size grows [21]. This of course implies that also approaches . In this case, the function between the two state spaces can be intuitively defined to map every to itself, and every to some (which may vary with ). In this way, .
In order to arrive at a good approximation with this approach, one first has to study a few simulations of , to determine a finite set that contains a good amount of its invariant mass and then perform the necessary calculation of the solution to the Poisson equation on . The size of this set is determined in practice as a tradeoff between tractability and approximation accuracy. However, the applicability of this approach is in general very limited due to the fact that the required truncations grow exponentially with the system dimension. Another problem is that the approximation of (the solution to the original intractable Poisson equation) will be very poor for states , because of the form of , which projects are states outside back into the set. This implies that significant variance reduction will be hard to achieve (and in some cases variance may even increase), if the chain sample paths exit too frequently during simulation.
4.2 Approximating solutions of the Poisson equation
Instead of searching for an approximating Markov process, one may try to approximate the solution of (6) directly, to arrive at a suitable shadow function . This approach is also followed in Ref. [12], where the discretetime steadystate simulation problem is considered. Given a set of functions ^{1}^{1}1Candidate functions must satisfy a boundedness condition derived from a FosterLyapunov inequality. For more details, see Ref. [7] or Ch.8 of Ref. [12]. In the sequel we will assume that all the functions considered satisfy this property., one can define
(11) 
where and is a vector of weights.
In principle one could then try to calculate the value of that minimizes the TAVC of . Using (5) and (11), this variance constant turns out to be (see Appendix A)
(12) 
where solves (6). Thus, minimizing the TAVC of requires knowledge of , which is unavailable.
We thus have to resort to heuristic methods for obtaining a suboptimal estimate of , for example by determining the value of that minimizes
for some suitable measure . This is a linear least squares regression problem, which can be solved approximately by generating a set of training data , , with weights .
If we define the finitesample version of by
and similarly set
we can then calculate the matrix corresponding to by using the explicitly known form of the Markov chain generator (1) and finally obtain
(13) 
where , as the (weighted) least squares minimizer of .
4.3 Variance reduction algorithm using a shadow function
Putting together all the elements presented above, we summarize below the basic steps of the variance reduction algorithm implemented in this work:

Simulate a long path of the process using any preferred version of the stochastic simulation algorithm [5].

Obtain a rough estimate of from the simulated trajectory using .

Pick a set of functions and approximate the solution to the Poisson equation (6) by using the approach outlined above.

Evaluate along the simulated sample path.

Refine the estimate of using .

Verify that variance reduction has been achieved.
The last step is necessary to ensure that the variance has not actually increased due to the use of a suboptimal weight vector , and it can be carried out quite straightforwardly using the method of batch means [3] and the simulated trajectory from Step 1. In all cases we have tested, Steps 26 do not contribute more than a few seconds to the computational cost of this algorithm, which implies that the main computational bottleneck still lies at Step 1.
4.3.1 Implementation issues
The estimate of obtained by weighted least squares is clearly suboptimal, however it may still yield a reducedvariance estimator. The choice of the weighting measure in the optimization problem above is completely free, and one could in principle try to optimize over both and for a given problem. In practice however, such an approach would increase computational cost of the reducedvariance estimator and possibly eliminate the benefit of variance reduction. To maintain estimator efficiency, one should thus consider a single (or a few) “generic” choices for , and preferably reuse the points generated at Step 1.
A reasonable choice of weighting measure would be itself. The training set for regression would then consist of all distinct points visited by the process over the course of simulation in Step 1 (possibly after discarding the burnin period), weighted according to the empirical distribution of the process. A more coarse approximation of would be to use the same sample with all weights being equal. Yet another possibility consists of sampling from a uniform grid that is centered on the area containing the bulk of the invariant mass of the chain. This area can also be crudely determined from the sample of Step 1. All these approaches can achieve variance reduction, however the optimal choice remains problemdependent. Given that the calculation of least squares estimates can be carried out very efficiently using linear algebraic techniques, it is highly advisable to test several alternatives for the problem at hand.
In Ch.11 of Ref. [12], the problem of selecting an optimal is overcome by introducing a leastsquares temporal difference learning (LSTD) algorithm for the approximation of the value of that minimizes the variance of in the context of discretetime chains. The same algorithm could in principle be applied to continuoustime chains using the embedded discretetime Markov chain and carrying out the necessary modifications to the original algorithm, based on the results of Ref. [10]. While this solution is theoretically justified, it requires setting up and running an LSTD estimator in parallel with the simulated chain that will asymptotically converge to the optimal value of . Depending on the convergence properties of this estimator, the overall efficiency of the variance reduction scheme may be smaller than the efficiency achieved by using a suboptimal value for , especially when several approximating functions are considered.
Another degree of freedom in the design of shadow function estimators is the choice of the approximating set . Here, the probabilistic interpretation of Poisson’s equation may assist the selection of approximating functions by providing some useful intuition: Assuming is integrable and ergodic, it holds that [2, 11]
where is the hitting time of some state (changing simply shifts by a constant) and denotes expectation given . From this equation one may infer some general properties of (e.g. monotonicity, oscillatory behavior etc.) based on the form of the propensity functions. The same formula can be used to provide some crude simulationbased estimates of , which can be also helpful for the selection of . Finally, a Lyapunovtype analysis can be employed to infer the asymptotic behavior of [7].
5 Steadystate parameter sensitivity
Chemical reaction systems typically depend on several kinetic parameters, and the calculation of the output sensitivity with respect to these parameters is an essential step in the analysis of a given model. While there are several powerful parameter sensitivity methods available today [19, 1], they are mostly appropriate for transient sensitivity analysis, as the variance of their estimates tends to grow with the simulation length. Indeed, it can be shown that the variance of sensitivity methods based on the socalled likelihood ratio [6] or the Girsanov transformation [16] grows linearly with time. On the other hand, the variance of estimators based on finite parametric perturbations can be shown to remain bounded under mild conditions on the propensity functions, provided the underlying process is ergodic. However, the stationary variance can be still quite large, which makes necessary the use of a variance reduction method, such as the one presented here. Besides providing reducedvariance estimates of various steadystate functions of the chain, the shadow function estimator can be also employed for sensitivity analysis using a finite difference scheme [3] and the Common Random Numbers (CRN) estimator [17].
More analytically, assuming that the propensity functions of are of the form , where is a parameter of interest, the finite difference method aims to characterize the sensitivity of the steadystate value of a given function to a small finite perturbation of of around a nominal value . If is small enough, we expect that will be approximately equal to .
Finite differencebased sensitivity analysis using shadow functions can be simply carried out by generating process trajectories for the nominal and perturbed parameter values, and estimating by . As shown in Ref. [17], use of the same random number stream for the generation of both the nominal and perturbed trajectories can result in great variance decrease compared to using independent streams.
6 Numerical Examples
To demonstrate the efficiency of shadow function estimators, we next present two applications of the method to steadystate sensitivity estimation. We compare our finite difference scheme that uses common random numbers and the shadow function estimator to the method of Coupled Finite Differences (CFD) [1], which frequently outperforms finitedifference estimators based on common random numbers and the Random Time Change representation [1, 17].
All numerical examples were generated using customwritten Matlab scripts running on a 3.4 Ghz quadcore PC with 8 GB of RAM.
6.1 Stochastic focusing
As a first example, we consider the stochastic focusing model of [15], where an input signaling molecule inhibits the production of another molecule . Stochastic focusing arises due to the presence of stochastic fluctuations in , that make the mean value of more sensitive to changes than predicted by the deterministic model of the system. The same system is treated in Ref. [22] using a more sophisticated method based on trajectory reweighting.
The system reactions are given below:
(14) 
where . The parameters used are , and , while is varied between 200 and 900 to study the effect of varying on . More specifically (and similarly to Ref. [22]), we want to calculate the gain
To this end we estimate using finite differences with at several points between and . Figure 1 shows the calculated confidence intervals for obtained by the Common Random Number (CRN) estimator, the CRN estimator in conjunction with a shadow function and the CFD method. For each value of , a simulated sample path of length time units (t.u.) was used to generate 19 batches of length 400 t.u. each, while the first 400 t.u. were discarded as burnin.
Shadow functions consisted of linear combinations of all monomials in two variables up to order three (that is, , with ), together with ^{2}^{2}2This is an example where the probabilistic interpretation of the Poisson equation given in subsection 4.3.1 can provide useful intuition for the selection of approximating functions. In the case at hand, , so (the solution to the Poisson equation) is expected to grow only very slowly with , as the production rate of tends to zero as . . This set of ’s was selected manually and is definitely not the “optimal” choice. The training set used for regression consisted of all unique points visited by the process sample paths after a burnin period. Two alternative weighting schemes were tested for each value of : according to the first, all points were assigned equal weight (), while in the second one the points were weighted according to the empirical distribution of the process, calculated using the simulated sample paths (. Both schemes lead to variance reduction, and calculation of in each case can be performed very fast ( sec), given the small number of training points ().
Postprocessing of the trajectories for the evaluation of the shadow function over the different batches takes another 5 sec of CPU time. On the other hand, SSA simulation takes on average 40 sec, which demonstrates that the overhead associated with the shadow function usage is relatively small, while the computational savings in the estimation of are significant, as Table 1 demonstrates. Finally, a CFD simulation of the same length requires 220 sec of CPU time on average, while achieving a smaller magnitude of variance reduction.
2  2.5  3  3.5  4  4.5  5  5.5  6  7  9  
CFD:  5.0  5.3  4.0  5.6  21.0  6.5  16.7  25.3  6.9  22.2  16.6 
CRN+SF:  7464  2051  968  735  444  379  232  475  386  108  129 
Before we leave this example, we should point out that application of the shadow function method to just the birthanddeath process of results in tremendous variance reduction for and . As an example, Table 2 shows the confidence intervals of uncentered moment estimates obtained with and without a shadow function for , , using a simulated trajectory of t.u. and 10 batches. We attribute this phenomenon to the fact that the chosen set of functions can approximate the true solution to the Poisson equation very closely for . Of course, it is known that has a Poisson stationary distribution, which makes the use of a moment estimator pointless in this case. However, this interesting observation provides some heuristic justification for using polynomial approximating functions .
Moment  True value  C.I.  C.I. 

5  
30  
205  
1555 
6.2 A sixdimensional system
Our second example is a system consisting of two interacting genes, and . The product of gene forms homodimers, which repress the expression of , as well as heterodimers with the product of , which repress the expression of . The system species are listed in Table 3, while Table 4 displays the reaction scheme and propensities of our model. Note that several parameters are assumed to be the same for the two genes for simplicity. For the same reason, the gene states are omitted from the model.
Name  Symbol 

Gene A mRNA  
Protein A monomer  
Protein A dimer  
Gene B mRNA  
Protein B momomer  
AB dimer 
Reactions  Propensities 

,  
,  
,  
,  
,  
, 
The system comprises six molecular species interacting through twelve reactions ^{3}^{3}3Note that all examples presented in Ref. [22] consist of twospecies systems and no more than four reactions. Our goal is to estimate the sensitivity of the steadystate mean of (the second repressor dimer), denoted by , to small variations of each of the system parameters. Once more, we compare the behavior of the CRN steadystate estimator with and without a shadow function to the performance of the CFD method. Shadow functions for this system consisted of linear combinations of all monomials of state pairs up to order two. Since the number of unique points visited by this sixdimensional process during simulation was (expectedly) too large to be handled with the least squares method, 10000 points sampled uniformly at random from this set were used in the regression step.
For the finite difference method we perturbed each parameter by and estimated the 95% confidence intervals of each sensitivity estimate using batch means with 24 batches of length 4000 time units each (with an additional 4000 t.u. for burnin). Prior to parameter perturbations, the estimate of was calculated for
The estimates and their corresponding variances were: , , and . The results of the sensitivity analysis are summarized in Table 5. CRN sensitivity estimates and their associated confidence intervals not accurate enough to provide useful information. On the contrary, using a shadow function results in great improvements, as now the relative magnitudes and signs of the various sensitivity coefficients can be meaningfully compared to each other.
Normalized sensitivity coefficient  CRN 95% CI  CFD 95% CI  CRN+SF 95% CI 

The variance reduction method remains quite efficient computationally in this case as well: SSA simulation of a t.u. trajectory takes about 17 sec of CPU time, while calculation of requires 1 sec and postprocessing of the sample path another 3 sec. At the same time, a CFD simulation of the same length requires 95 sec of CPU time on average, while failing to achieve a comparable level of variance reduction.
7 Discussion
We demonstrated the applicability of the powerful shadow function method to the problem of steadystate simulation of stochastic chemical kinetics. Our results suggest that a significant increase in the efficiency of a steadystate estimator is possible by only a small increase in its computational cost. The method can be applied to the steadystate estimation of practically any function of the process, and can thus provide improved estimates of high order (cross)moments, as well as estimates of stationary probabilities for subsets of the process state space, by using set indicators as cost functions. The magnitude of variance reduction achieved by the shadow function method allows also the efficient and precise computation of steadystate parameter sensitivities using the finite difference method.
The comparison of the efficiency of this approach for providing steadystate sensitivity estimates with the one presented in [22] is the topic of our ongoing work. It would also be instructive to assess the relative strengths and weaknesses of the LSTD approximation algorithm for optimizing the shadow function [12] and test its scalability with system size and number of approximating functions (note that only onedimensional examples are treated in [12]).
The proposed workflow for arriving at a useful shadow function can be improved at several points, by drawing from the large literature on function approximation techniques, in order to enlarge its range of applicability and its accuracy. However, even a crude approach such as the one presented above seems to be sufficient for systems of practical interest.
Appendix A TAVC for shadow function estimator
References
 [1] Anderson, D. F. An efficient finite difference method for parameter sensitivities of continuous time markov chains. SIAM Journal on Numerical Analysis 50, 5 (2012), 2237–2258.
 [2] Asmussen, S. Applied probability and queues. Springer, 2003.
 [3] Asmussen, S., and Glynn, P. Stochastic simulation: Algorithms and analysis, vol. 57. Springer, 2007.
 [4] Bhattacharya, R. N. On the functional central limit theorem and the law of the iterated logarithm for markov processes. Probability Theory and Related Fields 60, 2 (1982), 185–201.
 [5] Gillespie, D. T. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58 (2007), 35–55.
 [6] Glynn, P. W. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33, 10 (1990), 75–84.
 [7] Glynn, P. W., and Meyn, S. P. A Lyapunov bound for solutions of the Poisson equation. The Annals of Probability 24, 2 (1996), 916–931.
 [8] Henderson, S. Variance Reduction Via an Approximating Markov Process. PhD thesis, Dept. of Operations Research, Stanford University, 1997.
 [9] Henderson, S., and Glynn, P. Approximating martingales for variance reduction in markov process simulation. Mathematics of Operations Research 27, 2 (2002), 253–271.
 [10] Hordijk, A., Iglehart, D. L., and Schassberger, R. Discrete time methods for simulating continuous time markov chains. Advances in Applied Probability (1976), 772–788.
 [11] Makowski, A. M., and Shwartz, A. The poisson equation for countable markov chains: probabilistic methods and interpretations. In Handbook of Markov decision processes. Springer, 2002, pp. 269–303.
 [12] Meyn, S. Control techniques for complex networks. Cambridge University Press, 2007.
 [13] Munsky, B., and Khammash, M. The finite state projection algorithm for the solution of the chemical master equation. The Journal of Chemical Physics 124 (2006), 044104.
 [14] Norris, J. R. Markov chains. Cambridge University Press, 1998.
 [15] Paulsson, J., Berg, O. G., and Ehrenberg, M. Stochastic focusing: Fluctuationenhanced sensitivity of intracellular regulation. Proceedings of the National Academy of Sciences 97, 13 (2000), 7148–7153.
 [16] Plyasunov, S., and Arkin, A. P. Efficient stochastic sensitivity analysis of discrete event systems. Journal of Computational Physics 221, 2 (2007), 724–738.
 [17] Rathinam, M., Sheppard, P. W., and Khammash, M. Efficient computation of parameter sensitivities of discrete stochastic chemical reaction networks. The Journal of chemical physics 132, 3 (2010).
 [18] Ruess, J., MiliasArgeitis, A., Summers, S., and Lygeros, J. Moment estimation for chemically reacting systems by extended kalman filtering. The Journal of chemical physics 135, 16 (2011), 165102–165102.
 [19] Sheppard, P. W., Rathinam, M., and Khammash, M. A pathwise derivative approach to the computation of parameter sensitivities in discrete stochastic chemical systems. The Journal of chemical physics 136 (2012), 034115.
 [20] Singh, A., and Hespanha, J. P. Approximate moment dynamics for chemically reacting systems. Automatic Control, IEEE Transactions on 56, 2 (2011), 414–418.
 [21] Tweedie, R. L. Truncation approximations of invariant measures for Markov chains. Journal of applied probability (1998), 517–536.
 [22] Warren, P. B., and Allen, R. J. Steadystate parameter sensitivity in stochastic modeling via trajectory reweighting. The Journal of Chemical Physics 136 (2012), 104106.