Control Variates for Stochastic Simulation
of Chemical Reaction Networks
Abstract
Stochastic simulation is a widely used method for estimating quantities in models of chemical reaction networks where uncertainty plays a crucial role. However, reducing the statistical uncertainty of the corresponding estimators requires the generation of a large number of simulation runs, which is computationally expensive. To reduce the number of necessary runs, we propose a variance reduction technique based on control variates. We exploit constraints on the statistical moments of the stochastic process to reduce the estimators’ variances. We develop an algorithm that selects appropriate control variates in an online fashion and demonstrate the efficiency of our approach on several case studies.
1 Introduction
Chemical reaction networks that are used to describe cellular processes are often subject to inherent stochasticity. The dynamics of gene expression, for instance, is influenced by single random events (e.g. transcription factor binding) and hence, models that take this randomness into account must monitor discrete molecular counts and reaction events that change these counts. Discretestate continuoustime Markov chains have successfully been used to describe networks of chemical reactions over time that correspond to the basic events of such processes. The timeevolution of the corresponding probability distribution is given by the chemical master equation, whose numerical solution is extremely challenging because of the enormous size of the underlying statespace.
Analysis approaches based on sampling, such as the Stochastic Simulation Algorithm (SSA) [16], can be applied independent of the size of the model’s statespace. However, statistical approaches are costly since a large number of simulation runs is necessary to reduce the statistical inaccuracy of estimators. This problem is particularly severe if reactions occur on multiple time scales or if the event of interest is rare. A particularly popular technique to speed up simulations is leaping which applies multiple reactions in one step of the simulation. However, such multistep simulations rely on certain assumptions about the number of reactions in a certain time interval. These assumptions are typically only approximately fulfilled and therefore introduce approximation errors on top of the statistical uncertainty of the considered point estimators.
Momentbased techniques offer a fast approximation of the statistical moments of the model. The exact moment dynamics can be expressed as an infinitedimensional system of ODEs, which cannot be directly integrated for a transient analysis. Hence, adhoc approximations need to be introduced, expressing higher order moments as functions of lowerorder ones [1, 11]. However, momentbased approaches rely on assumptions about the dynamics that are often not even approximately fulfilled and may lead to high approximation errors. Recently, equations expressing the moment dynamics have also been used as constraints for parameter estimation [3] and for computing moment bounds using semidefinite programming [10, 13].
In this work, we propose a combination of such moment constraints with the SSA approach. Specifically, we interpret these constraints as random variables that are correlated with the estimators of interest usually given as functions of chemical population variables. These constraints can be used as (linear) control variates in order to improve the final estimate and reduce its variance [21, 31]. The method is easy on an intuitive level: If a control variate is positively correlated with the function to be estimated then we can use the estimate of the variate to adjust the target estimate.
The incorporation of control variates into the SSA introduces additional simulation costs for the calculation of the constraint values. These values are integrals over time, which we accumulate based on the piecewise constant trajectories. This introduces a tradeoff between the variance reduction that is achieved by using control variates versus the increased simulation cost. This tradeoff is expressed as the product of the variance reduction ratio and the cost increase ratio.
For a good tradeoff, it is crucial to find an appropriate set of control variates. Here we propose a class of constraints which is parameterized by a moment vector and a weighting parameter, resulting in infinitely many choices. We present an algorithm that samples from the set of all constraints and proceeds to remove constraints that are either only weakly correlated with the target function or are redundant in combination with other constraints.
In a case study, we explore different variants of this algorithm both in terms of generating the initial constraint set and of removing weak or redundant constraints. We find that the algorithm’s efficiency is superior to a standard estimation procedure using stochastic simulation alone in almost all cases.
Although in this work we focus on estimating first order moments at fixed time points, the proposed approach can in principle deal with any property that can be expressed in terms of expected values such as probabilities of complex path properties. Another advantage of our technique is that an increased efficiency is achieved without the price of an additional approximation error as it is the case for methods based on moment approximations or multistep simulations.
This paper is structured as follows. In Section 2 we give a brief survey of methods and tools related to efficient stochastic simulation and moment techniques. In Section 3 we introduce the common stochastic semantics of chemical reaction networks. From these semantics we show in Section 4 how to derive constraints on the moments of the transient distribution. The variance reduction technique of control variates is described in Section 5. We show the design of an algorithm using moment constraints to reduce sample variance in Section 6. The efficiency and other characteristics of this algorithm are evaluated on four nontrivial case studies in Section 7. Finally, we discuss the findings and give possibilities for further work in Section 8.
2 Related Work
Much research has been directed at the efficient analysis of stochastic chemical reaction networks. Usually research focuses on improving efficiency by making certain approximations.
If the statespace is finite and small enough one can deal with the underlying Markov chain directly. But there are also cases where the transient distribution has an infinitely large support and one can still deal with explicit state probabilities. To this end, one can fix a finite statespace, that should contain most of the probability [24]. Refinements of the method work dynamically and adjust the statespace according to the transient distributions [2, 18, 23].
On the other end of the spectrum there are meanfield approximations, which model the mean densities faithfully in the system size limit [4]. In between there are techniques such as moment closure [29], that not only consider the mean, but also the variance and other higher order moments. These methods depend on adhoc approximations of higher order moments to close the ODE system given by the moment equations. Yet another class of methods approximate molecular counts continuously and approximate the dynamics in such a continuous space, e.g. the system size expansion [32] and the chemical Langevin equation [14].
While the moment closure method uses adhoc approximations for high order moments to facilitate numerical integration, they can be avoided in some contexts. For the equilibrium distribution, for example, the timederivative of all moments is equal to zero. This directly yields constraints that have been used for parameter estimation at steadystate [3] and bounding moments of the equilibrium distribution using semidefinite programming [12, 13, 19]. The latter technique of bounding moments has been successfully adapted in the context of transient analysis [10, 26, 27]. We adapt the constraints proposed in these works to improve statistical estimations via stochastic simulation (cf. section 4).
While the above techniques give a deterministic output, stochastic simulation generates single executions of the stochastic process [16]. This necessitates accumulating large numbers of simulation runs to estimate quantities. This adds a significant computational burden. Consequently, some effort has been directed at lowering this cost. A prominent technique is leaping [15], which in one step performs multiple instead of only a single reaction. Another approach is to find approximations that are specific to the problem at hand, such as approximations based on timescale separations [6, 5].
The most prominent application of a variance reduction technique in the context of stochastic reaction networks is importance sampling [20]. This technique relies on an alteration of the process and then weighting samples using the likelihoodratio between the original and the altered process.
3 Stochastic Chemical Kinetics
A chemical reaction network (CRN) describes the interactions between a set of species in a wellstirred reactor. Since we assume that all reactant molecules are spatially uniformly distributed, we just keep track of the overall amount of each molecule. Therefore the statespace is given by . These interactions are expressed a set of reactions with a certain inputs and outputs, given by the vectors and for reaction , respectively. Such reactions are denoted as
(1) 
The reaction rate constant gives us information on the propensity of the reaction. If just a constant is given, massaction propensities are assumed. In a stochastic setting for some state these are
(2) 
The system’s behavior is described by a stochastic process . The propensity function gives the infinitesimal probability of a reaction occurring, given a state . That is, for a small time step
(3) 
This induces a corresponding continuoustime Markov chain (CTMC) on with generator matrix^{1}^{1}1Assuming a fixed enumeration of the state space.
(4) 
Accordingly, the timeevolution of the process’ distribution, given an initial distribution , is given by the Kolmogorov forward equation, i.e. , where . For a single state, it is commonly referred to as the chemical master equation (CME)
(5) 
A direct solution of (5) is usually not possible. If the statespace with nonnegligible probability is suitably small, a state space truncation, i.e. (5) is integrated on a possibly timedependent subset [18, 24, 30]. Instead of directly analyzing (5), one often resorts to simulating trajectories. A trajectory over the interval is a sequence of states and corresponding jump times , and . We can sample trajectories of by using stochastic simulation [16].
Consider the birthdeath model below as an example.
Model 1 (Birthdeath process)
A single species has a constant production and a decay that is linear in the current amount of molecules. Therefore the model consists of two massaction reactions
where denotes no reactant or no product, respectively.
For Model 1 the change of probability mass in a single state is described by expanding (5) and
We can generate trajectories of this model by choosing either reaction, with a probability that is proportional to its rate given the current state . The jump time is determined by sampling from an exponential distribution with rate .
4 Moment Constraints
The timeevolution of for some function can be directly derived from (5) by computing the sum
which yields
(6) 
While many choices of are possible, for this work we will restrict ourselves to monomial functions , i.e. the noncentral moments of the process. The order of a moment is the sum over the exponents, i.e. . The integration of (6) with such functions is wellknown in the context of moment approximations of CRN models. For most models the arising ODE system is infinitely large, because the timederivative of low order moments usually depends on the values of higher order moments. To close this system, moment closures, i.e. adhoc approximations of higher order moments are applied [28]. The main drawback of this kind of analysis is that it is not known whether the chosen closure gives an accurate approximation for the case at hand. Here, such approximations are not necessary, since we will apply the moment dynamics in the context of stochastic sampling instead of trying to integrate (6).
Apart from integration strategies, setting (6) to zero has been used as a constraint for parameter estimation at steadystate [3] and bounding moments at steadystate [9, 13, 19]. The extension of the latter has recently lead to the adaption of these constraints to a transient setting [10, 27]. These two transient constraint variants are analogously derived by multiplying (6) by a timedependent, differentiable weighting function and integrating:
In the context of computing moment bounds via semidefinite programming the choices [27] and [10] have been proposed. While both choices proved to be effective in different case studies, relying solely on the latter choice, i.e. was sufficient.
By expanding the rate functions and in (7) and substituting the exponential weight function we can rewrite (7) as
(8) 
with coefficients and vectors defined accordingly. Assuming the moments remain finite on , we can define the random variable
(9) 
with .
Note, that a realization of depends on the whole trajectory over . Thus, for the integral terms in (9) we have to compute sums
(10) 
over a given trajectory. This accumulation is best done during the simulation to avoid storing the whole trajectory. Still, the cost of a simulation run increases. For the method to be efficient, the variance reduction (Section 5) needs to overcompensate for this increased cost of a simulation run.
5 Control Variates
Now, we are interested in the estimation of some quantity by stochastic simulation. Let be independent samples of . Then the sample mean is an estimate of . By the central limit theorem
Now suppose, we know of a random variable with . The variable is called a control variate. If a control variate is correlated with , we can use it to reduce the variance of [17, 25, 31, 33]. For example, consider we are running a set of simulations and consider a single constraint. If the estimated value of this constraint is larger than zero and we estimate a positive correlation between the constraint and , we would, intuitively, like to decrease our estimate accordingly. This results in an estimation of the mean of the random variable
instead of . The variance
The optimal choice can be computed by considering the minimum of . Then
Therefore , where is the correlation of and .
If we have multiple control variates, we can proceed in a similar fashion. Now, let denote a vector of control variates and let
be the covariance matrix of . As above, we estimate the mean of The ideal choice of is the result of an ordinary least squares regression between and , . Specifically, . Then, asymptotically the variance of this estimator is [31],
(12) 
In practice, however, is unknown and needs to be replaced by an estimate . This leads to an increase in the estimator’s variance. Under the assumption of and having a multivariate normal distribution [8, 21], the variance of the estimator is
(13) 
Clearly, a control variate is “good” if it is highly correlated with . The constraint in (11) is an example of the extreme case. When we use this constraint as a control variate for the estimation of the mean at some time point , it has a correlation of since it describes the mean at that time precisely. Therefore the variance is reduced to zero. We thus aim to pick control variates that are highly correlated with .
Consider, for example, the above case of the birthdeath process. If we choose (11) as a constraint, it would always yield the exact difference of the exact mean to the sample mean and therefore have a perfect correlation. Clearly, reduces to 1 and .
6 Momentbased Variance Reduction
We propose an adaptive estimation algorithm (Algorithm 1) that starts out with an initial set of control variates and periodically removes potentially inefficient variates.
The algorithm consists of a main loop which performs simulation runs (line 4). Between each run the mean and covariance estimates of are updated (line 6). Every iterations, the control variates are checked for efficiency and redundancy (lines 7–12).
Checking both conditions is based on the correlation between the th and th control variate and the correlation of a control variate to . The first condition is a simple lower threshold for a correlation . This condition aims to remove those variates from the control variate set that are only weakly correlated to (line 9). The rationale is that, if variate has a low correlation with the variable of interest , its computation may not be worth the costs. Here, we propose to set heuristically as
where is an algorithm parameter.
The second condition aims to remove redundant conditions. This is not only beneficial for the efficiency of the estimator, but also necessary for the matrix inversion (12) because perfectly and highly correlated constraints will make the covariance matrix estimate (quasi) singular. For all considered criteria we iterate over all tuples , , removing the weaker of the two, i.e. , if the two control variates are considered redundant (line 10).
There are many ways to define such a redundancy criterion. Here, we focus on criteria that are defined in terms of the average correlation . For two variates and we then check if their mutual correlation exceeds a some function of , i.e. we check the inequality
If this inequality holds, constraint is removed. Naturally, there are many possible choices for the above decision boundary (cf. Fig. 1a).
The simplest choice is to ignore and just fix a constant close to 1 as a threshold, e.g. . While this often leads to the strongest variance reduction and avoids numerical issues in the control variate computation, it turns out that the computational overhead is not as wellcompensated as by other choices of (see Section 7).
Another option is to fix a simple linear function, i.e. . For this choice the intuition is, that one of two constraints is removed if their mutual correlation exceeds their average correlation with .
Here, we also assess two quadratic choices for . The first choice of is more tolerant than the linear function and more strict than a threshold function, except for highly correlated control variates. Another variant of is given by including the lower bound and scaling the quadratic function accordingly: . The different choices of considered here are plotted in Figure 1a.
Now, we discuss the choice of the initial control variates . We identify control variate by a tuple of a moment vector and a timeweighting parameter . That is, we use and in (7). For a given set of parameters , we use all moments up to some fixed order (line 3). The ideal set of parameters is generally not known. For certain choices the correlation of the control variates and the variable of interest is higher then for others. To illustrate this, consider the above example of the birthdeath process. Choosing leads to a control variate that has a correlation of with . Therefore, the ideal choice of initial values for would be . This, however, is generally not known. Therefore, we sample a set of ’s from some fixed distribution (line 2).
7 Case Studies
We first define a criterion of efficiency in order to estimate whether the reduction in variance is worth the increased cost. A natural baseline of a variance reduction is, that it is more efficient to pay for the overhead of the reduction than to generate more samples to achieve a similar reduction of variance. Let be the variance of . The efficiency of the method is the ratio of the necessary cost to achieve a similar reduction with the CV estimate compared to the standard estimate [22], i.e.
(14) 
That ratio depends on both the specific implementation and the technical setup. The simulation is implemented in the Rust programming language^{2}^{2}2https://www.rustlang.org. The model description is parsed from a high level specification. Rate functions are compiled to stack programs for fast evaluation. The cost increase is mainly due to the computation of the integrals in (8). But the repeated checking of control variates for efficiency also increases the cost. The accumulation over the trajectory directly increases the cost of a single simulation which is the critical part of the estimation. To estimate the baseline cost , 2000 estimations were performed without considering any control variates.
We consider four nontrivial case studies. Three models exhibit complex multimodal behaviour. We now describe the models and the estimated quantities in detail.
The first model is a simple dimerization on a countably infinite statespace.
Model 2 (Dimerization)
We first examine a simple dimerization model on an unbounded statespace
with initial condition .
Despite the models simplicity, the moment equations are not closed for this system due to the second reaction which is nonlinear. Therefore a direct analysis of the expected value would require a closure. For this model we will estimate .
The following two models are bimodal, i.e. they each posses two stable regimes among which they can switch stochastically. For both models we choose the initial conditions such that the process will move towards either attracting region with equal probability.
Model 3 (Distributive Modification)
The distributive modification model was introduced in [7]. It consists of the reactions
with initial conditions .
Model 4 (Exclusive Switch)
The exclusive switch model consists of 5 species, 3 of which are typically binary (activity states of the genes).
with initial conditions and .
We evaluate the influence of algorithm parameters, choices of distributions to sample from, and the influence of the sample size on the efficiency of the proposed method. Note that the implementation does not simplify the constraint representations or the state space according to stoichiometric invariants or limited state spaces. Model 3, for example has the invariant , , which could be used to reduce the statespace dimensionality to two. In Model 4 the invariant could be used to optimize the algorithm by eliminating redundant moments, e.g. . Such an optimization would further increase the efficiency of the algorithm.
We first turn to the choice of the sampling distribution. Here we consider two choices:

a standard normal distribution ,

a uniform distribution on .
We deterministically include in the constraint set, as this parameter corresponds to a uniform weighting function. We performed estimations on the case studies using different valuations of the algorithm parameters of the minimum threshold and the number of samples . We used samples size and checked the control variates every iterations for the defined criteria. For each valuation 1000 estimations were performed. In Figure 2b, we summarize the efficiencies for the arising parameter combinations on the three case studies. Most strikingly, we can note that the efficiency was consistently larger than one in all cases. Generally, the normal sampling distribution outperformed the alternative uniform distribution, except in case of the dimerization. The reason for this becomes apparent, when examining Figure 1b,c: In case of the dimerization model the most efficient constraints are found for , while in case of the distributive modification they are located just above 0 (we observe a similar pattern for the exclusive switch case study). Therefore the sampling of efficient values is more likely using a uniform distribution for the dimerization case study, than it is for the others. Given that larger absolute values for seem unreasonable due their exponential influence on the weighting function and the problem of fixing a suitable interval for a uniform sampling scheme, the choice of a standard normal distribution for seems superior.
In Figure 3 we compare efficiencies for different maximum orders of constraints . This comparison is performed for different choices of the redundancy rule and initial sample sizes . Again, for each parameter valuation 1000 estimations were performed. With respect to the maximum constraints order we see a clear tendency, that the inclusion of second order constraints lessens the efficiency of the method. In case of a constant redundancy threshold it even dips below breakeven for the distributive modification case study. This is not surprising, since the inclusion of second order moments increases the number of initial constraints quadratically and the incurred cost, especially of the first iterations, lessens efficiency.
Figure 4 depicts the tradeoff between the variance reduction versus the cost ratio . Comparing the redundancy criterion based on a constant threshold to the others, we observe both a larger variance reduction and an increased cost. This is due to the fact, that more control variates are included throughout the simulations (cf. Appendix A, Tables 1,2). Depending on the sample distribution and the case study, this permissive strategy may pay off. In case of the dimerization, for example, it pays off, while in case of the distributive modification it leads to a lower efficiency ratio. In the latter case the model is more complex, and therefore the set of initial control variates is larger. With a more permissive redundancy strategy, more control variates are kept (ca. 10 when using vs. ca. 2–3 for the others). The other redundancy boundaries move the results further in the direction of less variance reduction while keeping the cost increase low. On the opposite end is the linear . The quadratic versions and can be found in the middle of this spectrum.
We also observe, that an increase of is particularly beneficial, if the sampling distribution does not capture the parameter region of the highest correlations well. This can be seen for the Dimerization case study, where the variance reduction increases strongly with increasing sample size (App. A Tables 1,2). Since , more samples are needed to sample efficient values (cf. Figure 1b).
Finally, we discuss the effect of the sample size on the efficiency . In Figure 5 we give both the efficiencies and the slowdown for different sample sizes. As a redundancy rule we used the unscaled quadratic function, 30 initial values of , and . With increasing sample size, the efficiency usually approaches an upper limit. This is due to the fact that most control variates are dropped early on and the control variates often remain the same for the rest of the simulations. If we assume there are no helpful control variates in the initial set and all would be removed at iteration 100, the efficiency would approach 1 with .
8 Conclusion
In this work we have shown that known constraints on the moment dynamics can be successfully leveraged in simulationbased estimation of expected values. The empirical results indicate that the supplementing a standard SSA estimation with moment information can drastically reduce the estimators’ variance. This reduction is paid for by accumulating information on the trajectory during simulation. However, the reduction is able to compensate for this increase. This means that for fixed costs, using estimates with control variates is more beneficial than using estimates without control variates.
While a variety of algorithmic variants was evaluated, many aspects remain subject to further study. In particular different choices of and in (7) may improve efficiency further. These choices become particularly interesting when moving from the estimation of simple first order moments to more complex queries such as behavioural probabilities of trajectories. In such cases, one might even attempt to find efficient control variate functions using machine learning methods.
Another open question regarding this work is its performance when multiple quantities instead of a single quantity are to be estimated. In such a case, constraints would be particularly beneficial, if they lead to improvements as many estimation targets as possible.
Furthermore the identification of the best weighting parameters could be done in a more adaptive fashion. The presented scheme of a sampling from could be extended into a Bayesianlike procedure, wherein the values for are periodically resampled from a distribution that is adjusted according to the bestperforming constraints up to that point.
References
 [1] Ale, A., Kirk, P., Stumpf, M.P.: A general moment expansion method for stochastic kinetic models. The Journal of chemical physics 138(17), 174101 (2013)
 [2] Andreychenko, A., Mikeev, L., Spieler, D., Wolf, V.: Parameter identification for markov models of biochemical reactions. In: International Conference on Computer Aided Verification. pp. 83–98. Springer (2011)
 [3] Backenköhler, M., Bortolussi, L., Wolf, V.: Momentbased parameter estimation for stochastic reaction networks in equilibrium. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 15(4), 1180–1192 (2018)
 [4] Bortolussi, L., Hillston, J., Latella, D., Massink, M.: Continuous approximation of collective system behaviour: A tutorial. Performance Evaluation 70(5), 317–349 (2013)
 [5] Bortolussi, L., Milios, D., Sanguinetti, G.: Efficient stochastic simulation of systems with multiple time scales via statistical abstraction. In: International Conference on Computational Methods in Systems Biology. pp. 40–51. Springer (2015)
 [6] Cao, Y., Gillespie, D.T., Petzold, L.R.: The slowscale stochastic simulation algorithm. The Journal of chemical physics 122(1), 014116 (2005)
 [7] Cardelli, L., CsikászNagy, A.: The cell cycle switch computes approximate majority. Scientific reports 2, 656 (2012)
 [8] Cheng, R.C.: Analysis of simulation experiments under normality assumptions. Journal of the Operational Research Society 29(5), 493–497 (1978)
 [9] Dowdy, G.R., Barton, P.I.: Bounds on stochastic chemical kinetic systems at steady state. The Journal of chemical physics 148(8), 084106 (2018)
 [10] Dowdy, G.R., Barton, P.I.: Dynamic bounds on stochastic chemical kinetic systems using semidefinite programming. The Journal of chemical physics 149(7), 074103 (2018)
 [11] Engblom, S.: Computing the moments of high dimensional solutions of the master equation. Applied Mathematics and Computation 180(2), 498–515 (2006)
 [12] Ghusinga, K.R., Lamperski, A., Singh, A.: Estimating stationary characteristic functions of stochastic systems via semidefinite programming. In: 2018 European Control Conference (ECC). pp. 2720–2725. IEEE (2018)
 [13] Ghusinga, K.R., VargasGarcia, C.A., Lamperski, A., Singh, A.: Exact lower and upper bounds on stationary moments in stochastic biochemical systems. Physical biology 14(4), 04LT01 (2017)
 [14] Gillespie, D.T.: The chemical langevin equation. The Journal of Chemical Physics 113(1), 297–306 (2000)
 [15] Gillespie, D.T.: Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics 115(4), 1716–1733 (2001)
 [16] Gillespie, D.: Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry 81(25), 2340–2361 (1977)
 [17] Glasserman, P., Yu, B.: Large sample properties of weighted monte carlo estimators. Operations Research 53(2), 298–312 (2005)
 [18] Henzinger, T.A., Mateescu, M., Wolf, V.: Sliding window abstraction for infinite markov chains. In: International Conference on Computer Aided Verification. pp. 337–352. Springer (2009)
 [19] Kuntz, J., Thomas, P., Stan, G.B., Barahona, M.: Rigorous bounds on the stationary distributions of the chemical master equation via mathematical programming. arXiv preprint arXiv:1702.05468 (2017)
 [20] Kuwahara, H., Mura, I.: An efficient and exact stochastic simulation method to analyze rare events in biochemical systems. The Journal of chemical physics 129(16), 10B619 (2008)
 [21] Lavenberg, S.S., Moeller, T.L., Welch, P.D.: Statistical results on control variables with application to queueing network simulation. Operations Research 30(1), 182–202 (1982)
 [22] L’Ecuyer, P.: Efficiency improvement and variance reduction. In: Proceedings of the 26th conference on Winter simulation. pp. 122–132. Society for Computer Simulation International (1994)
 [23] Mateescu, M., Wolf, V., Didier, F., Henzinger, T.: Fast adaptive uniformisation of the chemical master equation. IET systems biology 4(6), 441–452 (2010)
 [24] Munsky, B., Khammash, M.: The finite state projection algorithm for the solution of the chemical master equation. The Journal of chemical physics 124(4), 044104 (2006)
 [25] Nelson, B.L.: Control variate remedies. Operations Research 38(6), 974–992 (1990)
 [26] Sakurai, Y., Hori, Y.: A convex approach to steady state moment analysis for stochastic chemical reactions. In: Decision and Control (CDC), 2017 IEEE 56th Annual Conference on. pp. 1206–1211. IEEE (2017)
 [27] Sakurai, Y., Hori, Y.: Bounding transient moments of stochastic chemical reactions. IEEE Control Systems Letters 3(2), 290–295 (2019)
 [28] Schnoerr, D., Sanguinetti, G., Grima, R.: Comparison of different momentclosure approximations for stochastic chemical kinetics. The Journal of Chemical Physics 143(18), 11B610_1 (2015)
 [29] Singh, A., Hespanha, J.P.: Lognormal moment closures for biochemical reactions. In: Proceedings of the 45th IEEE Conference on Decision and Control. pp. 2063–2068. IEEE (2006)
 [30] Spieler, D.: Numerical analysis of longrun properties for Markov population models. Ph.D. thesis, Saarland University (2014)
 [31] Szechtman, R.: Control variate techniques for monte carlo simulation: control variates techniques for monte carlo simulation. In: Proceedings of the 35th conference on Winter simulation: driving innovation. pp. 144–149. Winter Simulation Conference (2003)
 [32] Van Kampen, N.G.: Stochastic processes in physics and chemistry, vol. 1. Elsevier (1992)
 [33] Wilson, J.R.: Variance reduction techniques for digital simulation. American Journal of Mathematical and Management Sciences 4(34), 277–312 (1984)
Appendix A Supplementary Material
In Figure 6 we give detailed information on the influence of algorithm parameters , the number of initial values, and different redundancy rules. The sampling distribution is a standard normal.
Case study  slowdown  efficiency  

Dimerization  10  0.881641  1.530137  5.558692  1.621917  
0.965224  1.945588  14.859417  3.338501  
0.916445  1.625232  7.409904  1.997045  
0.868288  1.380344  5.529745  1.081152  
20  0.941153  1.637272  10.437978  1.842971  
0.964204  1.907999  14.747328  2.915082  
0.947984  1.747519  11.072422  2.227250  
0.931030  1.433401  10.169570  1.088572  
30  0.959517  1.723449  14.404936  1.972426  
0.962514  1.770936  15.142156  2.216103  
0.966216  1.847441  16.117387  2.446661  
0.945724  1.456432  12.710196  1.084188  
Dist. mod.  10  0.619560  1.488483  1.770218  3.232575  
0.700255  2.241695  1.492171  8.008607  
0.643550  1.613001  1.743500  3.817641  
0.596650  1.459405  1.703170  2.657000  
20  0.697414  1.519425  2.181687  2.631677  
0.713445  2.706546  1.292838  10.295856  
0.697654  1.585313  2.092817  3.398235  
0.695846  1.473976  2.235418  2.226530  
30  0.712941  1.543068  2.263644  2.378037  
0.721354  2.874249  1.252541  10.910880  
0.711877  1.607712  2.164485  2.979704  
0.669963  1.522184  1.996300  2.085473  
Excl. switch  10  0.807184  1.227471  4.239255  2.536479  
0.880285  1.633530  5.135205  7.411732  
0.849082  1.312416  5.067770  3.639250  
0.783459  1.195821  3.874778  2.090101  
20  0.856593  1.263340  5.539683  2.206154  
0.910480  1.864405  6.011256  9.441336  
0.867987  1.317958  5.765884  3.140806  
0.825518  1.243075  4.627662  1.981143  
30  0.869165  1.298893  5.905196  2.059415  
0.921019  1.966191  6.461331  9.928998  
0.876822  1.340409  6.079876  2.762449  
0.843288  1.288925  4.968796  1.983174 
Case study  slowdown  efficiency  

Dimerization  10  0.905254  1.659799  6.402491  2.388472  
0.987526  2.474939  33.074955  6.501180  
0.923063  1.822654  7.195544  3.179257  
0.878232  1.415909  5.830248  1.092264  
20  0.949038  1.831995  10.797164  2.890898  
0.985710  2.391457  29.704344  5.450299  
0.968076  2.021487  15.662368  3.681229  
0.925413  1.449386  9.298961  1.072761  
30  0.964855  1.924268  14.911787  3.026275  
0.981507  2.144089  25.520987  4.179125  
0.973902  2.095985  18.507746  3.685851  
0.948349  1.507425  12.904707  1.074538  
Dist. mod.  10  0.619450  1.734737  1.519168  3.148184  
0.665361  3.301159  0.909443  13.456259  
0.680592  1.840457  1.705876  3.864240  
0.612674  1.662962  1.556868  2.659592  
20  0.684789  1.811408  1.755652  2.687379  
0.689835  4.455005  0.726640  17.609554  
0.687665  1.901258  1.688449  3.413595  
0.651262  1.770238  1.623924  2.266729  
30  0.690602  1.922217  1.686011  2.375455  
0.649191  4.837419  0.591701  19.145054  
0.701253  2.001179  1.677062  3.007525  
0.639123  1.894074  1.467403  2.086275  
Excl. switch  10  0.811956  1.505521  3.544783  2.323999  
0.916866  4.507566  2.681363  21.692390  
0.868874  1.776190  4.309354  4.739893  
0.795802  1.466579  3.353046  2.016196  
20  0.832562  1.657484  3.617313  2.085711  
0.934280  6.348223  2.406431  29.976320  
0.878944  1.879341  4.416281  3.990881  
0.837922  1.647329  3.759896  1.978017  
30  0.829427  1.844766  3.190308  2.043201  
0.947324  7.130628  2.673225  32.513670  
0.878830  2.053317  4.034987  3.611746  
0.824936  1.838879  3.118728  1.978836 