Bayesian inference for Markov jump processes with informative observations

Bayesian inference for Markov jump processes with informative observations


In this paper we consider the problem of parameter inference for Markov jump process (MJP) representations of stochastic kinetic models. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, Bayesian inference typically proceeds through computationally intensive methods such as (particle) MCMC. Such methods ostensibly require the ability to simulate trajectories from the conditioned jump process. When observations are highly informative, use of the forward simulator is likely to be inefficient and may even preclude an exact (simulation based) analysis. We therefore propose three methods for improving the efficiency of simulating conditioned jump processes. A conditioned hazard is derived based on an approximation to the jump process, and used to generate end-point conditioned trajectories for use inside an importance sampling algorithm. We also adapt a recently proposed sequential Monte Carlo scheme to our problem. Essentially, trajectories are reweighted at a set of intermediate time points, with more weight assigned to trajectories that are consistent with the next observation. We consider two implementations of this approach, based on two continuous approximations of the MJP. We compare these constructs for a simple tractable jump process before using them to perform inference for a Lotka-Volterra system. The best performing construct is used to infer the parameters governing a simple model of motility regulation in Bacillus subtilis.

Keywords: Bayes; chemical Langevin equation (CLE); linear noise approximation (LNA); Markov jump process (MJP); pMCMC; particle marginal Metropolis-Hastings (PMMH); sequential Monte Carlo (SMC); stochastic kinetic model (SKM).


Stochastic kinetic models, most naturally represented by Markov jump processes (MJPs), can be used to model a wide range of real-world phenomena including the evolution of biological systems such as intra-cellular processes [24], predator-prey interaction [8] and epidemics [5]. The focus of this paper is to perform exact and fully Bayesian inference for the parameters governing the MJP, using discrete time course observations that may be incomplete and subject to measurement error. A number of recent attempts to address the inference problem have been made. For example, a data augmentation approach was adopted by [8] and applied to discrete (and error-free) observations of a Lotka-Volterra process. The particle marginal Metropolis-Hastings (PMMH) algorithm of [1] has been applied by [25] and [39] to estimate the parameters in model auto-regulatory networks.

The PMMH algorithm offers a promising approach, as it permits a joint update of the parameters and latent process, thus alleviating mixing problems associated with strong correlations. Moreover, the simplest approach is “likelihood-free” in the sense that only forward simulations from the MJP are required. These simulations can be readily obtained by using, for example, Gillespie’s direct method [20]. The PMMH scheme requires running a sequential Monte Carlo (SMC) scheme (such as the bootstrap particle filter of [26]) at every iteration. Given the potential for huge computational burden, improvements to the overall efficiency of PMMH for MJPs has been the focus of [23]. The latter propose a delayed acceptance analogue of PMMH, (daPMMH), that uses approximations to the MJP such as the chemical Langevin equation (CLE) and linear noise approximation (LNA) [43] to screen out parameter draws that are likely to be rejected by the sampler. It should be noted that the simplest likelihood free implementations of both PMMH and daPMMH are likely to perform poorly unless the noise in the measurement error process dominates the intrinsic stochasticity in the MJP. Essentially, in low measurement error cases, only a small number of simulated trajectories will be given reasonable weight inside the SMC scheme, leading to highly variable estimates of marginal likelihood used by the PMMH scheme to construct the acceptance probability. Intolerably long mixing times ensue, unless computational budget permits a large number of particles to be used. In the special case of error-free observation, the algorithm will be impracticable for models of realistic size and complexity, since in this case, trajectories must “hit” the observations.

The development of efficient schemes for generating MJP trajectories that are conditioned on the observations, henceforth referred to as MJP bridges, is therefore of paramount importance. Whilst there is considerable work on the construction of bridges for continuous valued Markov (diffusion) processes [13], seemingly little has been done for discrete state spaces. The approach taken by [8] linearly interpolates the hazard function between observation times but requires full and error-free observation of the system of interest. [15] consider an importance sampling algorithm for finite state Markov processes where informative observations are dealt with by sampling reaction times from a truncated exponential distribution and reaction type probabilities are weighted by the probability of reaching the next observation. [27] improve the performance of particle-based Monte Carlo algorithms by analytically marginalising waiting times. The method requires a user-specified potential to push trajectories towards the observation.

Our novel contribution is an MJP bridge obtained by sampling a jump process with a conditioned hazard that is derived by approximating the expected number of reaction events between observations, given the observations themselves. The resulting hazard is time dependent, however, we find that a simple implementation based on exponential waiting times between proposed reaction events works well in practice. We also adapt the recently proposed bridge particle filter of [10] to our problem. Their scheme works by generating forward simulations from the process of interest, and reweighting at a set of intermediate times at which resampling may also take place. A look ahead step in the spirit of [32] prunes out trajectories that are inconsistent with the next observation. The implementation requires an approximation to the (unavailable) transition probability governing the MJP. [10] used a flexible Gaussian process to approximate the unavailable transition density of a diffusion process. Here, we take advantage of two well known continuous time approximations of the MJP by considering use of the transition density under a discretisation of the CLE or the tractable transition density under the LNA. The methods we propose are simple to implement and are not restricted to finite state spaces.

In section 2, a review of the basic structure of the problem is presented, showing how the Markov process representation of a reaction network is constructed. In section 3, we consider the problem of sampling conditioned MJPs and give three viable solutions to this problem. In section 4, it is shown how the recently proposed particle MCMC algorithms [1] may be applied to this class of models. It is also shown how the bridge constructs introduced in section 3 can be used with a pMCMC scheme. The methodology is applied to a number of applications in section 5 before conclusions are drawn in section 6.

2Stochastic kinetic models

We consider a reaction network involving species and reactions , with a typical reaction denoted by and written using standard chemical reaction notation as

Let denote the number of molecules of species at time , and let be the -vector . The dynamics of this model can be described by a vector of rates (or hazards) of the reactions together with a matrix which describes the effect of each reaction on the state. We therefore define a rate function , giving the overall hazard of a type reaction occurring, and we let this depend explicitly on the reaction rate constant , as well as the state of the system at time . We model the system with a Markov jump process (MJP), so that for an infinitesimal time increment , the probability of a type reaction occurring in the time interval is . When a type reaction does occur, the system state changes discretely, via the th row of the so called net effect matrix , a matrix with th element given by . In what follows, for notational convenience, we work with the stoichiometry matrix defined as . Under the standard assumption of mass action kinetics, the hazard function for a particular reaction of type takes the form of the rate constant multiplied by a product of binomial coefficients expressing the number of ways in which the reaction can occur, that is,

Values for and the initial system state complete specification of the Markov process. Although this process is rarely analytically tractable for interesting models, it is straightforward to forward-simulate exact realisations of this Markov process using a discrete event simulation method. This is due to the fact that if the current time and state of the system are and respectively, then the time to the next event will be exponential with rate parameter

and the event will be a reaction of type with probability independently of the waiting time. Forwards simulation of process realisations in this way is typically referred to as Gillespie’s direct method in the stochastic kinetics literature, after [20]. See [46] for further background on stochastic kinetic modelling.

The primary goal of this paper is that of inference for the stochastic rate constants , given potentially noisy observations of the system state at a set of discrete times. [25] demonstrated that it is possible to use a particle marginal Metropolis-Hastings (PMMH) scheme for such problems, using only the ability to forward simulate from the system of interest and evaluate the density associated with the observation error process. This “likelihood free” implementation uses the bootstrap particle filter of [26]. As noted by [25], the efficiency of this algorithm is likely to be very poor when observations are highly informative. Moreover, in the special case of error-free observation, the algorithm will be computationally intractable for models of realistic size and complexity. We therefore consider three constructions for generating realisations of conditioned jump processes, for use in a PMMH scheme. These constructs rely on the ability to construct both cheap and accurate approximations of the MJP. We therefore consider two candidate approximations in the next section.

2.1SKM approximations

Chemical Langevin equation

Over an infinitesimal time interval, , the reaction hazards will remain constant almost surely. The occurrence of reaction events can therefore be regarded as the occurrence of events of a Poisson process with independent realisations for each reaction type. Therefore, the mean and variance of the change in the MJP over the infinitesimal time interval can be calculated as

The Itô stochastic differential equation (SDE) that has the same infinitesimal mean and variance as the true MJP is therefore

where (without loss of generality) is a matrix and is a -vector of standard Brownian motion. Equation is the SDE most commonly referred to as the chemical Langevin equation (CLE), and can be shown to approximate the SKM increasingly well in high concentration scenarios [22]. The CLE can rarely be solved analytically, and it is common to work with a discretisation such as the Euler-Maruyama discretisation:

where is a standard multivariate Gaussian random variable.

For a more formal discussion of the CLE and its derivation, we refer the reader to [21] and [22].

Linear noise approximation

The linear noise approximation (LNA) further approximates the MJP by linearising the drift and noise terms of the CLE. The LNA generally possesses a greater degree of numerical and analytic tractability than the CLE. For example, the LNA solution involves (numerically) integrating a set of ODEs for which standard routines, such as the lsoda package [35], exist. The LNA can be derived in a number of more or less formal ways [30]. Our brief derivation follows the approach of [46] to which we refer the reader for further details.

We begin by replacing the hazard function in equation (Equation 1) with the rescaled form where is the volume of the container in which the reactions are taking place. Note that the LNA approximates the CLE increasingly well as and become large, that is, as the system approaches its thermodynamic limit. The CLE then becomes

We now write the solution of the CLE as a deterministic process plus a residual stochastic process [43],

We then Taylor expand the rate function around to give

where is the Jacobian matrix with th element and we suppress the dependence of on and for simplicity. Substituting (Equation 3) and (Equation 4) into equation (Equation 2) and collecting terms of and give the ODE satisfied by , and SDE satisfied by respectively, as

Equations (Equation 3), (Equation 5) and ( ?) give the linear noise approximation of the CLE and in turn, an approximation of the Markov jump process model.

For fixed or Gaussian initial conditions, that is , the SDE in ( ?) can be solved explicitly to give

where and satisfy the coupled ODE system given by

Hence we obtain

In what follows we assume, without loss of generality, that .

3Sampling conditioned MJPs

We suppose that interest lies in the Markov jump process over an interval denoted by . In fact, it is convenient to denote by the collection of reaction times and types over the interval , which in turn gives the sample path of each species over this interval. Suppose further that the initial state is a known fixed value and that (a subset of components of) the process is observed at time subject to Gaussian error, giving a single observation on the random variable

Here, is a length- vector, is a constant matrix of dimension and is a length- Gaussian random vector. We denote the density linking and as . For simplicity, we assume in this section that both and the values of the rate constants are known, and drop them from the notation where possible.

Our goal is to generate trajectories from with associated probability function

where is the probability function associated with . Although will typically be intractable, simulation from is straightforward (via Gillespie’s direct method), suggesting the construction of a numerical scheme such as Markov chain Monte Carlo or importance sampling. In keeping with the pMCMC methods described in section 4, we focus on the latter.

The simplest importance sampling strategy (given in Algorithm ?) proposes from and weights by . If desired, an unweighted sample can easily be obtained by resampling (with replacement) from the discrete distribution over trajectory draws using the normalised weights as probabilities. Plainly, taking the average of the unnormalised weights gives an unbiased estimate of the normalising constant

This strategy is likely to work well provided that is not particularly informative. The proposal mechanism is independent of the observation and as is reduced, the variance of the importance weights increases. In an error free scenario, with , the unnormalised weights take the value 1 if and are 0 otherwise. Hence, in this extreme scenario, only trajectories that “hit” the observation have non-zero weight.

In order to circumvent these problems, in Section 3.1 we derive a novel proposal mechanism based on an approximation of the expected number of reaction events over the interval of interest, conditioned on the observation. In addition, in Section 3.2 we adapt a recently proposed bridge particle filter [10] to our problem.

3.1Conditioned hazard

We suppose that we have simulated as far as time and derive an approximation of the expected number of reaction events over the interval . Let denote the number of reaction events over the time . We approximate by assuming a constant reaction hazard over . A normal approximation to the corresponding Poisson distribution then gives

where . Under the Gaussian observation regime we have that

Hence, the joint distribution of and can then be obtained approximately as

Taking the expectation of using standard multivariate normal theory, and dividing the resulting expression by gives an approximate conditioned hazard as

A proposed path can then be produced by sampling reaction events according to an inhomogeneous Poisson process with rate given by (Equation 6). An importance sampling scheme based on this proposal mechanism can then be obtained. Although the conditioned hazard in (Equation 6) depends on the current time in a nonlinear way, a simple implementation ignores this time dependence, giving exponential waiting times between proposed reaction events. Algorithm ? describes the mechanism for generating .

To calculate the importance weights, we first note that can be written explicitly by considering the generation of all reaction times and types over . To this end, we let denote the number of reaction events of type , , and define as the total number of reaction events over the interval. Reaction times (assumed to be in increasing order) and types are denoted by , , and we take and . [46] gives , also known as the complete data likelihood over , as

We let denote the complete data likelihood under the approximate jump process with hazard , so that the importance weight for a path is given by

When the inhomogeneous Poisson process approximation is sampled exactly, the importance weight in (Equation 7) becomes

where the last term is seen to be the Radon-Nikodym derivative of the true Markov jump process () with respect to the inhomogeneous Poisson process approximation () and measures the closeness of the approximating process to the true process.

Algorithm ? gives importance sampling algorithm that uses an approximate implementation of the inhomogeneous Poisson process approximation. Note that in the special case of no error, the importance weight in step 1(b) has replaced with an indicator function assigning the value 1 if and 0 otherwise. Upon completion of the algorithm, an equally weighted sample approximately distributed according to is obtained. The average unnormalised weight can be used to (unbiasedly) estimate the normalising constant .

3.2Bridge particle filter

[10] considered the problem of sampling continuous time, continuous valued Markov processes and proposed a bridge particle filter to weight forward trajectories based on an approximation to the unknown transition probabilities at each reweighting step. Here, we adapt their method to our problem. We note that when using the bridge particle filter to sample MJP trajectories, it is possible to obtain a likelihood free scheme.

Without loss of generality, we adopt an equispaced partition of as

This partition is used to determine the times at which resampling may take place. Introduce the weight functions

where , are positive functions. Note that

and write as

where denotes the probability function associated with and the last line (Equation 8) follows by taking to be and absorbing into the proportionality constant. The form of (Equation 8) suggests a sequential Monte Carlo (SMC) scheme (also known as a particle filter) where at time each particle (trajectory) is extended by simulating from and incrementally weighted by . Intuitively, by “looking ahead” to the observation, trajectories that are not consistent with are given small weight and should be pruned out with a resampling step. [10] suggest an adaptive resampling procedure so that resampling is only performed if the effective sample size (ESS) falls below some fraction of the number of particles, say . The ESS is defined [33] as a function of the weights by

It remains that we can choose sensible functions to be used to construct the weights. We propose to use the density associated with under the CLE or LNA:

Note that due to the intractability of the CLE, we propose to use a single step of the Euler-Mauyama approximation. Comments on the relative merits of each scheme are given in Section 3.3.

Algorithm ? gives the sequence of steps necessary to implement the bridge particle filter. The average unnormalised weight obtained at time can be used to estimate the normalising constant :

We now consider some special cases of Algorithm ?. For unknown with prior probability mass function , the target becomes

which suggests that step 1(a) should be replaced by sampling particles . The contribution could either be absorbed into the final weight (taking care to respect the ancestral lineage of the trajectory), or an initial look ahead step could be performed by resampling amongst the with weights proportional to . If the latter strategy is adopted and no additional resampling steps are performed, the algorithm reduces to the auxiliary particle filter of [37], where particles are pre-weighted by and propagated through myopic forward simulation. If no resampling steps are performed at any time, the algorithm reduces to the myopic importance sampling strategy described in Section ?.

In the error free scenario, the target can be written as

where the incremental weight functions are redefined as

The form of the target suggests that at time , particle trajectories should be propagated via and weighted by , provided that the trajectory “hits” the observation , otherwise a weight of 0 should be assigned. Hence, unlike in the continuous state space scenario considered by [10], the algorithm is likelihood free, in the sense that need not be evaluated.

3.3Comments on efficiency

Application of Algorithm ? requires calculation of the conditioned hazard function in (Equation 6) after every reaction event. The cost of this calculation will therefore be dictated by the number of observed components , given that a matrix must be inverted. Despite this, for many systems of interest, it is unlikely that all components will be observed and we anticipate that in practice , where is the number of species in the system. The construction of the conditioned hazard is based on an assumption that the hazard function is constant over diminishing time intervals and that the number of reactions over this interval is approximately Gaussian. The performance of the construct is therefore likely to diminish if applied over time horizons during which the reaction hazards vary substantially. We also note that the elements of the conditioned hazard are not guaranteed to be positive and we therefore truncate each hazard component at zero.

We implement the bridge particle filter in Algorithm ? with the weight functions obtained either through the CLE or the LNA. To maximise statistical efficiency, we require that . Given the analytic intractability of the CLE, we obtain via a single step of an Euler-Maruyama scheme. Whilst this is likely to be computationally efficient, given the simplicity of applying a single step of the Euler-Maruyama scheme, we anticipate that applying the scheme over large time intervals (where non-linear dynamics are observed) is likely to be unsatisfactory. The tractability of the LNA has been recently exploited [29] and shown to give a reasonable approximation to the MJP for a number of reaction networks. However, use of the LNA requires the solution of a system of coupled ODEs. For most stochastic kinetic models of interest, the solution to the LNA ODEs will not be analytically tractable. Whilst good numerical ODE solvers are readily available, the bridge particle filter is likely to require a full numerical solution over the time interval of interest for each particle (except in the special error free case where only a single solution is required). Both the CLE and LNA replace the intractable transition probability with a Gaussian approximation. Moreover, the approximations may be light tailed relative to the target, and erstwhile valid trajectories may be pruned out by the resampling procedure. Tempering the approximations by raising to a power () may alleviate this problem at the expense of choosing an appropriate value for the additional tuning parameter . We assess the empirical performance of each scheme in Section 5.

4Bayesian inference

Consider a time interval over which a Markov jump process is not observed directly, but observations (on a regular grid) are available and assumed conditionally independent (given ) with conditional probability distribution obtained via the observation equation,

As in Section 3, we take to be a length- vector, is a constant matrix of dimension and is a length- Gaussian random vector. We assume that primary interest lies in the rate constants where, in the case of unknown measurement error variance, the parameter vector is augmented to include the parameters of . Bayesian inference may then proceed through the marginal posterior density

where is the prior density ascribed to and is the marginal likelihood. Since the posterior in (Equation 11) will be intractable in practice, samples are usually generated from (Equation 11) via MCMC. A further complication is the intractability of the marginal likelihood term, and we therefore adopt the particle marginal Metropolis-Hastings scheme of [2] which has been successfully applied to stochastic kinetic models in [25] and [23].

4.1Particle marginal Metropolis-Hastings

Since interest lies in the marginal posterior in (Equation 11), we consider the special case of the particle marginal Metropolis-Hastings (PMMH) algorithm [2] which can be seen as a pseudo-marginal MH scheme [6]. Under some fairly mild conditions (for which we refer the reader to [9] and [2]), a sequential Monte Carlo scheme targeting the probability associated with the conditioned MJP, , can be implemented to give an unbiased estimate of the marginal likelihood. We write this estimate as where denotes all random variables generated by the SMC scheme according to some density . We now consider a target of the form

for which marginalisation over gives

Hence, a MH scheme targeting with proposal kernel accepts a move from to with probability

We see that the values of need never be stored and it should now be clear that the scheme targets the correct marginal distribution .


Algorithms ? and ? can readily be applied to give an SMC scheme targeting . In each case, an initialisation step should be performed where a weighted sample is obtained by drawing values from some prior with mass function and assigning weights proportional to . If desired, resampling could be performed so that the algorithm is initialised with an equally weighted sample drawn from . Algorithms ? and ? can then be applied sequentially, for times , simply by replacing with . After assimilating all information, an unbiased estimate of the marginal likelihood is obtained as

where and we have dropped from the notation for simplicity. The product in (Equation 12) can be obtained from the output of Algorithms ? and ?. For example, when using the conditioned hazard approach, (Equation 12) is simply the product of the average unnormalised weight obtained in step 1(b). Use of Algorithms ? and ? in this way give SMC schemes that fall into a class of auxiliary particle filters [37]. We refer the reader to [36] for a theoretical treatment of the use of an auxiliary particle filter inside an MH scheme.

The mixing of the PMMH scheme is likely to depend on the number of particles used in the SMC scheme. Whilst the method can be implemented using just particle, the corresponding estimator of marginal likelihood will be highly variable, and the impact of this on the PMMH algorithm will be a poorly mixing chain. As noted by [3], the mixing efficiency of the PMMH scheme decreases as the variance of the estimated marginal likelihood increases. This problem can be alleviated at the expense of greater computational cost by increasing . This therefore suggests an optimal value of and finding this choice is the subject of [40] and [12]. The former show that for a “standard asymptotic regime” should be chosen so that the variance in the noise in the estimated log-posterior is around 3, but find that for low dimensional problems a smaller value (around 2) is optimal. We therefore recommend performing an initial pilot run of PMMH to obtain an estimate of the posterior mean (or median) parameter value, and a (small) number of additional sampled values. The value of should then be chosen so that the variance of the noise in the estimated log-posterior is (ideally) in the range .

Since all parameter values must be strictly positive we adopt a proposal kernel corresponding to a random walk on , with Gaussian innovations. We take the innovation variance to be and follow the practical advice of [40] by tuning to give an acceptance rate of around 15%.


In order to examine the empirical performance of the methods proposed in Section 3, we consider three examples. These are a simple (and tractable) birth-death model, the stochastic Lotka-Volterra model examined by [8] and a systems biology model of bacterial motility regulation [45].


The birth-death reaction network takes the form

with birth and death reactions shown respectively. The stoichiometry matrix is given by

and the associated hazard function is

where denotes the state of the system at time . The CLE is given by

which can be seen as a degenerate case of a Feller square-root diffusion [18]. For reaction networks of reasonable size and complexity, the CLE will be intractable. To explore the effect of working with a numerical approximation of the CLE inside the bridge particle filter, we adopt the Euler-Maruyama approximation which gives (for a fixed initial condition ) an approximation to the transition density as

The ODE system governing the LNA with initial conditions , and can be solved analytically to give

We consider a an example in which and are fixed. To provide a challenging scenario we took to be the upper 99% quantile of . To assess the performance of each algorithm as an observation is made with increasing time sparsity, we took . Algorithms ? (denoted MIS), ? (denoted CH) and ? (denoted BPF-CLE or BPF-LNA) were run with to give a set of estimates of the transition probability and we denote this set by . The bridge particle filter also requires specification of the intermediate time points at which resampling could take place. For simplicity, we took an equispaced partition of with a time step of 0.02 for , and for . We found that these gave a good balance between statistical efficiency and CPU time.

Table 1: , and , based on 5000 runs of MIS, CH, BPF-CLE and BPF-LNA. For MIS, the expected number of non-zero estimates (as obtained analytically) is reported. In all cases, and is the upper 99% quantile of .
MIS 10 300, 293, 6.2 171, 168, 3.5 151, 149, 3.0
50 1340, 1190, 1.2 827, 773, 7.0 682, 639, 5.8
100 2331, 1921, 6.4 1488, 1308, 3.5 1364, 1203, 3.2
500 4776, 3771, 1.2 4196, 3230, 6.8 3901, 3004, 6.1
CH 10 4974, 3264, 1.6 4985, 2998, 7.8 4990, 3581, 2.4
50 5000, 4395, 4.6 5000, 4546, 1.2 5000, 4508, 9.7
100 5000, 4689, 2.4 5000, 4668, 8.5 5000, 4798, 3.8
500 5000, 4921, 7.7 5000, 4943, 1.6 5000, 4939, 1.2
BPF-CLE 10 2581, 349, 5.7 2412, 556, 7.7 2745, 532,
50 4982, 2137, 6.3 4920, 3391, 4.9 3236, 4925, 4.0
100 5000, 3519, 1.9 4998, 3979, 2.8 4999, 4106, 4.1
500 5000, 3841, 1.5 5000, 4756, 6.7 5000, 4780, 3.2
BPF-LNA 10 2634, 403, 4.3 2514, 636, 6.9 2843, 1102, 2.4
50 4963, 2748, 3.2 4926, 3198, 6.0 4949, 3625, 2.8
100 5000, 3612, 1.5 4998, 4055, 2.5 5000, 4016, 1.9
500 5000, 3643, 1.4 5000, 4655, 8.8 5000, 4771, 5.4

To compare the algorithms, we report the number of non-zero normalising constant estimates , the effective sample size whose form is defined in (Equation 9) and mean-squared error given by

where can be obtained analytically [4].

The results are summarised in Table 1. Use of the conditioned hazard and bridge particle filters (CH, BPF-CLE and BPF-LNA) comprehensively outperform the myopic importance sampler (MIS). For example, for the case, an order of magnitude improvement is observed when comparing BPF (CLE or LNA) with MIS in terms of mean squared error. We see a reduction in mean squared error of two orders of magnitude when comparing MIS with CH, across all experiments, and performance (across all metrics) of MIS with is comparable with the performance of CH when . BPF-LNA generally outperforms BPF-CLE, although the difference is small. Running the BPF schemes generally requires twice as much computational effort than MIS, whereas CH is roughly three times slower than MIS. Even when this additional cost is taken into account, MIS cannot be recommended in this example.

Naturally, the performance of BPF will depend on the accuracy of the normal approximations used by the CLE and LNA. In particular, we expect these approximations to be unsatisfactory when species numbers are low. Moreover, when the conditioned jump process exhibits nonlinear dynamics, we expect the Euler approximation to be particularly poor. We therefore repeated the experiments of Table 1 with , , and as the lower 1% quantile of . Results are reported in Table 2. We see that in this case, MIS outperforms BPF and the performance of BPF-CLE worsens as increases, suggesting that a single step of the Euler approximation is unsatisfactory for . Use of the conditioned hazard on the other hand appears fairly robust to different choices of , and .

Table 2: , and , based on 5000 runs of MIS, CH, BPF-CLE and BPF-LNA. For MIS, the expected number of non-zero estimates (as obtained analytically) is reported. In all cases, , and is the lower 1% quantile of .
MIS 5000, 4747, 7.3 4998, 4426, 3.1 4999, 4500, 3.7
CH 5000, 4979, 8.7 5000, 4963, 2.3 5000, 4965, 2.58
BPF-CLE 5000, 4131, 3.9 5000, 3013, 1.4 5000, 3478, 1.8
BPF-LNA 5000, 3946, 3.6 5000, 3667, 1.4 5000, 3639, 1.3


We consider a simple model of predator and prey interaction comprising three reactions:

Denote the current state of the system by where we have dropped dependence of the state on for notational simplicity. The stoichiometry matrix is given by

and the associated hazard function is

We consider three synthetic datasets consisting of 51 observations at integer times on prey and predator levels generated from the stochastic kinetic model using Gillespie’s direct method and corrupted with zero mean Gaussian noise. The observation equation (Equation 10) is therefore

where , . We took to construct the first dataset (), to construct the second () and to give the third synthetic dataset (). In all cases we assumed to be known. True values of the rate constants were taken to be 0.5, 0.0025, and 0.3 following [8]. We took the initial latent state as assumed known for simplicity. Independent proper Uniform priors were ascribed to each