Towards automatic calibration of the number of state particles within the SMC$^2$ algorithm
SMC (Chopin et al., 2013) is an efficient algorithm for sequential estimation and state inference of state-space models. It generates parameter particles , and, for each , it runs a particle filter of size (i.e. at each time step, particles are generated in the state space ). We discuss how to automatically calibrate in the course of the algorithm. Our approach relies on conditional Sequential Monte Carlo updates, monitoring the state of the pseudo random number generator and on an estimator of the variance of the unbiased estimate of the likelihood that is produced by the particle filters, which is obtained using nonparametric regression techniques. We observe that our approach is both less CPU intensive and with smaller Monte Carlo errors than the initial version of SMC.
footnoteinfo]The first author is partially supported by a grant from the French National Research Agency (ANR) as part of the âInvestissements d’Avenirâ program (ANR-11-LABEX-0047). The third author is supported by DARPA under Grant No. FA8750-14-2-0117.
First]N. Chopin Second]J. Ridgway Third]M. Gerber Fourth]O. Papaspiliopoulos
ayesian inference, Estimation algorithms, Hidden Markov models, Monte Carlo simulation, Particle filtering, State space models
Consider a state-space model, with parameter , latent Markov process , and observed process , taking values respectively in and . The model is defined through the following probability densities: has prior , has initial law and Markov transition , and the ’s are conditionally independent, given the ’s, with density . Sequential analysis of such a model amounts to computing recursively (in ) the posterior distributions
or some of its marginals (e.g. ); the normalising constant of the above density is the marginal likelihood (evidence) of the data observed up to time .
For a fixed , the standard approach to sequential analysis of state-space models is particle filtering: one propagates particles in over time through mutation steps (based on proposal distribution at time ) and resampling steps; see Algorithm 1. Note the conventions: denotes the set of integers , is , , , and so on.
Operations involving superscript must be performed for all .
At time :
normalised weights, , and incremental likelihood estimate
Recursively, from time to time :
Sample , the multinomial distribution which generates value with probability .
and incremental likelihood estimate
The output of Algorithm 1 may be used in different ways: at time , the quantity is a consistent (as ) estimator of the filtering expectation ; In addition, is an unbiased estimator of incremental likelihood , and is an unbiased estimator of the full likelihood (Del Moral, 1996, Lemma 3).
In order to perform joint inference on parameter and state variables, Chopin et al. (2013) derived the SMC sampler, that is, a SMC (Sequential Monte Carlo) algorithm in space, which generates and propagates values in , and which, for each , runs a particle filter (i.e. Algorithm 1) for , of size . One issue however is how to choose : if too big, then CPU time is wasted, while if taken too small, then the performance of the algorithm deteriorates. Chopin et al. (2013) give formal results (adapted from Andrieu et al. (2010)) that suggest that should grow at a linear rate during the course of the algorithm. They also propose a practical method for increasing adaptively, based on an importance sampling step where the particle systems, of size , are replaced by new particle systems of size . But this importance sampling step increases the degeneracy of the weights, which in return may leads to more frequent resampling steps, which are expensive. In this paper, we derive an alternative way to increase adaptively, which is not based on importance sampling, but rather on a CSMC (conditional Sequential Monte Carlo) update, which is less CPU intensive.
2 Background on SMC
To explain SMC, we first recall the structure of the IBIS algorithm (Chopin, 2002) as Algorithm 2.1. For a model with parameter , prior , data , and incremental likelihood , IBIS provides at each iteration an approximation of partial posterior . In practice, IBIS samples particles from the prior, then perfoms sequential importance sampling steps, from to using incremental weight .
Operations involving superscript must be performed for all .
Sample , set .
From time to time , do
Update importance weights
If ESS(, sample (for all ) from mixture
where is a Markov kernel with invariant distribution ; finally reset particle system to
To avoid weight degeneracy, one performs a resample-move step (described as Step (b) in Algorithm 2.1). When the ESS (effective sample size) of the weights, computed as:
goes below some threshold (e.g. ), the ’s are resampled, then moved according to some Markov kernel that leaves invariant the current target of the algorithm, . This resample-move step re-introduces diversity among the -particles.
A convenient default choice for is several iterations of random-walk Metropolis, with the random step calibrated to the spread of the current particle population (i.e. variance of random step equals some fraction of the covariance matrix of the resampled particles).
The main limitation of IBIS is that it requires evaluating the likelihood increment , which is typically intractable for state-space models. On the other hand, we have seen that this quantity may be estimated unbiasedly by particle filtering. This suggests combining IBIS (i.e. SMC in the -dimension) with particle filtering (i.e. SMC in the dimension), as done in the SMC algorithm.
The general structure of SMC is recalled as Algorithm 2.2. Essentially, one recognises the IBIS algorithm, where the intractable incremental weight has been replaced by the unbiased estimate . This estimate is obtained from a PF run for ; thus PFs are run in parallel. Denote the random variables generated by the PF associated to .
Operations involving superscript must be performed for all .
Sample , set .
From time to time , do
For each , run iteration of Algorithm 1, so as to obtain , and .
If ESS(, sample (for all ) from mixture
where is a PMCMC kernel with invariant distribution (see text); finally reset particle system to
and , for all .
This ‘double-layer’ structure suggests that SMC suffers from two levels of approximation, and as such that it requires both and to converge. It turns out however that SMC is valid for any fixed value of ; that is, for any fixed , it converges as .
This property is intuitive in the simplified case when resampling-move steps are never triggered (i.e. take ). Then SMC collapses to importance sampling, with weights replaced by unbiased estimates, and it is easy to show convergence from first principles.
We now give a brief outline of the formal justification of SMC for fixed , and refer to Chopin et al. (2013) for more details. SMC may be formalised as a SMC sampler for the sequence of extended distributions:
where denotes the joint pdf of the random variables generated by a PF up to time (for parameter ), and denotes the unbiased estimate of the likelihood increment computed from that PF, , for ; i.e. is actually a function of .
One recognises in the type of extended target distribution simulated by PMCMC (Particle MCMC, Andrieu et al. (2010)) algorithms. Note is a proper probability density (it integrates to one), and that the marginal distribution of is . These two properties are easily deduced from the unbiasedness of (as an estimator of ). In addition,
where one recognises in the second factor the distribution of the variables generated by a PF at time , conditional on those variables generated up to time . Thus, the equation above justifies both Step (a) of Algorithm 2.2, where the particle filters are extended from time to , and Step (b), where the particles are reweighted by .
We describe in the following section PMCMC moves that may be used in Step (c). Before, we note that a naive implementation of SMC has a memory cost at time , as one must stores in memory for each . This memory cost may be substantial even on a modern computer.
2.3 PMCMC moves
To make more explicit the dependence of the unbiased estimate of the likelihood on the variables generated during the course of PF, define
The PMMH (Particle Markov Metropolis-Hastings) kernel, described as Algorithm 2.3, may be described informally as a Metropolis step in -space, where the likelihood of both the current value and the proposed value have been replaced by unbiased estimators. Formally, as proven in Andrieu et al. (2010), it is in fact a standard Metropolis step with respect to the extended distribution ; in particular it leaves invariant . (For convenience, our description of PMMH assumes a random walk proposal, but PMMH is not restricted to this kind of proposal.)
Generate PF (Algorithm 1) for parameter ; let the output.
With probability ,
let ; otherwise .
In practice, we set , the covariance matrix of the proposal, to a fraction of the covariance matrix of the resampled -particles.
One advantage of using PMHMH within SMC is that it does not require storing all the variables generated by the PFs: operations at time require only having access to, for each , and , which is computed recursively. Memory cost then reduces to .
The Particle Gibbs approach is an alternative PMCMC step, based on the following property of target : if one extends with random index , such that , and , the normalised weighs at the final iteration, then (a) the selected trajectory, together with , follow the posterior distribution ; and (b) the remaining arguments of follow a CSMC (conditional SMC) distribution, which corresponds to the distribution of the random variables generated by a PF, but conditional on one trajectory fixed to the selected trajectory; see Algorithm 2.3.
Sample , with From to , set . Set , for all .
Sample from a MCMC step that leaves invariant distribution , but with set to .
Sample as in Algorithm 1, but for parameter and conditionally on , that is: at time , generate for , at time , sample , for , and , and so on.
In contrast with PMMH, implementing particle Gibbs steps within SMC requires having access to all the variables at time , which as we have already discussed, might incur too big a memory cost.
In the context of SMC, this suggests that should be allowed to increase in the course of the algorithm. To that effect, Chopin et al. (2013) devised an exchange step, which consists in exchanging the current particle systems, of size , with new particle systems, of size , through importance sampling. In Chopin et al. (2013)’s implementation, the exchange step is triggered each time the acceptance rate of the PMMH step (as performed in Step 3. of Algorithm 2.3) is below a certain threshold, and (i.e. doubles every time).
The main drawback of this approach is that it introduces some weight degeneracy immediately after the resampling step. In particular, we will observe in our simulations that this prevents us from changing too frequently, as the ESS of the weights then becomes too low.
In this paper, we discuss how to use a Particle Gibbs step in order to increase without changing the weights.
3 Proposed approach
3.1 Particle Gibbs and memory cost
We first remark that the Particle Gibbs step, Algorithm 2.3, offers a very simple way to change during the course of the algorithm: In Step (2), simply re-generate a particle system (conditional on selected trajectory ) of size . But, as already discussed, such a strategy requires then to access past particle values (and also , rather than only current particle values .
This problem may be addressed in two ways. First, one may remark that, to implement Particle Gibbs, one needs to store only those (and ) which have descendant among the current particles . Jacob et al. (2013) developed such a path storage approach, and gave conditions on the mixing of Markov chain under which this approach has memory cost (for a single PF with particles, run until time ). Thus, an implementation of this approach within SMC would lead to a memory cost.
A second approach, developed here, exploits the deterministic nature of PRNGs (pseudo-random number generators): a sequence of computer-generated random variates is actually a deterministic sequence determined by the initial state (seed) of the PRNG. It is sufficient to store that initial state and in order to recover any in the future. The trade-off is an increase in CPU cost, as each access to require re-computing .
We apply this idea to the variables . By close inspection of Algorithm 2.2, we note that variables in a ‘time slice’ , (or at time 0) are always generated jointly, either during Step (a), or during Step (c). In both cases, this time-slice is a deterministic function of the current PRNG state and the previous time slice. Thus, one may recover any time slice (when needed) by storing only (i) the PNRG state (immediately before the generation of the time slice); and (ii) in which Step (either (a) or (c)) the time slice was generated. This reduces the memory cost of SMC from to .
Compared to the path storage approach mentioned above, our PRNG recycling approach has a larger CPU cost, a smaller memory cost, and does not require any conditions on the mixing properties of process . Note that the CPU cost increase is within a factor of two, because each time a Particle Gibbs update is performed, the number of random variables that must be re-generated (i.e. the and in Algorithm 2.3) roughly equals the number of random variables that are generated for the first time (i.e. the and in Algorithm 2.3).
3.2 Nonparametric estimation of
As seen in Algorithm 2.2, a Particle Gibbs step will be performed each time the ESS goes below some threshold. That the ESS is low may indicate that is also too low, and therefore that the variance of the likelihood estimates is too high. Our strategy is to update (each time a Particle Gibbs step is performed) the current value of to , where is some (possibly rough) estimate of the variance of the log likelihood estimates. This is motivated by results from Doucet et al. (2012), who also develop some theory that supports choosing is optimal (although their optimality results do not extend straightforwardly to our settings).
Assume . To estimate , we use backfitting to fit a GAM (generalized additive model) to the responses :
using as covariates the principal components of the resampled -particles. The estimate is then the empirical variance of the residuals. See e.g. Chap. 9 of Hastie et al. (2009) for more details on backfitting and GAM modelling.
We found this strategy to work well, with the caveat that choosing required some trial and error.
3.3 Additional considerations
Using Particle Gibbs as our PMCMC move within SMC hast two advantages: (a) it makes it possible to change without changing the weights, as explained above; and (b) it also makes it possible to update the according to Gibbs or Metropolis step that leaves invariant); see Step (3) of Algorithm 2.3. For models where sampling from is not convenient, one may instead update through several PMMH steps performed after the Particle Gibbs step.
4 Numerical example
We consider the following stochastic volatility model: , and ; thus , with , . We assign independent priors to the components of : , constrained to , and . The dataset consists in log-returns from the monthly SP500 index, observed from 29/05/2013 to 19/12/2014; .
Figure 1 plots the marginal posterior , as approximated by SMC, run up to time . This figure illustrates the need for modelling nonparametrically the true likelihood as a function of , in order to estimate the variance of the estimated likelihood.
For this model, sampling jointly from is difficult, but it is easy to perform a Gibbs step that leaves invariant , as the full conditionals of each component (e.g. and so on) are standard distributions. Let’s call ‘full PG’ Algorithm 2.3, where Step 2 consists of this Gibbs step for ; and conversely let’s call ‘partial PG’ Algorithm 2.3 with in Step 2 ( is not updated).
We compare four versions of SMC: (a) the standard version, as proposed in Chopin et al. (2013) (i.e. Step (c) of Algorithm 2.2 is a PMMH step, and that step is followed by an exchange step to double when the acceptance rate of PMMH is below ); (b) the same algorithm, except that an exchange step is systematically performed after Step (c), and is set to the value obtained with our non-parametric approach (see Section 3.2); (c) the version developed in this paper, with full PG steps (and updated through the non-parametric procedure); (d) the same algorithm, but with partial PG steps, followed by 3 PMMH steps to update .
The point of Algorithm (b) is to show that adapting too often during the course of the algorithm is not desirable when using the exchange step, as this leads to too much variance. The point of Algorithm (d) is to see how our approach performs when sampling from (either independently or through MCMC) is not feasible.
Figure 2 plots the evolution of over time for the four SMC algorithms. One sees that, for these model and dataset, the CPU cost of the standard SMC algorithm is quite volatile, as increases very quickly in certain runs. In fact certain runs are incomplete, as they were stopped when the CPU time exceeded hours. On the other hand, the CPU cost of other versions is more stable across runs, and, more importantly, quite lower.
Figure 3 plots the empirical variance of the estimated marginal likelihood (evidence, ), normalised with the running time up to time step . One observes that version (c) does quite better than (d), and far much better than (a). Results from Algorithm (b) were to variable to be included.
Figure 4 plots the acceptance rate of PMMH steps for Algorithms (a), (b) and (d). (Recall that Algorithm (c) does not perform PMMH steps). Note the poor performance of Algorithm (b). Figure 5 compares the box-plots of posterior estimates of at final time , obtained from several runs of Algorithms (c) and (d). Algorithm (c) shows slightly less variability, while being faster on average. One sees that the improvement brought by ability to sample from is modest here for parameter estimation, but recall that in Figure 3, the improvement was more substantial.
We thank Pierre Jacob for useful comments.
- thanks: [
- Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. R. Statist. Soc. B, 72(3), 269–342. doi:10.1111/j.1467-9868.2009.00736.x.
- Andrieu, C., Lee, A., and Vihola, M. (2013). Uniform Ergodicity of the Iterated Conditional SMC and Geometric Ergodicity of Particle Gibbs samplers. ArXiv e-prints.
- Chopin, N. (2002). A sequential particle filter for static models. Biometrika, 89, 539–552.
- Chopin, N., Jacob, P., and Papaspiliopoulos, O. (2013). SMC: A sequential Monte Carlo algorithm with particle Markov chain Monte Carlo updates. J. R. Statist. Soc. B, 75(3), 397–426.
- Del Moral, P. (1996). Non-linear filtering: interacting particle resolution. Markov processes and related fields, 2(4), 555–581.
- Doucet, A., Pitt, M., Deligiannidis, G., and Kohn, R. (2012). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. ArXiv preprint.
- Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., and Tibshirani, R. (2009). The elements of statistical learning, volume 2. Springer.
- Jacob, P., Murray, L., and Rubenthaler, S. (2013). Path storage in the particle filter. Statist. Comput., 1–10. doi:10.1007/s11222-013-9445-x. URL http://dx.doi.org/10.1007/s11222-013-9445-x.