An extended space approach for particle Markov chain Monte Carlo methods
Abstract
In this paper we consider fully Bayesian inference in general state space models. Existing particle Markov chain Monte Carlo (MCMC) algorithms use an augmented model that takes into account all the variable sampled in a sequential Monte Carlo algorithm. This paper describes an approach that also uses sequential Monte Carlo to construct an approximation to the state space, but generates extra states using MCMC runs at each time point. We construct an augmented model for our extended space with the marginal distribution of the sampled states matching the posterior distribution of the state vector. We show how our method may be combined with particle independent MetropolisHastings or particle Gibbs steps to obtain a smoothing algorithm. All the Metropolis acceptance probabilities are identical to those obtained in existing approaches, so there is no extra cost in term of MetropolisHastings rejections when using our approach. The number of MCMC iterates at each time point is chosen by the user and our augmented model collapses back to the model in Olsson and Ryden (2011) when the number of MCMC iterations reduces. We show empirically that our approach works well on applied examples and can outperform existing methods.
1 Introduction
Our article deals with statistical inference for nonGaussian state space models. Its main goal is to provide flexible methods that give effficient estimates for a wide class of state space models. This work extends the methods proposed by Andrieu et al. (2010), Bunch and Godsill (2013), Lindsten and Schön (2012), Lindsten et al. (2014) and Olsson and Ryden (2011).
MCMC methods for Bayesian inference for Gaussian state space models or conditionally Gaussian state space models are well developed with algorithms to generate from the joint distribution of all the state vectors and to generate from marginal distributions with the state vectors integrated out – see, for example, Carter and Kohn (1994), FrühwirthSchnatter (1994), Gerlach et al. (2000) and FrühwirthSchnatter (2006). Bayesian inference for general nonGaussian state space models has proved to be a much harder problem. MCMC approaches include singlesite updating of the state vectors in Carlin et al. (1992) and blockupdating of the state vectors in Shephard and Pitt (1997). These approaches apply to general models, but they can be inefficient for some cases and can require numerical approximations over high dimensional spaces. MCMC methods based on the particle filter have proved to be an attactive alternative. A class of MCMC methods involving unbiased estimation of the likelihood was introduced by Beaumont (2003) and its theoretical properties are discussed in Andrieu and Roberts (2009).
Andrieu et al. (2010) extend these methods by constructing a joint distribution for the output of the particle filter that has a marginal distribution equal to the posterior distribution of the states in a state space model. This marginal distribution involves the states determined by tracing back the ancestors of a selected particle and is called the ancestral tracing approach by Andrieu et al. (2010). They show that previous approaches involving unbiased estimation of the likelihood correspond to MetropolisHastings sampling schemes under their joint distribution. The methods in Andrieu et al. (2010) can also be viewed as a fully Bayesian approach to the smoothing algorithm of Kitagawa (1996). The Andrieu et al. (2010) approach also allows other possible MCMC sampling schemes and they construct a particle Gibbs sampler which targets the same joint distribution. Lindsten et al. (2014) construct another particle Gibbs sampler for this model and give empirical evidence that their sampler improves the mixing properties of the resulting Markov chains. Dubarry and Douc (2011) give a smoothing method based on singlesite MCMC updating of the generated trajectories from the ancestral tracing approach in Andrieu et al. (2010).
Olsson and Ryden (2011) extend the methods in Andrieu et al. (2010) by contructing a joint distribution on the ouput of the particle filter together with a series of indices corresponding to the selected states. The sampling of indices is based on the forward filtering backward simulation approach in Godsill et al. (2004) and is called the backward simulation approach in the literature. Their joint distribution also has a marginal distribution equal to the posterior distribution of the states in a state space model and their MetropolisHastings sampling schemes have the same acceptance probabilities as the Andrieu et al. (2010) approach. Lindsten and Schön (2012) constructs a particle Gibbs algorithm for the Olsson and Ryden (2011) model and gives empirical results showing improved effciency over previous approaches. Chopin and Singh (2013) gives theoretical results showing the particle Gibbs with backward simulation in Lindsten and Schön (2012) has a smaller integrated autocorrelation time compared to the Andrieu et al. (2010) particle Gibbs sampler.
Bunch and Godsill (2013) give a smoothing algorithm which runs the particle filter and then uses a backwards simulation approach that involves running an MCMC at each time point. They show that the advantage of their method is that new values of the state vectors are generated during the backward simulation step, whereas many other approaches are restricted to the output of the particle filter. Fearnhead et al. (2010) give a smoothing algorithm based on combining particles from a forward filter and a backward information filter, which also generates new values of the state vectors.
Our work extends the methods in Olsson and Ryden (2011), Lindsten and Schön (2012) and Bunch and Godsill (2013) by using an augmented model that includes the results of the particle filter, a series of indices which correspond to starting values of an MCMC run at each time point, and the output of the MCMC runs. We construct a joint distribution for our augmented space which has a marginal distribution equal to the posterior distribution of the states in a state space model and we show that our MetropolisHastings sampling schemes have the same acceptance probabilities as the approaches in Andrieu et al. (2010) and Olsson and Ryden (2011). The advantage of our approach is that the MCMC runs at each time point generate new values of the state vectors, so we are not restricted to the output of the particle filter. Our method can be used to obtain generated states from the smoothing distrution or for Bayesian inference involving parameters. Our method is fully Bayesian, so the output of our MCMC convergences to the posterior distribution given suitable regularity conditions which we discuss. We derive a particle Gibbs sampler for our augmented model.
The paper is organised as follows. Section 2 describes our state space model and sequential Monte Carlo algorithm. This section also constructs the joint distribution we use for Bayesian inference, describes the properties of this distubution, and gives our particle Gibbs algorithm. Section 3 describes our MCMC sampling schemes to carry out smoothing and Bayesian inference and discusses their convergence properties. Section 4 reports the empirical results. Proofs are given in an Appendix.
2 Generating the states
This section gives the technical results that are required for the Markov chain Monte Carlo methods described in Section 3. We describe the State Space Model, the Sequential Monte Carlo algorithm to generate the particles, and the extra Markov chain Monte Carlo steps in our method to generate the states. We then derive the properties of the distributions resulting from our algorithms. We also give a conditional sequential Monte Carlo algorithm that is used for particle Gibbs steps in Section 3. We use the standard convention where capital letters denote random variables and lower case letters denote their values.
2.1 State Space Model
Consider the state space model with states denoted by and observations denoted by . We will assume the transition and observation distributions have positive densities denoted by
(2.1)  
(2.2)  
(2.3) 
All the densities are with respect to Lebesgue measure for continuous variables and counting measure for discrete valued variables unless otherwise indicated. The vector represents parameters which are discussed in Section 3 and in the examples in Section 4. We use the following notation for sequences, and we denote the joint density of given by
(2.4) 
2.2 Sequential Monte Carlo algorithm
The Sequential Monte Carlo algorithm we use for the state space model defined by (2.1)–(2.4) at time constructs a sample of particles denoted by with associated normalized weights that approximates the distribution by
(2.5) 
In the pseudocode of the sequential Monte Carlo Algorithm 1 described below we denote the unnormalized weights at time by and use the notation for the discrete probability distribution on of parameter , with and , for some . Algorithm 1 uses the importance densities and for . We make Assumption 1 about these importance densities for the results in later sections.
Assumption 1
and for are finite strictly positive densities.
Algorithm 1 is based on Andrieu et al. (2010) and we include it for completeness and notational consistency. We use the convention that whenever the index is used for a particular value of we mean ‘for all ’.
Algorithm 1 (Sequential Monte Carlo)
 Step 1

For
 Step 1.1

sample
 Step 1.2

compute and normalize the weights
(2.6)
 Step 2

For
 Step 2.1

sample
 Step 2.2

sample
 Step 2.3

compute and normalize the weights
(2.7)
The variable in the above algorithm represents the index of the parent at time of particle . Our methods do not require the full trajectory of the states in a particle and are more concerned with the individual values for and for . We denote the collection of states at time by for and the corresponding collection of parent indices by for . We will also use the notation for sequences and .
2.3 MCMC steps to generate states
Algorithm 2 described below takes the output of the Sequential Monte Carlo steps described in Algorithm 1 and runs a backward simulation algorithm to generate extra state values. The state at time is generated using the approach in Andrieu et al. (2010) and the states at times are generated using an approach related to that of Bunch and Godsill (2013), which, at time involves a Markov chain Monte Carlo run of length . We denote the generated values at time by for and use the sequence notation
These Markov chain Monte Carlo runs involve the following components. For , the target density for the MetropolisHasting step is
so no approximation using the output from Algorithm 1 is required. For , the target densities for the MetropolisHasting steps are
(2.8) 
which approximates
based on the output from Algorithm 1. Similarly, for the target density for the MetropolisHasting steps is
(2.9) 
which approximates
The following lemma follows immediately from the assumption that and are strictly positive densities for .
Lemma 1
The densities for and are strictly positive.
We denote the MCMC transition kernels by
(2.10) 
for and
(2.11) 
The choice of MetropolisHastings proposal is determined by the user, but the conditioning indicated in (2.10) and (2.11) is sufficient for the results given in Sections 2.4 and 2.5. We require the standard reversibility condition of detailed balance as described in Assumption 2. Sections 3.3 and 4.1 give more detail on the transition kernels.
Assumption 2 (Detailed balance)
For all
 (a)

 (b)

for and
 (c)

Algorithm 2 generates the states using Markov chain Monte Carlo runs.
Algorithm 2 (Markov chain Monte Carlo)
 Step 1

Run the sequential Monte Carlo algorithm (Algorithm 1) to obtain and .
 Step 2

For sample
 Step 3

For sample as follows
 Step 3.1

compute and normalize the weights
(2.12)  Step 3.2

sample
 Step 3.3

set
 Step 3.4

For
 Step 3.4.1

sample
 Step 3.4.2

set
 Step 4

For
 Step 4.1

compute and normalize the weights
(2.13)  Step 4.2

sample
 Step 4.3

set
 Step 4.4

For
 Step 4.4.1

sample
 Step 4.4.2

set
2.4 Distributions on the extended space
This section first gives the joint probability distribution of the variables generated by Algorithms 1 and 2 before constructing our target distribution and deriving its properties. To simplify the notation, we group the variables together as . We denote the sample space of by
and the joint distrbution of generated by Algorithms 1 and 2 by Let
It is straightforward to show that the distribution of the variables generated by Algorithm 1 is
(2.14) 
see Andrieu et al. (2010) for details.
The conditional distribution generated by Algorithm 2 is
We now construct a joint distribution on the variable that will be the target distribution of a Markov chain Monte Carlo sampling scheme to generate a sample from the posterior distribution of the states in a state space model. To simplify the notation, define
(2.16) 
as the posterior density of the states in the state space model defined by (2.1)–(2.4). The distribution we construct is
(2.17) 
which is well defined by Assumption 1.
The following lemma describes the properties of the distribution defined in (2.17). Its proof is given in the Appendix.
Lemma 2
 (i)

The joint distribution has marginal distribution
 (ii)

For all the measures and are equivalent.
 (iii)

There exists a version of the density
with
(2.18)
Lemma 3 shows how to generate a sample from the distribution
(2.19) 
Its proof is given in the appendix.
2.5 Conditional sequential Monte Carlo
This section gives a conditional sequential Monte Carlo algorithm that is used to construct a particle Gibbs step later in the paper. We first describe the algorithm and derive its properties. Section 3 shows how to use it in Markov chain Monte Carlo sampling schemes.
Algorithm 3 generates from the conditional distribution