An extended space approach for particle Markov chain Monte Carlo methods

An extended space approach for particle Markov chain Monte Carlo methods

Christopher K. Carter    Eduardo F. Mendes    Robert Kohn
Abstract

In this paper we consider fully Bayesian inference in general state space models. Existing particle Markov chain Monte Carlo (MCMC) algorithms use an augmented model that takes into account all the variable sampled in a sequential Monte Carlo algorithm. This paper describes an approach that also uses sequential Monte Carlo to construct an approximation to the state space, but generates extra states using MCMC runs at each time point. We construct an augmented model for our extended space with the marginal distribution of the sampled states matching the posterior distribution of the state vector. We show how our method may be combined with particle independent Metropolis-Hastings or particle Gibbs steps to obtain a smoothing algorithm. All the Metropolis acceptance probabilities are identical to those obtained in existing approaches, so there is no extra cost in term of Metropolis-Hastings rejections when using our approach. The number of MCMC iterates at each time point is chosen by the user and our augmented model collapses back to the model in Olsson and Ryden (2011) when the number of MCMC iterations reduces. We show empirically that our approach works well on applied examples and can outperform existing methods.

1 Introduction

Our article deals with statistical inference for non-Gaussian state space models. Its main goal is to provide flexible methods that give effficient estimates for a wide class of state space models. This work extends the methods proposed by Andrieu et al. (2010), Bunch and Godsill (2013), Lindsten and Schön (2012), Lindsten et al. (2014) and Olsson and Ryden (2011).

MCMC methods for Bayesian inference for Gaussian state space models or conditionally Gaussian state space models are well developed with algorithms to generate from the joint distribution of all the state vectors and to generate from marginal distributions with the state vectors integrated out – see, for example, Carter and Kohn (1994), Frühwirth-Schnatter (1994), Gerlach et al. (2000) and Frühwirth-Schnatter (2006). Bayesian inference for general non-Gaussian state space models has proved to be a much harder problem. MCMC approaches include single-site updating of the state vectors in Carlin et al. (1992) and block-updating of the state vectors in Shephard and Pitt (1997). These approaches apply to general models, but they can be inefficient for some cases and can require numerical approximations over high dimensional spaces. MCMC methods based on the particle filter have proved to be an attactive alternative. A class of MCMC methods involving unbiased estimation of the likelihood was introduced by Beaumont (2003) and its theoretical properties are discussed in Andrieu and Roberts (2009).

Andrieu et al. (2010) extend these methods by constructing a joint distribution for the output of the particle filter that has a marginal distribution equal to the posterior distribution of the states in a state space model. This marginal distribution involves the states determined by tracing back the ancestors of a selected particle and is called the ancestral tracing approach by Andrieu et al. (2010). They show that previous approaches involving unbiased estimation of the likelihood correspond to Metropolis-Hastings sampling schemes under their joint distribution. The methods in Andrieu et al. (2010) can also be viewed as a fully Bayesian approach to the smoothing algorithm of Kitagawa (1996). The Andrieu et al. (2010) approach also allows other possible MCMC sampling schemes and they construct a particle Gibbs sampler which targets the same joint distribution. Lindsten et al. (2014) construct another particle Gibbs sampler for this model and give empirical evidence that their sampler improves the mixing properties of the resulting Markov chains. Dubarry and Douc (2011) give a smoothing method based on single-site MCMC updating of the generated trajectories from the ancestral tracing approach in Andrieu et al. (2010).

Olsson and Ryden (2011) extend the methods in Andrieu et al. (2010) by contructing a joint distribution on the ouput of the particle filter together with a series of indices corresponding to the selected states. The sampling of indices is based on the forward filtering backward simulation approach in Godsill et al. (2004) and is called the backward simulation approach in the literature. Their joint distribution also has a marginal distribution equal to the posterior distribution of the states in a state space model and their Metropolis-Hastings sampling schemes have the same acceptance probabilities as the Andrieu et al. (2010) approach. Lindsten and Schön (2012) constructs a particle Gibbs algorithm for the Olsson and Ryden (2011) model and gives empirical results showing improved effciency over previous approaches. Chopin and Singh (2013) gives theoretical results showing the particle Gibbs with backward simulation in Lindsten and Schön (2012) has a smaller integrated autocorrelation time compared to the Andrieu et al. (2010) particle Gibbs sampler.

Bunch and Godsill (2013) give a smoothing algorithm which runs the particle filter and then uses a backwards simulation approach that involves running an MCMC at each time point. They show that the advantage of their method is that new values of the state vectors are generated during the backward simulation step, whereas many other approaches are restricted to the output of the particle filter. Fearnhead et al. (2010) give a smoothing algorithm based on combining particles from a forward filter and a backward information filter, which also generates new values of the state vectors.

Our work extends the methods in Olsson and Ryden (2011), Lindsten and Schön (2012)  and Bunch and Godsill (2013) by using an augmented model that includes the results of the particle filter, a series of indices which correspond to starting values of an MCMC run at each time point, and the output of the MCMC runs. We construct a joint distribution for our augmented space which has a marginal distribution equal to the posterior distribution of the states in a state space model and we show that our Metropolis-Hastings sampling schemes have the same acceptance probabilities as the approaches in Andrieu et al. (2010) and Olsson and Ryden (2011). The advantage of our approach is that the MCMC runs at each time point generate new values of the state vectors, so we are not restricted to the output of the particle filter. Our method can be used to obtain generated states from the smoothing distrution or for Bayesian inference involving parameters. Our method is fully Bayesian, so the output of our MCMC convergences to the posterior distribution given suitable regularity conditions which we discuss. We derive a particle Gibbs sampler for our augmented model.

The paper is organised as follows. Section 2 describes our state space model and sequential Monte Carlo algorithm. This section also constructs the joint distribution we use for Bayesian inference, describes the properties of this distubution, and gives our particle Gibbs algorithm. Section 3 describes our MCMC sampling schemes to carry out smoothing and Bayesian inference and discusses their convergence properties. Section 4 reports the empirical results. Proofs are given in an Appendix.

2 Generating the states

This section gives the technical results that are required for the Markov chain Monte Carlo methods described in Section 3. We describe the State Space Model, the Sequential Monte Carlo algorithm to generate the particles, and the extra Markov chain Monte Carlo steps in our method to generate the states. We then derive the properties of the distributions resulting from our algorithms. We also give a conditional sequential Monte Carlo algorithm that is used for particle Gibbs steps in Section 3. We use the standard convention where capital letters denote random variables and lower case letters denote their values.

2.1 State Space Model

Consider the state space model with states denoted by and observations denoted by . We will assume the transition and observation distributions have positive densities denoted by

 X1 ∼ f1(⋅|\boldmath\mathchar274) \ \ \ and (2.1) Xt|(Xt−1=x) ∼ ft(⋅|x,\boldmath\mathchar274) \ \ t=2,…,T (2.2) Yt|(Xt=x) ∼ gt(⋅|x,\boldmath\mathchar274) \ \ t=1,…,T. (2.3)

All the densities are with respect to Lebesgue measure for continuous variables and counting measure for discrete valued variables unless otherwise indicated. The vector represents parameters which are discussed in Section 3 and in the examples in Section 4. We use the following notation for sequences, and we denote the joint density of given by

 p(y1:T,x1:T|\boldmath\mathchar274):=g1(y1|x1,\boldmath\mathchar274)f1(x1|\boldmath\mathchar274)T∏t=2gt(yt|xt,\boldmath\mathchar274)ft(xt|xt−1,\boldmath\mathchar274). (2.4)

2.2 Sequential Monte Carlo algorithm

The Sequential Monte Carlo algorithm we use for the state space model defined by (2.1)–(2.4) at time constructs a sample of particles denoted by with associated normalized weights that approximates the distribution by

 ˆp(dx1:t|y1:t,\boldmath\mathchar274):=Nt∑i=1Wit\boldmath\mathchar270Xi1:t(dx1:t). (2.5)

In the pseudocode of the sequential Monte Carlo Algorithm 1 described below we denote the unnormalized weights at time by and use the notation for the discrete probability distribution on of parameter , with and , for some . Algorithm 1 uses the importance densities and for . We make Assumption 1 about these importance densities for the results in later sections.

Assumption 1

and for are finite strictly positive densities.

Algorithm 1 is based on Andrieu et al. (2010) and we include it for completeness and notational consistency. We use the convention that whenever the index is used for a particular value of we mean ‘for all ’.

Algorithm 1 (Sequential Monte Carlo)
Step 1

For

Step 1.1

sample

Step 1.2

compute and normalize the weights

 wi1=p(y1|Xi1,\boldmath\mathchar274)p(Xi1|\boldmath\mathchar274)M1(Xi1|y1,\boldmath\mathchar274). (2.6)
 Wi1=wi1∑N1j=1wj1.
Step 2

For

Step 2.1

sample

Step 2.2

sample

Step 2.3

compute and normalize the weights

 wit=p(yt|Xit,\boldmath\mathchar274)p(Xit|XAit−1t−1,\boldmath\mathchar274)Mt(Xit|yt,XAit−1t−1,\boldmath\mathchar274). (2.7)
 Wit=wit∑Ntj=1wjt.

The variable in the above algorithm represents the index of the parent at time of particle . Our methods do not require the full trajectory of the states in a particle and are more concerned with the individual values for and for . We denote the collection of states at time by for and the corresponding collection of parent indices by for . We will also use the notation for sequences and .

2.3 MCMC steps to generate states

Algorithm 2 described below takes the output of the Sequential Monte Carlo steps described in Algorithm 1 and runs a backward simulation algorithm to generate extra state values. The state at time is generated using the approach in Andrieu et al. (2010) and the states at times are generated using an approach related to that of Bunch and Godsill (2013), which, at time involves a Markov chain Monte Carlo run of length . We denote the generated values at time by for and use the sequence notation

These Markov chain Monte Carlo runs involve the following components. For , the target density for the Metropolis-Hasting step is

 p(x1|y1,~xC22,\boldmath\mathchar274)∝g1(y1|x1,%\boldmath$\mathchar274$)f2(~xC22|x1,\boldmath\mathchar274),

so no approximation using the output from Algorithm 1 is required. For , the target densities for the Metropolis-Hasting steps are

 ˆp(xt|¯¯¯x1:t−1,~xCt+1t+1,¯¯¯a1:t−2,\boldmath\mathchar274)∝gt(yt|xt,\boldmath\mathchar274)ft+1(~xCt+1t+1|xt,\boldmath\mathchar274)Nt−1∑i=1wit−1ft(xt|xit−1,% \boldmath\mathchar274), (2.8)

which approximates

 p(xt|y1:t,Xt+1=~xCt+1t+1,\boldmath\mathchar274)∝gt(yt|xt,\boldmath\mathchar274)ft+1(~xCt+1t+1|xt,\boldmath\mathchar274)× ∫p(xt|y1:t−1,xt−1,\boldmath\mathchar274)p(xt−1|y1:t−1,\boldmath\mathchar274)dxt−1

based on the output from Algorithm 1. Similarly, for the target density for the Metropolis-Hasting steps is

 ˆp(xT−1|¯¯¯x1:T−2,~xbTT,¯¯¯a1:T−3,\boldmath\mathchar274)∝gt(yT−1|xT−1,\boldmath\mathchar274)fT(xbTT|xT−1,% \boldmath\mathchar274)NT−2∑i=1wiT−2fT−1(xT−1|xiT−2,% \boldmath\mathchar274), (2.9)

which approximates

 p(xT−1|y1:T−1,XT=xbTT,\boldmath\mathchar274)∝gT−1(yT−1|xT−1,\boldmath\mathchar274)fT(xbTT|xT−1,\boldmath\mathchar274)× ∫p(xT−1|y1:T−2,xT−2,\boldmath\mathchar274)p(xT−2|y1:T−2,\boldmath\mathchar274)dxT−2.

The following lemma follows immediately from the assumption that and are strictly positive densities for .

Lemma 1

The densities for and are strictly positive.

We denote the MCMC transition kernels by

 Kt(xt,dx′t|¯¯¯x1:t−1,¯¯¯x−btt,~xCt+1t+1,¯¯¯a1:t−2,a−btt−1,\boldmath\mathchar274), (2.10)

for and

 KT−1(xT−1,dx′T−1|¯¯¯x1:T−2,¯x−btT−1,xbTT,¯¯¯a1:T−3,a−bT−1T−2,\boldmath\mathchar274). (2.11)

The choice of Metropolis-Hastings proposal is determined by the user, but the conditioning indicated in (2.10) and (2.11) is sufficient for the results given in Sections 2.4 and 2.5. We require the standard reversibility condition of detailed balance as described in Assumption 2. Sections 3.3 and 4.1 give more detail on the transition kernels.

Assumption 2 (Detailed balance)

For all

(a)
 p(dx1|y1,~xC22,\boldmath\mathchar274)K1(x1,dx′1|¯¯¯x−b11,~xC22,\boldmath\mathchar274)=p(dx′1|y1,~xC22,\boldmath\mathchar274)K1(x′1,dx1|¯¯¯x−b11,~xC22,\boldmath\mathchar274),
(b)
 ˆp(dxt|¯¯¯x1:t−1,~xCt+1t+1,¯¯¯a1:t−2,\boldmath\mathchar274)Kt(xt,dx′t|¯¯¯x1:t−1,¯¯¯x−btt,~xCt+1t+1,¯¯¯a1:t−2,a−btt−1,% \boldmath\mathchar274) = ˆp(dx′t|¯¯¯x1:t−1,~xCt+1t+1,¯¯¯a1:t−2,% \boldmath\mathchar274)Kt(x′t,dxt|¯¯¯x1:t−1,¯¯¯x−btt,~xCt+1t+1,¯¯¯a1:t−2,a−btt−1,\boldmath\mathchar274),

for and

(c)
 =

Algorithm 2 generates the states using Markov chain Monte Carlo runs.

Algorithm 2 (Markov chain Monte Carlo)
Step 1

Run the sequential Monte Carlo algorithm (Algorithm 1) to obtain and .

Step 2

For sample

Step 3

For sample as follows

Step 3.1

compute and normalize the weights

 ˜wiT−1=wiT−1fT(XBTT|XiT−1,\boldmath\mathchar274) (2.12)
 ˜WiT−1=˜wiT−1∑NT−1j=1˜wjT−1,
Step 3.2

sample

Step 3.3

set

 ~XoldT−1=XBT−1T−1,
Step 3.4

For

Step 3.4.1

sample

 ~XjT−1∼KT−1(~XoldT−1,⋅|¯¯¯¯¯X1:T−2,X−BT−1T−1,XBTT,¯¯¯¯¯A1:T−3,A−BT−1T−2,%\boldmath$\mathchar274$),
Step 3.4.2

set

 ~XoldT−1=~XjT−1.
Step 4

For

Step 4.1

compute and normalize the weights

 ˜wit=witft+1(~XCt+1t+1|Xit,\boldmath\mathchar274) (2.13)
 ˜Wit=˜wit∑Nt−1j=1˜wjt,
Step 4.2

sample

Step 4.3

set

 ~Xoldt=XBtt,
Step 4.4

For

Step 4.4.1

sample

 ~Xjt∼Kt(~Xoldt,⋅|¯¯¯¯¯X1:t−1,X−Btt,~XCt+1t+1,¯¯¯¯¯A1:t−2,A−Btt−1\boldmath\mathchar274),
Step 4.4.2

set

 ~Xoldt=~Xjt.

2.4 Distributions on the extended space

This section first gives the joint probability distribution of the variables generated by Algorithms 1 and 2 before constructing our target distribution and deriving its properties. To simplify the notation, we group the variables together as . We denote the sample space of by

 U1:T=T∏t=1XNt×T−1∏t=1{1,…,Nt}Nt+1×T∏t=1{1,…,Nt}×T−1∏t=1XCt,

and the joint distrbution of generated by Algorithms 1 and 2  by Let

 r(¯¯¯at|Wt):=Nt∏i=1F(ait|Wt).

It is straightforward to show that the distribution of the variables generated by Algorithm 1 is

 (2.14)

see Andrieu et al. (2010) for details.

The conditional distribution generated by Algorithm 2 is

 Ψ(b1:T,d˜x1:T−1|¯¯¯x1:T,¯¯¯a1:T−1,%\boldmath$\mathchar274$) (2.15) = WbTT˜WbT−1T−1KT−1(xbT−1T−1,d˜x1T−1|¯¯¯x1:T−2,x−bT−1T−1,xbTT,¯¯¯a1:T−3,a−bT−1T−2,\boldmath\mathchar274) CT−1∏j=2KT−1(˜xj−1T−1,d˜xjT−1|¯¯¯x1:T−2,x−bT−1T−1,xbTT,¯¯¯a1:T−1,\boldmath\mathchar274) T−2∏t=1{˜WbttKt(xbtt,d˜x1t|¯¯¯x1:t−1,x−btt,˜xCt+1t+1,¯¯¯a,a−btt−1,\boldmath\mathchar274) Ct∏j=2Kt(˜xj−1t,d˜xjt|¯¯¯x1:t−1,x−btt,˜xCt+1t+1,¯¯¯a1:t−2,a−btt−1,\boldmath\mathchar274)},

and hence the joint distribution of is the product of (2.14) and (2.15).

We now construct a joint distribution on the variable that will be the target distribution of a Markov chain Monte Carlo sampling scheme to generate a sample from the posterior distribution of the states in a state space model. To simplify the notation, define

 \boldmath\mathchar281(x1:T|\boldmath\mathchar274):=p(x1:T|y1:T,\boldmath\mathchar274) (2.16)

as the posterior density of the states in the state space model defined by (2.1)–(2.4). The distribution we construct is

 Π(du1:T|\boldmath\mathchar274):= \boldmath\mathchar281(˜xC11,…,˜xCT−1T−1,xbTT|\boldmath\mathchar274)d˜xC11…d˜xCT−1T−1dxbTT(1NT) Ψ(d¯¯¯x1:T,¯¯¯a1:T−1|% \boldmath\mathchar274)M1(dxb11|y1,\boldmath\mathchar274)∏Tt=2{Wabtt−1t−1Mt(dxbtt|yt,xabtt−1t−1,\boldmath\mathchar274)} T∏t=2wabtt−1t−1ft(xbtt|xabtt−1t−1,\boldmath\mathchar274)∑Nt−1i=1wit−1ft(xbtt|xit−1,\boldmath\mathchar274) (1NT−1)CT−1∏j=2KT−1(˜xjT−1,d˜xj−1T−1|¯¯¯x1:T−2,x−bT−1T−1,xbTT,¯¯¯a1:T−3,a−bT−1T−2,\boldmath\mathchar274) KT−1(˜x1T−1,dxbT−1T−1|¯¯¯x1:T−2,x−bT−1T−1,xbTT,¯¯¯a1:T−3,a−bT−1T−2,%\boldmath$\mathchar274$) T−2∏t=1{(1Nt)Ct∏j=2Kt(˜xjt,d˜xj−1t|¯¯¯x1:t−1,x−btt,˜xCt+1t+1,¯¯¯a1:t−2,a−btt−1,\boldmath\mathchar274) Kt(˜x1t,dxbtt|¯¯¯x1:t−1,x−btt,˜xCt+1t+1,¯¯¯a1:t−2,a−btt−1,% \boldmath\mathchar274)}, (2.17)

which is well defined by Assumption 1.

The following lemma describes the properties of the distribution defined in (2.17). Its proof is given in the Appendix.

Lemma 2
(i)

The joint distribution has marginal distribution

 Π(d˜xC11,…,d˜xCT−1T−1,dxbTT,bT|\boldmath\mathchar274)=\boldmath\mathchar281(˜xC11,…,˜xCT−1T−1,xbTT|% \boldmath\mathchar274)d˜xC11…,d˜xCT−1T−1dxbTT(1NT).
(ii)

For all the measures and are equivalent.

(iii)

There exists a version of the density

 h(u1:T|\boldmath\mathchar274)=Π(du1:T|\boldmath\mathchar274)Ψ(du1:T|\boldmath\mathchar274)

with

 h(u1:T|\boldmath\mathchar274)=∏Tt=1{(∑Nti=1wit)(1Nt)}p(y1:T|\boldmath\mathchar274) (2.18)

Lemma 3 shows how to generate a sample from the distribution

 Π(b1:T,d˜x1:T−1|¯¯¯x1:T,¯¯¯a1:T−1,\boldmath\mathchar274). (2.19)

Its proof is given in the appendix.

Lemma 3
 Π(b1:T,d˜x1:T−1|¯¯¯x1:T,¯¯¯a1:T−1,\boldmath\mathchar274)=Ψ(b1:T,d˜x1:T−1|¯¯¯x1:T,¯¯¯a1:T−1,\boldmath\mathchar274),

and hence Algorithm 2 generates a sample from the distribution given in (2.19).

2.5 Conditional sequential Monte Carlo

This section gives a conditional sequential Monte Carlo algorithm that is used to construct a particle Gibbs step later in the paper. We first describe the algorithm and derive its properties. Section 3 shows how to use it in Markov chain Monte Carlo sampling schemes.

Algorithm 3 generates from the conditional distribution

 Π(d¯¯¯x1:T−1,dx−bTT,¯¯¯a1:T