An Introduction to Twisted Particle Filters and Parameter Estimation in Non-linear State-space Models

# An Introduction to Twisted Particle Filters and Parameter Estimation in Non-linear State-space Models

Juha Ala-Luhtala, Nick Whiteley, Kari Heine and Robert Piché J. Ala-Luhtala is with the Department of Mathematics, Tampere University of Technology, PO Box 553, 33101, Tampere, Finland, e-mail: juha.ala-luhtala@tut.fi.N. Whiteley is with the School of Mathematics, University of Bristol, University Walk, Bristol, BS8 1TW, UK, e-mail: nick.whiteley@bristol.ac.ukK. Heine is with the Department of Statistical Science, University College London, Gower Street, London, WC1E 6BT, UK, e-mail: k.heine@ucl.ac.ukR. Piché is with the Department of Automation Science and Engineering, Tampere University of Technology, e-mail: robert.piche@tut.fi.
###### Abstract

Twisted particle filters are a class of sequential Monte Carlo methods recently introduced by Whiteley and Lee [1] to improve the efficiency of marginal likelihood estimation in state-space models. The purpose of this article is to extend the twisted particle filtering methodology, establish accessible theoretical results which convey its rationale, and provide a demonstration of its practical performance within particle Markov chain Monte Carlo for estimating static model parameters. We derive twisted particle filters that incorporate systematic or multinomial resampling and information from historical particle states, and a transparent proof which identifies the optimal algorithm for marginal likelihood estimation. We demonstrate how to approximate the optimal algorithm for nonlinear state-space models with Gaussian noise and we apply such approximations to two examples: a range and bearing tracking problem and an indoor positioning problem with Bluetooth signal strength measurements. We demonstrate improvements over standard algorithms in terms of variance of marginal likelihood estimates and Markov chain autocorrelation for given CPU time, and improved tracking performance using estimated parameters.

Particle filter, sequential Monte Carlo, particle MCMC, Gaussian state-space model, parameter estimation.

## I Introduction

State-space models are applied to a wide variety of signal processing problems, especially in positioning, tracking and navigation [2, 3, 4]. These models need to be calibrated by inferring unknown parameters from data. There are a variety of approaches to this inference problem, such as maximum likelihood (ML) or maximum a posteriori (MAP) estimation using the Expectation Maximization algorithm or Laplace approximations, Gaussian filtering based approximations, and state augmentation techniques [5, 6, 3]. In this paper we consider a Bayesian approach, which has the advantage of allowing prior information about parameters to be imparted, and a variety of estimates and measures of uncertainty to be reported based on the posterior distribution. By using a Markov chain Monte Carlo (MCMC) algorithm, e.g. Metropolis-Hastings (M-H) (see [7] for an introduction), one can in principle explore the entire posterior, but in practice the design of efficient MCMC algorithm can be a challenging task.

A direct application of M-H to a state-space model requires evaluation of the marginal likelihood of data, which is a high-dimensional, analytically intractable integral in many cases of interest. However, this issue can be circumvented through the application of pseudo-marginal MCMC methods [8, 9], which allow MCMC algorithms yielding samples from the desired posterior to be constructed if an unbiased estimator of the marginal likelihood is available. Particle filters [10] (see [4, 3] for overviews in the context of tracking applications) provide such an estimator, and the resulting MCMC scheme is known as a particle Markov chain Monte Carlo (PMCMC) method [11]. Typically the most substantial contribution to the overall cost of a PMCMC algorithm arises from the need to run a particle filter at each iteration of the MCMC sampler, and the performance of the sampler is sensitive to the variability of the marginal likelihood estimate which the particle filter delivers [12]. This motivates the development of particle filters which can provide reliable marginal likelihood estimates at a low computational cost.

In this paper we develop new “twisted particle filtering” methodology, building from ideas recently introduced by [1]. Twisted particle filters are purposefully designed to provide more reliable approximations of marginal likelihoods than standard particle filters, while preserving the lack-of-bias property which permits their use within PMCMC.

Unlike traditional approaches to improving the efficiency of particle filters which modify the proposal distribution on a per-particle basis [13] or employ auxiliary weights for resampling [14], twisted particle filters are derived by applying a form of re-weighting to the particle system as a whole, using a so-called “twisting” function. The role of the twisting function is to incorporate information from the observations, possibly future and past, into the mechanism by which particles are propagated over time. The ability to choose different twisting functions introduces a degree of freedom into the design of the particle algorithm, leading naturally to questions of optimality. In concrete terms, if the twisting function is chosen well, the twisted particle filter can estimate the marginal likelihood with greater accuracy than a standard particle filter, in turn allowing more efficient estimation inference for static parameters in the state-space model.

The investigations of [1] focussed mainly on theoretical analysis of twisted particle filters, studying their asymptotic properties in the regimes where the number of particles tends to infinity and where the length of the time horizon grows, under probabilistic assumptions on the observation sequence and strong regularity conditions on the statistical model.

The objectives of this paper are to present new twisted particle filtering methodology, validate it theoretically, and demonstrate its application and effectiveness within PMCMC for inferring the parameters of state-space models. Our main contributions are as follows.

#### I-1 Algorithms

We introduce a general formulation of twisted particle filters. The first novel aspect of this formulation beyond that given in [1], is that it allows various resampling methods to be incorporated in twisted particle filters. In particular, we derive a twisted particle filter around the popular systematic resampling method, which is known to reduce variance within the particle filter. The second novel aspect of the new formulation is that it allows for twisting functions which depend on historical particle states, which is an important factor when designing them efficiently in practice. The methodology of [1], which treated only multinomial resampling and twisting functions which depend only on current particle states. The utility of these algorithmic developments is that they allow for more accurate estimation of marginal likelihoods.

#### I-2 Theory

We provide novel theoretical results which justify the new algorithms and characterize their optimal operation. The first result, Theorem 1, establishes the lack-of-bias property of the marginal likelihood approximation delivered by the new twisted particle filter. The importance of this result is that it justifies the use of the twisted particle filter within PMCMC, whilst allowing for more general structure of the resampling technique and twisting function than in [1].

The second result, Theorem 2, identifies the twisting functions which are optimal for approximating the marginal likelihood for a given finite number of observations. This provides a different perspective to the results of [1], which are asymptotic in nature, considering optimality in terms of minimal variance growth rate in the regime where the length of the data record tends to infinity. Theorem 2 relies on only mild regularity assumptions on the ingredients of the twisted particle filter, whereas the results of [1] assume a particularly strong form of geometric ergodicity of the signal in the hidden Markov model (HMM) and certain uniform upper and lower bounds on the likelihood functions. Moreover, compared to the analyses of [1], the proof of Theorem 2 is less intricate, and gives the reader a more accessible route to understanding how twisted particle filters work.

#### I-3 Approximation techniques

Informed by Theorem 2, we propose methods to approximate the optimal twisting function for nonlinear Gaussian state-space models, based on ideas of Kalman filtering methodology together with local linearization using historical particle states.

#### I-4 Applications and numerical results

We provide numerical results in the context of two applications.

The first application is a range and bearing tracking problem. This is a classical nonlinear tracking scenario and serves as a benchmark application of particle filters [15], [2]. The purpose of this example is to compare the performance of twisted particle filters and the corresponding PMCMC algorithms to standard particle filters in a situation where the ground truth for static parameters is available, with simulated data. The twisted particle filters we consider employ linearization techniques to approximate the optimal twisting functions. The results we obtain illustrate that twisted particle filters can more reliably approximate marginal likelihoods for the same or less computational cost than standard particle filters. The benefits of using twisted particle filters within PMCMC are also demonstrated in terms of lower auto-correlation, and consequently more accurate approximation of posterior distributions over static parameters. We also compare tracking performance based on estimated parameter values.

The second application is a more complex indoor positioning problem. In this application a state-space model represents the unknown position of a user over time, observed indirectly and with uncertainty through received signal strength (RSS) measurements. Such data are widely available from many different wireless communication systems including mobile networks and WLAN. They have been proposed for use in location estimation in a variety of novel location-aware applications, such as environmental and structure monitoring, and many military and public safety applications, see [16], [17] and references therein. We work with a real Bluetooth RSS data set. A key task when dealing with RSS measurements is to calibrate the model by estimating unknown parameters which describe attenuation characteristics of the environment in which the user moves, since otherwise one must resort to oversimplified models [17], which exhibit inferior tracking performance.

A variety of approaches to estimating these parameters have been suggested, involving least squares [17] and weighted least squares [18] methods. These techniques provide point estimates of parameter values from a batch of data. Bayesian approaches, e.g., [19], allow additionally for uncertainty associated with estimates to be reported, and incorporate prior distributions to quantify expert knowledge and physical constraints on parameter values. They also naturally handle uncertainty over state variables when inferring parameters through marginalization.

The price to pay for the Bayesian approach is the computational cost of Monte Carlo sampling, and so our numerical investigations largely focus on computational efficiency. We compare the performance of twisted particle filters to more standard particle filters using a variety of proposal and resampling techniques. We demonstrate improved CPU-time efficiency in estimating marginal likelihoods, and we show that this efficiency is carried over to the particle MCMC algorithm, giving performance gains in terms of quality of the resulting MCMC chain compared to PMCMC using a standard particle filter. We also demonstrate favourable tracking performance using parameter estimates obtained from PMCMC.

The structure of the paper is as follows. Section II gives the problem formulation. Section III introduces a PMCMC algorithm and a standard particle filter. Section IV presents the twisted particle filtering methodology and Theorems 1-2, which characterize the lack-of-bias property and optimal twisting functions. Computational complexity is also discussed. Section V introduces methods for approximating the optimal twisting functions in nonlinear state-space models with Gaussian noise. Section VI contains applications and numerical results. Finally, conclusions are presented in Section VII.

## Ii Problem formulation

We first introduce some notation. Uppercase is used to denote random variables (e.g. ) and realized values are denoted with lowercase (e.g. ). For any sequence and we write .

We consider state-space models of the form

 X0∼μ0,θ(⋅),Xk ∼fk,θ(⋅|Xk−1),k≥1, Yk ∼gk,θ(⋅|Xk),k≥0, (1)

where is the state vector, is the measurement vector, is the initial distribution, describes the transitions of the state process and is the conditional distribution for the measurement. All the model distributions are assumed to admit probability densities denoted with the same letter as the distribution. The joint density of the state-variables and measurements for is given by

 pθ(y0:k,x0:k) =μ0,θ(x0)g0,θ(y0|x0) ⋅k∏s=1fs,θ(xs|xs−1)gs,θ(ys|xs). (2)

The parameter vector contains all the unknown parameters of the model.

We are mainly concerned in estimating the unknown parameters using a set of realized measurements . In the Bayesian framework, the parameters are considered as random variables and estimates are computed using the posterior distribution

 p(θ|y0:t)∝pθ(y0:t)p(θ), (3)

where is the likelihood and is the prior.

With the shorthand

 π−k,θ(dxk):=pθ(dxk|y0:k−1),πk,θ(dxk):=pθ(dxk|y0:k),

the likelihood term can be evaluated recursively, for ,

 pθ(y0:k) =pθ(y0:k−1)∫Xgk,θ(yk|xk)π−k,θ(dxk), (4)

, and

 π−k,θ(xk)=∫Xfk,θ(xk|xk−1)πk−1,θ(dxk−1),k≥1, (5)
 πk,θ(xk)∝{g0,θ(y0|x0)μ0,θ(x0),k=0,gk,θ(yk|xk)π−k,θ(xk),k≥1. (6)

Exact inference using (3) directly is usually intractable, since the likelihood term can be evaluated exactly for only some special models (e.g. linear Gaussian model). We consider particle filtering methods for computing unbiased estimates for the likelihood term. These can then be used as a part of particle MCMC methods that draw samples from the posterior distribution of interest.

## Iii Particle MCMC

In this section we describe methods for drawing samples from the parameter posterior distribution in (3). Algorithms targeting only the parameter posterior are often called marginal algorithms, because samples are drawn only from the marginal posterior instead of the full posterior .

MCMC methods generate samples from the target posterior distribution by simulating a Markov chain that has the target posterior distribution as a stationary distribution [7]. One of the best known and general MCMC methods is the Metropolis-Hastings (MH) algorithm, where a new sample at step is generated from a proposal distribution . The generated sample is then accepted with probability

 min{1,pθ∗(y0:t)p(θ∗)pθi−1(y0:t)p(θi−1)κ(θi−1|θ∗)κ(θ∗|θi−1)}. (7)

To compute this acceptance probability, we need to evaluate likelihood terms , but that is not possible for a general nonlinear state-space model. However, if an unbiased estimator for the likelihood is available, it is still possible to construct an MCMC algorithm to sample from the posterior distribution [8, 9]. For state-space models, we can use particle filters as unbiased estimators of the likelihood [11]. A Metropolis-Hastings algorithm using particle filters to estimate the likelihood terms, called particle marginal Metropolis-Hastings (PMMH) [11], is given in Algorithm 1.

### Iii-a Particle filtering

We proceed with an account of a standard particle filter. Our notation is in some places a little non-standard, but is chosen deliberately to help with the derivation of twisted particle filters in Section IV. Henceforth, for notational simplicity, we often omit the subscript and implicitly assume that the distributions can depend on the parameters.

We denote the set of particles at time by , with corresponding unnormalized weights . The filtering distribution is approximated by

 πk,θ(dxk)≈∑ni=1Wikδξik(dxk)∑ni=1Wik, (8)

where denotes a unit point mass centered at .

In order to describe the sampling mechanism for the particles and understand certain properties of the algorithm it is convenient to also introduce, for each , the ancestor indicator variables , where each takes a value in . If we also define for each and , by letting and for , recursively , , then we can write the “ancestral line” of particle as

 Lik:=(ξik,ξBik,k−1k−1,…,ξBik,00), (9)

which is a -valued random variable.

A particle filter is given in Algorithm 2. Here the proposal distributions are assumed to be chosen such that for each the weights are strictly positive and finite. Each may be chosen to depend also on the observations , but this dependence is suppressed from the notation.

### Iii-B Resampling

Lines 7 and 8 in Algorithm 2 together implement a generic resampling operation. Line 7 generates consisting of i.i.d. random variables, each uniformly distributed on . Line 8 passes and the unnormalized weights to a deterministic mapping , which returns the ancestor indicator variables . With indicating the th element in the vector returned by , for brevity we sometimes write .

A variety of resampling mechanisms can be cast in this form through specific choices of and . We describe here two well known schemes: the multinomial and systematic methods; see [20] for background information. These techniques are standard; the details are included here in order to prepare for the presentation of the non-standard resampling techniques in twisted particle filters.

1. Multinomial resampling: We have and the mapping is defined as

 ri(u,w)=j⇔ui∈(dj−1,dj], (10)

where and .

2. Systematic resampling: We have and the mapping is defined as

 ri(u,w)=j⇔u+i−1∈(ndj−1,ndj], (11)

where and .

Systematic resampling is computationally light and has been found to have good empirical performance, although theoretical analysis is difficult due to high dependence between the resampled particles. Nevertheless, it is known, see e.g. [20], that both multinomial and systematic resampling satisfy Assumption 1 below.

We define the shorthand notation and for , .

###### Assumption 1.

The mapping is such that for any and integrable function ,

 E[1nn∑i=1φ(Lrik(Uk)k)∣∣ ∣∣Fk]=∑ni=1Wikφ(Lik)∑ni=1Wik,

where denotes expectation when sampling according to Algorithm 2.

Lines 5 and 13 compute a sequence , where each is an estimate of . The following proposition justifies the use of Algorithm 2 to provide an unbiased estimate of at line 5 of Algorithm 1. This kind of result is well known; a proof is outlined in Appendix A for completeness.

###### Proposition 1.

If Assumption 1 holds, then for each , .

## Iv Twisted particle filters

In order to introduce and validate twisted particle filters we think more explicitly about and the sequence as a stochastic process and consider the following initial and conditional distributions, according to which and evolve when sampled through Algorithm 2.

 M0(dξ0)=n∏i=1q0(dξi0), (12a) Mk(dξk,duk−1|Fk−1) =U(duk−1)n∏i=1qk(dξik|Lrik−1(uk−1)k−1), (12b)

where denotes the uniform distribution on .

Twisted particle filters are obtained by sampling the process , from alternatives to (12a)–(12b), which we discuss in more detail below.

###### Remark 1.

For historical perspective, we note that the idea of constructing alternative distributions over the random variables in particle filters appears in some of the theoretical arguments which justify PMCMC [11]. However, the specifics of twisted particle filters are more akin to eigenfunction changes of measure for branching processes, which were studied earlier in the stochastic processes literature, see [21, Section 3] and references therein.

Let be a sequence of strictly positive functions, such that and for , . We shall often write interchangeably . Each may also depend on and any number of the measurements , but this dependence is suppressed from the notation.

The initial and conditional distributions for the twisted particle filter are given by

 ˜M0(dξ0)∝1nn∑s=1M0(dξ0)ψ0(ξs0), (13a) ˜Mk(dξk,duk−1|Fk−1) ∝1nn∑s=1Mk(dξk,duk−1|Fk−1)ψk(Lrsk−1(uk−1)k−1,ξsk), (13b)

where the functions are called “twisting functions”. To avoid some tangential complications we shall assume henceforth that for each , which is sufficient to ensure that the integrals needed to normalize and each are finite.

A more explicit expression for is obtained by plugging in (12a) and normalizing, to give

 ˜M0(dξ0)=1nn∑s=1˜q0(dξs0)∏i≠sq0(dξi0),

where . So to sample from , one first draws a random variable, say , from the uniform distribution on , then samples and for . Deriving a similar sampling recipe for is somewhat more involved. We state the resulting procedure in Algorithm 3, then formalize its validity and other properties in Theorems 1 and 2.

To write out Algorithm 3 we need a few more definitions. For , define the twisted (unnormalized) weights

 ˜Wik:=Wik˜Vik,1≤i≤n, (14)

where

 ˜Vik:=∫Xψk+1(Lik,xk+1)qk+1(dxk+1|Lik). (15)

For , define the twisted proposal distribution

 ˜qk(dxk|x0:k−1)∝ψk(x0:k)qk(dxk|x0:k−1). (16)

Let be a discrete random variable conditional on , having distribution on , whose probabilities are proportional to

 ˜Sk(Sk=s)∝∫[0,1]mU(du) ⋅∫Xψk(Lrsk−1(u)k−1,xk)qk(dxk|Lrsk−1(u)k−1). (17)

Also introduce a distribution on given by

 ˜Uk−1(du|s)∝ U(du)∫Xψk(Lrsk−1(u)k−1,xk)qk(dxk|Lrsk−1(u)k−1). (18)

Note that the distributions and depend on the resampling method defined through the mapping . Details of how to sample from these distributions in the cases when corresponds to multinomial or systematic resampling are given in Sections IV-A and IV-B.

Our first main result, Theorem 1, establishes that Algorithm 3 indeed samples from (13a)–(13b) and delivers unbiased estimates of , which justifies its use within Algorithm 1. The proof is given in Appendix B.

###### Theorem 1.

The random variables and sampled using Algorithm 3 are drawn from (13a)–(13b). Furthermore, if Assumption 1 holds, then for each ,

 ˜E[˜Zk]=E[Zk]=p(y0:k), (19)

where (resp. ) denotes expectation w.r.t. (13a)–(13b) (resp. (12a)–(12b)).

Theorem 2 identifies the choice of the functions which are ideal for estimating . The proof is given in Appendix C.

###### Theorem 2.

If we choose

 ψk(x0:k) =⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩μ0(x0)p(y0:t|x0)q0(x0),k=0,fk(xk|xk−1)p(yk:t|xk)qk(xk|x0:k−1),1≤k≤t, (20)

then .

The choice of identified in (20) is of course not usually available in practice, but Theorem 2 motivates us to consider twisting functions of the form

 ψk,l(x0:k) =⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩μ0(x0)ϕ0,l(x0)q0(x0),k=0,fk(xk|xk−1)ϕk,l(x0:k)qk(xk|x0:k−1),1≤k≤t, (21)

where the functions are chosen to approximate , possibly also depending on , and is a positive integer parameter such that , which specifies how many future measurements are used in the twisting function. Devising such approximations is the subject of Section V.

We conclude Section IV by showing how to sample and on lines 11 and 12 in Algorithm 3.

### Iv-a Twisted multinomial resampling

In this case , and using definition (10) for , it is easily checked that the probabilities in (17) are independent of the value , i.e. is the uniform distribution over .

The density function corresponding to (18) can be written as

 ˜Uk−1(u|s) ∝I[0,1](us)∫Xψk(Lrsk−1(us)k−1,xk)qk(dxk|Lrsk−1(us)k−1) ⋅∏i≠sI[0,1](ui) =[n∑j=1I(dj−1,dj](us)~vjk−1]∏i≠sI[0,1](ui),

where the equality uses (10), and , for , for any set , , when and zero otherwise, and the terms are given by (15).

We therefore have the following procedure for sampling and from and respectively:

1. Sample uniformly from

2. Sample index from the discrete distribution on such that the probability that is proportional to

 ∫[0,1]I(dj−1k−1,djk−1](us)dus~vjk−1=wjk−1~vjk−1
3. Sample from the uniform distribution on and for each , from the uniform distribution on

### Iv-B Twisted systematic resampling

In this case we have , and using definition (11) for , the probabilities in (17) are

 ˜Sk(Sk=s) ∝∫[0,1]U(du)∫Xqk(dxk|Lrsk−1(u)k−1)ψk(Lrsk−1(u)k−1,xk) =∑{j|Ij,sk−1≠∅}[min(ndjk−1−s+1,1) −max(ndj−1k−1−s+1,0)]~vjk−1, (22)

where the first equality follows from (11), and and are defined as in the twisted multinomial resampling.

The probability density function corresponding to (18) can be written as

 ˜Uk−1(u|s) ∝I[0,1](u)∫Xψk(Lrsk−1(u)k−1,xk)qk(dxk|Lrsk−1(u)k−1) =n∑j=1IIs,jk−1(u)~vjk−1,

where the equality follows from (11).

This leads to the following procedure for sampling and from and respectively:

1. Sample from a distribution over with probabilities given by (22)

2. Sample index from the discrete distribution on such that the probability that is proportional to

 ∫[0,1]IISk,jk−1(u)du~vjk−1 =[min(ndjk−1−Sk+1,1) −max(ndj−1k−1−Sk+1,0)]~vjk−1,

if , and otherwise the probability that is zero.

3. Sample from the uniform distribution on

### Iv-C Complexity of twisted resampling methods

Twisted multinomial resampling involves sampling 2 times from a discrete distribution with elements and times from a continuous uniform distribution, and can be implemented in time.

Twisted systematic resampling involves sampling two times from a discrete distribution with elements and one time from a continuous uniform distribution, and can be implemented in time. Compared to twisted multinomial resampling, some computation time is saved since only one draw from the continuous uniform distribution is needed. However, the computation of the probabilities for the discrete distributions is computationally more involved for the twisted systematic resampling.

The overall complexity of Algorithm 3 depends on the specific nature of the twisting function and how it is computed. This is a problem-specific issue, which we discuss in the context of a particular family of models and twisting functions in Section V-C.

## V Twisted particle filters for Gaussian state-space models

In this section, we present methods for approximating the optimal twisting function in Gaussian state-space models with , and

 μ0(⋅) =N(⋅|ν0,P0), (23a) fk(⋅|xk−1) =N(⋅|ck−1(xk−1),Qk−1),k≥1, (23b) gk(⋅|xk) =N(⋅|hk(xk),Rk),k≥0, (23c)

where denotes a Gaussian distribution with mean vector and covariance matrix . The mean functions and can be nonlinear functions of the state vector.

To use the twisted particle filter in practice, we need to evaluate the integrals in (15) and sample from the twisted distributions given by (16). For the Gaussian model, we choose an exponential form for the function in (21), given by

 ϕk,l(x0:k)=αk,lexp{−12xTkΓk,lxk+xTkβk,l}, (24)

where , and are parameters, possibly depending on and any number of measurements. For , we use shorthand notation , and . Methods for computing these parameters are considered in Sections V-A and V-B.

With twisting function given by (21) and (24), we have , where

 μ0,l =Σ0,l(P−10ν0+β0,l), (25a) Σ0,l =(P−10+Γ0,l)−1. (25b)

For and , we have , where

 μik,l =Σik,l(Q−1k−