Bayesian Conditional Monte Carlo Algorithms for Sequential Single and Multi-Object filtering

# Bayesian Conditional Monte Carlo Algorithms for Sequential Single and Multi-Object filtering

Yohan Petetin Telecom and Mines Institute / Telecom SudParis / CITI Department and CNRS UMR 5157, 9 rue Charles Fourier, 91011 Evry, France. Email: {yohan.petetin,francois.desbouvries}@telecom-sudparis.eu    François Desbouvries Telecom and Mines Institute / Telecom SudParis / CITI Department and CNRS UMR 5157, 9 rue Charles Fourier, 91011 Evry, France. Email: {yohan.petetin,francois.desbouvries}@telecom-sudparis.eu
###### Abstract

Bayesian filtering aims at tracking sequentially a hidden process from an observed one. In particular, sequential Monte Carlo (SMC) techniques propagate in time weighted trajectories which represent the posterior probability density function (pdf) of the hidden process given the available observations. On the other hand, Conditional Monte Carlo (CMC) is a variance reduction technique which replaces the estimator of a moment of interest by its conditional expectation given another variable. In this paper we show that up to some adaptations, one can make use of the time recursive nature of SMC algorithms in order to propose natural temporal CMC estimators of some point estimates of the hidden process, which outperform the associated crude Monte Carlo (MC) estimator whatever the number of samples. We next show that our Bayesian CMC estimators can be computed exactly, or approximated efficiently, in some hidden Markov chain (HMC) models; in some jump Markov state-space systems (JMSS); as well as in multitarget filtering. Finally our algorithms are validated via simulations.

###### Keywords:
Conditional Monte Carlo Bayesian Filtering Hidden Markov Models Jump Markov state space systems Rao-Blackwell Particle Filters Probability Hypothesis Density.

## 1 Introduction

### 1.1 SMC algorithms for single- or multi-object Bayesian filtering

In single object Bayesian filtering we consider two random processes and with given joint probability law. is observed, i.e. we have at our disposal realizations of (as far as notations are concerned, upper case letters denote random variables (r.v.), lower case ones their realizations, and bold letters vectors; , say, denotes the pdf of r.v. and , say, the conditional pdf of given ; if is a shorthand notation for ; if are samples from then the set can also be denoted ; subscripts are reserved for times indices and superscripts for realizations). Process is hidden, and our aim is to compute, for each time instant , some moment of interest

 Θn=∫f(x0:n)p(x0:n|y0:n)dx0:n (1)

of the a posteriori pdf of given . Unfortunately, in most models (1) cannot be computed exactly. Suboptimal solutions for computing include SMC techniques livredoucet () arulampalamshort (), which propagate over time weighted trajectories with . In other words, , in which is the Dirac mass, is a discrete (and random) approximation of .

On the other hand, multi-object filtering (see e.g. MAHLER_ARTICLE2003 ()) essentially reduces to computing in which is now the so-called Probability Hypothesis Density (PHD), i.e. the a posteriori spatial density of the expected number of targets, given all measurements (be they due to detected targets or to false alarms). Again, SMC techniques propagate an approximation of with a set of weighted samples ; here , which in general is different from , is an estimator of the number of targets.

Now, SMC algorithms, be they for single- or multi-object Bayesian filtering, usually focus on how to propagate approximations (or ) of (or ); once or has been computed, is finally estimated either as or . By contrast, in this paper we directly focus on itself, and see under which conditions one can improve this point estimator at a reasonable computational cost.

### 1.2 Variance reduction via conditioning: Rao-Blackwellized particle filters (RB-PF)

This problem leads us to variance reduction techniques which form an important part of computer simulation (see e.g. asmussen-glynn ()). Among them, methods based on conditioning variables rely on the following well known result. Let and be two r.v. and some function. Then

 E(E(f(X2)|X1)) = E(f(X2)), (2) var(E(f(X2)|X1)) = var(f(X2))−E(var(f(X2)|X1)). (3)

So if the aim is to compute and we have at our disposal , then the so-called CMC estimator has lower variance than the corresponding crude MC one Of course, the interest of vs. depends on the choice of : ideally, one should easily sample from ; the variance reduction in (3) should be as large as possible; but in the meantime function should remain computable at a reasonable computational cost.

Variance reduction techniques based on CMC methods have been adapted to Bayesian filtering; in this context, these methods are either known as marginalized or RB-PF Chen-Mixture-Kalman () doucet-sequentialMC () doucet-jump-Mkv () Shon-MPF (). The rationale is as follow. Let now in (1) be rewritten as . It is usually not possible to sample from , and often is only known up to a constant, whence the use of Bayesian (or normalized) importance sampling (IS) techniques Gewecke (). So let

 ˆΘ(x1:N1,x1:N2) =N∑i=1wi2(x1:N1,x1:N2)f(xi1,xi2) with (xi1,xi2)∼q2, (4) ˆΘRB(x1:N1) =N∑i=1wi1(x1:N1)E(f(xi1,X2)|xi1) with xi1∼q1, (5)

with Estimator depends on samples only and is known as the RB estimator of . However is known to outperform only under specific assumptions on , , and . In particular, if , and , then the variance of can only lower than that of doucet-sequentialMC (). If moreover are independent, an asymptotic analysis based on (2) and (3) proves that indeed outperforms doucet-jump-Mkv (). However, independence never holds in the presence of resampling; in the general case, the comparison of both estimators depends on the choice of the importance distributions and , and can be proved (asympotically) only under specific sufficient conditions chopin () lindstein-rao ().

RB-PF have been applied in the specific case where the state vectors can be partitioned into a “linear” component and a “non-linear” one . Models in which computing is possible include linear and Gaussian JMSS doucet-jump-Mkv () Chen-Mixture-Kalman () or partially linear and Gaussian HMC Shon-MPF (). In other models, it may be possible to approximate by using numerical approximations of and of . However, due to the spatial structure of the decomposition of , approximating in (1) involves propagating numerical approximations over time.

### 1.3 Bayesian CMC estimators

#### 1.3.1 Spatial vs. temporal RB-PF

In this paper we propose another class of RB-PF; the main difference is that our partitioning of is now temporal rather than spatial. The question arises naturally in the Bayesian filtering context: at time we usually build from , but indeed was also available for free since, by nature, sequential MC algorithms construct from . Now, comparing with spatially partitioned RB-PF, a temporal partition of has a number of statistical and computational structural consequences, as we now see. So let again

 Θ = ∫f(x1,x2)p(x1,x2)dx1dx2 (6) = ∫[∫f(x1,x2)p(x2|x1)dx2]p(x1)dx1. (7)

Let us start from the following approximation of :

 p(x1)≈ˆp(x1)=N∑i=1wi(x1:N1)δxi1. (8)

For let next . This yields the following approximation of :

 ˆp(x1,x2)=N∑i=1wi(x1:N1)δ(xi1,xi2); (9)

note that each weight may depend on , but not on . The reason why is that we now use a temporal partition, and not a spatial one: in the spatial subdivision case, would reduce to , which means that we would need to sample at each time step the whole set , instead of simply extending the trajectories.

Finally we have two options: computing the full expectation in (6) by using (9), or only the outer one in (7) by using (8). So let

 ˆΘ(x1:N1,x1:N2) = N∑i=1wi(x1:N1)f(xi1,xi2), (10) ˜Θ(x1:N1) = N∑i=1wi(x1:N1)[∫f(xi1,x2)p(x2|xi1)dx2]. (11)

In this paper, we shall call (resp. ) the Bayesian crude MC (resp. Bayesian CMC) estimator of .

#### 1.3.2 Discussion

Let us now compare to . As in section 1.2, outperforms for all , but not for the same reasons. Indeed we have

 E(wi(X1:N1)f(Xi1,Xi2)|x1:N1)=wi(x1:N1)∫f(xi1,x2)p(x2|xi1)dx2. (12)

So from (3), the variance of each term of (11) is lower than or equal to that of the corresponding term in (10); however this is not sufficient to conclude that since the terms may be dependent. Fortunately (12) implies that , so is preferable to , due to (2) and (3).

Let us now turn to practical considerations. Of course, is of interest only if the conditional expectation in (11) can be computed easily. In the rest of this paper we will see that this indeed is the case in some Markovian models and for other models, we will propose and discuss some approximations which make the Bayesian CMC estimator a tool of practical interest for practitioners which may be used as an alternative to purely Monte Carlo classical PF. From a modeling point of view, by contrast with spatially partitioned RB-PF, the state space no longer needs to be multi-dimensional; here a key point is the availability (and integrability) of , which, in the temporal partitions considered below, will coincide with the so-called optimal conditional importance distribution. From a numerical point of view, another interesting feature of sequential RB-PF is that numerical approximations, when necessary, do not need to be propagated over time.

Let us finally address complexity. As we shall see, in some cases can even be computed under the same assumptions and for the same computational cost as (see sections 2.2.1 and 3.2.2). Also one should note that in the partition of a given set of variables (, say) should be as small as possible. More precisely, let and let be available. Then two Bayesian CMC estimators can be thought of : built from , in which the inner expectation (w.r.t. ) is computed exactly, and built from and from . Estimator is preferable to , but computing requires an additional exact expectation computation, since . As we shall see in section 3.2.2, in some Markovian models both estimators can indeed be computed; and computing only requires an additional computational cost.

The rest of this paper is organized as follows. First in section 2 we see that in some HMC models (including the Autoregressive Conditional Heteroscedasticity (ARCH) ones), a Bayesian CMC estimator can replace the classical one in the case where the sampling importance resampling (SIR) algorithm with optimal importance distribution is used. In Section 3 we develop our Bayesian CMC estimators for JMSS; in section 3.1 we address the linear and Gaussian case, where our solution can be seen as a further (temporal) RB step of an already (spatial) RB-PF algorithm; in section 3.2 we develop Bayesian CMC estimators for general JMSS. Finally in Section 4 we address a multi-target scenario and adapt Bayesian CMC to the PHD filter. In all these sections we propose relevant approximate estimators when the Bayesian CMC estimator cannot be computed exactly, and we validate our algorithms via simulations. We finally end the paper with a Conclusion.

## 2 Bayesian CMC PF for some HMC models

### 2.1 Deriving a Bayesian CMC estimator ˜Θn

Let (resp. ) be a - (resp. -) dimensional state vector (resp. observation). We assume that follows the well known HMC model:

 p(x0:n,y0:n)=p(x0)n∏i=1fi|i−1(xi|xi−1)n∏i=0gi(yi|xi), (13)

in which is the transition pdf of Markov chain and the likelihood of given . The Bayesian filtering problem consists in computing some moment of interest , which we rewrite as

 Θn = ∫f(xn)p(x0:n−1,xn|y0:n)dx0:n−1dxn. (14)

So (14) coincides with (6), with , , depends on only, and is the a posteriori (i.e., given ) joint pdf

 p(x0:n−1,xn|y0:n)=p(x0:n−1|y0:n)p(x1)p(xn|xn−1,yn)p(x2|x1). (15)

According to (8) we first need an approximation of , which in model (13) reads:

 p(x0:n−1|y0:n)=p(yn|xn−1)p(x0:n−1|y0:n−1)∫p(yn|xn−1)p(x0:n−1|y0:n−1)dx0:n−1, (16)

in which . On the other hand, PF algorithm propagate approximations of or of . So let us start from . According to Rubin’s SIR mechanism rubin1988 () gelfand-smith () smith-gelfand () , where , is an approximation of . Next in (15) coincides with the so-called optimal conditional importance pdf, i.e. the importance density which minimizes the conditional variance of weights , given past trajectories and observations zaritskii1975 () kong1994 () liu-chen1995 () and doucet-sequentialMC (). This leads to the so-called SIR algorithm with optimal importance distribution and optional resampling step:

SIR algorithm. Let be an MC approximation of .

1. For all , , sample ;

2. For all , , set , ;

3. (Optional). For all , , (re)sample , and set ; otherwise set .

This third resampling step is usually performed only if some criterion holds, and aims at preventing weights degeneracy, see e.g. livredoucet (), arulampalamshort (). Then

 ˆpSIR0:n|n=N∑i=1˜winδxi0:n−1,˜xin (17)

is a (SIR-based) SMC approximation of , and plays the role of in (9). Finally from (10) and (11), the SIR-based crude and CMC estimators of moment defined in (14) are respectively

 ˆΘSIRn(x1,N0:n−1,˜x1:Nn) = N∑i=1˜win(x1:N0:n−1)f(˜xin), (18) ˜ΘSIRn(x1:N0:n−1) = N∑i=1˜win(x1:N0:n−1)∫f(xn)p(xn|xin−1,yn)dxn. (19)

### 2.2 Computing ˜ΘSIRn in practice

#### 2.2.1 Exact computation

From (12) we know that outperforms ; but can be used only if and integral can be computed. As we now see, this is the case in some particular HMC models and for some functions . Let us e.g. consider the semi-linear stochastic models with additive Gaussian noise, given by

 Xn = fn(Xn−1)+Kn(Xn−1)×Un, (20) Yn = HnXn+Vn, (21)

in which and are i.i.d., mutually independent and independent of , and . The one-dimensional ARCH model is one such model with , and . In model (20) (21) and are Gaussian. More precisely, let ; then

 Ln(xn−1) = HnQn(xn−1)HTn+Rvn, (22) mn(xn−1,yn) = fn(xn−1)+Qn(xn−1)HTnL−1n(xn−1)(yn−Hnfn(xn−1)), (23) Pn(xn−1) = Qn(xn−1)−Qn(xn−1)HTnL−1n(xn−1)HnQn(xn−1), (24) p(xn|xn−1,yn) = N(xn,mn(xn−1,yn),Pn(xn−1)), (25) p(yn|xn−1) = N(yn,Hnfn(xn−1),Ln(xn−1)). (26)

Finally in such models the Bayesian CMC estimator is workable for some functions . If is a polynomial in , the problem reduces to computing the first moments of the available Gaussian pdf (25). In the important particular case where (used to give an estimator of the hidden state), no further computation is indeed necessary; in this case the integral in (19) is equal to .

###### Remark 1

In this class of models, computing or requires the same computational cost if . Both estimators indeed compute the parameters and of , and use these pdfs to sample the new particles , which in both cases are needed for the next time step. The only difference is that , while .

#### 2.2.2 Approximate computation

Let us now discuss cases where the Bayesian CMC estimator cannot be computed exactly because and/or moments of are not computable. Two approximations are proposed:

• Available techniques such as local linearizations doucet-sequentialMC (), Taylor series expansion Saha_EMM () or the Unscented Transformation (UT) julier-procieee () have already been proposed for approximating and a moment of , so one can use any of them in (19). The resulting algorithm can be seen as an alternative to solutions like the Extended Kalman Filter (EKF) or the Unscented Kalman filter (UKF), where we look for approximating the filtering pdf by a Gaussian and which rely on linearizations or the UT; or to SMC methods, where we look for a discrete approximation of . In our approximate Bayesian CMC technique, we start from a discrete approximation of produced by an SMC method, then similarly to the EKF/UKF, we look for a numerical approximation of , given that discrete approximation of . However, deriving a good approximation of can be an intricate issue, so we next look for approximations which do not rely on an approximation of .

• In the SIR algorithm used so far, is drawn from , whence a weight update factor equal to . On the other hand, sampling from an alternate (i.e., not necessarily optimal) pdf yields an approximation of given by , where weights are now proportional to , and so depend also on the new samples . In that case, the associated Bayesian CMC and crude estimators become

 ˆΘn(x1:N0:n−1,˜x1:Nn) = N∑i=1˜win(x1:N0:n−1,˜x1:Nn)f(˜xin), (27) ˜Θn(x1:N0:n−1,˜x1:Nn) = N∑i=1˜win(x1:N0:n−1,˜x1:Nn)∫f(xn)p(xn|xin−1,yn)dxn, (28)

which can no longer be compared easily (it was the case in section 1.3, because the weights in in (10) and in (11) depend on only). On the other hand, the computation of (28) does not require that of , but only that of . This is of interest in some models where approximating may be challenging because of the form of , while the first order moments of can be computed or approximated easily Saha_EMM ().

The two approximate implementations of the Bayesian CMC estimator which we just discussed will be compared via simulations in section §2.4.3.

### 2.3 Alternate Bayesian CMC solutions

#### 2.3.1 A Bayesian CMC estimator based on the fully-adapted auxiliary particle filter (FA)

The SIR algorithm of section 2.1 is not the only SMC algorithm which enables to compute an approximation of in which weights depend on only. Starting from , the so-called FA algorithm auxiliary () fearnhead () is one such alternative:

FA algorithm. Let be an MC approximation of .

1. For all , , set , ;

2. For all , , sample ,

3. For all , , sample and set , .

Finally

 ˆpFA0:n−1,n|n=N∑i=11Nδ˜xi0:n−1,xin (29)

is the FA-based SMC approximation of , and the FA-based crude and CMC estimators of become respectively

 ˆΘFAn(˜x1:N0:n−1,x1:Nn) = ˆΘFAn(x1:Nn)=1NN∑i=1f(xin), (30) ˜ΘFAn(˜x1:N0:n−1) = ˜ΘFAn(˜x1:Nn−1)=1NN∑i=1∫f(xn)p(xn|˜xin−1,yn)dxn. (31)

#### 2.3.2 Discussion

Comparing with section 2.1, we see that two Bayesian CMC estimators are indeed available: the SIR-based one given by (19), and the FA-based one given by (31). The natural question which arises at this point is thus to wonder which one is best. Two arguments are available.

Let us first start from a common MC approximation of . Given and , trajectories produced by the FA algorithm are i.i.d. from