Stability of Feynman-Kac formulae with path-dependent potentials

# Stability of Feynman-Kac formulae with path-dependent potentials

## Abstract

Several particle algorithms admit a Feynman-Kac representation such that the potential function may be expressed as a recursive function which depends on the complete state trajectory. An important example is the mixture Kalman filter, but other models and algorithms of practical interest fall in this category. We study the asymptotic stability of such particle algorithms as time goes to infinity. As a corollary, practical conditions for the stability of the mixture Kalman filter, and a mixture GARCH filter, are derived. Finally, we show that our results can also lead to weaker conditions for the stability of standard particle algorithms, such that the potential function depends on the last state only.

## 1 Introduction

The most common application of the theory of Feynman-Kac formulae (see e.g. Del Moral, 2004) is nonlinear filtering of a hidden Markov chain , based on observed process . In such settings, the potential function at time typically depends only on the current state . The uniform stability of the corresponding particle approximations can be obtained under appropriate conditions, see Section 7.4.3 of the aforementioned book and references therein. For a good overview of the theoretical and methodological aspects of particle approximation algorithms, also known as particle filtering algorithms, see also Doucet et al. (2001), Künsch (2001), and Cappé et al. (2005).

They are however several applications of practical interest where the potential function depends on the complete state trajectory . The corresponding particle filtering algorithms still have a fixed computational cost per iteration, because the potential can be computed using recursive formulae. An important example is the class of conditional linear Gaussian dynamic models, where the conditioning is on some unobserved Markov chain . The corresponding particle algorithm is known as the mixture Kalman filter (Chen and Liu, 2000, see also Example 7 in Doucet et al., 2000, and Andrieu and Doucet, 2002, for a related algorithm): the potential function at time is then a Gaussian density, the parameters of which are computed recursively using the Kalman-Bucy filter (Kalman and Bucy, 1961). Another example is the mixture GARCH model considered in Chopin (2007).

It is worth noting that these models such that the potential functions are path-dependent can often be reformulated as a standard hidden Markov model, with a potential function depending on the last state only, by adding components to the hidden Markov chain. For instance, the mixture Kalman filter may be interpreted as a standard particle filtering algorithm, provided the hidden Markov process is augmented with the associated Kalman filter parameters (filtering expectation and error covariance matrix) that are computed iteratively in the algorithm. However, this representation is unwieldy, and the augmented Markov process does not fulfil the usual mixing conditions found in the literature on the stability of particle approximations. This is the main reason why our study is based on path-dependent potential functions. Quite interestingly, we shall see that the opposite perspective is more fruitful. Specifically, our stability results obtained for path-dependent potential functions can also be applied to standard state-space models, leading to stability results under conditions different from those previously given in the literature.

In this paper, we study the asymptotic stability of particle algorithms based on path-dependent potential functions. We work under the assumption that the dependence of potential on state vanishes exponentially in . This assumption is met in practical settings because of the recursive nature of the potential functions. Our proofs are based on the following construction: the true filter is compared with an approximate filter associated to ‘truncated’ potentials, that is potentials that depend only on , the vector of the last states, for some well-chosen integer . Then, we compare the truncated filter with its particle approximation, using the fact the ‘truncated’ filter corresponds to a standard Feynman-Kac model with a Markov chain of fixed dimension. Finally, we use a coupling construction to compare the particle approximations of the true filter and the truncated filter. In this way, we obtain estimates of the stability of the particle algorithm of interest. We apply our results to the two aforementioned classes of models, and obtain practical conditions under which the corresponding particle algorithms are stable uniformly in time.

The paper is organised as follows. Section 2 introduces the model and the notations. Section 3 evaluates the local error induced by the truncation. Section 4 studies the mixing properties of the truncated filter. Section 5 studies the propagation of the truncation error. Section 6 develops a coupling argument for the two particle systems. Section 7 states the main theorem of the paper, which provides a bound for the particle error and derives time-uniform estimates for the long-term propagation of the error in the particle approximation of the true model. Section 8 applies these results to two particle algorithms of practical interest, namely, the mixture Kalman filter, and the mixture GARCH filter, and shows how these results can be adapted to standard state-space models, such that the potential function depends only on the last state.

## 2 Model and notations

We consider a hidden Markov model, with latent (non-observed) state process , and observed process , taking values respectively in a complete separable metric space and in . The state process is an inhomogeneous Markov chain, with initial probability distribution , and transition kernel . The observed process admits as a conditional probability density (with respect to an appropriate dominating measure) given and , where the short-hand for any symbol stands for the vector . As explained in the Introduction, this quantity depends on the entire path , rather than the last state . Following common practice, we drop dependencies on the ’s in the notations, as the observed sequence may be considered as fixed, and use the short-hand . The model admits a Feynman-Kac representation which we describe fully in (2.1). We consider the following assumptions.

###### Hypothesis 1.

For all , the kernel is mixing, i.e. there exists such that

 εnξ(A)≤Qn(λn−1,A)≤1εnξ(A)

for some , and for any Borel set , any .

###### Hypothesis 2.

For large enough, and all , there exists a ‘truncated’ potential function that depends on the last states only, and that approximates in the sense that

 |Ψn(λ0:n)−~Ψpn(λn−p+1:n)|≤ϕnτp{Ψn(λ0:n)∧~Ψpn(λn−p+1:n)}

for some constants and , , , and all . For convenience, we abuse notations and set for .

###### Hypothesis 3.

There exists constants , , , , such that

 1an≤Ψn(λ0:n)≤bn,1an≤~Ψpn(λ(n−p+1)+:n)≤bn

for all , using the short-hand for any integer .

The constants and depend implicitly on the realisation of the observed process. Hypotheses 1 and 3 are standard in the filtering literature; see e.g. Del Moral (2004). Hypothesis 2 formalises the fact that potential functions are computed using iterative formulae, and therefore should forget past states at an exponential rate. One may take for instance, where is an arbitrary element of . We shall work out, in several models of interest, practical conditions under which Hypothesis 2 is fulfilled in Section 8.

We introduce the following notations for the forward kernels, for :

 γn(λ0:n−1,dλ′0:n)=δλ0:n−1(dλ′0:n−1)Qn(λn−1,dλ′n)Ψn(λ′0:n)

where is the Dirac measure centred at . The above kernels implicitly defines operators on measures and on test functions, i.e.,

 γnμ(f)=⟨γnμ,f⟩=∫μ(dλ0:n−1)γn(λ0:n−1,dλ′0:n)f(λ′0:n),

for any , any test function , where denotes the set of nonnegative measures w.r.t. , and the set of probability measures w.r.t. .

We associate to a “normalised” operator , such that, for any , is defined as:

 Rnμ(f)=γnμ(f)γnμ(1)

for any . Both the ’s and the ’s may be iterated using the following short-hands, for :

 γk:nμ=γn…γkμ,Rk:nμ=Rn…Rkμ.

We have the following Feynman-Kac representation:

 E(f(Λ0:n)|Y1:n=y1:n)=R1:nζ(f) , (2.1)

, , where, as mentioned above, the law of .

Finally, we denote the total variation norm on nonnegative measures by , the supremum norm on bounded functions by , and the Hilbert metric by for any pair , ; see e.g. Atar and Zeitouni (1997) or Le Gland and Oudjane (2004), Definition 3.3. We recall that the Hilbert metric is scale invariant, and is related to the total variation norm in the following way, see e.g. Lemma 3.4 in Le Gland and Oudjane (2004):

 ∥μ−μ′∥TV ≤ 2log3h(μ,μ′) (2.2) h(Kμ,Kμ′) ≤ 1ε2∥μ−μ′∥TV (2.3)

provided is a -mixing kernel. We can also derive the following properties from the definition of (, ):

 ∀kernel Q, h(Qμ,Qμ′)≤h(μ,μ′) , (2.4) ∀nonnegative function ψ, h(ψμ,ψμ′)≤h(μ,μ′) (2.5)

with an equality in the latter equation if is positive.

## 3 Local error induced by truncation

Until further notice, is a fixed integer such that and such that Hypothesis 2 holds. Since our proofs involve a comparison between the true filter and a ‘truncated’ filter, we introduce the projection operator which, for , associates to any measure its marginal w.r.t. its last components, i.e. :

 Hpn(μ)(f)=∫μ(dλ0:n)f(λn−p+1:n)

for any ; for , let . We also define the following ‘truncated’ forward kernels, for :

 ~γpn(λn−p:n−1,dλ′n−p+1:n) = δλn−p+1:n−1(dλ′n−p+1:n−1)Qn(λn−1,dλ′n)~Ψpn(λ′n−p+1:n)

and the associated normalised operators, for , :

 ~Rpnμ(f)=~γpnμ(f)~γpnμ(1)

and set , for . From now on, we will refer to the filter associated to these ‘truncated’ operators as the truncated filter.

We now evaluate the local error induced by the truncation.

###### Lemma 1.

For all , and for all ,

 ∥∥~Rpk+1:nHpkRkμ−~Rpk:nHpk−1μ∥∥TV≤2ϕkτp.
###### Proof.

Let . One has

 ~Rpk+1:nHpkRkμ(f) = ~γpk+1:nHpkγkμ(f)~γpk+1:nHpkγkμ(1) ~Rpk:nHpk−1μ(f) = ~γpk:nHpk−1μ(f)~γpk:nHpk−1μ(1)

where

 ~γpk+1:nHpkγkμ(f) = ∫En+1μ(dλ0:k−1)Qk(λk−1,dλk)Ψk(λ0:k)f(λ(n−p+1)+:n) ×n∏i=k+1[Qi(λi−1,dλi)~Ψpi(λ(i−p+1)+:i)]

and

 ~γpk:nHpk−1μ(f) = ∫En+1μ(dλ0:k−1)Qk(λk−1,dλk)~Ψpk(λk−p+1:k)f(λ(n−p+1)+:n) ×n∏i=k+1[Qi(λi−1,dλi)~Ψpi(λ(i−p+1)+:i)]

hence

 ∣∣~γk+1:nHpkγkμ(f)−~γk:nHpk−1μ(f)∣∣ ≤ ∫En+1μ(dλ0:k−1)Qk(λk−1,dλk)∣∣Ψk(λ0:k)−~Ψpk(λ(k−p+1)+:k)∣∣ ×f(λ(k−p+1)+:k)n∏i=k+1[Qi(λi−1,dλj)~Ψpi(λ(i−p+1)+:i)] ≤ ϕkτp∫En+1μ(dλ0:k−1)Qk(λk−1,dλk)(Ψk(λ0:k)∧~Ψpk(λ(k−p+1)+:k)) ×f(λ(k−p+1)+:k)n∏i=k+1[Qi(λi−1,dλi)~Ψpi(λ(i−p+1)+:i)] ≤ ϕkτp{~γk+1:nHpkγkμ(f)∧~γk:nHpk−1γk−1μ(f)}

according to Hypothesis 2. And, since, for all , , , such that and ,

 ∣∣∣ab−cd∣∣∣ ≤ |a−c|b+|d−b|b (3.6)

one may conclude directly by taking , , , and . ∎

###### Lemma 2.

For , if there exists a (possibly random) probability kernel such that, for all ,

 supf:∥f∥∞=1E(∣∣⟨~Rpkμ−¯Rkμ,f⟩∣∣)≤δk

for some , then, for all and ,

 supf:∥f∥∞=1E(∣∣⟨~Rpk:k+iμ−~Rpk+1:k+i¯Rkμ,f⟩∣∣)≤2(ak+1…ak+i)(bk+1…bk+i)δk

where the expectation is with respect to the distribution of .

###### Proof.

Using the same ideas as above, one has, for ,

 ⟨~Rpk:k+iμ−~Rpk+1:k+i¯Rkμ,f⟩=~γpk+1:k+i~Rpkμ(f)~γpk+1:k+i~Rpkμ(1)−~γpk+1:k+i¯Rkμ(f)~γpk+1:k+i¯Rkμ(1).

In order to use inequality (3.6), compute

 E(∣∣~γpk+1:k+i~Rpkμ(f)−~γpk+1:k+i¯Rkμ(f)∣∣) = E(∣∣∣∫(~Rpkμ−¯Rkμ)(dλ(k−p+1)+:k) \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak k+i∏l=k+1Ql(λl−1,dλl)~Ψpl(λ(l−p+1)+:l)f(λ(k+i−p+1)+:k+i)∣∣ ∣∣) ≤ E(bk+1…bk+i∣∣(~Rpkμ−¯Rkμ)(¯f)∣∣) ≤ bk+1…bk+iδk

where is defined as

 ¯f(λ(k−p+1)+:k)=∫k+i∏l=k+1Ql(λl−1,dλl)f(λ(k+i−p+1)+:k+i)≤1.

and conclude by noting that

 ~γpk+1:k+i~Rpkμ(1) = ∫(~Rpkμ)(dλ(k−p+1)+:k)k+i∏l=k+1Ql(λl−1,dλl)~Ψpl(λ(l−p+1)+:l) ≥ 1ak+1…ak+i

since is a probability measure. ∎

## 4 Mixing and contraction properties of the truncated filter

The truncated filter may be interpreted as a standard filter based on Markov chain . This insight allows us to establish the contraction properties of the truncated filter.

###### Lemma 3.

One has:

 h(~Rpk+1:k+pμ,~Rpk+1:k+pμ′)≤1~ε2k+1,p∥μ−μ′∥TV

and

 h(~Rpk+1:k+pμ,~Rpk+1:k+pμ′)≤~ρk+1,ph(μ,μ′)

where

 ~ε2k,p=ε2k(ak…ak+p−2)(bk…bk+p−2),~ρk,p=1−~ε2k,p1+~ε2k,p,

for all , and all ,.

Note must be interpreted as a mixing coefficient, and as a Birkhoff contraction coefficient.

###### Proof.

Using Hypothesis 3, one has:

 Qk+p~γk+1:k+p−1μ = ∫μ(dλ(k−p+1)+:k)k+p∏i=k+1Qi(λi−1,dλi)k+p−1∏i=k+1[~Ψpi(λ(i−p+1)+:i)] ≤ bk+1…bk+p−1∫μ(dλ(k−p+1)+:k)k+p∏i=k+1Qi(λi−1,dλi) ≤ bk+1…bk+p−1εk+1~ξp(dλk+1:k+p)

where stands for the following reference measure:

 ~ξp(dλk+1:k+p)=ξ(dλk+1)k+p∏i=k+2Qi(λi−1,dλi).

One shows similarly that

 Qk+p~γk+1:k+p−1μ≥εk+1ak+1…ak+p−1~ξp(dλk+1:k+p).

Hence kernel is mixing, with mixing coefficient .

Following Lemma 3.4 in Le Gland and Oudjane (2004),

 h(~Rpk+1:k+pμ,~Rpk+1:k+pμ′) = h(Qk+p~γk+1:k+p−1μ,Qk+p~γk+1:k+p−1μ′) ≤ 1~ε2k+1,p∥μ−μ′∥TV

using the scale invariance property of the Hilbert metric. Similarly, according to Lemma 3.9 in the same paper:

 h(~Rpk+1:k+pμ,~Rpk+1:pμ′) = h(Qk+p~γk+1:k+p−1μ,Qk+p~γk+1:k+p−1μ′) ≤ (1−~ε2k+1,p1+~ε2k+1,p)h(μ,μ′).

## 5 Propagation of truncation error

We establish first the two following lemmas.

###### Lemma 4.

Let be a sequence of (possibly random) probability kernels such that for all and ,

 supf:∥f∥∞=1E{∣∣⟨~Rpnμ−¯Rnμ,f⟩∣∣}≤δn ,

where the expectation is w.r.t. the randomness of , then, for all and all , one has

 supf:∥f∥∞=1E{∣∣⟨~Rp1:nζ−¯R1:nζ,f⟩∣∣}≤8log(3)n∑i=1⎛⎜ ⎜⎝δi~ε2i+1~ε2i+p+1⌊n−ip⌋−1∏j=2~ρi+jp+1,p⎞⎟ ⎟⎠

where , and with the convention that empty products equal one.

###### Proof.

The following difference can be decomposed into a telescopic sum:

 ~Rp1:nζ−¯R1:nζ=n∑i=1(~Rpi+1:n~Rpi¯R1:i−1ζ−~Rpi+1:n¯Ri¯R1:i−1ζ).

We fix the integers , , and consider some arbitrary test function . For , one may apply Lemma 2:

 supf:∥f∥∞=1E{∣∣⟨~Rpi+1:n~Rpi¯R1:i−1ζ−~Rpi+1:n¯Ri¯R1:i−1ζ,f⟩∣∣} ≤ 2(ai+1…an)(bi+1…bn)δi ≤ 8log(3)δi~ε2i+1,p~ε2i+p+1,p

since , and for all .

For , let , then, using Lemma 3, Equations (2.2) to (2.5) one has

 ∣∣⟨~Rpi+1:n~Rpi¯R1:i−1ζ−~Rpi+1:n¯Ri¯R1:i−1ζ,f⟩∣∣ ≤ ∥∥~Rpi+1:n~Rpi¯R1:i−1ζ−~Rpi+1:n¯Ri¯R1:i−1ζ∥∥TV ≤ 2log(3)h(~Rpi+1:i+kp~Rpi¯R1:i−1ζ,~Rpi+1:i+kp¯Ri¯R1:i−1ζ) ≤ 2log(3)~ε2i+p+1,p×k−1∏j=2~ρi+jp+1,p×∥∥~Rpi+1:i+pν−~Rpi+1:i+pν′∥∥TV

where , . Applying (7) p. 160 of Le Gland and Oudjane (2004), one gets

 ∥∥~Rpi+1:i+pν−~Rpi+1:i+pν′∥∥TV ≤ 2∥~γpi+1:i+pν−~γpi+1:i+pν′∥TV~γpi+1:i+pν(1).

where, using the same calculations as in Lemma 3,

 ~γpi+1:i+pν(1)≥εi+1ai+1…ai+p

and

 E[∥∥~γpi+1:i+pν−~γpi+1:i+pν′∥∥TV] = E[∫x′∈Ep∣∣∣∫x∈Ep(ν−ν′)(dx)~γpi+1:i+p(x,dx′)∣∣∣] ≤ ≤ bi+1…bi+pεi+1[supϕ:∥ϕ∥∞=1E(|⟨ν−ν′,ϕ⟩|)]

which ends the proof.∎

###### Lemma 5.

For all and all , one has

 ∥∥~Rp1:nζ−HpnR1:nζ∥∥TV≤4τplog3⎧⎨⎩n∑i=1ϕi~ε2i+1,p⌊(n−i)/p⌋−1∏j=1~ρi+jp+1,p⎫⎬⎭

with the convention that empty sums equal zero, and empty products equal one.

###### Proof.

One has:

 ~Rp1:nζ−HpnR1:nζ=n∑i=1(~Rpi+1:n~RpiHpi−1R1:i−1ζ−~Rpi+1:nHpiR1:iζ)

For , let , then according to Lemma 3:

 ∥∥~Rpi+1:n~RpiHpi−1R1:i−1ζ−~Rpi+1:n~Rpi+1HpiR1:iζ∥∥TV ≤2log3h(~Rpi+1:i+kp~RpiHpi−1R1:iζ,~Rpi+1:i+kpHpiR1:iζ) <