A proof of uniform convergence over time for a distributed particle filter

A proof of uniform convergence over time for a distributed particle filter

Joaquín Míguez j.miguez@qmul.ac.uk Manuel A. Vázquez mvazquez@tsc.uc3m.es School of Mathematical Sciences, Queen Mary University of London.
Mile End Rd, E1 4NS London, UK.
Departamento de Teoría de la Señal y Comunicaciones, Universidad Carlos III de Madrid, Avenida de la Universidad 30, 28911 Leganés, Madrid, Spain.
Abstract

Distributed signal processing algorithms have become a hot topic during the past years. One class of algorithms that have received special attention are particles filters (PFs). However, most distributed PFs involve various heuristic or simplifying approximations and, as a consequence, classical convergence theorems for standard PFs do not hold for their distributed counterparts. In this paper, we analyze a distributed PF based on the non-proportional weight-allocation scheme of Bolic et al (2005) and prove rigorously that, under certain stability assumptions, its asymptotic convergence is guaranteed uniformly over time, in such a way that approximation errors can be kept bounded with a fixed computational budget. To illustrate the theoretical findings, we carry out computer simulations for a target tracking problem. The numerical results show that the distributed PF has a negligible performance loss (compared to a centralized filter) for this problem and enable us to empirically validate the key assumptions of the analysis.

1 Introduction

Distributed signal processing algorithms have become a hot topic during the past years, propelled by fast technological developments in the fields of parallel computing, on one hand, and wireless sensor networks (WSNs), on the other. In parallel computing, algorithms are optimized to run fast on a set of concurrent processors (e.g., in a graphics processing unit (GPU) Suchard10 ()), while signal processing methods for WSNs are designed for their implementation over a collection of low-power nodes that communicate wirelessly and share the processing tasks Read14 (). Popular techniques in the WSN arena include consensus-based estimators Fagnani08 (); Kar09 (); Kar10 (), diffusion-based adaptive algorithms Lopes08 (); Cattivelli10 (); Chen12 () and distributed stochastic filters, including Kalman filters Schizas08a (); Schizas08b () and particle filters (PFs) Hlinka12 (); Lee13 (); Dias13 (); Dias13b (). While consensus and diffusion algorithms require many iterations of message passing for convergence, PFs are a priori better suited for online estimation and prediction tasks. Unfortunately, most distributed PFs (DPFs) rely on simplifying approximations and their convergence cannot be guaranteed by the classical theorems in Crisan01 (); DelMoral01c (); Bain08 (). One exception is the Markov chain distributed particle filter (MCDPF), for which analytical results exist Lee13 (). However, the MCDPF converges asymptotically as sets of samples and weights are retransmitted repeatedly over the network according to a random scheme. From this point of view, it is as communication-intensive as consensus algorithms and, therefore, less appropriate for online processing compared to classical PFs.

The implementation of PFs on parallel computing systems has received considerable attention since these methods were originally proposed in Gordon93 (). The efficient implementation of PFs on parallel devices such as GPUs and multi-core CPUs is not as straightforward as it seems a priori because these Monte Carlo algorithms involve a resampling step which is inherently hard to parallelize. This issue is directly addressed in Bolic05 (), where two parallel implementations of the resampling step are proposed. While the approach of Bolic05 () is sound, the authors focus on implementation issues and no proof of convergence of the resulting PFs is provided. Only very recently, a number of authors have proposed distributed particle filtering schemes with provable convergence Whiteley13 (); Verge13 (). These methods have a fairly broad scope (the methodology in Whiteley13 () can actually be seen as a generalization of the techniques in Bolic05 ()) yet they appear to be less suitable for practical implementations under communications or computing power constraints, as they involve considerable parallelization overhead Verge13 () or depend on the centralized computation of certain statistics that involve the whole set of particles in the filter Whiteley13 ().

The goal of this paper is to provide a rigorous proof of convergence for a DPF that relies on the distributed resampling with non-proportional weight-allocation scheme of Bolic05 () (later adapted for implementation over WSNs in Read14 ()). Under assumptions regarding the stability of the state-space model underlying the PF, we prove that this algorithm converges asymptotically (as the number of particles generated by the filter increases) and uniformly over time. Time-uniform convergence implies that the estimation errors stay bounded without having to increase the computational effort of the filter over time. We provide explicit convergence rates for the DPF and discuss the implications of this result and the assumptions on which the analysis is based. The theoretical investigation is complemented by computer simulations of an indoor target tracking problem. For this specific system, we first show that the performance of the centralized and distributed PFs is very similar and then proceed to validate numerically a key assumption used in the analysis, related to the degree of cooperation among processing elements in the distributed computing system on which the algorithm is run.

The rest of the paper is organized as follows. In Section 2 we describe the DPF of interest. In Section 3 we prove a uniform convergence result for this filter and discuss the implications of such result. Computer simulations are presented in Section 4 and, finally, Section 5 is devoted to the conclusions.

2 A distributed particle filtering algorithm

2.1 State space systems and the standard particle filter

The stochastic filtering problem consists in tracking the posterior probability distribution of the state variables of a random dynamic system. Often, the problem is restricted to the (broad) class of Markov state space systems with conditionally independent observations. Let denote the discrete-time random sequence of the system state variables, taking values on the -dimensional set , and let denote the corresponding sequence of observations, taking values on . The systems of interest are modeled by triplets of the form , where is the prior probability measure associated to the random variable (r.v.) , is a Markov kernel that determines the probability distribution of conditional on , and is the conditional probability density function (pdf) of the random observation , given the state , with respect to (w.r.t.) the Lebesgue measure. The latter is most often used as the likelihood of given the observation . We write as a function of explicitly, namely , to emphasize this fact.

The goal in the stochastic filtering problem is to sequentially compute the posterior probability measures of given the observations , denoted , for (note that ). Except for a few particular cases, e.g., the Kalman Kalman60 (); Anderson79 () and Beneš Bain08 () filters, cannot be computed exactly and numerical approximations are pursued instead. PFs are recursive Monte Carlo algorithms that generate random discrete approximations of the probability measures Crisan01 (); DelMoral01c (); Bain08 (). At time a typical particle filtering algorithm produces a set of random samples (often termed particles) and associated importance weights, with , and approximate by way of the random probability measure , where denotes the Dirac (unit) delta measure located at .

It is common to analyze the convergence of PFs in terms of the approximation of integrals w.r.t. DelMoral00 (); Crisan01 (); Bain08 (); DelMoral01c (); Miguez13b (). To be specific, let be a real function integrable w.r.t. . Then we denote

and approximate the latter integral (generally intractable) as

2.2 A distributed particle filter

We describe a PF based on the distributed resampling with non-proportional allocation (DRNA) scheme of (Bolic05, , Section IV.A.3) (see also Miguez07b (); Balasingam11 (); Read14 ()). Assume that the set of weighted particles can be split into disjoint sets,

each of them assigned to an independent processing element (PE). The total number of particles is , where is the number of PEs and is the number of particles per PE. At the -th PE, , we additionally keep track of the aggregated weight for that PE.

Every time steps, the PEs exchange subsets of particles and weights by using some communication network Bolic05 (). We formally represent this transfer of data among PEs by means of a deterministic one-to-one map

that keeps the number of particles per PE, , invariant. To be specific, means that the -th particle of the -th PE is transmitted to the -th PE, where it becomes particle number . Typically, only subsets of particles are transmitted from one PE to a different one, hence for many values of and . The DPF of interest in this paper can be outlined as follows.

Algorithm 1.

DPF based on the DRNA scheme of (Bolic05, , Section IV.A.3), with PEs, particles per PE and periodic particle exchanges every time steps.

  1. For (concurrently) draw , , and set and .

  2. Assume that and are available for each .

    1. For (concurrently) and ,

    2. Local resampling: for (concurrently) set with probability

      for and .

      Set for each and all .

    3. Particle exchange: If for some , then set

      for every . Also set for every .

      Otherwise, if , set , , .

Every PE operates independently of all others except for the particle exchange, step 2.c), which is performed every time steps. The degree of interaction can be controlled by designing the map in a proper way. Typically, exchanging a subset of particles with “neighbor” PEs is sufficient, as illustrated by the following example.

Example 1.

Consider a circular arrangement in which the -th PE exchanges particles with PE and , where indicates the modulus operation ( is the integer remainder of the division ). To be explicit,

  • for each , the -th PE exchanges particles with two neighbours, namely PE number and PE number ,

  • PE number 1 exchanges particles with PE number and PE number 2, and

  • PE number exchanges particles with PE number and PE number 1.

Next, assume for simplicity that each PE sends one particle to each one of its neighbors (i.e., it sends out two particles) and receives one particle from each one of its neighbors as well (i.e., it gets two new particles) so that the number of particles per PE, remains constant. One choice of map that implements such an exchange is the following

In other words, for each PE ,

  • all particles with index remain the same,

  • the first particle () is sent to PE and, in exchange, the -th particle from that neighbor is received, i.e., and ,

  • the last particle () is sent to PE and, in exchange, the -st particle from that neighbor is received, i.e., and .

It is apparent that this instance of preserves the number of particles per PE constant. More elaborate schemes can be designed and in general this is related to the structure of the communication network that interconnects the PEs. As long as it is guaranteed that the number of particles that a PE gives to its neighbors is the same as the number of particles that it receives from these neighbors, the number of particles per PE remains invariant.

Remark 1.

The local resampling step 2.b) is carried out independently, and concurrently, at each PE and it does not change the aggregate weights, i.e., . We assume a multinomial resampling procedure, but other schemes (see, e.g., Bain08 ()) can be easily incorporated.

2.3 Measure and integral approximations

Let

be the locally normalized versions of the importance weights and let

be the globally normalized aggregated weights of the -th PE before and after the particle exchange step, respectively. The DPF produces three different local approximations of the posterior measure at each PE, namely

corresponding to steps 2.a), 2.b) and 2.c) of Algorithm 1. The normalized aggregate weights can be used to combine the local approximations, which readily yields global approximations of the posterior measure, i.e.,

(1)

Note that only the local normalization of the weights , , is necessary for Algorithm 1 to run (as they are needed in the local resampling step 2.b). The computation of the ’s, the ’s or are only necessary when local or global estimates of are needed. However, the computation of these estimates can be carried out concurrently with Algorithm 1, i.e., the DPF can keep running in parallel with the computation of any estimates.

Remark 2.

Algorithm 1 enjoys some relatively straightforward properties that should be highlighted, as they are relevant for the analysis in Section 3.

  1. The particle exchange step does not change the particles or their weights. It only “shuffles” the particles among the PEs and updates the aggregate weights accordingly. As a result, , since the individual particles and weights are not changed.

  2. A random exchange (i.e., a random ) is also possible, although it makes certain practical implementations harder Bolic05 (). We abide by a deterministic map for the sake of conciseness, although the analysis can be extended to account for random schemes in a relatively straightforward manner.

  3. The ensemble of local resampling steps keeps the local and aggregate importance weights proper and is globally unbiased Read14 ().

The goal of this paper is to analyze the approximation of integrals using the random measure in (1). We look into the norm of the approximation errors, namely for , where is an integrable real function on and . The expectation is taken w.r.t. the distribution of the r.v. .

3 Analysis

3.1 Assumptions, preliminary results and notations

Let be the set of probability measures on , where is the Borel -algebra of open subsets of the state space . Choose a measure and let be a real function integrable w.r.t. . We define the measure-valued map as

and it is not difficult to show that is the transformation that converts the filter measure at time into the filter at time , i.e., DelMoral01c (); Bain08 (). Composition of maps is denoted , for , hence , and we adopt the convention . We also define the functional , recursively, as

for , and it is not difficult to show that DelMoral01c ()

(2)

where is the constant unit function.

For conciseness, we denote the set of bounded real functions over as , i.e., if, and only if, is a map of the form and . In the sequel, we analyze the asymptotics of the approximation errors , where , subject to the following assumptions.

Assumption 1.

There exists a bounded sequence of positive real numbers such that , for every , and .

Assumption 2.

For any probability measures and every ,

Assumption 1 states that the likelihood functions are upper-bounded as well as bounded away from 0. Assumption 2 states that the optimal filter for the given state-space system is stable. A detailed study of the stability properties of optimal filters for the class of state space models of interest here can be found in DelMoral01c () (see also VanHandel09 (); VanHandel09b () for recent developments), including conditions on the kernels and the likelihoods which are sufficient to ensure stability.

Assumption 3.

The particle exchange step, with period , guarantees that

(3)

and some constants , and independent of .

Intuitively, Assumption 3 says that the aggregate weights remain “sufficiently balanced” (i.e., no PE takes too much weight compared to others). We also introduce the lemma below which, combined with Assumption 3, is key to the analysis of the approximation errors, as it enables us to obtain tractable bounds for the aggregate weights.

Lemma 1.

Assume that

(4)

for some , and constant w.r.t. . Then, there exists a non-negative and a.s. finite random variable , independent of , such that

(5)

where is also a constant w.r.t. . Moreover, there is another constant independent of and such that

Proof. See A.

Remark 3.

The r.v. can be written as

(see Eq. (45) in A), where is constant w.r.t. . If we also note that the aggregate weights after the particle exchange step, , can be computed deterministically111Because the map used for the particle exchange is deterministic. given the aggregates before the exchange, , then it follows that is measurable w.r.t. the -algebra , where each term in the countable union is a generated -algebra, namely

Finally, we introduce a simple inequality that will be repeatedly used through the analysis of Algorithm 1. Let be probability measures and let be two real bounded functions on such that and . If the identities hold, then it is straightforward to show (see, e.g., Crisan01 ()) that

(6)

3.2 Uniform convergence over time

In this section we rigorously prove that , as and remains fixed, uniformly over time. The key result is Lemma 2 below, on the propagation of errors across the map . From this result, we then obtain the main theorem on the convergence of Algorithm 1.

Lemma 2.

Let be fixed. If Assumption 1 holds, with , and Assumption 3 holds, with and , then there exist constants and , independent of , such that

(7)

for every , , , and .

Proof. Let us write in the remaining of the proof for conciseness. We can use the relationship (2) to rewrite the norm of the approximation error as

and applying (6) together with Assumption 1 in the equation above, we readily find an upper bound of the form

(8)

where

(9)

(note that ).

The two terms between square brackets on the right hand side (rhs) of (8) have the same form. To upper-bound them, we need to find bounds for errors of the form , where . To do this, we first split the norm of the error using a triangle inequality,

To deduce a bound for the first term on the rhs of (3.2), let us recall that the particle exchange step does not modify the individual particle weights, only the aggregates, hence . Then, we can readily write the conditional expectation of (given , see Remark 3) as

(11)

where the r.v.’s are conditionally independent (given ), zero mean (since ) and bounded (namely, for all , , and ). Additionally, from step 2.a) of Algorithm 1 it follows that the normalized aggregate weights , , have the form

(12)
(13)

where the inequality (12) is a consequence of Assumption 1, while (13) follows immediately from the definition of the weights in Algorithm 1. However, given (13) and provided that there is no particle exchange at times (exchanges occur periodically with period ) we readily obtain a straightforward relationship in the sequence of aggregate weights, namely

(14)

Since the most recent particle exchange was carried out at most time steps earlier, we can readily iterate (14) to obtain

(15)

However, the inequality (15) combined with Assumption 3 yields

(16)

for some , where , and are constants independent of , and . In turn, the inequality (16) enables the application of Lemma 1, which states that there exists an a.s. finite r.v. , independent of , such that

(17)

where is also constant w.r.t. . Substituting (17) back into Eq. (11) we arrive at

(18)

Since is measurable w.r.t. (see Remark 3) and the r.v.’s are conditionally independent, have zero mean and upper bound , it is an exercise in combinatorics to show that

(19)

for some constant independent of , and (actually, independent of the distribution of the ’s). Taking unconditional expectations on both sides of the inequality in (19) yields

(20)

where (see Lemma 1), since in Assumption 3. Moreover, from Lemma 1, there exists a constant such that for some . Therefore, for any there exists such that and

which readily yields, for any ,

(21)

for any