Distributed Kalman Filter via Gaussian Belief Propagation

# Distributed Kalman Filter via Gaussian Belief Propagation

## Abstract

Recent result shows how to compute distributively and efficiently the linear MMSE for the multiuser detection problem, using the Gaussian BP algorithm. In the current work, we extend this construction, and show that operating this algorithm twice on the matching inputs, has several interesting interpretations. First, we show equivalence to computing one iteration of the Kalman filter. Second, we show that the Kalman filter is a special case of the Gaussian information bottleneck algorithm, when the weight parameter . Third, we discuss the relation to the Affine-scaling interior-point method and show it is a special case of Kalman filter.

Besides of the theoretical interest of this linking estimation, compression/clustering and optimization, we allow a single distributed implementation of those algorithms, which is a highly practical and important task in sensor and mobile ad-hoc networks. Application to numerous problem domains includes collaborative signal processing and distributed allocation of resources in a communication network.

## 1 Introduction

Recent work [1] shows how to compute efficiently and distributively the MMSE prediction for the multiuser detection problem, using the Gaussian Belief Propagation (GaBP) algorithm. The basic idea is to shift the problem from linear algebra domain into a probabilistic graphical model, solving an equivalent inference problem using the efficient belief propagation inference engine. [2] compares the empirical performance of the GaBP algorithm relative to other linear iterative algorithms, demonstrating faster convergence. [3] elaborates on the relation to solving systems of linear equations.

In the present work, we propose to extend the previous construction, and show, that by performing the MMSE computation twice on the matching inputs we are able to compute several algorithms. First, we reduce the discrete Kalman filter computation [4] to a matrix inversion problem and show how to solve it using the GaBP algorithm. We show that Kalman filter iteration which is composed from prediction and measurement steps can be computed by two consecutive MMSE predictions. Second, we explore the relation to Gaussian information bottleneck (GIB) [5] and show that Kalman filter is a special instance of the GIB algorithm, when the weight parameter . To the best of the authors knowledge, this is the first algorithmic link between the information bottleneck framework and linear dynamical systems. Third, we discuss the connection to the Affine-scaling interior-point method and show it is an instance of the Kalman filter.

Besides of the theoretical interest of linking compression, estimation and optimization together, our work is highly practical since it proposes a general framework for computing all of the above tasks distributively in a computer network. This result can have many applications in the fields of estimation, collaborative signal processing, distributed resource allocation etc.

A closely related work is [6]. In this work, Frey et al. focus on the belief propagation algorithm (a.k.a sum-product algorithm) using factor graph topologies. They show that the Kalman filter algorithm can be computed using belief propagation over a factor graph. In this contribution we extend their work in several directions. First, we extend the computation to vector variables (relative to scalar variables in Frey’s work). Second, we use a different graphical model: an undirected graphical model which results in simpler update rules, where Frey uses factor-graph with two types of messages: factor to variables and variables to factors. Third and most important, we allow an efficient distributed calculation of the Kalman filter steps, where Frey’s algorithm is centralized.

Another related work is [7]. In this work the link between Kalman filter and linear programming is established. In this work we propose a new and different construction which ties the two algorithms.

The structure of this paper is as follows. In Section 2 we describe the discrete Kalman filter. In Section 3 we outline the GIB algorithm and discuss its relation to the Kalman filter. Section 4 presents the Affine-scaling interior-point method and compares it to the Kalman filter algorithm. Section 5 presents our novel construction for performing an efficient distributed computation of the three methods.

## 2 Kalman Filter

### 2.1 An Overview

The Kalman filter is an efficient iterative algorithm to estimate the state of a discrete-time controlled process that is governed by the linear stochastic difference equation 1:

 xk=Axk−1+Buk−1+wk−1, (1)

with a measurement that is The random variables and that represent the process and measurement AWGN noise (respectively). . We further assume that the matrices are given2.

The discrete Kalman filter update equations are given by [4]:

The prediction step:

 ^x−k = A^xk−1+Buk−1, (2a) P−k = APk−1AT+Q. (2b)

The measurement step:

 Kk = P−kHT(HP−kHT+R)−1, (3a) ^xk = ^x−k+Kk(zk−H^x−k), (3b) Pk = (I−KkH)P−k. (3c)

where is the identity matrix.

The algorithm operates in rounds. In round the estimates are computed, incorporating the (noisy) measurement obtained in this round. The output of the algorithm are the mean vector and the covariance matrix .

### 2.2 New construction

Our novel contribution is a new efficient distributed algorithm for computing the Kalman filter. We begin by showing that the Kalman filter can be computed by inverting the following covariance matrix:

 E=⎛⎜⎝−Pk−1A0ATQH0HTR⎞⎟⎠, (4)

and taking the lower right block to be .

The computation of can be done efficiently using recent advances in the field of Gaussian belief propagation [3, 1]. The intuition for our approach, is that the Kalman filter is composed of two steps. In the prediction step, given , we compute the MMSE prediction of [6]. In the measurement step, we compute the MMSE prediction of given , the output of the prediction step. Each MMSE computation can be done using the GaBP algorithm [1]. The basic idea, is that given the joint Gaussian distribution with the covariance matrix , we can compute the MMSE prediction

 ^y=argmaxyp(y|x)∝N(μy|x,Σ−1y|x),

where

 μy|x=(Σyy−ΣyxΣ−1xxΣxy)−1ΣyxΣ−1xxx,
 Σy|x=(Σyy−ΣyxΣ−1xxΣxy)−1.

This in turn is equivalent to computing the Schur complement of the lower right block of the matrix . In total, computing the MMSE prediction in Gaussian graphical model boils down to a computation of a matrix inverse. In [3] we have shown that GaBP is an efficient iterative algorithm for solving a system of linear equations (or equivalently computing a matrix inverse). In [1] we have shown that for the specific case of linear detection we can compute the MMSE estimator using the GaBP algorithm. Next, we show that performing two consecutive computations of the MMSE are equivalent to one iteration of the Kalman filter.

###### Theorem 1

The lower right block of the matrix inverse (eq. 4), computed by two MMSE iterations, is equivalent to the computation of done by one iteration of the Kalman filter algorithm.

Proof of Theorem 1 is given in Appendix A.

In Section 5 we explain how to utilize the above observation to an efficient distributed iterative algorithm for computing the Kalman filter.

## 3 Gaussian Information Bottleneck

Given the joint distribution of a source variable X and another relevance variable Y, Information bottleneck (IB) operates to compress X, while preserving information about Y [8, 9], using the following variational problem:

 minp(t|x)L:L≡I(X;T)−βI(T;Y)

represents the compressed representation of via the conditional distributions , while the information that maintains on is captured by the distribution . is a lagrange multiplier which weights the tradeoff between minimizing the compression information and maximizing the relevant information. As we are interested solely in compression, but all relevant information about Y is lost . When where are focused on preservation of relevant information, in this case T is simply the distribution X and we obtain . The interesting cases are in between, when for finite values of we are able to extract rather compressed representation of X while still maintaining a significant fraction of the original information about Y.

An iterative algorithm for solving the IB problem is given in [9]:

 Pk+1(t|x)= Pk(t)Zk+1(x,β)⋅ (5a) ⋅exp(−βDKL[p(y|x)||pk(y|t)]), Pk(t)= ∫xp(x)Pk(t|x)dx, (5b) Pk(y|t)= 1Pk(t)∫xPk(t|x)p(x,y)dx. (5c)

where is a normalization factor computed in round .

The Gaussian information bottleneck (GIB) [5] deals with the special case where the underlying distributions are Gaussian. In this case, the computed distribution is Gaussian as well, represented by a linear transformation where is a joint covariance matrix of and , is a multivariate Gaussian independent of X. The outputs of the algorithm are the covariance matrices representing the linear transformation T: .

An iterative algorithm is derived by substituting Gaussian distributions into (5), resulting in the following update rules:

 Σξ+1 = (βΣtk|y−(β−1)Σ−1tk), (6a) Ak+1 = βΣξk+1Σ−1tk|yAk(I−Σy|xΣ−1x). (6b)

Since the underlying graphical model of both algorithms (GIB and Kalman filter) is Markovian with Gaussian probabilities, it is interesting to ask what is the relation between them. In this work we show, that the Kalman filter posterior error covariance computation is a special case of the GIB algorithm when . Furthermore, we show how to compute GIB using the Kalman filter when (the case where is not interesting since it gives a degenerate solution where [5].) Table 1 outlines the different notations used by both algorithms.

###### Theorem 2

The GIB algorithm when is equivalent to the Kalman filter algorithm.

The proof is given in Appendix B.

###### Theorem 3

The GIB algorithm when can be computed by a modified Kalman filter iteration.

The proof is given in Appendix C.

There are some differences between the GIB algorithm and Kalman filter computation. First, the Kalman filter has input observations in each round. Note that the observations do not affect the posterior error covariance computation (eq. 3c), but affect the posterior mean (eq. 3b). Second, Kalman filter computes both posterior mean and error covariance . The covariance computed by the GIB algorithm was shown to be identical to when . The GIB algorithm does not compute the posterior mean, but computes an additional covariance (eq. 6b), which is assumed known in the Kalman filter.

From the information theoretic perspective, our work extends the ideas presented in [10]. Predictive information is defined to be the mutual information between the past and the future of a time serias. In that sense, by using Theorem 2, Kalman filter can be thought of as a prediction of the future, which from the one hand compresses the information about past, and from the other hand maintains information about the present.

The origins of similarity between the GIB algorithm and Kalman filter are rooted in the IB iterative algorithm: For computing (5a), we need to compute (5b,5c) in recursion, and vice versa.

## 4 Relation to the Affine-scaling algorithm

One of the most efficient interior point methods used for linear programming is the Affine-scaling algorithm [11]. It is known that the Kalman filter is linked to the Affine-scaling algorithm [7]. In this work we give an alternate proof, based on different construction, which shows that Affine-scaling is an instance of Kalman filter, which is an instance of GIB. This link between estimation and optimization allows for numerous applications. Furthermore, by providing a single distribute efficient implementation of the GIB algorithm, we are able to solve numerous problems in communication networks.

The linear programming problem in its canonical form is given by:

 minimize cTx (7a) subject to Ax=b,    x≥0. (7b)

where with . We assume the problem is solvable with an optimal . We also assume that the problem is strictly feasible, in other words there exists that satisfies and .

The Affine-scaling algorithm [11] is summarized below. Assume is an interior feasible point to (7b). Let . The Affine-scaling is an iterative algorithm which computes a new feasible point that minimizes the cost function (7a):

 x1=x0−αγD2r (8)

where is the step size, is the step direction.

 r = (c−ATw), (9a) w = (AD2AT)−1AD2c, (9b) γ = maxi(eiPDc). (9c)

Where is the unit vector and is a projection matrix given by:

The algorithm continues in rounds and is guaranteed to find an optimal solution in at most rounds. In a nutshell, in each iteration, the Affine-scaling algorithm first performs an Affine-scaling with respect to the current solution point and obtains the direction of descent by projecting the gradient of the transformed cost function on the null space of the constraints set. The new solution is obtained by translating the current solution along the direction found and then mapping the result back into the original space [7]. This has interesting analogy for the two phases of the Kalman filter.

###### Theorem 4

The Affine-scaling algorithm iteration is an instance of the Kalman filter algorithm iteration.

Proof is given in Appendix D.

## 5 Efficient distributed computation

We have shown how to express the Kalman filter, Gaussian information bottleneck and Affine-scaling algorithms as a two step MMSE computation. Each step involves inverting a block matrix. Recent result by Bickson and Shental et al. [1] show that the MMSE computation can be done efficiently and distributively using the Gaussian belief propagation algorithm. Because of space limitations the full algorithm is not reproduced here.

The interested reader is referred to [3, 1] for a complete derivation of the GaBP update rules and convergence analysis. The GaBP algorithm is summarized in Table 2.

Regarding convergence, if it converges, GaBP is known to result in exact inference [12]. Determining the exact region of convergence and convergence rate remain open research problems. All that is known is a sufficient (but not necessary) condition [13, 14] stating that GaBP converges when the spectral radius satisfies , where is first normalized s.t. the main diagonal contains ones. A stricter sufficient condition [12], determines that the matrix must be diagonally dominant (i.e. , ) in order for GaBP to converge.

Regarding convergence speed, [15] shows that when converging, the algorithm converges in iterations, where is the desired accuracy, and is a parameter related to the inverted matrix. The computation overhead in each iteration is determined by the number of non-zero elements of the inverted matrix . In practice, [16] demonstrates convergence of 5-10 rounds on sparse matrices with several millions of variables. [17] shows convergence of dense constraint matrices of size up to in 6 rounds, where the algorithm is run in parallel using 1,024 CPUs. Empirical comparison with other iterative algorithms is given in [2].

## 6 Example application

The TransFab software package is a distributed middleware developed in IBM Haifa Labs, which supports real time forwarding of message streams, providing quality of service guarantees. We plan to use our distributed Kalman filter algorithm for online monitoring of software resources and performance. On each second each node records a vector of performance parameters like memory usage, CPU usage, current bandwidth, queue sizes etc. The nodes execute the distributed Kalman filter algorithm on the background. Figure 3 plots a covariance matrix of running an experiment using two TransFab nodes propagating data. The covariance matrix is used as an input the Kalman filter algorithm. Yellow sections show high correlation between measured parameters. Initial results are encouraging, we plan to report them using a future contribution.

## 7 Conclusion

In this work we have linked together several different algorithms from the the fields of estimation (Kalman filter), clustering/compression (Gaussian information bottleneck) and optimization (Affine-scaling interior-point method). Besides of the theoretical interest in linking those different domains, we are motivated by practical problems in communication networks. To this end, we propose an efficient distributed iterative algorithm, the Gaussian belief propagation algorithm, to be used for efficiently solving these problems.

## Acknowledgment

O. Shental acknowledges the partial support of the NSF (Grant CCF-0514859). The authors are grateful to Noam Slonim and Naftali Tishby from the Hebrew University of Jerusalem for useful discussions.

## Appendix A

{proof}

We prove that inverting the matrix (eq. 4) is equivalent to one iteration of the Kalman filter for computing .

We start from the matrix and show that can be computed in recursion using the Schur complement formula:

 D−CA−1B (11)

applied to the upper left submatrix of E, where we get:

 P−k=DQ−+CAT−A−1Pk−1BA.

Now we compute recursively the Schur complement of lower right submatrix of the matrix using the matrix inversion lemma:

 A−1+A−1B(D−CA−1B)−1CA−1

where In total we get:

 A−1P−k+A−1P−kBHT(DR+CHA−1P−kBHT)−1CHA−1P−k= (12)
 (I−(???)P−kHT(HP−kHT+R)−1H)P−k=
 =(???)(I−KkH)P−k=Pk

## Appendix B

{proof}

Looking at [5, ], when we get

 Σξ+1=(Σ−1tk|y)−1=Σtk|y=MMSEΣtk−ΣtkyΣ−1yΣytk=\@@cite[cite]{[% \@@bibref{Refnum}{GIB}{}{}, \S 38b]}Σtk+BTΣy|tkB=Σtk+\@@cite[c% ite]{[\@@bibref{Refnum}{GIB}{}{}, \S 34]}Σ−1tkΣtkyΣy|tk\@@cite[cite]{[\@@bibref{Refnum}{GIB}{}{}, % \S 34]}ΣytkΣ−1tk=\@@cite[cite]{[\@@bibref{Refn% um}{GIB}{}{}, \S 33]}ATΣxA+Σξ+% \@@cite[cite]{[\@@bibref{Refnum}{GIB}{}{}, \S 33]}(ATΣxA+Σξ)ATΣxy⋅⋅Σy|tkΣyxA\@@cite[cite]{[\@@bibref{Refnum}{GIB}{}{}, \S 33]}(ATΣxA+Σξ)T=ATΣxA+Σξ+(ATΣxA+Σξ)ATΣxy⋅⋅MMSE(Σy+ΣytkΣ−1tkΣtky)ΣyxA(ATΣxA+Σξ)T=ATΣxA+Σξ+(ATΣxA+Σξ)ATΣxy⋅(Σy+\@@cite[cite]{[\@@bibref{Refnu% m}{GIB}{}{}, \S 5]}ATΣyx(% \@@cite[cite]{[\@@bibref{Refnum}{GIB}{}{}, \S 5]}(AΣxAT+Σξ)\@@cite[cite]{[\@@bibref{Refnum}{GIB}{}{}, \S 5]}ΣxyA)⋅⋅ΣyxA(ATΣxA+Σξ)T.

Now we show this formulation is equivalent to the Kalman filter with the following notations:

 P−k≜(ATΣxA+Σξ)  ,H≜ATΣyx,  R≜Σy,
 Pk−1≜Σx,Q≜Σξ.

Substituting we get:

 P−k(ATΣxA+Σξ)+P−k(ATΣxA+Σξ)HTATΣxy⋅⋅(RΣy+HATΣyxP−k(ATΣxA+Σξ)HTΣxyA)⋅⋅HΣyxAP−k(ATΣxA+Σξ).

Which is equivalent to (12). Now we can apply Theorem 1 and get the desired result.

## Appendix C

{proof}

In the case where , the MAP covariance matrix as computed by the GIB algorithm is:

 Σξk+1=βΣtk|y+(1−β)Σtk (13)

This is a weighted average of two covariance matrices. is computed at the first phase of the algorithm (equivalent to the prediction phase in Kalman literature), and is computed in the second phase of the algorithm (measurement phase). At the end of the Kalman iteration, we simply compute the weighted average of the two matrices to get (13). Finally, we compute using (eq. 6b) by substituting the modified .

## Appendix D

{proof}

We start by expanding the Affine-scaling update rule:

Looking at the numerator and using the Schur complement formula (11) with the following notations: we get the following matrix: . Again, the upper left block is a Schur complement of the following matrix: . In total with get a block matrix of the form: .
Note that the divisor is a scalar which affects the scaling of the step size.

Using Theorem 1, we get a computation of Kalman filter with the following parameters: . This has an interesting interpretation in the context of Kalman filter: both prediction and measurement transformation are identical and equal . The noise variance of both transformations are Gaussian variables with prior .

### Footnotes

1. In this paper, we assume there is no external input, namely . However, our approach can be easily extended to support external inputs.
2. Another possible extension is that the matrices change in time, in this paper we assume they are fixed. However, our approach can be generalized to this case as well.

### References

1. D. Bickson, O. Shental, P. H. Siegel, J. K. Wolf, and D. Dolev, “Gaussian belief propagation based multiuser detection,” in IEEE Int. Symp. on Inform. Theory (ISIT), Toronto, Canada, July 2008.
2. ——, “Linear detection via belief propagation,” in Proc. 45th Allerton Conf. on Communications, Control and Computing, Monticello, IL, USA, Sept. 2007.
3. O. Shental, D. Bickson, P. H. Siegel, J. K. Wolf, and D. Dolev, “Gaussian belief propagation solver for systems of linear equations,” in IEEE Int. Symp. on Inform. Theory (ISIT), Toronto, Canada, July 2008.
4. G. Welch and G. Bishop, “An introduction to the kalman filter,” Tech. Rep., 2006. [Online]. Available: http://www.cs.unc.edu/~welch/kalman/kalmanIntro.html
5. G. Chechik, A. Globerson, N. Tishby, and Y. Weiss, “Information bottleneck for gaussian variables,” in Journal of Machine Learning Research, vol. 6.   Cambridge, MA, USA: MIT Press, 2005, pp. 165–188.
6. F. Kschischang, B. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” in IEEE Transactions on Information Theory, vol. 47, Feb. 2001, pp. 498–519.
7. S. Puthenpura, L. Sinha, S.-C. Fang, and R. Saigal, “Solving stochastic programming problems via kalman filter and affine scaling,” in European Journal of Operational Research, vol. 83, no. 3, 1995, pp. 503–513. [Online]. Available: http://ideas.repec.org/a/eee/ejores/v83y1995i3p503-513.html
8. N. Tishby, F. Pereira, and W. Bialek, “The information bottleneck method,” in The 37th annual Allerton Conference on Communication, Control, and Computing, invited paper, September 1999.
9. N. Slonim, “The information bottelneck: Theory and application,” in Ph.D. Thesis, School of Computer Science and Enigeering, The Hebrew University of Jerusalem, 2003.
10. W. Bialek, I. Nemenman, and N. Tishby, “Predictability, complexity, and learning,” in Neural Comput., vol. 13, no. 11.   Cambridge, MA, USA: MIT Press, November 2001, pp. 2409–2463.
11. R. J. Vanderbei, M. S. Meketon, and B. A. Freedman, “A modification of karmarkar’s linear programming algorithm,” in Algorithmica, vol. 1, no. 1, March 1986, pp. 395–407.
12. Y. Weiss and W. T. Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology,” in Neural Computation, vol. 13, no. 10, 2001, pp. 2173–2200.
13. J. Johnson, D. Malioutov, and A. Willsky, “Walk-sum interpretation and analysis of gaussian belief propagation,” in Nineteenth Annual Conference on Neural Information Processing Systems (NIPS 05’), 2005.
14. D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” in Journal of Machine Learning Research, vol. 7, Oct. 2006.
15. D. Bickson, Y. Tock, D. Dolev, and O. Shental, “Polynomial linear programming with gaussian belief propagation,” in the 46th Allerton Conf. on Communications, Control and Computing, Monticello, IL, USA, 2008.
16. D. Bickson and D. Malkhi, “A unifying framework for rating users and data items in peer-to-peer and social networks,” in Peer-to-Peer Networking and Applications (PPNA) Journal, Springer-Verlag, 2008.
17. D. Bickson, D. Dolev, and E. Yom-Tov, “A gaussian belief propagation solver for large scale support vector machines,” in 5th European Conference on Complex Systems, Jerusalem, Sept. 2008.
72418