Gaussian Belief Propagation Solverfor Systems of Linear Equations

# Gaussian Belief Propagation Solver for Systems of Linear Equations

## Abstract

1

The canonical problem of solving a system of linear equations arises in numerous contexts in information theory, communication theory, and related fields. In this contribution, we develop a solution based upon Gaussian belief propagation (GaBP) that does not involve direct matrix inversion. The iterative nature of our approach allows for a distributed message-passing implementation of the solution algorithm. We also address some properties of the GaBP solver, including convergence, exactness, its max-product version and relation to classical solution methods. The application example of decorrelation in CDMA is used to demonstrate the faster convergence rate of the proposed solver in comparison to conventional linear-algebraic iterative solution methods.

## 1 Problem Formulation and Introduction

Solving a system of linear equations is one of the most fundamental problems in algebra, with countless applications in the mathematical sciences and engineering. Given the observation vector , and the data matrix , a unique solution, , exists if and only if the data matrix is full rank. In this contribution we concentrate on the popular case where the data matrices, , are also symmetric (e.g. , as in correlation matrices). Thus, assuming a nonsingular symmetric matrix , the system of equations can be solved either directly or in an iterative manner. Direct matrix inversion methods, such as Gaussian elimination (LU factorization, [1]-Ch. 3) or band Cholesky factorization ([1]-Ch. 4), find the solution with a finite number of operations, typically, for a dense matrix, on the order of . The former is particularly effective for systems with unstructured dense data matrices, while the latter is typically used for structured dense systems.

Iterative methods [2] are inherently simpler, requiring only additions and multiplications, and have the further advantage that they can exploit the sparsity of the matrix to reduce the computational complexity as well as the algorithmic storage requirements [3]. By comparison, for large, sparse and amorphous data matrices, the direct methods are impractical due to the need for excessive row reordering operations. The main drawback of the iterative approaches is that, under certain conditions, they converge only asymptotically to the exact solution  [2]. Thus, there is the risk that they may converge slowly, or not at all. In practice, however, it has been found that they often converge to the exact solution or a good approximation after a relatively small number of iterations.

A powerful and efficient iterative algorithm, belief propagation (BP) [4], also known as the sum-product algorithm, has been very successfully used to solve, either exactly or approximately, inference problems in probabilistic graphical models [5]. In this paper, we reformulate the general problem of solving a linear system of algebraic equations as a probabilistic inference problem on a suitably-defined graph. We believe that this is the first time that an explicit connection between these two ubiquitous problems has been established. As an important consequence, we demonstrate that Gaussian BP (GaBP) provides an efficient, distributed approach to solving a linear system that circumvents the potentially complex operation of direct matrix inversion.

We shall use the following notations. The operator denotes a vector or matrix transpose, the matrix is a identity matrix, while the symbols and denote entries of a vector and matrix, respectively.

## 2 The GaBP Solver

### 2.1 From Linear Algebra to Probabilistic Inference

We begin our derivation by defining an undirected graphical model (i.e. , a Markov random field), , corresponding to the linear system of equations. Specifically, let , where is a set of nodes that are in one-to-one correspondence with the linear system’s variables , and where is a set of undirected edges determined by the non-zero entries of the (symmetric) matrix . Using this graph, we can translate the problem of solving the linear system from the algebraic domain to the domain of probabilistic inference, as stated in the following theorem.

###### Proposition 1 (Solution and inference)

The computation of the solution vector is identical to the inference of the vector of marginal means over the graph with the associated joint Gaussian probability density function .

{proof}

Another way of solving the set of linear equations is to represent it by using a quadratic form . As the matrix is symmetric, the derivative of the quadratic form w.r.t. the vector is given by the vector . Thus equating gives the stationary point , which is nothing but the desired solution to . Next, one can define the following joint Gaussian probability density function

 p(x)≜Z−1exp(−q(x))=Z−1exp(−xTAx/2+bTx), (1)

where is a distribution normalization factor. Denoting the vector , the Gaussian density function can be rewritten as

 p(x) = Z−1exp(μTAμ/2) (2) × exp(−xTAx/2+μTAx−μTAμ/2) = ζ−1exp(−(x−μ)TA(x−μ)/2) = N(μ,A−1),

where the new normalization factor . It follows that the target solution is equal to , the mean vector of the distribution , as defined above (1). Hence, in order to solve the system of linear equations we need to infer the marginal densities, which must also be Gaussian, , where and are the marginal mean and inverse variance (sometimes called the precision), respectively.

According to Proposition 1, solving a deterministic vector-matrix linear equation translates to solving an inference problem in the corresponding graph. The move to the probabilistic domain calls for the utilization of BP as an efficient inference engine.

### 2.2 Belief Propagation in Graphical Model

Belief propagation (BP) is equivalent to applying Pearl’s local message-passing algorithm [4], originally derived for exact inference in trees, to a general graph even if it contains cycles (loops). BP has been found to have outstanding empirical success in many applications, e.g. , in decoding Turbo codes and low-density parity-check (LDPC) codes. The excellent performance of BP in these applications may be attributed to the sparsity of the graphs, which ensures that cycles in the graph are long, and inference may be performed as if the graph were a tree.

Given the data matrix and the observation vector , one can write explicitly the Gaussian density function,  (2), and its corresponding graph consisting of edge potentials (‘compatibility functions’) and self potentials (‘evidence’) . These graph potentials are simply determined according to the following pairwise factorization of the Gaussian function (1)

 p(x)∝n∏i=1ϕi(xi)∏{i,j}ψij(xi,xj), (3)

resulting in and . Note that by completing the square, one can observe that . The graph topology is specified by the structure of the matrix , i.e. , the edges set includes all non-zero entries of for which .

The BP algorithm functions by passing real-valued messages across edges in the graph and consists of two computational rules, namely the ‘sum-product rule’ and the ‘product rule’. In contrast to typical applications of BP in coding theory [6], our graphical representation resembles a pairwise Markov random field[5] with a single type of propagating message, rather than a factor graph [7] with two different types of messages, originating from either the variable node or the factor node. Furthermore, in most graphical model representations used in the information theory literature the graph nodes are assigned discrete values, while in this contribution we deal with nodes corresponding to continuous variables. Thus, for a graph composed of potentials and as previously defined, the conventional sum-product rule becomes an integral-product rule [8] and the message , sent from node to node over their shared edge on the graph, is given by

 mij(xj)∝∫xiψij(xi,xj)ϕi(xi)∏k∈N(i)∖jmki(xi)dxi. (4)

The marginals are computed (as usual) according to the product rule

 p(xi)=αϕi(xi)∏k∈N(i)mki(xi), (5)

where the scalar is a normalization constant. The set of graph nodes denotes the set of all the nodes neighboring the th node. The set excludes the node from .

### 2.3 The Gaussian BP Algorithm

Gaussian BP is a special case of continuous BP, where the underlying distribution is Gaussian. Now, we derive the Gaussian BP update rules by substituting Gaussian distributions into the continuous BP update equations (4)-(5). Before describing the inference algorithm performed over the graphical model, we make the elementary but very useful observation that the product of Gaussian densities over a common variable is, up to a constant factor, also a Gaussian density.

###### Lemma 2 (Product of Gaussians)

Let and be the probability density functions of a Gaussian random variable with two possible densities and , respectively. Then their product, is, up to a constant factor, the probability density function of a Gaussian random variable with distribution , where

 P−1 = (P1+P2)−1, (6) μ = P−1(P1μ1+P2μ2). (7)
{proof}

The proof of this lemma is straightforward, thus omitted.

Fig. 1. plots a portion of a certain graph, describing the neighborhood of node . Each node (empty circle) is associated with a variable and self potential , which is a function of this variable, while edges are identified with the pairwise (symmetric) potentials . Messages propagate along the edges in both directions. The messages relevant for the computation of message are shown in Fig. 1.). Looking at the right hand side of the integral-product rule (4), node needs to first calculate the product of all incoming messages, except for the message coming from node . Recall that since is jointly Gaussian, the factorized self potentials and similarly all messages are of Gaussian form as well.

As the terms in the product of the incoming messages and the self potential in the integral-product rule (4) are all a function of the same variable, (associated with the node ), then, according to the multivariate extension of Lemma 2, is proportional to a certain Gaussian distribution, . Applying the multivariate version of the product precision expression in (6), the update rule for the inverse variance is given by (over-braces denote the origin of each of the terms)

 Pi∖j=ϕi(xi)Pii+∑k∈N(i)∖jmki(xi)Pki, (8)

where is the inverse variance a-priori associated with node , via the precision of , and are the inverse variances of the messages . Similarly, using (7) for the multivariate case, we can calculate the mean

 μi∖j=P−1i∖j(ϕi(xi)Piiμii+∑k∈N(i)∖jmki(xi)Pkiμki), (9)

where is the mean of the self potential and are the means of the incoming messages.

Next, we calculate the remaining terms of the message , including the integration over . After some algebraic manipulation, using the Gaussian integral , we find that the messages are proportional to a normal distribution with precision and mean

 Pij=−A2ijP−1i∖j, (10) μij=−P−1ijAijμi∖j. (11)

These two scalars represent the messages propagated in the GaBP-based algorithm.

Finally, computing the product rule (5) is similar to the calculation of the previous product and the resulting mean (9) and precision (8), but including all incoming messages. The marginals are inferred by normalizing the result of this product. Thus, the marginals are found to be Gaussian probability density functions with precision and mean

 Pi=ϕi(xi)Pii+∑k∈N(i)mki(xi)Pki, (12) μi=P−1i∖j(ϕi(xi)Piiμii+∑k∈N(i)mki(xi)Pkiμki), (13)

respectively.

For a dense data matrix, the number of messages passed on the graph can be reduced from (i.e. , twice the number of edges) down to messages per iteration round by using a similar construction to Bickson et al.  [9]: Instead of sending a unique message composed of the pair of and from node to node , a node broadcasts aggregated sums to all its neighbors, and consequently each node can retrieve locally  (8) and  (9) from the aggregated sums

 ~Pi = Pii+∑k∈N(i)Pki, (14) ~μi = ~P−1i(Piiμii+∑k∈N(i)Pkiμki) (15)

by means of a subtraction

 Pi∖j = ~Pi−Pji, (16) μi∖j = ~μi−P−1i∖jPjiμji. (17)

The following pseudo-code summarizes the GaBP solver algorithm.

###### Algorithm 1 (GaBP solver)
1. Initialize: Set the neighborhood to include   . Fix the scalars and , . Set the initial broadcast messages    and . Set the initial internal scalars    and . Set a convergence threshold . 2. Iterate: Broadcast the aggregated sum messages , ,   (under chosen scheduling). Compute the internal scalars , . 3. Check: If the internal scalars and did not   converge (w.r.t. ), return to Step 2. Else, continue to Step 4. 4. Infer: Compute the marginal means   , . Optionally compute the marginal precisions    5. Solve: Find the solution , .

### 2.4 Max-Product Rule

A well-known alternative to the sum-product BP algorithm is the max-product (a.k.a. min-sum) algorithm [5]. In this variant of BP, a maximization operation is performed rather than marginalization, i.e. , variables are eliminated by taking maxima instead of sums. For trellis trees (e.g. , graphical representation of convolutional codes or ISI channels), the conventional sum-product BP algorithm boils down to performing the BCJR algorithm, resulting in the most probable symbol, while its max-product counterpart is equivalent to the Viterbi algorithm, thus inferring the most probable sequence of symbols [7].

In order to derive the max-product version of the proposed GaBP solver, the integral(sum)-product rule (4) is replaced by a new rule

 mij(xj)∝argmaxxiψij(xi,xj)ϕi(xi)∏k∈N(i)∖jmki(xi). (18)

Computing according to this max-product rule, one gets (the exact derivation is omitted)

 mij(xj)∝N(μij=−P−1ijAijμi∖j,P−1ij=−A−2ijPi∖j), (19)

which is identical to the messages derived for the sum-product case (10)-(11). Thus interestingly, as opposed to ordinary (discrete) BP, the following property of the GaBP solver emerges.

###### Corollary 3 (Max-product)

The max-product (18) and sum-product (4) versions of the GaBP solver are identical.

## 3 Convergence and Exactness

In ordinary BP, convergence does not guarantee exactness of the inferred probabilities, unless the graph has no cycles. Luckily, this is not the case for the GaBP solver. Its underlying Gaussian nature yields a direct connection between convergence and exact inference. Moreover, in contrast to BP, the convergence of GaBP is not limited to acyclic or sparse graphs and can occur even for dense (fully-connected) graphs, adhering to certain rules that we now discuss. We can use results from the literature on probabilistic inference in graphical models [8, 10, 11] to determine the convergence and exactness properties of the GaBP solver. The following two theorems establish sufficient conditions under which GaBP is guaranteed to converge to the exact marginal means.

###### Theorem 4

[8, Claim 4] If the matrix is strictly diagonally dominant (i.e. , ), then GaBP converges and the marginal means converge to the true means.

This sufficient condition was recently relaxed to include a wider group of matrices.

###### Theorem 5

[10, Proposition 2] If the spectral radius (i.e. , the maximum of the absolute values of the eigenvalues) of the matrix satisfies , then GaBP converges and the marginal means converge to the true means.

There are many examples of linear systems that violate these conditions for which the GaBP solver nevertheless converges to the exact solution. In particular, if the graph corresponding to the system is acyclic (i.e. , a tree), GaBP yields the exact marginal means (and even marginal variances), regardless of the value of the spectral radius [8].

## 4 Relation to Classical Solution Methods

It can be shown (see also Plarre and Kumar [12]) that the GaBP solver (Algorithm 1) for a system of linear equations represented by a tree graph is identical to the renowned direct method of Gaussian elimination (a.k.a. LU factorization, [1]). The interesting relation to classical iterative solution methods [2] is revealed via the following proposition.

###### Proposition 6 (Jacobi and GaBP solvers)

The GaBP solver (Algorithm 1)

1. with inverse variance messages arbitrarily set to zero, i.e. , ;

2. incorporating the message received from node when computing the message to be sent from node to node , i.e. , replacing with ;

is identical to the Jacobi iterative method.

{proof}

Arbitrarily setting the precisions to zero, we get in correspondence to the above derivation,

 Pi∖j = Pii=Aii, (20) Pijμij = −Aijμi∖j, (21) μi = A−1ii(bi−∑k∈N(i)Akiμk∖i). (22)

Note that the inverse relation between and  (10) is no longer valid in this case. Now, we rewrite the mean  (9) without excluding the information from node ,

 μi∖j=A−1ii(bi−∑k∈N(i)Akiμk∖i). (23)

Note that , hence the inferred marginal mean  (22) can be rewritten as

 μi=A−1ii(bi−∑k≠iAkiμk), (24)

where the expression for all neighbors of node is replaced by the redundant, yet identical, expression . This fixed-point iteration (24) is identical to the element-wise expression of the Jacobi method[2], concluding the proof.

Now, the Gauss-Seidel (GS) method can be viewed as a ‘serial scheduling’ version of the Jacobi method; thus, based on Proposition 6, it can be derived also as an instance of the serial (message-passing) GaBP solver. Next, since successive over-relaxation (SOR) is nothing but a GS method averaged over two consecutive iterations, SOR can be obtained as a serial GaBP solver with ‘damping’ operation [13].

## 5 Application Example: Linear Detection

We examine the implementation of a decorrelator linear detector in a CDMA system with spreading codes based upon Gold sequences of length . Two system setups are simulated, corresponding to and users. The decorrelator detector, a member of the family of linear detectors, solves a system of linear equations, , where the matrix is equal to the correlation matrix , and the observation vector is identical to the -length CDMA channel output vector . Thus, the vector of decorrelator decisions is determined by taking the signum (for binary signaling) of the vector . Note that and in this case are not strictly diagonally dominant, but their spectral radii are less than unity, since and , respectively. In all of the experiments, we assumed the (noisy) output sample was the all-ones vector.

Table 1 compares the proposed GaBP solver with standard iterative solution methods [2], previously employed for CDMA multiuser detection (MUD). Specifically, MUD algorithms based on the algorithms of Jacobi, GS and (optimally weighted) SOR were investigated [14, 15, 16]. Table 1 lists the convergence rates for the two Gold code-based CDMA settings. Convergence is identified and declared when the differences in all the iterated values are less than . We see that, in comparison with the previously proposed detectors based upon the Jacobi and GS algorithms, the serial (asynchronous) message-passing GaBP detector converges more rapidly for both and and achieves the best overall convergence rate, surpassing even the optimal SOR-based detector. Further speed-up of the GaBP solver can be achieved by adopting known acceleration techniques from linear algebra. Table 1 demonstrates the speed-up of the GaBP solver obtained by using such an acceleration method, termed Steffensen’s iterations [17], in comparison with the accelerated Jacobi algorithm (diverged for the 4 users setup). We remark that this is the first time such an acceleration method is examined within the framework of message-passing algorithms and that the region of convergence of the accelerated GaBP solver remains unchanged.

The convergence contours for the Jacobi and parallel (synchronous) GaBP solvers for the case of 3 users are plotted in the space of in Fig. 2. As expected, the Jacobi algorithm converges in zigzags directly towards the fixed point. It is interesting to note that the GaBP solver’s convergence is in a spiral shape, hinting that despite the overall convergence improvement, performance improvement is not guaranteed in successive iteration rounds. Further results and elaborate discussion on the application of GaBP specifically to linear MUD may be found in recent contributions [18, 19].

### Footnotes

1. footnotetext: Contributed equally to this work.
Supported in part by NSF Grant No. CCR-0514859 and EVERGROW, IP 1935 of the EU Sixth Framework.

### References

1. G. H. Golub and C. F. V. Loan, Matrix Computation, 3rd ed.   Baltimore, MD: The Johns Hopkins University Press, 1996.
2. O. Axelsson, Iterative Solution Methods.   Cambridge, UK: Cambridge University Press, 1994.
3. Y. Saad, Iterative Methods for Sparse Linear Systems.   PWS Publishing company, 1996.
4. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.   San Francisco: Morgan Kaufmann, 1988.
5. M. I. Jordan, Ed., Learning in Graphical Models.   Cambridge, MA: The MIT Press, 1999.
6. T. Richardson and R. Urbanke, Modern Coding Theory.   Cambridge University Press, 2007.
7. F. Kschischang, B. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inform. Theory, vol. 47, pp. 498–519, Feb. 2001.
8. Y. Weiss and W. T. Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology,” Neural Computation, vol. 13, no. 10, pp. 2173–2200, 2001.
9. D. Bickson, D. Dolev, and Y. Weiss, “Modified belief propagation for energy saving in wireless and sensor networks,” in Leibniz Center TR-2005-85, School of Computer Science and Engineering, The Hebrew University, 2005. [Online]. Available: http://leibniz.cs.huji.ac.il/tr/842.pdf
10. J. K. Johnson, D. M. Malioutov, and A. S. Willsky, “Walk-sum interpretation and analysis of Gaussian belief propagation,” in Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, Eds.   Cambridge, MA: MIT Press, 2006, pp. 579–586.
11. D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” Journal of Machine Learning Research, vol. 7, Oct. 2006.
12. K. Plarre and P. Kumar, “Extended message passing algorithm for inference in loopy Gaussian graphical models,” Ad Hoc Networks, 2004.
13. K. M. Murphy, Y. Weiss, and M. I. Jordan, “Loopy belief propagation for approximate inference: An empirical study,” in Proc. of UAI, 1999.
14. A. Yener, R. D. Yates, and S. Ulukus, “CDMA multiuser detection: A nonlinear programming approach,” IEEE Trans. Commun., vol. 50, no. 6, pp. 1016–1024, June 2002.
15. A. Grant and C. Schlegel, “Iterative implementations for linear multiuser detectors,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1824–1834, Oct. 2001.
16. P. H. Tan and L. K. Rasmussen, “Linear interference cancellation in CDMA based on iterative techniques for linear equation systems,” IEEE Trans. Commun., vol. 48, no. 12, pp. 2099–2108, Dec. 2000.
17. P. Henrici, Elements of Numerical Analysis.   New York: John Wiley and Sons, 1964.
18. D. Bickson, O. Shental, P. H. Siegel, J. K. Wolf, and D. Dolev, “Linear detection via belief propagation,” in Proc. 45th Allerton Conf. on Communications, Control and Computing, Monticello, IL, USA, Sept. 2007.
19. ——, “Gaussian belief propagation based multiuser detection,” in IEEE Int. Symp. on Inform. Theory (ISIT), Toronto, Canada, July 2008.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters