Convergence Analysis of Belief Propagation for Pairwise Linear Gaussian Models
Gaussian belief propagation (BP) has been widely used for distributed inference in large-scale networks such as the smart grid, sensor networks, and social networks, where local measurements/observations are scattered over a wide geographical area. One particular case is when two neighboring agents share a common observation. For example, to estimate voltage in the direct current (DC) power flow model, the current measurement over a power line is proportional to the voltage difference between two neighboring buses. When applying the Gaussian BP algorithm to this type of problem, the convergence condition remains an open issue. In this paper, we analyze the convergence properties of Gaussian BP for this pairwise linear Gaussian model. We show analytically that the updating information matrix converges at a geometric rate to a unique positive definite matrix with arbitrary positive semidefinite initial value and further provide the necessary and sufficient convergence condition for the belief mean vector to the optimal estimate.
Gaussian belief propagation (BP) provides an efficiently distributed way to compute the marginal distribution from the joint distribution of unknown random variables, and it has been adopted in a variety of areas such as distributed power state estimation  in power networks, synchronization  in wireless communication networks , cooperative localization in distributed networks , factor analyzer network , sparse Bayesian learning , and peer-to-peer rating in social networks . In one particular model of interested studied in ), two neighboring agents share a common observation. In this paper, we name this type of model pairwise linear Gaussian models.
Although with great empirical success, the major challenge that hinders Gaussian BP to realize its full potential is the lack of theoretical guarantees of convergence in loopy networks. Sufficient convergence conditions for Gaussian BP have been developed in  when the underlying Gaussian distribution is expressed in terms of pairwise connections between scalar variables (also known as Markov random field (MRF)). However, as demonstrated in  the iterative equations for Gaussian BP on MRFs are different from that for distributed estimation problems such as in , where linear measurements are involved. Therefore, the existing conditions and analysis methods in  are not applicable to distributed estimation problems. Though  gives the necessary and sufficient condition of BP for the Gaussian linear model, the type of observation allowed in  is not the most general in the sense that it does not allow two neighboring agents to share a common observation. In this paper, we focus particularly on the convergence analysis of BP for this pairwise linear Gaussian model. We show analytically that the updating of the information matrix converges at a geometric rate to a unique positive definite matrix with arbitrary positive semidefinite initial value and further provide the necessary and sufficient convergence condition for the updating belief mean vector to the optimal estimate.
Note that, in the setup of deterministic unknown parameter estimation, the distributed algorithm based on the consensusinnovations philosophy proposed in  (see also the related family of diffusion algorithms ) converges to the optimal centralized estimator under the assumption of global observability of the (aggregate) sensing model and connectivity of the inter-agent communication network. In particular, these algorithms allow 1) the communication or message exchange network to be different from the physical coupling network, and 2) the communication network to have arbitrary network structure with cycles (as long as it is connected). The results in  imply that the unknown variables can be reconstructed completely at each agent in the network. For large-scale networks with high dimensional , it may be impractical to reconstruct at every agent. In , the author developed approaches to address this problem, where each agent can reconstruct a set of unknown variables that should be larger than the set of variables that influence its local measurement. This paper studies a different distributed estimation problem when each agent estimates only its own unknown variables under pairwise independence condition of the unknown variables; this leads to lower dimensional data exchanges between neighbors.
Consider a general connected network of agents, with denoting the set of agents, and as the set of all undirect communication links in the network, i.e., if and are within the communication range, . The local observations, , between agents and are modeled by a pairwise Gaussian linear model:
where and are the known coefficient matrices with full column rank, and are the local unknown vector parameters at agent and with dimension and , and with the prior distribution and and is the additive noise with distribution . It is assumed that and for . The goal is to estimate , based on , and for all . Note that in (Equation 1), .
In centralized estimation, all the observations at different agents are forwarded to a central processing unit. Define vectors , and as the stacking of , and in ascending order first with respect to and then on , respectively; then we obtain where is constructed from , with specific arrangement depending on the network topology. Assuming is a full column rank matrix, and since is a Gaussian random vector, the optimal estimate of is given by
where and are block diagonal matrices containing and as their diagonal blocks, respectively. Although well-established, the drawbacks of the centralized estimation in large-scale networks include 1) the transmission of , and from peripheral agents to the computation center imposes huge communication overhead; 2) knowledge of the global network topology is needed in order to construct ; and 3) the computation burden at the computation center scales up with the cubic of the dimension of the matrix inverse in (Equation 2) with complexity order .
The joint distribution is first written as the product of the prior distribution and the likelihood function as
To facilitate the derivation of the distributed inference algorithm, the factorization above is expressed in terms of a factor graph, where every variable vector is represented by a variable node and the probability distribution of a vector variable or a group of vector variables is represented by a factor node. A variable node is connected to a factor node if the variable is involved in that particular factor. It involves two types of messages: One is the message from a factor node with function to its neighboring variable node , defined as
where denotes the set of neighboring variable nodes of factor node on the factor graph. The other type of message is from factor node , which denotes a likelihood function or prior distribution, to its neighboring variable node and it is defined as
where denotes the set of neighbouring factor nodes of , and is the message from to at time . The process iterates between equations (Equation 4) and (Equation 3). At each iteration , the approximate marginal distribution, also named belief, on is computed locally at as
It can be shown that the message from factor node to variable node is given by 
where and are the message covariance matrix and mean vector received at variable node at the iteration with
Furthermore, the general expression for the message from variable node to factor node is
where and are the message covariance matrix and mean vector received at variable node at the - iteration, with the information matrix computed as
and the mean vector is
Following Lemma 2 in , we know that setting the initial information matrix for all and guarantees for . Therefore, let the initial messages at factor node be in Gaussian function forms with covariance for all and . Then all the messages and exist and are in Gaussian form. Furthermore, during each round of message passing, each agent can compute the belief for using (Equation 5), which can be easily shown to be
with the inverse of the covariance matrix
and mean vector
The iterative algorithm based on BP is summarized as follows. The algorithm is started by setting the message from factor node to variable node as with a random initial vector and . At each round of message exchange, every variable node computes the outgoing messages to factor nodes according to (Equation 10) and (Equation 11). After receiving the messages from its neighboring variable nodes, each factor node computes its outgoing messages according to (Equation 7) and (Equation 8). Such iteration is terminated when (Equation 13) converges (e.g., when , where is a threshold) or the maximum number of iterations is reached. Then the estimate of of each node is obtained as in (Equation 13).
3 Convergence Analysis
The challenge of deploying the BP algorithm for large-scale networks is determining whether it will converge. In particular, it is generally known that, if the factor graph contains cycles, the BP algorithm may diverge. Thus, determining convergence conditions for the BP algorithm is very important. Sufficient conditions for the convergence of Gaussian BP with scalar variable in loopy graphs are available in  for Markov random fields. Unfortunately, as first pointed out in , the convergence analysis for the Gaussian Markov random field and for the Gaussian linear model are quite different due to different iteration equations. Though  gives the necessary and sufficient condition of BP for the Gaussian linear model, the type of observations allowed in  (e.g., equation (1) in ), is not the most general in the sense that it does not allow two neighboring agents to share a common observation as in equation (1) in this paper. In the following, we provide the convergence analysis of Gaussian BP for the pairwise linear Gaussian model.
Due to the recursively updating property of and in (Equation 9) and (Equation 6), the message evolution can be simplified by combining these two types of messages into a single one. By substituting in (Equation 10) into (Equation 7), the updating of the message covariance matrix inverse, named message information matrix in the following, can be denoted as
Observing that in (Equation 14) is independent of , the other type of updating information, we first focus on the convergence property of .
To consider the updates of all message information matrices, we
introduce the following definitions. Let be a block diagonal matrix with diagonal blocks being the message information matrices in the network at time with index arranged in ascending order first on and then on . Using the definition of , the term in (Equation 14) can be written as , where selects appropriate components from to form the summation.
We define the function that updates . Then, by stacking on the left side of (Equation 15) for all and as the block diagonal matrix , we obtain
where , , , , and are block diagonal matrices with block elements , , , , and , respectively, arranged in ascending order, first on and then on (i.e., the same order as in ). We first present properties of the updating operator , where the proof follows that in .
P ?.1: , if .
P ?.2: and , if and .
P ?.3: Define and . With arbitrary , is bounded by for .
In this paper, () means that is positive semidefinite (definite). Note is different from the function in . However, as demonstrated in , if a function satisfies Property 1, we can establish the convergence property for given by the following Theorem with detailed provided in .
Thus, if we choose for all and , then converges at a double exponential rate to a unique p.d. matrix . Furthermore, according to (Equation 10), also converges to a p.d. matrix once converges; the converged value is denoted by . Then, for arbitrary initial value , the evolution of in (Equation 11) can be written in terms of the limit message information matrices as
Using (Equation 8), and replacing indices , with , respectively, is given by
where and . The above equation for all cases can be further written in a compact form as
with the column vector containing all as subvectors with ascending index on . Similarly, containing all as subvectors with ascending index on , and containing for all as subvectors with ascending index first on and then on . The matrix is a block matrix with component blocks and where . We further define a diagonal block matrix as with increasing order on , and and be the vectors containing and , respectively, with the same stacking order as . Following (Equation 20), we have
For this linear updating equation, it is well known that, for arbitrary initial value , converges if and only if the spectral radius . Note that an algorithmically we to check this condition in a distributed manner is provided in . As convergence of depends on the convergence of , we have the following result.
According to (Equation 13), the convergence of depends on and . As Theorem 1 shows that is convergence guaranteed with arbitrary positive semidefinite initial value, the convergence condition of is equivalent to the convergence of . Moreover, as shown in , once converges, it converges to . We therefore conclude that the necessary and sufficient convergence condition of to the optimal estimate is .
In this paper, we have studied distributed inference using Gaussian belief propagation (BP) over networks with two neighboring agents sharing a common observation. We have analyzed the convergence property of the Gaussian BP algorithm for this particular model. We have shown analytically that, with arbitrary positive semidefinite matrix initialization, the message information matrix exchanged among agents converges at a geometric rate to a unique positive definite matrix. Moreover, we have presented the necessary and sufficient condition for convergence under which the belief mean vector converges to the optimal centralized estimate.
- Y. Hu, A. Kuh, T. Yang, and A. Kavcic, “A belief propagation based power distribution system state estimator,” IEEE Comput. Intell. Mag., vol. 6, no. 3, pp. 36–46, 2011.
- J. Du and Y.-C. Wu, “Distributed clock skew and offset estimation in wireless sensor networks: Asynchronous algorithm and convergence analysis,” IEEE Trans. Wireless Commun., vol. 12, no. 11, pp. 5908–5917, Nov 2013.
- ——, “Network-wide distributed carrier frequency offsets estimation and compensation via belief propagation,” IEEE Trans. Signal Process., vol. 61, no. 23, pp. 5868–5877, 2013.
- ——, “Fully distributed clock skew and offset estimation in wireless sensor networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 4499–4503.
- Y. Zhou, H. Liu, Z. Pan, L. Tian, J. Shi, and G. Yang, “Two-stage cooperative multicast transmission with optimized power consumption and guaranteed coverage,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 2, pp. 274–284, 2014.
- Y. Zhou, H. Liu, Z. Pan, L. Tian, and J. Shi, “Spectral - and energy-efficient two-stage cooperative multicast for lte-advanced and beyond,” IEEE Wireless Communications, vol. 21, no. 2, pp. 34–41, 2014.
- H. Wymeersch, J. Lien, and M. Z. Win, “Cooperative localization in wireless networks,” Proceedings of the IEEE, vol. 97, no. 2, pp. 427–450, 2009.
- B. J. Frey, “Local probability propagation for factor analysis,” in Neural Information Processing Systems (NIPS), Dec 1999, pp. 442–448.
- X. Tan and J. Li, “Computationally efficient sparse Bayesian learning via belief propagation,” IEEE Trans. Signal Process., vol. 58, no. 4, pp. 2010–2021, April 2010.
- D. Bickson and D. Malkhi, “A unifying framework for rating users and data items in peer-to-peer and social networks,” Peer-to-Peer Networking and Applications (PPNA) Journal, vol. 1, no. 2, pp. 93–103, 2008.
- J. Du, S. Ma, Y.-C. Wu, and H. V. Poor, “Distributed bayesian hybrid power state estimation with PMU synchronization errors,” in Global Communications Conference, 2014 IEEE, 2014, pp. 3174–3179.
- Y. Weiss and W. T. Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology,” Neural Computation, vol. 13, no. 10, pp. 2173–2200, Mar. 2001.
- D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” Journal of Machine Learning Research, vol. 7, no. 2, pp. 2031–2064, Feb. 2006.
- C. C. Moallemi and B. V. Roy, “Convergence of min-sum message passing for quadratic optimization,” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2413–2423, 2009.
- J. Du, S. Ma, Y.-C. Wu, S. Kar, and J. M. F. Moura, “Convergence analysis of distributed inference with vector-valued Gaussian belief propagation,” submitted for publication [Preprint Available]: https://users.ece.cmu.edu/ soummyak/GBP_convergence.
- B. L. Ng, J. Evans, S. Hanly, and D. Aktas, “Distributed downlink beamforming with cooperative base stations,” IEEE Trans. Inf. Theory, vol. 54, no. 12, pp. 5491–5499, Dec 2008.
- J. Du and Y.-C. Wu, “Distributed cfos estimation and compensation in multi-cell cooperative networks,” in International Conference on Information and Communication Technology Convergence, 2013.
- S. Kar and J. M. F. Moura, “Consensus+innovations distributed inference over networks: cooperation and sensing in networked systems,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 99–109, 2013.
- S. Kar, J. M. F. Moura, and H. Poor, “Distributed linear parameter estimation: asymptotically efficient adaptive strategies,” SIAM Journal on Control and Optimization, vol. 51, no. 3, pp. 2200–2229, 2013.
- F. S. Cattivelli and A. H. Sayed, “Diffusion LMS strategies for distributed estimation,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1035–1048, 2010.
- S. Kar, “Large scale networked dynamical systems: Distributed inference,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, Department of Electrical and Computer Engineering, June 2010.
- J. Du, S. Ma, Y.-C. Wu, S. Kar, and J. M. F. Moura, “Convergence analysis of the information matrix in Gaussian belief propagation,” in Proc. IEEE Acoustics, Speech and Signal Processing Conf. (ICASSP 2017), 2017.
- J. Du, , S. Kar, and J. M. F. Moura, “Distributed convergence verification for gaussian belief propagation,” to appear in 2017 Asilomar Conference on Signals, Systems, and Computers.