Broadcast gossip averaging algorithms: interference and asymptotical error in large networks
In this paper we study two related iterative randomized algorithms for distributed computation of averages. The first one is the recently proposed Broadcast Gossip Algorithm, in which at each iteration one randomly selected node broadcasts its own state to its neighbors. The second algorithm is a novel de-synchronized version of the previous one, in which at each iteration every node is allowed to broadcast, with a given probability: hence this algorithm is affected by interference among messages. Both algorithms are proved to converge, and their performance is evaluated in terms of rate of convergence and asymptotical error: focusing on the behavior for large networks, we highlight the role of topology and design parameters on the performance. Namely, we show that on fully-connected graphs the rate is bounded away from one, whereas the asymptotical error is bounded away from zero. On the contrary, on a wide class of locally-connected graphs, the rate goes to one and the asymptotical error goes to zero, as the size of the network grows larger.
When it comes to perform control and monitoring tasks through networked systems, a crucial role has to be played by algorithms for distributed estimation, that is algorithms to collectively compute aggregate information from locally available data. Among these problems, a prototypical one is the distributed computation of averages, also known as the average consensus problem. In the average consensus problem each node of a network is given a real number, and the goal is for the nodes to iteratively converge to a good estimate of the average of these initial values, by repeatedly communicating and updating their states.
Recently, an increasing interest has been devoted among the control and signal processing communities to randomized algorithms able to solve the average consensus problem. This is motivated because randomized algorithms may offer better performance or robustness with respect to their deterministic counterparts. As well, randomized algorithms may require less or no synchronization among the nodes, a property which is often difficult to guarantee in the applications. Moreover, it may very well happen that the communication network itself be random, thus implying the need for a stochastic analysis. These facts are especially true when communication is obtained through a wireless network. For these reasons, the present paper will study the performance of a pair of notable randomized algorithms, in terms of their ability to approach average consensus. Among randomized algorithms, researchers have devised gossip algorithms, in which at each iteration only a random subset of the nodes performs communication and update. Among these algorithms, the Broadcast Gossip Algorithm has been recently proposed: at each time step one node, randomly selected from a uniform distribution over the nodes, broadcasts its current value to its neighbors. Each of its neighbors, in turn, updates its value to a convex combination of its previous value and the received one. This algorithm may seem to require a significant synchronization, since the choice of the broadcasting node has to be done at the global level. However, it has been observed that this communication model is equivalent, up to a suitable scaling of time, to assume that each node broadcasts at time instants selected by a private Poisson process. Nevertheless, this equivalence is no longer true if broadcasting takes a finite duration of time. When this happens a node is, with non-zero probability, the target of more than one simultaneous communication and destructive collision may occur. This is especially true in wireless communications, which have to share their communication medium. Hence, the practical applicability of this algorithm in a distributed system resides either on the possibility to incorporate some nontrivial collision detection scheme, or on the assumption that communications are instantaneous. The first contribution of this paper is to relax this assumption to allow a communication model in which more than one node can broadcast at the same time, possibly implying the interference of attempted communications. Thus, we introduce a novel distributed randomized algorithm for average computation, facing the issue of interference in communication.
As a second contribution, we study both the original broadcast algorithm and the novel one, in terms of their asymptotical estimation error. Note, indeed, that the iterations of broadcast algorithms do not preserve the average of states, and in general do not converge to the initial average. Hence, it is crucial for the application to estimate such bias. In this paper, we prove that on sparse graphs with bounded degree, both algorithms are asymptotically unbiased, in the sense that the asymptotical errors go to zero as the network grows larger. Instead, on complete graphs, both algorithms are asymptotically biased. Moreover, for both algorithms we investigate significant trade-offs between speed of convergence and asymptotical performance. Our results are obtained via a mean square analysis, under the assumption that the communication graph possesses some structural symmetries, namely, that it is the Cayley graph of an Abelian group.
In latest years, many papers have dealt with distributed estimation and synchronization in networks. The latter problem, which partly motivates the interest for our algorithms, has been studied in several papers, also using approaches based on consensus, as in [25, 7]. Namely, an increasing interest has been devoted to randomized averaging algorithms. An influential pairwise gossip communication model is introduced in  and . Moreover, the interest for wireless networks has induced several authors to consider gossip models based on broadcast, rather than on pairwise communication [17, 3]. Other gossip models have been developed, for instance, in [5, 12, 26, 10].
Broadcast and wireless communication inherently imply the issue of interference among simultaneous communications. This has been considered since early times: actually, our communication model can be easily related with the slotted ALOHA protocol illustrated in . More recently, the negative effects of interference on the connectivity of wireless networks have been discussed in various papers, for instance [13, 14]. The role of interference and message collisions in consensus problems, and the effectiveness of countermeasures, has been already investigated in the computer science community : in the latter paper, consensus has to be achieved on a variable belonging to a finite set. Several related papers about real-valued average consensus problems have also appeared: we shall briefly review some of them. A few paper are concerned with design of communication protocols dealing with message collisions: for instance,  presents a data driven architecture which grants channel access to nodes based on their local data values. In , a consensus algorithm allowing simultaneous quantized transmissions has been proposed: the issue of collisions is avoided by a suitable data-dependent coding scheme. In , local additive interference is considered, and under some technical assumptions a collaborative consensus algorithm is proposed, which allows simultaneous transmissions and ensures energy savings. Besides interference, related robustness results against data losses in distributed systems have been presented, for instance, in [30, 18]. Finally, the paper  proposes a related communication model, in which the broadcasted values are received or not with a probability which depends on the transmitter and receiver nodes, rather than on the activity of their neighbors.
Our results build on the general mean square analysis developed for randomized consensus algorithms in [16, 17], and on results in algebra, linear algebra and probability theory. Indeed, our main results will assume that the network topology exhibits specific symmetries, in the sense that it is represented by the Cayley graph of an Abelian group. Cayley graphs have a long history in abstract mathematics , and they have been recently used in control theoretical applications, for instance in [29, 21], to describe communication networks. Assuming Abelian Cayley topologies is motivated by their algebraic structure, which allows a formal mathematical treatment, as well as by the potential applications. Indeed, Abelian Cayley graphs are a simplified and idealized version of communications scenarios of practical interest. In particular, they capture the effects on performance of the strong constraint that, for many networks of interest, communication is local, not only in the sense of a little number of neighbors, but also with a bound on the geometric distance among connected agents. This is especially true for wireless networks: indeed, Abelian Cayley graphs have been related, for instance in [6, 28, 8], to other models for wireless networks, as random geometric graphs or disk-graphs . Moreover, it has to be noted that several topologies which appear in the applications are themselves Abelian Cayley: for instance, complete graphs, rings, toroidal grids, hypercubes. Finally, our main findings about the asymptotical error are based on a result in  about the limit of the invariant vectors of sequences of stochastic matrices: related results dealing with comparison of Markov chains have appeared, for instance, in [19, 11].
After presenting the averaging problem and the algorithms under consideration in Section 2, we develop our analysis tools in Section 3. Later, Sections 4 and 5 are devoted to analyze the original broadcast gossip algorithm and the novel one, respectively. Some concluding remarks are presented in Section 6.
Notations and preliminaries
Given a set of finite cardinality , we define a graph on this set as , where (we exclude the presence of self-loops, namely edges of type ). Given , if , we shall say that is an in-neighbor of , and conversely is an out-neighbor of . We will denote by and , the set of, respectively, the out-neighbors and the in-neighbors of . Also, and , are said to be the out-degree and the in-degree of node , respectively. A graph whose nodes all have in-degree is said to be -regular. A graph is said to be (strongly) connected if for any pair of nodes , one can find a path, that is an ordered list of edges, from to . A graph is said to be symmetric, if implies . In a symmetric graph, being the neighborhood relation symmetrical, there is no distinction between in- and out-neighbors and we will drop, consequently, the index and . We let be the vector whose entries are all , be the identity matrix, and . Given a -vector , we denote by the diagonal matrix whose diagonal is equal to . The adjacency matrix of the graph , denoted by , is the matrix in such that if and only if . We also define the out-degree matrix as , the in-degree matrix as , and the Laplacian matrix as . The subscript will be usually skipped for the ease of notation. Given a matrix , we define the graph by putting iff and . A matrix is said to be adapted to the graph if , that is if .
When it comes to compare two sequences and , we shall write that if , that if , and that if there exist and positive scalars such that for
Given a linear operator from a vector space to itself, for instance represented by a square matrix, we denote by its spectral radius, that is the modulus of its largest in magnitude eigenvalue. Whenever we shall define as the modulus of the second largest eigenvalue in magnitude.
2 Broadcast gossip averaging algorithms
In this section we present the averaging problem and the algorithms we are dealing with. Let us be given a graph and a vector of real values , assigned to the nodes. Then, the averaging problem consists in approximating the average , with the constraint that at each time step each node can communicate its current state to its out-neighbors only. The simplest solution to this problem consists in an iterative algorithm such that , and for all , , where is a doubly stochastic matrix adapted to . Provided the diagonal of is non-zero, and the graph is connected, the algorithm solves the averaging problem, in the sense that for every ,
where by definition .
However, it is clear that this algorithm potentially requires synchronous communication along all the edges of the graph. As this requirement may be difficult to meet in real applications, in this paper we shall study algorithms which require little or no synchronization among the agents. We assume from now on that the communication network be represented by a graph, denoted by , whose adjacency and Laplacian matrix will be denoted by and , respectively.
We start recalling the Broadcast Gossip Algorithm [17, 3]. In this algorithm, at each time step one node, randomly selected from a uniform distribution over the nodes, broadcasts its current value to its neighbors. Its neighbors, in turn, update their values to a convex combination of their previous values and the received ones. More formally, we can write the algorithm as follows. Note that the only design parameter is the weight given to the received value in the convex update.
Broadcast Gossip Algorithm – Parameters: For all ,
This algorithm can also be written in the form of iterated matrix multiplication. Let be the broadcasting node which has been sampled at time . Then, where
and is the -th element of the canonical basis of . Clearly, at each time , the matrix is the realization of a uniformly distributed random variable, depending on the stochastic choice of the broadcasting node.
The Broadcast Gossip algorithm has received a considerable, and has been extensively studied in : in that paper, under the assumption that the communication graph is symmetric, the algorithm is shown to converge, and its speed of convergence is estimated. Instead, in this paper we shall concentrate on another crucial analysis question. Since the algorithm does not preserve the average of states through iterations, how far from the initial average the convergence value will be?
As we noted in the introduction, the practical interest of the BGA algorithm depends on the assumption that the transmissions are instantaneous, and reliable. In an effort towards more realistic communication models, we propose a modification of the Broadcast Gossip Algorithm, which has the feature of dealing with the issue of finite-length transmissions, and consequent packet losses due to collisions.
At each time step, each node wakes up, independently with probability , and broadcasts its current state to all its out-neighbors. It is clear that some agents can be the target of more than one message: in this case, we assume that a destructive collision occurs, and no message is actually received by these agents. Moreover, interference prevents the broadcasting nodes from hearing any others (half-duplex constraint111The half-duplex constraint is assumed throughout the paper: however, dropping it would imply minimal changes in the analysis.). If an agent is able to receive a message from agent , it updates its state to a convex combination with the received value, similarly to the standard BGA.
More formally, the algorithm is as follows.
Collision Broadcast Gossip Algorithm – Parameters: , For all ,
Also the latter algorithm can be written as matrix multiplication, defining
Both algorithms can actually be rewritten in the following graph-theoretic way. Let be the subgraph of depicting the communications taking place at a certain instant : the pair is an edge in iff successfully receives a message from at time . Denote by , , the adjacency, degree and Laplacian matrices, respectively, of . Clearly, for both algorithms:
Several questions are natural for the collision-prone CBGA algorithm, in comparison with its synchronous collision-less counterpart. Does the algorithm converge? How fast? Does it preserve the average of states? If not, how far it goes? Is performance poorer because of interferences?
We are going to answer the analysis questions we have posed, via a mean square analysis of the algorithm. Our interest will be mostly devoted to the properties of algorithms for large networks. To this goal, we shall often assume to have a sequence of graphs of increasing order , and we shall consider, for each , the corresponding matrix , which depends on and then on . Thus we will focus on studying the asymptotical properties of the algorithms as goes to infinity.
3 Mathematical models and techniques
In this section we lay down some mathematical tools that can be used to analyze gossip and other randomized algorithms. Namely, in Subsection 3.1 we review the mean square analysis in , which is going to be applied to the BGA and CBGA algorithms in Sections 4 and 5, respectively. Later, in Subsection 3.2 we introduce Abelian Cayley graphs and their properties, and in Subsection 3.3 we present perturbation results about sequences of stochastic matrices and their invariant vectors.
3.1 Mean square analysis
Motivated by the interpretation of the broadcast algorithms as iterated multiplications by random matrices, given in Equations (1) and (2), in this subsection we shall recall from  some definitions and results for the analysis of randomized schemes, in which the vector of states evolves in time following an iterate where is a sequence of i.i.d. stochastic-matrix-valued random variables. Consequently, is a stochastic process.
The sequence is said to achieve probabilistic consensus if for any , it exists a scalar random variable such that almost surely The following result, proved in , is a simple and effective tool to prove the convergence of these randomized linear algorithms. Let .
Proposition 3.1 (Probabilistic consensus criterion)
If is such that the graph is strongly connected and, for all , almost surely , then achieves probabilistic consensus.
For the rest of this section, we shall assume that the assumptions of Proposition 3.1 are satisfied. Then, in order to describe the speed of convergence of the algorithm, let us define and the rate of convergence as
Let us consider the (linear) operator such that
Notice that , and that with Let be the reachable space of the pair . Then,
and it has been proved in  that the rate can be estimated in terms of eigenvalues of matrices, as
It is clear that, if all matrices are doubly stochastic, converges to the initial average of states . If they are not, as in the cases we are studying in this paper, it is worth asking how far is the convergence value from the initial average. To study this bias in the estimation of the average, we let , and we define a matrix such that
Let , so that The convergence of the algorithm is equivalent to the existence of a random variable , taking values in , such that . This implies that
where and are the eigenvectors relative to 1 of and , respectively. In particular, if is doubly stochastic, then
For any -dimensional matrix , as a consequence, in the latter case the matrix can be computed as
Instead of computing , it may be easier and significant to obtain results about some functional of , for instance the spectral norm or the trace . The latter figure is of interest because, if we assume that the initial values are i.i.d. random variables with zero mean and variance , then
Motivated by our interest for the properties of the algorithms on large networks, and by Equation (9), we state the following definition.
Given a sequence of graphs , a randomized algorithm is said to be asymptotically unbiased if
3.2 Abelian Cayley graphs
A special family of graphs is that of Abelian Cayley graphs, which are graphs representing a group, as follows. Let be an Abelian group, considered with the additive notation, and let be a subset of . Then, the Abelian Cayley graph generated by in is the graph having as node set and as edge set. Note that the graph is symmetric if and only if is inverse-closed, and is connected if and only if generates the group . As well, a notion of Abelian Cayley matrix can be defined. Given a group and a generating vector of length , we shall define the Cayley matrix generated by as Correspondingly, for a given Cayley matrix , we shall denote by the generating vector of the Cayley matrix . Clearly, the adjacency matrices of -Cayley graphs are -Cayley matrices. Abelian Cayley graphs and matrices enjoy important properties: we refer the reader to [4, 31] for more details.
Abelian Cayley graphs encompass several important examples.
The complete graph on nodes, that is the graph where each node is directly connected with every other node, is ;
The circulant graphs (resp. matrices) are Abelian Cayley graphs (resp. matrices) on the group ; we shall denote the circulant matrix generated by as . For instance, he ring graph is the circulant graph ; its adjacency matrix is and its Laplacian is . For a ring, the eigenvalues of are , and in particular as
The square grids on a -dimensional torus are where are elements of the canonical basis of . In particular, the -dimensional hypercube graph is
Notice how all examples above are naturally forming a sequence of graphs indicized by the number of nodes . Special cases for which we will be able to prove asymptotically unbiasedness in the following, is when the generating set is finite and “kept fixed” as in the ring graph. Precisely we consider the following general example
Start from an infinite lattice and fix a finite generating as a group. For every integer , let considered as the Abelian group and let be the Cayley Abelian graph generated by . Notice that all graphs have the same generating set for sufficiently large, in particular they have the same degree. Moreover, by the assumption made on , all of them are strongly connected. Rings and grids fit in this framework.
A -Cayley structure for the communication graph has deep consequences on the mean square analysis of randomized consensus algorithms. In particular, it is easy to see that if is a -Cayley matrix, then also is -Cayley. To exploit this property, let us define a sequence of matrices by the following recursion. Let and . Since is a Cayley matrix on any Abelian group, the above fact implies that for every , the matrices are -Cayley. Thus, the sequence can be equivalently seen as the sequence of the corresponding generating vectors . We shall refer to this vector as the MSA vector. Since is linear, the MSA vector evolution can be written as a matrix multiplication . Clearly,
Moreover, is -stochastic, and if we let be the right invariant vector of , that is
then, can be computed using the fact that stochastic Cayley matrices are also doubly stochastic and Equation (8), obtaining
The following simple result will be useful later on.
Let be -Cayley, and suppose that almost surely. Then,
A straightforward computation shows that
Hence, implies that there exists such that and . If , this yields . If , then or also . Finally, if both cases above do not happen, then, and which yields . The second inclusion is thus proven. To prove the first one, notice that, by the assumption made, . This completes the proof. ∎
Notice that in the BGA and CBGA examples we can always apply Lemma 3.4 with .
3.3 Local perturbation of stochastic matrices
In this section we recall a perturbation result presented in  and which will be used later to estimate the trace of the matrix and to prove asymptotic unbiasedness for sequences of Cayley graphs.
We assume we have fixed an infinite universe set , an increasing sequence of finite cardinality subsets of such that and a sequence of irreducible stochastic matrices on the state spaces with the following stabilizing property: for every , there exist such that and
This property allows us to define, in a natural way, a limit stochastic matrix on . For every , we define
The sequence of stochastic matrices is said to be weakly democratic if the corresponding invariant vectors are such that, for all , for . Fix now a finite subset and another sequence of irreducible stochastic matrices on such that
In other terms, can be seen as a perturbed version of with the perturbation confined to the fixed subset and stable (it does not change as increases). Also for this perturbed sequence we can define, following (13), the asymptotic chain . The following result has been proven in .
Suppose that and are both irreducible. Then, if is weakly democratic, also is weakly democratic.
In the sequel we will use this result to prove the convergence to of the trace in (11).
4 Broadcast without collisions
In this section we present a comprehensive analysis of the Broadcasting Gossip Algorithm, in terms of both rate of convergence and bias. The following result characterizes the convergence properties of the algorithm, extending [3, Lemma 2 and 4] to directed networks.
Proposition 4.1 (Convergence of BGA algorithm)
Consider BGA algorithm. Let be any connected graph, its Laplacian matrix, and the smallest positive eigenvalue of . Then
In particular, BGA algorithm achieves probabilistic consensus.
First, we compute , as
Then, by Proposition 3.1, the algorithm achieves probabilistic consensus. To compute , we notice that , and we compute
The next corollary provides explicit bounds on the convergence rate, assuming the communication graph to be symmetric.
Under the assumptions of Proposition 4.1, if is symmetric, then
In particular, the convergence rate can be estimated as
Once we have understood that the algorithm converges, and how fast, it is worth to discuss to which value it converges. It is clear that the average is not preserved through each iteration, although the average can sometimes be preserved in expectation. Indeed, , so that if is symmetric,
However, this weak preservation property does not imply that the expected estimation error be zero, neither in finite time nor as time goes to infinity. In facts, the following example, derived from , shows that can be positive and, furthermore, it can be bounded away from zero, uniformly in the network size .
Example 4.3 (Complete graph)
Let be a complete graph, and consider the BGA algorithm. In this case, the operator can be computed explicitly, giving
Namely, and hence the BGA is not asymptotically unbiased on the complete graph. Note that by changing the parameter , one can trade off speed and estimation bias: this is numerically investigated in Figure 1. Similar trade-offs will be considered throughout the paper.
Next, we focus on Abelian Cayley graphs.
Consider the BGA algorithm and let the communication graph be Abelian Cayley with degree . Then, the MSA vector evolves as
where the matrix
is Cayley and is a matrix such that where does not depend neither on nor explicitly on but only on , and is such that the number of non-zero rows is at most and non-zero columns is at most .
Using the fact that all matrices are Abelian Cayley, we obtain
Since (being symmetric for every ), we easily obtain from above that
From this we immediately see that the non-zero elements of have row indices in and column indices in . Hence the result follows.
Note that, in general, will not be a stochastic matrix since it may be negative on the diagonal, however for large enough (and fixed) it is surely a stochastic matrix. Note, moreover, that the entries of the matrix in (19) are proportional to and, moreover, the number of the non-zero entries is upper bounded by , which does not depend on . Hence, we expect that, as diverges, would become negligible, and the MSA would depend on the matrix only. This would imply the unbiasedness of the algorithm, because it is immediate to remark that the invariant vector of is This fact can be actually stated as the following result.
Theorem 4.5 (Unbiasedness of BGA)
Fix a finite generating as a group. For every integer , let considered as the Abelian group and let be the Cayley Abelian graph generated by . On the sequence of the BGA is asymptotically unbiased.
The idea is to apply the perturbation result Theorem 3.5 to the sequence of matrices and . Notice that while by Lemma 3.4. Hence, and are both strongly connected. This also implies that the limit graph on of the two sequences and both contain which is simply the Abelian Cayley graph on generated by which is strongly connected by the assumption made. Finally, notice that is Abelian Cayley, hence obviously weakly democratic, while Lemma 4.4 guarantees that is a finite perturbation of in the sense of Section 3.3. Hence also is weakly democratic. This yields, by (11),
The results about Cayley graphs can be made more specific in the following example.
Example 4.6 (Ring graph)
Note that in this case the Cayley graph is circulant. For the ring graph, Proposition 4.1 implies that, for large enough,
and namely . Specializing the proof of Lemma 4.4, the evolution of can be written as
that is , with
Thanks to these explicit formulas, we can numerically compute the rate and bias. Namely, about the rate we obtain that , and . This means that the perturbation does not significantly affect the rate for large . Moreover, since and we argue that actually
On the other hand, about the bias we obtain that . We can also numerically evaluate and as functions of : these results are shown in Figure 2. Note that choosing we can trade off asymptotical error and convergence rate.
Remark 4.1 (Pareto optimality)
Combining Examples 4.3 and 4.6, we obtain that is proportional to on a complete graph, and to on a ring graph, considering the approximation for large . In Figure 3 we plot against , for both graphs, up to a suitable scaling with respect to . The fact that both curves are monotone decreasing highlights the fact that every value of is Pareto optimal, in the sense that one can not improve one of the two objectives without making the other worse.
Example 4.7 (Random geometric graph)
In order to take into account the locality constraint on connectivity in real-world networks, several models of random geometric graphs have been proposed, as accounted in . In this example we consider sequences of random geometric graphs, based on the following construction. For all , we sample points from a uniform distribution over a unit square , and we draw an edge between nodes when On these realizations we run the BGA algorithm until convergence is reached, up to a small tolerance threshold, and in this way we compute an approximation of The results are plotted in Figure 4, in comparison with the analogous quantity on the complete and ring graph. It appears from simulations that, as diverges, is on the complete graph, whereas it is on the ring graph, and on the random geometric graph. These evidences are in accordance with the theoretical results, and suggest their extension to other families of geometric graphs.
5 Broadcast with collisions
In this section we present a comprehensive analysis of the Collision Broadcast Gossip Algorithm, in terms of both rate of convergence and bias. After proving a general convergence result, we focus on complete graphs in Subsection 5.1, on Abelian Cayley graphs in Subsection 5.2, and finally on ring graphs in Subsection 5.3. Our main finding is that the performance of the CBGA, in terms of both speed and bias, is close to the BGA: in this sense we may claim the robustness of broadcast gossip algorithms to local interferences.
Proposition 5.1 (Convergence of CBGA algorithm)
Consider the CBGA algorithm. Let be any connected graph, and its Laplacian matrix. Then,
where for every and . In particular, the CBGA algorithm achieves probabilistic consensus.
The probability of having at time a successful transmission from to is . Hence,
Note that is strongly connected: by Proposition 3.1, we can conclude the convergence of the CBGA. ∎
Remark: Formula (20) is simpler if the graph is -regular, because in that case
Denote by that is the smallest nonzero eigenvalue of . Then , and this together with (5) and leads to
as a lower bound for the rate of convergence. As a function of , this lower bound is minimal whenever is maximal, that is for equal to
Hence, natural questions are: is this bound tight? is the best choice to improve the convergence rate? The content of the next section will answer positively these questions for complete and ring graphs.
5.1 Complete graphs
A thorough analysis can be carried on for the complete graph.