Revisiting Randomized Gossip Algorithms: General Framework, Convergence Rates and Novel Block and Accelerated Protocols1footnote 11footnote 1Part of the results was presented in [46, 49, 45]

# Revisiting Randomized Gossip Algorithms: General Framework, Convergence Rates and Novel Block and Accelerated Protocols111Part of the results was presented in [46, 49, 45]

Nicolas Loizou
University of Edinburgh
n.loizou@sms.ed.ac.uk
Peter Richtárik
KAUST , MIPT
peter.richtarik@kaust.edu.sa
###### Abstract

In this work we present a new framework for the analysis and design of randomized gossip algorithms for solving the average consensus problem. We show how classical randomized iterative methods for solving linear systems can be interpreted as gossip algorithms when applied to special systems encoding the underlying network and explain in detail their decentralized nature. Our general framework recovers a comprehensive array of well-known gossip algorithms as special cases, including the pairwise randomized gossip algorithm and path averaging gossip, and allows for the development of provably faster variants. The flexibility of the new approach enables the design of a number of new specific gossip methods. For instance, we propose and analyze novel block and the first provably accelerated randomized gossip protocols, and dual randomized gossip algorithms.

From a numerical analysis viewpoint, our work is the first that explores in depth the decentralized nature of randomized iterative methods for linear systems and proposes them as methods for solving the average consensus problem.

We evaluate the performance of the proposed gossip protocols by performing extensive experimental testing on typical wireless network topologies.

Keywords randomized gossip algorithms average consensus weighted average consensus stochastic methods linear systems randomized Kaczmarz randomized block Kaczmarz randomized coordinate descent heavy ball momentum Nesterov’s acceleration duality convex optimization wireless sensor networks

Mathematical Subject Classifications 93A14 68W15 68Q25 68W20 68W40 65Y20 90C15 90C20 90C25 15A06 15B52 65F10

## 1 Introduction

Average consensus is a fundamental problem in distributed computing and multi-agent systems. It comes up in many real world applications such as coordination of autonomous agents, estimation, rumour spreading in social networks, PageRank and distributed data fusion on ad-hoc networks and decentralized optimization. Due to its great importance there is much classical [81, 13] and recent [88, 87, 6] work on the design of efficient algorithms/protocols for solving it.

In the average consensus (AC) problem we are given an undirected connected network with node set and edges . Each node “knows” a private value . The goal of AC is for every node to compute the average of these private values, , in a distributed fashion. That is, the exchange of information can only occur between connected nodes (neighbors).

One of the most attractive classes of protocols for solving the average consensus problem are gossip algorithms. The development and design of gossip algorithms was studied extensively in the last decade. The seminal 2006 paper of Boyd et al. [6] motivated a fury of subsequent research and gossip algorithms now appear in many applications, including distributed data fusion in sensor networks [88], load balancing [10] and clock synchronization [19]. For a survey of selected relevant work prior to 2010, we refer the reader to the work of Dimakis et al. [14]. For more recent results on randomized gossip algorithms we suggest [94, 40, 63, 43, 55, 3]. See also [15, 4, 64, 29, 30].

### 1.1 Main contributions

In this work, we connect two areas of research which until now have remained remarkably disjoint in the literature: randomized iterative (projection) methods for solving linear systems and randomized gossip protocols for solving the average consensus. This connection enables us to make contributions by borrowing from each body of literature to the other and using it we propose a new framework for the design and analysis of novel efficient randomized gossip protocols.

The main contributions of our work include:

• RandNLA. We show how classical randomized iterative methods for solving linear systems can be interpreted as gossip algorithms when applied to special systems encoding the underlying network and explain in detail their decentralized nature. Through our general framework we recover a comprehensive array of well-known gossip protocols as special cases. In addition our approach allows for the development of novel block and dual variants of all of these methods. From a numerical analysis viewpoint our work is the first that explores in depth, the decentralized nature of randomized iterative methods for solving linear systems and proposes them as efficient methods for solving the average consensus problem (and its weighted variant).

• Weighted AC. The methods presented in this work solve the more general weighted average consensus (Weighted AC) problem (Section 3.1) popular in the area of distributed cooperative spectrum sensing networks. The proposed protocols are the first randomized gossip algorithms that directly solve this problem with finite-time convergence rate analysis. In particular, we prove linear convergence of the proposed protocols and explain how we can obtain further acceleration using momentum. To the best of our knowledge, the existing decentralized protocols that solve the weighted average consensus problem show convergence but without convergence analysis.

• Acceleration. We present novel and provably accelerated randomized gossip protocols. In each step, of the proposed algorithms, all nodes of the network update their values using their own information but only a subset of them exchange messages. The protocols are inspired by the recently proposed accelerated variants of randomized Kaczmarz-type methods and use momentum terms on top of the sketch and project update rule (gossip communication) to obtain better theoretical and practical performance. To the best of our knowledge, our accelerated protocols are the first randomized gossip algorithms that converge to a consensus with a provably accelerated linear rate without making any further assumptions on the structure of the network. Achieving an accelerated linear rate in this setting using randomized gossip protocols was an open problem.

• Duality. We reveal a hidden duality of randomized gossip algorithms, with the dual iterative process maintaining variables attached to the edges of the network. We show how the randomized coordinate descent and randomized Newton methods work as edge-based dual randomized gossip algorithms.

• Experiments. We corroborate our theoretical results with extensive experimental testing on typical wireless network topologies. We numerically verify the linear convergence of the our protocols for solving the weighted AC problem. We explain the benefit of using block variants in the gossip protocols where more than two nodes update their values in each iteration. We explore the performance of the proposed provably accelerated gossip protocols and show that they significantly outperform the standard pairwise gossip algorithm and existing fast pairwise gossip protocols with momentum. An experiment showing the importance of over-relaxation in the gossip setting is also presented.

This paper contains a synthesis and a unified presentation of the randomized gossip protocols proposed in Loizou and Richtárik [46], Loizou and Richtárik [49] and Loizou et al. [45]. In [46], building upon the results from [26], a connection between the area of randomized iterative methods for linear systems and gossip algorithms was established and block gossip algorithm were developed. Then in [49] and [45] faster and provably accelerated gossip algorithms were proposed using the heavy ball momentum and Nesterov’s acceleration technique, respectively. This paper expands upon these results and presents proofs for theorems that are referenced in the above papers. We also conduct several new experiments.

We believe that this work could potentially open up new avenues of research in the area of decentralized gossip protocols.

### 1.2 Structure of the paper

This work is organized as follows. Section 2 introduces the necessary background on basic randomized iterative methods for linear systems that will be used for the development of randomized gossip protocols. Related work on the literature of linear system solvers, randomized gossip algorithms for averaging and gossip algorithms for consensus optimization is presented. In Section 3 the more general weighted average consensus problem is described and the connections between the two areas of research (randomized projection methods for linear systems and gossip algorithms) is established. In particular we explain how methods for solving linear systems can be interpreted as gossip algorithms when applied to special systems encoding the underlying network and elaborate in detail their distributed nature. Novel block gossip variants are also presented. In Section 4 we describe and analyze fast and provably accelerated randomized gossip algorithms. In each step of these protocols all nodes of the network update their values but only a subset of them exchange their private values. Section 5 describes dual randomized gossip algorithms that operate with values that are associated to the edges of the network and Section 6 highlights further connections between methods for solving linear systems and gossip algorithms. Numerical evaluation of the new gossip protocols is presented in Section 7. Finally, concluding remarks are given in Section 8.

### 1.3 Notation

For convenience, a table of the most frequently used notation is included in the Appendix B. In particular, with boldface upper-case letters denote matrices; is the identity matrix. By and we denote the Euclidean norm and the Frobenius norm, respectively. For a positive integer number , we write . By we denote the solution set of the linear system , where and .

We shall often refer to specific matrix expressions involving several matrices. In order to keep these expressions brief throughout the paper it will be useful to define the following two matrices:

 H:=S(S⊤AB−1A⊤S)†S⊤andZ:=A⊤HA, (1)

depending on a random matrix drawn from a given distribution and on an positive definite matrix which defines the geometry of the space. In particular we define inner product in via and an induced norm, . By and we denote the row and the column of matrix , respectively. By we denote the Moore-Penrose pseudoinverse.

The complexity of all gossip protocols presented in this paper is described by the spectrum of matrix

 (2)

where the expectation is taken over . With and we indicate the smallest nonzero and the largest eigenvalue of matrix , respectively.

Vector represents the vector with the private values of the nodes of the network at the iteration while with we denote the value of node at the iteration. denotes the set of nodes that are neighbors of node . By we denote the algebraic connectivity of graph .

Throughout the paper, is the projection of onto in the -norm. We write . An explicit formula for the projection of onto set is given by

 ΠL,B(x):=argminx′∈L∥x′−x∥B=x−B−1A⊤(AB−1A⊤)†(Ax−b).

Finally, with we define the incidence matrix and with the Laplacian matrix of the network. Note that it holds that . Further, with we denote the degree matrix of the graph. That is, where is the degree of node .

## 2 Background - Technical Preliminaries

Solving linear systems is a central problem in numerical linear algebra and plays an important role in computer science, control theory, scientific computing, optimization, computer vision, machine learning, and many other fields. With the advent of the age of big data, practitioners are looking for ways to solve linear systems of unprecedented sizes. In this large scale setting, randomized iterative methods are preferred mainly because of their cheap per iteration cost and because they can easily scale to extreme dimensions.

### 2.1 Randomized iterative methods for linear systems

Kaczmarz-type methods are very popular for solving linear systems with many equations. The (deterministic) Kaczmarz method for solving consistent linear systems was originally introduced by Kaczmarz in 1937 [34]. Despite the fact that a large volume of papers was written on the topic, the first provably linearly convergent variant of the Kaczmarz method—the randomized Kaczmarz Method (RK)—was developed more than 70 years later, by Strohmer and Vershynin [77]. This result sparked renewed interest in design of randomized methods for solving linear systems [56, 57, 18, 51, 93, 58, 74, 41]. More recently, Gower and Richtárik [25] provide a unified analysis for several randomized iterative methods for solving linear systems using a sketch-and-project framework. We adopt this framework in this paper.

In particular, the analysis in [25] was done under the assumption that matrix has full column rank. This assumption was lifted in [26], and a duality theory for the method developed. Later, in [73], it was shown that the sketch and project method of [25] can be interpreted as stochastic gradient descent applied to a suitable stochastic optimization problem and relaxed variants of the proposed methods have been presented.

The sketch-and-project algorithm [25, 73] for solving a consistent linear system has the form

 xk+1 = xk−ωB−1A⊤Sk(S⊤kAB−1A⊤Sk)†S⊤k(Axk−b) (3) xk−ωB−1A⊤Hk(Axk−b),

where in each iteration, matrix is sampled afresh from an arbitrary distribution .222We stress that there is no restriction on the number of columns of matrix ( can be varied) In [25] it was shown that many popular randomized algorithms for solving linear systems, including RK method and randomized coordinate descent method (a.k.a Gauss-Seidel method) can be cast as special cases of the above update by choosing an appropriate combination of the distribution and the positive definite matrix .

In the special case that (no relaxation), the update rule of equation (3) can be equivalently written as follows:

 xk+1 :=argminx∈Rn∥x−xk∥2B (4) subject to S⊤kAx=S⊤kb.

This equivalent presentation of the method justifies the name Sketch and Project. In particular, the method is a two step procedure: (i) Draw random matrix from distribution and formulate a sketched system , (ii) Project the last iterate into the solution set of the sketched system.

A formal presentation of the Sketch and Project method is shown in Algorithm 1.

In this work, we are mostly interested in two special cases of the sketch and project framework— the randomized Kaczmarz (RK) method and its block variant, the randomized block Kaczmarz (RBK) method. In addition, in the following sections we present novel scaled and accelerated variants of these two selected cases and interpret their gossip nature. In particular, we focus on explaining how these methods can solve the average consensus problem and its more general version, the weighted average consensus (subsection 3.1).

Let be the unit coordinate vector in and let be column submatrix of the identity matrix with columns indexed by . Then RK and RBK methods can be obtained as special cases of the update rule (3) as follows:

• RK: Let and , where is chosen independently at each iteration, with probability . In this setup the update rule (3) simplifies to

 xk+1=xk−ωAi:xk−bi∥Ai:∥2A⊤i:. (5)
• RBK: Let and , where set is chosen independently at each iteration, with probability . In this setup the update rule (3) simplifies to

 xk+1=xk−ωA⊤C:(AC:A⊤C:)†(AC:xk−bC). (6)

In several papers [26, 73, 48, 50], it was shown that even in the case of consistent linear systems with multiple solutions, Kaczmarz-type methods converge linearly to one particular solution: the projection (on -norm) of the initial iterate onto the solution set of the linear system. This naturally leads to the formulation of the best approximation problem:

 minx=(x1,…,xn)∈Rn12∥x−x0∥2Bsubject toAx=b. (7)

where . In the rest of this manuscript, denotes the solution of (7) and we write .

##### Exactness.

An important assumption that is required for the convergence analysis of the randomized iterative methods under study is exactness. That is:

 (8)

The exactness property is of key importance in the sketch and project framework, and should be seen as an assumption on the distribution and not on matrix .

Clearly, an assumption on the distribution of the random matrices should be required for the convergence of Algorithm 1. For an instance, if is such that, with probability 1, where be the unit coordinate vector in , then the algorithm will select the same row of matrix in each step. For this choice of distribution it is clear that the algorithm will not converge to a solution of the linear system. The exactness assumption guarantees that this will not happen.

For necessary and sufficient conditions for exactness, we refer the reader to [73]. Here it suffices to remark that the exactness condition is very weak, allowing to be virtually any reasonable distribution of random matrices. For instance, a sufficient condition for exactness is for the matrix to be positive definite [26]. This is indeed a weak condition since it is easy to see that this matrix is symmetric and positive semidefinite without the need to invoke any assumptions; simply by design.

A much stronger condition than exactness is which has been used for the analysis of the sketch and project method in [25]. In this case, the matrix of the linear system requires to have full column rank and as a result the consistent linear system has a unique solution.

The convergence performance of the Sketch and Project method (Algorithm 1) under the exactness assumption for solving the best approximation problem is described by the following theorem.

###### Theorem 1 ([73]).

Let assume exactness and let be the iterates produced by the sketch and project method (Algorithm 1) with step-size . Set, . Then,

 E[∥xk−x∗∥2B]≤ρk∥x0−x∗∥2B, (9)

where

 ρ:=1−ω(2−ω)λ+min∈[0,1]. (10)

Recall that denotes the minimum nonzero eigenvalue of matrix .

In other words, using standard arguments, from Theorem 1 we observe that for a given we have that:

 k≥11−ρlog(1ϵ)⇒E[∥xk−x∗∥2B]≤ϵ∥x0−x∗∥2B.

We say that the iteration complexity of sketch and project method is,

 O(11−ρlog(1ϵ)).

### 2.2 Other related work

##### On Sketch and Project Methods.

Variants of the sketch-and-project methods have been recently proposed for solving several other problems. [22] and [27] use similar ideas for the development of linearly convergent randomized iterative methods for computing/estimating the inverse and pseudoinverse of a large matrix, respectively. A limited memory variant of the stochastic block BFGS method for solving the empirical risk minimization problem arising in machine learning was proposed by  [23]. Tu et al. [82] utilize the sketch-and-project framework to show that breaking block locality can accelerate block Gauss-Seidel methods. In addition, they develop an accelerated variant of the method for a specific distribution . A sketch and project method with the heavy ball momentum was studied in [48, 47] and an accelerated (in the sense of Nesterov) variant of the method proposed in [24] for the more general Euclidean setting and applied to matrix inversion and quasi-Newton updates. Inexact variants of Algorithm 1 have been proposed in [50]. As we have already mentioned, in [73], through the development of stochastic reformulations, a stochastic gradient descent interpretation of the sketch and project method has been proposed. Recently, using a different stochastic reformulation, [21] performed a tight convergence analysis of stochastic gradient descent in a more general convex setting. The analysis proposed in [21] recovers the linear convergence rate of sketch and project method (Theorem 1) as special case.

##### Gossip algorithms for average consensus

The problem of average consensus has been extensively studied in the automatic control and signal processing literature for the past two decades [14], and was first introduced for decentralized processing in the seminal work [81]. A clear connection between the rate of convergence and spectral characteristics of the underlying network topology over which message passing occurs was first established in [6] for pairwise randomized gossip algorithms.

Motivated by network topologies with salient properties of wireless networks (e.g., nodes can communicate directly only with other nearby nodes), several methods were proposed to accelerate the convergence of gossip algorithms. For instance, [5] proposed averaging among a set of nodes forming a path in the network (this protocol can be seen as special case of our block variants in Section 3.4). Broadcast gossip algorithms have also been analyzed [4] where the nodes communicate with more than one of their neighbors by broadcasting their values.

While the gossip algorithms studied in [6, 5, 4] are all first-order (the update of only depends on ), a faster randomized pairwise gossip protocol was proposed in [7] which suggested to incorporate additional memory to accelerate convergence. The first analysis of this protocol was later proposed in [40] under strong conditions. It is worth to mention that in the setting of deterministic gossip algorithms theoretical guarantees for accelerated convergence were obtained in [65, 35]. In Section 4 we propose fast and provably accelerated randomized gossip algorithms with memory and compare them in more detail with the fast randomized algorithm proposed in [7, 40].

##### Gossip algorithms for multiagent consensus optimization.

In the past decade there has been substantial interest in consensus-based mulitiagent optimization methods that use gossip updates in their update rule [55, 90, 76]. In multiagent consensus optimization setting , agents or nodes, cooperate to solve an optimization problem. In particular, a local objective function is associated with each node and the goal is for all nodes to solve the optimization problem

 minx∈Rd1nn∑i=1fi(x) (11)

by communicate only with their neighbors. In this setting gossip algorithms works in two steps by first executing some local computation followed by communication over the network [55]. Note that the average consensus problem with as node initial value can be case as a special case of the optimization problem (11) when the function values are .

Recently there has been an increasing interest in applying mulitagent optimization methods to solve convex and non-convex optimization problems arising in machine learning [80, 39, 1, 2, 9, 36, 32]. In this setting most consensus-based optimization methods make use of standard, first-order gossip, such as those described in [6], and incorporating momentum into their updates to improve their practical performance.

## 3 Sketch and Project Methods as Gossip Algorithms

In this section we show how by carefully choosing the linear system in the constraints of the best approximation problem (7) and the combination of the parameters of the Sketch and Project method (Algorithm 1) we can design efficient randomized gossip algorithms. We show that the proposed protocols can actually solve the weighted average consensus problem, a more general version of the average consensus problem described in Section 1. In particular we focus, on a scaled variant of the RK method (5) and on the RBK (6) and understand the convergence rates of these methods in the consensus setting, their distributed nature and how they are connected with existing gossip protocols.

### 3.1 Weighted average consensus

In the weighted average consensus (Weighted AC) problem we are given an undirected connected network with node set and edges . Each node holds a private value and its weight . The goal of this problem is for every node to compute the weighted average of the private values,

 ¯c:=∑ni=1wici∑ni=1wi,

in a distributed fashion. That is, the exchange of information can only occur between connected nodes (neighbors).

Note that in the special case when the weights of all nodes are the same ( for all ) the weighted average consensus is reduced to the standard average consensus problem. However, there are more special cases that could be interesting. For instance the weights can represent the degree of the nodes () or they can denote a probability vector and satisfy with .

It can be easily shown that the weighted average consensus problem can be expressed as optimization problem as follows:

 minx=(x1,…,xn)∈Rn12∥x−c∥2Bsubject tox1=x2=⋯=xn (12)

where matrix is a diagonal positive definite matrix (that is for all ) and the vector with the initial values of all nodes . The optimal solution of this problem is for all which is exactly the solution of the weighted average consensus.

As we have explained, the standard average consensus problem can be cast as a special case of weighted average consensus. However, in the situation when the nodes have access to global information related to the network, such as the size of the network (number of nodes ) and the sum of the weights , then any algorithm that solves the standard average consensus can be used to solve the weighted average consensus problem with the initial private values of the nodes changed from to .

The weighted AC problem is popular in the area of distributed cooperative spectrum sensing networks [33, 66, 91, 92]. In this setting, one of the goals is to develop decentralized protocols for solving the cooperative sensing problem in cognitive radio systems. The weights in this case represent a ratio related to the channel conditions of each node/agent [33]. The development of methods for solving the weighted AC problem is an active area of research (check [33] for a recent comparison of existing algorithms). However, to the best of our knowledge, existing analysis for the proposed algorithms focuses on showing convergence and not on providing convergence rates. Our framework allows us to obtain novel randomized gossip algorithms for solving the weighted AC problem. In addition, we provide a tight analysis of their convergence rates. In particular, we show convergence with a linear rate. See Section 7.1 for an experiment confirming linear convergence of one of our proposed protocols on typical wireless network topologies.

### 3.2 Gossip algorithms through sketch and project framework

We propose that randomized gossip algorithms should be viewed as special case of the Sketch and Project update to a particular problem of the form (7). In particular, we let be the initial values stored at the nodes of , and choose and so that the constraint is equivalent to the requirement that (the value stored at node is equal to the value stored at node ) for all .

###### Definition 2.

We say that is an “average consensus (AC) system” when iff for all .

It is easy to see that is an AC system precisely when and the nullspace of is , where is the vector of all ones in . Hence, has rank . Moreover in the case that , it is easy to see that for any AC system, the solution of (7) necessarily is — this is why we singled out AC systems. In this sense, any algorithm for solving (7) will “find” the (weighted) average . However, in order to obtain a distributed algorithm we need to make sure that only “local” (with respect to ) exchange of information is allowed.

It can be shown that many linear systems satisfy the above definition.

For example, we can choose and to be the incidence matrix of . That is, such that directly encodes the constraints for . That is, row of matrix contains value in column , value in column (we use an arbitrary but fixed order of nodes defining each edge in order to fix ) and zeros elsewhere. A different choice is to pick and , where is the Laplacian matrix of network . Depending on what AC system is used, the sketch and project methods can have different interpretations as gossip protocols.

In this work we mainly focus on the above two AC systems but we highlight that other choices are possible333Novel gossip algorithms can be proposed by using different AC systems to formulate the average consensus problem. For example one possibility is using the random walk normalized Laplacian . For the case of degree-regular networks the symmetric normalized Laplacian matrix can also being used.. In Section 4.2 for the provably accelerated gossip protocols we also use a normalized variant () of the Incidence matrix.

#### 3.2.1 Standard form and mass preservation

Assume that is an AC system. Note that since , the general sketch-and-project update rule (3) simplifies to:

 (13)

This is the standard form in which randomized gossip algorithms are written. What is new here is that the iteration matrix has a specific structure which guarantees convergence to under very weak assumptions (see Theorem 1). Note that if , i.e., the starting primal iterate is the vector of private values (as should be expected from any gossip algorithm), then the iterates of (13) enjoy a mass preservation property (the proof follows the fact that ):

###### Theorem 3 (Mass preservation).

If is an AC system, then the iterates produced by (13) satisfy: , for all .

Let fix then,

#### 3.2.2 ε-Averaging time

Let . The typical measure of convergence speed employed in the randomized gossip literature, called -averaging time and here denoted by , represents the smallest time for which gets within from , with probability greater than , uniformly over all starting values . More formally, we define

 Tave(ε):=supc∈Rninf{k:P(zk>εz0)≤ε}.

This definition differs slightly from the standard one in that we use instead of .

Inequality (9), together with Markov inequality, can be used to give a bound on , formalized next:

###### Theorem 4.

Assume is an AC system. Let and be positive definite diagonal matrix. Assume exactness. Then for any we have

 Tave(ϵ)≤3log(1/ε)log(1/ρ)≤3log(1/ϵ)1−ρ,

where is defined in (10).

###### Proof.

See Appendix A.1. ∎

Note that under the assumptions of the above theorem, only has a single zero eigenvalue, and hence is the second smallest eigenvalue of . Thus, is the second largest eigenvalue of . The bound on appearing in Thm 4 is often written with replaced by [6].

In the rest of this section we show how two special cases of the sketch and project framework, the randomized Kaczmarz (RK) and its block variant, randomized block Kaczmatz (RBK) work as gossip algorithms for the two AC systems described above.

### 3.3 Randomized Kaczmarz method as gossip algorithm

As we described before the sketch and project update rule (3) has several parameters that should be chosen in advance by the user. These are the stepsize (relaxation parameter), the positive definite matrix and the distribution of the random matrices .

In this section we focus on one particular special case of the sketch and project framework, a scaled/weighted variant of the randomized Kaczmarz method (RK) presented in (5), and we show how this method works as gossip algorithm when applied to special systems encoding the underlying network. In particular, the linear systems that we solve are the two AC systems described in the previous section where the matrix is either the incidence matrix or the Laplacian matrix of the network.

As we described in (5) the standard RK method can be cast as special case of the sketch and project update (3) by choosing and . In this section, we focus on a small modification of this algorithm and we choose the positive definite matrix to be , the diagonal matrix of the weights presented in the weighted average consensus problem.

##### Scaled RK:

Let us have a general consistent linear system with . Let us also choose and , where is chosen in each iteration independently, with probability . In this setup the update rule (3) simplifies to

 (14)

This small modification of RK allow us to solve the more general weighted average consensus presented in Section 3.1 (and at the same time the standard average consensus problem if where ). To the best of our knowledge, even if this variant is special case of the general Sketch and project update, was never precisely presented before in any setting.

#### 3.3.1 AC system with incidence matrix Q

Let us represent the constraints of problem (12) as linear system with matrix be the Incidence matrix of the graph and right had side . Lets also assume that the random matrices are unit coordinate vectors in .

Let then from the definition of matrix we have that where are unit coordinate vectors in . In addition, from the definition the diagonal positive definite matrix we have that

 ∥B−1/2Q⊤e:∥2=∥B−1/2(fi−fj)∥2=1w1+1wj. (15)

Thus in this case the update rule (14) simplifies:

 xk+1 b=0,A=Q,(???)= xk−ωQe:xk∥B−1/2Q⊤e:∥2B−1Q⊤e: (16) xk−ωQe:xk1w1+1wjB−1Q⊤e: = xk−ω(xki−xkj)1wi+1wj(1wifi−1wjfj).

From (16) it can be easily seen that only the values of coordinates and update their values. These coordinates correspond to the private values and of the nodes of the selected edge . In particular the values of and are updated as follows:

 xk+1i=(1−ωwjwj+wi)xki+ωwjwj+wixkjandxk+1j=ωwiwj+wixki+(1−ωwiwj+wi)xkj. (17)
###### Remark 1.

In the special case that where (we solve the standard average consensus problem) the update of the two nodes is simplified to

 xk+1i=(1−ω2)xki+ω2xkjandxk+1j=ω2xki+(1−ω2)xkj.

If we further select then this becomes:

 xk+1i=xk+1j=xki+xkj2, (18)

which is the update of the standard pairwise randomized gossip algorithm first presented and analyzed in [6].

#### 3.3.2 AC system with Laplacian matrix L

The AC system takes the form , where matrix is the Laplacian matrix of the network. In this case, each row of the matrix corresponds to a node. Using the definition of the Laplacian, we have that , where are unit coordinate vectors in and is the degree of node .

Thus, by letting to be the diagonal matrix of the weights we obtain:

 ∥B−1/2L⊤i:∥2=∥∥ ∥∥B−1/2(difi−∑j∈Nifj)∥∥ ∥∥2=d2iwi+∑j∈Ni1wj. (19)

In this case, the update rule (14) simplifies to:

 xk+1 b=0,A=L,(???)= xk−ωLi:xk∥B−1/2L⊤i:∥22B−1L⊤i: (20) xk−ωLi:xkd2iwi+∑j∈Ni1wjB−1L⊤i: = xk−ω(dixki−∑j∈Nixkj)d2iwi+∑j∈Ni1wj⎛⎝diwifi−∑j∈Ni1wjfj⎞⎠.

From (20), it is clear that only coordinates update their values. All the other coordinates remain unchanged. In particular, the value of the selected node (coordinate ) is updated as follows:

 xk+1i=xki−ω(dixki−∑j∈Nixkj)d2iwi+∑j∈Ni1wjdiwi, (21)

while the values of its neighbors are updated as:

 xk+1j=xkj+ω(dixki−∑ℓ∈Nixkℓ)d2iwi+∑ℓ∈Ni1wℓ1wj. (22)
###### Remark 2.

Let and where then the selected nodes update their values as follows:

 xk+1i=∑ℓ∈{i∪Ni}xkℓdi+1andxk+1j=xkj+(dixki−∑ℓ∈Nixkℓ)d2i+di. (23)

That is, the selected node updates its value to the average of its neighbors and itself, while all the nodes update their values using the current value of node and all nodes in .

In a wireless network, to implement such an update, node would first broadcast its current value to all of its neighbors. Then it would need to receive values from each neighbor to compute the sums over , after which node would broadcast the sum to all neighbors (since there may be two neighbors for which ). In a wired network, using standard concepts from the MPI library, such an update rule could be implemented efficiently by defining a process group consisting of , and performing one Broadcast in this group from (containing ) followed by an AllReduce to sum over . Note that the terms involving diagonal entries of and the degrees could be sent once, cached, and reused throughout the algorithm execution to reduce communication overhead.

#### 3.3.3 Details on complexity results

Recall that the convergence rate of the sketch and project method (Algorithm 1) is equivalent to:

 ρ:=1−ω(2−ω)λ+min(W),

where and (from Theorem 1). In this subsection we explain how the convergence rate of the scaled RK method (14) is modified for different choices of the main parameters of the method.

Let us choose (no over-relaxation). In this case, the rate is simplified to .

Note that the different ways of modeling the problem (AC system) and the selection of the main parameters (weight matrix and distribution ) determine the convergence rate of the method through the spectrum of matrix .

Recall that in the iterate of the scaled RK method (14) a random vector is chosen with probability . For convenience, let us choose444Similar probabilities have been chosen in [25] for the convergence of the standard RK method (). The distribution of the matrices used in equation (24) is common in the area of randomized iterative methods for linear systems and is used to simplify the analysis and the expressions of the convergence rates. For more choices of distributions we refer the interested reader to [25]. It is worth to mention that the probability distribution that optimizes the convergence rate of the RK and other projection methods can be expressed as the solution to a convex semidefinite program [25, 11].:

 pi=∥B−1/2A⊤i:∥2∥B−1/2A⊤∥2F. (24)

Then we have that:

 E[H] = E[S(S⊤AB−1A⊤S)†S⊤] (25) = m∑i=1pieie⊤ie⊤iAB−1A⊤ei=m∑i=1pieie⊤i∥A⊤i:∥2B−1=m∑i=1pieie⊤i∥B−1/2A⊤i:∥2

and

 WB−1/2A⊤AB−1/2∥B−1/2A⊤∥2F. (26)
##### Incidence Matrix:

Let us choose the AC system to be the one with the incidence matrix . Then and we obtain

If we further have , then and the convergence rate simplifies to:

 ρ=1−λ+min(D−1/2LD−1/2)n=1−λ+min(Lsym)n.

If where (solve the standard average consensus problem), then and the convergence rate simplifies to

 ρ=1−λ+min(L)2m=1−α(G)2m. (27)

The convergence rate (27) is identical to the rate proposed for the convergence of the standard pairwise gossip algorithm in [6]. Recall that in this special case the proposed gossip protocol has exactly the same update rule with the algorithm presented in [6], see equation (18).

##### Laplacian Matrix:

If we choose to formulate the AC system using the Laplacian matrix , that is , then and we have:

If , then the convergence rate simplifies to:

 ρ=1−λ+min(D−1/2L⊤LD−1/2)∑ni=1(di+1)=1−λ+min(D−1/2L2D−1/2)n+∑ni=1di∑ni=1di=2m=1−λ+min(D−1/2L2D−1/2)n+2m.

If , where , then and the convergence rate simplifies to

 ρ=1−λ+min(L2)∑ni=1di(di+1)=1−α(G)2∑ni=1di(di+1).

### 3.4 Block gossip algorithms

Up to this point we focused on the basic connections between the convergence analysis of the sketch and project methods and the literature of randomized gossip algorithms. We show how specific variants of the randomized Kaczmarz method (RK) can be interpreted as gossip algorithms for solving the weighted and standard average consensus problems.

In this part we extend the previously described methods to their block variants related to randomized block Kaczmarz (RBK) method (6). In particular, in each step of the sketch and project method (3), the random matrix is selected to be a random column submatrix of the identity matrix corresponding to columns indexed by a random subset . That is, , where a set is chosen in each iteration independently, with probability (see equation (6)). Note that in the special case that set is a singleton with probability 1 the algorithm is simply the randomized Kaczmarz method of the previous section.

To keep things simple, we assume that (standard average consensus, without weights) and choose the stepsize . In the next section, we will describe gossip algorithms with heavy ball momentum and explain in detail how the gossip interpretation of RBK change in the more general case of .

Similar to the previous subsections, we formulate the consensus problem using either or as the matrix in the AC system. In this setup, the iterative process (3) has the form:

 xk+1 xk−A⊤I:C(I⊤:CAA⊤I:C)†I⊤:CAxk=xk−A⊤C:(AC:A⊤C:)†AC:xk, (28)

which, as explained in the introduction, can be equivalently written as:

 xk+1=argminx∈Rn{∥x−xk∥2:I⊤:CAx=0}. (29)

Essentially in each step of this method the next iterate is evaluated to be the projection of the current iterate onto the solution set of a row subsystem of .

##### AC system with Incidence Matrix:

In the case that the selected rows correspond to a random subset of selected edges. While (28) may seem to be a complicated algebraic (resp. variational) characterization of the method, due to our choice of we have the following result which gives a natural interpretation of RBK as a gossip algorithm (see also Figure 1).

###### Theorem 5 (RBK as Gossip algorithm: RBKG).

Consider the AC system with the constraints being expressed using the Incidence matrix . Then each iteration of RBK (Algorithm (28)) works as gossip algorithm as follows:

1. Select a random set of edges ,

2. Form subgraph of from the selected edges

3. For each connected component of , replace node values with their average.

###### Proof.

See Appendix A.2. ∎

Using the convergence result of general Theorem 1 and the form of matrix (recall that in this case we assume , and ), we obtain the following complexity for the algorithm:

 E[∥xk−x∗∥2]≤[1−λ+min(E[Q⊤C:(QC:Q⊤C:)†QC:])]k∥x0−x∗∥2. (30)

For more details on the above convergence rate of randomized block Kaczmarz method with meaningfully bounds on the rate in a more general setting we suggest the papers [57, 58].

There is a very closed relationship between the gossip interpretation of RBK explained in Theorem 5 and several existing randomized gossip algorithms that in each step update the values of more than two nodes. For example the path averaging algorithm porposed in [5] is a special case of RBK, when set is restricted to correspond to a path of vertices. That is, in path averaging, in each iteration a path of nodes is selected and the nodes that belong to it update their values to their exact average. A different example is the recently proposed clique gossiping [44] where the network is already divided into cliques and through a random procedure a clique is activated and the nodes of it update their values to their exact average. In [6] a synchronous variant of gossip algorithm is presented where in each step multiple node pairs communicate exactly at the same time with the restriction that these simultaneously active node pairs are disjoint.

It is easy to see that all of the above algorithms can be cast as special cases of RBK if the distribution of the random matrices is chosen carefully to be over random matrices (column sub-matrices of Identity) that update specific set of edges in each iteration. As a result our general convergence analysis can recover the complexity results proposed in the above works.

Finally, as we mentioned, in the special case in which set is always a singleton, Algorithm (28) reduces to the standard randomized Kaczmarz method. This means that only a random edge is selected in each iteration and the nodes incident with this edge replace their local values with their average. This is the pairwise gossip algorithm of Boyd er al. [6] presented in equation (18). Theorem 5 extends this interpretation to the case of the RBK method.

##### AC system with Laplacian Matrix:

For this choice of AC system the update is more complicated. To simplify the way that the block variant work as gossip we make an extra assumption. We assume that the selected rows of the constraint in update (29) have no-zero elements at different coordinates. This allows to have a direct extension of the serial variant presented in Remark 2. Thus, in this setup, the RBK update rule (28) works as gossip algorithm as follows:

1. nodes are activated (with restriction that the nodes are not neighbors and they do not share common neighbors)

2. For each node we have the following update:

 xk+1i=∑ℓ∈{i∪Ni}xkℓdi+1andxk+1j=xkj+(dixki−∑ℓ∈Nixkℓ)d2i+di. (31)

The above update rule can be seen as a parallel variant of update (23). Similar to the convergence in the case of Incidence matrix, the RBK for solving the AC system with a Laplacian matrix converges to with the following rate (using result of Theorem 1):

 E[∥xk−x∗∥2]≤[1−λ+min(E[L⊤C:(LC:L⊤C:)†LC:])]k∥x0−x∗∥2.

## 4 Faster and Provably Accelerated Randomized Gossip Algorithms

The main goal in the design of gossip protocols is for the computation and communication to be done as quickly and efficiently as possible. In this section, our focus is precisely this. We design randomized gossip protocols which converge to consensus fast with provable accelerated linear rates. To the best of our knowledge, the proposed protocols are the first randomized gossip algorithms that converge to consensus with an accelerated linear rate.

In particular, we present novel protocols for solving the average consensus problem where in each step all nodes of the network update their values but only a subset of them exchange their private values. The protocols are inspired from the recently developed accelerated variants of randomized Kaczmarz-type methods for solving consistent linear systems where the addition of momentum terms on top of the sketch and project update rule provides better theoretical and practical performance.

In the area of optimization algorithms, there are two popular ways to accelerate an algorithm using momentum. The first one is using the Polyak’s heavy ball momentum [67] and the second one is using the theoretically much better understood momentum introduced by Nesterov [59, 61]. Both momentum approaches have been recently proposed and analyzed to improve the performance of randomized iterative methods for solving linear systems.

To simplify the presentation, the accelerated algorithms and their convergence rates are presented for solving the standard average consensus problem (