Pushsum on random graphs
Abstract
In this paper, we study the problem of achieving average consensus over a random timevarying sequence of directed graphs by extending the class of socalled pushsum algorithms to such random scenarios. Provided that an ergodicity notion, which we term the directed infinite flow property, holds and the auxiliary states of agents are uniformly bounded away from zero infinitely often, we prove the almost sure convergence of the evolutions of this class of algorithms to the average of initial states. Moreover, for a random sequence of graphs generated using a timevarying irreducible probability matrix, we establish convergence rates for the proposed pushsum algorithm.
1 Introduction
Many distributed algorithms, executed with limited information over a network of agents, rely on estimating the average value of the initial state of the individual agents. These include the distributed optimization protocols [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], distributed regret minimization algorithms in machine learning [12], and dynamics for fusion of information in sensor networks [13]. There is a large body of work devoted to the average consensus problem, starting with the pioneering work [14], where the socalled pushsum algorithm is first introduced. The key differentiating factor of the pushsum algorithm from consensus dynamics is that it takes advantage of a paralleled scalarvalued agreement dynamics, initiated uniformly across the agents, that tracks the imbalances of the network and adjusts for them when estimating the consensus value.
In addition to the earlier work [14], several recent papers have studied the problem of average consensus, see for example [15], where other classes of algorithms based on weight adaptation are considered, ensuring convergence to the average on fixed directed graphs. The study of convergence properties of pushsum algorithms on timevarying deterministic sequences of directed graphs, to best of our knowledge, was initiated in [16] and extended in [11], where pushsum protocols are intricately utilized to prove the convergence of a class of distributed optimization protocols on a sequence of timevarying directed graphs. The key assumption in [11] is the connectedness of the sequence, which means that in any window of size the union of the underlying directed graphs over time is strongly connected. As we demonstrate, a by product of our work in deterministic settings is the generalization of the sequences on which the convergence of the pushsum algorithms is valid to the ones which satisfy the infinite flow property; in this sense, this extension mimics the properties required for the convergence of consensus dynamics, along the lines of [17].
This paper is concerned with the problem of average consensus for scenarios where communication between nodes is timevarying and possibly random. The convergence properties of consensus dynamics on random sequences of directed graphs are by this time wellestablished, see for example [17, 18, 19]. Average consensus on random graphs has also been studied in [16], under the assumption that the corresponding random sequence of stochastic matrices is stationary and ergodic with positive diagonals and irreducible expectation. One of our main objectives in this work is to extend these result to more general sequences of random stochastic matrices, in particular, beyond stationary. More importantly, to best of our knowledge, we establish for the first time convergence rates for the pushsum algorithms on random sequences of directed graphs.
The remainder of this paper is organized as follows. Section 2 contains mathematical preliminaries. In Section 3, we give a formal description of our consensus problem. In Section 4, we describe the pushsum algorithm. Section 5 studies the ergodicity of rowstochastic matrices, and Section 6 contains our main convergence results. In Section 7, we derive convergence rates for the pushsum algorithm for a class of random columnstochastic matrices. Finally, we gather our conclusions and ideas for future directions in Section 8.
2 Mathematical Preliminaries
We start with introducing some notational conventions. Let and denote the set of real and integer numbers, respectively, and let and denote the set of nonnegative real numbers and integers, respectively. For a set , we write if is a proper subset of , and we call the empty set and trivial subsets of . The complement of is denoted by . Let denote the cardinality of a finite set . We view all vectors in as column vectors, where . We denote by , and , the standard Euclidean norm, the norm, and the infinity norm on , respectively. The th unit vector in , whose th component is and all other components are , is denoted by . We will also use the shorthand notation and . A vector is stochastic if its elements are nonnegative real numbers that sum to . We use to denote the set of nonnegative realvalued matrices. A matrix is rowstochastic (columnstochastic) if each of its rows (columns) sums to 1. For a given and any nontrivial , we let . The notation and will refer to the transpose of the matrix and the vector , respectively. A positive matrix is a real matrix all of whose elements are positive. Finally, denotes the th row of matrix and denotes the th column of .
2.1 Graph theory
A (weighted) directed graph consists of a node set , an edge set , and a weighted adjacency matrix , with if and only if , in which case we say that is connected to . Similarly, given a matrix , one can associate to a directed graph , where if and only if , and hence is the corresponding adjacency matrix for . The inneighbors and the outneighbors of are the set of nodes and , respectively. The outdegree of is . A path is a sequence of nodes connected by edges. A directed graph is strongly connected if there is a path between any pair of nodes. A directed graph is complete if every pair of distinct vertices is connected by an edge. If the directed graph is strongly connected, we say that is irreducible.
2.2 Sequences of random stochastic matrices
Let be the set of columnstochastic matrices that have positive diagonal entries, and let denote the Borel algebra on . Given a probability space , a measurable function is called a random columnstochastic matrix, and a sequence of such measurable functions on is called a random columnstochastic matrix sequence; throughout, we assume that . Note that for any , one can associate a sequence of directed graphs to , where if and only if . This in turn defines a sequence of random directed graphs on , which we denote by .
3 Problem Statement
Consider a network of nodes , where node has an initial state (or opinion) ; the assumption that this initial state is a scalar is without loss of generality, and our treatment can easily be extended to the vector case. The objective of each node is to achieve average consensus; that is to compute the average with the constraint that only limited exchange of information between nodes is permitted. The communication layer between nodes at each time is specified by a sequence of random directed graphs , where . Specifically, at each time , node updates its value based on the values of its inneighbors , where . One standing assumption throughout this paper is that each node knows its outdegree at every time ; this assumption is indeed necessary, as shown in [20]. Our main objective is to show that the class of socalled pushsum algorithms can be used to achieve average consensus at every node, under the assumption that the communication network is random. This key point distinguishes our work from the existing results in the literature [14], [11], [15]. Another key objective that we pursue in this paper is to obtain rates of convergence for such algorithms. We start our treatment with reviewing the pushsum algorithm.
4 Random PushSum
Consider a network of nodes , where node has an initial state (or opinion) . The pushsum algorithm, proposed originally in [14], is defined as follows. Each node maintains and updates, at each time , two state variables and . The first state variable is initialized to and the second one is initialized to , for all . At time , node sends and to its outneighbors in the random directed graph , which we assume to contain selfloops at each node for all . At time , node updates its state variables according to
(1)  
(2) 
It is useful to define another auxiliary variable ; as we will show later, is the estimate by node of the average . One can rewrite this algorithm in a vector form; let the columnstochastic matrix to be a function of with entries
(3) 
Using these weighted adjacency matrices, for every , we can rewrite the dynamics (1) as
(4)  
(5) 
where
5 Ergodicity
In this section, we establish some important auxiliary results regarding the convergence of products of matrices which satisfy the socalled directed infinite flow property (c.f. Definition 3). We study the products of a class of matrices in a deterministic setting, which we then use to study the pushsum algorithm in the next section. We start by some definitions.
Definition 1 (Ergodicity [21], [17]).
Let be a sequence of rowstochastic matrices, and for , let denote the product
(6) 
where . The sequence is said to be weakly ergodic, if for all and any , . The sequence is said to be strongly ergodic if for any , where is a stochastic vector.
It can be shown that weak ergodicity and strong ergodicity are equivalent [21, Theorem 1]. We will simply call such a sequence of rowstochastic matrices ergodic.
We first establish a sufficient condition for ergodicity of a sequence of rowstochastic matrices, Proposition 2, which we subsequently use in our convergence result for the pushsum algorithm. For this reason, we consider the following dynamical system:
(7) 
Let us start by two key definitions.
Definition 2 (Strong Aperiodicity [17]).
We say that a sequence of matrices is strongly aperiodic if there exists such that , for all and .
Motivated by the infinite flow property [17, Definition 3.2.], we provide the following definition.
Definition 3 (Directed Infinite Flow Property).
We say that a sequence of matrices has the directed infinite flow property if for any nontrivial , .
Consider now a sequence of matrices that is strongly aperiodic and has the directed infinite flow property. Let , and for any , define
(8) 
Note that is the minimal time instance after , such that there is nonzero information flow between any nontrivial subset of and its complement; consequently, the directed graph associated with the product is strongly connected.
Proposition 1.
If a sequence of matrices has the directed infinite flow property, is finite for all .
Proof.
Suppose that is not finite for some . Then, using (8), there exists a nontrivial subset such that . This implies that , which contradicts the assumption that has the directed infinite flow property. ∎
To establish convergence results for the products of rowstochastic matrices satisfying Proposition 3, we argue that in each time window where the underlying directed graph becomes strongly connected for times, i.e., after time steps for some , significant mixing will occur. To formalize this statement, let and
(9) 
for . For , we also define
We are now ready to state our first result.
Proposition 2.
Consider the dynamics (7), where the sequence of rowstochastic matrices is such that satisfies (3). Suppose, additionally, that is strongly aperiodic and has the directed infinite flow property. Then,

there is a vector such that, for all and ,
where and ;

if, for the sequence associated with , we have
(10) then the sequence is ergodic.
Proof.
We start by proving the first statement. By definition of , we know that for all , is irreducible. Since each is strongly aperiodic, by Lemma A.1, the matrix
which is the product of irreducible matrices, is positive for all . Hence, by Lemma A.2 (ii), for all , we have
Now, since and for all , , using [22, Lemma 3], we obtain
(11) 
Note that if we let for all , we have
(12) 
Using (11) and (12), we conclude that
for all .
We next prove part (ii); since for all , we have that , where we have used the fact that for all . This implies
(13) 
On the other hand, we have
The definition of the sets implies that we can write the right hand side as , which gives
where the last equality follows from (13) and the assumption . Using the fact that , we have that , for any . Hence, by Proposition 2, part (i), we conclude that is weakly (and thus strongly) ergodic. ∎
Following similar steps as in Proposition 2 we obtain the following result for sequences of columnstochastic matrices of the form (3).
Proposition 3.
Consider the dynamics (7) and assume that sequence of matrices is strongly aperiodic and has the directed infinite flow property, where the are weighted adjacency matrices in the form of (3). Then,

there is a vector such that, for all and ,
where and ;

for the sequence associated with , if
then for all , .
6 Convergence of PushSum
With all the pieces in place, we are now ready to study the behavior of the pushsum algorithm in a random setting.
Theorem 1.
Consider the pushsum algorithm (4) and suppose that the sequence of random columnstochastic matrices has the directed infinite flow property, almost surely. Then, we have
where and .
Proof.
Define
where is a (random) vector from part (i) of Proposition 3. In addition, under the pushsum algorithm we have that
for all . Hence, for every and all , we have
Using the fact that and by bringing the fractions to a common denominator, we have
Note that the denominator in the last equation is equal to . Hence, for all and we have
where the inequality follows from the triangle inequality. Since , we have that
Using the upper bound in part (i) of Proposition 3, we obtain
(14) 
∎
Proposition 4.
Consider the pushsum algorithm (4) and suppose that the sequence of random columnstochastic matrices has the directed infinite flow property, almost surely. Moreover, suppose that the sequence associated with satisfies (10), almost surely. If there exists , such that for any , there is such that for all , then
Remark 1.
In the next section we exhibit a class of random matrix sequences that satisfy the conditions of Proposition 4 and thus admit average consensus almost surely.
Proof.
Proof of this proposition is similar to the proof of Theorem 4.1 in [16], where the sequence is assumed to be stationary; however, since we do not assume stationarity, we provide a proof. By Proposition 3 part (ii), for any there is a time such that for all and ,
By assumption, there exists such that , which implies that , where is defined as in Lemma A.3. Since by Lemma A.3, is nonincreasing, for all , meaning that converges to zero as and hence, , almost surely. ∎
7 BIrreducible Sequences
In this section we characterize a class of random columnstochastic matrices that admits average consensus and we provide a rate of convergence of the pushsum algorithm for this class. To achieve this, we restrict the class of random matrices that we consider; as we will point out later, this restricted class still includes many interesting sequences of random matrices.
In the following discussion, we assume that the pushsum dynamics is generated by a columnstochastic matrix sequence where
(15) 
for all , where is with probability , and is with probability such that are independent random variables. In other words, there is a random communication link between node and at time with probability . Note that is a sequence of independent random columnstochastic matrices.
Furthermore, for the probability matrix sequence , we assume that the following holds.
Assumption 1.
is a sequence of matrices with . Additionally, we assume that , for all . Also, for some constant , we assume that for all and all such that . Finally, we assume that the sequence is irreducible, i.e. for some integer ,
is irreducible for all .
We next state the main result of this section.
Theorem 2.
The proof relies on the following results.
Lemma 1.
Proof.
We start by proving (i). For any , let us define the sequence of events
(16) 
Note that for all , the events are independent and that implies , for any nontrivial . Since , for all , we have
This follows from [23, Corollary 5.3.6] and the fact that is irreducible and hence, there is at least a subset of size of the edges that form a strongly connected graph and for some .
Since the events are independent, hence, by the second BorelContelli lemma [24, Theorem 2.3.6], infinitely often, almost surely. Moreover, since every positive entry of is bounded below by , for any nontrivial , , almost surely, implying that has the directed infinite flow property, almost surely. This also implies that and are finite for all , almost surely. This completes the proof of (i).
To prove (ii), let us define, for all ,
(17) 
where is defined in (16). Since the are independent, for all . This implies that . Again, since the are independent, by the BorelContelli lemma, occurs infinitely often, almost surely. This implies that infinitely often, almost surely. Hence, , almost surely. ∎
Lemma 2.
Proof.
The preceding two lemmas and Proposition 4 imply the following.
Corollary 1.
Let be a sequence of random columnstochastic matrices corresponding to the sequence satisfying Assumption 1. Then admits average consensus, almost surely.
Lemma 3.
Proof.
Let be the indicator of the event , i.e.,
By the preceding argument, we have . Note that the are independent. We let for all , and define
By definition of and , we have that
(18) 
Now, we have that
Since all terms on the righthand side are less than or equal to , we have
Using (18), we have
Let us consider the second term on the righthand side. When , we have . Using Lemma A.5 to maximize the second term on the righthand side over the choices of , we obtain
(19)  
(20) 
To further simplify the above inequality, we show that . To show this, we note that for all , we have and hence, . Now, assume that . We have . Therefore,
where the last inequality follows from the fact that .