Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation over Adaptive Networks
Adaptive networks consist of a collection of nodes with adaptation and learning abilities. The nodes interact with each other on a local level and diffuse information across the network to solve estimation and inference tasks in a distributed manner. In this work, we compare the mean-square performance of two main strategies for distributed estimation over networks: consensus strategies and diffusion strategies. The analysis in the paper confirms that under constant step-sizes, diffusion strategies allow information to diffuse more thoroughly through the network and this property has a favorable effect on the evolution of the network: diffusion networks are shown to converge faster and reach lower mean-square deviation than consensus networks, and their mean-square stability is insensitive to the choice of the combination weights. In contrast, and surprisingly, it is shown that consensus networks can become unstable even if all the individual nodes are stable and able to solve the estimation task on their own. When this occurs, cooperation over the network leads to a catastrophic failure of the estimation task. This phenomenon does not occur for diffusion networks: we show that stability of the individual nodes always ensures stability of the diffusion network irrespective of the combination topology. Simulation results support the theoretical findings.
Adaptive networks, diffusion strategy, consensus strategy, mean stability, mean-square stability, mean-square-error performance, combination weights.
Adaptive networks consist of a collection of spatially distributed nodes that are linked together through a topology and that cooperate with each other through local interactions. Adaptive networks are well-suited to perform decentralized information processing and inference tasks [2, 3] and to model complex and self-organized behavior encountered in biological systems [4, 5].
We examine two types of fully decentralized strategies, namely, consensus strategies and diffusion strategies. The consensus strategy was originally proposed in the statistics literature  and has since then been developed into an elegant procedure to enforce agreement among cooperating nodes. Average consensus and gossip algorithms have been studied extensively in recent years, especially in the control literature [7, 8, 9, 10, 11, 12], and applied to the study of multi-agent formations [13, 14], distributed optimization [15, 16], and distributed estimation problems [17, 18, 19]. Original implementations of the consensus strategy relied on the use of two time-scales [20, 21, 22]: one time-scale for the collection of measurements across the nodes and another time-scale to iterate sufficiently enough over the collected data to attain agreement before the process is repeated. Unfortunately, two time-scale implementations hinder the ability to perform real-time recursive estimation and adaptation when measurement data keep streaming in. For this reason, in this work, we focus instead on consensus implementations that operate in a single time-scale. Such implementations appear in several recent works, including [16, 19, 17, 18], and are largely motivated by the procedure developed earlier in [15, 23] for the solution of distributed optimization problems.
The second class of algorithms that we consider deals with diffusion strategies, which were originally introduced for the solution of distributed estimation and adaptation problems in [24, 25, 26, 2, 3]. The main motivation for the introduction of diffusion strategies in these works was the desire to develop distributed schemes that are able to respond in real-time to continuous streaming of data at the nodes by operating over a single time-scale. A useful overview of diffusion strategies appears in . Since their inception, diffusion strategies have been applied to model various forms of complex behavior encountered in nature [4, 5]; they have also been adopted to solve distributed optimization problems advantageously in [28, 29, 30]; and have been studied under varied conditions in [31, 32, 33, 34] as well. Diffusion strategies are inherently single time-scale implementations and are therefore naturally amenable to real-time and recursive implementations. It turns out that the dynamics of the consensus and diffusion strategies differ in important ways, which in turn impact the mean-square behavior of the respective networks in a fundamental manner.
The analysis in this paper will confirm that under constant step-sizes, diffusion strategies allow information to diffuse more thoroughly through networks and this property has a favorable effect on the evolution of the network. It will be shown that diffusion networks converge faster and reach lower mean-square deviation than consensus networks, and their mean-square stability is insensitive to the choice of the combination weights. In comparison, and surprisingly, it is shown that consensus networks can become unstable even if all the individual nodes are stable and able to solve estimation task on their own. In other words, the learning curve of a cooperative consensus network can diverge even if the learning curves for the non-cooperative individual nodes converge. When this occurs, cooperation over the network leads to a catastrophic failure of the estimation task. This behavior does not occur for diffusion networks: we will show that stability of the individual nodes is sufficient to ensure stability of the diffusion network regardless of the combination weights. The properties revealed in this paper indicate that there needs to be some care with the use of consensus strategies for adaptation because they can lead to network failure even if the individual nodes are stable and well-behaved. The analysis also suggests that diffusion strategies provide a proper way to enforce cooperation over networks; their operation is such that diffusion networks will always remain stable irrespective of the combination topology.
Ii Estimation Strategies over Networks
Consider a network consisting of nodes distributed over a spatial domain. Two nodes are said to be neighbors if they can exchange information. The neighborhood of node is denoted by . The nodes in the network would like to estimate an unknown vector, . At every time instant, , each node is able to observe realizations of a scalar random process and a vector random process with a positive-definite covariance matrix, , where denotes the expectation operator. All vectors in our treatment are column vectors with the exception of the regression vector, , which is taken to be a row vector for convenience of presentation. The random processes are related to via the linear regression model :
where is measurement noise with variance and assumed to be temporally white and spatially independent, i.e.,
in terms of the Kronecker delta function. The regression data are likewise assumed to be temporally white and spatially independent. The noise and the regressors are assumed to be independent of each other for all . All random processes are assumed to be zero mean. Note that we use boldface letters to denote random quantities and normal letters to denote their realizations or deterministic quantities. Models of the form (1) are useful in capturing many situations of interest, such as estimating the parameters of some underlying physical phenomenon, tracking a moving target by a collection of nodes, or estimating the location of a nutrient source or predator in biological networks (see, e.g., [4, 5, 35]); these models are also useful in the study of the performance limits of combinations of adaptive filters [36, 37, 38, 39].
The objective of the network is to estimate in a distributed manner through an online learning process. The nodes estimate by seeking to minimize the following global cost function:
In the sequel, we describe the algorithms pertaining to the consensus and diffusion strategies that we study in this article, in addition to the non-cooperative mode of operation. Afterwards, we move on to the main theme of this work, which is to show why diffusion networks outperform consensus networks. We may remark that the same strategies can be used to optimize global cost functions where the individual costs are not necessarily quadratic in as in (3). Most of the mean-square analysis performed here can be extended to this more general scenario — see, e.g., [30, 40] and the references therein.
Ii-a Non-Cooperative Strategy
In the non-cooperative mode of operation, each node operates independently of the other nodes and estimates by means of a local LMS adaptive filter applied to its data . The filter update takes the following form [41, 35]:
where is the constant step-size used by node . In (4), the vector denotes the estimate for that is computed by node at time . Note that for the underlying model where for all , every individual node can employ (4) to estimate independently if desired. Studies allowing for other observability conditions for diffusion and consensus strategies, including possibly singular covariance matrices, appear in [18, 42].
Ii-B Cooperative Strategies
In the cooperative mode of operation, nodes interact with their neighbors by sharing information. In this article, we study three cooperative strategies for distributed estimation.
B.1. Consensus Strategy
where is a set of nonnegative coefficients. It should be noted that in most works on consensus implementations, especially in the context of distributed optimization problems [16, 23, 17, 18, 28], the step-sizes that are used in (5) depend on the time-index and are required to satisfy
In other words, for each node , the step-size sequence is required to vanish as . Under such conditions, it is known that consensus strategies allow the nodes to reach agreement about [16, 18, 43, 44]. Here, instead, we will use constant step-sizes . This is because we are interested in studying the adaptation and learning abilities of the networks. Constant step-sizes are critical to endow networks with continuous adaptation and tracking abilities; otherwise, under (6), once the step-sizes have decayed to zero, the network stops adapting and learning is turned off.
The entry denotes the weight that node assigns to the estimate received from its neighbor (see Fig. 1); note that the weights are nonnegative for and that is nonnegative for sufficiently small step-sizes. If we collect the nonnegative weights into an matrix , then it follows from (7) that the combination matrix satisfies the following properties:
where is a vector of size with all entries equal to one. That is, the weights on the links arriving at node add up to one, which is equivalent to saying that the matrix is left-stochastic. Moreover, if two nodes and are not linked, then their corresponding entry is zero.
B.2. ATC Diffusion Strategy
Diffusion strategies for the optimization of (3) in a fully decentralized manner were derived in [24, 25, 2, 26, 3, 30] by applying a completion-of-squares argument, followed by a stochastic approximation step and an incremental approximation step — see . The adapt-then-combine (ATC) form of the diffusion strategy is described by the following update equations :
The above strategy consists of two steps. The first step of (10) involves local adaptation, where node uses its own data to update its weight estimate from to an intermediate value . The second step of (10) is a consultation (combination) step where the intermediate estimates from the neighborhood of node are combined through weights that satisfy (9) to obtain the updated weight estimate .
B.3. CTA Diffusion Strategy
Another variant of the diffusion strategy is the combine-then-adapt (CTA) form, which is described by the following update equations :
Thus, comparing the ATC and CTA strategies, we note that the order of the consultation and adaptation steps are simply reversed. The first step of (11) involves a consultation step, where the existing estimates from the neighbors of node are combined through the weights . The second step of (11) is a local adaptation step, where node uses its own data to update its weight estimate from the intermediate value to .
B.4. Comparing Diffusion and Consensus Strategies
Note that the first terms on the right hand side of these recursions are all the same. For the second terms, only variable appears in the consensus strategy (12), while the diffusion strategies (13)-(14) incorporate the estimates from the neighborhood of node into the update of . Moreover, in contrast to the consensus (12) and CTA diffusion (14) strategies, the ATC diffusion strategy (13) further incorporates the influence of the data from the neighborhood of node into the update of . These differences in the order by which the computations are performed have important implications on the evolution of the weight-error vectors across consensus and diffusion networks. It is important to note that the diffusion strategies (13)-(14) are able to incorporate additional information into their processing steps without being more complex than the consensus strategy. All three strategies have the same computational complexity and require sharing the same amount of data (see Table I), as can be ascertained by comparing the actual implementations (8), (10), and (11). The key fact to note is that the diffusion implementations first generate an intermediate state variable, which is subsequently used in the final update. This important ordering of the calculations has a critical influence on the performance of the algorithms, as we now move on to reveal.
|ATC diffusion (10)||CTA diffusion (11)||Consensus (8)|
Iii Mean-Square Performance Analysis
The mean-square performance of diffusion networks has been studied in detail in [2, 3, 27] by applying energy conservation arguments [35, 45]. Following , we will first show how to carry out the performance analysis in a unified manner that covers both diffusion and consensus strategies (see Table II further ahead, which highlights how the parameters for both strategies differ). Subsequently, we use the resulting performance expressions to carry out detailed comparisons and to establish and highlight some surprising and interesting differences in performance.
Iii-a Network Error Recursion
Let the error vector for an arbitrary node be denoted by
We collect all error vectors and step-sizes across the network into a block vector and block matrix:
where the notation denotes the vector that is obtained by stacking its arguments on top of each other, and the notation constructs a diagonal matrix from its arguments. We further introduce the extended combination matrix:
where the symbol denotes the Kronecker product of two matrices. This construction replaces each entry in by the diagonal matrix in . Then, if we start from (12), (13), or (14), and use model (1), some straightforward algebra similar to [3, 27] shows that the global error vector for the various strategies evolves according to the following recursion:
where the quantities and are listed in Table II and where is a block diagonal matrix and is a block column vector:
The coefficient matrix is an block matrix with blocks of size each. Likewise, the driving vector is an block vector with entries that are each. The matrix controls the evolution of the network error vector . It is obvious from Table II that this matrix is different for each of the strategies under consideration. We shall verify in the sequel that the differences have critical ramifications when we compare consensus and diffusion strategies. Note in passing that any of these three distributed strategies degenerates to the non-cooperative strategy (4) when .
Iii-B Mean Stability
We start our analysis by examining the stability in the mean of the networks, i.e., the stability of the recursion for . Thus, note that the matrices in Table II are random matrices due to the randomness of the regressors in . In other words, the evolution of the networks is stochastic in nature. Now, since the regressors are temporally white and spatially independent, then the are independent of for any of the strategies. Moreover, since the are independent of each other, then the are zero mean. Taking expectation of both sides of (19), we find that the mean of evolves in time according to the recursion:
where is shown in Table II and
The necessary and sufficient condition to ensure mean stability of the network (namely, as ) is therefore to select step-sizes that ensure :
where denotes the spectral radius of its matrix argument. Note that the coefficient matrices that control the evolution of are different in the cases listed in Table II. These differences lead to interesting conclusions.
B.1. Comparison of Mean Stability
To begin with, the matrix is block diagonal in the non-cooperative case and equal to
Therefore, for each of the individual nodes to be stable in the mean, it is necessary and sufficient that the step-sizes be selected to satisfy
where denotes the maximum eigenvalue of its Hermitian matrix argument. Condition (27) guarantees that when each node acts individually and applies the LMS recursion (4), then the mean of its weight error vector will tend asymptotically to zero. That is, by selecting the step-sizes to satisfy (27), all individual nodes will be stable in the mean.
Now consider the matrix in the consensus case; it is equal to
It is seen in this case that the stability of depends on . The fact that the stability of the consensus strategy is sensitive to the choice of the combination matrix is known in the consensus literature for the conventional implementation for computing averages and which does not involve streaming data or gradient noise [6, 46]. Here, we are studying the more demanding case of the single time-scale consensus iteration (8) in the presence of both noisy and streaming data. It is clear from (28) that the choice of can destroy the stability of the consensus network even when the step-sizes are chosen according to (27) and all nodes are stable on their own. This behavior does not occur for diffusion networks where the matrices for the ATC and CTA diffusion strategies are instead given by
The following result clarifies these statements.
Theorem 1 (Spectral properties of ).
It holds that
irrespective of the choice of the left-stochastic matrices . Moreover, if the combination matrix is symmetric, then the eigenvalues of are less than or equal to the corresponding eigenvalues of , i.e.,
where the eigenvalues are arranged in decreasing order, i.e., if .
See Appendix A. ∎
Result (30) establishes the important conclusion that the coefficient matrix for the diffusion strategies is stable whenever (or, from (26), each of the matrices ) is stable; this conclusion is independent of . The stability of the matrices is ensured by any step-size satisfying (27). Therefore, stability of the individual nodes will always guarantee the stability of in the ATC and CTA diffusion cases, regardless of the choice of . This is not the case for the consensus strategy (8); even when the step-sizes are selected to satisfy (27) so that all individual nodes are mean stable, the matrix can still be unstable depending on the choice of (and, therefore, on the network topology as well). Therefore, if we start from a collection of nodes that are behaving in a stable manner on their own, and if we connect them through a topology and then apply consensus to solve the same estimation problem through cooperation, then the network may end up being unstable and the estimation task can fail drastically (see Fig. 2 further ahead). Moreover, it is further shown in Appendix A that when is symmetric, the consensus strategy is mean-stable for step-sizes satisfying:
Note from (9) that since is a left-stochastic matrix, its spectral radius is equal to one and one of its eigenvalues is also equal to one , i.e., . This implies that the upper bound in (32) is less than the upper bound in (27) so that diffusion networks are stable over a wider range of step-sizes. Actually, the upper bound in (32) can be much smaller than the one in (27) or even zero because can be negative or equal to .
What if some of the nodes are unstable in the mean to begin with? How would the behavior of the diffusion and consensus strategies differ? Assume that there is at least one individual unstable node, i.e., for some so that . Then, we observe from (30) that the spectral radius of can still be smaller than one even if . It follows that even if some individual node is unstable, the diffusion strategies can still be stable if we properly choose . In other words, diffusion cooperation has a stabilizing effect on the network. In contrast, if there is at least one individual unstable node and the combination matrix is symmetric, then from (31), no matter how we choose , the will be larger than or equal to one and the consensus network will be unstable.
The above results suggest that fusing results from neighborhoods according to the consensus strategy (8) is not necessarily the best thing to do because it can lead to instability and catastrophic failure. On the other hand, fusing the results from neighbors via diffusion ensures stability regardless of the topology.
B.2. Example: Two-Node Networks
To illustrate these important observations, let us consider an example consisting of two cooperating nodes; in this case, it is possible to carry out the calculations analytically in order to highlight the various patterns of behavior. Later, in the simulations section, we illustrate the behavior for networks with multiple nodes. Thus, consider a network consisting of nodes. For simplicity, we assume the weight vector is a scalar, and and . Without loss of generality, we assume . The combination matrix for this example is of the form (Fig. 2):
with . When desired, a symmetric can be selected by simply setting . Then, using (33), we get
We first assume that
so that both individual nodes are stable in the mean by virtue of (27). Then, by Theorem 1, the ATC diffusion network will also be stable in the mean for any choice of the parameters . We now verify that there are choices for that will turn the consensus network unstable. Specifically, we verify below that if and happen to satisfy
then consensus will lead to unstable network behavior even though both individual nodes are stable. Indeed, note first that the minimum eigenvalue of is given by:
From the first equality of (39), we know that and, hence, is real. When (36)-(37) are satisfied, we have that and in the second equality of (39) are nonnegative. It follows that the consensus network is unstable since
Next, we consider an example satisfying
so that node is still stable, whereas node becomes unstable. From the first equality of (39), we again conclude that
That is, in this second case, no matter how we choose the parameters , the consensus network is always unstable. In contrast, the diffusion network is able to stabilize the network. To see this, we set so that the eigenvalues of in (34) are . Some algebra shows that the diffusion network is stable if satisfies
In Fig. 2(b), we set and so that node is stable, but node is unstable. If we now set , then (43) is satisfied and the diffusion strategies become stable even when the non-cooperative and consensus strategies are unstable.
Iii-C Mean-Square Stability
We now examine the stability in the mean-square sense of the consensus and diffusion strategies. Let denote an arbitrary nonnegative-definite matrix that we are free to choose. From (19), we get the following weighted variance relation for sufficiently small step-sizes:
where the notation denotes the weighted square quantity and appears in Table II with the covariance matrix defined by:
As shown in [3, 48, 27], step-sizes that satisfy (24) and are sufficiently small will also ensure mean-square stability of the network (namely, as ). Therefore, we find again that, for infinitesimally small step-sizes, the mean-square stability of consensus networks is sensitive to the choice of , whereas the mean-square stability of diffusion networks is not affected by . In the next section, we will examine more closely for the various strategies listed in Table II and establish that diffusion networks are not only more stable than consensus networks but also lead to better mean-square-error performance as well.
Iii-D Mean-Square Deviation
The mean-square deviation (MSD) measure is used to assess how well the nodes in the network estimate the weight vector, . The MSD at node is defined as follows:
where denotes the Euclidean norm for vectors. The network MSD is defined as the average MSD across the network, i.e.,
Iterating (44), we can obtain a series expression for the network MSD as:
We can also obtain a series expansion for the MSD at each individual node as follows:
Iv Comparison of Mean-Square Performance for Homogeneous Agents
In the previous section, we compared the stability of the various estimation strategies in the mean and mean-square senses. In particular, we established that stability of the individual nodes ensures stability of diffusion networks irrespective of the combination topology. In the sequel, we shall assume that the step-sizes are sufficiently small so that conditions (27) and (32) hold and the diffusion and consensus networks are stable in the mean and mean-square sense; as well as the individual nodes. Under these conditions, the networks achieve steady-state operation. We now use the MSD expressions derived above to establish that ATC diffusion achieves lower (and, hence, better) MSD values than the consensus, CTA, and non-cooperative strategies. In this way, diffusion strategies do not only ensure stability of the cooperative behavior but they also lead to improved mean-square-error performance. We establish these results under the following reasonable condition.
All nodes in the network use the same step-size, , and they observe data arising from the same covariance data so that for all . In other words, we are dealing with a network of homogeneous nodes interacting with each other. In this way, it is possible to quantify the differences in performance without biasing the results by differences in the adaptation mechanism (step-sizes) or in the covariance matrices of the regression data at the nodes.
Under Assumption 1, it holds that and , and thus the matrices and in Table II reduce to the expressions shown in Table III, where we introduced the diagonal matrix
Note that the ATC and CTA diffusion strategies now have the same coefficient matrix . We explain in the sequel the terms that appear in the last row of Table III.
|ATC diffusion (10)||CTA diffusion (11)||Consensus (8)||Non-cooperative (4)|
Iv-a Spectral Properties of
As mentioned before, the stability and mean-square-error performance of the various algorithms depend on the corresponding matrix ; therefore, in this section, we examine more closely the eigen-structure of . For the distributed strategies (diffusion and consensus), the eigen-structure of will depend on the combination matrix . Thus, let and () denote an arbitrary pair of right and left eigenvectors of corresponding to the eigenvalue . That is,
We scale the vectors and to satisfy:
Recall that . Furthermore, we let () denote the eigenvector of the covariance matrix that is associated with the eigenvalue . That is,
Since is Hermitian and positive-definite, the are orthonormal, i.e., , and the are positive. The following result describes the eigen-structure of the matrix in terms of the eigen-structures of for the diffusion and consensus algorithms of Table III. Note that the results for any of these distributed strategies collapse to the result for the non-cooperative strategy when we set for all .
Lemma 1 (Eigen-structure of under diffusion and consensus).
The matrices appearing in Table III for the diffusion and consensus strategies have right and left eigenvectors given by:
with the corresponding eigenvalues, , shown in Table III for and . Note that while the eigenvectors are the same for the diffusion and consensuses strategies, the corresponding eigenvalues are different.
We only consider the diffusion case and denote its coefficient matrix by ; the same argument applies to the consensus strategy. We multiply by the defined in (54) from the right and obtain
where we used the Kronecker product property for matrices of compatible dimensions . In a similar manner, we can verify that has left eigenvector defined in (54) with the corresponding eigenvalue from Table III. ∎
Theorem 2 (Spectral radius of under diffusion and consensus).
Under Assumption 1, it holds that
where equality holds if or when the step-size satisfies:
See Appendix B. ∎
Note that the upper bound in (57) is even smaller than the one in (32) and, therefore, can again be very small or even zero. It follows that there is generally a wide range of step-sizes over which is greater than . When this happens, the convergence rate of diffusion networks is superior to the convergence rate of consensus networks; in particular, the quantities and will converge faster towards their steady-state values over diffusion networks than over consensus networks.
Iv-B Network MSD Performance
We now compare the MSD performance. Note that the expressions for the individual MSD in (49) and the network MSD in (48) depend on in a nontrivial manner. To simplify these MSD expressions, we introduce the following assumption on the combination matrix.
The combination matrix is diagonalizable, i.e., there exists an invertible matrix and a diagonal matrix such that
That is, the columns of consist of the right eigenvectors of and the rows of consist of the left eigenvectors of , as defined by (51).
Note that, besides condition (52), it follows from Assumption 2 that . Furthermore, any symmetric combination matrix is diagonalizable and therefore satisfies condition (58) automatically. Actually, when is symmetric, more can be said about its eigenvectors. In that case, the matrix will be orthogonal so that and it will further hold that . Assumption 2 allows the analysis to apply to important cases in which is not necessarily symmetric but is still diagonalizable (such as when is constructed according to the uniform rule by assigning to the links of node weights that are equal to the inverse of its degree, ). We can now simplify the MSD expressions by using the eigen-decomposition of from Lemma 1 and the above eigen-decomposition of .