Decentralized JointSparse Signal Recovery: A Sparse Bayesian Learning Approach ^{†}^{†}thanks: This work has appeared in part in [1].
Abstract
This work proposes a decentralized, iterative, Bayesian algorithm called CBDSBL for innetwork estimation of multiple jointly sparse vectors by a network of nodes, using noisy and underdetermined linear measurements. The proposed algorithm exploits the network wide joint sparsity of the unknown sparse vectors to recover them from significantly fewer number of local measurements compared to standalone sparse signal recovery schemes. To reduce the amount of internode communication and the associated overheads, the nodes exchange messages with only a small subset of their single hop neighbors. Under this communication scheme, we separately analyze the convergence of the underlying Alternating Directions Method of Multipliers (ADMM) iterations used in our proposed algorithm and establish its linear convergence rate. The findings from the convergence analysis of decentralized ADMM are used to accelerate the convergence of the proposed CBDSBL algorithm. Using Monte Carlo simulations, we demonstrate the superior signal reconstruction as well as support recovery performance of our proposed algorithm compared to existing decentralized algorithms: DRL1, DCOMP and DCSP.
Decentralized Estimation, Distributed Compressive Sensing, Joint Sparsity, Sparse Bayesian Learning, Sensor Networks.
I Introduction
We consider the problem of innetwork estimation of multiple jointsparse vectors by a network of connected agents or processing nodes, using noisy and underdetermined linear measurements. Two or more vectors in are called jointsparse if, in addition to each vector being individually sparse,^{1}^{1}1 A vector in is said to be sparse if only out of its coefficients are nonzero. their nonzero coefficients belong to a common index set. Joint sparsity occurs naturally in scenarios involving multiple agents trying to learn a sparse representation of a common physical phenomenon. Since the underlying physical phenomenon is the same for all the agents (with similar acquisition modalities), their individual sparse representations/model parameters tend to exhibit joint sparsity. In this work, we consider jointsparse vectors which belong to Type2 Joint Sparse Model [2] or JSM2, one of the three generative models for jointsparse signals. JSM2 signal vectors satisfy the property that their nonzero coefficients are uncorrelated within and across the vectors. JSM2 has been successfully used in several applications such as cooperative spectrum sensing [3, 4], decentralized event detection [5, 6], multitask compressive sensing [7] and MIMO channel estimation[8, 9, 10].
To further motivate the signal structure of joint sparsity in a distributed setup, consider the problem of detection/classification of randomly occurring events in a field by multiple sensor nodes. Each sensor node , , employs a dictionary , whose each column is the signature corresponding to the event, one out of the events which can potentially occur. In many cases, due to the inability to accurately model the sensing process, the signature vectors are simply chosen to be the past recordings of sensor corresponding to standalone occurrence of the event, averaged across multiple experiments [6]. This procedure can result in a dictionary whose columns are highly correlated. Thus, for any events occurring simultaneously, a noisy sensor recording might belong to multiple subspaces, each spanned by different subsets of columns of the local dictionary. In such a scenario, enforcing joint sparsity across the sensor nodes can resolve the ambiguity in selecting the correct subset of columns at each sensor node.
In this work, we consider a distributed setup where each individual jointsparse vector is estimated by a distinct node in a network comprising multiple nodes, with each node having access to noisy and underdetermined linear measurements of its local sparse vector. By collaborating with each other, these nodes can exploit the underlying joint sparsity of their local sparse vectors to reduce the measurements required per node or improve the quality of their local signal estimates. In [2], it has been shown that the number of local measurements required for common support recovery can be dramatically reduced by exploiting the joint sparsity structure prevalent across the network. In fact, as the nodes increase in number, exact signal reconstruction is possible from as few as measurements per node, where denotes the size of the support set. Such a substantial reduction in the number of measurements is highly desirable, especially in applications where the cost or time required to acquire new measurements is high.
Distributed algorithms for JSM2 signal recovery come in two flavors  centralized and decentralized. In the centralized approach, each node transmits its local measurements to a fusion center (FC) which runs a jointsparse signal recovery algorithm. The FC then transmits the reconstructed sparse signal estimates back to their respective nodes. In contrast, in a decentralized approach, the goal is to obtain the same solution as with the centralized scheme at all nodes by allowing each node to exchange information with its single hop neighbors in addition to processing its local measurements. Besides being inherently robust to node failures, decentralized schemes also tend to be more energy efficient as the internode communication is restricted to relatively short ranges covering only one hop communication links. In this work, we focus on the decentralized approach for solving the sparse signal recovery problem under the JSM2 signal model.
Ia Related Work
In this subsection, we briefly summarize the existing centralized and decentralized algorithms for JSM2 signal recovery. The earliest work on jointsparse signal recovery considered extensions of recovery algorithms meant for single measurement vector setup to the centralized multiple measurement vector (MMV) model [11], and demonstrated the significant performance gains that are achievable by exploiting the joint sparsity structure. MMV Basic Matching Pursuit (MBMP), MMV Orthogonal Matching Pursuit (MOMP) and MMV FOcal Underdetermined System Solver (MFOCUSS), introduced in [11], belong to this category. In [12], joint sparsity was exploited for distributed encoding of multiple sparse signals. This work generalized the jointsparse signals as being generated according to one of the three jointsparse signal models (JSM1,2,3). This work also proposed a centralized greedy algorithm called Simultaneous Orthogonal Matching Pursuit (SOMP) [2] for JSM2 recovery. In [13], Alternating Directions Method for MMV setup (ADMMMV) was proposed which used an mixed norm penalty to promote a jointsparse solution. In [14], the multiple response sparse Bayesian learning (MSBL) algorithm was proposed as an MMV extension of the SBL algorithm [15]. Unlike the algorithms discussed earlier, MSBL adopts a probabilistic approach by seeking the maximum a posterior probability (MAP) estimate of the JSM2 signals. In MSBL, a jointsparse solution is encouraged by assuming a joint sparsity inducing parameterized prior on the unknown sparse vectors, with the prior parameters learnt directly from the measurements. MSBL has been shown to outperform deterministic methods based on norm relaxation such as MBMP and MFOCUSS [11] as well as greedy algorithms such as SOMP. AMPMMV [16] is another Bayesian algorithm which uses approximate message passing (AMP) to obtain marginalized conditional posterior distributions of jointsparse signals. Owing to their low computational complexity, AMP based algorithms are suitable for recovering signals with large dimensions. However, they have been shown to converge only for large dimensional and randomly constructed measurement matrices. Interested readers are referred to [17] for an excellent study comparing some of the aforementioned centralized JSM2 signal recovery algorithms.
Among decentralized algorithms, collaborative orthogonal matching pursuit (DCOMP) [18] and collaborative subspace pursuit (DCSP) [19] are greedy algorithms for JSM2 signal recovery, and both are computationally very fast. However, as demonstrated later in this paper, they do not perform as well as regularization based methods which induce joint sparsity in their solution by employing a suitable penalty or indirectly via a joint signal prior. Moreover, both DCOMP and DCSP assume a priori knowledge of the size of the nonzero support set, which could be unknown or hard to estimate. Decentralized rowbased LASSO (DRLASSO) [20] is an iterative alternating minimization algorithm which optimizes a nonconvex objective with  mixed norm based regularization to obtain a jointsparse solution. Decentralized reweighted minimization algorithms DRL1,2 [5] employ a nonconvex sumlogsum penalty to promote a jointsparse solution. Although nonconvex regularizers induce sparsity much more strongly as compared to convex norm based regularizers [21], the resulting nonconvex optimization can be difficult to solve efficiently. In DRL1/2, the nonconvex objective is replaced by a surrogate convex function constructed from iteration dependent weighted / norm terms. Using a nonconvex sumlogsum regularization results in a more sparse solution compared to convex regularization used in DRLASSO. However, both DRLASSO and DRL1,2 necessitate cross validation to tune the amount of regularization needed for optimal support recovery performance. DRL1,2 also requires proper tuning of a socalled smoothing parameter and an ADMM parameter for its optimal performance. By employing a Bayesian approach,we can completely eliminate any need for cross validation, by learning the parameters of a family of signal priors, such that selected signal prior has maximum Bayesian evidence. DCSAMP [3] is one such decentralized algorithm which employs approximate message passing to learn a parameterized joint sparsity inducing BernoulliGaussian signal prior. Turbo Bayesian Compressive Sensing (TurboBCS) [22], another decentralized algorithm, adopts a more relaxed zero mean Gaussian signal prior, with the variance hyperparameters themselves distributed according to an exponential distribution. This relaxation of signal prior results in improved MSE without compromising on sparsity of the solution. TurboBCS, however, involves direct exchange of signal estimates between the nodes, which renders it unsuitable for applications where it is necessary to preserve the privacy of the local signals.
Decentralized algorithm  Per node, per iteration computational complexity  Per node, per iteration communication complexity  Privacy of local signal estimates  
(if any)  Tunable parameters  
a priori knowledge of sparsity level  Assumes  
DCSP [19]  Yes  None  Yes  
DCOMP [18]  Yes  None  Yes  
DRL1 [5]  Yes  Yes  No  
DRLASSO [20]  Yes  Yes  No  
TurboBCS [22]  No  None  No  
DCSAMP [3]  Yes  Yes  No  
CBDSBL (proposed)  Yes  None  No  
1. and stand for the dimension of unknown sparse vector, number of local measurements per node, number of nonzero coefficients  
in the true support and network size, respectively.  
2. is the maximum number of communication links activated per node, per communication round.  
3. is the number of inner loop ADMM iterations executed per CBDSBL iteration.  
4. is also the number of ADMM iterations used to obtain an inexact solution to the weighted norm based subproblem  
in the inner loop of DRL1.  
5. and denote the number of iterations of the two different inner loop iterations executed per DRLASSO iteration. 
IB Contributions
Our main contributions in this work are as follows:

We propose a novel decentralized, iterative, Bayesian jointsparse signal recovery algorithm called Consensus Based Distributed Sparse Bayesian Learning or CBDSBL. CBDSBL works by establishing network wide consensus with respect to the estimated parameters of a joint sparsity inducing signal prior. The learnt signal prior is subsequently used by the individual nodes to obtain MAP estimates of local sparse signal vectors by the individual nodes. The proposed CBDSBL algorithm does not require direct exchange of either local measurements or signal estimates between the nodes and hence is well suited for applications where it is important to preserve the privacy of the local signal coefficients.

The proposed algorithm employs the Alternating Directions Method of Multipliers (ADMM) to solve a series of iteration dependent consensus optimization problems which require the nodes to exchange messages with each other. To reduce the associated communication overheads, we adopt a bandwidth efficient internode communication scheme. This scheme entails the nodes exchanging messages with only a predesignated subset of its single hop neighbors known as bridge nodes, as motivated in [23]. By selecting these bridge nodes, one can trade off between communication bandwidth requirements and the ADMM’s robustness to node failures. In this connection, we analytically establish the relationship between the selected set of bridge nodes and the convergence rate of the ADMM iterations. For the bridgenode based internode communication scheme, we show linear rate of convergence for the ADMM iterations when applied to a generic consensus optimization problem. The analysis is useful in obtaining a closed form expression for the tunable parameter of our proposed joint sparse signal recovery algorithm, ensuring its fast convergence.

We empirically demonstrate the superior MSE and support recovery performance of CBDSBL in comparison to existing decentralized algorithms: DRL1, DCOMP and DCSP.
In Table I, we compare the existing decentralized jointsparse signal recovery schemes with respect to their per iteration computational and communication complexity, privacy of local estimates, presence/absence of tunable parameters and dependence on prior knowledge of the sparsity level. As highlighted in the comparison in Table I, CBDSBL belongs to a handful of decentralized algorithms for jointsparse signal recovery which do not require a priori knowledge of the sparsity level, rely only on single hop communication, and do not involve direct exchange of local signal estimates between network nodes. Besides this, unlike loopy Belief Propagation (BP) or Approximate Message Passing (AMP) based Bayesian algorithms, CBDSBL does not suffer from any convergence issues even when the local measurement matrix at each node is dense or not randomly constructed.
The rest of this paper is organized as follows. Section II describes the system model and the problem statement of distributed JSM2 signal recovery. Section III discusses centralized MSBL [14] adapted to our setup, and sets the stage for our proposed decentralized solution. Section IV develops the proposed CBDSBL algorithm along with a detailed discussion on the convergence properties of the underlying ADMM iterations. Other implementation specific issues are also discussed. Section V compares the performance of proposed algorithm with existing ones with respect to various performance metrics. Finally, section VI concludes the paper.
Notation: Boldface lowercase and uppercase alphabets are used to denote vectors and matrices, respectively. Script styled alphabet (for example ) is used to denote a set. denotes the cardinality of set . The term denotes the element of vector associated with node at iteration/time index. The superscript denotes the transpose operation. For matrices and of sizes and respectively, denotes their Kronecker product, which is of size . denotes the Gaussian distribution with mean and covariance matrix . denotes taking expectation of random variable conditioned on another random variable .
Ii Distributed JSM2 System Model
We consider a network of nodes/sensors connected as a network described by a bidirectional graph . is the set of vertices in , each vertex representing a node in the network. Set contains the edges in , each edge representing a single hop errorfree communication link between a distinct pair of nodes. Each node is interested in estimating an unknown sparse vector from locally acquired noisy linear measurements . The generative model of the local measurement vector at node is given by
(1) 
where, is a full rank sensing matrix and is the measurement noise modeled as zero mean Gaussian distributed with covariance matrix . The sparse vectors at different nodes follow the JSM2 signal model [12]. This implies that all share a common support, represented by the index set . From the JSM2 model, it also follows that the nonzero coefficients of the sparse vectors are independent within and across the vectors.
The goal is to recover the local sparse vectors at their respective nodes using decentralized processing. In addition to processing the local data , each node must collaborate with its single hop neighboring nodes to exploit the network wide joint sparsity of the unknown sparse vectors. For sake of privacy, the nodes are prohibited from directly exchanging their local measurements or local signal estimates. Finally, the decentralized algorithm should be able to generate the centralized solution at each node, as if each node has access to the entire global information i.e., .
Iii Centralized Algorithm for JSM2
In this section, we briefly recall the centralized MSBL algorithm [14] for JSM2 signal recovery and extend it to support distinct measurement matrices and noise variances at each node. The centralized algorithm runs at an FC, which assumes complete knowledge of network wide information, . For ease of notation, we introduce two variables and to be used in the sequel.
Similar to MSBL, each of the sparse vectors is assumed to be distributed according to a parameterized signal prior shown below.
(2)  
Further, the joint signal prior is assumed to be given by
(3) 
In the above, is an dimensional hyperparameter vector, whose entry, , models the common variance of for . Since the signal priors are parameterized by a common , if has a sparse support , then the MAP estimates of will also be jointly sparse with the same common support . The Gaussian prior in (2) promotes sparsity as it has an alternate interpretation as a parameterized model for the family of variational approximations to a sparsity inducing Student’s tdistributed prior [24]. Under this interpretation, finding the hyperparameter vector which maximizes the likelihood is equivalent to finding the variational approximation which has the largest Bayesian evidence.
Let denote the maximum likelihood (ML) estimate of hyperparameters of the joint source prior:
(4) 
where is a type2 likelihood function obtained by marginalizing the joint density with respect to the unknown vectors in , i.e.,
(5)  
Here . We note that cannot be derived in closed form by directly maximizing the likelihood in (5) with respect to . Hence, as suggested in the SBL framework [15], we use the expectation maximization (EM) procedure to maximize by treating as hidden variables.
We now discuss the main steps of the EM algorithm to obtain . Let denote the variational approximation of true conditional density with variational parameter set . The variational parameters and represent the conditional mean and covariance of given . Then, as shown in [25], the log likelihood admits the following decomposition.
(6)  
where the term is the KullbackLeibler (KL) divergence between the probability densities and . From the nonnegativity of [26], the log likelihood is lower bounded by the first term in the RHS. In the Estep, we choose to make this variational lower bound tight by minimizing the KL divergence term.
(7) 
Here, denotes the iteration index of EM algorithm. From LMMSE theory, is Gaussian with mean and covariance given by
(8) 
By choosing and , the KL divergence term in (7) can be driven to its minimum value of zero.
In the Mstep, we choose to maximize the tight variational lower bound obtained in the Estep:
(9) 
As shown in Appendix A, the optimization problem (III) can be recast as the following minimization problem.
(10) 
From the zero gradient optimality condition in (10), the Mstep reduces to the following update rule:
(11) 
By repeatedly iterating between the Estep (8) and the Mstep (11), the EM algorithm converges to either a local maxima or a saddle point of [27]. Once is obtained, the MAP estimate of is evaluated by substituting it in the expression for in (8). It is observed that when the EM algorithm converges, the ’s belonging to the inactive support tend to zero, resulting in sparse MAP estimates. In practice, hard thresholding of is required to identify the nonzero support set. In this work, we remove all coefficients from the active support set for which is below the local noise variance. It must be noted that if the local noise variance at each node is unknown, it can be estimated along with within the EM framework, as discussed in [14].
Iv Decentralized Algorithm for JSM2
Iva Algorithm Development
In this section, we develop a decentralized version of the centralized algorithm discussed in the previous section. For notational convenience, we introduce an length vector maintained at node , where , and are as defined in (8).
From (11), we observe that the solution of the Mstep optimization (10) can be interpreted as an average of the vectors . The same solution can also be obtained by solving a different minimization problem
(12) 
Unlike the nonconvex Mstep objective function in (10), the surrogate objective function in (12) is convex in and therefore can be minimized in a distributed manner using powerful convex optimization techniques. An alternate form of (12) amenable to distributed optimization is given by
(13) 
where denotes the set of single hop neighbors of node . The equality constraints in (13) ensure its equivalence to the unconstrained optimization in (12). Here, the number of equality constraints is equal to , i.e., the total number of single hop links in the network. In a conventional decentralized implementation of (13), the number of messages exchanged between the nodes grow linearly with the number of consensus constraints. By restricting the nodes to exchange information only through a relatively small set of predesignated nodes called bridge nodes, the number of consensus constraints can be drastically reduced without affecting the equivalence of (12) and (13). Let denote the set of all bridge nodes in the network and denote the set of bridge nodes belonging to the single hop neighborhood of node , then (13) can be rewritten as
(14) 
The auxiliary variables , called bridge parameters, are used to establish consensus among . Each bridge parameter is a non negative length vector maintained by the bridge node . As motivated in [23], [28], using bridge nodes to impose network wide consensus allows us to trade off between the communication cost and robustness of the distributed optimization algorithm.^{2}^{2}2In an alternate embodiment of the proposed algorithm, the message exchanges could be restricted to occur only through the (trustworthy) bridge nodes, thereby avoiding direct communication between the nodes. In this case, the role of the bridge nodes could be to enforce consensus in across the nodes, and these nodes need not directly participate in signal reconstruction.
The following Lemma provides sufficient conditions on the choice of the bridge node set under which (12) and (14) are equivalent. The proof for the Lemma can be found in [23].
Lemma 1.
For a connected graph , if the bridge node set satisfies the following conditions

Each node must be connected to at least one bridge node in , i.e., for any , and,

If two nodes and are singlehop neighbors, then for any ,
then, in the solution to (14), ’s are equal for all .
Fig. 1 illustrates the selection of bridge nodes according to Lemma 1, in a sample network. In this work, we employ the Alternating Directions Method of Multipliers (ADMM) algorithm [29] to solve the convex optimization problem in (14). ADMM is the state of the art dual ascent algorithm for solving constrained convex optimization problems, offering a linear convergence rate and a natural extension to a decentralized implementation.
We start by constructing an augmented Lagrangian, , given by
(15)  
where denotes the sized Lagrange multiplier vector corresponding to the equality constraint and is a positive scalar which biases the quadratic consensus penalty term. For ease of notation, we define concatenated vectors and to be used in the sequel. We also define the concatenated Lagrange multiplier vector , where is the number of equality constraints in (14). The solution to (14) is then obtained by executing the following ADMM iterations until convergence:
(16)  
(17)  
(18) 
. Here, denotes the ADMM iteration index. In (1617), the primal variables, and , are updated in a GaussSeidel fashion by minimizing the augmented Lagrangian, , evaluated at the previous estimate of the dual variable . By adding an extra quadratic penalty term to the original Lagrangian, the objective in (17) is no longer affine in and hence has a bounded minimizer. The dual variable is updated via a gradientascent step (18) with a stepsize equal to the ADMM parameter . This particular choice of stepsize ensures the dual feasibility of the iterates for all . Since the augmented Lagrangian is strictly convex with respect to and individually, the zero gradient optimality conditions for (16) and (17) translate into simple update equations for and :
(19) 
(20) 
Here denotes the set of nodes connected to bridge node . As shown in Appendix B, by eliminating the Lagrange multiplier terms from (18) and (20), the update rule for can be further simplified to
(21) 
In section IVF, we compare the bridge node based ADMM discussed above with other decentralized optimization techniques available in the literature. We show empirically that the bridge node based ADMM scheme is able to flexibly trade off between communication complexity, robustness to node failures, speed of convergence, and signal reconstruction performance.
IvB CBDSBL Algorithm
We now propose the CBDSBL algorithm. Essentially, it is a decentralized EM algorithm for finding the ML estimate of the hyperparameters . The algorithm comprises two nested loops. In the outer loop, each node performs the Estep (8) in a standalone manner. In the inner loop, ADMM iterations are performed to solve the Mstep optimization in a decentralized manner. Upon convergence of the outer loop, each node has the same ML estimate of , which is then used to obtain a MAP estimate of the local sparse vector , similar to the centralized algorithm. The steps of the CBDSBL algorithm are detailed in Algorithm 1.
M step: , , ,
while do
end
Each ADMM iteration in the Mstep of the CBDSBL algorithm involves two rounds of communication (Steps and ) between the nodes. In the first communication round, each node transmits to its single hop neighbors. In the second communication round, each bridge node transmits to its single hop neighbors. Thus, in each Mstep, real numbers are exchanged between the nodes and their respective bridge nodes. In Fig. 2, we compare different variants of CBDSBL with respect to the average number of internode message exchanges required to achieve less than signal reconstruction error. From the figure, it is evident that the aforementioned bridge node based ADMM technique is effective in reducing the overall internode communication and the associated costs, without compromising on signal reconstruction performance. One of the ways of selecting the bridge node set is to sort the nodes in decreasing order of their nodal degrees and retain the least number of top most nodes satisfying the conditions in Lemma 1. Although suboptimal, this scheme is able to significantly reduce the overall communication complexity of the algorithm as demonstrated empirically in Fig. 2. In section IVD, a rule of thumb policy is discussed to select the bridge nodes which will ensure fast convergence of the decentralized ADMM iterations in the Mstep of CBDSBL algorithm.
Further reduction in internode communication is possible by executing only a finite number of ADMM iterations per Mstep. In a practical embodiment of the algorithm, running a single ADMM iteration per Mstep is sufficient for the CBDSBL to converge. As shown in Fig. 3, beyond two or three ADMM iterations per Mstep, there is only a marginal improvement in the quality of solution as well the convergence speed. Fig. 4 shows that even with a single ADMM iteration per Mstep, CBDSBL typically converges quite rapidly to the centralized solution.
IvC Convergence of ADMM Iterations in the Mstep
In this section, we analyze the convergence of the ADMM iterations (18), (19) and (21) derived for the Mstep optimization in CBDSBL. By doing so, we aim to highlight the effects of the bridge node set and the augmented Lagrangian parameter on the convergence of the ADMM iterations.
ADMM has been a very popular choice for solving both convex [33, 5, 29, 31, 23] and more recently nonconvex [34] optimization problems as well, in a distributed setup. In its classical form, ADMM solves the following constrained optimization problem:
(22) 
where and are the primal variables. The matrices and the vector appearing in the linear equality constraint are of appropriate dimensions. The functions and are convex with respect to and , respectively. In [35], the authors have shown linear convergence rate for the classical ADMM iterations under the assumptions of strict convexity and Lipschitz gradient on one of or , along with full row rank assumptions for the matrix . However, in the ADMM formulation of a decentralized consensus optimization problem, the coefficient matrix is seldom of full row rank. In [36], the full row rank condition of was relaxed and linear rate of convergence was established for decentralized ADMM iterations for a generic convex optimization with linear consensus constraints similar to (13). In [37], the convergence of ADMM for solving an average consensus problem has been analyzed for both noiseless and noisy communication links. In both [36] and [37], the secondary primary variables indicated by the entries of have a one to one correspondence with the communication links between the network nodes. However, such a bijection is missing for the bridge variables used in our work for enforcing consensus between the primal variables. Due to this, the convergence results of [36, 37] are not directly applicable to our case. In the sequel, we present the analysis of the convergence of decentralized ADMM iterations for the bridge node internode communication scheme.
In this section, we analyze the convergence of the ADMM iterations (18), (19) and (21) derived for the Mstep optimization in CBDSBL. By doing so, we aim to highlight the effects of the bridge node set and the augmented Lagrangian parameter on the convergence of the ADMM iterations. We start by defining block matrices and of sizes and , respectively. The rows of and encode the equality constraints in (14) such that if equality constraint is , , then and ; with the rest of the entries in the row being zero. It can easily be shown that the minimum and maximum number of bridge nodes connected to any node in the network is the same as the minimum and maximum eigenvalues of , denoted by and , respectively. Fig. 5 illustrates the construction of the block matrices and for an example network consisting of nodes.
Using the newly defined terms, the optimization problem in (14) can be rewritten compactly as
(23) 
where denotes the objective function in (14), which depends only on . The augmented Lagrangian corresponding to (23) can also be rewritten compactly as
(24)  
By construction, the block matrix has full column rank, as all its columns are mutually disjoint in support. However can be row rank deficient due to repeated rows caused by a node being connected to multiple bridge nodes, which is often the case. Since the matrix is row rank deficient, the ADMM convergence results of [35] are not applicable to (23). Theorem 1 below summarizes the convergence of the ADMM iterations (18), (19) and (21) to their fixed point. The result in Theorem 1 holds for any that is strongly convex with strong convexity constant , and with an Lipschitz continuous gradient.
Theorem 1.
Let , and denote the unique primal and dual optimal solutions of (23), and vector be constructed as (similarly for ). Then, it holds that

The sequence is Qlinearly^{3}^{3}3 A sequence is said to be a Qlinearly convergent to , if there exists such that [36]. convergent to , i.e.,
(25) where is evaluated as
(26) 
The primal sequence is Rlinearly^{4}^{4}4 A sequence is said to be Rlinearly convergent to , if there exists Qlinearly convergent sequence which converges to zero such that . convergent to , i.e.,
(27)
where is the weighted norm with respect to the diagonal matrix .
Proof.
See Appendix C. ∎
IvD Selection of the Augmented Lagrangian Parameter
From (25) and (27) in Theorem 1, we observe that to optimize the decay of the primal optimality gap between and in each ADMM iteration, the augmented Lagrangian parameter has to be chosen such that it maximizes in (26). Theorem 2 reveals the optimal value of and the corresponding value of .
Theorem 2.
The optimal value of augmented Lagrangian parameter which uniquely maximizes the as defined in (26) is given by
(28) 
The corresponding maximal value of is given by
(29) 
where represents the condition number of the objective function in (14) and is the ratio of the maximum and minimum eigenvalues of .
Proof.
See Appendix E. ∎
From (29), we observe that the convergence rate of the ADMM iteration in the Mstep of CBDSBL algorithm depends upon two factors: and . close to its minimum value of unity results in faster convergence of the ADMM iterations. Since the ratio is also equal to the ratio of maximum and minimum number of bridge nodes per node in the network, a rule of thumb for bridge node selection would be to ensure that each node is connected to more or less the same number of bridge nodes. The convergence rate also depends upon , the parameter that is dependent on how well conditioned the function is. For the case where is the objective function in (14), it is easy to show that and . Thus, specific to CBDSBL, the optimal ADMM parameter is given by and the corresponding . For a given network connectivity graph , this can be computed offline and programmed in each node. As shown in Fig. 6, the average MSE and mean number of iterations vary widely with , an inappropriate choice of resulting in slow convergence and poor reconstruction performance. Also, the computed in (28) is very close to the that results in both the fastest convergence as well as the lowest average MSE.
IvE Computational Complexity of CBDSBL
In this section, we discuss the computational complexity of the steps involved in a single iteration of the CBDSBL algorithm. The local Estep requires elementary operations at each node. The Mstep is executed as multiple (say, ) ADMM iterations. A single ADMM iteration involves updating of the local hyperparameter estimate and Lagrange multipliers, which takes computations per node, being the highest number of bridge nodes assigned per node in the network. Further, each bridge node has to perform an additional computations to update the local bridge parameters in every ADMM iteration. Thus, the overall computational complexity of a single CBDSBL algorithm at each node is , and, as desired, it does not scale with , i.e., the total number of nodes in the network.
IvF Other CBDSBL Variants
There are several alternatives to the aforementioned bridge node based ADMM technique that could potentially be used to solve the Mstep optimization in (13). In this section, we present empirical results comparing the performance and communication complexity of four different variations of the proposed CBDSBL algorithm based on (i) bridge node based ADMM [23] (ii) Distributed ADMM (DADMM) [31] (iii) Consensus averaging Method of Multipliers (CAMoM) [30], and (iv) EXact firsT ordeR Algorithm (EXTRA) [32]. Each of these decentralized algorithms is endowed with at least convergence rate, where stands for the iteration count. Besides these four, there are proximal gradient based methods [38, 39] relying on Nesterovtype acceleration techniques which also offer linear convergence rates. However, these algorithms require the objective function to be bounded and involve multiple communication rounds per iteration, which is of major concern in our work. As shown in Fig. 2, the proposed CBDSBL variant relying on the bridge node based ADMM scheme is the most communication efficient one.
IvG Implementation Issues
CBDSBL algorithm can be seen as a decentralized EM algorithm to find the ML estimate of the hyperparameters of a sparsity inducing prior. CBDSBL, not surprisingly, also inherits the tendency of the EM algorithm to converge to one of the multiple local maxima of the ML cost function . However, getting trapped in a local maximum is not a problem, as it has been shown in [15] that all local maxima of the are at most sparse and hence qualify as reasonably good solutions to our original sparse model estimation problem. Despite this, it is recommended to seed the EM algorithm with whose all entries are close to zero.
Another common issue is that of the wide variation in the energy of the nonzero entries of across the network. Specifically, in distributed event classification by a multitude of different types of sensors [6], each sensor node may employ its own distinct sensing modality and hence may perceive a different SNR. In such cases, a preconditioning step which normalizes the local response vector to unit energy is recommended for fast convergence of the CBDSBL algorithm. The local sparse signal estimates can be readjusted in the end to undo the preconditioning.
V Simulation Results
In this section, we present simulation results to examine the performance and complexity aspects of the proposed CBDSBL algorithm when compared with existing decentralized algorithms: DRL1 [5], DCOMP [18] and DCSP [19]. The centralized MSBL [14] is also included in the study as a performance benchmark for the proposed decentralized algorithm. The CBDSBL variant considered here executes two ADMM iterations in the inner loop for every EM iteration in the outer loop. The value of the augmented Lagrangian parameter, , is chosen according to (28). For each experiment, the set of bridge nodes is selected as described in section IVB. The local measurement matrices are chosen to be Gaussian random matrices with normalized columns. The nonzero signal coefficients are sampled independently from the Rademacher distribution, unless mentioned otherwise. For each trial, the connections between the nodes are assumed according to a randomly generated ErdösRenyi graph with a node connection probability of . In the final step of MSBL and CBDSBL algorithms, the active support is identified by elementwise thresholding the local hyperparameter vector at node using the threshold , where denotes the local measurement noise variance.
Va Performance versus SNR
In the first set of experiments, we compare the normalized mean squared error (NMSE) and the normalized support error rate (NSER) of different algorithms for a range of SNRs. The supportaware LMMSE estimator sets the MSE performance benchmark for all the support agnostic algorithms considered here. The NMSE and NSER error metrics are defined as
where is the true common support and is the support estimated at node . The network size is fixed to nodes. As seen in Fig. 7, CBDSBL matches the performance of centralized MSBL in all cases. For higher SNR ( dB), it can be seen that both MSBL and proposed CBDSBL are MSE optimal. CBDSBL also outperforms DRL1 and DCOMP in terms of both MSE and support recovery. This is attributed to the fact that the Gaussian prior used in CBDSBL with its alternate interpretation as a variational approximation to the Student’s tdistribution is more capable of inducing sparsity in comparison to the sumlogsum penalty used in DRL1. The poor performance of DCOMP is primarily due to its sequential approach towards support recovery which prevents any corrections to be applied to the support estimate at each step of the algorithm. Contrary to [19], DCSP fails to perform better than DCOMP. This is because DCSP works only when the number of measurements exceeds , where is the size of the nonzero support.
VB Tradeoff between Measurement Rate and Network Size
In the second set of experiments, we characterize the NMSE phase transition of the different algorithms in plane to identify the minimum measurement rate () needed to ensure less than signal reconstruction error (or, NMSE dB), for different network sizes (), and a fixed sparsity rate (). As shown in Fig. 8, for the same network size, CBDSBL is able to successfully recover the unknown signals at a much lower measurement rate compared to DRL1, DCOMP and DCSP. This plot brings out the significant benefit of using collaboration between nodes and taking advantage of the JSM2 model in reducing the number of measurements required per node for successful signal recovery. Additionally, as the network grows in size, the complexity of the local computations at each node also reduces with the number of local measurements (see section IVE).
VC Performance versus Measurement Rate ()
In the third set of experiments, we compare the algorithms with respect to their ability to recover the exact support for different undersampling ratios. As seen in Fig. 9, for a similar network size, CBDSBL is able to exploit the joint sparsity structure better than DCOMP, DCSP and DRL1, and can correctly recover the support from significantly fewer number of measurements per node. Once again, CBDSBL has identical support recovery performance as the centralized MSBL, which was one of our design goals.
VD Phase Transition Characteristics
In these set of experiments, we compare the phase transition behavior of different algorithms under NMSE and support recovery based pass/fail criteria. Fig. 9(a) plots the MSE phase transition of different algorithms where any point below the phase transition curve represents a sparsity rate and measurement rate tuple which results in an NMSE smaller than dB corresponding to smaller than percent signal reconstruction error. Likewise, in Fig. 9(b), points below the support recovery phase transition curve represent tuples which result in more than percent accurate nonzero support reconstruction across all the nodes. Again, we see that the CBDSBL and centralized MSBL have identical performance and both are capable of signal reconstruction from considerably fewer measurements compared to DRL1, DCOMP and DCSP.
VE Tradeoff between Number of Bridge Nodes and Robustness to Node Failures
In the final set of experiments, we demonstrate empirically that increasing the number of bridge nodes in the CBDSBL algorithm makes it more robust to random node failures. As shown in Fig. 11, by gradually increasing the density of bridge nodes in the network, the CBDSBL algorithm is able to tolerate higher rates of node failures without compromising on signal reconstruction performance. More interestingly, only a relatively small fraction of nodes need to be bridge nodes ( of the total network size) to ensure that CBDSBL operates robustly in the face of random node failures.
Vi Conclusions
In this paper, we proposed a novel iterative Bayesian algorithm called CBDSBL for decentralized estimation of jointsparse signals by multiple nodes in a network. The CBDSBL algorithm employs ADMM based decentralized EM procedure to efficiently learn the parameters of a joint sparsity inducing signal prior which is shared by all the nodes, and is subsequently used in the MAP estimation of the local signals. The CBDSBL algorithm is well suited for applications where the privacy of the signal coefficients is important, as there is no direct exchange of either measurements or signal coefficients between the nodes. Experimental results showed that CBDSBL outperforms existing decentralized algorithms: DRL1, DCOMP and DCSP, in terms of both NMSE as well as support recovery performance. We also established Rlinear convergence of the underlying decentralized ADMM iterations. The amount of internode communication during the ADMM iterations is controlled by restricting each node to exchange information with only a small subset of its single hop neighbors. For this internode communication scheme the ADMM convergence results presented here are applicable to any consensus driven optimization of a convex objective function. Future extensions of this work could encompass exploiting any inter vector correlation between the jointly sparse signals. Also, it would be interesting to analyze the convergence of CBDSBL algorithm in the presence of noisy communication links between nodes and under asynchronous network operation.
Appendix
A Derivation of the Mstep Cost Function
B Derivation of the Simplified Update for
C Proof of Theorem 1
The proof of the convergence of ADMM discussed in the sequel is a based on the proof given in [36]. However, our proof differs from the one in [36] due to the different scheme adopted here, which uses the auxiliary/bridge nodes to enforce consensus between the nodes. We make the following assumptions about the objective function in (23).

is twice differentiable and strongly convex in . This implies that there exists such that, for all , the following holds
(34) 
is Lipschitz continuous, i.e., there exists a positive scalar such that, for all