Decentralized Joint-Sparse Signal Recovery: A Sparse Bayesian Learning Approach This work has appeared in part in [1].

# Decentralized Joint-Sparse Signal Recovery: A Sparse Bayesian Learning Approach ††thanks: This work has appeared in part in [1].

\authorblockNSaurabh Khanna, Student Member, IEEE and Chandra R. Murthy, Senior Member, IEEE
\authorblockA Dept. of ECE, Indian Institute of Science Bangalore, India {sakhanna, cmurthy}@ece.iisc.ernet.in
###### Abstract

This work proposes a decentralized, iterative, Bayesian algorithm called CB-DSBL for in-network estimation of multiple jointly sparse vectors by a network of nodes, using noisy and underdetermined linear measurements. The proposed algorithm exploits the network wide joint sparsity of the unknown sparse vectors to recover them from significantly fewer number of local measurements compared to standalone sparse signal recovery schemes. To reduce the amount of inter-node communication and the associated overheads, the nodes exchange messages with only a small subset of their single hop neighbors. Under this communication scheme, we separately analyze the convergence of the underlying Alternating Directions Method of Multipliers (ADMM) iterations used in our proposed algorithm and establish its linear convergence rate. The findings from the convergence analysis of decentralized ADMM are used to accelerate the convergence of the proposed CB-DSBL algorithm. Using Monte Carlo simulations, we demonstrate the superior signal reconstruction as well as support recovery performance of our proposed algorithm compared to existing decentralized algorithms: DRL-1, DCOMP and DCSP.

{keywords}

Decentralized Estimation, Distributed Compressive Sensing, Joint Sparsity, Sparse Bayesian Learning, Sensor Networks.

## I Introduction

We consider the problem of in-network estimation of multiple joint-sparse vectors by a network of connected agents or processing nodes, using noisy and underdetermined linear measurements. Two or more vectors in are called joint-sparse if, in addition to each vector being individually sparse,111 A vector in is said to be -sparse if only out of its coefficients are nonzero. their nonzero coefficients belong to a common index set. Joint sparsity occurs naturally in scenarios involving multiple agents trying to learn a sparse representation of a common physical phenomenon. Since the underlying physical phenomenon is the same for all the agents (with similar acquisition modalities), their individual sparse representations/model parameters tend to exhibit joint sparsity. In this work, we consider joint-sparse vectors which belong to Type-2 Joint Sparse Model [2] or JSM-2, one of the three generative models for joint-sparse signals. JSM-2 signal vectors satisfy the property that their nonzero coefficients are uncorrelated within and across the vectors. JSM-2 has been successfully used in several applications such as cooperative spectrum sensing [3, 4], decentralized event detection [5, 6], multi-task compressive sensing [7] and MIMO channel estimation[8, 9, 10].

To further motivate the signal structure of joint sparsity in a distributed setup, consider the problem of detection/classification of randomly occurring events in a field by multiple sensor nodes. Each sensor node , , employs a dictionary , whose each column is the signature corresponding to the event, one out of the events which can potentially occur. In many cases, due to the inability to accurately model the sensing process, the signature vectors are simply chosen to be the past recordings of sensor corresponding to standalone occurrence of the event, averaged across multiple experiments [6]. This procedure can result in a dictionary whose columns are highly correlated. Thus, for any events occurring simultaneously, a noisy sensor recording might belong to multiple subspaces, each spanned by different subsets of columns of the local dictionary. In such a scenario, enforcing joint sparsity across the sensor nodes can resolve the ambiguity in selecting the correct subset of columns at each sensor node.

In this work, we consider a distributed setup where each individual joint-sparse vector is estimated by a distinct node in a network comprising multiple nodes, with each node having access to noisy and underdetermined linear measurements of its local sparse vector. By collaborating with each other, these nodes can exploit the underlying joint sparsity of their local sparse vectors to reduce the measurements required per node or improve the quality of their local signal estimates. In [2], it has been shown that the number of local measurements required for common support recovery can be dramatically reduced by exploiting the joint sparsity structure prevalent across the network. In fact, as the nodes increase in number, exact signal reconstruction is possible from as few as measurements per node, where denotes the size of the support set. Such a substantial reduction in the number of measurements is highly desirable, especially in applications where the cost or time required to acquire new measurements is high.

Distributed algorithms for JSM-2 signal recovery come in two flavors - centralized and decentralized. In the centralized approach, each node transmits its local measurements to a fusion center (FC) which runs a joint-sparse signal recovery algorithm. The FC then transmits the reconstructed sparse signal estimates back to their respective nodes. In contrast, in a decentralized approach, the goal is to obtain the same solution as with the centralized scheme at all nodes by allowing each node to exchange information with its single hop neighbors in addition to processing its local measurements. Besides being inherently robust to node failures, decentralized schemes also tend to be more energy efficient as the inter-node communication is restricted to relatively short ranges covering only one hop communication links. In this work, we focus on the decentralized approach for solving the sparse signal recovery problem under the JSM-2 signal model.

### I-a Related Work

In this subsection, we briefly summarize the existing centralized and decentralized algorithms for JSM-2 signal recovery. The earliest work on joint-sparse signal recovery considered extensions of recovery algorithms meant for single measurement vector setup to the centralized multiple measurement vector (MMV) model [11], and demonstrated the significant performance gains that are achievable by exploiting the joint sparsity structure. MMV Basic Matching Pursuit (M-BMP), MMV Orthogonal Matching Pursuit (M-OMP) and MMV FOcal Underdetermined System Solver (M-FOCUSS), introduced in [11], belong to this category. In [12], joint sparsity was exploited for distributed encoding of multiple sparse signals. This work generalized the joint-sparse signals as being generated according to one of the three joint-sparse signal models (JSM-1,2,3). This work also proposed a centralized greedy algorithm called Simultaneous Orthogonal Matching Pursuit (SOMP) [2] for JSM-2 recovery. In [13], Alternating Directions Method for MMV setup (ADM-MMV) was proposed which used an mixed norm penalty to promote a joint-sparse solution. In [14], the multiple response sparse Bayesian learning (M-SBL) algorithm was proposed as an MMV extension of the SBL algorithm [15]. Unlike the algorithms discussed earlier, M-SBL adopts a probabilistic approach by seeking the maximum a posterior probability (MAP) estimate of the JSM-2 signals. In M-SBL, a joint-sparse solution is encouraged by assuming a joint sparsity inducing parameterized prior on the unknown sparse vectors, with the prior parameters learnt directly from the measurements. M-SBL has been shown to outperform deterministic methods based on norm relaxation such as M-BMP and M-FOCUSS [11] as well as greedy algorithms such as SOMP. AMP-MMV [16] is another Bayesian algorithm which uses approximate message passing (AMP) to obtain marginalized conditional posterior distributions of joint-sparse signals. Owing to their low computational complexity, AMP based algorithms are suitable for recovering signals with large dimensions. However, they have been shown to converge only for large dimensional and randomly constructed measurement matrices. Interested readers are referred to [17] for an excellent study comparing some of the aforementioned centralized JSM-2 signal recovery algorithms.

Among decentralized algorithms, collaborative orthogonal matching pursuit (DCOMP) [18] and collaborative subspace pursuit (DCSP) [19] are greedy algorithms for JSM2 signal recovery, and both are computationally very fast. However, as demonstrated later in this paper, they do not perform as well as regularization based methods which induce joint sparsity in their solution by employing a suitable penalty or indirectly via a joint signal prior. Moreover, both DCOMP and DCSP assume a priori knowledge of the size of the nonzero support set, which could be unknown or hard to estimate. Decentralized row-based LASSO (DR-LASSO) [20] is an iterative alternating minimization algorithm which optimizes a non-convex objective with - mixed norm based regularization to obtain a joint-sparse solution. Decentralized re-weighted minimization algorithms DRL-1,2 [5] employ a non-convex sum-log-sum penalty to promote a joint-sparse solution. Although non-convex regularizers induce sparsity much more strongly as compared to convex norm based regularizers [21], the resulting non-convex optimization can be difficult to solve efficiently. In DRL-1/2, the non-convex objective is replaced by a surrogate convex function constructed from iteration dependent weighted / norm terms. Using a non-convex sum-log-sum regularization results in a more sparse solution compared to convex regularization used in DR-LASSO. However, both DR-LASSO and DRL-1,2 necessitate cross validation to tune the amount of regularization needed for optimal support recovery performance. DRL-1,2 also requires proper tuning of a so-called smoothing parameter and an ADMM parameter for its optimal performance. By employing a Bayesian approach,we can completely eliminate any need for cross validation, by learning the parameters of a family of signal priors, such that selected signal prior has maximum Bayesian evidence. DCS-AMP [3] is one such decentralized algorithm which employs approximate message passing to learn a parameterized joint sparsity inducing Bernoulli-Gaussian signal prior. Turbo Bayesian Compressive Sensing (Turbo-BCS) [22], another decentralized algorithm, adopts a more relaxed zero mean Gaussian signal prior, with the variance hyperparameters themselves distributed according to an exponential distribution. This relaxation of signal prior results in improved MSE without compromising on sparsity of the solution. Turbo-BCS, however, involves direct exchange of signal estimates between the nodes, which renders it unsuitable for applications where it is necessary to preserve the privacy of the local signals.

### I-B Contributions

Our main contributions in this work are as follows:

1. We propose a novel decentralized, iterative, Bayesian joint-sparse signal recovery algorithm called Consensus Based Distributed Sparse Bayesian Learning or CB-DSBL. CB-DSBL works by establishing network wide consensus with respect to the estimated parameters of a joint sparsity inducing signal prior. The learnt signal prior is subsequently used by the individual nodes to obtain MAP estimates of local sparse signal vectors by the individual nodes. The proposed CB-DSBL algorithm does not require direct exchange of either local measurements or signal estimates between the nodes and hence is well suited for applications where it is important to preserve the privacy of the local signal coefficients.

2. The proposed algorithm employs the Alternating Directions Method of Multipliers (ADMM) to solve a series of iteration dependent consensus optimization problems which require the nodes to exchange messages with each other. To reduce the associated communication overheads, we adopt a bandwidth efficient inter-node communication scheme. This scheme entails the nodes exchanging messages with only a predesignated subset of its single hop neighbors known as bridge nodes, as motivated in [23]. By selecting these bridge nodes, one can trade off between communication bandwidth requirements and the ADMM’s robustness to node failures. In this connection, we analytically establish the relationship between the selected set of bridge nodes and the convergence rate of the ADMM iterations. For the bridge-node based inter-node communication scheme, we show linear rate of convergence for the ADMM iterations when applied to a generic consensus optimization problem. The analysis is useful in obtaining a closed form expression for the tunable parameter of our proposed joint sparse signal recovery algorithm, ensuring its fast convergence.

3. We empirically demonstrate the superior MSE and support recovery performance of CB-DSBL in comparison to existing decentralized algorithms: DRL-1, DCOMP and DCSP.

In Table I, we compare the existing decentralized joint-sparse signal recovery schemes with respect to their per iteration computational and communication complexity, privacy of local estimates, presence/absence of tunable parameters and dependence on prior knowledge of the sparsity level. As highlighted in the comparison in Table I, CB-DSBL belongs to a handful of decentralized algorithms for joint-sparse signal recovery which do not require a priori knowledge of the sparsity level, rely only on single hop communication, and do not involve direct exchange of local signal estimates between network nodes. Besides this, unlike loopy Belief Propagation (BP) or Approximate Message Passing (AMP) based Bayesian algorithms, CB-DSBL does not suffer from any convergence issues even when the local measurement matrix at each node is dense or not randomly constructed.

The rest of this paper is organized as follows. Section II describes the system model and the problem statement of distributed JSM-2 signal recovery. Section III discusses centralized M-SBL [14] adapted to our setup, and sets the stage for our proposed decentralized solution. Section IV develops the proposed CB-DSBL algorithm along with a detailed discussion on the convergence properties of the underlying ADMM iterations. Other implementation specific issues are also discussed. Section V compares the performance of proposed algorithm with existing ones with respect to various performance metrics. Finally, section VI concludes the paper.

Notation: Boldface lowercase and uppercase alphabets are used to denote vectors and matrices, respectively. Script styled alphabet (for example ) is used to denote a set. denotes the cardinality of set . The term denotes the element of vector associated with node at iteration/time index. The superscript denotes the transpose operation. For matrices and of sizes and respectively, denotes their Kronecker product, which is of size . denotes the Gaussian distribution with mean and covariance matrix . denotes taking expectation of random variable conditioned on another random variable .

## Ii Distributed JSM-2 System Model

We consider a network of nodes/sensors connected as a network described by a bi-directional graph . is the set of vertices in , each vertex representing a node in the network. Set contains the edges in , each edge representing a single hop error-free communication link between a distinct pair of nodes. Each node is interested in estimating an unknown -sparse vector from locally acquired noisy linear measurements . The generative model of the local measurement vector at node is given by

 yj=Φjxj+wj,1≤j≤L (1)

where, is a full rank sensing matrix and is the measurement noise modeled as zero mean Gaussian distributed with covariance matrix . The sparse vectors at different nodes follow the JSM-2 signal model [12]. This implies that all share a common support, represented by the index set . From the JSM-2 model, it also follows that the nonzero coefficients of the sparse vectors are independent within and across the vectors.

The goal is to recover the local sparse vectors at their respective nodes using decentralized processing. In addition to processing the local data , each node must collaborate with its single hop neighboring nodes to exploit the network wide joint sparsity of the unknown sparse vectors. For sake of privacy, the nodes are prohibited from directly exchanging their local measurements or local signal estimates. Finally, the decentralized algorithm should be able to generate the centralized solution at each node, as if each node has access to the entire global information i.e., .

## Iii Centralized Algorithm for JSM-2

In this section, we briefly recall the centralized M-SBL algorithm [14] for JSM-2 signal recovery and extend it to support distinct measurement matrices and noise variances at each node. The centralized algorithm runs at an FC, which assumes complete knowledge of network wide information, . For ease of notation, we introduce two variables and to be used in the sequel.

Similar to M-SBL, each of the sparse vectors is assumed to be distributed according to a parameterized signal prior shown below.

 p(xj;γ) = n∏i=1p(xj(i);γ(i)) (2) = n∏i=11√2πγ(i)exp(−xj(i)22γ(i)).

Further, the joint signal prior is assumed to be given by

 p(X;γ)=∏j∈Jp(xj;γ). (3)

In the above, is an dimensional hyperparameter vector, whose entry, , models the common variance of for . Since the signal priors are parameterized by a common , if has a sparse support , then the MAP estimates of will also be jointly sparse with the same common support . The Gaussian prior in (2) promotes sparsity as it has an alternate interpretation as a parameterized model for the family of variational approximations to a sparsity inducing Student’s t-distributed prior [24]. Under this interpretation, finding the hyperparameter vector which maximizes the likelihood is equivalent to finding the variational approximation which has the largest Bayesian evidence.

Let denote the maximum likelihood (ML) estimate of hyperparameters of the joint source prior:

 ^γML=arg% max γp(Y;γ) (4)

where is a type-2 likelihood function obtained by marginalizing the joint density with respect to the unknown vectors in , i.e.,

 p(Y;γ) =L∏j=1∫p(yj|xj)p(xj;γ)dxj (5) =L∏j=1N(0,ΦjΓΦTj+σ2jIm).

Here . We note that cannot be derived in closed form by directly maximizing the likelihood in (5) with respect to . Hence, as suggested in the SBL framework [15], we use the expectation maximization (EM) procedure to maximize by treating as hidden variables.

We now discuss the main steps of the EM algorithm to obtain . Let denote the variational approximation of true conditional density with variational parameter set . The variational parameters and represent the conditional mean and covariance of given . Then, as shown in [25], the log likelihood admits the following decomposition.

 logp(Y;γ) = ∫qθ(X)logp(Y,X;γ)qθ(X)dX (6) + D(qθ(X)||p(X|Y;γ))

where the term is the Kullback-Leibler (KL) divergence between the probability densities and . From the non-negativity of [26], the log likelihood is lower bounded by the first term in the RHS. In the E-step, we choose to make this variational lower bound tight by minimizing the KL divergence term.

 (7)

Here, denotes the iteration index of EM algorithm. From LMMSE theory, is Gaussian with mean and covariance given by

 Σk+1j=Γk−ΓkΦTj(σ2jIm+ΦjΓkΦTj)−1ΦjΓk and μk+1j=σ−2jΣk+1jΦTjyj. (8)

By choosing and , the KL divergence term in (7) can be driven to its minimum value of zero.

In the M-step, we choose to maximize the tight variational lower bound obtained in the E-step:

 γk+1= arg maxγ∫qθk+1(X)logp(Y,X;γ)qθk+1(X)dX = arg maxγEX∼qθk+1[logp(Y,X;γ)]. (9)

As shown in Appendix A, the optimization problem (III) can be recast as the following minimization problem.

 γk+1=arg minγ∈Rn+∑j∈Jn∑i=1⎛⎝logγ(i)+Σkj(i,i)+μkj(i)2γ(i)⎞⎠. (10)

From the zero gradient optimality condition in (10), the M-step reduces to the following update rule:

 γk+1(i)=1L∑j∈J(Σk+1j(i,i)+μk+1j(i)2)% for 1≤i≤n. (11)

By repeatedly iterating between the E-step (8) and the M-step (11), the EM algorithm converges to either a local maxima or a saddle point of [27]. Once is obtained, the MAP estimate of is evaluated by substituting it in the expression for in (8). It is observed that when the EM algorithm converges, the ’s belonging to the inactive support tend to zero, resulting in sparse MAP estimates. In practice, hard thresholding of is required to identify the nonzero support set. In this work, we remove all coefficients from the active support set for which is below the local noise variance. It must be noted that if the local noise variance at each node is unknown, it can be estimated along with within the EM framework, as discussed in [14].

## Iv Decentralized Algorithm for JSM-2

### Iv-a Algorithm Development

In this section, we develop a decentralized version of the centralized algorithm discussed in the previous section. For notational convenience, we introduce an length vector maintained at node , where , and are as defined in (8).

From (11), we observe that the solution of the M-step optimization (10) can be interpreted as an average of the vectors . The same solution can also be obtained by solving a different minimization problem

 γk+1=arg min γ∈Rn+∑j∈J∥γ−ak+1j∥22. (12)

Unlike the non-convex M-step objective function in (10), the surrogate objective function in (12) is convex in and therefore can be minimized in a distributed manner using powerful convex optimization techniques. An alternate form of (12) amenable to distributed optimization is given by

 min γj∈Rn+,j∈J∑j∈J∥γj−ak+1j∥22 subject to γj=γj′∀j∈J,j′∈Nj (13)

where denotes the set of single hop neighbors of node . The equality constraints in (13) ensure its equivalence to the unconstrained optimization in (12). Here, the number of equality constraints is equal to , i.e., the total number of single hop links in the network. In a conventional decentralized implementation of (13), the number of messages exchanged between the nodes grow linearly with the number of consensus constraints. By restricting the nodes to exchange information only through a relatively small set of pre-designated nodes called bridge nodes, the number of consensus constraints can be drastically reduced without affecting the equivalence of (12) and (13). Let denote the set of all bridge nodes in the network and denote the set of bridge nodes belonging to the single hop neighborhood of node , then (13) can be rewritten as

 minimize γj∈Rn+,j∈J∑j∈J∥γj−ak+1j∥22 subject to γj=γb∀j∈J,b∈Bj. (14)

The auxiliary variables , called bridge parameters, are used to establish consensus among . Each bridge parameter is a non negative length vector maintained by the bridge node . As motivated in [23], [28], using bridge nodes to impose network wide consensus allows us to trade off between the communication cost and robustness of the distributed optimization algorithm.222In an alternate embodiment of the proposed algorithm, the message exchanges could be restricted to occur only through the (trustworthy) bridge nodes, thereby avoiding direct communication between the nodes. In this case, the role of the bridge nodes could be to enforce consensus in across the nodes, and these nodes need not directly participate in signal reconstruction.

The following Lemma provides sufficient conditions on the choice of the bridge node set under which (12) and (14) are equivalent. The proof for the Lemma can be found in [23].

###### Lemma 1.

For a connected graph , if the bridge node set satisfies the following conditions

1. Each node must be connected to at least one bridge node in , i.e., for any , and,

2. If two nodes and are single-hop neighbors, then for any ,

then, in the solution to (14), ’s are equal for all .

Fig. 1 illustrates the selection of bridge nodes according to Lemma 1, in a sample network. In this work, we employ the Alternating Directions Method of Multipliers (ADMM) algorithm [29] to solve the convex optimization problem in (14). ADMM is the state of the art dual ascent algorithm for solving constrained convex optimization problems, offering a linear convergence rate and a natural extension to a decentralized implementation.

We start by constructing an augmented Lagrangian, , given by

 Lρ(γJ,γB,λ) ≜ ∑j∈J∥γj−ak+1j∥22+ (15) ∑j∈J∑b∈Bj(λbj)T(γj−γb)+ρ2∑j∈J∑b∈Bj∥γj−γb∥22

where denotes the sized Lagrange multiplier vector corresponding to the equality constraint and is a positive scalar which biases the quadratic consensus penalty term. For ease of notation, we define concatenated vectors and to be used in the sequel. We also define the concatenated Lagrange multiplier vector , where is the number of equality constraints in (14). The solution to (14) is then obtained by executing the following ADMM iterations until convergence:

 (16) (17) (λbj)r+1=(λbj)r+ρ(γr+1j−γr+1b) (18)

. Here, denotes the ADMM iteration index. In (16-17), the primal variables, and , are updated in a Gauss-Seidel fashion by minimizing the augmented Lagrangian, , evaluated at the previous estimate of the dual variable . By adding an extra quadratic penalty term to the original Lagrangian, the objective in (17) is no longer affine in and hence has a bounded minimizer. The dual variable is updated via a gradient-ascent step (18) with a step-size equal to the ADMM parameter . This particular choice of step-size ensures the dual feasibility of the iterates for all . Since the augmented Lagrangian is strictly convex with respect to and individually, the zero gradient optimality conditions for (16) and (17) translate into simple update equations for and :

 γr+1j=2ak+1j+∑b∈Bj(ργrb−(λbj)r)2+ρ|Bj|∀j∈J (19)
 and γr+1b=∑j∈Nb(ργr+1j+(λbj)r)ρ|Nb|∀b∈B. (20)

Here denotes the set of nodes connected to bridge node . As shown in Appendix B, by eliminating the Lagrange multiplier terms from (18) and (20), the update rule for can be further simplified to

 γr+1b=1|Nb|∑j∈Nbγr+1j∀b∈B. (21)

In section IV-F, we compare the bridge node based ADMM discussed above with other decentralized optimization techniques available in the literature. We show empirically that the bridge node based ADMM scheme is able to flexibly trade off between communication complexity, robustness to node failures, speed of convergence, and signal reconstruction performance.

### Iv-B CB-DSBL Algorithm

We now propose the CB-DSBL algorithm. Essentially, it is a decentralized EM algorithm for finding the ML estimate of the hyperparameters . The algorithm comprises two nested loops. In the outer loop, each node performs the E-step (8) in a standalone manner. In the inner loop, ADMM iterations are performed to solve the M-step optimization in a decentralized manner. Upon convergence of the outer loop, each node has the same ML estimate of , which is then used to obtain a MAP estimate of the local sparse vector , similar to the centralized algorithm. The steps of the CB-DSBL algorithm are detailed in Algorithm 1.

Each ADMM iteration in the M-step of the CB-DSBL algorithm involves two rounds of communication (Steps and ) between the nodes. In the first communication round, each node transmits to its single hop neighbors. In the second communication round, each bridge node transmits to its single hop neighbors. Thus, in each M-step, real numbers are exchanged between the nodes and their respective bridge nodes. In Fig. 2, we compare different variants of CB-DSBL with respect to the average number of inter-node message exchanges required to achieve less than signal reconstruction error. From the figure, it is evident that the aforementioned bridge node based ADMM technique is effective in reducing the overall inter-node communication and the associated costs, without compromising on signal reconstruction performance. One of the ways of selecting the bridge node set is to sort the nodes in decreasing order of their nodal degrees and retain the least number of top most nodes satisfying the conditions in Lemma 1. Although suboptimal, this scheme is able to significantly reduce the overall communication complexity of the algorithm as demonstrated empirically in Fig. 2. In section IV-D, a rule of thumb policy is discussed to select the bridge nodes which will ensure fast convergence of the decentralized ADMM iterations in the M-step of CB-DSBL algorithm.

Further reduction in inter-node communication is possible by executing only a finite number of ADMM iterations per M-step. In a practical embodiment of the algorithm, running a single ADMM iteration per M-step is sufficient for the CB-DSBL to converge. As shown in Fig. 3, beyond two or three ADMM iterations per M-step, there is only a marginal improvement in the quality of solution as well the convergence speed. Fig. 4 shows that even with a single ADMM iteration per M-step, CB-DSBL typically converges quite rapidly to the centralized solution.

### Iv-C Convergence of ADMM Iterations in the M-step

In this section, we analyze the convergence of the ADMM iterations (18), (19) and (21) derived for the M-step optimization in CB-DSBL. By doing so, we aim to highlight the effects of the bridge node set and the augmented Lagrangian parameter on the convergence of the ADMM iterations.

ADMM has been a very popular choice for solving both convex [33, 5, 29, 31, 23] and more recently nonconvex [34] optimization problems as well, in a distributed setup. In its classical form, ADMM solves the following constrained optimization problem:

 min x,zf(x)+g(z) subject to Ax+Bz=c, (22)

where and are the primal variables. The matrices and the vector appearing in the linear equality constraint are of appropriate dimensions. The functions and are convex with respect to and , respectively. In [35], the authors have shown linear convergence rate for the classical ADMM iterations under the assumptions of strict convexity and Lipschitz gradient on one of or , along with full row rank assumptions for the matrix . However, in the ADMM formulation of a decentralized consensus optimization problem, the coefficient matrix is seldom of full row rank. In [36], the full row rank condition of was relaxed and linear rate of convergence was established for decentralized ADMM iterations for a generic convex optimization with linear consensus constraints similar to (13). In [37], the convergence of ADMM for solving an average consensus problem has been analyzed for both noiseless and noisy communication links. In both [36] and [37], the secondary primary variables indicated by the entries of have a one to one correspondence with the communication links between the network nodes. However, such a bijection is missing for the bridge variables used in our work for enforcing consensus between the primal variables. Due to this, the convergence results of [36, 37] are not directly applicable to our case. In the sequel, we present the analysis of the convergence of decentralized ADMM iterations for the bridge node internode communication scheme.

In this section, we analyze the convergence of the ADMM iterations (18), (19) and (21) derived for the M-step optimization in CB-DSBL. By doing so, we aim to highlight the effects of the bridge node set and the augmented Lagrangian parameter on the convergence of the ADMM iterations. We start by defining block matrices and of sizes and , respectively. The rows of and encode the equality constraints in (14) such that if equality constraint is , , then and ; with the rest of the entries in the row being zero. It can easily be shown that the minimum and maximum number of bridge nodes connected to any node in the network is the same as the minimum and maximum eigenvalues of , denoted by and , respectively. Fig. 5 illustrates the construction of the block matrices and for an example network consisting of nodes.

Using the newly defined terms, the optimization problem in (14) can be rewritten compactly as

 γJ,γBminf(γJ)s.t. E1γJ+E2γB=0 (23)

where denotes the objective function in (14), which depends only on . The augmented Lagrangian corresponding to (23) can also be rewritten compactly as

 Lρ(γJ,γB,λ) = f(γJ)+λT(E1γJ+E2γB) (24) +ρ2(E1γJ+E2γB)T(E1γJ+E2γB).

By construction, the block matrix has full column rank, as all its columns are mutually disjoint in support. However can be row rank deficient due to repeated rows caused by a node being connected to multiple bridge nodes, which is often the case. Since the matrix is row rank deficient, the ADMM convergence results of [35] are not applicable to (23). Theorem 1 below summarizes the convergence of the ADMM iterations (18), (19) and (21) to their fixed point. The result in Theorem 1 holds for any that is strongly convex with strong convexity constant , and with an Lipschitz continuous gradient.

###### Theorem 1.

Let , and denote the unique primal and dual optimal solutions of (23), and vector be constructed as (similarly for ). Then, it holds that

1. The sequence is Q-linearly333 A sequence is said to be a Q-linearly convergent to , if there exists such that [36]. convergent to , i.e.,

 ∥ur+1−u∗∥G≤11+δ∥ur−u∗∥G (25)

where is evaluated as

 δ=maxμ,ν≥1⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩min⎛⎜ ⎜ ⎜ ⎜ ⎜⎝2mfνM2fρ(ν−1)σ2%min+μρσ2max,σ2minνσ2max,μ−1μ⎞⎟ ⎟ ⎟ ⎟ ⎟⎠⎫⎪ ⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪ ⎪⎭. (26)
2. The primal sequence is R-linearly444 A sequence is said to be R-linearly convergent to , if there exists Q-linearly convergent sequence which converges to zero such that . convergent to , i.e.,

 ∥γr+1J−γJ∗∥2≤12mf∥ur−u∗∥G (27)

where is the weighted norm with respect to the diagonal matrix .

###### Proof.

See Appendix C. ∎

According to Theorem 1, the primal optimality gap decays R-linearly with each ADMM iteration. Moreover, since is primal feasible, there is consensus among upon convergence, implying that each node effectively minimizes the centralized M-step cost function in (10).

### Iv-D Selection of the Augmented Lagrangian Parameter ρ

From (25) and (27) in Theorem 1, we observe that to optimize the decay of the primal optimality gap between and in each ADMM iteration, the augmented Lagrangian parameter has to be chosen such that it maximizes in (26). Theorem 2 reveals the optimal value of and the corresponding value of .

###### Theorem 2.

The optimal value of augmented Lagrangian parameter which uniquely maximizes the as defined in (26) is given by

 ρopt=Mfσmaxσmin⎡⎢ ⎢⎣√(κ−1)2+4κκ2f+(κ−1)√(κ−1)2+4κκ2f−(κ−1)⎤⎥ ⎥⎦12. (28)

The corresponding maximal value of is given by

 δopt=2(κ+1+√(κ−1)2+4κκ2f) (29)

where represents the condition number of the objective function in (14) and is the ratio of the maximum and minimum eigenvalues of .

###### Proof.

See Appendix E. ∎

From (29), we observe that the convergence rate of the ADMM iteration in the M-step of CB-DSBL algorithm depends upon two factors: and . close to its minimum value of unity results in faster convergence of the ADMM iterations. Since the ratio is also equal to the ratio of maximum and minimum number of bridge nodes per node in the network, a rule of thumb for bridge node selection would be to ensure that each node is connected to more or less the same number of bridge nodes. The convergence rate also depends upon , the parameter that is dependent on how well conditioned the function is. For the case where is the objective function in (14), it is easy to show that and . Thus, specific to CB-DSBL, the optimal ADMM parameter is given by and the corresponding . For a given network connectivity graph , this can be computed off-line and programmed in each node. As shown in Fig. 6, the average MSE and mean number of iterations vary widely with , an inappropriate choice of resulting in slow convergence and poor reconstruction performance. Also, the computed in (28) is very close to the that results in both the fastest convergence as well as the lowest average MSE.

### Iv-E Computational Complexity of CB-DSBL

In this section, we discuss the computational complexity of the steps involved in a single iteration of the CB-DSBL algorithm. The local E-step requires elementary operations at each node. The M-step is executed as multiple (say, ) ADMM iterations. A single ADMM iteration involves updating of the local hyperparameter estimate and Lagrange multipliers, which takes computations per node, being the highest number of bridge nodes assigned per node in the network. Further, each bridge node has to perform an additional computations to update the local bridge parameters in every ADMM iteration. Thus, the overall computational complexity of a single CB-DSBL algorithm at each node is , and, as desired, it does not scale with , i.e., the total number of nodes in the network.

### Iv-F Other CB-DSBL Variants

There are several alternatives to the aforementioned bridge node based ADMM technique that could potentially be used to solve the M-step optimization in (13). In this section, we present empirical results comparing the performance and communication complexity of four different variations of the proposed CB-DSBL algorithm based on (i) bridge node based ADMM [23] (ii) Distributed ADMM (D-ADMM) [31] (iii) Consensus averaging Method of Multipliers (CA-MoM) [30], and (iv) EXact firsT ordeR Algorithm (EXTRA) [32]. Each of these decentralized algorithms is endowed with at least convergence rate, where stands for the iteration count. Besides these four, there are proximal gradient based methods [38, 39] relying on Nesterov-type acceleration techniques which also offer linear convergence rates. However, these algorithms require the objective function to be bounded and involve multiple communication rounds per iteration, which is of major concern in our work. As shown in Fig. 2, the proposed CB-DSBL variant relying on the bridge node based ADMM scheme is the most communication efficient one.

### Iv-G Implementation Issues

CB-DSBL algorithm can be seen as a decentralized EM algorithm to find the ML estimate of the hyperparameters of a sparsity inducing prior. CB-DSBL, not surprisingly, also inherits the tendency of the EM algorithm to converge to one of the multiple local maxima of the ML cost function . However, getting trapped in a local maximum is not a problem, as it has been shown in [15] that all local maxima of the are at most -sparse and hence qualify as reasonably good solutions to our original sparse model estimation problem. Despite this, it is recommended to seed the EM algorithm with whose all entries are close to zero.

Another common issue is that of the wide variation in the energy of the nonzero entries of across the network. Specifically, in distributed event classification by a multitude of different types of sensors [6], each sensor node may employ its own distinct sensing modality and hence may perceive a different SNR. In such cases, a preconditioning step which normalizes the local response vector to unit energy is recommended for fast convergence of the CB-DSBL algorithm. The local sparse signal estimates can be re-adjusted in the end to undo the pre-conditioning.

## V Simulation Results

In this section, we present simulation results to examine the performance and complexity aspects of the proposed CB-DSBL algorithm when compared with existing decentralized algorithms: DRL-1 [5], DCOMP [18] and DCSP [19]. The centralized M-SBL [14] is also included in the study as a performance benchmark for the proposed decentralized algorithm. The CB-DSBL variant considered here executes two ADMM iterations in the inner loop for every EM iteration in the outer loop. The value of the augmented Lagrangian parameter, , is chosen according to (28). For each experiment, the set of bridge nodes is selected as described in section IV-B. The local measurement matrices are chosen to be Gaussian random matrices with normalized columns. The nonzero signal coefficients are sampled independently from the Rademacher distribution, unless mentioned otherwise. For each trial, the connections between the nodes are assumed according to a randomly generated Erdös-Renyi graph with a node connection probability of . In the final step of M-SBL and CB-DSBL algorithms, the active support is identified by element-wise thresholding the local hyperparameter vector at node using the threshold , where denotes the local measurement noise variance.

### V-a Performance versus SNR

In the first set of experiments, we compare the normalized mean squared error (NMSE) and the normalized support error rate (NSER) of different algorithms for a range of SNRs. The support-aware LMMSE estimator sets the MSE performance benchmark for all the support agnostic algorithms considered here. The NMSE and NSER error metrics are defined as

 NMSE=1LL∑j=1||xj−^xj||22||xj||22

where is the true common support and is the support estimated at node . The network size is fixed to nodes. As seen in Fig. 7, CB-DSBL matches the performance of centralized M-SBL in all cases. For higher SNR ( dB), it can be seen that both M-SBL and proposed CB-DSBL are MSE optimal. CB-DSBL also outperforms DRL-1 and DCOMP in terms of both MSE and support recovery. This is attributed to the fact that the Gaussian prior used in CB-DSBL with its alternate interpretation as a variational approximation to the Student’s t-distribution is more capable of inducing sparsity in comparison to the sum-log-sum penalty used in DRL-1. The poor performance of DCOMP is primarily due to its sequential approach towards support recovery which prevents any corrections to be applied to the support estimate at each step of the algorithm. Contrary to [19], DCSP fails to perform better than DCOMP. This is because DCSP works only when the number of measurements exceeds , where is the size of the nonzero support.

### V-B Tradeoff between Measurement Rate and Network Size

In the second set of experiments, we characterize the NMSE phase transition of the different algorithms in plane to identify the minimum measurement rate () needed to ensure less than signal reconstruction error (or, NMSE dB), for different network sizes (), and a fixed sparsity rate (). As shown in Fig. 8, for the same network size, CB-DSBL is able to successfully recover the unknown signals at a much lower measurement rate compared to DRL-1, DCOMP and DCSP. This plot brings out the significant benefit of using collaboration between nodes and taking advantage of the JSM-2 model in reducing the number of measurements required per node for successful signal recovery. Additionally, as the network grows in size, the complexity of the local computations at each node also reduces with the number of local measurements (see section IV-E).

### V-C Performance versus Measurement Rate (mn)

In the third set of experiments, we compare the algorithms with respect to their ability to recover the exact support for different undersampling ratios. As seen in Fig. 9, for a similar network size, CB-DSBL is able to exploit the joint sparsity structure better than DCOMP, DCSP and DRL-1, and can correctly recover the support from significantly fewer number of measurements per node. Once again, CB-DSBL has identical support recovery performance as the centralized M-SBL, which was one of our design goals.

### V-D Phase Transition Characteristics

In these set of experiments, we compare the phase transition behavior of different algorithms under NMSE and support recovery based pass/fail criteria. Fig. 9(a) plots the MSE phase transition of different algorithms where any point below the phase transition curve represents a sparsity rate and measurement rate tuple which results in an NMSE smaller than  dB corresponding to smaller than  percent signal reconstruction error. Likewise, in Fig. 9(b), points below the support recovery phase transition curve represent tuples which result in more than percent accurate nonzero support reconstruction across all the nodes. Again, we see that the CB-DSBL and centralized M-SBL have identical performance and both are capable of signal reconstruction from considerably fewer measurements compared to DRL-1, DCOMP and DCSP.

### V-E Tradeoff between Number of Bridge Nodes and Robustness to Node Failures

In the final set of experiments, we demonstrate empirically that increasing the number of bridge nodes in the CB-DSBL algorithm makes it more robust to random node failures. As shown in Fig. 11, by gradually increasing the density of bridge nodes in the network, the CB-DSBL algorithm is able to tolerate higher rates of node failures without compromising on signal reconstruction performance. More interestingly, only a relatively small fraction of nodes need to be bridge nodes ( of the total network size) to ensure that CB-DSBL operates robustly in the face of random node failures.

## Vi Conclusions

In this paper, we proposed a novel iterative Bayesian algorithm called CB-DSBL for decentralized estimation of joint-sparse signals by multiple nodes in a network. The CB-DSBL algorithm employs ADMM based decentralized EM procedure to efficiently learn the parameters of a joint sparsity inducing signal prior which is shared by all the nodes, and is subsequently used in the MAP estimation of the local signals. The CB-DSBL algorithm is well suited for applications where the privacy of the signal coefficients is important, as there is no direct exchange of either measurements or signal coefficients between the nodes. Experimental results showed that CB-DSBL outperforms existing decentralized algorithms: DRL-1, DCOMP and DCSP, in terms of both NMSE as well as support recovery performance. We also established R-linear convergence of the underlying decentralized ADMM iterations. The amount of inter-node communication during the ADMM iterations is controlled by restricting each node to exchange information with only a small subset of its single hop neighbors. For this inter-node communication scheme the ADMM convergence results presented here are applicable to any consensus driven optimization of a convex objective function. Future extensions of this work could encompass exploiting any inter vector correlation between the jointly sparse signals. Also, it would be interesting to analyze the convergence of CB-DSBL algorithm in the presence of noisy communication links between nodes and under asynchronous network operation.

## Appendix

### A Derivation of the M-step Cost Function

The conditional expectation in (III) can be simplified as shown below.

 EX[logp(Y,X;γ)|Y;γk] =E[X|Y;γk][logp(Y|X)+logp(X;γ)] =E[X|Y;γk]logp(Y|X)+∑j∈JE[xj|yj;γk]logp(xj;γ). (30)

Using (2), and discarding the terms independent of in (30), the M-step objective function is given by

 Q(γ|γk) = ∑j∈JE[xj|yj,γk](−12log|Γ|−12xTjΓ−1xj) (31) =−12∑j∈J⎛⎜⎝log|Γ|+n∑i=1E[xj∼N(μk+1j,Σk+1j)]xj(i)2γ(i)⎞⎟⎠ =−12∑j∈Jn∑i=1⎛⎝logγ(i)+Σk+1j(i,i)+μk+1j(i)2γ(i)⎞⎠.

### B Derivation of the Simplified Update for γb

By summing the dual variable update rule (18) across all nodes, the following holds for all

 ∑j∈Nb(λbj)r+1=∑j∈Nb(λbj)r+ρ∑j∈Nbγr+1j−ρ|Nb|γr+1b. (32)

Plugging (20) in (32), we obtain

 (33)

Using (33) in (20), we obtain the simplified update for .

### C Proof of Theorem 1

The proof of the convergence of ADMM discussed in the sequel is a based on the proof given in [36]. However, our proof differs from the one in [36] due to the different scheme adopted here, which uses the auxiliary/bridge nodes to enforce consensus between the nodes. We make the following assumptions about the objective function in (23).

1. is twice differentiable and strongly convex in . This implies that there exists such that, for all , the following holds

 ⟨∇f(γJ)T−∇f(γ′J)T,γJ−γ′J⟩≥mf||γJ−γ′J||22. (34)
2. is Lipschitz continuous, i.e., there exists a positive scalar such that, for all