Joint Transceiver Design for Wireless Sensor Networks through Block Coordinate Descent Optimization

# Joint Transceiver Design for Wireless Sensor Networks through Block Coordinate Descent Optimization

Yang Liu,  Jing Li,  Xuanxuan Lu,   and   Chau Yuen
Electrical and Computer Engineering Department, Lehigh University, Bethlehem, PA 18015, USA
Singapore University of Technology and Design, 20 Dover Drive, 138682, Singapore
Email: {yal210@lehigh.edu, jingli@ece.lehigh.edu, xul311@lehigh.edu, yuenchau@sutd.edu.sg}
Supported by National Science Foundation under Grants No.0928092, 1133027 and 1343372.
###### Abstract

This paper considers the joint transceiver design in a wireless sensor network where multiple sensors observe the same physical event and transmit their contaminated observations to a fusion center, with all nodes equipped with multiple antennae and linear filters. Under the mean square error (MSE) criterion, the joint beamforming design problem can be formulated as a nonconvex optimization problem. To attack this problem, various block coordinate descent (BCD) algorithms are proposed with convergence being carefully examined. First we propose a two block coordinate descent (2-BCD) algorithm that iteratively designs all the beamformers and the linear receiver, where both subproblems are convex and the convergence of limit points to stationary points is guaranteed. Besides, the thorough solution to optimizing one single beamformer is given, which, although discussed several times, is usually incomplete in existing literature. Based on that, multiple block coordinate descent algorithms are proposed. Solving the joint beamformers’ design by cyclically updating each separate beamformer under the 2-BCD framework gives birth to a layered BCD algorithm, which guarantees convergence to stationary points. Besides that, a wide class of multiple BCD algorithms using the general essentially cyclic updating rule has been studied. As will be seen, by appropriately adjusting the update of single beamformer, fast converging, highly efficient and stationary point achieving algorithms can be obtained. Extensive numerical results are presented to verify our findings.

## I Introduction

Consider a typical wireless sensor network (WSN) comprised of a fusion center (FC) and numerous sensors that are spatially distributed and wirelessly connected to provide surveillance to the same physical event. After harvesting information from the environment, these sensors transmit distorted observations to the fusion center (FC) to perform data fusion. A central underlying problem is how to design the sensors and the fusion center to collaboratively accomplish sensing, communication and fusion task in an efficient and trust-worthy manner.

When the sensors and the fusion center are all equipped with multiple antennas and linear filters, this problem may be regarded as one of the cooperative multi-input multi-output (MIMO) beamforming design problems, which have been tackled from various perspectives [1, 2, 3, 4, 5, 6, 7, 8, 9]. For example [1, 2, 4, 3] target compression (dimensionality reduction) beamforming. [1] and [2] consider the scenarios where the orthogonal multiple access channels (MAC) between the sensors and the fusion center are perfect without fading or noise. For wireless communication, the assumption of ideal channel is unrealistic and the imperfect channels are considered in [3, 4, 5, 6, 7, 8, 9]. [3] researches the problem of scalar source transmission with all sensors sharing one total transmission power and using orthogonal MAC. Imperfect coherent MAC and separate power constraint for each sensor are considered in [4], under the assumptions that all channel matrices are square and nonsingular. The work [5] and [6] are particularly relevant to our problem. [5] is the first to present a very general system model, which considers noisy and fading channels, separate power constraints and does not impose any constraints on the dimensions of beamformers or channel matrices. [5] provides the solutions to several interesting special cases of the general model for coherent MAC, such as the noiseless channel case and the no-intersymbol-interference (no-ISI) channel case. In [6], the authors develop a useful type of iterative method that is applicable to the general model in [5] for coherent MAC. All the works mentioned above take the mean square error (MSE) as performance metric. Recently, under the similar system settings of [5], joint transceiver design to maximize mutual information(MI) attract attentions and are studied in [7] and [8], with orthogonal and coherent MAC being considered respectively. The SNR maximization problem for wireless sensor network with coherent MAC is reported in [9].

It is interesting to note that the beamforming design problems in MIMO multi-sensor decision-fusion system have significant relevance with those in other multi-agent communication networks, e.g. MIMO multi-relay and multiuser communication systems. A large number of exciting papers exist in the literature, see, for example, [12, 11, 14, 13] and the references therein.

This paper considers the very general coherent MAC model discussed in [5, 6]. To solve the original nonconvex joint beamforming problem, we propose several iterative optimization algorithms using the block coordinate descent (BCD) methodology, with their convergence and complexity carefully studied. Specifically our contributions include:

1) We first propose a 2 block coordinate descent (2-BCD) method that decomposes the original problem into two subproblems— one subproblem, with all the beamformers given, is a linear minimum mean square error (LMMSE) filtering problem and the other one, jointly optimizing the beamformers with the receiver given, is shown to be convex. It is worth mentioning that [5] considers the special case where the sensor-FC channels are intersymbol-interference (ISI) free (i.e. the sensor-FC channel matrix is an identity matrix) and solves the entire problem by semidefinite programming(SDP) and relaxation. Here we reformulate the joint optimization of beamformers, even with arbitrary sensor-FC channel matrices, into a second-order cone programming(SOCP) problem, which is more efficiently solvable than the general SDP problem. Convergence analysis shows that this 2-BCD algorithm guarantees its limit points to be stationary points of the original problem. Interestingly enough, although not presented in this article, the proposed 2-BCD algorithm has one more fold of importance—the convexity of its subproblem jointly optimizing beamformers can be taken advantage of by the multiplier method [24], which requires the original problem to be convex, and therefore gives birth to decentralized solutions to the problem under the 2-BCD framework.

2) We have also attacked the MSE minimization with respect to one single beamformer and developed fully analytical solutions (possibly up to a simple one-dimension bisection search). It should be pointed out that, although the same problem has been studied in several previous papers (e.g. [11, 14, 6, 13]), we are able to carry out the analysis to the very end and thoroughly solved the problem by clearly describing the solution structure and deriving the solutions for all possible cases. Specifically, we explicitly obtain the conditions for judging the positiveness of the Lagrange multiplier. Moreover, in the zero-Lagrange-multiplier case with singular quadratic matrix, we give out the energy-preserving solution via pseudoinverse among all possible optimal solutions. To the best of our knowledge, these exact results have never been discussed in existing literature.

3) Our closed form solution for one single beamformer’s update paves the way to multiple block coordinate descent algorithms. A layered-BCD algorithm is proposed, where an inner-loop cyclically optimizing each separate beamformer is embedded in the 2-BCD framework. This layered-BCD algorithm is shown to guarantee the limit points of its solution sequence to be stationary. Besides we also consider a wide class of multiple block coordinate descent algorithms with the very general essentially cyclic updating rule. It is interesting to note that this class of algorithms subsumes the one proposed in [6] as a specialized realization. Furthermore, as will be shown, by appropriately adjusting the update of each single beamformer to a proximal version and introducing approximation, the essentially cyclic multiple block coordinate descent algorithm exhibits fast converging rate, guarantees convergence to stationary points and achieves high computation efficiency.

The rest of the paper is organized as follows: Section II introduces the system model of the joint beamforming problem in the MIMO wireless sensor network. Section III discusses the 2-BCD beamforming design approach and analyzes its convexity and convergence. Section IV discusses the further decomposition of the joint optimization of beamformers, including the closed form solution to one separate beamformer’s update, layered BCD algorithms, essentially cyclic BCD algorithms and their variants and convergence. Section V provides simulation verification and Section VI concludes this article.

Notations: We use bold lowercase letters to denote complex vectors and bold capital letters to denote complex matrices. , , and are used to denote zero vectors, zero matrices of dimension , and identity matrices of order respectively. , and are used to denote transpose, conjugate and conjugate transpose (Hermitian transpose) respectively of an arbitrary complex matrix . denotes the trace operation of a square matrix. denotes the modulus of a complex scalar, and denotes the -norm of a complex vector. means vectorization operation of a matrix, which is performed by packing the columns of a matrix into a long one column. denotes the Kronecker product. denotes the block diagonal matrix with its -th diagonal block being the square complex matrix , . denotes the real part of a complex value .

## Ii System Model

Consider a centralized wireless sensor network with sensors and one fusion center where all the nodes are equipped with multiple antennae, as shown in Figure 1. Let and () be the number of antennas provisioned to the fusion center and the -th sensor respectively. Denote as the common source vector observed by all sensors. The source is a complex vector of dimension , i.e. , and is observed by all the sensors. At the -th sensor, the source signal is linearly transformed by an observation matrix and corrupted by additive observation noise , which has zero mean and covariance matrix .

Each sensor applies some linear precoder, , to its observation before sending it to the common fusion center. Denote as the fading channel between the -th sensor and the fusion center. Here we considers the coherent MAC model, where the transmitted data is superimposed and corrupted by additive noise at the fusion center. Without loss of generality, the channel noise is modeled as a vector with zero mean and white covariance . The fusion center, after collecting all the results, applies a linear postcoder, , to retrieve the original source .

This system model depicted in Figure 1 is the same as the general model presented in [5, 6]. Following their convention, we assume that the system is perfectly time-synchronous (which may be realized via the GPS system) and that all the channel state information is known (which may be achieved via channel estimation techniques). Since the sensors and the fusion center are usually distributed over a wide range of space, it is reasonable to assume that the noise at different sensors and at the fusion center are mutually uncorrelated.

The signal transmitted by the -th sensor takes the form of . The output of the postcoder at the fusion center is given as

 ^s =GHr=GH(L∑i=1HiFi(Kis+ni)+n0) (1) =GH(L∑i=1HiFiKi)s+GH(L∑i=1HiFini+n0n), (2)

where the compound noise vector has covariance matrix given by

 Σn=σ20IM+L∑i=1HiFiΣiFHiHHi. (3)

In this paper, we take the mean square error as a figure of merit. The mean square error matrix is defined as

 Φ ≜E{(s−^s)(s−^s)H}. (4)

Assume that the source signal has zero mean and a covariance matrix . By plugging (2) into (4), we can express the matrix as a function of and as:

 Φ({Fi}Li=1,G) =GH(L∑i=1HiFiKi)Σs(L∑i=1HiFiKi)HG −GH(L∑i=1HiFiKi)Σs−Σs(L∑i=1HiFiKi)HG +L∑i=1GHHiFiΣiFHiHHiG+σ20GHG+Σs. (5)

The total is then given by

 MSE({Fi}Li=1,G)≜Tr{Φ({Fi}Li=1,G)}. (6)

We consider the case where each sensor has its own transmission power constraint. This means . The overall beamforming design problem can then be formulated as the following optimization problem:

 (P0):min.{Fi}Li=1,GMSE({Fi}Li=1,G), (7a) (7b)

The above problem is nonconvex, which can be verified by checking the special case where and are all scalars.

The following of this paper consults to block coordinate descent (BCD) method [15, 16, 17, 18], which is also known as Gauss-Seidel method, to solve () by partitioning the whole variables into separate groups and optimize each group (with the others being fixed) in an iterative manner. Appropriate decomposition can lead to efficiently solvable subproblems and may also provide opportunities for parallel computation.

## Iii Two-Block Coordinate Descent (2-BCD)

In this section, we study a two block coordinate descent (2-BCD) method that decouples the design of the postcoder (conditioned on the precoders), thereafter referred to as (), from the design of all the precoders (conditioned on the postcoder), thereafter referred to as ().

### Iii-a (P1): Optimizing G given {Fi}

For any given , minimizing with respective to becomes a strictly convex non-constrained quadratic problem ():

 (P1):minG Tr{Φ(G∣∣{Fi}Li=1)}. (8)

By equating the derivative with zero, the optimal receiver is readily obtained as the well-known Wiener filter [21]

 G⋆(P1)=[(L∑i=1HiFiKi)Σs(L∑i=1HiFiKi)H+Σn]−1(L∑i=1HiFiKi)Σs, (9)

where is given in (3).

### Iii-B (P2): Optimizing {Fi} given G

With being fixed, the subproblem () minimizes with respect to is formulated as

 (P2):min.{Fi}Li=1Tr{Φ({Fi}Li=1∣∣G)}, (10a) (10b)

Below we discuss the convexity of ().

###### Theorem 1.

() is convex with respect to .

###### Proof.

First consider the function , where the constant matrices and have appropriate dimensions and is Hermitian and positive semidefinite.

By the identities and , can be equivalently written as .

According to [20], i) ; ii) for any two Hermitian matrices and having eigenvalues and respectively, the eigenvalues of their Kronecker product are given by . As a result, is positive semidefinite when and are positive semidefinite.

Since and are both positive semidefinite, is positive semidefinite and therefore is actually a convex homogeneous quadratic function of .

Now substitute in by and recall the fact that affine operation preserves convexity [22], the term in the objective function () is therefore convex with respect to . By the same reasoning, the remaining terms in the objective and the constraints of () are either convex quadratic or affine functions of and therefore the problem is convex with respective to . ∎

In the following we reformulate the subproblem () into a standard second order cone programming(SOCP) presentation. To this end, we introduce the following notations:

 fi ≜vec(Fi);   g≜vec(G); (11a) Aij ≜(KjΣsKHi)T⊗(HHiGGHHj); (11b) Bi ≜(KiΣs)T⊗Hi; (11c) Ci ≜Σ∗i⊗(HHiGGHHi). (11d)

By the identity and the above notations, we can rewrite the in () as

 =L∑i=1L∑j=1fHiAijfj−2Re(L∑i=1gHBifi) +L∑i=1fHiCifi+σ20∥g∥2+Tr{Σs}. (12)

By further denoting

 fT ≜[fT1,⋯,fTi,⋯,fTL]; (13a) A ≜⎡⎢ ⎢ ⎢ ⎢ ⎢⎣A1,1A1,2⋯A1,LA2,1A2,2⋯A2,L⋮⋮⋱⋮AL,1AL,2⋯AL,L⎤⎥ ⎥ ⎥ ⎥ ⎥⎦; (13b) B ≜[B1,⋯,Bi,⋯,BL]; (13c) C ≜diag{C1,⋯,Ci,⋯,CL}; (13d) Di ≜diag{O∑i−1j=1JjNj,Ei,O∑Lj=i+1JjNj}, i∈{1,⋯,L}; (13e) Ei ≜(KiΣsKHi+Σi)T⊗INi,     i∈{1,⋯,L}; (13f) c ≜Tr{Σs}+σ20∥g∥2, (13g)

the problem () can be rewritten as ():

 (P2′):minf fH(A+C)f−2Re{gHBf}+c, (14a) s.t. fHDif≤Pi,      i∈{1,⋯,L}. (14b)

As proved by Theorem 1, () (or equivalently ()) is convex, which implies is positive semidefinite. Thus the square root exists. The above problem can therefore be reformulated in an SOCP form as follows

 (P2SOCP):min.f,t,s t, (15a) s.t. s−2Re{gHBf}+c≤t; (15b) ∥∥ ∥∥(A+C)12fs−12∥∥ ∥∥2≤s+12; (15c) ∥∥ ∥∥D12ifPi−12∥∥ ∥∥2≤Pi+12,  i∈{1,⋯,L}; (15d)

can be numerically solved by off-the-shelf convex programming solvers, such as CVX [23].

Summarizing the above discussions, the problem () can be solved by a 2-BCD algorithm: updating by solving () and updating by solving () alternatively, which is summarized in Algorithm 1.

### Iii-C Convergence of 2-BCD Algorithm

In this subsection we study the convergence of the above 2-BCD algorithm. Consider the optimization problem with being continuously differentiable and the feasible domain being closed and nonempty. A point is a stationary point if and only if , , where denotes the gradient of at . For the proposed 2-BCD algorithm, we have the following convergence conclusion.

###### Theorem 2.

The objective sequence generated by the 2-BCD algorithm in Algorithm 1 is monotonically decreasing. If or for all , the solution sequence generated by the 2-BCD algorithm has limit points and each limit point of is a stationary point of ().

###### Proof.

Since each block update solves a minimization problem, keeps decreasing. Let , for and . Under the strictly positive definiteness assumption of or , we have and thus for all . This implies that the null space of is and consequently has to be bounded to satisfy power constraint. Therefore is bounded for all . Since the feasible set for each is bounded, by Bolzano-Weierstrass theorem, there exists a convergent subsequence . Since is updated by equation (9) as a continuous function of , the subsequence also converges and thus bounded. By further restricting to a subsequence of , we can obtain a convergent subsequence of .

Since Algorithm 1 is a two block coordinate descent procedure and the problem () has continuously differentiable objective and closed and convex feasible domain, Corollary 2 in [17] is valid to invoke, we conclude that any limit point of is a stationary point of (). ∎

## Iv Multi-Block Coordinate Descent

For the above 2-BCD algorithm, although we can solve the subproblem () as a standard SOCP problem, its closed-form solution is still inaccessible. The complexity for solving () can be shown to be , This implies that when the sensor network under consideration has a large number of sensors and/or antennae, the complexity for solving () can be rather daunting. This motivates us to search for more efficient ways to update sensor’s beamformer.

### Iv-a Further Decoupling of (P2) and Closed-Form Solution

Looking back to problem (), although it has separable power constraints, its quadratic terms in its objective tangles different sensors’ beamformers together and thus makes the Karush-Kuhn-Tucker(KKT) conditions of () analytically unsolvable. Here we adopt the BCD methodology to further decompose the subproblem (). Instead of optimizing all the ’s in a single batch, we optimize one at a time with the others being fixed. By introducing the notation , each subproblem () of () is given as

 (P2′i):minfi fHi(Aii+Ci)fi+2Re{qHifi}−2Re{gHBifi} (16a) s.t. fHiEifi≤Pi. (16b)

Now our problem boils down to solving the simpler problem (), for . The following theorem provides an almost closed-form solution to (). The only reason that this is not a fully closed-form solution is because it may involve a bisection search to determine the value of a positive real number.

###### Theorem 3.

Assume or . Define parameters , and as in equations (26) in the appendix, as the rank of and as the -th entry of . The solution to () is given as follows:
(I)—if either of the following two conditions holds:
i) such that ;
or  ii) and .
The optimal solution to () is given by

 f⋆i=(Aii+Ci+μ⋆iEi)−1(BHig−qi), (17)

with the positive value being the unique solution to the equation: . An interval containing is determined by Lemma 1 which comes later.
(II) and ,
The optimal solution to () is given by

 (18)
###### Proof.

See Appendix -A. ∎

Here we have several comments and supplementary discussions on the solution to ().

###### Comment IV.1.

When and is singular, the solution to () is usually non-unique. According to the proof procedure in Appendix -A, (18) is actually the power-preserving optimal solution, which has the minimal transmission power among all optimal solutions to ().

###### Comment IV.2.

It is worth noting that the three cases discussed in the proof of Theorem 3, (I)-case i), (I)-case ii) and (II), are mutually exclusive events. One and only one case will occur.

###### Comment IV.3.

The problem of minimizing MSE with respect to one separate beamformer with one power constraint is a rather standard problem that has been discussed in previous works such as [11, 14, 6, 13]. A big contribution here is that we have fully solved this problem by clearly identifying the solution structure and writing out the almost closed-form solutions for all possible cases, whereas the previous papers have not. One key consideration is the case of rank deficient for zero . Although [11] and [6] mention that can be zero, the solution for singular in this case is missing. In fact when does not have full rank and is zero, its inverse does not exist and consequently the solutions given in [11, 14, 6, 13] do not stand any more (they all provide solutions by matrix inversion). It is noted that [6] imposes more assumptions on the number of antennas to exclude some cases where is rank deficient. However these assumptions undermine the generality of the system model and, still, adverse channel parameters can result in rank deficiency of . Turns out, the rank deficiency scenario is actually not rare. In fact, whenever or holds, the matrices and are both born rank deficient. If they share common nonzero components of null space, will be rank deficient. For example, consider the simple case where , , and . At this time is not of full rank. Besides inappropriate channel parameters can also generate rank deficient . Thus taking the rank deficiency of when into consideration is both necessary and meaningful.

###### Comment IV.4.

In the special case where , the fully closed form solution to () does exist! At this time, the optimal and can be obtained analytically without bisection search. In this case, eigenvalue decomposition is also unnecessary. So when , solving () is extremely efficient. The details can be found in [10]111[10] actually solves an approximation of problem () with scalared source , where a specific affine term of in the objective of () is approximated by its latest value (approximation is discussed in subsection IV-D of this paper). However fully analytic solution of () can be obtained by following very similar lines as [10] without introducing approximation of ..

Recall that in (I) of Thoerem 3, is obtained as the solution to . This equation generally has no analytic solution. Fortunately is strictly decreasing in and thus the equation can be efficiently solved by a bisection search. The following lemma provides an interval containing the positive , from which the bisection search to determine can be started.

###### Lemma 1.

The positive in () (i.e. CASE (I) in Theorem 3) has the following lower bound and upper bound :
i) For subcase i)

 lbdi=[∥pi∥2√Pi−λi,1]+,   ubdi=∥pi∥2√Pi; (19)

ii) For subcase ii)

 lbdi=[∥pi∥2√Pi−λi,1]+,   ubdi=∥pi∥2√Pi−λi,ri, (20)

where .

###### Proof.

For subcase i), by definition of in (30), we have

 ∥pi∥22(μi+λi,1)2=∑JiNik=1|pi,k|2(μi+λi,1)2 ≤gi(μi)=Pi ≤∑JiNik=1|pi,k|2μ2i=∥pi∥22μ2i, (21)

which can be equivalently written as

 ∥pi∥2√Pi−λi,1≤μi≤∥pi∥2√Pi. (22)

Also notice that should be positive; the bounds in (19) thus follow.

For subcase ii), by assumption, . This leads to

 ∥pi∥22(μi+λi,1)2=∑rik=1|pi,k|2(μi+λi,1)2 ≤gi(μi)=Pi ≤∑rik=1|pi,k|2(μi+λi,ri)2=∥pi∥22(μi+λi,ri)2. (23)

Following the same line of derivation as in subcase i), we obtain the bounds in (20). ∎

Algorithm 2 summarizes the results obtained in Theorem 3 and Lemma 1 and provides a (nearly) closed-form solution to ().

### Iv-B Layered-BCD Algorithm

The above analysis of (), combined with (), naturally leads to a nested or layered-BCD algorithm, that can be used to analytically solve the joint beamforming problem (). The algorithm consists of two loops (two layers). The outer-loop is a two-block descent procedure alternatively optimizing and , and the inner-loop further decomposes the optimization of into an -block descent procedure operated in an iterative round robin fashion. Algorithm 3 outlines the overall procedure. As will be seen in the next, this layered-BCD has strong convergence property.

###### Theorem 4.

Assume that or , . The objective sequence generated by Algorithm 3 is monotonically decreasing. The solution sequence generated by Algorithm 3 has limit points, and each limit point is a stationary point of ().

###### Proof.

The proof of the monotonicity of and the existence of limit points for the solution sequence follows the same lines as those of Theorem 2.

From Theorem 1, given , the objective function of Problem (14) is convex (and therefore, of course, pseudoconvex) with respect to . Since the objective in () is continuous and the feasible domain of is bounded, there exists some feasible point making the level set closed and bounded. Thus Proposition 6 in [17] is valid to invoke. For a given at any step of outer-loop, the inner loop generates limit point(s) converging to a stationary point of the problem (). Since () is a convex problem, any stationary point is actually an optimal solution [16]. Therefore the subproblem () is actually globally solved. By Theorem 2, each limit point of solution sequence is a stationary point of the original problem (). ∎

Although the convergence analysis in Theorem 4 states that the layered-BCD algorithm guarantees convergence, it requires the inner-loop to iterate numerous times to converge sufficiently. In fact if each inner loop is performed with a small number of iterations, the layered BCD algorithm becomes a specialized essentially cyclic BCD algorithm, which will be discussed in next subsection.

### Iv-C Essentially Cyclic (L+1)-BCD Algorithm

In this subsection, we propose an ()-BCD algorithm, where in each update the linear FC receiver or one single beamformer is updated efficiently by equation (9) or Theorem 3 respectively. Compared to the 2-BCD algorithm, the block updating rule for multiple block coordinate descent method can have various patterns. Here we adopt a very general updating manner called essentially cyclic rule [18]. For essentially cyclic update rule, there exists a positive integer , which is called period, such that each block of variables is updated at least once within any consecutive updates. The classical Gauss-Seidel method is actually a special case of essentially cyclic rule with its period being exactly the number of blocks of variables.

For the convergence of essentially cyclic BCD algorithm, when the whole solution sequence converges, the limit of the solution sequence is stationary. In fact, assume that the sequence converges to the limit point