Optimized Sensor Collaboration for Estimation of Temporally Correlated Parameters

Optimized Sensor Collaboration for Estimation of Temporally Correlated Parameters

Sijia Liu  Swarnendu Kar,  Makan Fardad,  and Pramod K. Varshney  Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.S. Liu was with Syracuse University. Now he is with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA. Email: lsjxjtu@umich.edu.S. Kar is with New Devices Group, Intel Corporation, Hillsboro, Oregon, 97124 USA. Email: swarnendu.kar@intel.com.M. Fardad and P. K. Varshney are with the Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244 USA. Email: {makan, varshney}@syr.edu.The work of S. Liu and P. K. Varshney was supported by the U.S. Air Force Office of Scientific Research under grants FA9550-10-1-0458. The work of M. Fardad was supported by the National Science Foundation under awards EAGER ECCS-1545270 and CNS-1329885.
Abstract

In this paper, we aim to design the optimal sensor collaboration strategy for the estimation of time-varying parameters, where collaboration refers to the act of sharing measurements with neighboring sensors prior to transmission to a fusion center. We begin by addressing the sensor collaboration problem for the estimation of uncorrelated parameters. We show that the resulting collaboration problem can be transformed into a special nonconvex optimization problem, where a difference of convex functions carries all the nonconvexity. This specific problem structure enables the use of a convex-concave procedure to obtain a near-optimal solution. When the parameters of interest are temporally correlated, a penalized version of the convex-concave procedure becomes well suited for designing the optimal collaboration scheme. In order to improve computational efficiency, we further propose a fast algorithm that scales gracefully with problem size via the alternating direction method of multipliers. Numerical results are provided to demonstrate the effectiveness of our approach and the impact of parameter correlation and temporal dynamics of sensor networks on estimation performance.

Distributed estimation, sensor collaboration, convex-concave procedure, semidefinite programming, ADMM, wireless sensor networks.

I Introduction

Wireless sensor networks (WSNs) consist of a large number of spatially distributed sensors that often cooperate to perform parameter estimation; example applications include environment monitoring, source localization and target tracking [1, 2, 3]. Under limited resources, such as limited communication bandwidth and sensor battery power, it is important to design an energy-efficient architecture for distributed estimation. In this paper, we employ a WSN to estimate time-varying parameters in the presence of inter-sensor communication that is referred to as sensor collaboration. Here sensors are allowed to update their measurements by taking a linear combination of the measurements of those they interact with prior to transmission to a fusion center (FC). The presence of sensor collaboration smooths out the observation noise, thereby improving the quality of the signal and the eventual estimation performance.

Early research efforts [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] focused on the problem of distributed inference (estimation or detection) in the absence of sensor collaboration, where an amplify-and-forward transmission strategy is commonly used. In [4], the problem of designing optimal power amplifying factors (also known as power allocation problem) was studied for distributed estimation over an orthogonal multiple access channel (MAC). In [5], the power allocation problem was addressed when the MAC is coherent, where sensors coherently form a beam into a common channel received at the FC. In [6], a likelihood-based multiple access communication strategy was proposed for estimation, and was proved to be asymptotically efficient as the number of sensors increases. In [7], feedback signals were studied to combat uncertainty in the observation model for distributed estimation with coherent MAC. In [8], distributed detection problem was studied in the setting of identical Gaussian multiple access channels (without fading). It was shown that the centralized error exponent can be achieved via the transmission of the log-likelihood ratio as the number of sensors approaches infinity. Further in [9, 10, 11], asymptotic detection performance was studied over multiaccess fading channels. In [12], the problem of power allocation was studied for distributed detection using a MAC. In [13], the impact of nonlinear bounded transmission schemes was studied on distributed detection and estimation. In the aforementioned literature [4, 5, 6, 7, 8, 9, 10, 11, 12, 13], the act of inter-sensor communication was not considered. In contrast, here we seek the optimal sensor collaboration scheme for the estimation of temporally correlated parameters.

Recently, the problem of distributed estimation with sensor collaboration has attracted attention [14, 15, 16, 17, 18, 19, 20, 21, 22]. In [14], the optimal power allocation strategy was found for a fully connected network, where all the sensors are allowed to collaborate, namely, share their measurements with the other sensors. It was shown that sensor collaboration results in significant improvement of estimation performance compared with the conventional amplify-and-forward transmission scheme. In [15] and [16], optimal power allocation schemes were found for star, branch and linear network topologies. In [17], the sensor collaboration problem was studied for parameter estimation via the best linear unbiased estimator. In [18, 19, 20], the problem of sensor collaboration was studied given an arbitrary collaboration topology. It was observed that even a partially connected network can yield performance close to that of a fully connected network. In [21] and [22], nonzero collaboration costs were taken into account, and a sparsity inducing optimization framework was proposed to jointly design both sensor selection and sensor collaboration schemes.

In the existing literature [14, 15, 16, 17, 18, 19, 20, 21, 22], sensor collaboration was studied in static networks, where sensors take a single snapshot of the static parameter, and then initiate sensor collaboration protocols designed in the setting of single-snapshot estimation. In contrast, here we study the problem of sensor collaboration for the estimation of temporally-correlated parameters in dynamic networks that involve, for example, time-varying observation and channel gains. Solving such a problem is also motivated by real-life applications, in which the physical phenomenon to be monitored such as daily temperature, precipitation, soil moisture and seismic activities [23, 24, 25] is temporally correlated. For example, when monitoring daily temperature variations, temperatures at different times of the day are strongly correlated, e.g., a cold morning is likely to be followed by a cold afternoon.

Due to the presence of temporal dynamics and parameter correlation, optimal sensor collaboration schemes at multiple time steps are coupled with each other, and thus pose many challenges in problem formulation and optimization compared to the existing work [14, 15, 16, 17, 18, 19, 20, 21, 22]. For example, when parameters of interest are temporally correlated, expressing the estimation distortion in a succinct closed form (with respect to the collaboration variables) is not straightforward. It should be pointed out that even for uncorrelated parameters, finding the optimal collaboration scheme for each time step is nontrivial since energy constraints are temporally inseparable. In this paper, we seek the optimal sensor collaboration scheme by minimizing the estimation distortion subject to individual energy constraints of sensors in the presence of (a) temporal dynamics in system, (b) temporal correlation of parameter, and (c) energy constraints in time.

Besides [14, 15, 16, 17, 18, 19, 20, 21, 22], our work is also related to but quite different from the problem of consensus-based decentralized estimation [26, 27, 28, 29, 30, 31, 32]. The common idea in [26, 27, 28, 29, 30, 31, 32] is that the task of centralized estimation can be performed using local estimators at sensors together with inter-sensor communications. It was shown in [31] and [32] that the success of decentralized estimation is based on the fact that the global estimation cost with respect to the parameter of interest can be converted into a sum of local cost functions subject to consensus constraints. Different from [26, 27, 28, 29, 30, 31, 32], the focus of this paper is to design the optimal energy allocation strategy (namely, the collaboration weights), rather to find the optimal estimate. Here tasks of estimation and optimization are completed at an FC. Moreover, the studied sensor network is not necessarily connected. An extreme case is that in the absence of inter-sensor communication, the proposed sensor collaboration problem would reduce to the conventional power allocation problem (based on the amplify-and-forward transmission strategy) [5, 4]. Therefore, our problem is different from the consensus-based decentralized estimation problem, in which the network is assumed to be connected so that the consensus of estimate at local sensors can be achieved.

In our work, design of the optimal collaboration scheme is studied under two scenarios: a) parameters are temporally uncorrelated or prior knowledge about temporal correlation is not available, and b) parameters are temporally correlated. When parameters are uncorrelated, we derive the closed form of the estimation distortion with respect to sensor collaboration variables, which is in the form of a sum of quadratic ratios. We show that the resulting sensor collaboration problem is equivalent to a nonconvex quadratically constrained problem, in which the difference of convex functions carries all the nonconvexity. This specific problem structure enables the use of convex-concave procedure (CCP) [33] to solve the sensor collaboration problem in a numerically efficient manner.

When parameters of interest are temporally correlated, expressing the estimation error as an explicit function of the collaboration variables becomes difficult. In this case, we show that the sensor collaboration problem can be converted into a semidefinite program together with a (nonconvex) rank-one constraint. After convexification, the method of penalty CCP [34] becomes well-suited for seeking the optimal sensor collaboration scheme. However, the proposed algorithm is computationally intensive for large-scale problems. To improve computational efficiency, we develop a fast algorithm that scales gracefully with problem size by using the alternating direction method of multipliers (ADMM) [35].

We summarize our contributions as follows.

  • We propose a tractable optimization framework for the design of the optimal collaboration scheme that accounts for parameter correlation and temporal dynamics of sensor networks.

  • We show that the problem of sensor collaboration for the estimation of temporally uncorrelated parameters can be solved as a special nonconvex problem, where the only source of nonconvexity can be isolated to a constraint that contains the difference of convex functions.

  • We provide valuable insights into the problem structure of sensor collaboration with correlated parameters, and propose an ADMM-based algorithm for improving the computational efficiency.

The rest of the paper is organized as follows. In Section II, we introduce the collaborative estimation system, and present the general formulation of the optimal sensor collaboration problem. In Section III, we discuss two types of sensor collaboration problems for the estimation of temporally uncorrelated and correlated parameters. In Section IV, we study the sensor collaboration problem with uncorrelated parameters. In Section V, we propose efficient optimization methods to solve the sensor collaboration problem with correlated parameters. In Section VI, we demonstrate the effectiveness of our approach through numerical examples. Finally, in Section VII we summarize our work and discuss future research directions.

Ii System Model

In this section, we introduce the collaborative estimation system and formulate the sensor collaboration problem considered in this work. The task here is to estimate a time-varying parameter over a time horizon of length . In the estimation system, sensors first accquire their raw measurements via a linear sensing model, and then update their observations through spatial collaboration, where collaboration refers to the act of sharing measurements with neighboring sensors. The collaborative signals are then transmitted through a coherent MAC to the FC, which finally determines a global estimate of for . The overall architecture of the collaborative estimation system is shown in Fig. 1.

Fig. 1: Collaborative estimation architecture.

The vector of measurements from sensors at time is given by the linear sensing model

(1)

where for notational simplicity, let denote the integer set , is the vector of measurements, is the vector of observation gains, without loss of generality is assumed to be a random process with zero mean and variance , is the vector of Gaussian noises with i.i.d variables for and .

After linear sensing, each sensor may pass its observation to other sensors for collaboration prior to transmission to the FC. With a relabelling of sensors, we assume that the first sensors (out of a total of sensor nodes) communicate with the FC. Collaboration among sensors is represented by a known matrix with zero-one entries, namely, for and . Here we call a topology matrix, where signifies that the th sensor shares its observation with the th sensor, and indicates the absence of a collaboration link from the th sensor to the th sensor. Note that is essentially a truncated adjacency matrix. The bidirectional communication link between two sensors indicates that the underlying graph of the network is directed but not necessarily connected. In particular, the network given by for corresponds to the amplify-and-forward transmission strategy considered in [4].

Based on the topology matrix, the sensor collaboration process at time is given by

(2)

where , is the signal after collaboration at sensor and time , is the collaboration matrix that contains collaboration weights (based on the energy allocated) used to combine sensor measurements at time , denotes the elementwise product, is the vector of all ones, and is the matrix of all zeros. In what follows, while refering to vectors of all ones and all zeros, their dimensions will be omitted for simplicity but can be inferred from the context. In (2), we assume that sharing of an observation is realized through an ideal (noise-less and cost-free) communication link. The proposed ideal collaboration model enables us to obtain explicit expressions for transmission cost and estimation distortion.

After sensor collaboration, the message is transmitted through a coherent MAC so that the received signal at the FC is a coherent sum [5]

(3)

where is the vector of channel gains, and is temporally independent Gaussian noise with zero mean and variance .

From (1) – (3), the vector of received signals at the FC can be compactly expressed as a linear function of parameters ,

(4)

where , , , , and denotes the block-diagonal matrix with diagonal blocks .

At the FC, we employ a linear minimum mean squared-error estimator (LMMSE) [36] to estimate , where we assume that the FC knows the observation gains, channel gains, and the second-order statistics of the parameters of interest and additive noises. The corresponding estimation error covariance is given by [36, Theorem 10.3]

(5)

where represents prior knowledge about the parameter correlation, particularly for temporally uncorrelated parameters, is the identity matrix, and . It is clear from (5) that the estimation error covariance matrix is a function of collaboration matrices , and their dependence on is through . This dependency does not lend itself to easy optimization of scalar-valued functions of for design of the optimal sensor collaboration scheme. More insights into the LMMSE will be provided in Sec. III.

We next define the transmission cost of the th sensor at time , which refers to the energy consumption of transmitting the collaborative message to the FC. That is,

(6)

for and , where is a basis vector with at the th coordinate and s elsewhere. In what follows, while refering to basis vectors and identity matrices, their dimensions will be omitted for simplicity but can be inferred from the context.

We now state the main optimization problem considered in this work for sensor collaboration

(7)

where is the optimization variable for , denotes the estimation distortion of using the LMMSE, is the transmission cost given by (6), is a prescribed energy budget of the th sensor, and characterizes the network topology. The problem structure and the solution of (7) will be elaborated on in the rest of the paper.

We end this section with the following remarks.

Remark 1

In the system model, the assumption of known observation and channel gains can be further relaxed to that of given knowledge about their second-order statistics. Our earlier work [22] has shown that under this weaker assumption, we can obtain similar expressions of the linear estimator. In this paper, we assume the observation and channel models are known for ease of presentation and analysis.

Remark 2

Although sensor collaboration is performed with respect to a time-invariant (fixed) topology matrix , energy allocation in terms of the magnitude of nonzero entries in is time varying in the presence of temporal dynamics of the sensor network. As will be evident later, the proposed sensor collaboration approach is also applicable to the problem with time-varying topologies.

Iii Reformulation and Simplification Using Matrix Vectorization

In this section, we simplify problem (7) by exploiting the sparsity structure of the topology matrix and concatenating the nonzero entries of a collaboration matrix into a collaboration vector. There exist two benefits to using matrix vectorization: a) the topology constraint in (7) can be eliminated without loss of performance, which renders a less complex problem; b) the structure of nonconvexities is more easily revealed via such a reformulation.

Fig. 2: Example of vectorization of .

In problem (7), the only optimization variables are the nonzero entries of collaboration matrices. We concatenate these nonzero entries (columnwise) into a collaboration vector

(8)

where denotes the th entry of , and is the number of nonzero entries of the topology matrix . We note that given , there exists a row index and a column index such that , where (or ) denotes the th entry of a matrix . We demonstrate the vectorization of through an example in Fig. 2, where we consider sensor nodes, communicating nodes, and collaboration links.

Iii-a Collaboration problem for the estimation of uncorrelated parameters

When the parameters of interest are uncorrelated, the estimation error covariance matrix (5) simplifies to

(9)

where denotes a diagonal matrix with diagonal entries , , …, .

Let be the vector obtained by stacking the nonzero entries of columnwise. Then

(10)

where is a coefficient vector, is an matrix whose th entry is given by

(11)

and the indices and are such that for . The proof of equation (10) is given in Appendix A for the sake of completeness.

From (9) and (10), the objective function of problem (7) can be rewritten as

(12)

where we used the fact that , i.e., is derived from in the same way that is derived from in (10), and .

Moreover, the transmission cost (6) can be rewritten as

(13)

where is defined as in (10) such that . We remark that is positive semidefinite for and .

From (12) and (13), the sensor collaboration problem for the estimation of temporally uncorrelated parameters becomes

(P1)

where is the optimization variable, is the estimation distortion given by (12), and . Note that (P1) cannot be decomposed in time since sensor energy constraints are temporally inseparable.

Compared to problem (7), the topology constraint in terms of is eliminated without loss of performance in (P1) since the sparsity structure of the topology matrix has been taken into account while constructing the collaboration vector. In the special case of single-snapshot estimation (namely, ), the objective function of (P1) simplifies to a single quadratic ratio. It has been shown in [18] and [22] that such a nonconvex problem can be readily solved via convex programming. In contrast, (P1) is a more complex nonconvex optimization problem, where the nonconvexity stems from the sum of quadratic ratios in the objective function. As indicated in [37] and [38], the Karush-Kuhn-Tucker (KKT) conditions of such a complex fractional optimization problem are intractable to solve to obtain the globally optimal solution (or all locally optimal solutions). Therefore, an efficient local optimization method will be proposed to solve (P1) in Sec. IV. Also, the efficacy of the proposed solution will be shown in Sec. VI via extensive numerical experiments.

Iii-B Collaboration problem for the estimation of correlated parameters

When parameters are temporally correlated, the covariance matrix is no longer diagonal and it is not straight forward to express the estimation error in a succinct form, as was done in (12). We recall from (5) that the dependence of the estimation error covariance on collaboration matrices is through . According to the matrix inversion lemma [36, A 1.1.3],

(14)

Substituting (14) into (5), we obtain

(15)

with . According to the definition of in (4), we obtain

(16)

where has been introduced in the paragraph that proceeds (12).

Combining (15) and (16), we can rewrite the estimation error covariance as a function of the collaboration vector

(17)

From (17), the sensor collaboration problem for the estimation of temporally correlated parameters becomes

(P2)

where is the optimization variable.

We note that (P2) is a nonconvex optimization problem. We will show in Sec. V that the rank-one matrix that appears in (17) is the source of nonconvexity. Compared to (P1), (P2) is more involved due to the presence of the parameter correlation. We will also show that (P2) can be cast as a particular nonconvex optimization problem, where the objective function is linear, and the constraint set is formed by convex quadratic constraints, linear matrix inequalities and nonconvex rank constraints. The presence of generalized inequalities (with respect to positive semidefnite cones) and rank constraints make KKT conditions complex and intractable to find the globally optimal solution. Instead, we will employ an efficient convexification method to find a locally optimal solution of (P2). The efficacy of the proposed optimization method will be empirically shown in Sec. VI.

We finally remark that both (P1) and (P2) are feasible optimization problems, namely, in the sense that an optimal solution exists for each of them. This can be examined as follows. First, there exists a non-empty constraint set. For example, is a feasible solution to (P1) and (P2). When , the estimate of the unknown parameter is only determined by the prior knowledge about the parameter. Second, the optimal value is bounded due to the presence of the energy constraint.

Iv Special Case: Optimal Sensor Collaboration for The Estimation of Uncorrelated Parameters

In this section, we show that (P1) can be transformed into a special nonconvex optimization problem, where the difference of convex (DC) functions carries all the nonconvexity. Spurred by the problem structure, we employ a convex-concave procedure (CCP) to solve (P1).

Iv-a Equivalent optimization problem

We express (P1) in its epigraph form [39, Sections 3.1&7.5]

(18a)
(18b)
(18c)

where is the vector of newly introduced optimization variables.

We further introduce new variables and for to rewrite (18b) as

(19)

where the equivalence between (18b) and (19) holds since the minimization of with the above inequalities forces the variable and to achieve their upper and lower bounds, respectively.

In (19), the ratio together with can be reformulated as a quadratic inequality of DC type

(20)

where both and are convex quadratic functions.

From (19) and (20), problem (18) becomes

(21a)
(21b)
(21c)
(21d)
(21e)
(21f)

where the optimization variables are , , and , , , and denotes elementwise inequality. Note that the quadratic functions of DC type in (21b) and (21c) contain the nonconvexity of problem (21). In what follows, we will show that CCP is a suitable convex restriction approach for solving this problem.

Iv-B Convex restriction

Problem (21) is convex except for the nonconvex quadratic constraints (21b) and (21c), which have the DC form

(22)

where both and are convex functions. In (21b), we have , and . In (21c), , and .

We can convexify (22) by linearizing around a feasible point ,

(23)

where , is the first-order derivative of at the point . In (23), is an affine lower bound on the convex function , and therefore, the set of that satisfy (23) is a strict subset of the set of that satisfy (22). This implies that a solution of the optimization problem with the linearized constraint (23) is locally optimal for the problem with the original nonconvex constraint (22).

We can obtain a restricted convex version of problem (21) by linearizing (21b) and (21c) as was done in (22) and (23). We then solve a sequence of convex programs with iteratively updated linearization points. The use of linearization to convexify nonconvex problems with DC type functions is known as CCP [34]. At each iteration of CCP, we solve

(24)

where the optimization variables are , , , and , and are affine approximations of and , namely, , and . We summarize CCP for solving problem (21) or (P1) in Algorithm 1.

1:initial points , and , and
2:for iteration  do
3:     solve problem (24) for the solution
4:     update the linearization point, , , and      
5:     until with .
6:end for
Algorithm 1 CCP for solving (P1)

To initialize Algorithm 1, we can choose random points, for example drawn from a standard uniform distribution, that are then scaled to satisfy the constraints (21b) – (21e). Our extensive numerical examples show that Algorithm 1 is fairly robust with respect to the choice of the initial point; see Fig. 4-(a) for an example.

It is known from [40, Theorem 10] that CCP is a descent algorithm that converges to a stationary point of the original nonconvex problem. To be specific, at each iteration, we solve a restricted convex problem with a smaller feasible set which contains the linearization point (i.e., the solution after the previous iteration). Therefore, we always obtain a new feasible point with a lower or equal objective value. Moreover, reference [41] showed that CCP has at least linear convergence rate , where is the number of iterations111Given the stopping tolerance , the linear convergence rate implies iterations to convergence.. However, our numerical results and those in [42, 34, 33] have shown that the empirical convergence rate is typically faster, and much of the benefit of using CCP is gained during its first few iterations.

The computation cost of Algorithm 1 is dominated by the solution of the convex program with quadratic constraints at Step 2. This has the computational complexity in the use of interior-point algorithm [43, Chapter 10], where and denote the number of optimization variables and constraints, respectively. In problem (24), we have and . Therefore, the complexity of our algorithm is roughly given by per iteration. Here we focus on the scenario in which the number of collaboration links is much larger than or .

V General Case: Optimal Sensor Collaboration for The Estimation of Correlated Parameters

Different from (P1), the presence of temporal correlation makes finding the solution of (P2) more challenging. However, we demonstrate that (P2) can be recast as an optimization problem with the important property that the problem becomes a semidefinite program (SDP) if its rank-one constraint is replaced by a linear relaxation/approximation. Spurred by the problem structure, we employ a penalty CCP to solve (P2), and propose a fast optimization algorithm by using the alternating direction method of multipliers (ADMM).

V-a Equivalent optimization problem

We transform (P2) into the following equivalent form

(25)

where is the newly introduced optimization variable, represents the set of symmetric matrices, and the notation (or ) indicates that (or ) is positive semidefinite. The first inequality constraint of problem (25) is obtained from , where is given by (17), and represents the Bayesian Fisher information matrix.

We further introduce a new vector of optimization variables such that the first matrix inequality of problem (25) is expressed as

(26)
(27)
(28)

where we use the expression of given by (17), and is the newly introduced optimization variable for . Note that the minimization of with inequalities (26) and (27) would force the variable to achieve its lower bound. In other words, problem (25) is equivalent to the problem in which the first inequality constraint of (25) is replaced by the above two inequalities.

By employing the Schur complement, we can express (26) and (27) as the linear matrix inequalities (LMIs)

(29)
(30)

Replacing the first inequality of problem (25) with LMIs (29) – (30), we obtain an optimization problem that is convex except for the rank-one constraint (28), which can be recast as two inequalities

(31)

According to the Shur complement, the first matrix inequality is equivalent to the LMI

(32)

And the second inequality in (31) involves a function of DC type, where and are matrix convex functions [39].

From (29) – (32), problem (25) or (P2) is equivalent to

(33a)
(33b)
(33c)
(33d)
(33e)

where the optimization variables are , , and for , and (33e) is a nonconvex constraint of DC type.

V-B Convexification

Proceeding with the same logic as in Sec. IV to convexify the constraint (22), we linearize (33e) around a point ,

(34)

It is straightforward to apply CCP to solve problem (33) by replacing (33e) with (34). However, such an approach fails in practice. This is not surprising, since the feasible set determined by (33d) and (34) only contains the linearization point. Specifically, from (33d) and (34), we obtain

(35)

which indicates that . Therefore, CCP gets trapped in the linearization point.

Remark 3

Dropping the nonconvex constraint (33e) is another method to convexify problem (33), known as semidefinite relaxation [44]. However, such an approach makes the optimization variable unbounded, since the minimization of forces to be as large as possible such that the variable in (27) is as small as possible.

In order to circumvent the drawback of the standard CCP, we consider its penalized version, known as penalty CCP [34, 45], where we add new variables to allow for constraints (34) to be violated and penalize the sum of the violations in the objective function. As a result, the convexification (34) is modified by

(36)

where is a newly introduced variable. The constraint (36) implicitly adds the additional constraint due to from (33d).

After replacing (33e) with (36), we obtain the SDP,

(37)

where the optimization variables are , , , and for , and is a penalty parameter. Compared to the standard CCP, problem (37) is optimized over a larger feasible set since we allow for constraints to be violated by adding variables for . We summarize the use of penalty CCP to solve (P2) in Algorithm 2.

1:an initial point , , , and .
2:for iteration  do
3:     solve problem (37) for its solution via SDP solver      or ADMM-based algorithm in Sec. V-C
4:     update the linearization point,
5:     update the penalty parameter
6:     let be the objective value of (37)
7:     until with .
8:end for
Algorithm 2 Penalty CCP for solving (P2)

In Algorithm 2, the initial point is randomly picked from a standard uniform distribution. Note that is not necessarily feasible for (P2) since violations of constraints are allowed. We also remark that once (after at most iterations), the penalty CCP reduces to CCP. Therefore, the penalty CCP enjoys the same convergence properties of CCP.

The computation cost of Algorithm 2 is dominated by the solution of the SDP (37) at Step 2. This leads to the complexity by using the interior-point alogrithm in off-the-shelf solvers [43, Chapter 11], where and are the number of optimization variables and the size of the semidefinite matrix, respectively. In (37), the number of optimization variables is proportional to . Therefore, the complexity of Algorithm 2 is roughly given by . Clearly, computing solutions to SDPs becomes inefficient for problems of medium or large size. In what follows, we will develop an ADMM-based algorithm that is more amenable to large-scale optimization.

V-C Fast algorithm via ADMM

It has been shown in [35, 46, 47, 48] that ADMM is a powerful tool for solving large-scale optimization problems. The major advantage of ADMM is that it allows us to split the original problem into subproblems, each of which can be solved more efficiently or even analytically. In what follows, we will employ ADMM to solve problem (37).

It is shown in Appendix B that problem (37) can be reformulated in a way that lends itself to the application of ADMM. This is achieved by introducing slack variables and indicator functions to express the inequality constraints of problem (37) as linear equality constraints together with cone constraints with respect to slack variables, including second-order cone and positive semidefinite cone constraints.

ADMM is performed based on the augmented Lagrangian [35] of the reformualted problem (37), and leads to two problems, the first of which can be treated as an unconstrained quadratic program and the latter renders an analytical solution. These two problems are solved iteratively and ‘communicate’ to each other through special quadratic terms in their objectives; the quadratic term in each problem contains information about the solution of the other problem and also about dual variables (also known as Lagrange multipliers). In what follows, we refer to these problems as the ‘-minimization’ and ‘-minimization’ problems. Here denotes the set of primal variables , , , and for , and denotes the set of slack variables , and for and . We also use to denote the set of dual variables , and for and . The ADMM algorithm is precisely described by (58) – (60) in Appendix B.

We emphasize that the crucial property of the ADMM approach is that, as we demonstrate in the rest of this section, the solution of each of the - and -minimization problems can be found exactly and efficiently.

V-C1 -minimization step

The -minimization problem can be cast as

(38)

The objective function of problem (38) is given by (V-C1), where for , , and for and , and denotes the ADMM iteration. For ease of notation, we will omit the ADMM iteration index in what follows.

(39)

We note that problem (38) is an unconstrained quadratic program (UQP) with large amounts of variables. In order to reduce the computational complexity and memory requirement in optimization, we will employ a gradient descent method [39] together with a backtracking line search [39, Chapter 9.2] to solve this UQP. In Proposition 1, we show the gradient of the objective function of problem (38).

Proposition 1

The gradient of the objective function of problem (38) is given by

where , is the column of after the last entry is removed, , , is a submatrix of that contains its first rows and columns, , is the first element of , returns the diagonal entries of its matrix argument in vector form, is a submatrix of after the first rows and columns are removed, is a submatrix of after the first row and column are removed, is a submatrix of after the last row and column are removed, and .

Proof: See Appendix C.

In Proposition 1, the optimal values of and are achieved by letting