A Proximal Dual Consensus ADMM Method for Multi-Agent Constrained Optimization

# A Proximal Dual Consensus ADMM Method for Multi-Agent Constrained Optimization

Tsung-Hui Chang, Member, IEEE The work of Tsung-Hui Chang is supported by Ministry of Science and Technology, Taiwan (R.O.C.), under Grant NSC 102-2221-E-011-005-MY3. Tsung-Hui Chang is the corresponding author. Address: Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan, (R.O.C.). E-mail: tsunghui.chang@ieee.org.
###### Abstract

This paper studies efficient distributed optimization methods for multi-agent networks. Specifically, we consider a convex optimization problem with a globally coupled linear equality constraint and local polyhedra constraints, and develop distributed optimization methods based on the alternating direction method of multipliers (ADMM). The considered problem has many applications in machine learning and smart grid control problems. Due to the presence of the polyhedra constraints, agents in the existing methods have to deal with polyhedra constrained subproblems at each iteration. One of the key issues is that projection onto a polyhedra constraint is not trivial, which prohibits from closed-form solutions or the use of simple algorithms for solving these subproblems. In this paper, by judiciously integrating the proximal minimization method with ADMM, we propose a new distributed optimization method where the polyhedra constraints are handled softly as penalty terms in the subproblems. This makes the subproblems efficiently solvable and consequently reduces the overall computation time. Furthermore, we propose a randomized counterpart that is robust against randomly ON/OFF agents and imperfect communication links. We analytically show that both the proposed methods have a worst-case convergence rate, where is the iteration number. Numerical results show that the proposed methods offer considerably lower computation time than the existing distributed ADMM method.

EDICS: OPT-DOPT, MLR-DIST, NET-DISP, SPC-APPL.

## I Introduction

Multi-agent distributed optimization [1] has been of great interest due to applications in sensor networks [2], cloud computing networks [3] and due to recent needs for distributed large-scale signal processing and machine learning tasks [4]. Distributed optimization methods are appealing because the agents access and process local data and communicate with connecting neighbors only [1], thereby particularly suitable for applications where the local data size is large and the network structure is complex. Many of the problems can be formulated as the following optimization problem

 (P) minx=[xT1,…,xTN]T∈RNK F(x)≜∑Ni=1fi(xi) (1a) s.t.  ∑Ni=1Eixi=q, (1b) Cixi⪯di,xi∈Si,}≜Xi, i=1,…,N. (1c)

In (1), is a local control variable owned by agent , is a local cost function, , , , and are locally known data matrices (vectors) and constraint set, respectively. The constraint (1b) is a global constraint which couples all the ’s; while each in (1c) is a local constraint set of agent which consists of a simple constraint set (in the sense that projection onto is easy to implement) and a polyhedra constraint . It is assumed that each agent knows only , , and , and the agents collaborate to solve the coupled problem (P). Examples of (P) include the basis pursuit (BP) [5] and LASSO problems [6] in machine learning, the power flow and load control problems in smart grid [7], the network flow problem [8] and the coordinated transmission design problem in communication networks [9], to name a few.

In this paper, we improve upon the works in [19] by presenting new computationally efficient distributed optimization methods for solving (P). Specifically, due to the presence of the polyhedra constraints in (1c), the agents in the existing methods have to solve a polyhedra constrained subproblem at each iteration. Since projection onto the polyhedra constraint is not trivial, closed-form solutions are not available and, moreover, simple algorithms such as the gradient projection method [22] cannot handle this constrained subproblem efficiently. To overcome this issue, we propose in this paper a proximal DC-ADMM (PDC-ADMM) method where each of the agents deals with a subproblem with simple constraints only, which is therefore more efficiently implementable than DC-ADMM. This is made possible by the use of the proximal minimization method [14, Sec. 3.4.3] to deal with the dual variables associated with the polyhedra constrains, so that the constraints can be softly handled as penalty terms in the subproblems. Our contributions are summarized as follows.

• We propose a new PDC-ADMM method, and show that the proposed method converges to an optimal solution of (P) with a worst-case convergence rate, where is the iteration number. Numerical results will show that the proposed PDC-ADMM method exhibits a significantly lower computation time than DC-ADMM in [19].

• We further our study by presenting a randomized PDC-ADMM method that is tolerable to randomly ON/OFF agents and robust against imperfect communication links. We show that the proposed randomized PDC-ADMM method is convergent to an optimal solution of (P) in the mean, with a worst-case convergence rate.

The rest of this paper is organized as follows. Section II presents the applications, network model and assumptions of (P). The PDC-ADMM method and the randomized PDC-ADMM method are presented in Section III and Section IV, respectively. Numerical results are presented in Section V and conclusions are given in Section VI.

Notations: () means that matrix is positive semidefinite (positive definite); indicates that for all , where means the th element of vector . is the identity matrix; is the -dimensional all-one vector. denotes the Euclidean norm of vector , represents the 1-norm, and for some ; is a diagonal matrix with the th diagonal element being . Notation denotes the Kronecker product. denotes the maximum eigenvalue of the symmetric matrix .

## Ii Applications, Network Model and Assumptions

### Ii-a Applications

Problem (P) has applications in machine learning [6, 4], data communications [9, 8] and the emerging smart grid systems [13, 7, 23, 24], to name a few. For example, when , (P) is the least-norm solution problem of the linear system ; when , (P) is the well-known basis pursuit (BP) problem [5, 17]; and if , then (P) is the BP problem with group sparsity [6]. The LASSO problem can also be recast as the form of (P). Specifically, consider a LASSO problem [6] with column partitioned data model [17, Fig. 1],[25],

 minxi∈Xi,i=1,…,N ∥∥∥N∑i=1Aixi−b∥∥∥22+λN∑i=1∥xi∥1, (2)

where ’s contain the training data vectors, is a response signal and is a penalty parameter. By defining , one can equivalently write (2) as

 minx0∈RL,xi∈Si,i=1,…,N ∥x0∥22+λN∑i=1∥xi∥1 (3a) s.t. N∑i=1Aixi−x0=b, (3b) Cixi⪯di, i=1,…,N, (3c)

which is exactly an instance of . The polyhedra constraint can rise, for example, in the monotone curvature fitting problem [26]. Specifically, suppose that one wishes to fit a signal vector over some fine grid of points , using a set of monotone vectors , . Here, each is modeled as where contains the basis vectors and is the fitting parameter vector. To impose monotonicity on , one needs constraints of , , if is non-increasing. This constitutes a polyhedra constraint on . Readers may refer to [26] for more about constrained LASSO problems.

On the other hand, the load control problems [13, 7, 23] and microgrid control problems [24] in the smart grid systems are also of the same form as (P). Specifically, consider that a utility company manages the electricity consumption of customers for power balance. Let denote the power supply vector and be the power consumption vector of customer ’s load, where is the load control variable. For many types of electricity loads (e.g., electrical vehicle (EV) and batteries), the load consumption can be expressed as a linear function of [23, 24], i.e., , where is a mapping matrix. Besides, the variables ’s are often subject to some control constraints (e.g., maximum/minimium charging rate and maximum capacity et al.), which can be represented by a polyhedra constraint for some and . Then, the load control problem can be formulated as

 minx0∈RL,xi∈RK,i=1,…,N U(x0) (4a) s.t. N∑i=1Eixi−x0=q, (4b) Cixi⪯di, i=1,…,N, (4c)

where is a slack variable and is the cost function for power imbalance. Problem (4) is again an instance of (P).

### Ii-B Network Model and Assumptions

We model the multi-agent network as a undirected graph , where is the set of nodes (i.e, agents) and is the set of edges. In particular, an edge if and only if agent and agent are neighbors; that is, they can communicate and exchange messages with each other. Thus, for each agent , one can define the index subset of its neighbors as . Besides, the adjacency matrix of the graph is defined by the matrix , where if and otherwise. The degree matrix of is denoted by . We assume that

###### Assumption 1

The undirected graph is connected.

Assumption 1 is essential for consensus optimization since it implies that any two agents in the network can always influence each other in the long run. We also have the following assumption on the convexity of (P).

###### Assumption 2

(P) is a convex problem, i.e., ’s are proper closed convex functions (possibly non-smooth), and ’s are closed convex sets; there is no duality gap between (P) and its Lagrange dual; moreover, the minimum of (P) is attained and so is its optimal dual value.

## Iii Proposed Proximal Dual Consensus ADMM Method

In the section, we propose a distributed optimization method for solving (P), referred to as the proximal dual consensus ADMM (PDC-ADMM) method. We will compare the proposed PDC-ADMM method with the existing DC-ADMM method in [19], and discuss the potential computational merit of the proposed PDC-ADMM.

The proposed PDC-ADMM method considers the Lagrange dual of (P). Let us write (P) as follows

 minxi∈Si,ri⪰0,∀i∈V N∑i=1fi(xi) (5a) s.t.  N∑i=1Eixi=q, (5b) Cixi+ri−di=0 ∀i∈V, (5c)

where , are introduced slack variables. Denote as the Lagrange dual variable associated with constraint (5b), and as the Lagrange dual variable associated with each of the constraints in (5c). The Lagrange dual problem of (5) is equivalent to the following problem

 miny∈RL,zi∈RP∀i∈V N∑i=1(φi(y,zi)+1NyTq+zTidi) (6)

where

 φi(y,zi)≜maxxi∈Si,ri≥0{−fi(xi)−yTEixi−zTi(Cixi+ri)}, (7)

for all . To enable multi-agent distributed optimization, we allow each agent to have a local copy of the variable , denoted by , while enforcing the distributed ’s to be the same across the network through proper consensus constraints. This is equivalent to reformulating (6) as the following problem

 minyi,zi,si{tij}∀i∈V N∑i=1(φi(yi,zi)+1NyTiq+zTidi) (8a) s.t. yi=tij ∀j∈Ni, i∈V, (8b) yj=tij ∀j∈Ni, i∈V, (8c) zi=si, ∀i∈V, (8d)

where and are slack variables. Constraints (8b) and (8c) are equivalent to the neighbor-wise consensus constraints, i.e., . Under Assumption 1, neighbor-wise consensus is equivalent to global consensus; thus (8) is equivalent to (6). It is worthwhile to note that, while constraint (8d) looks redundant at this stage, it is a key step that constitutes the proposed method as will be clear shortly.

Let us employ the ADMM method [14, 15] to solve (8). ADMM concerns an augmented Lagrangian function of (8)

 Lc≜N∑i=1(φi(yi,zi)+1NyTiq+zTidi) +N∑i=1∑j∈Ni(uTij(yi−tij)+vTij(yj−tij))+N∑i=1wTi(zi−si) +c2N∑i=1∑j∈Ni(∥yi−tij∥22+∥yj−tij∥22)+N∑i=1τi2∥zi−si∥22, (9)

where , and are the Lagrange dual variables associated with each of the constraints in (8b), (8c) and (8d), respectively, and and are penalty parameters. Then, by applying the standard ADMM steps [14, 15] to solve problem (8), we obtain: for iteration ,

 (yki,zki)=argminyi,zi {φi(yi,zi)+1NyTiq+zTidi +∑j∈Ni((yi−tk−1ij)Tuk−1ij+(yi−tk−1ji)Tvk−1ji) +c2∑j∈Ni(∥yi−tk−1ij∥22+∥yi−tk−1ji∥22) +τi2∥zi−sk−1i+wk−1iτi∥22} ∀i∈V, (10) tkij=argmintij{∥yki−tij+uk−1ijc∥22+∥ykj−tij+vk−1ijc∥22} ∀j∈Ni, i∈V, (11) ski=argminsi ∥zki−si+wk−1iτi∥22  ∀i∈V, (12) wki=wk−1i+τi(zki−ski) ∀i∈V, (13) ukij=uk−1ij+c(yki−tkij) ∀j∈Ni, i∈V, (14) vkji=vk−1ji+c(yki−tkji) ∀j∈Ni, i∈V, (15)

Equations (10), (11) and (12) involve updating the primal variables of (8) in a one-round Gauss-Seidel fashion; while equations (13), (14) and (15) update the dual variables.

It is shown in Appendix A that

 tkij=tkji=yki+ykj2, wki=0, ski=zki, (16)

for all and for all . By (16), equations (10) to (15) can be simplified to the following steps

 (yki,zki)=argminyi,zi{φi(yi,zi)+1NyTiq+zTidi +yTi∑j∈Ni(uk−1ij+vk−1ji)+c∑j∈Ni∥yi−yk−1i+yk−1j2∥22 +τi2∥zi−zk−1i∥22} ∀i∈V, (17) ukij=uk−1ij+c(yki−yki+ykj2) ∀j∈Ni, i∈V, (18) vkji=vk−1ji+c(yki−yki+ykj2) ∀j∈Ni, i∈V. (19)

By letting

 pki≜∑j∈Ni(ukij+vkji) ∀i∈V, (20)

(18) and (19) reduce to

 pki=pk−1i+c∑j∈Ni(yki−ykj) ∀i∈V. (21)

On the other hand, note that the subproblem in (17) is a strongly convex problem. However, it is not easy to handle as subproblem (17) is in fact a min-max (saddle point) problem (see the definition of in (7)). Fortunately, by applying the minimax theorem [27, Proposition 2.6.2] and exploiting the strong convexity of (17) with respect to , one may avoid solving the min-max problem (17) directly. As we show in Appendix B, of subproblem (17) can be conveniently obtained in closed-form as follows

 yki=12|Ni|(∑j∈Ni(yk−1i+yk−1j)−1cpk−1i +1c(Eixki−1Nq)), (22a) zki=zk−1i+1τi(Cixki+rki−di). (22b)

where is given by an solution to the following quadratic program (QP)

 (xki,rki)=argminxi∈Si,ri⪰0{fi(xi)+c4|Ni|∥∥1c(Eixi−1Nq)−1cpki +∑j∈Ni(yk−1i+yk−1j)∥∥22 +12τi∥Cixi+ri−di+τizk−1i∥22}. (23)

As also shown in Appendix B, the dummy constraint in (8d) and the augmented term in (III) are essential for arriving at (22) and (III). Since they are equivalent to applying the proximal minimization method [14, Sec. 3.4.3] to the variables ’s in (8), we name the developed method above the proximal DC-ADMM method. In Algorithm 1, we summarize the proposed PDC-ADMM method. Note that the PDC-ADMM method in Algorithm 1 is fully parallel and distributed except that, in (29), each agent requires to exchange with its neighbors.

The PDC-ADMM method in Algorithm 1 is provably convergent, as stated in the following theorem.

###### Theorem 1

Suppose that Assumptions 1 and 2 hold. Let and , be a pair of optimal primal-dual solution of (5) (i.e., (P)), where and , and let (which stacks all for all ) be an optimal dual variable of problem (8). Moreover, let

 ¯xMi≜1M∑Mk=1xki, ¯rMi≜1M∑Mk=1rki  ∀i∈V, (24)

and , where are generated by (3). Then, it holds that

 |F(¯xM)−F(x⋆)|+∥N∑i=1Ei¯xMi−q∥2 +N∑i=1∥Ci¯xMi+¯rMi−di∥2≤(1+δ)C1+C2M, (25)

where and are constants, in which and .

The proof is presented in Appendix C. Theorem 1 implies that the proposed PDC-ADMM method asymptotically converges to an optimal solution of (P) with a worst-case convergence rate.

As discussed in Appendix B, if one removes the dummy constraint from (8) and the augmented term