A dualitybased approach for distributed minmax optimization
Abstract
In this paper we consider a distributed optimization scenario in which a set of processors aims at cooperatively solving a class of minmax optimization problems. This setup is motivated by peakdemand minimization problems in smart grids. Here, the goal is to minimize the peak value over a finite horizon with: (i) the demand at each time instant being the sum of contributions from different devices, and (ii) the device states at different time instants being coupled through local constraints (e.g., the dynamics). The minmax structure and the double coupling (through the devices and over the time horizon) makes this problem challenging in a distributed setup (e.g., existing distributed dual decomposition approaches cannot be applied). We propose a distributed algorithm based on the combination of duality methods and properties from minmax optimization. Specifically, we repeatedly apply duality theory and properly introduce adhoc slack variables in order to derive a series of equivalent problems. On the resulting problem we apply a dual subgradient method, which turns out to be a distributed algorithm consisting of a minimization on the original primal variables and a suitable dual update. We prove the convergence of the proposed algorithm in objective value. Moreover, we show that every limit point of the primal sequence is an optimal (feasible) solution. Finally, we provide numerical computations for a peakdemand optimization problem in a network of thermostatically controlled loads.
I Introduction
Distributed optimization problems arise as building blocks of several network problems in different areas as, e.g., control, estimation and learning. On this regard, the addition of processing, measurement, communication and control capability to the electric power grid is leading to “smart grids” in which tasks, that were typically performed at a central level, can be more efficiently performed by smart devices in a cooperative way. Therefore, these complex systems represent a rich source of motivating optimization scenarios. An interesting example is the design of smart generators, accumulators and loads that cooperatively execute Demand Side Management (DSM) programs [2]. The goal is to reduce the hourly and daily variations and peaks of electric demand by optimizing generation, storage and consumption. A widely adopted objective in DSM programs is PeaktoAverage Ratio (PAR), defined as the ratio between peakdaily and averagedaily power demands. PAR minimization gives rise to a minmax optimization problem if the average daily electric load is assumed not to be affected by the demand response strategy.
This problem has been already investigated in the literature in a noncooperative framework. In [3] the authors propose a gametheoretic model for PAR minimization and provide a distributed energycostbased strategy for the users which is proven to be optimal. A noncooperativegame approach is also proposed in [4], where optimal strategies are characterized and a distributed scheme is designed based on a proximal decomposition algorithm. It is worth pointing out that in the literature above the term “distributed” is used to indicate that data are deployed on a set of devices, which perform local computation simultaneously. However, the nodes do not run a “distributed algorithm”, that is they do not cooperate and do not exchange information locally over a communication graph.
Motivated by this application scenario, in this paper we propose a novel distributed optimization framework for minmax optimization problems commonly found in DSM problems. Differently from the references above, we consider a cooperative, distributed computation model in which the agents in the network solve the optimization problem (i) without any knowledge of aggregate quantities, (ii) by communicating only with neighboring agents, and (iii) by performing local computations (with no central coordinator).
The distributed algorithm proposed in the paper heavily relies on duality theory. Duality is a widely used tool for parallel and (classical) distributed optimization algorithms as shown, e.g., in the tutorials [5, 6]. More recently, in [7] a distributed, consensusbased, primaldual algorithm is proposed to solve constrained optimization problems with separable convex costs and common convex constraints. In [8] the authors use the same technique to solve optimization problems with coupled smooth convex costs and convex inequality constraints. In the proposed algorithm, agents employ a consensus technique to estimate the global cost and constraint functions and use a local primaldual perturbed subgradient method to obtain a global optimum. These approaches do not apply to optimization problems as the one considered in this paper.
Primal recovery is a key issue in dual methods, since the primal sequence is not guaranteed, in general, to satisfy the dualized primal constraint. Thus, several strategies have been proposed to cope with this issue. In [9], the authors propose and analyze a centralized algorithm for generating approximate primal solutions via a dual subgradient method applied to a convex constrained optimization problem. Moreover, in the paper the problem of (exact) primal recovery and rate analysis of existing techniques is widely discussed. In [10], still in a centralized setup, the primal convergence rate of dual firstorder methods is studied when the primal problem is only approximately solved. In [11] a distributed algorithm is proposed to generate approximate dual solutions for a problem with separable cost function and coupling constraints. A similar optimization setup is considered in [12] in a distributed setup. A dual decomposition approach combined with a proximal minimization is proposed to generate a dual solution. In the last two papers, a primal recovery mechanism is proposed to obtain a primal optimal solution.
Another tool used to develop and analyze the distributed algorithm we propose in the paper is minmax optimization, which is strictly related to saddlepoint problems. In [13] the authors propose a subgradient method to generate approximate saddlepoints. A minmax problem is also considered in [14] and a distributed algorithm based on a suitable penalty approach has been proposed. Differently from our setup, in [14] each term of the maxfunction is local and entirely known by a single agent. Another class of algorithms exploits the exchange of active constraints among the network nodes to solve constrained optimization problems which include minmax problems, [15, 16]. Although they work under asynchronous, directed communication they do not scale in setups as the one in this paper, in which the terms of the max function are coupled. Very recently, in [17] the authors proposed a distributed projected subgradient method to solve constrained saddlepoint problems with agreement constraints. The proposed algorithm is based on saddlepoint dynamics with Laplacian averaging. Although our problem setup fits in those considered in [17], our algorithmic approach and the analysis are different. In [18, 19] saddle point dynamics are used to design distributed algorithms for standard separable optimization problems.
The main contributions of this paper are as follows. First, we propose a novel distributed optimization framework which is strongly motivated by peak powerdemand minimization in DSM. The optimization problem has a minmax structure with local constraints at each node. Each term in the max function represents a daily cost (so that the maximum over a given horizon needs to be minimized), while the local constraints are due to the local dynamics and state/input constraints of the subsystems in the smart grid. The problem is challenging when approached in a distributed way since it is doubly coupled. Each term of the max function is coupled among the agents, since it is the sum of local functions each one known by the local agent only. Moreover, the local constraints impose a coupling between different “days” in the timehorizon. The goal is to solve the problem in a distributed computation framework, in which each agent only knows its local constraint and its local objective function at each day.
Second, as main paper contribution, we propose a distributed algorithm to solve this class of minmax optimization problems. The algorithm has a very simple and clean structure in which a primal minimization and a dual update are performed. The primal problem has a similar structure to the centralized one. Despite this simple structure, which resembles standard distributed dual methods, the algorithm is not a standard decomposition scheme [6], and the derivation of the algorithm is nonobvious. Specifically, the algorithm is derived by heavily resorting to duality theory and properties of minmax optimization (or saddlepoint) problems. In particular, a sequence of equivalent problems is derived in order to decompose the originally coupled problem into locallycoupled subproblems, and thus being able to design a distributed algorithm. An interesting feature of the algorithm is its expression in terms of the original primal variables and of dual variables arising from two different (dual) problems. Since we apply duality more than once, and on different problems, this property, although apparently intuitive, was not obvious a priori. Another appealing feature of the algorithm is that every limit point of the primal sequence at each node is a (feasible) optimal solution of the original optimization problem (although this is only convex and not strictly convex). This property is obtained by the minimizing sequence of the local primal subproblems without resorting to averaging schemes. Finally, since each node only computes the decision variable of interest, our algorithm can solve both largescale (many agents are present) and bigdata (a large horizon is considered) problems.
The paper is structured as follows. In Section II we formalize our distributed minmax optimization setup and present the main contribution of the paper, a novel, dualitybased distributed optimization method. In Section III we characterize its convergence properties. In Section IV we corroborate the theoretical results with a numerical example involving peak power minimization in a smartgrid scenario. Finally, in Appendix we provide some useful preliminaries from optimization, specifically basics on duality theory and a result for the subgradient method.
Ii Problem Setup and Distributed
Optimization Algorithm
In this section we setup the distributed minmax optimization framework and propose a novel distributed algorithm to solve it.
Iia Distributed minmax optimization setup
We consider a network of processors which communicate according to a connected, undirected graph , where is the set of edges. That is, the edge models the fact that node and exchange information. We denote by the set of neighbors of node in the fixed graph , i.e., . Also, we denote by the element of the adjacency matrix. We recall that if and , and otherwise.
Next, we introduce the minmax optimization problem to be solved by the network processors in a distributed way. Specifically, we associate to each processor a decision vector , a constraint set and local functions , , and setup the following optimization problem
(1) 
where for each the set is nonempty, convex and compact, and the functions , , are convex.
Note that we use the superscript to indicate that a vector belongs to node , while we use the subscript to identify a vector component, i.e., is the th component of .
Using a standard approach for minmax problems, we introduce an auxiliary variable to write the so called epigraph representation of problem (1), given by
(2) 
It is worth noticing that this problem has a particular structure, which gives rise to interesting challenges in a distributed setup. First of all, two types of couplings are present, which involve simultaneously the agents and the components of each decision variable . Specifically, for a given index , the constraint couples all the vectors , . At the same time, for a given , the constraint couples all the components of . Figure 1 provides a nice graphical representation of this interlaced coupling. Moreover, the problem is both largescale and bigdata. That is, both the number of decision variables and the number of constraints depend on (and thus scale badly with the number of agents in the network). Also, the dimension of the coupling constraint, , can be large. Therefore, common approaches as reaching a consensus among the nodes on an optimal solution and/or exchanging constraints are not computationally affordable.
To conclude this section, notice that problem (2) is convex, but not strictly convex. This means that it is not guaranteed to have a unique optimal solution. As discussed in the introduction, this impacts on dual approaches when trying to recover a primal optimal solution, see e.g., [9] and references therein. This aspect is even more delicate in a distributed setup in which nodes only know part of the constraints and of the objective function.
IiB Distributed DualityBased Peak Minimization (DDPM)
Next, we introduce our distributed optimization algorithm. Informally, the algorithm consists of a twostep procedure. First, each node stores a set of variables ((, ), ) obtained as a primaldual optimal solution pair of a local optimization problem with an epigraph structure as the centralized problem. The coupling with the other nodes in the original formulation is replaced by a term depending on neighboring variables , . These variables are updated in the second step according to a suitable linear law weighting the difference of neighboring . Nodes use a diminishing stepsize denoted by and can initialize the variables , to arbitrary values. In the next table we formally state our Distributed DualityBased Peak Minimization (DDPM) algorithm from the perspective of node .
(3) 
(4) 
The structure of the algorithm and the meaning of the updates will be clear in the constructive analysis carried out in the next section. At this point we want to point out that although problem (3) has the same epigraph structure of problem (2), is not a copy of the centralized cost , but rather a local contribution to that cost. That is, as we will see, the total cost will be the sum of the s.
Iii Algorithm Analysis
The analysis of the proposed DDPM distributed algorithm is constructive and heavily relies on duality theory tools.
We start by deriving the equivalent dual problem of (2) which is formally stated in the next lemma.
Lemma III.1
Proof:
We start showing that problem (5) is the dual of
(2). Let
be Lagrange multipliers
associated to the inequality constraints
for in
(2). Then the partial Lagrangian
By definition, the dual function is defined as
where the presence of constraints for all is due to the fact that we have not dualized them.
The minimization of with respect to gives rise to the simplex constraint . The minimization with respect to splits over , so that the dual function can be written as the sum of terms given in (6).
To prove strong duality, we show that the strong duality theorem for convex inequality constraints, [20, Proposition 5.3.1], applies. Since the sets , , are convex (and compact), we need to show that the inequality constraints for all are convex and that there exist , , and such that the strict inequality holds. Since each and are convex functions, then for all each function
is convex. Also, since the sets , are nonempty, there exist , , and a sufficiently large (finite) such that the strict inequalities , are satisfied and, thus, the Slater’s condition holds. Finally, since a feasible point for the convex problem (1) always exists, then the optimal cost is finite and so , thus concluding the proof.
In order to make problem (5) amenable for a distributed solution, we can rewrite it in an equivalent form. To this end, we introduce copies of the common optimization variable and coherence constraints having the sparsity of the connected graph , thus obtaining
(7) 
Notice that we have also duplicated the simplex constraint, so that it becomes local at each node.
To solve this problem, we can use a dual decomposition approach by designing a dual subgradient algorithm. Notice that dual methods can be applied to (7) since the constraints are convex and the cost function concave. Also, as known in the distributed optimization literature, a dual subgradient algorithm applied to problem (7) would immediately result into a distributed algorithm if functions were available in closed form.
Remark III.2
In standard convex optimization deriving the dual of a dual problem brings back to a primal formulation. However, we want to stress that in what we will develop in the following, problem (7) is dualized rather than problem (5). In particular, different constraints are dualized, namely the coherence constraints rather than the simplex ones. Therefore, it is not obvious if and how this leads back to a primal formulation.
We start deriving the dual subgradient algorithm by dualizing only the coherence constraints. Thus, we write the partial Lagrangian
(8) 
where for all are Lagrange multipliers associated to the constraints .
Since the communication graph is undirected and connected, we can exploit the symmetry of the constraints. In fact, for each we also have , and, expanding all the terms in (8), for given and , we always have both the terms and . Thus, after some simple algebraic manipulations, we get
(9) 
which is separable with respect to .
In the next lemma we characterize the properties of problem (10).
Lemma III.3
Proof: Since problem (5) is a dual problem, its cost function is concave on its domain, which is convex (simplex constraint). Moreover, by Lemma III.1 its optimal cost is finite. Problem (7) is an equivalent formulation of (5) and, thus, has the same (finite) optimal cost . This allows us to conclude that strong duality holds between problem (7) and its dual (10), so that, . The second equality in (12) holds by Lemma III.1, so that the proof follows.
Problem (10) has a particularly appealing structure for distributed computation. In fact, the cost function is separable and each term of the cost function depends only on neighboring variables and with . Thus, a subgradient method applied to this problem turns out to be a distributed algorithm. Since problem (10) is the dual of (7) we recall, [20, Section 6.1], how to compute a subgradient of with respect to each component, that is,
(13) 
where denotes the component associated to the variable of a subgradient of , and
for .
The distributed dual subgradient algorithm for problem (7) can be summarized as follows. For each node :

receive , for each , and compute a subgradient by solving
(14)
It is worth noting that in (14) the value of and , for , is fixed as highlighted by the index . Moreover, we want to stress, once again, that the algorithm is not implementable as it is written, since functions are not available in closed form.
On this regard, we point out that here we slightly abuse notation since in (S1)(S2) we use as in the DDPM algorithm, but without proving its equivalence yet. Since we will prove it in the next, we preferred not to overweight the notation.
Before proving the convergence of the updates (S1)(S2) we need the following lemma.
Lemma III.4
For each , the function defined in (6) is concave over .
Proof: For each , consider the (feasibility) convex problem
subj. to 
Then, is the dual function of that problem and, thus, is a concave function on its domain, namely .
We can now prove the convergence in objective value of the dual subgradient.
Lemma III.5
Proof: As already recalled in equation (13), we can build subgradients of by solving problem in the form (14). Since in (14), the maximization of the concave (Lemma III.4) function is performed over the nonempty, compact (and convex) probability simplex , , then the maximum is always attained at a finite value. As a consequence, at each iteration the subgradients of are bounded quantities. Moreover, the stepsize satisfies Assumption B.3 and, thus, we can invoke Proposition B.4 which guarantees that (S1)(S2) converges in objective value to the optimal cost of problem (10) so that the proof follows.
We can explicitly rephrase update (14) by plugging in the definition of , given in (6), thus obtaining the following maxmin optimization problem
(15) 
Notice that (15) is a local problem at each node once and for all are given. Thus, the dual subgradient algorithm (S1)(S2) could be implemented in a distributed way by letting each node solve problem (15) and exchange and with neighbors . Next we further explore the structure of (15) to prove that DDPM solves the original problem (2).
The next lemma is a first instrumental result.
Lemma III.6
Consider the optimization problem
(16) 
with given , and , . Then, the problem
(17) 
is dual of (16) and strong duality holds.
Proof: First, since (as well as and ) is given, problem (16) is a feasible linear program (the simplex constraint is nonempty) and, thus, strong duality holds. Introducing a scalar multiplier associated to the constraint , we write the partial Lagrangian of (16)
and rearrange it as
The dual function is equal to with domain given by the inequalities , . Thus, the dual problem is obtained by maximizing the dual function over its domain giving (17), so that the proof follows.
The next lemma is a second instrumental result.
Lemma III.7
Define
(18) 
and note that (i) is closed and convex for all (affine transformation of a convex function with compact domain ) and (ii) is closed and concave since it is a linear function with compact domain (, ), for all . Thus we can invoke Proposition A.2 which allows us to switch and operators, and write
(19) 
Since the inner maximization problem depends nonlinearly on (which is itself an optimization variable), it cannot be performed without also considering the simultaneous minimization over . We overcome this issue by substituting the inner maximization problem with its equivalent dual minimization. In fact, by Lemma III.6 we can rephrase the right hand side of (19) as
(20) 
At this point, a joint (constrained) minimization with respect to and can be simultaneously performed leading to problem (3).
To prove the second part, namely that a primaldual optimal solution pair exists and solves problem (15), we first notice that problem (3) is convex. Indeed, the cost function is linear and the constraints are convex ( is convex as well as the functions and ). Then, by using similar arguments as in Lemma III.1, we can show that the problem satisfies the Slater’s constraint qualification and, thus, strong duality holds. Therefore, a primaldual optimal solution pair exists and from the previous arguments solves (15), thus concluding the proof.
Remark III.8 (Alternative proof of Lemma iii.7)
Let , be (nonnegative) Lagrange multipliers associated to the inequality constraints of problem (3). Then, its (partial) Lagrangian can be written as
and collecting the multiplier , we obtain
The minimization of with respect to constrains the norm of the dual variable (i.e., ). Then, minimizing the reminder over and maximizing the result over gives problem (15).
We point out that in the previous lemma we have shown that the minimization in (3) turns out to be equivalent to performing step (S1). An important consequence of Lemma III.6 is that each iteration of the algorithm can be in fact performed (since a primadual optimal solution pair of (3) exists). This is strictly related to the result of Lemma III.5. In fact, the solvability of problem (3) is equivalent to the boundedness, at each , of the subgradients of . This is ensured, equivalently, by the compactness of the simplex constraint in (14).
The next corollary is a byproduct of the proof of Lemma III.7.
Corollary III.9
Proof: To prove the corollary, we first rewrite explicitly the definition of given in (11), i.e.,
(22) 
Then, being the optimal cost of problem (3), it is also the optimal cost of problem (20), which is equivalent to the right hand side of equation (19). The proof follows by noting that the expression of in (22) is exactly the left hand side of (19) after rearranging some terms.
We are now ready to state the main result of the paper, namely the convergence of the DDPM distributed algorithm.
Theorem III.10
Proof: We prove the theorem by combining all the results given in the previous lemmas.
First, for each , let , and , , be the auxiliary sequences defined in the DDPM distributed algorithm associated to . From Lemma III.7 a primaldual optimal solution pair of (3) in fact exists (so that the algorithm is wellposed) and solves (15). Recalling that solving (15) is equivalent to solving (14), it follows that in the DDPM implements step (S1) of the dual subgradient (S1)(S2). Noting that update (4) of is exactly step (S2), it follows that DDPM is an operative way to implement the dual subgradient algorithm (S1)(S2). From Lemma III.5 the algorithm converges in objective value, that is
where the second equality follows from Lemma III.3. Then, we notice that from Corollary III.9
so that , thus concluding the proof of the first statement.
To prove the second statement, we show that every limit point of the (primal) sequence , , is feasible and optimal for problem (1).
For analysis purposes, let us introduce the sequence defined as
(23) 
for each . Notice that is also the cost of problem (2) associated to and thus, by definition of optimality, satisfies
(24) 
for all .
By summing over both sides of inequality constraints in (3), at each the following holds
(25) 
Let us denote the th entry of the adjacency matrix associated to the undirected graph . Then, we can write
Since the graph is undirected for all and thus
Hence, (25) reduces to
(26) 
for all and .
For all , since is a bounded sequence in , then there exists a convergent subsequence . Let be its limit point. Since each is a (finite) convex function over , it is also continuous over any compact subset of and, taking the limit of (26), we can write
(27) 
for , where the last equality follows from the first statement of the theorem. Since the subsequence is arbitrary, we have shown that every limit point , , is feasible.