A joint decomposition method for global optimization of multiscenario nonconvex mixed-integer nonlinear programs
This paper proposes a joint decomposition method that combines Lagrangian decomposition and generalized Benders decomposition, to efficiently solve multiscenario nonconvex mixed-integer nonlinear programming (MINLP) problems to global optimality, without the need for explicit branch and bound search. In this approach, we view the variables coupling the scenario dependent variables and those causing nonconvexity as complicating variables. We systematically solve the Lagrangian decomposition subproblems and the generalized Benders decomposition subproblems in a unified framework. The method requires the solution of a difficult relaxed master problem, but the problem is only solved when necessary. Enhancements to the method are made to reduce the number of the relaxed master problems to be solved and ease the solution of each relaxed master problem. We consider two scenario-based, two-stage stochastic nonconvex MINLP problems that arise from integrated design and operation of process networks in the case study, and we show that the proposed method can solve the two problems significantly faster than state-of-the-art global optimization solvers.
Keywords:Generalized Benders decomposition Dantzig-Wolfe decomposition Lagrangian decomposition Joint decomposition Mixed-integer nonlinear programming Global optimization Stochastic programming
Global optimization is a field of mathematical programming devoted to obtaining global optimal solutions; and it has over the years found enormous applications in Process Systems Engineering (PSE). Mixed-integer nonlinear programs are global optimization problems where some decision variables are integer while others are continuous. Discrete decisions and nonconvex nonlinearities introduce combinatorial behavior for such problems floudas1995nam () tawarmalani2004goo (). Various applications of mixed-integer nonlinear programming for PSE systems include natural gas network design and operation li2010spp (), gasoline blending and scheduling problems li2011 (), expansion of chemical processes sahinidis1991cpo (), reliable design of software Berman1993 () floudas1999hot (), pump network problem adjiman2000goo () westerlund1994oop (), chemical process design synthesis Duran1986 (), planning of facility investments for electric power generation Bloom1983 (), etc.
As adopted for mixed-integer linear programing (MILP), branch-and-bound has been employed for global optimization of nonconvex mixed-integer nonlinear programs (MINLP) Falksoland1969 () Soland1971 () tawarmalani2004goo (). The method entails systematically generating lower and upper bounds of the optimal objective function value over subdomains of the search space. The lower bounds can be generated via convex relaxations (such as McCommick relaxations mccormick1976cog ()) or Lagrangian relaxation (or called Lagrangian decomposition) Fisher1981 ()cynthia1998 () Guignard1987 (). Ways of generating multipliers for the Lagrangian subproblem exist, including subgradient methods Held1974 (), cutting plane methods Fisher1981 (), and the Dantzig-Wolfe master problem (also known the restricted Lagrangian master problem) van1983cross () OgbeLi2016 ().
Branch-and-bound based strategies can be improved by incorporation of domain reduction techniques. Domain reduction entails eliminating portions of the feasible domain based on feasibility and optimality. Bound tightening or contraction Zamora1999 (), range reduction ryoo1996bat () and generation of cutting planes misener-floudas2014 () are different domain reduction strategies that have been successful in solving nonconvex problems floudas1999hot (). In bound contraction, the variable bounds are shrunk at every iteration by solving bound contraction subproblems Zamora1999 (). In range reduction, the bounds on the variables are shrunk based on simple calculations using Lagrange multiplier information ryoo1996bat (). For cutting planes generation, Lagrangian relaxation information provides cuts that is used to cut-off portion of the feasible domain that does not contain the global optimum Karuppiah2008 (). Current state-of-the-art commercial deterministic global optimization solvers embody branch-and-bound and enhancements such as tighter convex relaxations and domain reduction techniques, such as the Branch-And-Reduce Optimization Navigator (BARON) tawarmalani2004goo () and Algorithms for coNTinuous/Integer Global Optimization of Nonlinear Equations (ANTIGONE) misener-floudas:ANTIGONE:2014 (). They do provide rigorous frameworks for global optimization of Problem (P0).
Branch-and-bound based methods have been successful for global optimization, mostly for small to medium sized problems. However, when the size of the problem becomes large, the branch-and-bound steps needed for convergence can be prohibitively large. A typical example of large-scale nonconvex MINLP is the following multiscenario optimization problem:
where links subparts of the model that are indexed by , and it is called linking variable in the paper. We assume that at least one of the functions , , , or one of the sets and is nonconvex, so Problem (P0) is a nonconvex MINLP, or a nonconvex nonlinear program (NLP) if no integer variables are involved. Clearly, (P0) is a large-scale problem when is large. Problem (P0) has attracted more and more attention over the last 20 years in the field of PSE sahinidis2004ouu (). It usually arises from scenario-based two-stage stochastic programming birge2010its () vanslyke1969 (), for which represents the first stage decisions that are made before the uncertainty is realized and represents second-stage decisions that are made after the uncertainty is revealed in scenario . Functions and represent probability times costs associated with and for every scenario . Problem (P0) can also arise from integrated system design and operation problems which consider system operation over multiple time periods (but without uncertainties), such as for energy polygeneration plants chen2011 () and electrical power distribution networks frank2015710 ()). In this case, represents system design decisions and represents system operational decisions for time period (or scenario) , and and represent frequency of occurrence of time period times investment cost and operational cost, respectively. In this paper, we focus on how to efficiently solve Problem (P0) to global optimality, rather than how to generate scenarios and probabilities for stochastic programming or the time periods and their occurrence frequencies for multiperiod optimization.
It is well-known that Problem (P0) has a decomposable structure that could be exploited for efficient solution. Benders decomposition (BD) benders1962pps () (known as L-shaped method in the stochastic programming literature vanslyke1969 () birge2010its ()) is one class of decomposition methods applied for MILPs. Geoffrion geoffrion1972gbd () generalized BD into Generalized Benders Decomposition (GBD), for solving convex MINLPs. Li et al. developed a further extension, called Nonconvex Generalized Benders Decomposition li2011 (), for solving nonconvex MINLPs, but this method can guarantee global optimality only if the linking variable is fully integer. Karuppiah and Grossmann applied a Lagrangian decomposition-based scheme to solve Problem (P0) Karuppiah2008lbb (); in order to guarantee convergence to a global optimum, explicit branch-and-bound of linking variables are needed. They also presented bound contraction as an optional scheme in their Lagrangian-based branch-and-bound strategy. Shim et al. Shim2013 () proposed a method that combines Lagrangian decomposition and BD together with branch-and-bound (to ensure convergence), in order to solve a class of bilevel programs with an integer program in the upper-level and a complementarity problem in the lower-level. A more recent algorithm combining NGBD and Lagrangian decomposition was proposed by Kannan and Barton Rohit2015 (), and this algorithm also requires explicit branch-and-bound for convergence.
Efforts have been taken to achieve better computational efficiency by combining classical decomposition methods. In 1983, Van Roy proposed a cross decomposition method that combines Lagrangian decomposition and Benders decomposition van1983cross () to solve MILP problems which do not have non-linking integer variables. Since then, a number of extensions and variants of cross decomposition have been developed vanroy1984 () Holmberg1990 () Holmberg1997 () Deep1993 () Mitra2016 () OgbeLi2016 (). All of these methods require that no nonconvexity comes from non-linking variables as otherwise finite convergence cannot be guaranteed.
The performance of branch-and-bound based solution methods depends heavily on the branching and node selection strategies, but what are the best strategies for a particular problem are usually unknown. In addition, branching and node selection strategies are not able to fully exploit the problem structure. Therefore, the goal of this paper is to develop a new decomposition method for global optimization of Problem (P0), which does not require explicit branch-and-bound. The new decomposition method was inspired by cross decomposition, and it follows a similar algorithm design philosophy, combining primarily generalized Benders decomposition and Lagrangian decomposition. However, its decomposition procedure is rather different in many details due to the nonconvexity it has to deal with, so we do not call it cross decomposition, but a new name joint decomposition. To the best of our knowledge, this is the first decomposition method that can solve Problem (P0) to global optimality without explicitly performing branch-and-bound (but the solution of nonconvex subproblems requires branch-and-bound based solvers).
The remaining part of the article is organized as follows. In section 2, we give a brief introduction to generalized Benders decomposition and Lagrangian decomposition, using a reformulation of Problem (P0). Then in section 3, we present the basic joint decomposition algorithm and the convergence proof. Section 4 discusses enhancements to the basic joint decomposition algorithm, including domain reduction and use of extra convex relaxation subproblems. The joint decomposition methods are tested with two case study problems adapted from the literature, and the simulation results demonstrate the effectiveness and the computational advantages of the methods. The article ends with concluding remarks in section 6.
2 Problem reformulation and classical decomposition methods
In order to bring up the joint decomposition idea, we reformulate Problem (P0) and briefly discuss how the reformulated problem can be solved via classical GBD and LD methods. The reformulation starts from separating the convex part and the nonconvex part of the problem, and it ends up in the following form:
where set is convex, set is nonconvex, and set can be either convex or nonconvex. The first group of equations in (P) are nonanticipativity constraints (NACs) Guignard1987 ()Caroe199937 () Karuppiah2008 (), where matrix selects from the duplicated for scenario . The details of transforming (P0) to (P) are provided in Appendix A.
and are the two reasons why Problem (P) is difficult to solve. Linking variables couple different subparts of the model and they cause nonconvexity if set is nonconvex. Variables cause nonconvexity due to the nonconvexity of set . If the values of and are fixed, the problem will be much easier to solve. Therefore, in this paper we call and complicating variables. In order to distinguish the two sets of variables, we also call linking variables, and non-linking complicating variables. We also call non-complicating variables.
The classical GBD method can be used to solve Problem (P) by treating and as complicating variables, while the LD method can be used to solve Problem (P) by dualizing NACs so that no long links different scenarios. In the next two subsections we briefly introduce GBD and LD for Problem (P), and we make the following assumptions for Problem (P) for convenience of discussion.
, and for all are non-empty and compact.
After fixing to any point in , if Problem (P) is feasible, it satisfies Slater condition.
Assumption 1 is a mild assumption, as for most real-world applications, the variables are naturally bounded and the functions involved are continuous. If a discontinuous function is involved, it can usually be expressed with continuous functions and extra integer variables. Assumption 2 ensures strong duality of convex subproblems that is required for GBD. If this assumption is not satisfied for a problem, we can treat the non-complicating variables that fail the Slater condition to be complicating variables, so that after fixing all complicating variables the Slater condition is satisfied.
2.1 Generalized Benders decomposition
At each GBD iteration , fixing the complicating variables , () results in an upper bounding problem that can be decomposed into the following Benders primal subproblem for each scenario :
is the optimal objective value of (BPP). For convenience, we indicate the optimal objective value of a problem in the above way for all subproblems discussed in this paper. Obviously, represents an upper bound for Problem (P). If (BPP) is infeasible for one scenario, then solve the following Benders feasibility subproblem for each scenario :
where , , and are slack variables. Note that (BFP) is always feasible according to Assumption 1. Solution of (BFP) provides a feasibility cut (that is described below), which prevents the generation of the same infeasible and lasdon1970spp ().
At the same iteration, the following Benders relaxed master problem is solved to yield a lower bound for Problem (P):
where includes Lagrange multipliers for the first group of constraints in Problem (BPP) or (BFP), and includes Lagrange multipliers for the second group of constraints in Problem (BPP) or (BFP). Set includes indices of Benders iterations at which only (BPP) is solved, and set includes indices of Benders iterations at which (BFP) is solved. Note that Problem (BRMP)) is used in the multicut BD or GBD, which is different from the one used in the classical single cut BD or GBD. The multicut version of the Benders master problem is known to be tighter than the single cut version birge1988mat () OgbeLi2015b (), so it is considered in this paper.
The finite convergence property of GBD is stated and proved in geoffrion1972gbd (). In Section 3, we will provide more details in the context of our new decomposition method.
2.2 Lagrangian decomposition
We start discussing LD from the Lagrangian dual of Problem (P) that is constructed by dualizing the NACs of the problem:
where is the optimal objective value of the following Lagrangian subproblem with given :
Due to weak duality, Problem (DP) or any Lagrangian subproblem is a lower bounding problem for Problem (P). Typically, the LD method is incorporated in a branch-and-bound framework that only needs to branch on linking variables to guarantee convergence to an -optimal solution. At each branch-and-bound node or LD iteration , a set of multipliers are selected to construct a Lagrangian subproblem for (DP), and this subproblem can be naturally decomposed into subproblems, i.e.,
for all . Let be the optimal objective value of the Lagrangian subproblem, then . Clearly, always holds. If happens to be an optimal solution of (DP), then .
The upper bounds in the LD methods are typically generated by fixing to certain values. At each iteration , an upper bounding problem, or called primal problem, is constructed via fixing (which may be the solution of (LS)), and this problem can be separated into primal subproblem in the following form:
Let be the optimal objective value of the primal problem, then .
For generation of multipliers, we take the idea from Dantzig-Wolfe decomposition, which is essentially a special LD method. Consider the convex hull of nonconvex set :
where denotes a point in that is indexed by . The index set may need to be an infinite set for being the convex hull. Replace with its convex hull for all in (P), then we get the following Dantzig-wolfe master problem, or called primal master problem in this paper:
Clearly, Problem (PMP) is a relaxation of Problem (P), and it is either fully convex or partially convex (as set can still be nonconvex). At LD iteration , the following restriction of (PMP) can be solved:
where index set is finite. may consist of indices of that are generated in the previously solved primal problems and Lagrangian subproblems. Replacing set with set is a restriction operation, so (RPMP) is a restriction of (PMP). Since (PMP) is a relaxation of (P), (RPMP) is neither a relaxation nor a restriction of (P), so it does not yield an upper or a lower bound of (P). The role of (RPMP) in joint decomposition is to generate multipliers for NACs in order to construct a Lagrangian subproblem for iteration . Problem (RPMP) can be solved by a state-of-the-art optimization solver directly or by GBD.
Actually, we can construct a different Lagrangian dual of Problem (P) by dualizing both the NACs and the second group of constraints in the problem, as what we do for GBD in the last subsection. However, this Lagrangian dual is not as tight as Problem (DP) (as stated by the following proposition), so it is not preferred for a LD method. The following proposition follows from Theorem 3.1 of Guignard1987 () and its proof is omitted here.
3 The joint decomposition method
3.1 Synergizing LD and GBD
In the LD method described in the last section, at each iteration the subproblems to be solved are much easier than the original problem (P), as either the size of the subproblem is independent of number of scenarios, such as (PP), (LS), and (LS), or the subproblem is a MILP or convex MINLP that can be solved by existing optimization solvers or by GBD relatively easily, such as (RPMP). However, without branching on the linking variables , LD cannot guarantee finding a global solution, and we do not always know how to exploit the problem structure to efficiently branch on and whether the branching can be efficient enough.
On the other hand, GBD can find a global solution, but it requires solving the nonconvex relaxed master problem (BRMP) at each iteration. The size of (BRMP) may be much smaller than the size of (P) if most variables in (P) are non-complicating variables, but (BRMP) can still be difficult to solve, especially considering that it needs to be solved at each iteration and its size grows with the number of iterations.
Therefore, there may be a way to combine LD and GBD, such that we solve as many as possible LD subproblems and Benders primal subproblems (BPP) (as they are relatively easy to solve), but avoid solving many difficult Benders relaxed master problems (BRMP). This idea is similar to the one that motivates cross decomposition van1983cross (), but it leads to very different subproblems and a very different algorithmic procedure. The subproblems are very different, because for problem (P), we prefer dualizing only NACs in LD in order to achieve the smallest possible dual gap (according to Proposition 1), but we have to dualize both the NACs and the second group of constraints in GBD. In addition, due to the different nature of the subproblems, the order in which the subproblems are solved and how often the problems are solved are different. Therefore, we do not name the proposed method cross decomposition, but call it joint decomposition (JD).
Fig. 1 shows the basic framework of JD. Each JD iteration includes one LD iteration part, as indicated by the solid lines, and possibly one GBD iteration, as indicated by the dashed lines. In a JD iteration, the GBD iteration is performed only when the LD iteration improves over the previous LD iteration substantially. The GBD iteration is same to the one described in the last section, except that the relaxed master problem (BRMP) includes more valid cuts (which will be described later). The LD iteration is slightly different from the one described in the last section. One difference is that, after solving (PP) at LD iteration , a Benders primal problem (BPP) is constructed using (which is used for constructing (PP)) and (which is from the optimal solution of (PP)). The (BPP) is solved to generate a Benders cut that can be added to (BRMP). The other difference is that (RPMP), (LS), (LS) (decomposed from (LS)) slightly differ from the ones described in the last section, and they will be described later.
The JD method requires that all subproblems can be solved using an existing optimization solver within reasonable time. If this requirement is not met, then JD does not work, or we have to further decompose the difficult subproblems into smaller, solvable subproblems.
3.2 Feasibility issues
According to Assumption 1, a subproblem in JD either has a solution or is infeasible. Here we explain how JD handles infeasibility of a subproblem.
Second, if (BPP) or (BPP) is infeasible, then JD will solve the corresponding Benders feasibility problem (BFP) or (BFP) to yield a feasibility cut. If (BFP) or (BFP) is infeasible, then (P) is infeasible and JD can terminate.
Third, if (PP) is infeasible, then JD will solve a feasibility problem that ”softens” the second group of constraints: and this problem can be separated into subproblems as follows:
If (FP) is infeasible for one scenario , then (P) is infeasible and JD can terminate. If (FP) is feasible for all scenarios, then JD can construct and solve a feasible Benders feasibility problem (BFP) to yield a Benders feasibility cut for (BRMP).
Finally, problem (RPMP) can actually be infeasible if none of the in the problem is feasible for the original problem (P). To prevent this infeasibility, we can generate a point that is feasible for (P), by solving the following initial feasibility problem:
Problem (IFP) is not naturally decomposable over the scenarios, but it can be solved by JD. When solving (IFP) using JD, the restricted primal master problem (RPMP) must have a solution (according to Assumption 1).
3.3 The tightened subproblems
The relaxed master problem described in Section 2 can be tightened with the solutions of previously solved subproblems in JD. The tightened problem, called joint decomposition relaxed master problem, can be written as:
Since it is already known that Problem (BRMP) is a valid lower bounding problem and and are valid upper and lower bounds, we only need to prove that the cuts from Lagrangian subproblems together with the Benders optimality cuts do not exclude an optimal solution. Let be the optimal objective value of (P), then
On the one hand, ,
On the other hand,
where . From weak duality, ,
For convenience, we call the cuts from the Lagrangian subproblems, Lagrangian cuts. The Benders cuts and the Lagrangian cuts in (JRMP) imply that, ,
Now we get new constraints
which only include variable and do not link different scenarios. This constraint can be used to enhance any subproblems that involves as variables. Specifically, problems (LS), (LS), (RPMP) can be enhanced as: