Adaptive Augmented Lagrangian Methods:
Algorithms and Practical Numerical Experience—Detailed Version
In this paper, we consider augmented Lagrangian (AL) algorithms for solving large-scale nonlinear optimization problems that execute adaptive strategies for updating the penalty parameter. Our work is motivated by the recently proposed adaptive AL trust region method by Curtis, Jiang, and Robinson [Math. Prog., DOI: 10.1007/s10107-014-0784-y, 2013]. The first focal point of this paper is a new variant of the approach that employs a line search rather than a trust region strategy, where a critical algorithmic feature for the line search strategy is the use of convexified piecewise quadratic models of the AL function for computing the search directions. We prove global convergence guarantees for our line search algorithm that are on par with those for the previously proposed trust region method. A second focal point of this paper is the practical performance of the line search and trust region algorithm variants in \Matlab software, as well as that of an adaptive penalty parameter updating strategy incorporated into the \lancelot software. We test these methods on problems from the \CUTEst and \COPS collections, as well as on challenging test problems related to optimal power flow. Our numerical experience suggests that the adaptive algorithms outperform traditional AL methods in terms of efficiency and reliability. As with traditional AL algorithms, the adaptive methods are matrix-free and thus represent a viable option for solving extreme-scale problems.
onlinear optimization, nonconvex optimization, large-scale optimization, augmented Lagrangians, matrix-free methods, steering methods
49M05, 49M15, 49M29, 49M37, 65K05, 65K10, 90C06, 90C30, 93B40
Augmented Lagrangian (AL) methods [25, 33] have recently regained popularity due to growing interests in solving extreme-scale nonlinear optimization problems. These methods are attractive in such settings as they can be implemented matrix-free [2, 3, 11, 28] and have global and local convergence guarantees under relatively weak assumptions [18, 26]. Furthermore, certain variants of AL methods [20, 21] have proved to be very efficient for solving certain structured problems [6, 34, 36].
A new AL trust region method was recently proposed and analyzed in . The novel feature of that algorithm is an adaptive strategy for updating the penalty parameter inspired by techniques for performing such updates in the context of exact penalty methods [7, 8, 29]. This feature is designed to overcome a potentially serious drawback of traditional AL methods, which is that they may be ineffective during some (early) iterations due to poor choices of the penalty parameter and/or Lagrange multiplier estimates. In such situations, the poor choices of these quantities may lead to little or no improvement in the primal space and, in fact, the iterates may diverge from even a well-chosen initial iterate. The key idea for avoiding this behavior in the algorithm proposed in  is to adaptively update the penalty parameter during the step computation in order to ensure that the trial step yields a sufficiently large reduction in linearized constraint violation, thus steering the optimization process steadily toward constraint satisfaction.
The contributions of this paper are two-fold. First, we present an AL line search method based on the same framework employed for the trust region method in . The main difference between our new approach and that in , besides the differences inherent in using line searches instead of a trust region strategy, is that we utilize a convexified piecewise quadratic model of the AL function to compute the search direction in each iteration. With this modification, we prove that our line search method achieves global convergence guarantees on par with those proved for the trust region method in . The second contribution of this paper is that we perform extensive numerical experiments with a \Matlab implementation of the adaptive algorithms (i.e., both line search and trust region variants) and an implementation of an adaptive penalty parameter updating strategy in the \lancelot software . We test these implementations on problems from the \CUTEst  and \COPS  collections, as well as on test problems related to optimal power flow . Our results indicate that our adaptive algorithms outperform traditional AL methods in terms of efficiency and reliability.
The remainder of the paper is organized as follows. In §2, we present our adaptive AL line search method and state convergence results. Details and proofs of these results, which draw from those in , can be found in Appendices A and B. We then provide numerical results in §3 to illustrate the effectiveness of our implementations of our adaptive AL algorithms. We give conclusions in §4.
Notation. We often drop function arguments once a function is defined. We also use a subscript on a function name to denote its value corresponding to algorithmic quantities using the same subscript. For example, for a function , if is the value for the variable during iteration of an algorithm, then . We also often use subscripts for constants to indicate the algorithmic quantity to which they correspond. For example, denotes a parameter corresponding to the algorithmic quantity .
2 An Adaptive Augmented Lagrangian Line Search Algorithm
We assume that all problems under our consideration are formulated as
Here, we assume that the objective function and constraint function are twice continuously differentiable, and that the variable lower bound vector and upper bound vector satisfy . Our goal is to design an algorithm that will compute a first-order primal-dual stationary point for problem (1). However, in order for the algorithm to be suitable as a general-purpose approach, it should have mechanisms for terminating and providing useful information when an instance of (1) is (locally) infeasible. In such cases, we have designed our algorithm so that it transitions to finding an infeasible first-order stationary point for the nonlinear feasibility problem
where is defined as .
As implied by the previous paragraph, our algorithm requires first-order stationarity conditions for problems (1) and (2), which can be stated in the following manner. First, introducing a Lagrange multiplier vector , we define the Lagrangian for problem (1), call it , by
Then, defining the gradient of the objective function by , the Jacobian of the constraint functions by , and the projection operator onto the bounds , component-wise for , by
we may introduce the primal-dual stationarity measure given by
First-order primal-dual stationary points for (1) can then be characterized as zeros of the primal-dual stationarity measure defined by stacking the stationarity measure and the constraint function , i.e., a first-order primal-dual stationary point for (1) is any pair with satisfying
Similarly, a first-order primal stationary point for (2) is any with satisfying
where is defined by
Over the past decades, a variety of effective numerical methods have been proposed for solving large-scale bound-constrained optimization problems. Hence, the critical issue in solving problem (1) is how to handle the presence of the equality constraints. As in the wide variety of penalty methods that have been proposed, the strategy adopted by AL methods is to remove these constraints, but push the algorithm to satisfy them through the addition of influential terms in the objective. In this manner, problem (1) (or at least (2)) can be solved via a sequence of bound-constrained subproblems—thus allowing AL methods to exploit the methods that are available for subproblems of this type. Specifically, AL methods consider a sequence of subproblems in which the objective is a weighted sum of the Lagrangian and the constraint violation measure . By scaling by a penalty parameter , each subproblem involves the minimization of a function , called the augmented Lagrangian (AL), defined by
Observe that the gradient of the AL with respect to , evaluated at , is given by
where we define the function by
Hence, each subproblem to be solved in an AL method has the form
Given a pair , a first-order stationary point for problem (6) is any zero of the primal-dual stationarity measure , defined similarly to but with the Lagrangian replaced by the augmented Lagrangian; i.e., given , a first-order stationary point for (6) is any satisfying
Given a pair with , a traditional AL method proceeds by (approximately) solving (6), which is to say that it finds a point, call it , that (approximately) satisfies (7). If the resulting pair is not a first-order primal-dual stationary point for (1), then the method would modify the Lagrange multiplier or penalty parameter so that, hopefully, the solution of the subsequent subproblem (of the form (6)) yields a better primal-dual solution estimate for (1). The function plays a critical role in this procedure. In particular, observe that if , then and (7) would imply , i.e., that is a first-order primal-dual stationary point for (1). Hence, if the constraint violation at is sufficiently small, then a traditional AL method would set the new value of as . Otherwise, if the constraint violation is not sufficiently small, then the penalty parameter is decreased to place a higher priority on reducing it during subsequent iterations.
2.2 Algorithm Description
Our AL line search algorithm is similar to the AL trust region method proposed in , except for two key differences: it executes line searches rather than using a trust region framework, and it employs a convexified piecewise quadratic model of the AL function for computing the search direction in each iteration. The main motivation for utilizing a convexified model is to ensure that each computed search direction is a direction of strict descent for the AL function from the current iterate, which is necessary to ensure the well-posedness of the line search. However, it should be noted that, practically speaking, the convexification of the model does not necessarily add any computational difficulties when computing each direction; see §3.1.1. Similar to the trust region method proposed in , a critical component of our algorithm is the adaptive strategy for updating the penalty parameter during the search direction computation. This is used to ensure steady progress—i.e., steer the algorithm—toward solving (1) (or at least (2)) by monitoring predicted improvements in linearized feasibility.
The central component of each iteration of our algorithm is the search direction computation. In our approach, this computation is performed based on local models of the constraint violation measure and the AL function at the current iterate, which at iteration is given by . The local models that we employ for these functions are, respectively, and , defined as follows:
We note that is a typical Gauss-Newton model of the constraint violation measure , and is a convexification of a second-order approximation of the augmented Lagrangian. (We use the notation rather than simply to distinguish between the model above and the second-order model—without the max—that appears extensively in .)
Our algorithm computes two types of steps during each iteration. The purpose of the first step, which we refer to as the steering step, is to gauge the progress towards linearized feasibility that may be achieved (locally) from the current iterate. This is done by (approximately) minimizing our model of the constraint violation measure within the bound constraints and a trust region. Then, a step of the second type is computed by (approximately) minimizing our model of the AL function within the bound constraints and a trust region. If the reduction in the model yielded by the latter step is sufficiently large—say, compared to that yielded by the steering step—then the algorithm proceeds using this step as the search direction. Otherwise, the penalty parameter may be reduced, in which case a step of the latter type is recomputed. This process repeats iteratively until a search direction is computed that yields a sufficiently large (or at least not too negative) reduction in . As such, the iterate sequence is intended to make steady progress toward (or at least approximately maintain) constraint satisfaction throughout the optimization process, regardless of the initial penalty parameter value.
We now describe this process in more detail. During iteration , the steering step is computed via the optimization subproblem given by
where, for some constant , the trust region radius is defined to be
A consequence of this choice of trust region radius is that it forces the steering step to be smaller in norm as the iterates of the algorithm approach any stationary point of the constraint violation measure . This prevents the steering step from being too large relative to the progress that can be made toward minimizing . While (8) is a convex optimization problem for which there are efficient methods, in order to reduce computational expense our algorithm only requires to be an approximate solution of (8). In particular, we merely require that yields a reduction in that is proportional to that yielded by the associated Cauchy step (see (16a) later on), which is defined to be
for such that, for some , the step satisfies
Appropriate values for and —along with auxiliary nonnegative scalar quantities and to be used in subsequent calculations in our method—are computed by Algorithm 1. The quantity representing the predicted reduction in constraint violation yielded by is guaranteed to be positive at any that is not a first-order stationary point for subject to the bound constraints; see part (i) of Lemma A.4. We define a similar reduction for the steering step .
After computing a steering step , we proceed to compute a trial step via
where, given from the output of Algorithm 1, we define the trust region radius
As for the steering step, we allow inexactness in the solution of (12) by only requiring the step to satisfy a Cauchy decrease condition (see (16b) later on), where the Cauchy step for problem (12) is
for such that, for returned from Algorithm 1, yields
Algorithm 2 describes our procedure for computing and . (The importance of incorporating in (13) and in (15) is revealed in the proofs of Lemmas A.2 and A.3.) The quantity representing the predicted reduction in yielded by is guaranteed to be positive at any that is not a first-order stationary point for subject to the bound constraints; see part (ii) of Lemma A.4. A similar quantity is also used for the search direction .
Our complete algorithm is given as Algorithm 3 on page 3. In particular, the th iteration proceeds as follows. Given the th iterate tuple , the algorithm first determines whether the first-order primal-dual stationarity conditions for (1) or the first-order stationarity condition for (2) are satisfied. If either is the case, then the algorithm terminates, but otherwise the method enters the while loop in line 11 to check for stationarity with respect to the AL function. This loop is guaranteed to terminate finitely; see Lemma A.1. Next, after computing appropriate trust region radii and Cauchy steps, the method enters a block for computing the steering step and trial step . Through the while loop on line 19, the overall goal of this block is to compute (approximate) solutions of subproblems (8) and (12) satisfying
In these conditions, the method employs user-provided constants and the algorithmic quantity representing the th constraint violation target. It should be noted that, for sufficiently small , many approximate solutions to (8) and (12) satisfy (16), but for our purposes (see Theorem 2.2) it is sufficient that, for sufficiently small , they are at least satisfied by and . A complete description of the motivations underlying conditions (16) can be found in [15, Section 3]. In short, (16a) and (16b) are Cauchy decrease conditions while (16c) ensures that the trial step predicts progress toward constraint satisfaction, or at least predicts that any increase in constraint violation is limited (when the right-hand side is negative).
With the search direction in hand, the method proceeds to perform a backtracking line search along the strict descent direction for at . Specifically, for a given , the method computes the smallest integer such that
and then sets and . The remainder of the iteration is then composed of potential modifications of the Lagrange multiplier vector and target values for the accuracies in minimizing the constraint violation measure and AL function subject to the bound constraints. First, the method checks whether the constraint violation at the next primal iterate is sufficiently small compared to the target . If this requirement is met, then a multiplier vector that satisfies
is computed. Two obvious potential choices for are and , but another viable candidate would be an approximate least-squares multiplier estimate (which may be computed via a linearly constrained optimization subproblem). The method then checks if either or is sufficiently small with respect to the target value . If so, then new target values and are set, is chosen, and a new Lagrange multiplier vector is set as
where is the largest value in such that
This updating procedure is well-defined since the choice results in , for which (20) is satisfied since . If either line 26 or line 28 in Algorithm 3 tests false, then the method simply sets . We note that unlike more traditional augmented Lagrangian approaches [2, 11], the penalty parameter is not adjusted on the basis of a test like that on line 26, but instead relies on our steering procedure. Moreover, in our approach we decrease the target values at a linear rate for simplicity, but more sophisticated approaches may be used .
2.3 Well-posedness and global convergence
In this section, we state two vital results, namely that Algorithm 3 is well posed, and that limit points of the iterate sequence have desirable properties. Proofs of these results, which are similar to those in , are given in Appendices A and B. In order to show well-posedness of the algorithm, we make the following formal assumption.
At each given , the objective function and constraint function are both twice-continuously differentiable.
Under this assumption, we have the following theorem.
According to Theorem 2.2, we have that Algorithm 3 will either terminate finitely or produce an infinite sequence of iterates. If it terminates finitely—which can only occur if line 6 or 9 is executed—then the algorithm has computed a first-order stationary solution or an infeasible stationary point and there is nothing else to prove about the algorithm’s performance in such cases. Therefore, it remains to focus on the global convergence properties of Algorithm 3 under the assumption that the sequence is infinite. For such cases, we make the following additional assumption.
The primal sequences and are contained in a convex compact set over which the objective function and constraint function are both twice-continuously differentiable.
Our main global convergence result for Algorithm 3 is as follows.
every limit point of is an infeasible stationary point;
and there exists an infinite ordered set such that every limit point of is first-order stationary for (1); or
, every limit point of is feasible, and if there exists a positive integer such that for all sufficiently large , then there exists an infinite ordered set such that any limit point of either or is first-order stationary for (1).
Observe that the conclusions are exactly the same as in [15, Theorem 3.14]. We also call the readers attention to the comments following [15, Theorem 3.14], which discuss the consequences of these results. In particular, these comments suggest how Algorithm 3 may be modified to guarantee convergence to first-order stationary points, even in case (iii) of the theorem. However, as mentioned in , we do not consider these modifications to the algorithm to have practical benefits. This perspective is supported by the numerical tests presented in the following section.
3 Numerical Experiments
In this section, we provide evidence that steering can have a positive effect on the performance of AL algorithms. To best illustrate the influence of steering, we implemented and tested algorithms in two pieces of software. First, in \Matlab, we implemented our adaptive AL line search algorithm, i.e., Algorithm 3, and the adaptive AL trust region method given as [15, Algorithm 4]. Since these methods were implemented from scratch, we had control over every aspect of the code, which allowed us to implement all features described in this paper and in . Second, we implemented a simple modification of the AL trust region algorithm in the \lancelot software package . Our only modification to \lancelot was to incorporate a basic form of steering; i.e., we did not change other aspects of \lancelot, such as the mechanisms for triggering a multiplier update. In this manner, we were also able to isolate the effect that steering had on numerical performance, though it should be noted that there were differences between Algorithm 3 and our implemented algorithm in \lancelot in terms of, e.g., the multiplier updates.
While we provide an extensive amount of information about the results of our experiments in this section, further information can be found in Appendix C.
3.1 \Matlab implementation
3.1.1 Implementation details
Our \Matlab software was comprised of six algorithm variants. The algorithms were implemented as part of the same package so that most of the algorithmic components were exactly the same; the primary differences related to the step acceptance mechanisms and the manner in which the Lagrange multiplier estimates and penalty parameter were updated. First, for comparison against algorithms that utilized our steering mechanism, we implemented line search and trust region variants of a basic augmented Lagrangian method, given as [15, Algorithm 1]. We refer to these algorithms as \balls (basic augmented Lagrangian, line search) and \baltr (trust region), respectively. These algorithms clearly differed in that one used a line search and the other used a trust region strategy for step acceptance, but the other difference was that, like Algorithm 3 in this paper, \balls employed a convexified model of the AL function. (We discuss more details about the use of this convexified model below.) The other algorithms implemented in our software included two variants of Algorithm 3 and two variants of [15, Algorithm 4]. The first variants of each, which we refer to as \aalls and \aaltr (adaptive, as opposed to basic), were straightforward implementations of these algorithms, whereas the latter variants, which we refer to as \aallssafe and \aaltrsafe, included an implementation of a safeguarding procedure for the steering mechanism. The safeguarding procedure will be described in detail shortly.
The main per-iteration computational expense for each algorithm variant can be attributed to the search direction computations. For computing a search direction via an approximate solve of (12) or [15, Prob. (3.8)], all algorithms essentially used the same procedure. For simplicity, all algorithms considered variants of these subproblems in which the -norm trust region was replaced by an -norm trust region so that the subproblems were bound-constrained. (The same modification was used in the Cauchy step calculations.) Then, starting with the Cauchy step as the initial solution estimate and defining the initial working set by the bounds identified as active by the Cauchy step, a projected conjugate gradient (PCG) method was used to compute an improved solution. During the PCG routine, if a new trial solution violated a bound constraint that was not already part of the working set, then this bound was added to the working set and the PCG routine was reinitialized. By contrast, if the reduced subproblem (corresponding to the current working set) was solved sufficiently accurately, then a check for termination was performed. In particular, multiplier estimates were computed for the working set elements. If these multiplier estimates were all nonnegative (or at least larger than a small negative number), then the subproblem was deemed to be solved and the routine terminated. Otherwise, an element corresponding to the most negative multiplier estimate was removed from the working set and the PCG routine was reinitialized. We do not claim that the precise manner in which we implemented this approach guaranteed convergence to an exact solution of the subproblem. However, the approach just described was based on well-established methods for solving bound-constrained quadratic optimization problems (QPs), and we found that it worked very well in our experiments. It should be noted that if, at any time, negative curvature was encountered in the PCG routine, then the solver terminated with the current PCG iterate. In this manner, the solutions were generally less accurate when negative curvature was encountered, but we claim that this did not have too adverse an effect on the performance of any of the algorithms.
A few additional comments are necessary to describe our search direction computation procedures. First, it should be noted that for the line search algorithms, the Cauchy step calculation in Algorithm 2 was performed with (15) as stated (i.e., with ), but the above PCG routine to compute the search direction was applied to (12) without the convexification for the quadratic term. However, we claim that this choice remains consistent with the stated algorithms since, for all algorithm variants, we performed a sanity check after the computation of the search direction. In particular, the reduction in the model of the AL function yielded by the search direction was compared against that yielded by the corresponding Cauchy step. If the Cauchy step actually provided a better reduction in the model, then the computed search direction was replaced by the Cauchy step. In this sanity check for the line search algorithms, we computed the model reductions with the convexification of the quadratic term (i.e., with ), which implies that, overall, our implemented algorithm guaranteed Cauchy decrease in the appropriate model for all algorithms. Second, we remark that for the algorithms that employed a steering mechanism, we did not employ the same procedure to approximately solve (8) or [15, Prob. (3.4)]. Instead, we simply used the Cauchy steps as approximate solutions of these subproblems. Finally, we note that in the steering mechanism, we checked condition (16c) with the Cauchy steps for each subproblem, despite the fact that the search direction was computed as a more accurate solution of (12) or [15, Prob. (3.8)]. This had the effect that the algorithms were able to modify the penalty parameter via the steering mechanism prior to computing the search direction; only Cauchy steps for the subproblems were needed for steering.
Most of the other algorithmic components were implemented similarly to the algorithm in . As an example, for the computation of the estimates (which are required to satisfy (18)), we checked whether ; if so, then we set , and otherwise we set . Furthermore, for prescribed tolerances , we terminated an algorithm with a declaration that a stationary point was found if
and terminated with a declaration that an infeasible stationary point was found if
As in , this latter set of conditions shows that we did not declare that an infeasible stationary point was found unless the penalty parameter had already been reduced below a prescribed tolerance. This helps in avoiding premature termination when the algorithm could otherwise continue and potentially find a point satisfying (21), which was always the preferred outcome. Each algorithm terminated with a message of failure if neither (21) nor (22) was satisfied within iterations. It should also be noted that the problems were pre-scaled so that the -norms of the gradients of the problem functions at the initial point would be less than or equal to a prescribed constant . The values for all of these parameters, as well as other input parameter required in the code, are summarized in Table 1. (Values for parameters related to updating the trust region radii required by [15, Algorithm 4] were set as in .)
We close this subsection with a discussion of some additional differences between the algorithms as stated in this paper and in  and those implemented in our software. We claim that none of these differences represents a significant departure from the stated algorithms; we merely made some adjustments to simplify the implementation and to incorporate features that we found to work well in our experiments. First, while all algorithms use the input parameter given in Table 1 for decreasing the penalty parameter, we decrease the penalty parameter less significantly in the steering mechanism. In particular, in line 20 of Algorithm 3 and line 20 of [15, Algorithm 4], we replace with . Second, in the line search algorithms, rather than set the trust region radii as in (9) and (13) where appears as a constant value, we defined a dynamic sequence, call it , that depended on the step-size sequence . In this manner, replaced in (9) and (13) for all . We initialized . Then, for all , if , then we set , and if , then we set . Third, to simplify our implementation, we effectively ignored the imposed bounds on the multiplier estimates by setting and . This choice implies that we always chose in (19). Fourth, we initialized the target values as
Finally, in \aallssafe and \aaltrsafe, we safeguard the steering procedure by shutting it off whenever the penalty parameter was smaller than a prescribed tolerance. Specifically, we considered the while condition in line 19 of Algorithm 3 and line 19 of [15, Algorithm 4] to be satisfied whenever .
3.1.2 Results on \CUTEst test problems
We tested our \Matlab algorithms on the subset of problems from the \CUTEst  collection that have at least one general constraint and at most variables and constraints. This set contains test problems. However, the results that we present in this section are only for those problems for which at least one of our six solvers obtained a successful result, i.e., where (21) or (22) was satisfied. This led to a set of problems that are represented in the numerical results in this section.
To illustrate the performance of our \Matlab software, we use performance profiles as introduced by Dolan and Moré  to provide a visual comparison of different measures of performance. Consider a performance profile that measures performance in terms of required iterations until termination. For such a profile, if the graph associated with an algorithm passes through the point , then, on of the problems, the number of iterations required by the algorithm was less than times the number of iterations required by the algorithm that required the fewest number of iterations. At the extremes of the graph, an algorithm with a higher value on the vertical axis may be considered a more efficient algorithm, whereas an algorithm on top at the far right of the graph may be considered more reliable. Since, for most problems, comparing values in the performance profiles for large values of is not enlightening, we truncated the horizontal axis at 16 and simply remark on the numbers of failures for each algorithm.
Figures 2 and 2 show the results for the three line search variants, namely \balls, \aalls, and \aallssafe. The numbers of failures for these algorithms were 25, 3, and 16, respectively. The same conclusion may be drawn from both profiles: the steering variants (with and without safeguarding) were both more efficient and more reliable than the basic algorithm, where efficiency is measured by either the number of iterations (Figure 2) or the number of function evaluations (Figure 2) required. We display the profile for the number of function evaluations required since, for a line search algorithm, this value is always at least as large as the number of iterations, and will be strictly greater whenever backtracking is required to satisfy (17) (yielding ). From these profiles, one may observe that unrestricted steering (in \aalls) yielded superior performance to restricted steering (in \aallssafe) in terms of both efficiency and reliability; this suggests that safeguarding the steering mechanism may diminish its potential benefits.
Figures 4 and 4 show the results for the three trust region variants, namely \baltr, \aaltr, and \aaltrsafe, the numbers of failures for which were 30, 12, and 20, respectively. Again, as for the line search algorithms, the same conclusion may be drawn from both profiles: the steering variants (with and without safeguarding) are both more efficient and more reliable than the basic algorithm, where now we measure efficiency by either the number of iterations (Figure 4) or the number of gradient evaluations (Figure 4) required before termination. We observe the number of gradient evaluations here (as opposed to the number of function evaluations) since, for a trust region algorithm, this value is never larger than the number of iterations, and will be strictly smaller whenever a step is rejected and the trust-region radius is decreased because of insufficient decrease in the AL function. These profiles also support the other observation that was made by the results for our line search algorithms, i.e., that unrestricted steering may be superior to restricted steering in terms of efficiency and reliability.
The performance profiles in Figures 2–4 suggest that steering has practical benefits, and that safeguarding the procedure may limit its potential benefits. However, to be more confident in these claims, one should observe the final penalty parameter values typically produced by the algorithms. These observations are important since one may be concerned whether the algorithms that employ steering yield final penalty parameter values that are often significantly smaller than those yielded by basic AL algorithms. To investigate this possibility in our experiments, we collected the final penalty parameter values produced by all six algorithms; the results are in Table 2. The column titled gives a range for the final value of the penalty parameter. (For example, the value 27 in the column indicates that the final penalty parameter value computed by our basic line search AL algorithm fell in the range for 27 of the problems.)
We remark on two observations about the data in Table 2. First, as may be expected, the algorithms that employ steering typically reduce the penalty parameter below its initial value on some problems on which the other algorithms do not reduce it at all. This, in itself, is not a major concern, since a reasonable reduction in the penalty parameter may cause an algorithm to locate a stationary point more quickly. Second, we remark that the number of problems for which the final penalty parameter was very small (say, less than ) was similar for all algorithms, even those that employed steering. This suggests that while steering was able to aid in guiding the algorithms toward constraint satisfaction, the algorithms did not reduce the value to such a small value that feasibility became the only priority. Overall, our conclusion from Table 2 is that steering typically decreases the penalty parameter more than does a traditonal updating scheme, but one should not expect that the final penalty parameter value will be reduced unnecessarily small due to steering; rather, steering can have the intended benefit of improving efficiency and reliability by guiding a method toward constraint satisfaction more quickly.
3.1.3 Results on \Cops test problems
We also tested our \Matlab software on the large-scale constrained problems available in the \COPS  collection. This test set was designed to provide difficult test cases for nonlinear optimization software; the problems include examples from fluid dynamics, population dynamics, optimal design, mesh smoothing, and optimal control. For our purposes, we solved the smallest versions of the \AMPL models [1, 19] provided in the collection. All of our solvers failed to solve the problems named chain, dirichlet, henon, lane_emden, and robot1, so these problems were excluded. The remaining set consisted of the following 17 problems: bearing, camshape, catmix, channel, elec, gasoil, glider, marine, methanol, minsurf, pinene, polygon, rocket, steering, tetra, torsion, and triangle. Since the size of this test set is relatively small, we have decided to display pair-wise comparisons of algorithms in the manner suggested in . That is, for a performance measure of interest (e.g., number of iterations required until termination), we compare solvers, call them and , on problem with the logarithmic outperforming factor
Therefore, if the measure of interest is iterations required, then would indicate that solver required the iterations required by solver . For all plots, we focus our attention on the range .
The results of our experiments are given in Figures 6–8. For the same reasons as discussed in §3.1.2, we display results for iterations and function evaluations for the line search algorithms, and display results for iterations and gradient evaluations for the trust region algorithms. In addition, here we ignore the results for \aallssafe and \aaltrsafe since, as in the results in §3.1.2, we did not see benefits in safeguarding the steering mechanism. In each figure, a positive (negative) bar indicates that the algorithm whose name appears above (below) the horizontal axis yielded a better value for the measure on a particular problem. The results are displayed according to the order of the problems listed in the previous paragraph. In Figures 6 and 6 for the line search algorithms, the red bars for problems catmix and polygon indicate that \aalls failed on the former and \balls failed on the latter; similarly, in Figures 8 and 8 for the trust region algorithms, the red bar for catmix indicates that \aaltr failed on it.
The results in Figures 6 and 6 indicate that \aalls more often outperforms \balls in terms of iterations and functions evaluations, though the advantage is not overwhelming. On the other hand, it is clear from Figures 8 and 8 that, despite the one failure, \aaltr is generally superior to \baltr. We conclude from these results that steering was beneficial on this test set, especially in terms of the trust region methods.
3.1.4 Results on optimal power flow (OPF) test problems
As a third and final set of experiments for our \Matlab software, we tested our algorithms on a collection of optimal power flow (OPF) problems modeled in \AMPL using data sets obtained from MATPOWER . OPF problems represent a challenging set of nonconvex problems. The active and reactive power flow and the network balance equations give rise to equality constraints involving nonconvex functions while the inequality constraints are linear and result from placing operating limits on quantities such as flows, voltages, and various control variables. The control variables include the voltages at generator buses and the active-power output of the generating units. The state variables consist of the voltage magnitudes and angles at each node as well as reactive and active flows in each link. Our test set was comprised of problems modeled on systems having 14 to 662 nodes from the IEEE test set. In particular, there are seven IEEE systems, each modeled in four different ways: (i) in Cartesian coordinates; (ii) in polar coordinates; (iii) with basic approximations to the sin and cos functions in the problem functions; and (iv) with linearized constraints based on DC power flow equations (in place of AC power flow). It should be noted that while linearizing the constraints in formulation (iv) led to a set of linear optimization problems, we still find it interesting to investigate the possible effect that steering may have in this context. All of the test problems were solved by all of our algorithm variants.
We provide outperforming factors in the same manner as in §3.1.3. Figures 10 and 10 reveal that \aalls typically outperforms \balls in terms of both iterations and function evaluations, and Figures 12 and 12 reveal that \aaltr more often than not outperforms \baltr in terms of iterations and gradient evaluations. Interestingly, these results suggest more benefits for steering in the line search algorithm than in the trust region algorithm, which is the opposite of that suggested by the results in §3.1.3. However, in any case, we believe that we have presented convincing numerical evidence that steering often has an overall beneficial effect on the performance of our \Matlab solvers.
3.2 An implementation of \lancelot that uses steering
3.2.1 Implementation details
The results for our \Matlab software in the previous section illustrate that our adaptive line search AL algorithm and the adaptive trust region AL algorithm from  are often more efficient and reliable than basic AL algorithms that employ traditional penalty parameter and Lagrange multiplier updates. Recall, however, that our adaptive methods are different from their basic counterparts in two key ways. First, the steering conditions (16) are used to dynamically decrease the penalty parameter during the optimization process for the AL function. Second, our mechanisms for updating the Lagrange multiplier estimate are different than the basic algorithm outlined in [15, Algorithm 1] since they use optimality measures for both the Lagrangian and the AL functions (see line 28 of Algorithm 3) rather than only that for the AL function. We believe this strategy is more adaptive since it allows for updates to the Lagrange multipliers when the primal estimate is still far from a first-order stationary point for the AL function subject to the bounds.
In this section, we isolate the effect of the first of these differences by incorporating a steering strategy in the \lancelot [12, 13] package that is available in the \galahad library . Specifically, we made three principle enhancements in \lancelot. First, along the lines of the model in  and the convexified model defined in this paper, we defined the model of the AL function given by
as an alternative to the Newton model , originally used in \lancelot,
As in our adaptive algorithms, the purpose of employing such a model was to ensure that (pointwise) as , which was required to ensure that our steering procedure was well-defined; see (26a). Second, we added routines to compute generalized Cauchy points  for both the constraint violation measure model and during the loop in which was decreased until the steering test (16c) was satisfied; recall the while loop starting on line 19 of Algorithm 3. Third, we used the value for determined in the steering procedure to compute a generalized Cauchy point for the Newton model , which was the model employed to compute the search direction. For each of the models just discussed, the generalized Cauchy point was computed using either an efficient sequential search along the piece-wise Cauchy arc  or via a backtracking Armijo search along the same arc . We remark that this third enhancement would not have been needed if the model were used to compute the search directions. However, in our experiments, it was revealed that using the Newton model typically led to better performance, so the results in this section were obtained using this third enhancement. In our implementation, the user was allowed to control which model was used via control parameters. We also added control parameters that allowed the user to restrict the number of times that the penalty parameter may be reduced in the steering procedure in a given iteration, and that disabled steering once the penalty parameter was reduced below a given tolerance (as in the safeguarding procedure implemented in our \Matlab software).
The new package was tested with three different control parameter settings. We refer to algorithm with the first setting, which did not allow any steering to occur, simply as \Lancelot. The second setting allowed steering to be used initially, but turned it off whenever (as in our safeguarded \Matlab algorithms). We refer to this variant as \LancelotSS. The third setting allowed for steering to be used without any safeguards or restrictions; we refer to this variant as \LancelotS. As in our \Matlab software, the penalty parameter was decreased by a factor of until the steering test (16c) was satisfied. All other control parameters were set to their default \Lancelot values. The new package will be re-branded as \lancelot in the next official release, \galahad 2.6.
was compiled with gfortran-4.7 with optimization -O and using Intel MKL BLAS. The code was executed on a single core of an Intel Xeon E5620 (2.4GHz) CPU with 23.5 GiB of RAM.
3.2.2 Results on \CUTEst test problems
We tested \Lancelot, \LancelotS, and \LancelotSS on the subset of \CUTEst problems that have at least one general constraint and at most variables and constraints. This amounted to test problems. The results are displayed as performance profiles in Figures 14 and 14, which were created from the 364 of these problems that were solved by at least one of the algorithms. As in the previous sections, since the algorithms are trust region methods, we use the number of iterations and gradient evaluations required as the performance measures of interest.
We can make two important observations from these profiles. First, it is clear that \LancelotS and \LancelotSS yielded similar performance in terms of iterations and gradient evaluations, which suggests that safeguarding the steering mechanism is not necessary in practice. Second, \LancelotS and \LancelotSS were both more efficient and reliable than \Lancelot on these tests, thus showing the positive influence that steering can have on performance.
As in §3.1.2, it is important to observe the final penalty parameter values yielded by \LancelotS and \LancelotSS as opposed to those yielded by \Lancelot. For these experiments, we collected this information; see Table 3.
We make a few remarks about the results in Table 3. First, as may have been expected, the \LancelotS and \LancelotSS algorithms typically reduced the penalty parameter below its initial value, even when \Lancelot did not reduce it at all throughout an entire run. Second, the number of problems for which the final penalty parameter was less than was for \Lancelot and for \LancelotS. Combining this fact with the previous observation leads us to conclude that steering tended to reduce the penalty parameter from its initial value of , but, overall, it did not decrease it much more aggressively than \Lancelot. Third, it is interesting to compare the final penalty parameter values for \LancelotS and \LancelotSS. Of course, these values were equal in any run in which the final penalty parameter was greater than or equal to , since this was the threshold value below which safeguarding was activated. Interestingly, however, \LancelotSS actually produced smaller values of the penalty parameter compared to \LancelotS when the final penalty parameter was smaller than . We initially found this observation to be somewhat counterintuitive, but we believe that it can be explained by observing the penalty parameter updating strategy used by \Lancelot. (Recall that once safeguarding was activated in \LancelotSS, the updating strategy became the same used in \Lancelot.) In particular, the decrease factor for the penalty parameter used in \Lancelot is , whereas the decrease factor used in steering the penalty parameter was . Thus, we believe that \LancelotS reduced the penalty parameter more gradually once it was reduced below while \LancelotSS could only reduce it in the typical aggressive manner. (We remark that to (potentially) circumvent this inefficiency in \Lancelot, one could implement a different strategy in which the penalty parameter decrease factor is increased as the penalty parameter decreases, but in a manner that still ensures that the penalty parameter converges to zero when infinitely many decreases occur.) Overall, our conclusion from Table 3 is that steering typically decreases the penalty parameter more than a traditional updating scheme, but the difference is relatively small and we have implemented steering in a way that improves the overall efficiency and reliability of the method.
In this paper, we explored the numerical performance of adaptive updates to the Lagrange multiplier vector and penalty parameter in AL methods. Specific to the penalty parameter updating scheme is the use of steering conditions that guide the iterates toward the feasible region and toward dual feasibility in a balanced manner. Similar conditions were first introduced in  for exact penalty functions, but have been adapted in  and this paper to be appropriate for AL-based methods. Specifically, since AL methods are not exact (in that, in general, the trial steps do not satisfy linearized feasibility for any positive value of the penalty parameter), we allowed for a relaxation of the linearized constraints. This relaxation was based on obtaining a target level of infeasibility that is driven to zero at a modest, but acceptable, rate. This approach is in the spirit of AL algorithms since feasibility and linearized feasibility are only obtained in the limit. It should be noted that, like other AL algorithms, our adaptive methods can be implemented matrix-free, i.e., they only require matrix-vector products. This is of particular importance when solving large problems that have sparse derivative matrices.
As with steering strategies designed for exact penalty functions, our steering conditions proved to yield more efficient and reliable algorithms than a traditional updating strategy. This conclusion was made by performing a variety of numerical tests that involved our own \Matlab implementations and a simple modification of the well-known AL software \lancelot. To test the potential for the penalty parameter to be reduced too quickly, we also implemented safeguarded variants of our steering algorithms. Across the board, our results indicate that safeguarding was not necessary and would typically degrade performance when compared to the unrestricted steering approach. We feel confident that these tests clearly show that although our theoretical global convergence guarantee is weaker than some algorithms (i.e., we cannot prove that the penalty parameter will remain bounded under a suitable constraint qualification), this should not be a concern in practice. Finally, we suspect that the steering strategies described in this paper would also likely improve the performance of other AL-based methods such as [4, 27].
-  AMPL Home Page, http://www.ampl.com.
-  R. Andreani, E.G. Birgin, J.M. Martínez, and M.L. Schuverdt, Augmented Lagrangian methods under the constant positive linear dependence constraint qualification, Mathematical Programming 111 (2008), pp. 5–32, Available at http://dx.doi.org/10.1007/s10107-006-0077-1.
-  E.G. Birgin and J.M. Martínez, Augmented Lagrangian method with nonmonotone penalty parameters for constrained optimization, Computational Optimization and Applications 51 (2012), pp. 941–965, Available at http://dx.doi.org/10.1007/s10589-011-9396-0.
-  E.G. Birgin and J.M. Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, Fundamentals of Algorithms, SIAM, Philadelphia, PA, USA, 2014.
-  A. Bondarenko, D. Bortz, and J.J. Moré, COPS: Large-scale nonlinearly constrained optimization problems, Technical Report ANL/MCS-TM-237, Mathematics and Computer Science division, Argonne National Laboratory, Argonne, IL, 1998, revised October 1999.
-  S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning 3 (2011), pp. 1–122.
-  R.H. Byrd, G. Lopez-Calva, and J. Nocedal, A line search exact penalty method using steering rules, Mathematical Programming 133 (2012), pp. 39–73.
-  R.H. Byrd, J. Nocedal, and R.A. Waltz, Steering exact penalty methods for nonlinear programming, Optimization Methods and Software 23 (2008), pp. 197–213, Available at http://dx.doi.org/10.1080/10556780701394169.
-  A.R. Conn, N.I.M. Gould, and Ph.L. Toint, Global convergence of a class of trust region algorithms for optimization with simple bounds, SIAM Journal on Numerical Analysis 25 (1988), pp. 433–460.
-  A.R. Conn, N.I.M. Gould, and Ph.L. Toint, Testing a class of methods for solving minimization problems with simple bounds on the variables, Mathematics of Computation 50 (1988), pp. 399–430.
-  A.R. Conn, N.I.M. Gould, and Ph.L. Toint, A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds, SIAM Journal on Numerical Analysis 28 (1991), pp. 545–572.
-  A.R. Conn, N.I.M. Gould, and Ph.L. Toint, \lancelot: A Fortran package for large-scale nonlinear optimization (Release A), Lecture Notes in Computation Mathematics 17, Springer Verlag, Berlin, Heidelberg, New York, London, Paris and Tokyo, 1992.
-  A.R. Conn, N.I.M. Gould, and Ph.L. Toint, Numerical experiments with the LANCELOT package (Release A) for large-scale nonlinear optimization, Mathematical Programming 73 (1996), pp. 73–110.
-  A.R. Conn, N.I.M. Gould, and Ph.L. Toint, Trust-Region Methods, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000.
-  F.E. Curtis, H. Jiang, and D.P. Robinson, An adaptive augmented Lagrangian method for large-scale constrained optimization, Mathematical Programming (2014), Available at http://dx.doi.org/10.1007/s10107-014-0784-y.
-  K.R. Davidson and A.P. Donsig, Real Analysis and Applications, Undergraduate Texts in Mathematics, Springer, New York, 2010, Available at http://dx.doi.org/10.1007/978-0-387-98098-0.
-  E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles, Mathematical Programming 91 (2002), pp. 201–213.
-  D. Fernández and M.V. Solodov, Local convergence of exact and inexact augmented Lagrangian methods under the second-order sufficiency condition, SIAM Journal on Optimization 22 (2012), pp. 384–407.
-  R. Fourer, D.M. Gay, and B.W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, Brooks/Cole—Thomson Learning, Pacific Grove, USA, 2003.
-  D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximations, Computers and Mathematics with Applications 2 (1976), pp. 17–40.
-  R. Glowinski and A. Marroco, Sur l’Approximation, par Elements Finis d’Ordre Un, el la Resolution, par Penalisation-Dualité, d’une Classe de Problèmes de Dirichlet Nonlineares, Revue Française d’Automatique, Informatique et Recherche Opérationelle 9 (1975), pp. 41–76.
-  N.I.M. Gould, D. Orban, and Ph.L. Toint, CUTEr and sifdec: A constrained and unconstrained testing environment, revisited, ACM Transactions on Mathematical Software 29 (2003), pp. 373–394.
-  N.I.M. Gould, D. Orban, and Ph.L. Toint, Galahad—a library of thread-safe fortran 90 packages for large-scale nonlinear optimization, ACM Transactions on Mathematical Software 29 (2003), pp. 353–372.
-  N.I.M. Gould, D. Orban, and Ph.L. Toint, CUTEst: A constrained and unconstrained testing environment with safe threads, Tech. Rep. RAL-TR-2013-005, Rutherford Appleton Laboratory, 2013.
-  M.R. Hestenes, Multiplier and gradient methods, Journal of Optimization Theory and Applications 4 (1969), pp. 303–320.
-  A.F. Izmailov and M.V. Solodov, On attraction of linearly constrained Lagrangian methods and of stabilized and quasi-Newton SQP methods to critical multipliers, Mathematical Programming 126 (2011), pp. 231–257, Available at http://dx.doi.org/10.1007/s10107-009-0279-4.
-  M. Kočvara and M. Stingl, PENNON: A code for convex nonlinear and semidefinite programming, Optimization Methods and Software 18 (2003), pp. 317–333.
-  M. Kočvara and M. Stingl, PENNON: A generalized augmented Lagrangian method for semidefinite programming, in High Performance Algorithms and Software for Nonlinear Optimization (Erice, 2001), Applied Optimization, Vol. 82, Kluwer Academic Publishing, Norwell, MA, 2003, pp. 303–321, Available at http://dx.doi.org/10.1007/978-1-4613-0241-4_14.
-  M. Mongeau and A. Sartenaer, Automatic decrease of the penalty parameter in exact penalty function methods, European Journal of Operational Research 83 (1995), pp. 686–699.
-  J.L. Morales, A numerical study of limited memory BFGS methods, Applied Mathematics Letters 15 (2002), pp. 481–487.
-  J.J. Moré, Trust regions and projected gradients, in System Modelling and Optimization, Vol. 113, lecture Notes in Control and Information Sciences, Springer Verlag, Heidelberg, Berlin, New York, 1988, pp. 1–13.
-  J.J. Moré, Trust regions and projected gradients, in System Modelling and Optimization, M. Iri and K. Yajima, eds., Lecture Notes in Control and Information Sciences, Vol. 113, Springer Berlin Heidelberg, 1988, pp. 1–13, Available at http://dx.doi.org/10.1007/BFb0042769.
-  M.J.D. Powell, A method for nonlinear constraints in minimization problems, in Optimization, Academic Press, London and New York, 1969, pp. 283–298.
-  Z. Qin, D. Goldfarb, and S. Ma, An alternating direction method for total variation denoising, arXiv preprint arXiv:1108.1587 (2011).
-  Ph.L. Toint, Nonlinear stepsize control, trust regions and regularizations for unconstrained optimization, Optimization Methods and Software 28 (2013), pp. 82–95, Available at http://www.tandfonline.com/doi/abs/10.1080/10556788.2011.6104%58.
-  J. Yang, Y. Zhang, and W. Yin, A fast alternating direction method for TVL1-L2 signal reconstruction from partial Fourier data, IEEE Journal of Selected Topics in Signal Processing 4 (2010), pp. 288–297.
-  R.D. Zimmerman, C.E. Murillo-Sánchez, and R.J. Thomas, Matpower: Steady-state operations, planning, and analysis tools for power systems research and education, Power Systems, IEEE Transactions on 26 (2011), pp. 12–19.
Appendix A Well-posedness
Our goal in this appendix is to prove that Algorithm 3 is well-posed under Assumption 2.1. Since this assumption is assumed to hold throughout the remainder of this appendix, we do not refer to it explicitly in the statement of each lemma and proof.
a.1 Preliminary results
Our proof of the well-posedness of Algorithm 3 relies on showing that it will either terminate finitely or will produce an infinite sequence of iterates . In order to show this, we first require that the while loop that begins at line 11 of Algorithm 3 terminates finitely. Since the same loop appears in the AL trust region method in  and the proof of the result in the case of that algorithm is the same as that for Algorithm 3, we need only refer to the result in  in order to state the following lemma for Algorithm 3.