A feasible second order bundle algorithm for nonsmooth nonconvex optimization problems with inequality constraints: II. Implementation and numerical results

# A feasible second order bundle algorithm for nonsmooth nonconvex optimization problems with inequality constraints: II. Implementation and numerical results

Hannes Fendl This research was supported by the Austrian Science Found (FWF) Grant Nr. P22239-N13.Faculty of Mathematics, University of Vienna, Austria
Oskar-Morgenstern-Pl. 1, A-1090 Wien, Austria
11email: hermann.schichl@univie.ac.at
Hermann Schichl Faculty of Mathematics, University of Vienna, Austria
Oskar-Morgenstern-Pl. 1, A-1090 Wien, Austria
11email: hermann.schichl@univie.ac.at
###### Abstract

This paper presents a concrete implementation of the feasible second order bundle algorithm for nonsmooth, nonconvex optimization problems with inequality constraints Fendl & Schichl (2011). It computes the search direction by solving a convex quadratically constrained quadratic program. Furthermore, certain versions of the search direction problem are discussed and the applicability of this approach is justified numerically by using different solvers for the computation of the search direction. Finally, the good performance of the second order bundle algorithm is demonstrated by comparison with test results of other solvers on examples of the Hock-Schittkowski collection, on custom examples that arise in the context of finding exclusion boxes for quadratic constraint satisfaction problems, and on higher dimensional piecewise quadratic examples.

###### Keywords:
Nonsmooth optimization, nonconvex optimization, bundle method
###### Msc:
90C56, 49M37, 90C30
\nochangebars\cbcolor

blue

## 1 Introduction

Nonsmooth optimization addresses to solve the optimization problem

 minf(x) s.t. Fi(x)≤0\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak for all i=1,…,m\leavevmode\nobreak\ , (1)

where are locally Lipschitz continuous. Since for all if and only if with constants and since is still locally Lipschitz continuous (cf., e.g., Mifflin (1977, p. 969, Theorem 6 (a))), we can always assume in (1). Therefore w.l.o.g. we always consider the nonsmooth optimization problem with a single nonsmooth constraint

 minf(x) s.t. F(x)≤0\leavevmode\nobreak\ , (2)

where is locally Lipschitz continuous.

Since locally Lipschitz continuous functions are differentiable almost everywhere, both and may have kinks and therefore already the attempt to solve an unconstrained nonsmooth optimization problem by a smooth solver (e.g., by a line search algorithm or by a trust region method) by just replacing the gradient by a subgradient, fails in general (cf., e.g., Zowe (1989, p. 461-462)): If is an element of the subdifferential , then the search direction does not need to be a direction of descent (contrary to the behavior of the gradient of a differentiable function). Furthermore, it can happen that converges towards a minimizer , although the sequence of gradients does not converge towards and therefore we cannot identify as a minimizer. Moreover, it can happen that converges towards a point , but is not stationary for . The reason for these problems is that if is not differentiable at , then the gradient is discontinuous at and therefore does not give any information about the behavior of in a neighborhood of .

Not surprisingly, like in smooth optimization, the presence of constraints adds additional complexity, since constructing a descent sequence whose limit satisfies the constraints is (both theoretically and numerically) much more difficult than achieving this aim without the requirement of satisfying any restrictions.
Linearly constrained nonsmooth optimization. There exist various types of nonsmooth solvers like, e.g., the R-algorithm by Shor (1985) or stochastic algorithms that try to approximate the subdifferential (e.g., by Burke et al. (2005)) or bundle algorithms which force a descent of the objective function by using local knowledge of the function. We will concentrate on the latter ones as they proved to be quite efficient.

One of the few publicly available bundle methods is the bundle-Newton method for nonsmooth, nonconvex unconstrained minimization by Lukšan & Vlček (1998). We sum up its key features: It is the only method which we know of that uses second order information of the objective function, which results in faster convergence (in particular it was shown in Lukšan & Vlček (1998, p. 385, Section 4) that the bundle-Newton method converges superlinearly for strongly convex functions). Furthermore, the search direction is computed by solving a convex quadratic program (QP) (based on an SQP-approach in some sense) and it uses a line search concept for deciding whether a serious step or a null step is performed. Moreover, its implementation PNEW, which is described in Lukšan & Vlček (1997), is written in FORTRAN. Therefore, we can use the bundle-Newton method for solving linearly constrained nonsmooth optimization problems (as the linear constraints can just be inserted into the QP without any additional difficulties).

In general, every nonsmooth solver for unconstrained optimization can treat constrained problems via penalty functions. Nevertheless, choosing the penalty parameter well is a highly nontrivial task. Furthermore, if an application only allows the nonsmooth solver to perform a few steps (as, e.g., in Fendl et al. (2011, )), we need to achieve a feasible descent within these steps.

Nonlinearly constrained nonsmooth optimization. Therefore, Fendl & Schichl (2011, ) give an extension of the bundle-Newton method to the constrained case in a very special way: We use second order information of the constraint (cf. (2)). Furthermore, we use the SQP-approach of the bundle-Newton method for computing the search direction for the constrained case and combine it with the idea of quadratic constraint approximation, as it is used, e.g., in the sequential quadratically constrained quadratic programming method by Solodov (2004) (this method is not a bundle method), in the hope to obtain good feasible iterates, where we only accept strictly feasible points as serious steps. Therefore, we have to solve a strictly feasible convex QCQP for computing the search direction. Using such a QCQP for computing the search direction yields a line search condition for accepting infeasible points as trial points (which is different to that in, e.g., Mifflin (1982)). One of the most important properties of the convex QP (that is used to determine the search direction) with respect to a bundle method is its strong duality (e.g., for a meaningful termination criterion, for global convergence,…) which is also true in the case of strictly feasible convex QCQPs (cf. Fendl & Schichl (2011, )). Since there exist only a few solvers specialized in solving QCQPs (all written in MATLAB or C, none in FORTRAN), the method is implemented in MATLAB as well as in C.

For a detailed description of the presented issues we refer the reader to Fendl (2011, ).

The paper is organized as follows: In Section 2 we give a brief description of the implemented variant of the second order bundle algorithm. In Section 3 we discuss some aspects that arise when using a convex QCQP for the computation of the search direction problem like the reduction of its dimension and the existence of a strictly feasible starting point for its SOCP-reformulation. Furthermore, we justify the approach for determining the search direction by solving a QCQP numerically by comparing the results of some well-known solvers for our search direction problem. In Section 4 we provide numerical results for our second order bundle algorithm for some examples of the Hock-Schittkowski collection by Schittkowski (2009a, b), for custom examples that arise in the context of finding exclusion boxes for a quadratic CSP (constraint satisfaction problem) in GloptLab by Domes (2009) as well as for higher dimensional piecewise quadratic examples, and finally we compare these results to those of MPBNGC by Mäkelä (2003) and SolvOpt by Kappel & Kuntsevich (2000) to emphasize the good performance of the algorithm on constrained problems.

Throughout the paper we use the following notation: We denote the non-negative real numbers by , and the space of all symmetric -matrices by . For we denote the Euclidean norm of by , for we define the (MATLAB-like) colon operator , and for we denote the spectral norm of by .

## 2 Presentation of the algorithm

In the following section we give a brief exposition of our implemented variant of the second order bundle algorithm whose theoretical convergence properties are proved in Fendl & Schichl (2011). For this purpose we assume that the functions are locally Lipschitz continuous, that and , and , , where the set of the substitutes for the Hessian of at is defined by

 ∂2f(x):={{G}if the % Hessian G of f at x existsRn×nsymotherwise% \leavevmode\nobreak\ ,

i.e., we calculate elements of the sets and (in the proof of convergence in Fendl & Schichl (2011) only approximations were required). We consider the nonsmooth optimization problem (2) which has a single nonsmooth constraint. Then the second order bundle algorithm (described in Algorithm 2.1) tries to solve optimization problem (2) according to the following scheme: After choosing a starting point and setting up a few positive definite matrices, we compute the localized approximation errors. Then we solve a convex QCQP to determine the search direction, where the intention of the usage of the quadratic constraints of the QCQP is to obtain preferably feasible points that yield a good descent. \cbstartTherefore, we only use quadratic terms in the QCQP for the approximation of the constraint, but not for the approximation of the objective function (in contrast to Fendl & Schichl (2011, )) to balance the effort of solving the QCQP with the higher number of iterations caused by this simplification (in Subsection 3.1 we will even discuss a further reduction of the size of the QCQP). \cbendNow, after computing the aggregated data and the predicted descent as well as testing the termination criterion, we perform a line search (s. Algorithm 2.2) on the ray given by the search direction which yields a trial point that has the following property: Either is strictly feasible and the objective function achieves sufficient descent (serious step) or is strictly feasible and the model of the objective function changes sufficiently (null step with respect to the objective function) or is not strictly feasible and the model of the constraint changes sufficiently (null step with respect to the constraint). Afterwards we update the iteration point and the information which is stored in the bundle. Now, we repeat this procedure until the termination criterion is satisfied.

###### Algorithm 2.1
1. Initialization:

Choose the following parameters, which will not be changed during the algorithm:

Set the initial values of the data which gets changed during the algorithm:

 in ={0} (\# subsequent null and % short steps) is ={0} (\# subsequent serious % steps) J1 ={1} (set of bundle indices)% \leavevmode\nobreak\ .

Compute the following information at the initial trial point

 f1p=f11 =f(y1) g1p=g11 =g(y1)∈∂f(y1) G1p=G1 =G(y1)∈∂2f(y1) F1p=F11 =F(y1)<0\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak (y1 is strictly feasible according to % assumption) ^g1p=^g11 =^g(y1)∈∂F(y1) ^G1p=^G1 =^G(y1)∈∂2F(y1)

and set

 ^s1p=s1p=s11 =0 (locality measure) ^ρ1=ρ1 =1 (damping parameter) ¯κ1 =1 (Lagrange multiplier for optimality condition) k =1 (iterator)\leavevmode\nobreak\ .
2. Determination of the matrices for the QCQP:

if (step and were serious steps) ( )
aaa
else
aaa
end

if

aaa
else
aaa
end
Compute

 (\makebox[0.0pt][l]\makebox[0.0pt][c]^Gk,\makebox[0.0pt][l]\makebox[0.0pt][c]^Gkj)=% positive definite modification of (^Gkp,^Gj)'% '\leavevmode\nobreak for all j∈Jk\leavevmode% \nobreak\ . (3)
3. Computation of the localized approximation errors:

 αkj :=max(|f(xk)−fkj|,γ1(skj)ω1)\leavevmode\nobreak\ ,αkp:=max(|f(xk)−fkp|,γ1(skp)ω1) Akj :=max(|F(xk)−Fkj|,γ2(skj)ω2)\leavevmode\nobreak\ ,Akp:=max(|F(xk)−Fkp|,γ2(^skp)ω2)\leavevmode\nobreak\ .
4. Determination of the search direction: Compute the solution of the (convex) QCQP

 mind,^v^v+12dT\makebox[0.0pt][l]\makebox[0.0pt][c]Wkpd\leavevmode\nobreak\ , s.t. −αkj+dTgkj≤^v\leavevmode\nobreak \leavevmode\nobreak for j∈Jk s.t. −αkp+dTgkp≤^v\leavevmode\nobreak \leavevmode\nobreak if is≤ir s.t. F(xk)−Akj+dT^gkj+12dT\makebox[0.0pt][l]\makebox[0.0pt][c]^Gkjd≤0\leavevmode\nobreak \leavevmode\nobreak for j∈Jk s.t. F(xk)−Akp+dT^gkp+12dT\makebox[0.0pt][l]\makebox[0.0pt][c]^Gkd≤0\leavevmode\nobreak \leavevmode\nobreak if is≤ir (4)

and its corresponding Lagrange multiplier and set and .
if
aaa
else
aaa
end
if

aaa (bundle reset)
end

5. Aggregation: We set for the aggregation of information of the objective function

 (~fkp,~gkp,Gk+1p,~skp) =∑j∈Jkλkj(fkj,gkj,ρjGj,skj)+λkp(fkp,gkp,Gkp,skp) ~αkp =max(|f(xk)−~fkp|,γ1(~skp)ω1)

and for the aggregation of information of the constraint

 (~Fkp,~^gkp,^Gk+1p,~^skp) =∑j∈Jkκkj(Fkj,^gkj,^ρj^Gj,skj)+κkp(Fkp,^gkp,^Gkp,^skp) ~Akp =max(|F(xk)−~Fkp|,γ2(~^skp)ω2)

and we set

 vk =−dTk\makebox[0.0pt][l]\makebox[0.0pt][c]Wkpdk−12dTk(∑j∈Jkμkj\makebox[0.0pt][l]\makebox[0.0pt][c]^Gkj+μkp\makebox[0.0pt][l]\makebox[0.0pt][c]^Gk)dk−~αkp−¯κk+1~Akp−¯κk+1(−F(xk)) wk =12|Hk(~gkp+¯κk+1~^gkp)|2+~αkp+¯κk+1~Akp+¯κk+1(−F(xk))\leavevmode\nobreak\ .
6. Termination criterion:

if
aaastop
end

7. Line search: We compute step sizes and by using the line search described in Algorithm 2.2 and we set

 xk+1 =xk+tkLdk\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak (is created strictly feasible by the line % search) yk+1 =xk+tkRdk fk+1 =f(yk+1)\leavevmode\nobreak\ ,gk+1=g(yk+1)∈∂f(yk+1)\leavevmode\nobreak\ ,Gk+1=G(yk+1)∈∂2f(yk+1) Fk+1 =F(yk+1)\leavevmode\nobreak\ ,^gk+1=^g(yk+1)∈∂F(yk+1)\leavevmode\nobreak\ ,^Gk+1=^G(yk+1)∈∂2F(yk+1)% \leavevmode\nobreak\ .
8. Update:

if
aaa
else
aaa
end

if (serious step)
aaa
aaa
else (no serious step, i.e. null or short step)
aaa
end

Compute the updates of the locality measure

 sk+1j =skj+|xk+1−xk|\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak for j∈Jk sk+1k+1 =|xk+1−yk+1| sk+1p =~skp+|xk+1−xk| ^sk+1p =~^skp+|xk+1−xk|\leavevmode% \nobreak\ .

Compute the updates for the objective function approximation

 fk+1j =fkj+gkTj(xk+1−xk)+12ρj(xk+1−xk)TGj(xk+1−xk)\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak for j∈Jk fk+1k+1 =fk+1+gTk+1(xk+1−yk+1)+12ρk+1(xk+1−yk+1)TGk+1(xk+1−yk+1) fk+1p =~fkp+~gkTp(xk+1−xk)+12(xk+1−xk)TGk+1p(xk+1−xk)

and for the constraint

 Fk+1j =Fkj+^gkTj(xk+1−xk)+12^ρj(xk+1−xk)T^Gj(xk+1−xk)\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak for j∈Jk Fk+1k+1 =Fk+1+^gTk+1(xk+1−yk+1)+12^ρk+1(xk+1−yk+1)T^Gk+1(xk+1−yk+1) Fk+1p =~Fkp+~^gkTp(xk+1−xk)+12(xk+1−xk)T^Gk+1p(xk+1−xk)% \leavevmode\nobreak\ .

Compute the updates for the subgradient of the objective function approximation

 gk+1j =gkj+ρjGj(xk+1−xk)\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak for j∈Jk gk+1k+1 =gk+1+ρk+1Gk+1(xk+1−yk+1) gk+1p =~gkp+Gk+1p(xk+1−xk)

and for the constraint

 ^gk+1j =^gkj+^ρj^Gj(xk+1−xk)\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak for% j∈Jk ^gk+1k+1 =^gk+1+^ρk+1^Gk+1(xk+1−yk+1) ^gk+1p =~^gkp+^Gk+1p(xk+1−xk)\leavevmode\nobreak\ .

Choose with .

Go to 1

We extend the line search of the bundle-Newton method for nonsmooth unconstrained minimization to the constrained case in the line search described in Algorithm 2.2. Before formulating the line search in detail, we give a brief overview of its functionality:

Starting with the step size , we check if the point is strictly feasible. If so and if additionally the objective function decreases sufficiently in this point and is not too small, then we take as new iteration point in Algorithm 2.1 (serious step). Otherwise, if the point is strictly feasible and the model of the objective function changes sufficiently, we take as new trial point (short/null step with respect to the objective function). If is not strictly feasible, but the model of the constraint changes sufficiently (in particular here the quadratic approximation of the constraint comes into play), we take as new trial point (short/null step with respect to the constraint). After choosing a new step size by interpolation, we iterate this procedure.

###### Algorithm 2.2
1. Initialization: Choose as well as and set as well as .

2. Modification of either or :

 if F(xk+tdk)<0 iiiif f(xk+tdk)≤f(xk)+mLvk⋅t iiiiiitL=t iiielse if f(xk+tdk)>f(xk)+mLvk⋅t iiiiiitU=t iiiend
 else if F(xk+tdk)≥0 iiitU=t iiit0=^t0tU end if tL≥t0 iiitR=tL iiireturn (serious step) end
3. Decision of return:

 10 if F(xk+tdk)<0 10 iiig=g(xk+tdk)∈∂f(xk+tdk)\leavevmode\nobreak\ ,G=G(xk+tdk)∈∂2f(xk+tdk) 10 iiiρ=⎧⎨⎩min(1,CG|G|)for in≤30else 10 iiif=f(xk+tdk)+(tL−t)gTdk+12ρ(tL−t)2dTkGdk 10 iiiβ=max(|f(xk+tLdk)−f|,γ1|tL−t|ω1|dk|ω1) 10 iiiif −β+dTk(g+ρ(tL−t)Gdk)≥mRvk and (t−tL)|dk|≤CS 10 iiiiiitR=t 10 iiiiiireturn (short/null step: change of model of the objective function) 10 iiiend 10 else if F(xk+tdk)≥0 10 iii^g=^g(xk+tdk)∈∂F(xk+tdk)\leavevmode\nobreak\ ,^G=^G(xk+tdk)∈∂2F(xk+tdk) 10 iii^ρ=min(1,^CG|^G|) 10 iiiF=F(xk+tdk)+(tL−t)^gTdk+12ρ(tL−t)2dTk^Gdk 10 iii^β=max(|F(xk+tLdk)−F|,γ2|tL−t|ω2|dk|ω2) 10 iii\makebox[0.0pt][l]\makebox[0.0pt][c]^G=positive definite modification of ^G'' (5) 10 iiiif F(xk+tLdk)−^β+dTk(^g+^ρ(tL−t)^Gdk)≥mF⋅(−12dTk\makebox[0.0pt][l]\makebox[0.0pt][c]^Gdk) and (t−tL)|dk|≤CS (6) 10 iiiiiitR=t 10 iiiiiireturn (short/null step: change of model of the constraint) 10 iiiend 10 end
4. Interpolation: Choose .

5. Loop: Go to 1

###### Remark 1

Similar to the line search in the bundle-Newton method for nonsmooth unconstrained minimization by Lukšan & Vlček (1998), we want to choose a new point in the interval by interpolation. For this purpose, we set up a polynomial passing through and as well as a polynomial passing through and . Now we minimize subject to the constraint on and we use a solution as the new point. The degree of the polynomial should be chosen in a way that determining is easy (e.g., if we choose and as quadratic polynomials, then determining consists of solving a one-dimensional linear equation, a one-dimensional quadratic equation and a few case distinctions).

## 3 The reduced problem

In this section we present some issues that arise when using a convex QCQP for the computation of the search direction problem like the reduction of its dimension. Moreover, we give a numerical justification of the approach of determining the search direction by solving a QCQP by comparing the results of some well-known solvers for our search direction problem.

### 3.1 Reduction of problem size

We want to reduce the problem size of the QCQP (4). For this purpose we choose as a positive definite modification of and for all , i.e. we choose all matrices for the constraint approximation equal to a positive definite modification of an aggregated Hessian of the constraint (i.e. similar to the choice of in the bundle-Newton method for nonsmooth unconstrained minimization by Lukšan & Vlček (1998)). For the implementation, we will extract linear constraints with and that may occur in the single nonsmooth function (via a -function of the rows