Quasi-Newton approaches to Interior Point Methods for quadratic problems

# Quasi-Newton approaches to Interior Point Methods for quadratic problems

J. Gondzio111School of Mathematics, University of Edinburgh, Edinburgh, EH9 3FD, Scotland, United Kingdom. Email: J.Gondzio@ed.ac.uk    F. N. C. Sobral222Corresponding author. Department of Mathematics, State University of Maringá, Avenida Colombo, 5790, Paraná, Brazil, 87020-900. Phone: +55 44 30116211. E-mail: fncsobral@uem.br
Technical Report ERGO 18-015, School of Mathematics, June 25, 2018
###### Abstract

Interior Point Methods (IPM) rely on the Newton method for solving systems of nonlinear equations. Solving the linear systems which arise from this approach is the most computationally expensive task of an interior point iteration. If, due to problem’s inner structure, there are special techniques for efficiently solving linear systems, IPMs enjoy fast convergence and are able to solve large scale optimization problems. It is tempting to try to replace the Newton method by quasi-Newton methods. Quasi-Newton approaches to IPMs either are built to approximate the Lagrangian function for nonlinear programming problems or provide an inexpensive preconditioner. In this work we study the impact of using quasi-Newton methods applied directly to the nonlinear system of equations for general quadratic programming problems. The cost of each iteration can be compared to the cost of computing correctors in a usual interior point iteration. Numerical experiments show that the new approach is able to reduce the overall number of matrix factorizations and is suitable for a matrix-free implementation.

Keywords: Broyden Method, Quasi-Newton, Interior Point Methods, Matrix-free, Quadratic Programming Problems

## 1 Introduction

Let us consider the following general quadratic programming problem

 min12xTQx+cTxs. t.Ax=bx≥0, (1)

where , , and . We will suppose that the rows of are linearly independent. Define function by

 F(x,λ,z)=⎡⎢⎣−Qx+ATλ+z−cAx−bXZe⎤⎥⎦, (2)

where are diagonal matrices defined by and , respectively, and is the vector of ones of appropriate size. First order necessary conditions for (1) state that, if is a minimizer, then there exist , , and such that .

Primal-Dual IPMs try to solve (1) by solving a sequence of relaxed constrained nonlinear equations in the form of

 F(x,y,s)=⎡⎢⎣00μe⎤⎥⎦,x,s>0, (3)

where is called the barrier parameter, which is associated with the logarithmic barrier applied to the inequalities used to derive the method [1, 2]. As more importance is given to optimality over feasibility. Systems of type (3) are not easy to solve. When , they can be solved by general algorithms for bounded nonlinear systems [3, 4]. In this case, a suitable merit function, usually , has to be used to select the step-sizes. IPMs try to stay near the solution of (3), called the central path, and reduce at each iteration. Instead of solving (3) exactly, one step of the Newton method is applied. Thus, given an iterate , in the interior of the bound constraints, i.e. , the next point is given by

 (xk+1,λk+1,zk+1)=(xk,λk,zk)+(αPΔxk,αDΔλk,αDΔzk), (4)

where is computed by solving some Newton-like systems

 J(xk,λk,zk)⎡⎢⎣ΔxkΔλkΔzk⎤⎥⎦=v, (5)

where and is the Jacobian of , defined by

 J(x,λ,z)=⎡⎢⎣−QATIA00Z0X⎤⎥⎦. (6)

Standard predictor-corrector algorithms solve (5) twice: first the affine scaling predictor is computed for and then the corrector step is computed using , with , . Additional correctors can be computed in one iteration to further accelerate convergence, such as second order correctors [5] or multiple centrality correctors [6]. Scalars and are selected such that and , respectively.

The most expensive task during an interior point (IP) iteration is to solve (5). The coefficient matrix is known as unreduced matrix and has dimension , but its nice structure allows efficient solution techniques to be used. The most common approaches for solving the linear system in IPMs are to work with augmented system or normal equations. If we eliminate in (5), we have the augmented system for which we can solve directly using matrix factorizations or compute adequate preconditioners and solve iteratively by Krylov subspace methods. If matrix is easily invertible, or (linear programming problems), it is possible to further eliminate and solve the normal equations by Cholesky factorization or by Conjugate Gradients, depending on the size of the problem. For both approaches it is known that computing good preconditioners or computing the factorization can be most expensive part of the process. Therefore (5) can be solved several times for the same with different right-hand sides, in a classical predictor-corrector approach [5] or in the multiple centrality correctors framework [7, 1]. In this work we will extensively use the fact that the backsolves in (5) are less expensive than computing a good preconditioner or factorization.

Although is unsymmetric, under reasonable assumptions Greif, Moulding and Orban showed that it has only real eigenvalues [8]. Based on those results, Morini, Simoncini and Tani [9] developed preconditioners for the unreduced matrix and compared the performance of interior point methods using unreduced matrices and augmented system. The unreduced matrix has also two more advantages, when compared to augmented system and normal equations. First, small changes of variables or result in small changes in . Second, is the Jacobian of , so it is possible to approximate it by building models or evaluating on some extra points. These two characteristics are explored in this work.

Since is the Jacobian of , it is natural to ask if it can be approximated by evaluating in some points. Function is composed by two linear and one nonlinear functions. Therefore, the only part of which may change during iterations is the third row. Moreover, it can be efficiently stored by just storing , , and . Since computing and storing is inexpensive, the only reason to use an approximation of is if system (5), using instead of , becomes easier to solve. That is where quasi-Newton methods and low rank updates become an interesting tool in interior point methods.

Quasi-Newton methods are well known techniques for solving large scale nonlinear systems or nonlinear optimization problems. The main motivation is to replace the Jacobian used by the traditional Newton method by its good and inexpensive approximation. Originally, they were useful to avoid computing the derivatives of , but they have become popular as a large scale tool, since they usually do not need to explicitly build matrices and enjoy superlinear convergence. Classical references for quasi-Newton methods are [10, 11] for nonlinear equations and [12] for unconstrained optimization.

In the review [11] about practical quasi-Newton methods for solving nonlinear equations, Martínez suggests that there is room for studying such techniques in the interior point context. The author points to the work of Dennis Jr., Morshedi and Turner [13] which applies quasi-Newton techniques to make the projections in Karmarkar’s algorithm cheaper. The authors write the interpolation equations associated with the linear system in interior point iterations and describe a fast algorithm to compute updates and also to update an already existing Cholesky factorization. When solving general nonlinear programming problems by IPMs, a well known approach is to replace the Hessian of the Lagrangian function by low rank approximations [12].

In 2000, Morales and Nocedal [14] used quasi-Newton arguments to show that the directions calculated by the Conjugate Gradient algorithm can be used to build an automatic preconditioner for the matrix under consideration. The preconditioner is a sequence of rank-one updates of an initial diagonal matrix. Such approach is efficient when solving a sequence of linear systems with the same (or a slowly varying) coefficient matrix. Based on those ideas, a limited memory BFGS-like preconditioner for positive definite matrices was developed in [15] and was specialized for symmetric indefinite matrices in [16] . Recently, Bergamaschi et al. [17] developed limited-memory BFGS-like preconditioners to KKT systems arising from IP iterations and described their spectral properties. The approach was able to reduce the number of iterations in the Conjugate Gradient algorithm, but the approximation deteriorates as the number of interior point iterations increase. Also, extra linear algebra has to be performed to ensure orthogonality of the vectors used to build the updates.

In all works, with exception of [13], the main focus was to use low rank updates of an already computed preconditioner such that new preconditioners are constructed in an inexpensive way and reduce the overall number of linear algebra iterations. In the present work, our main objective is to work directly with nonlinear equations and use low rank secant updates for computing the directions in the IP iterations. We use least change secant updates, in particular Broyden updates, and replace the Newton system (5) by an equivalent one. Some properties of the method are presented and extensive numerical experiments are performed. The main features of the proposed approach are:

• Low rank approximations are matrix-free and use only vector multiplications and additions;

• The quasi-Newton method for solving (5) can be easily inserted into an existing IPM;

• The number of factorizations is reduced for small and large instances of linear and quadratic problems;

• When the cost of the factorization is considerably higher than the cost of the backsolves, the total CPU time is also decreased.

In Section 2 we discuss the basic ideas of quasi-Newton methods, in particular the Broyden method, which is extensively used in the work. In Section 3 we show that, if the initial approximation is good enough, least change secant updates preserve most of the structure of the true coefficient matrix and a traditional IP iteration can be performed with the cost of computing correctors only. New low rank secant updates, which are able to exploit the sparsity of are also discussed. In Section 4 we describe the aspects of a successful implementation of a quasi-Newton interior point method. In Section 5 we compare our approach with a research implementation of the primal-dual IPM for solving small- and medium-sized linear and quadratic problems. Finally, in Section 6 we draw the conclusions and mention possible extensions of the method.

#### Notation.

Throughout this work we use and as short versions of vector and matrix , respectively. The vector denotes the vector of ones of appropriate dimension.

## 2 Background for quasi-Newton methods

Quasi-Newton methods can be described as algorithms which use approximations to the Jacobian in the Newton method in order to solve nonlinear systems. The approximations are generated using information from previous iterations. Suppose that we want to find such that , where is continuously differentiable. Given the current point at iteration , Newton method builds a linear model of around in order to find . Now, suppose that and have already been calculated and let us create a linear model for around :

 Mk+1(¯x)=F(¯xk+1)+Bk+1(¯x−¯xk+1). (7)

The choice results in the Newton method for iteration . In secant methods, is constructed such that interpolates at and , which gives us the secant equation

 Bk+1sk=yk, (8)

where and . When and there are more unknowns than equations and several choices for exist [18, 11].

Let be the current approximation to , the Jacobian of at (it can be itself, for example). One of the most often used simple secant approximations for unsymmetric Jacobians is given by the Broyden “good” method. Given , a new approximation to is given by

 Bk+1=Bk+(yk−Bksk)sTksTksk. (9)

Matrix is the closest matrix to , in Frobenius norm, which satisfies (8). The update of the Broyden method belongs to the class of least change secant updates, since is a rank-one update of . As we are interested in solving a linear system, it may be interesting to analyze matrix , which is obtained by the well known Sherman-Morrison-Woodbury formula:

 Hk+1=Hk+(sk−Hkyk)sTkHksTkHkyk=(I+uksTkρk)Hk, (10)

where and . We can see that is also a least change secant update of . To store , one needs first to compute and then store one scalar and two vectors. Storing is more efficient than storing when is going to be used more than once. According to (10), the cost of computing is the cost of computing plus one scalar product and one sum of vectors times a scalar. After updates of an initial approximation , current approximation is given by

 Hk=(I+uk−1sTk−1ρk−1)Hk−1=[ℓ∏j=1(I+uk−jsTk−jρk−j)]Hk−ℓ.

Instead of updating and then computing its inverse, the Broyden “bad” method directly computes the least change secant update of the inverse:

 Hk+1=Hk+(sk−Hkyk)yTkyTkyk=HkVk+skyTkρk, (11)

where and . Similarly to in (10), given by (11) is the closest matrix of , in the Frobenius norm, such that satisfies (8). The cost of storing is lower than that of (10), since vectors and have already been computed. The cost of calculating is higher: it involves one scalar product, two sums of vector times a scalar and . After updates of an initial approximation , current approximation is given by

 Hk=Hk−1Vk−1+sk−1yTk−1ρk−1=Hk−ℓ⎛⎝k−1∏j=k−ℓVj⎞⎠+ℓ∑i=1⎛⎝sk−iyTk−iρk−ik−1∏j=k−i+1Vj⎞⎠ (12)

Approach (11) has some advantages over (10). First, it does not need to compute for constructing the update. When is a complicated matrix, this is a costly operation. Second, unlike (10), matrices depend solely on and for all , so it is possible to replace by different matrices without updating the whole structure. This is suitable to be applied in a limited-memory scheme [16]. Third, the computation of can be efficiently implemented in a scheme similar to the BFGS update described in [12], as we show in Algorithm 1. Unfortunately, the Broyden “bad” method is known to behave worse in practice than the “good” method [10]. To avoid the extra cost of computing in (10) it is common to compute a Cholesky or LU factorization of and work directly with (9), performing rank-one updates of the factorization, which can be efficiently implemented [19].

The class of rank-one least change secant updates can be generically represented by updates of the form

 Bk+1=Bk+(yk−Bksk)wTkwTksk, (13)

where . Setting defines the Broyden “good” method and defines the Broyden “bad” method. Several other well known quasi-Newton methods fit in update (13), such as the Symmetric Rank-1 update used in nonlinear optimization, which defines . See [18, 10] for details on least change secant updates.

## 3 A quasi-Newton approach for IP iterations

According to the general description of primal-dual IPMs in Section 1, we can see that, at each iteration, they perform one Newton step associated with the nonlinear system (3), for decreasing values of . Each step involves the computation of the Jacobian of and the solution of a linear system (5).

Our proposal for this work is to perform one quasi-Newton step to solve (3), replacing the true Jacobian by a low rank approximation . The idea might seem surprising at first glance, since, for quadratic problems, is very cheap to evaluate. In this section we further develop the quasi-Newton ideas applied to interior point methods and show that they might help to reduce the cost of the linear algebra when solving (1).

It is important to note that and discussed in Section 2 will be given by (2) and (6), respectively, in the interior point context, which highlights the importance of using the unreduced matrix in our analysis. Therefore, variable in Section 2 is given by and, consequently, .

### 3.1 Initial approximation and update

Suppose that is an interior point iteration for which system (5) was solved and was calculated, using any available technique. Usually, solving (5) involves an expensive factorization or the computation of a good preconditioner associated with . Most traditional quasi-Newton methods for general nonlinear systems compute by finite differences or use a diagonal matrix as the initial approximation. According to Section 2, it is necessary to have an initial approximation of in order to generate approximation of by low rank updates. Most of traditional quasi-Newton methods for general systems compute by finite differences or use a diagonal matrix. Since have already been computed, we will define it as , i.e., the perfect approximation to . It is clear that, in such case, is the approximation to .

In order to compute , vectors and in secant equation (8) have to be built:

 sk =⎡⎢⎣sk,xsk,λsk,z⎤⎥⎦=⎡⎢⎣xk+1−xkλk+1−λkzk+1−zk⎤⎥⎦ (14) yk =⎡⎢⎣yk,cyk,byk,μ⎤⎥⎦=F(xk+1,λk+1,zk+1)−F(xk,λk,zk) =⎡⎢⎣−Qsk,x+ATsk,λ+sk,zAsk,xXk+1Zk+1e−XkZke⎤⎥⎦.

The use of as the initial approximation ensures that the first two block elements of are zero. This is a well known property of low rank updates given by (13) when applied to linear functions (see [10, Ch. 8]). In Lemma 1 we show that rank-one secant updates maintain most of the good sparsity structure of approximation when its structure is similar to the true Jacobian of .

###### Lemma 1.

Let be the Jacobian of given by (2). If the least change secant update for approximating is computed by (13) using , , , and is defined by

 Bk=⎡⎢⎣−QATIA00M1kM2kM3k⎤⎥⎦

then

 Bk+1=⎡⎢ ⎢⎣−QATIA00M1k+1M2k+1M3k+1⎤⎥ ⎥⎦,

where is a rank-one update of , for . In addition, if and , then .

###### Proof.

By the definition of and in (14) it is easy to see that , where

 uk=(Xk+1Zk+1−XkZk)e−M1ksk,x−M2ksk,λ−M3ksk,z.

Using the secant update (13), we have that the first two rows of are kept the same and

 M1k+1 =M1k+ukaTk/(wTksk) M2k+1 =M2k+ukbTk/(wTksk) M3k+1 =M3k+ukcTk/(wTksk).

It is easy to see that when and . ∎

By Section 2 we know that Broyden “good” and “bad” updates are represented by specific choices of and, therefore, enjoy the consequences of Lemma 1. Unfortunately, not much can be said about the structure of the “third row” of . When , the diagonal structure of blocks and , as well as the zero block in the middle, are likely to be lost. However, if we select , then, by Lemma 1, the zero block is kept in . The update given by this choice of is a particular case of Schubert’s quasi-Newton update for structured and sparse problems [20]. This update minimizes the distance to on the space of the matrices that satisfy (8) and have the same block sparsity pattern of  [18]. Using the Sherman-Morrison-Woodbury formula, we also have the update for :

 Hk+1=(I−(Hkyk−sk)wTkwTkHkyk)H−1k,

which only needs an extra computation of to be stored. There is no need to store , since it is composed by components of . We can say that this approach is inspired in the Broyden “good” update.

On the other hand, if we use , then we still have by Lemma 1 and, in addition, we are able to remove the calculation in the inverse. This approach is inspired by the Broyden “bad” update and results in the following update

 (15)

Up to the knowledge of the authors, this update has not been theoretically studied in the literature.

Lemma 1 also justifies our choice to work with approximations of rather than . After rank-one updates, if is solved by factorizations and backsolves, it would be necessary to perform updates on the factorization of initial matrix , what could introduce many nonzero elements. A clear benefit of defining is that computing uses the already calculated factorizations/preconditioners for , which were originally used to solve (5) at iteration . Step 1 of Algorithm 1 is an example of low rank update (12). Clearly, we do not explicitly compute , but instead solve the system .

### 3.2 Computation of quasi-Newton steps

Having defined how quasi-Newton updates are initialized and constructed, we now have to insert the approximations in an interior point framework. Denoting as the starting point of the algorithm, at the end of any iteration it is possible to build a rank-one secant approximation of the unreduced matrix to be used at iteration . Let us consider iteration , where and . If , then, by the previous subsection, and the step in the interior point iteration is the usual Newton step, given by (5). If , we have a quasi-Newton step, which can be viewed as a generalization of (5), and is computed by solving

 Bk⎡⎢⎣ΔxkΔλkΔzk⎤⎥⎦=v (16)

or, equivalently, by performing . All the other steps of the IPM remain exactly the same.

When , the cost of solving (16) depends on the type of update that is used. In general, it is the cost of solving system (or, equivalently, ) plus some vector multiplications and additions. However, since has already been the coefficient matrix of a linear system at iteration , it is usually less expensive than solving for the first time. That is one of the main improvements that a quasi-Newton approach brings to interior point methods.

When the Broyden “bad” update (12) is used together with defining as the initial approximation, it is possible to derive an alternative interpretation of (16). Although this update is known to have worse numerical behavior when compared with the “good” update (10), this interpretation can result in a more precise implementation, which is described in Lemma 2.

###### Lemma 2.

Assume that and is the approximation of constructed by updates (12) using initial approximation . Given , the computation of is equivalent to the solution of

 Jk−ℓr=v+⎡⎢ ⎢ ⎢ ⎢⎣00ℓ∑i=1αi(Zk−ℓsk−i,x+Xk−ℓsk−i,z−yk−i,μ)⎤⎥ ⎥ ⎥ ⎥⎦,

where , for .

###### Proof.

Using the expansion (11) of Broyden “bad” update, the definition of and the fact that , we have that

 (17)

where the last equality comes from the definition of in (11), applied recursively. When , we assume that results in the identity matrix, therefore . Multiplying on the left on both sides of (17), we obtain

 Jk−ℓr=v+ℓ∑i=1αi(Jk−ℓsk−i−yk−i).

By Lemma 1 and definition (14), the first two components of are zero, for all , which demonstrates the lemma. ∎

Lemma 2 states that only the third component of the right hand side actually needs to be changed in order to compute Broyden “bad” quasi-Newton steps at iteration . This structure is very similar to corrector or multiple centrality correctors in IPMs and reinforce the argument that the cost of computing a quasi-Newton step is lower than the Newton step. It is important to note that scalars are the same as the ones computed at step 1 of Algorithm 1.

### 3.3 Dealing with regularization

Rank-deficiency of , near singularity of or the lack of strict complementarity at the solution may cause matrix , the augmented system or the normal equations to become singular near the solution of (1). As the iterations advance, it becomes harder to solve the linear systems. Regularization techniques address this issue by adding small perturbations to in order to increase numerical accuracy and convergence speed, without losing theoretical properties. A common approach is to interpret the perturbation as the addition of weighted proximal terms to the primal and dual formulations of (1). Saunders and Tomlin [21] consider fixed perturbations while Altman and Gondzio [22] consider dynamic ones, computed at each iteration. Friedlander and Orban [23] add extra variables to the problem, expand the unreduced system and, after an initial reduction, arrive in a regularized system similar to [22]. In all these approaches, given reference points and , the regularized matrix

 J(x,λ,z)=⎡⎢⎣−Q−RpATIARd0Z0X⎤⎥⎦, (18)

where diagonal matrices and represent primal and dual regularization, respectively, can be viewed as the Jacobian of the following function

 ^F(x,λ,z)=⎡⎢ ⎢⎣ATλ−Qx−Rp(x−^x)−cAx+Rd(λ−^λ)−bXZe⎤⎥ ⎥⎦.

Any choice is possible for reference points and . However, in order to solve the original Newton system (5) and make use of the good properties of the regularization (18) at the same time, they are usually set to the current iteration points and , respectively, which annihilates terms and on the right hand side of (5) during affine scaling steps.

Matrix given by (18) now depends on and in addition to and . The regularization terms and do not need to be considered as variables, but if new regularization parameters are used, a new factorization or preconditioner needs to be computed. Since this is one of the most expensive tasks of the IP iteration, during quasi-Newton step the regularization parameters are not allowed to change from those selected at iteration , where the initial approximation was selected. That is a reasonable decision, as the system that is actually being solved in practice has the coefficient matrix from iteration . The fact that the regularization terms are linear in implies, by Lemma 1, that the structure of (18) is maintained during least change secant updates.

The reference points have no influence in , but they do influence the function . Suppose, as an example, that , i.e., the initial approximation for quasi-Newton is the Jacobian at the starting point , and only quasi-Newton steps are taken in the interior point algorithm. If we use and as the reference points and the algorithm converges, the limit point could be very different from the true solution, as initial points usually are far away from the solution, especially for infeasible IPMs. If we update the reference points at each quasi-Newton iteration, as it is usually the choice in literature [22, 23], we eliminate their effect on the right hand side of (16) during affine scaling steps. By (7), is the Jacobian of a linear approximation of which interpolates and . As the regularization parameters are fixed during quasi-Newton iterations, the reference points can be seen as simple constant shifts on , with no effect on the Jacobian. Therefore, the only request is that has to be evaluated at points and using the same reference points, when calculating by (14). The effect of changing the reference points at each iteration in practice is the extra evaluation of at the beginning of iteration .

## 4 Implementation

The quasi-Newton approach can easily be inserted into an existing interior point method implementation. In this work, the primal-dual interior point algorithm HOPDM [24] was modified to implement the quasi-Newton approach. Algorithm 2 describes the steps of a conceptual quasi-Newton primal-dual interior point algorithm.

The most important element of Algorithm 2 is , the memory size of the low rank update, which controls if the iteration involves Newton or quasi-Newton steps. At step 2 several systems (16) might be solved, depending on the IPM used. HOPDM implements the strategy of multiple centrality correctors [7], which tries to maximize the step-size at the iteration. HOPDM also implements the regularization strategy (18). Note in (16) that we do not have to care how the systems are solved, only how to implement the matrix-vector multiplication efficiently.

Step 2 is the most important step in a quasi-Newton IP algorithm, since it decides whether or not quasi-Newton steps will be used in the next iteration. Several possible strategies are discussed in this section, as well as some implementation details.

Bound constraints

 l≤x≤u,l,u∈Rn

can be considered in the general definition (1) of a quadratic programming problem by using slack variables. HOPDM explicitly deals with bound constraints and increases the number of variables to . When bound constraints are considered, function is given by

 F(x,t,λ,z,w)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣ATλ−Qx+z−w−cAx−bx+t−uXZeTWe⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

and the Jacobian is

 J(x,t,λ,z,w)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣−Q0ATI−IA0000II000Z00X00W00T⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

Note that, in this case, is eliminated by proper shifts, represents upper shifted constraints and represents slacks. All the results and discussions considered so far can be easily adapted to the bound-constrained case. Therefore, in order to keep notation simple, we will refer to the more general and simpler formulation (1) and work in the -dimensional space.

### 4.1 Storage of Hk and computation of Hkv

When solving quadratic problems, the Jacobian of function used in a primal-dual interior point method is not expensive to compute and has an excellent structure, which can be efficiently explored by traditional approaches. Therefore, there is no point in explicitly building approximation matrix (or ) since, by Lemma 1, they would be denser. For an efficient implementation of the algorithm only the computation has to be performed in (16). To accomplish this task, we store

• Initial approximation and

• Triples or , , if updates are based on Broyden “good” or “bad” method, respectively.

In order to store we have to store vectors and , since all other blocks of are constant. If regularization is being used, vectors and used at iteration are also stored. The reference points are not stored. The most important structure to store is the factorization or the preconditioner computed when solving (16) at iteration for the first time. Without this information, the computation of would have the same computational cost of using the true matrix . Data is stored at step 2 of Algorithm 2, whenever it has decided to store quasi-Newton information and .

Regarding the triples, they are composed of two -dimensional vectors and one scalar. Storing is the most expensive part in Broyden “bad” updates, since function has to be evaluated twice. In Broyden “good” updates the computation of is the most expensive, due to the computation of .

The implementation of an algorithm to compute depends on the selected type of low rank update. Algorithm 1 is an efficient implementation of the general Broyden “bad” update (12). If the structure described by Lemma 1 is being used, then all vector multiplications are performed before the solution of the linear system, as described by Algorithm 3. Both algorithms can be easily modified to use updates of the form in the generic update (13). The only changes are the storage of an extra vector and the computation of scalars at step 1. The implementation of the sparse update (15) is straightforward and there is no need to store extra information. Algorithm 3 uses a little extra computation, since vector is discarded after the computation of all . On the other hand, there is no need to store blocks , .

Algorithm 4 describes the steps to compute when Broyden “good” update (10) is considered. Note that a linear system is first solved, then a sequence of vector multiplications and additions is applied. The algorithm is simpler and more general than Algorithm 1, but it has to be called more often in an interior point algorithm: to compute the steps (step 2 in Algorithm 2) and to compute , needed to build (step 2 in Algorithm 2). Algorithm 4 is very general and can be easily modified to consider any least change secant update of the form (13) without extra storage requirements, although not necessarily in an efficient way.

### 4.2 Size of ℓ

The cost of computing increases as the quasi-Newton memory increases. In addition, it was observed that the quality of the approximation decreases when the quasi-Newton memory is large [17]. In our implementation of Algorithm 2, we also observed the decrease in the quality of the steps when is too large. The decrease of the barrier parameter for different bounds on is shown in Figure 1, for problem afiro, the smallest example in Netlib test collection. In this example, Newton steps were allowed after quasi-Newton iterations, where . The maximum of 200 iterations was allowed.

We can see that if the Jacobian is only evaluated once () then the method is unable to converge in 200 iterations. As the maximum memory is reduced, the number of iterations to convergence is also reduced. On the other hand, the number of (possibly expensive) Newton steps is increased. When , i.e., no quasi-Newton steps, the algorithm converges in 7 iterations. We take the same approach as [17] and define an upper bound on in the implementation of Algorithm 2. When this upper bound is reached, we set to 0, which, by (16), results in the computation of a Newton step. The verification is performed at step 2 of Algorithm 2. This approach is also known as quasi-Newton with restarts [25] and differs from usual limited-memory quasi-Newton [12], where only the oldest information is dropped.

### 4.3 The quasi-Newton steps

The behavior of consecutive quasi-Newton steps depicted in Figure 1 reminds us that it is important to use the true Jacobian in order to improve convergence of the method. However, we would like to minimize the number of times the Jacobian is evaluated, since it involves expensive factorizations and computations. Unfortunately, to use only the memory bound as a criterion to compute quasi-Newton steps is not a reasonable choice. When , for example, the algorithm converges in 110 iterations, but it spends around 60 iterations without any improvement. As the dimension of the problem increases, this behavior is getting even worse. We can also see that the choice is better for this problem, as the algorithm converges in 31 iterations, computing only two times the Cholesky factorization of the Jacobian.

The lack of reduction is related to small step-sizes and . Our numerical experience with quasi-Newton IP methods indicates that the quasi-Newton steps often are strongly attracted to the boundaries. The step-sizes calculated for directions originated from a quasi-Newton predictor-corrector strategy are almost always small and need to be fixed. Several strategies have been tried to increase the step-sizes of those steps:

1. Perturb complementarity pairs for which the relative component-wise direction magnitude

 |[Δxk]i|xkior|[Δzk]i|zki,i=1,…,n (19)

is high and then recompute quasi-Newton direction;

2. Use multiple centrality correctors [7];

3. Gentle reduction of on quasi-Newton iterations, selecting close to 1 in the predictor and corrector steps.

Note that the terms in (i) are the inverse of the maximum step-size allowed by each component.

The motivation of strategy (i) is the strong relation observed between components of the quasi-Newton direction which are too large with respect their associated variable and components which differ too much from the respective component of the Newton direction for the same iteration, i.e.,

 ∣∣[Δxk(N)−Δxk(QN)]i∣∣xkiand∣∣[Δzk(N)−Δzk(QN)]i∣∣zki,i=1,…,n. (20)

We display this relation in Figure 2(a) for one iteration on linear problem GE. Positive spikes represent the component-wise relative magnitude of quasi-Newton steps (19) for each component of variables and . The higher the spikes, the smaller the step-sizes are. Negative spikes represent the component-wise relative error between the Newton and quasi-Newton directions (20). The lower the spikes, the larger the relative difference between Newton and quasi-Newton components. To generate this figure, the problem was solved twice and, at the selected iteration, the Newton step and quasi-Newton step were saved. Only negative quasi-Newton directions were considered in the figure. It is possible to see in Figure 2(a) that very few components are responsible for the small step-sizes. Interestingly, most of those blocking components are associated with components of the quasi-Newton direction which differ considerably from the Newton direction. Unfortunately, numerical experiments show that the perturbation of variables or setting the problematic components to zero has the drawback of increasing the infeasibility and cannot be performed at every iteration.

To test the impact of each strategy on the quality of the steps, four linear programming problems were selected: afiro, GE, stocfor3 and finnis. The tests were performed as follows. Given an iteration of a problem, we run algorithm HOPDM allowing only Newton steps up to iteration . At iteration only one of each approach is applied: Newton step, quasi-Newton step, or one of the discussed strategies (i), (ii) or (iii). Only one affine-scaling predictor and one corrector were allowed, except for strategy (ii), where multiple centrality correctors were used at iteration . We repeated this procedure for from 2 up to the total number of iterations that the original version of HOPDM needed to declare convergence.

The average of the sum of the step-sizes for each problem and for each approach is shown in Table 1. We can see that quasi-Newton steps are considerably smaller than Newton steps. All improvement strategies are able to increase, on average, the sum of the step-sizes. Strategy (i) has the drawback of increasing the infeasibility and has a huge impact on the convergence of the algorithm. Strategy (iii) is simple and efficient to implement but has worse results when compared to strategy (ii), based on multiple centrality correctors. Strategy (ii) has the ability to improve quasi-Newton directions in almost all iterations and has the drawback of extra backsolves. Similar behavior was observed in [7]. The effect of strategy (ii) is shown in Figure 2(b). Step-sizes are increased, but the new quasi-Newton direction is slightly different from the Newton direction for the same step. Strategy (ii) was selected as the default one in our implementation.

In order to perform as few Newton steps as possible, step 2 of Algorithm 2 has to be carefully implemented. Clearly, the first basic condition to try a quasi-Newton step at iteration , , is to check if there is available memory to store it at iteration .

###### Criterion 1 (Memory criterion).

If .

Our experience shows that quasi-Newton steps should always be tried, since they are cheaper than Newton steps. This means that a quasi-Newton step is always tried (but not necessarily accepted) after a Newton step in the present implementation. As shown in Figure 1, using only Criterion 1 can lead to slow convergence and slow convergence is closely related to small step-sizes. Therefore, in addition to Criterion 1 we tested two criteria, which cannot be used together. In Section 5 we compare those different acceptance criteria.

###### Criterion 2 (α criterion).

If iteration is a quasi-Newton iteration and

 αkP+αkD≥εα.
###### Criterion 3 (Centrality criterion).

If iteration is a quasi-Newton iteration and

 xk+1Tzk+1≤εc(xkTzk).

## 5 Numerical results

Algorithm 2 was implemented in Fortran 77 as a modification of the primal-dual interior point algorithm HOPDM [24], release 2.45. The code was compiled using gfortran 4.8.5 and run in a Dell PowerEdge R830 powered with Red Hat Enterprise Linux, 4 processors Intel Xeon E7-4660 v4 2.2GHz and 512GB RAM. The modifications discussed in Sections 3 and 4 have been performed in order to accommodate the quasi-Newton strategy. The main stopping criteria have been set to Mehrotra and Li’s stopping criteria [7, 26]:

 μ1+|cTx|≤εopt,∥b−Ax∥1+∥b∥≤εP,∥c−ATλ−z∥1+∥c∥≤εD, (21)

where . By default, in HOPDM parameters are defined to , and is set to for linear problems and to for quadratic problems. In addition to (21), successful convergence is also declared when lack of improvement is detected and . Besides several performance heuristics, HOPDM implements the regularization technique [22] and the multiple centrality correctors strategy [7]. When solving systems with the unreduced matrix, sparse Cholesky factorization of normal equations or factorization of the augmented system is automatically selected on initialization. HOPDM also has a matrix-free [27] implementation for which the present approach is fully compatible.

According to Algorithm 2, once a quasi-Newton step is computed, it is used to build point . However, in practice, if such step is considered “bad”, it is also possible to discard it, setting , compute the exact Jacobian and perform the Newton step at this iteration. The idea is to avoid quasi-Newton steps which might degrade the quality of the current point. Preliminary experiments using linear programming problems from Netlib collection were performed, in order to test several possibilities for in Criterion 1 and to select between Criteria 2 and 3. In addition we also verified the possibility to reject quasi-Newton steps, instead of always accepting them. The selected combination uses and Criterion 3 with . Rejecting quasi-Newton steps has not led to reductions in the number of factorizations and has the drawback of more expensive iterations, therefore, the steps are always taken. As mentioned in Section 4, the multiple centrality correctors strategy (ii) is used to improve quasi-Newton directions.

A key comparison concerns the type of low rank update to be used. Three implementations were tested:

• General Broyden “bad” algorithm, described by Algorithm 1;

• Sparse Broyden “bad” algorithm, described by Algorithm 3 using update (15) inspired in Schubert’s update [20];

• General Broyden “good” algorithm, described by Algorithm 4.

Four test sets were used in the comparison: 96 linear problems from Netlib, 10 medium-sized linear problems from Maros-Mészáros misc library, 39 linear problems from the linear relaxation of Quadratic Assignment Problems (QAP) and 138 convex quadratic programming problems from Maros-Mészáros qpdata library. In order to compare algorithms in large test sets, performance profiles were used [28]. A problem is declared solved by an algorithm if the obtained solution satisfies (21). Number of factorizations or total CPU time are used as performance measures.

Using the default HOPDM values for (21), implementations U1, U2 and U3 are able to solve 269, 275 and 271 problems, respectively, out of 283. There were 19 problems in which at least one implementation did not solve. We relaxed the parameters in (21), multiplying them by a factor of , and solved the 19 problems again. The resulting performance profiles in numbers are shown in Table 2, using number of factorizations and CPU time as performance measures. The efficiency of an algorithm is the number of solved problems in which the algorithm spent the smallest number of factorizations (or the smallest amount of CPU time) among the compared algorithms. The robustness is the total number of problems solved.

We can see that update U2 solves 210 problems using the smallest number of factorizations and 137 problems using least CPU time, while U1 solves 177 and 126 and U3 solves 123 and 85, respectively. In addition, updates U2 and U3 are the most robust implementations, being able to solve 281 out of 283 problems. Therefore, U2 was used as the default update in this work. Update U2 has performed particularly well on quadratic problems, what explains the difference in efficiency between updates.

Based on the preliminary results, the default implementation of Algorithm 2, denoted qnHOPDM from now on, uses update U2 for solving (16) and computing the step, strategy (ii) to improve quasi-Newton directions and Criteria 1 and 3 to decide when to use quasi-Newton at step 2. By default, HOPDM uses multiple centrality correctors, which were shown to improve convergence of the algorithm [7]. We implemented two versions of Algorithm 2: with (qnHOPDM-mc) and without (qnHOPDM) multiple centrality correctors for computing Newton steps. Since we are using strategy (ii), multiple correctors are always used for quasi-Newton steps. Each implementation was compared against its respective original version: HOPDM-mc and HOPDM.

In the first round of tests only the QAP collection was excluded from the comparison, which gives 244 problems from Netlib and from Maros-Mészáros linear and quadratic programming test collection. The performance profiles using number of factorizations and CPU time as performance measures are shown in Figure 3. Comparisons between the implementation of HOPDM without multiple centrality correctors and qnHOPDM are given by Figures 3(a) and 3(b). The comparison of implementations HOPDM-mc and qnHOPDM-mc is displayed in Figures 3(c) and 3(d).

Similarly to the previous comparison, using default parameters, 5 problems were not solved by qnHOPDM or HOPDM without multiple centrality correctors, while 7 problems were not solved by qnHOPDM-mc or HOPDM-mc. Criteria (21) was relaxed in the same way on these problems. Using this approach, HOPDM is able to solve all the 244 problems, qnHOPDM solves 242, HOPDM-mc solves 243 and qnHOPDM-mc solves 242. The quasi-Newton implementations are able to successfully reduce the number of factorizations, as shown in Figures 3(a) and 3(c). We can see in Figure 3(a) that from all 242 problems considered solved by qnHOPDM, in 237 it uses less factorizations than HOPDM without multiple centrality correctors. On the other hand, for about 150 problems, qnHOPDM uses at least twice as much CPU time as HOPDM (Figure 3(b)). The behavior of the implementations using multiple centrality correctors in the Newton step is similar, but HOPDM-mc has improved efficiency results. The problems where qnHOPDM reduces both factorizations and CPU time when compared to HOPDM without centrality correctors are highlighted in Table 3. The only problem which qnHOPDM-mc uses strictly less CPU time than HOPDM-mc is the quadratic programming problem cont-101.

Our last comparison considers 39 medium-sized problems from the QAP collection. These problems are challenging, since they are sparse, but their Cholesky factorization is very dense. Performance profiles were once more used for comparing the implementations. As the algorithm approaches the solution, the linear systems become harder to solve. Therefore, using default HOPDM values for parameters in (21) the number of problems solved is 21 (HOPDM), 31 (qnHOPDM), 25 (HOPDM-mc) and 35 (qnHOPDM-mc). Clearly the quasi-Newton approach benefits of using matrices that are not too close to the solution. From the 39 problems, 19 were solved again using relaxed parameters for the comparison between HOPDM and qnHOPDM, and 14 were solved again for the comparison between HOPDM-mc and qnHOPDM-mc. The results are shown in Figure 4. Quasi-Newton IPM is the most efficient and robust algorithm in terms of CPU time for both implementations, solving all 39 problems. Without multiple centrality correctors (Figure 4(a)), HOPDM has a poor performance and is not able to solve any problem using less CPU time than qnHOPDM. When multiple centrality correctors are allowed (Figure 4(b)), HOPDM-mc is able to solve only 10 problems using less or equal CPU time than qnHOPDM-mc.

Clearly, the efficiency of qnHOPDM is due to the decrease in the number of factorizations, as shown in Table 4. In this table we display the number of factorizations (F) and CPU time (CPUt) for each problem and each algorithm in all QAP test problems considered. When no multiple centrality correctors are allowed at Newton steps, qnHOPDM displays the biggest improvements, being the fastest solver in all problems. The results are more competitive when multiple centrality correctors are allowed, but qnHOPDM-mc was the most efficient in 29 problems while HOPDM-mc was the most efficient in 10 problems.