QuasiNewton approaches to Interior Point Methods for quadratic problems
Abstract
Interior Point Methods (IPM) rely on the Newton method for solving
systems of nonlinear equations. Solving the linear systems which
arise from this approach is the most computationally expensive task
of an interior point iteration. If, due to problem’s inner
structure, there are special techniques for efficiently solving
linear systems, IPMs enjoy fast convergence and are able to solve
large scale optimization problems. It is tempting to try to replace
the Newton method by quasiNewton methods. QuasiNewton approaches
to IPMs either are built to approximate the Lagrangian function for
nonlinear programming problems or provide an inexpensive
preconditioner. In this work we study the impact of using
quasiNewton methods applied directly to the nonlinear system of
equations for general quadratic programming problems. The cost of
each iteration can be compared to the cost of computing correctors
in a usual interior point iteration. Numerical experiments show that
the new approach is able to reduce the overall number of matrix
factorizations and is suitable for a matrixfree implementation.
Keywords: Broyden Method, QuasiNewton, Interior Point Methods, Matrixfree, Quadratic Programming Problems
1 Introduction
Let us consider the following general quadratic programming problem
(1) 
where , , and . We will suppose that the rows of are linearly independent. Define function by
(2) 
where are diagonal matrices defined by and , respectively, and is the vector of ones of appropriate size. First order necessary conditions for (1) state that, if is a minimizer, then there exist , , and such that .
PrimalDual IPMs try to solve (1) by solving a sequence of relaxed constrained nonlinear equations in the form of
(3) 
where is called the barrier parameter, which is associated with the logarithmic barrier applied to the inequalities used to derive the method [1, 2]. As more importance is given to optimality over feasibility. Systems of type (3) are not easy to solve. When , they can be solved by general algorithms for bounded nonlinear systems [3, 4]. In this case, a suitable merit function, usually , has to be used to select the stepsizes. IPMs try to stay near the solution of (3), called the central path, and reduce at each iteration. Instead of solving (3) exactly, one step of the Newton method is applied. Thus, given an iterate , in the interior of the bound constraints, i.e. , the next point is given by
(4) 
where is computed by solving some Newtonlike systems
(5) 
where and is the Jacobian of , defined by
(6) 
Standard predictorcorrector algorithms solve (5) twice: first the affine scaling predictor is computed for and then the corrector step is computed using , with , . Additional correctors can be computed in one iteration to further accelerate convergence, such as second order correctors [5] or multiple centrality correctors [6]. Scalars and are selected such that and , respectively.
The most expensive task during an interior point (IP) iteration is to solve (5). The coefficient matrix is known as unreduced matrix and has dimension , but its nice structure allows efficient solution techniques to be used. The most common approaches for solving the linear system in IPMs are to work with augmented system or normal equations. If we eliminate in (5), we have the augmented system for which we can solve directly using matrix factorizations or compute adequate preconditioners and solve iteratively by Krylov subspace methods. If matrix is easily invertible, or (linear programming problems), it is possible to further eliminate and solve the normal equations by Cholesky factorization or by Conjugate Gradients, depending on the size of the problem. For both approaches it is known that computing good preconditioners or computing the factorization can be most expensive part of the process. Therefore (5) can be solved several times for the same with different righthand sides, in a classical predictorcorrector approach [5] or in the multiple centrality correctors framework [7, 1]. In this work we will extensively use the fact that the backsolves in (5) are less expensive than computing a good preconditioner or factorization.
Although is unsymmetric, under reasonable assumptions Greif, Moulding and Orban showed that it has only real eigenvalues [8]. Based on those results, Morini, Simoncini and Tani [9] developed preconditioners for the unreduced matrix and compared the performance of interior point methods using unreduced matrices and augmented system. The unreduced matrix has also two more advantages, when compared to augmented system and normal equations. First, small changes of variables or result in small changes in . Second, is the Jacobian of , so it is possible to approximate it by building models or evaluating on some extra points. These two characteristics are explored in this work.
Since is the Jacobian of , it is natural to ask if it can be approximated by evaluating in some points. Function is composed by two linear and one nonlinear functions. Therefore, the only part of which may change during iterations is the third row. Moreover, it can be efficiently stored by just storing , , and . Since computing and storing is inexpensive, the only reason to use an approximation of is if system (5), using instead of , becomes easier to solve. That is where quasiNewton methods and low rank updates become an interesting tool in interior point methods.
QuasiNewton methods are well known techniques for solving large scale nonlinear systems or nonlinear optimization problems. The main motivation is to replace the Jacobian used by the traditional Newton method by its good and inexpensive approximation. Originally, they were useful to avoid computing the derivatives of , but they have become popular as a large scale tool, since they usually do not need to explicitly build matrices and enjoy superlinear convergence. Classical references for quasiNewton methods are [10, 11] for nonlinear equations and [12] for unconstrained optimization.
In the review [11] about practical quasiNewton methods for solving nonlinear equations, Martínez suggests that there is room for studying such techniques in the interior point context. The author points to the work of Dennis Jr., Morshedi and Turner [13] which applies quasiNewton techniques to make the projections in Karmarkar’s algorithm cheaper. The authors write the interpolation equations associated with the linear system in interior point iterations and describe a fast algorithm to compute updates and also to update an already existing Cholesky factorization. When solving general nonlinear programming problems by IPMs, a well known approach is to replace the Hessian of the Lagrangian function by low rank approximations [12].
In 2000, Morales and Nocedal [14] used quasiNewton arguments to show that the directions calculated by the Conjugate Gradient algorithm can be used to build an automatic preconditioner for the matrix under consideration. The preconditioner is a sequence of rankone updates of an initial diagonal matrix. Such approach is efficient when solving a sequence of linear systems with the same (or a slowly varying) coefficient matrix. Based on those ideas, a limited memory BFGSlike preconditioner for positive definite matrices was developed in [15] and was specialized for symmetric indefinite matrices in [16] . Recently, Bergamaschi et al. [17] developed limitedmemory BFGSlike preconditioners to KKT systems arising from IP iterations and described their spectral properties. The approach was able to reduce the number of iterations in the Conjugate Gradient algorithm, but the approximation deteriorates as the number of interior point iterations increase. Also, extra linear algebra has to be performed to ensure orthogonality of the vectors used to build the updates.
In all works, with exception of [13], the main focus was to use low rank updates of an already computed preconditioner such that new preconditioners are constructed in an inexpensive way and reduce the overall number of linear algebra iterations. In the present work, our main objective is to work directly with nonlinear equations and use low rank secant updates for computing the directions in the IP iterations. We use least change secant updates, in particular Broyden updates, and replace the Newton system (5) by an equivalent one. Some properties of the method are presented and extensive numerical experiments are performed. The main features of the proposed approach are:

Low rank approximations are matrixfree and use only vector multiplications and additions;

The quasiNewton method for solving (5) can be easily inserted into an existing IPM;

The number of factorizations is reduced for small and large instances of linear and quadratic problems;

When the cost of the factorization is considerably higher than the cost of the backsolves, the total CPU time is also decreased.
In Section 2 we discuss the basic ideas of quasiNewton methods, in particular the Broyden method, which is extensively used in the work. In Section 3 we show that, if the initial approximation is good enough, least change secant updates preserve most of the structure of the true coefficient matrix and a traditional IP iteration can be performed with the cost of computing correctors only. New low rank secant updates, which are able to exploit the sparsity of are also discussed. In Section 4 we describe the aspects of a successful implementation of a quasiNewton interior point method. In Section 5 we compare our approach with a research implementation of the primaldual IPM for solving small and mediumsized linear and quadratic problems. Finally, in Section 6 we draw the conclusions and mention possible extensions of the method.
Notation.
Throughout this work we use and as short versions of vector and matrix , respectively. The vector denotes the vector of ones of appropriate dimension.
2 Background for quasiNewton methods
QuasiNewton methods can be described as algorithms which use approximations to the Jacobian in the Newton method in order to solve nonlinear systems. The approximations are generated using information from previous iterations. Suppose that we want to find such that , where is continuously differentiable. Given the current point at iteration , Newton method builds a linear model of around in order to find . Now, suppose that and have already been calculated and let us create a linear model for around :
(7) 
The choice results in the Newton method for iteration . In secant methods, is constructed such that interpolates at and , which gives us the secant equation
(8) 
where and . When and there are more unknowns than equations and several choices for exist [18, 11].
Let be the current approximation to , the Jacobian of at (it can be itself, for example). One of the most often used simple secant approximations for unsymmetric Jacobians is given by the Broyden “good” method. Given , a new approximation to is given by
(9) 
Matrix is the closest matrix to , in Frobenius norm, which satisfies (8). The update of the Broyden method belongs to the class of least change secant updates, since is a rankone update of . As we are interested in solving a linear system, it may be interesting to analyze matrix , which is obtained by the well known ShermanMorrisonWoodbury formula:
(10) 
where and . We can see that is also a least change secant update of . To store , one needs first to compute and then store one scalar and two vectors. Storing is more efficient than storing when is going to be used more than once. According to (10), the cost of computing is the cost of computing plus one scalar product and one sum of vectors times a scalar. After updates of an initial approximation , current approximation is given by
Instead of updating and then computing its inverse, the Broyden “bad” method directly computes the least change secant update of the inverse:
(11) 
where and . Similarly to in (10), given by (11) is the closest matrix of , in the Frobenius norm, such that satisfies (8). The cost of storing is lower than that of (10), since vectors and have already been computed. The cost of calculating is higher: it involves one scalar product, two sums of vector times a scalar and . After updates of an initial approximation , current approximation is given by
(12) 
Approach (11) has some advantages over (10). First, it does not need to compute for constructing the update. When is a complicated matrix, this is a costly operation. Second, unlike (10), matrices depend solely on and for all , so it is possible to replace by different matrices without updating the whole structure. This is suitable to be applied in a limitedmemory scheme [16]. Third, the computation of can be efficiently implemented in a scheme similar to the BFGS update described in [12], as we show in Algorithm 1. Unfortunately, the Broyden “bad” method is known to behave worse in practice than the “good” method [10]. To avoid the extra cost of computing in (10) it is common to compute a Cholesky or LU factorization of and work directly with (9), performing rankone updates of the factorization, which can be efficiently implemented [19].
The class of rankone least change secant updates can be generically represented by updates of the form
(13) 
where . Setting defines the Broyden “good” method and defines the Broyden “bad” method. Several other well known quasiNewton methods fit in update (13), such as the Symmetric Rank1 update used in nonlinear optimization, which defines . See [18, 10] for details on least change secant updates.
3 A quasiNewton approach for IP iterations
According to the general description of primaldual IPMs in Section 1, we can see that, at each iteration, they perform one Newton step associated with the nonlinear system (3), for decreasing values of . Each step involves the computation of the Jacobian of and the solution of a linear system (5).
Our proposal for this work is to perform one quasiNewton step to solve (3), replacing the true Jacobian by a low rank approximation . The idea might seem surprising at first glance, since, for quadratic problems, is very cheap to evaluate. In this section we further develop the quasiNewton ideas applied to interior point methods and show that they might help to reduce the cost of the linear algebra when solving (1).
It is important to note that and discussed in Section 2 will be given by (2) and (6), respectively, in the interior point context, which highlights the importance of using the unreduced matrix in our analysis. Therefore, variable in Section 2 is given by and, consequently, .
3.1 Initial approximation and update
Suppose that is an interior point iteration for which system (5) was solved and was calculated, using any available technique. Usually, solving (5) involves an expensive factorization or the computation of a good preconditioner associated with . Most traditional quasiNewton methods for general nonlinear systems compute by finite differences or use a diagonal matrix as the initial approximation. According to Section 2, it is necessary to have an initial approximation of in order to generate approximation of by low rank updates. Most of traditional quasiNewton methods for general systems compute by finite differences or use a diagonal matrix. Since have already been computed, we will define it as , i.e., the perfect approximation to . It is clear that, in such case, is the approximation to .
In order to compute , vectors and in secant equation (8) have to be built:
(14)  
The use of as the initial approximation ensures that the first two block elements of are zero. This is a well known property of low rank updates given by (13) when applied to linear functions (see [10, Ch. 8]). In Lemma 1 we show that rankone secant updates maintain most of the good sparsity structure of approximation when its structure is similar to the true Jacobian of .
Lemma 1.
Proof.
By Section 2 we know that Broyden “good” and “bad” updates are represented by specific choices of and, therefore, enjoy the consequences of Lemma 1. Unfortunately, not much can be said about the structure of the “third row” of . When , the diagonal structure of blocks and , as well as the zero block in the middle, are likely to be lost. However, if we select , then, by Lemma 1, the zero block is kept in . The update given by this choice of is a particular case of Schubert’s quasiNewton update for structured and sparse problems [20]. This update minimizes the distance to on the space of the matrices that satisfy (8) and have the same block sparsity pattern of [18]. Using the ShermanMorrisonWoodbury formula, we also have the update for :
which only needs an extra computation of to be stored. There is no need to store , since it is composed by components of . We can say that this approach is inspired in the Broyden “good” update.
On the other hand, if we use , then we still have by Lemma 1 and, in addition, we are able to remove the calculation in the inverse. This approach is inspired by the Broyden “bad” update and results in the following update
(15) 
Up to the knowledge of the authors, this update has not been theoretically studied in the literature.
Lemma 1 also justifies our choice to work with approximations of rather than . After rankone updates, if is solved by factorizations and backsolves, it would be necessary to perform updates on the factorization of initial matrix , what could introduce many nonzero elements. A clear benefit of defining is that computing uses the already calculated factorizations/preconditioners for , which were originally used to solve (5) at iteration . Step 1 of Algorithm 1 is an example of low rank update (12). Clearly, we do not explicitly compute , but instead solve the system .
3.2 Computation of quasiNewton steps
Having defined how quasiNewton updates are initialized and constructed, we now have to insert the approximations in an interior point framework. Denoting as the starting point of the algorithm, at the end of any iteration it is possible to build a rankone secant approximation of the unreduced matrix to be used at iteration . Let us consider iteration , where and . If , then, by the previous subsection, and the step in the interior point iteration is the usual Newton step, given by (5). If , we have a quasiNewton step, which can be viewed as a generalization of (5), and is computed by solving
(16) 
or, equivalently, by performing . All the other steps of the IPM remain exactly the same.
When , the cost of solving (16) depends on the type of update that is used. In general, it is the cost of solving system (or, equivalently, ) plus some vector multiplications and additions. However, since has already been the coefficient matrix of a linear system at iteration , it is usually less expensive than solving for the first time. That is one of the main improvements that a quasiNewton approach brings to interior point methods.
When the Broyden “bad” update (12) is used together with defining as the initial approximation, it is possible to derive an alternative interpretation of (16). Although this update is known to have worse numerical behavior when compared with the “good” update (10), this interpretation can result in a more precise implementation, which is described in Lemma 2.
Lemma 2.
Assume that and is the approximation of constructed by updates (12) using initial approximation . Given , the computation of is equivalent to the solution of
where , for .
Proof.
Using the expansion (11) of Broyden “bad” update, the definition of and the fact that , we have that
(17) 
where the last equality comes from the definition of in (11), applied recursively. When , we assume that results in the identity matrix, therefore . Multiplying on the left on both sides of (17), we obtain
By Lemma 1 and definition (14), the first two components of are zero, for all , which demonstrates the lemma. ∎
Lemma 2 states that only the third component of the right hand side actually needs to be changed in order to compute Broyden “bad” quasiNewton steps at iteration . This structure is very similar to corrector or multiple centrality correctors in IPMs and reinforce the argument that the cost of computing a quasiNewton step is lower than the Newton step. It is important to note that scalars are the same as the ones computed at step 1 of Algorithm 1.
3.3 Dealing with regularization
Rankdeficiency of , near singularity of or the lack of strict complementarity at the solution may cause matrix , the augmented system or the normal equations to become singular near the solution of (1). As the iterations advance, it becomes harder to solve the linear systems. Regularization techniques address this issue by adding small perturbations to in order to increase numerical accuracy and convergence speed, without losing theoretical properties. A common approach is to interpret the perturbation as the addition of weighted proximal terms to the primal and dual formulations of (1). Saunders and Tomlin [21] consider fixed perturbations while Altman and Gondzio [22] consider dynamic ones, computed at each iteration. Friedlander and Orban [23] add extra variables to the problem, expand the unreduced system and, after an initial reduction, arrive in a regularized system similar to [22]. In all these approaches, given reference points and , the regularized matrix
(18) 
where diagonal matrices and represent primal and dual regularization, respectively, can be viewed as the Jacobian of the following function
Any choice is possible for reference points and . However, in order to solve the original Newton system (5) and make use of the good properties of the regularization (18) at the same time, they are usually set to the current iteration points and , respectively, which annihilates terms and on the right hand side of (5) during affine scaling steps.
Matrix given by (18) now depends on and in addition to and . The regularization terms and do not need to be considered as variables, but if new regularization parameters are used, a new factorization or preconditioner needs to be computed. Since this is one of the most expensive tasks of the IP iteration, during quasiNewton step the regularization parameters are not allowed to change from those selected at iteration , where the initial approximation was selected. That is a reasonable decision, as the system that is actually being solved in practice has the coefficient matrix from iteration . The fact that the regularization terms are linear in implies, by Lemma 1, that the structure of (18) is maintained during least change secant updates.
The reference points have no influence in , but they do influence the function . Suppose, as an example, that , i.e., the initial approximation for quasiNewton is the Jacobian at the starting point , and only quasiNewton steps are taken in the interior point algorithm. If we use and as the reference points and the algorithm converges, the limit point could be very different from the true solution, as initial points usually are far away from the solution, especially for infeasible IPMs. If we update the reference points at each quasiNewton iteration, as it is usually the choice in literature [22, 23], we eliminate their effect on the right hand side of (16) during affine scaling steps. By (7), is the Jacobian of a linear approximation of which interpolates and . As the regularization parameters are fixed during quasiNewton iterations, the reference points can be seen as simple constant shifts on , with no effect on the Jacobian. Therefore, the only request is that has to be evaluated at points and using the same reference points, when calculating by (14). The effect of changing the reference points at each iteration in practice is the extra evaluation of at the beginning of iteration .
4 Implementation
The quasiNewton approach can easily be inserted into an existing interior point method implementation. In this work, the primaldual interior point algorithm HOPDM [24] was modified to implement the quasiNewton approach. Algorithm 2 describes the steps of a conceptual quasiNewton primaldual interior point algorithm.
The most important element of Algorithm 2 is , the memory size of the low rank update, which controls if the iteration involves Newton or quasiNewton steps. At step 2 several systems (16) might be solved, depending on the IPM used. HOPDM implements the strategy of multiple centrality correctors [7], which tries to maximize the stepsize at the iteration. HOPDM also implements the regularization strategy (18). Note in (16) that we do not have to care how the systems are solved, only how to implement the matrixvector multiplication efficiently.
Step 2 is the most important step in a quasiNewton IP algorithm, since it decides whether or not quasiNewton steps will be used in the next iteration. Several possible strategies are discussed in this section, as well as some implementation details.
Bound constraints
can be considered in the general definition (1) of a quadratic programming problem by using slack variables. HOPDM explicitly deals with bound constraints and increases the number of variables to . When bound constraints are considered, function is given by
and the Jacobian is
Note that, in this case, is eliminated by proper shifts, represents upper shifted constraints and represents slacks. All the results and discussions considered so far can be easily adapted to the boundconstrained case. Therefore, in order to keep notation simple, we will refer to the more general and simpler formulation (1) and work in the dimensional space.
4.1 Storage of and computation of
When solving quadratic problems, the Jacobian of function used in a primaldual interior point method is not expensive to compute and has an excellent structure, which can be efficiently explored by traditional approaches. Therefore, there is no point in explicitly building approximation matrix (or ) since, by Lemma 1, they would be denser. For an efficient implementation of the algorithm only the computation has to be performed in (16). To accomplish this task, we store

Initial approximation and

Triples or , , if updates are based on Broyden “good” or “bad” method, respectively.
In order to store we have to store vectors and , since all other blocks of are constant. If regularization is being used, vectors and used at iteration are also stored. The reference points are not stored. The most important structure to store is the factorization or the preconditioner computed when solving (16) at iteration for the first time. Without this information, the computation of would have the same computational cost of using the true matrix . Data is stored at step 2 of Algorithm 2, whenever it has decided to store quasiNewton information and .
Regarding the triples, they are composed of two dimensional vectors and one scalar. Storing is the most expensive part in Broyden “bad” updates, since function has to be evaluated twice. In Broyden “good” updates the computation of is the most expensive, due to the computation of .
The implementation of an algorithm to compute depends on the selected type of low rank update. Algorithm 1 is an efficient implementation of the general Broyden “bad” update (12). If the structure described by Lemma 1 is being used, then all vector multiplications are performed before the solution of the linear system, as described by Algorithm 3. Both algorithms can be easily modified to use updates of the form in the generic update (13). The only changes are the storage of an extra vector and the computation of scalars at step 1. The implementation of the sparse update (15) is straightforward and there is no need to store extra information. Algorithm 3 uses a little extra computation, since vector is discarded after the computation of all . On the other hand, there is no need to store blocks , .
Algorithm 4 describes the steps to compute when Broyden “good” update (10) is considered. Note that a linear system is first solved, then a sequence of vector multiplications and additions is applied. The algorithm is simpler and more general than Algorithm 1, but it has to be called more often in an interior point algorithm: to compute the steps (step 2 in Algorithm 2) and to compute , needed to build (step 2 in Algorithm 2). Algorithm 4 is very general and can be easily modified to consider any least change secant update of the form (13) without extra storage requirements, although not necessarily in an efficient way.
4.2 Size of
The cost of computing increases as the quasiNewton memory increases. In addition, it was observed that the quality of the approximation decreases when the quasiNewton memory is large [17]. In our implementation of Algorithm 2, we also observed the decrease in the quality of the steps when is too large. The decrease of the barrier parameter for different bounds on is shown in Figure 1, for problem afiro, the smallest example in Netlib test collection. In this example, Newton steps were allowed after quasiNewton iterations, where . The maximum of 200 iterations was allowed.
We can see that if the Jacobian is only evaluated once () then the method is unable to converge in 200 iterations. As the maximum memory is reduced, the number of iterations to convergence is also reduced. On the other hand, the number of (possibly expensive) Newton steps is increased. When , i.e., no quasiNewton steps, the algorithm converges in 7 iterations. We take the same approach as [17] and define an upper bound on in the implementation of Algorithm 2. When this upper bound is reached, we set to 0, which, by (16), results in the computation of a Newton step. The verification is performed at step 2 of Algorithm 2. This approach is also known as quasiNewton with restarts [25] and differs from usual limitedmemory quasiNewton [12], where only the oldest information is dropped.
4.3 The quasiNewton steps
The behavior of consecutive quasiNewton steps depicted in Figure 1 reminds us that it is important to use the true Jacobian in order to improve convergence of the method. However, we would like to minimize the number of times the Jacobian is evaluated, since it involves expensive factorizations and computations. Unfortunately, to use only the memory bound as a criterion to compute quasiNewton steps is not a reasonable choice. When , for example, the algorithm converges in 110 iterations, but it spends around 60 iterations without any improvement. As the dimension of the problem increases, this behavior is getting even worse. We can also see that the choice is better for this problem, as the algorithm converges in 31 iterations, computing only two times the Cholesky factorization of the Jacobian.
The lack of reduction is related to small stepsizes and . Our numerical experience with quasiNewton IP methods indicates that the quasiNewton steps often are strongly attracted to the boundaries. The stepsizes calculated for directions originated from a quasiNewton predictorcorrector strategy are almost always small and need to be fixed. Several strategies have been tried to increase the stepsizes of those steps:

Perturb complementarity pairs for which the relative componentwise direction magnitude
(19) is high and then recompute quasiNewton direction;

Use multiple centrality correctors [7];

Gentle reduction of on quasiNewton iterations, selecting close to 1 in the predictor and corrector steps.
Note that the terms in (i) are the inverse of the maximum stepsize allowed by each component.
The motivation of strategy (i) is the strong relation observed between components of the quasiNewton direction which are too large with respect their associated variable and components which differ too much from the respective component of the Newton direction for the same iteration, i.e.,
(20) 
We display this relation in Figure 2(a) for one iteration on linear problem GE. Positive spikes represent the componentwise relative magnitude of quasiNewton steps (19) for each component of variables and . The higher the spikes, the smaller the stepsizes are. Negative spikes represent the componentwise relative error between the Newton and quasiNewton directions (20). The lower the spikes, the larger the relative difference between Newton and quasiNewton components. To generate this figure, the problem was solved twice and, at the selected iteration, the Newton step and quasiNewton step were saved. Only negative quasiNewton directions were considered in the figure. It is possible to see in Figure 2(a) that very few components are responsible for the small stepsizes. Interestingly, most of those blocking components are associated with components of the quasiNewton direction which differ considerably from the Newton direction. Unfortunately, numerical experiments show that the perturbation of variables or setting the problematic components to zero has the drawback of increasing the infeasibility and cannot be performed at every iteration.
(a)  (b) 
To test the impact of each strategy on the quality of the steps, four linear programming problems were selected: afiro, GE, stocfor3 and finnis. The tests were performed as follows. Given an iteration of a problem, we run algorithm HOPDM allowing only Newton steps up to iteration . At iteration only one of each approach is applied: Newton step, quasiNewton step, or one of the discussed strategies (i), (ii) or (iii). Only one affinescaling predictor and one corrector were allowed, except for strategy (ii), where multiple centrality correctors were used at iteration . We repeated this procedure for from 2 up to the total number of iterations that the original version of HOPDM needed to declare convergence.
The average of the sum of the stepsizes for each problem and for each approach is shown in Table 1. We can see that quasiNewton steps are considerably smaller than Newton steps. All improvement strategies are able to increase, on average, the sum of the stepsizes. Strategy (i) has the drawback of increasing the infeasibility and has a huge impact on the convergence of the algorithm. Strategy (iii) is simple and efficient to implement but has worse results when compared to strategy (ii), based on multiple centrality correctors. Strategy (ii) has the ability to improve quasiNewton directions in almost all iterations and has the drawback of extra backsolves. Similar behavior was observed in [7]. The effect of strategy (ii) is shown in Figure 2(b). Stepsizes are increased, but the new quasiNewton direction is slightly different from the Newton direction for the same step. Strategy (ii) was selected as the default one in our implementation.
Newton  QuasiNewton  (i)  (ii)  (iii)  

afiro  1.826500  0.849070  1.065883  0.908283  
GE  0.911343  0.079197  0.264640  0.142124  
stocfor3  1.006294  0.089973  0.569176  0.386584  
finnis  1.454824  0.074488  0.452727  0.405455 
In order to perform as few Newton steps as possible, step 2 of Algorithm 2 has to be carefully implemented. Clearly, the first basic condition to try a quasiNewton step at iteration , , is to check if there is available memory to store it at iteration .
Criterion 1 (Memory criterion).
If .
Our experience shows that quasiNewton steps should always be tried, since they are cheaper than Newton steps. This means that a quasiNewton step is always tried (but not necessarily accepted) after a Newton step in the present implementation. As shown in Figure 1, using only Criterion 1 can lead to slow convergence and slow convergence is closely related to small stepsizes. Therefore, in addition to Criterion 1 we tested two criteria, which cannot be used together. In Section 5 we compare those different acceptance criteria.
Criterion 2 ( criterion).
If iteration is a quasiNewton iteration and
Criterion 3 (Centrality criterion).
If iteration is a quasiNewton iteration and
5 Numerical results
Algorithm 2 was implemented in Fortran 77 as a modification of the primaldual interior point algorithm HOPDM [24], release 2.45. The code was compiled using gfortran 4.8.5 and run in a Dell PowerEdge R830 powered with Red Hat Enterprise Linux, 4 processors Intel Xeon E74660 v4 2.2GHz and 512GB RAM. The modifications discussed in Sections 3 and 4 have been performed in order to accommodate the quasiNewton strategy. The main stopping criteria have been set to Mehrotra and Li’s stopping criteria [7, 26]:
(21) 
where . By default, in HOPDM parameters are defined to , and is set to for linear problems and to for quadratic problems. In addition to (21), successful convergence is also declared when lack of improvement is detected and . Besides several performance heuristics, HOPDM implements the regularization technique [22] and the multiple centrality correctors strategy [7]. When solving systems with the unreduced matrix, sparse Cholesky factorization of normal equations or factorization of the augmented system is automatically selected on initialization. HOPDM also has a matrixfree [27] implementation for which the present approach is fully compatible.
According to Algorithm 2, once a quasiNewton step is computed, it is used to build point . However, in practice, if such step is considered “bad”, it is also possible to discard it, setting , compute the exact Jacobian and perform the Newton step at this iteration. The idea is to avoid quasiNewton steps which might degrade the quality of the current point. Preliminary experiments using linear programming problems from Netlib collection were performed, in order to test several possibilities for in Criterion 1 and to select between Criteria 2 and 3. In addition we also verified the possibility to reject quasiNewton steps, instead of always accepting them. The selected combination uses and Criterion 3 with . Rejecting quasiNewton steps has not led to reductions in the number of factorizations and has the drawback of more expensive iterations, therefore, the steps are always taken. As mentioned in Section 4, the multiple centrality correctors strategy (ii) is used to improve quasiNewton directions.
A key comparison concerns the type of low rank update to be used. Three implementations were tested:
Four test sets were used in the comparison: 96 linear problems from Netlib^{3}^{3}3http://www.netlib.org/lp/data/, 10 mediumsized linear problems from MarosMészáros misc library^{4}^{4}4http://old.sztaki.hu/~meszaros/public_ftp/lptestset/misc/, 39 linear problems from the linear relaxation of Quadratic Assignment Problems (QAP)^{5}^{5}5http://anjos.mgi.polymtl.ca/qaplib/inst.html and 138 convex quadratic programming problems from MarosMészáros qpdata library^{6}^{6}6http://old.sztaki.hu/~meszaros/public_ftp/qpdata/. In order to compare algorithms in large test sets, performance profiles were used [28]. A problem is declared solved by an algorithm if the obtained solution satisfies (21). Number of factorizations or total CPU time are used as performance measures.
Using the default HOPDM values for (21), implementations U1, U2 and U3 are able to solve 269, 275 and 271 problems, respectively, out of 283. There were 19 problems in which at least one implementation did not solve. We relaxed the parameters in (21), multiplying them by a factor of , and solved the 19 problems again. The resulting performance profiles in numbers are shown in Table 2, using number of factorizations and CPU time as performance measures. The efficiency of an algorithm is the number of solved problems in which the algorithm spent the smallest number of factorizations (or the smallest amount of CPU time) among the compared algorithms. The robustness is the total number of problems solved.
We can see that update U2 solves 210 problems using the smallest number of factorizations and 137 problems using least CPU time, while U1 solves 177 and 126 and U3 solves 123 and 85, respectively. In addition, updates U2 and U3 are the most robust implementations, being able to solve 281 out of 283 problems. Therefore, U2 was used as the default update in this work. Update U2 has performed particularly well on quadratic problems, what explains the difference in efficiency between updates.
Efficiency  Efficiency  Robustness  

Factorization  CPU time  
U1  177  126  280 
U2  210  137  281 
U3  123  85  281 
Based on the preliminary results, the default implementation of Algorithm 2, denoted qnHOPDM from now on, uses update U2 for solving (16) and computing the step, strategy (ii) to improve quasiNewton directions and Criteria 1 and 3 to decide when to use quasiNewton at step 2. By default, HOPDM uses multiple centrality correctors, which were shown to improve convergence of the algorithm [7]. We implemented two versions of Algorithm 2: with (qnHOPDMmc) and without (qnHOPDM) multiple centrality correctors for computing Newton steps. Since we are using strategy (ii), multiple correctors are always used for quasiNewton steps. Each implementation was compared against its respective original version: HOPDMmc and HOPDM.
In the first round of tests only the QAP collection was excluded from the comparison, which gives 244 problems from Netlib and from MarosMészáros linear and quadratic programming test collection. The performance profiles using number of factorizations and CPU time as performance measures are shown in Figure 3. Comparisons between the implementation of HOPDM without multiple centrality correctors and qnHOPDM are given by Figures 3(a) and 3(b). The comparison of implementations HOPDMmc and qnHOPDMmc is displayed in Figures 3(c) and 3(d).
Without multiple centrality correctors  
Cholesky factorizations  CPU time 
(a)  (b) 
With multiple centrality correctors  
Cholesky factorizations  CPU time 
(c)  (d) 
Similarly to the previous comparison, using default parameters, 5 problems were not solved by qnHOPDM or HOPDM without multiple centrality correctors, while 7 problems were not solved by qnHOPDMmc or HOPDMmc. Criteria (21) was relaxed in the same way on these problems. Using this approach, HOPDM is able to solve all the 244 problems, qnHOPDM solves 242, HOPDMmc solves 243 and qnHOPDMmc solves 242. The quasiNewton implementations are able to successfully reduce the number of factorizations, as shown in Figures 3(a) and 3(c). We can see in Figure 3(a) that from all 242 problems considered solved by qnHOPDM, in 237 it uses less factorizations than HOPDM without multiple centrality correctors. On the other hand, for about 150 problems, qnHOPDM uses at least twice as much CPU time as HOPDM (Figure 3(b)). The behavior of the implementations using multiple centrality correctors in the Newton step is similar, but HOPDMmc has improved efficiency results. The problems where qnHOPDM reduces both factorizations and CPU time when compared to HOPDM without centrality correctors are highlighted in Table 3. The only problem which qnHOPDMmc uses strictly less CPU time than HOPDMmc is the quadratic programming problem cont101.
HOPDM  qnHOPDM  HOPDMmc  qnHOPDMmc  

F  CPUt  F  CPUt  F  CPUt  F  CPUt  
dfl001  53  56.975  24  24  28.178  21  36.122  
marosr7  16  2.362  8  10  1.723  8  2.361  
pilot87  31  4.242  10  15  2.472  11  3.576  
cont101  11  1.138  5  9  1.255  5  
cont200  9  6.992  5  9  8.031  12  15.666  
dualc8  121  0.049  5  61  0.036  23  0.056  
hs35  8  0.025  3  7  0.022  3  0.023  
tame  5  0.021  2  5  0.020  2  0.021 
Our last comparison considers 39 mediumsized problems from the QAP collection. These problems are challenging, since they are sparse, but their Cholesky factorization is very dense. Performance profiles were once more used for comparing the implementations. As the algorithm approaches the solution, the linear systems become harder to solve. Therefore, using default HOPDM values for parameters in (21) the number of problems solved is 21 (HOPDM), 31 (qnHOPDM), 25 (HOPDMmc) and 35 (qnHOPDMmc). Clearly the quasiNewton approach benefits of using matrices that are not too close to the solution. From the 39 problems, 19 were solved again using relaxed parameters for the comparison between HOPDM and qnHOPDM, and 14 were solved again for the comparison between HOPDMmc and qnHOPDMmc. The results are shown in Figure 4. QuasiNewton IPM is the most efficient and robust algorithm in terms of CPU time for both implementations, solving all 39 problems. Without multiple centrality correctors (Figure 4(a)), HOPDM has a poor performance and is not able to solve any problem using less CPU time than qnHOPDM. When multiple centrality correctors are allowed (Figure 4(b)), HOPDMmc is able to solve only 10 problems using less or equal CPU time than qnHOPDMmc.
Clearly, the efficiency of qnHOPDM is due to the decrease in the number of factorizations, as shown in Table 4. In this table we display the number of factorizations (F) and CPU time (CPUt) for each problem and each algorithm in all QAP test problems considered. When no multiple centrality correctors are allowed at Newton steps, qnHOPDM displays the biggest improvements, being the fastest solver in all problems. The results are more competitive when multiple centrality correctors are allowed, but qnHOPDMmc was the most efficient in 29 problems while HOPDMmc was the most efficient in 10 problems.
(a)  (b) 
HOPDM  qnHOPDM  HOPDMmc  qnHOPDMmc  

F  CPUt  F  CPUt  F  CPUt  F  CPUt  
qap8  12  0.657  5  0.438 