1 Introduction

A Frank-Wolfe Based Branch-and-Bound Algorithm

[1ex] for Mean-Risk Optimization

C. Buchheim, M. De Santis, F. Rinaldi, L. Trieu

Fakultät für Mathematik

TU Dortmund

Vogelpothsweg 87 - 44227 Dortmund - Germany

Institut für Mathematik

Universitätsstrasse 65-67, 9020 Klagenfurt - Austria

Dipartimento di Matematica

Via Trieste, 63 - 35121 Padova - Italy

e-mail (Buchheim): christoph.buchheim@tu-dortmund.de

e-mail (De Santis): marianna.desantis@aau.at

e-mail (Rinaldi): rinaldi@math.unipd.it

e-mail (Trieu): long.trieu@math.tu-dortmund.de

Abstract

We present an exact algorithm for mean-risk optimization subject to a budget constraint, where decision variables may be continuous or integer. The risk is measured by the covariance matrix and weighted by an arbitrary monotone function, which allows to model risk-aversion in a very individual way. We address this class of convex mixed-integer minimization problems by designing a branch-and-bound algorithm, where at each node, the continuous relaxation is solved by a non-monotone Frank-Wolfe type algorithm with away-steps. Experimental results on portfolio optimization problems show that our approach can outperform the MISOCP solver of CPLEX 12.6 for instances where a linear risk-weighting function is considered.

Keywords. mixed-integer programming, mean-risk optimization, global optimization

AMS subject classifications. 90C10, 90C57, 90C90

## 1 Introduction

We consider mixed-integer knapsack problems of the form

 maxc⊤y~{}s.t. a⊤y≤by≥0yi∈Z  ∀i∈I,

where  is the vector of non-negative decision variables, the index set specifies which variables have to take integer values. In many practical applications, the objective function coefficients  are uncertain, while  and  are known precisely. E.g., in portfolio optimization problems, the current prices  and the budget  are given, but the returns  are unknown at the time of investment. The robust optimization approach tries to address such uncertainty by considering worst-case optimal solutions, where the worst-case is taken over a specified set of probable scenarios called the uncertainty set  of the problem. Formally, we thus obtain the problem

 maxminc∈U c⊤y~{}s.t. a⊤y≤by≥0yi∈Z  ∀i∈I. (1)

The coefficients  may also be interpreted as random variables. Assuming a multivariate normal distribution, a natural choice for the set  is an ellipsoid defined by the means  and a positive definite covariance matrix  of . In this case, Problem (1) turns out to be equivalent (see, e.g., [3]) to the non-linear knapsack problem

 maxr⊤y−Ω√y⊤My~{}s.t. a⊤y≤by≥0yi∈Z  ∀i∈I, (2)

where the factor  corresponds to the chosen confidence level. It can be used to balance the mean and the risk in the objective function and hence to model the risk-aversion of the user. Ellipsoidal uncertainty sets have been widely considered in robust optimization [1, 2, 3].

In fact, mean-risk models such as (2) have been studied intensively in portfolio optimization, since Markowitz addressed them in his seminal paper dating back to 1952 [22]. Originally, the risk term was often given as  instead of , which generally leads to a different optimal balance between mean and risk. In our approach, we allow to describe the weight of the risk by any convex, differentiable and non-decreasing function . Typical choices for the function could be , yielding (2), or , which gives a convex MIQP problem. However, it may also be a reasonable choice to neglect small risks while trying to avoid a large risk as far as possible, this could be modeled by an exponential function

 h(t)={0 t≤γexp(t−γ)−(t−γ+1) t>γ.

In summary, our aim is to compute exact solutions for problems of the form

 maxr⊤y−h(√y⊤My)~{}s.t. a⊤y≤by≥0yi∈Z  ∀i∈I. (3)

### 1.1 Our contribution

The main contribution of this paper is an exact algorithm to solve Problem (3), i.e. a class of convex nonlinear mixed-integer programming problems. We propose a branch-and-bound method that suitably combines a Frank-Wolfe like algorithm [11] with a branching strategy already succesfully used in the context of mixed-integer programming problems (see [4] and references therein).

Our approach for solving the continuous relaxation in each subproblem (i.e. the problem obtained by removing the integrality constraints) exploits the simple structure of the feasible set of (3) as well as the specific structure of the objective function. It uses away-steps as proposed by Guélat and Marcotte [17] as well as a non-monotone line search.

Our motivation to choose a Frank-Wolfe like method is twofold. On the one hand, the algorithm, at each iteration, gives a valid dual bound for the original mixed-integer nonlinear programming problem, thus enabling fast pruning of the nodes in the branch-and-bound tree. On the other hand, the running time per iteration is very low, because the computation of the descent direction and the update of the objective function can be performed in an efficient way, as it will be further explained in the next sections. These two properties, along with the possibility of using warmstarts, are the key to a fast enumeration of the nodes in the branch-and-bound algorithm we have designed.

### 1.2 Organization of the paper

The remaining sections of the paper are organized as follows. In Section 2 we describe a modified Frank-Wolfe method to efficiently compute the dual bounds for the node relaxations. The section also includes an in-depth convergence analysis of the algorithm. In Section 3 we shortly explain the main ideas of our branch-and-bound algorithm, including the branching strategy, upper and lower bound computations and several effective warmstart strategies to accelerate the dual bound computation. In Section 4 we test our algorithm on real-world instances. We show computational results and compare the performances of our algorithm and of CPLEX 12.6 for different risk-weighting functions . Finally, in Section 5 we summarize the results and give some conclusions.

## 2 A modified version of the Frank-Wolfe method for the fast computation of valid dual bounds

A continuous convex relaxation of (the minimization version of) Problem (3), simply obtained by removing the integrality constraints in the original formulation, is the following:

 minh(√y⊤My)−r⊤y~{}s.t. a⊤y≤by≥0. (4)

By the transformation , Problem (4) becomes

 minf(x)=h(√x⊤Qx)−μ⊤x~{}s.t. 1⊤x≤1x≥0 (5)

where , and is the -dimensional vector with all entries equal to one. For the following, let denote the feasible set of (5).

In this section, we consider the Frank-Wolfe algorithm with away-steps proposed by Guélat and Marcotte [17], and define a non-monotone version for solving Problem (5). We also analyze its convergence properties. This algorithm is then embedded into our branch-and-bound framework.

The original method described in [17] uses an exact line search to determine, at a given iteration, the stepsize along the descent direction that yields the new iterate. When the exact line search is too expensive (i.e. too many objective function and gradient evaluations are required), different rules can be used for the stepsize calculation; see e.g. [12]. In particular, inexact line search methods can be applied to calculate the stepsize [10], such as the Armijo or Goldstein line search rules. Typical line search algorithms try a sequence of candidate values for the stepsize, stopping as soon as some well-defined conditions on the resulting reduction of the objective function value are met. Since the evaluation of the objective function at the trial points can be performed in constant time (see Section 2.3), line search methods are inexpensive in our context. Furthermore, from our numerical experience, using a non-monotone Armijo line search turned out to be the best choice in practice. With this choice, a stepsize that yields a (safeguarded) growth of the objective function can be accepted (see e.g. [13, 14, 15, 16]).

The outline of our approach is given in Algorithm 1. At each iteration , the algorithm first computes a descent direction, choosing among a standard toward-step and an away-step direction, as clarified in Section 2.2. Then, in case optimality conditions are not satisfied, it calculates a stepsize along the given direction by means of a non-monotone line search, see Section 2.3, updates the point, and starts a new iteration.

In Section 2.1, we will discuss how to decide whether the origin is an optimal solution of Problem (5). If this is not the case, we always choose a starting point better than the origin. The points produced at each iteration thus satisfy , so that and , where

 L(x0)={x∈Rn∣f(x)≤f(x0)}.

This is done in order to avoid obtaining the origin in any of the following iterations, as the objective function may not be differentiable in .

For the following, we summarize some important properties of Problem (5).

###### Lemma 1.

Assume that is not an optimal solution of Problem (5) and a point exists such that . Then,

• the set is compact;

• the function is continuously differentiable in ;

• the function is Lipschitz continuous in ;

• the function is Lipschitz continuous in with Lipschitz constant , where is the Lipschitz constant of the function .

###### Proof.

For (a), it suffices to note that is a closed subset of the compact set , while (b) holds since . As is differentiable on the compact set , we obtain (c). Finally, to prove (d), let  denote the unique symmetric matrix satisfying . Then

 ∥∇f(x)∥=∥∥h′(∥Q1/2x∥)Qx∥Q1/2x∥−μ∥∥≤|h′(∥Q1/2x∥)|∥∥Q1/2Q1/2x∥Q1/2x∥∥∥+∥μ∥≤|h′(∥Q1/2x∥)|∥Q1/2∥+∥μ∥≤L√λmax(Q)+∥μ∥.

In particular, it follows from (d) that is uniformly continuous in .

### 2.1 Checking optimality in the origin

A first difficulty in dealing with Problem (5) arises from the fact that the objective function may not be differentiable in the origin . We thus aim at checking, in a first phase of our algorithm, whether the origin is an optimizer of Problem (5). If so, we are done. Otherwise, our strategy is to avoid the origin as an iterate of our algorithm, as discussed in more detail in the following sections.

Since Problem (5) is convex, the origin is a global optimal solution if and only if there exists a subgradient  such that for all . From standard results of convex analysis (see e.g. Theorem 2.3.9 in Clarke [6]), we have that and we derive that

 ∂f(0)=h′(0)Q12B−μ,

where is the unit ball in . Thus is an optimal solution for Problem (5) if and only if

 ∃v∈B:∀x∈S:(h′(0)Q12v−μ)⊤x≥0. (6)

Since implies and  for all , Condition (6) is equivalent to

 ∃v∈B:h′(0)Q12v−μ≥0. (7)

Note that Condition (7) is never satisfied if , since and . Consequently, the origin is not an optimal solution of Problem (5) in this case. In general, Condition (7) allows to decide whether the origin is optimal by solving a convex quadratic optimization problem with non-negativity constraints, namely

 min||1h′(0)Q−12(y+μ)||~{}s.t. y≥0.

### 2.2 Computation of a feasible descent direction

For the computation of a feasible descent direction we follow the away-step approach described in [17]. At every iteration , we either choose a toward-step or an away-step. We first solve the following linearized problem (corresponding to the toward-step),

 ^xkTS=argmin∇f(xk)⊤(x−xk)~{}s.t. x∈S, (8)

and define as . The maximum stepsize that guarantees feasibility of the point chosen along is . Once the toward-step direction is computed, we consider the problem corresponding to the away-step,

 ^xkAS=argmax∇f(xk)⊤(x−xk)~{}s.t. x∈S,xi=0 if xki=0, (9)

and define as . In this case, the maximum stepsize guaranteeing feasibility is

 αAS=max{α≥0∣xk+αdkAS∈S}.

If , the point may become infeasible in case the non-negativity constraint on is violated. On the other hand, if , the point can only violate the constraint . Therefore, needs to be chosen as:

 αAS:=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩xk^ı1−xk^ıif ^xkAS=e^ı,1−1⊤xk1⊤xkif ^xkAS=0.

Note that, according to this rule, may be an infeasible steplength. Note also that, in case the equality constraints are not enforced in Problem (9), could be trivially zero.

In order to choose between the two directions, we use a criterion similar to the one presented in [17]: if

 ∇f(xk)⊤dkAS≤∇f(xk)⊤dkTS%andαAS>β, (10)

with a suitably chosen constant value, we choose the away-step direction, setting and . Otherwise we select the toward-step direction, setting and . The condition is needed to ensure convergence, as will become clear in Section 2.4 below.

In both Problems (8) and (9), we need to optimize a linear function over a simplex. This reduces to computing the objective function value at each vertex of the simplex, i.e., for  and  in (8) and for  and all  with  in (9). Consequently, after computing the gradient , both solutions can be obtained at a computational cost of .

### 2.3 Computation of a suitable stepsize

When using exact line searches, the Frank-Wolfe method with away-steps converges linearly if the objective function satisfies specific assumptions; see e.g. [17, 21]. When an exact line search approach is too expensive, we combine the away-step approach with non-monotone inexact line searches. Even if the Frank-Wolfe method is not guaranteed to converge linearly in the latter case, it yields very good results in practice, as will be shown in the numerical experience section.

In the non-monotone line search used in our algorithm, a stepsize is accepted as soon as it yields a point which allows a sufficient decrease with respect to a given reference value. A classical choice for the reference value is the maximum among the last  objective function values computed, where  is a positive integer constant. See Algorithm 2 for the details of our line search method.

The maximum stepsize used in Line  of Algorithm 2 is set to if the toward-step direction is chosen; it is set to , otherwise.

The following result states that Algorithm 2 terminates in a finite number of steps. It can be proved using similar arguments as in the proof of Proposition 3 in [16].

###### Proposition 1.

For each , assume that . Then Algorithm 2 determines, in a finite number of iterations of the while loop in Lines 3–5, a stepsize such that

 f(xk+αkdk)≤¯fk+γ1αk∇f(xk)⊤dk−γ2(αk)2∥dk∥2.

From a practical point of view, it is important that the computation of the objective function values of the trial points  can be accelerated by using incremental updates. Therefore, during the entire algorithm for solving Problem (5), we keep the values , , and up-to-date. In the line search, if a toward-step is applied and , we exploit the fact that all expressions

 (xk+αdk)⊤Q(xk+αdk) = (1−α)2(xk)⊤Qxk+2α(1−α)(Qxk)^ı+α2Q^ı^ı μ⊤(xk+αdk) = (1−α)μ⊤xk+αμ^ı

can be computed in constant time. Similarly, for , we obtain

 (xk+αdk)⊤Q(xk+αdk) = (1−α)2(xk)⊤Qxk μ⊤(xk+αdk) = (1−α)μ⊤xk.

In particular, if  can be evaluated in constant time, the same is true for the computation of the objective value . Moreover, when the line search is successful and the next iterate is chosen, the same formula as above can be used to compute  and in constant time, while  can be updated in linear time using

 Q(xk+αdk)={(1−α)Qxk+αQ^ı⋅ if ^xk=e^ı(1−α)Qxk if ^xk=0.

The case of an away-step can be handled analogously.

In summary, after computing , , and from scratch, the computation of objective function values takes  time per iteration of Algorithm 2 – assuming that  can be evaluated in constant time – plus  time per iteration of Algorithm 1 (needed to keep the values of , , and up-to-date).

### 2.4 Convergence analysis of the non-monotone Frank-Wolfe algorithm

We now analyze the convergence properties of the non-monotone Frank-Wolfe algorithm NM-MFW with away-steps (Algorithm 1). All the proofs of the following theoretical results can be found in the Appendix.

###### Lemma 2.

Suppose that NM-MFW produces an infinite sequence . Then

• for all ;

• the sequence is non-increasing and converges to a value .

###### Proof.

For the proof, see Appendix.

###### Lemma 3.

Suppose that NM-MFW produces an infinite sequence . Then

 limk→∞f(xk)=limk→∞¯fk=¯f.
###### Proof.

For the proof, see Appendix.

###### Lemma 4.

Suppose that NM-MFW produces an infinite sequence . Then

 limk→∞∇f(xk)⊤dk=0.
###### Proof.

For the proof, see Appendix.

###### Theorem 1.

Let be the sequence of points produced by NM-MFW. Then, either an integer exists such that is an optimal solution for Problem (5), or the sequence  is infinite and every limit point is an optimal solution for Problem (5).

###### Proof.

For the proof, see Appendix.

We notice that, due to the use of the line search, there is no need to make any particular assumption on the gradient of the objective function (such as Lipschitz continuity) for proving the convergence of Algorithm NM-MFW.

### 2.5 Lower bound computation

When using Algorithm NM-MFW within a branch-and-bound framework as we will present in Section 3, the availability of valid dual bounds during the execution of NM-MFW can help to prune the current node before termination of the algorithm, and thus to decrease the total running time of the branch-and-bound scheme.

Considering Problem (5), we can define the following dual function [7, 19] for all :

 w(x):=minz∈S(f(x)+∇f(x)⊤(z−x)).

From the definition of  and taking into account the convexity of , we have the following weak duality result:

 w(x)≤f(x)+∇f(x)⊤(x⋆−x)≤f(x⋆), (11)

where  again denotes an optimal solution of Problem (5). We thus obtain a dual bound in each iteration for free, given by

 f(xk)+∇f(xk)⊤dk≤w(xk)=f(xk)+minz∈S∇f(xk)⊤(z−xk)=f(xk)+∇f(xk)⊤dkTS.

Note that this equation follows from how our direction is chosen, according to (10) (see Section 2.2 for further details). We can stop Algorithm NM-MFW as soon as  exceeds the current best upper bound in the branch-and-bound scheme. Furthermore, strong duality holds in (11) (in the sense that ); see e.g. [7] and the references therein.

## 3 Branch-and-Bound algorithm

In order to deal with integer variables in Problem (3), we embedded Algorithm 1 into a branch-and-bound framework. Aiming at a fast enumeration of the branch-and-bound tree, we follow the ideas that have been successfully applied in, e.g., [4]. In this section, we give a short overview over the main features of the branch-and-bound scheme.

### 3.1 Branching and enumeration strategy

At every node in our branch-and-bound scheme, we branch by fixing a single integer variable to one of its feasible values. The enumeration order of the children nodes is by increasing distance to the value of this variables in the solution of the continuous relaxation , computed by Algorithm 1. If the closest integer value to is , we thus consecutively fix to integer values , and so on. If the closest integer is , we analogously start with fixing to the integer value . By optimality of and by the fact that the problem is convex, the resulting lower bounds are non-decreasing when fixing to either increasing values greater than or decreasing values less than . In particular, when being able to prune a node, all siblings beyond this node can be pruned as well.

Once we arrive at level , all integer variables are fixed and the problem reduces to the purely continuous problem (4). We refer to [4] and the references therein for further details on the branching strategy.

### 3.2 Lower bounds after fixing

An advantage of branching by fixing variables as opposed to branching by splitting up variable domains is that the subproblems in the enumeration process of the search tree essentially maintain the same structure. Fixing a variable in Problem (4) just corresponds to moving certain coefficients from the matrix  to a linear or constant part under the square root, and from the vector  to a constant part outside the square root. More precisely, assume that the variables with indices in  have been fixed to values . The problem then reduces to the minimization of

 fs:Z|I|−|J|×Rn−|I|→R, x↦h(√x⊤Msx+c⊤sx+ds)−r⊤sx−ts (12)

over the feasible region , where the matrix is obtained by deleting the rows and columns corresponding to , the vector is obtained by deleting the columns corresponding to , and the remaining terms are updated appropriately.

Note that the relaxation of Problem (12) has a slightly more general form than the original Problem (4), since the data  and  may be non-zero as a result of fixing variables. However, the algorithm for solving Problem (4) discussed in Section 2 can easily be applied to the relaxation of Problem (12) as well, the only difference being in the computation of the gradient. In fact, in case at least one variable has been fixed to a non-zero value, we obtain  since . In particular, the objective function becomes globally differentiable in this case.

### 3.3 Upper bounds

As an initial upper bound in the branching tree, we use a simple heuristic, adapted from a greedy heuristic by Julstrom [20] for the quadratic knapsack problem. Analogously to the notation used in the theory of knapsack problems the profit ratio of an item is defined as the sum of all profits that one gains by putting item into the knapsack, divided by its weight. Transferred to our application, we have

 pi:=(h(√mii+2∑j≠imij)−ri)/ai

for all . Julstrom proposed to sort all items in a non-decreasing order with respect to  and, starting from the first item, successively set until the capacity of the knapsack is reached. The remaining variables are set to zero.

We adapt this algorithm by allowing multiple copies of each item, i.e. , where is the current capacity of the knapsack.

During the branch-and-bound enumeration, we do not use any heuristics for improving the primal bound, since the fast enumeration using a depth-first search usually leads to the early identification of good feasible solutions and hence to fast updates of the bound. Once all integer variables have been fixed, we compute the optimal solution of the subproblem in the reduced continuous subspace.

### 3.4 Warmstarts

With the aim of speeding-up our branch-and-bound scheme, we use a warmstart procedure by taking over information from the parent node. For this, let be the optimal solution in the parent node and define by removing the entry of  corresponding to the variable that has been fixed last. If is feasible for the current node relaxation, we always use it as a starting point for NM-MFW, otherwise we choose one of the following feasible points according to our chosen warmstarting rule:

• the first unit vector ;

• the projection of onto the feasible region;

• or the unit vector with .

The resulting warmstarting rules are denoted by (), (), and (), respectively. This notation is meant to emphasize that we either use or, if not possible, one of the other choices depending on the selected rule.

Note that the point can be computed by the algorithm originally proposed by Held et al. [18] that was recently rediscovered by Duchi et al. [9]. For the latter version the overall complexity has been proved to be . The unit vector  is chosen by again adapting ideas of the greedy heuristic by Julstrom [20]. It represents the vertex of where the potential increase of the objective function due to the remaining items is minimized, if setting .

## 4 Numerical experience

In order to investigate the potential of our algorithm FW-BB when applied to Problem (3), we implemented it in C++ and Fortran 90 and performed an extensive computational evaluation. As benchmark data set, we used historical real-data capital market indices from the Standard & Poor’s 500 index (S&P 500) that were used and made public by Cesarone et al. [5]. This data set was used for solving a Limited Asset Markowitz (LAM) model. For each of the 500 stocks the authors obtained 265 weakly price data, adjusted for dividends, from Yahoo Finance for the period from March 2003 to March 2008. Stocks with more than two consecutive missing values were disregarded. The missing values of the remaining stocks were interpolated, resulting in an overall of 476 stocks. Logarithmic weekly returns, expected returns and covariance matrices were computed based on the period March 2003 to March 2007.

By choosing stocks at random from the 476 available ones, we built mixed-integer portfolio optimization instances of different sizes. Namely, we built 10 problems with 100, 10 with 150 and 10 with 200 stocks, considering (so half of the variables are constrained to be integer). We considered three different values for , representing the budget of the investor, namely , , and , yielding a total of 90 instances.

All experiments were carried out on Intel Xeon processors running at 2.60 GHz. All running times were measured in cpu seconds and the time-limit was set to one cpu hour. In the following, we first present a numerical evaluation related to our algorithm FW-BB: we explore the benefits obtained from using the non-monotone line search and using warmstart alternatives. Then, we present a comparison of FW-BB with the MISOCP and the MIQP solver of CPLEX 12.6, for the two cases  and , respectively. Finally, to show the generality of our approach, we report the results of numerical tests for a non-standard risk-weighting function .

### 4.1 Benefits of the non-monotone line search and warmstarts

The NM-MFW-algorithm devised in Section 2 uses a non-monotone line search; in our implementation of FW-BB we set . In order to show the benefits of the non-monotone version of FW-BB we report in Table 1 a comparison between the non-monotone version (NM-FW-BB) and the monotone one (M-FW-BB), on instances with and budget constraint . We considered as warmstart choice. In Table 1 we report, for each dimension, the number of instances solved within the time limit (), the average running times (time), and the average numbers of iterations of NM-MFW in each node of the enumeration tree (it). All averages are taken over the set of instances solved within the time limit. Using the non-monotone line search, FW-BB is able to solve a greater number of instances within the time limit. Furthermore, NM-FW-BB gives in general better performance in terms of running times, while the number of iterations is very similar, showing the advantage of allowing stepsizes with a safeguarded growth of the objective function.

In order to investigate the benefits of the warmstart choices , , , we again ran the different versions of FW-BB on instances with and budget constraint . We compare the three warmstart possibilities presented above with the following alternatives: () always choose ; () always choose . In Table 2 we show the results related to the five different starting point choices. We can observe that the best choice among those considered, according to the number of instances solved within the time limit, is . We also observe that, when , choosing is better than considering or as starting points, highlighting the benefits of using warmstarts.

### 4.2 Comparison with Cplex 12.6

In this section, we present a numerical comparison on instances with and . We compare FW-BB with the MISOCP and the MIQP solver of CPLEX 12.6, respectively. Concerning FW-BB, we consider the two non-monotone versions, FW-BB-P and FW-BB-G, using and , respectively. We use an absolute optimality tolerance of for all algorithms.

#### Comparison on instances with h(t)=Ωt.

In order to compare FW-BB with CPLEX 12.6, we modeled (3) as an equivalent mixed-integer second-order cone program (MISOCP):

 −min{y−r⊤x:a⊤x≤b, Ω√x⊤Mx≤y, x≥0, xi∈Z, i=1,…,|I|, y∈R}.

We chose , where . The value of controls the amount of risk the investor is willing to take. In theory, can take any value in (0,1], where a small value implies a big weight on the risk-term and means that the risk is not taken into account. Numerical tests on single instances showed that any value of in (0,0.9] leads to the trivial optimal solution zero, i.e. not investing anything is the optimal decision for the investor. Therefore, we restricted our experiments to the three values of mentioned above.

In Table 3, we report for each algorithm the following data: numbers of instances solved within the time limit (), average running times (time), average numbers of branch-and-bound nodes (nodes). All averages are taken over the set of instances solved within the time limit. We show the computational results for the three different values of and . We can see that FW-BB suffers from an increasing right hand side , which however holds for CPLEX 12.6 as well, even to a larger extent. The choice of  does not significantly effect the performance of FW-BB, while CPLEX 12.6 performs better on instances with large . Altogether, we can observe that FW-BB-P is able to solve the largest number of instances within the time limit. When the number of solved instances is the same, both version of FW-BB outperform the MISOCP solver of CPLEX 12.6 in terms of cpu time. Note that the average number of branch-and-bound nodes in FW-BB is much larger than that needed by CPLEX 12.6. This highlights how solving the continuous relaxations by NM-FW-BB leads to a fast enumeration of the branch-and-bound nodes. Besides Table 3, we visualize our running time results by performance profiles in Figure 1, as proposed in [8]. They confirm that, in terms of cpu time, FW-BB-P outperforms the MISOCP solver of CPLEX 12.6 significantly.

In our experiments, we noticed that in some cases FW-BB and CPLEX provide slightly different minimizers, yielding slightly different optimal objective function values. While on certain instances the optimal solution of FW-BB is slightly superior to CPLEX, on other instances it is the other way round. We observed a relative difference from the best solution of the order of .

#### Comparison on instances with h(t)=Ωt2.

If we consider as risk-weighting function , Problem (3) reduces to a convex quadratic mixed-integer problem, and the objective function is differentiable everywhere in the feasible set. In Table 4 we report the comparison among FW-BB-P, FW-BB-G and the MIQP solver of CPLEX 12.6. We considered . All algorithms were able to solve all the instances very quickly. The MIQP solver of CPLEX 12.6 shows the best cpu times, although both versions of FW-BB are also very fast, even if they enumerate a higher number of nodes.

We would like to remark that our branch-and-bound algorithm does not exploit the (quadratic) structure of the objective function, since it is designed to solve a more general class of problems than MIQPs. Nevertheless, the algorithm gives competitive results also when dealing with those problems.

### 4.3 Results with a non-standard risk-weighting function

As a further experiment, we tested our instances considering a different risk-weighting function , namely

 hexp(t)={0 t≤γexp(t−γ)−(t−γ+1) t>γ,

such that the investor’s risk-aversion increases exponentially in the risk after exceeding a certain threshold value . In Table 5, we report the results of FW-BB-P, considering three choices . We observe that for both  and our algorithm FW-BB-P is able to solve all instances within the time limit, and that instances get more difficult for FW-BB-P with increasing .

In order to investigate the influence of the risk-weighting function on the optimal solution, we compared different functions for an instance of dimension under the constraint . The results are given in Table 6. We report, for each risk-weighting function depending on a specific risk parameter (risk-par), the objective function value obtained (obj), the value of the return term in the objective function evaluated at the optimal solution (), the number of non-zero entries in the optimal solution (), and the maximal entry in the optimal solution ().

Not surprisingly, the results show that a larger weight on the risk-term leads to a smaller expected return in the optimal solution. At the same time, a large weight on the risk favors a diversified portfolio, so that the number of non-zeros increases with the weight on the risk, at the same and the maximal amount invested into a single investment decreases. However, the precise dependencies are defined by the function . In Figure 2, we show contour plots for the different types of functions  considered here.

## 5 Conclusions

We presented a branch-and-bound algorithm for a large class of convex mixed-integer minimization problems arising in portfolio optimization. Dual bounds are obtained by a modified version of the Frank-Wolfe method. This is motivated mainly by two reasons. On the one hand, the Frank-Wolfe algorithm, at each iteration, gives a valid dual bound for the original mixed-integer problem, therefore it may allow an early pruning of the node. On the other hand, the cost per iteration is very low, since the computation of the descent direction and the update of the objective function can be performed in a very efficient way. Furthermore, the devised Frank-Wolfe method benefits from the use of a non-monotone Armijo line search. Within the branch-and-bound scheme, we propose different warmstarting strategies. The branch-and-bound algorithm has been tested on a set of real-world instances for the capital budgeting problem, considering different classes of risk-weighting functions. Experimental results show that the proposed approach significantly outperforms the MISOCP solver of CPLEX 12.6 for instances where a linear risk-weighting function is considered.

## 6 Appendix

Proof of Lemma 2:

###### Proof.

First note that the definition of  ensures  and hence  for all , which proves (i). For (ii), we have that

 ¯fk+1=max0≤i≤min{pnm,k+1}f(xk+1−i)≤max{¯fk,f(xk+1)}.

Since by the definition of the line search, we derive , which proves that the sequence is non-increasing. By (i), this sequence is bounded from below by the minimum of  on , which exists by Lemma 1, and hence converges.

Proof of Lemma 3:

###### Proof.

For each , choose with . We prove by induction that for any fixed integer we have

 limk→∞f(xtk−i)=limk→∞f(xtk)=limk→∞¯fk=¯f. (13)

Suppose at first . Then (13) follows from Lemma 2.

We now assume that (13) holds for and we prove that it holds for index . We have

 f(xtk−i)≤¯ftk−i−1+γ1αtk−i−1∇f(xtk−i−1)⊤dtk−i−1−γ2(αtk−i−1)2∥dtk−i−1∥2,

so that the same reasoning as before yields

 f(xtk−i)−¯ftk−i−1≤−γ2(αtk−i−1)2∥dtk−i−1∥2. (14)

The left hand side of (14) converges to zero since (13) holds for and the term converges to (by the inductive hypothesis), as well as because of Lemma 2 (and the fact that is bounded by ). Then,

 limk→∞(αtk−i−1)2∥dtk−i−1∥2=0,

so that . Again, uniform continuity of over yields (13) for index .

To conclude the proof, let and note that for any  we can write

 f(xk)=f(xTk)−Tk−k−1∑i=0(f(xTk−i)−f(xTk−i−1)).

Therefore, since the summation vanishes and converges to from Lemma 2, taking the limit for and observing  we obtain the result.

Proof of Lemma 4:

###### Proof.

First note that for all . Let be the stepsize used by NM-MFW at iteration . Then,

 ¯fk−f(xk+αkdk)≥γ1αk|∇f(xk)⊤dk|+γ2(αk)2∥dk∥2≥γ1αk|∇f(xk)⊤dk|≥0.

By Lemma 3, the left hand side converges to zero, hence

 limk→∞αk|∇f(xk)⊤dk|=0. (15)

Since  is continuously differentiable on the compact set  by Lemma 1 and  is bounded on , the sequence  is bounded. It thus suffices to show that any convergent subsequence of  converges to zero.

We assume by contradiction that a subsequence exists with

 limi→∞∇f(xki)⊤dki=−η<0.

Since the sequences  and  are bounded, we can switch to an appropriate subsequence and assume that and exist. From (15) we obtain

 limk→∞αk=0, (16)

and the continuity of the gradient in implies

 ∇f(¯x)⊤¯d=limk→∞∇f(xk)⊤dk=−