BundleLevel Type Methods Uniformly Optimal
for Smooth and Nonsmooth Convex Optimization
^{1}^{1}1The paper is a combined version of the two manuscripts
previously submitted to Mathematical Programming, namely: “Bundletype methods uniformly optimal for smooth and nonsmooth convex optimization”
and “Level methods uniformly optimal for composite and structured nonsmooth convex optimization”.
^{†}^{†}thanks: The author of this paper was partially supported by
NSF grant CMMI1000347,
ONR grant N000141310036 and
NSF CAREER Award CMMI1254446.
Abstract
The main goal of this paper is to develop uniformly optimal firstorder methods for convex programming (CP). By uniform optimality we mean that the firstorder methods themselves do not require the input of any problem parameters, but can still achieve the best possible iteration complexity bounds. By incorporating a multistep acceleration scheme into the wellknown bundlelevel method, we develop an accelerated bundlelevel (ABL) method, and show that it can achieve the optimal complexity for solving a general class of blackbox CP problems without requiring the input of any smoothness information, such as, whether the problem is smooth, nonsmooth or weakly smooth, as well as the specific values of Lipschitz constant and smoothness level. We then develop a more practical, restricted memory version of this method, namely the accelerated proxlevel (APL) method. We investigate the generalization of the APL method for solving certain composite CP problems and an important class of saddlepoint problems recently studied by Nesterov [Mathematical Programming, 103 (2005), pp 127152]. We present promising numerical results for these new bundlelevel methods applied to solve certain classes of semidefinite programming (SDP) and stochastic programming (SP) problems.
Keywords: Convex Programming, Complexity, Bundlelevel, Optimal methods
1 Introduction
Consider the convex programming (CP)
(1.1) 
where is a convex compact set and is a closed convex function. In the classic blackbox setting, is represented by a firstorder oracle which, given an input point , returns and , where denotes the subdifferential of at .
If is a general nonsmooth Lipschitz continuous convex function, then, by the classic complexity theory for CP nemyud:83, the number of calls to the firstorder oracle for finding an solution of (1.1) (i.e., a point s.t. ), cannot be smaller than when is sufficiently large. This lower complexity bound can be achieved, for example, by the simple subgradient descent or mirror descent method nemyud:83. If is a smooth function with Lipschitz continuous gradient, Nesterov in a seminal work Nest831 presented an algorithm with the iteration complexity bounded by , which, by nemyud:83, is also optimal for smooth convex optimization if is sufficiently large. Moreover, if is a weakly smooth function with Hölder continuous gradient, i.e., constants and such that then the optimal iteration complexity bound is given by (see NemNes851; Nest881; DeGlNe101).
To accelerate the solutions of largescale CP problems, much effort has recently been directed to exploiting the problem’s structure, in order to identify possibly some new classes of CP problems with stronger convergence performance guarantee. One such example is given by the composite CP problems with the objective function given by . Here is a relatively simple nonsmooth convex function such as or (see Subsection 4.1 for more examples) and is a dimensional vector function, see Nest89; Nest04; Nest071; Nem94; LewWri091; Lan103; GhaLan122a; GhaLan101b. In most of these studies, the components of are assumed to be smooth convex functions. In this case, the iteration complexity can be improved to by properly modifying Nesterov’s optimal smooth method, see for example, Nest04; Nest071; Nem94. It should be noted that these optimal firstorder methods for general composite CP problems are in a sense “conceptual” since they require the minimization of the summation of a proxfunction together with the composition of with an affine transformation Nest04. More recently, Nesterov Nest051 studied a class of nonsmooth convexconcave saddle point problems, where the objective function , in its basic form, is given by
Here is a convex compact set and denotes a linear operator from to . Nesterov shows that can be closely approximated by a certain smooth convex function and that the iteration complexity for solving this class of problems can be improved to . It is noted in jnt08 that this bound is unimprovable, for example, if is given by a Euclidean ball and the algorithm can only have access to and (the adjoint operator of ). These problems were later studied in Nem051; Nest052; AuTe061; Nest061; pena081; LaLuMo111 and found many interesting applications, for example, in dbg081; Lu091; BeBoCa091.
The advantages of the aforementioned optimal firstorder methods (e.g., subgradient method or Nesterov’s method) mainly consist of their optimality, simplicity and cheap iteration cost. However, these methods might have some shortcomings in that each method is designed for solving a particular subclass of CP problems (e.g., smooth or nonsmooth). In particular, nonsmooth CP algorithms usually cannot make use of local smoothness properties that a nonsmooth instance might have, while it is wellknown that Lipschitz continuous functions are differentiable almost everywhere within its domain. On the other hand, although it has been shown recently in Lan103 that Nesterov’s method, which was originally designed for solving smooth CP problems, is also optimal for nonsmooth optimization when employed with a properly specified stepsize policy (see also DeGlNe101 for a more recent generalization to weakly smooth CP problems), one still needs to determine some smoothness properties of (e.g., whether is smooth or not, i.e., or , and the specific value of ), as well as some other global information (e.g., and in some cases, the number of iterations ), before actually applying these generalized algorithms. Since these parameters describe the structure of CP problems over a global scope, these types of algorithms are still inherently worstcase oriented.
To address these issues, we propose to study the socalled uniformly optimal firstorder methods. The key difference between uniformly optimal methods and existing ones is that they can achieve the best possible complexity for solving different subclasses of CP problems, but require little (preferably no) structural information for their implementation. To this end, we focus on a different type of firstorder methods, namely: the bundlelevel (BL) methods. Evolving from the wellknown bundle methods Kiw831; Kiw901; Lem75, the BL method was first proposed by Lemaréchal et al. LNN in 1995. In contrast to subgradient or mirror descent methods for nonsmooth CP, the BL method can achieve the optimal iteration complexity for general nonsmooth CP without requiring the input of any problem parameters. Moreover, the BL method and their certain “restrictedmemory” variants BenNem00; BenNem051; Rich071 often exhibit significantly superior practical performance to subgradient or mirror descent methods. However, to the best of our knowledge, the study on BL methods has so far been focused on general nonsmooth CP problems only.
Our contribution in this paper mainly consists of the following aspects. Firstly, we consider a general class of blackbox CP problems in the form of (1.1), where satisfies
(1.2) 
for some , and . Clearly, this class of problems cover nonsmooth (), smooth () and weakly smooth () CP problems (see for example, p.22 of Nest04 for the standard arguments used in smooth and weakly smooth case, and Lemma 2 of Lan103 for a related result in the nonsmooth case). By incorporating into the BL method a multistep acceleration scheme that was first used by Nesterov Nest831 and later in AuTe061; Lan103; LaLuMo111; Nest04; Nest051 to accelerate gradient type methods for solving smooth CP problems, we present a new BLtype algorithm, namely: the accelerated bundlelevel (ABL) method. We show that the iteration complexity of the ABL method can be bounded by
Hence, the ABL method is optimal for solving not only nonsmooth, but also smooth and weakly smooth CP problems. More importantly, this method does not require the input of any smoothness information, such as whether a problem is smooth, nonsmooth or weakly smooth, and the specific values of problem parameters , and . To the best of our knowledge, this is the first time that uniformly optimal algorithms of this type have been proposed in the literature.
Secondly, one problem for the ABL method is that, as the algorithm proceeds, its subproblems become more difficult to solve. As a result, each iteration of the ABL method becomes computationally more and more expensive. To remedy this issue, we present a restricted memory version of this method, namely: the accelerated proxlevel (APL) method, and demonstrate that it can also uniformly achieve the optimal complexity for solving any blackbox CP problems. In particular, each iteration of the APL method requires the projection onto the feasible set coupled with a few extra linear constraints, and the number of such linear constraints can be fully controlled (as small as or ). The basic idea of this improvement is to incorporate a novel rule due to Kiwiel Kiw951 (later studied by Bental and Nemirovski BenNem00; BenNem051) for updating the lower bounds and proxcenters. In addition, nonEuclidean proxfunctions can be employed to make use of the geometry of the feasible set in order to obtain (nearly) dimensionindependent iteration complexity.
Thirdly, we investigate the generalization of the APL method for solving certain classes of composite and structured nonsmooth CP problems. In particular, we show that with little modification, the APL method is optimal for solving a class of generalized composite CP problems with the objective given by . Here , , can be a mixture of smooth, nonsmooth, weakly smooth or affine components. Such a formulation covers a wide range of CP problems, including the nonsmooth, weakly smooth, smooth, minimax, and regularized CP problems (see Subsection 4.1 for more discussions). The APL method can achieve the optimal iteration complexity for solving this class of composite problems without requiring any global information on the inner functions, such as the smoothness level and the size of Lipschitz constant. In addition, based on the APL method, we develop a completely problemparameter free smoothing scheme, namely: the uniform smoothing level (USL) method, for solving the aforementioned class of structured CP problems with a bilinear saddle point structure Nest051. We show that this method can find an solution of these CP problems in at most iterations.
Finally, we demonstrate through our preliminary numerical experiments that these new BL type methods can be competitive and even significantly outperform existing firstorder methods for solving certain classes of CP problems. Observe that each iteration of BL type methods involves the projection onto coupled with a few linear constraints, while gradient type methods only require the projection onto . As a result, the iteration cost of BL type methods can be higher than that of gradient type methods, especially when the projection onto has explicit solutions. Here we would like to highlight a few interesting cases in which the application of BL type methods would be preferred: (i) the major iteration cost does not exist in the projection onto , but the computation of firstorder information (e.g., involving eigenvalue decomposition or the solutions of another optimization problem); and (ii) the projection onto is as expensive as the projection onto coupled with a few linear constraints, e.g., is a general polyhedron. In particular, we show that the APL and USL methods, when applied to solving certain important classes of semidefine programming (SDP) and stochastic programming (SP) problems, can significantly outperform gradient type algorithms, as well as some existing BL type methods. The problems we tested consist of instances with up to decision variables.
The paper is organized as follows. In Section 2, we provide a brief review of the BL method and present the ABL method for blackbox CP problems. We then study a restricted memory version of the ABL method, namely the APL method in Section 3. In Section 4, we investigate how to generalize the APL method for solving certain composite and structured nonsmooth CP problems. Section 5 is dedicated to the numerical experiments conducted on certain classes of SDP and SP problems. Finally, some concluding remarks are made in Section 6.
2 The accelerated bundlelevel method
We present a new BL type method, namely: the accelerated bundlelevel (ABL) method, which can uniformly achieve the optimal rate of convergence for smooth, weakly smooth and nonsmooth CP problems. More specifically, we provide a brief review of the BL method for nonsmooth minimization in Section 2.1, and then present the ABL method and discuss its main convergence properties in Section 2.2. Section 2.3 is devoted to the proof of a major convergence result used in Section 2.2. Throughout this section, we assume that the Euclidean space is equipped with the standard Euclidean norm associated with the inner product .
2.1 Review of the bundlelevel method
Given a sequence of search points , an important construct, namely, the cutting plane model, of the objective function of problem (1.1) is given by
(2.1) 
where
(2.2) 
In the simplest cutting plane method CheGol59; Kelley60, we approximate by and update the search points according to
(2.3) 
However, this scheme converges slowly, both theoretically and practically nemyud:83; Nest04. A significant progress Kiw831; Kiw901; Lem75 was made under the name of bundle methods (see, e.g., HeRe00; OliSagSch111 for some important applications of these methods). In these methods, a proxterm is introduced into the objective function of (2.3) and the search points are updated by
Here, the current proxcenter is a certain point from and denotes the current penalty parameter. Moreover, the proxcenter for the next iterate, i.e., , will be set to if is sufficiently smaller than . Otherwise, will be the same as . The penalty reduces the influence of the model ’s inaccuracy and hence the instability of the algorithm. Note, however, that the determination of usually requires certain online adjustments or linesearch. In the closely related trustregion technique Rusz06; linWri031, the proxterm is put into the constraints of the subproblem instead of its objective function and the search points are then updated according to
This approach also encounters similar difficulties for determining the size of .
In an important work LNN, Lemaréchal et al. introduced the idea of incorporating level sets into the bundle method. The basic scheme of their bundlelevel (BL) methods consists of:

Update to be the best objective value found so far and compute a lower bound on by

Set for some ;

Set
Observe that step c) ensures that the new search point falls within the level set , while being as close as possible to . We refer to as the proxcenter, since it controls the proximity between and the aforementioned level set. It is shown in LNN that, if is a general nonsmooth convex function (i.e., in (1.2)), then the above scheme can find an solution of (1.1) in at most
(2.4) 
iterations, where is a constant depending on and
(2.5) 
In view of nemyud:83, the above complexity bound in (2.4) is unimprovable for nonsmooth convex optimization. Moreover, it turns out that the level sets give a stable description about the objective function and, as a consequence, very good practical performance has been observed for the BL methods, e.g., LNN; BenNem00; lns11.
2.2 The ABL algorithm and its main convergence properties
Based on the bundlelevel method, our goal in this subsection is to present a new bundle type method, namely the ABL method, which can achieve the optimal complexity for solving any CP problems satisfying (1.2).
We introduce the following two key improvements into the classical BL methods. Firstly, rather than using a single sequence , we employ three related sequences, i.e., , and , to build the cuttingplane models (and hence the lower bound ), compute the upper bounds , and control the proximity, respectively. Moreover, the relations among these sequences are defined carefully. In particular, we define and for a certain . This type of multistep scheme originated from the wellknown Nesterov’s accelerated gradient method for solving smooth CP problems Nest831. Secondly, we group the iterations performed by the ABL method into different phases, and in each phase, the gap between the lower and upper bounds on will be reduced by a certain constant factor. It is worth noting that, although the convergence analysis of the BL method also relies on the concept of phases (see, e.g., BenNem00; BenNem051), the description of this method usually does not involve phases. However, we need to use phases explicitly in the ABL method in order to define in an optimal way to achieve the best possible complexity bounds for solving problem (1.1).
We start by describing the ABL gap reduction procedure, which, for a given search point and lower bound on , computes a new search point and updated lower bound satisfying for some .
The ABL gap reduction procedure:

Set , , and . Also let and the cutting plane be arbitrarily chosen, say and . Let .

Update lower bound: set , , ,
(2.6) 
Update proxcenter: set and
(2.7) 
Update upper bound: set , and choose such that ;

If , terminate the procedure with and ;

Set and go to Step 1.
We now add a few remarks about the above gap reduction procedure . Firstly, we say that an iteration of procedure occurs whenever increases by . Observe that, if for all , then an iteration of procedure will be exactly the same as that of the BL method. In fact, in this case procedure will reduce to one phase of the BL method as described in BenNem051; BenNem00. Secondly, with more general selections of , the iteration cost of procedure is still about the same as that of the BL method. More specifically, each iteration of procedure involves the solution of two subproblems, i.e., (2.6) and (2.7), and the computation of and , while the BL method requires the solution of two similar subproblems and the computation of and . Thirdly, it can be easily seen that and , , respectively, computed by procedure are lower and upper bounds on . Indeed, by the definition of , (2.2) and the convexity of , we have
(2.8) 
which, in view of (2.6), then implies that Moreover, it follows from the definition of that Hence, denoting
(2.9) 
we have
(2.10) 
By showing how in (2.9) decreases with respect to , we establish in Theorem 2.1 some important convergence properties of procedure . The proof of this result is more involved and hence provided separately in Section 2.3.
Theorem 2.1
Let and , , be given. Also let denote the optimality gap obtained at the th iteration of procedure before it terminates. Then for any , we have
(2.11) 
where is defined in (2.5), is the norm,
(2.12) 
(2.13) 
In particular, if and , , are chosen such that for some ,
(2.14) 
then the number of iterations performed by procedure can be bounded by
(2.15) 
Observe that, if for all , then as mentioned before, procedure reduces to a single phase (or segment) of the BL method and hence its termination follows by slightly modifying the standard analysis of the BL algorithm. However, such a selection of does not satisfy the conditions stated in (2.14) and thus cannot guarantee the termination of procedure in at most iterations. Below we discuss a few possible selections of that satisfy (2.14), in order to obtain the bound in (2.15). It should be pointed out that none of these selections rely on any problem parameters, such as , and .
Proposition 1
Proof
Denoting and , we first show part a). Note that by (2.12), the selection of and the fact that , we have
Using these relations and the simple observation we conclude that
where the last inequality follows from the facts and for any .
We now show that part b) holds. Note that by (2.16), we have
(2.17) 
which clearly implies that , . We now show that and by induction. Indeed, if , then, by (2.17), we have
The previous conclusion, together with the fact that due to (2.16), then also imply that . Now let us bound for any . First observe that, by (2.16), we have, for any ,
Using the above identity, (2.16) and the fact that due to (2.16), we conclude that
which, in view of the fact that , then implies that Using the previous inequality and (2.16), we conclude that
and
According to the termination criterion in step 4 of procedure , each call to this procedure will reduce the gap between a given upper and lower bound on by a constant factor. In the ABL method described below, we will iteratively call procedure until a certain accurate solution of problem (1.1) is found.
The ABL method:

Input: initial point , tolerance and algorithmic parameter .

Set , and . Let .

If , terminate;

Set and ;

Set and go to step 1.
Whenever increments by , we say that a phase of the ABL method occurs. Unless explicitly mentioned otherwise, an iteration of procedure is also referred to as an iteration of the ABL method. The main convergence properties of the above ABL method are summarized as follows.
Theorem 2.2
Suppose that and , , in procedure are chosen such that (2.14) holds for some . Let , and be given by (2.5) and (1.2).

The number of phases performed by the ABL method does not exceed
(2.18) 
The total number of iterations performed by the ABL method can be bounded by
(2.19)
Proof
Denote , . Without loss of generality, we assume that , since otherwise the statements are obviously true. Note that by the origin of and , we have
(2.20) 
Also note that, by (1.2), (2.5) and the definition of in the ABL method, we have
(2.21) 
The previous two observations then clearly imply that the number of phases performed by the ABL method is bounded by (2.18). We now bound the total number of iterations performed by the ABL method. Suppose that procedure has been called times for some . It then follows from (2.20) that , , since due to the origin of . Using this observation, we obtain
Moreover, by Theorem 2.1, the total number of iterations performed by the ABL method is bounded by
Our result then immediately follows by combining the above two inequalities.
We now add a few remarks about Theorem 2.2. Firstly, by setting , and in (2.19), respectively, we obtain the optimal iteration complexity for nonsmooth, smooth and weakly smooth convex optimization (see Lan103; NemNes851; Nest881; DeGlNe101 for a discussion about the lower complexity bounds for solving these CP problems). Secondly, the ABL method achieves these aforementioned optimal complexity bounds without requiring the input of any smoothness information, such as whether the problem is smooth or not, and the specific values for and in (1.2). To the best of our knowledge, the ABL method seems to be the first uniformly optimal method for solving smooth, nonsmooth and weakly smooth CP problems in the literature. Thirdly, observe that one potential problem for the ABL method is that, as the algorithm proceeds, the model accumulates cutting planes, and the subproblems in procedure become more difficult to solve. We will address this issue in Section 3 by developing a variant of the ABL method.
2.3 Convergence analysis of the ABL gap reduction procedure
Our goal in this subsection is to prove Theorem 2.1, which describes some important convergence properties of procedure . We will first establish three technical results from which Theorem 2.1 immediately follows.
Lemma 1 below shows that the proxcenters for procedure are “close” to each other, in terms of . It follows the standard analysis of the BL method (see, e.g., LNN; BenNem00).
Lemma 1
Let and , , respectively, be computed in step 1 and step 2 of procedure before it terminates. Then the level sets given by have a point in common. As a consequence, we have
(2.22) 
where is defined in (2.5).
Proof
Let , , be defined in (2.9). First, in view of (2.10) and the termination criterion of procedure , we have
(2.23) 
Now let . Observe that, by (2.6), (2.8) and (2.23), we have, for any ,
We have thus shown that for any . Now by (2.7), we have
Summing up the above inequalities and using (2.5), we obtain
which clearly implies (2.22).
The following two technical results will be used in the convergence analysis for a few accelerated bundlelevel type methods, including ABL, APL and USL, developed in this paper.
Lemma 2
Let be given at the th iteration, , of an iterative scheme and denote . Also let be defined in (2.2) and suppose that the pair of new search points satisfy that, for some and ,
(2.24)  
(2.25) 
Then,
(2.26) 