Approximating optimization problems over convex functions
Many problems of theoretical and practical interest involve finding an optimum over a family of convex functions. For instance, finding the projection on the convex functions in , and optimizing functionals arising from some problems in economics.
In the continuous setting and assuming smoothness, the convexity constraints may be given locally by asking the Hessian matrix to be positive semidefinite, but in making discrete approximations two difficulties arise: the continuous solutions may be not smooth, and functions with positive semidefinite discrete Hessian need not be convex in a discrete sense.
Previous work has concentrated on non-local descriptions of convexity, making the number of constraints to grow super-linearly with the number of nodes even in dimension 2, and these descriptions are very difficult to extend to higher dimensions.
In this paper we propose a finite difference approximation using positive semidefinite programs and discrete Hessians, and prove convergence under very general conditions, even when the continuous solution is not smooth, working on any dimension, and requiring a linear number of constraints in the number of nodes.
Using semidefinite programming codes, we show concrete examples of approximations to problems in two and three dimensions.
Convex and concave functions appear naturally in many branches of science such as biology (growth), medicine (dose-response), or economics (utility, production or costs), spurring in turn the interest of other areas, for example, statistics. It is no surprise, then, that many problems of theoretical and practical interest involve the optimization of a functional over a family of convex functions.
A particularly important case is when the functional to be minimized is the norm of some normed function space , i.e., given find
where is a family of convex functions in . Typical choices are the spaces or , and, in general, the spaces, where is a convex domain in .
Sometimes the convexity is a reasonable shape assumption on the model, which could be replaced by or added to other shape constraints such as radial symmetry, harmonicity or upper and lower bounds. This is the case, for example, of Newton’s problem of minimal resistance (see, e.g., [3, 4, 12, 13]).
More surprisingly perhaps, the convexity constraint may be a consequence of the model, as in the design of some mechanisms in economics [14, 17]. Actually, our interest in the subject arose from one of these problems, which we will call the monopolist problem, and is described in some detail in section 6.1. In this problem we wish to find
where , is a non-negative probability density function over , and is a family of convex functions on with some further properties.
The monopolist problem is numerically very challenging since, unlike the problems coming from physics, the space dimension may be much higher than or .
There is a big qualitative jump going from one to more dimensions when dealing with convexity constraints. This can be appreciated readily by looking at the statistics literature, where the one dimensional convex regression or density problem has been considered both theoretically and numerically, and using different measuring functionals (see, e.g., [1, 8, 9, 15]), but very little has been done numerically in two or more dimensions. In this regard, it is worth mentioning the work by Shih, Chen and Kim , who use appropriate splines in the MARS (Multivariate Adaptive Regression Splines) algorithm.
One of the main difficulties in obtaining discrete approximations to optimization problems over convex functions lies in giving a local and finite description of convex functions if . Though this could be done for smooth functions of continuous variables by asking the Hessian matrix to be positive semidefinite at all points, there is no similar characterization for discrete functions on meshes. As a matter of fact, we show in examples 2.3 and 2.4 that this cannot be done easily.
It goes without saying that there is a lot of work done on convex functions of continuous variables, as exemplified by the classical book by Rockafellar . Traditionally, this work leans more on properties satisfied by convex functions rather than on properties that imply convexity. In particular, very little has been done on local properties guaranteeing convexity.
For discrete variables, there is a rather large work done by the discrete mathematics community for fixed lattices, and a number of definitions for discrete convex functions have been proposed (see, e.g., ). Again, usually these definitions are non-local.
As far as we know, there exists very few literature on optimization problems on convex functions in dimensions two or more, either theoretically or numerically. Besides those already mentioned (and the references therein), let us cite three more which are prominent in the context of our work.
Carlier, Lachand-Robert and Maury  proposed a numerical scheme for minimizers of functionals of the type
on closed convex subsets of convex functions of or , where is a quadratic function of and , and . We will discuss their approach in the next section (after the inequalities (2.5)).
As Carlier et al.  point out, their work encompasses the problem of finding
for given , i.e., a -norm projection, and this problem is equivalent to that of finding the convex envelope of . Thus, minimizing over convex functions and finding the convex envelope of a function are two quite related tasks.
Being one of the main problems in computational geometry, there are a number of well established codes for finding the convex hull of a set of points in , which are very efficient in low dimensions. Hence, it is natural to try to use these codes to approximate optimization problems on convex functions, an approach which Lachand-Robert and Oudet  applied to several problems.
It would be very interesting to see whether these ideas could be carried over, since the convex hull codes are quite fast in low dimensions.
Our approach, based on semidefinite programming, takes a different direction from those mentioned.
Let us recall that a semidefinite program is an optimization problem of the form
where , are symmetric matrices, and indicates that the symmetric matrix is positive semidefinite. By letting the matrices be diagonal, we see that the program (1.2) is a generalization of linear programming (and includes it strictly). Thus, in a semidefinite program the constraints can be a mixture of linear inequalities and positive semidefinite requirements.
In this paper we give a theoretical framework for approximating many optimization problems on convex functions using a finite differences scheme which imposes a positive semidefinite constraint on a discretization of the Hessian matrix.
Although not linear, our approach seems very natural and has many advantages. Being of a local nature, the number of constraints grows only linearly with the number of nodes, and it works for any dimension of the underlying space. Notwithstanding the already mentioned counterexamples (2.3 and 2.4) of the relation between convexity and positive semidefinite Hessian, we will show that for many problems we obtain convergence to a continuous optimum. This convergence holds even for non-smooth optima, as those arising when projecting in the norm or in some problems of the type given by equation (1.1).
In practice, our definition can be used to advantage by using the existing efficient semidefinite codes and, furthermore, it is very simple to program in higher dimensions.
As a final remark, we think that our approach might be a first step to deal with other problems where convexity is a consequence of the model, such as transportation problems where the cost is quadratic, i.e., for solving the Monge-Ampère equations, a prime example of fully non-linear differential equations.
This article is organized as follows.
In section 2 we summarize some techniques that could be used to deal with the numerical approximation of optimal convex functions, showing their strengths and weaknesses, and introduce the discrete Hessian.
In section 3 we give different characterizations of smooth convex functions of continuous variables, which allow us to extend the definition of the Hessian to functions in terms of averages.
The heart of the paper is section 4: in order to have a good definition of discrete convexity, we need to show that any convex function of continuous variables may be approximated by its discrete counterparts, and conversely, that a converging sequence (in a suitable norm) of discrete convex functions will do so to a convex function of continuous variables.
We show in this section that the approximation of the discrete functions to the limit is uniform in and its derivatives if is smooth, and uniform in over compact subsets of , if is merely convex and bounded (and hence continuous). Although our definition is stated using the Hessian, if a sequence of discrete functions converges to a function which is not or even , still convexity of the limit function is guaranteed.
In section 5 we pose a general structure for optimization problems on convex functions which fits the results of the previous sections, and may be applied to several important problems. In addition, error bounds may be obtained if the continuous optimal solution is smooth.
The results of sections 3–5 are somewhat independent of semidefinite programming. However, as explained in section 2, for many problems of interest the functional and constraints involved may be expressed as a semidefinite program, and we present in section 6 numerical examples showing that the current codes make it feasible to use our scheme on them.
We begin section 6 by considering the monopolist problem in two and three dimensions, comparing our results to the analytical solution when in (1.1). We continue by showing some norm projections, exploring the behavior under different functionals of an example given by Carlier, Lachand-Robert and Maury . Moreover, we exhibit an explicit example in which the projection is not unique. We finish the section on numerical examples by showing how our scheme could be used to fit a (discrete) convex function to noisy data given on the nodes of a regular mesh, using the norm (i.e., a least squares approach), and also the and norms, which are not often seen.
2 Dealing with convex functions
In this section we present some ideas and techniques that could be used to approximate numerically an optimal convex function. For the sake of simplicity, we will work on the unit -dimensional cube in , .
is a discretization of , then a convex function defined on satisfies the inequalities
Conversely, if a function defined only in the mesh points satisfies the inequalities (2.2), we can always extend it linearly in the intervals , , so that the resulting piecewise linear function will be convex on .
Thus, the inequalities (2.2) may be used to define discrete convexity in one dimension. It is clear that—via the piecewise linear extension—we can approximate any convex function in . On the other hand, as we show in lemma 2.1, under some adequate assumptions if a sequence of discrete convex functions (satisfying (2.2) for each corresponding subdivision) converges pointwise, then the limit defines a unique convex function in .
Let be given, , and suppose is a subdivision of satisfying (2.1) and such that
Let be defined on and satisfy (2.2). For a given , , let us consider
From (2.2) we see that
for all such that , . Therefore, if
where and are constants depending only on and (but not on or ). In other words, if is extended as a piecewise linear function to all of , the resulting function is uniformly Lipschitz on compact subsets of .
Let be a sequence of meshes in , , satisfying (2.3) for each , and such that for . Suppose is defined in for each , and the sequence is such that for all , exists and is finite, and let
Then may be extended to all of as a continuous convex function in a unique way. Moreover, the convergence of (extended piecewise linearly) to is uniform in compact subsets of .
Also, by the Arzela-Ascoli theorem,
Let be a sequence of meshes as in the previous lemma. Suppose is given, and that for each a function is defined in so that
and assume is extended to all of piecewise linearly.
Then, there exists a subsequence of of converging to a continuous convex function defined on . Moreover, the convergence of the subsequence to is uniform in compact subsets of .
Thus, from a theoretical point of view, the discrete functions satisfying (2.2) are well understood.
From the numerical point of view, when used in a discretization of an optimization problem, the constraints coming from the inequalities (2.2) are linear, and solving the resulting discrete optimization problem with them is usually not much harder than solving it without them.
Hence, the case poses no major trouble.
For it will be more convenient to work only with regular meshes (or grids) on . Thus, for fixed ( for some ), the mesh will consist of all points of the form with . Denoting by the interior of , we set
and denote by the set of real valued functions defined on .
A first simple idea to extend the inequalities (2.2) to more dimensions, is to consider the set of functions satisfying the convexity constraints
where denotes the -th vector of the canonical basis of .
This set of discrete functions is very appealing because the convexity is modelled by using only (number of mesh points) linear constraints, as in the case . As we have seen, this approach is exact in one dimension, and gives satisfactory results in many cases in more dimensions (in particular for some of the specific examples treated later in section 6.1), but there is no guarantee of convexity in the limit function if . In fact, by taking the interpolant, with this set we can certainly approximate any function of continuous variables which is convex, but we can also approximate other functions. For example, we may approximate which is linear—and thus convex—in each coordinate direction, but not convex in .
This shows that the definition of discrete convexity should be done with more care: when , we have to take into account every possible direction.
It is reasonable, then, to say that a discrete function is convex if
Since as goes to the possible directions become dense, it is possible (under some conditions) to regain convexity in the limit as we had for one dimension.
In a way, this is the approach followed by Carlier, Lachand-Robert and Maury . In a two dimensional setting, they consider discrete convex functions which are restrictions to a mesh of convex functions of continuous variables, and show that this definition is equivalent to an intrinsic one, stated only in terms of the value of the function at the grid points, similar to (2.5). The problem with this description is that it is non-local, and the number of constraints needed in two dimensions (after pruning) reportedly grows approximately as , where is the number of nodes in the grid. Moreover, this approach is very difficult to extend to higher dimensions.
In order to keep the definition of discrete convexity local, another possibility is to consider discretizations of the Hessian matrix, as we do in this work.
For and , we define the (forward) first order finite differences by
and the second order finite differences by
or, if , by
Clearly, not all of these finite differences are defined for points in . Therefore, when mentioning or for , we will implicitly assume that and (and eventually ) are such that the corresponding finite difference is well defined at .
Finally, we define the discrete Hessian of at , as the symmetric matrix whose entry is .
We are faced now with two issues:
How can a discrete optimization problem with positive semidefinite discrete Hessian constraints be solved?
Once we found a discrete solution, how does it approximate the continuous solution?
As mentioned in the introduction, for the first issue we will use the semidefinite programming model (1.2), where the objective is linear and the constraints are positive semidefinite. Thus, if the objective of the continuous problem is not linear, we must rewrite it as a constraint. However, it is worth mentioning that not every functional may be readily modelled as a positive semidefinite constraint, an example of such functionals being
arising in Newton’s problem of minimal resistance.
Although this method may not be used for all functionals, it can be used in many cases as we now illustrate. The interested reader is referred to the article by Vandenberghe and Boyd  for other possibilities.
In what follows we associate with each mesh node , , the unknown value , the discrete Hessian , and a -dimensional cube centered at of measure , so that .
subject to convex, is modeled by adding more unknowns (for a total of ) as
In turn, the constraints of the form
are lumped as , and finally modeled as the linear inequalities
This is analogous to the projection:
and the constraints of the form are written in the form
The projection is handled in a similar way.
This is analogous to the other projections, except that we only need to add one more variable (instead of ), obtaining the discrete program
The next issue is to see how well the discrete solutions of the semidefinite program approximate the continuous convex solution.
Our first hope is that will be equivalent to some version of convexity of discrete functions, for example to the one given by the inequalities (2.5). The next examples show that this is not true. The first one shows that a non-convex function may have a positive semidefinite discrete Hessian, and the second one shows that the discrete Hessian may not be positive semidefinite for some convex functions.
Let us consider , and defined on the 2-dimensional grid by
We may check that (with ),
whose eigenvalues are and (with eigenvectors and ), and is thus positive semidefinite.
However, the restriction to the diagonal is not convex:
Consider the same grid of the previous example and defined by
We see that is the restriction to of the convex function
which has eigenvalues and , and hence is not positive semidefinite.
Notwithstanding these examples, in section 4 we will show that, under suitable conditions, by asking for positive semidefinite discrete Hessians we may conveniently approximate continuous convex functions, and obtain variants of lemmas 2.1 and 2.2.
However, we will need some previous results which are tackled in the following section.
3 Convex functions of continuous variables
In this section we deal with variants of the Hessian matrix, initially defined for functions, which allow us to characterize convexity for functions locally.
Let us start by introducing some more precise notation:
The distance from a point to a set will be denoted by .
If and ,
The gradient of is denoted by , and we write for .
If is an open set, we denote by the set of functions having all derivatives up to order continuous on , and by the set of functions having all derivatives up to order uniformly continuous in . If is omitted, .
Let be a non-negative, real valued function in , vanishing outside , and such that and for define The function
defined for functions and points whenever the right hand side makes sense, has many interesting well known properties, and we state some of them without proof, referring the reader to, e.g., the book by Ziemer [22, Theorem 1.6.1 and Remark 1.6.2].
If , then for every , and for every multi-index , where for , .
If then is defined for and .
whenever is a Lebesgue point of .
If , and is a compact subset of , then converges uniformly to on as .
The functions , also called regularizations or mollifiers of , will make us work often with the sets
where , .
Our first result is a simple characterization of continuous convex functions using the regularizations , and we omit its proof.
Suppose and is defined as in (3.1). We have,
If is convex, then is convex in for all .
Conversely, if for a sequence of ’s converging to , is convex in , then is convex.
The Hessian of at is the symmetric matrix whose entry is , and convex functions in are characterized by the positive semidefinite condition
Also, for , and , the average
defined for , converges to for all , as . Thus the condition
implies the convexity of . Actually, since the average of positive semidefinite matrices is positive semidefinite, (3.3) is equivalent to convexity for .
The definitions of and involve second order derivatives which are not defined if is not smooth enough. However, we may express merely in terms of and integrating by parts, and the resulting formula will make sense for . Thus, for and let us define as the symmetric matrix whose diagonal entries are
|and, if ,|
|where denotes the dimensional face of the cube having outward normal .|
Of course, since the equations (3.4) were obtained integrating by parts, we have:
If then .
As we show now, the extension of to functions in , still gives a local characterization of convexity.
The following are valid if :
If is convex, then for all and .
If for a sequence , for all , then is convex.
Suppose , and consider the regularization defined in (3.1). Since we can interchange derivatives and integrals with the convolution,
Using theorem 3.1, since and their derivatives converge uniformly to in compact subsets of , we must have for in compact subsets of , and therefore in the whole of .
which implies that is convex in . Letting go to while keeping fixed, we see that is convex in , and by the second part of lemma 3.2, must be convex. ∎
4 Approximating convex functions, the discrete Hessian
There are two main issues when defining the set of discrete approximants to be used:
we want it to be rich enough to approximate every convex function, and
we want this set to be not too large, to avoid convergence to non-convex functions.
The first point is very natural, and necessary to approximate the solution to the problem. The second point might look artificial at first sight, but if not enforced, then we could be approximating a non-convex function. This is the case, for instance, if we only require convexity along the coordinate axes, as we stated in section 2.
Our discrete approximations will be a subset of functions having positive semidefinite discrete Hessians, and the main purpose of this section is to address the two issues mentioned above.
This is not surprising since we can approximate, say, , by discrete functions whose finite differences up to order converge to the derivatives up to order of , and then add a small perturbation to bring up the eigenvalues of the discrete Hessians. This is the main idea of the proof, and in its course, we will use the following part of the Hoffman-Wielandt theorem .
Theorem 4.1 (Hoffman and Wielandt).
There exists a positive constant , depending only on the dimension , such that if and are symmetric matrices, and and are their minimum eigenvalues, then
Let be convex. Then, for any , there exists such that for any , , there exists a function defined on satisfying for all ,
Let us recall that in the inequality (4.1), for we consider only the finite differences that are defined at that .
If is convex and smooth, given there exists such that for all and ,
where is the constant in theorem 4.1. Hence, since is convex and therefore , the minimum eigenvalue of is uniformly bounded below by .
Now consider the function
for which, if small enough,
where is the identity matrix in .
If , the function
defined on , satisfies
Thus, for some constant depending only on , the inequality (4.1) holds with replaced by .
The result follows now by taking and appropriately. ∎
The main implication of theorem 4.2 is that any smooth convex function is a limit of a sequence of functions with positive semidefinite discrete Hessians, giving an affirmative answer to the first issue mentioned at the beginning of this section. Moreover, an application of lemma 3.2 implies this result also for non-smooth convex functions, with convergence in on compact subsets of (see definitions below).
We would like to show next that our set of approximants also solves the second issue. That is, if we have a convergent (in certain norm) sequence of functions with positive semidefinite discrete Hessians, then the limit is convex. On the other hand, if the sequence is not convergent, we would like also to understand under which further conditions on the sequence we may extract a subsequence converging to a convex function.
In what follows, we will work with sequences of functions defined on finer and finer meshes, that is, sequences with for every , such that and decreases to . We will denote by the set of all such sequences, and use the notation
So as not to clutter even more the notation, usually we will drop the index , writing for , when this does not lead to confusion.
If and , we will say that
if for any , there exists , so that for all and , that is, as . In this case we will write, with a little abuse of notation,
If in addition , we write, similarly,
to indicate that and
uniformly for all for which the finite differences make sense and .
Suppose and are such that
Then is convex.
We follow closely what was done in section 3, using a variant of the divergence theorem for discrete variables, so that only approximations to the function or its derivatives—but not the second derivatives—are needed.
For and , let us define
where the sum is over all such that .
Since the sum of positive semidefinite matrices is a positive semidefinite matrix, we have
In one dimension we have just one entry in , involving a term of the form