Curse of dimensionality reduction in max-plus based approximation methods: theoretical estimates and improved pruning algorithms
Max-plus based methods have been recently developed to approximate the value function of possibly high dimensional optimal control problems. A critical step of these methods consists in approximating a function by a supremum of a small number of functions (max-plus “basis functions”) taken from a prescribed dictionary. We study several variants of this approximation problem, which we show to be continuous versions of the facility location and -center combinatorial optimization problems, in which the connection costs arise from a Bregman distance. We give theoretical error estimates, quantifying the number of basis functions needed to reach a prescribed accuracy. We derive from our approach a refinement of the curse of dimensionality free method introduced previously by McEneaney, with a higher accuracy for a comparable computational cost.
Dynamic programming is one of the main approaches to optimal control. It leads to solving Hamilton-Jacobi-Bellman (HJB) partial differential equations. Several techniques have been proposed in the literature to solve this problem. We mention, for example, finite difference schemes and the method of the vanishing viscosity [crandall-lions], the antidiffusive schemes for advection [zidani-bokanowski], the so-called discrete dynamic programming method or semi-Lagrangian method [capuzzodolcetta, falcone, falcone-ferretti, carlini-falcone-ferretti]. Unlike alternative approaches based on the maximum principles or on direct methods, dynamic programming based methods are guaranteed to give the global optimum of the problem. However, they suffer from the curse of dimensionality, meaning that the execution time grows exponentially with the dimension of the state space.
Recently, a new class of methods has been developed after the work of Fleming and McEneaney [a5], see in particular [curseofdim, a6, a7]. These methods all rely on max-plus algebra. Their common idea is to approximate the value function by a supremum of finitely many “basis functions”. They exploit the max-plus linearity of the Lax-Oleinik semi-group (evolution semi-group of the HJB partial differential equations associated to a deterministic optimal control problem). One of these methods, developed by Akian and al. [a6], was shown to be a max-plus analogue of the (Petrov-Galerkin) finite element method. In particular, the global error of the method can be estimated in terms of certain elementary projection errors, as in the case of usual finite elements.
Among the max-plus methods, the curse of dimensionality reduction of McEneaney [curseofdim] (see also [mccomplex, a7, eneaneyphys]) appears to be of special interest. In its original form, it applies to an optimal switching problem involving linear quadratic models: it approximates the solution by a supremum of quadratic functions which are obtained by solving Riccati equations. The theoretical analysis of the method [mccomplex] shows that the growth of the execution time is only polynomial as the dimension grows, keeping all the other parameters fixed. However, the bound of [mccomplex] still grows exponentially as the required accuracy tends to zero, hence the curse of dimensionality is replaced by a “curse of complexity” [a7]. However, the complexity of the method can be considerably reduced in practice by incorporating a pruning algorithm, which eliminates on the fly the redundant basis functions produced by the algorithm. In this way, high dimensional instances (with state dimensions from 6 to 15) inaccessible by other methods could be solved [a7, eneaneyphys].
This raises the question to understand why, and to what extent, max-plus techniques can attenuate the curse of dimensionality: this is the object of the present paper.
After a brief review of max-plus based methods (Section II) we establish in Section III a negative result, concerning the family of max-plus methods based on -semiconvex transforms, developed by Fleming and McEneaney [a5] and Akian et al. [a6]. In these methods, the function is approximated by a supremum of quadratic forms all of which have the same hessian. Then, Theorem III.3 below shows that the number of max-plus basis functions necessary to reach an accuracy of is at least of order , meaning that the curse of dimensionality is inherent to all of these methods (the order is optimal, it is reached in particular by the max-plus finite elements of [a6], with finite elements and or test functions [theseasma]). The proof, which is sketched in Section IV, relies on results concerning the approximation of smooth convex bodies [a2, b2].
However, this theoretical negative result is contrasted by the experimental efficiency of pruning in the dimensionality free method of [curseofdim], which often gives approximations of an acceptable accuracy for a modest amount of basis functions. Therefore, we focus our attention on the algorithmic aspects of the pruning problem in the rest of the paper. In Section V, we present a primal variant of the method, which avoids the use of dual representations: in the absence of pruning, it is equivalent to the original method, but we shall see that it leads to a more efficient pruning. Next, we show in Section VI that the optimal pruning problem can be formulated as a continuous version of the -median or -center problem, depending on the choice of the norm. The discrete versions of these problems are NP-hard. Hence, we propose several heuristics (combining facility location heuristics and Shor SDP relaxation scheme). Experimental results are given in Section VII. They show that by combining the primal version of the method with improved pruning algorithms, a higher accuracy is reached for a similar running time, by comparison with [curseofdim, a7].
Ii Max-plus numerical methods to solve optimal control problems
Ii-a The Lax-Oleinik semi-group
We consider the optimal control problem
Here, is the set of states, is the set of actions, denotes the horizon, the initial condition , the Lagrangian , the terminal reward , and the dynamics are given. The supremum is taken over all the control functions and system trajectories satisfying (2), and is the value function. We will assume here for simplicity that the set is invariant by the dynamics (2) for all choices of the control function . Under certain regularity assumptions, it is known that is the solution of the Hamilton-Jacobi equation:
with initial condition:
Let be the Lax-Oleinik semi-group, i.e., the evolution semi-group of the Hamilton-Jacobi equation. For every horizon , is a map which associates to the terminal reward the value function on horizon . By semi-group, we mean that for all . Recall that the max-plus semiring, , is the set , equipped with the addition and the multiplication . For all maps from to and , we denote by the map such that and by the map such that . It is known that the semi-group is max-plus linear, i.e.,
We shall see that the max-plus basis method exploit these properties to solve the optimal control problem (1).
Ii-B Max-plus linear spaces
A set of functions is a max-plus linear space if for all and , the functions and belong to . A max-plus linear space is (conditionally) complete if the pointwise supremum of any family of functions of that is bounded from above by an element of is finite.
Let be a set of functions (max-plus basis functions). The complete max-plus (linear) space of functions generated by is defined to be the set of arbitrary linear combinations of elements of , in the max-plus sense, so that an element of reads for some family of elements of . The (non complete) space is defined in a similar way, but the linear combination must now involve a finite family, meaning that for all but finitely many values of . We refer the reader to [litvinov, ilade, mceneaney-livre] for more background on max-plus linear spaces.
Several choices of basis functions have been considered in the literature. Following [a5] and [a6], we will consider here a set consisting of the basis functions of the form
where is a fixed real constant. Hence, an element of can be written as
for some function . Recall that a function is -semiconvex if and only if the function is convex. Then, it follows from Legendre-Fenchel duality that the space coincides with the space of -semiconvex (lower semicontinuous) functions.
When , it is sometimes more convenient to write the expansion (7) in the form
One can pass from one representation to the other by the change of variable .
If is a complete max-plus linear space of functions , and if is any function , the projection of onto is defined to be
(by writing max, we mean that the supremum element of the set under consideration belongs to this set, which follows from the completeness of ).
All the previous definitions can be dualized, replacing max by min, and by . In particular, a complete min-plus linear space is a set of functions such that is a complete max-plus linear space. Then, we define the dual projector by
for all functions .
Ii-C Max-plus basis methods
All these methods approximate the value function at time by a finite max-plus linear combination of max-plus basis functions, i.e.,
where . Typically, (see [a5]), so that the previous approximate representation is nothing but a discretization of the semiconvex representation (7) of .
Then, the coefficients and the functions need to be inductively determined. Let us fix a time discretization step , such that for some integer . Using the semi-group property, we get
We require the max-plus basis approximation of to satisfy the analogous relation, at least approximately:
The various methods differ in the way they address the two subproblems,
for all , replace by an easily computable (accurate enough) approximation ;
Project each , or , to the space of finite max-plus linear combinations of basis functions.
The first subproblem is the simplest one: computing is equivalent to solving an optimal control problem, but the horizon is small, and the terminal reward (typically a quadratic function) has a regularizing and a “concavifying” effect, which implies that the global optimum can be accurately approached (by reduction to a convex programming problem), leading to various approximations with a consistency error of , with , or sometimes better, depending on the scheme, see [mceneaney-livre, a6, theseasma].
The second step is the critical one, in particular, the accuracy of the method is limited by the projection error [a6], i.e., the maximal distance between every function or and its best approximation by a max-plus linear combination of basis functions. In a number of methods, including the original one [a5], the approximation is such that, assuming that the approximation of is exact,
so that where , and is defined as in (8). Then, the approximation error in the norm, satisfies
Similarly, in the max-plus finite element method of Akian, Gaubert and Lakhoua [a6], the approximation is computed recursively by
where at each step, we also have a (dual) min-plus linear space generated by finitely many test-functions. Then, it follows from [a6] that the sup-norm projection error cannot be expected to be smaller than
Moreover, it is shown in [a6] that this estimate is tight, meaning that the total error of the method, is of order at most times the previous sum, up to a term depending on the quality of approximation of by .
Finally, in the curse of dimensionality method of McEneaney [curseofdim], the basis functions are quadratic forms and the semi-group is approximated by the pointwise supremum of semi-groups associated to linear-quadratic control problems
The number of basis functions of the linear combination grows by a factor of at each step. Then, the accuracy of the method is still limited by the projection error, which arises when pruning the less useful basis functions.
Iii Curse of dimensionality for semiconvex based approximations
As discussed above, the main source of inaccuracy of max-plus basis methods is the projection error (12). In this section, we give an asymptotic estimate of the optimal projection error as the number of basis functions tends to infinity, in the special case in which the space consists of quadratic functions (6), as in the max-plus basis method [a5], or in the type finite element method of [a6, theseasma].
Let , and be a -semiconvex function. It is approximated by a given number of semiconvex basis functions :
We are interested in the or approximation error
for some suitable full dimensional compact convex subset . We shall compute these errors when , and so, for reasons discussed in Section II-C (see (11)), we shall require that . We denote by (resp. ) the minimal (resp. ) approximation error on of by as in (14), by the hessian matrix of at point , and by the identity matrix of size .
The next two theorems imply that whatever computation scheme is chosen for the coefficients , the approximation error is necessarily subject to a curse of dimensionality.
Theorem III.1 ( approximation error)
Let , and let denote any full dimensional compact convex subset. If is -semiconvex of class , then, there exists a constant depending only on such that
Theorem III.2 ( approximation error)
Let , , and be as in Theorem III.1. Then there exists a constant depending only on such that
The proof of these theorems is sketched in the next section, it builds on analogous methods and results of Gruber [a2], [b2], concerning the approximation of smooth strictly convex bodies by circumscribed polytopes, the constants and , which grow slowly with , already appeared there.
The following theorem is a direct corollary:
Assume that is the value function, and that it is and -semiconvex. Then, for any max-plus basis method providing an approximation from below of of the value function by a supremum of quadratic functions (see (14)), the error and error of approximation of the value function are both as .
Besides, the estimates of Theorems III.1 and III.2 confirm that when is sufficiently large, it is more interesting to choose the smallest such that is convex. Note also that the integral term can be small if the Hessian of is nearly constant and close to (attenuation of the curse of dimensionality).
In this section, we sketch the proof of Theorems III.1 and III.2. Let , , and satisfy the conditions of these theorems. Note that approximating as in (14) is equivalent to approximating the strongly convex function by of its affine minorants. Hence, it suffices to prove Theorems III.1 and III.2 when , is strongly convex, and is approximated by the supremum of of its affine minorants. We denote by the gradient operator of , by the point and by the quadratic form determined by the hessian matrix . Let us first recall some results on the approximation of convex bodies by circumscribed polytopes.
Gruber proved in [a2], [b2] that for any convex body with boundary and positive Gaussian curvature , there are two constants and depending only on such that as :
Here and are respectively the minimal distance with respect to Hausdorff and metric between and any circumscribed polytope with facets, is the ordinary surface area measure on . Moreover, we have the following asymptotic estimates for the constants and :
where is estimated as [b1]:
Both of the proofs partition into finitely many pieces associated to a family of supporting planes. For each supporting plane, there is a corresponding strongly convex function whose graph is a piece of . Then, the volume of the difference between and a circumscribed polytope can be estimated by computing the norm of the difference of this strongly convex functions with some of its piecewise affine lower bounds. For the Hausdorff metric case, some results regarding the optimal covering of a manifold by geodesic discs are used. We next apply the same techniques to our problem.
First of all, we recall the definition of the Bregman distance.
Definition IV.1 ([Mr0215617])
For any two points and of , the Bregman distance from to , associated to a strongly convex and differentiable function , is defined by
The Bregman distance is positive definite ( and the equality holds if and only if ), but it may not be symmetric.
The proofs follow essentially the same steps as for convex bodies. We give just an outline of the proof without going into details.
To prove Theorem III.1, we need an asymptotic formula for optimal quantization:
Theorem IV.1 ([a2])
Let be Jordan measurable with positive volume , and a positive definite quadratic form on . Then as :
Sketch of proof of Theorem III.1 for
Let , for each , there is an open convex neighborhood such that:
Then for every , the Bregman distance is bounded below and above as:
We choose finitely many points with respective neighborhoods covering the compact set . One may then dissect the integral on smaller pieces with Jordan measurable. Using the asymptotic formula of Theorem IV.1 and some other arithmetic inequalities as in [b2], the theorem can be deduced.
Sketch of proof of Theorem III.2 for
Let be the graph of on . is a -dimensional (Riemannian) manifold of class with metric of class . For each , one may define the Riemannian metric between and , , by:
To prove Theorem III.2, we need an asymptotic formula on the minimum covering. The next lemma is a special case of Lemma 1 in [a2]:
Lemma IV.1 (Compare with Lemma 1 in [a2])
For , let be the minimum number of discs of radius with respect to the Riemannian metric needed to cover . Then:
Given a similar result of minimum covering under Euclidean metric of Hlawka [MR0030547], this lemma essentially proves the equivalence between the Riemannian metric on and the Euclidean metric on . Since our problem is to minimize the maximum radius of Bregman balls covering the manifold , one last thing to be proved is the equivalence between the Riemannian distance and the Bregman distance. Indeed, we prove that:
V Primal curse of dimensionality free method
We consider the optimal control problem for switched linear system studied in [curseofdim] (see also [a7, mccomplex]). Let .
and the state dynamics are given by
where and are such that , . The corresponding HJB PDE is:
where has the form:
Under certain technical assumptions [curseofdim] which we will not repeat here, the function is finite, it is a viscosity solution of (25), and it is given by where is the Lax-Oleinik semi-group of the Hamilton-Jacobi equation (3) for defined in (25). In [curseofdim], the value function is approximated by a max-plus sum of quadratic functions, and the approximated semi-group is propagated in a dual space. We next introduce a variant, which we call the primal curse of dimensionality free method: it is equivalent if no trimming is performed, but it avoids the use of dual representations.
V-a Approximate propagation
V-B Computation of a single semi-group operator
The propagation of a quadratic function by reduces to solving a differential Riccati equation (DRE). Moreover, it is well-known that one can recover the solution of a DRE from a system of Hamiltonian linear differential equations (see, e.g., [MR0357936]). Suppose there are only quadratic terms, i.e., . Let , then , where and are the solution of:
We denote by the matrix coefficient in the above linear system. Note that the invertibility of can be derived from the fact that the value function is finite. Given a fixed time step , the fundamental solution of the previous linear system satisfies:
In the presence of linear or constant terms in the control system or in the quadratic function, the problem can be easily transformed into a purely quadratic one by adding a constant state variable. The above analysis shows that, given a fixed propagation time , computing , for every quadratic form , reduces to a matrix multiplication and an inverse operation, which can be done in incremental time.
V-C Propagation and curse of complexity
We choose a time discretization step , a number of steps and an initial function . Let be the approximation at step . The iteration formula is given by:
The computational growth in the space dimension is cubic, as shown in the above subsection. However, the number of quadratic forms grows by a factor of at each iteration. This curse-of-complexity issue also occurs for the dual method in [curseofdim]. Some SDP relaxation based pruning method was proposed in [a7] to reduce the number of quadratic forms. We next discuss improvements of this pruning methods, still partly SDP based, but now exploiting the combinatorial nature of the problem.
Vi Reduction of pruning to -center and -median problems for a Bregman type distance
Vi-a Formulation of the pruning problem
We first give a general formulation for the pruning problem appearing in max-plus basis methods. Let . Let be a max-plus sum of basis functions: . Let be a fixed integer. The problem is to approximate by keeping only basis functions. To measure the approximation error, we introduce a Bregman type distance between each point and each basis function , such that:
In other words, the distance measures the loss at point caused when approximating by . For example, the simplest choice is to let . Consider a compact set on which we measure the loss. One may minimize the total loss ( metric) or the maximal loss ( metric) on .
metric and -median problem
metric and -center problem
We recognize in (29) and (30) the classical -median and the -center facility location problem with continuous demand area and discrete service points. The facility location problem, discrete or continuous, is known to be -hard even with euclidean distance. Besides, we remark that a subproblem of Problem (29) is the volume computation for polytopes, which is known to be -hard. To the best of our knowledge, the only few references that discuss this general class of location problem replace the continuous demand with a discrete one with large number of points, see [MRDrezner]. In the following, we consider a specific case and propose a method based on SDP relaxation to generate discrete points.
Vi-B Pruning methods
We assume that all basis functions are quadratic: . We normalize the distance function as in [a7], i.e.,
’Sort upper bound’ [a7]
The first method was introduced in [a7]. Roughly speaking, to each basis function we associate an importance metric :
Then is the normalized error caused by pruning the function . In some sense the bigger is, the more useful the function is. In particular, when the function is dominated by the others and it can be pruned without generating any approximation error. Let
The problem (31) is equivalent to:
This nonconvex QCQP(quadratically constrained quadratic program) [MR2061575] has its SDP relaxation given by:
Then is an upper bound of the importance metric . Finally the sort upper bound method consist in sorting all the upper bounds and picking up the first ones.
’Sort lower bound’
The SDP relaxation (33) provides not only an upper bound on the importance metric but also a rather simple way to generate feasible solutions.
Suppose is a solution of program (33). We use the randomization technique [MR1801497] to get feasible points: we pick as a Gaussian random variable with . Then over this distribution, the constraints in (32) are satisfied on average. By sampling a sufficient number of times, we get a such that the inequality constraints in (32) are all satisfied. Then, setting provides a lower bound of (31). The proposed procedure provides in practice a good lower bound, although there is no theoretical guarantee in the present generality.
Then, the sort lower bound method proceeds as follows. Fixing an integer , for each basis function , we resolve the SDP program (33), get feasible points using the above randomization technique and put them into a set . At the end we get a discrete set and for each basis function we calculate its lower bound by:
Finally we sort all of the lower bounds and keep the first ones.
Following the above randomization technique, we get a discrete set which in some sense reflect rather well the importance of each basis function. We replace the compact set by this discrete set and seek to minimize the total loss on . This gives the discrete -median problem:
This central problem in combinatorial optimization has seen a succession of papers designing approximations algorithms. Our two last pruning methods are merely two heuristics for the -median problem (34).
’J-V facility location’
Lin and Vitter [Lin:1992:EMP:129712.129787] proved that the constant factor approximation for general -median problem is -hard. For metric distance, Jain and Vazirani [MR1851303] proposed a primal-dual 6-approximation algorithm. This algorithm is interesting not only due to its constant factor, but also because it is combinatorial (there is no need to solve a linear program).
’greedy facility location’
The fourth method is the greedy heuristic. Remember that the function to be minimized in the facility location problem is supermodular, which implies that the greedy heuristic has a bound estimate (even without the triangular inequality on the distance function). Let be the value of a particular solution constructed by the greedy heuristic, then we have [MR0503866]: where . The execution time of the greedy heuristic is .
Vii Experimental results
Vii-a Problem instance
To compare with the dual max-plus basis method, we use the instance of [a7] originating from H-infinity control, in which the parameters where chosen to exhibit a complex behavior. The state dimension and the switch number are both 6. The overpruning threshold is also the same: we keep basis functions at step .
Without the exact value function, we do not have a direct error estimation. Recall that the value function is the unique viscosity solution of the following HJB equation:
where is defined in (26). The value of Hamiltonian is then used to measure the approximation.
Vii-B Numerical results
All of our results
For comparison with [a7], we first take the same time-discretization step-size , the same iteration steps 25 and the same pruning method (sort upper bound). Figure 1 shows the value of Hamiltonian at the end of 25 iterations. Comparing with the error plot shown in [a7], which is in the same scale but has a peak of error of order (versus here), we see that the primal max-plus basis method yields a small improvement.
Now we take smaller time-discretization step-sizes. Figure 2 compares the four pruning methods with and . They both show that the sort lower bound and the greedy facility location pruning method are better than the two others.
Our experimental results confirm that the total approximation error comes both from the approximation error of the Lax-Oleinik semi-group and from the pruning error. The error of approximation of the semi-group can be improved by decreasing the discretization-time step-size , while the pruning error depends on the pruning method. When becomes small, the pruning appears to be the bottleneck. We introduced here new pruning methods, combining facility location algorithms and semidefinite relaxations, which improve the final precision (see Figure 2 ). However, these pruning methods remain time-consuming (see Table I), new ideas are needed to develop more efficient methods. Our experiments also show that the error is of order , which is smaller than the bound of established in [mccomplex]. This remains to be studied theoretically.
|=0.2, =25||Total time||Propagation||SDP||Pruning|
- The code was mostly written in Matlab (version 184.108.40.2064), calling YALMIP (version 3) and SeDuMi (version 1.3) for the resolution of SDP programs. The computation of the distance function and Jain & Vazirani’s primal dual algorithm were written in C++. The results were obtained on a single core of an Intel quad core running at 2.66GHz, with 8Gb of memory.