Convergence analysis for Lasserre’s measure–based hierarchy of upper bounds for polynomial optimization

Convergence analysis for Lasserre’s measure–based hierarchy of upper bounds for polynomial optimization

Etienne de Klerk Etienne de Klerk Tilburg University PO Box 90153, 5000 LE Tilburg, The Netherlands
33email: E.deKlerk@uvt.nlMonique Laurent Centrum Wiskunde & Informatica (CWI), Amsterdam and Tilburg University CWI, Postbus 94079, 1090 GB Amsterdam, The Netherlands
66email: M.Laurent@cwi.nlZhao Sun École Polytechnique de Montréal GERAD–HEC Montreal 3000, Côte-Sainte-Catherine Rd, Montreal, QC H3T 2A7, Canada
99email: Zhao.Sun@polymtl.ca
Monique Laurent Etienne de Klerk Tilburg University PO Box 90153, 5000 LE Tilburg, The Netherlands
33email: E.deKlerk@uvt.nlMonique Laurent Centrum Wiskunde & Informatica (CWI), Amsterdam and Tilburg University CWI, Postbus 94079, 1090 GB Amsterdam, The Netherlands
66email: M.Laurent@cwi.nlZhao Sun École Polytechnique de Montréal GERAD–HEC Montreal 3000, Côte-Sainte-Catherine Rd, Montreal, QC H3T 2A7, Canada
99email: Zhao.Sun@polymtl.ca
Zhao Sun Etienne de Klerk Tilburg University PO Box 90153, 5000 LE Tilburg, The Netherlands
33email: E.deKlerk@uvt.nlMonique Laurent Centrum Wiskunde & Informatica (CWI), Amsterdam and Tilburg University CWI, Postbus 94079, 1090 GB Amsterdam, The Netherlands
66email: M.Laurent@cwi.nlZhao Sun École Polytechnique de Montréal GERAD–HEC Montreal 3000, Côte-Sainte-Catherine Rd, Montreal, QC H3T 2A7, Canada
99email: Zhao.Sun@polymtl.ca
Abstract

We consider the problem of minimizing a continuous function over a compact set . We analyze a hierarchy of upper bounds proposed by Lasserre in [SIAM J. Optim. , pp. ], obtained by searching for an optimal probability density function on which is a sum of squares of polynomials, so that the expectation is minimized. We show that the rate of convergence is no worse than , where is the degree bound on the density function. This analysis applies to the case when is Lipschitz continuous and is a full-dimensional compact set satisfying some boundary condition (which is satisfied, e.g., for convex bodies). The th upper bound in the hierarchy may be computed using semidefinite programming if is a polynomial of degree , and if all moments of order up to of the Lebesgue measure on are known, which holds, for example, if is a simplex, hypercube, or a Euclidean ball.

Keywords:
Polynomial optimization Semidefinite optimizationLasserre hierarchy
90C2290C26 90C30
journal:

1 Introduction and Preliminaries

1.1 Background

We consider the problem of minimizing a continuous function over a compact set . That is, we consider the problem of computing the parameter:

 fmin,K:=minx∈Kf(x).

Our main interest will be in the case where is a polynomial, and is defined by polynomial inequalities and equations. For such problems, active research has been done in recent years to construct tractable hierarchies of (upper and lower) bounds for , based on using sums of squares of polynomials and semidefinite programming (SDP). The starting point is to reformulate as the problem of finding the largest scalar for which the polynomial is nonnegative over and then to replace the hard positivity condition by a suitable sum of squares decomposition. Alternatively, one may reformulate as the problem of finding a probability measure on minimizing the integral . These two dual points of view form the basis of the approach developed by Lasserre Las01 () for building hierarchies of semidefinite programming based lower bounds for (see also Las09 (); ML09 () for an overview). Asymptotic convergence to holds (under some mild conditions on the set ). Moreover, error estimates have been shown in Sch (); NS () when is a general basic closed semi-algebraic set, and in KL10 (); KLP06 (); KLS13 (); KLS14 (); DW (); Fay (); SZ14 () for simpler sets like the standard simplex, the hypercube and the unit sphere. In particular, Sch () shows that the rate of convergence of the hierarchy of lower bounds based on Schmüdgen’s Positivstellensatz is in the order , while NS () shows a convergence rate in for the (weaker) hierarchy of bounds based on Putinar’s Positivstellensatz. Here, are constants (not explicitly known) depending only on , and is the selected degree bound. For the case of the hypercube, KL10 () shows (using Bernstein approximations) a convergence rate in for the lower bounds based on Schmüdgen’s Positivstellensatz.

On the other hand, by selecting suitable probability measures on , one obtains upper bounds for . This approach has been investigated, in particular, for minimization over the standard simplex and when selecting some discrete distributions over the grid points in the simplex. The multinomial distribution is used in Nes (); KLS13 () to show convergence in and the multivariate hypergeometric distribution is used in KLS14 () to show convergence in for quadratic minimization over the simplex (and in the general case assuming a rational minimizer exists).

Additionnally, Lasserre Las11 () shows that, if we fix any measure on , then it suffices to search for a polynomial density function which is a sum of squares and minimizes the integral in order to compute the minimum over (see Theorem 1.1 below). By adding degree constraints on the polynomial density we get a hierarchy of upper bounds for and our main objective in this paper is to analyze the quality of this hierarchy of upper bounds for . Next we will recall this result of Lasserre Las11 () and then we describe our main results.

1.2 Lasserre’s hierarchy of upper bounds

Throughout, is the set of polynomials in variables with real coefficients, and is the set of polynomials with degree at most . is the set of sums of squares of polynomials, and consists of all sums of squares of polynomials with degree at most . We now recall the result of Lasserre Las11 (), which is based on the following characterization for nonnegative continuous functions on a compact set .

Theorem 1.1

(Las11, , Theorem 3.2) Let be compact, let be an arbitrary finite Borel measure supported by , and let be a continuous function on . Then, is nonnegative on if and only if

 ∫Kg2fdμ≥0  ∀g∈R[x].

Therefore, the minimum of over can be expressed as

 fmin,K=infh∈Σ[x]∫Khfdμ  s.t. % ∫Khdμ=1. (1)

Note that formula (1) does not appear explicitly in (Las11, , Theorem 3.2), but one can derive it easily from it. Indeed, one can write . Then, by the first part of Theorem 1.1, we have . As , after normalizing , we can conclude (1).

If we select the measure to be the Lebesgue measure in Theorem 1.1, then we obtain the following reformulation for , which we will consider in this paper:

 fmin,K=infh∈Σ[x]∫Kh(x)f(x)dx  s% .t. ∫Kh(x)dx=1.

By bounding the degree of the polynomial by , we can define the parameter:

 f––(r)K:=infh∈Σ[x]r∫Kh(x)f(x)dx  s.t. ∫Kh(x)dx=1. (2)

Clearly, the inequality holds for all . Lasserre Las11 () gives conditions under which the infimum is attained in the program (2).

Theorem 1.2

(Las11, , Theorems 4.1 and 4.2) Assume is compact and has nonempty interior and let be a polynomial. Then, the program (2) has an optimal solution for every and

We now recall how to compute the parameter in terms of the moments of the Lebesgue measure on , where

 mα(K):=∫Kxαdx    for α∈Nn,

and .

Let , and suppose has degree . If we write as , then the parameter from (2) can be reformulated as follows:

 f––(r)K = min∑β∈N(n,d)fβ∑α∈N(n,2r)hαmα+β(K) s.t.   ∑α∈N(n,2r)hαmα(K)=1, ∑α∈N(n,2r)hαxα∈Σ[x]r.

Hence, if we know the moments for all with , then we can compute the parameter by solving the semidefinite program (1.2) which involves a LMI of size . So the bound can be computed in polynomial time for fixed and (to any fixed precision).

When is the standard simplex , the unit hypercube , or the unit ball , there exist explicit formulas for the moments . Namely, for the standard simplex, we have

 mα(Δn)=∏ni=1αi!(|α|+n)!, (4)

see e.g., (LZ01, , equation (2.4)) or (GM78, , equation (2.2)). From this one can easily calculate the moments for the hypercube :

 mα(Qn)=∫Qnxαdx=n∏i=1∫10xαiidxi=n∏i=11αi+1.

To state the moments for the unit Euclidean ball, we will use the notation , the Euler gamma function , and the notation for the double factorial of an integer :

 k!!=⎧⎪⎨⎪⎩k⋅(k−2)⋯3⋅1,if k>0 is odd,k⋅(k−2)⋯4⋅2,if k>0 is even,1if k=0 or k=−1.

In terms of this notation, the moments for the unit Euclidean ball are given by:

 mα(B1(0))=⎧⎪ ⎪⎨⎪ ⎪⎩πn/2∏ni=1(αi−1)!!Γ(1+n+|α|2)2|α|/2=π(n−1)/22(n+1)/2∏ni=1(αi−1)!!(n+|α|)!!\quad if αi is even for all% i∈[n],0\quad otherwise. (5)

One may prove relation (5) using

 ∫B1(0)xαdx=1Γ(1+(n+|α|)/2)∫Rnxαexp(−∥x∥2)dx

(see, e.g., (Las14, , Theorem 2.1)), together with the fact (see, e.g., page in Las11 ()) that

and the identity for all integers (see e.g., (AS64, , Section 6.1.12)).

For a general polytope , it is a hard problem to compute the moments . In fact, the problem of computing the volume of polytopes of varying dimensions is already #P-hard DF88 (). On the other hand, any polytope can be triangulated into finitely many simplices (see e.g., LRS08 ()) so that one could use (4) to obtain the moments of . The complexity of this method depends on the number of simplices in the triangulation. However, this number can be exponentially large (e.g., for the hypercube) and the problem of finding the smallest possible triangulation of a polytope is NP-hard, even in fixed dimension (see e.g., LRS08 ()).

Example

Consider the minimization of the Motzkin polynomial over the hypercube , which has four global minimizers at the points , and . Figure 1 shows the computed optimal sum of squares density function , for , corresponding to . We observe that the optimal density shows four peaks at the four global minimizers and thus, it appears to approximate the density of a convex combination of the Dirac measures at the four minimizers.

We will present several additional numerical examples in Section 4.

1.3 Our main results

In this paper we analyze the quality of the upper bounds from (2) for the minimum of over . Our main result is an upper bound for the range , which applies to the case when is Lipschitz continuous on and when is a full-dimensional compact set satisfying the additional condition from Assumption 1, see Theorem 1.3 below. We will use throughout the following notation about the set .

We let denote the (squared) diameter of the set , where is the -norm. Moreover, is the minimal width of , which is the minimum distance between two distinct parallel supporting hyperplanes of . Throughout, denotes the Euclidean ball centered at and with radius . With denoting the volume of the -dimensional unit ball, the volume of the ball is given by

We now formulate our geometric assumption about the set which says (roughly) that around any point there is a ball intersecting a constant fraction of the unit ball.

Assumption 1

For all points there exist constants and such that

 \rm vol(Bϵ(a)∩K)≥ηK\rm vol% Bϵ(a)=ηKϵnγn  for all 0<ϵ≤ϵK. (6)

Note that Assumption 1 implies that the set has positive Lebesgue density at all points . For all sets satisfying Assumption 1, we also define the parameter

 rK:=max{D(K)e2ϵ3K,n}  if ϵK≤1,  and  rK:=D(K)e2  if ϵK≥1. (7)

Here, denotes the base of the natural logarithm. Note that the parameters , and depend not only on the set but also on the point ; we omit the dependance on to simplify notation. Assumption 1 will be used in the case when the point is a global minimizer in of the polynomial to be analyzed.

For instance, convex bodies and, more generally, compact star-shaped sets satisfy Assumption 1 (see Section 5.1). We now give an example of a set that does not satisfy Assumption 1 and refer to Section 5.1 for more discussion about Assumption 1.

Example 1

Consider the following set , displayed in Figure 2:

 K={x∈R2 : x≥0,(x1−1)2+(x2−1)2≥1}.

One can easily check that Assumption 1 is not satisfied, since the condition (6) does not hold for the two points and .

We now present our main result.

Theorem 1.3

Assume that is compact and satisfies Assumption 1. Then there exists a constant (depending only on ) such that, for all Lipschitz continuous functions with Lipschitz constant on , the following inequality holds:

 f––(r)K−fmin,K≤ζ(K)Mf√r   for all r≥rK+1. (8)

Moreover, if is a polynomial of degree and is a convex body, then

 f––(r)K−fmin,K≤2d2ζ(K)supx∈K|f(x)|wmin(K)1√r   for all r≥rK+1. (9)

The key idea to show this result is to select suitable sums of squares densities which we are able to analyse. For this, we will select a global minimizer of over and consider the Gaussian distribution with mean and, as sums of squares densities, we will select the polynomials obtained by truncating the Taylor series expansion of the Gaussian distribution, see relation (14).

Remark 1

When the polynomial has a root in (which can be assumed without loss of generality), the parameter involved in relation (9) can easily be upper bounded in terms of the range of values of ; namely,

 supx∈K|f(x)|≤fmax,K−fmin,K,

where denotes the maximum value of over . Hence relation (9) also implies an upper bound on in terms of the range as is commonly used in approximation analysis (see, e.g., KHE08 (); KLP06 ()).

1.4 Contents of the paper

Our paper is organized as follows. In Section 2, we give a constructive proof for our main result in Theorem 1.3. In Section 3 we show how to obtain feasible points in that correspond to the bounds through sampling. This is followed by a section with numerical examples (Section 4). Finally, in the concluding remarks (Section 5), we revisit Assumption 1, and discuss perspectives for future research.

2 Proof of our main result in Theorem 1.3

In this section we prove our main result in Theorem 1.3. Our analysis holds for Lipschitz continuous functions, so we start by reviewing some relevant properties in Section 2.1. In the next step we indicate in Section 2.2 how to select the polynomial density function as a special sum of squares that we will be able to analyze. Namely, we let denote a global minimizer of the function over the set . Then we consider the density function in (12) of the Gaussian distribution with mean (and suitable variance) and the polynomial in (14), which is obtained from the truncation at degree of the Taylor series expansion of the Gaussian density function . The final step will be to analyze the quality of the bound obtained by selecting the polynomial and this will be the most technical part of the proof, carried out in Section 2.3.

2.1 Lipschitz continuous functions

A function is said to be Lipschitz continuous on , with Lipschitz constant , if it satisfies:

 |f(y)−f(x)|≤Mf∥y−x∥ for all x,y∈K.

If is continuous and differentiable on , then is Lipschitz continuous on with respect to the constant

 Mf=maxx∈K∥∇f(x)∥. (10)

Furthermore, if is an -variate polynomial with degree , then the Markov inequality for on a convex body reads as

 maxx∈K∥∇f(x)∥≤2d2wmin(K)supx∈K|f(x)|,

see e.g., (KHE08, , relation (8)). Thus, together with (10), we have that is Lipschitz continuous on with respect to the constant

 Mf≤2d2wmin(K)supx∈K|f(x)|. (11)

2.2 Choosing the polynomial density function Hr,a

Consider the function

 Ga(x):=1(2πσ2)n/2exp(−∥x−a∥22σ2), (12)

which is the probability density function of the Gaussian distribution with mean and standard variance (whose value will be defined later). Let the constant be defined by

 ∫KCK,aGa(x)dx=1. (13)

Observe that is equal to the function evaluated at the point .

Denote by the Taylor series expansion of truncated at the order . That is,

 Hr,a(x)=1(2πσ2)n/22r∑k=01k!(−∥x−a∥22σ2)k. (14)

Moreover consider the constant , defined by

 ∫KcrK,aHr,a(x)dx=1. (15)

The next step is to show that is a sum of squares of polynomials and thus . This follows from the next lemma.

Lemma 1

Let denote the (univariate) polynomial of degree obtained by truncating the Taylor series expansion of at the order . That is,

 ϕ2r(t):=2r∑k=0(−t)kk!.

Then is a sum of squares of polynomials. Moreover, we have

 0≤ϕ2r(t)−e−t≤t2r+1(2r+1)! for all t≥0. (16)
Proof

First, we show that is a sum of squares. As is a univariate polynomial, by Hilbert’s Theorem (see e.g., (ML09, , Theorem 3.4)), it suffices to show that for all . As , it suffices to show that at all the stationary points where . For this, observe that so that it can be written as Hence, for all with , we have .

Next, we show that for all . Fix . Then, by Taylor Theorem (see e.g., WW96 ()), one has for some . As , one can conclude that and

We now consider the parameter defined as

 f(r)K,a:=∫Kf(x)crK,aHr,a(x)dx. (17)

Our main technical result is the following upper bound for the range .

Theorem 2.1

Assume is compact and satisfies Assumption 1, and consider the parameter from (7). Then there exists a constant (depending only on ) such that, for all Lipschitz continuous functions with Lipschitz constant on , the following inequality holds:

 f(r)K,a−fmin,K≤ζ(K)Mf√2r+1,   for all r≥rK2. (18)

Moreover, if is a polynomial of degree and is a convex body, then

 f(r)K,a−fmin,K≤2d2ζ(K)supx∈K|f(x)|wmin(K)√2r+1,   for all % r≥rK2. (19)

We will give the proof of Theorem 2.1, which has lengthy technical details, in Section 2.3 below. We now show how to derive Theorem 1.3 as a direct application of Theorem 2.1.

Proof

(of Theorem 1.3) Assume is Lipschitz continuous with Lipschitz constant on and is a minimizer of over the set . Using the definitions (2) and (17) of the parameters and the fact that is a sum of squares with degree , it follows that

 f––(2r+1)K≤f––(2r)K≤f(r)K,a,  for all r∈N.

Then, from inequality (18) in Theorem 2.1, one obtains

 f––(2r+1)K−fmin,K≤f––(2r)K−fmin,K≤f(r)K,a−fmin,K≤ζ(K)Mf√2r+1   for all r≥rK2.

Hence, for all ,

 f––(r)K−fmin,K ≤ ζ(K)Mf√r+1≤ζ(K)Mf√r  for even r, f––(r)K−fmin,K ≤ ζ(K)Mf√r  for odd r.

This concludes the proof for relation (8), and relation (9) follows from (19) in an analogous way. This finishes the proof of Theorem 1.3. ∎

2.3 Analyzing the polynomial density function Hr,a

In this section we prove the result of Theorem 2.1. Recall that is a global minimizer of over . For the proof, we will need the following four technical lemmas.

Lemma 2

Assume is compact and satisfies Assumption 1. Then, for all and , we have:

 crK,a≤CK,a≤(2πσ2)n/2exp(ϵ22σ2)ηKϵnγn. (20)
Proof

By Lemma 1, for all , which implies for all . Together with the relations (13) and (15) defining the constants and , we deduce that . Moreover, by the definition (13) of the constant , one has

 1CK,a = ∫KGa(x)dx=∫K1(2πσ2)n/2exp(−∥x−a∥22σ2)dx ≥ ∫K∩Bϵ(a)1(2πσ2)n/2exp(−∥x−a∥22σ2)dx ≥ 1(2πσ2)n/2exp(−ϵ22σ2)\rm vol(K∩Bϵ(a)).

We now use relation (6) from Assumption 1 in order to conclude that , which gives the desired upper bound on . ∎

Lemma 3

Given and a function , define the function by for all . Then, for all , one has

 ∫Bρ2(~x)∖Bρ1(~x)f(x)dx=nγn∫ρ2ρ1zn−1F(z)dz,

where is the volume of the unit Euclidean ball in .

Proof

Apply a change of variables using spherical coordinates as explained, e.g., in LEB60 (). ∎

Lemma 4

For all positive integers and , one has .

Proof

Let be given. Denote

 g(r):=(12r+1)−n4(2r+1)+2n=(2r+1)n4(2r+1)+2n(r≥0).

Observe that, , for all , , and thus . It suffices to show for all stationary points . Since

 dln(g(r))dr=−8nln(2r+1)(8r+4+2n)2+2n(2r+1)(8r+4+2n),

and , any stationary point satisfies

 dln(g(r∗))dr=0⟺(2r∗+1)[ln(2r∗+1)−1]=n2.

Since

 (2r∗+1)(ln(3)−1)≤(2r∗+1)[ln(2r∗+1)−1]=n2,

one has . Since for all , one has

Lemma 5

Assume is compact and satisfies Assumption 1. Then, for all , one has

 ∫KCK,a∥x−a∥Ga(x)dx≤ϵ+nσn+1p(n)ϵnηKeϵ22σ2,

where is a constant depending on , given by

 p(n)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩1if n=1,√π2∏kj=1(2j−1)if n=2k and k≥1,∏kj=1(2j)if n=2k+1 and k≥1. (21)
Proof

Let denote the integral that we need to upper bound. We split the integral as , depending on whether lies in the ball or not.

First, we upper bound the term as

 φ1:=∫K∩Bϵ(a)∥x−a∥CK,aGa(x)dx≤ϵ∫K∩Bϵ(a)CK,aGa(x)dx≤ϵ∫KCK,aGa(x)dx=ϵ.

Second, we bound the integral

 φ2:=CK,a∫K∖Bϵ(a)∥x−a∥Ga(x)dx.

Since , one has

 φ2≤CK,a∫B√D(K)(a)∖Bϵ(a)∥x−a∥Ga(x)dx,

where the right hand side, by Lemma 3, is equal to

 CK,anγn(2πσ2)n/2∫√D(K)ϵznexp(−z22σ2)dz.

By a change of variable , one obtains

 φ2≤CK,anγnσ(2π)n/2∫√D(K)/σϵ/σtnexp(−t22)dt,

and thus

 φ2≤CK,anγnσ(2π)n/2∫+∞0tnexp(−t22)dt=CK,anγnσ(2π)n/2p(n).

Here we have set which can be checked to be given by (21) (e.g., using induction on ). Now, combining with the upper bound for from (20), we obtain

 φ2≤nσn+1p(n)ϵnηKeϵ22σ2.

Therefore, we have shown:

 φ=φ1+φ2≤ϵ+nσn+1p(n)ϵnηKeϵ22σ2,

which shows the lemma. ∎

We are now ready to prove Theorem 2.1.

Proof

(of Theorem 2.1) Observe that, if is a polynomial, then we can use the upper bound (11) for its Lipschitz constant and thus the inequality (19) follows as a direct consequence of the inequality (18). Therefore, it suffices to show the relation (18).

Recall that is a minimizer of over . As is Lipschitz continuous with Lipschitz constant on , we have

 f(x)−f(a)≤Mf∥x−a∥  ∀x∈K.

This implies

 f(r)K,a−fmin,K=∫KcrK,aHr,a(x)(f(x)−f(a))dx≤Mf∫K∥x−a∥crK,aHr,a(x)dx.

Our objective is now to show the existence of a constant such that

 ψ:=∫KcrK,a∥x−a∥Hr,a(x)dx≤ζ(K)√2r+1,  for all r≥rK, (see % (???))

by which we can then conclude the proof for (18).

For this, we split the integral as the sum of two terms:

 ψ = ∫KcrK,a∥x−a∥Ga(x)dx=:ψ1+∫KcrK,a∥x−a∥(