Semi-infinite optimization with sums of exponentials via polynomial approximation

# Semi-infinite optimization with sums of exponentials via polynomial approximation

Bogdan Dumitrescu, Bogdan C. Şicleru, Florin Avram
###### Abstract

We propose a general method for optimization with semi-infinite constraints that involve a linear combination of functions, focusing on the case of the exponential function. Each function is lower and upper bounded on sub-intervals by low-degree polynomials. Thus, the constraints can be approximated with polynomial inequalities that can be implemented with linear matrix inequalities. Convexity is preserved, but the problem has now a finite number of constraints. We show how to take advantage of the properties of the exponential function in order to build quickly accurate approximations. The problem used for illustration is the least-squares fitting of a positive sum of exponentials to an empirical density. When the exponents are given, the problem is convex, but we also give a procedure for optimizing the exponents. Several examples show that the method is flexible, accurate and gives better results than other methods for the investigated problems.

Department of Automatic Control and Computers, University Politehnica of Bucharest, Spl. Independenţei 313, Bucharest 060042, Romania. E-mails: bogdan.dumitrescu, bogdan.sicleru@acse.pub.ro
Department of Signal Processing, Tampere University of Technology, Finland. E-mail: bogdan.dumitrescu@tut.fi
Université de Pau et des Pays de l’Adour, France

## 1 Introduction

### 1.1 The problem

The purpose of this paper is twofold. Our main aim is to give a detailed procedure for solving a class of semi-infinite programming (SIP) problems involving the function

 f(t)=n∑i=1pi(t)e−λit, (1)

where and are polynomials, having usually a low degree . For the sake of abbreviation, we name SOPE (sum of polynomials times exponentials) a function like (1).

In the same time, but with much less detail, we point out how the proposed approach can be extended to a much more general category of functions, with the exponentials from (1) replaced by arbitrary (typically elementary) functions.

The main source of problems involving (1) is the modeling of random variables. If are all constants (), then is the probability density function (pdf) of a hyperexponential distribution. If and are nonnegative, then is the pdf of an Erlang mixture. If and are nonnegative, then is called a hyper-Erlang density.

There are three types of optimization problems involving (1) that we tackle. They all involve SIP constraints having the form

 f(t)≥0, ∀t∈[t0,tf], (2)

where typical choices are , ; however, it is enough to take a sufficiently large instead of infinity, due to the decay properties of the exponential. For simplicity of reference, we give ad hoc names to these problems and list them in increasing order of difficulty.

Positivity check (SOPE-P). The simplest problem is: given a function (1), decide if the inequality (2) holds. Such decision is necessary e.g. when should be a pdf.

Convex constraint (SOPE-C). More interesting are the cases where the parameters of the function are variables of the optimization problem. In particular, when the polynomial coefficients are variable, but the exponents are given, the constraint (2) is convex. This type of problem is our main focus. We will always assume that the number of exponentials and the degrees of the polynomials are known; they can be selected using Information Theoretic Criteria, but this is beyond the scope of this paper.

A typical problem is fitting a pdf to empirical data. A random process is governed by an unknown pdf ; from empirical observations, we know the values , for some times , , usually equidistant. Assuming that (1) is an appropriate model, we want to fit it to the data by solving the least squares problem

 minJ(f)=1MM∑m=1wm[f(tm)−hm]2s.t.f(t)≥0, ∀t∈[t0,tf] (3)

where are weights (we take implicitly ). If the exponents are given, the problem (3) belongs to convex SIP; its difficulty comes from the infinite number of constraints hidden by (2). There is no simple way to express it equivalently using a finite number of constraints like, for example, for polynomials.

With small modifications, the problem (3) can be posed not for pdfs, but for cumulative density functions (cdf) or complementary cdfs, see later (35).

General constraint (SOPE-G). The most difficult problem is when the exponents are also variable. In this case an optimization problem like (3) is no longer convex. Although we will present a solution to this problem also, it will have no guarantee of optimality.

An interesting particular case of the above problems is when , hence the polynomials are reduced to constants and so

 f(t)=n∑i=1αie−λit. (4)

The name of the problems will be changed accordingly, by replacing SOPE with SOE (sum of exponentials).

### 1.2 Contribution and contents

The approach we propose is based on a polynomial approximation of the exponential function that allows the approximation of a SOPE-C problem like (3) with a polynomial optimization problem that can be expressed with linear matrix inequalities and hence solved efficiently with semidefinite programming (SDP) methods in friendly media like CVX  and POS3POLY  or GloptiPoly .

We split the inequality (2) in sub-intervals

 [t0,tf]=K⋃k=1[tk−1,tk] (5)

and impose positivity on each sub-interval. We compute polynomials , such that

 ^bik(t)≥e−λit≥ˇbik(t),  ∀t∈[tk−1,tk]. (6)

Using such lower and upper approximations of the exponentials, we impose sufficient positivity conditions on (1), simultaneously on all sub-intervals. The resulting approximation of SOPE-C will be detailed in Section 2.

One-sided approximations like in (6) can be computed using the results from  (see also [6, 15]), as reviewed later in Section 3. Since the derivatives of the exponential have constant sign, a satisfactory approximation can be computed by solving a linear system.

It is intuitive that by playing with the degrees of the approximation polynomials and the lengths of the sub-intervals, the approximation can be in principle as good as desired, at the price of increased complexity. This is true not only for the exponential, but for all smooth functions on a finite interval. Numerical accuracy is also an important issue, especially since the approximating polynomials from (6) are defined by their coefficients. However, since we use the inequalities (6) to build a lower approximation of and thus positivity is not violated (like when the exponentials would be approximated with a single polynomial), the quality of the approximation should be comparable to the accuracy of the SDP algorithm used for solving the transformed problem.

In a practical implementation, we have reached the conclusion that it is enough to use relatively low degrees of the approximation polynomials, for example 8 or 10, in order to reach reasonable accuracy and complexity.

Using this approach, we propose in Section 4 a complete iterative procedure for solving the SOE-G and SOPE-G problems (3). In particular, we select initial values for the exponents by searching sparse functions (1) whose exponents belong to an arithmetic progression. For given exponents, the coefficients are optimized using the polynomial approximation suggested above. Possibly better values of the exponents are sought through a descent search, the coefficients are re-optimized, etc.

Although the procedure is based on simple ideas, the results given in Section 5 show that it is competitive, giving good results on several fitting problems studied previously.

### 1.3 Previous work

The idea of using polynomial approximations in optimization is by no means new. However, as far as we know, the particular combination of ideas that lies at the foundation of our approach was not proposed. We present below a few relevant works and show the distinctive features of our method.

Most of the methods use a single polynomial for approximation on the whole interval or on sub-intervals. A leading example is the library Chebfun , which allows numerically-performed ”symbolic” computation by actually replacing given functions with polynomials that approximate them to machine precision on the interval of interest. However, due to numerical considerations, the polynomials are not defined by their coefficients but by their values in Chebyshev nodes and barycentric Lagrange interpolation is employed for computing the function values. Hence, Chebfun can solve a problem like SOPE-P by computing the minimum of the function, but is not able to take advantage of convexity.

In , functional optimization is performed by approximating the unknown function with a polynomial with unknown coefficients. It can be seen as a non-parametric form of our problem, where the values of are sought for all in some interval. Here, we are interested in the parameters of a function with known structure, hence our method is parametric.

A precursor of the above approach was presented in , where the minimum of a given function was obtained by computing instead the minimum of an interpolating polynomial. Polynomial positivity, used for the minimum computation, is enforced through SDP via interpolation .

In somewhat the same vein of non-parametric methods, but with pdf estimation as specific target, functional approximation with polynomials of a pdf fitting given moments is proposed in . A simpler idea appeared in , proposing piecewise polynomial approximations for pdfs, with no interest on optimization and no positivity enforcement. More refined approaches appeal to splines [13, 17, 1] (the latter work summarizes previous work of its authors, including the use of multidimensional splines); positivity is enforced with an ad hoc method in the first work, while in the others it is imposed on each spline section via SDP.

One-sided polynomial approximations are less current; for example, in , they were used for relaxation in a branch and bound process for global optimization. The approximation regarded the optimization variables, not independent functions, like here. Polynomial optimization was solved via linear programming.

There is also an entire body of literature dedicated to the estimation of the parameters of a pdf, with diverse applications. Optimization is used more or less explicitly, but usually in a relatively standard way. We cite only a few methods connected to our work. SDP was used in  for estimating pdfs that are the product between a polynomial and a kernel with few parameters, like the Erlang mixture; polynomial positivity is imposed via SDP. In , the SOE-G problem is solved within the class of exponents in arithmetic progression; however, the opportunity is missed for transforming it into a polynomial problem, like shown in Section 4, and hence use SDP; positivity is imposed by rather ad hoc means. In , matrix exponential distributions are estimated; after finding the exponents with a method for linear systems identification, the coefficients are found by optimization, positivity being ensured by a Budan-Fourier technique. Among other families of techniques, we mention expectation maximization .

Although this paper gives an incomplete image, our aim is visible. When implemented not only for the exponential, but for more general categories of functions, our method could be seen as a possible meeting point between CVX and Chebfun, by extending CVX to SIP, using polynomial approximations like Chebfun and having a simple modus operandi like both of them.

## 2 Transformation to a polynomial problem

We discuss here the approximation of SOPE-C with an optimization problem with polynomials, thus preserving convexity, but transforming the semi-infinite constraint into a finite one.

Let us assume that the splitting (5) and the approximations (6) of the exponentials are available. We show how to impose the positivity constraint (2) on a generic sub-interval , keeping in mind that the conditions are imposed simultaneously on all sub-intervals.

### 2.1 Known signs

Assume first that the polynomials have constant and known signs on . This is usually possible only when solving SOPE-P, where are fixed. Let be the set of indices for which on , and be defined similarly for the negative case. In this situation, we can write

 f(t)≥∑i∈I+pi(t)ˇbik(t)+∑i∈I−pi(t)^bik(t)=Pk(t). (7)

Since is a polynomial whose coefficients depend linearly on those of , imposing on is easy and can be inserted into any convex optimization problem involving . How to transform a polynomial positivity constraint into a linear matrix inequality is discussed in . However, such knowledge is not necessary when using a library facilitating the manipulation of positive polynomials, like POS3POLY.

### 2.2 Unknown signs

In general, the coefficients and the signs of are not known, since the coefficients of these polynomials are variables in the optimization problem SOPE-C. In this case, it is impossible to build the approximation (7). However, we can replace it with

 Pk(t)=n∑i=1[pi(t)−γik(t)]^bik(t)+γik(t)ˇbik(t), (8)

with the extra conditions

 pi(t)−γik(t)≤0,  ∀t∈[tk−1,tk]γik(t)≥0, (9)

where is a polynomial, typically of the same degree as . If , then we can consider the alternative bounding polynomial

 Pk(t)=n∑i=1γik(t)^bik(t)+[pi(t)−γik(t)]ˇbik(t), (10)

with the extra conditions

 pi(t)−γik(t)≥0,  ∀t∈[tk−1,tk]γik(t)≤0. (11)

If , then (8)-(9) and (10)-(11) are equivalent; otherwise, they are different, but it is hard to give general rules for choosing one over the other. In both cases, it is clear that and so is a sufficient condition for . From now on we will work only with (8)-(9).

So, the SOPE-C problem (3) is approximated with

 minJ(f)s.t.∑ni=1[pi(t)−γik(t)]^bik(t)+γik(t)ˇbik(t)≥0,  ∀t∈[tk−1,tk], k=1:Kpi(t)−γik(t)≤0γik(t)≥0}  ∀t∈[tk−1,tk], k=1:K, i=1:n (12)

Note that the criterion is unchanged. The price for generality is the apparition of the new variable polynomials and of the new constraints (9). Although there is a potentially large number of variable and constraints, namely , we will see later that the degrees of all polynomials involved here are generally small, and also the number of sub-intervals is not large. So, the problem (12) does not have an excessively high complexity.

###### Remark 1

For some polynomial , denote

 Pk,ε(t)=n∑i=1[pi(t)−γik(t)−εi(t)]^bik(t)+[γik(t)+εi(t)]ˇbik(t). (13)

We note that .

So, when solving an optimization problem with the constraint , the polynomials will generally tend to take their smallest possible values that are allowed by the constraints (9); the underlying reason is that the least conservative is the constraint , the larger the feasibility domain of the optimization problem and hence a possibly better solution. So, if , then it results that , and (assuming ) if , then ; the approximation (8) actually coincides (a posteriori, after the optimization problem is solved and is available) with that for known signs (7). If changes the sign on the current interval, then is the best upper approximation to .

The above values of are reached only if the constraint is active on the interval . However, the important conclusion is that the construction (8)-(9) naturally gives the best lower approximation of by a polynomial , given the bounding polynomials (6).

## 3 One-sided polynomial approximations of a set of exponentials

We present here the tools for finding the polynomial approximation (6) and the intervals (5) for a set of exponentials , . We also suggest how this can be done for other functions.

### 3.1 Approximation of a single exponential

We start by showing how to find lower and upper polynomial approximations to a single exponential on an interval . This is made via [3, Th.4], taking advantage of the fact that the derivatives of all orders of the exponential have constant sign.

We present in detail only one case, that of the lower approximation of odd degree , to a function whose derivative of order is nonnegative on (which is the case of our exponential). The approximation is optimal in the sense that the norm

 ∫τ1τ0w(t)[ϕ(t)−ˇb(t)]dt (14)

is minimized, where is a weight function. Denote by , …, the zeros of the -th order polynomial from the sequence of polynomials which are orthogonal on with respect to . Then, the coefficients of can be found by solving the linear system given by the equations

 ˇb(xk)=ϕ(xk), ˇb′(xk)=ϕ′(xk),  k=1:ℓ. (15)

So, if the zeros of are readily available, then the computation of is very simple and effective.

The other cases, corresponding to the three other combinations of the parity of and the sign of the -th derivative of are similar, but involve the zeros of the polynomials that are orthogonal with respect to , and , and one or both of the interval ends.

For the sake of quick computation, we chose the weight , which generates Chebyshev polynomials of the first kind. Their roots are

 xk=τ1−τ02cos(2k−1)π2ℓ+τ1+τ02,  k=1:ℓ. (16)

(The other three weights above generate Chebyshev polynomials of the second kind and Jacobi and polynomials, whose roots are also availabe via simple formulas.)

###### Example 2

To have an idea of the approximation error, we plot in Figure 1 the maximum value of and for several degrees of the polynomial and intervals starting from 0 and ending in various points up to 5. One can see that, for example, a polynomial of degree 8 gives an error smaller than for intervals included in . The approximation becomes unreliable when the approximation error approaches , in the sense that it may be no longer one-sided. However, errors of order appear perfectly obtainable. Figure 1: One-sided polynomial approximation error for e−t. Left: for various degrees on the interval [0,1]. Right: for degree 8 and intervals from 0 to the value on the horizontal axis.

Although the infinity norm would be preferrable to an integral norm like (14), in order to minimize the maximum approximation error, the computation of such a norm would involve a Remez-like algorithm which significantly increases the computation time with relatively small benefits (at least in the case of the exponential and our choices of intervals).

An alternative to the above construction was employed in  and works for general functions, not only in the restrictive conditions from . Using a grid like (16), one builds Chebyshev interpolation polynomials, for whom an approximation error bound is available. Subtracting or adding the bound value to the polynomial gives lower or upper one-sided approximations. A more refined approach , based on a sound numerical implementation of the Remez algorithm, can give nearly optimal solutions.

### 3.2 Approximations of a set of exponentials

We go now to our full problem, approximating a set of exponentials on a reunion of intervals. We assume a target approximation error . Given the exponentials , , the problem is how to find the intervals (5) and the degrees of the polynomials (6) such that

 maxt∈[tk−1,tk]|^bik(t)−e−λit|≤ϵ,maxt∈[tk−1,tk]|e−λit−ˇbik(t)|≤ϵ,∀i=1:n, ∀k=1:K. (17)

The solution is obviously not unique, but we aim to find one that is computationally advantageous. The exponential function allows a very cheap solution.

###### Remark 3

Assume that the maximum approximation error of a polynomial of degree approximating on the interval is . Then, there exist polynomials of degree such that:

• is approximated on with error at most ;

• is approximated on with error at most .

Combining these two results, it follows that is approximated on with error at most .

###### Example 4

Let us put the problem differently. We take and . We want to approximate on an interval as large as possible, with approximation error less than . From the above Remark, the error for on should be less than . Looking in Figure 1 (right), we see that this happens for . Indeed, taking gives the desired accuracy.

The approximation procedure we propose is as follows. Besides the input data listed in the beginning of this section, we assume that a maximum degree is given.

1. We make a table of approximation errors given by polynomials built as described in this section for approximating on . The errors are measured on a grid; there can be a few tens of values and can go from 0 to 10 or 12. (The result would be a ”cartesian product” of the two graphs from Figure 1.) This table, having a few hundred entries, is built a single time. Even so, the computation time for a table was only 0.25 seconds on a standard desktop, which is negligible.

2. Assume that we have found and we search . For each , using the table and Remark 3, we find the interval length such that the approximation error for on is at most . More precisely, we seek in the table the value for which has the largest value smaller than and set . Finally, to ensure that the approximation error is respected for all exponentials, we take the smallest interval length and put . This iterative procedure, ending when , is extremely fast since it involves only table searches.

For the sake of numerical accuracy, we optionally can reduce the degrees of the approximating polynomials for the exponentials that are not deciding the length of an interval. (Note that the highest degree of a polynomial actually decides the complexity.) Denoting , for each for which we search in the table the smallest degree for which is smaller than , where is the smallest grid value larger than . Remark 3 ensures that the error made by approximating on with a polynomial of degree is at most .

###### Example 5

Let us approximate the exponentials with exponents , 1 and 3 on the interval , with error less then , using polynomials of degree at most . The table is built with step value for . Using the above procedure, we find that 10 sub-intervals are necessary. The first is , on which the degrees of the approximating polynomials are 5, 6 and 8; the fastest decaying exponential naturally needs the highest degree. The last sub-interval is and the degrees are 6, 5 and 1. They are smaller than because the sub-interval is cut short by ; for example, the previous sub-interval is ; since the exponentials are decaying, the sub-intervals are longer as grows. Note that now the slowest exponential sets the degree; the fastest has almost vanished.

Also for improving numerical accuracy, it is useful to move the approximation on an interval centered in the origin. Similarly to a Vandermonde system, the linear system (15) tends to become ill-conditioned when the roots (16) have all the same sign and large absolute value. So, denoting , instead of working with for , we work with for . Since usually the degree of is small, the bad numerical effect of computing the coefficients of (as a polynomial in ) is much smaller than the reduction of the condition number of the system solved for approximating on .

It is clear that the above procedure is particularly fit for the exponential function. For other functions, the polynomial approximations must be computed explicitly for each sub-interval, in a trial-and-error procedure for finding the right polynomial degrees and the sub-intervals lengths; this is essentially what Chebfun does, but in our case more flexibility is allowed since we don’t (and cannot) aim to approximation within machine precision.

It is also obvious that imposing an approximation error for each exponential, does not make the approximation error in (2) be of the same size. However, an a posteriori analysis of the function , in particular of the values of the computed polynomials , can help estimate the actual error. A single new run with a smaller would be required, since the new optimized function should be near the previous one and hence the worst-case accuracy becomes predictable.

## 4 Fitting an empirical density

Let us now come back to our prototype problem (3) of fitting an empirical density and consider first the simpler SOE case (4). We start by studying a helpful particular case.

### 4.1 Exponents in arithmetic progression

Let us assume that the exponents are known and form an arithmetic progression, meaning that

 λi=λ1+(i−1)q,  i=1:n, (18)

where is the ratio. In this case, the function (4) has the form

 f(t)=e−λ1tn∑i=1αie−(i−1)qt. (19)

Denoting , the condition (2) becomes

 n∑i=1αixi−1≥0,  ∀x∈[e−qtf,e−qt0]. (20)

This is a polynomial positivity condition, easy to impose through LMIs . Hence the SOPE-C problem (3) is equivalent to an SDP problem. No approximation is required, the problem is naturally polynomial.

The method from  works with models based on the progression (18) and transforms into a polynomial as above. However, positivity is imposed by quite rudimentary means (fitting a SOE to the square root of the target pdf, then squaring, which gives also a SOE, but with more terms). Moreover, the values and defining the arithmetic progression are found by just trying different values until a satisfying result is obtained.

### 4.2 An algorithm for the SOE-G problem

The arithmetic progression case can be used for initialization in an iterative procedure for the general problem SOE-G. The iterative part uses the polynomial approximation described in Sections 2 and 3.

1. Initialization using sparse arithmetic progression. Let us assume that an estimate of is available. This can be obtained like in , using the tail of ; for large values of , the slowest exponential dominates the others. Alternatively, we can just take sufficiently small, since it is enough to have .

We then attempt to solve SOE-G by making a more general assumption than in Section 4.1 and searching exponents estimations that belong to an arithmetic progression

 ~λi=~λ1+μiq, (21)

where are unknown positive integers.

To this purpose, we work with the function

 ~f(t)=N∑i=0~αie−(~λ1+qi)t=e−~λ1tN∑i=0~αie−qit, (22)

with given and . In principle, should be small enough to cover decently well the possible intervals where the exponents lie and should be as large as computationally acceptable (e.g. 100 is certainly good, but one can consider going to 200 and even beyond). Since we seek a sparse solution, with only nonzero coefficients , we modify the problem (3) by adding a sparsity-promoting term to the criterion and transforming it into

 minJ(~f)+β∑Ni=0ϖi|~αi|s.t.~f(t)≥0, ∀t∈[0,∞) (23)

where and , , are weighting constants. The second term of the criterion is a weighted 1-norm of the coefficients vector , denoted ; the most meaningful choice appears to be , which takes into account that and thus implicitly normalizes the exponentials from (4).

The problem (23) is convex in the coefficients . The positivity constraint can be expressed as the positivity of a polynomial on , by substituting as in Section 4.1. If the constant is large enough, many of the coefficients will be small and only few will have significant values. We take the largest of them (in weighted absolute value ) and the corresponding are given by their positions.

We can then re-solve (23) by imposing that only the chosen coefficients are nonzero, finding thus the optimal coefficients for the selected (this can have the advantage of exact positivity constraint via polynomials, but can be skipped by going directly to the iterative step).

The problem (23) could be replaced with a minimization of with a bounded ; in this case, the bound should be chosen instead of ; this may make more sense if some value of the criterion is already available.

2. Iterative part. With the above initialization, we can start the iterative part of the algorithm, where the coefficients and the exponents are optimized alternatively.

For given exponents , we optimize the coefficients by solving the approximation (12) of the SOE-C problem (3), as shown in Sections 2 and 3. The auxiliary variables are just scalars.

For optimizing the exponents we attempt small gradient steps. The gradient is

 ∂J(f)∂λi=−2αiMM∑m=1wmtm[f(tm)−hm]e−λitm. (24)

The step size is , so the new values of the exponents are

 λi←λi−ς∂J(f)∂λi. (25)

(If the resulting would become negative, we reduce the step size such that they stay positive.) Then, the SOE-C problem is (approximately) solved with the new exponents . If the value of the criterion does not decrease, we restore the previous exponents, halve the step size and recompute (25). Note that by modifying we usually improve the criterion but may go out of the positivity domain. Solving (12) means returning back to it, but not necessarily with a better criterion value.

The iterations continue as long as the improvement is significant or the step size is not very small. Of course, there is no guarantee of optimality, but our main purpose is to illustrate the kernel of our approach—the polynomial approximation idea. Even with such a simple descent procedure, the results are satisfactory in the test problems we will report in Section 5.

### 4.3 Extension to SOPE

The iterative part of the above procedure can be used for true SOPE functions (1), with some , with the minor change of the gradient expression (24) into

 ∂J(f)∂λi=−2MM∑m=1wmtmpi(tm)[f(tm)−hm]e−λitm. (26)

However, the arithmetic progression trick used in the SOE case is no more possible, since the transformation no longer leads to a polynomial. Instead, we simply use the exponents resulting from a SOE solution (not necessarily fully optimized) as initialization for the SOPE problem. As confirmed by numerical evidence, this seems effective especially when the polynomials degrees from (1) are small.

## 5 Results

The polynomial approximation method described in this paper has been implemented using POS3POLY  for CVX  and can be downloaded from http://www.schur.pub.ro/sope, together with the programs solving the problems presented in this section. The simplest description of the constraint (2), with a SOE function (4) characterized by the variable vector of coefficients alpha and the constant vector of exponents lambda is

   alpha == pos_soe( lambda, t0, tf );


where t0 and tf are the positivity interval ends. This is consistent with CVX style and the parameters of the approximation are hidden, although the user can control them if so desired.

We start illustrating the behavior of our method by solving a problem proposed in , that is slightly different in nature from (3), but finally has the same form.

###### Example 6

Unlike an empirical pdf, the SOE

 h(t)=16e−t2−30e−t+15e−2t (27)

has also negative values. We want to find the nearest positive SOE (4) to , by minimizing

 J(f)=∫∞0[f(t)−h(t)]2dt. (28)

This criterion is convex quadratic in , like (3), but is rational in . The expression of the gradient with respect to is omitted here, for brevity, but easy to obtain.

Using the tail of like in , the smallest exponent estimation is . With , and in (2223), the initialization step of our algorithm gives , , and a criterion value . Positivity is imposed on the interval , which is large enough to ensure it on .

The iterative step improves it to , the result being

 f(t)=19.91e−0.5315t−48.63e−1.0264t+29.66e−1.5177t. (29)

Figure 2 presents the graph of this function and of the target . The whole design, containing 30 iterations, took less than 1 minute on a standard desktop computer. The polynomial approximations of the exponentials were made with and . The minimum value of the SOE (29), computed on a very fine grid, is . Taking a smaller reduces this value only if the accuracy of the SDP solver is also increased. For example, setting cvx_precision best (which means that CVX iterates as long as the criterion can be decreased without numerical trouble), a value of (which is still safe for the polynomial approximation) leads to a minimum of the SOE of .

In the above case, the initialization step already gave a good result. However, the iterative step can significantly decrease the criterion if the approximation is poor. For example, with , the initialization step gives and the final result is .

Solving the SOE-C problem with the exponents from (27) kept fixed, the optimal criterion is 0.0712. The same value is given by a dedicated algorithm that takes into account that the exponents are actually part of an arithmetic progression and imposes positivity via polynomials; the resulting optimal SOE, whose graph is also shown in Figure 2, is

 f(t)=15.5243e−t2−28.5073e−t+14.2410e−2t. (30)

The result reported in  is

 f(t)=16e−t2−29.946e−t+15.5385e−2t. (31)

Somewhat surprisingly, the corresponding criterion is . Figure 2: Graphs of the functions from Example 6. Blue: target h(t) (27). Red: our best approximation (29). Black, dashed: best approximation (30), with the same exponents as h(t).

Many methods in the literature are tested using standard densities, and our next two examples will be of this type. Given a continuous pdf , we simply discretize it on equidistant values in the interval and optimize the criterion (3) through the approximation (12). The values and are kept throughout the rest of the paper. The other parameters may change values, but we will not report them, since they can be found in the Matlab files.

###### Example 7

We discuss in detail the example W1 from , where the target is the Weibull pdf

 h(t)=kη(tη)k−1e(−t/η)k, (32)

with , . We approximate it with a SOE with terms on the interval . The solution

 f(t)=130.12e−2.7535t−329.39e−3.1707t+228.96e−3.5850t−29.665e−4.7543t (33)

gives . The result is illustrated in Figure 3, together with the best 4-term SOE reported in  (where the approximation is made on the Laplace transform), for which the criterion is only . Also, our result is visually similar to the order six model from [2, Fig. 2].

For other examples from  involving the Weibull and lognormal distribution, where we could compare the solutions only graphically (since the result was given only in this form), we have obtained better results with the same number of exponentials or similar results with fewer exponentials. Note also that the method from  allows complex values for the exponents; however, for example W1 the exponents were real. Figure 3: Graphs of the functions from Example 7. Blue: target h(t) (32). Red: our best approximation (33). Black, dashed: the approximation from .
###### Example 8

We turn now to some examples from , where the complementary cumulative density function (ccdf)

 H(t)=∫∞th(τ)dτ (34)

was used instead of the pdf and modeled with a SOE (4). However, since the derivative of a SOE is also a SOE , we have solved a problem very similar to (3), namely

 minJ(F)=1MM∑m=1wm[F(tm)−Hm]2s.t.f(t)≥0, ∀t∈[t0,tf]F′(t)=−f(t)F(t0)=1, F(tf)≥0 (35)

The last two constraints, that do not appear in (3), are linear in the coefficients of the SOE and hence the algorithms described in this paper can be immediately adapted.

The two ccdf discussed in this example are those of the Pareto

 H(t)=11+t (36)

and lognormal

 H(t)=1−12erfc(−log(t−μ)σ√2) (37)

distributions. For the lognormal, the parameters are , . The intervals on which optimization is performed are for Pareto and for the lognormal.

We present in Table 1 the values of the criteria (35) for several values of the number of exponentials and three methods. The first is an optimized version of the method from , which used exponents in arithmetic progression. The first exponent and the progression ratio were found by trial and error; we use the same values as reported there. In , the SOE is approximated using Jacobi polynomials, but with no explicit optimization. We use instead the exact optimization method sketched in section 4.1. Hence we obtain better results than those reported in . (Note that the maximum error is reported in , while we optimize an LS criterion and still get better maximum error.)

The other two methods are ours. In addition to SOE, we report now also some results for SOPE models. The values shown in Table 1 are obtained with degrees , , in (1). For Pareto, when comparing the number of coefficients , the SOPE model is sometimes more efficient than SOE. For example, the criterion for SOPE with and hence is smaller than that for SOE with . Also, if we take via , , the criterion is , smaller than the value for SOE and . For the lognormal (as well as for the previous examples, where SOPE was not mentioned), SOE appears more adequate than SOPE.

Comparing now our SOE method with the least-squares optimal version of the method from , we notice two distinct behaviors. For small , our method is significantly better, showing that the search for good exponents is successful and that using an arithmetic progression may be quite restrictive. However, for larger , the methods give comparable results. Now the arithmetic progression seems a satisfactory model. Numerical accuracy properties of the SDP solver may also come into play, since the fit is actually very good and the values of the criterion very small. Finally, we note that our method, by its nature, targets small values of , where the parameters of the model may have a significance. Taking exponents in a long arithmetic progression resembles more what was called non-parametric method in the introduction.

###### Example 9

We fit now SOE and SOPE models to data belonging to a popular time series, the eruption durations of the Old Faithful geyser. The problem is notoriously difficult since the pdf appears to have two disjoint intervals as support. There are 272 values in the series, which are grouped into equally spaced bins and shifted towards the origin with 1.6 minutes, which is the smallest duration of an eruption. Figure 4 shows the resulting histogram and two approximations with parameters, a SOE with and a SOPE with and , , . The optimization is performed on the interval ; zero values are appended to the histogram for times greater than the longer duration. The values of the criterion are for SOE and for SOPE, so again the SOPE model is more appropriate. The first peak of the data and the valley are quite well followed, while the approximation is worse for the second peak.

A visual comparison with Fig. 4b from  shows that our models with only 6 parameters have a better fit than some (older) methods with order 10 models and are only slightly worse than the model proposed in . All the models investigated there are more general than SOPE. Anyway, our purpose in this example is not to prove that SOPE is a good model for the Old Faithful data, but that our method is flexible enough to give reasonably good results in this case. Better approximation of the data can be obtained by shortening the optimization interval, but then, like in , the pdf is not strictly decreasing after the second peak, but has a third small peak.

## 6 Conclusion and future work

We have presented a method for solving SIP problems with inequality constraints involving functions (1) that are a sums of exponentials multiplied with polynomials with variable coefficients. The method is based on one-sided polynomial approximations of the exponentials on sub-intervals, which allow transformation of SIP into SDP and hence reliable computation of near-optimal solutions that are guaranteed to respect the constraints. We showed through several examples that our method is computationally attractive and gives good results compared to methods based on different principles.

Future work will be dedicated to the investigation of approximations for other functions. The exponential has properties that allowed us several tricks easing the computation of the polynomial approximation. Extension to general functions is not trivial, but is certainly possible. We aim to build an user-friendly library for convex SIP based on the principles set up in this paper.

## References

•  F. Alizadeh and D. Papp, Estimating arrival rate of nonhomogeneous Poisson processes with semidefinite programming, Ann. Oper. Res. (2013), in press.
•  S. Asmussen, O. Nerman, and M. Olsson, Fitting Phase-Type Distributions via the EM Algorithm, Scand. J. Stat. 23 (1996), pp. 419–441.
•  R. Bojanic and R. DeVore, On Polynomials of Best One Sided Approximation, Enseignement Math. 12 (1966), pp. 139–164.
•  E. de Klerk, G. Elabwabi, and D. den Hertog, Optimization of univariate functions on bounded intervals by interpolation and semidefinite programming, CentER discussion paper 2006-26, Tilburg Univ. (2006).
•  O. Devolder, F. Glineur, and Y. Nesterov, Solving infinite-dimensional Optimization Problems by Polynomial Approximation, in Recent Advances in Optimization and its Applications in Engineering, M.D. et al, ed., Springer, 2010.
•  R. DeVore, One Sided Approximation of Functions, J. Approx. Theory 1 (1968), pp. 11–25.
•  D. Dufresne, Fitting Combinations of Exponentials to Probability Distributions, Appl. Stochastic Models Bus. Ind. 23 (2007), pp. 23–48.
•  T. Fushiki, S. Horiuchi, and T. Tsuchiya, A Maximum Likelihood Approach to Density Estimation with Semidefinite Programming, Neural Computation 18 (2006), pp. 2777–2812.
•  M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, http://cvxr.com/cvx (2010).
•  C. Harris and W. Marchal, Distribution estimation using Laplace transforms, INFORMS J. Computing 10 (1998), pp. 448–459.
•  D. Henrion, J. Lasserre, and J. Löfberg, GloptiPoly 3: moments, optimization and semidefinite programming, Optim. Meth. Software 24 (2009), pp. 761–779.
•  D. Henrion, J. Lasserre, and M. Mevissen, Mean squared error minimization for inverse moment problems, Appl. Math. Optim. (2012), submitted, http://homepages.laas.fr/henrion/Papers/density.pdf.
•  V. John, I. Angelov, A. Oncul, and D. Thevenin, Techniques for the reconstruction of a distribution from a ?nite number of its moments, Chem. Eng. Science 62 (2007), pp. 2890–2904.
•  K. Kim and N. Thomas, A fitting method with generalized Erlang distributions, Simulation Modelling Practice and Theory 19 (2011), pp. 1507–1517.
•  C. Lewis, Computation of Best One-Sided Approximation, Math. Comput. 24 (1970), pp. 529–536.
•  J. Löfberg and P. Parrilo, From Coefficients to Samples: a New Approach to SOS Optimization, in 43rd IEEE Conf. Decision and Control, Bahamas, 2004, pp. 3154–3159.
•  A. Monteiro, R. Tutuncu, and L. Vicente, Recovering risk-neutral probability density functions from options prices using cubic splines and ensuring nonnegativity, Eur. J. Oper. Res. 187 (2008), pp. 525–542.
•  Y. Nesterov, Squared Functional Systems and Optimization Problems, in High Performance Optimiation, J. Frenk, C. Roos, T. Terlaky, and S. Zhang, eds., Kluwer Academic, 2000, pp. 405–440.
•  R. Pachon and L. Trefethen, Barycentric-Remez algorithms for best polynomial approximation in the chebfun system, BIT Numer. Math. 49 (2009), pp. 721–741.
•  C. Sexton, M. Olivi, and B. Hanzon, Rational Approximation of Transfer Functions for Non-Negative EPT Densities, in IFAC Symposium on System Identification, July, Brussels, Belgium, 2012, pp. 716–721.
•  H. Sherali and H. Wang, Global optimization of nonconvex factorable programming problems, Math. Program., ser. A 89 (2001), pp. 459–478.
•  J. Shortle, Piecewise Polynomial Approximations for Heavy-Tailed Distributions in Queueing Analysis, Stochastic Models 21 (2005), pp. 215–234.
•  B. Şicleru and B. Dumitrescu, POS3POLY – a MATLAB Preprocessor for Optimization with Positive Polynomials, Optimization and Engineering 14 (2013), pp. 251–273.
•  L.N. Trefethen, et al., Chebfun Version 4.2, The Chebfun Development Team (2011), http://www.maths.ox.ac.uk/chebfun/.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   