# Optimal polygonal linearization and

fast interpolation of nonlinear systems

###### Abstract

The analysis of complex nonlinear systems is often carried out using simpler piecewise linear representations of them. A principled and practical technique is proposed to linearize and evaluate arbitrary continuous nonlinear functions using polygonal (continuous piecewise linear) models under the L1 norm. A thorough error analysis is developed to guide an optimal design of two kinds of polygonal approximations in the asymptotic case of a large budget of evaluation subintervals N. The method allows the user to obtain the level of linearization (N) for a target approximation error and vice versa. It is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), allowing real-time performance of computationally demanding applications. The quality and efficiency of the technique has been measured in detail on two nonlinear functions that are widely used in many areas of scientific computing and are expensive to evaluate.

## I Introduction

The approximation of complex nonlinear systems by simpler piecewise linear representations is a recurrent and attractive task in many applications since the resulting simplified models have lower complexity, fit into well established tools for linear systems and are capable of representing arbitrary nonlinear mappings. Examples include, among others, complexity reduction for finding the inverse of nonlinear functions [1, 2], distortion mitigation techniques such as predistorters for power amplifier linearization [3, 4], the approximation of nonlinear vector fields obtained from state equations [5], the obtainment of approximate solutions in simulations with complex nonlinear systems [6], or the search for canonical piecewise linear representations in one and multiple dimensions [7].

In the last decades, the main efforts in piecewise linearization have been devoted both to find approximations of multidimensional functions from a mathematical standpoint and to define circuit architectures implementing them (see, for example, [8] and references therein). In the one-dimensional setting, a simple and common linearization strategy consists in building a linear interpolant between samples of the nonlinear function over a uniform partition of its domain. Such a polygonal (i.e., continuous piecewise linear) interpolant may be further optimized by choosing a better partition of the domain according to the minimization of some error measure. This is a sensible strategy in problems where there is a constraint on the budget of samples allowed in the partition.

Hence, in spite of the multiple benefits derived from modeling with piecewise linear representations, a proper selection of the interval partitions and/or predefining the number of partitions is paramount for a satisfactory performance. Some researchers [2] use cross-validation based approaches to select such a number of pieces within a partition. In other applications, the budget of pieces may be constrained by an internal design requirement (speed, memory or target error) of the approximation algorithm or by some external condition.

Simplified models may be built using descent methods [9], dynamic programming [10] or heuristics such as genetic [1] and/or clustering [11] algorithms to optimize some target approximation error. In some cases, however, the resulting piecewise representation may fail to preserve desirable properties of the original nonlinear system such as continuity [1].

We consider the simplified model representation given by the least-first-power or best approximation of a continuous nonlinear function by some polygonal function. The generic topic of least-first-power approximation has been previously considered in several references, e.g., [12, 13, 14], over a span of many years and it is a recurrent topic and source of insightful results.

We develop a fast and practical method to compute a suboptimal partition of the interval where the polygonal interpolant and the best polygonal approximation to a nonlinear function are to be computed. This technique allows to further optimize the polygonal approximation to a function among all possible partitions having the same number of segments, or conversely, allow to achieve a target approximation error while minimizing the budget of segments used to represent the nonlinear function. The resulting polygonal approximation is useful in applications where the evaluation of continuous mathematical functions constitutes a significant computational burden, such as computer vision [15, 16] or signal processing [17, 18, 19].

Our work may be generalized to the linearization of multidimensional functions [7] and the incorporation of constraints, thus opening new perspectives also in the context of designing circuit architectures for such piecewise linear approximations, as in [8]. However, these interesting generalizations will be the topic of future work.

The paper is organized as follows: two polygonal approximations of real-valued univariate functions (interpolant and best approximation) are presented in Section II. The mathematical foundation and algorithmic procedure to compute a suboptimal partition for the polygonal approximations are developed in Section III. The implementation of the numerical evaluation of polygonal approximations is discussed in Section IV. Experimental results of the developed technique on nonlinear functions (Gaussian, chirp) are given in Section V, both in terms of quality and computational times. Finally, some conclusions are drawn in Section VI.

## Ii Piecewise Linearization

In general, a piecewise function
over an interval is specified by two elements: a set of
control or nodal points , also called knots,
that determine a partition of into
a set of (disjoint) subintervals ,
and a collection of functions (so called “pieces”),
one for each subinterval . In particular, a *polygonal*
or continuous piecewise linear (CPWL) function satisfies additional
constraints: all “pieces” are (continuous) linear
segments and there are no jumps across pieces, i.e., continuity is
also enforced at subinterval boundaries, .
Fig. 1 shows, for a given partition the
two polygonal functions that we use throughout the paper to approximate
a real-valued function : the interpolant and best
approximation . Polygonal functions of a given partition
generate a vector space since the addition of
such functions and/or multiplication by a scalar yields another polygonal
function defined over the same partition.

A useful basis for vector space is formed by the set of nodal basis or hat functions , where , displayed in Fig. 2, is the piecewise linear function in whose value is 1 at and zero at all other control points , , i.e.,

Functions and associated to boundary points and are only half hats. These basis functions are convenient since they can represent any function in terms of the values of at the control points, , in the form

(1) |

From an approximation point of view this basis is frequently used
in the Finite Element Method since the hat function (*simplex*
in arbitrary dimensions) is flexible, economic and in some way a natural
geometrical element into which to decompose an arbitrary geometric
object.

The polygonal interpolant of a continuous function (possibly not in ) over the interval linearly interpolates the samples of at the control points, thus using in (1),

This polygonal approximation is trivial to construct and might be good enough in some applications (e.g., power amplifier predistorters [4], the trapezoidal rule for integration), but for us it is useful to analyze other possible polygonal approximations, such as the best one in the sense, as we discuss next.

Now, consider the problem of approximating a continuous function by some polygonal function in using the norm to measure distances. We address natural questions such as the existence and uniqueness of such a best approximation, methods to determine it and the derivation of estimates for the minimal distance.

Let us answer the question about the existence of a best approximation, i.e., the existence of whose distance from is least. Recall that the space of continuous functions in a given closed interval , together with the norm

(2) |

is a normed linear vector space (NLVS) . Since is a finite dimensional linear subspace (with basis given by the nodal functions ) of the normed space , then for every there exists a best approximation to in [20, Cor. 15.10] [21, Thm. I.1].

The uniqueness of the best approximation is guaranteed for strictly convex subspaces of NLVSs [20, Thm. 15.19] [21, Thm. I.3], i.e., those whose unit balls are strictly convex sets. Linear vector spaces with the 1 or norms are not strictly convex, therefore (a priori) the solution might be unique, but it is not guaranteed. In these cases, the uniqueness question requires special consideration. Further insights about this topic are given in [22, ch. 4][21, ch. 3], which are general references for approximation (using polynomials or other functions) and in [23], which is a comprehensive and advanced reference about nonlinear approximation.

Next, we show how to compute such a best approximation, and later we will carry out an error analysis. As is well known [24, p. 130], generically, the analytic approach to optimization problems using the norm involves derivatives of the absolute value, which makes the search for an analytical solution significantly more difficult than other problems (e.g., those using the norm).

As already seen, a function can be written as (1). Let us explicitly note the dependence of with respect to the coefficients by . The least-first-power or best approximation to is a function that minimizes . Since , it admits the expansion in terms of the basis functions of , i.e., letting we may write

(3) |

By definition, the coefficients minimize the cost function

(4) |

Hence, they solve the necessary optimality conditions given by the non-linear system of equations

(5) |

where is the gradient of (4), with entries .

Due to the partition of the interval into disjoint subintervals , we may write (4) as

therefore, if , each gradient component is,

Observe that solely depends on (except at extreme cases ) due to the locality and adjacency of the basis functions . In the case of the norm, the optimality conditions are linear and the previous observation leads to a tridiagonal (linear) system of equations.

A closed form solution of (5) may not be available, and so, to solve the system we use standard numerical iterative algorithms of the form to find an approximate local solution . Specifically, we use step in the Newton-Raphson iteration given by the solution of the linear system , where . Near the solution , this iteration has quadratic convergence rate. This is also the step given by Newton’s optimization method when approximating (4) by its quadratic Taylor model. Due to the locality of the basis functions , cost function (4) has the advantage that its Hessian (H) is a tridiagonal matrix, so is faster to compute than in the case of a full Hessian matrix.

The search for the optimal coefficients may be initialized by setting equal to the values of the function at the nodal points, i.e., with . A more sensible initialization to improve convergence toward the optimal coefficients is given by the ordinates of the best approximation (i.e., orthogonal projection of onto ) , which are easily obtained by solving a linear system of equations using the Thomas algorithm, with tridiagonal Gramian matrix , , and inner product .

From a numerical point of view, it is also a reasonable choice to replace by some smooth approximation, for example, , with parameter controlling the width of the transition around .

In summary, the coefficients that specify the best approximation (3) on a given partition are computed numerically via iterative local optimization techniques starting from an initial guess .

## Iii Optimizing the partition

Given a vector space , we are endowed with a procedure to compute the least-first-power approximation of a function and the corresponding error, . However, the approximation error depends on the choice of , which is specified by the partition . Hence, the next problem that naturally arises is the optimization of the partition for a given budget of control points, i.e., the search for the best vector space to approximate for a given partition size. This is a challenging non-linear optimization problem, even in the simpler case (less degrees of freedom) of substituting by the polygonal interpolant . Fortunately, a good approximation of the optimal partition can be easily found using an asymptotic analysis.

Next, we carry out a detailed error analysis for the polygonal interpolant and the polygonal least-first-power approximation . This will help us derive an approximation to the optimal partition that is valid for both and , because, as it will be shown, their approximation errors are roughly proportional if a sufficiently large budget of control points, i.e., large number of subintervals, is available.

### Iii-a Error in a single interval: linear interpolant

First, let us analyze the error generated when approximating a function , twice continuously differentiable, by its polygonal interpolant in a single interval , of length . To this end, recall the following theorem on interpolation errors [25, sec. 4.2]: Let be a function in , with and closed, and let be a polynomial of degree or less that interpolates at distinct points . Then, for each there exists a point for which

(6) |

In the subinterval , letting , the polygonal interpolant is written as

(7) |

Since interpolates the function at the endpoints of , we can apply theorem (6) (with ); hence, the approximation error only depends on and , but not on or :

(8) |

Let us compute the error over the subinterval by integrating the magnitude of (8), according to (2):

Next, recall the first mean value theorem for integration, which states that if is a continuous function and is an integrable function that does not change sign on , then there exists a number such that

(9) |

Applying (9) to the error and noting that gives

(10) |

for . Finally, if , a direct derivation of the error bound yields

(11) |

Formula (11) states that the deviation of from being linear between endpoints of is bounded by the maximum concavity/convexity of the function in (e.g., limits the amount of bending) and the cubic power of the interval size , also known as the local density of control points.

### Iii-B Error in a single interval: best linear approximation

To analyze the error due to the least-first-power approximation and see how much it improves over that of the interpolant , let us first characterize the error incurred when approximating a function by a linear segment not necessarily passing through the endpoints of ,

(12) |

where and are extra parameters with respect to that allow the linear segment to better fit the function in . Letting by analogy to (7), the corresponding error is

(13) | ||||

(14) |

#### Iii-B1 Characterization of the optimal line segment

To find the line segment that minimizes the distance

i.e., to specify the values of the optimal in (12), we solve the necessary optimality conditions given by the non-linear system of equations

(15) | ||||

where we used that, for a function ,

Adding both optimality equations in (15) gives

which implies that must be positive in half of the interval and negative in the other half.

In fact, [26][21, Cor. 3.1.1] state that if has a finite number of zeros (at which changes sign) in , then is a best approximation to if and only if (15) is satisfied. To answer the uniqueness question, [27][28][21, Thm 3.2] state that a continuous function on has a unique best approximation out of the set of polynomials of degree . Hence, the solution of (15) provides the best linear approximation.

Let us discuss the solution of (15). If changes sign only at one abscissa , e.g., , and , the non-linear system of equations (15) cannot be satisfied since the first equation gives while the second equation gives . However, in the next simplest case where changes sign at two abscissas , the non-linear system (15) does admit a solution. This is also intuitive to justify since it corresponds to the simplified case constant in , where the sign change occurs if , i.e., according to (14), which is a quadratic equation in . It is also intuitive by looking at a plot of a candidate small error, as in Fig. 3, right.

Next, we further analyze the aforementioned case of changing sign at , with . Assume that for and in the other half of . If we apply the change of variables , and let for , then (15) becomes

Adding both equations gives, as we already mentioned, , i.e., , stating that in half of the interval. This equation can be used to simplify the second equation, , yielding . Therefore (15) is equivalent to the linear system whose solution is , , i.e. , .

This agrees with the particularization of a more general result [21, Cor.3.4.1]: if is adjoined to the set of (linear, ) polynomials in , , meaning that and has at most distinct zeros in for every , its best approximation out of is the unique which satisfies

for , . The cosine term comes from the zeros of the Chebyshev polynomial of the second kind.

In other words, the best approximation is constructed by interpolating
at the *canonical points* (the points of
sign change of in (15)),
as expressed by [13][22] in a nonlinear
context. Hence, the values of that
satisfy (15) are chosen so that
zero crossings of occur
at canonical points and length of the
interval , yielding the linear system of equations

whose solution is, after substituting in (14),

(16) | ||||

The previous solution implies that the sum of the displacements has opposite sign to the convexity/concavity of the function :

(17) |

where from (16) lies between the least and greatest values of on , and by the intermediate value theorem it is for some . This agrees with the intuition/graphical interpretation (see Fig. 3, right).

#### Iii-B2 Minimum error of the optimal line segment

Now that the optimal have been specified, we may compute the minimum error. Let for , then, since , we may expand

(18) |

Next, since for all , use the first mean value theorem for integration (9) to simplify

for some . In particular, using the previous formula for each term in (18) gives

for some . Hence, (18) becomes

The segments in the best polygonal approximation may not strictly satisfy this because has additional continuity constraints across segments. The jump discontinuity at between adjacent independently-optimized pieces is

where and are displacements with respect to of the optimized segments (16) at each side of , and evaluation points and lie in . In case of twice continuously differentiable functions in a closed interval, the extreme value theorem states that the absolute value terms in the previous equation are bounded. Accordingly, if and decrease due to a finer partition of the interval (i.e., a larger number of segments in ), the discontinuity jumps at the control points of the partition decrease, too. Therefore, the approximation is valid for large . Finally, if is sufficiently small so that is approximately constant within it, say , then

(19) |

In the last step we substituted , which can be proven by evaluation at the midpoint of interval :

### Iii-C Approximation to the optimal partition

Once analyzed the errors of both interpolant and least-first-order approximation on a subinterval , let us use such results to propose a suboptimal partition of the interval in the asymptotic case of a large number of subintervals.

A suboptimal partition for a given budget of control points is one in which every subinterval has approximately equal contribution to the total approximation error [29, 30]. Since such an error depends on the function being approximated, it is clear that such a dependence will be transferred to the suboptimal partition, i.e., the suboptimal partition is tailored to . Specifically, because the error is proportional to the local amount of convexity/concavity of the function, a suboptimal partition places more controls points in regions of with larger convexity than in other regions so that error equalization is achieved. Assuming is large enough so that is approximately constant in each subinterval and therefore the bound (11) is tight, we have

(21) |

for some constant , and the control points should be chosen so that the local knot spacing [30] is , i.e., smaller intervals as increases. Hence, the local knot distribution or density is

(22) |

meaning, as already announced, that more knots of the partition are placed in the regions with larger magnitude of the second derivative.

The error equalization criterion leads to the following suboptimal partition : , , and take knots given by

(23) |

where the monotonically increasing function is

(24) |

This procedure divides the range of into contiguous equal length sub-ranges, and chooses the control points given by the preimages of the endpoints of the sub-ranges. It is graphically illustrated in Fig. 4. The suboptimal partition is related to the theory of optimum quantization [31], particularly in the asymptotic or high-resolution quantization case [32], where a “companding” function such as enables non-uniform subinterval spacing within a partition.

This partition allows us to estimate the error bound in the entire interval starting from that of the subintervals. For any partition , the total error is the sum of the errors over all subintervals and, by (11),

(25) |

which, under the error equalization condition (21), becomes

(26) |

To determine , sum over all subintervals and approximate the result using the Riemann integral: