Stable soft extrapolation of entire functions

Stable soft extrapolation of entire functions

Dmitry Batenkov Department of Mathematics, Massachussetts Institute of Technology, Cambridge, MA 02139 batenkov@mit.edu Laurent Demanet Department of Mathematics, Massachussetts Institute of Technology, Cambridge, MA 02139 laurent@math.mit.edu  and  Hrushikesh N. Mhaskar Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711. hrushikesh.mhaskar@cgu.edu
Abstract.

Soft extrapolation refers to the problem of recovering a function from its samples, multiplied by a fast-decaying window and perturbed by an additive noise, over an interval which is potentially larger than the essential support of the window. To achieve stable recovery one must use some prior knowledge about the function class, and a core theoretical question is to provide bounds on the possible amount of extrapolation, depending on the sample perturbation level and the function prior.

In this paper we consider soft extrapolation of entire functions of finite order and type (containing the class of bandlimited functions as a special case), multiplied by a super-exponentially decaying window (such as a Gaussian). We consider a weighted least-squares polynomial approximation with judiciously chosen number of terms and a number of samples which scales linearly with the degree of approximation. It is shown that this simple procedure provides stable recovery with an extrapolation factor which scales logarithmically with the perturbation level and is inversely proportional to the characteristic lengthscale of the function. The pointwise extrapolation error exhibits a Hölder-type continuity with an exponent derived from weighted potential theory, which changes from 1 near the available samples, to 0 when the extrapolation distance reaches the characteristic smoothness length scale of the function. The algorithm is asymptotically minimax, in the sense that there is essentially no better algorithm yielding meaningfully lower error over the same smoothness class.

When viewed in the dual domain, soft extrapolation of an entire function of order 1 and finite exponential type corresponds to the problem of (stable) simultaneous de-convolution and super-resolution for objects of small space/time extent. Our results then show that the amount of achievable super-resolution is inversely proportional to the object size, and therefore can be significant for small objects. These results can be considered as a first step towards analyzing the much more realistic “multiband” model of a sparse combination of compactly-supported “approximate spikes”, which appears in applications such as synthetic aperture radar, seismic imaging and direction of arrival estimation, and for which only limited special cases are well-understood.

Acknowledgements

The research of DB and LD is supported in part by AFOSR grant FA9550-17-1-0316, NSF grant DMS-1255203, and a grant from the MIT-Skolkovo initiative. The research of HNM is supported in part by ARO Grant W911NF-15-1-0385, and by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2018-18032000002. The authors would also like to thank Alex Townsend and Matt Li for useful discussions.

1. Introduction

1.1. Background

Consider a one-dimensional (in space/time) object , and suppose it is corrupted by a low-pass convolutional filter and additive modeling/noise error , resulting in the output :

In the Fourier domain, we have

(1.1)

and consequently

(1.2)

where has small frequency support , the so-called “effective bandlimit” of the system (for example as in the case of the ideal low-pass filter ). The problem of computational super-resolution asks to recover features of from below the classical Rayleigh/Nyquist limit [5, 23]. As an ill-posed inverse problem, super-resolution can be regularized using some a-priori information about . One of the main theoretical questions of interest is to quantify the resulting stability of recovery.

Viewed directly in the frequency domain, super-resolution is equivalent to out-of-band extrapolation of for from samples of . Since the Fourier transform is an analytic function, this leads to the the problem of stable analytic continuation, widely studied during the last couple of centuries (Subsection 1.3).

1.2. Contributions

In this paper we consider the question of stable extrapolation of certain class of entire functions . In more detail, we assume that is analytic in and, for some ,

(1.3)

The underlying motivation for choosing such a growth condition is that the resulting set contains Fourier transforms of distributions of compact support (see Subsection 1.5 below for more details). Another common name for such would be “bandlimited” (or, in this case, “time-limited”).

Furthermore, instead of the “hard” cutoff we consider “soft” windows of super-exponentially decaying shapes, parametrized by (see Figure (c)c on page (c)c)

(1.4)

Gaussian point-spread functions are considered a fairly reasonable approximation in microscopy [39], and when is increased the shape of approaches the ideal filter. We therefore argue that the assumption (1.4) is realistic in applications.

We further assume that the perturbation in (1.2) is a uniformly bounded function

(1.5)

The “soft extrapolation” question, schematically depicted in Figure (a)a on page (a)a is then to recover in a stable fashion, over an interval which is potentially larger than the effective support of the window , where both and may depend on and .

(a) Schematic spectrum extrapolation from noisy and bandlimited data. The original function is (green, dashed). After being multiplied by the window (brown, dash-dot), it gives the blue, solid curve. Adding a corruption of size , this gives the data (shaded grey). The question is to recover over as wide an interval as possible, with as good an accuracy as possible.
(b) The object prior: the function has small space/time extent.
(c) The super-exponentially decaying windows (left column) and their inverse Fourier transforms (time domain filters ).
Figure 1.1. The schematic representation of the “soft extrapolation” problem.

Our first main result, Theorem 2.1 below (in particular see also Remark 2.2), shows that a weighted least-squares polynomial approximation with sufficiently dense samples of the form (1.2), taken inside the interval with scaling (up to sub-logarithmic factors111More complete scaling w.r.t , made precise in the sequel, is ) like and a judiciously chosen number of terms achieves an extrapolation factor which scales (again, up to sub-logarithmic in factors) like , while the pointwise extrapolation error exhibits a Hölder-type continuity, morally of the form for . The exponent varies from to , has an explicit form originating from weighted potential theory, and has itself a minor dependence on that we make explicit in the sequel.

Our second main result, Theorem 2.2 below, shows that the above result is minimax in terms of optimal recovery and thus cannot be meaningfully improved with respect to the asymptotic behaviour in .

In fact, we prove more general estimates, which hold for functions of finite exponential order and type , satisfying

(1.6)

provided that the window parameter satisfies .

1.3. Novelty and related work

Analytic continuation is a very old subject since at least the times of Weierstraß [16]. It is the classical example of an ill-posed inverse problem, and has been considered in this framework since the early 1960’s [33, 34, 10, 35], [6, 7, 19], and more recently [12] and [42]. These works establish regularized linear least squares as a near-optimal computational method, while also deriving the general form of Hölder-type stability, logarithmic scaling of the extrapolation range and the connection to potential theory. Weighted approximation of entire functions has been previously considered in [26, 28]. Below we outline some of these results in more detail, and relate them to ours.

  1. Miller [34] and Miller&Viano [35] considered the problem of analytic continuation in the general framework of ill-posed inverse problems with a prescribed bound. A general stabiliy estimate of the form was established using Carleman’s inequality, and regularized least squares (also known as the Tikhonov-Miller regularization) was shown to be a near-optimal recovery method. The setting in these work is one of “hard” extrapolation, where is analytic in an open disc , is being measured on a compact closed curve , and extended into . It is assumed that is uniformly bounded by some constant on .

    In contrast, we consider analytic continuation of entire functions satisfying (1.6) on unbounded domains from samples on intervals of the form into where both and can be arbitrarily large when . It doesn’t appear obvious how to extend the methods in [34, 35] to this setting, without using tools of weighted approximation of entire functions which were developed later.

  2. Tikhonov-Miller theory was applied in the particular context of super-resolution in  [6]. The inverse problem of “optical image extrapolation”, i.e. restoring a square-integrable bandlimited function from its measurments on a bounded interval is precisely dual to extrapolating the Fourier transform of a space-limited object of finite energy. For the “hard window” and any bandlimit the authors showed that the best possible reconstruction error has a Hölder-type scaling for some , in the uniform norm.

    In contrast, we consider the “soft” formulation, where the convolutional kernel does not vanish, but rather becomes exponentially small at high frequencies. Our estimates give point-wise error bounds so that the Hölder exponent is a function of the particular point, while also providing the functional dependence on the bandlimit (corresponding to in our notations).

  3. Out-of-band extrapolation for bandlimited functions was also considered in [19]. This setting is closely related to ours as the sampling interval can be arbitrary. It was shown that in this case, the optimal extrapolation length scales logarithmically with the signal-to-noise ratio, however no pointwise estimates were obtained. The sampling widow was again the “hard” one .

    Our results provide a more complete extrapolation scaling, rather than just Landau’s , while also giving pointwise error estimates. We also allow for functions of exponential order , and taking and , we in fact recover Landau’s scalings.

  4. A recent work by Demanet&Townsend [12] investigates the question of stable extrapolation of analytic functions having convergent Chebyshev polynomial expansions in a Bernstein ellipse with parameter from equispaced data on . Linear least squares reconstruction is shown to be asymptotically optimal, if the number of samples scales quadratically with , the number of terms in the approximation, and also . The resulting extrapolation error scales as where has a simple dependence on . Naturally, the maximal extrapolation extent is the boundary of the ellipse, i.e. , and also , while .

    Our results are close in spirit to [12], in particular the scaling for the polynomial approximation order, however the specific setting of soft extrapolation is different. Furthermore, we do not require equispaced samples (but only bound the sample density), and also we obtain point-wise extrapolation error bounds on the entire complex plane rather than just the real line.

  5. In another related recent work [42], Trefethen considers stable analytic continuation from an interval to an infinite half-strip (the so-called “linear” geometry), and shows an exponential loss of significant digits as function of distance from (corresponding to a stability estimate of the form where is the distance from and decreases exponentially). In contrast, we consider functions analytic in the entire plane which is not conformally equivalent to the linear geometry.

  6. Extrapolation is closely related to approximation, and in our case of “soft window”, one is naturally lead to adding a corresponding weight function. Indeed, the theory of weighted approximation (in particular of entire functions, [26, 27, 28]) plays a crucial role in our developments. In addition to extending these works to the extrapolation setting, our numerical approximation scheme in this paper does not necessarily require construction of quadrature formulas and orthogonal polynomials with respect to the weight .

1.4. Organization of the paper

In Section 2 we define the basic notation and formulate the main theorems. In Section 3 we present some numerical experiments which demonstrate the results in the case of and . In Section 4 we quote results from weighted approximation theory which are subsequently used in Section 5 for the proofs.

1.5. Discussion

When viewed in the dual domain (the variable in (1.1)), soft extrapolation of an entire function of order and finite exponential type corresponds to the problem of (stable) simultaneous de-convolution and super-resolution for objects of small space/time extent . Indeed, if , , is even, and if , then the Fourier transform in (1.1) is an entire function satisfying (1.3) (another common name for would be “bandlimited”, although in this case it would be more appropriate to say “time-limited”). This is an example of theorems of Paley-Wiener type ([41, Sect. 7.2], see also [37, p.12]), which hold also for more general classes such as tempered distributions.

In various applications such as seismic imaging, communications, radar, and microscopy, a fairly realistic prior on takes the form of a sparse atomic combination of compactly supported waveforms, also known as the multiband model in the literature [45, 36, 18, 43]

(1.7)

where each is assumed to have a small space/time support but is otherwise unknown. While there exist numerous studies of multiband signals such as the ones quoted above, super-resolution properties associated with this model are not well-understood, except in only some special cases (see e.g. [1, 2, 3, 4, 8, 9, 11, 13, 14, 22, 25, 29, 38, 40] as a very small sample).

Our results in this paper may be interpreted in this context when instead of the sparse sum in (1.7) we can consider the limit of a single object (possibly a distribution) of compact space/time support (Figure (b)b on page (b)b). In particular, we obtain the best possible scalings for stability of this inverse problem, showing that the amount of achievable super-resolution scales like , and therefore can be significant for small objects with . Furthermore, a simple algorithm — linear least squares fitting — is asymptotically optimal.

2. Optimal extrapolation of entire functions

2.1. Notation

In the sequel, we fix , and omit its mention from notations except to avoid conflict of notation, and for emphasis. Also, contrary to the introductory section, in the remainder of the paper we denote the extrapolation variable by instead of . The functions , , will correspond to and .

We shall use the standard definitions and notations of the spaces for (in this paper with respect to the Lebesgue measure on ) and the corresponding norms .

Given a function , and we define

(2.1)
Definition 2.1.

Given , the class consists of all entire functions , real valued on , and satisfying the condition

(2.2)
Remark 2.1.

Without loss of generality, in this paper we restrict the considerations to the class , although it is a proper subset of the set of entire functions of exponential order and type 222The standard definition of order and type is as follows (see e.g. [20]): if , then and ..

Indeed, for fixed , any (real-valued on ) of order and type and any we have , and therefore for some constant and . Furthermore, if the original is complex-valued on , one can consider the approximation of its real and imaginary parts separately, without changing the main asymptotic behaviour of the bounds.

Definition 2.2.

When comparing small functions of a small quantity , we shall write when there exists , depending on only, such that for all (however small), there exists , depending on such that

When both and we shall sometimes write .

For , denotes the class of all algebraic polynomials of degree at most . This is the same as the class of polynomials of degree at most , but the notation is simplified if we simply interpret in this way, rather than writing .

2.2. Extrapolation by least squares fitting

Let be a set of arbitrary real numbers. We observe data of the form

(2.3)

where and is a function satisfying

(2.4)

Given and as above, we define the operator computing the solution to the following least squares problem of degree :

(2.5)

When clear from the context, we shall omit and write .

Let

(2.6)

Our first main result below bounds the error for with the particular choice

with , under the assumption that the sampling set approximately lies in the interval and is sufficiently dense333 is given by (5.46), and the density conditions are given in (2.7) and (2.8).

The error bound heuristically behaves like a -dependent fractional power of the perturbation, but in reality it is slightly more complicated. In more detail:

  1. There is a natural rescaling of the variable by a factor of (which in turn depends on ), because the sampling set and the resulting approximating polynomial themselves depend on . So instead of bounding directly, we have a bound of the form

  2. The exponent in fact has a weak dependence on , and it is of the form where is the same function used in the definition of above and is a certain logarithmic potential.

There are three distinct regions in the complex plane with respect to the error asymptotics:

  1. The “approximation region” . In the range we have in fact , and therefore (disregarding the rounding effects) it can be easily shown that

    This is the error that would be obtained by conventional deconvolution, i.e. division by . Note that for any we have as

  2. The “extrapolation region”, where the error is less than , i.e. the maximal growth rate for any function in . It turns out that the maximal extrapolation interval can be precisely determined as , where

    In terms of the Hölder exponent , it turns out that , and therefore as we have

  3. The “forbidden” region where essentially no information can be obtained about from the samples on .

In Figure 2.1 on page 2.1 we show an example for the behaviour of both the exponent and the complete bound for different values of and other parameters fixed.

(a) The exponent . As becomes smaller, the values of in the interval approach 1, while at the right boundary they approach 0.
(b) The complete bound (solid), the quantity (dashed) and (dotted).
Figure 2.1. The optimal exponent and the corresponding error bound for several values of and .
Remark 2.2.

The length of the sampling window essentially scales like , while the maximal extrapolation range is of the asymptotic order . Therefore we obtain a genuine extension by a factor of .

The result reads as follows, and is proved in Section 5.

Theorem 2.1.

Let . For , let be as defined in (5.45) below. For any sequence of points satisfying

(2.7)
(2.8)

where is the explicit consant defined in (5.52) below, let be the weighted least squares polynomial fit of degree to using samples of as in (2.3), (2.4) and (2.5).

Then the (properly rescaled) pointwise extrapolation error satisfies

(2.9)

where is given by (5.46) and satisfies , while the function is defined in (5.49) below, such that

for any fixed (2.10)
for any with . (2.11)

Furthermore, the relation in (2.9) holds uniformly on compact subsets in .

Remark 2.3.

If the points are chosen to be equispaced, then it suffices to have at most linear (in the approximation degree ) growth of the size of the sampling set , as indeed it is sufficient to take

2.3. Optimality

Next, we will show that the results above cannot be meaningfully improved. To do so, we first review some facts from the theory of optimal recovery based on [32, 31], as they relate to our problem. We consider the set as a subset444Since the estimate (5.4) is uniform for all functions in , is a compact subset of . of the space of all continuous functions such that .

Let be a normed linear space with norm given by , and , be functions. The function is called information operator, and is called a recovery operator. In [32, 31], is required to be a linear operator, but we do not need this restriction here. In its place, we require that

(2.12)

We give some examples of and .

  1. A trivial example is and , . In this case the recovery operator is of interest only for an extension of to .

  2. In this paper, we are considering the mapping given by

    and the recovery operator given by .

  3. If , we define its Fourier-orthogonal coefficients (see (4.14) below) by

    (2.13)

    Let be the space of all sequences for which

    where is defined in (4.15). The information operator maps into , and satisfies (2.12). An example of a recovery operator is the expansion which converges when and . This shows that one can weaken the condition (2.12), allowing a constant factor on the right hand side of the inequality, by renormalizing .

For any information operator and recovery operator , , and , we are interested in the worst case recovery error when the information is perturbed by :

(2.14)

where again as given by (5.45). The best worst case (minimax) error is defined by

(2.15)

where it is understood that the infimum is over all and with all normed linear spaces , subject to (2.12). Informally, the best accuracy one can expect in reconstructing for every from any kind of information and recovery based only on the values of on . It is closely related to the notion of “best possible stablity estimate” of K.Miller [34], cf. Subsection 1.3.

With these notations, the bound (2.9) in Theorem 2.1 means that (with as before)

The following theorem shows that in this sense of optimal recovery, this result is the best possible.

Theorem 2.2.

There exists a function satisfying , such that for any ,

(2.16)

the relation holding uniformly for compact sets in the complex plane (w.r.t ).

In particular, for any small enough there exists a “dark object” such that

  1. for it has the same magnitude as the perturbation level:

  2. outside the sampling window, it has the same magnitude as the extrapolation error:

3. A numerical illustration: functions of order with Hermite polynomials

In this section we specialize our preceding results to the case , , with a function of finite exponential type . We then run a simple computational experiment and compare the results with the theoretical predictions, showing good agreement between the two in practice.

For technical convenience, we have chosen to work with an off-the-shelf implementation of Hermite polynomials, which are orthogonal with respect to the weight function . So instead of (2.3) we assume that is blurred by . Since our weights are not of this form, we perform a trivial change of variable and apply our results to the function , which is consequently of exponential type .

In this case we have , , , and also555According to [30, (2.9)], the relationship between and the function is The explicit formula for is then given by [30, (2.19),(2.20)].

(3.1)

where the branch of is chosen so that as .

Further, according to our notations, we also have , and

To implement the least squares operator , we have chosen to work in the basis of Hermite orthogonal polynomials which satisfy . Following Remark 2.3 we pick equispaced samples in (i.e. the oversampling factor is ).

For running the experiments below, we have chosen the following model function to extrapolate:

(3.2)

A simple computation shows that , and therefore .

Theorem 2.1 implies that the error satisfies, in the unscaled variable,

(3.3)

Instead of the construction used in the proof of Theorem 2.2 (i.e. the function given in (5.56)), we shall use the following function as our “dark object” (unrelated to ):

(3.4)

It is not difficult to show that this function is in and in fact also satisfies (5.57) and (5.60)666Proof is available upon request.. The underlying reason for using a different function is that it is not known how to evaluate general, while (3.4) is an absolutely explicit formula.

In Figure 3.1 on page 3.1 we show the reconstruction and the corresponding errors + bounds for fixed in the original scaling. As can be seen in Figure (a)a on page (a)a, the derived bounds are reasonably accurate. In Figure (b)b on page (b)b it is clearly seen that the algorithm chooses a reasonable value for , avoiding the extreme noise outside the essential support of the window.

In Figure 3.2 on page 3.2 the reconstruction and comparison were performed for fixed and , varying . It can be seen that the dependence of the error on is accurately determined. The threshold values of for which one moves from interpolation to extrapolation region () and from extrapolation to forbidden region () can be approximately determined as the solutions and of the equations

(3.5)
(3.6)
(a) The error and the bounds. Thin black lines are for different noise realizations (see (2.3) and (2.4)), their maximal envelope is the red solid curve. The green, dashed curve is the analytical bound in (3.3), while the magenta, dashdotted curve is the minimax function defined in (3.4). The dotted vertical lines are the region boundaries.
(b) The function (blue dashed) and its extrapolant (red solid). The black dots are the actual sampling points, while the grey curve is the noisy function which would need to have been used without extrapolation – recall (2.3). As can be seen, the choice of ensures that the samples are taken inside the essential support of the window, which depends on .
Figure 3.1. Results for the function defined in (3.2) with and .
(a) , the transition between regions 1 and 2 (black vertical line) computed numerically from (3.5).
(b) , the transition between regions 2 and 3 (brown vertical line), computed numerically from (3.6).
Figure 3.2. Error (red, solid curve) vs bound (green, dashed curve) and the minimax function (magenta, dashdotted) from (3.4) for , as function of for a fixed . Same function as in (3.2).

4. Preliminaries from weighted approximation theory

4.1. Weighted polynomials

A very important fact in the theory of weighted approximation is that the supremum norm of weighted polynomials is attained on an interval depending only on the degree of the polynomial, and not on the individual polynomials involved. We need to describe this fact in some detail. Let

(4.1)

be the so called Mhaskar-Rakhmanov-Saff numbers. The Ullman distribution is defined by

(4.2)

Further denote by its logarithmic potential

(4.3)

The number

(4.4)

is called the modified Robin’s constant for .

The following is proved in [27, Theorem 6.4.2], in fact, for all .

Theorem 4.1.

Let . Then

  1. The potential satisfies

    (4.5)
  2. If , , then

    (4.6)

    In particular,

    (4.7)

The following theorem gives an analogue of (4.7) in the case of the norms. The proof of this theorem is essentially in [27], but proved in [28, Theorem 2.2] in the form given below.

Theorem 4.2.

Let , , be as in (4.1). For any , integer and , we have

(4.8)

where

(4.9)

In the sequel, we will use the notation

(4.10)
Definition 4.1.

Let denote the unique monic polynomial of degree satisfying

(4.11)

These are also called weighted Chebyshev polynomials, as the above expression is completely analogous to the well-known property of the classical Chebyshev polynomials , namely that is the minimax monic approximant to the zero function [15, Theorem 3.6].

The following proposition (see e.g.[27, Section 6.3], [24, Corollary 3.3]) lists some of the required properties of .

Proposition 4.1.

Let . Then

  1. For , the polynomial has simple zeros in : .

  2. We have

    (4.12)
  3. Uniformly on compact subsets of , we have

    (4.13)

We will also use another family of polynomials. Let be the system of orthonormalized polynomials with respect to ; i.e., for each , , and for ,

(4.14)

It is known [21, Theorem 13.6] that there exist constants depending only on such that

(4.15)

4.2. Weighted approximation of entire functions

If and , we define the degree of approximation of by

(4.16)

In [27, Theorem 7.2.1(b)], we have proved the following theorem (with a different notation).

Theorem 4.3.

Let , , , , and

(4.17)

Then has an extension to the complex plane as an entire function of order and type given by

(4.18)

Conversely, if is the restriction to the real line of an entire function of order and type , then for every , , defined by (4.17) is finite, and (4.18) holds.

5. Proofs

5.1. Asymptotics as

In the sequel, the symbols will denote positive constants depending on , and other explicitly indicated quantities only. Their values may be different in different occurrences, even within the same formula.

We shall be using the following notation to compare decay of sequences up to sub-exponential factors.

Definition 5.1.

For sequences , , we write (or ) to denote the fact that there exists a sequence with , the limit being uniform in such that

Equivalently, for any , there exists , depending on and other explicitly indicated quantities only such that

Proposition 5.3 below relates the above condition to Definition 2.2.

Recall the weighted Chebyshev polynomials from (4.11) and Proposition 4.1. The following theorem is proved implicitly in the course of the proof of Lemma 7.2.5 in [27], but we will sketch a proof again since it is not stated explicitly there.

Theorem 5.1.

Let , , , and . For integer let be the unique polynomial of Lagrange interpolation in that satisfies at each of the zeros