Worstcase optimal approximation with increasingly flat Gaussian kernels
Abstract
We study worstcase optimal approximation of positive linear functionals in reproducing kernel Hilbert spaces (RKHSs) induced by increasingly flat Gaussian kernels. This provides a new perspective and some generalisations to the problem of interpolation with increasingly flat radial basis functions. When the evaluation points are fixed and unisolvent, we show that the worstcase optimal method converges to a polynomial method. In an additional onedimensional extension, we allow also the points to be selected optimally and show that in this case convergence is to the unique Gaussiantype method that achieves the maximal polynomial degree of exactness. The proofs are based on an explicit characterisation of the Gaussian RKHS in terms of exponentially damped polynomials.
Keywords:
Worstcase analysis Reproducing kernel Hilbert spaces Gaussian kernel Gaussian quadrature∎ \DeclareUnicodeCharacter00A0
1 Introduction
Let be a subset of with a nonempty interior and a positive linear functional acting on continuous realvalued functions defined on and satisfying for any polynomial on . A cubature rule (quadrature if ) with the distinct points and weights is a weighted approximation to of the form
(1) 
In this article, the interest is often only in the weights while the points are kept fixed; accordingly, we may denote when there is no risk for confusion. Every continuous positivedefinite kernel induces a unique reproducing kernel Hilbert space (RKHS) where the reproducing property holds for every and . We assume that , which guarantees for any and consequently for any . The worstcase error of the cubature rule (1) in is
(2) 
Given a fixed set of distinct points, we are interested in the kernel cubature rule whose weights are chosen so as to minimise the worstcase error:
These weights are unique and available as the solution to the linear system [Oettershagen, 2017, Section 3.2]
(3) 
Arguably, the two most prominent linear functionals in approximation theory are the point evaluation functionals , given by for , and the integration functional defined by a measure on . The former case yields the kernel interpolant
where are the Lagrange cardinal functions that satisfy . For any fixed we have for . In this case the worstcase error coincides with the power function [Schaback, 1993]. For an arbitrary , the kernel cubature rule can be also obtained by applying to the kernel interpolant so that .
1.1 Increasingly flat kernels
Selection of the positivedefinite kernel has a radical effect on the weights of the kernel cubature rule . In the case of interpolation, interesting behaviour has been observed when kernel is isotropic (i.e., a radial basis function):
for a positivedefinite function and a lengthscale parameter . When , the kernel becomes increasingly flat and the linear system (3) increasingly illconditioned.^{1}^{1}1Note that most of the literature we cite parametrises the kernel in terms of the inverse lengthscale and accordingly considers the case . Nevertheless, the corresponding kernel interpolant is typically wellbehaved at this limit. Starting with the work of Driscoll and Fornberg [2002], it has been shown that a certain unisolvency assumption on implies that the kernel interpolant converges to (i) a polynomial interpolant if the kernel is infinitely smooth [Driscoll and Fornberg, 2002, Fornberg et al., 2004, Schaback, 2005, Larsson and Fornberg, 2005, Lee et al., 2007, Schaback, 2008] or (ii) a polyharmonic spline interpolant if the kernel is finitely smooth [Song et al., 2012, Lee et al., 2014]. Further generalisations appear in [Lee et al., 2015]. The former case covers kernels such as Gaussians, multiquadrics, and inverse multiquadrics while the latter applies to, for example, Matérn kernels and Wendland’s functions. Among the most interesting of these results is the one by Schaback [2005] who proved that the interpolant at the increasingly flat limit of the Gaussian kernel
(4) 
exists regardless of the geometry of and coincides with the de Boor and Ron polynomial interpolant [de Boor and Ron, 1992, de Boor, 1994]. Increasingly flat kernels have been also discussed independently in the literature on the use of Gaussian processes for numerical integration O’Hagan [1991], Minka [2000], Särkkä et al. [2016], albeit accompanied only with nonrigorous arguments. Even though the intuition that the lowest degree terms in the Taylor expansion of the kernel dominate construction of the interpolant as and that this ought to imply convergence to a polynomial interpolant is quite clear, this is not always translated into transparent proofs.
1.2 Contributions
The purpose of this article is to generalise the aforementioned results on flat limits of kernel interpolants for kernel cubature rules. That such generalisations are possible is not perhaps surprising; it is rather the simple proof technique made possible by the worstcase framework that we find the most interesting aspect. We consider only the Gaussian kernel (4). This is because its RKHS has been completely characterised by Steinwart et al. [2006] and Minh [2010] (see also [Steinwart and Christmann, 2008, Section 4.4] and [De Marchi and Schaback, 2009, Example 3]).
Theorem 1.1
Let be a subset of with a nonempty interior. Then RKHS induced by the Gaussian kernel (4) with lengthscale consists of the functions
(5) 
where convergence is absolute. Its inner product is . Furthermore, the collection
(6) 
of functions forms an orthonormal basis of .
This theorem is our central tool. Two crucial implications are that consists of certain functions expressable as series of exponentially damped polynomials, the damping effect vanishing as , and that, due to the terms appearing in the RKHS norm, the highdegree terms contribute the most to the norm. Consequently, the worstcase error (2), taking into account only functions of at most unit norm, is dominated by lowdegree terms when is large. The following results then follow:

If is unisolvent with respect to a full polynomial space and , then converges to the unique cubature rule that satisfies for every polynomial of degree at most . This result, contained in Theorem 2.3 and Corollary 2.4 in Section 2, is a generalisation for arbitrary linear functionals of the interpolation results cited in Section 1.1. It is not required that be bounded: at the end of Section 2 we supply an example involving integration over with respect to the Gaussian measure.

In Section 3 we present a generalisation, based on a theorem of Barrow [1978], for optimal kernel quadrature rules [Oettershagen, 2017, Chapter 5] that have both their points and weights selected so as to minimise the worstcase error. The result, Theorem 3.4, states that such rules, if unique, converge to the point Gaussian quadrature rule for the functional , which is the unique quadrature rule such that for every polynomial of degree at most . This partially settles a conjecture posed by O’Hagan [1991, Section 3.3] and further discussed in [Minka, 2000, Särkkä et al., 2016] on convergence of optimal kernel quadrature rules to Gaussian quadrature rules.
Some generalisations for other kernels and cubature rules of more general form than (1) are briefly discussed in Section 4.
2 Fixed points
Let stand for the space of variate polynomials of degree at most :
In this section we assume that the point set is unisolvent. That is,
and the zero function is the only element of that vanishes on . This is equivalent to nonsingularity of the (generalised) Vandermonde matrix
(7) 
where , …, . It follows that there is a unique polynomial cubature rule such that for every . Its weights solve the linear system of equations, where . Define then
(8) 
so that functions in the Gaussian RKHS , characterised by Theorem 1.1, are of the form for coefficients that decay sufficiently fast. Since the exponential function has no real roots, determinant of the matrix
(9) 
satisfies and is hence nonsingular. From nonsingularity it follows that there are unique weights such that for every satisfying . The weights solve , where .^{2}^{2}2See [Fasshauer and McCourt, 2012] for an interpolation method based on a closely related basis derived from a Mercer eigendecomposition of the Gaussian kernel and [Karvonen and Särkkä, 2019] for an explicit construction of weights similar to in the case is the Gaussian integral. The following simple lemmas will be useful.
Lemma 2.1
Suppose that is unisolvent and for every . Then there is a constant such that for any .
Proof
The assumption and unisolvency of imply that . Because for any polynomial , both the weights and are finite and the claim follows. ∎
Lemma 2.2
Suppose that is unisolvent and for every . If is a sequence of weights such that
then .
Proof
We have and
where each of the terms on the righthand side vanishes as . Because and is nonsingular, we conclude that . ∎
We are ready to prove the main result of the article for a fixed unisolvent point set consisting of distinct points. First, by considering one of the basis functions (6) we show that for every . Second, the suboptimal cubature rule defined above can be used, in combination with (5), to establish the upper bound . These two bounds imply that for every . If , Lemma 2.2 then implies that .
Theorem 2.3
Let for some and be unisolvent. Suppose that for every such that and that
(10) 
for some and any sequence such that . Then
where are the weights of the unique polynomial cubature rule such that for every .
Proof
For every select the function
From Theorem 1.1 it follows that since is one of the basis functions (6). Thus, by definition of the worstcase error,
(11) 
Next we derive an appropriate upper bound on by considering the unique suboptimal cubature rule that is exact for every with . In the expansion (5) of a function in we have for every term with . Consequently, the worstcase error of this rule admits the bound
where are the coefficients that define in Theorem 1.1. A consequence of (5) is that implies for some reals such that . Therefore, for ,
by assumption (10). Moreover, because
for some and every , we have
where follows from convergence of the last term and Lemma 2.1. Thus
(12) 
when . Since is worstcase optimal, we have thus established with (11) and (12) that, for sufficiently large ,
for every such that and a constant independent of . That is,
(13) 
The claim then follows by setting in Lemma 2.2. ∎
Assumptions of Theorem 2.3 hold, for instance, if the domain is bounded.
Corollary 2.4
Let for some and be unisolvent. Suppose that is bounded. Then
where are the weights of the unique polynomial cubature rule such that for every .
Proof
On a bounded domain the convergence as is uniform. Thus
as for every . Assumption (10) is also satisfied:
where for and finiteness follows from the assumption . ∎
However, boundedness of is not necessary. Consider Gaussian integration:
When for some (otherwise all the integrals below vanish by symmetry),
while
Thus as . To verify (10), recall that the absolute moments of the standard Gaussian distribution are
where is the Gamma function. Because for any and
if is odd, we have
Thus
if .
3 Optimal points in one dimension
Let and for . In this section we consider quadrature rules whose points are also selected so as to minimise the worstcase error. A kernel quadrature rule is optimal if
In order to eliminate degrees of freedom in ordering the points we require that the points are in ascending order (i.e., ). Even though optimal kernel quadrature rules have been studied since the 1970s [Larkin, 1970, Richter, 1970, RichterDyn, 1971, Barrar et al., 1974, Bojanov, 1979] (the main results have been recently collated by Oettershagen [2017, Section 5.1]) for the integration functional , their theory is still far from complete. As far as we are aware of, there are no results guaranteeing existence or uniqueness of these rules, and the most advanced statement seems to be that an optimal kernel quadrature rule, if it exists, has all its points distinct and in the interior of if the kernel is totally positive (e.g., Gaussian) [Oettershagen, 2017, Corollary 5.13].
In Theorem 3.4 we show that uniqueness of implies that its increasingly flat limit is , the point Gaussian quadrature rule for the linear functional . This is the unique quadrature rule that is exact for every polynomial of degree at most : whenever . This degree of exactness is maximal; there are no point quadrature rules exact for all polynomials up to degree . The most familiar methods of this type are of course the classical Gaussian quadrature rules for numerical integration [Gautschi, 2004, Section 1.4]. For example, the Gauss–Legendre quadrature rule satisfies
for every polynomial of degree at most . The points of this rule are the roots of the th degree Legendre polynomial. Theorem 3.4 was conjectured by O’Hagan [1991, Section 3.3] in 1991 in the form that the optimal kernel quadrature rule has the classical Gauss–Hermite quadrature rule as its increasingly flat limit if the kernel is Gaussian and is the Gaussian integral. More discussion of this conjecture—but no rigorous proofs—can be found in [Minka, 2000, Section 4].
The proof of Theorem 3.4 is based on a general result by Barrow [1978] on existence and uniqueness of generalised Gaussian quadrature rules. This result replaces the polynomials in a Gaussian quadrature rule with generalised polynomials formed out of functions that constitute an extended Chebyshev system [Karlin and Studden, 1966, Chapter 1]. A collection of functions is an extended Chebyshev system if any nontrivial linear combination of the functions has at most zeroes, counting multiplicities. That is, if and for , , and , then . Any basis of the space of polynomials of degree at most is an extended Chebyshev system. Importantly, the functions in (8) are an extended Chebyshev system for any . To verify this, note that any can be written as for some polynomial of degree at most and consequently
for some polynomials . From this expression we see that for every if and only if for every . Since can have at most zeroes, counting multiplicities, it follows that the same is true of .
Theorem 3.1 (Barrow)
Let be an extended Chebyshev system and a positive linear functional on . Then there exist unique points and positive weights such that
The following lemmas are also needed.
Lemma 3.2
Let and suppose that a cubature rule with nonnegative weights satisfies for some positive function such that for all . Then
Proof
The claims follow immediately from the inequalities
∎
Lemma 3.3
Let be a metric space, a positive constant, and a function. If there is a continuous function such that uniformly as and a unique minimiser for which , then a function such that has .
Proof
The inequality shows that since by assumption and by uniformity of the convergence . Because is continuous, nonnegative, and has a unique minimiser , this implies that . ∎
Theorem 3.4
Suppose that for . If for every there exists a unique optimal kernel quadrature rule , then its points and weights converge to those of the point Gaussian quadrature rule for :
Moreover, .
Proof
In a manner identical to the proof of Theorem 2.3, we establish the lower bound
that holds for every . Because are an extended Chebyshev system, Theorem 3.1 guarantees the existence of a unique point quadrature rule such that for every . The points of this rule are distinct and lie inside and the weights positive. We can then replicate the rest of the proof of Theorem 2.3 in one dimension but with and Lemma 2.1 replaced with Lemma 3.2 (applied to the function ) to show that, for sufficiently large <