Worst-case optimal approximation with increasingly flat Gaussian kernels

# Worst-case optimal approximation with increasingly flat Gaussian kernels

Toni Karvonen Department of Electrical Engineering and Automation
Aalto University, Espoo, Finland
11email: tskarvon@iki.fi, simo.sarkka@aalto.fi
Simo Särkkä Department of Electrical Engineering and Automation
Aalto University, Espoo, Finland
11email: tskarvon@iki.fi, simo.sarkka@aalto.fi
###### Abstract

We study worst-case optimal approximation of positive linear functionals in reproducing kernel Hilbert spaces (RKHSs) induced by increasingly flat Gaussian kernels. This provides a new perspective and some generalisations to the problem of interpolation with increasingly flat radial basis functions. When the evaluation points are fixed and unisolvent, we show that the worst-case optimal method converges to a polynomial method. In an additional one-dimensional extension, we allow also the points to be selected optimally and show that in this case convergence is to the unique Gaussian-type method that achieves the maximal polynomial degree of exactness. The proofs are based on an explicit characterisation of the Gaussian RKHS in terms of exponentially damped polynomials.

###### Keywords:
Worst-case analysis Reproducing kernel Hilbert spaces Gaussian kernel Gaussian quadrature

\DeclareUnicodeCharacter00A0

## 1 Introduction

Let be a subset of with a non-empty interior and a positive linear functional acting on continuous real-valued functions defined on and satisfying for any polynomial on . A cubature rule (quadrature if ) with the distinct points and weights is a weighted approximation to  of the form

 Q(X,w)[f]\coloneqqN∑n=1w(n)f(xn)≈L[f]. (1)

In this article, the interest is often only in the weights while the points are kept fixed; accordingly, we may denote when there is no risk for confusion. Every continuous positive-definite kernel induces a unique reproducing kernel Hilbert space (RKHS) where the reproducing property holds for every and . We assume that , which guarantees for any and consequently for any . The worst-case error of the cubature rule (1) in is

 eH(Q(X,w))\coloneqqsup∥f∥H≤1∣∣∣L[f]−N∑n=1w(n)f(xn)∣∣∣. (2)

Given a fixed set of distinct points, we are interested in the kernel cubature rule whose weights are chosen so as to minimise the worst-case error:

 wH=argminw∈RNeH(Q(X,w)) and eH(QH)=infw∈RNeH(Q(X,w)).

These weights are unique and available as the solution to the linear system [Oettershagen, 2017, Section 3.2]

 ⎡⎢ ⎢⎣k(x1,x1)⋯k(x1,xN)⋮⋱⋮k(xN,x1)⋯k(xN,xN)⎤⎥ ⎥⎦⎡⎢ ⎢⎣wH(1)⋮wH(N)⎤⎥ ⎥⎦=⎡⎢ ⎢⎣L[k(⋅,x1)]⋮L[k(⋅,xN)]⎤⎥ ⎥⎦. (3)

Arguably, the two most prominent linear functionals in approximation theory are the point evaluation functionals , given by for , and the integration functional defined by a measure on . The former case yields the kernel interpolant

 sf,X(x)=N∑n=1un(x)f(xn),

where are the Lagrange cardinal functions that satisfy . For any fixed we have for . In this case the worst-case error coincides with the power function [Schaback, 1993]. For an arbitrary , the kernel cubature rule can be also obtained by applying to the kernel interpolant so that .

### 1.1 Increasingly flat kernels

Selection of the positive-definite kernel has a radical effect on the weights of the kernel cubature rule . In the case of interpolation, interesting behaviour has been observed when kernel is isotropic (i.e., a radial basis function):

 kℓ(x,x′)=k0(ℓ−1∥x−x′∥)

for a positive-definite function and a length-scale parameter . When , the kernel becomes increasingly flat and the linear system (3) increasingly ill-conditioned.111Note that most of the literature we cite parametrises the kernel in terms of the inverse length-scale and accordingly considers the case . Nevertheless, the corresponding kernel interpolant is typically well-behaved at this limit. Starting with the work of Driscoll and Fornberg [2002], it has been shown that a certain unisolvency assumption on  implies that the kernel interpolant converges to (i) a polynomial interpolant if the kernel is infinitely smooth [Driscoll and Fornberg, 2002, Fornberg et al., 2004, Schaback, 2005, Larsson and Fornberg, 2005, Lee et al., 2007, Schaback, 2008] or (ii) a polyharmonic spline interpolant if the kernel is finitely smooth [Song et al., 2012, Lee et al., 2014]. Further generalisations appear in [Lee et al., 2015]. The former case covers kernels such as Gaussians, multiquadrics, and inverse multiquadrics while the latter applies to, for example, Matérn kernels and Wendland’s functions. Among the most interesting of these results is the one by Schaback [2005] who proved that the interpolant at the increasingly flat limit of the Gaussian kernel

 kℓ(x,x′)=exp(−∥x−x′∥22ℓ2) (4)

exists regardless of the geometry of and coincides with the de Boor and Ron polynomial interpolant [de Boor and Ron, 1992, de Boor, 1994]. Increasingly flat kernels have been also discussed independently in the literature on the use of Gaussian processes for numerical integration O’Hagan [1991], Minka [2000], Särkkä et al. [2016], albeit accompanied only with non-rigorous arguments. Even though the intuition that the lowest degree terms in the Taylor expansion of the kernel dominate construction of the interpolant as and that this ought to imply convergence to a polynomial interpolant is quite clear, this is not always translated into transparent proofs.

### 1.2 Contributions

The purpose of this article is to generalise the aforementioned results on flat limits of kernel interpolants for kernel cubature rules. That such generalisations are possible is not perhaps surprising; it is rather the simple proof technique made possible by the worst-case framework that we find the most interesting aspect. We consider only the Gaussian kernel (4). This is because its RKHS has been completely characterised by Steinwart et al. [2006] and Minh [2010] (see also [Steinwart and Christmann, 2008, Section 4.4] and [De Marchi and Schaback, 2009, Example 3]).

###### Theorem 1.1

Let be a subset of with a non-empty interior. Then RKHS induced by the Gaussian kernel (4) with length-scale consists of the functions

 f(x)=e−∥x∥2/(2ℓ2)∑α∈Nd0fαxαsuch that∥f∥2Hℓ=∑α∈Nd0ℓ2|α|α!f2α<∞, (5)

where convergence is absolute. Its inner product is . Furthermore, the collection

 (6)

of functions forms an orthonormal basis of .

This theorem is our central tool. Two crucial implications are that  consists of certain functions expressable as series of exponentially damped polynomials, the damping effect vanishing as , and that, due to the terms appearing in the RKHS norm, the high-degree terms contribute the most to the norm. Consequently, the worst-case error (2), taking into account only functions of at most unit norm, is dominated by low-degree terms when is large. The following results then follow:

• If is unisolvent with respect to a full polynomial space and , then  converges to the unique cubature rule that satisfies for every polynomial of degree at most . This result, contained in Theorem 2.3 and Corollary 2.4 in Section 2, is a generalisation for arbitrary linear functionals of the interpolation results cited in Section 1.1. It is not required that be bounded: at the end of Section 2 we supply an example involving integration over with respect to the Gaussian measure.

• In Section 3 we present a generalisation, based on a theorem of Barrow [1978], for optimal kernel quadrature rules [Oettershagen, 2017, Chapter 5] that have both their points and weights selected so as to minimise the worst-case error. The result, Theorem 3.4, states that such rules, if unique, converge to the -point Gaussian quadrature rule for the functional , which is the unique quadrature rule such that for every polynomial of degree at most . This partially settles a conjecture posed by O’Hagan [1991, Section 3.3] and further discussed in [Minka, 2000, Särkkä et al., 2016] on convergence of optimal kernel quadrature rules to Gaussian quadrature rules.

Some generalisations for other kernels and cubature rules of more general form than (1) are briefly discussed in Section 4.

## 2 Fixed points

Let stand for the space of -variate polynomials of degree at most :

 Πm=span{xα\ordinarycolonα∈Nd0,|α|≤m}.

In this section we assume that the point set is -unisolvent. That is,

 N=#X=dim(Πm)=(m+dd)=(m+d)!d!m!

and the zero function is the only element of that vanishes on . This is equivalent to non-singularity of the (generalised) Vandermonde matrix

 PΠ\coloneqq⎡⎢ ⎢ ⎢⎣xα11⋯xαN1⋮⋱⋮xα1N⋯xαNN⎤⎥ ⎥ ⎥⎦, (7)

where , …, . It follows that there is a unique polynomial cubature rule  such that for every . Its weights solve the linear system of  equations, where . Define then

 ϕℓα(x)=e−∥x∥2/(2ℓ2)xα, (8)

so that functions in the Gaussian RKHS , characterised by Theorem 1.1, are of the form  for coefficients that decay sufficiently fast. Since the exponential function has no real roots, determinant of the matrix

 Pϕ,ℓ\coloneqq⎡⎢ ⎢ ⎢⎣ϕℓα1(x1)⋯ϕℓαN(x1)⋮⋱⋮ϕℓα1(xN)⋯ϕℓαN(xN)⎤⎥ ⎥ ⎥⎦ (9)

satisfies and  is hence non-singular. From non-singularity it follows that there are unique weights such that for every satisfying . The weights solve , where .222See [Fasshauer and McCourt, 2012] for an interpolation method based on a closely related basis derived from a Mercer eigendecomposition of the Gaussian kernel and [Karvonen and Särkkä, 2019] for an explicit construction of weights similar to  in the case is the Gaussian integral. The following simple lemmas will be useful.

###### Lemma 2.1

Suppose that is -unisolvent and for every . Then there is a constant such that for any .

###### Proof

The assumption and unisolvency of imply that . Because for any polynomial , both the weights and are finite and the claim follows. ∎

###### Lemma 2.2

Suppose that is -unisolvent and for every . If is a sequence of weights such that

 ∣∣L[ϕℓα]−Q(X,wℓ)[ϕℓα]∣∣→0 for every |α|≤m,

then .

###### Proof

We have and

 ∥LΠ−PTΠwℓ∥≤∥LΠ−Lϕ,ℓ∥+∥Lϕ,ℓ−PTϕ,ℓwℓ∥+∥PTϕ,ℓwℓ−PTΠwℓ∥,

where each of the terms on the right-hand side vanishes as . Because and is non-singular, we conclude that . ∎

We are ready to prove the main result of the article for a fixed -unisolvent point set consisting of distinct points. First, by considering one of the basis functions (6) we show that for every . Second, the sub-optimal cubature rule defined above can be used, in combination with (5), to establish the upper bound . These two bounds imply that for every . If , Lemma 2.2 then implies that .

###### Theorem 2.3

Let for some and be -unisolvent. Suppose that for every  such that  and that

 L[∑|α|≥m+1|aα|ℓ|α|−(m+1)0√α!|xα|]≤CL<∞ (10)

for some and any sequence such that . Then

 limℓ→∞wHℓ=wΠ and eHℓ(QHℓ)=O(ℓ−(m+1)),

where  are the weights of the unique polynomial cubature rule such that for every .

###### Proof

For every select the function

 gα(x)=1ℓ|α|√α!e−∥x∥2/(2ℓ2)xα=1ℓ|α|√α!ϕℓα(x).

From Theorem 1.1 it follows that since is one of the basis functions (6). Thus, by definition of the worst-case error,

 1ℓ|α|√α!∣∣L[ϕℓα]−QHℓ[ϕℓα]∣∣=∣∣L[gα]−QHℓ[gα]∣∣≤eHℓ(QHℓ). (11)

Next we derive an appropriate upper bound on by considering the unique sub-optimal cubature rule that is exact for every with . In the expansion (5) of a function in we have for every term with . Consequently, the worst-case error of this rule admits the bound

where are the coefficients that define in Theorem 1.1. A consequence of (5) is that implies for some reals such that . Therefore, for ,

 sup∥f∥Hℓ≤1L[∑|α|≥m+1|fα||ϕℓα|]≤L[∑|α|≥m+1|aα|ℓ|α|√α!|ϕℓα|]≤ℓ−(m+1)L[∑|α|≥m+1|aα|ℓ|α|−(m+1)0√α!|ϕℓα|]≤ℓ−(m+1)L[∑|α|≥m+1|aα|ℓ|α|−(m+1)0√α!|xα|]≤CLℓ−(m+1)

by assumption (10). Moreover, because

 maxn=1,…,N|ϕℓα(xn)|≤maxn=1,…,N|xαn|≤CX

for some and every , we have

 sup∥f∥Hℓ≤1∣∣∣Q(wϕ,ℓ)[∑|α|≥m+1fαϕℓα]∣∣∣ ≤sup∥f∥Hℓ≤1N∑n=1|wϕ,ℓ(n)|∑|α|≥m+1|fα||ϕℓα(xn)| ≤ℓ−(m+1)N∑n=1|wϕ,ℓ(n)|∑|α|≥m+1|aα|ℓ|α|−(m+1)√α!|ϕℓα(xn)| ≤ℓ−(m+1)N∑n=1|wϕ,ℓ(n)|∑|α|≥m+1CXℓ|α|−(m+1)0√α! ≤ℓ−(m+1)(supℓ≥ℓ0N∑n=1|wϕ,ℓ(n)|)∑|α|≥m+1CXℓ|α|−(m+1)0√α! \eqqcolonCQℓ−(m+1)

where  follows from convergence of the last term and Lemma 2.1. Thus

 eHℓ(Q(wϕ,ℓ))≤(CL+CQ)ℓ−(m+1)\eqqcolonCℓ−(m+1) (12)

when . Since is worst-case optimal, we have thus established with (11) and (12) that, for sufficiently large ,

 1ℓ|α|√α!∣∣L[ϕℓα]−QHℓ[ϕℓα]∣∣≤eHℓ(QHℓ)≤eHℓ(Q(wϕ,ℓ))≤Cℓ−(m+1)

for every such that and a constant  independent of . That is,

 ∣∣L[ϕℓα]−QHℓ[ϕℓα]∣∣≤C√α!ℓ−(m+1)+|α|≤C√m!ℓ−1→0 as ℓ→∞. (13)

The claim then follows by setting in Lemma 2.2. ∎

Assumptions of Theorem 2.3 hold, for instance, if the domain is bounded.

###### Corollary 2.4

Let for some and be -unisolvent. Suppose that is bounded. Then

 limℓ→∞wHℓ=wΠ and eHℓ(QHℓ)=O(ℓ−(m+1)),

where  are the weights of the unique polynomial cubature rule such that for every .

###### Proof

On a bounded domain the convergence as is uniform. Thus

 ∣∣L[xα]−L[ϕℓα]∣∣≤L[1]supx∈Ω|xα−ϕℓα(x)|→0

as for every . Assumption (10) is also satisfied:

 L[∑|α|≥m+1|aα|ℓ|α|−(m+1)0√α!|xα|]≤L[∑|α|≥m+1βαℓ|α|−(m+1)0√α!]<∞,

where for and finiteness follows from the assumption . ∎

However, boundedness of  is not necessary. Consider Gaussian integration:

 L[f]=1(2π)d/2∫Rdf(x)e−∥x∥2/2dx=d∏i=1[1√2π∫Rf(x)e−x2i/2dxi].

When for some (otherwise all the integrals below vanish by symmetry),

 L[ϕℓα]=d∏i=1[1√2π∫Rxαiiexp(−(1+ℓ−2)x2i2)dxi]=(1+ℓ−2)−d/2d∏i=1[√1+ℓ−22π∫Rxαiiexp(−(1+ℓ−2)x2i2)dxi]=(1+ℓ−2)−d/2d∏i=1(1+ℓ−2)−αi(αi−1)!!

while

 L[xα]=d∏i=1[1√2π∫Rxαie−x2i/2dxi]=d∏i=1(αi−1)!!.

Thus as . To verify (10), recall that the absolute moments of the standard Gaussian distribution are

 L[|xα|]=π−d/2d∏i=12αi/2Γ(αi+12)=[∏αi oddπ−1/22αi/2(αi−12)!]×[∏αi even(αi−1)!!],

where is the Gamma function. Because for any and

 π−1/22n/2√n!(n−12)!=π−1/22n/2√n!×(n−1)!!2(n−1)/2=√2π(n−1)!!√n!≤√2π≤1

if is odd, we have

 L[|xα|]√α!=[∏αi oddπ−1/22αi/2√αi!(αi−12)!]×[∏αi even(αi−1)!!√αi!]≤1.

Thus

 L[∑|α|≥m+1|aα|ℓ|α|−(m+1)0√α!|xα|]≤∑|α|≥m+11ℓ|α|−(m+1)0<∞

if .

## 3 Optimal points in one dimension

Let and  for . In this section we consider quadrature rules whose points are also selected so as to minimise the worst-case error. A kernel quadrature rule is optimal if

 eH(Q∗H)=infw∈RN,X∈ΩNeH(Q(X,w)).

In order to eliminate degrees of freedom in ordering the points we require that the points are in ascending order (i.e., ). Even though optimal kernel quadrature rules have been studied since the 1970s [Larkin, 1970, Richter, 1970, Richter-Dyn, 1971, Barrar et al., 1974, Bojanov, 1979] (the main results have been recently collated by Oettershagen [2017, Section 5.1]) for the integration functional , their theory is still far from complete. As far as we are aware of, there are no results guaranteeing existence or uniqueness of these rules, and the most advanced statement seems to be that an optimal kernel quadrature rule, if it exists, has all its points distinct and in the interior of if the kernel is totally positive (e.g., Gaussian) [Oettershagen, 2017, Corollary 5.13].

In Theorem 3.4 we show that uniqueness of implies that its increasingly flat limit is , the -point Gaussian quadrature rule for the linear functional . This is the unique quadrature rule that is exact for every polynomial of degree at most : whenever . This degree of exactness is maximal; there are no -point quadrature rules exact for all polynomials up to degree . The most familiar methods of this type are of course the classical Gaussian quadrature rules for numerical integration [Gautschi, 2004, Section 1.4]. For example, the Gauss–Legendre quadrature rule satisfies

 Q(X\tiny{G},w\tiny{G})[p]=∫1−1p(x)dx

for every polynomial of degree at most . The points of this rule are the roots of the th degree Legendre polynomial. Theorem 3.4 was conjectured by O’Hagan [1991, Section 3.3] in 1991 in the form that the optimal kernel quadrature rule has the classical Gauss–Hermite quadrature rule as its increasingly flat limit if the kernel is Gaussian and is the Gaussian integral. More discussion of this conjecture—but no rigorous proofs—can be found in [Minka, 2000, Section 4].

The proof of Theorem 3.4 is based on a general result by Barrow [1978] on existence and uniqueness of generalised Gaussian quadrature rules. This result replaces the polynomials in a Gaussian quadrature rule with generalised polynomials formed out of functions that constitute an extended Chebyshev system [Karlin and Studden, 1966, Chapter 1]. A collection of functions is an extended Chebyshev system if any non-trivial linear combination of the functions has at most zeroes, counting multiplicities. That is, if and for , , and , then . Any basis of the space of polynomials of degree at most is an extended Chebyshev system. Importantly, the functions in (8) are an extended Chebyshev system for any . To verify this, note that any can be written as for some polynomial of degree at most and consequently

 ϕ(l)(x)=e−x2/(2ℓ2)(l−1∑r=0sr(x)p(r)(x)+p(l)(x))

for some polynomials . From this expression we see that for every if and only if for every . Since can have at most zeroes, counting multiplicities, it follows that the same is true of .

###### Theorem 3.1 (Barrow)

Let be an extended Chebyshev system and a positive linear functional on . Then there exist unique points and positive weights such that

 Q(X,w)[un]=L[un] for every n=0,…,2N−1.

The following lemmas are also needed.

###### Lemma 3.2

Let and suppose that a cubature rule with non-negative weights satisfies for some positive function such that for all . Then

 N∑n=1w(n)≤L[1]cucl and maxn=1,…,Nw(n)≤L[1]cucl.
###### Proof

The claims follow immediately from the inequalities

 infx∈Ωu(x)N∑n=1w(n)≤N∑n=1w(n)u(xn)=L[u]≤L[1]cu.

###### Lemma 3.3

Let be a metric space, a positive constant, and a function. If there is a continuous function such that uniformly as and a unique minimiser for which , then a function such that has .

###### Proof

The inequality shows that since by assumption and by uniformity of the convergence . Because is continuous, non-negative, and has a unique minimiser , this implies that . ∎

###### Theorem 3.4

Suppose that for . If for every there exists a unique optimal kernel quadrature rule , then its points and weights converge to those of the -point Gaussian quadrature rule for :

 limℓ→∞X∗Hℓ=X\tiny{G} % and limℓ→∞w∗Hℓ=w\tiny{G}.

Moreover, .

###### Proof

In a manner identical to the proof of Theorem 2.3, we establish the lower bound

 1ℓn√n!∣∣L[ϕℓn]−Q∗Hℓ[ϕℓn]∣∣≤eHℓ(Q∗Hℓ)

that holds for every . Because are an extended Chebyshev system, Theorem 3.1 guarantees the existence of a unique -point quadrature rule such that for every . The points of this rule are distinct and lie inside and the weights positive. We can then replicate the rest of the proof of Theorem 2.3 in one dimension but with and Lemma 2.1 replaced with Lemma 3.2 (applied to the function ) to show that, for sufficiently large <