Localized bases for kernel spaces on the unit sphere ^{†}^{†}thanks: 2000 Mathematics Subject Classification:41a05, 41a30, 41a63, 65d05 ^{†}^{†}thanks: Key words:interpolation, thinplate splines, sphere, kernel approximation
Abstract
Approximation/interpolation from spaces of positive definite or conditionally positive definite kernels is an increasingly popular tool for the analysis and synthesis of scattered data, and is central to many meshless methods. For a set of scattered sites, the standard basis for such a space utilizes globally supported kernels; computing with it is prohibitively expensive for large . Easily computable, welllocalized bases, with “smallfootprint” basis elements – i.e., elements using only a small number of kernels – have been unavailable. Working on , with focus on the restricted surface spline kernels (e.g. the thinplate splines restricted to the sphere), we construct easily computable, spatially welllocalized, smallfootprint, robust bases for the associated kernel spaces. Our theory predicts that each element of the local basis is constructed by using a combination of only kernels, which makes the construction computationally cheap. We prove that the new basis is stable and satisfies polynomial decay estimates that are stationary with respect to the density of the data sites, and we present a quasiinterpolation scheme that provides optimal approximation orders. Although our focus is on , much of the theory applies to other manifolds – , the rotation group, and so on. Finally, we construct algorithms to implement these schemes and use them to conduct numerical experiments, which validate our theory for interpolation problems on involving over one hundred fifty thousand data sites.
1 Introduction
Approximation/interpolation with positive definite or conditionally positive definite kernels is an increasingly popular tool for analyzing and synthesizing of scattered data and is central to many meshless methods. The main difficulty in using this tool is that welllocalized bases with “smallfootprint” elements – i.e., elements using only a small number of kernels – have been unavailable. With this in mind, we have two main goals for this paper.
The first is the theoretical development of smallfootprint bases that are welllocalized spatially, for a variety of kernels. For important classes of kernels on , the theory itself predicts that a basis element requires only kernels, where is the number of data sites.
Previous numerical experiments on data sets, with on the order of a thousand, used adhoc techniques to determine the number of kernels per basis element. The predictions of our theory, on the other hand, have been verified numerically on for data sets with over a hundred thousand sites.
Our second goal is to show how to easily and efficiently compute these smallfootprint, welllocalized, robust bases for spaces associated with restricted surfacespline kernels on the sphere . The kernels in question are spherical basis functions having the form
(1) 
for (cf. [17, Eqn. 3.3]). The kernel spaces are denoted – these are finite dimensional spaces of functions obtained as linear combinations of , sampled at some (finite) set of nodes , plus a spherical polynomial of degree , i.e. . The coefficients involved satisfy the simple side conditions given in 5.
The Lagrange functions , which interpolate cardinal sequences: , form a basis for . Recently, it has been shown in [11], for restricted surface splines, as well as many other kernels, that these functions decay extremely rapidly away from . Thus, forms a basis that is theoretically quite good (sufficient to demonstrate that the Lebesgue constant is uniformly bounded, among many other things). However, determining a Lagrange basis function generally requires solving a full linear system with at least unknowns, so working with this basis directly is computationally expensive. In this paper we consider an alternative basis: one that shares many of the nice properties of the Lagrange basis, yet its construction is computationally cheap.
Here is what we would desire in an easily computed, robust basis for . Each basis function should be highly localized with respect to the mesh norm of . Moreover, each should have a nearly stationary construction. By this we mean that each basis element is of the form , where the coefficients and the degree polynomial are completely determined by and a small subset of centers Specifically, we wish to satisfy the following requirements:
i)  
ii) 
where the number of points influencing each basis function is constant or slowly growing with , and the function decays rapidly – at an exponential rate or at least at a fast polynomial rate . The Bspline basis, constructed from the family of truncated power functions (i.e., using in place of ), is a model solution to the problem we consider.
Main results. The solution we present is to consider a basis of “local Lagrange” functions, which are constructed below in Section 3. It has the following properties:

Numerical Stability. For any , one can construct a numerically stable basis with decay .

Small footprint. Each basis function is determined by a relatively small set of centers: , where the constant is proportional to the square of the rate of decay : .

stability. The basis is stable in : sequence norms of the coefficients are comparable to norms of the expansion .

Nearbest approximation. For sufficiently large , the operator provides nearbest approximation.
Preconditioners. Over the years practical implementation of kernel approximation has progressed despite the illconditioning of kernel bases. This has happened with the help of clever numerical techniques like multipole methods and other fast methods of evaluation [1, 4, 12] and often with the help of preconditioners [3, 6, 14, 23]. Many results already exist in the RBF literature concerning preconditioners and “better” bases. For a good list of references and further discussion, see [5]. Several of these papers use local Lagrange functions in their efforts to efficiently construct interpolants, but the number of points chosen to localize the Lagrange functions are ad hoc and seem to be based on experimental evidence. For example, Faul and Powell, in [7], devise an algorithm which converges to a given RBF interpolant that is based on local Lagrange interpolants using about thirty nearby centers. Beatson–Cherrie–Mouat, in [2], use fifty local centers (p. 260, Table 1) in their construction along with a few “far away” points to control the growth of the local interpolant at a distance from the center. In other work, Ling and Kansa [14] and coworkers have studied approximate cardinal basis functions based on solving least squares problems.
An offshoot of our results is a strategy for selecting centers for preconditioning (as in [7] and [2]) that scales correctly with the total number of centers . We demonstrate the power of this approach in Section 7, where the local basis is used to successfully precondition kernel interpolation problems varying in size by several orders of magnitude.
Organization. We now sketch the outline of the remainder of the article. In Section 2 we give some necessary background: in Section 2.1 we treat analysis on spheres and in Section 2.2 we treat conditionally positive definite kernels. Section 3 presents the construction of the local Lagrange basis. Much of the remainder of the article is devoted to proving that this basis has the desired properties mentioned above. However, doing this will first require a thorough understanding of the (full) Lagrange basis , which we study in detail in Sections 4 and 5.
In Section 4 we consider the full Lagrange basis: the stable, local bases constructed in [11]. We first numerically exhibit the exponential decay of these functions away from their associated center. A subsequent experiment shows that the coefficients in the expansion have similar rapid decay. These numerical observations confirm the theory in Section 5, where it is proven that the Lagrange coefficients indeed decay quickly and stationarily with respect to as moves away from .
Section 6 treats the main arguments of the paper. This occurs roughly in three stages.

The decay of the Lagrange function coefficients indicates that truncated Lagrange functions will be satisfactory. But simply truncating causes the function to fall outside of the space (moment conditions for the coefficients are no longer satisfied), so it is necessary to adjust the coefficients slightly. The cost of this readjustment is related to the smallest eigenvalue of a certain Gram matrix: a symmetric positive definite matrix that depends on the set – this is discussed in Section 6.1.

Section 6.2 estimates the minimal eigenvalue of the Gram matrix, which is shown to be quite small compared to tail of the coefficients. Although the resulting truncated, adjusted Lagrange function decays at a fast polynomial rate and requires terms, it is still unsuitable because its construction requires the full expansion.
In Section 7 we demonstrate the effectiveness of using the local basis to build preconditioners for large kernel interpolation problems.
Generalization to other manifolds/kernels. Finally, we note that many of the results here can be demonstrated in far greater generality with minimal effort: in particular, most results hold for Sobolev kernels on manifolds (as considered in [10]) and for many kernels of polyharmonic and related type on two point homogeneous spaces (as considered in [11]). To simplify our exposition, we focus almost entirely on surface splines on . We will include remarks discussing the generalizations as we go along.
2 Background
2.1 The sphere
We denote by the unit sphere in , and by we denote Lebesgue measure. The distance between two points, and , on the sphere is written . The basic neighborhood is the spherical ‘cap’ . The volume of a spherical cap is .
Throughout this article, is assumed to be a finite set of distinct nodes on and we denote the number of elements in by . The mesh norm or fill distance, measures the density of in . The separation radius is , where , and the mesh ratio is .
The LaplaceBeltrami operator and spherical coordinates. Given a north pole on , we will use the longitude and the colatitude as coordinates. The Laplace–Beltrami operator is then given by
For each , the eigenvalues of the negative of the LaplaceBeltrami operator, , have the form ; these have multiplicity . For each fixed , the eigenspace has an orthonormal basis of eigenfunctions, , the spherical harmonics of degree . The space of spherical harmonics of degree is and has dimension . These and are the basic objects of Fourier analysis on the sphere. In order to simplify notation, we often denote a generic basis for as . We deviate from this only when a specific basis of spherical harmonics is required.
Covariant derivatives. The second kind of operators are the covariant derivative operators. These play a secondary role in this article – they are a construction useful for defining smoothness spaces and for proving results about surface splines, but they play no role in the actual implementation of the algorithms. We present some useful overview here – a more detailed discussion, where the relevant concepts are developed for Riemannian manifolds (including ) is given in [10, Section 2]. A more complete discussion is not warranted here.
We consider tensor valued operators , where each entry is a differential operator of order – each index or , corresponding to the variables and . They transform tensorially and are the “invariant” partial derivatives of order . For , is the ordinary gradient, but when , is the invariant Hessian. In spherical coordinates, is a rank two covariant tensor with four components:
For a discussion of covariant derivatives on a general compact, manifold, we refer the reader to [10, (2.7)].
Of special importance is the fact that at each point there is a natural inner product on the space of tensors. The inner product employs the inverse of the metric tensor. For the sphere this is a diagonal matrix with entries: and and . The general form of the inner product for tensors is
This gives rise to a notion of the pointwise size of the th derivative at :
Smoothness spaces. This allows us to construct Sobolev spaces. For each and each measurable subset , the Sobolev norm is
2.2 Conditionally positive definite kernels and interpolation
Many of the useful computational properties of restricted surface splines stem from the fact that they are conditionally positive definite. We treat the topic of conditional positive definiteness for a general kernel .
Definition 2.1.
A kernel is conditionally positive definite with respect to a finite dimensional space if, for any set of distinct centers , the matrix is positive definite on the subspace of all vectors satisfying for .
Let be a complete orthonormal basis of continuous functions. Consider a kernel
(2) 
with coefficients so that all but finitely many coefficients are positive. Then is conditionally positive definite with respect to the (finite dimensional) space where Indeed,
provided for (i.e., satisfying ).
Conditionally positive definite kernels are important for the following interpolation problem. Suppose is a set of nodes on the sphere, is some target function, and are the samples of at the nodes in . We look for a function that interpolates this data from the space
Provided is unisolvent with respect to (meaning that for all implies that for any ), the unique interpolant from can be written
where the expansion coefficients satisfy the (nonsingular) linear system of equations:
(3) 
where , , and , , . This interpolant plays a dual role as the minimizer of the seminorm induced from the “native space” semiinner product
(4) 
Namely, it is the interpolant to having minimal seminorm .
3 Constructing the local Lagrange basis
The restricted surface splines (see (1)) are conditionally positive definite with respect to the space of spherical harmonics of degree up to , i.e. . The finite dimensional spaces associated with these kernels are denoted as in the previous section:
(5)  
The goal of this section is to provide an easily constructed, robust basis for . The fundamental idea behind building this basis is to associate with each , a new basis function that interpolates over a relatively small set of nodes a function that is cardinal at .
Specifically, let be the nearest neighbors to the node , including the node ; see Figure 1 for an illustration. Then the new basis function associated with is given by
(6) 
where are a basis for the spherical harmonics of degree . The coefficients and are determined from the cardinal conditions
(7) 
These coefficients can be determined by solving the (small) linear system
(8) 
where represents the cardinal data and the entries of the matrix follow from (3). We call a local Lagrange function about .
The new basis for will consist of the collection of all the local Lagrange functions for the nodes in . It will be shown in Section 6.3 that by choosing the number of nearest neighbors to each as will give a basis with sufficient locality. The choice of is related to the polynomial rate of decay of away from its center and a priori estimates are given for in Section 6.3. However, in practice it will be sufficient to choose by tuning it appropriately to get the desired rate of decay.
The exact details of the algorithm for constructing this basis then proceed as follows: For each

Find the nearest neighbors to , .
We note each set can be determined in operations by using a KDtree algorithm for sorting and searching through the nodes . After the initial construction of the KDtree, which requires , the construction of all the sets thus takes operations.
Before continuing, we note that our main results, given in Theorem 6.5 and its corollaries, depend heavily on properties that this local Lagrange basis inherits from the full Lagrange basis . Thus, much of what follows is spent on developing a working understanding of the full Lagrange basis and its connections to the local Lagrange basis. Even though the local Lagrange basis is the focus of our work, we will delay any further mention of until Section 6.3.
4 The full Lagrange basis: numerical observations
In this section we numerically examine a full Lagrange basis function and its associated coefficients for the kernel , the second order restricted surface spline (also known as the thin plate spline) on . First, we demonstrate numerically that decays exponentially away from its center. Secondly, we provide the initial evidence that the Lagrange coefficients decay at roughly the same rate, which is proved later in Theorem 5.3.
The full Lagrange function centered at takes the form
where is a degree spherical harmonic. In this example, we use the “minimal energy points” of Womersley for the sphere – these are described and distributed at the website [25].^{1}^{1}1These point sets are used as benchmarks: each set of centers has a nearly identical mesh ratio, and the important geometric properties (e.g., fill distance and separation distance) are explicitly documented. Because of the quasiuniformity of the minimal energy point sets, it is sufficient to consider the Lagrange function centered at the north pole .
Figure 2 displays the maximal colatitudinal values^{2}^{2}2The function is evaluated on a set of points with equispaced longitudes and equispaced colatitudes . of . Until a terminal value of roughly , we clearly observe the exponential decay of the Lagrange function, which follows
(9) 
(this “plateau” at is caused by roundoff error – see Figure 4). The estimate (9) has in fact been proven in [11, Theorem 5.3], where this and other analytic properties of bases for were studied in detail. By fitting a line to the data in Figure 2 where the exponential decay is evident, one can estimate the constants and , which in this case are quite reasonable. For example, the value of , which measures the rate of exponential decay, is observed to be close to (see Table 1).
We can visualize the decay of the corresponding coefficients in the same way. We again take the Lagrange function centered at the north pole: for each , the coefficient in the expansion is plotted with horizontal coordinate . The results for sets of centers of size and are given in Figure 3. The exponential decay seems to follow
Indeed, this is established later in Theorem 5.3. As before, we can estimate the constants and for the decay of the coefficients. Comparing Figures 2 and 3, we note that the coefficient plot is shifted vertically. This is a consequence of the factor of in the estimate (15) below. Table 1 gives estimates for the constants and , along with the constants involved in the decay of the Lagrange functions.
400  0.1136  1.2930  1.1119  0.8382  1.0997  0.5402 

900  0.0874  1.5302  1.3556  1.0982  1.3445  0.7554 
1600  0.0656  1.5333  1.3513  1.2170  1.3216  0.5946 
2500  0.0522  1.5278  1.3345  0.9618  1.3117  0.5494 
5041  0.0365  1.5304  1.3395  1.1080  1.3158  0.6188 
10000  0.0260  1.5421  1.3645  1.1934  1.3369  0.7291 
The perceived plateau present in the Lagrange function values as well as the coefficients shown in Figures 2 and 3 is due purely to roundoff error related to the conditioning of kernel collocation and evaluation matrices. These results were produced using doubleprecision (approximately 16 digits) floating point arithmetic. To illustrate this point, we plot the decay rate of the Lagrange coefficients for the 900 and 1600 point node sets as computed using highprecision (40 digits) floating point arithmetic in Figure 4. The figure clearly shows that the exponential decay does not plateau, but continues as the theory predicts (see Theorem 5.3).
5 Coefficients of the full Lagrange functions
In this section we give theoretical results for the coefficients in the kernel expansion of Lagrange functions. In the first part we give a formula relating the size of coefficients to native space inner products of the Lagrange functions themselves (this is Proposition 5.1). We then obtain estimates for the restricted surface splines on , demonstrating the rapid, stationary decay of these coefficients.
5.1 Interpolation with conditionally positive definite kernels
In this section we demonstrate that the Lagrange function coefficients can be expressed as a certain kind of inner product of different Lagrange functions and . Because this is a fundamental result, we work in generality in this subsection: the kernels we consider here are conditionally positive of the type considered in Section 2.2.
When – meaning that they have the expansion and with coefficients for – then the semiinner product is
(This follows directly from the definition (4) coupled with the observation that for , and .) We can use this expression of the inner product to investigate the kernel expansion of the Lagrange function.
Proposition 5.1.
Let be a conditionally positive definite kernel with respect to the space , and let be unisolvent for . Then (the Lagrange function centered at ) has the kernel expansion with coefficients
Proof.
Select two centers with corresponding Lagrange functions and . Denote the collocation and auxiliary matrices, introduced in Section 2.2, by and . Because and are both orthogonal to , we have
Now define to be the orthogonal projection onto the subspace of samples of on and let be its complement. Then for any data , (3) yields coefficient vectors and satisfying and , hence . Because is also an orthogonal projector, and therefore selfadjoint, it follows that
In the last line, we have introduced the sequence for which which implies that . Using once more the fact that is selfadjoint, and that is in its range, we have
and the lemma follows. ∎
The next result involves estimating the norms and , where and are as in (3). It will be useful later, when we discuss local Lagrange functions. The notation is the same as that used in the proof above. In addition, because is a conditionally positive definite kernel for , the matrix is positive definite on the orthogonal complement of the range of . We will let be the minimum eigenvalue of this matrix; that is,
Proposition 5.2.
Proof.
From (3) and the fact that projects onto the orthogonal complement of the range of , we have that and that . Consequently,
The bound on follows immediately from this and the estimate . To get the bound on , note that and, hence, that
We also have that . However, , which implies that
Next, note that the following inequalities hold: , , and . Applying these to the inequality
then yields the desired bound on , completing the proof. ∎
5.2 Estimating Lagrange function coefficients
In [11], it has been shown that Lagrange functions for restricted surface splines decay exponentially fast away from the center. We can use these decay estimates in conjunction with Proposition 5.1 to estimate the decay of the coefficients .
Recall that the eigenvalues of are . Let . The kernel has the expansion
where, for , , with [17, Eqn. 3.3]. From the expansion, one sees that is conditionally positive definite with respect to . Kernels such as are said to be of polyharmonic or related type; they have been studied in [11]. The kernel acts as the Green’s function for the elliptic operator (cf. [11, Example 3.3]), in the sense that
where is the orthogonal projection of onto .
The native space “inner product” on subsets. In [11] it was shown that for any , the operator (which involves the adjoint – with respect to the inner product – of the covariant derivative operator which was introduced in Section 2.1) can be expressed as with . Consequently, any operator of the form can be expressed as with and viceversa:
(10) 
Because , it follows that , with , and so the native space semiinner product, introduced in (4), can be expressed as
with and are the appropriate constants guaranteed by (5.2). The latter expression allows us to extend naturally the native space inner product to measurable subsets of . Namely,
This has the desirable property of set additivity: for sets and with , we have Unfortunately, since some of the coefficients may be negative, and may assume negative values for some : in other words, the bilinear form is only an indefinite inner product.
A CauchySchwarz type inequality. When restricted to the cone of functions in having a sufficiently dense set of zeros, the quadratic form is positive definite. We now briefly discuss this.
When has Lipschitz boundary and has many zeros, we can relate the quadratic form to a Sobolev norm . Arguing as in [11, (4.2)], we see that
If on a set with with determined only by the boundary of (specifically the radius and aperture of an interior cone condition satisfied by ), Theorem A.11 of [11] guarantees that with depending only on the order and the roughness of the boundary (in this case, depending only on the aperture of the interior cone condition). Thus, by choosing , where satisfies the two conditions
(11) 
we have