[

Preconditioned Conjugate Gradients, Radial Basis Functions and Toeplitz Matrices

Abstract

Radial basis functions provide highly useful and flexible interpolants to multivariate functions. Further, they are beginning to be used in the numerical solution of partial differential equations. Unfortunately, their construction requires the solution of a dense linear system. Therefore much attention has been given to iterative methods. In this paper, we present a highly efficient preconditioner for the conjugate gradient solution of the interpolation equations generated by gridded data. Thus our method applies to the corresponding Toeplitz matrices. The number of iterations required to achieve a given tolerance is independent of the number of variables.

Preconditioned CG and RBFs] Preconditioned Conjugate Gradients, Radial Basis Functions and Toeplitz Matrices B. J. C. Baxter]B. J. C. Baxter
Department of Mathematics, Imperial College
London SW7 2BZ, England
b.baxter@ic.ac.uk
www.ma.ic.ac.uk/baxter

1 Introduction

A radial basis function approximation has the form

where is some given function, are real coefficients, and the centres are points in ; the norm will be Euclidean throughout this study. For a wide class of functions , it is known that the interpolation matrix

is invertible. This matrix is typically full, which fact has encouraged the study of iterative methods. For example, highly promising results have been published in the use of radial basis functions in collocation and Galerkin methods for the numerical solution of partial differential equations (see \citeasnounFrankeSchaback and \citeasnounWendland), but direct solution limits their applicability to fairly small problems. The use of the preconditioned conjugate gradient algorithm was pioneered by \citeasnounDynLevinRippa, and some stunning results for scattered data were presented recently in \citeasnounFaulPowell, although the rapid convergence described there is not fully understood. Therefore we study the highly structured case when the data form a finite regular grid. The conjugate gradient algorithm has been applied to Toeplitz matrices with some success; see, for instance, \citeasnounChan. However, since our matrices are usually not positive definite and often possess elements that grow away from the diagonal, the preconditioners of \citeasnounChan are not suitable. However, the matrices have the property that their inverses tractable more tractable. Specifically, the detailed study of the spectra of the associated Toeplitz operators presented in Baxter (1992) and Baxter (1994) allows us to create highly efficient preconditioners by inverting relatively small finite sections of the bi-infinite symmetric Toeplitz operator, and this construct is also easily understood via Toeplitz theory.

Let be a positive integer and let be the symmetric Toeplitz matrix given by

(1)

where is either a Gaussian ( for some positive constant ) or a multiquadric ( for some real constant ). In this paper we construct efficient preconditioners for the conjugate gradient solution of the linear system

(2)

when is a Gaussian, or the augmented linear system

(3)
(4)

when is a multiquadric. Here and . Section 2 describes the construction for the Gaussian and Section 3 deals with the multiquadric. Of course, we exploit the Toeplitz structure of to perform a matrix-vector multiplication in operations whilst storing real numbers. Further, we shall see numerically that the number of iterations required to achieve a solution of (2) or (4) to within a given tolerance is independent of . The Matlab software used can be obtained from my homepage.

Our method applies to many other radial basis functions, such as the inverse multiquadric () and the thin plate spline (). However, we concentrate on the Gaussian and the multiquadric because they exhibit most of the important features of our approach in a concrete setting. Similarly we treat the one-dimensional problem merely to avoid complication; the multidimensional case is a rather slight generalization of this work. Let us remark that the analogue of (1) is the operator

(5)

and we shall still call a Toeplitz matrix. Moreover the matrix-vector multiplication

(6)

where is the Euclidean norm and , can still be calculated in operations, where , requiring real numbers to be stored. This trick is a simple extension of the Toeplitz matrix-vector multiplication method when .

2 The Gaussian

It is well-known that the Gaussian generates a positive definite interpolation matrix, and its functional decay is so rapid that preconditioning the conjugate gradient algorithm is not necessary. However, it provides a useful model problem that we shall describe here before developing the ideas further in the following section.

Our treatment of the preconditioned conjugate gradient (PCG) method follows Section 10.3 of Golub and Van Loan (1989), and we begin with a general description. We let be a positive integer and be an arbitrary symmetric positive definite matrix. For any nonsingular symmetric matrix and we can use the following iteration to solve the linear system .

Algorithm 2.1

Choose any in . Set and .

For do begin

  
  

  
 Stop if or is sufficiently small.
end.

In order to simplify Algorithm 2.1 define

(7)

Substituting in Algorithm 2.1 we obtain the following method.

Algorithm 2.2

Choose any in . Set , .

For do begin

  
  

  
 Stop if or is sufficiently small.
end.

It is Algorithm 2.2 that we shall consider as our PCG method in this section, and we shall call the preconditioner. We see that the only restriction on is that it must be a symmetric positive definite matrix, but we observe that the spectrum of should consist of a small number of clusters, preferably one cluster concentrated at one. At this point, we also mention that the condition number of is not a reliable guide to the efficacy of our preconditioner. For example, consider the two cases when (i) has only two different eigenvalues, say and , and (ii) when has eigenvalues uniformly distributed in the interval . The former has the larger condition number but, in exact arithmetic, the answer will be achieved in two steps, whereas the number of steps can be as high as in the latter case. Thus the term “preconditioner” is sometimes inappropriate, although its usage has become standard.

In this paper we concentrate on preconditioners for the Toeplitz matrices generated by radial basis function interpolation on a (finite) regular grid. Accordingly, we let be the matrix of (1) and let . Thus is positive definite and can be embedded in the bi-infinite symmetric Toeplitz matrix

(8)

The classical theory of Toeplitz operators (see, for instance, Grenander and Szegő (1984)) and the work of Baxter (1994) provide the relations

(9)

where is the symbol function

(10)

and denotes the spectrum of the operator . Further, Theorem 9 of Buhmann and Micchelli (1991) allows us to conclude that, for any fixed integers and , we have

(11)

It was equations (9) and (11) which led us to investigate the possibility of using some of the elements of for a relatively small value of to construct preconditioners for , where may be much larger than . Specifically, let us choose integers and define the sequence

(12)

We now let be the banded symmetric Toeplitz matrix

(13)

We claim that, for sufficiently large and , provides an excellent preconditioner when in Algorithm 2.2. Before discussing any theoretical motivation for this choice of preconditioner, we present an example. We let , and . Constructing and calculating the elements we find that

(14)
Figure 1: The symbol function for .

Now can be embedded in the bi-infinite Toeplitz matrix defined by

(15)

and the symbol for this operator is the trigonometric polynomial

(16)

In Figure 1 we display a graph of for , and it is clearly a positive function. Thus the relations

(17)

imply that is positive definite. Hence it is suitable to use as the preconditioner in Algorithm 2.2. Our aim in this example is to compare this choice of preconditioner with the use of the identity matrix as the preconditioner. To this end, we let the elements of the vector of Algorithm 2.2 be random real numbers uniformly distributed in the interval . Applying Algorithm 2.2 using the identity matrix as the preconditioner provides the results of Table 1. Table 2 contains the analogous results using (13) and (14). In both cases the iterations were stopped when the residual vector satisfied the bound . The behaviour shown in the tables is typical; we find that the number of steps required is independent of and .

Iteration Error
Table 1: No preconditioning
Iteration Error
Table 2: Using (13) and (14) as the preconditioner

Why should (13) and (14) provide a good preconditioner? Let us consider the bi-infinite Toeplitz matrix . The spectrum of this operator is given by

(18)

where is given by (10) and by (16). Therefore in order to concentrate at unity we must have

(19)

In other words, we want to be a trigonometric polynomial approximating the continuous function . Now if the Fourier series of is given by

(20)

then its Fourier coefficients are the coefficients of the cardinal function for the integer grid, that is

(21)

and

(22)

(See, for instance, Buhmann (1990).) Recalling (11), we deduce that one way to calculate approximate values of the coefficients is to solve the linear system

(23)

where . We now set

(24)

and we observe that the symbol function for the Gaussian is a theta function (see Baxter (1994), Section 2). Thus is a positive continuous function whose Fourier series is absolutely convergent. Hence is a positive continuous function and Wiener’s lemma (Rudin (1973)) implies the absolute convergence, and therefore the uniform convergence, of its Fourier series. We deduce that the symbol function can be chosen to approximate to within any required accuracy. More formally we have the

Lemma 2.1

Given any , there are positive integers and such that

(25)

for every , where is given by (23).

{proof}

The uniform convergence of the Fourier series for implies that we can choose such that

(26)

By (11), we can also choose such that , when . Then we have

Since is arbitrary the proof is complete.

3 The Multiquadric

The multiquadric interpolation matrix

(28)

where and are points in , is not positive definite. In Micchelli (1986), it was shown to be almost negative definite, that is for any real numbers satisfying we have

(29)

Furthermore, inequality (29) is strict when , the points are all different, and . In other words, is negative definite on the subspace , where .

Of course we cannot apply Algorithms 2.1 and 2.2 in this case. However, we can use the almost negative definiteness of to solve a closely related linearly constrained quadratic programming problem:

where can be any element of . Standard theory of Lagrange multipliers guarantees the existence of a unique pair of vectors and satisfying the equations

(31)

where is the Lagrange multiplier vector for the constrained optimization problem (LABEL:chapno3.2). We do not go into further detail on this point because the nonsingularity of the matrix

(32)

is well-known (see, for instance, Powell (1990)). Instead we observe that one way to solve (31) is to apply the following modification of Algorithm 2.1 to (LABEL:chapno3.2).

Algorithm 3.1

Let be any symmetric matrix such that .

Set , , .
For do begin

  
  

  
 Stop if or is sufficiently small.
 end.

We observe that Algorithm 3.1 solves the linearly constrained optimization problem

Moreover, the following elementary lemma implies that the solutions of (31) and of (LABEL:chapno3.5) are related by the equations .

Lemma 3.1

Let be any symmetric matrix and let . Then is a bijection. In other words, given any there is precisely one such that

(34)
{proof}

For any matrix we have the equation

Consequently the symmetric matrix satisfies

whence . Hence for every there exists such that . Now we can write , where and are uniquely determined by . Thus , and (34) has a solution. If also satifies (34), then their difference lies in the intersection , which settles the uniqueness of .

Setting and in Lemma 3.1 we deduce that there is exactly one such that

and is negative definite when restricted to the subspace . Following the development of Section 2, we define

(35)

as in equation (7). However, we cannot define by (7) because is singular. One solution, advocated by Dyn, Levin and Rippa (1986), is to use the recurrence for embodied in Algorithm 2.1 without further ado.

Algorithm 3.2

Choose any in . Set and .

For do begin

  
  

  
 Stop if or is sufficiently small.
end.

However this algorithm is unstable in finite precision arithmetic, as we shall see in our main example below. One modification that successfully avoids instability is to force the condition

(36)

to hold for all . Now Lemma 3.1 implies the existence of exactly one vector for which . Therefore, defining to be the orthogonal projection onto , that is , we obtain

Algorithm 3.3

Choose any in . Set , .

For do begin

  
  

  
 Stop if or is sufficiently small.
end.

We see that the only restriction on is that it must be a non-negative definite symmetric matrix such that . It is easy to construct such a matrix given a positive definite symmetric matrix D by a rank one modification:

(37)

The Cauchy-Schwarz inequality implies that with equality if and only if . Of course we do not need to form explicitly, since . Before constructing we consider the spectral properties of in more detail.

A minor modification to Proposition 5.2.2 of Baxter (1992) yields the following useful result. Let us say that a complex sequence is zero-summing if it is finitely supported and satisfies . The symbol function

(38)

now requires the distributional Fourier transform of the multiquadric. In the univariate case, this is given by

(39)

where is a modified Bessel function. The symbol function is studied extensively in Baxter (1994).

Figure 2: The reciprocal symbol function for the multiquadric.
Proposition 3.2

For every we can find a set of zero-summing sequences such that

(40)
{proof}

We adopt the proof technique of Proposition 5.2.2 of Baxter (1992). For each positive integer we define the trigonometric polynomial

and we recall from Section 2 of Baxter (1994) that

(41)

where is the th degree Fejér kernel. We now choose to be the Fourier coefficients of the trigonometric polynomial , which implies the relation

and we see that is a zero-summing sequence. By the Parseval relation we have

(42)

and the approximate identity property of the Fejér kernel (Zygmund (1988), p. 86) implies that

Further, because is continuous on (Baxter (1994), Section 4.4), we have

Thus we have shown that, just as in the classical theory of Toeplitz operators (Grenander and Szegő (1984)), everything depends on the range of values of the symbol function . Because inherits the double pole that enjoys at zero, we have . In Figure 2 we display the function .

Now let be a positive integer and let be an even sequence of real numbers. We define a bi-infinite banded symmetric Toeplitz matrix by the equations

(43)

Thus where . Further

(44)

where the symbol function for the Toeplitz operator is given by

(45)

Now the function is continuous for , so the argument of Proposition 3.2 also shows that, for every , we can find a set of zero-summing sequences such that

(46)

A good preconditioner must ensure that is a bounded set. Because of the form of we have the equation

(47)

Moreover, as in Section 2, we want the approximation

(48)

and we need to be a non-negative trigonometric polynomial which is positive almost everywhere, which ensures that every one of its principal minors is positive definite.

Let us define

(49)

and

(50)

Then Theorem 9 of Buhmann and Micchelli (1991) states that

(51)

for any given fixed integer . We shall use this fact to construct a suitable . First we subtract a multiple of the vector from to form a new vector satisfying , and we observe that, by (51), is one-signed for all sufficiently large values of . For the numerical experiments here, we have chosen and .

Iteration Error
Table 3: Preconditioned CG – , ,
Iteration Error
Table 4: Preconditioned CG – , ,

Thus, given

for any , we let be any principal submatrix of and define the preconditioner by the equation

(52)

where . We reiterate that we actually compute the matrix-vector product by the operations rather than by storing the elements of in memory.

provides an excellent preconditioner. Tables 3 and 4 illustrate its use when Algorithm 3.3 is applied to the linear system

when and respectively. Here , and consists of pseudo-random real numbers uniformly distributed in the interval . Again, this behaviour is typical and all our numerical experiments indicate that the number of steps is independent of . We remind the reader that the error shown is , but that the iterations are stopped when either or is less than , where we are using the notation of Algorithm 3.3.

Figure 3: The spectrum of for and .
Figure 4: The spectrum of for and .
Iteration Error
Table 5: Preconditioned CG – , ,
Iteration – 3.3a – 3.3b
Table 6: Algorithms 3.3a & b – , , , .

It is interesting to compare Table 3 with Table 5. Here we have chosen , and the preconditioner is essentially a multiple of the second divided difference preconditioner advocated by Dyn, Levin and Rippa (1986). Indeed, we find that and . We see that its behaviour is clearly inferior to the preconditioner generated by choosing . Furthermore, this is to be expected, because we are choosing a smaller finite section to approximate the reciprocal of the symbol function. However, because is a multiple of , this preconditioner still possesses the property that is a bounded set of real numbers.

It is also interesting to compare the spectra of for and and . Accordingly, Figures 3 and 4 display all but the largest nonzero eigenvalues of for and respectively. The largest eigenvalues are and , respectively, and these were omitted from the plots in order to reveal detail at smaller scales. We see that the clustering of the spectrum when is excellent.

The final topic in this section demonstrates the instability of Algorithm 3.2 when compared with Algorithm 3.3. We refer the reader to Table 6, where we have chosen , , and setting .

The iterations for Algorithm 3.3, displayed in Table 6, were stopped at iteration . For Algorithm 3.2, iterations were stopped when either or became smaller than . It is useful to display the norm of