Fast SGL Fourier transforms for scattered data

Fast SGL Fourier transforms for scattered data\addtitlefootnote

Christian Wülker Christian Wülker - Department of Mechanical Engineering, Johns Hopkins University
July 27, 2019
Abstract.

Spherical Gauss-Laguerre (SGL) basis functions, i. e., normalized functions of the type , , being a generalized Laguerre polynomial, a spherical harmonic, constitute an orthonormal polynomial basis of the space on with radial Gaussian (multivariate Hermite) weight . We have recently described fast Fourier transforms for the SGL basis functions based on an exact quadrature formula with certain grid points in . In this paper, we present fast SGL Fourier transforms for scattered data. The idea is to employ well-known basal fast algorithms to determine a three-dimensional trigonometric polynomial that coincides with the bandlimited function of interest where the latter is to be evaluated. This trigonometric polynomial can then be evaluated efficiently using the well-known non-equispaced FFT (NFFT). We proof an error estimate for our algorithms and validate their practical suitability in extensive numerical experiments.

Key words and phrases:
Spherical Gauss-Laguerre basis functions, generalized FFTs, non-equispaced data

1. Introduction

Let denote the standard Euclidean norm on . We consider the weighted space

 H\coloneqq{f:R3→C Lebesgue % measurable and∫R3|f(x)|2e−∥x∥22dx<∞}

equipped with the inner product

 ⟨f,g⟩H\coloneqq∫R3f(x)¯¯¯¯¯¯¯¯¯¯g(x)e−∥x∥22dx,f,g∈H,

and induced norm . As the (classical) multivariate Hermite polynomials, spherical Gauss-Laguerre basis functions are orthogonal polynomials in the Hilbert space . They arise from a particular construction approach in spherical coordinates. The latter are defined as radius , polar angle , and azimuthal angle , connected with Cartesian coordinates , , and via

 x =rsinϑcosφ, y =rsinϑsinφ, z =rcosϑ.
Definition 1.1.

The spherical Gauss-Laguerre (SGL) basis function of orders , , and is defined as

 Hnlm:R3→C,Hnlm(r,ϑ,φ)\coloneqqNnlRnl(r)Ylm(ϑ,φ),

wherein

 Nnl\coloneqq√2(n−l−1)!Γ(n+1/2)

is a normalization constant, is the spherical harmonic of degree and order (see [Dai and Xu, 2013, Sect. 1.6.2], for example), while the radial part is defined as

 Rnl(r)\coloneqqL(l+1/2)n−l−1(r2)rl,

being a generalized Laguerre polynomial (see, e. g., [Andrews et al., 1999, Sect. 6.2]).

Theorem 1.2 ([Prestin and Wülker, 2017, Cor. 1.3]).

The SGL basis functions constitute a complete orthonormal polynomial set (a polynomial orthonormal basis) in the Hilbert space .

Theorem 1.2 implies that a function can be approximated arbitrarily well w. r. t.  by finite linear combinations of the SGL basis functions. Such linear combinations are referred to as bandlimited functions. In particular, a function is called bandlimited with bandwidth if the SGL Fourier coefficients vanish for . We have recently described fast and reliable SGL Fourier transforms, i. e., generalized FFTs for the SGL basis functions [Prestin and Wülker, 2017]. These fast algorithms compute the potentially non-zero SGL Fourier coefficients of a function with bandwidth in or even only computation steps from sampled function values, instead of the naive computation steps (as usual, we define a single computation step as a complex multiplication and subsequent addition). However, in these fast SGL Fourier transforms, the function of interest is to be sampled at certain grid points in . In addition to our fast SGL Fourier transforms, another advantage of using the SGL basis functions is that their spectral behavior under rotations and translations in is completely known [Prestin and Wülker, 2018].

There are many applications conceivable in which the sample values of a bandlimited function are not given on the grid points in [Prestin and Wülker, 2017, Thm. 3.4]. It could be, for example, that the points constitute a Cartesian grid in , or that these points are scattered in another way. In such cases, computation of the SGL Fourier coefficients of using our previously described fast SGL Fourier transforms after interpolation of the given function values is generally not advisable. Therefore, in this paper, we develop non-lattice fast SGL Fourier transforms (NFSGLFTs).

In the case of the classical FFT, the now standard non-equispaced equivalent (NFFT) was introduced by Potts et al. [2001], subsequent to works including [Dutt and Rokhlin, 1993, 1995; Beylkin, 1995; Elbel and Steidl, 1998] (cf. [Keiner et al., 2009, Appx. D]). Nowadays the NFFT is firmly established in practice and continues to prove itself useful in many applications. A direct employment of the NFFT also yields “non-equispaced” FFTs on the two-dimensional unit sphere [Kunis and Potts, 2003] and on the three-dimensional rotation group [Potts et al., 2009]. In both cases, the idea is to use well-known basal fast algorithms to determine a multivariate trigonometric polynomial that coincides with the bandlimited function of interest where the latter is to be evaluated. This trigonometric polynomial can then be evaluated efficiently using the NFFT. In this paper, we pursue the same strategy in the SGL case, resulting in a class of NFSGLFTs analogous to the generalized NFFTs on and mentioned above. In addition to the three-dimensional NFFT of Potts et al., as basal fast algorithms, we use a fast discrete Legendre transform (FLT), the Clenshaw algorithm or, alternatively, a fast discrete polynomial transform (FDPT), as well as the well-known fast discrete cosine transform (DCT).

Analogously as in the derivation of all the above-mentioned (generalized) NFFTs, we begin the derivation of our NFSGLFTs with the discrete transform that reconstructs function values of a bandlimited function with bandwidth at scattered points from given SGL Fourier coefficients , . To state this transform, we linearize the index range of the SGL basis functions: We identify the triple , , with via the one-to-one correspondence

 μ=n(n−1)(2n−1)6+l(l+1)+m.

This allows for understanding the indices , , and as functions of the linear index ,

 n(μ) =⌊1√3cosh(13arcosh(36√3μ))+12⌋, (1.1) l(μ) =⌊√μ−n(μ)(n(μ)−1)(2n(μ)−1)6⌋, m(μ) =μ−n(μ)(n(μ)−1)(2n(μ)−1)6−l(μ)(l(μ)+1).
Definition 1.3 (Ndsglft).

Let scattered points , , and a bandwidth be given. We set

The corresponding linear mapping is called the non-lattice discrete SGL Fourier transform (NDSGLFT).

Let now a Fourier vector of a bandlimited function with bandwidth , as well as scattered points be given. Without risk of confusion, we define the vector containing the corresponding function values of the function . These function values can be computed from the Fourier vector by multiplication of the latter by the transformation matrix , i. e.,

 (1.2) f=Λ^f.

A direct multiplication by the matrix , however, apparently requires computation steps. In practice, this is prohibitively expensive. The NFSGFTs presented in this paper, in contrast, have an asymptotic complexity of , where and are functions of the bandwidth ,

 Φ(B) =B4+(σ(B)B)3log(σ(B)B), Ψ(B) =q(B)3,

where is the oversampling factor and the cutoff parameter of the employed three-dimensional NFFT (Sect. 2). When using an FDPT instead of the Clenshaw algorithm, can be achieved. Note that the above functions and are representatives of larger function classes.

The NFSGLFTs presented in this paper are characterized by a respective factorization of the matrix in (1.2). This means that we also have a respective fast adjoint transform, i. e., a fast algorithm for multiplication with the Hermitean-transposed matrix , with the same complexity. Analogously as in the case of the NFFT, we can thus realize a fast inverse transform (iNFSGLFT), i. e., a fast algorithm for computing the SGL Fourier coefficients , , of a bandlimited function with bandwidth from given scattered data , as an iterative conjugate-gradient (CG) method.

It is important to note that as is the case in the classical NFFT and its above-mentioned generalizations, the NFSGLFTs presented in this paper are approximating algorithms, the results of which are approximative even in exact arithmetics. Hence, in our investigations, the relation between the approximation error and the functions and plays a crucial role. In particular, we will show that is possible to appropriately choose and in order to control the error when the bandwidth is increasing.

The remainder of this paper is organized as follows: Sections 2, 3, 4, and 5 deal with the required NFFT, DCT, FLT, and the Clenshaw algorithm / FDPT, respectively. The derivation of the NFSGLFTs is contained in Section 6. In Section 7, we proof an error estimate. Finally, in Section 8, we report and discuss our extensive numerical results, demonstrating the practical applicability of our fast algorithms.

Notation.

We write if there exists a constant such that for all .

2. Non-equispaced fast Fourier transform (NFFT)

In this section, we review the functional principle of the NFFT of Potts et al., due to its fundamental importance within this work. We follow the outline of [Potts, 2003, Sect. 1.1]. As an important result, in Theorem 2.7, we further derive an error estimate for the -dimensional NFFT that we will later need. Although the NFFT itself is a grid-free transform, certain grids occur in the functional background; they also describe the index range of the Fourier coefficients. We begin with the definition of these grids, as well as the -dimensional torus.

Definition 2.1.

Let , even. We set

 Idn\coloneqq{z=[ζ0,…,ζd−1]∈Zd:−n/2≤ζi
Definition 2.2.

For , the -dimensional torus is defined as the quotient group

 Td\coloneqqRd/2πZd.
Remark 2.3.

For each equivalence class , the representative can be chosen in . We thus simply identify the torus with the -dimensional hypercube .

As opposed to in the classical FFT, the derivation of the NFFT starts with the reconstruction of function values from given Fourier coefficients. Let such Fourier coefficients , , of a -dimensional trigonometric polynomial

 (2.1)

of degree at most be given ( even). The NFFT is a fast algorithm to evaluate the trigonometric polynomial at scattered points . The difference to the classical FFT is that these points do not necessarily lie on a grid, and that their number is independent of .

To bring the above problem into matrix-vector notation, the index range of the Fourier coefficients of is linearized (cf. [Potts, 2003, p. 11]): We identify with via the bijective relation

 χ=(κ0+n2)+n(κ1+n2)+⋯+nd−1(κd−1+n2).

This allows for understanding as a function of ,

 k(χ)=[⌊χ−(χ mod nj+1)nj⌋−n2]j=0,…,d−1.

The computation of the polynomial values is equivalent to evaluating the matrix-vector product

 (2.2)
Definition 2.4 (Ndft).

The above linear mapping is called the -dimensional non-equispaced discrete Fourier transform (NDFT).

A direct evaluation of the product (2.2) requires steps, in accordance with the size of the matrix . The NFFT is an efficient approximating algorithm with a lower complexity. The idea is the following: Determine an approximant to the trigonometric polynomial that can be evaluated efficiently in the local (time) domain, so that and are as similar as possible in the frequency domain. The latter requirement is met as best as possible if

 (2.3) (2π)−d∫Tds(t)e−i⟨k,t⟩2dt=ωk,k∈Zd,

holds, where for .

Let now be an ansatz function that can be evaluated in steps. We assume that the -periodized version

 ~φ(t)\coloneqq∑z∈Zdφ(t−2πz),t∈Td,

has a uniformly converging Fourier series. The latter is given by

 (2.4) ~φ(t)=∑k∈Zd~φkei⟨k,t⟩2,t∈Td,

with the Fourier coefficients

 ~φk=(2π)−d∫Td~φ(t)e−i⟨k,t⟩2dt,k∈Zd.

The equality in (2.4) is to be understand in the sense; we shall not mention this for the following Fourier series. The Fourier coefficients are directly connected with the continuous Fourier transform of the ansatz function via the well-known Poisson summation formula, i. e.,

 (2.5) ~φk=(2π)−d∫Rdφ(x)e−i⟨k,x⟩2dx,k∈Zd.

As a first approximant to the trigonometric polynomial , a linear combination of translates of the -periodized ansatz function is chosen,

 (2.6) ~p(t)\coloneqq∑l∈Idσnαl~φ(t−2πσnl),t∈Td,

where is an oversampling factor, and the scaling coefficients of the translates are to be determined so that (2.3) approximately holds. Expanding into a Fourier series yields

 ~p(t) =∑k∈Zdβk~φkei⟨k,t⟩2 (2.7) =∑k∈Idσnβk~φkei⟨k,t⟩2+∑z∈Zd∖{0}∑k∈Idσnβk~φk+σnzei⟨k+σnz,t⟩2

with the -periodic coefficients

 (2.8) βk=∑l∈Idσnαle−2πi⟨k,l⟩2/σn.

Assuming that the absolute value of the Fourier coefficients with and is negligible small, the double sum on the right-hand side of (2.7) can be neglected. Under the additional assumption that the absolute value of the Fourier coefficients does not vanish for , a comparison of the first sum on the right-hand side of (2.7) with the right-hand side of (2.1) motivates the particular choice

 βk\coloneqq{ωk/~φk% if k∈Idn,0if k∈Idσn∖Idn,

so that the equality in (2.3) holds for all .

Now the scaling coefficients in (2.6) are determined from the coefficients . To this end, (2.8) is brought into matrix-vector notation:

 [βk(χ)]χ=0,…,(σn)d−1=[e−2πi⟨k(χ),l(ν)⟩2/σn]χ,ν=0,…,(σn)d−1\eqqcolonF⋅[αl(ν)]ν=0,…,(σn)d−1.

The matrix is a classical Fourier matrix in dimensions with special ordering of the indices. Its inverse is given by . In particular, this implies that by means of the -dimensional iFFT, the scaling coefficients can be computed in steps from the coefficients . To obtain a very fast algorithm, the oversampling factor should thus be chosen such that is a power of two.

In a last step, the approximant is approximated by the final approximant that can be evaluated in local space more efficiently. Under the assumption that the ansatz function decays fast in local space, it can be replaced by another function , the support of which is contained in a hypercube (, ). In particular, the choice

 (2.9) ψ(x)\coloneqq{φ(x)for x∈[−2πq/σn,2πq/σn]d,0otherwise,

is made, and again the -periodized version is considered. The number is called the cutoff parameter. As an approximant to , and thus to , the function

 s(t)\coloneqq∑l∈Idσnαl~ψ(t−2πσnl),t∈Td,

lends itself particularly well (cf. Eq. 2.6). In accordance with (2.9), for fixed , the index range can be restricted to the range , as can be checked geometrically easily. This allows for an evaluation of the approximant in instead of steps. The overall arithmetic complexity of the above-described NFFT is, therefore, . The storage complexity amounts to .

The above-derived NFFT corresponds to an approximate factorization of the matrix in (2.2). Without going into detail, we here refer to [Potts, 2003, Sect. 1.2]. This means in particular that we also have an adjoint NFFT, i. e., a fast algorithm for multiplication with the Hermitean-transposed (adjoint) matrix . The adjoint NFFT has the same arithmetic and storage complexity as the NFFT.

In this work, we will employ a Gaussian ansatz function (cf. [Potts, 2003, Sect. 1.3.2]):

Definition 2.5.

Let , even, and . We define the Gaussian ansatz function for the NFFT as

 φ(x)\coloneqqd−1∏j=0ϕ(ξj),x=[ξ0,…,ξd−1]∈Rd,

with the underlying univariate function

 ϕ(ξ)\coloneqq1√2πλ(σn2π)e−(σn/2π)2ξ2/2λ.

We can state a closed-form expression for the continuous Fourier transform of the Gaussian ansatz function at points in by generalizing [Potts, 2003, Eq. 1.34] to the -dimensional case. By (2.5), this facilitates the use of the Gaussian ansatz function in the NFFT.

Lemma 2.6.

Let , , and be as in Definition 2.5. For the continuous Fourier transform of the Gaussian ansatz function, we have that

 ∫Rdφ(x)e−i⟨k,x⟩2dx=d−1∏j=0ϕκj,k=[κ0,…,κd−1]∈Zd,

with .

In the next part of this section, we derive an error estimate for the NFFT with Gaussian ansatz function. Specifically, we give an upper bound for the maximum absolute error

 E∞=E∞(d,n;σ,q;t0,…,tm−1;w)\coloneqqmaxi∈{0,…,m−1}|p(ti)−s(ti)|

with given Fourier coefficients . For this, by making use of Lemma 2.6, we generalize [Potts, 2003, Thm. 1.7] to the -dimensional case (note the different normalization of the Gaussian ansatz function in [Potts, 2003]). We need this error bound in Section 7 for .

Theorem 2.7.

The maximum absolute error of the NFFT with Gaussian ansatz function and the special choice is bounded by

 E∞≤∥w∥1((2d−1)(2+1πq)d+3d−1qd/2(√2σ−12σ+12π√2σ2σ−1)d)e−qπ(1−12σ(1+d2σ−1)).

In the special case , we obtain for due to monotonicity the error estimate

 E∞≲∥w∥1e−qπ/2.
Proof.

We first get with the triangle inequality for arbitrary that

 (2.10) |p(ti)−s(ti)|≤|p(ti)−~p(ti)|\eqqcolonEa(ti)+|~p(ti)−s(ti)|\eqqcolonEt(ti).

The aliasing error , which results from the cutoff of in the frequency domain, can be estimated as [Potts, 2003, Eq. 1.23]

 Ea(ti)≤∥w∥1maxk∈Idn∑z∈Zd∖{0}∣∣∣~φk+σnz~φk∣∣∣.

On the right-hand side, we distinguish between summands with exactly one of the components of different from zero, with exactly two of the components of different from zero, etc. It follows with (2.5) and Lemma 2.6 that

 Ea(ti) ≤∥w∥1d∑j=1(dj)(maxκ∈I1n∑ζ∈Z∖{0}∣∣∣ϕκ+σnζϕκ∣∣∣\eqqcolonL)j.

The proof of [Potts, 2003, Thm. 1.7] reveals that

 L=L(σ,n,q) ≤e−2π2λ(1−1/σ)(1+σ2π2λ(2σ−1)+e−4λπ2/σ(1+σ2π2λ(2σ+1))) ≤e−qπ(1−12σ(1+d2σ−1))(1+12πq+e−4πq/(2σ−1)(1+12πq2σ−12σ+1)) ≤e−qπ(1−12σ(1+d2σ−1))(2+1πq),

due to the special choice of . It follows with the binomial formula that

 (2.11) Ea(ti)≤∥w∥1(2d−1)(2+1πq)de−qπ(1−12σ(1+d2σ−1)).

The truncation error that is due to the cutoff of in local space can be estimated as

 (2.12)

[Potts, 2003, Eq. 1.27]. According to (2.5) and Lemma 2.6, we have that

 maxk∈Idn|~φk|−1=(2π)d(maxκ∈I1n|ϕκ|−1)d=(2π)deλ(π/σ)2d/2.

Without loss of generality, let . On the right-hand side of (2.12), we distinguish between summands with multi-index for which the condition is fulfilled by exactly one of , by exactly two of , etc. It is

 ∣∣∣ϕ(τj+2πσnι)∣∣∣≤⎧⎪⎨⎪⎩∣∣ϕ(2πσnι)∣∣if ι≥0,∣∣ϕ(2πσn(ι+1))∣∣if ι<0.

Similar as above, we get

 Et(ti) ≤∥w∥1(2πσn)deλ(π/σ)2d/2d∑j=1(dj)(1√2πλ(σn2π))d−j(2∑ι≥q∣∣∣ϕ(2πσnι)∣∣∣)j ≤∥w∥1(2πλ)−d/2eλ(π/σ)2d/2d∑j=1(dj)2j(∑ι≥qe−ι2/2λ\eqqcolonM)j.

The proof of [Potts, 2003, Thm. 1.7] shows that

 M=M(σ,q)≤(1+λq)e−q2/2λ.

Due to the special choice of , and again with the binomial formula, we obtain

 (2.13) Et(ti) ≤∥w∥1(3d−1)(2πλ)−d/2(1+λq)deλ(π/σ)2d/2−q2/2λ =∥w∥13d−1qd/2(√2σ−12σ+12π√2σ2σ−1)de−qπ(1−12σ(1+d2σ−1)).

Inserting (2.11) and (2.13) into (2.10) and taking the maximum over all completes the proof. ∎

We note that the bound for the error in Theorem 2.7 does not directly depend upon , but that must be chosen as . By choosing the oversampling factor large enough, we can see that the error decays not less than exponentially w. r. t. the cutoff parameter .

To close this section, we review how the inverse transform (iNFFT), i. e., the fast algorithm to compute the Fourier coefficients , , of a -dimensional trigonometric polynomial of degree at most from given scattered data , can be constructed from the NFFT and its adjoint. For this, there are different possibilities, of which we only consider one particular here. For a more extended discussion and potential further developments of the method discussed here, we refer to [Kunis, 2006, Chap. 5] (see also [Potts, 2003, Sect. 1.7]).

Let and be as in (2.2). The aim is to find a Fourier vector that solves the linear system

 (2.14) N~f=f.

Under the above assumptions, this system has at least one solution. To compute such, we state (2.14) as a least-squares problem: Determine so that

 (2.15) ∥f−N~f∥22≤∥f−Ng∥22∀g∈Cnd.

We distinguish between three different cases. If the number of sample points is larger than the number of potentially non-zero Fourier coefficients (), then the solutions of (2.15) are obtained by solving the normal equations of first kind of the over-determined system (2.14),

 (2.16) NHN~f=NHf.

If we assume in addition that the columns of are linearly independent (i. e., ), then the solution is unique. The conjugate-gradient normal-equation residual (CGNR) method lends itself well to the numerical solution of (2.16) (see [Golub and van Loan, 1996, Alg. 10.4.1]). The advantage of this method in this context is that it is based on multiplications by the matrices and , a task for which we have the NFFT and its adjoint as fast algorithms.

If, on the other hand, the number of points is smaller than the number of Fourier coefficients (), then the least-squares problem (2.15) is reformulated as an optimization problem: Determine so that

 ∥f−N~f∥22≤∥f−Ng∥22∀g∈Cnd,∥~f∥2=min.

With this additional condition, the solution is uniquely determined independently of the rank of . If we assume that the rows of are linearly independent (i. e., ), then we obtain by solving the normal equations of second kind of the under-determined system (2.14),

 (2.17) NNHg=f,~f=NHg.

The conjugate-gradient normal-equation error (CGNE) method is well suited for the numerical solution of (2.17) (see [Golub and van Loan, 1996, Alg. 10.4.2])). Here again we can use the NFFT and its adjoint for the required multiplications with and , respectively.

Finally, if the number of sample points is exactly the same as the number of Fourier coefficients (), then we can also apply the CGNR method to the normal equations (2.16); with the NFFT and its adjoint, we obtain a fast algorithm here as well.

3. Discrete cosine transform (DCT)

One cannot really speak of the discrete cosine transform, for there are multiple classes of underlying discrete transforms to be distinguished (cf. Rem. 3.2). Here, we consider a particular type and call this the DCT.

Definition 3.1 (Dct).

Let . We set

 Dn \coloneqqdiag[1/√n,√2/n,…,√2/n]∈Rn×n, Cn \coloneqq[cos(iωj)]i,j=0,…,n−1∈Rn×n,

where . The linear mapping is called the discrete cosine transform.

Remark 3.2.

In the literature, four different variants DCT I to IV are typically distinguished from each other (see, e. g., [Plonka and Tasche, 2005, Sect. 2]). These are the four versions established in practice of eight theoretically possible [Strang, 1999]. The DCT in Definition 3.1 is closely related to the DCT II, which is often referred to as the discrete cosine transform.

For a given vector of length , a direct multiplication by the matrix apparently requires steps. In the context of the discrete cosine transform, the fast algorithms for performing the DCT are commonly also abbreviated as DCT, instead of FCT for fast cosine transform. Being closely related to the iFFT of length (cf. [Wülker, 2018, p. 40]), these fast algorithms have an asymptotic complexity of (see also [Plonka and Tasche, 2005], for instance).

As a first important property of the DCT, we note without proof that it is an orthogonal transform, which also answers the question regarding the inverse transform (iDCT):

Lemma 3.3.

It is .

For two vectors and of length , Lemma 3.3 implies that . The DCT is thus an isometric isomorphism w. r. t. the standard Euclidean norm, i. e., . In Section 7, we need the following estimate for the -norm:

Let and . Then .

Proof.

From the isometry property of the DCT w. r. t. the -norm, it follows with the Cauchy-Schwarz inequality that

 ∥~Cnx∥1≤√n∥~Cnx∥2=√n∥x∥2≤√n∥x∥1.

Another property of the DCT is of particular importance to us, for it makes working with (trigonometric) polynomials very easy. To see this, we first introduce the Chebyshev polynomials (of first kind),

 (3.1) Tk:[−1,1]→R,Tk(cosω)\coloneqqcos(kω)(k∈N0).

From the well-known cosine addition theorem, it follows that is a polynomial of degree . Hence, for fixed , the first Chebyshev polynomials constitute a basis of the polynomial space . The next lemma shows how to expand polynomials efficiently w. r. t. the Chebyshev basis with the DCT.

Lemma 3.5 (cf. [Kunis and Potts, 2003, Sect. 3]).

Let

 p=n−1∑k=0αkTk

be a polynomial on of degree at most , and for . Then it is

 [αk]k=0,…,n−1=Dn~Cn[p(cosωj)]j=0,…,n−1.

4. Fast Legendre transform (FLT)

The associated Legendre polynomial of degree and order is defined as

 Plm:[−1,1]→R,Plm(ξ)\coloneqq(−1)m2ll!(1−ξ2)m/2dl+mdξl+m(ξ2−1)l.

Note that the associated Legendre polynomials are, in fact, only polynomials for even order . They are sometimes also referred to as associated Legendre functions. The associated Legendre polynomial constitutes the polar part of the spherical harmonic .

For , we set , , and define the Legendre matrices

 (4.1)
Definition 4.1 (Dlt).

Let . The sequence of linear mappings , , is called discrete Legendre transform (DLT).

In addition to , let data , , be given. When precomputing each required matrix , steps are necessary for directly computing each matrix-vector product . This results in a naive algorithm for performing the DLT with an arithmetic and storage complexity of . In recent years, many FLTs with a lower complexity have been proposed (see, for example, [Driscoll and Healy, 1994; Healy et al., 2003; Kunis and Potts, 2003]). This is due to the fact that the FLT constitutes an integral part of the fast spherical Fourier transform (ibid.). Pursuing the well-known divide-and-conquer strategy (see, e.g., [Cormen et al., 2001, Sect. 2.3.1]), Healy et al. develop FLTs with an arithmetic and storage complexity of ; see [Healy et al., 2003, Thm. 3] and note that the precomputed data structure is required solely for the FLT. Kunis and Potts [2003, Sect. 4] offer FLTs with a complexity of as well; these authors even also carry out a stabilization for large problem sizes. These elaborate FLTs, however, are more of theoretical interest to us: for the three-dimensional problems considered in this work, the DCT-based semi-naive FLT and its adjoint of Healy et al. are suitable choices (cf. [Healy et al., 2003, Sect. 6]). The semi-naive FLT and its adjoint – which correspond to a factorization of the matrices and , respectively, see [Wülker, 2018, Thm. 2.2.14 & Cor. 2.2.15] – have an asymptotic and storage complexity of , i.e., they are no “truly fast” algorithms. The number of required computation steps, however, is significantly reduced in this variant. As an alternative to the semi-naive FLTs, one could also use the Clenshaw algorithm (Sect. 5) or an FDPT and its respective adjoint to obtain a fast FLT and adjoint. This is due to the fact that the associated Legendre polynomials satisfy the three-term recurrence relation

 (4.2) (l+1−m)Pl+1,m(ξ)=(2l+1)ξPlm(ξ)−(l+m)Pl−1,m(ξ),|m|≤l∈N;

cf. Remark 5.1.

5. The Clenshaw algorithm / fast discrete polynomial transform (FDPT)

Consider a function system satisfying a three-term recurrence relation

 (5.1) fk+1(ξ)=αk(ξ)fk(ξ)+βk(ξ)fk−1(ξ),k∈N.

Here we assume that the coefficients functions and can be evaluated in steps. We are looking for an efficient method to evaluate the sums

 (5.2) Sj\coloneqqn−1∑k=0γkfk(ξj),j=0,…,m−1,

with given data at given points (). This problem can be stated in matrix-vector notation as

 (5.3) ⎡⎢ ⎢⎣S0⋮Sm−1⎤⎥ ⎥⎦=⎡⎢ ⎢⎣f0(ξ0)⋯fn−1(ξ0)⋮⋮f0(ξm−1)⋯fn−1(ξm−1)⎤⎥ ⎥⎦\eqqcolonA⋅⎡⎢ ⎢⎣γ0⋮γn−1⎤⎥ ⎥⎦.

If one precomputes the matrix