Fast Algorithms for the Multi-dimensionalJacobi Polynomial Transform

Fast Algorithms for the Multi-dimensional
Jacobi Polynomial Transform

James Bremer
Department of Mathematics
University of California, Davis, CA, USA
bremer@math.ucdavis.edu
   Qiyuan Pang
Department of Mathematics
Tsinghua University, China
ppangqqyz@foxmail.com
   Haizhao Yang
Department of Mathematics
National University of Singapore, Singapore
haizhao@nus.edu.sg
July 20, 2019
Abstract

We use the well-known observation that the solutions of Jacobi’s differential equation can be represented via non-oscillatory phase and amplitude functions to develop a fast algorithm for computing multi-dimensional Jacobi polynomial transforms. More explicitly, it follows from this observation that the matrix corresponding to the discrete Jacobi transform is the Hadamard product of a numerically low-rank matrix and a multi-dimensional discrete Fourier transform (DFT) matrix. The application of the Hadamard product can be carried out via fast Fourier transforms (FFTs), resulting in a nearly optimal algorithm to compute the multidimensional Jacobi polynomial transform.

Keywords. Multi-dimensional Jacobi polynomial transform, nonuniform transforms, non-oscillatory phase function, randomized low-rank approximation, fast Fourier transform.

1 Introduction

The one-dimensional forward discrete Jacobi transform consists of evaluating an expansion of the form

(1)

where denotes the order- Jacobi polynomial of the first kind corresponding to parameters and , at a collection of points . The one-dimensional inverse discrete Jacobi transform is the process of computing the coefficients in an expansion of the form (1) given its values at a collection of distinct points . For the sake of brevity, we will generally drop the adjective “discrete” and simply use the terms “forward Jacobi transform” and “inverse Jacobi transform” when referring to these operations. Often, the points are the nodes of the -point Gauss-Jacobi quadrature rule

(2)

that is exact when is a polynomial of degree less than or equal to . However, it is useful to consider more general sets of points as well. In the case in which the points are the nodes of the Gauss-Jacobi quadrature rule, we say the corresponding transform and inverse transform are uniform; otherwise, we describe them as nonuniform. To form an orthonormal matrix to represent the transform for numerical purpose, we will rescale the transform in (1) with quadrature weights later.

Many methods for rapidly applying the Jacobi transform and various special cases of the Jacobi transform have been developed. Examples include the algorithms of [1, 2, 3, 4] for applying the Legendre transform, and those of [5] for forming and evaluating expansions for Jacobi polynomials in general. See also, [6] and its references for extensive information on numerical algorithms for forming and manipulating Chebyshev expansions. Almost all such algorithms can be placed into one of two categories. Algorithms in the first category, such as [7, 8, 9, 10, 11], make use of the structure of the connection matrices which take the coefficients in the expansion of a function in terms of one set of Jacobi polynomials to the coefficients in the expansion of the same function with respect to a different set of Jacobi polynomials. They typically operate by computing the Chebyshev coefficients of an expansion and then applying a connection matrix or a series of connection matrices to obtain the coefficients in the desired expansion. The computation of the Chebyshev expansion can be carried out efficiently in operations via the nonuniform FFT (NUFFT) [12, 13] and the application of each connection matrix can be performed in a number of operations which grows linearly or quasi-linearly via the fast multipole method (FMM) [3, 14, 15, 5] or the fast eigendecomposition of semiseparable matrices [16, 17, 5].

The second class of algorithms, of which [18] is a major example, make use of butterfly methods and similar techniques for the application of oscillatory matrices. Most algorithms in this category, including that of in [18], require precomputation with running times. However, [19] describes an algorithm based on this approach whose total running time is , when both parameters and are in the interval . It makes use of the observation that the solutions of Jacobi’s differential equation can be accurately represented via non-oscillatory phase and amplitude functions to apply the Jacobi transform rapidly. That certain differential equations admit non-oscillatory phase functions has long been known [20]; a general theorem was established in [21] and a numerical method for the computation of non-oscillatory phase functions was given in [22]. In the case of Jacobi’s differential equation, there exists a smooth amplitude function and a smooth phase function such that

(3)

and

(4)

where and are referred as the modified Jacobi functions of the first and second kind, respectively. They are defined in terms of the Jacobi functions of the first and second kinds and (see [20] for definitions) via the formulas

(5)

and

(6)

where

(7)

The normalization constant is chosen to ensure the norm of is when is an integer. Indeed, the set is an orthonormal basis for . The change of variables makes the singularities in phase and amplitude functions for Jacobi’s differential equation more tractable. Obviously, there is no substantial difference in treating expansions of the type (1) and those of the form

(8)

and we will consider the latter form here, more specially, the latter form with scaling by quadrature weights. Moreover, we will denote by and the nodes and weights of the trigonometric Gauss-Jacobi quadrature rule

(9)

obtained by applying the change of variables to (2). Again, we refer to transforms which use the set of points as uniform and those which use some other set points as nonuniform.

In [19], it was observed that the second summation in (8) could be interpreted as the application of a structured matrix that is the real part of a Hadamard product of a NUFFT matrix and a numerically low-rank matrix. This leads to a method for its computation which takes quasi-linear time [13, 23, 24]. Expansions in terms of the functions of the second kind can be handled similarly. This non-oscillatory representation method will be described in more detail in Section 2.2.

In this paper, we generalize the one-dimensional Jacobi polynomial transform of [19] to the multi-dimensional case. The transformation matrix in the multi-dimensional case is still the real part of a Hadamard product of a multi-dimensional NUFFT matrix and a numerically low-rank matrix in the form of tensor products. However, the numerical rank of the low-rank matrix might increase quickly in the dimension following the algorithm in [19] and hence the multi-dimensional NUFFT in [12, 13] might not be sufficiently efficient. To obtain a viable multi-dimensional Jacobi polynomial transform, we reformulate the Hadamard product into a new one, which is the Hadamard product of an over-sampled Fourier transform matrix and a numerically low-rank matrix, and propose a fast randomized SVD to achieve near optimality in the low-rank approximation. This leads to an efficient implementation of two-dimensional and three-dimensional Jacobi polynomial transforms. Besides, as will be demonstrated numerically, the new method proposed in this paper is more robust than the method in [19] and works for and in the regime , other than the regime in [19].

The remainder of the paper is organized as follows. In Section 2, we will first briefly introduce the fast SVD via randomized sampling and the theory of non-oscillatory phase functions. In Section 3, a variant of the fast one-dimensional Jacobi polynomial transform in [19] is given; it is followed by a description of our fast multi-dimensional Jacobi polynomial transform. In Section 4, numerical results which demonstrate the efficiency of our algorithm are described. Finally, we will conclude our discussion in Section 5.

2 Preliminaries

In this section, we will revisit the linear scaling randomized SVD introduced in [25] and the non-oscillatory phase and amplitude functions in [19] to make the presentation self-contained.

2.1 Approximate SVD via Randomized Sampling

For a numerically low-rank matrix with an algorithm to evaluate an arbitrary entry, [25] introduced an algorithm to construct a rank- approximate SVD, , from randomly selected rows and columns of , where denotes the conjugate transpose, and , is a diagonal matrix with approximate singular values in the diagonal part in a descent order.

Here, we adopt the standard notation for a submatrix in MATLAB: given a row index set and a column index set , is the submatrix with entries from rows in and columns in ; we also use “” to denote the entire columns or rows of the matrix, i.e., and . With these handy notations, we briefly introduce the randomized SVD as follows.

Algorithm 2.1.

Approximate SVD via randomized sampling

  1. Let and denote the important columns and rows of that are used to form the column and row bases. Initially and .

  2. Randomly sample rows and denote their indices by . Let . Here is a multiplicative oversampling parameter. Perform a pivoted decomposition of to get

    (10)

    where is the resulting permutation matrix and is an upper triangular matrix. Define the important column index set to be the first columns picked within the pivoted decomposition.

  3. Randomly sample columns and denote their indices by . Let . Perform a pivoted decomposition of to get

    (11)

    where is the resulting permutation matrix and is an lower triangular matrix. Define the important row index set to be the first rows picked within the pivoted decomposition.

  4. Repeat steps 2 and 3 a few times to ensure and sufficiently sample the important columns and rows of .

  5. Apply the pivoted factorization to and let be the matrix of the first columns of the matrix. Similarly, apply the pivoted factorization to and let be the matrix of the first columns of the matrix.

  6. We seek a middle matrix such that . To solve this problem efficiently, we approximately reduce it to a least-squares problem of a smaller size. Let and be the index sets of a few extra randomly sampled columns and rows. Let and . A simple least-squares solution to the problem

    (12)

    gives , where stands for the pseudo-inverse.

  7. Compute an SVD . Then the low-rank approximation of is given by

    (13)

In our numerical implementation, iterating Steps and twice is empirically sufficient to achieve accurate low-rank approximations via Algorithm 2.1. Similar arguments as in [26] for a randomized CUR factorization can be applied to quantify the error and success probability rigorously for Algorithm 2.1. But at this point, we only focus on the application of Algorithm 2.1 to the fast Jacobi polynomial transform without theoretical analysis.

Note that our goal in the Jacobi polynomial transform is to construct a low-rank approximation of the low-rank matrix, i.e., with and , up to a fixed relative error , rather than a fixed rank. Algorithm 2.1 can also be embedded into an iterative process that gradually increases the rank parameter to achieve the desired accuracy. We denote the rank parameter as when it is large enough to obtain the accuracy .

When origins from the discretization of a smooth function at the grid points and , i.e., , another standard method for constructing a low-rank factorization of is Lagrange interpolation at Chebyshev grids in or . For example, let denote the set of Chebyshev grids in the domain of , be the matrix consisting of the -th Lagrange polynomial in corresponding to as its -th column, and be the matrix such that is equal to , where denotes the conjugate operator, then by the Lagrange interpolation. Usually, an oversampling parameter is used via setting . Then a rank- truncated SVD of gives a compressed rank- approximation of , where and .

The fast nonuniform FFT in [13] and the one-dimensional Jacobi polynomial transform in [19] adopted low-rank approximation via Lagrange interpolation to deal with the low-rank term in their Hadamard products of low-rank and (nonuniform) FFT matrices without an extra truncated SVD. In this paper, we propose to use the randomized SVD via random sampling to obtain nearly optimal rank in the low-rank approximation.

2.2 Non-oscillatory phase and amplitude functions

Given a pair of parameters and in , and a maximum degree of interest , we revisit the fast algorithms in [19] for constructing non-oscillatory phase and amplitude functions and such that

(14)

and

(15)

for and . The polynomials with and out of these ranges can be evaluated by the well-known three-term recurrence relations or various asymptotic expansions; the lower bound of was chosen to obtain optimal numerical performance.

Next, we present a few facts regarding the phase and amplitude functions related to Jacobi’s differential equations. It is well known that the functions and satisfy the second order differential equation

(16)

where {dmath} q_ν^(α,β)(t) = (ν+ α+β+12 )^2 + 1424 sin(t2)2 + 1424 cos(t2)2.

We refer to Equation (16) as Jacobi differential equation. Following the derivation in [19], we can show that the pair of real-valued, linearly independent solutions of (16) satisfies

(17)

and

(18)

where is the necessarily positive constant Wronskian of the pair , and is the non-oscillatory phase function of the pair satisfying

(19)

with an appropriately chosen constant related to . Note that since . Hence, the non-oscillatory amplitude function of can be defined as

(20)

Through straightforward computation, it can be verified that the square of the amplitude function satisfies the third order linear ordinary differential equation (ODE)

(21)

Obviously,

(22)

Hence, we can specify the values of and its first two derivatives in at a point on the interval for any using various asymptotic expansion for and . Afterwards, we uniquely determine via solving the ODE (21) using a variant of the integral equation method of [12] (or any standard method for stiff problems).

To obtain the values of , we first calculate the values of

(23)

via (20). Next, we obtain the values of the function defined via

(24)

at any point in via spectral integration. There is an unknown constant connecting with ; that is,

(25)

To evaluate , we first use a combination of asymptotic and series expansions to calculate at the point . Since , it follows that

(26)

and can be calculated in the obvious fashion.

The above discussion has introduced an algorithm to evaluate and in the whole domain . To achieve a fast algorithm, we note that it is enough to conduct calculation on selected important grid points of through the above calculation, since we can evaluate and via interpolation with their values on the important grid points.

It is well known that a polynomial (here is either or ) can be evaluated in a numerically stable fashion using the barcyentric Chebyshev interpolation formula [27]

(27)

where are the nodes of the -point Chebyshev grid on a sufficiently small interval (such that is a polynomial of degree at most ) and

(28)

Hence, it is sufficient to evaluate and , which lead to and , at a tensor product of Chebyshev grid points in and , and calculate them at an arbitrary location via a bivariate piecewise barcyentric Chebyshev interpolation.

More explicitly, let be the least integer such that , and let be equal to twice the least integer such that . Next, we define for , for , and for . Now we let

(29)

denote the -point piecewise Chebyshev gird on the intervals

(30)

with points in total; let

(31)

denote the nodes of the -point piecewise Chebyshev grid on the intervals

(32)

with points in total. The piecewise Chebyshev grids (29) and (31) form a tensor product

(33)

Then and can be obtained via barcyentric Chebyshev interpolations in and via their values at the tensor product restricted to the piece that belongs to for some and .

Here we summarize the operation complexity of the whole algorithm above. A detailed discussion can be found in [19]. ODE solvers are applied for values of , and the solution of the solver is evaluated at values of , making the total running time of the procedure just described .

Once the values of and are ready at the tensor product of the piecewise Chebyshev grids (29) and (31), they can be evaluated for any and via repeated application of the barycentric Chebyshev interpolation formula in the same number of operations which is independent of and .

3 Fast Jacobi polynomial transforms

Without loss of generality, we assume that , , , and can be evaluated in operations for any and . A fast algorithm for rapidly computing Gauss-Jacobi quadrature nodes and weights in the Jacobi polynomial transform in (8) has also been introduced in [19]. We refer the reader to [19] for details and assume that quadrature nodes and weights are available in this section.

3.1 One-dimensional transform and its inverse

Here we propose a new variant of the one-dimensional Jacobi polynomial transform for in [19]. The new algorithm simplifies the discussion of the fast algorithm. The transform for is similar. For our purpose, we consider the order uniform forward Jacobi polynomial transform with a scaling, that is, calculate the vector of values

(34)

given the vector

(35)

of the coefficients in the expansion

(36)

here, are the nodes and weights of the -point trigonometric Gauss-Jacobi quadrature rule corresponding to the parameters and . The “nonuniform” forward Jacobi transform does not require trigonometric Gauss-Jacobi quadrature nodes and weights.

In the uniform transform, the properties of the trigonometric Gauss-Jacobi quadrature rule and the weighting by square roots in (34) ensure that the matrix taking (35) to (34) is orthogonal, where is the matrix

(37)

and is the matrix representing the sum in (36). Hence, the inverse transform is a simple matvec by the transpose of . In the nonuniform case, is usually a very ill-conditioned matrix and the inverse transform requires solving a challenging system of linear equations, which will be discussed in a separate paper in the future.

It follows from (14) that the -th entry of is

(38)

since the part consisting of low-degree polynomials is computed via three-term recurrence relations or various asymptotic expansions. Let be the submatrix of corresponding to the part of polynomials with degree less than , and be the rest of , then

Note that is the real part of a discrete Fourier integral transform that can be evaluated with a quasi-linear complexity by the algorithm of [23] for applying Fourier integral transforms and a special case of that in [24] for general oscillatory integral transforms. Motivated by this observation, the algorithm in [19] constructs a low-rank matrix whose -th entry is

(39)

Denote the nonuniform FFT matrix, whose -th entry is

(40)

as . Then

(41)

where “” denotes the Hadamard product. It can be immediately seen from (41) that can be applied through a small number of NUFFTs [12, 13] and a simple direct summation for in a quasi-linear scaling of .

More specifically, let a low-rank approximation of be

(42)

where and are column vectors for each . Then the transform from (35) to (34) can be approximated by the following sum,

(43)

where is the subvector of with the first entries, is the subvector of with the rest entries, denotes a diagonal matrix with a column vector on its diagonal. The formula (43) can be carried out by NUFFTs in arithmetic operations.

In practice, inspired by the NUFFT in [13], we can replace the Hadamard product of a low-rank matrix and a NUFFT matrix in (41) with a Hadamard product of a low-rank matrix and a DFT matrix, the matvec of which can be carried out more efficiently with a few numbers of FFTs.

Consider another matrix whose -th entry is defined via

(44)

where denotes the integer nearest to . Then we have

(45)

where is an matrix whose -th entry is

(46)

It’s obvious that in (45) is a row permutation of an inverse DFT matrix. Note that the difference between the phase functions of and is less than . Hence, is also a low-rank matrix and we can apply the randomized SVD in Algorithm 2.1 to construct a low-rank approximation of in operations.

Suppose we have constructed an approximate rank- SVD of up to a desired accuracy using Algorithm 2.1; that is, suppose that we have computed the factorization

(47)

where , , and is a positive definite diagonal matrix. By rearranging the factors above, we have

(48)

where and denote the column vector of and the row vector of , respectively, and denotes the matrix transpose. Once (48) is ready, we have

(49)

where denotes a diagonal matrix with on its diagonal. Formula (49) indicates that the Jacobi polynomial transform can be evaluated efficiently via inverse FFTs, which requires arithmetic operations and memory.

Compared to (43) used in the original fast Jacobi transform in [19], the number of inverse FFTs in (49) has been optimized. Thus Formula (49) would take fewer operations to compute a multi-dimensional transform. Note that, compared to the original method in [19], though our new method using (49) would cost more time to construct a low-rank approximation, the optimized rank of the low-rank approximation would accelerate the application of the multi-dimensional Jacobi transform. In addition, as we mentioned previously, the new method works in a larger range of and . In Section 4, we will provide numerical comparisons to demonstrate the superior of the new method over the method in [19].

The fast inverse Jacobi transform in the uniform case can be carried out in a similar manner, since is an orthonormal matrix. In fact, the inverse transform can be computed via

and

(50)

where is a permutation of the DFT matrix. Therefore, the inverse Jacobi polynomial transform can also be computed via FFTs.

It is worth emphasizing that the fast algorithm introduced in this section also works for non-uniform Jacobi polynomial transforms since the low-rankness of is independent of the samples in and the quadrature weights in the uniform Jacobi polynomial transform. There will be numerical examples later to verify this. The inverse transform in the non-uniform case requires solving a highly ill-conditioned linear system, which will be reported in a separate paper in the future.

3.2 Two-dimensional transform and its inverse

In this section, we extend our algorithm to the two-dimensional case, using the new method developed in the last section. We only focus on the transform for ; the one for is similar. We will adopt the MATLAB notation as a vector resulting from reshaping the matrix into a vector. We also use similar notations as in Section 3.1, e.g., , , , , , and denote corresponding matrices analogous to their counterparts in Section 3.1, respectively, with “” specifying a variable or in the spatial domain. In the rest of this section, we always assume a low-rank approximation

(51)

has been obtained by Algorithm 2.1 up to a desired accuracy for “” as or .

Given locations and , with no substantial difference, the forward and inverse two-dimensional Jacobi polynomial transforms arise in the following Jacobi expansion

(52)

where denotes an expansion coefficients matrix. The forward and inverse transform can be defined analogously as in the one-dimensional case. When both and are exactly the nodes of the trigonometric Gauss-Jacobi quadrature rule, the corresponding transform is referred as the uniform transform. With additional weight matrices and , the forward transformation matrix taking to is orthogonal, where ”” denotes the Kronecker product. When , , and weights are not given by the trigonometric Gauss-Jacobi quadrature rule, we refer the corresponding transform as a non-uniform transform.

With the low-rank approximations in (51) available, we have

(53)

where is an zero matrix. In our numerical implement, since does not contribute to the numerical result, there are just entries of needed to be applied in the first three terms above. Formula (53) indicates that 2D uniform Jacobi polynomial transforms can be evaluated via 1D inverse FFTs and 2D inverse FFTs, which results in arithmetic operations and memories. We would like to emphasize that this algorithm also works for 2D non-uniform forward transform since it is actually a tensor form of the one-dimensional transform. There will be numerical examples later to verify this.

The fast 2D inverse Jacobi transform in the uniform case can be carried out in a similar manner, since both and are orthonormal matrices. In fact, the 2D inverse transform can be computed via

(54)

where . In our numerical implement, since does not contribute to the numerical result, we just need to compute entries of for the first three terms above. Therefore, 2D inverse Jacobi polynomial transforms can also be evaluated via 1D FFTs and 2D FFTs.

3.3 Three-dimensional transform and its inverse

In this section, we continue to extend our algorithm for 3D Jacobi polynomial transform and its inverse. Analogously, we just discuss the transform for and the transform for is similar. The notations in Section 3.2 will be inherited here.

Given locations , , and in , with no substantial difference, the forward and inverse three-dimensional Jacobi polynomial transforms arise in the following Jacobi expansion

(55)

where denotes a three-dimensional tensor containing expansion coefficients. The forward and inverse transforms can be defined analogously to the three-dimensional case. When , , and are all exactly the nodes of the trigonometric Gauss-Jacobi quadrature rule, the corresponding transform is referred as the uniform transform. Otherwise, we refer the corresponding transform as a non-uniform transform.

In the uniform transform, in order to take advantage of orthogonality, we consider the the tensor product that takes to . Once the related low-rank approximations,

(56)

have been obtained, we have

(57)

where

(58)

In numerical implement, since does not contribute to numerical result, there are just entries of needed to be applied in the first seven terms above. Formula (58) indicates that 3D uniform Jacobi polynomial transform can be evaluated via