A Fast Summation Method for translation invariant kernels

# A Fast Summation Method for translation invariant kernels

Fabien Casenave IGN LAREG, Univ Paris Diderot, Sorbonne Paris Cité, 5 rue Thomas Mann, 75205 Paris Cedex 13, France.
###### Abstract

We derive a Fast Multipole Method (FMM) where a low-rank approximation of the kernel is obtained using the Empirical Interpolation Method (EIM). Contrary to classical interpolation-based FMM, where the interpolation points and basis are fixed beforehand, the EIM is a nonlinear approximation method which constructs interpolation points and basis which are adapted to the kernel under consideration. The basis functions are obtained using evaluations of the kernel itself. We restrict ourselves to translation-invariant kernels, for which a modified version of the EIM approximation can be used in a multilevel FMM context; we call the obtained algorithm Empirical Interpolation Fast Multipole Method (EIFMM). An important feature of the EIFMM is a built-in error estimation of the interpolation error made by the low-rank approximation of the far-field behavior of the kernel: the algorithm selects the optimal number of interpolation points required to ensure a given accuracy for the result, leading to important gains for inhomogeneous kernels.

## 1 Introduction

We consider the following problem: compute approximations of many sums of the form

 f(¯xi)=N∑j=1σjK(¯xi,¯yj),1≤i≤N, (1)

for a set of potentials , and target points and source points , where , with a controllable error . The domain and are fixed for all computations, but the potentials and the points and can vary from one computation to another. The many-query situation can come from a temporal N-body problem (in which case the set of source points equals the set of observation points), or the iterative resolution of a dense linear system.

Various algorithms have been proposed to tackle such problems. Some methods only using numerical evaluations of have been proposed, see [1, 8, 15]. Many methods, e.g. the mosaic skeleton method [27], hierarchical matrices [20], the Fast Multipole Method (FMM) [17] (and its algebraic generalization H2-matrices [18, 19]), the panel clustering method [21], rely on local low-rank approximations of :

 K(x,y)≈d∑l,m=1Δl,mgl(x)hm(y),x∈I,y∈J, (2)

where and are subsets of . Such approximations can be analytic, like interpolation (for instance the Chebyshev interpolation as in [14, 20]), or multipole expansions. FMM for general kernels have been developed leading to kernel-independent procedures, see [24, 28]. In [13], another kernel-independent FMM algorithm, based on the Chebyshev interpolation to obtain the low-rank approximation (2) of , is developed and called Black-Box FMM. Since the algorithm we present here is in the same family as the one developed in [13], our numerical experiments contain comparisons with this method. For other kernel-independent interpolation based procedures, see [10, 11]. Adaptive Cross Approximation (ACA) [3, 4] directly constructs approximations from the evaluations of the kernel on the target and source points , by selecting some columns and rows of the matrix . Approximating a matrix by extracting some of its rows and columns has been investigated in [16]. With ACA methods, (2) is such that depends on and depends on , which leads to a storage complexity of . With FMM and H2-matrices, and are independent of respectively and , which allows additional factorization and the use of nested basis to lower the storage complexity to . An improved ACA exploiting the nested basis property has been proposed in [5].

In this work, a low-rank approximation of is obtained with the Empirical Interpolation Method (EIM), a procedure that represents a two-variable function in a so-called affine dependent formula and that is used in the Reduced Basis community, [2]. The EIM provides a non-linear and function-dependent interpolation: the basis functions and interpolation points are not fixed a priori but are computed during a greedy procedure, which differs from the Chebyshev interpolations. The procedure should capture any irregularity of the kernel better than a fixed basis interpolation procedure, since the basis functions of the EIM are based on evaluations of the kernel itself. Theoretical results on the convergence of EIM interpolation error in the general case are limited, to the author knowledge, to an upper bound of of the Lebesgue constant, where is the number of interpolation points, see [2, Proposition 3.2]. Various numerical studies illustrate the efficiency of the EIM [12, 22, 23, 26], even though a theoretical analysis validating this efficiency is not available yet. Depending on the kernel (and on the level in the tree for inhomogeneous kernels), the EIM selects an optimal number of terms that ensures a given accuracy of the approximation. Such a variable-order interpolation procedure have been used in [6] in the H2-matrix context with Lagrange polynomials.

A direct application of the EIM algorithm to the kernel to obtain a local low-rank approximation (2) is such that depends on and depends on , and therefore is not compatible with a multilevel FMM: a nested basis procedure cannot be derived. To correct this, we make use of the translation-invariant hypothesis of and derive a local-global low-rank EIM approximation of , in the sense that one variable is restricted to a local small subdomain while the other variable is taken in almost all the domain. Then, we derive recursive formulae to produce a multilevel FMM for this modified approximation: this is detailed in Section 3.3. The modified approximation uses four summations, instead of two summations for the Black-Box FMM. However, we will see that the M2L step, the most expensive one, has the same complexity as the Black-Box FMM one with respect to the number on interpolation points. Our algorithm requires a precomputation step, during which the greedy step of the EIM at each level is carried-out, at well as compression of some operators. In our simulations, we did not notice any speedup on homogeneous kernels compared to Black-Box FMM. For inhomogeneous kernels, the approximation problem gets easier as we get higher in the tree: corresponding EIM require much less interpolation points and speedups are measured. In points test-cases, depending on the kernel, the EIFMM gets more interesting if one has to compute more than 2 to 18 queries of the summation. In a points test-case, for the kernel , the precomputation step becomes negligible, and a single query is computed twice as fast for a nearly one order of magnitude better accuracy.

In Section 2, the EIM is recalled. In Section 3 is presented the new FMM algorithm, called the Empirical Interpolation Fast Multipole Method (EIFMM): first, a four-summation formula based on the EIM is derived in Section 3.1, that can be used in a multilevel FMM procedure, then a monolevel version of the EIFMM is proposed in Section 3.2, and a multilevel one in Section 3.3. In Section 4, the overall execution time is reduced using some precomputation and classical compression of the M2L step. Finally, numerical experiments are presented in Section 5, and conclusions are drawn in Section 6.

## 2 The Empirical Interpolation Method

Consider a function defined over assumed to be real-valued for simplicity, where . Fix an integer . The Empirical Interpolation Method provides a way to approximate this function in the following form:

 K(x,y)≈(IdK)(x,y):=d∑m=1λm(x)qm(y), (3)

where is such that

 d∑m=1Bl,mλm(x)=K(x,yl),∀1≤l≤d. (4)

The functions and the matrix , which is lower triangular with unity diagonal, are constructed in Algorithm 1, where and is a norm on , for instance the - or the -norm. In practice, the argmax appearing in Algorithm 1 is searched over finite subsets of and , denoted respectively by and and called training sets. Notice that Algorithm 1 also constructs the set of points in used in (4), and a set of points in . These points are interpolations points for the approximation of (see Property 2.1) and are different from the target and source points .

• The dimension of is larger than .

From [(H)], the functions are linearly independent (otherwise, for some in Algorithm 1).

It is shown in [7] that the approximation (3) can be written in the form

 (IdK)(x,y)=d∑l,m=1Δl,mK(x,yl)K(xm,y), (5)

where , and where the matrix is is upper triangular and constructed recursively in the loop in of Algorithm 1 in the following way:

• :

 Γ1,1=K(x1,y1),
• :

 Γk+1,k+1 =(δkK)(xk+1,yk+1), Γm,k+1 =0, ∀1≤m≤k, Γk+1,m =αm, ∀1≤m≤k,

where the vector is such that , for all .

The particular form (5) is the key to the use of the EIM approximation in a fast multipole context.

We recall the interpolation property of ; see [23, Lemma 1]:

###### Property 2.1 (Interpolation property).

For all ,

 {(IdK)(x,ym)=K(x,ym),for all x∈Dx,(IdK)(xm,y)=K(xm,y),for % all y∈Dy.

An important aspect is that the error made by the EIM approximation over the training set , denoted and defined as , is monitored during the greedy procedure. In practice, a stopping criterion is used on instead of on the number of interpolation points , which was used to simplify the presentation of the EIM. Hence, depending on the regularity of the function , the EIM will automatically select the number of interpolation points to certify the error over .

## 3 The Empirical Interpolation Fast Multipole Method

The FMM algorithm is based on a low-rank approximation of the far-field behavior of the kernel of the form (2). Such approximations enable some factorizations in the computation of key-quantities and the propagation of information through a tree structure, resulting in lowering the complexity of the computation of (1). For a general presentation of the FMM algorithm, see the original article [17, Section 3].

### 3.1 A suitable approximation

For the simplicity of the presentation, the algorithm is described in 1D. It can be readily extended in 2D and 3D. Consider a kernel , , where is the complete domain on which the FMM computation is carried out. The kernel is supposed translation invariant, i.e. such that , . The domain is partitioned in intervals, , of length , with . Consider two intervals and , well-separated in the sense that . To anticipate the generalization to higher dimensions, and are called “boxes” in what follows. We denote and the center of the boxes and . Consider an EIM approximation constructed over :

 K(x,y)≈d∑l,m=1ΔI,Jl,mK(x,yJl)K(xIm,y),x∈I,y∈J, (6)

where and superscripts have been added to underline the box-dependent quantities (as deduced from Algorithm 1). In (6), depends on , the box containing and depends on , the box containing : as stated in the introduction, this approximation is then not suitable for a multilevel FMM procedure.

Define now for all box in the partitioning of and all box well separated from , and define the box of length located at the center of (notice that is not a box from the partitioning of ). In our setting, and , see Figure 1. In other words, is a source box translated to the center of the domain, is the union of possible target boxes, translated in the same fashion.

Since , with and , it is inferred that

 K(x,y)≈d∑l,m=1ΔI0,J0l,mK(x−cJ,yJ0l)K(xI0m,y−cJ), (7)

where the EIM is carried out on and . An important aspect is that and are fixed domains, independent from the boxes and containing respectively and . Consider the first evaluation of the kernel in (7). There holds , where and . It is then inferred that

 K(x−cJ,yJ0l)≈d∑l′,m′=1ΔJ0,I0l′,m′K(x−cI,yI0l′)K(xJ0m′,yJ0l+cJ−cI),1≤l≤m, (8)

where a second EIM has been carried out on and (with the same number of terms for simplicity). Notice that if the kernel is symmetric, , for all , and for all . Injecting (8) into (7):

 K(x,y)≈d∑l′=1K(x−cI,yI0l′)d∑m′=1ΔJ0,I0l′,m′d∑l=1K(xJ0m′,yJ0l+cJ−cI)d∑m=1ΔI0,J0l,mK(xI0m,y−cJ),x∈I,y∈J. (9)

In (9), the only term depending on the boxes and (besides and themselves) are and , the center of the boxes and . Hence, in the low-rank approximation, the -dependent function no longer depends on and the -dependent function no longer depends on . As a consequence, the non-trivial manipulation leading to (9) are required for the use of EIM interpolation in a multilevel FMM context.

Notice that in all the kernel evaluations in (9), the first and second variables are always separated by a distance of at least .

### 3.2 A monolevel multipole method

Consider a set of target points , source points in , and potentials , with . We look for a low complexity approximation of

 f(¯xi)=N∑j=1σjK(¯xi,¯yj),1≤i≤N. (10)

We say that two boxes and are well-separated if . The set of boxes well-separated from is denoted , its complement in is the neighborhood of and is denoted . The FMM aims at compressing the “far” interactions in the following fashion:

 fF(¯xi)=∑J∈F(I)∑¯yj∈JσjK(¯xi,¯yj), for all box I and all ¯xi∈I. (11)

A monolevel multipole method can be directly derived from (9) to approximate , see Algorithm 2. The obtained approximation of is denoted . An approximation of is then given by

 f(¯xi)≈fF,FMM(¯xi)+fN(¯xi), (12)

where the “near“ interactions are defined by

 fN(¯xi)=∑¯yj∈N(I)σjK(¯xi,¯yj). (13)

In Table 1 is given the complexity of each step of Algorithm 2. For step 3 and the computation of the near interactions to be in the same order in , one must impose , leading to a overall complexity of the sum of order . As classically done in the literature, the derivation of a multilevel multipole method allows to lower the complexity in of the algorithm to linear.

### 3.3 A multilevel multipole method

The domain of computation is now organized as a binary tree with levels. Level is the root of the tree: there is one box equal to , level contains the partitioning of into two boxes of length , and the following levels are obtained by partitioning each box of the previous level in the same fashion, up to level .

A superscript is now added to the quantities previously defined, to indicate that they depend on the level . Two EIM’s are to be considered at each level, in the same fashion as in Section 3.1, where now the domains depend on the level in the tree: , and .

For symmetric kernels, this amount of precomputation can be lowered to one EIM by level. The key to a multilevel method stands in the ability to derive recursion formulae to propagate information from the level to the level in the upward pass, and from the level to the level in the downward pass.

Black-Box FMM, at least in its original version, is proposed with a constant-order of interpolation, which enables to propagate the information through the levels of the tree without introducing additional errors. Considering a variable-order Chebyshev interpolation is possible, but one must carefully analyse the introduced error, as done in [6] in the context of H2-matrices. In our method, the domains of approximation depicted in Figure 1 naturally enable a variable-order interpolation depending on the level of the tree, for which the introduced error is the same as the EIM approximation error in (9). As a consequence, the variable-order aspect of the approximation is directly obtained at the required accuracy, and the length of the approximation is automatically optimized at each level of the tree, which leads to important gains for inhomogeneous kernels.

#### 3.3.1 Recursion for the upward pass: multipole to multipole (M2M)

The goal is to compute

 WIkm:=∑¯yj∈IkσjK(xIk0m,¯yj−cIk), (14)

for all box at all level with and all by recursion. The initialization of the recursion consists in computing at level using (14). Then, suppose is available at level , there holds

 WIk−1m=∑¯yj∈Ik−1σjK(xIk−10m,¯yj−cIk−1)=∑Jk∈C(Ik−1)∑¯yj∈JkσjK(xIk−10m+cIk−1−cJk,¯yj−cJk), (15)

where denotes the children boxes of . To obtain a recursion on , i.e. to compute using some values of , for being boxes at the level , we will use the EIM approximations already computed at level to derive (9). One must check that the corresponding EIM approximations are evaluated on the right domains, namely and .

For all , , i.e. . Then, for all , (see for instance Figure 3). Moreover, since , it is inferred that

 −L+lk≤xIk−10m+cIk−1−cJk≤−5lk≤−3lk or 3lk≤5lk≤xIk−10m+cIk−1−cJk≤L−lk,

i.e. . Therefore, the EIM from level can then be used to write the following approximation:

 K(xIk−10m+cIk−1−cJk,¯yj−cJk)≈dk∑p,q=1ΔIk0,Jk0p,qK(xIk−10m+cIk−1−cJk,yJk0p)K(xIk0q,¯yj−cJk), (16)

where the approximation error is of the same order as in the (9).

Then, injecting (16) into (15), it is inferred that

 WIk−1m ≈dk∑p,q=1ΔIk0,Jk0p,q∑Jk∈C(Ik−1)K(xIk−10m+cIk−1−cJk,yJk0p)∑¯yj∈JkσjK(xIk0q,¯yj−cJk) (17) ≈dk∑p,q=1ΔIk0,Jk0p,q∑Jk∈C(Ik−1)K(xIk−10m+cIk−1−cJk,yJk0p)WJkq,

which provides a recursive formula for the upward pass. In the same fashion as for the monolevel version, we then compute

 ^WIkl:=dk∑m=1ΔIk0,Jk0l,mWIkm, (18)

for all box at all level k with and all .

#### 3.3.2 Transfer pass: multipole to local (M2L)

We denote , the set of boxes that are the children of the boxes in the neighborhood of the parent of but are well-separated from at level , and call it the interaction list of , see Figure 2.

The M2L step consists in computing

 gIkm′:=∑Jk∈I(Ik)dk∑l=1K(xJk0m′,yJk0l+cJk−cIk)^WJkl, (19)

for all box at all level k with and all .

#### 3.3.3 Recursion for the downward pass: local to local (L2L)

The goal is to compute

 lIkm′:=∑Jk∈F(Ik)dk∑l=1K(xJk0m′,yJk0l+cJk−cIk)^WJkl, (20)

for all box at all level with and all by recursion. The initialization of the recursion consists in computing at level using (20). Then, suppose is available, there holds

 lIk+1m′ =gIk+1m′+∑Jk∈F(P(Ik+1))∑Jk+1∈C(Jk)dk+1∑l=1K(xJk+10m′,yJk+10l+cJk+1−cIk+1)^WJk+1l (21) =gIk+1m′+∑Jk∈F(P(Ik+1))∑Jk+1∈C(Jk)dk+1∑l=1K(xJk+10m′+cIk+1−cP(Ik+1),yJk+10l+cJk+1−cP(Ik+1))^WJk+1l,

where denotes the parent of .

It is seen on Figures 3 and 4 that

 −lk+1≤cIk+1−cP(Ik+1)≤lk+1,

and that

 −L+3lk+1≤cJk+1−cP(Ik+1)≤−7lk+1 % or 7lk+1≤cJk+1−cP(Ik+1)≤L−3lk+1.

Since , it is then readily verified that and

 −L+2lk+1=−L+lk≤yJk+10l+cJk+1−cP(Ik+1)≤−6lk+1=−3lk or 3lk≤yJk+10l+cJk+1−cP(Ik+1)≤L−lk,

i.e. . Therefore, the EIM from level can then be used to write the following approximation

 K(xJk+10m′+cIk+1−cP(Ik+1),yJk+10l+cJk+1−cP(Ik+1))≈ (22) dk∑p′,q′=1ΔJk0,Ik0p′,q′K(xJk+10m′+cIk+1−cP(Ik+1),yIk0p′)K(xJk0q′,yJk+10l+cJk+1−cP(Ik+1)).

Then, injecting (22) into (21), it is inferred that

 lIk+1m′≈gIk+1m′+ ∑Jk∈F(P(Ik+1))∑Jk+1∈C(Jk)dk∑p′,q′=1ΔJk0,Ik