A Fast Summation Method for translation invariant kernels
Abstract
We derive a Fast Multipole Method (FMM) where a lowrank approximation of the kernel is obtained using the Empirical Interpolation Method (EIM). Contrary to classical interpolationbased FMM, where the interpolation points and basis are fixed beforehand, the EIM is a nonlinear approximation method which constructs interpolation points and basis which are adapted to the kernel under consideration. The basis functions are obtained using evaluations of the kernel itself. We restrict ourselves to translationinvariant kernels, for which a modified version of the EIM approximation can be used in a multilevel FMM context; we call the obtained algorithm Empirical Interpolation Fast Multipole Method (EIFMM). An important feature of the EIFMM is a builtin error estimation of the interpolation error made by the lowrank approximation of the farfield behavior of the kernel: the algorithm selects the optimal number of interpolation points required to ensure a given accuracy for the result, leading to important gains for inhomogeneous kernels.
1 Introduction
We consider the following problem: compute approximations of many sums of the form
(1) 
for a set of potentials , and target points and source points , where , with a controllable error . The domain and are fixed for all computations, but the potentials and the points and can vary from one computation to another. The manyquery situation can come from a temporal Nbody problem (in which case the set of source points equals the set of observation points), or the iterative resolution of a dense linear system.
Various algorithms have been proposed to tackle such problems. Some methods only using numerical evaluations of have been proposed, see [1, 8, 15]. Many methods, e.g. the mosaic skeleton method [27], hierarchical matrices [20], the Fast Multipole Method (FMM) [17] (and its algebraic generalization H2matrices [18, 19]), the panel clustering method [21], rely on local lowrank approximations of :
(2) 
where and are subsets of . Such approximations can be analytic, like interpolation (for instance the Chebyshev interpolation as in [14, 20]), or multipole expansions. FMM for general kernels have been developed leading to kernelindependent procedures, see [24, 28]. In [13], another kernelindependent FMM algorithm, based on the Chebyshev interpolation to obtain the lowrank approximation (2) of , is developed and called BlackBox FMM. Since the algorithm we present here is in the same family as the one developed in [13], our numerical experiments contain comparisons with this method. For other kernelindependent interpolation based procedures, see [10, 11]. Adaptive Cross Approximation (ACA) [3, 4] directly constructs approximations from the evaluations of the kernel on the target and source points , by selecting some columns and rows of the matrix . Approximating a matrix by extracting some of its rows and columns has been investigated in [16]. With ACA methods, (2) is such that depends on and depends on , which leads to a storage complexity of . With FMM and H2matrices, and are independent of respectively and , which allows additional factorization and the use of nested basis to lower the storage complexity to . An improved ACA exploiting the nested basis property has been proposed in [5].
In this work, a lowrank approximation of is obtained with the Empirical Interpolation Method (EIM), a procedure that represents a twovariable function in a socalled affine dependent formula and that is used in the Reduced Basis community, [2]. The EIM provides a nonlinear and functiondependent interpolation: the basis functions and interpolation points are not fixed a priori but are computed during a greedy procedure, which differs from the Chebyshev interpolations. The procedure should capture any irregularity of the kernel better than a fixed basis interpolation procedure, since the basis functions of the EIM are based on evaluations of the kernel itself. Theoretical results on the convergence of EIM interpolation error in the general case are limited, to the author knowledge, to an upper bound of of the Lebesgue constant, where is the number of interpolation points, see [2, Proposition 3.2]. Various numerical studies illustrate the efficiency of the EIM [12, 22, 23, 26], even though a theoretical analysis validating this efficiency is not available yet. Depending on the kernel (and on the level in the tree for inhomogeneous kernels), the EIM selects an optimal number of terms that ensures a given accuracy of the approximation. Such a variableorder interpolation procedure have been used in [6] in the H2matrix context with Lagrange polynomials.
A direct application of the EIM algorithm to the kernel to obtain a local lowrank approximation (2) is such that depends on and depends on , and therefore is not compatible with a multilevel FMM: a nested basis procedure cannot be derived. To correct this, we make use of the translationinvariant hypothesis of and derive a localglobal lowrank EIM approximation of , in the sense that one variable is restricted to a local small subdomain while the other variable is taken in almost all the domain. Then, we derive recursive formulae to produce a multilevel FMM for this modified approximation: this is detailed in Section 3.3. The modified approximation uses four summations, instead of two summations for the BlackBox FMM. However, we will see that the M2L step, the most expensive one, has the same complexity as the BlackBox FMM one with respect to the number on interpolation points. Our algorithm requires a precomputation step, during which the greedy step of the EIM at each level is carriedout, at well as compression of some operators. In our simulations, we did not notice any speedup on homogeneous kernels compared to BlackBox FMM. For inhomogeneous kernels, the approximation problem gets easier as we get higher in the tree: corresponding EIM require much less interpolation points and speedups are measured. In points testcases, depending on the kernel, the EIFMM gets more interesting if one has to compute more than 2 to 18 queries of the summation. In a points testcase, for the kernel , the precomputation step becomes negligible, and a single query is computed twice as fast for a nearly one order of magnitude better accuracy.
In Section 2, the EIM is recalled. In Section 3 is presented the new FMM algorithm, called the Empirical Interpolation Fast Multipole Method (EIFMM): first, a foursummation formula based on the EIM is derived in Section 3.1, that can be used in a multilevel FMM procedure, then a monolevel version of the EIFMM is proposed in Section 3.2, and a multilevel one in Section 3.3. In Section 4, the overall execution time is reduced using some precomputation and classical compression of the M2L step. Finally, numerical experiments are presented in Section 5, and conclusions are drawn in Section 6.
2 The Empirical Interpolation Method
Consider a function defined over assumed to be realvalued for simplicity, where . Fix an integer . The Empirical Interpolation Method provides a way to approximate this function in the following form:
(3) 
where is such that
(4) 
The functions and the matrix , which is lower triangular with unity diagonal, are constructed in Algorithm 1, where and is a norm on , for instance the  or the norm. In practice, the argmax appearing in Algorithm 1 is searched over finite subsets of and , denoted respectively by and and called training sets. Notice that Algorithm 1 also constructs the set of points in used in (4), and a set of points in . These points are interpolations points for the approximation of (see Property 2.1) and are different from the target and source points .
The following assumption is made:

The dimension of is larger than .
From [(H)], the functions are linearly independent (otherwise, for some in Algorithm 1).
It is shown in [7] that the approximation (3) can be written in the form
(5) 
where , and where the matrix is is upper triangular and constructed recursively in the loop in of Algorithm 1 in the following way:

:

:
where the vector is such that , for all .
The particular form (5) is the key to the use of the EIM approximation in a fast multipole context.
We recall the interpolation property of ; see [23, Lemma 1]:
Property 2.1 (Interpolation property).
For all ,
An important aspect is that the error made by the EIM approximation over the training set , denoted and defined as , is monitored during the greedy procedure. In practice, a stopping criterion is used on instead of on the number of interpolation points , which was used to simplify the presentation of the EIM. Hence, depending on the regularity of the function , the EIM will automatically select the number of interpolation points to certify the error over .
3 The Empirical Interpolation Fast Multipole Method
The FMM algorithm is based on a lowrank approximation of the farfield behavior of the kernel of the form (2). Such approximations enable some factorizations in the computation of keyquantities and the propagation of information through a tree structure, resulting in lowering the complexity of the computation of (1). For a general presentation of the FMM algorithm, see the original article [17, Section 3].
3.1 A suitable approximation
For the simplicity of the presentation, the algorithm is described in 1D. It can be readily extended in 2D and 3D. Consider a kernel , , where is the complete domain on which the FMM computation is carried out. The kernel is supposed translation invariant, i.e. such that , . The domain is partitioned in intervals, , of length , with . Consider two intervals and , wellseparated in the sense that . To anticipate the generalization to higher dimensions, and are called “boxes” in what follows. We denote and the center of the boxes and . Consider an EIM approximation constructed over :
(6) 
where and superscripts have been added to underline the boxdependent quantities (as deduced from Algorithm 1). In (6), depends on , the box containing and depends on , the box containing : as stated in the introduction, this approximation is then not suitable for a multilevel FMM procedure.
Define now for all box in the partitioning of and all box well separated from , and define the box of length located at the center of (notice that is not a box from the partitioning of ). In our setting, and , see Figure 1. In other words, is a source box translated to the center of the domain, is the union of possible target boxes, translated in the same fashion.
Since , with and , it is inferred that
(7) 
where the EIM is carried out on and . An important aspect is that and are fixed domains, independent from the boxes and containing respectively and . Consider the first evaluation of the kernel in (7). There holds , where and . It is then inferred that
(8) 
where a second EIM has been carried out on and (with the same number of terms for simplicity). Notice that if the kernel is symmetric, , for all , and for all . Injecting (8) into (7):
(9) 
In (9), the only term depending on the boxes and (besides and themselves) are and , the center of the boxes and . Hence, in the lowrank approximation, the dependent function no longer depends on and the dependent function no longer depends on . As a consequence, the nontrivial manipulation leading to (9) are required for the use of EIM interpolation in a multilevel FMM context.
Notice that in all the kernel evaluations in (9), the first and second variables are always separated by a distance of at least .
3.2 A monolevel multipole method
Consider a set of target points , source points in , and potentials , with . We look for a low complexity approximation of
(10) 
We say that two boxes and are wellseparated if . The set of boxes wellseparated from is denoted , its complement in is the neighborhood of and is denoted . The FMM aims at compressing the “far” interactions in the following fashion:
(11) 
A monolevel multipole method can be directly derived from (9) to approximate , see Algorithm 2. The obtained approximation of is denoted . An approximation of is then given by
(12) 
where the “near“ interactions are defined by
(13) 
Step number  Complexity 

1  
2  
3  
4  
5  
In Table 1 is given the complexity of each step of Algorithm 2. For step 3 and the computation of the near interactions to be in the same order in , one must impose , leading to a overall complexity of the sum of order . As classically done in the literature, the derivation of a multilevel multipole method allows to lower the complexity in of the algorithm to linear.
3.3 A multilevel multipole method
The domain of computation is now organized as a binary tree with levels. Level is the root of the tree: there is one box equal to , level contains the partitioning of into two boxes of length , and the following levels are obtained by partitioning each box of the previous level in the same fashion, up to level .
A superscript is now added to the quantities previously defined, to indicate that they depend on the level . Two EIM’s are to be considered at each level, in the same fashion as in Section 3.1, where now the domains depend on the level in the tree: , and .
For symmetric kernels, this amount of precomputation can be lowered to one EIM by level. The key to a multilevel method stands in the ability to derive recursion formulae to propagate information from the level to the level in the upward pass, and from the level to the level in the downward pass.
BlackBox FMM, at least in its original version, is proposed with a constantorder of interpolation, which enables to propagate the information through the levels of the tree without introducing additional errors. Considering a variableorder Chebyshev interpolation is possible, but one must carefully analyse the introduced error, as done in [6] in the context of H2matrices. In our method, the domains of approximation depicted in Figure 1 naturally enable a variableorder interpolation depending on the level of the tree, for which the introduced error is the same as the EIM approximation error in (9). As a consequence, the variableorder aspect of the approximation is directly obtained at the required accuracy, and the length of the approximation is automatically optimized at each level of the tree, which leads to important gains for inhomogeneous kernels.
3.3.1 Recursion for the upward pass: multipole to multipole (M2M)
The goal is to compute
(14) 
for all box at all level with and all by recursion. The initialization of the recursion consists in computing at level using (14). Then, suppose is available at level , there holds
(15) 
where denotes the children boxes of . To obtain a recursion on , i.e. to compute using some values of , for being boxes at the level , we will use the EIM approximations already computed at level to derive (9). One must check that the corresponding EIM approximations are evaluated on the right domains, namely and .
3.3.2 Transfer pass: multipole to local (M2L)
We denote , the set of boxes that are the children of the boxes in the neighborhood of the parent of but are wellseparated from at level , and call it the interaction list of , see Figure 2.
The M2L step consists in computing
(19) 
for all box at all level k with and all .
3.3.3 Recursion for the downward pass: local to local (L2L)
The goal is to compute
(20) 
for all box at all level with and all by recursion. The initialization of the recursion consists in computing at level using (20). Then, suppose is available, there holds
(21)  
where denotes the parent of .