Sparsify and sweep: an efficient preconditioner for the Lippmann-Schwinger equation

Sparsify and sweep: an efficient preconditioner for the Lippmann-Schwinger equation

Abstract

This paper presents an efficient preconditioner for the Lippmann-Schwinger equation that combines the ideas of the sparsifying and the sweeping preconditioners. Following first the idea of the sparsifying preconditioner, this new preconditioner starts by transforming the dense linear system of the Lippmann-Schwinger equation into a nearly sparse system. The key novelty is a newly designed perfectly matched layer (PML) stencil for the boundary degrees of freedoms. The resulting sparse system gives rise to fairly accurate solutions and hence can be viewed as an accurate discretization of the Helmholtz equation. This new PML stencil also paves the way for applying the moving PML sweeping preconditioner to invert the resulting sparse system approximately. When combined with the standard GMRES solver, this new preconditioner for the Lippmann-Schwinger equation takes only a few iterations to converge for both 2D and 3D problems, where the iteration numbers are almost independent of the frequency. To the best of our knowledge, this is the first method that achieves near-linear cost to solve the 3D Lippmann-Schwinger equation in high frequency cases.

Keywords. Lippmann-Schwinger equation, acoustic and electromagnetic scattering, quantum scattering, preconditioner

AMS subject classifications. 65F08, 65F50, 65N22, 65R20, 78A45

1Introduction

This paper concerns the time-harmonic scattering problem

where is the given incoming wave, is the scattered field to solve, is the angular frequency and is the velocity field such that outside some bounded region . See Figure 1 for an example. The incoming wave satisfies the homogeneous Helmholtz equation

[!ht]

Figure 1: An example of the incoming wave u_I(x) and the scattered field u(x).
Figure 1: An example of the incoming wave and the scattered field .

Let be the perturbation field. Rewriting in terms of we have

Let be the Green’s function of the free space Helmholtz equation

Convolving both sides of with gives

which is known as the Lippmann-Schwinger equation written in terms of the scattered field .

Solving the integral equation has several advantages compared to solving . First, since is compactly supported, we only need to solve in . The scattered field for is explicitly given by once in is known. More importantly, the resulting wave field in automatically satisfies the Sommerfeld radiation condition. On the contrary, for one has to truncate the domain to some bounded region and impose appropriate boundary conditions to simulate the radiation condition. Second, most local discretizations of suffer from the pollution effect [4] due to inaccurate dispersion relations. avoids this problem by leveraging the Green’s function explicitly in the equation.

However, discretizing also raises several issues. First, the resulting linear system is dense. By the Nyquist theorem, a constant number of points per wavelength is needed to capture the oscillations, thus the number of degrees of freedom is at least . In high frequency cases, can be rather large where it is impractical to solve general dense linear systems with direct method. Second, the discretized system can have very large condition number for non-negligible perturbations due to multiple scattering when is large. As a result, most standard iterative solvers require a large number of iterations to converge.

Recently, several progresses have been made to solve the Lippmann-Schwinger equation [2]. [28] proposes a numerical scheme that has spectral accuracy for smooth media by truncating the interactions on the physical domain. [1] presents an adaptive method for the Lippmann-Schwinger equation in 2D. [8] solves the 2D Lippmann-Schwinger equation with a technique which is now often referred to as recursive interpolative factorization or recursive skeletonization, where the setup cost is and the solve cost is . [31] approximates the discretized dense system by a sparse system, and applies the nested dissection factorization [15] to the sparse system as a preconditioner to the original dense system. The costs are dominated by merely the nested dissection solver, which are and for setup and solve in 2D, and for setup and solve in 3D respectively. [33] combines the sparsifying preconditioner [31] with the method of polarized traces [32] to design a preconditioner for the Lippmann-Schwinger equation in 2D, which achieves setup and solve costs. As far as we know, [33] is the first to achieve near-linear cost in 2D high frequency cases.

Meanwhile, a series of domain decomposition methods were developed to solve the Helmholtz equation with Sommerfeld radiation condition [13]. The idea is to divide the domain into slices and impose suitable transmission conditions between these slices. These methods reduce the computational costs to for setup and for solve in 2D, and for setup and for solve in 3D, which is a notable improvement over the nested dissection method. A recursive technique [20] further reduces both the setup and solve costs in 3D to .

This work combines the sparsifying preconditioner in [31] with the sweeping preconditioner in [14] to develop a new preconditioner which solves the Lippmann-Schwinger equation in near-linear cost. The sketch of the method is as follows. We first construct two types of compact stencil schemes to approximate the discretized dense system by a sparse system, and then apply the sweeping factorization to the sparse system. The solving process of the sweeping factorization induces an approximating solution, which defines a preconditioner to the original system. The setup and application costs are and in 2D and and in 3D respectively. Furthermore, the costs in 3D can be reduced to for setup and for application by a recursive sweep similar to [20]. When combined with the standard GMRES solver, the preconditioner only needs a few iterations to converge, where the iteration number is almost independent of the angular frequency as shown by the numerical results. To the best of our knowledge, this is the first algorithm to solve the Lippmann-Schwinger equation in near-linear cost in 3D high frequency cases.

Another highlight of this work is the newly designed compact stencil introduced for the preconditioner. The design approach focuses on fitting the stencils to the wave data given by the analytic expressions such as the Green’s function. This approach is quite different from the state-of-the-art methods [26] to design compact stencils, which focus more on the analytic property of the underlying differential operator. Numerical results show that, when used as a method for solving the Helmholtz equation, this scheme is comparably as accurate as the Quasi-Stabilized FEM (QSFEM) method in [5] in terms of the phase error.

The rest of the paper is organized as follows. Sections Section 2 and Section 3 present the preconditioners and the numerical results in 2D and 3D respectively, where the detailed approach is explained in Section 2 for the 2D case, and Section 3 generalizes it to 3D with necessary modifications. Section 4 presents numerical results to show the validity of the compact stencil sparsifying scheme presented in this work when used as a direct method. Conclusions and future work are given in Section 5.

2Preconditioner in 2D

This section describes the preconditioner for the 2D Lippmann-Schwinger equation. Starting by formalizing the dense linear system obtained from discretization, we transform it into an approximately sparse one by introducing two types of compact stencils. After that, the sweeping factorization is used to solve the truncated sparse system approximately. The whole process can then be treated as a preconditioner for the original dense system of the Lippmann-Schwinger equation.

2.1Problem formulation

Without loss of generality, we assume that and that is supported in . The task is to discretize the Lippmann-Schwinger equation and solve for in .

The domain is discretized by a uniform Cartesian grid, which allows for the rapid evaluation of the convolution in by FFT. Let be the number of grid points per unit length, be the step size, and be the number of degrees of freedom.

Denote as the 2D index point and as the grid point with step size by

Let be the index set of the grid points in and be the set of the corresponding grid points, given by

We also introduce as the index set for and as the boundary index set by

and correspondingly we have and as

Let be the numerical solution of at for . To compute the integral in , we use the Nyström method

where

and is the weight given by a quadrature correction at the singular point of at , which achieves accuracy when is smooth [12]. This gives the discretized equation

where

and is the discrete value of the incoming wave. Higher order quadrature can be achieved by using more extended local quadrature correction [12].

With a slight abuse of the notations, we extend the discrete vectors and to the whole 2D grid by zero padding

Introducing matrix with , can be written into a more compact form

where .

A subtle difference between and is that, is a set of equations for the unknowns with indices , while can be regarded as an equation set defined on the infinite index set , where the unknown vector is also extended to the whole 2D grid with the extension value determined by the equation implicitly. We have two observations for

  • The solution of agrees with the one of in . To get the numerical solution of in , we can solve and then restrict the solution to instead of solving .

  • The solution of does not match the numerical solution of outside since the zero padding of differs from the discretized value of the right-hand side of in . Nonetheless, this is not an issue as we only care about the solution of in .

One may ask: why do we extend the discrete domain to the infinite grid and consider a problem with infinite size? Besides, the zero padding pattern of seems rather irrational as it creates discontinuities at . The answer is that, we are not going to actually solve the -size problem. The purpose of extending the unknown to a larger domain is to introduce the wave attenuation by PML on the extended grid to simulate the Sommerfeld radiation condition as we shall see in Section ?. The zero padding of is to ensure that there is no source outside such that the PML approximation holds.

The reader may notice that, if we just use the discretized value of the right-hand side of defined on the whole plane, the solution will also satisfy the Sommerfeld condition, so it seems meaningless to introduce the zero padding. It is true that the right-hand side of on the whole plane will induce a solution satisfying the radiation condition. However, in some cases, when solving , we are only given defined in without knowing the actual incoming wave , and it’s computationally impractical to get the extension of determined by . This is especially true when we develop preconditioners where the input only involves the right-hand side in the domain of interest.

With the extended problem , we are now ready to build a sparse system to approximate .

2.2Sparsification

In this section, we adopt the idea of the sparsifying preconditioner [31] to build a sparse system which serves as an approximation to . The sparse system to be constructed has the same sparsity pattern as a compact stencil scheme, i.e., each equation only involves the unknowns at one grid point and its neighbor points, unlike where each equation is dense in .

To be specific, we define as the neighborhood for the index

Now the task is to build for each point a local stencil supported only in . We shall build two types of stencils in what follows. The first type is for the interior points, while the second type is for the points near the boundary which are inside what we call “the PML region”.

The perfectly matched layer (PML) [6] is a technique to attenuate the waves exponentially near the boundary of the domain so that the zero Dirichlet boundary conditions can be imposed directly to simulate the radiation boundary condition without bringing in too much error. We will explain the PML usage during the construction of the second type of the stencils.

Let’s start with extending our domain . Denote as with an -size extension, as the PML extension of with width , given by

Here is the PML width, where is the number of discrete layers in each side of the PML region. is typically around one wavelength. The PML region is where we will attenuate the scattered field in Section ?. Note that there is a small -distance between the domain of interest and the PML region . This small distance is introduced on purpose and the reason will be clear later.

The corresponding index sets in these regions are

Similar to the notations and , we introduce , etc, as the corresponding grid point sets, boundary sets, closures and so on. The meanings are straightforward and we omit the formal definitions. See Figure 2 for an illustration.

[!ht]

Figure 2: This figure is an illustration of \Omega, \Omega^h, \Omega^{h+\eta}, \mathcal{D}, \mathcal{D}^{h}, etc where n=18, b=4. The inner green box is the boundary of \Omega, the outer green box (with blue points on it) is the boundary of \Omega^h and the big blue box (with gray points on it) is the boundary of \Omega^{h+\eta}. The set of the all the green points is \mathcal{D}^h, and the set of all the blue points is \mathcal{D}^{h+\eta}\setminus \mathcal{D}^h. \mathcal{D} is the set of all the green points strictly inside the inner green box, while \partial \mathcal{D} consists of the points exactly located on the inner green box. The gray points form \partial \mathcal{D}^{h+\eta} where we will impose the zero Dirichlet boundary conditions after wave attenuation. Each of the 3\times 3 green grid corresponds to a neighborhood \mu_i for some i \in \mathcal{I}^h where we construct stencils of the first type. Correspondingly, the 3\times 3 blue grids are for \mu_i with i \in \mathcal{I}^{h+\eta} \setminus \mathcal{I}^h where we build stencils of the second type.
Figure 2: This figure is an illustration of , etc where . The inner green box is the boundary of , the outer green box (with blue points on it) is the boundary of and the big blue box (with gray points on it) is the boundary of . The set of the all the green points is , and the set of all the blue points is . is the set of all the green points strictly inside the inner green box, while consists of the points exactly located on the inner green box. The gray points form where we will impose the zero Dirichlet boundary conditions after wave attenuation. Each of the green grid corresponds to a neighborhood for some where we construct stencils of the first type. Correspondingly, the blue grids are for with where we build stencils of the second type.

We now describe how to design two types of stencils for the unknowns indexed by : first for the ones in and then for the ones in . At the end, we assemble them together to form our sparse system.

Stencils for the interior points in

Following the approach in [31], we design the first type of the stencils for the neighborhood where (see Figure 2, the green grids). Taking out the equations in indexed by we have

which can be written as

Here are some explanations for the notations in :

  • The subscript stands for the corresponding vector restricted to the index set , for example, is the vector of the elementwise multiplication of and restricted to .

  • , which is the complement of with respect to .

  • is the sub-matrix of with row index set and column index set .

Let’s consider a linear combination of the equations in . Suppose is a column vector supported on . Multiplying both sides of by gives

where is the conjugate transpose of .

To design a local stencil, we hope that the resulting equation only involves unknowns indexed by . Observing the left-hand side of , we found if , then we can truncate the terms involving and the resulting equation will be local. But does there exist an such that ? The answer is yes. The reason is that the elements of are defined by the Green’s function , which satisfies

Each column of the matrix can be treated as the Green’s function centered at some grid point indexed by and evaluated at the points indexed by the neighborhood , which does not involve the singular point of at . Thus it’s reasonable to expect some local stencil , which can be thought of as a discretization of the local operator , such that . By the translational invariance of the Green’s function, to find such , it suffices to require that , where

which means that we can translate the index to the origin and consider an equivalent problem. Here the complement of is taken with respect to a larger index set. The reason is that, when we translate different indices to the origin, the corresponding complement will also be translated. The larger set is taken as the union of all those translated complements to ensure that the condition is sufficient.

To minimize , we consider the optimization problem

The solution is the left singular vector corresponding to the smallest singular value of , which can be solved in .

Once we have , we compute by setting

Then can be approximated as

This defines the local stencil for each .

Note that, if we do the same thing for , the right-hand side will be due to the zero padding of . If we build the stencils for all and combine them with the Sommerfeld radiation condition at infinity, it will induce a discrete DtN map at . This linear map, though existing in theory, is dense and expensive to compute. Section ? circumvents this issue by exploiting PML on the extended domain and introducing the second type of stencils to approximate this dense map efficiently.

Now why do we introduce the -size padded domain and build the first type of stencils for rather than just for ? The reason is that, for , is not necessarily zero, thus we cannot assign to the second type where the corresponding right-hand side is zero. So we enlarge by -size and build stencils of the first type for . Figure 3 shows this subtlety in 1D as an illustration.

[!ht]

Figure 3: This figure is an illustration of the supports of the discrete vectors g and f in 1D where n= 18, b =4 and f_i \coloneqq \alpha^* g_{\mu_i}. We see that by introducing the stencil \alpha, the support of g is enlarged to the support of f by one grid point on each side.
Figure 3: This figure is an illustration of the supports of the discrete vectors and in 1D where and . We see that by introducing the stencil , the support of is enlarged to the support of by one grid point on each side.

Stencils for the PML points in

Next, we design the stencils for (see Figure 2, the blue grids). Define the auxiliary function

where is some positive constant. We attenuate the scattered field in the PML region by introducing the complex stretching

By changing variable from to , we know that the function satisfies the modified Helmholtz equation in the PML region

A simple way to build local stencils for is to discretize explicitly with some local scheme such as the central difference scheme. Unfortunately, it turns out to be not accurate enough to do so. We adopt a different approach. The idea is similar to what we did in the previous section, where we aim to find some local stencil to annihilate a set of given functions evaluated at the points indexed by . In what above, we used the Greens function to design the stencil . Here we use a set of “modified plane waves” to achieve the same goal.

Specifically, we first note that the plane wave function

satisfies the free space Helmholtz equation

Let be the complex stretching of . We immediately have that satisfies by definition. If we were to design a local stencil for where , we would hope that , where is the function evaluated at the grid points indexed by . Note that any direction such that induces a “modified plane wave” . We hope to solve by annihilating as many as possible. To be precise, Let be a set of directions where the elements are sampled uniformly from the unit circle , and be a matrix of size , each column of which is a modified plane wave function with a direction , evaluated at the grid points indexed by . Then we solve by

Intuitively, it’s better to increase the sample size to improve the reliability of the stencil. However, larger sample size also leads to more computational cost. Fortunately, it turns out that not too many samples are needed for a reliable result. It suffices to use only the eight most common directions – north, south, west, east, northwest, northeast, southwest and southeast – to form , and is given by the vector perpendicular to the eight corresponding vectors on . Note that the solution is unique up to a coefficient since we have independent modified plain waves and the size of the neighborhood is .

In the PML region, we need to compute different stencils for different neighborhoods due to the lack of translational invariance as a result of the complex stretching. Nevertheless, by the symmetry of the stretching, we only need to compute the stencils near a corner of , which takes only work in total. See Figure 4 for an illustration.

[!ht]

Figure 4: By the symmetry and the translational invariance of the complex stretching along each dimension, one only needs to compute the PML stencils for the points marked with red color near the top right corner.
Figure 4: By the symmetry and the translational invariance of the complex stretching along each dimension, one only needs to compute the PML stencils for the points marked with red color near the top right corner.

We denote as the stencil for , then the corresponding approximating equation is

This defines the local stencil for each .

Assemble together

Assembling and together and noting that for , we have

where , , and are given in , and respectively. Noticing also that almost satisfies the zero Dirichlet boundary conditions

we can introduce the sparse linear system

for the unknown defined on that serves as an approximation to . In what follows, we write this system conveniently as

where the right-hand side is given by

The system is defined on . To get the unknowns on , we simply solve and extract the solutions on . The result is an approximation to the true solution of , and this process can serve as a preconditioner for the linear system . In the next section, we present an approach for approximating the solution of efficiently by leveraging the idea of the sweeping preconditioner.

2.3Sweeping factorization

In this section, we adopt the sweeping factorization to solve the sparse system approximately. The main idea of the sweeping factorization is to divide the domain into slices and eliminate the unknowns slice by slice. An auxiliary PML region is introduced for each slice to build a subproblem to approximate the inverse of the Schur complement during the Gaussian elimination to save computational cost.

To be specific, we first divide the 2D grid into slices along the direction. Each slice contains only a few layers. The leftmost slice contains the left PML region and the rightmost one contains the right PML region (see Figure 5). For simplicity, we assume that each of the middle slices contains layers and each of the two boundary slices contains layers – normal layers plus attenuating layers in the PML region. Let be the discrete points in each slice correspondingly, and define and as the restrictions of and on respectively. The sparse system can be written as the block tridiagonal form

where ’s are the corresponding sparse blocks. Note that we use the bracket subscripts to emphasize that the corresponding unknowns are grouped together in each slice.

[!ht]

Figure 5: The grid points are divided into 5 slices along the x_1 direction. Each of the middle slices contains 4 layers, and each of the two boundary ones contains 4 more attenuating layers in the PML region.
Figure 5: The grid points are divided into slices along the direction. Each of the middle slices contains layers, and each of the two boundary ones contains more attenuating layers in the PML region.

We introduce the Schur complement and its inverse slice by slice recursively

Then we can solve by the Gaussian elimination

The expensive part of the above process is to compute and apply it to the vectors on . If say we formed directly, the computation would take steps and the application steps. The sweeping factorization reduces the cost by approximating with a subproblem. To introduce the approximation, we first make a key observation of the operator : inverting the top left block of , one notices that appears at the bottom right block of the resulting matrix. In other words

This means is the restriction of to . Think of as an operator from some input vector to on the grid . Then given , we can compute by solving the equation

where is exactly equal to . That is to say, given , we can find by padding with zeros on , solving the unknowns on by and then extracting the solution on .

Note that the right-hand side of is zero on , thus the only role of the first blocks of equations in is to induce the radiation condition at the left boundary of implicitly. To simulate this radiation condition, one can directly put the PML region to the left side of instead of putting it far away on . That’s the key idea of the sweeping factorization: move the PML region adjacent to the domain of interest and approximate the operator by solving a much smaller system compared to (see Figure 7).

[!ht]

Figure 6: This figure is an illustration of the moving PML method. Left: T_{[3]} is the restriction of H_{[1:3,1:3]}^{-1} to the discrete domain \mathcal{D}_3. Right: We move the PML adjacent to \mathcal{D}_3 to induce the approximation operator \tilde T_{[3]} as the restriction to \mathcal{D}_3 of the corresponding subproblem on \mathcal{D}_{2:3}.
Figure 6: This figure is an illustration of the moving PML method. Left: is the restriction of to the discrete domain . Right: We move the PML adjacent to to induce the approximation operator as the restriction to of the corresponding subproblem on .
Figure 7: This figure is an illustration of the moving PML method. Left: T_{[3]} is the restriction of H_{[1:3,1:3]}^{-1} to the discrete domain \mathcal{D}_3. Right: We move the PML adjacent to \mathcal{D}_3 to induce the approximation operator \tilde T_{[3]} as the restriction to \mathcal{D}_3 of the corresponding subproblem on \mathcal{D}_{2:3}.
Figure 7: This figure is an illustration of the moving PML method. Left: is the restriction of to the discrete domain . Right: We move the PML adjacent to to induce the approximation operator as the restriction to of the corresponding subproblem on .

By introducing the modified plain waves, we can build the local stencils for points in the auxiliary PML region on the left of similar to what was done in Section ?. A subtle difference is that, the local spacial frequency is perturbed to instead of at location , and we need to use this local frequency to build the local stencil for each point.

To save computational cost of the stencil construction, we do not use the exact value of the local frequency. Though building local stencil in the PML region with the exact local frequency takes only constant steps per point in theory, the constant is not small since it involves finding the kernel of a matrix. Instead, we consider the square frequency range:

We choose some samples uniformly from this range interval, and build local stencils only for those samples. Then for each point in the PML region, we assign the stencil to be the one from the samples with the closest local square frequency value, a technique introduced earlier in [21]. In practice, only samples will be enough for an accurate approximation. So it only takes steps to build these stencils, which is negligible compared to the problem size . An intuition of why we only need samples is that, is the size of the variation in one neighborhood on average, so there’s no need to make the sampling scale smaller than that.

With the auxiliary PML region on the left of , we can solve a much smaller system instead of solving . In our setting, the set of the auxiliary PML points for is just since the width of the PML region is the same as . The auxiliary system can be written as

where the bottom block of equations is inherited from , and the top block is defined by the local stencils of the second type in the auxiliary PML region, the role of which is to simulate the radiation boundary condition on the left of .

A minor problem here is that the auxiliary PML region for consists only the normal layers in rather than all the layers, so needs a slight modification for : we restrict the columns of to the normal layers in so that the two blocks are compatible. This problem is inessential and the patch here is only to make the discussion strictly correct. In practice, the width of the slices and the PML regions can be rather flexible.

Equation defines an approximating operator for by restricting the system on to . Note that for , is exactly equal to if we treat as naturally. Compared to , Equation is a much smaller quasi-1D problem, which can be solved efficiently with the LU factorization.

2.4Putting together

We now have all the tools needed to design a linear-complexity preconditioner for the discretized Lippmann-Schwinger equation . The setup and application processes of the preconditioner are given by Algorithms ? and ? respectively. The slice width is typically a small integer less than , thus both the setup and the application costs are linear.

We would like to make some comments below for the actual implementation of the algorithm.

  1. The algorithm presented above constructs the sweeping factorization along the direction from left to right. Indeed, since we have radiation conditions on all sides of the domain, we can construct the factorization from both sides and sweep toward the middle slice. The two sweeping fronts can be processed independently until they meet in the middle, where they exchange some local information in the middle slice and then sweep back to the boundaries independently. This is potentially helpful for the parallelization of the algorithm.

  2. The widths of the slices and the auxiliary PML regions are completely arbitrary. There are two reasons why we set them to be uniformly in what above. The first is for the simplicity of discussion. The second is that, given the PML width , it is optimal to set the width of each slice to be also to minimize the setup and application costs of the preconditioner. In practice, it may not be possible to uniformly divide the domain where each slice contains layers exactly. In that case, we change the widths of one or two slices accordingly, which has negligible effect to the cost and efficiency of the preconditioner.

  3. The constructions of the stencils and , though depending on and , are essentially independent of the velocity field . First, the computation of and only involves the free space Green’s function , where the velocity field is completely irrelevant. Next, for the local PML stencils , they might depend on slightly, but only on the range as we see from the sampling process. In practice , so the range is actually bounded for fixed . Thus we can precompute the stencils without given the velocity field. This means that the stencil construction only needs a fixed cost for given problem size, which can be eliminated from the setup process of the algorithm for the input .

2.5Numerical results

In this section we present the numerical results in 2D. The algorithm is implemented in MATLAB and the tests are performed on a 2.4-GHz server. We force MATLAB to use only one computational thread to test the sequential time cost. The preconditioner is combined with the standard GMRES solver with relative tolerance and restart value . The domain is discretized with where is the typical wavelength.

We choose as the width of the slices and the auxiliary PML regions. This corresponds to about one wavelength width for the PML regions and the slices used in the sweeping preconditioner. The sweeping factorization is built with two fronts sweeping toward the middle slice, and the middle slice is padded with auxiliary PMLs on both sides for the corresponding quasi-1D subproblem.

Four velocity fields are tested in 2D, which are

  1. A converging Gaussian centered at .

  2. A diverging Gaussian centered at .

  3. randomly placed converging Gaussians with narrow width.

  4. A random velocity field that is equal to at .

The incoming wave for each test is a plane wave shooting downward at frequency . The test results are given in Tables Table 1, Table 2, Table 3 and Table 4 respectively. The notations in the tables are listed as follows.

  • is the angular frequency.

  • is the number of unknowns.

  • is the setup cost of the preconditioner in seconds.

  • is the application cost of the preconditioner in seconds.

  • is the iteration number.

  • is the solve cost of the preconditioner in seconds.

[!ht]

image
image
Table 1: Numerical results for velocity field in 2D. Top: The velocity field (left) and the total wave field (right) for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e
e e e

[!ht]

image
image
Table 2: Numerical results for velocity field in 2D. Top: The velocity field (left) and the total wave field (right) for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e
e e e

[!ht]

image
image
Table 3: Numerical results for velocity field in 2D. Top: The velocity field (left) and the total wave field (right) for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e
e e e

[!ht]

image
image
Table 4: Numerical results for velocity field in 2D. Top: The velocity field (left) and the total wave field (right) for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e
e e e

From the numerical tests we observe that both the setup time and the application time scale linearly in , which are in accordance with the complexity analyses. More importantly, the iteration numbers change only slightly as the problem size grows, almost independent of .

We notice that the iteration number also depends on the velocity field. For simple fields such as the diverging Gaussian, it requires less iterations compared to more complicated fields such as the narrower converging Gaussians. This makes sense intuitively since converging lenses and velocity fields with drastic local variations increase the oscillations and refractions of the wave field, thus the corresponding systems are harder to solve. In addition, for the sweeping factorization to work well, we need to assume that there are no strong reflections and refractions during the transmission of the waves so that the auxiliary PMLs in the intermediate slices can make correct approximations to the true underlying DtN maps. In practice, moderate amount of wave-ray bendings can be taken care of by a few more iterations as we see in the tests for the multiple diverging Gaussians and the random field. If the velocity field is even worse, for example, if the field has large region of strong discontinuities, then neither will the Nyström method be able to give an accurate discretization scheme, nor can the sweeping factorization provide an accurate approximating solution due to the strong reflections caused by the discontinuities. Thus for our preconditioner to work, we require certain smoothness from the velocity fields. Nonetheless, as we can tell from the numerical examples, the preconditioner works well even when the fields have drastic transitions in narrow regions. So this approach can be widely applied to many use cases.

3Preconditioner in 3D

This section presents the preconditioner in 3D. As we see from Section 2, the approach is essentially dimension independent and it can be easily generalized to 3D. We will keep the description short, mainly emphasizing the differences compared to the 2D case so that the reader can get the central idea effortlessly. The 3D numerical results for both the recursive approach and the non-recursive approach of the sweeping factorization are provided in the second part of this section.

3.1Problem formulation, sparsification and sweeping factorization

In this section we formulate the approach in 3D. All the notations in 2D can be easily reused without causing any ambiguities. We will keep them unless otherwise stated.

We assume contains the support of . The domain is discretized with step size in each dimension. A similar quadrature correction formula is used for the central weight of the Green’s function, which gives an accuracy of .

For the sparsification process, the first type of stencils and can be constructed similarly, where now each neighborhood has points. For the second type of stencils in the PML region, we use the modified plain waves in 3D, defined similarly as

where now and are in , and is stretched to the complex plane from for all three coordinates. In 2D, the stencil is defined as the kernel vector which annihilates the independent waves shooting toward the eight most common directions. This can be done similarly in 3D. We now need a set of directions, which is defined as

In other words, these are the directions shooting from the center of a neighborhood to the boundary neighbor points.

The computational cost of constructing the stencils in 3D seems higher due to more degrees of freedom and larger size of the neighborhoods. But indeed, the relative cost compared to the sweeping factorization is lower than the 2D case, let alone that the stencil computations are independent of the velocity field and they can be done by a once-in-a-life-time preprocessing.

For the sweeping factorization, the domain are now divided into quasi-2D slices. The auxiliary PMLs are padded to each slice similarly. Each subproblem is quasi-2D, which can be solved efficiently by the nested dissection algorithm with setup cost and application cost. Consisting of subproblems, the whole process has a total setup cost and application cost . Note that the direct use of the nested dissection algorithm to the 3D sparse system costs for setup and for solve. The sweeping factorization drastically reduces the costs by dimension reduction.

For each of the quasi-2D problem, we can sweep similarly along the direction, reducing it to quasi-1D subproblems. This reduces the setup cost to and the application cost to , which are both linear in , but more sensitive to the slice width . We call this the recursive approach [20] while the one in the previous paragraph as non-recursive.

3.2Numerical results

In this section we present the numerical results in 3D. The test configurations are the same as Section 2.5 unless otherwise stated. In the 3D tests, we set for the slice width and PML width.

The four velocity fields tested are

  1. A converging Gaussian centered at .

  2. A diverging Gaussian centered at .

  3. randomly placed converging Gaussians of narrow width.

  4. A random velocity field that is equal to at .

The right-hand side is a plain wave shooting downward at frequency .

The tests of the non-recursive approach are given in Tables Table 5,Table 6, Table 7 and Table 8, and the ones of the recursive approach are in Tables Table 9, Table 10, Table 11, Table 12, where the relative costs compared to the non-recursive approach are also listed as percentages, together with the iteration numbers of the non-recursive ones in the parentheses for the convenience of comparison.

[!ht]

image
image
Table 5: Numerical results for velocity field in 3D with the non-recursive approach. Top: The velocity field (left) and the total wave field (right) in a cross-section view for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e

[!ht]

image
image
Table 6: Numerical results for velocity field in 3D with the non-recursive approach. Top: The velocity field (left) and the total wave field (right) in a cross-section view for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e

[!ht]

image
image
Table 7: Numerical results for velocity field in 3D with the non-recursive approach. Top: The velocity field (left) and the total wave field (right) in a cross-section view for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e

[!ht]

image
image
Table 8: Numerical results for velocity field in 3D with the non-recursive approach. Top: The velocity field (left) and the total wave field (right) in a cross-section view for . Bottom: Table of the numerical results for different problem sizes.
e e e
e e e
e e e
e e e

[!ht]

Table 9: Numerical results for velocity field in 3D with the recursive approach.