1 Introduction

# A fast direct solver for quasi-periodic scattering problems

## Abstract.

We consider the numerical solution of the scattering of time-harmonic plane waves from an infinite periodic array of reflection or transmission obstacles in a homogeneous background medium, in two dimensions. Boundary integral formulations are ideal since they reduce the problem to unknowns on the obstacle boundary. However, for complex geometries and/or higher frequencies the resulting dense linear system becomes large, ruling out dense direct methods, and often ill-conditioned (despite being 2nd-kind), rendering fast multipole-based iterative schemes also inefficient. We present an integral equation based solver with complexity, which handles such ill-conditioning, using recent advances in “fast” direct linear algebra to invert hierarchically the isolated obstacle matrix. This is combined with a recent periodizing scheme that is robust for all incident angles, including Wood’s anomalies, based upon the free space Green’s function kernel. The resulting solver is extremely efficient when multiple incident angles are needed, as occurs in many applications. Our numerical tests include a complicated obstacle several wavelengths in size, with and solution error of , where the solver is 66 times faster per incident angle than a fast multipole based iterative solution, and 600 times faster when incident angles are chosen to share Bloch phases.

## 1. Introduction

Numerical modeling of the scattering of linear time-harmonic waves from materials with periodic geometry plays a key role in modern optics, acoustics, signal processing, and antenna design. Periodic diffraction problems where numerical modeling is crucial include the design of gratings for high-power lasers [6], thin-film solar cells [4] and absorbers [39], process control in semiconductor lithography [34], linear water-wave scattering from pillars [37], and radar sensing of ocean waves. In all such problems, an incident plane wave generates a scattered wave which (in the far field) takes the form of a finite sum of plane waves at known Bragg angles, whose amplitudes are desired.

Several challenges arise in the efficient numerical solution of grating scattering problems: (i) the period of the gratings can be many wavelengths in size, (ii) in many applications (such as photovoltaic [4] or solar absorber design [39]) solutions are needed at many incident angles and/or frequencies, (iii) the scatterer may have physical resonances in the form of guided modes (see Remark 2.1), which leads to ill-conditioned problems, and (iv) so-called Wood’s anomalies may occur, that is, scattering parameters (incident angle and frequency) for which one of the Bragg angles lies precisely along the grating. Note that challenges (iii) and (iv) are distinct: (iii) is a physical resonance leading to an ill-posed problem (see the reviews [41, 30]), whereas Wood’s anomalies do not cause ill-posedness—and yet they do cause problems for many numerical schemes.

In most problems of interest, gratings consist of homogeneous media delineated by sharp interfaces; hence the corresponding partial differential equations (PDEs) have piecewise-constant coefficients. This manuscript focuses on the following two-dimensional Dirichlet boundary value problem, which models acoustics with sound-soft obstacles, or electromagnetic scattering in TM (transverse magnetic) polarization from perfect electric conductors (where represents the out-of-plane electric field [5]). We seek the scattered wave which solves,

 (1.1) (Δ+ω2)u(x)=0  x:=(x,y)∈R2∖¯¯¯¯¯¯¯ΩZu(x)=−ui(x)x∈∂ΩZuradiative as   y→±∞ ,

where is a single bounded obstacle which is repeated to form an infinite grating of obstacles (denoted by ) of period along the -axis (see Fig. 1.1(a)). A plane wave with frequency , angle , and wavevector is incident; hence outside the obstacles and vanishes inside. It is easy to see from (1.1) that the total field vanishes on : this is the physical boundary condition. For the exact radiation condition on the scattered field see (2.5)-(2.6). The incident field, and hence the scattered field, satisfy a quasi-periodicity condition,

 (1.2) u(x+d,y)=αu(x,y)∀(x,y)∈R2 ,

where is known as the Bloch phase. We will also consider the transmission problem corresponding to dielectric obstacles (see section 5).

Integral equation formulations are a natural choice for problems with piecewise-constant coefficients such as (1.1): they exploit this fact by reducing the problem to an integral equation whose discretization involves unknowns living on the boundary alone. This has many advantages over finite elements or finite difference schemes: it reduces the dimensionality by one, allowing for more complicated boundaries to be solved to high accuracy with a small , and is amenable to easily implemented high-order quadratures.

In the grating application, the infinite boundary must be reduced to the boundary of a single obstacle ; there are two main approaches to this task of “periodizing” the integral equations.

The first approach replaces the Green’s function (fundamental solution) that appears in the kernel of in the integral operators by its quasi-periodized version [30] (which satisfies (1.2)). This is used in two dimensions by Bruno–Haslam [10] and in three by Nicholas [36] and Arens [2]. However, this fails as parameters approach a Wood’s anomaly, because the Green’s function does not exist there. One cure (in the case of a connected interface) is to use the quasi-periodic impedance half-space Green’s function [32, 3]. However, all such periodized kernel methods do not scale well to large because dense matrices must be filled at a cost of order a millisecond per element [26].

The second approach, which is robust at all parameters including at or near Wood’s anomalies, is to return to the free-space Green’s function and instead periodize using a small number of additional unknowns on artificial unit-cell walls. The condition (1.2) is then imposed directly in the linear system. This is used by Wu–Lu [43], and also by one of the authors in [5]. Further advantages of the second approach include: the low-rank nature of the periodizing is explicit (no dense matrices of periodized Green’s evaluations are needed), and that free-space boundary integral equation quadratures and codes may be used without modification.

In this paper, we present a fast direct solver for the formulation in [5], that can handle, in reasonable computation times, complicated boundaries that both demand large and cause ill-conditioning. Section 2 reviews the derivation of the periodized integral formulation in the Dirichlet setting (1.1).

Like many boundary integral equations for two-dimensional domains, this gives a linear system that could be solved via an iterative solver (e.g. GMRES) coupled with a fast matrix-vector multiplication scheme such as the fast multipole method (FMM) [12]. Unfortunately, when the system is ill-conditioned (which often occurs for complicated geometry) it can take hundreds of iterations for such an iterative solver to converge, or worse, it may never converge to an acceptable accuracy. Additionally, iterative methods are not able to efficiently solve problems with multiple right hand sides that occur when the response is needed at multiple incident angles. This has motivated in recent years the development of a collection of fast direct solvers. These solvers utilize internal structure (essentially low-rank off-diagonal blocks) to construct rapidly an approximate inverse of the dense matrices resulting from the the discretization of integral equations. For many problems, the computational cost is with typically or [22, 9, 31, 7, 33, 42], which can be orders of magnitude smaller than traditional dense direct methods.

In addition to being fast, the solvers are robust and construct the approximate inverse in such a way that it can be applied in with (with small constant) computational cost. In this work, we choose a Hierarchically Block Separable (HBS) solver [17]. Like the method in [25], the solver, described in section 4, exploits potential theory to create low-rank factorizations. As a result, the computational cost scales linearly with the number of discretization points for low-frequency problems on many domains.

Upon discretization, the integral formulation of the grating scattering problem in section 2 gives a two-by-two block linear system. Section 3 presents an efficient technique for solving this system with multiple right-hand sides, assuming an inverse of the large block is available. The full fast direct scheme for the problem is then achieved by using the HBS solver of section 4 to compute and apply this large matrix inverse whenever it is needed in the technique of section 3.

We describe how our technique may be adapted to the transmission scattering problem in section 5. Section 6 illustrates the performance of the fast direct solver in some test cases, and compares the computation time to a standard fast iterative scheme. Finally, we draw some conclusions in section 7.

## 2. Integral formulation for the quasi-periodic Dirichlet problem

In this section, we describe an integral equation formulation for the grating scattering problem (1.1) that is based upon [5]. First we rephrase this boundary-value problem as an equivalent one on the unit cell containing a single obstacle.

### 2.1. Restriction to a problem on a single unit cell

We assume that the copies of do not intersect and that we can create a unit cell “strip” containing the closure of . Let the infinite bounding left and right walls be denoted by and respectively (see Figure 1.1(b)). As in [8], the boundary value problem can be rephrased as an equivalent problem on the domain , that is

 (2.1) (Δ+ω2)u = 0 in U∖¯¯¯¯Ω (2.2) u = −u\tiny i on ∂Ω .

Quasi-periodicity of the field is imposed via wall matching conditions,

 (2.3) u|L−α−1u|R = 0 (2.4) un|L−α−1un|R = 0

where the normal derivatives on , are taken to be in the positive direction. The total field satisfies the outgoing radiation condition [8], which is to say that it is given by uniformly-convergent Rayleigh–Bloch expansions,

 (2.5) u(x,y) = ∑n∈Zcneiκnxeikn(y−y0)y>y0,|x|≤d/2 (2.6) u(x,y) = ∑n∈Zdneiκnxeikn(−y−y0)y<−y0,|x|≤d/2

where , so that lies within the vertical bounds , the -component of the th mode wavevector is , and the -component is . The sign of the square-root is taken as non-negative real or positive imaginary. The coefficients and for orders that are propagating () are the desired amplitudes of the Bragg orders mentioned in the Introduction.

The above radiation condition also completes the precise description of the scattering problem (1.1). We now make a remark about well-posedness of (1.1), which of course equally well applies to the single unit cell version presented above.

###### Remark 2.1.

With the radiation condition (2.5)–(2.6), at least one solution exists to (1.1) at all scattering parameters (angles and frequencies ) [8, Thm. 3.2]. Furthermore, at each angle, the solution is unique except possibly at a discrete set of frequencies which correspond to physical guided modes. The only such modes accessible in the scattering setting are embedded in the continuous spectrum [41]. [8, 41] give conditions for nonexistence of such modes are given. These results will also hold for the transmission version where (2.1)–(2.2) are replaced by (5.1)–(5.4) below.

### 2.2. An indirect boundary integral equation formulation

Recall the single- and double-layer potentials [15] on an obstacle boundary , defined as

 (2.7) (S∂Ωσ)(x)=∫∂ΩG(x,y)σ(y)dsy,

and

 (2.8) (D∂Ωτ)(x)=∫∂Ω∂G∂n(y)(x,y)τ(%y)dsy,

respectively. The kernel is the free-space Green’s function for the Helmholtz equation at frequency ,

 (2.9) G(x)=i4H(1)0(ω|x|),x∈R2∖{0},

where is the outgoing Hankel function of order zero.

For the purposes of periodizing, auxiliary layer potentials on the and walls are needed. To handle their infinite extent, we switch from coordinate to the Fourier variable , via

 (2.10) ^g(k)=12π∫∞−∞e−ikyg(y)dy,g(y)=∫∞−∞eiky^g(k)dk,

and make use of the spectral representation of the free-space Green’s function [35, Ch. 7.2],

 (2.11) G(x)=i4π∫∞−∞eikyei√ω2−k2|x|√ω2−k2dk,x=(x,y)

Square-roots are again taken non-negative real or positive imaginary, achieved by taking the branch cuts of the function in the plane as along the real axis and using a so-called Sommerfeld contour [14] for the integration passing from the 2nd to 4th quadrants (Fig 1.1(c)). Inserting (2.11) into the usual expressions for single- and double-layer potentials living on a vertical wall ( will be or , where takes the values or respectively), we get “Fourier layer potentials”

 (2.12) (^SW^μ)(x) = i2∫∞−∞eikyei√ω2−k2|x−x0|1√ω2−k2^μ(k)dk , (2.13) (^DW^ν)(x) = sign(x−x0)2∫∞−∞eikyei√ω2−k2|x−x0|^ν(k)dk .

Here and are interpreted as Fourier-transformed layer densities, or the coefficients of a plane-wave representation.

We use a standard combined-field representation (to avoid spurious interior obstacle resonances [15]) with density on the obstacle boundary, and auxiliary densities which will represent fields due to the remaining lattice of obstacles in the grating, thus

 (2.14) Extra open brace or missing close brace

See Fig. 1.1(b). Imposing the Dirichlet boundary condition in (1.1) whilst the imposing quasi-periodicity at the walls (2.3)–(2.4) results in the first and second rows, respectively, of the following 2-by-2 linear operator system

 (2.15) [A^B^C^Q][η^ξ]=[−u\tiny i|∂Ω0] ,

where the four operators , , and are defined in the remainder of this section. 1

Using jump relations [15], we have where are the boundary operators with the kernels of (2.7) and (2.8), respectively.

The operator gives the effect of the auxiliary densities on the obstacle boundary value. By restricting (2.12)–(2.13) and (2.14) to ,

 (2.16) ^B=[^S∂Ω,L+α^S∂Ω,R,^D∂Ω,L+α^D∂Ω,R] ,

where and denote the operators resulting from restricting (2.12) and (2.13) to evaluation on .

Following [5], we impose the second row of (2.15) in Fourier space to enable an efficient discretization with spectral accuracy. Via (2.7), (2.8) and (2.11), the operators mapping single- and double-layer densities on to their Fourier values on a wall are defined by

 (S∧W,∂Ωσ)(k)=i4π∫∂Ωe−ikyei√ω2−k2|x−x0|√ω2−k2σ(y)dsy,y=(x,y)∈∂Ω

and

 (D∧W,∂Ωτ)(k)=14π∫∂Ωe−ikyei√ω2−k2|x−x0|(−sign(x−x0),k√ω2−k2)⋅n(y)τ(y)dsy .

Similarly, the following two operators which instead map to Fourier normal derivatives on the wall are defined by

 (D∗,∧W,∂Ωσ)(k) = 14π∫∂Ωe−ikyei√ω2−k2|x−x0|sign(x−x0)σ(y)dsy (T∧W,∂Ωτ)(k) = i4π∫∂Ωe−ikyei√ω2−k2|x−x0|(√ω2−k2,−ksign(x−x0))⋅n(y)τ(y)dsy .

These four formulae allow us to express the operator in (2.15), which gives the effect of the density on the Fourier transforms of the wall quasi-periodicity conditions (2.3)-(2.4), as

 (2.17) ^C=[D∧L,∂Ω−iωS∧L,∂Ω−α−1(D∧R,∂Ω−iωS∧R,∂Ω)T∧L,∂Ω−iωD∗,∧L,∂Ω−α−1(T∧R,∂Ω−iωD∗,∧R,∂Ω)] .

Finally, maps Fourier wall densities to Fourier wall quasi-periodicity conditions. Thus, by translational invariance, each of its four blocks must be a pure multiplication operator in . The Fourier coefficients of (2.12)–(2.13) in (2.14) and (2.3)–(2.4) give,

 (2.18) Unknown environment '%
###### Remark 2.2.

Once the operator system (2.15) is solved for and , the desired Bragg amplitudes and appearing in (2.5)–(2.6) are extracted by evaluating the scattered field in (2.14), and its -derivative, at typically 20 equi-spaced samples along the lines , then applying the discrete Fourier transform (e.g. via the FFT). See [5] for more details.

### 2.3. Discretizing the integral equations

This section describes a discretization of the linear operator system (2.15) with high-order accuracy to obtain a finite-sized linear system. We discretize all operator blocks of via the Nyström method [29]. Since the operator has a kernel with a logarithmic singularity at the diagonal, it requires special quadrature corrections. For this, we choose the 6th-order Kapur–Rokhlin rule [27]; however, the fast direct solver is compatible with other recently-developed local-correction quadrature schemes such as that of Alpert [1], that of Helsing [24], generalized Gaussian [23], or QBX [28].

The boundary is parameterized by the smooth -periodic function , then discretized using the -point global periodic trapezoid rule with nodes , . Then the -by- matrix represents the operator . The Kapur–Rokhlin scheme modifies the weights (but not the nodes) near the diagonal, giving the matrix entries

 Anm=(2π/N)R|n−m|[∂G(yn,ym)/∂n(ym)−iωG(yn,ym)]|z′(2πm/N)|.

The values are given in terms of from the left-center block of [27, Table 6], as follows: , while for , and otherwise.

There is some freedom in choosing a contour for the Sommerfeld -integrals in (2.12)–(2.13). We choose a hyperbolic tangent curve of height and use the trapezoid rule in the real part of with spacing , truncated to maximum real part (see Fig. 1.1(b) and [5]). is chosen such that the integrand is exponentially small beyond this real part. There are nodes for . Since the scheme is exponentially convergent, is typically only 100 for full machine precision when is around 10. Note that since there are two types of auxiliary densities, there are unknowns in the periodizing scheme. Using the above quadrature nodes and weights, and are simply Nyström discretizations of (2.16) and (2.17), while has four diagonal sub-blocks with entries given by (2.18) evaluated at the Sommerfeld nodes. The entire square block system in unknowns is written (we drop the symbols from now on for simplicity),

 (2.19) [ABCQ][ηξ]=[b0] ,

where is the vector of values of at the nodes on .

###### Remark 2.3.

Once and have been solved for, the desired Bragg amplitudes may be computed as in Remark 2.2, evaluating (2.14) using the underlying Nyström quadratures on and in the Fourier variable.

### 2.4. Improving the convergence rate, and robustness near Wood’s anomalies

In this section, we briefly sketch two ways to improve the convergence rate and robustness of the integral formulation (see [5] for full details).

Firstly, we include nearest neighbor images of the obstacle in the representation, replacing (2.14) with

 (2.20) u=P∑j=−Pαj(D∂Ω+jd−iωS∂Ω+jd)η+(^SL+α^SR)^μ+(^DL+α^DR)^ν in U∖¯¯¯¯Ω

where is the lattice vector and is the number of nearest neighbors on either side of to be included. Since the auxiliary periodizing wall densities have to represent fields whose nearest singularities are now further away, this improves the convergence rate with respect to . We have found that or is optimal. As a result, will now contain not only a self-interaction of , but interactions from the boundaries of obstacles neighboring .

Recall that is already defined as the matrix from Nyström discretization of the self-interaction operator , where and represent the integral operators with single- and double-layer kernels acting from source curve to target curve . For , let denote the matrix resulting from Nyström discretization of the operator . This expresses the effect on of a neighbor obstacles from . Let

 ~A=A+P∑j=−Pj≠0αjAj.

Then, with this new representation, the block linear system (2.19) is replaced by

 (2.21) [~AB~CQ][ηξ]=[b0] ,

where is the Nyström discretization of

 ~C=⎡⎣αP(D∧L,∂Ω+Pd−iωS∧L,∂Ω+Pd)−α−(P+1)(D∧R,∂Ω−(P+1)d−iωS∧R,∂Ω−(P+1)d)αP(T∧L,∂Ω+Pd−iωD∗,∧L,∂Ω+Pd)−α−(P+1)(T∧R,∂Ω−(P+1)d−iωD∗,∧R,∂Ω−(P+1)d)⎤⎦ .

which results from the new representation via some cancellations as in [5].

The scheme as presented so far would fail as one approaches a Wood’s anomaly, since the auxiliary densities and contain poles at , and as any approaches the origin this makes the Sommerfeld quadrature highly inaccurate [5, Sec. 5]. Thus, the second way to improve the scheme is to, in this situation, displace the Sommerfeld contour by an real value such that it is far from all poles . This causes an incorrect radiation condition for the Rayleigh–Bloch mode whose pole the contour crossed. However, this is fixed simply by including an unknown coefficient for this mode in the representation (2.20), and imposing the one extra linear condition that the radiation condition (2.6) be correct. The elements of the matrix row which imposes the condition that there be no incoming radiation in the relevant mode are computed as in Remark 2.3.

The net effect is that, by expanding the system (2.19) or (2.21) by one extra row and column, parameters at or near Wood’s anomalies are also handled to machine precision. This robustness distinguishes the scheme from many other integral-equation-based solvers.

## 3. A direct solution technique

This section presents techniques for solving the discretized linear system (2.21) in the case of multiple incident angles, using only a single inversion of the matrix . Coupling these techniques with the method described in section 4 and [18] will result in a fast direct solver for the Dirichlet scattering problem (1.1).

For simplicity, we start with the block solution technique assuming no contributions from neighboring obstacles, i.e. . Then we describe how to handle by exploiting the internal structure of the matrices characterizing the interaction between and its nearest neighbors.

### 3.1. The block solve (case P=0)

When there are no neighbor contributions, the linear system (2.19) has solutions and given by

 (3.1) ξ =(Q−CA−1B)−1A−1b η =A−1b−A−1Bξ.

Their computation involves inverting two matrices and . Of course this assumes that is invertible; however, this is known to hold for sufficiently large since it is the Nyström discretization of the injective operator arising in the scattering from an isolated obstacle [16, p. 48]. Since the size of is much less than (typically ), the cost of the solve is dominated by the inversion of the matrix . Fortunately, has internal structure which makes it amenable to an inversion technique described in section 4. Notice that while , and do depend upon the incident angle via their dependence on , the matrix does not. Thus may be inverted once and for all at each frequency .

###### Remark 3.1.

At a fixed frequency , there may be multiple incident angles that share a common Bloch phase via the relation . Let be the number of incident angles that share an . It is then easy to check that, within , we have , so that is proportional to the grating period in wavelengths. Notice that the number of matrix-vector multiplies with required to evaluate (3.1) is then , and that need only be accessed twice. We will later exploit this fact for efficiency.

### 3.2. The block solve with neighboring contributions

Recall from section 2.4 that, when contributions from neighboring obstacles are included in the integral formulation, the matrix in (3.1) is replaced by

 ~A=A+P∑j=−P,j≠0αjAj.

In practice, we find or 2 is best as gives little additional accuracy.

Because is separated from its neighbors, the matrices are low rank (i.e. has rank where ). Thus they admit a factorization where and are of size . This means that the matrix that needs to be inverted is where

 L=[α−PL−P|⋯|α−1L−1|αL1|⋯|αPLP]

and

 R=[RT−P|…|RT−1|RT1|…|RTP]T .

The inverse of can be computed using only the inverse of via the Woodbury formula [20]

 (3.2) ~A−1=(A+LR)−1=A−1+A−1L(I+RA−1L)−1RA−1.

Note that the square matrix is only of size thus can easily be inverted with dense linear algebra. Moreover, in practice we do not actually compute . Instead we apply via (3.2) which requires two applications of . For example, to multiply by a vector , we need to evaluate and . For large , this is done via the techniques described in section 4.

Instead of using a QR factorization to find and , we choose to use an interpolatory decomposition [21, 13] defined as follows.

###### Definition 3.1.

The interpolatory decomposition of a matrix that has rank is the factorization

 M=PM(J(1:l),:)

where is a vector of integers such , and is a matrix that contains a identity matrix. Namely, .

Since the and matrices do not involve , they need only be computed once at each frequency, independent of the number of incident angles. However, the cost of computing these factorizations using general linear algebraic techniques such as QR is . This would negate the substantial savings obtained by using the inversion technique for to be described in section 4.

To restore the complexity, we use ideas from potential theory. First, a circle of radius approximately twice the typical radius of is placed concentric with it. From potential theory, we know that any field generated by sources outside of this circle can be approximated arbitrarily well by placing enough equivalent charges on the circle. In practice, it is sufficient to place a small number of “proxy” points spaced evenly on the circle. For the relatively low frequencies (i.e. small ) tested in this paper, we have found it is enough to have proxy points. We call all the points on the neighboring obstacles that are within the proxy circle near points.

Instead of computing the interpolatory decomposition of each , one matrix is generated by computing the interpolatory decomposition of the matrix

 [Aj(:,Inear)|Aproxy)]

where corresponds to the indices of the points on that are near , and is a matrix that characterizes the interaction between the nodes on and the proxy points. We call the points on picked by the interpolatory decomposition skeleton points. Figure 3.1 illustrates the proxy points, near points and skeleton points for a sample domain. This can be used for all nearest neighbors. For each , the is simply given by .

###### Remark 3.2.

When is replaced by in (3.1), and the Woodbury formula used for , the number of matrix-vector multiplies with required to evaluate (3.1) is then .

## 4. Creating a compressed inverse of the matrix A

While the matrix is dense, it has a structure that we call Hierarchically Block Separable (HBS) which allows for an approximation of its inverse to computed rapidly. Loosely speaking, its off-diagonal blocks are low rank. This arises because is the discretization on a curve of an integral operator with smooth kernel (when is not too large). This section briefly describes the HBS property and how it can be exploited to rapidly construct an approximate inverse of a matrix. For additional details see [17]. Note that the HBS property is very similar to the concept of Hierarchically Semi-Separable (HSS) matrices [40, 11].

### 4.1. Block separable

Let be an matrix that is blocked into blocks, each of size .

We say that is “block separable” with “block-rank” if for , there exist matrices and such that each off-diagonal block of admits the factorization

 (4.1) Mσ,τ=Uσ~Mσ,τV∗τ,σ,τ∈{1,2,…,p},σ≠τ.m×mm×kk×kk×m

Observe that the columns of must form a basis for the columns of all off-diagonal blocks in row , and analogously, the columns of must form a basis for the rows in all the off-diagonal blocks in column . When (4.1) holds, the matrix admits a block factorization

 (4.2) M=U~MV∗+D,mp×mpmp×kpkp×kpkp×mpmp×mp

where

 U=diag(U1,U2,…,Up),V=diag(V1,V2,…,Vp),D=diag(D1,D2,…,Dp),

and

 ~M=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣0~M12~M13⋯~M210~M23⋯~M31~M320⋯⋮⋮⋮⎤⎥ ⎥ ⎥ ⎥ ⎥⎦.

Once the matrix has put into block separable form, its inverse is given by

 (4.3) M−1=E(~M+^D)−1F∗+G,

where

 (4.4) ^D= (V∗D−1U)−1, (4.5) E= D−1U^D, (4.6) F= (^DV∗D−1)∗, (4.7) G= D−1−D−1U^DV∗D−1.

### 4.2. Hierarchically Block-Separable

Informally speaking, a matrix is Hierarchically Block-Separable (HBS), if it is amenable to a telescoping version of the above block factorization. In other words, in addition to the matrix being block separable, so is once it has been reblocked to form a matrix with blocks, and one is able to repeat the process in this fashion multiple times.

For example, a “3 level” factorization of is

 (4.8) M=U(3)(U(2)(U(1)~M(0)(V(1))∗+B(1))(V(2))∗+B(2))(V(3))∗+D(3),

where the superscript denotes the level.

The HBS representation of an matrix requires to store and to apply to a vector. By recursively applying formula (4.3) to the telescoping factorization, an approximation of the inverse can be computed with computational cost; see [17]. This compressed inverse can be applied to a vector (or a matrix) very rapidly. Note that for memory movement reasons, it is more efficient to apply to a block of vectors than to each vector sequentially.

## 5. Transmission problem

So far this paper has focused on the Dirichlet problem. However, all of the techniques presented here also apply to the transmission problem, with minor modifications that we describe briefly in this section.

### 5.1. The boundary value problem

This section presents the unit-cell version of the boundary value problem analogous to section 2.1.

Let all obstacles in the grating have refractive index , and the background index be 1. Then the partial differential equation analogous to (2.1) is

 (5.1) (Δ+n2ω2)u = 0 in Ω (5.2) (Δ+ω2)u = 0 in U∖¯¯¯¯Ω

with matching conditions on the boundary analogous to (2.2),

 (5.3) u+−u− = −u\tiny i on ∂Ω (5.4) u+n−u−n = −u\tiny in on ∂Ω .

The quasi-periodicity and radiation conditions are the same as for the Dirichlet case. Problems of this kind correspond to the acoustic scattering from obstacles with a constant wave speed differing from the background value, or electromagnetic scattering from a dielectric grating in TM polarization. Remark  2.1 also applies.

### 5.2. The integral formulation

The transmission obstacle case is handled via an integral formulation often called the Müller–Rokhlin scheme [38]: a pair of unknown densities (where is single-layer and a double-layer) is used on , and these densities also generate potentials inside with the interior wavenumber . The resulting operator turns out to be a compact perturbation of the 2-by-2 identity. Operators and become slightly more elaborate, whilst is unchanged. The complete recipe is found in [5].

### 5.3. The linear system

As with the Dirichlet problem, is discretized via a Nyström method with points, and the Sommerfeld contour is discretized as before with points. However, since there is a pair of densities and defined on the boundary (one of each for every point), the resulting linear system has unknowns. In order for the matrix to be amenable to the fast direct solver described in section 4, it is crucial to interlace the unknowns so that nearby entries in the vector have nearby locations on , namely . This reorders the columns of ; the same reordering is also needed for the rows.

## 6. Numerical examples

In this section, we illustrate the performance of the proposed solution methodology for several different problems. Once the obstacle, period , frequency , and set of desired incident angles are specified, the solver requires two steps:

• Pre-computation: A compressed representation of the matrix is computed via the technique in section 4. If , the low-rank neighbor contributions and are computed as in section 3.2. This step is performed once.

• Block solve: For each distinct Bloch phase derived from the set of incident angles, a -by- right-hand side matrix is formed by stacking the right-hand sides in (2.19) for the incident angles that share this common . The block solve formula (3.1) from Section 3.1 is then applied to this right-hand side matrix, at a cost of matrix-vector products with the compressed inverse.

It is advantageous for the user to choose many incident angles that share a common . This leads an overall efficiency gain of a factor results (assuming ).

All experiments are run on a Lenovo laptop computer with 8GB of RAM and a 2.6GHz Intel i5-2540M processor. The direct solver was run at a requested relative precision of . It was implemented rather crudely in MATLAB, which means that significant further gains in speed should be achievable.

###### Remark 6.1 (error measure).

In the below, the solution error quoted is the “flux error”, namely the absolute difference between the total incoming and outgoing flux. Since these fluxes should be equal, this provides a standard error measure in diffraction problems. It is computed via a weighted sum of the squared magnitudes of the Bragg amplitudes and in (2.5)–(2.6); see [5, Eq. (4.2)]. We have checked that this measure is similar to the pointwise error in the far field.

### 6.1. Scaling with problem size

In this section, we apply the direct solver to both the Dirichlet and transmission scattering problems from a grating of star-shaped domains with period , and the frequency is fixed at . For the transmission case, the index is . We choose a single incident angle . Hence, . The obstacle is defined by the parametrization where the radius function is for angle . Figure 6.1 plots the total field for both cases.

The number of periodizing unknowns were kept fixed, with , while the number of discretization points on the boundary of the obstacle was increased. This experiment is designed to illustrate the scaling of the fast direct solver. (Note that for most of the tested, the boundary is over-discretized: for , the discretized integral equation already has accuracy .) Figure 6.2 illustrates the times for the pre-computation and the solve steps in the direct solver for the Dirichlet and transmission problems both with and without neighbor contributions (). As expected, since is not space-filling and is small, the times scale linearly with . Notice that the cost of adding the contributions from nearest neighbors is small, since their interaction rank is . For the Dirichlet problem with , it takes seconds for the precomputation and seconds for the block solve when , versus seconds for the precomputation and seconds for the block solve when .

### 6.2. Comparison of fast direct solver against a fast iterative solver

In this experiment, we consider a more challenging Dirichlet problem with where the obstacle is the complicated domain whose total field is illustrated in Figure 6.3. This domain is given by a radial function which is a random Fourier series with 201 terms. The first cosine term is set large and negative to give the obstacles the shape of a vertically-oriented ellipse. Its rough surface resembles a dentritic metallic particle. The large number of tight curves and close approaches of the boundary with itself is typical of complex geometries, and demands a large to reach any reasonable accuracy. It also causes ill-conditioning that demands a large number of GMRES iterations in an iterative solver.

The grating period is , corresponding to around 5 wavelengths. The closest distance between obstacles is around . We choose an incident angle corresponding to a Wood’s anomaly (); note that therefore the problem cannot even be solved the standard integral-equation approach based on the quasi-periodic Green’s function. We take . In order to obtain an accuracy of for this problem, unknowns are needed on the boundary of the obstacle () and the number of periodizing unknowns was increased to .

Two methods were used to solve for the unknown densities. Firstly we tested an iterative solution of (2.21) using standard GMRES without restarts, with the publicly-available Helmholtz FMM of Greengard–Gimbutas [19] to apply the matrices in the block (with quadrature corrections near the diagonal where appropriate), and dense matrix-vector multiplication for the other three blocks. Secondly we tested the direct solver scheme presented in this work. Note that the FMM is coded in fortran, whereas our direct solver is (apart from the interpolative decomposition) in MATLAB.

The GMRES+FMM solver takes approximately one hour, taking a large number () of iterations, to solve for the densities. By using the fast direct solver, the densities can be found with minutes for the pre-computation and seconds for the block solve at each . In other words, for this domain, the direct solver can solve independent incident angles in the amount of time it takes the accelerated iterative method to solve for one.

Note that for the neighbor interactions, we chose 200 points on a proxy circle with diameter times the vertical height of the obstacle. The approximate interaction rank between obstacles is then . Note a single matrix vector multiply with takes seconds. Thus if we were to apply to one vector at a time in (3.2) it would take seconds. Thus, since the complete block solve takes seconds, significant timing gains are seen by applying the to block matrices instead of single vectors.

Next, we present an example where a large sampling of incident angles are required, as would be typical for a solar-cell design problem. An arithmetic series in , with around 200 values covering the range , is considered. Their spacing in is , which provides roughly incident angles, where . As discussed above, because around angles share the same , this leads to an additional speed-up of a factor of nearly . It takes minutes to solve for the densities ( minutes of pre-computation, followed by minutes for the block solves.) Notice that this is around 600 times faster than the solution at 200 incident angles would take using GMRES+FMM.

The resulting fractions of incident flux scattered into each of the Bragg modes (i.e.  and ) are shown, as a function of incident angle , in Fig. 6.4 (b).

### 6.3. A challenging transmission problem

Prior to the development of the fast direct solver, accurately solving a transmission problem on complicated domains required a lot of computing time. We applied the direct solver to a transmission problem on the domain from section 6.2. With and all other parameters as in the previous example (thus there are 20241 unknowns), the solver achieves a flux error of in minutes of pre-computation and seconds for each block solve. Figure 6.5 illustrates the total field for this example.

## 7. Conclusion

This paper presented a fast direct solution technique for grating scattering problems with either Dirichlet or transmission boundary conditions. For low frequency problems on simple domains, the computational cost of the solver scales linearly with the number of discretization points on one obstacle. The example in section 6.2 illustrates that when the obstacle is complicated and the frequency somewhat higher, the direct solver is much faster than an FMM accelerated GMRES, because it handles ill-conditioning. Additionally, the direct solver is very fast for multiple incident angles that occur often in design problems, and this can be further accelerated in the case when many incident angles share a Bloch phase . In one complicated Dirichlet obstacle case which requires unknowns to discretize, 200 incident angles are solved in under 6 seconds per incident angle.

Although, for simplicity, we disallowed intersections of with the and walls, it would be quite simple to adapt the fast direct solver to the scheme presented in [5, Sec. 6] to handle this case. It would also be relatively easy to generalize the scheme to multi-layer transmission gratings, which are more common in applications. We anticipate creating a fast solver for this case in future work.

### Footnotes

1. Note that operators with the symbol involve Fourier variables, a notation consistent with [5].

### References

1. B. K. Alpert. Hybrid Gauss-trapezoidal quadrature rules. SIAM J. Sci. Comput., 20:1551–1584, 1999.
2. T. Arens. Scattering by biperiodic layered media: The integral equation approach. Habilitation thesis, Karlsruhe, 2010.
3. T. Arens, S. N. Chandler-Wilde, and J. A. DeSanto. On integral equation and least squares methods for scattering by diffraction gratings. Commun. Comput. Phys., 1:1010–42, 2006.
4. H. A. Atwater and A. Polman. Plasmonics for improved photovoltaic devices. Nature Materials, 9(3):205–213, 2010.
5. A. H. Barnett and L. Greengard. A new integral representation for quasi-periodic scattering problems in two dimensions. BIT Numer. Math., 51:67–90, 2011.
6. C. P. J. Barty, et al. An overview of LLNL high-energy short-pulse technology for advanced radiography of laser fusion experiments. Nuclear Fusion, 44(12):S266, 2004.
7. G. Beylkin, R. Coifman, and V. Rokhlin. Wavelets in numerical analysis. In Wavelets and their applications, pages 181–210. Jones and Bartlett, Boston, MA, 1992.
8. Bonnet-BenDhia, A.-S. and F. Starling. Guided waves by electromagnetic gratings and non-uniqueness examples for the diffraction problem. Math. Meth. Appl. Sci., 17:305–338, 1994.
9. S. Börm. Efficient Numerical Methods for Non-local Operators: -Matrix Compression, Algorithms and Analysis. European Mathematics Society, 2010.
10. O. P. Bruno and M. C. Haslam. Efficient high-order evaluation of scattering by periodic surfaces: deep gratings, high frequencies, and glancing incidences. J. Opt. Soc. Am. A, 26(3):658–668, 2009.
11. S. Chandrasekaran and M. Gu. A divide-and-conquer algorithm for the eigendecomposition of symmetric block-diagonal plus semiseparable matrices. Numer. Math., 96(4):723–731, 2004.
12. H. Cheng, W. Y. Crutchfield, Z. Gimbutas, G. L., J. Huang, V. Rokhlin, N. Yarvin, and J. Zhao. Remarks on the implementation of the wideband FMM for the Helmholtz equation in two dimensions, volume 408 of Contemp. Math., pages 99–110. Amer. Math. Soc., Providence, RI, 2006.
13. H. Cheng, Z. Gimbutas, P. Martinsson, and V. Rokhlin. On the compression of low rank matrices. SIAM Journal of Scientific Computing, 26(4):1389–1404, 2005.
14. W. C. Chew. Waves and Fields in Inhomogeneous Media. Wiley-IEEE Press, 1999.
15. D. Colton and R. Kress. Integral equation methods in scattering theory. Wiley, 1983.
16. D. Colton and R. Kress. Inverse acoustic and electromagnetic scattering theory, volume 93 of Applied Mathematical Sciences. Springer-Verlag, Berlin, second edition, 1998.
17. A. Gillman, P. Young, and P. Martinsson. A direct solver with complexity for integral equations on one-dimensional domains. Frontiers of Mathematics in China, 7(2):217–247, 2012.
18. A. Gillman, P. Young, and P. G. Martinsson. A direct solver with complexity for integral equations on one-dimensional domains, 2011.
19. Z. Gimbutas and L. Greengard. HFMM2D, Fortran code for fast multipole method with Helmholtz kernel in two dimensions (version 20110313), 2011. http://www.cims.nyu.edu/cmcl/fmm2dlib/fmm2dlib.html.
20. G. Golub and C. V. Loan. Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition, 1996.
21. M. Gu and S. C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Sci. Comput., 17(4):848–869, 1996.
22. W. Hackbusch. A sparse matrix arithmetic based on H-matrices; Part I: Introduction to H-matrices. Computing, 62:89–108, 1999.
23. S. Hao, P. Martinsson, and P. Young. High-order accurate Nyström discretization of integral equations with weakly singular kernels on smooth curves in the plane, 2011. arXiv.org report #1112.6262.
24. J. Helsing. Integral equation methods for elliptic problems with boundary conditions of mixed type. J. Comput. Phys., 228:8892–8907, 2009.
25. K. Ho and L. Greengard. A fast direct solver for structured linear systems by recursive skeletonization, 2012. To appear in SIAM J. on Scientific Computing.
26. K. V. Horoshenkov and S. N. Chandler-Wilde. Efficient calculation of two-dimensional periodic and waveguide acoustic Green’s functions. J. Acoust. Soc. Amer., 111:1610–1622, 2002.
27. S. Kapur and V. Rokhlin. High-order corrected trapezoidal quadrature rules for singular functions. SIAM J. Numer. Anal., 34:1331–1356, 1997.
28. A. Klöckner, A. H. Barnett, L. Greengard, and M. O’Neil. Quadrature by expansion: a new method for the evaluation of layer potentials, 2012. submitted.
29. R. Kress. Numerical Analysis. Graduate Texts in Mathematics #181. Springer-Verlag, 1998.
30. C. M. Linton. Lattice sums for the Helmholtz equation. SIAM Review, 52(4):603–674, 2010.
31. P. Martinsson and V. Rokhlin. A fast direct solver for boundary integral equations in two dimensions. J. Comp. Phys., 205(1):1–23, 2005.
32. A. Meier, T. Arens, S. N. Chandler-Wilde, and A. Kirsch. A Nyström method for a class of integral equations on the real line with applications to scattering by diffraction gratings and rough surfaces. J. Integral Equations Appl., 12:281–321, 2000.
33. E. Michielssen, A. Boag, and W. C. Chew. Scattering from elongated objects: direct solution in operations. IEE Proc. Microw. Antennas Propag., 143(4):277 – 283, 1996.
34. R. Model, A. Rathsfeld, H. Gross, M. Wurm, and B. Bodermann. A scatterometry inverse problem in optical mask metrology. J. Phys.: Conf. Ser., 135:012071, 2008.
35. P. Morse and H. Feshbach. Methods of theoretical physics, volume 1. McGraw-Hill, 1953.
36. M. J. Nicholas. A higher order numerical method for 3-D doubly periodic electromagnetic scattering problems. Commun. Math. Sci., 6(3):669–694, 2008.
37. M. A. Peter, M. H. Meylan, and C. M. Linton. Water-wave scattering by a periodic arrya of arbitrary bodies. J. Fluid Mech., 548:237–256, 2006.
38. V. Rokhlin. Solution of acoustic scattering problems by means of second kind integral equations. Wave Motion, 5:257–272, 1983.
39. N. P. Sergeant, M. Agrawal, and P. Peumans. High performance solar-selective absorbers using coated sub-wavelength gratings. Opt. Express, 18(6):5525–5540, 2010.
40. Z. Sheng, P. Dewilde, and S. Chandrasekaran. Algorithms to solve hierarchically semi-separable systems. In System theory, the Schur algorithm and multidimensional analysis, volume 176 of Oper. Theory Adv. Appl., pages 255–294. Birkhäuser, Basel, 2007.
41. S. Shipman. Resonant scattering by open periodic waveguides, volume 1 of Progress in Computational Physics (PiCP), pages 7–50. Bentham Science Publishers, 2010.
42. P. Starr and V. Rokhlin. On the numerical solution of two-point boundary value problems. II. Comm. Pure Appl. Math., 47(8):1117–1159, 1994.
43. Y. Wu and Y. Y. Lu. Analyzing diffraction gratings by a boundary integral equation Neumann-to-Dirichlet map method. J. Opt. Soc. Am. A, 26(11):2444–2451, 2009.
72360