Randomized sampling for gFEM

# Randomized sampling for basis functions construction in generalized finite element methods

Ke Chen Mathematics Department, University of Wisconsin-Madison, 480 Lincoln Dr., Madison, WI 53706 USA. Qin Li Mathematics Department, University of Wisconsin-Madison, 480 Lincoln Dr., Madison, WI 53706 USA. Jianfeng Lu Department of Mathematics, Department of Physics and Department of Chemistry, Duke University, Box 90320, Durham, NC 27708 USA.  and  Stephen J. Wright Computer Sciences Department, University of Wisconsin-Madison, 1210 W Dayton St, Madison, WI 53706 USA.
July 18, 2019
###### Abstract.

In the context of generalized finite element methods for elliptic equations with rough coefficients , efficiency and accuracy of the numerical method depend critically on the use of appropriate basis functions. This work explores several random sampling strategies for construction of basis functions, and proposes a quantitative criterion to analyze and compare these sampling strategies. Numerical evidence shows that the optimal basis functions can be well approximated by a random projection of generalized eigenvalue problem onto subspace of -harmonic functions.

The work of Q.L. is supported in part by a start-up fund from UW-Madison and National Science Foundation under the grant DMS-1619778. The work of J.L. is supported in part by the National Science Foundation under grant DMS-1454939. The work of S.W. is supported in part by NSF Awards IIS-1447449, 1628384, and 1634597 and AFOSR Award FA9550-13-1-0138. Both Q.L. and S.W. are supported by TRIPODS: NSF 1740707.

## 1. Introduction

The focus of the paper is the construction of basis functions for elliptic equations with rough coefficients in the framework of generalized finite element methods. Consider an elliptic partial differential equation

 (1.1) {−div(a(x)∇u(x))=f(x),x∈Ω;u(x)=0,x∈∂Ω,

with and the coefficient uniformly elliptic, that is, there exists such that for all . Note that we assume only regularity of , so the coefficient could be rather rough, which poses challenges to conventional numerical methods, such as the standard finite element method with local polynomial basis functions.

Numerical methods can be designed to take advantage of certain analytical properties of the problem (1.1). A classical example is when is two-scale, that is, where is -periodic with respect to its second argument. (Thus, characterizes explicitly the small scale of the problem.) Using the theory of homogenization [4, 24], several numerical methods have been proposed over the past decades to capture the homogenized solution of the problem and possibly also provide some microscopic information. Approaches of this type include the multiscale finite element method [8, 13, 14, 12] and the heterogeneous multiscale method [6, 7, 20].

While one could apply methods designed for numerical homogenization to the cases of rough media (), the lack of favorable structural properties often degrades the efficiency and convergence rates. Various numerical methods have been proposed for media, including the generalized finite element method [2], upscaling based on harmonic coordinate [22], elliptic solvers based on -matrices [3, 10], and Bayesian numerical homogenization [21], to name just a few. Our work lies in the framework of the generalized finite element method (gFEM) originally proposed in [2]. The idea is to approximate local solution space by constructing good basis functions locally and to use either the partition-of-unity or discontinuous Galerkin method to obtain a global discretization.

According to the partition-of-unity finite-element theory, which we will recall briefly in Section 2, the global error is controlled by the accuracy of the numerical local solution spaces. Thus, global performance of the method depends critically on efficient preparation of accurate local solution spaces. Towards this end, in [1], Babuška and Lipton studied the Kolmogorov width of a finite dimensional approximation to the optimal solution space, and showed that the distance decays almost exponentially fast, as we recall in Section 2. The basis construction algorithm proposed in [1] follows the analysis closely: a full list of -harmonic functions (up to discretization) is obtained in each patch, and local basis functions are obtained by a “post-processing” step of solving a generalized eigenvalue problem to select modes with highest “energy ratios”. Since the roughness of necessitates a fine discretization in each patch, and thus a large number of -harmonic functions per patch, the overall computational cost of the above strategy is high.

Our work is based on the gFEM framework [2] together with the concept of optimal local solution space via Kolmogorov width studied in [1]. Our main contribution in this work is to accelerate the procedure in [1] by incorporating random sampling techniques. Randomized algorithms have been shown to be powerful in reducing computational complexities in looking for low rank structures of matrices. Since the generalized eigenvalues decay almost exponentially fast, the local solution space is of low rank, and we find that random sampling approaches can effectively capture the local solution space. The efficiency of the approach certainly depends on the particular random sampling strategy employed; we explore several strategies and identify the most successful ones.

The idea of random sampling or oversampling to construct local basis functions has been proposed before in the literature. In [9], the authors proposed to sample the -harmonic function space by choosing random boundary conditions, and use SVD thresholding to obtain well-conditioned basis functions. The method was developed further in [5], where the randomized sample basis functions are selected by solving a local eigenvalue problem associated with the elliptic operator. This post-processing step is similar to the method developed in [17] in the context of solving elliptic eigenvalue problems with a rough coefficient.

There are several important differences between our approach and that of [9, 5]. First, we find that the best randomized sampling strategy is not based on random sampling of boundary conditions, but rather on projection of a Gaussian function randomly centered in the domain. Second, our postprocessing step is based on the Kolmogorov -width analysis in [1], so a low rank approximation is guaranteed. Third, we provide a quantitative criterion for evaluating the efficiency and accuracy of the method.

Other basis construction approaches based on gFEM framework have been explored in the literature, mostly based on a similar offline and online strategy. In the offline step, one prepares the solution space (either local or global). In the online step, one assembles the basis through the Galerkin framework (see, for example, [19, 23, 15]). The random sampling strategy can be also explored in these contexts.

The organization of the rest of the paper is as follows. We review preliminaries in Section 2, including the basics of basis construction and error analysis . In Section 3, we describe the random sampling framework, and present a few particular sampling strategies. We will also connect and compare the framework with the randomized singular value decomposition (rSVD) in Section 3.3. To compare the various sampling approaches, we propose a criterion in Section 4, according to which random sampling in the interior of the subdomains is more effective than random sampling of boundary conditions, as we demonstrate through numerical examples in Section 5.

## 2. Preliminary Results

Here we provide some preliminary results about the generalized finite element method and review the construction of optimal basis set. Consider the following elliptic equation:

 (2.1) {Lu=−div(a(x)∇u(x))=f(x),x∈Ω;u(x)=0,x∈∂Ω,

with , where denotes the elliptic operator. The corresponding weak formulation is given by

 ⟨a(x)∇u,∇v⟩L2(Ω)=⟨f,v⟩L2(Ω)

for all test functions , where .

In the Galerkin framework, one constructs the solution space first. Given the approximation space

 (2.2) Span{ϕi,i=1,2,…n},

we substitute the ansatz and test functions , into (2.1) to obtain:

 n∑j=1⟨a(x)∇ϕj,∇ϕi⟩L2(Ω)cj=⟨f,ϕi⟩L2(Ω).

We write this system in matrix form as follows:

 (2.3) A→c=→b,

where is a symmetric matrix with entries , and (with ) is a list of coefficients to be determined. The right hand side is the load vector , with entries .

It is well known that is -perpendicular to the ansatz space (2.2), and the following quasi-optimality property holds:

 ∥u−U∥E(Ω)≤C∥u−Pu∥E(Ω),

where is some constant depending on and , and is the projection of the true solution onto the space (2.2). Here the energy norm on any subdomain is defined by

 (2.4) ∥v∥E(ω)=⟨a∇v,∇v⟩1/2L2(ω):=[∫ωa|∇v(x)|2dx]1/2.

Thus the existence of a small numerical error requires a set of basis functions that form a space for which is small.

The main difficulty of computing the elliptic equation with rough coefficient is the number of basis functions needed. When is rough with as its smallest scale, for standard piecewise affine finite elements, the mesh size needs to resolve the smallest scale, so that in each dimension. It follows that the dimension of the system (2.3) is , where is the spatial dimension. The large size of stiffness matrix and its large condition number (usually on the order of ) make the problem expensive to solve using this approach.

The question is then whether it is possible to design a Galerkin space for which is independent of ? As mentioned in Section 1, the offline-online procedure makes this possible, as we discuss next.

### 2.1. Generalized Finite Element Method

The generalized Finite Element Method was one of the earliest methods to utilize the offline-online procedure. This approach is based on the partition of unity. One first decomposes the domain into many small patches , , that form an open cover of . Each patch is assigned a partition-of-unity function that is zero outside and over most of the set . Specifically, there are positive constants and such that

 (2.5a) 0≤νi(x) ≤1, for all x∈Ω and all i=1,2,…,m, (2.5b) νi(x) =0, for all x∈Ω∖ωi and all i=1,2,…,m, (2.5c) maxx∈Ωνi(x) ≤C1, for all i=1,2,…,m, (2.5d) maxx∈Ω|∇νi(x)| ≤C2diam(ωi), for all i=1,2,…,m.

Moreover, we have

 (2.6) m∑i=1νi(x)=1,for all x∈Ω\,.

In the offline step, basis functions are constructed for each patch , . We denote the numerical local solution space in patch by:

 (2.7) Φ[i]=Span{ϕi,j,j=1,2,…,ni},

where is the number of basis functions for . In the online step, the Galerkin formulation is used, with the space in (2.2) replaced by:

 (2.8) Φ:=⨁i=1,2,…,mΦ[i]νi=Span{ϕi,jνi,i=1,2,…,m,j=1,2,…,ni}.

Details can be found in [2].

The total number of basis functions is . If all , are bounded by a modest constant, the dimension of the space is of order , so the computation in the online step is potentially inexpensive. It is proved in [2] that the total approximation error is governed by the summation of all local approximation errors.

###### Theorem 2.1.

Denote by the solution to (2.1). Suppose forms an open cover of and let denote a set of partition-of-unity functions as defined in (2.5). Assume that in each patch the solution can be approximated well by . Then the global error is small too. More specifically, if we assume that

 (2.9) ∥u−ζi∥L2(ωi)≤ε1(i)and∥u−ζi∥E(ωi)≤ε2(i),

and define

 ζ(x)=m∑i=1ζi(x)νi(x),

then , and for the constants and defined in (2.5), we have

 ∥u−ζ∥L2(Ω)≤C1(m∑i=1ε21(i))1/2

and

 ∥u−ζ∥E(Ω)≤C2(m∑i=1ε21(i)diam2(ωi)+C21m∑i=1ε22(i))1/2.

This theorem shows that the approximation error of the Galerkin numerical solution for the gFEM depends directly on the accuracy of the local approximation spaces in each patch.

### 2.2. Low-Rank Local Solution Space

One reason for the success of gFEM is that the local numerical solution space is approximately low-rank, meaning that can be made small for all in (2.7); see [1]. We review the relevant results in this section, and show how to find .

Denote by an enlargement of the patch . To simplify notation, we suppress subscripts from here on. We introduce a restriction operator:

 P:Ha(ω∗)/R→Ha(ω)/R,

where is the collection of all -harmonic functions in and represents the quotient space of with respect to constant function. (This modification is needed to make a norm, since an -harmonic function is defined only up to an additive constant.) The operator is determined uniquely by restricted in and . We denote its adjoint operator by . It is shown in [1] that the operator is a compact, self-adjoint, nonnegative operator on .

To derive an -dimensional approximation of , we first define the distance of an arbitrary -dimensional function subspace to , as follows:

 d(Sn,ω∗)=supu∈Ha(ω∗)/Rinfξ∈Sn∥Pu−ξ∥E(ω)∥u∥E(ω∗).

By considering all possible , we can identify the optimal approximation space (called the optimal -dimensional set) that achieves the infimum:

 (2.10)

We now define a distance measure between and as follows:

 dn(ω,ω∗)=d(Φn,ω∗)=supu∈Ha(ω∗)/Rinfξ∈Φn∥Pu−ξ∥E(ω)∥u∥E(ω∗).

The term is the celebrated Kolmogorov -width of the compact operator . It reflects how quickly -harmonic functions supported on lose their energies when confined on . According to [25], the optimal space and Kolmogorov -width can be found explicitly, in terms of the eigendecomposition of on , which is

 (2.11) P∗Pψi=λiψi,

with arranged in descending order. We then have

 Φn:=PΨn=Span{ϕ1,ϕ2,…,ϕn},

where

 (2.12) Ψn=Span{ψ1,⋯,ψn}andϕi=Pψi,

and

 (2.13) dn(ω,ω∗)=√λn+1.

Note that are all supported in the enlarged domain , while are their confinements in . Almost-exponential decay of the Kolmogorov width with respect to was proved in [1], as follows.

###### Theorem 2.2.

The accuracy has nearly exponential decay for sufficiently large, that is, for any small , we have

 dn(ω,ω∗)≤e−n(1d+1−ε).

It follows that for any function that is -harmonic function in the patch , we can find a function in space for which

 ∥u−v∥E(ω)≤dn(ω,ω∗)∥u∥E(ω∗)∼e−n1/3.
###### Remark 1.

Note that is the -th singular value of . Because of the fast decay of with respect to indicated by Theorem 2.2, is considered to be an approximately-low-rank operator. It follows that almost all -harmonic functions supported on , when confined in , look almost alike, and can be represented by a relatively small number of “important” modes.

###### Remark 2.

We note that enlarging the domain for over-sampling is not a new idea. In [12], the boundary layer behavior confined in was studied and utilized for computation.

### 2.3. Computing the Local Solution Space

We describe here the computation of an approximation to via a discretized implementation of the definitions in the previous subsection. The idea is to discretize the enlarged patch with a fine mesh, collect all -harmonic functions in this discrete setting, and then solve (2.11) in a finite basis. To collect all -harmonic functions, we would need to solve the system with elliptic operator (1.1) locally, with all possible Dirichlet boundary conditions on . For ease of presentation, here and in sequel, we assume that we choose a piecewise-affine finite-element discretization of the patch for computing the local -harmonic functions. Then the discretized -harmonic functions are determined by their values on grid points on the boundary of . We proceed in three stages.

#### Stage A

Approximate the space of -harmonic functions via the functions obtained by solving the following system, for :

 (2.14) {Lχi=−div(a(x)∇χi)=0,x∈ω∗χ|∂ω∗=δi,yi∈∂ω∗,

where is the hat function that peaks at and equals zero at other grid points , . Recall that we have assumed a piecewise-affine finite-element discretization of .

#### Stage B

Compute (2.11) in the space spanned by . To do this, we first note that:

 (2.15) ⟨P∗Pψi,δ⟩E(ω∗)=⟨Pψi,Pδ⟩E(ω)=⟨ψi,δ⟩E(ω),∀ δ∈Ha(ω∗)/R,

Thus the weak formulation of (2.11) is given by

 ⟨ψi,χ⟩E(ω)=λi⟨ψi,χ⟩E(ω∗),∀χ∈span{χ1,…,χNy}.

Expanding the eigenfunction in terms of , :

 (2.16) ψi=∑jc(i)jχj,

we obtain the following equation for the coefficient vector :

 ∑jc(i)j⟨χj,χk⟩E(ω)=λi∑jc(i)j⟨χj,χk⟩E(ω∗).

This system can be written as

 (2.17) S→c(i)=λiS∗→c(i),withSmn=⟨χm,χn⟩E(ω)=⟨χm,χn⟩E(ω)andS∗mn=⟨χm,χn⟩E(ω∗).

This generalized eigenvalue problem can be solved for , , arranged in descending order, and their associated eigenfunctions , defined from (2.16) using the generalized eigenvectors from (2.17). Index is chosen such that , where TOL is a given error tolerance.

#### Stage C

Obtain by substituting the functions , from Stage B into (2.12).

## 3. Randomized Sampling Methods for Local Bases

In this section we propose a class of random sampling methods to construct local basis functions efficiently. As seen in Section 2.3, finding the optimal basis functions amounts to solving the generalized eigenvalue problem in (2.17). The main cost does not come from performing the eigenvalue decomposition, but rather from computing the -harmonic functions , which are used to construct the matrices and in (2.17). As shown in Section 2.2, the eigenvalues decay almost exponentially fast, indicating that, locally, only a limited number of modes is needed to well represent the whole solution space. This low-rank structure motivates us to consider randomized sampling techniques.

Randomized algorithms have been highly successful in compressed sensing, where they are used to extract low rank structure efficiently from data. The Johnson-Lindenstrauss lemma [16] suggests that structure in high dimensional data points is largely preserved when projected onto random lower-dimensional spaces. The randomized SVD (rSVD) algorithm uses this idea to captures the principal components of a large matrix by random projection of its row and column spaces into smaller subspaces; see [11] for a review. In the current numerical PDE context, knowing that the local solution space is essentially low-rank, we seek to adopt the random sampling idea to generate local approximate solution spaces efficiently.

For our problem, unfortunately, randomized SVD cannot be applied directly, as we discuss further in Section 3.3. Instead, we propose a method based on Galerkin approximation of the generalized eigenvalue problem on a small subspace. One immediate problem is that an arbitrarily given random function is not necessarily -harmonic. Thus, our method first generates a random collection of functions and projects them onto the -harmonic function space, and then solves the generalized eigenvalue problem (2.17) on the subspace to find the optimal basis functions. A detailed description of our procedure is shown in Algorithm 1.

Note that the steps in Stage 2 of Algorithm 1 are parallel to those of Section 2.3, but only a small number of functions is used in the generalized eigenvalue problem, rather that the whole list of -harmonic functions (i.e., ). We therefore save significant computation in preparing the -harmonic function space, in assembling the and matrices, and in solving the generalized eigenvalue decomposition.

The key is to use the random sampling strategy in Stage 1 of Algorithm 1 to generate an effective small subspace for the generalized eigenvalue problem. This aspect of the algorithm will be the focus of the rest of this section.

### 3.1. a-Harmonic Projection

Let us first discuss the -harmonic projection of a given function supported on . This problem can be formulated as a PDE-constrained optimization problem:

 (3.4) minγ12∥γ−ξ∥2L2(ω∗)s.% t.Lγ=0,

where is the elliptic operator defined in (2.1). The Lagrangian function for (3.4) is as follows:

 (3.5) F(γ,λ):=12∥γ−ξ∥2L2(ω∗)−⟨λ,Lγ⟩L2(ω∗),

where is a Lagrange multiplier. In the discrete setting, we form a grid over and denote by the hat function centered at grid point . (Recall that we have assumed piecewise affine finite element discretization.) The Lagrangian function for the corresponding discretized optimization problem is

 (3.6) F(γ,λ)=12(γi−ξi)⊤(γi−ξi)−λ⊤Aiiγi−λ⊤Aibγb,

where the superindices and stand for interior and boundary grids, respectively, and is the stiffness matrix:

 Amn=⟨a∇ζm,∇ζn⟩L2(ω∗).

In the discrete setting, is a vector of the same length as (the number of grid points in the interior). Note that in the translation to the discrete setting, we represent by , which leads to

 Aiiγi+Aibγb=0.

Here is the stiffness matrix confined in the interior, and is the part of the stiffness matrix generated by taking the inner product of the interior basis functions and the boundary basis functions. To solve the minimization problem, we take the partial derivatives of (3.6) with respect to and and set them equal to zero, as follows:

 ∇γiF =γi−ξi−Aiiλ =0, ∇γbF =Aib⊤λ =0, ∇λF =Aiiγi+Aibγb =0.

Some manipulation yields the following systems for and :

 Aib⊤(Aii)−2Aibγb=−Aib⊤(Aii)−1ξi,γi=−(Aii)−1Aibγb,

and the solution to this system gives the solution of (3.4) in the discrete setting.

### 3.2. Random sampling strategies

We have many possible choices for the random functions functions , in Stage 1-A of 1. Here we list a few natural ones.

• Interior -function. Choose a random grid point in and set at this grid point, and zero at all other grid points. That is, is the hat function associated with the grid point .

• Interior i.i.d. function. Choose the value of at each grid point in independently from a standard normal Gaussian distribution. The values of at grid points in are set to .

• Full-domain i.i.d. function. The same as in 2, except that the values of at the grid points in are also chosen as Gaussian random variables.

• Random Gaussian. Choose a random grid point and set at all grid points in .

We aim to select basis functions (through Stage 2) that are associated with the largest eigenvalues, so that the Kolmogorov -width can be small (2.13). Thus, we hope that in Stage 1, the chosen functions provide large eigenvalues in (2.17). A large value of indicates that a large portion of the energy is maintained in , with only a small amount coming from the buffer region . It therefore suggests to choose functions with most of their variations inside . However, the projection to -harmonic space step makes the locality of the resulting functions hard to predict. In Section 4, we propose and analyze a criterion for the performance of the random sampling schemes. In particular, we compare the choices listed above.

We mention here two other approaches that have been proposed in the literature for producing bases of -harmonic functions. In these previous works, the functions obtained are used directly as basis functions, without applying the post-processing step of the generalized eigenvalue problem (2.17) in Phase 2 above.

1. Random boundary sampling. In [9], the authors proposed to obtain a list of random -harmonic functions by computing the local elliptic equation with i.i.d. random Dirichlet boundary conditions. Assuming there are grid points on the boundary , we define to be a vector of length with i.i.d. random variables for each component. We then define by solving

 (3.7) {Lγ=0,x∈ω∗γ|∂ω∗=g.

This process is repeated for times to obtain random -harmonic functions .

2. Randomized boundary sampling with exponential covariance. A technique in which the Dirichlet boundary conditions are chosen to be random Gaussian variables with a specified covariance matrix is described in [18]. This matrix is assumed to be an exponential function, that is,

 (3.8) Cov(yi,yj)=exp(−|yi−yj|/σ).

The first few modes of a Karhunen-Loéve expansion are used to construct a boundary condition in (3.7), with which basis functions are computed. Although a justification for this approach is not provided, numerical computations show that it is more efficient than the i.i.d. random boundary sampling.

### 3.3. Connection with Randomized SVD

We briefly address the connection of the random sampling method we propose in this paper with the well-known randomized SVD (rSVD) algorithm. Although rSVD cannot be used directly in our problem, it served as a motivation for our randomized sampling strategies.

The randomized SVD algorithm, studied thoroughly in [11], speeds up the computation of the SVD of a matrix when the matrix is large and approximately low rank. With high probability, the singular vector structure is largely preserved when the matrix is projected onto a random subspace. Specifically, for a random matrix with a small number of columns (the number depending on the rank of ), it is proved in [11] if we obtain from the QR factorization of , we have

 (3.9) ∥A−QQ⊤A∥2≪∥A∥2.

This bound implies that any vector in the range space of can be well approximated by its projection into the space spanned by . For example, if , we have from (3.9) that

 (3.10) ∥→u−QQ⊤→u∥≪∥→u∥.

We note that and span the same column space, but is easier to work with and better conditioned, because its columns are orthonormal. Equivalent to (3.10), we can also say that any in the image of can be approximated well using a linear combination of the columns of .

To see the connection between rSVD and our problem, we first write the generalized eigenvalue problem (2.17) into a SVD form. Recall the definitions (3.1) of and :

 Smn=∫ωa(x)∇χm(x)⋅∇χn(x)dx,S∗mn=∫ω∗a(x)∇χm(x)⋅∇χn(x)dx,

and define

 (3.11) Φ∗=[√a∇χ1,√a∇χ2,…,√a∇χNy],Φ=Φ∗|ω.

Then one could denote and , and rewrite the generalized eigenvalue problem (2.17) as follows:

 (3.12) Φ⊤Φ→c=λΦ∗⊤Φ∗→c.

We write the QR factorization for as follows:

 Φ∗=QΦ∗RΦ∗,

and denote . By substituting into (3.12), we obtain

 (ΦR−1Φ∗)⊤(ΦR−1Φ∗)→d=λ→d,

meaning that forms a singular value pair of the matrix .

According to the rSVD argument, the leading singular vectors of are captured by those of

 (3.13) ΦR−1Φ∗R,

where is a matrix whose entries are i.i.d Gaussian random variables. Specifically, with high probability, the leading singular values of are almost the same as those of , and the column space spanned by (3.13) largely covers the image of , as in (3.11).

We now interpret from the viewpoint of PDEs. Decomposing into columns as follows:

 (3.14) R−1Φ∗R=[r1,r2,…],withrk=[rk1,rk2,⋯]⊤,

and denoting , we have from (3.11) that

 Φrk=√a∇(∑jrkjχj)≐√a∇γk.

Numerically, this corresponds to solving the following system for :

 (3.15)

It is apparent from this equation that to obtain , we do not need to compute all functions , and use them to construct . Rather, we can compute directly by solving the elliptic equation with random boundary conditions given by , . The cost of this procedure is now proportional to , which is much less than .

Unfortunately, this procedure is difficult to implement in a manner that accords with the rSVD theory. is constructed using i.i.d. Gaussian random variables, but is unknown ahead of time, so the distribution of (3.14) is unknown. The theory here suggests that there is a random sampling strategy that achieves the accuracy and efficiency that characterize rSVD, but it does not provide such a strategy.

## 4. Efficiency of Various Random Sampling Methods

As discussed in Section 3, given multiple ways to choose the random samples in Stage 1 of Algorithm 1, it is natural to ask which one is better, and how do to predetermine the approximation accuracy? We answer these questions in this section.

The key requirement is that Algorithm 1 should capture the high-energy modes of (2.11) — the modes that correspond to the highest values of . We start with a simple example in Section 4.1 that finds the relationship between the energy captured by a certain single mode, and the angle that that mode makes with the highest energy mode (which corresponds to the Kolmogorov -width). We extend this discussion beyond a single mode in Section 4.2, again using the language of linear algebra. The relevance to local PDE basis construction is outlined in Section 4.3.

### 4.1. A One-Mode Example

Suppose we are working in a three-dimensional space, with symmetric positive definite matrices and and generalized eigenvectors , , and such that

 (4.1) ⟨xi,xj⟩B=x⊤iBxi=δij,⟨xi,xj⟩A=x⊤iAxi=δijλi,

for generalized eigenvalues . We thus have

 A⋅xi=λiB⋅xi,i=1,2,3.

Suppose we have some vector that is intended to approximate the leading eigenvector . The energy of is

 (4.2) E(x)=x⊤Axx⊤Bx,

and the angle between the space spanned by and that spanned by is defined by:

 (4.3) ∠(x,x1)=maxβ:|β|≤1minα∥αx−βx1∥A∥x∥B∥x1∥B=maxβ:|β|≤1minα∥αx−βx1∥A∥x∥B.

We have the following result (which generalizes easily to dimension greater than ).

###### Proposition 1.

The angle (4.3) is bounded in terms of the energy (4.2) as follows:

 (4.4) ∠(x,x1)≤√λ1λ2(λ1−E(x))(λ1−λ2)E(x).
###### Proof.

The proof is simple algebra. As span the entire space and are -orthogonal, we have

 (4.5) x=w1x1+w2x2+w3x3,

with , . According to the definition of the angle, one can reduce the problem by setting and , so that in (4.3). (With these normalizations, we have from (4.1) and (4.2) that .) We thus have

 (∠(x,x1))2 =minα∥αx−x1∥2A =minα∥(αw1−1)x1+αw2x2+αw3x3∥2A =minα((αw1−1)2λ1+α2w22λ2+α2w23λ3).

The minimum is achieved at , with the minimized angle being

 (4.6) (∠(x,x1))2=E(x)−w21λ1E(x)λ1.

To bound the numerator in (4.6) we observe that

 E(x)−w21λ1=w22λ2+w23λ3≤w22λ2+w23λ2=(1−w21)λ2,

and moreover

 E(x)≤w21λ1+(1−w21)λ2⇒λ1−E(x)≥(1−w21)(λ1−λ2)⇒1−w21≤λ1−E(x)λ1−λ2.

By conbining these last two bounds, we obtain

 E(x)−w21λ1≤λ2λ1−E(x)λ1−λ2.

By substituting this bound into (4.6), we obtain (4.4). ∎

Note that the bound (4.4) decreases to zero as .

According to (4.4), a larger gap in the spectrum between and yields a tighter bound, thus better control over the angle. The theorem indicates that the “energy” is the quantity that measures how well the randomly given vector captures the first mode, and thus serves as the criterion for the quality of the approximation.

### 4.2. Higher-Dimensional Criteria

In this section, we seek the counterpart in higher dimensional space of the previous result. Suppose now that the two symmetric positive definite matrices and are , and their generalized eigenpairs satisfy the following:

 (4.7) ⟨xi,xj⟩B=δij,⟨xi,xj⟩A=δijλi,

so that

 A⋅xi=λiB⋅xi,withλ1≥⋯≥λk>λk+1≥⋯≥λn>0,

or

 (4.8) A⋅X=B⋅X⋅Λ,% withΛ=diag(λ1,λ2,…,λn).

Suppose we are trying to recover , the matrix whose columns are the first modes. Define to collect the remaining modes, and let be our proposed approximation to . We seek a quantity that measures how well approximates in the energy norm.

To proceed, we assume first that is -orthonormal, that is,

 (4.9) Y⊤BY=\mathbbm1.

(If necessary, a QR procedure based on the -norm can be used to ensure this property.) Extending (4.2), we define the energy for an matrix as follows:

 (4.10) E(Z):=Tr(Z⊤AZ)Tr(Z⊤BZ)

The Kolmogorov width between space and the optimal subspace is a generalization of the angle (4.3), defined as follows:

 (4.11) d(Xh,Y)=supz∈SpanXh,∥z∥B≤1infy∈Y∥y−z∥A,

where “” denotes the space spanned by the columns of . We show below how is related to .

Since spans the entire space, we can express as follows:

 (4.12)

where . We have that has orthonormal columns, because from (4.7) and (4.9), we have

 (4.13) Y=X⋅C⇒\mathbbm1=Y⊤⋅B⋅Y=C⊤⋅X⊤⋅B⋅X⋅C=C⊤⋅C.

We denote by the upper portion of , and by the lower portion. We denote the elements of by , that is

 (4.14) Ch=[cji]j=1,2,…,k;i=1,2,…,k,Cl=[cji]j=k+1,k+2,…,n;i=1,2,…,k.

By orthonormality of , we have that

 k∑j=1c2ji+n∑j=k+1c2ji=1,i=1,2,…,k.

and thus

 (4.15) [Cl⊤Cl]ii=1−[Ch⊤Ch]ii=1−k∑j=1c2ji,i=1,2,…,k.
###### Lemma 4.1.

The trace of is bounded by energy difference between optimal space and the proposed space

 (4.16) Tr(Cl⊤Cl)≤k(E(Xh)−E(Y))λk−λk+1.

Furthermore, is invertible if

 (4.17) E(Xh)−E(Y)<λk−λk+1k.
###### Proof.

Since is -orthonormal, we have from (4.12) that

 (4.18) Ch⊤Ch+Cl⊤Cl