A model reduction approach to numerical inversion for a parabolic partial differential equation

# A model reduction approach to numerical inversion for a parabolic partial differential equation

Liliana Borcea111Department of Mathematics, University of Michigan, 2074 East Hall, 530 Church Street, Ann Arbor, MI 48109-1043 (borcea@umich.edu)    Vladimir Druskin222Schlumberger-Doll Research Center, 1 Hampshire St., Cambridge, MA 02139-1578 (druskin1@slb.com)    Alexander V. Mamonov333Schlumberger, 3750 Briarpark Dr., Houston, TX 77042 (amamonov@slb.com)    Mikhail Zaslavsky444Schlumberger-Doll Research Center, 1 Hampshire St., Cambridge, MA 02139-1578 (mzaslavsky@slb.com)
###### Abstract

We propose a novel numerical inversion algorithm for the coefficients of parabolic partial differential equations, based on model reduction. The study is motivated by the application of controlled source electromagnetic exploration, where the unknown is the subsurface electrical resistivity and the data are time resolved surface measurements of the magnetic field. The algorithm presented in this paper considers inversion in one and two dimensions. The reduced model is obtained with rational interpolation in the frequency (Laplace) domain and a rational Krylov subspace projection method. It amounts to a non-linear mapping from the function space of the unknown resistivity to the small dimensional space of the parameters of the reduced model. We use this mapping as a non-linear preconditioner for the Gauss-Newton iterative solution of the inverse problem. The advantage of the inversion algorithm is twofold. First, the non-linear preconditioner resolves most of the nonlinearity of the problem. Thus the iterations are less likely to get stuck in local minima and the convergence is fast. Second, the inversion is computationally efficient because it avoids repeated accurate simulations of the time-domain response. We study the stability of the inversion algorithm for various rational Krylov subspaces, and assess its performance with numerical experiments.

Key words. Inverse problem, parabolic equation, model reduction, rational Krylov subspace projection

AMS subject classifications. 65M32, 41A20

## 1 Introduction

Inverse problems for parabolic partial differential equations arise in applications such as groundwater flow, solute transport and controlled source electromagnetic oil and gas exploration. We consider the latter problem, where the unknown is the electrical resistivity, the coefficient in the diffusion Maxwell system satisfied by the magnetic field

 −∇×[r(x)∇×H(t,x)]=∂H(t,x)∂t, \hb@xt@.01(1.1)

for time and in some spatial domain. The data from which is to be determined are the time resolved measurements of at receivers located on the boundary of the domain.

Determining from the boundary measurements is challenging especially because the problem is ill-posed and thus sensitive to noise. A typical numerical approach is to minimize a functional given by the least squares data misfit and a regularization term, using Gauss-Newton or non-linear conjugate gradient methods [newman2000three, zhdanov, oristaglio1994inversion]. There are two main drawbacks. First, the functional to be minimized is not convex and the optimization algorithms can get stuck in local minima. The lack of convexity can be overcome to some extent by adding more regularization at the cost of artifacts in the solution. Nevertheless, convergence may be very slow [oristaglio1994inversion]. Second, evaluations of the functional and its derivatives are computationally expensive, because they involve multiple numerical solutions of the forward problem. In applications the computational domains may be large with meshes refined near sources, receivers and regions of strong heterogeneity. This results in a large number of unknowns in the forward problem, and time stepping with such large systems is expensive over long time intervals.

We propose a numerical inversion approach that arises when considering the inverse problem in the model reduction framework. We consider one and two dimensional media, and denote the spatial variable by , with and , a simply connected domain in with boundary , for . The setting is illustrated in figure 1.1.

The resistivity is , and assuming that

 H=u(t,x)ez, \hb@xt@.01(1.2)

we obtain from (1) the parabolic equation

 ∇⋅[r(x)∇u(t,x)]=∂u(t,x)∂t, \hb@xt@.01(1.3)

for and . The boundary consists of an accessible part , which supports the receivers at which we make the measurements and the initial excitation

 u(0,x)=uo(x),supp{uo}⊂BA, \hb@xt@.01(1.4)

and an inaccessible part where we set

 u(t,x)=0,x∈BI. \hb@xt@.01(1.5)

The boundary condition at is

 n(x)⋅∇u(t,x)=0,x∈BA, \hb@xt@.01(1.6)

where is the outer normal. We choose the boundary conditions (1-1) to simplify the presentation. Other (possibly inhomogeneous) conditions can be taken into account and amount to minor modifications of the reduced models described in this paper.

The sources and receivers, are located on the accessible boundary , and provide knowledge of the operator

 M(uo)=u(t,x)|x∈BA,t>0, \hb@xt@.01(1.7)

for every such that . Here is the solution of (1) with the initial condition (1). The inverse problem is to determine the resistivity for from . Note that the operator is highly non-linear in , but it is linear in . This implies that in one dimension where is an interval, say , and the accessible boundary is a single point , is completely defined by . All the information about is contained in a single function of time

 y(t)=M(δ(x))=u(t,0),t>0. \hb@xt@.01(1.8)

To ease the presentation, we begin by describing in detail the model reduction inversion method for the one dimensional case. Then we show how to extend it to the two dimensional case. The reduced model is obtained with a rational Krylov subspace projection method. Such methods have been applied to forward problems for parabolic equations in [borner2008fast, druskin2009solution, frommer2008matrix], and to inversion in one dimension in [dsz2012] and multiple dimensions in [beattietal]. The reduced models in [dsz2012] are rational interpolants of the transfer function defined by the Laplace transform of from (1). In this paper we build on the results in [dsz2012] to study in more detail and improve the inversion method in one dimension, and to extend it to two dimensions.

The reduced order models allow us to replace the solution of the full scale parabolic problem by its low order projection, thus resolving the high computational cost inversion challenge mentioned above. Conventionally, the reduced order model (the rational approximant of the transfer function) is parametrized in terms of its poles and residues. We parametrize it instead in terms of the coefficients of its continued fraction expansion. The rational approximant can be viewed as the transfer function of a finite difference discretization of (1) on a special grid with rather few points, known in the literature as the optimal or spectrally matched grid, or as a finite-difference Gaussian rule [druskin2000gaussian]. The continued fraction coefficients are the entries in the finite difference operator, which are related to the discrete resistivities.

To mitigate the other inversion challenge mentioned above, we introduce a non-linear mapping from the function space of the unknown resistivity to the low-dimensional space of the discrete resistivities. This map appears to resolve the nonlinearity of the problem and we use it in a preconditioned Gauss-Newton iteration that is less likely to get stuck in local minima and converges quickly, even when the initial guess is far from the true resistivity.

To precondition the problem, we map the measured data to the discrete resistivities. The inverse problem is ill-posed, so we limit the number of discrete resistivities (i.e. the size of the reduced order model) computed from the data. This number depends on the noise level and it is typically much less than the dimension of models used in conventional algorithms, where it is determined by the accuracy of the forward problem solution. This represents another significant advantage of our approach.

The paper is organized as follows: We begin in section 2 with a detailed description of our method in one dimension. The inversion in two dimensional media is described in section 3. Numerical results in one and two dimensions are presented in section 4. We conclude with a summary in section 5. The technical details of the computation of the non-linear preconditioner and its Jacobian are given in Appendix A.

## 2 Model order reduction for inversion in one dimensional media

In this section we study in detail the use of reduced order models for inversion in one dimension. Many of the ideas presented here are used in section 3 for the two-dimensional inversion. We begin in section 2.1 by introducing a semi-discrete analogue of the continuum inverse problem, where the differential operator in in equation (1) is replaced by a matrix. This is done in any numerical inversion, and we use it from the start to adhere to the conventional setting of model order reduction, which is rooted in linear algebra. The projection-based reduced order models are described in section 2.2. They can be parametrized in terms of the coefficients of a certain reduced order finite difference scheme, used in section 2.3 to define a pair of non-linear mappings. They define the objective function for the non-linearly preconditioned optimization problem, as explained in section 2.4. To give some intuition to the preconditioning effect, we relate the mappings to so-called optimal grids in section 2.5. The stability of the inversion is addressed in section 2.6, and our regularization scheme is given in section 2.7. The detailed description of the one dimensional inversion algorithm is in section 2.8.

### 2.1 Semi-discrete inverse problem

In one dimension the domain is an interval, which we scale to , with accessible and inaccessible boundaries and respectively. To adhere to the conventional setting of model order reduction, we consider the semi-discretized equation (1)

 ∂u(t)∂t=A(r)u(t), \hb@xt@.01(2.1)

where

 A(r)=−DTdiag(r)D, \hb@xt@.01(2.2)

is a symmetric and negative definite matrix, the discretization of . The vector contains the discrete values of and the matrix arises in the finite difference discretization of the derivative in , for boundary conditions (1-1). The discretization is on a very fine uniform grid with points in the interval , and spacing . Note that our results do not depend on the dimension , and all the derivations can be carried out for a continuum differential operator. We let be a matrix to avoid unnecessary technicalities. The vector is the discretization of , and the initial condition is

 u(0)=1he1, \hb@xt@.01(2.3)

the approximation of a point source excitation at . The time-domain response of the semi-discrete dynamical system (2.1) is given by

 y(t;r)=eT1u(t), \hb@xt@.01(2.4)

where . This is a direct analogue of (1), i.e. it corresponds to measuring the solution of (2.1) at the left end point of the interval . We emphasize in the notation that the response depends on the vector of discrete resistivities.

We use (2.1) to define the forward map

 F:RN+→C(0,+∞) \hb@xt@.01(2.5)

that takes the vector to the time domain response

 F(r)=y(⋅;r). \hb@xt@.01(2.6)

The measured time domain data is denoted by

 d(t)=y(t;r\tiny true)+N(t),t>0, \hb@xt@.01(2.7)

where is the true (unknown) resistivity vector and is the contribution of the measurements noise and the discretization errors. The inverse problem is: Given data for , recover the resistivity vector.

### 2.2 Projection-based model order reduction

In order to apply the theory of model order reduction we treat (2.1-2.1) as a dynamical system with the time domain response

 y(t;r)=eT1eA(r)te1h=bTeA(r)tb \hb@xt@.01(2.8)

written in a symmetrized form using the source/measurement vector

 b=e1/√h. \hb@xt@.01(2.9)

The transfer function of the dynamical system is the Laplace transform of (2.2)

 Y(s;r)=∫+∞0y(t;r)e−stdt=bT(sI−A(r))−1b,s>0. \hb@xt@.01(2.10)

Since is negative definite, all the poles of the transfer function are negative and the dynamical system is stable.

In model order reduction we obtain a reduced model so that its transfer function

 Ym(s)=bTm(sIm−Am)−1bm \hb@xt@.01(2.11)

is a good approximation of as a function of in some norm. Here is the identity matrix, , and . Note that while the matrix given by (2.1) is sparse, the reduced order matrix is typically dense. Thus, it has no straightforward interpretation as a discretization of the differential operator on some coarse grid with points.

Projection-based methods search for reduced models of the form

 Am=VTAV,bm=VTb,VTV=Im, \hb@xt@.01(2.12)

where the columns of form an orthonormal basis of an -dimensional subspace of on which the system is projected. The choice of depends on the matching conditions for and . They prescribe the sense in which approximates . Here we consider moment matching at interpolation nodes that may be distinct or coinciding,

 ∂kYm∂sk∣∣∣s=sj=∂kY∂sk∣∣∣s=sj,k=0,…,2Mj−1,j=1,…,l. \hb@xt@.01(2.13)

The multiplicity of a node is denoted by , so at non-coinciding nodes . The reduced order transfer function matches and its derivatives up to the order , and the size of the reduced model is given by .

Note from (2.2) that is a rational function of , with partial fraction representation

 Ym(s)=m∑j=1cjs+θj,cj>0,θj>0. \hb@xt@.01(2.14)

Its poles are the eigenvalues of and the residues are defined in terms of the normalized eigenvectors ,

 cj=(bTmzj)2,Amzj+θjzj=0,∥zj∥=1,j=1,…,m. \hb@xt@.01(2.15)

Thus, (2.2) is a rational interpolation problem. It is known [grimme1997krylov] to be equivalent to the projection (2.2) when the columns of form an orthogonal basis of the rational Krylov subspace

 Km(s)=span{(sjI−A)−kb|j=1,…,l;k=1,…,Mj}. \hb@xt@.01(2.16)

The interpolation nodes, obviously, should be chosen in the resolvent set of . Moreover, since in reality we need to solve a limiting continuum problem, the nodes should be in the closure of the intersection of the resolvent sets of any sequence of finite-difference operators that converge to the continuum problem. This set includes for problems on bounded domains. In our computations the interpolation nodes lie on the positive real axis, since they correspond to working with the Laplace transform of the time domain response.

Ideally, in the solution of the inverse problem we would like to minimize the time (or frequency) domain data misfit in a quadratic norm weighted in accordance with the statistical distribution of the measurement error. When considering the reduced order model, it is natural to choose the interpolation nodes that give the most accurate approximation in that norm. Such interpolation is known in control theory as (Hardy space) optimal, and in many cases the optimal interpolation nodes can be found numerically [gugercin2008mathcal]. Moreover, it was shown in [dsz2012], that the solution of the inverse problem using reduced order models with such interpolation nodes also minimizes the misfit functional (in the absence of measurement errors). When such nodes are not available, we can select some reasonable interpolation nodes chosen based on a priori error estimates, for example, the so-called Zolotarev points or their approximations obtained with the help of potential theory [druskin2009solution, zasletalgeoinv]. In most cases such choices lead to interpolation nodes that are distributed geometrically.

### 2.3 Finite difference parametrization of reduced order models

As we mentioned above, even though the reduced order model comes from a finite difference operator , it does not retain its structure. In particular, is a dense matrix. The model can be uniquely parametrized by numbers, for example the poles and residues in the representation (2.2). Here we show how to reparametrize it so that the resulting parameters have a meaning of finite difference coefficients.

A classical result of Stieltjes says that any rational function of the form (2.2) with negative poles and positive residues admits a representation as a Stieltjes continued fraction with positive coefficients

 Ym(s)=1ˆκ1s+1κ1+1ˆκ1s+1⋱1ˆκms+1κm. \hb@xt@.01(2.17)

Obviously, this is true in our case, since are the Ritz values of a negative definite operator and the residues are given as squares in (2.2).

Furthermore, it is known from [druskin2000gaussian] that (2.3) is a boundary response (Neumann-to-Dirichlet map) of a second order finite difference scheme with three point stencil

 1ˆκj(wj+1−wjκj−wj−wj−1κj−1)−swj=0,j=2,…,m, \hb@xt@.01(2.18)

and boundary conditions

 1ˆκ1(w2−w1κ1)−sw1+1ˆκ1=0,wm+1=0. \hb@xt@.01(2.19)

These equations closely resemble the Laplace transform of (2.1), except that the discretization is at nodes, which is much smaller than the dimension of the fine grid.

It is convenient to work henceforth with the logarithms and of the finite difference coefficients. We now introduce the two mappings and that play the crucial role in our inversion method. We refer to the first mapping as data fitting. It takes the time-dependent data to the logarithms of reduced order model parameters

 Q(d(⋅))={(logκj,logˆκj)}mj=1 \hb@xt@.01(2.20)

via the following chain of mappings

 Q:d(t)\lx@stackrel\tiny(a)→Y(s)\lx@stackrel\tiny(b)→Ym(s)\lx@stackrel\tiny(c)→{(cj,θj)}mj=1\lx@stackrel\tiny(d)→{(κj,ˆκj)}mj=1\lx@stackrel\tiny(e)→{(logκj,logˆκj)}mj=1. \hb@xt@.01(2.21)

Here step (a) is a Laplace transform of the measured data , step (b) requires solving the rational interpolation problem (2.2), which is converted to a partial fraction form in step (c), which in turn is transformed to a Stieltjes continued fraction form in step (d) with a variant of Lanczos iteration, as explained in Appendix A.

Note step (b) is the only ill-conditioned computation in the chain. The ill-posedness is inherited from the instability of the parabolic inverse problem and we mitigate it by limiting . The instability may be understood intuitively by noting that step (b) is related to analytic continuation. We give more details in section (2.6). In practice we choose so that the resulting and are positive. This determines the maximum number of degrees of freedom that we can extract from the data at the present noise level.

The mapping is the non-linear preconditioner. It takes the vector of discrete resistivities to the same output as . In simplest terms, it is a composition of and the forward map (2.1) given by

 R(r)=Q(y(⋅;r)). \hb@xt@.01(2.22)

Unlike the data fitting, the computation of can be done in a stable manner using a chain of mappings

 R:r\lx@stackrel\tiny(a)→A(r)\lx@stackrel\tiny(b)→V\lx@stackrel\tiny(c)→Am\lx@stackrel% \tiny(d)→{(cj,θj)}mj=1\lx@stackrel\tiny(e)→{(κj,ˆκj)}mj=1\lx@stackrel\tiny(f)→{(logκj,logˆκj)}mj=1. \hb@xt@.01(2.23)

Here step (a) is just the definition of , in (b) we compute the orthonormal basis for the rational Krylov subspace (2.2) on which is projected in step (c) to obtain and . Then, the poles and residues (2.2) are computed. The last two steps are the same as in the computation of .

### 2.4 Non-linearly preconditioned optimization

Now that we defined the data fitting and the non-linear preconditioner mappings, we can formulate our method for solving the semi-discrete inverse problem as an optimization. Given the data for (recall (2.1)), we estimate the true resistivity by , the solution of the non-linear optimization problem

 r⋆=arg minr∈RN+12∥Q(d(⋅))−R(r))∥22. \hb@xt@.01(2.24)

This is different than the typical optimization-based inversion, which minimizes the norm of the (possibly weighted) misfit between the measured data and the model . Such an approach is known to have many issues. In particular, the functional is often non-convex with many local minima, which presents a challenge for derivative-based methods (steepest descent, non-linear conjugate gradients, Gauss-Newton). The convergence if often slow and some form of regularization is required. In our approach we aim to convexify the objective functional by constructing the non-linear preconditioner .

To explain the map , let us give a physical interpretation of the reduced model parameters by introducing the change of coordinates , so that . The equation for , the Laplace transform of , is

 r−1/2∂ξ(r1/2∂ξU)−sU=0 \hb@xt@.01(2.25)

and (2.3) is its discretization on a staggered grid. The coefficients and are the increments in the discretization of the differential operators and , so they are proportional to the local values of and , respectively. Since , an ideal choice of would be an approximate inverse of for resistivity functions that can be approximated well on the discretization grid. It would interpolate in some manner the grid values of , which are defined by , up to some scaling factors that define the grid spacing.

Not all grids give good results, as explained in more detail in the next section. We can factor out the unknown grid spacings by working with the logarithm of the coefficients instead of the resistivity . Although it is possible to calculate a good grid, we do not require it in our inversion method. However, the grids can be useful for judging the quality of different matching conditions and for visualizing the behavior of as shown in section 2.5.

The ill-posedness of the inverse problem is localized in the computation of the data fitting term that is computed once. The instability of this computation can be controlled by reducing the size of the reduced order model projection subspace. A good strategy to follow in practice is to choose the largest such that all the coefficients , , computed by are positive.

Conventional optimization-based methods regularize the inversion by adding a penalty term to the objective function. Our approach is different. We acknowledge that there is a resolution vs. stability trade-off by reducing the size of the reduced model, and view regularization only as a means of adding prior information about the unknown resistivity. If such information is available, we can incorporate it at each iteration of the Gauss-Newton method via a correction in the null space of the Jacobian . The regularization scheme is discussed in detail in section 2.7.

Inversion via model order reduction is a recent idea that was introduced in [dsz2012, druskin2007combining, beattietal]. In particular, the approach in [dsz2012] uses maps and to the spectral parameters of the reduced order model and , . Unlike the continued fraction coefficients and , the poles and residues do not have a physical meaning of resistivity and the mapping does not behave like an approximate identity, as does. Our definition of the mapping is based on the ideas from [borcea2005continuum]. It allows us to improve the results of [dsz2012]. The improvement becomes especially pronounced in cases of high resistivity contrast, as shown in the numerical comparison in section 4.

The idea of convexification of the non-linear inverse problem has been pursued before for hyperbolic equations in [beretta2014inverse, klibanov2007numerical] and global convergence results were obtained in [beilina2008globally]. However, it is not clear if these approaches apply to parabolic equations. Although both hyperbolic and parabolic equations can be transformed to the frequency domain via Fourier and Laplace transforms, their data is mapped to different parts of the complex plane where the spectral parameter lies. It is known that the transformations from the hyperbolic to the parabolic data are unstable [kabanikhin2011inverse], so one cannot directly apply the methods from [beilina2008globally, beretta2014inverse, klibanov2007numerical] to the parabolic problem. The model reduction approach in this paper gives a specially designed discrete problem of small size which can be inverted stably.

### 2.5 Connection to optimal grids

We explain here that the map relates to an interpolant of the resistivity on a special, so-called optimal grid. Although our inversion method does not involve directly such grids, it is beneficial to study them because they provide insight into the behavior of the non-linear preconditioner . We give a brief description of the grids, and show that they depend strongly on the interpolation nodes in the matching conditions (2.2). We use this dependence to conclude that not all rational approximants of the transfer function are useful in inversion, as shown in section 2.5.2. Then, we use in section 2.5.3 the optimal grids and the continued fraction coefficients and to visualize the action of on . This allows us to display how the non-linear preconditioner approximates the identity.

#### 2.5.1 Optimal grids

The optimal grids have been introduced in [druskin2000gaussian, druskin2000gsr, IngDruKni] to obtain exponential convergence of approximations of the Dirichlet-to-Neumann map. They were used and analyzed in the context of inverse spectral problems in [borcea2005continuum] and in the inverse problem of electrical impedance tomography in [BorDruGue, BDMG-10]. The grids are defined by the coefficients of the continued fraction representation of the rational response function corresponding to a reduced model of a medium with constant resistivity . In principle we can define grids for other reference resistivities, not necessarily constant, but the results in [BorDruGue, borcea2005continuum, BDMG-10] show that the grids change very little with respect to . This is why they are very useful in inversion.

Let then so that the change of coordinates in (2.4) is trivial () and obtain from (2.3-2.3) that are the optimal grid steps in a finite difference scheme for the equation

 ∂2w∂x2−sw=0.

The primary and dual grid points are

 x(0)j=j∑k=1κ(0)k,ˆx(0)j=j∑k=1ˆκ(0)k,j=1,…,m, \hb@xt@.01(2.26)

with boundary nodes . In the numerical experiments we observe that the grid is staggered, i.e. the primary nodes and the dual nodes obey the interlacing conditions

 0=ˆx(0)0=x(0)0<ˆx(0)1

We do not prove (2.27) here, although it is possible to do so at least in some settings.

The optimal grids are closely connected with the sensitivity functions, which for the semi-discrete problem are given by the rows of the Jacobian defined as

The studies in [BorDruGue, BDMG-10] show that the sensitivity function corresponding to is localized around the corresponding grid cell , and its maximum is near . The same holds for , after interchanging the primary and dual grid nodes. Moreover, the columns of the pseudoinverse have similar localization behavior. The Gauss-Newton update is the linear combination of the columns of , and therefore optimal grids are useful for inversion. They localize well features of the resistivity that are recoverable from the measurements.

A good grid should have two properties. First, it should be refined near the point of measurement to capture correctly the loss of resolution away from . Second, the nodes should not all be clustered near because when the nodes get too close, the corresponding rows of become almost linearly dependent, and the Jacobian is ill-conditioned.

#### 2.5.2 Matching conditions

We study here the grids for three choices of matching conditions (2.2). The first corresponds to , and (simple Padé) and yields the rational Krylov subspace

 Km(0)=span{A−1b,A−2b,…,A−mb}.

This approximant has the best accuracy near the interpolation point (), and is obviously inferior for global approximation in when compared to multipoint Padé aproximants. The other two choices match and its first derivative at nodes that are distributed geometrically

 ˜sj=˜s1(˜s2˜s1)j−1,j=1,…,m. \hb@xt@.01(2.28)

We use henceforth the tilde to distinguish these nodes from those obtained with the change of variables

 sj=˜sj˜sm∈(0,1], \hb@xt@.01(2.29)

intended to improve the conditioning of the interpolation. The mathching conditions at yield the rational Krylov projection subspace

 Km(˜s)=span{(˜s1I−A)−1b,…,(˜smI−A)−1b},

and the two choices of interpolants differ in the rate of growth of . The second interpolant uses the rate of growth in (2.28) and , so that

 ˜sj=2(1+12m)j−1,j=1,…,m. \hb@xt@.01(2.30)

This choice approximates the Zolotarev nodes [IngDruKni] which arise in the optimal rational approximation of the transfer function over a real positive and bounded interval of . The third interpolant uses a faster rate of growth of and gives worse results, as illustrated in the numerical experiment below.

We show the optimal grids for all three choices of matching conditions in Figure 2.1 for reduced models of sizes . We observe that the nodes of the grids corresponding to fast growing are clustered too close to the measurement point . Thus, inversion results are expected to have poor resolution throughout the rest of the domain away from the origin. In addition, the clustering of the grid nodes leads to poor conditioning of the Jacobian . This is illustrated in Figure 2.2, where the condition numbers are plotted against the size of the reduced model. We observe that the condition number of the Jacobian for the reduced model with fast growing increases exponentially. The condition numbers of the Jacobians for the other two choices of matching conditions grow very slowly.

We can explain why the case of fast growing is undesirable in inversion by looking at the limiting case of approximation at infinity. The simple Padé approximant at infinity corresponds to the Krylov subspace

 Km(+∞)=span{b,Ab,…,Am−1b}

which is unsuitable for inversion in our setting. To see this, recall from (2.1) that is tridiagonal and is a scalar multiple . Thus, for any only the first components of the vector are non-zero. When is projected on in (2.2), the reduced model matrix is aware only of the upper left block of , which depends only on the first entries of . This is unsuitable for inversion where we want to capture the behavior of the resistivity in the whole interval, even for small . The corresponding optimal grid steps will simply coincide with the first grid steps of the fine grid discretization in (2.1).

When the interpolation nodes grow too quickly, we are near the limiting case of simple Padé approximant at infinity, and the first optimal grid steps are Consequently, the rows and of are almost collinear and the Jacobian is poorly conditioned, as shown in Figure 2.2.

#### 2.5.3 Action of the non-linear preconditioner on the resistivity

The optimal grids can be used to obtain resistivity reconstructions directly, without optimization, as was done in [borcea2005continuum] for the inverse spectral problem and in [BorDruGue, BDMG-10] for electrical impedance tomography. We do not use this approach, but we show here such reconstructions to display the behavior of the non-linear preconditioner when acting on .

Recall from equation (2.4) and the explanation in section 2.4 that and are proportional to the values of and around the corresponding optimal grid points. If we take as the proportionality coefficients the values and then we expect the ratios

 \hb@xt@.01(2.31)

to behave roughly as and . In practice, a more accurate estimate of the resistivity can be obtained by taking the geometric average of and

 ˜ζj=√ζjˆζj=κ(0)jˆκjκjˆκ(0)j,j=1,…,m. \hb@xt@.01(2.32)

Since building a direct inversion algorithm is not our focus, we only show (2.32) for comparison purposes.

In figure 2.3 we display the ratios (2.31) plotted at the nodes of the optimal grid , , . We take the same resistivities , and that are used in the numerical experiments in section 4. They are defined in (4.8) and (4.9). We observe that the curve defined by the linear interpolation of the “primary” points overestimates the true resistivity, while the “dual” curve passing through underestimates it. Both curves capture the shape of the resistivity quite well, so when taking the geometric average (2.32) the reconstruction falls right on top of the true resistivity. This confirms that resolves most of the non-linearity of the problem and thus acts on the resistivity as an approximate identity.

We can also illustrate how well resolves the non-linearity of the problem by considering an example of high contrast resistivity. In figure 2.4 we plot the same quantities as in figure 2.3 in the case of piecewise constant resistivity of contrast . The contrast is captured quite well by , while the shape of the inclusion is shrunk. This is one of the reasons why we use as a preconditioner for optimization and not a reconstruction mapping. Optimization allows us to recover resistivity features on scales that are smaller than those captured in and . Moreover, since resolves most of the non-linearity of the inverse problem, the optimization avoids the pitfalls of traditional data misfit minimization approaches, such as sensitivity to the initial guess, numerous local minima and slow convergence.

### 2.6 Data fitting via the rational interpolation

Unlike computed using the chain of mappings (2.3) with all stable steps, the computation of the data fitting term requires solving an osculatory rational interpolation problem (2.2) in step (b) of (2.3) to obtain the rational interpolant of the transfer function . This involves the solution of a linear system of equations with an ill-conditioned matrix, or computing the singular value decomposition of such matrix. We use the condition number of the matrix to assess the instability of the problem. The condition number grows exponentially with , but the rate of growth depends on the matching conditions used in the rational interpolation. We show this for the two choices of matching conditions: interpolation of and its first derivatives at distinct nodes distributed as in (2.30) (multipoint Padé), and matching of moments of at (simple Padé). We describe first both Padé interpolation schemes and then compare their stability numerically.

Multipoint Padé: Let us rewrite the reduced order transfer function (2.2) in the form

 Ym(s)=f(s)g(s)=f0+f1s+…+fm−1sm−1g0+g1s+…+gmsm, \hb@xt@.01(2.33)

where and are polynomials defined up to a common factor. We use this redundancy later to choose a unique solution of an underdetermined problem. The matching conditions (2.2) of and at the distinct interpolation nodes are

 \hb@xt@.01(2.34)

Next, we recall the change of variables (2.29) and define the Vandermonde-like matrices

 S=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣1s1s21…sm11s2s22…sm2⋮⋮⋮⋮⋮1sms2m…smm⎤⎥ ⎥ ⎥ ⎥ ⎥⎦,S′=1˜sm⎡⎢ ⎢ ⎢ ⎢ ⎢⎣012s1…msm−11012s2…msm−12⋮⋮⋮⋮⋮012sm…msm−1m⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, \hb@xt@.01(2.35)

and the diagonal matrices

 Y=diag(Y(˜s1),…,Y(˜sm)),Y′=diag(Y′(˜s1),…,Y′(˜sm)). \hb@xt@.01(2.36)

This allows us to write equations (2.34) in matrix-vector form as an underdetermined problem

 Pu=0,u∈R2m+1, \hb@xt@.01(2.37)

with

 P=[S1:m,1:m−YSS′1:m,1:m−Y′S−YS′]∈R2m×(2m+1), \hb@xt@.01(2.38)

and

 fj =˜s−jmuj+1,j=0,…,m−1, gj =˜s−jmuj+m+1,j=0,…,m. \hb@xt@.01(2.39)

The problem is underdetermined because of the redundancy in (2.33). We eliminate it by the additional condition , which makes it possible to solve (2.37) via the singular value decomposition. If we let be the matrix of right singular vectors of , then

 u=U1:(2m+1),2m+1. \hb@xt@.01(2.40)

Once the polynomials and are determined from (2.39), we can compute the partial fraction expansion (2.2). The poles are the roots of , and the residues are given by

 cj=f(−θj)gmm∏k=1k≠j(θk−θj),j=1,…,m,

assuming that are distinct. Finally, and are obtained from and via a Lanczos iteration, as explained in Appendix A.

Simple Padé: When , and , we have a simple Padé approximant which matches the first moments of in the time domain, because

 ∂jYm∂sj∣∣∣s=0=∂jY∂sj∣∣∣s=0=(−1)j∫+∞0y(t;r)tjdt,j=0,1,…,2m−1.

A robust algorithm for simple Padé approximation is proposed in [gonnet2011robust]. It is also based on the singular value decomposition. If has the Taylor expansion at

 Y(s)=τ0+τ1s+τ2s2+…+τ2m−1s2m−1+…, \hb@xt@.01(2.41)

then the algorithm in [gonnet2011robust] performs a singular value decomposition of the Toeplitz matrix

 T=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣τmτm−1⋯τ1τ0τm+1τm⋯τ2τ1⋮⋮⋱⋮⋮τ2m−1τ2m−2⋯τmτm−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦∈Rm×(m+1). \hb@xt@.01(2.42)

If is the matrix of right singular vectors of then the coefficients in (2.33) satisfy

 ⎛⎜ ⎜ ⎜ ⎜⎝g0g1⋮gm⎞⎟ ⎟ ⎟ ⎟⎠=U1:(m+1),m+1, \hb@xt@.01(2.43)

and

 ⎛⎜ ⎜ ⎜ ⎜⎝f0f1⋮fm−1⎞⎟ ⎟ ⎟ ⎟⎠=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣τ00⋯00τ1τ0⋯00⋮⋮⋱⋮⋮τm−1τm−2⋯τ00⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎛⎜ ⎜ ⎜ ⎜⎝g0g1⋮gm⎞⎟ ⎟ ⎟ ⎟⎠. \hb@xt@.01(2.44)

We refer the reader to [gonnet2011robust] for detailed explanations.

Comparison: To compare the performance of the two interpolation procedures, we give in Table 2.1 the condition numbers of matrices and . They are computed for the reference resistivity , for several values of the size of the reduced model. We observe that while both condition numbers grow exponentially, the growth rate is slower for the multipoint Padé approximant. Thus, we conclude that it is the best of the choices of matching conditions considered in this section. It allows a more stable computation of , it gives a good distribution of the optimal grid points and a well-conditioned Jacobian .

### 2.7 Regularization of Gauss-Newton iteration

Our method of solving the optimization problem (2.4) uses a Gauss-Newton iteration with regularization similar to that in [BorDruGue]. We outline it below, and refer to the next section for the precise formulation of the inversion algorithm.

Recall that maps the vectors of resistivity values on the fine grid to reduced order model parameters . Thus, the Jacobian has dimensions . Since the reduced order model is much coarser than the fine grid discretization , the Jacobian has a large null space. At each iteration the Gauss-Newton update to is in the dimensional range of the pseudoinverse of the Jacobian. This leads to low resolution in practice, because is kept small to mitigate the sensitivity of the inverse problem to noise. If we have prior information about the unknown , we can use it to improve its estimate .

We incorporate the prior information about the true resistivity into a penalty functional . For example, may be the total variation norm of if is known to be piecewise constant, or the square of the norm of or of its derivative if is expected to be smooth.

In our inversion method we separate the minimization of the norm of the residual and the penalty functional . At each iteration we compute the standard Gauss-Newton solution and then we add a correction to obtain a regularized iterate . The correction is in the null space of , so that the residual remains unchanged. We define it as the minimizer of the constrained optimization problem

 minimizes.t. [DR](r−ρ)=0L(ρ), \hb@xt@.01(2.45)

which we can compute explicitly in the case of a weighted discrete seminorm regularization, assumed henceforth,

 L(r)=12∥W1/2˜Dr∥22. \hb@xt@.01(2.46)

Here the matrix is a truncation of defined as

 ˜D=D1:(N−1),1:N∈R(N−1)×N,

and is a diagonal matrix of weights. We specify it below depending on the prior information on the true resistivity.

With the choice of penalty in the form (2.46) the optimization problem (2.45) is quadratic with linear constraints, and thus can be calculated from the first order optimality conditions given by the linear system

 ˜DTW˜Dρ+[DR]Tλ =0, \hb@xt@.01(2.47) [DR]ρ =[DR]r, \hb@xt@.01(2.48)

where is a vector of Lagrange multipliers.

In the numerical results presented in section 4 we consider smooth and piecewise constant resistivities, and choose in (2.46) as follows. For smooth resistivities we simply take , so that (2.46) is a regular discrete seminorm. For discontinuous resistivities we would like to minimize the total variation of the resistivity. This does not allow an explicit computation of , so we make a compromise and use the weights introduced in [abubakar2008]. The matrix is diagonal with entries

 wj=(([˜Dr]j