# Large-scale Inversion of Subsurface Flow using Discrete Adjoint Method

###### Contents

## Abstract

Keywords: subsurface, flow, inversion, adjoint method, sensitivity analysis, parallel, high performance computing.

## 1. Introduction

Inverse analysis plays a key role in developing realistic models for subsurface hydrogeology.
This is largely because state variables such as pressure can be observed, but constitutive parameters such as permeability that are needed to make predictions can only be inferred from observations of the state variables.
There is a long history of applying inverse methods in subsurface hydrology ^{1}; ^{2}; ^{3}; ^{4}; ^{5}; ^{6}; ^{7} often via the geostatistical approach^{8}; ^{6}; ^{5}; ^{9}; ^{10} and using a variety of computational techniques such as dimension reduction^{11}, subspace recycling^{12}, and even quantum computational methods^{13}.
Recently, it is increasingly important to calibrate the model to match large data sets obtained from observing pressure transients at a relatively modest number of wells over a long period of time^{14}.

As a result of an inverse analysis, subsurface hydrologic modelers can often use these calibrated models to make accurate predictions related to, e.g., the impacts of pumping at one well on the water supply at another well or the fate of contaminants in groundwater ^{15}.
In other cases, the data that has been used to calibrate the model may not be sufficient to use the model in a predictive fashion.
When this happens, the calibrated model is often used as a starting point for an uncertainty analysis (e.g., as the starting point in a Markov Chain Monte Carlo or in a null space Monte Carlo method ^{16}).
Often these analyses are used to inform decisions (e.g., related to remediating contaminated groundwater ^{17}).

## 2. Formulation

Let be a bounded open domain, where “nd” is the number of spatial dimensions. The boundary is assumed to be piecewise smooth. The boundary is divided into two parts: and . ()is that part of the boundary on which Dirichlet(Neumann) boundary conditions are prescribed. For mathematical well-posedness, we assume and . The unit outward normal to boundary is denoted as . The permeability tensor is denoted by , which is assumed to symmetric, bounded above and uniformly elliptic. That is, there exists two constant such that

(2.1) |

### 2.1. Govering equations for subsurface flow

The governing equation for subsurface flow is given by \linenomath

(2.2) |

where is the porosity (unitless), is the mass density (kg/m), is the dynamic viscosity (Pa-s), is the permeability (m), is the pressure (Pa), is the volumetric flow rate (kg//s). Assuming is constant and that the spatial gradient of density is small, the above equation reduces to \linenomath

(2.3) |

where is water compressibility (, Pa) and is the hydraulic conductivity (m/s). Here permeability is connected to hydraulic conductivity by , and is the acceleration due to gravity (). We shall denote the pressure field by . Let us consider the transient flow in heterogeneous porous media governed by the following diffusion equation and boundary/initial conditions

(2.4) | ||||

where is the volumetric source or sink, are prescribed pressure and flux respectively. is the scaled diffusivity as D(x) = . For uniqueness, we assume is not empty. This initial-boundary-value-problem(IBVP) is a second-order parabolic partial differential equation(PDE). Let denote the operator in Eq. 2.4, it is worthwhile to point out that the adjoint operator is , where time runs backwards. The adjoint problem to Eq. 2.4 involves adjoint boundary conditions, which is often non-trivial to formulate, especially when irregular boundary configuration is involved.

#### 2.1.1. Maximum principle

The maximum principle of a transient diffusion equation asserts that the maximum can occur only on the boundary of the domain or in the initial condition if and . Mathematically, a solution to equations (2.18a)–(2.18a) will satisfy:

### 2.2. PDE-constrained optimization

Determining parameters of a partial differential equations(PDE) model is often formulated as a PDE-constrained optimization problem where the field values mathch observations. This is also referred as inverse problem. Such problems take the form,

s.t. |

where , , and are field value, parameters in PDE, objective function and PDE induced constraints. From a optimization point of view, it is required that be feasible at every step in when converges to a minimizer. The necessary ingredients of a capable optimization solver for Eq. 2.2 should: 1)be able to solve (PDE or forward problem solver); 2) evaluate ; 3) provide the gradient . Among those problems, time-dependent ones arise wide attentions for such a reason that forward problems are often treated by the method-of-line which induces a system of ODE. The adjoint equation to the probelm is also an ODE, which means that they both can be solved by the same standard ODE integrators. The adjoint method of time-dependent problem comes in the form,

(2.5) | ||||||

s.t. | ||||||

The ith total derivative(gradient) is denoted as,

(2.6) |

The corresponding Lagrangian of 2.5 can be written as

(2.7) |

where and are vectors of Lagrangian multipliers as function of time. They are also named by adjoint vectors. Since only equality constraints are involved, we are free to set values of and . Also note that , the total derivative is,

(2.8) | ||||

where integration by part is used. The term is non-trivial to obtain, thus we set to make the whole term vanish. By setting , evaluation of term is avoided. Recursively, we can avoid computing for all by setting

(2.9) |

The following algorithm describes how is computed:

The output is the Jacobian matrix which is associated with the sensitivity on . It only takes 1 forward and 1 adjoint(inverse) run, the Jacobian is yielded. As a compare, differentiation based approach needs to take forward runs. The advantages get siginificant when .

#### 2.2.1. Discrete adjoint sensitivity analysis

There are several ways to solve Eq. 2.4, here the method of line approach is adopted, which resulting a system of ordinary differential equations(ODE) as,

(2.10) |

where is the spatial discretization of flow field . is the mass matrix which is usually symmetric-positive definite. Here assume is identity for brevity of notations. The right-hand-side involves the contribution from the parameters of model(permeability distribution ). Let us consider a simples t forward integration scheme, backward Euler, for 2.10 as

(2.11) |

Now define the sensitivity variable as , where means the th parameters in the model. The sensitivity equation corresponding to is immediately obtained after pluging into Eq. 2.11,

(2.12) |

where and are Jacobian matrices. As we can see that the trajactory of follows a similar trajactory with model’s state variable in the forward process. To be general, use to denote any one-step integration scheme. In our implementation , the objective function is chosen to only involve the terminal term under the effect of parameters as

(2.13) |

The constraints of the optimization problem are chosen to be the discretized PDE at each time step. Therefore, the Lagrangian is written as

(2.14) |

,where are Lagrange multipliers. We use to approximate . Differentiating this function with respect to yields

(2.15) |

Let and define

(2.16) |

The gradient of target objective function is

(2.17) |

Now treat as a implicit function and use backward Euler as example. Take derivative of in Eq. 2.11, we get

(2.18) |

Combining with Eq. 2.16, the discrete adjoint equation is formulated as,

(2.19) |

## 3. Numerical Implementation

#### 3.0.1. PETSc and TAO

We leverage on scientific libraries such as PETSc and TAO to implement the large-scale inversion’s computation. PETSc is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by PDEs, implementing MPI standard and widely used in parallel finite element codes development. It also provides interfaces to several other libraries such as Metis/ParMETIS and HDF5 for mesh partitioning and binary data format handling respectively. To solve the large-scale optimization problem, another important feature with PETSc, TAO, is employed. Our non-negative methodology will use the Bounded Limited-Memory Variable-Metric(BLMVM) solver available in TAO to approximate the Hessian, and this is efficient in large-scale context. Other optization algorithm like Conjugate Gradient(CG) and Limited-Memory Variable-Metric(LMVM) will be compared in the convergence and memory consumption. Further details regarding the implementation of these various methods may be found in and the references within.

#### 3.0.2. Weak formulation

Continuous Galerkin approach is adopted for the FE setup. The trial and test function spaces are chosen to be

(3.1) | ||||

The weak form for Eq. 2.4 reads: find , such that

(3.2) |

where the bilinear form and linear functional are, respectively, defined as

(3.3) | ||||

The assembly of mass/stiffness matrix, Gaussian quadrature and other routines are implemented in-house while the parallel matrix/vector operations are interfaced with in PETSc’s build-in. First, following the FE model outlined in [19], we consider the weak form that depends on fields and gradients. The residual evaluation can be expressed as:

(3.4) |

where and are point-wise functions that capture the physics. This framework decouples the problem specification from the mesh and degree of freedom traversal which easy the implementation on distributed memory machines. The discretization of the residual is written as:

(3.5) |

where represents the assembly operator, and are matrix forms of basis functions over quadrature points, diagonal matrix is the quadurature weights, and is the field value at quadrature point . Mapping back to Eq. 3.3,

(3.6) |

here the superscript denotes for time step and also assume for simplicity. Naturally, the Jacobian is the derivatives of Eq.3.5 as,

(3.7) | ||||

The point wise functions are

(3.8) |

#### 3.0.3. Parallel finite element assembly

In each optimization step, one forward and adjoint run are conducted and each run is a solution of time-dependent problem. The PETSc interface for solving time dependent problems assumes the problem is written in the form

(3.9) |

User has to provide how to evaluate the residual and Jacobian from using interface functions "TSSetIFunction" and "TSSetIJacobian". Take backward Euler scheme applied to as example, the time derivate , it results in the Jacobian . As a result, evalulation of the Jacobian for each forward/adjoint run is required since it is a function of . But within one forward(or adjoint)run, it just has been computed once if fixed time step is assumed. Apart from matrix systems solution by Krylov method, matrices assembly is another bottle-neck when going to large scale.

This paper considers a hybrid framework of parallelism on both shared-memory(OpenMP) and distributed-memory(MPI) level. In dealing with shared memory machines, the assembly of stiffness matrices in FE will encounter race condition if two adjacent elements are assembled at the same time by two threads. With the help of graph coloring, the elements can be assembled one color at a time, thus preventing race condition. In order to do the coloring, the indices of neighboring elements are necessary ,which can be readily obtained from the adjacency graph of the mesh. Take the triangular mesh in Fig 1 as example, the corresponding graph and one possible coloring(4-colored) are shown.

Since the test and trial functions are nodal based, two elements are considered to be connected once they share at least one node. Elements in the same color now are safe to be assembled by different threads.

## 4. Results and Performance

### 4.1. 2D Verification

#### 4.1.1. Hetergeneous diffusion in 2D geometry

Convergence with tao types

#### 4.1.2. Hetergeneous diffusion in a unit cube with spherical holes

Let the computational domain be a unit cube with two spherical holes of radius 0.2 and 0.35. The concentration on the outer boundary is taken to be zero and the concentration on the interior boundary is taken to be unity. The volumetric source is taken as zero (i.e., f (x) = 0). The velocity vector field for this problem is chosen to be

The estimation of diffusivity for mesh type B after three optimization steps are shown below.

## 5. Parallel Performance

### 5.1. Strong scaling performance of forward run

1million hex grid forward run

#### 5.1.1. Scaling performance of multi-threading

The multi-threading is implemented with OpenMP on multi-core CPUs.

#### 5.1.2. Scaling performance of MPI+OpenMP

### 5.2. Performance modeling

Since the inversion process involves both forward and backward runs, as well as optimization steps, there will be fraction of the code that is not amenable to parallelization. Here we employ Amdahl’s law and Gustafson’s law to model strong and weak scaling respectively.

### 5.3. Strong scaling model

Amdahl’s law can be formulated as follows

(5.1) |

where s is the proportion of execution time spent on the serial part and N is the number of processors. Amdahl’s law states that, for a fixed problem, the upper limit of speedup is determined by the serial fraction of the code. Here we tested on three types of mesh with unknown sizes being 0.25, 1 and 4 million. The results and fitted model are shown below.

As dipicted in the figure, the fraction of serial part of the code decreases with the increasing of problem size. Thus we expect better strong scaling performance for large problem.

### 5.4. Weak scaling model

The sizes of problems scale with the amount of available resources in real applications. A more reasonable choice is to use small amounts of resources for small problems and larger quantities of resources for big problems. Amdahl’s law gives the upper limit of speedup for a problem of fixed size. For measuring the weak scaling, where the scaled speedup is calculated based on the amount of work done for a scaled problem size (in contrast to Amdahl’s law which focuses on fixed problem size), Gustafson’s law is a more wise choice. It is based on the approximations that the parallel part scales linearly with the amount of resources, and that the serial part does not increase with respect to the size of the problem. It provides the formula for scaled speedup as:

(5.2) |

, where and has the same meaning as in Amdahl’s law. Here we fix the number of cells per processor and increase the number of processors. The results and fitted model are shown below.

We observe that, as incresing the workload per processor, the weak scaling gets worse, together with the proportion of serial part increases. This can be explained by the fact that the optimizer, which is the major serial part, takes more efforts to find next optimization direction. Also notice that the discrepancy in s between strong/weak scaling modeling. This is attributed to the approximations in the laws — the serial fraction is assumed to remain constant, and the parallel part is assumed to be speed up in proportion to the number of processors. In practice, the overhead of parallelization may also increase with the job size (e.g. from the scheduling of threads), and in this case it is understandable that the weak scaling model gives a larger serial fraction s.

### 5.5. Real-world Problem

We consider a real world problem of subsurface flow in this section. The parameters of simulation domain is described in the table below. In the first case, a 2D model is considered. The inversion run is carried out based on the observation of pressure collected at day 1 and day 150. In this simulation, only the flows on x-y plane are considered. The initial pressure is only collected at 25 locations and the background pressure is assumed to be Pa. The pressure at day 1 and 150 are shown in Fig. 11.

In Fig. 12, the diffusivity field after 50 and 110 TAO iterations are plotted as compared to the true diffusivity. The convergence history is also shown.

Due to the sparsity of the observations, the inversion could only reveals the diffusivity field at sample locations.

3D example, In the z direction, 2 kilometer. The initial condition is a steady state(run >1000 days from the funky ic I generated). Sinks are prescribed at 5 locations and run forwardly for 200 days. For the initial condition and observation data, please see Fig.13

The parameters for this module is : g=9.8m/, kg/; ; and K=9.8 m/s to 2.94m/s

Run the inversion on 64 nodes for 160 TAO iterations. The inverted diffusivity as compared to true distribution are shown in Fig. 14.

And the convergence plot.

## Conclusions

## Acknowledgments

SK thanks BER for support. SW thanks LANL Parallel Computing Summer School for support.

## References

- 1 S. P. Neuman and S. Yakowitz. A statistical approach to the inverse problem of aquifer hydrology: 1. theory. Water Resources Research, 15(4):845–860, 1979.
- 2 S. P. Neuman, G. E. Fogg, and E. A. Jacobson. A statistical approach to the inverse problem of aquifer hydrology: 2. case study. Water Resources Research, 16(1):33–58, 1980.
- 3 J. Carrera and S. P. Neuman. Estimation of aquifer parameters under transient and steady state conditions: 1. maximum likelihood method incorporating prior information. Water Resources Research, (22):199–210, 1986.
- 4 N. Sun. Inverse problems in groundwater modeling. Kluwer Academic Publishers, 1994.
- 5 P.K. Kitanidis. Introduction to Geostatistics: Applications to Hydrogeology. Stanford-Cambridge program. Cambridge University Press, 1997.
- 6 J. Zhang and T-C J. Yeh. An iterative geostatistical inverse method for steady flow in the vadose zone. Water Resources Research, 33(1):63–71, 1997.
- 7 Jesús Carrera, Andrés Alcolea, Agustín Medina, Juan Hidalgo, and Luit J Slooten. Inverse problem in hydrogeology. Hydrogeology journal, 13(1):206–222, 2005.
- 8 Peter K Kitanidis. Quasi-linear geostatistical theory for inversing. Water resources research, 31(10):2411–2419, 1995.
- 9 Peter K Kitanidis. The minimum structure solution to the inverse problem. Water resources research, 33(10):2263–2272, 1997.
- 10 V.V. Vesselinov, S.P. Neuman, and W.A. Illman. Three-dimensional numerical inversion of pneumatic cross-hole tests in unsaturated fractured tuff 1. Methodology and borehole effects. Water Resources Research, 37(12), 2001.
- 11 Jonghyun Lee and Peter K Kitanidis. Large-scale hydraulic tomography and joint inversion of head and tracer data using the principal component geostatistical approach (pcga). Water Resources Research, 50(7):5410–5427, 2014.
- 12 Youzuo Lin, Daniel O’Malley, and Velimir V Vesselinov. A computationally efficient parallel levenberg-marquardt algorithm for highly parameterized inverse model analyses. Water Resources Research, 52(9):6948–6977, 2016.
- 13 Daniel O’Malley. An approach to quantum-computational hydrologic inverse analysis. Scientific reports, 8(1):6919, 2018.
- 14 Youzuo Lin, Ellen B Le, Daniel O’Malley, Velimir V Vesselinov, and Tan Bui-Thanh. Large-scale inverse model analyses employing fast randomized data reduction. Water Resources Research, 53(8):6784–6801, 2017.
- 15 D O’Malley and VV Vesselinov. A combined probabilistic/nonprobabilistic decision analysis for contaminant remediation. SIAM/ASA Journal on Uncertainty Quantification, 2(1):607–621, 2014.
- 16 Matthew Tonkin and John Doherty. Calibration-constrained monte carlo analysis of highly parameterized models using subspace techniques. Water Resources Research, 45(12), 2009.
- 17 D O’Malley and VV Vesselinov. Groundwater remediation using the information gap decision theory. Water Resources Research, 50(1):246–256, 2014.