Gradient-Based Estimation of Uncertain Parameters for Elliptic Partial Differential Equations

# Gradient-Based Estimation of Uncertain Parameters for Elliptic Partial Differential Equations

Jeff Borggaard, Hans-Werner van Wyk
###### Abstract

This paper addresses the estimation of uncertain distributed diffusion coefficients in elliptic systems based on noisy measurements of the model output. We formulate the parameter identification problem as an infinite dimensional constrained optimization problem for which we establish existence of minimizers as well as first order necessary conditions. A spectral approximation of the uncertain observations allows us to estimate the infinite dimensional problem by a smooth, albeit high dimensional, deterministic optimization problem, the so-called finite noise problem in the space of functions with bounded mixed derivatives. We prove convergence of finite noise minimizers to the appropriate infinite dimensional ones, and devise a stochastic augmented Lagrangian method for locating these numerically. Lastly, we illustrate our method with three numerical examples.

## 1 Introduction

This paper discusses a variational approach to estimating the parameter in the elliptic system

 −∇⋅(q∇u)=f on\ D,u=0 on ∂D, (1)

based on noisy measurements of , when is modeled as a spatially varying random field. Equation (1), defined over the physical domain , may describe the flow of fluid through a medium with permeability coefficient or heat conduction across a material with conductivity . Variational formulations in which the identification problem is posed as a constrained optimization, have been studied extensively for the case when is deterministic [6, 12, 13, 19, 17]. Aleatoric uncertainty arising in these problems from imprecise, noisy measurements, variability in operating conditions, or unresolved scales are traditionally modeled as perturbations and addressed by means of regularization techniques. These approximate the original inverse problem by one in which the parameter depends continuously on the data , thus ensuring an estimation error commensurate with the noise level. However, when a statistical model for uncertainty in the dynamical system is available, it is desirable to incorporate this information more directly into the estimation framework to obtain an approximation not only of itself but also of its probability distribution.

Bayesian methods provide a sampling-based approach to statistical parameter identification problems with random observations . By relating the observation noise in to the uncertainty associated with the estimated parameter via Bayes’ Theorem [37, 38], these methods allow us to sample directly from the joint distribution of at a given set of spatial points, through repeated evaluation of the deterministic forward model. The convergence of numerical implementations of Bayesian methods, most notably Markov chain Monte Carlo schemes, depends predominantly on the statistical complexity of the input and the measured output and is often difficult to assess. In addition, the computational cost of evaluating the forward model can possibly severely limit their efficiency.

There has also been a continued interest in adapting variational methods to estimate parameter uncertainty [5, 31, 32, 42]. Benefits include a well-established infrastructure of existing theory and algorithms, the possibility of incorporating multiple statistical influences, arising from uncertainty in boundary conditions or source terms for instance, and clearly defined convergence criteria. Let be a complete probability space and suppose we have a statistical model of the measured data in the form of a random field contained in the tensor product . A least squares formulation of the parameter identification problem in (1), when is a random field, may take the form

where the regularization term with is added to ensure continuous dependence of the minimizer on the data . Here , where is any Hilbert space that imbeds continuously in , which may be taken to be the Sobolev space when or when (see [19]). The feasible set is given by

while the stochastic equality constraint represents Equation (1). It can also be written in its weak form as a functional equation in , where

 ⟨~e(q,u),v⟩H−1,H10:=∫Ω∫Dq(x,ω)∇u(x,ω)⋅∇v(x,ω)dxdω−∫Ω∫Df(x)ϕ(x,ω)dω (2)

for all [4]. For our purposes, it is useful to consider the equivalent functional equation in , where in the weak sense. Although these two forms of equality constraint are equivalent, pre-multiplication by the inverse Laplace operator adds a degree of preconditioning to the problem, as observed in [19]. We assume for the sake of simplicity that is deterministic.

This formulation poses a number of theoretical, as well as computational challenges. The lack of smoothness of the random field in its stochastic component limits the regularity of the equality constraint as a function of , making it difficult to use theory analogous to the deterministic case in establishing first order necessary optimality conditions, as will be shown in Section 2. The most significant hurdle from a computational point of view is the need to approximate high dimensional integrals, both when evaluating the cost functional and when dealing with the equality constraint (2). Monte Carlo type schemes seem inefficient, especially when compared with Bayesian methods. The recent success of Stochastic Galerkin methods [4, 41] and stochastic collocation-based approaches [4, 27] in efficiently estimating high dimensional integrals related to stochastic forward problems has, however, motivated investigations into their potential use in associated inverse and design problems.

In forward simulations, collocation methods make use of spectral expansions, such as the Karhunen-Loève (KL) series, to approximate the known input random field by a smooth function of finitely many random variables, a so-called finite noise approximation. Standard PDE regularity theory [4] then ensures that the corresponding model output depends smoothly (even analytically) on these random variables. This facilitates the use of high-dimensional quadrature techniques, based on sparse grid interpolation of high order global polynomials. Inverse problems on the other hand are generally ill-posed and consequently any smoothness of a finite noise approximation of the given measured data does not necessarily carry over to the unknown parameter . In variational formulations, explicit assumptions should therefore be made on the smoothness of finite noise approximations of to facilitate efficient implementation, while also accurately estimating problem ().

We approximate () in the space of functions with bounded mixed derivatives. Posing the finite noise minimization problem () in this space not only guarantees that the equality constraint is twice Fréchet differentiable in (see Section 4), but also allows for the use of numerical discretization schemes based on sparse grid hierarchical finite elements, approximations known not only for their amenability to adaptive refinement, but also for their effectiveness in mitigating the curse of dimensionality [11]. The authors in [42] demonstrate the use of piecewise linear hierarchical finite elements to approximate the finite noise design parameter in a least squares formulation of a heat flux control problem subject to system uncertainty, which is solved numerically through gradient-based methods. This paper aims to provide a rigorous framework within which to analyze and numerically approximate problems of the form ().

In Section 2, we establish existence and first order necessary optimality conditions for the infinite dimensional problem (). In Section 3 we make use of standard regularization theory to analytically justify the approximation of () by the finite noise problem (). We discuss existence and first order necessary optimality conditions for () in Section 4 and formulate an augmented Lagrangian algorithm for finding its solution in Section 5. Section 6 covers the numerical approximation of and , as well as the discretization of augmented Lagrangian optimization problem. Finally, we illustrate the application of our method on three numerical examples.

## 2 The Infinite Dimensional Problem

In order to accommodate the lack of smoothness of as a function of in our analysis, we impose inequality constraints uniformly in random space. Any function in the feasible set , satisfies the norm bound uniformly on , which by the continuous imbedding of into , implies for all . This assumption, while ruling out unbounded processes, nevertheless reflects actual physical constraints. The uniform coercivity condition , guarantees that for each , there exists a unique solution to the weak form (2) of the equality constraint [3] satisfying the bound

 ∥u∥2H10≤CDqmin∥f∥L2. (3)

Hence all and their respective model outputs have statistical moments of all orders.

### 2.1 Existence of Minimizers

An explicit stability estimate of in terms of the norm of was given in [3, 4] for . These norms, besides not having Hilbert space structure, give rise to topologies that are too weak for our purposes. The following lemmas establish the weak compactness of the feasible set, continuity of the solution mapping restricted to , as well as the weak closedness of its graph in the stronger norm and will be used to prove the existence of solutions to ().

###### Lemma 2.1.

The set is closed, convex, and hence weakly compact in .

###### Proof.

Recall that

 Qad={q∈H:q(x,ω)≥qmin,  a.% s. on D×Ω, ∥q(.,ω)∥H≤qmax \ a.s% . on Ω}.

Convexity is easily verified. To show that is closed, let and be such that

 ∥qn−q∥2H=∫Ω∥qn(.,ω)−q(.,ω)∥2Hdω→0as n→∞.

Since convergence in implies pointwise almost sure convergence of a subsequence on , it follows that

 ∥qnk(.,ω)−q(.,ω)∥H→0   a.s. on Ω

for some subsequence . Additionally, a.s. on for and therefore also satisfies this constraint. Finally, imbeds continuously in , which implies that the subsequence in fact converges to pointwise a.s. on , ensuring that also satisfies pointwise constraint a.s. on . ∎

###### Lemma 2.2.

The mapping is continuous.

###### Proof.

Suppose in . As in the proof of the previous lemma, there exists a subsequence pointwise a.s. on . The upper bound on the function established in [4, p. 1261] ensures that

 ∥u(qnk)−u(q)∥H10≤(CD∥f∥L2q2min)∥qnk−q∥L∞(Ω;L∞(D))→0   as n→∞,

where is the constant appearing in the Poincaré inequality on . Furthermore, since any subsequence of has a subsequence converging to , it follow that in fact . ∎

###### Lemma 2.3.

The graph of is weakly closed.

###### Proof.

Let be a sequence in , so that in and in . The weak compactness of shown in Lemma 2.1, directly implies . It now remains to be shown that or equivalently that solves . Written in variational form, the requirement is given by

 ∫Ω∫Dq∇u⋅∇vdxdω=∫Ω∫Dfvdxdω     for all v∈H10. (4)

Since the condition can be written as:

 ∫Ω∫Dqn∇un⋅∇vdxdω=∫Ω∫Dfvdxdω     for all v∈H10, (5)

it suffices to show that the left hand side of (5) (or some subsequence thereof) converges to the left hand side of (4) for all . Now for any and ,

 ∫Ω∫D(qn∇un−q∇u)⋅∇vdxdω =∫Ω∫D(qn−q)∇un⋅∇vdxdω +∫Ω∫Dq∇(un−u)⋅∇vdxdω.

Let be the subsequence of that converges to pointwise a.s. on , as guaranteed by Lemma 2.1. We can then bound the first term by

 ∣∣∣∫Ω∫D(qnk−q)∇unk⋅∇vdxdω∣∣∣ ≤ (∫Ω∫D|qnk−q||∇unk|2dxdω)12(∫Ω∫D|qnk−q||∇v|2dxdω)12 ≤ 2qmaxqmin∥f∥L2(∫Ω∫D|qnk−q||∇v|2dxdω)12→0  as nk→∞,

by the Dominated Convergence Theorem, since the integrand is bounded above by .

The second term in this sum converges to due to the weak convergence and the fact that the mapping defines a norm that is equivalent to , by virtue of the fact that . Therefore

 ∫Ω∫Dq∇u⋅∇vdxdω=limn→∞∫Ω∫Dqn∇un⋅∇vdxdω=∫Ω∫Dfvdxdω

for all and hence . ∎

By combining these lemmas, we can now show that a solution of the infinite dimensional minimization problem () exists for any .

###### Theorem 2.4 (Existence of Minimizers).

For each , the problem () has a minimizer.

###### Proof.

Let be a minimizing sequence for the cost functional over , i.e.

Since satisfies the equality constraint , and consequently for all (Lax-Milgram), the Banach Alaoglu theorem guarantees the existence of a weakly convergent subsequence . Moreover, the weak compactness of established in Lemma 2.1 also yields a subsequence as , so that . The fact that the infimum of is attained at the point follows directly from the weak lower semicontinuity of norms [30]. Indeed,

Finally, it follows directly from Lemma 2.3 that and hence satisfies the inequality constraint . The regularization term was not required to show the existence of minimizers. ∎

### 2.2 A Saddle Point Condition

Although solutions to () exist, the inherent lack of smoothness of in the stochastic variable complicates the establishment of traditional necessary optimality conditions. A short calculation reveals that the equality constraint is not Fréchet differentiable, as a function in . Additionally, the set of constraints has an empty interior in the -norm. Instead, we follow [14] in deriving a saddle point condition for the optimizer of () with the help of a Hahn-Banach separation argument.

Let denote the inner product. For any triple , we define the Lagrangian functional by

 L(q,u,λ)=J(q,u)+⟨e(q,u),λ⟩H10=12∥u−^u∥2H10+β2∥q∥2H+⟨q∇u,∇λ⟩−⟨f,λ⟩.

The main theorem of this subsection is the following

###### Theorem 2.5 (Saddle Point Condition).

Let solve problem (). Then there exists a Lagrange multiplier so that the saddle point condition

 L(q∗,u∗,μ)≤L(q∗,u∗,λ∗)≤L(q,u,λ∗) (6)

holds for all .

###### Proof.

Note that the second inequality simply reflects the optimality of . To obtain the first inequality, we rely on a Hahn-Banach separation argument. Let

and

 T={(−t,0)∈\real×H10:t>0}

In the ensuing three lemmas we will show that

1. and are convex (Lemma 2.6),

2. (Lemma 2.7), and

3. has at least one interior point (Lemma 2.8).

The Hahn-Banach Theorem thus gives rise to a separating hyperplane, i.e. a pair in , such that

Letting and readily yields . In fact . Suppose to the contrary that . Then by (7)

particularly for and satisfying , we have

 ⟨q∗∇u,∇λ0⟩−⟨f,λ0⟩=⟨f−λ0,λ0⟩−⟨f,λ0⟩=−⟨λ0,λ0⟩≥0,

which implies that . This contradicts the fact that . Dividing (7) by and letting yields and hence

 L(q∗,u∗,μ) =J(q∗,u∗)+⟨e(q∗,u∗),μ⟩H10=J(q∗,u∗) ≤J(q,u)+⟨e(q,u),λ∗⟩H10=L(q,u,λ∗)

for all . ∎

###### Lemma 2.6.

The sets and are convex.

###### Proof.

Clearly, is convex. Let and consider the convex combination where . Hence is of the form where

 pα =α(J(q1,u1)−J(q∗,u∗)+s1)+(1−α)(J(q2,u2)−J(q∗,u∗)+s2) wα =αe(q1,u1)+(1−α)e(q2,u2)

with . It now remains to show that for some and for some . Let and let be the unique solution of the variational problem

 ⟨qα∇uα,∇ϕ⟩=⟨αq1∇u1+(1−α)q2∇u2,∇ϕ⟩∀ϕ∈H10.

Therefore

 ⟨wα,ϕ⟩H10 =⟨αq1∇u1+(1−α)q2∇u2,∇ϕ⟩−⟨f,ϕ⟩ =⟨qα∇uα,∇ϕ⟩−⟨f,ϕ⟩=⟨e(qα,uα),ϕ⟩H10 ∀ϕ∈H10

which implies that . Moreover, it follows readily from the convexity of norms that

 J(qα,uα)≤αJ(q1,u2)+(1−α)J(q2,u2)

and therefore letting

 sα =αJ(q1,u1)+(1−α)J(q2,u2)−J(qα,uα)+αs1+(1−α)s2≥αs1+(1−α)s2≥0

we obtain

 pα=J(qα,uα)−J(q∗,u∗)+sα.

###### Lemma 2.7.

The sets and are disjoint.

###### Proof.

This follows directly from the fact that for all points in

###### Lemma 2.8.

The set has a non-empty interior.

###### Proof.

Clearly for any . For any , let belong to the -neighborhood of . In other words . Let and let be the solution to the problem

 ⟨q∗∇u,∇ϕ⟩=⟨f,ϕ⟩+⟨∇w,∇ϕ⟩∀ϕ∈H10(D). (8)

Clearly, by definition. Then

 s′ :=s0+J(q∗,u∗)−J(q,u)=s0+J(q∗,u∗)−J(q∗,u) =s0+12∫Ω∫D|∇(u∗(x,ω)−^u(x,ω))|2dxdω−12∫Ω∫D|∇(u(x,ω)−^u(x,ω))|2dxdω =s0−12∫Ω∫D∇(u(x,ω)−u∗(x,ω))⋅∇(u(x,ω)+u∗(x,ω)−2^u(x,ω))dxdω

Now satisfies and hence by (3). Similarly, since solves (8), it follows that and hence . We therefore have

 s′ ≥s0−12∥u−u∗∥H10(∥u∗∥H10+∥u∥H10+2∥^u∥H10) ≥s0−ϵ2qmin(CDqmin∥f∥L2+CDqmin(∥f∥L2+ϵ)+2∥^u∥H10) ≥s0−ϵ2q2min(2CD∥f∥L2+CDϵ+2qmin∥^u∥H10)≥0

for small enough . Therefore for any in a small enough -neighborhood of . ∎

In the following section, we will show that if the observed data is expressed as a Karhunen-Loève series [23, 33], we may approximate problem () by a finite noise optimization problem (), where is a smooth, albeit high-dimensional, function of and intermediary random variables . The convergence framework not only informs the choice of numerical discretization, but also suggests the use of a dimension-adaptive scheme to exploit the progressive ‘smoothing’ of the problem.

## 3 Approximation by the Finite Noise Problem

According to [23], the random field may be written as the Karhunen-Loève (KL) series

 ^u(x,ω)=^u0(x)+∞∑k=1√νkbk(x)Yk(ω), (9)

where is an uncorrelated orthonormal sequence of random variables with zero mean and unit variance and is the eigenpair sequence of ’s compact covariance operator [33]. Moreover, the truncated series

 ^un(x,ω)=u0(x)+n∑k=1√νkbk(x)Yk(ω)

converges to in , i.e. as . Assume w.l.o.g. that forms a complete orthonormal basis for , the set of functions in with zero mean. If this is not the case, we can restrict ourselves to . The following additional assumption imposes restrictions on the range of the random vectors we consider.

###### Assumption 3.1.

Assume the random variables are bounded uniformly in , i.e.

 ymin≤Yn(ω)≤ymax   a.s.\;on Ω for all n∈N and some ymin,ymax∈R.

Furthermore, assume that for any the probability measure of the random vector is absolutely continuous with respect to the Lebesgue measure and hence has joint density , where the hypercube denotes the range of .

Since depends on only through the intermediary variables , it seems reasonable to also estimate the unknown parameter as a function of these, i.e.

 qn(x,ω)=qn(x,Y1(ω),...,Yn(ω)).

The appropriate parameter space for the finite noise identification problem is not immediately apparent. In order for the finite noise optimization problem to approximate (), should at the very least be square integrable in , i.e. . With this parameter space, however, the finite noise problem suffers from the same lack of regularity encountered in the infinite dimensional problem (). In order to ensure both that the finite noise equality constraint is Fréchet differentiable and that the set of admissible parameters has a non-empty interior, we require a higher degree of smoothness in as a function of .

For the sake of our analysis, we therefore seek finite noise minimizers in the space , where is the space of functions with bounded mixed derivatives, [39]. A function is one for which the -norm,

 ∥v∥2˜Hmix:=∑|γ|∞≤s∑|α|1≤td∫D∫Γn∣∣DγyDαxv(x,y)∣∣2ρn(y)dydx (10)

is finite, where and are multi-indices, with , and when or when . Apart from considerations of convenience, the use of this parameter space is partly justified by the fact that forms a basis for . The minimizer of the original infinite dimensional problem () thus takes the form

 q∗(x,ω)=q∗0(x)+∞∑n=1qn(x)Yn(ω),

which is linear in each of the random variables . Any minimizer of () that approximates (even in the weak sense) is therefore expected to depend relatively smoothly on when is large. At low orders of approximation, on the other hand, the parameter that gives rise to the model output most closely resembling the partial data may not exhibit the same degree of smoothness in the variable . Since the accuracy in approximation of functions in high dimensions benefits greatly from a high degree of smoothness [7], this suggests the use of a dimension adaptive strategy in which the smoothness requirement of the parameter is gradually strengthened as the stochastic dimension increases.

We can now proceed to formulate a finite noise least squares parameter estimation problem for the perturbed, finite noise data :