Mean-variance risk-averse optimal control of systems governed by PDEs with random parameter fields using quadratic approximationsThis work was partially supported by NSF grants 1508713 and 1507009, and DOE grants DE-FC02-13ER26128, DE-SC0010518, and DE-FC02-11ER26052.

# Mean-variance risk-averse optimal control of systems governed by PDEs with random parameter fields using quadratic approximations††thanks: This work was partially supported by NSF grants 1508713 and 1507009, and DOE grants DE-FC02-13ER26128, DE-SC0010518, and DE-FC02-11ER26052.

Alen Alexanderian111Department of Mathematics, North Carolina State University, Raleigh, NC, USA ().    Noemi Petra222Applied Mathematics, School of Natural Sciences, University of California, Merced, CA, USA ().    Georg Stadler333Courant Institute of Mathematical Sciences, New York University, New York, NY, USA ().    Omar Ghattas444Institute for Computational Engineering & Sciences, Department of Mechanical Engineering, and Department of Geological Sciences, The University of Texas at Austin, Austin, TX, USA ().
###### Abstract

We present a method for optimal control of systems governed by partial differential equations (PDEs) with uncertain parameter fields. We consider an objective function that involves the mean and variance of the control objective, leading to a risk-averse optimal control problem. Conventional numerical methods for optimization under uncertainty are prohibitive when applied to this problem. To make the optimal control problem tractable, we invoke a quadratic Taylor series approximation of the control objective with respect to the uncertain parameter field. This enables deriving explicit expressions for the mean and variance of the control objective in terms of its gradients and Hessians with respect to the uncertain parameter. The risk-averse optimal control problem is then formulated as a PDE-constrained optimization problem with constraints given by the forward and adjoint PDEs defining these gradients and Hessians. The expressions for the mean and variance of the control objective under the quadratic approximation involve the trace of the (preconditioned) Hessian, and are thus prohibitive to evaluate. To overcome this difficulty, we employ trace estimators, which only require a modest number of Hessian-vector products. We illustrate our approach with two specific problems: the control of a semilinear elliptic PDE with an uncertain boundary source term, and the control of a linear elliptic PDE with an uncertain coefficient field. For the latter problem, we derive adjoint-based expressions for efficient computation of the gradient of the risk-averse objective with respect to the controls. Along with the quadratic approximation and trace estimation, this ensures that the cost of computing the risk-averse objective and its gradient with respect to the control—measured in the number of PDE solves—is independent of the (discretized) parameter and control dimensions, and depends only on the number of random vectors employed in the trace estimation, leading to an efficient quasi-Newton method for solving the optimal control problem. Finally, we present a comprehensive numerical study of an optimal control problem for fluid flow in a porous medium with uncertain permeability field.

O

ptimization under uncertainty, PDE-constrained optimization, optimal control, risk-aversion, PDEs with random coefficients, Gaussian measure, Hessian, trace estimators

{AMS}

60H15, 60H35, 35Q93, 35R60, 65K10

## 1 Introduction

An important class of problems arising in engineering and science is the optimization or optimal control of natural or engineered systems governed by partial differential equations (PDEs). Often, the PDE models of these systems are characterized by parameters (or parameter functions) that are not known and are considered uncertain and modeled as random variables. These parameters can appear for example as coefficients, boundary data, initial conditions, or source terms. Consequently, optimization of such systems should be done in a way that the computed optimal controls or designs are robust with respect to the variability in the uncertain parameters.

##### Literature survey and challenges

There is a rich body of literature on theoretical and computational aspects of optimal control of systems governed by PDEs [45, 25, 5, 9, 27], and of optimization under uncertainty (OUU) [6, 40, 4, 41]. Recently there has been considerable interest in solution methods for optimization problems lying at the intersection of these two fields, namely, optimization problems governed by PDEs with uncertain parameters [10, 9, 11, 26, 28, 44, 29, 32, 31, 34, 30, 13, 33, 16]. To discuss the challenges of optimal control, and more generally optimization, of systems governed by PDEs with uncertain parameters, we consider a real-valued optimization objective that depends on a control variable and an uncertain parameter , both of which can be finite- or infinite-dimensional. Throughout this article we refer to as the control objective. The evaluation of this control objective requires the solution of a system of PDEs. Namely, with , where is a PDE solution operator. Here, we assumed that the system of PDEs admits a unique solution for every pair of controls and parameters. This dependence of on the solution of a PDE makes the evaluation (and the computation of derivatives) of computationally expensive. The presence of uncertain parameters greatly compounds the computational challenges of solving the PDE-constrained optimization problem.

In an OUU problem, it is natural to seek optimal controls that make small in an average sense. For example, a risk-neutral optimal control approach seeks controls that solve

 minzE{Θ(z,m)}, (1)

where denotes expectation over the uncertain parameter . If we seek controls that, in addition to minimizing the expected value of with respect to , result in a small uncertainty in , we are led to risk-averse optimal control. In the present work, we use the variance of the control objective as a risk measure, and seek optimal controls that solve the problem

 minzE{Θ(z,m)}+\upbetaVar{Θ(z,m)}. (2)

Here, denotes the variance with respect to , and is a risk-aversion parameter that aims to penalize large variances of the control objective. This mean-variance formulation is only one of several formulations for finding risk-averse optimal controls. Other examples of more complex risk measures include the value at risk (VaR) and the conditional value at risk (CVaR) [37, 41]. Compared to the mean-variance formulation, these approaches do not symmetrically penalize the deviation of the quantity of interest around the mean, which is desirable for instance in applications where the control objective models a loss. In [33], the authors consider primal and dual formulations of a risk-averse PDE-constrained OUU problem using CVaR. They employ and study smooth approximations of the primal formulation to enable the application of derivative-based optimization methods to the CVaR objective, and rely on quadrature-based discretizations in the discretized parameter space.

To illustrate the main computational challenges involved in OUU problems, let us consider the (simpler) risk-neutral problem. A common approach to cope with the expectation in the objective function uses sampling over the random parameter space, , where is a sample set, and are sample weights. In the context of PDE-constrained OUU, evaluation of requires solving the PDE problem for each sample point . The sample set is chosen either by Monte Carlo sampling, where each is a draw from the distribution law of (and for every ), or, for a suitably low-dimensional parameter space, based on quadrature rules (and are quadrature weights). The Monte Carlo-based approach, sometimes referred to as sample average approximation (SAA), is computationally prohibitive for OUU problems governed by PDEs. This is due to the slow convergence of Monte Carlo and the resulting large number of PDE solves for each evaluation of the expectation. Quadrature-based methods, obtained from tensorization of one-dimensional quadrature rules, use regularity of with respect to and can accelerate convergence, but they suffer from the the curse of dimensionality—the exponential growth of the number of quadrature points as the dimension increases. The use of sparse quadrature [42] can mitigate but not overcome the curse of dimensionality. Quadrature-based methods can be improved significantly by using adaptive sparse grids (see e.g., [32, 9, 31], in which adaptive sparse grids are employed to solve OUU problems); however these approaches are still computationally expensive for problems with parameter dimensions in the order of hundreds or thousands. Another class of methods for OUU problems are stochastic approximation (SA) methods [36, 24, 22, 39]. Similar to methods based on SAA, SA methods are computationally intractable for PDE-constrained OUU problems with high-dimensional parameters due to their slow convergence and the resulting need for a prohibitively large number of PDE solves.

##### Approach

We consider an uncertain parameter that is modeled with a random field, which can also be viewed as a function-valued random variable. In the uncertainty quantification literature, it is a common to use an a priori dimension reduction provided by a truncated Karhunen–Loève (KL) decomposition for such problems. However, KL modes that appear unimportant in simulating the random process may turn out to be important to the control objective. Moreover, a priori truncation of the KL expansion of a random field is most useful if the eigenvalues of the covariance operator (of the uncertain parameter) exhibit rapid decay. This is not always the case, for example, in presence of small correlation lengths; in such cases, an a priori truncation needs to retain a large number of KL modes. We do not follow such approaches to avoid bias introduced by the truncated KL expansion. Instead, we seek formulations that preserve the problem’s infinite-dimensional character and work in an infinite-dimensional setting as long as possible. Moreover, we aim to devise algorithms whose computational complexity, measured in the number of PDE solves, is independent of the discretized parameter dimension.

In the present work, we employ quadratic approximations of the parameter-to-objective map, , to render the computation of the control objective and its gradient (as necessitated by a gradient-based optimization method) tractable. Related approaches for OUU with finite-dimensional uncertain parameters and inexpensive-to-evaluate (compared to problems governed by PDEs) control objectives are used in [18, 17]. More generally, linear or quadratic expansions with respect to uncertain finite-dimensional parameters have also been used for robust (finite-dimensional) optimization and reliability methods in engineering applications; we refer, e.g., to [21, 35]. Using this approach, we can compute the moments of the first and second-order Taylor expansions of analytically. For optimal control problems with infinite-dimensional parameters, computation of the derivatives with respect to the uncertain parameters and the controls is prohibitive using the direct sensitivity approach (or finite differences). Instead, we employ adjoint methods to avoid dependence on the dimension of the discretized parameter field. Our formulation is particularized to two model problems, the control of a semilinear elliptic PDE with an uncertain boundary source term, and the control of a linear elliptic PDE with an uncertain coefficient field. The latter is motivated by industrial problems involving the optimal control of flows in porous media.

As we will see, using the quadratic approximation of results in an OUU objective function that involves traces of operators that depend on the Hessian of this mapping. Since direct computation of these traces is prohibitive for high-dimensional problems (explicit computation of the Hessian requires as many PDE solves as there are parameters), we use trace estimation either based on random vectors, or on eigenvectors of the (preconditioned) Hessian at a nominal control. This only requires the action of the Hessian on vectors and is thus well suited for control problems governed by systems of PDEs.

##### Contributions

The main contributions of this work are as follows: (1) For an uncertain parameter field that follows a Gaussian distribution law, we derive analytic expressions for the mean and variance of a quadratic approximation to the parameter-to-objective map in infinite dimensions. These results are the basis for developing an efficient OUU approach that extends the work in [17] to a method suitable for large-scale PDE-constrained OUU problems. (2) We propose a formulation of the risk-averse OUU problem as a PDE-constrained optimization problem, with the constraints given by the PDEs defining the adjoint-based expression for the gradient and the linear action of the Hessian of the parameter-to-objective map. Our method ensures that the cost of computing the risk-averse objective and its gradient with respect to the control—measured in the number of PDE solves—is independent of the (discretized) parameter and control dimensions, and depends only on the number of random vectors used in the trace estimation. (3) We fully elaborate our approach for the risk-averse control of an elliptic PDE with uncertain coefficient field, and in particular derive the adjoint-based gradient of the control objective with respect to the control. We numerically study various aspects of our risk-aversion measure and of the efficiency of the method. The results show the effectiveness of our approach in computing risk-averse optimal controls in a problem with a -dimensional discretized parameter space.

##### Limitations

We also remark on limitations of our method. (1) Since we rely on approximations based on Taylor expansions, our arguments require smoothness (and proper boundedness of the derivatives) of the parameter-to-objective map. For the mean-variance formulation used here, this smoothness mainly depends on the governing PDEs and how the parameter enters these PDEs. (2) Compared to sampling-based methods, our approach requires first and second derivatives of with respect to , and thus appropriate adjoint solvers for the efficient computation of these derivatives. However, efficient methods for solution of optimal control problems governed by PDEs will require adjoint-based derivatives with respect to the control; only minor modifications are needed to obtain derivatives with respect to the uncertain parameters. (3) Our derivation of the mean and variance of the quadratic approximation to the parameter-to-objective map assumes that the parameter space is a Hilbert space, say , and that is defined and has the required derivatives in . As such, our framework does not apply to cases where the parameter space is a general Banach space. Even when we consider a Hilbert space of uncertain parameters, may not be defined for every in and have derivatives that are bounded in , in which case the expressions for the mean and variance could be used only formally to obtain an objective function for a risk-averse OUU problem. This is the case for the linear elliptic PDE problem with uncertain coefficient field, where the parameter-to-objective map is defined and has the required smoothness only in a subspace that has full measure. In section 3.3, we discuss such issues further and give conditions that ensure that the expressions for the mean and variance are well defined. Extensions to more general cases is a subject of our future work.

## 2 Preliminaries

We let be an infinite-dimensional real separable Hilbert space endowed with an inner product and induced norm . We consider uncertain parameters that are modeled as spatially distributed random processes, which can be viewed as function-valued random variables. Let denote such an uncertain parameter. We assume the distribution law of , which we denote by , is supported on and consider as an -valued random variable. That is, is a function, , where is a sample space, is an appropriate sigma-algebra, and is a probability measure; here, denotes the Borel sigma-algebra on . In what follows, with a slight abuse of notation, we denote the realizations of the uncertain parameter using the same symbol .

As is common practice, instead of working on the abstract probability space , we work in the image space , where is the law of , and use for an integrable function . As mentioned in the introduction, we assume that has a Gaussian probability law, . Here, is a self-adjoint, positive trace class operator and thus defines a Gaussian measure on .

We denote by the set of admissible controls, which is a closed convex subset of , where is a bounded open set with piecewise smooth boundary. We consider the control objective, as a real-valued function defined on . We refer to the mapping as the parameter-to-objective map. For normed linear spaces and we denote by the space of bounded linear transformations from to , and by the space of bounded linear operators on . For a Hilbert space , we use to denote the subspace of consisting of self-adjoint linear operators.

### 2.1 Linear and quadratic expansions

Recall that for a function , where is a Banach space, existence of first and second Fréchet derivatives at implies that

 f(m)=f(¯m)+f′(¯m)[m−¯m]+12f′′(¯m)[m−¯m,m−¯m]+o(∥m−¯m∥2X), (3)

with and . In the following, we consider the case where is a Hilbert space , and hence the derivatives admit Riesz representers in .

Next, we consider the control objective, and assume for an arbitrary control , existence of first and second Fréchet derivatives for the parameter-to-objective map at . Consider the linear and quadratic approximations of the parameter-to-objective map:

 Θlin(z,m) =Θ(z,¯m)+⟨Θm(z,¯m),m−¯m⟩, (4) Θquad(z,m) =Θ(z,¯m)+⟨Θm(z,¯m),m−¯m⟩+12⟨Θmm(z,¯m)(m−¯m),m−¯m⟩. (5)

Notice that for clarity, we denote first and second derivatives of with respect to by and , respectively. In this paper, we mainly use the quadratic approximation .

Using an approximation rather than the exact parameter-to-objective map enables computing the moments appearing in the objective function of the (approximate) risk-averse OUU problem analytically. This computation is facilitated by the fact that has a Gaussian distribution law. Note also that the accuracy of such linear and quadratic approximations, considered in an average sense, can be related to the variance of the uncertain parameter. In section 3, where we derive analytic expressions for the mean and variance of the local quadratic approximation to , we also describe how the variance of is related to the expected value of the truncation error in the quadratic approximation.

### 2.2 Probability measures on H

Here we recall basics regarding Borel probability measures, and Gaussian measures on infinite-dimensional Hilbert spaces. Let be a Borel probability measure on with finite first and second moments. The mean of is an element of such that,

 ∫H⟨s,b⟩μ(ds)=⟨¯a,b⟩for all b∈H.

The covariance operator of is a positive self-adjoint trace-class operator that satisfies

 ∫H⟨a,s−¯a⟩⟨b,s−¯a⟩μ(ds)=⟨Ca,b⟩for all a,b∈H.

It is straightforward to show that , where denotes the trace of the (positive self-adjoint) operator ; see, e.g., [14, p. 8]. Note also that

 ∫H⟨b,s−¯a⟩2μ(ds)=⟨Cb,b⟩. (6)

One can also show (see e.g.,[1, Lemma 1]) that for a bounded linear operator ,

 (7)

Now consider the case where the measure is a Gaussian, , where is a positive, self-adjoint, trace-class operator. The mean is assumed to belong to the Cameron–Martin space . The Cameron–Martin space is a dense subspace of and is a Hilbert space endowed with the inner product  [15]. While the space is dense in , it is in some sense very “thin”; more precisely, .

We follow the construction in [12], and define a Gaussian measure on , where is a bounded domain with piecewise smooth boundary, as follows: define the covariance operator as the inverse of the square of a Laplacian-like operator:

 C=(−κΔ+αI)−2=:A−2, (8)

where , and the domain of is given by . Here, is the Sobolev space of functions with square integrable first and second weak derivatives, and is the unit outward normal for the boundary . This construction of the operator ensures that it is positive, self-adjoint, and of trace-class and thus the Gaussian measure on is well-defined. A Gaussian random field whose law is given by such a Gaussian measure has almost surely continuous realizations [43].

As we saw in (7), given a Borel probability measure on with bounded first and second moments, we can obtain a simple expression for the first moment of a quadratic form on . If is a Gaussian measure on , we can also compute the second moment of a quadratic form (see Remark 1.2.9. in [15]). In particular, if we let be a self-adjoint bounded linear operator on , then

 (9)

## 3 Risk-averse OUU with quadratic approximation of the parameter-to-objective map

As discussed above, we consider a control objective , where has a Gaussian distribution law . In section 3.1, we analytically derive the moments of the quadratic approximation of . We discuss the approximation errors due to this approximation by studying the expected value of the remainder term in the Taylor expansion. In section 3.2, using the expressions for the moments of the quadratic approximation, we formulate the optimization problem for finding risk-averse optimal controls. Extensions of our OUU approach to problems where is defined only in a subspace of are discussed in section 3.3.

### 3.1 Quadratic approximation to a function of a Gaussian random variable

In this section, we compute mean and variance of defined in (5) in the infinite-dimensional Hilbert space setting. The following arguments are pointwise in the control and hence, for notational convenience, we suppress the dependence of on . We begin by establishing the following technical result: {lemma} Let be a Gaussian measure on , and be fixed, and let be a bounded linear operator on . Then,

 ∫H⟨b,s−¯a⟩⟨K(s−¯a),s−¯a⟩μ(ds)=0.
{proof}

Without loss of generality, we assume . Let be an orthonormal basis of eigenvectors of with corresponding positive eigenvalues . By we denote the orthogonal projection onto the span of the first eigenvectors. Observe that , and that . Since , we can apply the Lebesgue Dominated Convergence Theorem to obtain

 ∫H ⟨b,s⟩⟨Ks,s⟩μ(ds)=limn→∞∫H⟨b,πn(s)⟩⟨Kπn(s),πn(s)⟩μ(ds) =limn→∞n∑i,j,k=1⟨b,ei⟩⟨Kej,ek⟩∫H⟨s,ei⟩⟨s,ej⟩⟨s,ek⟩μ(ds)=0,

where in the last step we used that the random -vector defined by is an -variate Gaussian whose distribution law is . Note that we also use the result (see, e.g., [46]) that for a mean zero -variate normal random vector , for .

#### 3.1.1 Mean and variance of the quadratic approximation

Next, we derive expressions for the mean and variance of in the infinite-dimensional Hilbert space setting. {proposition} Let be a function that is twice differentiable at , with gradient and Hessian . Let be as defined in (5). Then,

 E{Θquad} =Θ(¯m)+12Tr[C1/2Θmm(¯m)C1/2], (10) Var{Θquad} =⟨Θm(¯m),C[Θm(¯m)]⟩+12Tr[(C1/2Θmm(¯m)C1/2)2]. (11)
{proof}

The first statement follows from,

 E{Θquad}=∫HΘquad(m)μ(dm) =Θ(¯m)+12∫H⟨Θmm(¯m)(m−¯m),m−¯m⟩μ(dm) =Θ(¯m)+12Tr[Θmm(¯m)C]=Θ(¯m)+12Tr[C1/2Θmm(¯m)C1/2].

To derive the expression for the variance, first note that the variance of equals the variance of . Thus,

 Var{Θquad}=E{(Θquad(m)−Θ(¯m))2}−E{Θquad(m)−Θ(¯m)}2. (12)

The first term on the right hand side is given by

 E{ =E{⟨Θm(¯m),m−¯m)⟩2}+14E{⟨Θmm(¯m)(m−¯m),m−¯m⟩2} +E{⟨Θm(¯m),m−¯m)⟩⟨Θmm(¯m)(m−¯m),m−¯m⟩}

This, along with and (12) finishes the proof.

Note that the expressions for the mean and variance of the linear approximation defined in (4) consist of only the first terms in (10) and (11), respectively. We also point out an intuitive interpretation of the covariance-preconditioned Hessian, , in the expressions for the mean and variance in (10) and (11). As the Hessian only appears preconditioned by the covariance , the second-order contributions to the expectation and the variance are large only if the dominating eigenvector directions of and are “similar”. More precisely, the eigenvectors of that correspond to large eigenvalues (i.e., directions of large uncertainty) only have a significant influence on the mean and variance if these directions are also important for the Hessian of the prediction. Conversely, important directions for the prediction Hessian only result in significant contributions to the second-order approximation of mean and variance if the uncertainty in these directions is significant.

We point out that while the Gaussian assumption on the distribution law of is required for derivation of the expression for in Proposition 3.1.1, the expression for can be derived without this assumption. Namely, it holds if the law of is any Borel probability measure on with bounded first and second moments. These assumptions on the law of are also sufficient for deriving the expressions for the mean and variance of ,

 E{Θlin}=Θ(¯m),Var{Θlin}=⟨Θm(¯m),C[Θm(¯m)]⟩.

Here, the expression for the mean is immediate and the one for variance follows from (6).

#### 3.1.2 Expected value of the truncation error

Next, we discuss the error due to replacing by the quadratic approximation . Assuming sufficient smoothness and boundedness of the derivatives of , we study the expected value of the truncation error as the uncertainty in the parameter decreases, i.e., we consider , where approaches zero. To gain intuition, consider the case when the covariance operator is such that parameter draws have a high probability of being close to the mean , where the quadratic approximation is more accurate. In this case, we can expect the truncation error to be small.

To derive a quantitative estimate, we assume that is three times continuously differentiable such that the remainder term in the quadratic expansion (3) has the form:

 R(m;¯m):=Θ(m)−Θquad(m)=13!Θ(3)(ξ)[m−¯m,m−¯m,m−¯m]. (13)

Here, denotes the third derivative with respect to at , which is an element of the line segment between and . Note that this and the following considerations are pointwise in the control variable . We are interested in the expectation value of the remainder term, which we will relate to , the average variance of .

Assuming is a uniformly bounded trilinear map, i.e., for all , we obtain

 ∫H|R(m;¯m)|μ(dm)≤K∫H∥m−¯m∥3μ(dm)=K∫H∥m−¯m∥∥m−¯m∥2μ(dm)≤K(∫H∥m−¯m∥2μ(dm))1/2(∫H∥m−¯m∥4μ(dm))1/2. (14)

Next, recall that . Moreover, using (9) with , we have

 ∫H∥m−¯m∥4μ(dm)=2Tr(C2)+Tr(C)2≤3Tr(C)2, (15)

where we have also used . Therefore, using (14) and (15), we have

 ∫H|R(m;¯m)|μ(dm)≤√3KTr(C)3/2.

Now, if we consider a family of laws , , for , then, the expected value of the remainder (13) is , as . This should be contrasted with the expected value of the remainder term for the linear expansion which can be shown to be . Note that if is cubic, the third derivative is constant and the expectation over the remainder term vanishes since follows a Gaussian distribution that is symmetric with respect to its mean.

The above argument regarding the expected value of the remainder in a Taylor expansion can be generalized for higher order expansions; see appendix A.

### 3.2 The OUU objective function

We can now give the explicit form for the objective function for the risk-averse OUU problem (2), in which we use rather than , and thus (10) and (11):

 (16)

Note that we have also added the control cost in (16). The numerical computation of operator traces appearing in the expressions for the mean and variance of the quadratic approximation is expensive and can be prohibitive for inverse problems governed by PDEs. Hence, we employ approximations obtained by randomized trace estimators [3, 38], which require only the application of the operator to (random) vectors and provide reasonably accurate trace estimates using a small number of trace estimator vectors (see also [2, Appendix A] for a result on an infinite-dimensional Gaussian trace estimator). Trace estimation for and amounts to

 Tr(C1/2ΘmmC1/2) (17) Tr[(C1/2ΘmmC1/2)2]

where , are draws from the measure . This form of the trace estimators is justified by the identities

 ∫H⟨Θmmζ,ζ⟩ν(dζ)=Tr(C1/2ΘmmC1/2),∫H⟨Θmmζ,C[Θmmζ]⟩ν(dζ)=Tr[(C1/2ΘmmC1/2)2].

Replacing the operator traces in the OUU objective function (16) using trace estimators results in the OUU objective function

 J(z):=Θ(z,¯m) (18a) +\upbeta2{⟨Θm(z,¯m),C[Θm(z,¯m)]⟩+12n% trntr∑j=1⟨ψj,Cψj⟩}+γ2∥z∥2, where for j∈{1,…,ntr}, ψj=Θmm(z,¯m)ζj,ζj∼N(0,C). (18b)

As an alternative to the randomized estimator in (17), we can use

 Tr(C1/2ΘmmC1/2)≈ntr∑j=1⟨wj,Θmmwj⟩,Tr[(C1/2ΘmmC1/2)2]≈ntr∑j=1⟨Θmmwj,C[Θmmwj]⟩,

with , where is an orthonormal basis of , and is an appropriate truncation level. One possibility is to choose , as the dominant eigenvectors of , where is a nominal control variable. Since in many applications the operator has a rapidly decaying spectrum, can be chosen small. While such an estimator is tailored to the Hessian evaluated at , we have observed it to perform well for values of the control variable in a neighborhood of . We will demonstrate the utility of this approach in our computational results.

Note that the existence of minimizers for (as well as (16)) depends on the control space and on properties of and , and must be argued on a case-by-case basis.

### 3.3 Extensions

In some applications, the control objective might be defined only for in a Banach subspace with . Let be such a subspace, and recall that since the Cameron–Martin space is compactly embedded in all subspaces of that have full measure [43], is compactly embedded in . Moreover, when using quadratic approximations, we need derivatives at . It is thus reasonable to require existence of derivatives only in the Cameron–Martin space. In such cases, one might be tempted to consider the restriction of to the Cameron–Martin space and use

 (19)

Here denotes the duality pairing between and its dual , and and are the gradient and Hessian of at , respectively. Note that we have suppressed the dependence of on .

The definition (19), however, is not meaningful from a measure-theoretic point of view as has measure zero. A possible remedy is to define a bounded and self-adjoint linear operator and to consider the composition . One possibility is to choose , in which case the smoothing is controlled by . The gradient and Hessian of are now given by

 Θδm(¯m)=SδΘm(Sδ¯m)∈H,Θδmm(m)=SδΘmm(Sδ¯m)Sδ∈L(H).

This way, one might consider the local quadratic approximation

 Θδquad(m)=Θδ(¯m)+⟨Θδm(¯m),m−¯m⟩+12⟨Θδmm(¯m)(m−¯m),m−¯m⟩,m∈H.

This construction allows to consider control objectives that are defined only on to be extended to via the mapping . Then, the Hilbert space formulation of the risk-averse OUU with quadratic approximations, developed in earlier sections, can be applied.

Another case where the Hilbert space theory needs extension is when is defined on a Banach subspace of full measure, and thus its gradient and Hessian belong to and , respectively. For example, in the control problem governed by a linear elliptic PDE with uncertain coefficient discussed in section 5, the space plays such a role (it is known [43, 19] that due to our choice of the covariance operator , ). In this case, we show that the expressions for the mean and variance of the quadratic approximation continue—under appropriate assumptions—to be well-defined.

Now, the gradient , and thus can be interpreted as a duality product, i.e., the linear action of on . Here, we have used that for the covariance operator defined above, we have .

Next, considering the expressions (10) and (11) for mean and variance, it remains to specify conditions that ensure that the operator is trace-class on . {proposition} Let be the covariance operator as defined in section 2.2. Assume that restricted to is a bounded linear operator on . Then, the operator is a trace class operator on . {proof} It is straightforward to see that . It remains to show that is trace-class. Let be the complete orthonormal set of eigenvectors of , with corresponding (positive) eigenvalues . We note that for , . Therefore, for each , we can write

 ⟨ej,C1/2Θmm(¯m)C1/2ej⟩=⟨C1/2ej,CΘmm(¯m)C1/2ej⟩E≤∥Θmm(¯m)∥∥C1/2Cej∥E∥C1/2ej∥E=∥Θmm(¯m)∥∥∥Cej∥∥∥∥ej∥∥=∥Θmm(¯m)∥λj.

Therefore, .

## 4 Control of a semilinear elliptic PDE with uncertain Neumann boundary data

We first illustrate our approach for the optimal control of a semilinear elliptic PDE with uncertain Neumann boundary data and a right hand side control. In this problem, the nonlinearity in the governing PDE is the sole reason why the quadratic approximation of the objective is not exact. Below, we present and discuss the optimization formulation for this PDE-constrained OUU problem.

We assume a bounded domain with boundary split into disjoint parts and , and we consider the semilinear elliptic equation

 −Δu+cu3 =z in D, (20) u =0 on ΓD, ∇u⋅\boldmathn =m on ΓN.

Here, , is the control and the uncertain parameter is , distributed according to the law with mean and covariance operator . Due to the monotonicity of the nonlinear term, the state equation (20) has a unique solution for every Neumann data and every , [20].

We consider a control objective of tracking type as follows:

 Θ(z,m)=12∥u−ud∥2, (21)

where is a given desired state. It is straightforward to show that the solution of (20) satisfies the estimate , for a constant . Hence, since has moments of all orders, it follows that the state variable also has moments of all orders for every . This in particular implies the existence of the first and second moments of the control objective for every . Therefore, a mean-variance risk-averse OUU objective function is well-defined.

To derive the quadratic approximation of (21) with respect to at the mean , we compute, for fixed control , the gradient and Hessian of with respect to . The gradient of with respect to is , where satisfies (20) with (the corresponding state is denoted by ), and satisfies the adjoint equation

 −Δ¯p+3c¯u2¯p =−(¯u−ud) in D, (22) ¯p =0 on ΓD, ∇¯p⋅\boldmathn =0 on ΓN.

The second derivative at evaluated in a direction is given by , where solves the incremental adjoint equation:

 −Δ^p+3c¯u2^p =−(6c¯u¯p+1)^u in D, (23) ^p =0 on ΓD, ∇^p⋅\boldmathn =0 on ΓN,

and the incremental state equation:

 −Δ^u+3c¯u2^u =0 in D, (24) ^u =0 on ΓD, ∇^u⋅\boldmathn =^m on ΓN.

We now show that is bounded as mapping from . From (24) it follows that . To estimate the -norm of the right hand side in (23), we consider an arbitrary . Using Hölder’s inequality and the continuous embedding of in , we obtain

 ∫D(6c¯u¯p+1)^uvdx ≤∥6c¯u¯p+1∥∥^uv∥≤(c1+c2∥¯u∥L4(D)∥¯p∥L4(D))∥^u∥L4(D)∥v∥L4(D) ≤c3∥^u∥H1(D)∥v∥H1(D),

where the constants do not depend on or . This shows that the -norm of the right hand side in (24) is bounded by . Hence, with constants :

 ∥^p|ΓN∥L2(ΓN)≤c4∥^p∥H1(D)≤c5∥^u∥H1(D)≤c6∥^m∥L2(ΓN),

which proves the boundedness of . Thus, the OUU objective function (16), specialized to the present example, with the gradient and Hessian operators and defined above is well-defined and conforms to the theory outlined in sections 3.13.2.

## 5 Control of an elliptic PDE with uncertain coefficient

Motivated by problems involving the optimal control of flows in porous media, we consider the optimal control of a linear elliptic PDE with uncertain coefficient field. We discuss this application numerically in section 6, where we consider control of fluid injection into the subsurface at injection wells. In this section, we describe the control objective and the PDE-constrained objective function for the risk-averse OUU problem, and derive the adjoint-based expressions for the gradient of the OUU objective function. This is an example for a problem in which the parameter-to-objective map is defined only on a Banach subspace of that has full measure. We thus use the expressions for the mean and variance of the quadratic approximation, developed in a Hilbert space setting in section 3, formally, to define the objective function for the risk-averse optimal control problem.

We begin by describing the state (forward) equation. On an open bounded and sufficiently smooth domain , with boundary , we consider the following elliptic partial differential equation:

 −∇⋅(em∇u) =b+Fz in D, (25) u =g on ΓD, em∇u⋅\boldmathn =0 on ΓN.

Here, the boundary is split into disjoint parts and on which we impose Dirichlet and Neumann boundary conditions, respectively. The Dirichlet data is