A fast Monte-Carlo method with a Reduced Basis of Control Variates applied to Uncertainty Propagation and Bayesian Estimation
The Reduced-Basis Control-Variate Monte-Carlo method was introduced recently in [S. Boyaval and T. Lelièvre, CMS, 8 2010] as an improved Monte-Carlo method, for the fast estimation of many parametrized expected values at many parameter values. We provide here a more complete analysis of the method including precise error estimates and convergence results. We also numerically demonstrate that it can be useful to some parametrized frameworks in Uncertainty Quantification, in particular (i) the case where the parametrized expectation is a scalar output of the solution to a Partial Differential Equation (PDE) with stochastic coefficients (an Uncertainty Propagation problem), and (ii) the case where the parametrized expectation is the Bayesian estimator of a scalar output in a similar PDE context. Moreover, in each case, a PDE has to be solved many times for many values of its coefficients. This is costly and we also use a reduced basis of PDE solutions like in [S. Boyaval, C. Le Bris, Nguyen C., Y. Maday and T. Patera, CMAME, 198 2009]. This is the first combination of various Reduced-Basis ideas to our knowledge, here with a view to reducing as much as possible the computational cost of a simple approach to Uncertainty Quantification.
keywords:Monte-Carlo method, Variance reduction, Control variate, Reduced Basis method, Partial Differential Equations with stochastic coefficients, Uncertainty Quantification, Bayesian estimation .
The Reduced-Basis (RB) control-variate Monte-Carlo (MC) method was recently introduced in Boyaval and Lelièvre (2010) to compute fast many expectations of scalar outputs of the solutions to parametrized ordinary Stochastic Differential Equations (SDEs) at many parameter values. But as a simple, generic MC method with reduced variance, the RB control-variate MC method can also be useful in other parametric contexts and the main goal of this article is to show that it can be useful to Uncertainty Quantification (UQ) too, possibly in combination with the standard RB method in a PDE context.
There is a huge literature on the UQ subject. Indeed, to be actually predictive in real-life situations Liu et al. (2008), most numerical models require (i) to calibrate as much as possible the parameters and (ii) to quantify the remaining uncertainties propagated by the model. Besides, the latter two steps are complementary in an iterative procedure to improve numerical models using datas from experiments: quantifying the variations of outputs generated by input parameters allows one to calibrate the input uncertainties with data and in turn reduces the epistemic uncertainty in outputs despite irreducible aleatoric uncertainty. Various numerical techniques have been developped to quantify uncertainties and have sometimes been used for years Ghanem and Spanos (1991); L. A. Bergman and Wojtkiewicz (1997). But there are still a number of challenges Debusschere et al. (2004); Burkardt et al. (2007); Doostan et al. (2007).
For PDEs in particular, the coefficients are typical sources of uncertainties. One common modelling of these uncertainties endows the coefficients with a probability distribution that presumably belongs to some parametric family and the PDEs solutions inherit the random nature of the uncertainty sources. A Bayesian approach is often favoured to calibrate the parameters in the probability law using observations of the reality Wang and Zabaras (2005); Marzouk and Najm (2009). But the accurate numerical simulation of the PDEs solutions as a function of parametrized uncertain coefficients is a computational challenge due to its complexity, and even more so is the numerical optimization of the parameters in uncertain models. That is why new/improved techniques are still being investigated Ghanem and Doostan (2006); Soize (2008). Our goal in this work is to develop a practically useful numerical approach that bases on the simple MC method to simulate the probability law of the uncertain coefficients. We suggest to use the RB control-variate MC method in some UQ frameworks to improve the computational cost of the naive MC method, in particular in contexts where some coefficients in a PDE are uncertain and other are controlled.
There exist various numerical approaches to UQ. The computational cost of MC methods is certainly not optimal when the random PDE solution is regular, see e.g. Cohen et al. (2010). But we focus here on MC methods because they are (a) very robust, that is useful when the regularity of the solution with respect to the random variables degrades, and (b) very easy to implement (they are non-intrusive in the sense that they can use a PDE numerical solver as a black-box, with the values of the PDE coefficients as input and that of the discrete solution as output). Besides, note that even when the random PDE solution is very regular with respect to the random variables, it is not yet obvious how algorithms can take optimal profit of the regularity of random PDE solutions and remain practically efficient as the dimension of the (parametric) probability space increases, see e.g. Cohen et al. (2011). So, focusing on a MC approach, our numerical challenge here is basically two-sided: (i) on the probabilistic side, one should sample fast the statistics of the random PDE solution (or of some random output that is the quantity of interest), and (ii) on the deterministic side, one has to compute fast the solution to a PDE for many realizations of the random coefficients. It was proposed in Boyaval et al. (2009) to use the RB method in order to reduce the numerical complexity of (ii), but this does not fully answer the numerical challenge. In particular, although the RB method can improve naive MC approaches at no-cost (since the selection of the reduced basis for the PDE solutions at various coefficients values can be trained on the same large sample of coefficients values that is necessary to the MC sampling), the resulting MC approach might still be very costly, maybe prohibitively, due to the large number of realizations that is necessary to accurately sample the statistics of the PDE solution (side (i) of our challenge above). In this work, we thus tackle the question how to reduce the numerical complexity of (i). We have in mind the particular but useful case where one is interested in the expected value of a random scalar output of the random PDE solution as a function of a (deterministic) control parameter, typically another (deterministic) coefficient in the UQ problem which is “under control”. (Think of the construction of response surfaces for a mean value as a function of control parameters.) A similar parametric context occurs in Bayesian estimation, sometimes by additionally varying the hyper parameters or the observations. In any case, our goal is to reduce the computational cost of a parametrized (scalar) MC estimation when the latter has to be done many times for many values of a parameter, and we illustrate it with examples meaningful in a UQ context.
To accelerate the convergence of MC methods as numerical quadratures for the expectation of a random variable, one idea is to tune the sampling for a given family of random variables like in the quasi-Monte-Carlo (qMC) methods Sloan and Woźniakowski (1998); Venkiteswaran and Junk (2005); Kuo et al. (2011). Another common idea is to sample another random variable with same mean but with a smaller variance. Reducing the variance allows one to take smaller MC samples of realizations and yet get MC estimations with confidence intervals of similar (asymptotic) probability. Many techniques have been designed in order to reduce the variance in various contexts Hammersley and Handscomb (1965); Jourdain (2009). Our RB control-variate MC method bases on the so-called control-variate technique. It has a specific scope of application in parametric contexts. But it suits very well to some computational problems in mathematical finance and molecular dynamics as shown in Boyaval and Lelièvre (2010), and can be useful in UQ as we are going to see.
The paper is organized as follows. In Section 2, we recall the RB control-variate technique as a general variance reduction tool for the MC approximation of a parametrized expected value at many values of the parameter. The presentation is a bit different to that in Boyaval and Lelièvre (2010), which was more focused on SDEs. Moreover, we also give new error estimates and convergence results. In Section 3, the RB control-variate MC method is applied to compute the mean of a random scalar output in a model PDE with stochastic coefficients (the input uncertainty) at many values of a control parameter. In Section 4, it is applied to Bayes estimation, first for a toy model where various parametric contexts are easily discussed, then for the same random PDE as in section 3.
We also note that this work does not only improve on the RB approach to UQ Boyaval et al. (2009) but also on an RB approach to Bayesian estimation proposed in Nguyen et al. (2009) with a deterministic quadrature formula to evaluate integrals. For both applications, to our knowledge, our work is the first attempt at optimally approximating the solution with a simple MC/FE method by combining RB ideas of two kinds, stochastic and deterministic ones Boyaval et al. (2010). Note that for convenience of the reader non-expert in RB methods, the standard RB method Maday (2006); Patera and Rozza (2007) is briefly recalled in Section 3.3.
2 The RB Control-Variate MC Method
The RB control-variate technique is a generic variance reduction tool for the MC approximation of a parametrized expected value at many values of the parameter. In this section we recall the technique for the expectation of a generic square-integrable random variable parametrized by . The principle for the reduction of computations is based on the same paradigm as the standard RB method and allows one to accelerate the MC computations of many at many values of . Our presentation is slightly different than the initial one in Boyaval and Lelièvre (2010) and gives new elements of analysis (error estimates and convergence results).
2.1 Principles of RB control-variate MC method
Let be a probability measure such that is a random variable in for all parameter values in a given fixed range . Assume one has an algorithm to simulate the law of whatever . Then, at any , one can define MC estimators that provide useful approximations of , by virtue of the strong law of large numbers
provided the number of independent identically distributed (i.i.d.) random variables , , is sufficiently large. Here, the idea is: if is already known with a good precision for parameter values , , () and if the law of depends smoothly on then, given well-chosen real values , , the standard MC estimator could be efficiently replaced by a MC estimator for that is as accurate and uses much less than copies of .
In other words, if the random variable
is correlated with such that the control of by reduces the variance, that is if
then the confidence intervals with asymptotic probability () for MC estimations
that are in the Central Limit Theorem (CLT)
At a given , we define , , in (2.2) to obtain the optimal variance reduction minimizing . Equivalently, the control variates of the form (2.2) are thus defined as best approximations of the ideal control variate that reduces the variance to zero
The , , thus solve a least-squares problem with normal equations for
where denotes the covariance between . And they are unique as long as the matrix with entries remains definite.
In general, the computation of a good control variate is difficult (the ideal one requires the result itself). But we proved in Boyaval and Lelièvre (2010) that control variates of the form (2.2) actually make sense in some contexts, a result inspired by Maday et al. (2002a, b). Let us denote by the Hilbert subspace of random variables that are centered .
Consider a set of random variables
where , , are uncorrelated and are functions. If there exists a constant and an interval such that, for all parameter ranges , there exists a diffeomorphism where
for all -derivatives of , then there exist constants independent of and such that, for all parameter ranges , for all , ,
where and uses the explicit values .
But the explicit construction of , , in Prop. 1, for random variables that are analytic in a 1D parameter , does not generalize to higher-dimensions in an efficient way, a well-known manifestation of the curse of dimensionality 111 A tensor-product of (2.9) in dimension yields a rate . . Yet, our numerical results (here and in Boyaval and Lelièvre (2010)) show our variance reduction principle can work well in contexts with higher-dimensional or less regular parametrization. Then, given a parametric family , , how to choose parameter values , , that can in practice efficiently reduce the MC computations at all ?
Given , let us define , , as the parameter values with optimal approximation properties in : the linear space spanned by minimizes the variances over all , or equivalently the , , minimize
The variates , , define an optimal -dimensional reduced basis in for the approximations of the ideal controls at all . The values in (2.10) coincide with a well-known notion in approximation theory Pinkus (1985): the Kolmogorov widths of in , and are upper bounds for the more usual Kolmogorov widths of in
Explicitly computing Kolmogorov widths is a difficult problem solved only for a few simple sets , let alone the computation of (non-necessarily unique) minimizers. But in a many-query parametric framework where the MC computations are queried many times for many values of the parameter, one can take advantage of a presumed smooth dependence of the MC computations on the parameter to identify a posteriori some approximations for the optimal parameter values , , with prior computations at only a few parameter values. Moreover, in practice, we shall be content with approximations of , that are not very good, provided they produce sufficiently fast-decaying upper-bounds of as . To our knowledge, this reduction paradigm has been applied for the first time in Boyaval and Lelièvre (2010) to minimize the variance of many parametrized MC estimations. It is inspired by the (by now) standard RB method to minimize the discretization error in many-query Boundary Value Problems (BVP) for parametrized PDEs (see Maday (2006); Patera and Rozza (2007) e.g. and Section 3.3).
Let us define linear combinations like (2.2) when by . We next define a (non-unique) sequence of parameter values , by the sucessive iterations of a greedy algorithm
For all , the parameter values , , approximate , in (2.10) and yield the following upper-bound
Using a finite discrete subset (possibly in case of finite cardinality ) and a MC estimator for the variance, like
we can finally define a computable sequence of parameter values , by sucessive iterations of a weak greedy algorithm at
For all , our computable approximations of the optimal parameter values , used hereafter in practical applications will be , , from a set like in (2.14). Note that the , , are constructed iteratively and thus do not depend on but only on (and possibly on ), which is useful in practice: the adequate value of for a good variance reduction is typically not known in advanced and one simply stops the weak greedy algorithm at when, at a given ,
is smaller than a maximum tolerance .
Even if greedy algorithms might yield approximate minimizers of (2.10) that are far suboptimal, numerical results show that they can nevertheless be useful to computational reductions using the RB control-variate MC method, just as they already proved useful to computational reductions with the standard RB method in numerous practical examples. Recent theoretical results Buffa et al. (2009); Binev et al. (2011) also support that viewpoint as concerns the standard RB method and can be straightforwardly adapted to our framework. Comparing directly with at same will not, in general, give estimates better than . And this is pessimistic as regards the convergence of the greedy algorithm insofar as it predicts variance reduction only for sets with extremely fast decaying Kolmogorov widths . Yet, the two sequences and typically have comparable decay rates as , and it holds for adequate given any :
where c) is sharper than b) if, and only if, or and . So, when the Kolmogorov widths decay fast (variance reduction is a priori possible), greedy algorithms can be useful if used with sufficiently many iterations 222 Then computational reductions are possible, but only for sufficiently many queries in the parameter of course, as we shall see. .
The latter results can also be adapted to the weak greedy algorithm, like in Binev et al. (2011). For all and , given any , we define the smallest number of realizations such that
holds for all and control variates of dimension . Of course, we do not know exactly in practice, which could even not be finite if variations in are not smooth enough. But if, for all , we assume that one can use more than realizations in the computations of step of the weak greedy algorithm, then the following events occur with probability more than , for adequate given any :
see A for a proof. Of course, the constants in the upper-bounds above can be overly pessimistic. But our numerical results show that one does not need to go to the asymptotics in order to get good variance reduction results.
2.2 Implementation and analysis
At any step of a weak greedy algorithm, whether we are looking for a new parameter value or not ( reached), the practical implementation of our RB control-variate MC method still requires a few discretization ingredients. In particular, one has to be able to actually compute approximations of and whatever . We are now going to suggest a generic practical approach to that question. On the contrary, practically useful choices of specifically depend on the parametric context and will be discussed only for specific applications in the next sections.
First, we need to approximate the expectations , . To obtain good useful approximations of in general, we suggest to use an expensive MC estimation , independent from subsequent realizations of the random variable . Although this is a priori as computationally expensive as approximating well at any by a naive MC estimation, we do it only once at each of the few . Once a realization of has been computed at step of the weak greedy algorithm () with realizations, the deterministic result can be stored in memory. We denote the computable approximations of by , , so that the actual control variate used in practice is in fact a linear combination in a linear space of dimension spanned by , . Note that given we expect realizations of to be computed concurrently with realizations of , using the same pseudo-random numbers generated by a computer, with approximately the same cost.
Second, for any , we need to replace the coefficients , , with a tractable solution of the minimization problem (2.5), so that practical control variates in fact read
Now, the MC computations at any one specific should be fast, in particular the approximate solution to the minimization problem (2.5), whether we decide to stop the weak greedy algorithm at step and use only parameter values or still want to explore the parameter values in to select a new parameter value . So we invoke only a small number of realizations of and to compute the . Note that there are different strategies for the numerical solution to a least-squares problem like (2.5) that use only realizations of and . One can try to compute directly the solution to an approximation of the linear system (2.6) where variances and covraiances are approximated with MC estimations similar to (2.13) using i.i.d. realizations and , , of and . The MC estimations similar to (2.13) are indeed interesting insofar as they remain positive, see Welford (1962); West (1979) for a study of consistency. But there may still be difficulties when is too ill-conditioned. One can thus also apply a QR decomposition approach to a small set of realizations, using the Modified Gram-Schmidt (MGS) algorithm for instance 333 The matrix with entries uniquely decomposes as where is a matrix such that is the identity matrix and is an upper triangular matrix. The -dimensional vector with entries , , useful in is next defined as (where is the -dimensional vector with entries ). The are thus random variables depending on realizations and an approximate minimum of (2.5) is . . In any case, realizations of and , , should be correlated and thus computed with the same collection of random numbers in practice. Besides, one can use a single collection of random numbers (generated by initializing the random number generator at one fixed seed) for all , including , , and realizations of can thus be computed only once (at step of the weak greedy algorithm) and next stored in memory.
Finally we approximate by one draw of
whatever and compute a MC estimation (2.13) of with the same realizations of and . Remember that at step of the weak greedy algorithm, (2.17) is useful either to inspect all parameter values , check whether is reached and next select a new parameter value when it is not. Or if has been reached before, then (2.17) is used for definitive estimations at any (still with companion variance estimations (2.13) to concurrently certify that is maintained).
Let us compare our strategy with the cost of direct MC estimations that have same confidence levels and thus use realizations such that . We denote by the cost of one realization at one parameter value compared with that of one multiplication. The evaluation of realizations at each , plus multiplications for the QR solution to the least-squares problem, multiplications for the MC estimation of the output expectation, and for the variance, make our RB control-variate MC method interesting as soon as , at least for “real-time” purposes. Then, the price of identifying control variates with the weak greedy algorithm pays back if one needs to compute very fast for any , provided the control variates still provide good variance reduction in the case where they were constructed by a weak greedy algorithm trained on . In addition, the RB control-variate MC method is also interesting in the many-query cases where one has to compute for a sufficiently large number of parameter values . At each greedy step , in addition to variance estimations at each , the selection of requires a quicksort of the sample of estimated variances , plus the computation of and realizations of to be stored. The cost of the greedy algorithm is then compensated by variance reductions as soon as .
We also have a posteriori error estimates. For values given by a fixed MC estimation with realizations independent of new realizations in (2.17), CLT (2.4) holds444 If the realizations in (2.17) are the same as the ones used to compute , then (2.17) is not a MC empirical estimator of the type . It does not use independent realizations of the random variable , so the CLT does not hold and it is difficult to give a rigorous quantitative estimate of the statistical error. , , for
where the MC empirical estimator is defined in (2.17) and where one can replace by a MC estimator by Slutsky theorem. Furthermore, although (2.18) is not a full error analysis because it does not take into account the bias , a function of and precomputed realizations, probabilities like (2.18) are conditionally to (and to ), so Bayes rule applies and it also holds, for all :
where variances have been replaced by estimators.
Last, to get a full convergence analysis that predicts an efficient variance reduction with the RB control-variate MC method in practice, at least when , one should take into account all realizations in which in fact reads . Yet, note first that in the weak greedy algorithm (2.14), only the realizations introduce new statistical error. And second, one can again predict that, with a good probability, the greedy algorithm is robust to discretization (in the same sense as in Binev et al. (2011)) provided the realizations of the least-squares problems are not too close to rank-deficient and their numerical solution is close to the solution. We do not state this more rigorously but instead refer to Binev et al. (2011), whose results can be adapted in the same way as in A for the week greedy algorithm.
3 Application to Uncertainty Propagation
In this section, we numerically demonstrate the efficiency of the RB control-variate MC method for uncertainty propagation in a representative UQ framework. As example, we consider a PDE parametrized by stochastic coefficients and other non-stochastic coefficients which we term control parameters. The goal is to compute fast many expectations of a scalar output of the random PDE solution for many values of the control parameters.
For the numerical simulations, as usual in UQ frameworks Ghanem and Spanos (1991), we use stochastic coefficients that are random fields with a Karhunen-Loève (KL) spectral decomposition. In practice, one can only use representations with a finite-rank of course and the MC method allows one to simulate the law of the random field just by generating realizations of the random numbers in the KL decomposition. We compute approximations to the realizations of the random PDE solution with a standard FE discretization method (the same one whatever the realizations). The expectation of the random scalar output can be computed for many values of a control parameter just by reiterating many times the MC procedure. But this is costly. So first, we do not actually compute the PDE solution with the FE method at each realization of the stochastic coefficients and for each control parameter value. In fact, we replace it with a cheap reliable surrogate built with the standard RB method like in Boyaval et al. (2009). Yet, even with a cheap surrogate model instead of the full FE, the MC method is still costly, because computing the expectation of the parametrized output under the input probability law still requires a very large number of realizations. This quickly becomes untractable as soon as one has to do it many times for many values of the control parameter. Then, we use the RB control-variate MC method to decrease the number of MC realizations needed at most of the control parameter values.
Compared with Boyaval et al. (2009), one uses here an improved MC method to show how various RB ideas can be combined to efficiently tackle some uncertainty propagation problems for partial differential equations with stochastic coefficients. We thus especially discuss the practice of the RB control-variate MC method here. Yet, although the example is exactly the same as in Boyaval et al. (2009), the specific use of the standard RB method in the frame of UQ, and in combination with the RB control-variate MC method, requires special care. That is why at the end of this section we briefly expose our implementation of the standard RB method, without which we cannot use the MC method at a reasonable price, and next discuss our specific use of it.
3.1 An elliptic PDE with stochastic coefficients
Consider first the solution to a scalar Robin Boundary Value Problem (BVP) in a regular domain : satisfies Laplace equation in
and conditions of third type on the boundary
where and denote the usual divergence and gradient operators in equipped with a cartesian frame, the outward unit normal on , and , and are scalar parameter functions. For the simulations we will choose in particular
where is the characteristic function of ,
and the boundary decomposes into subsets
As output, we consider the compliance 555 For symmetric problems like (3.6), the output (3.7) is a particularly simple choice because it allows a simple accurate a posteriori error estimation without invoking a dual problem, but neither this choice nor the symmetric character of the problem are limitations to our approach, see e.g. Nguyen et al. (2005).
The accurate numerical discretization of (3.6) is standard, for instance using the Finite-Element (FE) method. In what follows, we shall invoke continuous piecewise linear approximations defined in conforming, regular FE spaces of .
To fix ideas, one could think of as the temperature in a fin subject to a constant radiative flux on (in contact with a heat source) and to convective thermal exchanges on , see Fig. 1. The function determines the value of the Biot number along , that is the intensity of local heat transfers on the part of the boundary in contact with a fluid. Now, the Biot number is typically uncertain. We shall model it as a random field in a probability space and one is typically interested in quantifying the uncertainty propagated in . More precisely, we define an essentially bounded function , positive almost everywhere (a.e.) on with probability (-almost surely or a.s. in abbrev.). Then, with a slight abuse of notation, now denotes the solution in to (3.1–3.2) when . And, for many values of the control parameter , we consider computing expectations under of the output . To that aim, we will propagate the uncertainty (random “noise”) from to the scalar (random) output variable by simulating the law of . We recall that here we focus on MC discretizations of the noise. Compared with other discretizations Matthies and Keese (2005); Babuška et al. (2007), it is very easy to implement, in particular because it is not intrusive with respect to existing discretizations of the deterministic problem, although the convergence is a priori much slower when the stochastic variations are smooth Cohen et al. (2010, 2011).
We consider a bounded second-order stochastic process as random input. We denote its constant mean value on by , the spatial correlation length between random fluctuations by and the relative intensity of the (bounded) random fluctuations around the mean by . More precisely, like in Boyaval et al. (2009), the random field shall be the limit in of random fields whose covariance defines a kernel operator of finite rank in and which can be easily simulated with a MC method using independent random scalar parameters in a priori given ranges. For the numerical simulations, we choose a gaussian kernel to define a covariance operator in , where are in the same connected component of the boundary in Fig. 1. The covariance operator has eigencouples , and a normalized trace . Then, invoking uniformly distributed independent variates for each spectral mode (the same on each size of the boundary in Fig. 1 for the sake of simplicity), we define by its KL representation
which makes sense since truncated representations
Approximations converge to in as and for a.e. , realizations of the solution to (3.10) solve a variational formulation with test function
parametrized by and realizations