Bayesian inference using intermediate distribution based on coarse multiscale model for time fractional diffusion equation L. Jiang acknowledges the support of Chinese NSF 11471107.

# Bayesian inference using intermediate distribution based on coarse multiscale model for time fractional diffusion equation ††thanks: L. Jiang acknowledges the support of Chinese NSF 11471107.

Lijian Jiang Institute of Mathematics, Hunan University, Changsha 410082, China. (ljjiang@hnu.edu.cn), Corresponding author.    Na Ou College of Mathematics and Econometrics, Hunan University, Changsha 410082, China (oyoungla@163.com).
###### Abstract

In the paper, we present a strategy for accelerating posterior inference for unknown inputs in time fractional diffusion models. In many inference problems, the posterior may be concentrated in a small portion of the entire prior support. It will be much more efficient if we build and simulate a surrogate only over the significant region of the posterior. To this end, we construct a coarse model using Generalized Multiscale Finite Element Method (GMsFEM), and solve a least-squares problem for the coarse model with a regularizing Levenberg-Marquart algorithm. An intermediate distribution is built based on the approximate sampling distribution. For Bayesian inference, we use GMsFEM and least-squares stochastic collocation method to obtain a reduced coarse model based on the intermediate distribution. To increase the sampling speed of Markov chain Monte Carlo, the DREAM algorithm is used to explore the surrogate posterior density, which is based on the surrogate likelihood and the intermediate distribution. The proposed method with lower gPC order gives the approximate posterior as accurate as the the surrogate model directly based on the original prior. A few numerical examples for time fractional diffusion equations are carried out to demonstrate the performance of the proposed method with applications of the Bayesian inversion.

Key words. Bayesian inversion, GMsFEM, intermediate distribution, time fractional diffusion equation

AMS subject classifications. 65N99, 65N30, 35R60

## 1 Introduction

Fractional-differential equations play a crucial role in modeling many physical phenomenon. They are broadly applied in damping laws, fluid mechanics, viscoelasticity, biology, physics, engineering and the modeling of earth quakes due to anomalous diffusion effects in constrained environments [1, 14]. These fractional equations, which can be asymptotically derived from basic random walk models or generalized master equations, are physical-mathematical approach to anomalous diffusion. In practice, the inputs or parameters in the fractional models are often unknown and need to be identified based on some observation data and prior information. The problem of identifying unknown inputs in mathematical models has been intensively studied in the framework of inverse problems and various numerical methods have been developed [3, 30, 23].

Because this kind of inverse problems involve sparse and indirect observations and the uncertainties from the prior information of forward models, they are often ill-posed. In practical applications, the inevitable measurement error would render great challenge in obtaining stable and accurate numerical solutions of the inverse problems. Iterative regularization methods [5] can be used to solve the ill-posed inverse problems. Another classical approach to regularize inverse problems is through the least-squares approach and penalty functional regularization [10, 36]. Both of them lead to point estimates of the parameters. However, what we are often interested in may not only point estimates but also the statistical distribution of the parameters. This can be fulfilled by Bayesian inference. In this paper, we resort to Bayesian inference to identify unknown parameters in fractional diffusion equations.

The Bayesian approach [21, 36, 12] incorporates uncertainties in noisy observations and prior information, and can derive the posterior probability density of the parameters, which enables us to quantify the uncertainty in the parameters. We can use the posterior conditional expectation or maximum a posterior (MAP) to characterize the parameters. However, the nonlinearity of the parameter-to-observation map and lack of analytical form of the forward model make the implementation of posterior expression very difficult. Alternatively, one can use Markov chain Monte Carlo (MCMC) [25, 4, 12] techniques to construct Markov chains whose stationary distribution is the posterior density. The efficiency of many standard MCMC chains degrades with respect to the dimension of parameters [33, 29, 32]. The DREAM [37, 22] algorithm employs simultaneous multiple Markov chains (generally 3-5) and uses the differential evolution algorithm to generate trial points for each chain. It has been particularly successful in finding appropriate trial parameters, which can improve the efficiency of MCMC chains. The DREAM algorithm has successfully coped with high-dimensionality, nonlinearity, non-convexity, multimodality and numerous local optima problems [37].

The MCMC approach entails repeated solutions of the forward model. When the forward model is computationally intensive, such as multiscale models, a direct application of Bayesian inference would be computationally prohibitive. To significantly improve the simulation efficiency, reducing order or searching for surrogates of the forward models, or seeking more efficient sampling from the posterior [7, 26] are necessary to accelerate the Bayesian inference. Numerical multiscale methods can efficiently and accurately solve multiscale models in a coarse grid without resolving all scales in very fine grid. Multiscale Finite Element Method (MsFEM) [15] is one of the methods to efficiently simulate multiscale models. The basic idea of MsFEM is to incorporate the small-scale information into multiscale basis functions and capture the impact of small-scale features on the coarse-scale through a variational formulation. One of the most important features for MsFEM is that the multiscale basis functions can be computed overhead and used repeatedly for the model with different source terms and boundary conditions. Many other multiscale methods [2, 6, 16, 18, 20] share the similarities with MsFEM. Generalized Multiscale Finite Element Method (GMsFEM) [9, 8] has been developed to solve multiscale models with complex multiscale structures. GMsFEM has some advantages over the standard MsFEM. For example, the coarse space in GMsFEM is more flexible and the convergence of GMsFEM is independent of the contrastness of the multiscales.

In Bayesian inversion, the model’s output depends on the random parameters. We can use generalized polynomial chaos (gPC)-based stochastic Galerkin methods to propagate prior uncertainty through the forward model. As an alternative to the stochastic Galerkin approach, stochastic collocation requires only a few number of uncoupled deterministic simulations, with no reformulation of the governing equations of the forward model. We assume that the model’s output is a stochastic field and admits a gPC expansion. Then we choose a set of collocation nodes and use least-squares methods to determine the coefficients of the gPC basis functions. We call the method as least-squares stochastic collocation method (LS-SCM). This method shares the same idea as probabilistic collocation method [24]. LS-SCM inherits the merits from stochastic Galerkin methods and collocation methods. The recent work [39] employs a stochastic collocation algorithm using l1-minimization to construct stochastic sparse models with limited number of nodes, and their strategy has been applied to the Bayesian approach to handle nonlinear problems. The authors of [17] present a basis selection method that used with l1-minimization to adaptively determine the large coefficients of polynomial chaos expansions. Such sparse stochastic collocation methods may give a feasible approach to solve problems in high dimension random spaces.

The gPC surrogate is usually constructed based on a prior density [19]. However, the posterior is concentrated in a small portion of the entire prior support in many inference problems. Thus, it may be much more efficient to build surrogates only over the important region of a posterior than the entire prior support. In this paper, we construct an intermediate distribution by solving a coarse reduced optimization problem, on which a surrogate is constructed. By the proposed method, we obtain the posterior density of unknowns, instead of getting point estimate or confidence intervals as the deterministic method do, the posterior density provides more quantities of interest. As a comparison to the two-stage MCMC method [7], which uses the coarse forward model to reject samplers with low probability via MH algorithm, the intermediate distribution can be easily obtained by a gradient-based iteration method, and it is not required to be close to the posterior or works as the proposal as in the two-stage MCMC method, i.e., the intermediate distribution is not a part of the Markov chain. Moreover, the intermediate distribution is obtained from the sampling distribution and considerably reduces uncertainty of the unknown parameters compared to the prior density, the gPC surrogate constructed based on the distribution leads to better approximate posterior density than the one derived by the original prior. In the proposed approach, we exclude the unimportant region of the posterior by constructing the intermediate distribution, the acceptance rate of the Markov chains has been improved significantly.

Based on the intermediate distribution, we construct the reduced order model by combing GMsFEM with LS-SCM. This gives a representation for the model response. We solve an optimization problem based on the coarse reduced order model, where the measured data are used to inform the significant region of the posterior, and we obtain the approximate sampling distribution. Then we construct the intermediate distribution based on the sampling distribution, on which we use GMsFEM and LS-SCM to construct the surrogate model, and use the DREAM algorithm to explore the surrogate posterior density. We have combined the deterministic and statistical method to solve inverse problems and obtain the statistical properties of the unknowns. We find that the surrogate based on the intermediate distribution and lower gPC order leads to the approximate posterior as accurate as the the surrogate model directly based on on the original prior. This can significantly alleviate the difficulty in simulating problems in high-dimensional random spaces

The paper is organized as follows. We begin with formulating a time-fractional diffusion equation and its inverse problems in Section 2 and introduce the framework of surrogate construction, especially the ideas of constructing the intermediate distribution. Section 3 is devoted to reducing the dimensionality for the model parameter space and the model state space. This includes Karhunen-Loève expansion, particularly for the multiple random fields. The model reduction method using GMsFEM and LS-SCM is presented in the section. In section 4, we present a few numerical examples to illustrate the performance of proposed method with applications in inverse time-fractional diffusion equation problems. Some conclusions and comments are made finally.

## 2 Statistical inversion based on intermediate distribution

Anomalous diffusion has been well-known since Richardson’s treatise on turbulent diffusion in 1926 [31]. The role of anomalous diffusion has received attention within the literatures to describe many physical scenarios [35, 38], most prominently within crowded systems, for example, protein diffusion within cells and diffusion through porous media. Fractional kinetic equations of the diffusion, diffusion advection and Fokker Planck type are presented as a useful approach for the description of transport dynamics in complex systems, which are governed by anomalous diffusion and non-exponential relaxation patterns. The typical time-fractional diffusion equation is defined by

 q(x)cDγtu−div(k(x)∇u)=f(x,t),  x∈Ω,t∈(0,T], \hb@xt@.01(2.1)

subject to an appropriate boundary condition and initial condition. Here denotes state variable at space point and time , and is the fractional order of the derivative in time, where the Caputo fractional derivative of order with respect to time is used and defined by

 cDγtu=1Γ(1−γ)∫t0∂u(x,s)∂sds(t−s)γ, \hb@xt@.01(2.2)

where is the function. Here is a spatial varied diffusion coefficient, is the specific field and the term is a source (or sink) term. To simplify the function notations, we will suppress the variables and in functions when no ambiguity occurs. For practical models, the model inputs such as , and may be not known, and they need to be estimated by some observations or measurements. In the paper, we focus on the inverse problems for the fractional diffusion model (LABEL:frac-eq).

### 2.1 Bayesian inference

In the paper, we use Bayesian inference to estimate unknown parameters by some given noisy measurements of the model response at various sensors. We consider the case of additive noise with probability density function , then the measurement data can be expressed by

 D=G(m)+ε, \hb@xt@.01(2.3)

where is a vector of model parameters or inputs and is the observed model response at measurement sensors. Here is the dimension of observations. We assume that is independent of , then the conditional probability density for the measurement data given the unknown , i.e., the likelihood function is given by

 π(d|m)=πN(d−G(m)). \hb@xt@.01(2.4)

We use Bayesian inference to solve the inverse problem. This approach results in measure of uncertainty but not only a single-point estimate of the parameter. This is an advantage of Bayesian method over the standard regularization method. In the Bayesian setting, both and are random variables. Then the posterior probability density for can be derived by Bayesian rule,

 π(m|d)∝π(d|m)π(m), \hb@xt@.01(2.5)

where is the prior distribution with available prior information before the data is observed. The data enters the Bayesian formulation through the likelihood function . For the convenience of notation, we will use to denote the posterior density and to denote the likelihood function . Then (LABEL:Bayes) can be rewritten as

 πd(m)∝L(m)π(m). \hb@xt@.01(2.6)

Furthermore, if the prior density is conditional to unknown parameter , i.e., , the parameter is also a part of the inference problem in the Bayesian framework. In other words, these hyperparameters may be endowed with priors and estimated from data

 π(m,θ|d)∝L(m)π(m|θ)π(θ).

The vector is assumed to be independent and identically distributed (i.i.d.) Gaussian random vector with mean zero and standard deviation ,

 ε∼N(0,σ20I),

where is the identity matrix of size . Then the likelihood defined in is given by

 L(m)=(2πσ20)−nd2exp(−∥d−G(m)∥222σ20), \hb@xt@.01(2.7)

where refers to the Euclidean norm. We note that it is not necessary to compute the normalized term in (LABEL:like-L) under most circumstances. As the posterior distribution of can be inferred, we can extract the posterior mean or the maximum a posteriori (MAP) of the unknowns

 mMAP=argmaxπd(m).

The MAP estimate is equivalent to the solution of a regularization minimization problem for some specific priors. However, the analytical expression of the posterior distribution is generally unavailable and the high dimension integration involved in posterior expectation is a great challenge. Markov chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability distribution based on constructing a Markov chain that has the desired distribution as its equilibrium distribution, and we can use the MCMC methods to explore the posterior state space of the unknowns.

We can calculate the credible intervals for inferred parameters and the model response based on posterior samplers. The credible interval for a random variable is defined as

 Prob(μL

and the prediction interval for a random response is referred to the pair of statistics constructed from a random observation such that

 Prob(ΥL

where is a new observation at the point , which is independent of the data used to construct and . Once the corresponding samplers obtained, the credible interval and prediction interval are easily constructed. For example, for , we put the samplers in ascending order and determine the location of the 0.025 and 0.975 values, since we have the Monte Carlo integration in calculating the quantile for a random variable as

 p=∫μ0−∞πd(m)dm=1MM∑k=1I(mk<μ0)=Mμ0M,

where the sequence are the samplers taken from the density , and is an indicator function. We can obtain the location of by , i.e., the th sampler in the ascended samplers is the quantile .

### 2.2 Reduced order model and intermediate distribution

Let be parameter space, and , we can directly apply the Bayesian formulation described in the previous section and explore the posterior density of with MCMC. When using finite element method (FEM) to solve the forward model, the equation (LABEL:frac-eq) leads to a system of algebraic equations

 K(m)u=f,  G(m)=Cu,

where is the nonlinear forward operator depending on the parameter , is the solution or state, is the source and is the degree of freedoms in the FEM, and is the observation operator, is the observation vector.

The dimension of the random vector may be high-dimensional because it depends on the discretization of the physical domain. High dimensionality will make MCMC exploration of posterior more challenging. Besides, we note that the dimension number of the states scales with the degree of freedoms of the FEM, but the dimension of the observable outputs in practice. Thus, it is necessary to construct a reduced model to accelerate the forward model computation. We assume that the parameter and state can be adequately approximated in the span of parameter and state bases and , respectively. This leads to a reduced algebraic system corresponding to (LABEL:frac-eq) as follows

 Kr(z)ur=fr,  Gr(z)=Crur,

where

 Kr(z)=RTK(Ez)R,  fr=RTf,  Cr=CR. \hb@xt@.01(2.8)

Here denotes the reduced forward model depending on the reduced parameter , is the reduced state, is the projected source, is the reduced model observation operator, and are the reduced model outputs. The reduction in parameter space is enforced directly by assuming . We note and . Thus, both the states and the parameters reside in lower dimensional spaces.

In additional to reducing the parameter space and state space, we also use stochastic response surface methods (e.g., generalized polynomial chaos) to construct surrogate, which can be used to accelerate evaluations of the posterior density. Generally, we can construct the surrogate based on the prior density using the reduced model and stochastic response surface methods. However, constructing a sufficiently accurate surrogate model over the support of the prior distribution may not be possible in many practical problems. As the posterior density reflects some information gain relative to the prior distribution, this motivates us to seek an intermediate distribution from a sampling distribution. To this end, we solve the nonlinear optimization problem for the derived reduced order model

 Fr(z)=min{12∥d−Gr(z)∥22}. \hb@xt@.01(2.9)

From a Frequentist perspective, given the observed data, the solution to problem (LABEL:coarseopt) is one realization of the ordinary least square (OLS) estimator , where

 ^z=argmin12∥D−Gr(z)∥22.

Here is a random vector as noted in equation (LABEL:stac-mod), is a realization of . Let denote the true but unknown value of the parameter set that generated the observation . Then we have the following conclusion [34]: when the measurement errors are i.i.d. and , or is sufficiently large, we can specify a sampling distribution for and it can be directly or asymptotically established that

 ^z∼N(z0,V),

where the covariance matrix is given by

 V=σ20[JT(z0)J(z0)]−1,

and denotes the sensitivity matrix whose elements are

 Jik(z)=∂[Gr]i(z)∂zk.

The asymptotical distribution is the so-called sampling distribution. When we use the reduced model to approximate the forward model as shown in the optimization problem (LABEL:coarseopt), the model reduced error is unknown. This leads the error between measured data and the reduced model unknown, the covariance matrix can be approximated by

where

 σ2OLS=1nd−nzRTR,R=d−Gr(zOLS).

Furthermore, if we use to denote the th diagonal element of and the th element of the true parameter , then the confidence interval for is

 [zOLSk−tnd−nz,1−α0/2σOLS√δk,zOLSk+tnd−nz,1−α0/2σOLS√δk].

As in practice, the t-distribution with degrees of freedom is approximated by the normal distribution. Following the rule of normal distribution, we have the assertion: for each , it has high probability to be included in the interval

 [zOLSk−2σOLS√δk,zOLSk+2σOLS√δk],

where is a realization of . We have gotten an interval estimation for in the probability sense, and assume it uniformly distributed on the interval, which is used as the new prior in Bayesian inference. To distinguish it from the original prior, we call the new prior as the intermediate distribution. The diagonal element of is often very small, which implies the support of the intermediate distribution narrower than the original prior’s. We will construct surrogate based on the intermediate distribution.

### 2.3 DreamZS sampling method

After obtaining the approximate posterior for the reduced parameter , which is cooperated by the surrogate likelihood and the intermediate distribution, we use DREAM algorithm to explore the posterior density and obtain samplers. DREAM has been widely used to increase the speed of the MCMC process. It employs multiple Markov chains (generally 3-5) simultaneously and uses a differential evolution algorithm to generate trial points for each chain.

The DREAM algorithm stems from DREAM but creates an archive of potential solutions. Thus we can focus on the introduce to the DREAM algorithm. For -dimension parameters, chains ( ) simultaneously run in parallel and the present population is stored in an matrix . The candidates are randomly generated according to

 qi=zi+(1nz+e)g′(β,n′z)[β∑j=1zr1(j)−β∑j=1zr2(n)]+ϵ, \hb@xt@.01(2.10)

where signifies the number of pairs used to generate the proposal, and , for . The value of and are drawn from and with respectively, the is small compared to the width of the target distribution. The value of jump-size depends on and , i.e., and we set at every 5th generation. The is described in table LABEL:dreamtab.

Subspace sampling is implemented in DREAM by only updating randomly selected dimensions of each time a proposal is generated. If is a subset of -dimensions of the original parameter space, , then a jump in the th chain at iteration is calculated using differential evolution crossover operation. The crossover operation is used to decide whether to update the th dimension of the parameter and the selection process follows the rule (for chain and th parameter)

 qij={zijif $U$≤$1−CR$qijotherwise,

where CR denotes the crossover probability, and is a draw from a uniform distribution. If the user-defined parameter is , then CR is taken from the multinomial distribution. The pseudo-code of DREAM algorithm is presented in Algorithm LABEL:dreamtab.

## 3 Construction of surrogate model

As we can see, obtaining samplers by DREAM algorithm requires us to run 3-5 chains simultaneously, which means we need to call a large numbers of deterministic forward solvers. This may be computationally expensive and inefficient. We appeal to the truncated KLE and GMsFEM to reduce the dimension of unknown parameter space and state space in equation (LABEL:reduce). We assume that

 E=[√λ1e1(x),⋯,√λnzenz(x)],

is the weighted eigenfunctions in KLE, and , , is the KLE modes, and the matrix is the multiscale basis. We note that GMsFEM provides us much flexibility in choosing multiscale basis functions, so we can set a small number of selected basis functions on each coarse neighborhood while maintaining the accuracy to some degree. Hence, we set the number of multiscale basis as in each coarse block to construct the reduced observation vector , and use the regularizing Levenberg-Marquart algorithm to solve the ill-posed nonlinear problem (LABEL:coarseopt). The computational cost of solving PDE involved in optimization iterations can be reduced significantly due to the scale reduction of the forward model. The iteration associated with regularizing Levenberg-Marquart algorithm is given by

 zk+1=zk+J†(d−Gr(zk)),

where

 J†=(JTJ+αI)−1JT.

Here the is the regularization parameter and is the identity matrix of size , is the sensitivity matrix, whose transpose is defined by

 JT(zk)=[∇Gr1(zk),∇Gr2(zk),⋯,∇Grnd(zk)].

The gradient of , can be approximated by a difference method, e.g., each column of can be computed by

 J(:,j)=Gr(zk+hηj)−Gr(zk)h

for , where is the stepsize and is a vector of zeros with one in the th component.

In reality, the accurate solution to problem (LABEL:coarseopt) is unaccessible, both the discretization of the forward model and the iterative methods may cause errors. This would cause some difference from the accurate interval estimation of the true parameter . Thus we need some adjustment in determining the radius of the interval. We recall that the interval estimation of is

 [zOLSk−2σOLS√δk,zOLSk+2σOLS√δk].

Because the diagonal element of is often very small, the support of the intermediate distribution would be small and we need to do some extension to avoid the concentration at a very narrow region. We denote the modified intermediate distribution of by , where the radius is chosen following the rule: if , we set , else is set as . The adjustment may not be optimal, we choose 1 as the criterion here just because we want to use Legend polynomials to construct surrogate in computation. We note that the support of the modified intermediate distribution is much narrower than .

In summary, we use the regularizing Levenberg-Marquart algorithm to solve the problem (LABEL:coarseopt), and construct an intermediate distribution, on which to build the surrogate using LS-SCM and GMsFEM. Then we obtain an approximate posterior by integrating the surrogate likelihood and the intermediate distribution. Furthermore, we explore the approximate posterior by MCMC method and characterize the statistical properties of unknowns by samplers. The outline of surrogate construction is described in Table LABEL:tab2. We can use the Kullback-Leibler divergence to measure the difference between the approximate posterior and the full posterior.

### 3.1 Karhunen-Loève expansion

Let and be two Gaussian random fields with second-order moments. We first parameterize the random fields () by applying Karhunen-Loève expansion (KLE) with given covariance functions and truncate the KLE to approximately represent the random fields.

For a general presentation, we consider multiple random input fields defined on the same physical domain , which can be normalized as , where is the standard deviation of and is the expectation operator. For a single random field , it can be expressed as

 Y′j(x,ω)=∞∑i=1√λjieji(x)Zji(ω),

where are uncorrelated random variables, and are the eigenpairs defined by

 ∫Ωcov[Y′j](x1,x2)ej(x2)=λjej(x1),

where is the covariance function of . In particular, if is a Gaussian field, then are standard identically independent Gaussian random variables. We sort the eigenvalues in ascending order, i.e., , and their corresponding eigenfunctions are also sorted accordingly. The is approximated by the truncated KLE expansion with the first terms, which depends on the spectrum energy measured by , where is the area of the spatial domain .

For multiple random input fields, the correlation between them can be described by the symmetric and positive definite covariance matrix with the th element ,

 ρi,j=E[Y′i(x,ω)Y′j(x,ω)],

We decompose the covariance matrix into by Cholesky decomposition and incorporate the correlation into the representation of the input random field as follows [13]

 Yj(x,ω)=E[Yj(x,ω)]+σYj(j∑k=1Lj,kY′k(x,ω)). \hb@xt@.01(3.1)

It is noted that the KLE has the same mean and covariance function as the prescribed input field . Furthermore, the input random field can be approximated by truncating the infinite summation in (LABEL:KLE) at the th term,

 Yj(x,ω)=E[Yj(x,ω)]+σYj(j∑k=1Lj,kNk∑i=1√λkieki(x)Zki(ω)).

As a result, and can be approximated by the representations with finite parameters, i.e., and , where . Thus, the stochastic property of the input random fields () can be effectively approximated with finite random variables (; ) by retaining the leading KLE terms with large eigenvalues.

### 3.2 GMsFEM

GMsFEM can achieve efficient forward model simulation and provide an accurate approximation for the solution of multiscale problems. In this section, we follow the idea of GMsFEM [8] and apply it to the time-fractional diffusion equation (LABEL:frac-eq). For GMsFEM, we need to pre-compute a set of multiscale basis functions. For the space of snapshots, we solve the following local eigenvalue problem on each coarse block ,

 {−div(k(x,μj)∇φl,j)=λl,jk(x,μj)φl,j  in ωik(x,μj)∇φl,j⋅→n=0  on ∂ωi, \hb@xt@.01(3.2)

where the samplers are drawn from the prior distribution of . After the discretization using finite element method, the local eigenvalue problem can be formulated as an algebraic system,

 A(μj)φl,j=λl,jS(μj)φl,j,

where

 A(μj)=[a(μj)mn]=∫ωik(x,μj)∇vn∇vm,S(μj)=[s(μj)mn]=∫ωik(x,μj)vnvm,

where denotes the basis functions in fine grid. We take the first eigenfunctions corresponding to the dominant eigenvalues for each coarse neighborhood , (see Figure LABEL:coarse-cell), , where is the number of coarse nodes. Hence we can construct the space of snapshots as

 Vωisnap=span{φl,j,1≤j≤Nμ,1≤l≤Misnap}.

and the snapshot functions can be stacked into a matrix as

 Rsnap=[φ1,⋯,φMsnap],

where denotes the total number of snapshots used in the construction.

Next we perform a spectral decomposition of the space of snapshots and solve the local problems

 {−div(k(x,¯μ)∇ψik)=λkk(x,¯μ)ψik  in ωik(x,¯μ)∇ψik⋅→n=0  on ∂ωi. \hb@xt@.01(3.3)

where , , to reduce the dimension of the snapshot space and construct the space . We choose the smallest eigenvalues of

 Aψik=λkSψik,

and take the corresponding eigenvectors in the space of snapshots by setting , for , to form the reduced snapshot space, where are the coordinates of the vector and

 A=[amn]=∫ωik(x,¯μ)∇φn∇φm=RTsnap¯ARsnap,
 S=[smn]=∫ωik(x,¯μ)φnφm=RTsnap% ¯SRsnap.

where and denote fine-scale matrices corresponding to the stiffness and mass matrices with the permeability . For each coarse element , let be the solution to the equation

where is a linear hat function. The relationship between a coarse neighborhood and its coarse elements is illustrated in Figure LABEL:coarse-cell. Thus form a set of partition of unity functions associated with the open cover of .

Then we multiply the partition of unity functions by the eigenfunctions to construct GMsFE space,

 VH=span{Ψil:Ψil=χiψil:1≤i≤NHand1≤l≤Mi}.

We use a single index for the multiscale basis function set and place them in the following matrix

 R=[Ψ1,Ψ2,⋯,ΨMv],

where denotes the total number of multiscale basis functions. We note that once the matrix is constructed, it can be repeatedly used for simulation.

In the paper, we first discretize the fractional derivative as

 ∫t0∂u(x,s)∂sds(t−s)γ=n∑k=1∫tk+1tk∂u(x,s)∂sds(t−s)γ=n∑k=1uk+1−ukΔt∫tk+1tkds(tn+1−s)γ+O(Δt)=Δt−γ1−γn∑k=1(uk+1−uk)[(n+1−k)1−γ−(n−k)1−γ]+O(Δt),

where and . Thus, we have the approximation of the derivative at time as

 cDγtu(x,tn+1)=Δt−γΓ(2−γ)n∑k=1(uk+1−uk)[(n+1−k)1−γ−(n−k)1−γ]+O(Δt).

For simplicity of presentation, we define

 bk:=(n+1−k)1−γ−(n−k)1−γ,  k=1,2,⋯,n.

Let be the solution at the th time level. Then we have the weak formulation for the diffusion equation (LABEL:frac-eq),

where denotes the usual inner product and

 a(u,v)=∫k(x)∇u∇vdx,  ~a(u,v)=∫q(x)uvdx.

Let and . Then the weak formulation can be rewritten as

We assume that has the expansion

 Un=Mv∑j=1unHjΨj(x),

where the subscript denotes the GMsFEM solution on coarse grid. Let

 unH=(unH1,unH2,⋯,unHMv)T.

Then for ,

 Mv∑j=1un+1Hj~a(Ψj,Ψk)+ΔtγΓ(2−γ)Mv∑j=1unHja(Ψj,Ψk)=n∑i=1Mv∑j=1ciuiHj~a(Ψj,Ψk)+ΔtγΓ(2−γ)(fn+1,Ψk). \hb@xt@.01(3.4)

Let , and be the weighted mass, stiffness matrices and load vector using FEM basis functions in fine grid, respectively. Then equation (LABEL:c-eq) gives the following algebraic system,

 RTBRun+1H+ΔtγΓ(2−γ)RTKRun+1H=n∑i=1ciRTBRuiH+ΔtγΓ(2−γ)RTF.

If we denote

 ~B=RTBR,~K=RTKR,

then can be computed by the iteration

 \hb@xt@.01(3.5)

By using the multiscale basis functions, the solution in fine grid can be downscaled by the transformation .

We note that when GMsFEM is not applied, the full model solution is obtained by the iteration

 un+1h=(B+ΔtγΓ(2−γ)K)−1(n∑i=1ciBuih+ΔtγΓ(2−γ)F). \hb@xt@.01(3.6)

Compared with , it can be seen that the size of and are , but the size of and are (). Thus a much smaller system is solved in GMsFEM. The matrix for multiscale basis functions is computed overhead and it can be repeatedly used for all the time levels. This significantly improves the efficiency for forward model simulations.

### 3.3 Stochastic collocation via least-squares method

Stochastic collocation method is an efficient approach to approximate the solution of PDEs with random inputs. In the paper, the stochastic collocation method is based on generalized polynomial chaos (gPC) and least-squares method. The approximation solution can be represented by gPC expansion using the stochastic collocation method. We use the stochastic collocation method to solve the forward model. With the established gPC expansion of the approximation of the forward model, the evaluation of the likelihood function in MCMC sampling can be significantly accelerated.

We denote the random parameters as , and assume that each random variable has a prior probability density function , for , where is the support of . Then the joint prior density function of is

 π(z)=nz∏i=1πi(zi),

and its support has the form

 Γ:=nz∏i=1Γi∈Rnz.

For standard normal random parameters, the support of the prior is , and Hermite polynomials can be used. If the support random parameters are bounded, e.g., , we can use Legendre orthogonal polynomials as the basis functions to construct approximations of the forward model solution since random variables with bounded support in can always be mapped to .

Without loss of generality, we describe the gPC approximation to the forward model for . Let be a multi-index with , and let be an integer. The th-degree gPC expansion of is defined as

 GN(Z)=P∑i=1ciΦi(Z),P=(N+nz)!N!nz!, \hb@xt@.01(3.7)

where are the basis functions defined as

 Φi(Z)=ϕi1(Z1)⋯ϕinz(Znz),0≤|i|≤N,

and satisfies

 E[Φi(Z)Φj(Z)]=∫Φi(z)Φj(z)π(z)dz=δi,j,0≤|i|,|j|≤N, \hb@xt@.01(3.8)

where because is the th degree one-dimensional orthogonal polynomial having been normalised in the direction.

In the stochastic collocation method, we first choose a set of collocation nodes , where is the number of nodes. Then for each , we solve a deterministic problem at the node to obtain

 G(z(i))=g∘u(x,t;z(i)),

where is a state function. After the pairings () being obtained, we are able to construct a good approximation of , such that for all . Thus, we need to solve deterministic problems. In the paper we use least-squares method to obtain the coefficient in .

Let be the set of i.i.d. samples for and the corresponding realizations of the stochastic function . Let

 c=(c1,⋯,cP)T∈RP,b=(G(z(1)),⋯,G(z(Q)))T∈RQ.

If we set the condition , , then the following equation holds,

 Ac=b, \hb@xt@.01(3.9)

where is the matrix with the entries

 Aij=Φj(z(i)),i=1,⋯,Q,j=