A two-stage ensemble Kalman filter based on multiscale model reduction for inverse problems in time fractional diffusion-wave equations

A two-stage ensemble Kalman filter based on multiscale model reduction for inverse problems in time fractional diffusion-wave equations

Yuming Ba College of Mathematics and Econometrics, Hunan University, Changsha 410082, China. Email:yumingb@hnu.edu.cn.    Lijian Jiang Institute of Mathematics, Hunan University, Changsha 410082, China. Email: ljjiang@hnu.edu.cn. Corresponding author    Na Ou College of Mathematics and Econometrics, Hunan University, Changsha 410082, China. Email: oyoungla@hnu.edu.cn.

ABSTRACT

Ensemble Kalman filter (EnKF) has been widely used in state estimation and parameter estimation for the dynamic system where observational data is obtained sequentially in time. Very burdened simulations for the forward problem are needed to update a large number of EnKF ensemble samples. This will slow down the analysis efficiency of the EnKF for large-scale and high dimensional models. To reduce uncertainty and accelerate posterior inference, a two-stage ensemble Kalman filter is presented to improve the sequential analysis of EnKF. It is known that the final posterior ensemble may be concentrated in a small portion of the entire support of the initial prior ensemble. It will be much more efficient if we first build a new prior by some partial observations, and construct a surrogate only over the significant region of the new prior. To this end, we construct a very coarse model using generalized multiscale finite element method (GMsFEM) and generate a new prior ensemble in the first stage. GMsFEM provides a set of hierarchical multiscale basis functions supported in coarse blocks. This gives flexibility and adaptivity to choosing degree of freedoms to construct a reduce model. In the second stage, we build an initial surrogate model based on the new prior by using GMsFEM and sparse generalized polynomial chaos (gPC)-based stochastic collocation methods. To improve the initial surrogate model, we dynamically update the surrogate model, which is adapted to the sequential availability of data and the updated analysis. The two-stage EnKF can achieve a better estimation than standard EnKF, and significantly improve the efficiency to update the ensemble analysis (posterior exploration). To enhance the applicability and flexibility in Bayesian inverse problems, we extend the two-stage EnKF to non-Gaussian models and hierarchical models. In the paper, we focus on the time fractional diffusion-wave models in porous media and investigate their Bayesian inverse problems using the proposed two-stage EnKF. A few numerical examples are carried out to demonstrate the performance of the two-stage EnKF method by taking account of parameter and structure inversion in permeability fields and source functions.

keywords: GMsFEM, time fractional diffusion-wave equation, two-stage EnKF, inverse problems

1 Introduction

The model inputs (parameters, source, domain geometry and system structure, et. al.) in many practical systems are often unknown. We need to identify or estimate these inputs by partial and noisy observations to construct predictive models and calibrate the models. This results in inverse problems. In the paper, we consider the inverse problems in anomalous diffusion models. The anomalous diffusion can be roughly classified into two categories: subdiffusion and superdiffusion . Here denotes the fraction derivative with respect to time. The anomalous diffusion equations are also called fractional diffusion-wave equations. Theoretical results such as existence, uniqueness, stability and numerical error estimates are presented in [19] for some type of anomalous diffusion equations. The relationship between anomalous diffusion equations and regular diffusion equations is discussed in [28]. The fractional order diffusion-wave equation as a typical fractional partial differential equation [35], is a generalization of the classical diffusion and wave equation and can be used to better characterize anomalous diffusion phenomena in various fields. The fractional diffusion-wave equations can model porous media applications, viscoelastic mechanics, power-law phenomenon in fluid and complex network, allometric scaling laws in biology and ecology, quantum evolution of complex systems and fractional kinetics [27].

In practice, the inputs and parameters in the anomalous diffusion models are often unknown and need to be identified based on some observation data and prior information. The problem of identifying unknown inputs in mathematical models has been intensively studied in the framework of inverse problems and various numerical methods have been developed [1, 20, 26]. The mathematical model of inverse problem is featured with quantities which renders useful simulation prediction obtained by imperfect model equations and measurements. The inverse problem is usually ill-posed. Many methods [17, 11] such as regularization or penalty can be used to overcome the ill-posedness. The unknown inputs (e.g., permeability field) of the anomalous diffusion models in porous media may have multiscale structure, complex geometry patterns and uncertainty. This significantly increases the challenge of the inverse problems for these models.

Practical models usually involve uncertainty. Moreover, the prior information for unknown parameters and observations are often characterized by random variables. Thus, it is desirable to treat the computational model and its inverse problem in statistical perspective. Once of statistical approach for inverse problems is Bayesian inference. The Bayesian approach [17, 33] incorporates uncertainties in noisy observations and prior information, and can derive the posterior probability density of the parameters, which enables us to quantify the uncertainty in the parameters. The popular sampling methods in Bayesian inversion are Markov chain Monte Carlo (MCMC) [10] method and its variants [2, 22], which require costly computation to achieve convergence and explore the whole state space in high dimension sample spaces. MCMC simulation has to run a long enough chain to give an accurate estimate, and entails repeated solutions of the forward model. This leads to great challenge for solving the Bayesian inverse problem.

The ensemble Kalman filter (EnKF) is another Bayesian method. It can be seen as a reduced-order Kalman filter (KF) or a Monte Carlo implementation of KF [8, 23]. Since its introduction by Evenson in [8], EnKF has been applied in many fields such as oceanography, numerical weather prediction, hydrology and petroleum reservoir history matching. EnKF can be used for both data assimilation, where the goal is to estimate model states by incorporating dynamical observation data, and inverse problem, where the objective is estimate unknown parameters appearing in models. Ensemble members of EnKF are forecasted in time by solving forward models and updated by an approximate Kalman filter scheme. EnKF has the significant advantage that its inherent recursive process is adapted to the sequential availability of observation data in dynamical systems. Unlike MCMC, the ensemble samples are updated independently each other in EnKF and it is not necessary to propose samplers in a very tricky manner. Thus, EnKF and other filter methods have attracted much more attention in community of Bayesian inversion [13, 6]. For nonlinear models, the linearization of model can be avoided in EnKF using ensemble covariance as an approximation of the posterior error covariance. The EnKF methods provide the first and second moments of random parameter, which are approximated by ensemble mean and ensemble covariance, respectively. Thus, the EnKF algorithm makes Gaussian approximation in a sequential manner. These approximation is accurate for Gaussian prior models. Some insight analysis of EnKF for inverse problems has been made in recent years [7, 31]. For the non-Gaussian models, a normal-score ensemble Kalman filter is proposed in [38], where the normal-score transformation is applied to transform unknown non-Gaussian parameters to Gaussian and make the parameters follow marginal Gaussian distributions.

As a Bayesian sampling method, EnKF needs to compute the forward problem repeatedly. When the forward model is computationally intensive, such as multiscale models, a direct application of EnKF forecast with full order model would be computationally prohibitive. In order to significantly improve the simulation efficiency, seeking more efficient sampling from the previous posterior and building surrogates of the forward models [3, 25] are necessary to accelerate the EnKF analysis (posterior exploration). Multiscale models can be solved efficiently and accurately by the numerical multiscale methods in a coarse grid instead of resolving all scales in very fine grid. As a numerical multiscale method, Generalized Multiscale Finite Element Method (GMsFEM) [4, 12] can provide a reduce model with a good trade-off between accuracy and computation efficiency. The main idea of GMsFEM is to use a variational formulation to capture the impact of small-scale features on the coarse-scale by using multiscale basis functions. The small-scale information is integrated into multiscale basis functions, which can be used repeatedly for different source terms and boundary conditions of the model [14]. GMsFEM has been developed to solve multiscale models with complex multiscale structures and its convergence is independent of the contrastness of the multiscales [4].

In the framework of EnKF, the output of model depends on random parameters. We use generalized polynomial chaos (gPC)-based stochastic collocation methods to propagate prior uncertainty through the forward model in a sequential manner. The gPC stochastic collocation methods require only a few number of uncoupled deterministic simulations with no reformulation of the governing equations of the forward model. We assume that the model’s output is a stochastic field and admits a gPC expansion. Then we select a set of collocation nodes and use least-squares methods to determine the coefficients of the gPC basis functions. To taking account of the potential sparsity of the gPC expansion, we use regularization to the least-squares problem. This allows using much fewer samples to construct a gPC surrogate model. The idea has been employed in Bayesian inverse problems [15, 16, 21, 37]. To accelerate computation for the the regularized least-squares problem, the lagged diffusivity fixed point method is used. The gPC surrogate is usually constructed based on a prior density [15, 21, 37] . However, the posterior is concentrated in a small portion of the entire prior support in many inference problems [16]. In EnKF, the prior is sequentially updated by incorporating new data information. Thus, it may be much more efficient to build surrogates only over the important region of the updated prior than the initial prior support.

In this paper, we propose a two-stage EnKF to take care of the challenges and concerns mentioned above. In the first stage, we construct a coarse GMsFEM model with very few multiscale basis functions, and build a new prior using standard EnKF based on the first partial measurement data information in time. The initial ensemble samples are drawn from the new prior for the second stage of EnKF. By integrating GMsFEM and sparse gPC stochastic collocation method based on the new prior, we build an initial surrogate model for the second stage. Because the ensemble samples are updated by the new analysis of EnKF, this also sequentially updates the prior based on the new ensemble samples. To improve the initial surrogate model, we dynamically update the surrogate model based on the updated prior in each assimilation step. We note that the surrogates are constructed efficiently in EnKF procedure by using GMsFEM and sparse gPC stochastic collocation method. By virtue of building new priors, we exclude the unimportant region of the posterior. We may use some other methods such as ensemble smoother (ES) [5] to build the new prior. In general, ES is used to estimate the parameters and states when simulation models are typically stable functions. To extend the two-stage EnKF to non-Gaussian models, we integrate the proposed EnKF with normal-score transformation to broaden the applicability. The two-stage EnKF is also explored in hierarchical Bayesian inverse problems. This increases flexibility in prior modeling for the Bayesian inference.

The structure of the paper is as follows. We begin with the general framework of EnKF for inverse problems. In Section 3, we focus on the time fractional diffusion-wave models and the surrogate model construction using GMsFEM and sparse gPC. Section 4 is devoted to presenting the two-stage EnKF based on the surrogate model. In Section 5, we present a few numerical examples to illustrate the performance of proposed EnKF with applications of inverse problems for time fractional diffusion-wave equations. Some conclusions and comments are made finally.

2 Ensemble Kalman filter for inverse problems

Let be a Hilbert space and a generic forward operator on for some physical system. We assume that the forward operator describes the relation of parameter , state and source term , i.e.,

(2.1)

where , , the dual space of . We assume that the solution of the problem has the form

Let be the observation operator mapping the model state to the observation space

Let be additive Gaussian noise for observation. Then the observation data can be expressed by

(2.2)

In the paper, we assume that is independent of . In practical setting, the observation data is in a finite dimensional space and can be expressed by

where is the dimension of observations.

2.1 Bayesian inference using EnKF

The EnKF was introduced by Evensen [8] as a powerful data assimilation method. Kalman filter is used for sequential update for states in linear dynamical systems and Gaussian distribution. It provides the mean and covariance information of the posterior distribution. When the prior is Gaussian, the filter gets the posterior Gaussian distribution from the joint Gaussian observation and the parameter. But for nonlinear dynamical system, EnKF has been widely used for data assimilation. In the paper, we use EnKF for Bayesian inverse problems. This is a particular application of EnKF in recent years [6, 32].

Given some observation data, we want to estimate the parameter . In Bayesian context, both and are random variables. Thus Bayes rule gives the posterior probability density for by

where is the prior distribution before the data is observed. The data enter the Bayesian inference through the likelihood function .

If the information of observations is incomplete, the covariance of the noise observation may be unknown. For this situation, we need to estimate the covariance of observation noise. Let be independent and identically distributed (i.i.d.) Gaussian random vector with zero mean and variance , i.e.,

where is the identity matrix. Thus the likelihood function obeys the Gaussian distribution. Let be the Euclidean norm and the weighted norm, where is the prior’s covariance matrix. If the prior is also Gaussian distribution, then

(2.3)

where is the mean of prior (background information). When is unknown, is a hyperparameter in the hierarchical Bayesian model. Then the corresponding posterior

The marginal posterior of is

Because the likelihood

(2.4)

belongs to the inverse-gamma family, the conjugate prior can be the inverse-gamma distribution

(2.5)

From and , we get

(2.6)

As in [9], we choose two numbers ( often small and between and ) and

where , such that and . Once the posterior distribution of is inferred, we can extract the posterior mean or the maximum a posteriori (MAP) of the unknown parameter . We note that the MAP estimate is equivalent to the solution of a regularized optimization problem. In fact, maximizing the right hand side of is equivalent to the minimization problem

(2.7)

where . The hyperparameter can be drawn from Inv-gamma distribution of (2.6). When the observation operator is linear, Kalman filter (KF) method can be derived from (2.7) by completing the squares on the variable and gives the following analysis

(2.8)

where and the Kalman gain is given by

Let be an artificial time for the data assimilation in a dynamic system. We denote unknown parameter, hyperparameter and observation data as , and at artificial time step , respectively. Then we define the artificial discrete dynamic system

In the framework of EnKF, an estimate for is updated in each data assimilation step. The sequential update needs forecast steps and analysis steps, which transport information of the current time to the next observation time in the forecast step. At the time step , we denote the forecast by , the analysis by , forecast error covariance matrix by and analysis error covariance matrix by . Then we have

As in (2.8), the posterior is the weighted sum of observations and forecast in the analysis step , i.e.,

where Kalman gain

Here is the Jacobian matrix of in Extended Kalman Filter (EKF) when is nonlinear.

It is well known that the KF and EKF are numerically scarcely affordable and the storage of a few state vectors is impossible in the high-dimensional systems. To overcome the difficulty, EnKF is desirable for nonlinear data assimilation problems in high-dimensional space. The advantage of EnKF is that we apply a useful approximation to the Kalman filter to avoid propagating the first and second order statistical moments. To this end, Monte Carlo method is used to propagate an ensemble of realizations from the prior distribution. In EnKF, we just update the propagating ensemble and the Kalman gain matrix is approximated by

where is the forecast ensemble and is the ensemble of simulated observations. Thus, the forecast error covariance matrix and analysis error covariance matrix are not necessary to compute. The true mean and covariance are approximated by ensemble mean and ensemble covariance, respectively. In the paper, we make use of the stochastic analysis ensemble generation method, where the simulated observations are perturbed by simulated observation error . The is independent of .

Let be a time series of observations and (). We assume that the prior distribution of is (Gaussian), and , where is unknown. We initially pick ensemble members for EnKF, through which we obtain the analysis ensemble . Furthermore, the mean and covariance of can be used to estimate the unknown parameter. The pseudo-code of EnKF algorithm is presented in Algorithm 1.

number of ensemble members , initial ensemble members drawn from prior , the number of data assimilation steps , observations .
.
.
. for
. Forecast/predictor: Generate ensemble of by

for
,  , , ,
.
end for
,  ,  .
. Analysis/corrector: Update the previous ensemble by
,
where and .
end for

Algorithm 1 Sequential EnKF algorithm with unknown
Remark 2.1.

We denote by . We replace by in (2.2). We can use ensemble smoother (ES) to do a single global update. Then the analysis in ES is

where .

2.2 EnKF for non-Gaussian model using normal-score transformation

In general, EnKF is a Gaussian approximation for the estimated parameter because it reproduces the mean and covariance. If the target distribution is Gaussian and unimodal, EnKF inherently gives an accurate estimation. However, if the the target distribution is non-Gaussian or multimodal, the approximation may not capture the properties of target distribution. In this situation, we can invoke the normal-score transformation, which maps non-Gaussian into Gaussian and is invertible [29]. We perform the normal-score transformation after each forecast step. Let be the normal-score operator at the assimilation step and satisfy

Because the support of cumulative distribution function (CDF) is , the transformation can be fulfilled by CDF. The normal-score transformation renders the Gaussian random variables one by one, and the multivariate properties of parameter vector are also changed but not necessary to be multi-Gaussian [38].

We want to incorporate the normal-score transformation into EnKF for non-Gaussian cases. Let and be the forecast and analysis after the normal-score transformation. Then the forecast step of EnKF implies that

The analysis step is followed by

where the Kalman gain matrix is approximated by

Here is the forecast ensemble after the transformation and is the ensemble of simulated observations. Let be a non-Gaussian distribution and . We describe the normal-score EnKF (NS-EnKF) in Algorithm 2.

Input: number of ensemble members , initial ensemble members drawn from prior , the number of data assimilation steps , observations .
Output: .
. ,
.for
. Forecast/predictor: Generate ensemble of by
, ,  .
for
,  , .
end for
,  .
. Analysis/corrector: Update the previous ensemble and by
,  ,
where and .
end for

Algorithm 2 NS-EnKF algorithm

3 Surrogate model construction using GMsFEM and sparse gPC

For the EnKF methods presented in Algorithm 1 and Algorithm 2, we need to repeatedly compute the forward model for all ensemble members. This computation is very expensive when the forward model is a complex PDE model and the number of ensemble members is large. In order to significantly accelerate the forward model computation, we construct a surrogate model for the forward model using model reduction methods.

The goal is to approximate a large-scale problem in a low dimensional space. To this end, the key idea is to choose a set of appropriate basis functions, which can span a good approximation space for the solution. If equation (2.1) is linear with respect to , we can derive an algebraic system for (2.1) as follows by applying suitable discretization method

(3.9)

where is the numerical solution vector and the source vector. The is the number of spatial degree of freedoms and is usually very large if we straightforwardly solve the equation in fine grid. We can use a model reduction method and reduce the number of basis functions to improve the efficiency. Then we can get a reduced algebraic system for (2.1),

Let () be the matrix comprised of the reduced basis functions. Then a projection reduce method implies

In order to accelerate evaluations of the posterior density for each updated parameter ensemble, we use stochastic response surface methods to construct surrogate. The solution of the reduced model can be expressed by stochastic basis functions such as polynomial chaos [36], radial basis functions [30], and wavelet basis functions [24]. The surrogate model is constructed through the stochastic collocation method by solving a penalized least-square problem. We use the lagged diffusivity fixed point method for the optimization problem and get a sparse representation for using fewer samples.

3.1 GMsFEM

In the paper, we consider the following time fractional PDE model

(3.10)

where and is the fractional order of the derivative with respect to time. Here we consider the Caputo fractional derivative defined by

(3.11)

where is the Gamma function and is a positive integer. The equation (3.10) has a close form subject to a suitable boundary condition and initial condition.

In the model equation, usually refers to a permeability field in porous media applications. The permeability field has heterogeneous and multiscale structure inherently and results in a multiscale model. We will use general multiscale finite element method (GMsFEM) to reduce the model and get a coarse GMsFEM model. This can achieve a good trade-off between efficiency and accuracy for simulating the forward model. We will apply GMsFEM presented in [4] to the time-fractional diffusion-wave equation (3.10). For GMsFEM, we need to pre-compute a set of multiscale basis functions. To this end, the first step is to construct a snapshot space for multiscale basis by solving local eigenvalue problem on each coarse block ,

(3.12)

where the samplers are drawn from the prior distribution of . By a finite element method discretization on underlying fine grid, the local eigenvalue problem can be formulated as an algebraic system,

where

and are the basis functions in fine grid. We take the first eigenfunctions corresponding to the dominant eigenvalues for each coarse neighborhood (see Figure 3.1), , where is the number of coarse nodes. Hence we construct the space of snapshots by

The snapshot functions can be stacked into a matrix as

where denotes the total number of snapshots used in the construction. The second step is to solve the following local problems in the snapshot space

(3.13)

where . We define

where and denote fine-scale matrices corresponding to the stiffness and mass matrices, respectively, with the permeability . We choose the smallest eigenvalues of the equation

and take the corresponding eigenvectors in the snapshot space by setting , for , to form the reduced snapshot space, where are the coordinates of the vector .

Let be a set of partition of unity functions associated with the open cover of .

Figure 3.1: Illustration of a coarse neighborhood and a coarse block

Then we multiply the partition of unity functions by the eigenfunctions to construct GMsFE space, i.e.,

We can use a single index for the multiscale basis function set and place them in the following matrix

where denotes the total number of multiscale basis functions. We note that once the matrix is constructed, it can be repeatedly used for simulation.

Next we present the temporal discretization for the equation (3.10). When , the equation (3.10) is the subdiffusion equation. We use the method in [20] to discretize the fractional derivative and have

where , , and

(3.14)

When , the is the superdiffusion equation. We can use the method in [18] to discretize the fractional derivative and have

where

Let be the solution at the th time level. Then we have the weak formulation for the subdiffusion equation ,

(3.15)

where and are defined in (3.14) and

The weak formulation for the superdiffusion equation reads

(3.16)

For subdiffusion case , we define

For superdiffusion case , we define

Then the weak formulation of can be rewritten as

We assume that has the expansion

where denote the GMsFEM basis functions. Let

Then for ,

Let , and be the weighted mass, stiffness matrices and load vector using FEM basis functions in fine grid, respectively. Then the equation gives the following algebraic system,

If we define

then can be computed by the iteration

By using the multiscale basis functions, the solution of in fine grid can be obtained by downscaling through the transformation .

In a similar way to solving equation , the GMsFEM solution of can be computed by the iteration

We note that when GMsFEM is not applied, the full order model solution in fine grid is obtained by the iteration

and

By comparing the GMsFEM model with the full order model, we see that the size of and are , but the size of and are (). Thus a much smaller system is solved in GMsFEM. The matrix for multiscale basis functions is computed overhead and it can be repeatedly used for all the time levels. This significantly improves the efficiency for forward model simulations.

Remark 3.1.

In the sequential data assimilation process, ensemble members update and so of (3.13) updates as well. We can update the GMsFEM basis matrix to improve the GMsFEM model.

3.2 Stochastic collation method using regularized least-squares

Stochastic collocation method is an efficient approach to approximate the solution of PDEs with random inputs. In this paper, we use stochastic collocation method (SCM) to obtain an expansion for observation operator and efficiently evaluate the simulated observation ensemble.

We use generalized polynomial chaos (gPC) functions to represent by penalized least-squares method. Let be a multi-index with and be a nonnegative integer. The th-degree gPC expansion of is then approximated by a linear combination of gPC basis , i.e.,

(3.17)

The coefficients of expansion are obtained by choosing some collocation points and least-squares method. We first take realizations of in the support of prior distribution . Then for each , we solve a deterministic problem at the node to obtain . After we obtain all pairs (), we are able to construct a approximation of such that for all . Thus, can produce a system of linear equations

(3.18)

where is the matrix with the entries

and the right term satisfing

If we solve the system in the ordinary least-squares method, the system should be overdetermined, i.e., should be much larger than . This means that we need solve the forward model for a large number of samples. To reduce the computation burden, we can take much fewer samples () and use regularized least-squares, i.e.,

which is equivalent to the optimization problem

(3.19)

We use the lagged diffusivity fixed point method [34] for the penalized least-squares problem. Due to the nondifferentiability of the norm, we take an approximation to the penalty such as , where is a scalar and is a small positive parameter. We denote the approximated penalty by

where

For any ,

where diag denotes the diagonal matrix whose th diagonal entry is , and denotes the Euclidean inner product on