Bayesian inversion of convolved hidden Markov models with applications in reservoir prediction

# Bayesian inversion of convolved hidden Markov models with applications in reservoir prediction

Torstein Fjeldstad torstein.fjeldstad@ntnu.no Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim. Henning Omre Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim.
###### Abstract

Efficient assessment of convolved hidden Markov models is discussed. The bottom-layer is defined as an unobservable categorical first-order Markov chain, while the middle-layer is assumed to be a Gaussian spatial variable conditional on the bottom-layer. Hence, this layer appear as a Gaussian mixture spatial variable unconditionally. We observe the top-layer as a convolution of the middle-layer with Gaussian errors. Focus is on assessment of the categorical and Gaussian mixture variables given the observations, and we operate in a Bayesian inversion framework. The model is defined to make inversion of subsurface seismic AVO data into lithology/fluid classes and to assess the associated elastic material properties. Due to the spatial coupling in the likelihood functions, evaluation of the posterior normalizing constant is computationally demanding, and brute-force, single-site updating Markov chain Monte Carlo algorithms converges far too slow to be useful. We construct two classes of approximate posterior models which we assess analytically and efficiently using the recursive Forward-Backward algorithm. These approximate posterior densities are used as proposal densities in an independent proposal Markov chain Monte Carlo algorithm, to assess the correct posterior model. A set of synthetic realistic examples are presented. The proposed approximations provides efficient proposal densities which results in acceptance probabilities in the range 0.10-0.50 in the Markov chain Monte Carlo algorithm. A case study of lithology/fluid seismic inversion is presented. The lithology/fluid classes and the elastic material properties can be reliably predicted.

## 1 Introduction

Inverse problems arises naturally in several fields of engineering, such as image analysis and signal processing, and constitutes a major challenge. The variables of interest are observed through an acquisition procedure together with an error term. The objective is to predict the variables of interest given the observations, being an inverse problem. We cast the problem into a Bayesian inversion framework [28], defined by a likelihood and a prior model.

Focus is on the class of switching state-space models where the likelihood function for the observations vary according to an unobservable categorical random process, see [11] and references therein. Switching state-space models have numerous applications in, for example, econometrics [12], signal processing [4], speech recognition [22] and blind deconvolution [19].

We restrict ourselves to the subclass of convolved hidden Markov models inspired by [18], [30] and [25]. The bottom-layer is assumed to be an unobservable categorical Markov chain. Conditional on the bottom-layer we define a middle-layer being a Gaussian spatial variable, which then appear as a Gaussian mixture spatial variable unconditionally. The top-layer, representing the convolved observations, is assumed to be Gaussian conditional on the middle-layer. Focus is on assessment of the categorical and the Gaussian mixture variables given the convolved observations, which appear as an inverse problem.

Recursive algorithms have proven to be successful for hidden Markov models [27, 23]. Indeed, they are well-suited for models where each observation depend only on current and past values of the categorical process, not future values which is the case with convolved observations. Thus, for more complex likelihood functions sample based inference by Monte Carlo sampling, such as particle filters, is often required [10].

Consider a discretized vertical profile of categorical classes subsurface penetrating a reservoir unit. The categorical variable in the bottom-layer represent, for example, geological lithology/fluid classes such as gas sandstone, brine sandstone or shale. In reservoir characterisation such inverse problems are of utmost importance to predict the presence of hydrocarbons [3, 9, 15]. The middle-layer, being a Gaussian mixture spatial variable, represents, for example, the elastic material properties of the reservoir. [7] proposed a linearized Bayesian inversion technique for assessment of continuous-valued properties subsurface, however they did not account for the different effects of the lithology/fluid classes. [13] proposed a Gaussian mixture density prior model for seismic velocities, but their model did not included spatial dependence in the middle-layer. [8] proposed an approximate sampling technique based on approximate Bayesian computation to assess lithology/fluid classes and the continuous-valued properties.

We consider an extension of the convolved hidden Markov model defined and evaluated in [18] and [25]. We assume a Markov chain prior model for the bottom-layer. In our model the middle-layer is Gaussian spatial variable conditional on the bottom-layer defined by class-dependent expectations and variances, and a spatial correlation function. The spatial correlation is defined to enforce spatial continuity in the middle-layer, a property observed in subsurface seismic velocities. Unconditionally the middle-layer appear as a Gaussian mixture spatial model, extending the traditional Gaussian prior assumption for seismic velocities [7]. The top-layer contains convolved observations, which appear unconditionally as a Gaussian mixture random variable dependent on past, current and future values of the two lower layers. Assessment of the categorical variable given the observations, which constitutes a challenging categorical inverse problem, is discussed. Moreover, we discuss simulation and prediction of the middle-layer Gaussian mixture variable.

The convolution and spatial coupling in the likelihood model prevents straightforward use of recursive algorithms since the posterior model can not be written on factorial form [23]. [18], [16] and [25] proposed various approximate posterior models on lower order factorial form which are computationally feasible. For the extended model defined in our study, we present two classes of likelihood approximations and demonstrate that their posterior models can be written on factorial form. Hence, the approximate posterior models can be efficiently assessed by the recursive Forward-Backward algorithm [5]. The correct posterior model is assessed by an independent proposal Markov chain Monte Carlo (MCMC) Metropolis-Hastings (MH) algorithm, with the approximate posterior model as proposal density. In general, independent proposal densities results in a poor rate of convergence. [25] verified in a simulation study that it is possible to obtain satisfactory acceptance rates with their proposed approximate posterior models.

Denote by a generic probability distribution for both categorical and continuous random variables. The probability density function (pdf) for a -dimensional Gaussian random variable  having mean  and covariance matrix is denoted by . We refer to a likelihood model that is linear in the conditioning variable with additive Gaussian errors as a Gauss-linear likelihood model. Let be the identity matrix.

In Section 2 we define the convolved hidden Markov model. Section 3 contains the definition of the proposed likelihood approximations, and a MCMC algorithm to assess the correct posterior model is presented in Section 4. A simulation study is included in Section 5, where we consider various model parameters to empirically evaluate the overall performance of the two proposed likelihood approximations. The synthetic simulation models are chosen to be comparable to the ones in [25]. A seismic inversion case study based on real data is included in Section 6.

## 2 Model specification

Consider an unobservable categorical variable on a discretized top-down regular grid , along a vertical profile, representing for example lithology/fluid classes of the subsurface. Label the variable for . A continuous valued variable is observed and represent, for example, seismic data. The main objective is to predict  given d with the associated uncertainty, being a categorical inverse problem. We operate in a Bayesian framework, and the posterior density for the categorical variable given the observations is defined by Bayes’ theorem:

 p(κ|d)=const×p(d|κ)p(κ), (1)

where  is the normalizing constant,  is the likelihood function, and  is the prior model. The solution of the categorial inverse problem in Eq. (1) is a computationally challenging problem, and analytical assessment is only feasible for very particular models. We introduce an additional continuous valued variable separating d and . That is, we assume d and  to be conditionally independent given . The variable  represents, for example, the logarithm of the seismic velocity. We define the gross likelihood model [18] as:

 p(d|κ)=∫p(d|m)p(m|κ)\difm, (2)

where and are respectively the acquisition and response likelihood functions. The variable  is marginalized out in , but its uncertainty propagates into . In general, integrating out  in Eq. (2) is a challenging problem caused by its dimension, and analytical expressions need not exist.

### 2.1 Prior model

We define a Markov chain prior model to represent vertical dependency in the categorical variable of interest. Let the categorical variable  be defined by a -th order stationary Markov chain,

 p(κ)=ps(κ1,…,κk)×n∏t=k+1p(κt|κt−1,…,κt−k)=ps(κ(k)t)×n∏t=k+1p(κ(k)t∣∣κ(k)t−1), (3)

with for . The latter equality is justified by a trivial extension of the sample space since the set is a subset of the conditioning set. The corresponding transition -matrix  is time-independent and contain at most non-zero elements, due to the trivial extension. Let be the corresponding stationary distribution. Indeed in this case the spatial Markov property is , and the prior model is a -th order Markov random field [6].

### 2.2 Likelihood model

We define the response model as follows:

 [m|κ]=μm|κ+ϵm|κ, (4)

where is the vector with pointwise conditional means given , and is a centered Gaussian process having covariance matrix . The response likelihood is therefore Gaussian:

 p(m|κ)=ϕn(m;μm|κ,Σm|κ). (5)

We decompose the covariance matrix  as follows:

 Σm|κ=Σσm|κΣρmΣσm|κ, (6)

where is a diagonal matrix with the conditional standard deviations. The correlation matrix is defined from a spatial correlation function , where for all combinations of . This entails that the response process  is constructed by a normalized Gaussian process which is scaled and shifted dependent on the current categorical variable for each . For each , the Gaussian process representing the current categorical variable is assigned. This response process can be extended to have a set of correlated Gaussian processes, with separate expectations, variances and spatial correlation functions - one for each of the classes of the categorical variable. The inversion methodology defined in the following sections work also for this case, but the notation will be more complex. We present only the simpler parametrization in Eq. (5)-(6) to ease readability.

The marginal pdf of :

 p(m)=∑κp(m|κ)p(κ), (7)

is a -dimensional Gaussian mixture pdf with at most unique modes. Even for short profiles, evaluation of  is unfeasible since it requires evaluation of different configurations of . Also, for , the unconditional pdfs

 p(mt)=∑κtϕ1(mt;μmt|κt,σ2mt|κt)p(κt), (8)

are univariate Gaussian mixture pdfs with at most unique modes.

We consider a convolved data acquisition procedure represented by the convolution matrix , representing time-independent convolutions. Hence, each datum appear as a weighted sum of neighboring elements of the response variable . Consider the following linear acquisition model:

 [d|m]=Wm+ϵd|m, (9)

where is a centered Gaussian process with covariance matrix . Thus, the acquisition likelihood model:

 p(d|m)=ϕn(d;Wm,Σd|m) (10)

is Gaussian with mean vector and covariance matrix . Indeed, the proposed acquisition likelihood is straightforward to generalize to cases where d and  are of different dimension.

Since both the response and acquisition likelihood models are assumed to be Gaussian, the challenging gross likelihood model in Eq. (2) is also Gaussian:

 (11)

Given , each datum has an expectation as a weighted sum of the conditional mean vector . Note that the explicit dependence on  in the covariance matrix requires the covariance matrix to be computed for each unique , thus evaluation of the likelihood is computationally expensive for a set of unique .

### 2.3 Posterior model

The model defined in the previous section may be represented by a graph. If we assume that the spatial correlation function is on first order exponential form, the convolution kernel defining  has finite support and the prior model is a first order Markov chain, the graph takes a simple form (Fig. 1). Indeed, each observation is seen to be dependent on a large subset of .

The general posterior model is

 p(κ|d)=const×∫p(d|m)p(m|κ)\difm×p(κ), (12)

where

 (13)

is computationally challenging since it requires summing over elements. The posterior distribution is the ultimate solution to the Bayesian inversion problem, but it is often represented by a set of independent realizations from . Characteristics such as marginal probability profiles MPR for :

 pκMPR={p(κt=κ|d);t=1,…,n}, (14)

maximum posterior (MAP) predictor

 κMAP=argmaxκp(κ|d), (15)

or alternatively the marginal maximum posterior (MMAP) predictor

 κMMAP={argmaxκp(κt=κ|d);t=1,…,n} (16)

are often used to describe the distribution. Since the MMAP predictor is only a sequence of pointwise maximums, it need not necessarily equal the computationally unfeasible global MAP predictor. Indeed, transitions having zero-probability in the prior model may occur in the MMAP since the latter is only a pointwise property.

It can be shown that the posterior model  is also a Gaussian mixture density

 p(m|d)=∑κϕn(m;μm|d,κ,Σm|d,κ)p(κ|d), (17)

where the model parameters and are as given in [14]. That is, is a conjugate prior model with respect to the Gaussian likelihood function . We define the following MMAP predictor:

 ^mt=argmaxm{∑κϕ1(m;μm|d,κ,σ2m|d,κ)p(κ|d)} (18)

for , being univariate optimizations. Here, is the -th element in and is the corresponding -th diagonal element in . Posterior % prediction intervals are obtained similarly.

In a generic convolved hidden Markov model there are severe couplings, mostly due to the convolution operator  and spatial dependence. Assessment of the posterior model by brute-force single-site proposal MCMC algorithms is not feasible. We choose to approximate the likelihood model such that we obtain an approximate posterior model which is analytically tractable. This approximate posterior model is afterwards used as a proposal distribution in an independent proposal MCMC MH algorithm to assess the correct posterior model. For a simpler model [25] obtained satisfactory acceptance rates in a simulation study.

## 3 Likelihood approximations

We present two likelihood approximations of the gross likelihood, inspired by [25], to obtain approximate posterior models on low-order factorial form. Such models are efficiently assessed by recursive algorithms. We consider only approximations of the likelihood function denoted by for , since the spatial correlation in the response likelihood and the convolution acquisition likelihood contribute with the major spatial couplings in the model. Recall that a likelihood function, contrary to a probability density, need not be normalized, hence the former is scale invariant.

Define for , similarly as . From the definition of the response likelihood function it follows that:

 p(m(k)t∣∣κ)=p(m(k)t∣∣κ(k)t). (19)

The proposed approximations are based on, respectively, a naïve truncation of the likelihood function and a Gaussian approximation of the Gaussian mixture pdf .

The proposed likelihood approximations should be such that is small with respect to some measure, and decrease for increasing . In practice, we have to accepted a low-order approximation since we have to re-run the approximation for different model parameter values.

### 3.1 Truncation approximation

First we consider a naïve approximation, inspired by Approximation 1 in [25], by truncating the marginal densities. For simplicity we assume , since coloured errors in  appear as results of the convolution. Thus, the acquisition likelihood can be written on factorial form:

 p(d|m)=n∏t=1p(dt|m). (20)

Define the -band truncated matrix by truncating every element in  more than from the diagonal equal to zero, where is such that for . Thus, it follows that the -th order marginal acquisition likelihood for is given as:

 p(k)(dt|m)=ϕ1(dt;wktm,σ2d|m), (21)

where is the -th row in . Combining Eq. (19) and Eq. (21) we obtain

 (22)

which are Gaussian pdfs for . We define the -th order truncation likelihood approximation as:

 (23)

where first and last factor are boundary correction terms. Note that each is only dependent on a small subset of the observations. Indeed, if  and  are diagonal matrices the truncation approximation of order one is exact, and the model correspond to a standard hidden Markov model. Note that we do not require  to be independent of . The definition above is based on representing a time-invariant convolution, but it can be extended to an arbitrary linear operator  by -truncating each row such that it covers as much weight as possible, and by extracting the corresponding subset of .

### 3.2 Projection approximation

The second proposed likelihood approximation, which we denote the projection approximation, is based on a Gaussian approximation of the Gaussian mixture pdf . Let be the Gaussian approximation with mean

 [μm]t=∑κtμmt|κtps(κt)=μm;t=1,…,n, (24)

and covariance

 [Σm]t,s =∑κt∑κsp(κt,κs){σmt|κtσms|κsρm|κ(t−s) (25) +(μmt|κt−μm)(μms|κs−μm)}

for . These expressions are obtained analytically using the laws of total expectation and covariance.

The joint approximate distribution is a Gaussian pdf, thus also the marginal distributions for are Gaussian pdfs. By conditioning, we define the -th order approximate acquisition likelihood model for , which are Gaussian pdfs. Combining the results above with Eq. (19) it can be verified that

 p(k)(d|κ(k)t)=∫p(k)(d,m(k)t∣∣κ(k)t)\difm(k)t (26)

are Gaussian pdfs for . We note that the marginalization requires evaluation of a -dimensional Gaussian pdf, , which is computationally expensive for large . Indeed, by Bayes’ theorem, we have that

 p(k)(d|m(k)t)=p(k)(m(k)t∣∣d)p(d)p(k)(m(k)t). (27)

Thus, we rephrase Eq. (26) as:

 p(k)(d|κ(k)t)∝∫p(k)(m(k)t∣∣d)p(k)(m(k)t)p(m(k)t∣∣κ(k)t)\difm(k)t∝∫p(k)(m(k)t∣∣d)p(k)(m(k)t)p(m(k)t∣∣κ(k)t)\difm(k)t, (28)

where the densities to be evaluated are of dimension .

We define the -th order projection approximation as follows:

 (29)

where the -root ensures that each datum is used exactly once, and the first and last factors are boundary corrections.

Note that the projection approximation is straightforward to generalize to an arbitrary linear operator . Contrary to the truncation approximation, the full set of observations d is used for each in the projection approximation.

## 4 Assessment of posterior model

Assessment of the correct posterior model by brute-force single-site simulation is unfeasible due to spatial and convolutional coupling in the observations, and possible prior ordering constraints on the categorical variable. We define an independent proposal MCMC MH algorithm based on an approximate posterior model , where the latter is exactly accessed by a recursive algorithm.

Both the truncation and projection based approximate likelihoods functions can be phrased on factorial form

 p(k)(d|κ)=p(k)a(d|κ(k)k)×n−1∏t=k+1p(k)a(d|κ(k)t)×p(k)a(d|κ(k)n), (30)

where the subscript denotes the chosen approximation type. Thus, their approximate posterior densities can be phrased as follows:

 (31)

which is a -th order non-homogeneous Markov chain. Such factorisable posterior models are exactly and efficiently assessed by the recursive Forward-Backward algorithm in operations [23]. For a given approximation order , the marginal probability (MPR) profiles for the approximate posterior model is denoted aMPR for and . Correspondingly we denote the maximum posterior (MAP) and marginal maximum posterior (MMAP) predictors, respectively, aMAP and aMMAP. These characteristics are exactly calculated for the approximate posterior model. Assessment of aMAP requires the use of the recursive Viterbi algorithm [31]. We refer to [20] for a discussion of model parameter estimation in a convolved hidden Markov model.

The correct posterior model  is assessed by an independent proposal MCMC MH algorithm [26] using as proposal distribution. At each iteration, the acceptance probability is given as

 (32)

where the troublesome normalizing constant in Eq. (1) cancels. After burn-in we generate realizations for from the correct posterior model . To empirically quantify the quality of the proposed approximations we consider the similarity measure defined in [25]:

 α(k)=Eκ,~κ{min{1,p(~κ|d)p(κ|d)×p(k)(κ|d)p(k)(~κ|d)}}, (33)

being the average acceptance rate in the MCMC algorithm. We define and to be the acceptance rates based on, respectively, the -th order truncation and projection approximation. Acceptance rates close to unity entails that is close to , which is as desired. We define performance measures and to compare the effect of the approximation order as a function of computational demands.

We estimate the marginal probabilities profiles MPR for the correct posterior model for each by:

 ^pκMPR={^pκt=1BB∑b=1I(κbt=κ);t=1,…,n}, (34)

and similarly the marginal MAP predictor by

 ^κmmap={^κt=argmaxκ ^pκt;t=1,…,n}. (35)

Evaluating the correct posterior density is computationally expensive since each iteration requires evaluation of a -dimensional Gaussian pdf . In practice, this limits the number of realizations to be generated in reasonable time. However, we note that the MCMC-step is essentially independent of the approximation order ; that is, for a fixed computational budget it can be beneficiary to generate few realizations from a high order approximation than many realizations from a low order approximation.

Zero-transitions in the prior matrix  enforces zero-transitions in the posterior, thus the -th order approximation can be obtained at a cost smaller than the theoretical for a full matrix .

## 5 Synthetic example

We empirically evaluate the proposed likelihood approximations for various order in a synthetic example. The synthetic example is defined from a Base case and six deviating cases, having different model parameters. We avoid the limiting cases where the covariance matrix in the gross likelihood tends toward a diagonal matrix, which is the well-known standard hidden Markov model. That is, we consider only deviating cases where the number of non-zero entries in the covariance matrix is at least as high as for the Base case. For the standard hidden Markov model the truncation approximation is exact for all , and the projection approximation is empirically verified to have an acceptance rate close to unity for all . We claim that the chosen deviating cases are the most challenging ones, since they appear with strong spatial coupling in the likelihood function.

The reference profile of interest is assumed to be of length and have three distinct classes; namely , see Figure 2. The reference profile is identical for all cases to be discussed, and it is generated as a realization from a first order stationary Markov chain with transition matrix

 P=⎛⎜⎝0.800.150.050.150.750.100.050.050.90⎞⎟⎠, (36)

having stationary distribution .

The Base case has response likelihood model parameters:

 (μmt|κt,σmt|κt)=⎧⎪⎨⎪⎩(−1,0.5)if black(0,0.5)if red(1,0.5)if brown, (37)

and spatial correlation function for . The acquisition likelihood model is specified through a second-order exponential convolution kernel, .

The deviating cases, displayed in Fig. 2, are defined as follows, relative to the Base case:

1. High smoothness: a spatial correlation function with the same range, but a higher degree of smoothness: .

2. Long correlation: a spatial correlation function with an identical degree of smoothness but a higher range parameter: .

3. Overlapping classes: the black and red classes have identical pointwise means, but different marginal variances:

 (μmt|κt,σmt|κt)=⎧⎪⎨⎪⎩(0,0.1)if black(0,0.5)if red(1,0.5)if brown (38)
4. Wide convolution: a second-order exponential convolution kernel:

5. High noise: an acquisition likelihood with a large error component: .

6. Ricker convolution: a convolution kernel with negative lobes:
.

Note that cases one through three are defined by different response models, resulting in different  signals, however they do have an identical acquisition model. Cases four through six have an identical response profile, but different acquisition models. The various set of observations are generated by simulation from the various gross likelihood models given the reference profile .

Fig. 2 contains the reference classification , response signals  and observations d displayed for the various cases. The objective is to predict  given d for each case. The reference profile , and  and d in pairs are displayed in the top row for the Base, High smoothness, Long correlation and Overlapping classes cases. Relative to the Base case,  appears to be smoother in the High smoothness and Long correlation cases, however, the observations appears to be almost identical. For the Overlapping classes case we observe that d has less variability than in the Base case.

The bottom row displays, in a similar format, the Base case, Wide convolution, High noise and Ricker convolution cases. We do not display  in the three latter cases since the response models are assumed to be identical to the Base case to ease interpretation. The respective observations are very different, however.

We assess by simulation the associated signal-to-noise-ratios:

 snr=tr(Wμm|κ)tr(WΣr|κW⊺+Σd|m). (39)

The signal-to-noise-ratio for the Base case is 2.53, and the signal-to-noise-ratios for the deviating cases are presented in Table 1. The Base case, High smoothness, Wide convolution and Ricker convolution cases have almost identical signal-to-noise-ratios, while the other cases appear with lower singal-to-noise-ratios. Particularly the High noise case has a large noise component, as it should.

For each case we apply both the truncation and the projection approximations to assess the corresponding approximate posterior models by the efficient recursive Forward-Backward algorithm. We use the approximate posterior models as proposal distribution in an independent proposal MCMC MH algorithm to assess the correct posterior model for each case, and 160,000 realizations are generated from the correct posterior model. We discard the initial 10,000 realizations as the burn-in phase. A higher order delayed rejection step [29] is included in the algorithm. Fig. 3 contains a sequence of 1,000 consecutive posterior realizations for the Base case based on a ninth order projection approximation. We observe relatively good mixing, with the three classes represented in the realizations. This characteristic is shared also by the other cases, but the results are not presented here.

In Fig. 4 results for the Base case are presented. The top line of display is based on the truncation approximation, while the bottom line is for the projection approximation. Both the aMPR profiles for and the aMAP predictors are displayed for varying orders of of approximations. These characteristics are exactly calculated by recursive algorithms, although the computer demand increase fast with increasing order . To the right estimated MPR profiles for and the MMAP predictor for the correct posterior model are displayed together with the reference profile . These characteristics are only available by MCMC based inference. Note that the reference profile is more heterogeneous than the MMAP predictor due to the regression towards the local majority class.

The aMPR profiles for the truncation approximation tend towards the MPR profiles as the order increase, which is desirable. It can be shown that for they are identical. It is problematic, however, that for low order the approximations are not satisfactory, since the computer demands increases fast with . For the projection approximation the results appears to be better; good approximations for low order and improvements towards the correct model results as increase.

Lastly, note that the MCMC estimates of the correct model characteristics are almost identical - independent of whether the proposal is based on the truncation or projection approximation - of course. The MCMC acceptance rates will of course depend on the proposal distribution.

In Fig. 5 the similarity measures and for the approximate versus correct posterior model for all cases are presented in the top row for increasing . The corresponding performance measures and , taking also computer demands into account, are displayed in the bottom row.

The similarity measures and appears to increase monotonically with increased order in the various cases. This indicates that the two sequences of approximations, parametrized by order , provides monotonical improved approximations for the correct posterior model with increasing at the cost of increased computer demands. The projection approximation is almost uniformly better than the truncation approximation. For all cases, the similarity measure for the former reaches the range for a ninth order approximation, which entails average acceptance rates of % in the independent proposal MCMC MH algorithm. These acceptance rates are very satisfactory, but the computer demands for the ninth order projection approximation is large. In the bottom line, we observe that the performance measure indicates that the order in around 3 is optimal with acceptance rates in range %.

Fig. 6 contains exact aMPR profiles for the deviating cases for and exact aMAP predictors for the truncation and projection approximations of order . Also, the estimated MPR profiles for and MMAP predictor for the correct posterior models are displayed. The aMPR profiles and aMAP predictors based on the projection approximation are closer to the MRP profiles and MMAP predictors for the correct posterior model than the ones based on the truncation approximation. It is particularly so for the Long correlation, Overlapping classes, Wide convolution and Rick convolution cases, that is, for cases with strong spatial coupling between states. The aMPR profiles for and aMAP predictors, which can be exactly assessed by recursive algorithms, are so close to the corresponding characteristics for the correct posterior model that one may question the necessity of a MCMC step. Recall that calculating the posterior density in each MCMC iteration is computer demanding.

We define the following MMAP predictor: from Eq. (18). Posterior realizations, MMAP predictor, 80 % prediction interval and the fitted density are displayed in Fig. 7. We observe that the MMAP predictor captures the class transitions in . The fitted density appear with similar characteristics, such as bimodality and skewness, as . Compared to the true response profile  we obtain a root mean squared error (rmse) value of 0.54 in the Base case. In Fig. 8 we display the empirical coverage ratios for various prediction interval, and we observe the coverage ratios to be satisfactory.

In Fig. 9 we present posterior results for for the deviating cases. The MMAP predictors are seen to capture the main characteristics of the true response profile  in most of the cases. Also, the 80 % prediction intervals appear to be satisfactory. Coverage ratios for the 80 % prediction intervals and rmse values are given in Tab. 2 for the deviating cases. As expected, the rmse values are higher in the Wide convolution and High noise case. Somewhat surprisingly the rmse values are almost identical in the Base, High smoothness, Long correlation and Ricker convolution cases. In general, the 80 % coverage ratios are satisfactory.

To conclude - the projection approximation is clearly preferable to the truncation approximation, particularly for models where the likelihood functions have a high degree of spatial coupling. These models are indeed the ones most challenging to invert. For hard problems with many classes a projection approximation of order and a MCMC step to assess the correct posterior model is recommended. In most cases with a small to moderate number of classes a projection approximation of order about nine will provide aMPR profiles for and aMAP predictors that are so close to the correct MPR profiles and MMAP predictor, that the MCMC step need not be necessary. The MMAP predictor is seen to capture the main characteristics of the true response profile , and the coverage ratios appear to be satisfactory compared to the reference. Bimodality and skewness are present in the posterior model , as desired.

## 6 Norwegian Sea case study

Lithology/fluid prediction subsurface is a major challenge in reservoir characterization. The variables of interest are the lithology/fluid classes and elastic attributes along a vertical profile penetrating a reservoir unit, and the objective is to predict these variables given a set of seismic amplitude-versus-observations (AVO) observations. We demonstrate the proposed methodology on a case study from the Norwegian Sea. The data set covers a mid-Jurassic gas sandstone reservoir in the Garn and Ile formation, separated by a thick silty-shale formation. The reservoir zones are characterized by a relatively low P-wave velocity and density. We refer to [2] for further details about the reservoir of interest.

We discretize the upscaled region of interest onto a regular grid of size , and we operate in time-domain. Three distinct lithology/fluid classes are identified in a reference solution (Fig. 11), namely, . We define a first order Markov chain prior model for the lithology/fluid classes:

 P=⎛⎜⎝0.750.050.200.150.8500.3000.70⎞⎟⎠, (40)

having stationary distribution . We note the prior zero-probability transitions between gas sandstone and brine sandstone due to gravitational sorting [17].

The middle-layer  represents the logarithm of the elastic material properties, which are the canonical variables for the seismic AVO observations. These properties are parametrized by the logarithm of pressure-wave velocity , the logarithm of shear-wave velocity , and the logarithm of the density . The response likelihood model, or rock-physics model [3, 21], is empirically calibrated from an upscaled nearby well. Contour plots for the various are displayed in Fig. 10. Indeed, as seen in the contour plot, the elastic attributes are correlated. An exponential spatial correlation function is also estimated from the nearby well.

The upper-layer d represents angle-dependent seismic AVO data from near and far angles (Fig. 11). The reflections and wave propagation subsurface are modeled by a convolutional, linearized Zoeppritz version of the wave equation [7]. That is, , where is a convolutional matrix,  is the angle-dependent weak-contrast Aki-Richards coefficients [1], and is a differential matrix which calculates contrasts. The covariance matrix in the acquisition model is assumed to be . The signal-to-noise-ratio is assessed using Eq. (39), and it is estimated to be 2.23, which is comparable to most cases discussed in Section 6.

We consider only the projection approximation for based on the discussion in Section 5. A set of 100,000 realizations is generated from , and the initial 10,000 realizations are discarded as the burn-in phase. The similarity measures and the performance measures are given in Table 3. We obtain acceptance rates in the range 0.15 - 0.35, which is monotonically increasing with except for . As in Section 6, around 2-3 appear to be a reasonable trade-off between acceptance rate and computational cost.

Fig. 12 contains aMPR profiles for and aMMAP predictors for based on the projection approximation. Both the aMPR profiles and the aMMAP predictors appear to be similar and almost identical for . The estimated MPR profiles are displayed together with the MMAP predictor, and we observe that most of their characteristics are shared by the approximate solutions. However, we note that in contrast to the aMMAP predictors, the thin silty-shale layer around 2320 ms is identified in the MMAP. The brine sandstone layer at 2430 ms is captured in the aMAP for low order , but it is not captured for higher order . As seen in the aMPR profiles the marginal probability for brine sandstone is approximately 50 % around 2430 ms, thus it does not influence the proposal density in the MCMC algorithm significantly. The MMAP predictor is observed to capture the main characteristics of the reference profile , however small-scale variability is lost since predictions causes a regression towards the dominating class; shale. We observe that the MMAP predictor is more uncertain in the lowermost part of the the domain of interest based on the MPR profiles. Indeed, this in correspondence with the rock-physics model where we see that the brine sandstone is partly masked by shale whereas gas sandstone is clearly separated from shale (Fig. 10).

Posterior predictions and prediction intervals for the elastic attributes are displayed in Fig. 13. In specific, we observe that we are able to predict low and zones where the reservoirs are located. We display the kernel smoothed time-averaged histograms at the bottom in Fig. 13 and observe the posterior models for and to resemble the observed smoothed histograms based on the well observations. Root mean squaredd errors for the predictions are for . Coverage ratios for the 80 % prediction intervals are for , which are satisfactory.

As seen in Fig. 13 the width of the prediction intervals varies with time, in specific, the width of the prediction interval in the uppermost gas reservoir is wider than in the lowermost gas reservoir. Fig. 14 displays histograms of at 2320 ms and 2360 ms. The posterior is observed to wider at the former. At 2320 ms the posterior is bimodal, while it is unimodal at 2360 ms. This is in consistent with the MPR profiles displayed in Fig. 12 where the marginal probability for shale is higher in the upper reservoir. That is, the uncertainty in the lithology/fluid classification propagates into the uncertainty of the posterior for the elastic properties.

## 7 Conclusion

A convolved hidden Markov model extending [18] and [25] is defined inspired by applications in subsurface reservoir prediction. The bottom-layer is a categorical model, and the middle-layer is Gaussian spatial model conditional on the bottom-layer. Hence, the middle-layer appears as an unconditional Gaussian mixture spatial variable. The top-layer, representing the observations, is assumed to be Gaussian conditional on the middle-layer. Prediction of the categorical and Gaussian mixture variables, given the observations in a Bayesian inversion framework, is of interest.

Two classes of likelihood approximations are presented to obtain approximate posterior models on factorial form, which are analytically and efficiently assessed by the recursive Forward Backward algorithm. The first approximation is based on a naïve truncation of the marginal densities, while the second approximation is based on a Gaussian approximation of a Gaussian mixture density. The correct posterior model is thereafter assessed using an approximate posterior model as proposal density in an independent proposal MCMC MH algorithm. The approximations are empirically evaluated on a set of synthetic cases. In general, higher order approximations results in acceptance rates up to around 0.5, at the cost of additional computational resources which increases exponentially. The projection approximation appears with higher acceptance rates in the MCMC algorithm than the truncation approximation. We observe that about two or three appears to be a reasonable trade-off between accuracy and computational cost.

The MMAP predictor for the middle-layer is verified to reproduce bimodality and skewness in the middle layer in the synthetic examples. Coverage ratios for the prediction intervals are observed to be satisfactory.

The projection approximation have been empirically verified on a subsurface case study to predict both the lithology/fluid classes and elastic material properties reliably.

We observe that for a higher order approximation additional MCMC sampling may not be necessary since the approximate posterior models appear to be very close to the correct posterior model.

The current work should naturally be extended to 2D and 3D models along the lines in [30] and [24]. Preliminary work on real 2D subsurface seismic show encouraging results.

### Acknowledgments

This research is part of the Uncertainty in Reservoir Evaluation (URE) activity at the Norwegian University of Science and Technology (NTNU). We thank PGS and Tullow Oil for providing the data.

## References

• Aki and Richards [1980] Aki, K. and P. Richards (1980). Quantitative Seismology: Theory and Methods. W. H. Freeman and Co., New York.
• Avseth et al. [2016] Avseth, P., A. Janke, and F. Horn (2016). AVO inversion in exploration - Key learnings from a Norwegian Sea discovery. The Leading Edge 35, 405–414.
• Avseth et al. [2005] Avseth, P., T. Mukerji, and G. Mavko (2005). Quantitative Seismic Interpretation: Applying Rock Physics Tools to Reduce Interpretation Risk. Cambridge University Press.
• Barembruch et al. [2009] Barembruch, S., A. Garivier, and E. Moulines (2009). On Approximate Maximum-Likelihood Methods for Blind Identification: How to Cope With the Curse of Dimensionality. IEEE Transactions on Signal Processing 57(11), 4247–4259.
• Baum et al. [1970] Baum, L. E., T. Petrie, G. Soules, and N. Weiss (1970). A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics 41(1), 164–171.
• Besag [1974] Besag, J. (1974). Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society. Series B (Methodological) 36(2), pp. 192–236.
• Buland and Omre [2003] Buland, A. and H. Omre (2003). Bayesian linearized AVO inversion. Geophysics 68(1), 185–198.
• Connolly and Hughes [2016] Connolly, P. A. and M. J. Hughes (2016). Stochastic inversion by matching to large numbers of pseudo-wells. Geophysics 81(2), M7–M22.
• Doyen [2007] Doyen, P. (2007). Seismic reservoir characterization: an earth modelling perspective. Education tour series. EAGE publications.
• Fearnhead and Clifford [2003] Fearnhead, P. and P. Clifford (2003). On-line inference for hidden Markov models via particle filters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(4), 887–899.
• Frühwirth-Schnatter [2006] Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer Series in Statistics. Springer New York.
• Giordani et al. [2007] Giordani, P., R. Kohn, and D. van Dijk (2007). A unified approach to nonlinearity, structural change, and outliers. Journal of Econometrics 137(1), 112 – 133.
• Grana and Della Rossa [2010] Grana, D. and E. Della Rossa (2010). Probabilistic petrophysical-properties estimation integrating statistical rock physics with seismic inversion. Geophysics 75(3), O21–O37.
• Grana et al. [2017] Grana, D., T. Fjeldstad, and H. Omre (2017). Bayesian Gaussian Mixture Linear Inversion for Geophysical Inverse Problems. Mathematical Geosciences, in press.
• Gunning and Glinsky [2007] Gunning, J. and M. E. Glinsky (2007). Detection of reservoir quality using Bayesian seismic inversion. Geophysics 72(3), R37–R49.
• Hammer and Tjelmeland [2011] Hammer, H. and H. Tjelmeland (2011). Approximate forward–backward algorithm for a switching linear Gaussian model. Computational Statistics & Data Analysis 55(1), 154 – 167.
• Krumbein and Dacey [1969] Krumbein, W. C. and M. F. Dacey (1969). Markov chains and embedded Markov chains in geology. Mathematical Geology 1, 79–96.
• Larsen et al. [2006] Larsen, A. L., M. Ulvmoen, H. Omre, and A. Buland (2006). Bayesian lithology/fluid prediction and simulation on the basis of a Markov-chain prior model. Geophysics 71(5), R69–R78.
• Lindberg and Omre [2014] Lindberg, D. V. and H. Omre (2014). Blind Categorical Deconvolution in Two-Level Hidden Markov Models. IEEE Transactions on Geoscience and Remote Sensing 52(11), 7435–7447.
• Lindberg and Omre [2015] Lindberg, D. V. and H. Omre (2015). Inference of the Transition Matrix in Convolved Hidden Markov Models and the Generalized Baum-Welch Algorithm. IEEE Transactions on Geoscience and Remote Sensing 53(12), 6443–6456.
• Mavko et al. [2009] Mavko, G., T. Mukerji, and J. Dvorkin (2009). The Rock Physics Handbook (Second ed.). Cambridge University Press.
• Rabiner [1989] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286.
• Reeves and Pettitt [2004] Reeves, R. and A. N. Pettitt (2004). Efficient recursions for general factorisable models. Biometrika 91, 751–757.
• Rimstad and Omre [2010] Rimstad, K. and H. Omre (2010). Impact of rock-physics depth trends and Markov random fields on hierarchical Bayesian lithology/fluid prediction. Geophysics 75(4), R93–R108.
• Rimstad and Omre [2013] Rimstad, K. and H. Omre (2013). Approximate posterior distributions for convolutional two-level hidden Markov models. Computational Statistics & Data Analysis 58, 187–200.
• Robert and Casella [2005] Robert, C. P. and G. Casella (2005). Monte Carlo Statistical Methods (Springer Texts in Statistics). Secaucus, NJ, USA: Springer-Verlag New York, Inc.
• Scott [2002] Scott, S. L. (2002). Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. Journal of the American Statistical Association 97(457), 337–351.
• Tarantola [2005] Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics.
• Trias et al. [2009] Trias, M., A. Vecchio, and J. Veitch (2009). Delayed rejection schemes for efficient Markov-Chain Monte-Carlo sampling of multimodal distributions.
• Ulvmoen and Omre [2010] Ulvmoen, M. and H. Omre (2010). Improved resolution in Baeysian lithology/fluid inversion from seismic prestack data and well observations: Part I - Methodology. Geophysics 75(2), R21–R35.
• Viterbi [1967] Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters