Bayesian inversion of convolved hidden Markov models with applications in reservoir prediction
Abstract
Efficient assessment of convolved hidden Markov models is discussed. The bottomlayer is defined as an unobservable categorical firstorder Markov chain, while the middlelayer is assumed to be a Gaussian spatial variable conditional on the bottomlayer. Hence, this layer appear as a Gaussian mixture spatial variable unconditionally. We observe the toplayer as a convolution of the middlelayer with Gaussian errors. Focus is on assessment of the categorical and Gaussian mixture variables given the observations, and we operate in a Bayesian inversion framework. The model is defined to make inversion of subsurface seismic AVO data into lithology/fluid classes and to assess the associated elastic material properties. Due to the spatial coupling in the likelihood functions, evaluation of the posterior normalizing constant is computationally demanding, and bruteforce, singlesite updating Markov chain Monte Carlo algorithms converges far too slow to be useful. We construct two classes of approximate posterior models which we assess analytically and efficiently using the recursive ForwardBackward algorithm. These approximate posterior densities are used as proposal densities in an independent proposal Markov chain Monte Carlo algorithm, to assess the correct posterior model. A set of synthetic realistic examples are presented. The proposed approximations provides efficient proposal densities which results in acceptance probabilities in the range 0.100.50 in the Markov chain Monte Carlo algorithm. A case study of lithology/fluid seismic inversion is presented. The lithology/fluid classes and the elastic material properties can be reliably predicted.
1 Introduction
Inverse problems arises naturally in several fields of engineering, such as image analysis and signal processing, and constitutes a major challenge. The variables of interest are observed through an acquisition procedure together with an error term. The objective is to predict the variables of interest given the observations, being an inverse problem. We cast the problem into a Bayesian inversion framework [28], defined by a likelihood and a prior model.
Focus is on the class of switching statespace models where the likelihood function for the observations vary according to an unobservable categorical random process, see [11] and references therein. Switching statespace models have numerous applications in, for example, econometrics [12], signal processing [4], speech recognition [22] and blind deconvolution [19].
We restrict ourselves to the subclass of convolved hidden Markov models inspired by [18], [30] and [25]. The bottomlayer is assumed to be an unobservable categorical Markov chain. Conditional on the bottomlayer we define a middlelayer being a Gaussian spatial variable, which then appear as a Gaussian mixture spatial variable unconditionally. The toplayer, representing the convolved observations, is assumed to be Gaussian conditional on the middlelayer. Focus is on assessment of the categorical and the Gaussian mixture variables given the convolved observations, which appear as an inverse problem.
Recursive algorithms have proven to be successful for hidden Markov models [27, 23]. Indeed, they are wellsuited for models where each observation depend only on current and past values of the categorical process, not future values which is the case with convolved observations. Thus, for more complex likelihood functions sample based inference by Monte Carlo sampling, such as particle filters, is often required [10].
Consider a discretized vertical profile of categorical classes subsurface penetrating a reservoir unit. The categorical variable in the bottomlayer represent, for example, geological lithology/fluid classes such as gas sandstone, brine sandstone or shale. In reservoir characterisation such inverse problems are of utmost importance to predict the presence of hydrocarbons [3, 9, 15]. The middlelayer, being a Gaussian mixture spatial variable, represents, for example, the elastic material properties of the reservoir. [7] proposed a linearized Bayesian inversion technique for assessment of continuousvalued properties subsurface, however they did not account for the different effects of the lithology/fluid classes. [13] proposed a Gaussian mixture density prior model for seismic velocities, but their model did not included spatial dependence in the middlelayer. [8] proposed an approximate sampling technique based on approximate Bayesian computation to assess lithology/fluid classes and the continuousvalued properties.
We consider an extension of the convolved hidden Markov model defined and evaluated in [18] and [25]. We assume a Markov chain prior model for the bottomlayer. In our model the middlelayer is Gaussian spatial variable conditional on the bottomlayer defined by classdependent expectations and variances, and a spatial correlation function. The spatial correlation is defined to enforce spatial continuity in the middlelayer, a property observed in subsurface seismic velocities. Unconditionally the middlelayer appear as a Gaussian mixture spatial model, extending the traditional Gaussian prior assumption for seismic velocities [7]. The toplayer contains convolved observations, which appear unconditionally as a Gaussian mixture random variable dependent on past, current and future values of the two lower layers. Assessment of the categorical variable given the observations, which constitutes a challenging categorical inverse problem, is discussed. Moreover, we discuss simulation and prediction of the middlelayer Gaussian mixture variable.
The convolution and spatial coupling in the likelihood model prevents straightforward use of recursive algorithms since the posterior model can not be written on factorial form [23]. [18], [16] and [25] proposed various approximate posterior models on lower order factorial form which are computationally feasible. For the extended model defined in our study, we present two classes of likelihood approximations and demonstrate that their posterior models can be written on factorial form. Hence, the approximate posterior models can be efficiently assessed by the recursive ForwardBackward algorithm [5]. The correct posterior model is assessed by an independent proposal Markov chain Monte Carlo (MCMC) MetropolisHastings (MH) algorithm, with the approximate posterior model as proposal density. In general, independent proposal densities results in a poor rate of convergence. [25] verified in a simulation study that it is possible to obtain satisfactory acceptance rates with their proposed approximate posterior models.
Denote by a generic probability distribution for both categorical and continuous random variables. The probability density function (pdf) for a dimensional Gaussian random variable having mean and covariance matrix is denoted by . We refer to a likelihood model that is linear in the conditioning variable with additive Gaussian errors as a Gausslinear likelihood model. Let be the identity matrix.
In Section 2 we define the convolved hidden Markov model. Section 3 contains the definition of the proposed likelihood approximations, and a MCMC algorithm to assess the correct posterior model is presented in Section 4. A simulation study is included in Section 5, where we consider various model parameters to empirically evaluate the overall performance of the two proposed likelihood approximations. The synthetic simulation models are chosen to be comparable to the ones in [25]. A seismic inversion case study based on real data is included in Section 6.
2 Model specification
Consider an unobservable categorical variable on a discretized topdown regular grid , along a vertical profile, representing for example lithology/fluid classes of the subsurface. Label the variable for . A continuous valued variable is observed and represent, for example, seismic data. The main objective is to predict given d with the associated uncertainty, being a categorical inverse problem. We operate in a Bayesian framework, and the posterior density for the categorical variable given the observations is defined by Bayes’ theorem:
(1) 
where is the normalizing constant, is the likelihood function, and is the prior model. The solution of the categorial inverse problem in Eq. (1) is a computationally challenging problem, and analytical assessment is only feasible for very particular models. We introduce an additional continuous valued variable separating d and . That is, we assume d and to be conditionally independent given . The variable represents, for example, the logarithm of the seismic velocity. We define the gross likelihood model [18] as:
(2) 
where and are respectively the acquisition and response likelihood functions. The variable is marginalized out in , but its uncertainty propagates into . In general, integrating out in Eq. (2) is a challenging problem caused by its dimension, and analytical expressions need not exist.
2.1 Prior model
We define a Markov chain prior model to represent vertical dependency in the categorical variable of interest. Let the categorical variable be defined by a th order stationary Markov chain,
(3) 
with for . The latter equality is justified by a trivial extension of the sample space since the set is a subset of the conditioning set. The corresponding transition matrix is timeindependent and contain at most nonzero elements, due to the trivial extension. Let be the corresponding stationary distribution. Indeed in this case the spatial Markov property is , and the prior model is a th order Markov random field [6].
2.2 Likelihood model
We define the response model as follows:
(4) 
where is the vector with pointwise conditional means given , and is a centered Gaussian process having covariance matrix . The response likelihood is therefore Gaussian:
(5) 
We decompose the covariance matrix as follows:
(6) 
where is a diagonal matrix with the conditional standard deviations. The correlation matrix is defined from a spatial correlation function , where for all combinations of . This entails that the response process is constructed by a normalized Gaussian process which is scaled and shifted dependent on the current categorical variable for each . For each , the Gaussian process representing the current categorical variable is assigned. This response process can be extended to have a set of correlated Gaussian processes, with separate expectations, variances and spatial correlation functions  one for each of the classes of the categorical variable. The inversion methodology defined in the following sections work also for this case, but the notation will be more complex. We present only the simpler parametrization in Eq. (5)(6) to ease readability.
The marginal pdf of :
(7) 
is a dimensional Gaussian mixture pdf with at most unique modes. Even for short profiles, evaluation of is unfeasible since it requires evaluation of different configurations of . Also, for , the unconditional pdfs
(8) 
are univariate Gaussian mixture pdfs with at most unique modes.
We consider a convolved data acquisition procedure represented by the convolution matrix , representing timeindependent convolutions. Hence, each datum appear as a weighted sum of neighboring elements of the response variable . Consider the following linear acquisition model:
(9) 
where is a centered Gaussian process with covariance matrix . Thus, the acquisition likelihood model:
(10) 
is Gaussian with mean vector and covariance matrix . Indeed, the proposed acquisition likelihood is straightforward to generalize to cases where d and are of different dimension.
Since both the response and acquisition likelihood models are assumed to be Gaussian, the challenging gross likelihood model in Eq. (2) is also Gaussian:
(11) 
Given , each datum has an expectation as a weighted sum of the conditional mean vector . Note that the explicit dependence on in the covariance matrix requires the covariance matrix to be computed for each unique , thus evaluation of the likelihood is computationally expensive for a set of unique .
2.3 Posterior model
The model defined in the previous section may be represented by a graph. If we assume that the spatial correlation function is on first order exponential form, the convolution kernel defining has finite support and the prior model is a first order Markov chain, the graph takes a simple form (Fig. 1). Indeed, each observation is seen to be dependent on a large subset of .
The general posterior model is
(12) 
where
(13) 
is computationally challenging since it requires summing over elements. The posterior distribution is the ultimate solution to the Bayesian inversion problem, but it is often represented by a set of independent realizations from . Characteristics such as marginal probability profiles MPR for :
(14) 
maximum posterior (MAP) predictor
(15) 
or alternatively the marginal maximum posterior (MMAP) predictor
(16) 
are often used to describe the distribution. Since the MMAP predictor is only a sequence of pointwise maximums, it need not necessarily equal the computationally unfeasible global MAP predictor. Indeed, transitions having zeroprobability in the prior model may occur in the MMAP since the latter is only a pointwise property.
It can be shown that the posterior model is also a Gaussian mixture density
(17) 
where the model parameters and are as given in [14]. That is, is a conjugate prior model with respect to the Gaussian likelihood function . We define the following MMAP predictor:
(18) 
for , being univariate optimizations. Here, is the th element in and is the corresponding th diagonal element in . Posterior % prediction intervals are obtained similarly.
In a generic convolved hidden Markov model there are severe couplings, mostly due to the convolution operator and spatial dependence. Assessment of the posterior model by bruteforce singlesite proposal MCMC algorithms is not feasible. We choose to approximate the likelihood model such that we obtain an approximate posterior model which is analytically tractable. This approximate posterior model is afterwards used as a proposal distribution in an independent proposal MCMC MH algorithm to assess the correct posterior model. For a simpler model [25] obtained satisfactory acceptance rates in a simulation study.
3 Likelihood approximations
We present two likelihood approximations of the gross likelihood, inspired by [25], to obtain approximate posterior models on loworder factorial form. Such models are efficiently assessed by recursive algorithms. We consider only approximations of the likelihood function denoted by for , since the spatial correlation in the response likelihood and the convolution acquisition likelihood contribute with the major spatial couplings in the model. Recall that a likelihood function, contrary to a probability density, need not be normalized, hence the former is scale invariant.
Define for , similarly as . From the definition of the response likelihood function it follows that:
(19) 
The proposed approximations are based on, respectively, a naïve truncation of the likelihood function and a Gaussian approximation of the Gaussian mixture pdf .
The proposed likelihood approximations should be such that is small with respect to some measure, and decrease for increasing . In practice, we have to accepted a loworder approximation since we have to rerun the approximation for different model parameter values.
3.1 Truncation approximation
First we consider a naïve approximation, inspired by Approximation 1 in [25], by truncating the marginal densities. For simplicity we assume , since coloured errors in appear as results of the convolution. Thus, the acquisition likelihood can be written on factorial form:
(20) 
Define the band truncated matrix by truncating every element in more than from the diagonal equal to zero, where is such that for . Thus, it follows that the th order marginal acquisition likelihood for is given as:
(21) 
where is the th row in . Combining Eq. (19) and Eq. (21) we obtain
(22) 
which are Gaussian pdfs for . We define the th order truncation likelihood approximation as:
(23) 
where first and last factor are boundary correction terms. Note that each is only dependent on a small subset of the observations. Indeed, if and are diagonal matrices the truncation approximation of order one is exact, and the model correspond to a standard hidden Markov model. Note that we do not require to be independent of . The definition above is based on representing a timeinvariant convolution, but it can be extended to an arbitrary linear operator by truncating each row such that it covers as much weight as possible, and by extracting the corresponding subset of .
3.2 Projection approximation
The second proposed likelihood approximation, which we denote the projection approximation, is based on a Gaussian approximation of the Gaussian mixture pdf . Let be the Gaussian approximation with mean
(24) 
and covariance
(25)  
for . These expressions are obtained analytically using the laws of total expectation and covariance.
The joint approximate distribution is a Gaussian pdf, thus also the marginal distributions for are Gaussian pdfs. By conditioning, we define the th order approximate acquisition likelihood model for , which are Gaussian pdfs. Combining the results above with Eq. (19) it can be verified that
(26) 
are Gaussian pdfs for . We note that the marginalization requires evaluation of a dimensional Gaussian pdf, , which is computationally expensive for large . Indeed, by Bayes’ theorem, we have that
(27) 
Thus, we rephrase Eq. (26) as:
(28) 
where the densities to be evaluated are of dimension .
We define the th order projection approximation as follows:
(29) 
where the root ensures that each datum is used exactly once, and the first and last factors are boundary corrections.
Note that the projection approximation is straightforward to generalize to an arbitrary linear operator . Contrary to the truncation approximation, the full set of observations d is used for each in the projection approximation.
4 Assessment of posterior model
Assessment of the correct posterior model by bruteforce singlesite simulation is unfeasible due to spatial and convolutional coupling in the observations, and possible prior ordering constraints on the categorical variable. We define an independent proposal MCMC MH algorithm based on an approximate posterior model , where the latter is exactly accessed by a recursive algorithm.
Both the truncation and projection based approximate likelihoods functions can be phrased on factorial form
(30) 
where the subscript denotes the chosen approximation type. Thus, their approximate posterior densities can be phrased as follows:
(31) 
which is a th order nonhomogeneous Markov chain. Such factorisable posterior models are exactly and efficiently assessed by the recursive ForwardBackward algorithm in operations [23]. For a given approximation order , the marginal probability (MPR) profiles for the approximate posterior model is denoted aMPR for and . Correspondingly we denote the maximum posterior (MAP) and marginal maximum posterior (MMAP) predictors, respectively, aMAP and aMMAP. These characteristics are exactly calculated for the approximate posterior model. Assessment of aMAP requires the use of the recursive Viterbi algorithm [31]. We refer to [20] for a discussion of model parameter estimation in a convolved hidden Markov model.
The correct posterior model is assessed by an independent proposal MCMC MH algorithm [26] using as proposal distribution. At each iteration, the acceptance probability is given as
(32) 
where the troublesome normalizing constant in Eq. (1) cancels. After burnin we generate realizations for from the correct posterior model . To empirically quantify the quality of the proposed approximations we consider the similarity measure defined in [25]:
(33) 
being the average acceptance rate in the MCMC algorithm. We define and to be the acceptance rates based on, respectively, the th order truncation and projection approximation. Acceptance rates close to unity entails that is close to , which is as desired. We define performance measures and to compare the effect of the approximation order as a function of computational demands.
We estimate the marginal probabilities profiles MPR for the correct posterior model for each by:
(34) 
and similarly the marginal MAP predictor by
(35) 
Evaluating the correct posterior density is computationally expensive since each iteration requires evaluation of a dimensional Gaussian pdf . In practice, this limits the number of realizations to be generated in reasonable time. However, we note that the MCMCstep is essentially independent of the approximation order ; that is, for a fixed computational budget it can be beneficiary to generate few realizations from a high order approximation than many realizations from a low order approximation.
Zerotransitions in the prior matrix enforces zerotransitions in the posterior, thus the th order approximation can be obtained at a cost smaller than the theoretical for a full matrix .
5 Synthetic example
We empirically evaluate the proposed likelihood approximations for various order in a synthetic example. The synthetic example is defined from a Base case and six deviating cases, having different model parameters. We avoid the limiting cases where the covariance matrix in the gross likelihood tends toward a diagonal matrix, which is the wellknown standard hidden Markov model. That is, we consider only deviating cases where the number of nonzero entries in the covariance matrix is at least as high as for the Base case. For the standard hidden Markov model the truncation approximation is exact for all , and the projection approximation is empirically verified to have an acceptance rate close to unity for all . We claim that the chosen deviating cases are the most challenging ones, since they appear with strong spatial coupling in the likelihood function.
The reference profile of interest is assumed to be of length and have three distinct classes; namely , see Figure 2. The reference profile is identical for all cases to be discussed, and it is generated as a realization from a first order stationary Markov chain with transition matrix
(36) 
having stationary distribution .
The Base case has response likelihood model parameters:
(37) 
and spatial correlation function for . The acquisition likelihood model is specified through a secondorder exponential convolution kernel, .
The deviating cases, displayed in Fig. 2, are defined as follows, relative to the Base case:

High smoothness: a spatial correlation function with the same range, but a higher degree of smoothness: .

Long correlation: a spatial correlation function with an identical degree of smoothness but a higher range parameter: .

Overlapping classes: the black and red classes have identical pointwise means, but different marginal variances:
(38) 
Wide convolution: a secondorder exponential convolution kernel:

High noise: an acquisition likelihood with a large error component: .

Ricker convolution: a convolution kernel with negative lobes:
.
Note that cases one through three are defined by different response models, resulting in different signals, however they do have an identical acquisition model. Cases four through six have an identical response profile, but different acquisition models. The various set of observations are generated by simulation from the various gross likelihood models given the reference profile .
Fig. 2 contains the reference classification , response signals and observations d displayed for the various cases. The objective is to predict given d for each case. The reference profile , and and d in pairs are displayed in the top row for the Base, High smoothness, Long correlation and Overlapping classes cases. Relative to the Base case, appears to be smoother in the High smoothness and Long correlation cases, however, the observations appears to be almost identical. For the Overlapping classes case we observe that d has less variability than in the Base case.
The bottom row displays, in a similar format, the Base case, Wide convolution, High noise and Ricker convolution cases. We do not display in the three latter cases since the response models are assumed to be identical to the Base case to ease interpretation. The respective observations are very different, however.
We assess by simulation the associated signaltonoiseratios:
(39) 
The signaltonoiseratio for the Base case is 2.53, and the signaltonoiseratios for the deviating cases are presented in Table 1. The Base case, High smoothness, Wide convolution and Ricker convolution cases have almost identical signaltonoiseratios, while the other cases appear with lower singaltonoiseratios. Particularly the High noise case has a large noise component, as it should.
Different response models  Different acquisition models  

High smoothness  2.49  Wide convolution  2.42 
Long correlation  1.71  High noise  0.74 
Overlapping classes  1.35  Ricker convolution  2.55 
For each case we apply both the truncation and the projection approximations to assess the corresponding approximate posterior models by the efficient recursive ForwardBackward algorithm. We use the approximate posterior models as proposal distribution in an independent proposal MCMC MH algorithm to assess the correct posterior model for each case, and 160,000 realizations are generated from the correct posterior model. We discard the initial 10,000 realizations as the burnin phase. A higher order delayed rejection step [29] is included in the algorithm. Fig. 3 contains a sequence of 1,000 consecutive posterior realizations for the Base case based on a ninth order projection approximation. We observe relatively good mixing, with the three classes represented in the realizations. This characteristic is shared also by the other cases, but the results are not presented here.
In Fig. 4 results for the Base case are presented. The top line of display is based on the truncation approximation, while the bottom line is for the projection approximation. Both the aMPR profiles for and the aMAP predictors are displayed for varying orders of of approximations. These characteristics are exactly calculated by recursive algorithms, although the computer demand increase fast with increasing order . To the right estimated MPR profiles for and the MMAP predictor for the correct posterior model are displayed together with the reference profile . These characteristics are only available by MCMC based inference. Note that the reference profile is more heterogeneous than the MMAP predictor due to the regression towards the local majority class.
The aMPR profiles for the truncation approximation tend towards the MPR profiles as the order increase, which is desirable. It can be shown that for they are identical. It is problematic, however, that for low order the approximations are not satisfactory, since the computer demands increases fast with . For the projection approximation the results appears to be better; good approximations for low order and improvements towards the correct model results as increase.
Lastly, note that the MCMC estimates of the correct model characteristics are almost identical  independent of whether the proposal is based on the truncation or projection approximation  of course. The MCMC acceptance rates will of course depend on the proposal distribution.
In Fig. 5 the similarity measures and for the approximate versus correct posterior model for all cases are presented in the top row for increasing . The corresponding performance measures and , taking also computer demands into account, are displayed in the bottom row.
The similarity measures and appears to increase monotonically with increased order in the various cases. This indicates that the two sequences of approximations, parametrized by order , provides monotonical improved approximations for the correct posterior model with increasing at the cost of increased computer demands. The projection approximation is almost uniformly better than the truncation approximation. For all cases, the similarity measure for the former reaches the range for a ninth order approximation, which entails average acceptance rates of % in the independent proposal MCMC MH algorithm. These acceptance rates are very satisfactory, but the computer demands for the ninth order projection approximation is large. In the bottom line, we observe that the performance measure indicates that the order in around 3 is optimal with acceptance rates in range %.
Fig. 6 contains exact aMPR profiles for the deviating cases for and exact aMAP predictors for the truncation and projection approximations of order . Also, the estimated MPR profiles for and MMAP predictor for the correct posterior models are displayed. The aMPR profiles and aMAP predictors based on the projection approximation are closer to the MRP profiles and MMAP predictors for the correct posterior model than the ones based on the truncation approximation. It is particularly so for the Long correlation, Overlapping classes, Wide convolution and Rick convolution cases, that is, for cases with strong spatial coupling between states. The aMPR profiles for and aMAP predictors, which can be exactly assessed by recursive algorithms, are so close to the corresponding characteristics for the correct posterior model that one may question the necessity of a MCMC step. Recall that calculating the posterior density in each MCMC iteration is computer demanding.
We define the following MMAP predictor: from Eq. (18). Posterior realizations, MMAP predictor, 80 % prediction interval and the fitted density are displayed in Fig. 7. We observe that the MMAP predictor captures the class transitions in . The fitted density appear with similar characteristics, such as bimodality and skewness, as . Compared to the true response profile we obtain a root mean squared error (rmse) value of 0.54 in the Base case. In Fig. 8 we display the empirical coverage ratios for various prediction interval, and we observe the coverage ratios to be satisfactory.
In Fig. 9 we present posterior results for for the deviating cases. The MMAP predictors are seen to capture the main characteristics of the true response profile in most of the cases. Also, the 80 % prediction intervals appear to be satisfactory. Coverage ratios for the 80 % prediction intervals and rmse values are given in Tab. 2 for the deviating cases. As expected, the rmse values are higher in the Wide convolution and High noise case. Somewhat surprisingly the rmse values are almost identical in the Base, High smoothness, Long correlation and Ricker convolution cases. In general, the 80 % coverage ratios are satisfactory.
Root mean squared error  Coverage ratio  

High smoothness  0.56  77 % 
Long correlation  0.52  79 % 
Overlapping classes  0.29  89 % 
Wide convolution  0.83  70 % 
High noise  0.75  76 % 
Ricker convolution  0.53  78 % 
To conclude  the projection approximation is clearly preferable to the truncation approximation, particularly for models where the likelihood functions have a high degree of spatial coupling. These models are indeed the ones most challenging to invert. For hard problems with many classes a projection approximation of order and a MCMC step to assess the correct posterior model is recommended. In most cases with a small to moderate number of classes a projection approximation of order about nine will provide aMPR profiles for and aMAP predictors that are so close to the correct MPR profiles and MMAP predictor, that the MCMC step need not be necessary. The MMAP predictor is seen to capture the main characteristics of the true response profile , and the coverage ratios appear to be satisfactory compared to the reference. Bimodality and skewness are present in the posterior model , as desired.
6 Norwegian Sea case study
Lithology/fluid prediction subsurface is a major challenge in reservoir characterization. The variables of interest are the lithology/fluid classes and elastic attributes along a vertical profile penetrating a reservoir unit, and the objective is to predict these variables given a set of seismic amplitudeversusobservations (AVO) observations. We demonstrate the proposed methodology on a case study from the Norwegian Sea. The data set covers a midJurassic gas sandstone reservoir in the Garn and Ile formation, separated by a thick siltyshale formation. The reservoir zones are characterized by a relatively low Pwave velocity and density. We refer to [2] for further details about the reservoir of interest.
We discretize the upscaled region of interest onto a regular grid of size , and we operate in timedomain. Three distinct lithology/fluid classes are identified in a reference solution (Fig. 11), namely, . We define a first order Markov chain prior model for the lithology/fluid classes:
(40) 
having stationary distribution . We note the prior zeroprobability transitions between gas sandstone and brine sandstone due to gravitational sorting [17].
The middlelayer represents the logarithm of the elastic material properties, which are the canonical variables for the seismic AVO observations. These properties are parametrized by the logarithm of pressurewave velocity , the logarithm of shearwave velocity , and the logarithm of the density . The response likelihood model, or rockphysics model [3, 21], is empirically calibrated from an upscaled nearby well. Contour plots for the various are displayed in Fig. 10. Indeed, as seen in the contour plot, the elastic attributes are correlated. An exponential spatial correlation function is also estimated from the nearby well.
The upperlayer d represents angledependent seismic AVO data from near and far angles (Fig. 11). The reflections and wave propagation subsurface are modeled by a convolutional, linearized Zoeppritz version of the wave equation [7]. That is, , where is a convolutional matrix, is the angledependent weakcontrast AkiRichards coefficients [1], and is a differential matrix which calculates contrasts. The covariance matrix in the acquisition model is assumed to be . The signaltonoiseratio is assessed using Eq. (39), and it is estimated to be 2.23, which is comparable to most cases discussed in Section 6.
We consider only the projection approximation for based on the discussion in Section 5. A set of 100,000 realizations is generated from , and the initial 10,000 realizations are discarded as the burnin phase. The similarity measures and the performance measures are given in Table 3. We obtain acceptance rates in the range 0.15  0.35, which is monotonically increasing with except for . As in Section 6, around 23 appear to be a reasonable tradeoff between acceptance rate and computational cost.
Similarity  

measure  0.1612  0.2139  0.2580  0.2812  0.2392  0.3460 
Performance  
measure  0.0537  0.0238  0.0096  0.0035  0.0010  0.0005 
Fig. 12 contains aMPR profiles for and aMMAP predictors for based on the projection approximation. Both the aMPR profiles and the aMMAP predictors appear to be similar and almost identical for . The estimated MPR profiles are displayed together with the MMAP predictor, and we observe that most of their characteristics are shared by the approximate solutions. However, we note that in contrast to the aMMAP predictors, the thin siltyshale layer around 2320 ms is identified in the MMAP. The brine sandstone layer at 2430 ms is captured in the aMAP for low order , but it is not captured for higher order . As seen in the aMPR profiles the marginal probability for brine sandstone is approximately 50 % around 2430 ms, thus it does not influence the proposal density in the MCMC algorithm significantly. The MMAP predictor is observed to capture the main characteristics of the reference profile , however smallscale variability is lost since predictions causes a regression towards the dominating class; shale. We observe that the MMAP predictor is more uncertain in the lowermost part of the the domain of interest based on the MPR profiles. Indeed, this in correspondence with the rockphysics model where we see that the brine sandstone is partly masked by shale whereas gas sandstone is clearly separated from shale (Fig. 10).
Posterior predictions and prediction intervals for the elastic attributes are displayed in Fig. 13. In specific, we observe that we are able to predict low and zones where the reservoirs are located. We display the kernel smoothed timeaveraged histograms at the bottom in Fig. 13 and observe the posterior models for and to resemble the observed smoothed histograms based on the well observations. Root mean squaredd errors for the predictions are for . Coverage ratios for the 80 % prediction intervals are for , which are satisfactory.
As seen in Fig. 13 the width of the prediction intervals varies with time, in specific, the width of the prediction interval in the uppermost gas reservoir is wider than in the lowermost gas reservoir. Fig. 14 displays histograms of at 2320 ms and 2360 ms. The posterior is observed to wider at the former. At 2320 ms the posterior is bimodal, while it is unimodal at 2360 ms. This is in consistent with the MPR profiles displayed in Fig. 12 where the marginal probability for shale is higher in the upper reservoir. That is, the uncertainty in the lithology/fluid classification propagates into the uncertainty of the posterior for the elastic properties.
7 Conclusion
A convolved hidden Markov model extending [18] and [25] is defined inspired by applications in subsurface reservoir prediction. The bottomlayer is a categorical model, and the middlelayer is Gaussian spatial model conditional on the bottomlayer. Hence, the middlelayer appears as an unconditional Gaussian mixture spatial variable. The toplayer, representing the observations, is assumed to be Gaussian conditional on the middlelayer. Prediction of the categorical and Gaussian mixture variables, given the observations in a Bayesian inversion framework, is of interest.
Two classes of likelihood approximations are presented to obtain approximate posterior models on factorial form, which are analytically and efficiently assessed by the recursive Forward Backward algorithm. The first approximation is based on a naïve truncation of the marginal densities, while the second approximation is based on a Gaussian approximation of a Gaussian mixture density. The correct posterior model is thereafter assessed using an approximate posterior model as proposal density in an independent proposal MCMC MH algorithm. The approximations are empirically evaluated on a set of synthetic cases. In general, higher order approximations results in acceptance rates up to around 0.5, at the cost of additional computational resources which increases exponentially. The projection approximation appears with higher acceptance rates in the MCMC algorithm than the truncation approximation. We observe that about two or three appears to be a reasonable tradeoff between accuracy and computational cost.
The MMAP predictor for the middlelayer is verified to reproduce bimodality and skewness in the middle layer in the synthetic examples. Coverage ratios for the prediction intervals are observed to be satisfactory.
The projection approximation have been empirically verified on a subsurface case study to predict both the lithology/fluid classes and elastic material properties reliably.
We observe that for a higher order approximation additional MCMC sampling may not be necessary since the approximate posterior models appear to be very close to the correct posterior model.
The current work should naturally be extended to 2D and 3D models along the lines in [30] and [24]. Preliminary work on real 2D subsurface seismic show encouraging results.
Acknowledgments
This research is part of the Uncertainty in Reservoir Evaluation (URE) activity at the Norwegian University of Science and Technology (NTNU). We thank PGS and Tullow Oil for providing the data.
References
 Aki and Richards [1980] Aki, K. and P. Richards (1980). Quantitative Seismology: Theory and Methods. W. H. Freeman and Co., New York.
 Avseth et al. [2016] Avseth, P., A. Janke, and F. Horn (2016). AVO inversion in exploration  Key learnings from a Norwegian Sea discovery. The Leading Edge 35, 405–414.
 Avseth et al. [2005] Avseth, P., T. Mukerji, and G. Mavko (2005). Quantitative Seismic Interpretation: Applying Rock Physics Tools to Reduce Interpretation Risk. Cambridge University Press.
 Barembruch et al. [2009] Barembruch, S., A. Garivier, and E. Moulines (2009). On Approximate MaximumLikelihood Methods for Blind Identification: How to Cope With the Curse of Dimensionality. IEEE Transactions on Signal Processing 57(11), 4247–4259.
 Baum et al. [1970] Baum, L. E., T. Petrie, G. Soules, and N. Weiss (1970). A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics 41(1), 164–171.
 Besag [1974] Besag, J. (1974). Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society. Series B (Methodological) 36(2), pp. 192–236.
 Buland and Omre [2003] Buland, A. and H. Omre (2003). Bayesian linearized AVO inversion. Geophysics 68(1), 185–198.
 Connolly and Hughes [2016] Connolly, P. A. and M. J. Hughes (2016). Stochastic inversion by matching to large numbers of pseudowells. Geophysics 81(2), M7–M22.
 Doyen [2007] Doyen, P. (2007). Seismic reservoir characterization: an earth modelling perspective. Education tour series. EAGE publications.
 Fearnhead and Clifford [2003] Fearnhead, P. and P. Clifford (2003). Online inference for hidden Markov models via particle filters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(4), 887–899.
 FrühwirthSchnatter [2006] FrühwirthSchnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer Series in Statistics. Springer New York.
 Giordani et al. [2007] Giordani, P., R. Kohn, and D. van Dijk (2007). A unified approach to nonlinearity, structural change, and outliers. Journal of Econometrics 137(1), 112 – 133.
 Grana and Della Rossa [2010] Grana, D. and E. Della Rossa (2010). Probabilistic petrophysicalproperties estimation integrating statistical rock physics with seismic inversion. Geophysics 75(3), O21–O37.
 Grana et al. [2017] Grana, D., T. Fjeldstad, and H. Omre (2017). Bayesian Gaussian Mixture Linear Inversion for Geophysical Inverse Problems. Mathematical Geosciences, in press.
 Gunning and Glinsky [2007] Gunning, J. and M. E. Glinsky (2007). Detection of reservoir quality using Bayesian seismic inversion. Geophysics 72(3), R37–R49.
 Hammer and Tjelmeland [2011] Hammer, H. and H. Tjelmeland (2011). Approximate forward–backward algorithm for a switching linear Gaussian model. Computational Statistics & Data Analysis 55(1), 154 – 167.
 Krumbein and Dacey [1969] Krumbein, W. C. and M. F. Dacey (1969). Markov chains and embedded Markov chains in geology. Mathematical Geology 1, 79–96.
 Larsen et al. [2006] Larsen, A. L., M. Ulvmoen, H. Omre, and A. Buland (2006). Bayesian lithology/fluid prediction and simulation on the basis of a Markovchain prior model. Geophysics 71(5), R69–R78.
 Lindberg and Omre [2014] Lindberg, D. V. and H. Omre (2014). Blind Categorical Deconvolution in TwoLevel Hidden Markov Models. IEEE Transactions on Geoscience and Remote Sensing 52(11), 7435–7447.
 Lindberg and Omre [2015] Lindberg, D. V. and H. Omre (2015). Inference of the Transition Matrix in Convolved Hidden Markov Models and the Generalized BaumWelch Algorithm. IEEE Transactions on Geoscience and Remote Sensing 53(12), 6443–6456.
 Mavko et al. [2009] Mavko, G., T. Mukerji, and J. Dvorkin (2009). The Rock Physics Handbook (Second ed.). Cambridge University Press.
 Rabiner [1989] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286.
 Reeves and Pettitt [2004] Reeves, R. and A. N. Pettitt (2004). Efficient recursions for general factorisable models. Biometrika 91, 751–757.
 Rimstad and Omre [2010] Rimstad, K. and H. Omre (2010). Impact of rockphysics depth trends and Markov random fields on hierarchical Bayesian lithology/fluid prediction. Geophysics 75(4), R93–R108.
 Rimstad and Omre [2013] Rimstad, K. and H. Omre (2013). Approximate posterior distributions for convolutional twolevel hidden Markov models. Computational Statistics & Data Analysis 58, 187–200.
 Robert and Casella [2005] Robert, C. P. and G. Casella (2005). Monte Carlo Statistical Methods (Springer Texts in Statistics). Secaucus, NJ, USA: SpringerVerlag New York, Inc.
 Scott [2002] Scott, S. L. (2002). Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. Journal of the American Statistical Association 97(457), 337–351.
 Tarantola [2005] Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics.
 Trias et al. [2009] Trias, M., A. Vecchio, and J. Veitch (2009). Delayed rejection schemes for efficient MarkovChain MonteCarlo sampling of multimodal distributions.
 Ulvmoen and Omre [2010] Ulvmoen, M. and H. Omre (2010). Improved resolution in Baeysian lithology/fluid inversion from seismic prestack data and well observations: Part I  Methodology. Geophysics 75(2), R21–R35.
 Viterbi [1967] Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269.