Skew-Gaussian Random Fields

# Skew-Gaussian Random Fields

Kjartan  Rimstad Henning  Omre Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
###### Abstract

Skewness is often present in a wide range of spatial prediction problems, and modeling it in the spatial context remains a challenging problem. In this study a skew-Gaussian random field is considered. The skew-Gaussian random field is constructed by using the multivariate closed skew-normal distribution, which is a generalization of the traditional normal distribution. We present a Metropolis-Hastings algorithm for simulating realizations efficiently from the random field, and an algorithm for estimating parameters by maximum likelihood with a Monte Carlo approximation of the likelihood. We demonstrate and evaluate the algorithms on synthetic cases. The skewness in the skew-Gaussian random field is found to be strongly influenced by the spatial correlation in the field, and the parameter estimators appear as consistent with increasing size of the random field. Moreover, we use the closed skew-normal distribution in a multivariate random field predictive setting on real seismic data from the Sleipner field in the North Sea.

###### keywords:
Spatial statistics, Skewness, Orthant probabilities, Seismic inversion

## 1 Introduction

Spatial prediction is an important element in many earth science and engineering applications, such as climate studies, geographical and geological sciences, petroleum engineering, mining, and environmental engineering. Usually, data are considered to be a realization from a random field, and focus is on predicting values in unobserved locations or regions. The random field is often assumed to be Gaussian, but histograms of the raw data are frequently skewed, multi-modal, and/or heavy tailed and hence do not appear as Gaussian. A common approach to deal with non-Gaussianity is to univariately transform the spatial field into a field with Gaussian marginals and then use Gaussian models, but the transformation is usually not known and the inverse transformation may be difficult to assess (see e.g. De Oliveira et al., 1997; Diggle and Ribeiro, 2007). Moreover it is difficult to create a joint transformation of the entire random field into a Gaussian random field. An alternative strategy is to consider a non-Gaussian random field model. We follow the latter approach in this study and define a random field which capture skewness.

The univariate skew-normal distribution was introduced in Azzalini (1985), and the multivariate skew-normal distribution in Azzalini and Dalla Valle (1996) and Azzalini and Capitanio (1999). Several authors have generalized the multivariate skew-normal distribution and many of these generalizations are summarized in Arellano-Valle and Azzalini (2006). The book Genton (2004) provides a detailed overview over a variety of skewed probability distributions. We consider the multivariate closed skew-normal (CSN) distribution introduced in González-Farías et al. (2004a, b). The family of CSN distributions inherits many favorable properties from the multivariate normal distribution, it is for example closed under marginalization and conditioning.

The skew-normal probability distributions have also previously been cast in a spatial context. In Kim and Mallick (2004) a skew-normal random field using the multivariate skew-normal distribution is defined, and an approach for spatial interpolation is presented. Allard and Naveau (2007) studies a random field constructed from the CSN distribution and outline a procedure for spatial prediction. Bayesian spatial prediction for CSN random fields is presented in Karimi and Mohammadzadeh (2011), and Bayesian spatial regression with CSN errors and missing observations is considered in Karimi and Mohammadzadeh (2012). Zhang and El-Shaarawi (2010) presents an approach based on the univariate skew-normal distribution, but the model lacks the closure properties which the CSN model exhibits. In the current study we follow Allard and Naveau (2007) and consider the random field constructed from the CSN distribution. This choice is made because the CSN distribution family has several favorable closure properties, which will be exploited for inference and prediction. We extend the study in Allard and Naveau (2007) by using a grid representation which defines an approximately stationary random field, using a slightly different parameterization, estimating model parameters by maximum likelihood, and using the model in a predictive setting with real seismic data from the North Sea.

In order to evaluate the CSN density function we have to calculate high dimensional orthant probabilities of the normal distribution, which is an extremely computer demanding task. We present an algorithm inspired by the algorithms presented in Genz (1992) and Genz and Bretz (2009) to calculate these orthant probabilities. Moreover, an efficient method for sampling from truncated normal distribution is presented. Lastly, we use a Monte Carlo based maximum likelihood estimation approach to assess the model parameters (Geyer and Thompson, 1992). All the computer calculation are made on the laptop computer (Intel Core i7 CPU, 8 GB memory).

## 2 Model

The multivariate CSN distribution is defined in González-Farías et al. (2004a), as a generalization of the multivariate normal distribution. Let be multivariate normal distributed by using the notation:

 U=(U1U2)∼Np+q[(μ1μ2),(Σ1Σ12Σ21Σ2)], (1)

where , , , , , , and denotes the -dimensional multivariate normal distribution with expectation vector and covariance matrix . Then is defined to be CSN distributed, denoted . Here the notation corresponds to all elements in being jointly negative. The corresponding probability density function (pdf) of the CSN distribution is

 fp,q(x;μ1,μ2,Σ1,Σ2,Σ12) = ϕp(x;μ1,Σ1)Φq(0;μ2+Σ21Σ−11(x−μ1),Σ2−Σ21Σ−11Σ12)Φq(0;μ2,Σ2), (2)

where is the normal pdf and is the normal cumulative distribution function (cdf). For being a matrix of zeros, the CSN projects into the multivariate normal distribution. For and the CSN is identical to the multivariate skew-normal distribution as defined in Azzalini and Dalla Valle (1996). The class of CSN distributions inherits important properties from the multivariate normal distribution, such as being closed under marginalization, conditioning, and linear transformations (González-Farías et al., 2004a).

We work in a spatial random field context and we let be a real-valued random field, where is a finite domain in and is a generic location in . The random field is a Gaussian random field if for all configurations of locations and all the pdf of is a multivariate normal distribution.

Consider the bivariate Gaussian random field

 {U(s)=(U1(s)U2(s)):  s∈D}. (3)

Then the random field defined by

 {Y(s)=[U1(s)∣U2(s′)≤0:s′∈D]:s∈D}, (4)

is a skewed Gaussian random field. One particular case occurs when and are independent, then is a Gaussian random field. Moreover, for the extreme case with full dependence we have that is a truncated Gaussian random field. When the bivariate is a stationary Gaussian random field, then is also a stationary random field, except for border effects.

The random field defined in Expression 4 is difficult to handle in practice; therefore, we consider a CSN random field as defined in Allard and Naveau (2007). The random field is defined by first specifying a discretization with finite and fixed , and . Then we define the associated CSN random field as

 {X(s)=[U1(s)∣U2≤0]:s∈D}, (5)

if for all configurations of locations and all the pdf of is CSN distributed. Or equivalently, if the Gaussian random field and are jointly Gaussian, then is a CSN random field.

Note that even when in Expression 3 is a stationary Gaussian random field, the CSN random field as defined in Expression 5 is not stationary in the traditional sense. The non-stationarity is a consequence of the locations being finite and fixed. For example when is far from all and hence and are weakly correlated, then the marginal distribution of will be approximately Gaussian.

The random field has some stationary properties, however, for example for being a regular grid discretization over the random field is approximately stationary in the discretization locations sufficiently far away from the border. In addition the CSN random field is approximately stationary when the discretization in some sense is dense compared to the correlation range of the random field. The CSN random field can also be interpreted in a Bayesian setting, with a truncated random field seen as a latent random field (Liseo and Loperfido, 2003).

The skew-Gaussian random field introduced in Kim and Mallick (2004) is a special case of the CSN random field defined above with . By choosing , we obtain the CSN random field

 {X(s)=U1(s)+Cov(U1(s),U2)Σ−12([U2∣U2≤0]−μ2):s∈D}, (6)

where is a Gaussian random field and is a zero-truncated Gaussian random variable. If we assume that is independent of is it easy to see that this random field behaves like a Gaussian random field with a skewed mean (Allard and Naveau, 2007; Zhang and El-Shaarawi, 2010) and the skewness will only be identified through repeated sampling from the random field. This effect is also observed for multivariate t-fields, where each realization behaves like a Gaussian random field, and the heavy tails will only be identifiable through repeated sampling from the random field (see e.g. Røislien and Omre, 2006). Another consequence of each sample behaving like a Gaussian random field is that the skew-Gaussian random field model in Kim and Mallick (2004) cannot be identified by a single realization (Genton and Zhang, 2012).

An additional problem with the skew-Gaussian random field introduced in Kim and Mallick (2004) is that contains the spatial variability and the scalar the skewness. If we fix the correlation structure of and assume that has short correlation length relative to the size of , the correlation between and has to be small. Thus the skewness in will be small. This is also discussed in Azzalini and Dalla Valle (1996) where they consider the case when and . Evidently similar problems will appear for the CSN random field when is large compared to . In Karimi and Mohammadzadeh (2011, 2012) is used in the prediction part; thus each realization of the random field can not contain a high degree of skewness.

In the current study we consider a two-dimensional approximately stationary CSN random field defined on a regular grid over , which has the distribution in Expression 2. We choose to use a parsimonious model with few parameters such that we are able to estimate the parameters from one realization of the random field. The model should, however, be sufficiently flexible such that it is able to describe different levels of skewness. We use , where and , with location parameters , and , where is a vector of ones. The covariance structure is defined to be on the form

 Σ (7)

where is a scale parameter, is a skewness parameter, is a -dimensional identity matrix, and is a stationary correlation matrix with an exponential correlation function where is the distance between two spatial locations, and and are horizontal and vertical range parameters, respectively. The parameterization structure of and the restrictions on the parameters ensure positive semidefiniteness of . The new parameterization of Expression 2 becomes

 fp,p(x;μ,ν,σ2,γ,dh,dv) =ϕp(x;μ1,σ2C)∏pi=1Φ1(0;ν−γσ(xi−μ),1−γ2)Φp(0;ν1,(1−γ2)Ip+γ2C), (8)

which is equivalent to . The factorization in the nominator is a consequence of the identity matrix term in Expression 7. The parameters in the model are , , , , , and , and the constraints and ensure that is positive semidefinite and hence a valid covariance matrix. Note that there is only one multivariate normal cdf in Expression 8. The marginal distribution for is

 f1,p(xj;μ,ν,σ2,γ,dh,dv) = ϕ1(xj;μ,σ2)Φ1(0;ν−γσ(xj−μ),1−γ2) × ∫ϕp−1(x−j∣xj;μ1,σ2 C)∏pi=1,i≠jΦ1(0;ν−γσ(xi−μ),1−γ2)dx−jΦp(0;ν1,(1−γ2)Ip+γ2 C), (9)

where and is the conditional distribution of given . This marginal distribution is not equal to the univariate skew-normal distribution introduced in Azzalini (1985). The marginal pdf in Expression 9 is dependent on the complete grid design, hence all parameter inference and predictions must be made with reference to one common design. It is not so for Gaussian random fields where the marginal pdfs are independent of the spatial correlation function. These important features of CSN random fields are not discussed in Allard and Naveau (2007). We will later in this study see that coupling between and will reduce the maximum skewness we can obtain in the marginal distribution compared to the corresponding skewness in the univariate skew-normal distribution introduced in Azzalini (1985). The maximum skewness is dependent on the spatial coupling of the grid nodes, and increased coupling decreases the maximum skewness. Increased coupling can be caused by either a denser grid or stronger spatial correlation, or both. Moreover the random field is stationary except for border effects. The parameterization in Expression 7 is similar, but not identical, to the parameterization in Allard and Naveau (2007) where the full covariance function is of the form

 [σ2C−γσ2C−γσ2C(1+γ2)σ2C]. (10)

Note that for this parameterization there are no restrictions on , but we will later in this study argue for our parameterization usually being able to capture a higher degree of skewness than the model in Allard and Naveau (2007).

### 2.1 Simulation study

In this section we explore the properties of the CSN random field defined above for six sets of parameter values, one base case, case 1, and five deviating cases. We are particularly concerned about the ability to represent skewness in the marginal pdf and its dependence on spatial coupling. The parameter values of the six cases are summarized in Table 1, and we present grid random fields for the six parameter cases.

A Metropolis Hastings (MH) algorithm is used to sample from the distributions. The algorithm is summarized in Appendix A, and the algorithm uses the importance sampler from Genz (1992) as a block proposal distribution in the update steps. The size of the blocks used in the MH-algorithm is normally about which gives an acceptance rate of about for most of the parameter cases we present. It takes a couple of minutes on a laptop computer to sample one realization of the CSN random field with the MH-algorithm implemented in C. The burn-in and mixing appear as satisfactory and is not displayed.

Figure 1 displays the results from the six cases. The marginal distributions in the center location of the random field are presented in the first column. Normal distributions, with the two first moments identical to the CSN marginal distribution, are also displayed. The second column displays quantile-quantile plots of the marginal CSN distribution versus the normal distribution. One arbitrary realization from the CSN random field is presented in the last column.

The first row in Figure 1 displays the base case of a CSN random field with isotropic spatial correlation and reasonably strong correlation with the hidden truncated random field. Note that some skewness in the marginal distribution is visible, but the skewness is not very evident. In the second row a CSN random field without spatial correlation, i.e. a white noise random field, and otherwise identical parameters is displayed. We observe that the reduction in spatial correlation increases the skewness in the random field.

The integral in Expression 9 provides the spatial coupling effect since increased correlation of the variables make the mode of the truncated pdf move away from the truncation border and hence appear more normal like. Support S1 contains an illustrative example of the effect. This effect is a similar effect to the correlation versus skewness effect discussed in Azzalini and Dalla Valle (1996). The model in Allard and Naveau (2007) will usually have higher correlation in the truncated field than our model, due to the lack of the term. Thus, the skewness in their model will be reduced compared to the skewness in our model.

In case 3, the third row of Figure 1, the spatial correlation parameters are increased and otherwise identical parameters as case 1, which reduce the skewness even further. In the forth row the correlation parameter between and is increased. Note that a value close to unity represents a random field that is close to a truncated Gaussian random field. The increase in causes somewhat higher skewness in the marginal distribution. The fifth row presents results from a case where the truncation of the latent random field appears further out in the tail. This change of conditioning causes only minor changes in the marginal distribution compared to the base case. The last row in Figure 1 presents the case with spatial anisotropic correlation, with correlation only in the horizontal direction, and otherwise identical parameters to case 1. More skewness in the marginal distribution is obtained compared to case 1 where correlation is present in both directions. This spatial anisotropic correlation structure is similar to the one estimated for the seismic data case studied later in this paper.

The study shows that it is difficult to obtain a high degree of skewness in the marginal distribution in CSN random fields due to the spatial coupling effect. This lack of skewness is unfortunate and reduce the relevance of the family of CSN random fields. Moreover, spatial coupling complicates model parameter estimation by a maximum likelihood criterion.

## 3 Parameter estimation

Parameter estimation for CSN random fields poses challenging numerical problems, since the probability density function and moments (see Allard and Naveau, 2007) are functions of multivariate normal cdfs. In Allard and Naveau (2007) a method of moment estimation is discussed while a weighted method of moments estimation approach is used in Flecher et al. (2009). The methods of moment estimators are particularly computational demanding due to the frequent appearance of the multivariate normal cdfs in the moments. In the current study we use a maximum likelihood estimator. The multivariate normal cdf also appears in the normalizing constant in the likelihood, but the total number of evaluations will usually be smaller than in a method of moments estimation procedure. From the previous section we learned that the differences between case 1 and case 5 are small. The only difference between the two cases is different values of ; therefore, we choose to fix . In this section we only consider isotropic random fields and let . The log-likelihood is then

 l(μ,σ2,γ,d;x)=logL(μ,σ2,γ,d;x) = logϕp(x;μ1,σ2 C)+q∑i=1logΦ1(0;−γσ(xi−μ),1−γ2) −logΦq(0;0,(1−γ2)Ip+γ2C), (11)

with being the stationary correlation matrix as defined previously with . We only need to compute the multivariate normal cdf once for each likelihood evaluation, while the optimization requires sequential likelihood computations. Note that for our model the challenging last term in Expression 11 only depends on the parameters and , not the parameters and , which simplifies the computations somewhat.

The model can be considered as a missing data model (Little and Rubin, 1987), is observed if , and the expectationâmaximization (EM, Dempster et al., 1977) algorithm could be used to obtain the maximum likelihood estimate. Usually the EM-algorithm has slow convergence, and we cannot calculate the expectation step analytically; thus a Monte Carlo EM-algorithm (Wei and Tanner, 1990) has to be applied, which is a computational burden. Direct maximization of the likelihood requires estimates of multivariate normal cdfs that can be done by Monte Carlo methods, which is the approach chosen in this paper. Our experience is that this approach will normally converge faster than the alternatives discussed above.

Importance sampling provides a simple method for estimating the orthant probability in Expression 11, and Monte Carlo optimization can be used to maximize the likelihood (Geyer, 1996). In our study we approximate these orthant probabilities by using the importance sampling method described in Genz (1992) and Genz and Bretz (2009), and the importance sampling algorithm is summarized in Appendix B. By using the same set of uniform random variables for each likelihood evaluation we ensure that the approximated likelihood is smooth; thus we are able to use standard optimization routines. The Monte Carlo errors in the Monte Carlo maximum likelihood estimates are evaluated by doing an ensemble of optimizations with different Monte Carlo likelihood approximations by using different set of random numbers.

As discussed in Azzalini (1985) the information matrix for the original parameterization of the univariate skew normal distribution becomes singular as the skewness parameter goes to zero, but this singularity problem can be solved by reparameterizing the model (Azzalini, 1985; Azzalini and Capitanio, 1999). The singularity in the information matrix becomes a problem in numerical optimization with Newton methods. We used the original parameterization of the CSN random field, but we used some steps with a Nelder-Mead simplex method before we used a interior-reflective Newton method in MATLAB to maximize the likelihood. By following this approach we did not notice any abnormalities in the optimization procedure.

The likelihood function is, however, generally not a convex function of the parameters. Thus to find the global optimum we start the optimization at multiple starting points and choose the values of the parameters that give the largest value of the likelihood function. In our simulation study we did not encounter any problem in identifying the dominant mode.

### 3.1 Empirical study

In this section we estimate parameters from realizations of CSN random field from the base case in the previous section, i.e. , , , and . We consider the parameter estimates as a function of the dimension of the random field and the number of Monte Carlo samples used to assess the orthant probability. We want to evaluate the size of the random field needed and the computational demands required to get proper estimates. The computing time for and is typically one minute on the laptop computer for estimating one set of parameters.

Figure 2 displays the parameter estimates as a function of the number of Monte Carlo points . The maximum likelihood estimates are plotted for eight different Monte Carlo approximations of the likelihood, i.e. we use eight sets of independent random numbers in the approximation of the likelihood. We use the same realization of the random field for all eight Monte Carlo likelihood approximations. The variations among these eight estimates illustrate the Monte Carlo likelihood approximation error in the parameter estimates. The dashed lines are prediction interval computed, where the variances are the diagonal elements of the inverse Hessian of the log likelihood function evaluated at its maximum. Figure 2 illustrates that the Monte Carlo error is significantly smaller than the prediction intervals for , and that the Monte Carlo error increases with higher dimension , while the inverse Hessian variance error decreases with higher dimension . Hence we need higher value of for increasing , which is expected since we need to compute orthant probabilities of higher dimensions.

Figure 3 contains the distribution of the maximum likelihood estimates randomized over realizations from the base case random field. The size of the random fields varies from to . The number of Monte Carlo samples are constant, , and we assume that the Monte Carlo likelihood approximation error is ignorable. Note that the estimated pdf for is non-zero, but these boundary values are not “unacceptable” values, as discussed in Azzalini and Capitanio (1999) and Azzalini (2005), as they represent a truncated Gaussian random field in the same way as represents a Gaussian random field. The figure also displays that the occurrence of decreases with higher , as discussed in Liseo (1990). The maximum likelihood estimates in Figure 3 are not unbiased, but the estimators appear as consistent since the biases and variances shrink with increasing size of the random field .

## 4 Seismic inversion of data from the North Sea

In this section we use the CSN random field model in a predictive setting. We consider inversion of seismic amplitude-versus-offset (AVO) data into elastic material properties (pressure-wave velocity, shear-wave velocity, density) in the subsurface, which is a major challenge in modeling of hydrocarbon reservoirs. The seismic AVO data are measurements from the Sleipner Øst field in the North Sea, which is a gas condensate field in the southern part of the North Sea. The depth of the reservoir is in the range from 2270 to 2500 meter sub-sea. We have seismic AVO data from a 2D profile and observations of the elastic material properties from one well, drilled through the reservoir, see Figure 4.

In Buland and Omre (2003) the seismic inversion is casted in a Bayesian predictive setting with a Gaussian prior on the logarithm of the elastic material properties and Gauss-linear likelihood for the seismic observations. The methodology is illustrated on data from the Sleipner Øst field. To justify the use of a Gauss-linear model the prior model for the logarithm of the elastic material properties have to be assumed Gaussian, but these assumptions do not fit the observations from an available well particularly good. Data from the same area is also considered in Karimi et al. (2010) where a CSN model with a pseudo-likelihood is used on a 1D profile along one well. The pseudo-likelihood approach will suffer from instabilities in the parameter inference, especially in a 2D setting where the spatial coupling is more prominent due to strong horizontal correlation. In this section we consider the full 2D profile of the seismic data and aim at predicting the associated elastic material properties. We use the CSN random field presented above as prior model for the 2D profile of the logarithm of the elastic material properties.

The seismic AVO data are collected by a seismic survey, which is an active acoustic data acquisition technique. Explosions are fired at several locations at the surface and reflections from a grid covering the subsurface for a set of reflection angles are collected. The data represent angle-dependent seismic AVO data for three angles at each grid node in a -grid covering the 2D profile. The dimension of the seismic AVO data is .

The variable of interest represents the logarithm of the elastic material properties on the -grid covering the 2D profile. Hence, the dimension of is . The logarithmic transformation is used to get a linear relationship between the variables of interest and the seismic data . The elastic material properties are also observed along the well trace, see Figure 4. The observations in the well are assumed to be exact and they are displayed in Figure 5. The elastic material properties are centered around linear vertical trends, and their histograms are displayed in Figure 5(b) and 5(c). The histograms appear as skewed, or may even bimodal.

The link between the observations and the elastic material properties, termed seismic likelihood model , is defined by a weak-contrast, convolutional, linearized Zoeppritz model (Buland and Omre, 2003). The convolutional forward model is defined by the matrix , where is a convolutional matrix defined by the kernels, presented in Support S2, is a matrix of angle-dependent weak contrast Aki-Richards coefficients (Aki and Richards, 1980), and is a differential matrix which calculates contrasts. This forward matrix represents the physics of the wave reflections. The reflection depends on the contrasts in the material properties (), it is angle dependent (), and wave propagation is a diffusive process (). The model is , where is assumed to be a colored Gaussian error term with zero mean and covariance matrix . The covariance matrix is parameterized as , where denotes the Kronecker product, is the error variance, is a wavelet correlation matrix, is a horizontal correlation matrix, and is a vertical correlation matrix. These matrices are parameterized by exponential correlation matrices with parameters , , , respectively. Thus the likelihood is .

The objective is to predict the elastic variables , from the observed seismic AVO data, . The variables of interest, , is high-dimensional, and so is the observations, . The forward matrix is ill-conditioned and rank-deficient, however. Moreover, there is a colored error term, , in the observations. Hence cannot be uniquely determined by . We cast this prediction in a Bayesian setting; hence the posterior distribution is the objective of the study

 p(m∣d)=const×p(d∣m)p(m), (12)

where const is a normalizing constant and is the prior distribution of which must be defined.

Let the prior model for be a stationary CSN random field with as previously defined: , where , , and . The covariance matrix is parameterized as , where is the inter-variable covariance matrix and , is a horizontal direction exponential correlation matrix with parameter and is an vertical direction exponential correlation matrix with parameter . The skewness parameter is , where , and . The full covariance matrix for is

 (13)

where is a diagonal matrix where the elements are the square root of the inverse of the diagonal matrix of , and is used to scale the covariance matrix of the truncated field, i.e. is a correlation matrix. This expression corresponds to Expression 7 extended to a tri-variate random field.

The unknown model parameters in both the likelihood and the prior models are , , , , , , , , . We estimate by maximum marginal likelihood for seismic AVO data and well observations . These parameter estimates, , are used as plug-in values in the posterior model , see Expression 12, to obtain an operable model.

The marginal likelihood to be maximized with respect to is:

 p(d,mw;θ)=∫p(d,m;θ)dm−w=∫p(d;m,θ)p(m;θ)dm−w, (14)

where denotes the material properties everywhere except in the well trace. Expression 14 is analytically tractable since CSN random fields are closed under linear operations. In practice, we only use 20 seismic traces on each side of the well to reduce the computational burden, i.e. the normalizing constant integral in the CSN distributions has dimension .

The estimation procedure is identical to the one discussed in the parameter estimation section, and we use Monte Carlo samples. Note that we use a common grid design for both predictions and model parameter inference in order to obtain consistent estimates. Each likelihood evaluation takes a couple of minutes on the laptop computer; thus the optimization procedure takes a couple of hours. The estimated parameters for the CSN random field prior model are

 σ2e=0.27,dwe=0.09,dhe=22.85,dve=16.19, μ0m=⎡⎢⎣−0.27−0.550.10⎤⎥⎦,Σ0m=⎡⎢⎣0.00520.00800.00030.00800.0342−0.00100.0003−0.00100.0022⎤⎥⎦,γ=⎡⎢⎣0.9410.996−0.902⎤⎥⎦, dhm=3.37,dvm=13.76.

Note that the skewness parameter indicates a positive skewness for pressure-wave velocity and shear-wave velocity , and negative skewness for density , which agree with the histograms in Figure 5. Note also that the correlation in the horizontal direction of the prior model for the elastic properties is much higher than in the vertical direction .

We will compare the CSN random field model with a Gaussian random field model, which is known to be a CSN random field with . The estimated parameters for the Gaussian model are

 σ2e=0.32,dwe=0.10,dhe=24.75,dve=17.9, μ0m=⎡⎢⎣−0.025−0.0540.0010⎤⎥⎦,Σ0m=⎡⎢⎣0.00400.00350.00020.00350.0193−0.00090.0002−0.00090.0013⎤⎥⎦, dhm=3.25,dvm=14.85.

The estimation procedure for the Gaussian model takes only a couple of minutes on the laptop computer. The main deviations between the estimates under the two models are the differences in the location and scale parameter estimates. Recall that the location and scale parameters are not identical to the expected value and variance in the CSN model.

Having an estimate of the model parameters, , we use with plug-in values as the predictive posterior distribution, see Expression 12. The posterior model is a CSN random field and analytically tractable due to the closure properties of the CSN distribution (González-Farías et al., 2004a; Karimi et al., 2010). Note that we do not use the observations of the elastic material properties in the well trace explicitly when predicting , the well observations will only have influence on the predictions through .

The posterior distribution is estimated by sampling samples using the MH algorithm in Appendix A. Figure 6 displays the posterior median for both models. The predictions for the two models appear as fairly similar, but the predictions with the CSN model have generally lower values for and compared to the Gaussian model. The predictions for appears to deviate more from the well observations for both models than the predictions for and , but this is expected from the geophysical model since there is less information about in the data. Realizations from the CSN and Gaussian posterior distributions are presented in Support S2, and the well observations do not deviate dramatically.

The posterior standard deviations for the CSN model are both observation design and value dependent, see Support S2. Usually higher prediction variance for extreme predictions. The posterior standard deviations for the Gaussian model are only dependent on the observations design, not the observed values, hence they are almost constant due to a symmetric design. The standard deviations for the Gaussian model is slightly larger than the typical values in the CSN model for and , while standard deviations for are similar in both models.

Figure 7 displays the observations and the well predictions from the seismic data. The bold black solid lines are well observations, the thin black solid lines are posterior medians, dashed black lines are posterior 80% prediction intervals, the thin gray solid lines are prior medians, and the dashed gray lines are prior 80% prediction intervals. We see that the CSN model predictions match the well observations better than the Gaussian model for low values of and , and reach almost as high as the Gaussian model predictions for the high values. This is as expected since the CSN model has larger flexibility than the Gaussian model. The differences for predictions are small. The CSN model produce asymmetric predictions intervals due to skewness in the marginal posterior pdfs, but this effect is not very prominent in the display.

Table 2 displays the mean absolute error (MAE), prior and posterior coverage for the prediction intervals for the CSN and Gaussian models, with the well observations used as truth. The MAE is reduced by when using the CSN model compared to the Gaussian model. The prior coverage is a reference coverage, and the posterior coverage shall ideally be identical to the prior one. The reduction in coverages, entailing underestimation of the prediction intervals, are much larger for the Gaussian model than for the CSN model for and . Recall that and are the variables with most skewness. This indicates that the CSN model is superior to the Gaussian model in seismic inversion into elastic material properties for the Sleipner case.

The well observations can also be used in the prediction based on the CSN random field model, hence the predictive distribution is . Figure 8 corresponds to Figure 6a when also is conditioned on. As expected these predictions reproduce the well observations and appear with higher resolution close to the well. A simulated realization from the posterior distributions is displayed in Support S2, and the well observations appear as an integral part of the realization.

For the Gaussian random field model, the computation time for parameter estimation and sampling are within minutes on the laptop computer, while for the CSN random field model the corresponding time is hours. One may ask whether the rather small improvements in the predictions are worth the large increase in computation time, although in some cases these minor improvements may be important for identifying a hydrocarbon reservoir of immense value.

## 5 Conclusion

We define an approximately stationary CSN random field with skewed Gaussian marginal distributions. The field is based on the definition of CSN multivariate distribution (González-Farías et al., 2004a) and has a somewhat different parameterization than the CSN random field defined in Allard and Naveau (2007). We demonstrate that there is a strong dependence between the maximum skewness in the marginal pdfs and the spatial coupling.

A Metropolis-Hastings algorithm for effective simulation of realizations from the CSN random field, and an procedure for estimating model parameters by maximum likelihood are presented. A simulation study on different CSN random fields illustrate that the maximum skewness of the marginal distributions is severely reduced compared to the skewness in univariate skew-normal distribution due to spatial coupling effects. The maximum likelihood estimates for the model parameters are biased for small random fields, but the bias and variances of the estimates are reduced with increasing extent of the random field. The model parameter estimators appear to be consistent.

A case study of seismic AVO inversion into elastic material properties is presented. The inversion is cast in a Bayesian predictive setting with a tri-variate CSN random field prior model and a Gauss-linear likelihood model. The posterior model is also a CSN random field which is analytically tractable. Plug-in model parameter estimates based on maximum marginal likelihood is used. improvements in MAE in the predictions compared to a Gaussian random field model are documented.

The computational cost of parameter estimation and prediction is feasible even for random field discretized into grid of size at least . Our example runs within hours on a laptop computer.

## 6 Acknowledgments

The research is a part of the Uncertainty in Reservoir Evaluation (URE) activity at the Norwegian University of Science and Technology (NTNU). We thank the operator Statoil and the Sleipner licence (Statoil, ExxonMobil, and Total) for providing the data.

## Supporting Information

Additional supporting information may be found in the online version of this article:
S1. Illustration: Marginals of truncated multivariate normal distributions.
S2. Figures: Additional figures from seismic inversion of data from the North Sea.

## References

• Aki and Richards (1980) Aki, K., Richards, P. G., 1980. Quantitative seismology: Theory and methods. W. H. Freeman and Co., New York.
• Allard and Naveau (2007) Allard, D., Naveau, P., 2007. A new spatial skew-normal random field model. Communications in Statistics: Theory and Methods 36 (9), 1821–1834.
• Arellano-Valle and Azzalini (2006) Arellano-Valle, R., Azzalini, A., 2006. On the unification of families of skew-normal distributions. Scandinavian Journal of Statistics 33 (3), 561–574.
• Azzalini (1985) Azzalini, A., 1985. A class of distributions which includes the normal ones. Scandinavian journal of statistics 12 (2), 171–178.
• Azzalini (2005) Azzalini, A., 2005. The skew-normal distribution and related multivariate families*. Scandinavian Journal of Statistics 32 (2), 159–188.
• Azzalini and Capitanio (1999) Azzalini, A., Capitanio, A., 1999. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61 (3), 579–602.
• Azzalini and Dalla Valle (1996) Azzalini, A., Dalla Valle, A., 1996. The multivariate skew-normal distribution. Biometrika 83 (4), 715.
• Buland and Omre (2003) Buland, A., Omre, H., 2003. Bayesian linearized AVO inversion. Geophysics 68 (1), 185–198.
• De Oliveira et al. (1997) De Oliveira, V., Kedem, B., Short, D., 1997. Bayesian prediction of transformed Gaussian random fields. Journal of the American Statistical Association 92, 1422–1433.
• Dempster et al. (1977) Dempster, A., Laird, N., Rubin, D., et al., 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39 (1), 1–38.
• Diggle and Ribeiro (2007) Diggle, P., Ribeiro, P., 2007. Model-based geostatistics. New York: Springer.
• Flecher et al. (2009) Flecher, C., Naveau, P., Allard, D., October 2009. Estimating the closed skew-normal distribution parameters using weighted moments. Statistics & Probability Letters 79 (19), 1977–1984.
• Genton and Zhang (2012) Genton, M., Zhang, H., 2012. Identifiability problems in some non-Gaussian spatial random fields. Chilean Journal of StatisticsTo appear.
• Genton (2004) Genton, M. G. (Ed.), 2004. Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality, 1st Edition. Chapman & Hall/CRC, Boca Raton, FL.
• Genz (1992) Genz, A., 1992. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics 1 (2), 141–149.
• Genz and Bretz (2009) Genz, A., Bretz, F., 2009. Computation of Multivariate Normal and t Probabilities. Springer Verlag.
• Geyer (1996) Geyer, C., 1996. Estimation and optimization of functions. In: Gilks, W., Richardson, S., Spiegelhalter, D. (Eds.), Markov chain Monte Carlo in Practice. Chapman and Hall, New York, pp. 241–258.
• Geyer and Thompson (1992) Geyer, C., Thompson, E., 1992. Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society. Series B (Methodological), 657–699.
• González-Farías et al. (2004a) González-Farías, G., Domínguez-Molina, J., Gupta, A., 2004a. Additive properties of skew normal random vectors. Journal of statistical planning and inference 126 (2), 521–534.
• González-Farías et al. (2004b) González-Farías, G., Domínguez-Molina, J., Gupta, A., 2004b. The closed skew-normal distribution. In: Genton, M. G. (Ed.), Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall / CRC, Boca Raton, FL, pp. 25–42.
• Karimi and Mohammadzadeh (2011) Karimi, O., Mohammadzadeh, M., 2011. Bayesian spatial prediction for discrete closed skew Gaussian random field. Mathematical Geosciences 43, 565–582.
• Karimi and Mohammadzadeh (2012) Karimi, O., Mohammadzadeh, M., 2012. Bayesian spatial regression models with closed skew normal correlated errors and missing observations. Statistical Papers 53, 205–218.
• Karimi et al. (2010) Karimi, O., Omre, H., Mohammadzadeh, M., 2010. Bayesian closed-skew Gaussian inversion of seismic AVO data for elastic material properties. Geophysics 75 (1), R1–R11.
• Kim and Mallick (2004) Kim, H.-M., Mallick, B. K., 2004. A Bayesian prediction using the skew Gaussian distribution. Journal of Statistical Planning and Inference 120 (1-2), 85 – 101.
• Liseo (1990) Liseo, B., 1990. La classe delle densita normali sghembe: aspetti inferenziali da un punto di vista bayesiano. Statistica 50 (1), 71–79.
• Liseo and Loperfido (2003) Liseo, B., Loperfido, N., 2003. A Bayesian interpretation of the multivariate skew-normal distribution. Statistics & probability letters 61 (4), 395–401.
• Little and Rubin (1987) Little, R., Rubin, D., 1987. Statistical analysis with missing data. Wiley New York.
• Robert (1995) Robert, C. P., June 1995. Simulation of truncated normal variables. Statistics and Computing 5 (2), 121–125.
• Røislien and Omre (2006) Røislien, J., Omre, H., 2006. T-distributed random fields: A parametric model for heavy-tailed well-log data. Mathematical Geology 38 (7), 821–849.
• Wei and Tanner (1990) Wei, G., Tanner, M., 1990. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association 85 (411), 699–704.
• Zhang and El-Shaarawi (2010) Zhang, H., El-Shaarawi, A., 2010. On spatial skew-Gaussian processes and applications. Environmetrics 21 (1), 33–47.

## Appendix A Algorithm: Sampling from a truncated multivariate normal distribution

Consider the problem of sampling realizations from a truncated multivariate normal distribution with unnormalized density , where , , is the indicator function, the notation corresponds to all elements of being jointly negative, and is the -dimensional multivariate normal density distribution. In order to sample from this distribution we extend the Metropolis-Hastings algorithm in Robert (1995) with a block independent proposal distribution defined by

 p∗(xa∣xb) =q∏i=1I(xai≤0)ϕ1(xai∣xa1:i−1,xb;μ,Σ)Φ1(0∣xa1:i−1,xb;μ,Σ), (15)

where is the block size, , and are the conditional normal probability and cumulative probability distribution of given and , respectively, with . Note that when we have the algorithm in Robert (1995). Note also that the distribution in Expression 15 is normalized, it is easy to sample sequentially from this distribution, and that the distribution depends on the ordering of . Expression 15 is the distribution which is used as independent sampler in Genz (1992) for estimating orthant probabilities.

The acceptance probability in the accept/reject step is

 α =min{1,p(xa′∣xb)p(xa∣xb)⋅p∗(xa∣xb)p∗(xa′∣xb)} =min{1,∏nai=1Φ1(0∣xa1:i−1′,xb;μ,Σ)∏nai=1Φ1(0∣xa1:i−1,xb;μ,Σ)}, (16)

where is the new proposed state. The Metropolis-Hastings algorithm is presented in Algorithm 1.

In practice we calculate the conditional distributions in advance. To save memory and time we also limit the elements in , i.e. sets, we are allowed to choose, but we try to choose the allowed elements in a way such that all elements in has approximately equal update probability. We normally use the block size .

## Appendix B Algorithm: Monte Carlo estimation of normal orthant probabilities

Consider the problem of estimating the orthant probability

 Φn(0;μ,Σ) =∫I(x≤0)ϕn(x;μ,Σ)dx, (17)

where , , is the indicator function, the notation corresponds to all elements of being jointly less than or equal to zero, and is the -dimensional multivariate normal density distribution. The usual importance sampling Monte Carlo approximation with importance function is

 Φn(0;μ,Σ) ≈N∑j=1I(xj≤0)ϕn(xj;μ,Σ)fn(xj;μ,Σ), (18)

with and is the number of Monte Carlo sampling points. We follow the approach presented in Genz (1992) and use the importance function

 fn(x;μ,Σ) =n∏i=1I(xi≤0)ϕ1(xi∣x1:i−1;μ,Σ)Φ1(0∣x1:i−1;μ,Σ). (19)

where and are the conditional normal probability and cumulative probability distribution of given , respectively, with . However, we also introduce a mean shift parameter in the importance function. Then the importance sampling approximation appear as

 Φn(0;μ,Σ) ≈N∑j=1ϕn(xj;μ,Σ)ϕn(xj;μ+η,Σ)n∏i=1Φ1(0∣xj1:i−1;μ+η,Σ), (20)

with . We use when the correlation structure in is high, and close to when is a diagonal matrix. It is also possible to use a different covariance matrix in the importance function, but the variance reduction we were able to attain was small compared to the extra computational time.

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters