Parametrization and Generation of Geological Models with Generative Advesarial Networks
One of the main challenges in the parametrization of geological models is the ability to capture complex geological structures often observed in subsurface fields. In recent years, Generative Adversarial Networks (GAN) were proposed as an efficient method for the generation and parametrization of complex data, showing state-of-the-art performances in challenging computer vision tasks such as reproducing natural images (handwritten digits, human faces, etc.). In this work, we study the application of Wasserstein GAN for the parametrization of geological models. The effectiveness of the method is assessed for uncertainty propagation tasks using several test cases involving different permeability patterns and subsurface flow problems. Results show that GANs are able to generate samples that preserve the multipoint statistical features of the geological models both visually and quantitatively. The generated samples reproduce both the geological structures and the flow properties of the reference data.
Generation and parametrization of geological models are fundamental tasks in subsurface reservoir management. Due to inherent difficulties in obtaining a complete and accurate description of the subsurface properties –such as the porosity and the permeability fields–, engineers must contemplate a set of possible realizations of such unknown fields while honoring the sparse information available (e.g. information from well logs). Standard algorithms used to generate geological realizations rely on the variogram and two-point geostatistics [deutsch1998gslib], however these algorithms are not suitable for generating complex geological models that are often encountered in practice, such as those containing channels and faults. More recently, a number of multipoint statistics (MPS) algorithms were introduced [strebelle2001reservoir, strebelle2002conditional]. MPS algorithms infer properties from a conceptual model. Basically, a conceptual geological model is designed under expert knowledge which conveys expected patterns of the random field. MPS algorithms then generate realizations honoring this model and any other information available.
One limitation of MPS algorithms is that they do not provide a useful parametrization of the output realizations. Concretely, to achieve efficient reservoir management, it is of interest to be able to characterize the geological realizations in terms of a few number of parameters. Moreover, it is often desirable that such parametrization be differentiable. This is of practical importance because the generated realizations are often coupled with other parts of the simulation pipeline, e.g. for history matching or uncertainty quantification tasks. In these procedures, the differentiability allows for gradient-based optimization techniques. Furthermore, using MPS algorithms to generate a large number of realizations – usually required for history matching and uncertainty quantification – is often computationally demanding.
A typical approach for parameterizing distributed fields is the Principal Component Analysis (PCA), or K-L expansion, performed using a limited set of realizations of the random field. However, PCA is based on Gaussian assumptions and has limited capacity to parameterize general fields, such as complex geological structures with channelized features. In the field of machine learning, where parametrization of high dimensional data is ubiquitous, Kernel Principal Component Analysis (kPCA) [scholkopf1998nonlinear] was introduced as an extension of PCA. In essence, kPCA relies on an initial transformation of the input data to enable more general features, followed by standard PCA application on the transformed data. This is done indirectly through the “kernel trick” to avoid the evident computational implications of transformations in very high dimensions. The application of this machine learning tool in geostatistics was first studied in [sarma2008kernel], and further improved in [ma2011kernel, vo2016regularized]. Later, other PCA-based methods were proposed such as in [vo2014new, vo2015data] where PCA is formulated as an optimization problem, allowing the deliberate introduction of bounds and regularization terms. This idea was then adopted in [vo2016regularized] to extend the kPCA method.
Since the cost of standard singular value decomposition grows cubically with the size of the covariance matrix, PCA/kPCA-based methods do not scale very well with the number of data samples and the dimensionality of the data. In reservoir engineering, the latter is already very large, while both the former and the latter are ever increasing as the trend of decreasing computational and storage cost is sustained. Therefore a method that is scalable in both regards is appealing in the present and near future. Regarding the kernel-based variants, an additional disadvantage is the need to solve a pre-image problem to actually obtain a parametrization with respect to the input space.
In recent years, a novel parametrization method called Generative Adversarial Networks (GAN) [goodfellow2014generative] was introduced in the machine learning community. The formulation of GANs takes a distinctive approach based on game theory, improving upon previous generative models by avoiding intractable likelihood functions and Markov chains. The sampling from the resulting generator is performed simply using a forward function evaluation.
In this work, we study the application of Wasserstein GAN [arjovsky2017wasserstein] (a particular implementation of GAN) for the parametrization of complex geological models. This method is scalable and the resulting parametrization is differentiable and straightforward (no pre-image problem required). The main motivation in the study of GAN comes from the well-known capability of neural networks to learn very complex, nonlinear features. In fact, in combination with convolutional networks [lecun1989generalization, krizhevsky2012imagenet], GAN models have shown state of the art performances in computer vision, specifically in challenging tasks such as reproducing natural images (human faces, handwritten digits, etc. [radford2015unsupervised]). On a more related topic, GAN was recently employed in the reconstruction of porous media [mosser2017reconstruction]. All this suggests that GAN can be effective in the parametrization of very general geological models exhibiting complex structures.
The rest of this paper is organized as follows: In section 2, we briefly describe the goal of parametrization of geological models, along with the standard principal component analysis approach. In section 3, we describe Generative Adversarial Networks and the Wasserstein formulation. In section 4, we compare the generated realizations and present several numerical results in subsurface flow problems. Finally, we draw our conclusions in section 5.
2 Parametrization of geological models
In a generic modelling setting, a spatial discretization grid is usually defined over the physical domain of interest. The underlying random field (e.g. porosity, permeability, etc.) then takes the form of a random vector where is the value at grid block , and is the total number of grid blocks. The goal of parametrization is to obtain a functional relationship , where is a random vector with assumed prior distribution , of size much smaller than (i.e. ). Additionally, it is desirable that be differentiable. New realizations of the random vector can then be generated by sampling the random vector from the prior .
The traditional approach in parameterizing is to perform a principal component analysis (PCA). Given a set of centered realizations (i.e. zero mean) , let be the matrix whose columns are the realization vectors. The covariance matrix of the random vector can be approximated by
PCA assumes the following parametrization of :
where is the matrix whose columns are the eigenvectors of the covariance matrix , is a diagonal matrix containing the respective eigenvalues , and is the parameterizing random vector of uncorrelated components. Assumming , the dimensionality reduction is obtained by keeping only the first components, that is, where , .
It can be easily verified that realizations generated by Equation 2 reproduce the covariance matrix . However, these new realizations do not in general reproduce higher order moments of the original realizations.
3 Generative adversarial networks
Generative adversarial networks (GAN) [goodfellow2014generative] consist of two competing functions (typically neural networks), namely a generator and a discriminator . The generator maps an input noise vector to an output “synthetic” sample . A prior distribution is assumed for , usually a Gaussian or uniform noise. Given data consisting of a set of original samples (for example, realizations ), the goal of is to generate samples that resemble samples from the data. More specifically, the objective of is to generate samples with a distribution that is close to the data generating distribution , i.e. attempts to replicate the data generating process. On the other hand, the discriminator is a classifier which takes a sample as input and tries to determine whether this sample is real (i.e. coming from the data) or synthetic (i.e. coming from the generator ). The output of is a scalar representing the probability of coming from the data.
In short, the goal of is to correctly discern between synthetic and real samples, while the goal of is to fool . This interplay beween and is formulated as a two-player minmax game:
In practice, the solution to this game can be approached by iteratively optimizing and in an alternating manner and independently. Given infinite capacity, the game converges to a global optimum where and everywhere. That is, the generator learns the data generating distribution exactly, therefore the discriminator fails to distinguish real samples from synthetic ones, resulting in a “coin toss” situation. After the model has been trained, can be used to generate new synthetic samples, parameterized by the noise vector . We note that the differentiability of with respect to is a direct consequence of choosing a set of differentiable functions for the search space of . This is often the case when neural networks are employed.
3.1 Wasserstein GAN
Finding the equilibrium of the game defined by (3) is notoriously hard in practice. In particular, if the discriminator is “too well trained”, it becomes saturated and provides no useful information for improvement of . On the other hand, overtraining might result in mode collapse of the generated distribution toward a single sample that always accepts. Solving this balancing problem has been the focus of many recent works, as summarized in [salimans2016improved].
One major development in improving GANs is Wasserstein GAN (WGAN) [arjovsky2017wasserstein]. In WGAN, the focus is in directly minimizing the Wasserstein distance instead of the Jensen-Shannon divergence implied by solving the game (3). The Wasserstein distance is defined as:
where denotes the set of all joint distributions with marginals and , correspondingly. The Wasserstein distance as expressed in Equation 4 is intractable in practice, therefore the following expression, due to the Kantorovich-Rubinstein duality, is used instead:
where indicates that the supremum is over the set of real-valued -Lipschitz functions, for some (irrelevant) constant . In effect, the function in Section 3.1, also called the “critic”, replaces the discriminator in standard GAN. Instead of a classifier function outputing a probability, the critic outputs a general real-valued score to the samples.
Even though the theoretical derivation of WGAN is very different from standard GAN, the only difference in the resulting formulation is a small change in the objective function, plus the Lipschitz constraint in the search space of the adversarial function (the critic), although relaxing the compliance to be a classifier function. However, the practical implications of WGAN are much broader, most importantly the critic can be trained to optimality (and in fact it should, in order to obtain a good approximation of ), alleviating the balancing problem. Additionally, WGAN provides a meaningful loss metric during training that correlates with the quality of the generated samples. This is very important for assessing the training progress, as well as providing a criteria for convergence.
3.1.1 Training WGAN
Typically, both the critic function and the generator are neural networks with specified architectures. Let and represent the weights of the networks for and , respectively, and we use the symbols and to explicitly express this dependency. To enforce the Lipschitz condition of , it is enough to assume a compact search space for the weights . In practice, this can be done, for example, by clipping the values of to certain range after each update of the training iteration.
The training procedure consists of alternating steps where and are optimized independently: In step A, we fix the generator weights and optimize the critic network (ideally to optimality) following the maximization problem:
Once the critic is fitted, we are provided with an approximation of the Wasserstein distance up to a multiplicative constant, via the objective function in expression (6). In step B, this distance is differentiated with respect to to provide the gradient required to optimize :
Steps A and B are iteratively repeated until convergence is reached.
The quality of the approximation of the Wasserstein distance, and therefore of its gradient, will depend of course on how well the critic has been trained. Hence, it is desirable to train the critic relatively well in each iteration. In practice, it is common to fully optimize every so often by setting a high number of inner iterations in step A after a number of full A-B loops. This is in contrast to standard GAN where the training of the discriminator and the generator has to be carefully balanced. Moreover, the readily available approximated Wasserstein distance provides an useful metric that correlates well with the quality of generated samples. We note the lack of a similar a metric in the standard GAN formulation.
4 Numerical Experiments
In this section, we demonstrate the use of Wasserstein GAN ( referred simply as GAN in the rest of this section) for the parameterization of two types of permeability. We further assess the generated realizations with two subsurface flow problems. An uncertainty propagation study is performed to estimate densities for several quantities of interest. Results are compared with the true results derived from the original samples. We also include results based on standard PCA parametrization.
4.1 The dataset
We evaluate the effectiveness of GANs in parameterizing two types of permeability. The first type of permeability is characterized by channelized structures showing semi-straight patterns. The second type consists of channelized structures exhibiting heavy meandering patterns. Conceptual images111Conceptual images or conceptual models are also called training images in the geology literature. Unluckily, the same term is also employed in machine learning to refer to the input training samples (the realizations). Hence, we do not employ such term to avoid confusion. for each permeability type are shown in Figures 0(b) and 0(a). These are binary images representing log-permeability values, where (black) designates the high permeability channels and (white) designates the low permeability background. In practice, these could be channels of sand embedded in a background of clay. The conceptual images used are of size .
We consider realizations of the permeability of size . Typically, a conceptual image is passed to a multipoint geostatistics algorithm such as snesim [strebelle2001reservoir] to generate a set of realizations. Here we utilize a straightforward “sliding window” approach, i.e. the realizations were obtained by simply cropping out regions of from the conceptual image. A total of samples can be obtained in this way for each pattern considered, although with significant overlap between samples. Some samples obtained by this procedure are shown in Figure 0(c).
4.2 The setup
For each pattern considered, a dataset of samples is employed to train a GAN model and to perform PCA for comparison. For the design of our GAN architecture, we followed the guidelines provided in [radford2015unsupervised]. We found that the inclusion of fully connected layers worked best in our experiments. More specifically, we included one fully connected layer after the input of , and one before the output of . The rest of the architecture design follows the suggestions from the guidelines. The network is trained with the Wasserstein formulation using the default settings provided in [arjovsky2017wasserstein]. For the size of the noise vector , we looked at sizes and , and we assumed standard Gaussian distribution for the prior . We used as the output activation of the generator and we preprocessed the binary data before training by shifting and scaling, from the binary range to the range . Using as the output activation automatically bounds the output of the generator. After the generator is trained and new realizations are generated, we can simply transform the outputs back to the binary range222Alternatively, one can use a sigmoid function, which is already bounded in , as the output activation of . However, offers better properties during training, such as already providing zero-centered inputs for the discriminator..
Regarding PCA, we followed the procedure in published literature [sarma2008kernel, ma2011kernel] of retaining of the total variance. This corresponded to retaining eigencomponents in the semi-straight pattern, and eigencomponents in the meandering pattern. A scree-plot of the eigenvalues is shown in Figure 2 for both permeability patterns.
4.3 Comparing realizations
We denote by GAN20 and GAN40 the GAN models trained with input of sizes and , respectively. Figures 4 and 3 show realizations generated with the GAN models and PCA, along with original realizations. These realizations are fair draws (not cherry-picked). It is evident from these images that the GAN models outperform PCA in capturing the channelized patterns of the data. In the semi-straight case, PCA slightly manages to capture the horizontal correlation of the channels, but it is clear that the generated samples are not channelized. In the meandering case, PCA seems to completely fail in capturing any pattern from the data. Additional samples are displayed in Appendix A.
Visually, we did not see much difference in samples generated by GAN20 and GAN40. Overall, both models generated realizations that are highly plausible, with samples that may trick the human eye. There are, however, some features that expose them such as pixel noise, which could be addressed by applying a median filter. Perhaps of more importance in regards to reservoir simulation are the “hanging channels”, i.e. channels with endpoints inside the domain, which are not present in the original realizations. In principle, this could be addressed by further tuning the network architecture (theoretically, models with infinite capacity and training time eventually learn the data generating distribution, as proven in [goodfellow2014generative]).
Following the visual comparison, we plot the histogram of the permeability values at the center of the domain, which corresponds to block . For this, we generated a number of realizations using the GAN models and PCA, for each of the permeability patterns. The results presented in Figure 5 show that both GAN20 and GAN40 learn the correct type of distribution (binary) with the vast majority of the values toward the extremes ( and ), and the obtained frequencies are very close to data. In contrast, the results from PCA follow a normal distribution, as expected.
4.4 Uncertainty propagation study
In this subsection, we study the effectiveness of the parametrization for an uncertainty propagation task. Since no significant differences were found in the visual comparison between GAN20 and GAN40, we only consider the results for GAN20 to simplify the presentation. We denote the results of GAN20 simply as GAN in the rest of this subsection.
We consider a subsurface flow problem where water is injected in an oil-filled reservoir. The physical domain under consideration is the unit square discretized by a grid and with isotropic permeability field as prescribed by the generated realizations. To simplify the physical equations, we assume that both water and oil have the same fluid properties. Under this assumption, the system of equations to be propagated is
where denotes the fluid pressure, denotes (total) fluid sources and sinks, and are the water and oil sources and sinks, respectively, denotes the permeability, denotes the saturation of water, denotes the medium porosity, and denotes the Darcy velocity. Note that where denotes the discretization of (note that the values of are the log-permeability values).
We consider two flow problems with different injection and production conditions:
- Quarter-five spot problem:
In this problem, injection and production points are located at and of the unit square, respectively. No-flow boundary conditions are imposed on all sides of the square. We assume unit injection/production rates, i.e. and .
- Uniform flow problem:
In this problem, uniformly distributed inflow and outflow conditions are imposed on the left and right sides of the unit square, respectively. No-flow boundary conditions are imposed on the remaining top and bottom sides. A total inflow/outflow rate of is assumed. For the unit square, this means and on the left and right sides, respectively, where denotes the outward-pointing unit normal to the boundary.
In both conditions, a pressure value of is imposed at the center of the square to close the problem, the reservoir only contains oil at the beginning (i.e. ), and the medium porosity is homogeneous and constant, with value . The simulated time is from until . In reservoir engineering, it is common practice to express changes in terms of pore volume injected (PVI), defined as , where is the water injection rate and is the pore volume of the reservoir. For a constant injection rate, this is simply a scaling factor of time.
For each flow problem and for each permeability pattern, equations 9 and 8 were propagated on three sets of realizations: one set corresponding to data ( samples were randomly selected from the original samples), and the other two sets corresponding to realizations generated by PCA and GAN.
After propagation, we computed the first four statistical moments (mean, variance, skewness and kurtosis) of the saturation at time , based on the runs. The results are shown in Figures 9, 8, 7 and 6 for each test case. In the mean and variance maps, both PCA and GAN are very close to the true maps. The good performance of PCA is expected since this method aims to conserve the first two statistical moments. For the higher order moments (skewness and kurtosis), estimations by GAN are overall better than PCA, as seen from the map plots. These observations are general in all the test cases. To further compare the statistics of the saturation solutions, we plot the saturation histogram () at the point of maximum variance, shown in Figures 11 and 10. In accordance to the map plot results, we see that PCA fails to replicate the higher order characteristics of the saturation distribution. On the other hand, the distributions by GAN are surprisingly close to the true distributions, suggesting that it has generated samples that preserve the flow properties of the data.
Next, we look at the water-cut curves, which measure the water content in the produced fluid over time. We computed the mean and variance of the water-cut curves from the runs The results are shown in Figures 13 and 12 for each test case. We observe that the mean curves are hardly distinguishable in all cases. Regarding the variance curves, two cases favored PCA (quarter-five spot in the semi-straight pattern, and uniform flow in the meandering pattern) while the other two cases favored GAN. We emphasize again that PCA is expected to perform very well for these curves, yet the results of GAN are comparable.
Finally, we look at the histogram of the water breakthrough time (measured in PVI), defined as the time at which water-cut reaches 1% of the produced fluids. Figures 15 and 14 show the histogram and probability density function (PDF) of the water breakthrough times for the quarter-five spot problem and the uniform flow problem, respectively. The PDFs were obtained by kernel density estimation using Scott’s rule. In all cases, we found that the obtained PDFs using GAN-generated realizations approximate the true distributions relatively well, and clearly outperformed the distributions obtained from samples generated by PCA parameterization.
We presented one of the first applications of Generative Adversarial Networks (GAN) for the parametrization of geological models. The results of our study show the potential of GANs as a parametrization tool to capture complex structures in geological models. The method generated geological realizations that were visually plausible, reproducing the structures seen in the reference data. More importantly, the flow physics induced by the generated realizations were close to reference. This was empirically verified for different permeability patterns and subsurface flow problems. In particular, we performed uncertainty propagation and estimated distributions of several quantities of interest in each test case. In all cases, we found that GAN was quite effective in approximating the distributions of the estimated quantities even when the distributions were non-trivial. In contrast, PCA was not suitable for distributions that are far removed from the normal distribution. Furthermore, we note that GAN showed superior performance compared to PCA despite only employing parametrizing coefficients, in contrast to PCA where and coefficients were used for the semi-straight and meandering patterns, respectively.
We conclude that GANs present a promising method for the parametrization of geological models. In our future work, we will look into the practicality of GANs for inverse modeling where the differentiability of the generator could be exploited to build an efficient gradient-based inverse method.