# Creating Virtual Universes Using Generative Adversarial Networks

###### Abstract

Inferring model parameters from experimental data is a grand challenge in many sciences, including cosmology. This often relies critically on high fidelity numerical simulations, which are prohibitively computationally expensive. The application of deep learning techniques to generative modeling is renewing interest in using high dimensional density estimators as computationally inexpensive emulators of fully-fledged simulations. These generative models have the potential to make a dramatic shift in the field of scientific simulations, but for that shift to happen we need to study the performance of such generators in the precision regime needed for science applications. To this end, in this letter we apply Generative Adversarial Networks to the problem of generating cosmological weak lensing convergence maps. We show that our generator network produces maps that are described by, with high statistical confidence, the same summary statistics as the fully simulated maps.

^{1}

^{1}footnotetext: Corresponding author: mmustafa@lbl.gov

The scientific success of the next generation of sky surveys (e.g. [1, 2, 3, 4, 5]) to test the current “standard model” of cosmology (CDM), hinges critically on the success of underlying simulations. Answering questions in cosmology about the nature of cold dark matter, dark energy and the inflation of the early universe, requires relating observations of a large number of astrophysical objects which trace the underlying matter density field, to simulations of “virtual universes” with different cosmological parameters. Currently the creation of each virtual universe requires an extremely computationally expensive simulation on High Performance Computing resources. In order to make this inverse problem practically solvable, constructing a computationally cheap surrogate model or an emulator [6, 7] is imperative.

However, traditional approaches to emulators require the use of a summary-statistic which is to be emulated. An approach that does not require such mathematical templates of the simulation outcome would be of considerable value in the field. The ability to emulate these simulations with high fidelity, in a fraction of the computational time, would boost our ability to understand the fundamental nature of the universe. While in this letter we focus our attention on cosmology, and in particular weak lensing convergence maps, we believe that this approach is relevant to many areas of science and engineering.

Recent developments in deep generative modeling techniques open the potential to meet this need. The density estimators in these models are built out of neural networks which can serve as universal approximators [8], thus having the ability to learn the underlying distributions of data and emulate the observable without being biased by the choice of summary-statistics, as in the traditional approach to emulators.

In this letter, we study the ability of a recent variant of generative models - Generative Adversarial Networks (GANs) [9] to generate weak lensing convergence maps. The training and validation maps are produced using N-body simulations of CDM cosmology. We show that maps generated by the neural network exhibit, with high statistical confidence, the same power (Fourier) spectrum of the fully-fledged simulator maps, as well as higher order non-Gaussian features, thus demonstrating that such scientific data is amenable to a GAN treatment for generation. The very high level of agreement we achieve offers promise for building emulators out of deep neural networks. We first present our results and analysis then outline the future investigations which we think are critical to build such emulators in the Discussion section.

## Results

Gravitational lensing has potential to be one of the most sensitive probes of the nature of dark energy [10], and affects the shape and apparent brightness of every galaxy we observe. Convergence is the quantity that defines the brightness of an observed object as it is affected by the matter along the line of sight between that galaxy and the observer. It can be interpreted as a measure of the density of the universe observed from a particular direction. A full N-body simulation creates convergence maps corresponding to many random realizations of the same cosmological model. We set out to train a GAN model on pixels convergence maps taken from these simulations. A description of the simulations and data preparation methods is in the Methods section. Before we describe our results we first outline the objective of generative models and the GANs framework.

The central problem of generative models is the question: given a distribution of data can one devise a generator such that the distribution of model generated data ? Our information about comes from the training dataset, typically an independent and identically distributed random sample which is assumed to have the same distribution as . Achieving a high fidelity generation scheme amounts to the construction of a density estimator of the training data. In the GANs framework a generator function is optimized to generate samples that are indistinguishable from training data as judged by a discriminator function . is optimized to discriminate between training data and generated data. In the neural network formulation of this framework the generator network parametrized by network parameters and discriminator network parametrized by are simultaneously optimized using gradient-descent.

Of interest to us here is the generator . Its parameters are optimized to map a vector sampled from a prior to the support of . The only requirement on the generator is that it is differentiable with respect to its parameters and input (except at possibly finitely many points). For the convergence maps we study, we choose a normal prior, so:

The dimension of the vector needs to be commensurate with the support of the training convergence maps in . Because the underlying physics of the convergence maps is translation and rotation invariant [11], we chose to construct the generator and discriminator networks mainly from convolutional layers. To allow the network to learn the proper correlations on the components of the input early on, the first layer of the generator network needs to be a fully-connected layer. A well studied architecture that meets these criteria is the Deep Convolutional Generative Adversarial Networks (DCGAN) [12]. DCGAN is a set of empirically chosen architectural guidelines and hyper-parameters which have been shown to be robust to excel at a variety of tasks. We experimented with DCGAN architectural parameters and we found that most of the hyper-parameters optimized for natural images by the original authors perform well on the convergence maps, for example, changing the learning rates or the kernel sizes worsens the performance. We used DCGAN with slight modifications to meet our problem dimensions as described in the Methods section.

Figure 1 shows examples of maps from the validation and GAN generated datasets. The validation dataset has not been used in the training or tuning of the networks. By eye an expert cannot distinguish the generated maps from the full simulation ones. We now move to a quantitative comparison of these datasets.

Evaluation of generative models. Once one has a density estimator of the original data the first practical question would be to determine the goodness of the fit. Basically, how close is to ? This issue is critical to understanding and improving generative models formulation and training procedure and is still an active area of research [13]. Significant insights into the training dynamics of GANs from a theoretical point of view have been gained recently [14, 15]. We think that when it comes to practical applications of generative models, such as in the case of emulating scientific data, the criterion to evaluate generative models is to study their ability to reproduce the characteristic statistics which we can measure from the original dataset.

To this end, we calculate three statistics on the generated convergence maps: a first order statistic (pixel intensity), second order correlations and a measurement of the non-Gaussian corrections. Such measurements and several other corrections schemes are known as summary-statistics of the maps. The ability to reproduce such summary-statistics is a reliable metric to evaluate generative models from an information encoding point of view. To test our statistical confidence of the matching of the summary-statistics we perform a two-tailed Kolmogorov-Smirnov (KS) test of the null-hypothesis that the summary-statistics in the generated maps and the validation maps have been drawn from the same continuous distribution.

Figure 2 shows a histogram of the distribution of pixel intensities of an ensemble of generated images compared to that of a validation dataset. It is clear that the GAN generator has been able to learn the probabilities of pixel intensities in the real simulation dataset, the KS p-value is .

The second-order statistical moment in gravitational lensing is the shear power spectrum. This is a measure of the correlation in gravitational lensing at different distances, and characterizes the matter density fluctuations at different length scales. Assuming we have only Gaussian fluctuations at comoving distance , the matter density of the universe can be defined by its power spectrum :

where is the Dirac delta function, and is the Fourier mode [11]. The power spectrum (and its corresponding real-space measurement, the two-point correlation function) of convergence maps has been long used to constrain models of cosmology by comparing simulated maps to real data from sky surveys [16, 17, 18]. Numerically, the power spectrum is the amplitudes of the Fourier components of the map. We calculate the power spectrum at 248 Fourier modes of an ensemble of generated maps using LensTools [19], and compare them to the validation dataset. Since each map is a different random realization, comparison has to be done at the ensemble level. Figure 3(a) shows two bands representing the mean standard deviation of the ensemble of power spectra at each Fourier mode of the validation and a generated dataset. As is clear in the figure, the bands completely overlap with each other. To confirm that the range of the power spectrum at a given is completely reproduced in the generated maps we look differentially at the underlying 1-D distributions. Samples of such distributions at equally spaced Fourier modes are shown in Figure 3(b). The KS p-values for Fourier modes are , and for the remaining modes. The power spectrum is the figure of merit for evaluating the success of an emulator for reproducing the structures of the convergence map, and we have shown with statistical confidence that GANs are able to discover and reproduce such structures.

The power spectrum only captures the Gaussian components of matter structures in the universe. However, gravity produces non-Gaussian structures at small scales which are not described by the two-point correlation function [11]. There are many ways to access this non-Gaussian statistical information, including higher-order correlation functions, and topological features such as Minkowski functionals , , and [20, 21] which characterize the area, perimeter and genus of the convergence maps. We investigate the ability of the networks to discover and reproduce these non-Gaussian structures in the maps by evaluating the three Minkowski functionals. Figure 4 compares bands of of the three Minkowski Functionals (calculated using [19]) to those in the real dataset. Each functional is evaluated at 19 thresholds. As we did with the power spectrum, we show the Minkowski functionals at different thresholds in Figure 5. Out of the threshold histograms have p-values , are , are and is . Again, it is clear that the GANs have successfully reproduced those non-Gaussian structures in the maps.

## Discussion

Generative models for emulating scientific data and parameter inference. Without loss of generality, the generation of one random realization of a science dataset (simulation or otherwise) can be posited as a black-box model where is a vector of the physical model parameters and is a set of random numbers. The physical model can be based on first principles or effective theories. Different random seeds generate statistically independent mock data realizations of the model parameters . Such models are typically computationally expensive to evaluate at many different points in the space of parameters .

A common problem for doing statistical inference from observations is making accurate predictions for a given summary statistic, for example the weak lensing power spectrum, or Minkowski functionals. Even in the simplest case, when the likelihood of the observed data is a multivariate Gaussian, it is characterized by both the mean data vector and a covariance matrix :

Dealing with the non-linear problem of structure formation in the universe, we usually do not have analytic expressions for the mean and covariance in terms of the model parameters, and instead they have to be estimated from simulations mimicking the experiment. Modeling the mean of a target summary statistic can efficiently be done via existing emulator techniques [22, 23, 24]. That however is not true for the covariance matrix, where importantly any statistical noise in propagates to an error in cosmological parameter constraints. In fact, future surveys will require of order mock realizations per cosmological model just to prevent parameter degradation due to covariance uncertainty [25]. Alternative approaches to expensive simulations are therefore mandated for generating such large numbers of mock data sets.

The inference problem becomes even harder for cases where a multivariate Gaussian likelihood is not a good approximation (for an example, see [26]). In those cases, it is often the best to adopt a likelihood-free method such as Approximate Bayesian Computation [27], which approximates the likelihood function through a massive Monte Carlo simulation of mock data realizations, and a summary statistic is used to quantify distance between real data and any mock realization. Needless to say, a “generator” which cheaply produces those mock realizations is an essential tool for those methods.

The idea of applying deep generative models techniques to emulating scientific data and simulations has been gaining attention recently [28, 29, 30, 31]. In this letter we have shown that, with high statistical confidence, GANs are able to generate convergence maps of a particular CDM model. Not only those maps share the same mean summary statistics as simulation data, but even more importantly, they reproduce the statistical variety present in simulations.

In our minds, the most important next step to build on this foundation and achieve an emulator of model is the ability of generative models to generalize in the space of model parameters from datasets at a finite number of points in the parameter space. More specifically, can we use GANs to build parametric density estimators of the physical model ?. Such generalization rests on smoothness and continuity of the response function of the physical model in the parameter space, but such an assumption is the foundation of any surrogate modeling. Future extensions of this work will seek to to add this parameterization in order to enable the construction of robust and computationally inexpensive emulators of cosmological models.

Finally, the remarkable success of GANs in this work demonstrates that, unlike natural images which are the focus of the deep learning community, scientific data come with a suite of tools (statistical tests) which enable verifying and improving the fidelity of generative networks. The lack of such tools for natural images is one of the most serious challenges to understanding the relative performance of the different generative models architectures, loss functions and training procedures [13, 32]. As well as the impact within these fields, scientific data with its suite of tools have the potential to provide the grounds for benchmarking and improving on the state-of-the-art generative models.

## Methods

Dataset. We use the cosmological simulations described in [21], produced using the Gadget2 [33] N-Body simulation code and ray-traced to produce weak lensing shear and convergence maps. A total of 50 runs were produced, each consisting of 512 particles in a box of size of 240Mpc. The cosmological parameters used in these simulations were , , , , , . These simulation boxes were rotated and translated to produce 1000 ray-traced lensing maps covering 12 square degrees, with pixels. The gravitational lensing illustrated by these maps can be described by the Jacobian matrix

where is the convergence, and is the shear.

The data was augmented by randomly cropping maps from each of the original maps. The probability for a map to have a single pixel value outside [-1.0,1.0] range is less than so it was safe to use the data without any normalization.

Data availability. The dataset used in this work will be made available publicly by the publication time.

Network architecture and training. The architectures of the generator and discriminator networks follow the Deep Convolutional Generative Adversarial Networks (DCGAN) guidelines and architecture [12]. The generator takes a 64-dimensional vector sampled from a normal prior . The first layer is a fully connected layer whose output is reshaped into a stack of feature maps. The rest of the network is four layers of transposed convolutions that lead to a single channel image. A rectified linear unit (ReLU) activation [34] is used for all except the output layer where a hyperbolic-tangent () is used. Batch Normalization [35] is used for all layers except output layer. We list our choice for the number of feature maps in Table 1.

Operation | Output tensor size |
---|---|

Linear | |

Transpose Conv | |

Transpose Conv | |

Transpose Conv | |

Transpose Conv |

The discriminator utilizes four convolutional layers. The number of feature maps, stride and kernel sizes are the same as in the generator (Table 1). All convolutional layers are activated with LeakyReLU [36] with parameter . The output of the last convolutional layer is flattened and fed into a fully connected layer with a 1-dimensional output that is fed into a sigmoid. Batch Normalization is used for all except the first layer. Finally, we minimize the heuristic loss function [37] with batch-size of 64 maps and using an Adam optimizer [38] with the parameters suggested in the DCGAN paper, learning rate and . We flip the real and fake labels with probability to avoid the discriminator overpowering the generator too early into the training. We implement the networks in TensorFlow [39].

As is well experienced in practice, GANs with the heuristic loss function suffer from unstable updates towards the end of their training. This has been recently analyzed theoretically in [14] and shown to happen when the discriminator is close to optimality but has not yet converged. For the results shown in this letter we trained the network until the generated images pass the visual and pixel intensity Kolmogorov-Smirnov tests. This took 45 passes (or epochs) over all of the training data. Given the un-stability of the updates at this point, the performance of the generator on the summary-statistics tests starts varying uncontrollably. Therefore, to choose a generator, we train the networks for two extra epochs after epoch 45, and randomly generate 100 batches (6400 maps) of samples at every single training step. We evaluate the power spectrum on the generated samples and calculate Chi-squared distance measurement to the power spectrum histograms evaluated on a development subset of the validation dataset. We use the generator with the best Chi-squared distance.

Summary statistics analysis. To calculate the two-point correlation functions and Minkowski Functionals we use LensTools [19], a suite of analysis tools for Weak Gravitational Lensing. We evaluated the power spectrum spanning the range of to in step sizes. The Minkowski Functionals thresholds ranged from to in step sizes.

## References

- [1] DESI Collaboration, et al. The DESI Experiment Part I: Science,Targeting, and Survey Design. ArXiv e-prints (2016).
- [2] DESI Collaboration, et al. The DESI Experiment Part II: Instrument Design. ArXiv e-prints (2016).
- [3] Laureijs, R., et al. Euclid Definition Study Report. ArXiv e-prints (2011).
- [4] LSST Dark Energy Science Collaboration. Large Synoptic Survey Telescope: Dark Energy Science Collaboration. ArXiv e-prints (2012).
- [5] Spergel, D., et al. Wide-Field InfrarRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTA 2015 Report. ArXiv e-prints (2015).
- [6] Heitmann, K., et al. The Coyote Universe. II. Cosmological Models and Precision Emulation of the Nonlinear Matter Power Spectrum. The Astrophysical Journal, 705, 156–174 (2009). doi:10.1088/0004-637X/705/1/156.
- [7] Lawrence, E., et al. The Coyote Universe. III. Simulation Suite and Precision Emulator for the Nonlinear Matter Power Spectrum. The Astrophysical Journal, 713, 1322–1331 (2010). doi:10.1088/0004-637X/713/2/1322.
- [8] Csáji, B.C. Approximation with artificial neural networks. MSc Thesis, Eötvös Loránd University (ELTE), Budapest, Hungary (2001).
- [9] Goodfellow, I., et al. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger (editors), Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc. (2014).
- [10] Albrecht, A., et al. Report of the Dark Energy Task Force. ArXiv Astrophysics e-prints (2006).
- [11] Kilbinger, M. Cosmology with cosmic shear observations: a review. Reports on Progress in Physics, 78(8), 086901 (2015).
- [12] Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434 (2015).
- [13] Theis, L., van den Oord, A., and Bethge, M. A note on the evaluation of generative models. ArXiv e-prints (2015).
- [14] Arjovsky, M. and Bottou, L. Towards Principled Methods for Training Generative Adversarial Networks. ArXiv e-prints (2017).
- [15] Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein GAN. ArXiv e-prints (2017).
- [16] Liu, J., et al. Cosmology constraints from the weak lensing peak counts and the power spectrum in CFHTLenS data. Physics Review D, 91(6), 063507 (2015). doi:10.1103/PhysRevD.91.063507.
- [17] Abbott, T., et al. Cosmology from cosmic shear with Dark Energy Survey Science Verification data. Physics Review D, 94(2), 022001 (2016). doi:10.1103/PhysRevD.94.022001.
- [18] Jee, M.J., et al. Cosmic Shear Results from the Deep Lens Survey. I. Joint Constraints on and with a Two-dimensional Analysis. Astrophysics Journal, 765, 74 (2013). doi:10.1088/0004-637X/765/1/74.
- [19] Petri, A. Mocking the weak lensing universe: The lenstools python computing package. Astronomy and Computing, 17, 73 – 79 (2016). doi:https://doi.org/10.1016/j.ascom.2016.06.001.
- [20] Petri, A., et al. Emulating the CFHTLenS weak lensing data: Cosmological constraints from moments and Minkowski functionals. Physics Review D, 91(10), 103511 (2015). doi:10.1103/PhysRevD.91.103511.
- [21] Kratochvil, J.M., et al. Probing cosmology with weak lensing Minkowski functionals. Physics Review D, 85(10), 103513 (2012). doi:10.1103/PhysRevD.85.103513.
- [22] Kwan, J., et al. Cosmic Emulation: The Concentration-Mass Relation for wCDM Universes. The Astrophysical Journal, 768, 123 (2013). doi:10.1088/0004-637X/768/2/123.
- [23] Heitmann, K., et al. The Coyote Universe Extended: Precision Emulation of the Matter Power Spectrum. The Astrophysical Journal, 780, 111 (2014). doi:10.1088/0004-637X/780/1/111.
- [24] Kwan, J., et al. Cosmic Emulation: Fast Predictions for the Galaxy Power Spectrum. The Astrophysical Journal, 810, 35 (2015). doi:10.1088/0004-637X/810/1/35.
- [25] Dodelson, S. and Schneider, M.D. The effect of covariance estimator error on cosmological parameter constraints. Physical Review D, 88(6), 063537 (2013). doi:10.1103/PhysRevD.88.063537.
- [26] Davies, F.B., et al. A New Method to Measure the Post-Reionization Ionizing Background from the Joint Distribution of Lyman- and Lyman- Forest Transmission. ArXiv e-prints (2017).
- [27] Pritchard, J.K., et al. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution (1999).
- [28] Ravanbakhsh, S., et al. Enabling Dark Energy Science with Deep Generative Models of Galaxy Images. arXiv:1609.05796" (2016).
- [29] de Oliveira, L., Paganini, M., and Nachman, B. Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis. arXiv:1701.05927 (2017).
- [30] Paganini, M., de Oliveira, L., and Nachman, B. CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks. arXiv:1705.02355 (2017).
- [31] Mosser, L., Dubrule, O., and Blunt, M.J. Reconstruction of three-dimensional porous media using generative adversarial neural networks. arXiv:1704.03225 (2017).
- [32] Salimans, T., et al. Improved techniques for training gans. CoRR, abs/1606.03498 (2016).
- [33] Springel, V. The cosmological simulation code GADGET-2. Monthly Notices of the Royal Astronomical Society, 364, 1105–1134 (2005). doi:10.1111/j.1365-2966.2005.09655.x.
- [34] Nair, V. and Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In J. Fürnkranz and T. Joachims (editors), Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807–814. Omnipress, Haifa, Israel (2010).
- [35] Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In F. Bach and D. Blei (editors), Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages 448–456. PMLR, Lille, France (2015).
- [36] Maas, A.L., Hannun, A.Y., and Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In S. Dasgupta and D. McAllester (editors), Proceedings of the 30th International Conference on Machine Learning, volume 28. PMLR, Atlanta, Georgia, USA (2013).
- [37] Goodfellow, I.J. NIPS 2016 tutorial: Generative adversarial networks. CoRR, abs/1701.00160 (2017).
- [38] Kingma, D.P. and Ba, J. Adam: A method for stochastic optimization. CoRR, abs/1412.6980 (2014).
- [39] Abadi, M., et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2015).

## Acknowledgements

The authors thank Jan M. Kratochvil for providing access to the suite of simulated convergence maps used in the this work. M. Mustafa would like to thank Evan Racah for his invaluable deep learning related discussions throughout this project. M. Mustafa would also like to thank Taehoon Kim for providing an open-sourced TensorFlow implementation of DCGAN which was used in early evaluations of this project. No network was hurt during the adversarial trainings performed in this work. M. Mustafa, D. Bard and W. Bhimji are supported by the U.S. Department of Energy, Office of Science, Office of High Energy Physics, under contract No. DE-AC02-05CH11231 through the National Energy Research Scientific Computing Center and Computational Center for Excellence programs.

## Author Contributions

M. Mustafa led the project, designed and carried out the experiments and statistical analysis and prepared the draft of this letter. D. Bard pioneered applications of machine learning techniques to convergence maps and guided the cosmology and statistical analysis, in addition to writing the introduction jointly with Z. Lukić and co-writing the Results section. Lukić also provided guidance on cosmology simulations and emulators in addition to co-writing the Discussion section and providing critical reviews of the early draft of this letter. R. Al-Rfou reviewed the manuscript and contributed to the discussion of the deep learning aspects of the project. W. Bhimji reviewed the manuscript, contributed to the discussions of all aspects of the project and helped in organizing the collaboration.