CMBGAN: Fast Simulations of Cosmic Microwave background Anisotropy maps using Deep Learning
Abstract
Cosmic Microwave Background (CMB) has been a cornerstone in many cosmology experiments and studies since it was discovered back in 1964. Traditional computational models like CAMB that are used for generating CMB anisotropy maps are extremely resource intensive and act as a bottleneck in cosmology experiments that require a large amount of CMB data for analysis. In this paper, we present a new approach to the generation of CMB anisotropy maps using a machine learning technique called Generative Adversarial Network (GAN). We train our deep generative model to learn the complex distribution of CMB maps and efficiently generate new sets of CMB data in the form of 2D patches of anisotropy maps. We limit our experiment to the generation of 56 and 112 patches of CMB maps. We have also trained a Multilayer perceptron model for estimation of baryon density from a CMB map, we will be using this model for the performance evaluation of our generative model using diagnostic measures like Histogram of pixel intensities, the standard deviation of pixel intensity distribution, Power Spectrum, Cross power spectrum, Correlation matrix of the power spectrum and Peak count.
1 Introduction
The variations in temperature of the Cosmic Microwave Background (CMB) are similar to the ripples on the cosmic pond and enclose a lot of information about the universe. To collect this information we look at the scales at which these temperature fluctuations occur. The amount of temperature fluctuations (in micro Kelvin) is plotted against the multipole moment (l). This is the angular power spectrum graph of a CMB temperature map. Such graphs contain several peaks which provide a lot of information and we exploit this for our use.
The first peak is an indication of the geometry of the universe, whether it is flat or curved (Hu, Wayne, et al., 2004).. CMB radiation is distorted by the curvature of the universe since the radiation comes from all directions of the visible universe. The fluctuations will appear undistorted if the universe is flat. The fluctuations would appear magnified if the universe is positively curved and demagnified if it is negatively curved. The second peak reveals information about the number of baryons present in the universe. Due to the initial fluctuations in the universe, all matter would tend to gravitationally group towards the higher density fluctuations. However, baryon matter which is interactive with light would heat up as it clumps up, and the resultant pressure would try to push against the grouped matter. This implies that the second peak will be more damped if there is more matter. Thus, the ratio of the first and second peak gives us the baryon density(Bucher, M., 2015).
The anisotropy of the cosmic microwave background (CMB) consists of the small temperature fluctuations in the blackbody radiation left over from the Big Bang. The CMB temperature maps are an incredible source of information for cosmological analysis and the advent of big data methods (Alex Krizhevsky, Geoffrey E Hinton, 2012) have opened a new avenue for the analysis of CMB. Modern data analysis methods such as machine learning and deep learning require a large amount of data and traditional methods such as CAMB and healpy (Gorski, K. M., Hivon, E., Banday, A. J., et al. 2005) are computationally expensive and inefficient for generating a large number of CMB maps. Here we demonstrate the use of deep generative models to generate synthetic samples of CMB allsky maps which can be used for cosmological analysis. Deep generative models are capable of learning complex distributions from a given dataset and then generate new, statistically consistent data samples (I. J. Goodfellow, 2014). We generate the dataset for the training of our generative model by snipping 128x128 and 256x256 resolution patches from the CMB maps which effectively gave us 56 and 112 patches respectively. We also train a multilayer perceptron network (MLP) to predict the baryon density of a given CMB map, this helps us in comparing the samples generated by the generative model with the samples of our dataset by correlating the baryon density predictions given by the MLP model. We use various diagnostic metrics like the histogram of pixel intensities, the standard deviation of pixel intensity distribution, Power Spectrum, Cross power spectrum, Correlation matrix of the power spectrum and Peak count to evaluate the performance of our generative model. The practical advantage of this method is that once the model has been trained, the generation process is extremely fast, thus giving us the ability to generate a large number of samples that can be used for scientific study.
2 Methodology
2.1 CAMB and Data Generation
We use standard cosmological software CAMB to generate CMB temperature maps for training. CAMB is used to compute CMB, CMB lensing and other related cosmological functions. CAMB takes several parameters as input to generate a file containing the initial angular power spectrum data of the universe. The Curved correlation function is used as the lensing method and we include reionization. Other physical parameters which are input to CAMB include Hubble constant, the temperature of CMB, baryon density, cold dark matter density, the effective mass density of dark energy, maximum multipoles data, redshift and helium fraction. This power spectrum file is in turn used by healpy to generate random gaussian CMB temperature maps which are used for training the neural network.
Anisotropy from dipole effect due to the movement of the earth relative to the CMB rest frame and galactic contaminants along the equator corresponding to the galactic plane is removed while generating the temperature maps. The generated fullsky maps have the galactic center at the center of the mollweide projection.
2.2 Implementation and Training
The method proposed in this paper comprises two steps, Baryon density estimation and CMB data generation using a traditional Artificial Neural Network and a Deep generative model. We use a Generative Adversarial Network trained on CMB patches obtained using CAMB for the generation of new CMB data and a Multilayer perceptron network trained on labeled CMB data for baryon density estimation which will be used for diagnosis and performance evaluation of our Generative network. We first train an image classifier using a multilayer perceptron network with the baryon density obtained from the power spectrum of CMB as the classes/labels of our data. Here we use a dataset with a large number of classes to approximate our classifier as a regression model, this helps us in predicting the baryon density of the input test images with a higher degree of precision. Convolutional neural networks (CNN) are one of the most famous set of neural network architectures used for classifying images, CNN takes advantage of local spatial coherence of the input (Rippel, Snoek, Adams, 2015) because we assume that the spatially close images used for training are correlated, but in the case of the CMB dataset , the pixels in the images are random noise following a gaussian distribution, the CNN network will not be able to find any common features in the inputs and thus the training accuracy and test error will be less than favorable. We have tested Resnet101 and Inception v2 CNN architectures. The training accuracy of resnet network was very low whereas the inception network was suffering from high variance problem (Liu, Wei, Zhang, Yang, 2017). For this reason, we will be using a MultiLayer perceptron network. A multilayer perceptron is one of the most commonly used architectures of feedforward artificial neural networks, it consists of three classes of layers and nodes, the input layer, hidden layers, and an output layer. Each node in a layer is connected to the nodes of the next layer via a nonlinear activation function. Multilayer perceptron makes use of one of the most famous techniques of supervised learning called backpropagation (Goodfellow, Bengio, Courville, 2018) for training the network. A multilayer perceptron can be distinguished from a linear perceptron from its characteristic use of fully connected multilayers. This makes multilayer perceptrons suitable for working with nonlinearly separable data (Bullinaria, 2015) and can be perceived as a logistic regression classifier. The weights of the fully connected layers are updated once a batch of data has been passed through the network by measuring the error of the output with the expected result (predetermined labels), this is the essence of learning in neural networks and is carried out with the help of an iterative algorithm called backpropagation. This is an example of supervised learning. Backpropagation uses an iterative optimization algorithm called gradient descent (Goodfellow, Bengio, Courville, 2018) to update the weights of the network. We continue to train the network until the training accuracy and the testing cost gets saturated. We have used a softmax crossentropy function as our loss function. consider a mapping of input x to category y, we have
Objective:
where,
E is the expectation function
P(data) is the true data distribution
P(yx) is the distribution of our parametric model.
We now train our generative model to generate the CMB data. The primary difference between a discriminative algorithm and a generative algorithm is that a discriminative algorithms map features to labels whereas a generative algorithm tries to predict the features given a certain label. Discriminative models learn the boundary between classes and Generative models model the distribution of individual classes. In this experiment, we use a Deep Convolutional Generative Adversarial Network which has the ability to mimic complex distributions of data. The primary goal of the Generative Adversarial Network is to generate new samples from the same distribution as that of the training data. The most notable feature of GAN is that it consists of a pair of networks: a generative network (G) and a discriminative network (D). The two networks are in a twoplayer game setting where the Generator network tries to fool the discriminator by generating images that match very closely to the training data and the Discriminator network tries to differentiate between real and generated images thus training jointly in a minimax game. The Discriminator tries to classify a sample x and outputs the likelihood in (0,1) of the real image, whereas the Generator uses a random variable z drawn from a given prior distribution.
Objective :
where,
E is the expectation function
P(x) is the true data distribution
P(z) is the prior distribution ( usually a Gaussian )
The Discriminator D tries to maximize the objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake) and the Generator G tries to minimize the objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real). This training process is essentially trying to reduce the JensenShannon divergence between P(x) and P(z). We have used the Tensorflow library to implement the MLP model and the GAN model. We have used Adam optimization (Kingma, Ba, 2017) algorithm instead of the traditional stochastic gradient descent for updating the weights of the network (Michelucci, Umberto, 2018) and used L2 regularization, also known as ridge regularization to prevent our model from overfitting. In L2 regularization, we add a squared error term as a penalty to the loss function (Goodfellow, Bengio, Courville, 2018). The training of the network is done in the Google Cloud platform using a Tesla K80 GPU.
2.3 Network Configuration
MLP Network:
No of hidden layers  No of Nodes in each hidden layer  Learning Rate 
5  3223  0.001 
Batch size  No of epochs  Regularization parameter 
512  50000  0.01 
The learning rate determines how fast the weights or the coefficients of the network are updated. An epoch can be defined as the number of times the algorithm perceives the entire dataset. Hence, an epoch is completed when all the samples of the data have been perused. An iteration can be defined as the number of times a âbatch of dataâ has been passed through the algorithm. In the case of a multilayer perceptron, that means the forward pass and backward pass. Hence, an iteration is completed once a batch of data has passed through the network. The batch size is the number of training examples passed through the network at once (Shen, 2017 & Svozil, Kvasnicka, Pospichal, 1997).
GAN Network:
We use a modified version (Alec Radford, 2015) of the standard GAN architecture incorporating convolution layer with a kernel Size of 5x5.
No of hidden layers  Operations  Outputs  Batch Size 
5 (for 56 patches) 6 (for 112 patches)  Conv/ linear  LeakyReluBatchNorm/ sigmoid  50 
No of epochs  Learning Rate  Regularization parameter 
2000  0.000001  0.01 
Dimension of the gaussian prior distribution ( linear input of generator) = 200
No of hidden layers  Operations  Outputs  Batch Size 
5 (for 56 patches) 6 (for 112 patches)  linear/DeConv  ReluBatchNorm/ tanh  50 
No of epochs  Learning Rate  Regularization parameter 
2000  0.000001  0.01 
3 Results
Here we present the results obtained after training the Multilayer Perceptron and the Generative adversarial network. We have focused our study on two classes of CMB simulations, 56 patches and 112 patches. We also analyze the performance of our generative model by comparing the simulated patches with the patches obtained using CAMB, whose baryon density matches that of the baryon density of the generated patches that is predicted by the trained multilayer perceptron. The diagnostic measures used are Histogram of pixel intensities, the standard deviation of pixel intensity distribution, Power Spectrum, Cross power spectrum, the Correlation matrix of the power spectrum, Peak count and total peak count. We find the peak count by calculating all the pixels in the map that have a higher intensity compared to its neighbors, this is achieved by converting our 2D image into a signal using fast fourier transform (fft). We then use the signal to find the peak count. To find the power spectrum, we auto correlate the image to get the power spectral density image. Further, we find the azimuthal averaged radial profile which gives the power spectrum of the given image.
3.1 56 patches
(a) 
(b) 
Figures 511 represent the diagnostic results obtained from the 56 patches generated by the GAN model.
We now generate 100 samples using the trained GAN model and then use them as the input to the trained MLP model to predict the corresponding baryon densities. We then extract the patches from the CAMB training dataset whose baryon densities matches the predictions obtained from the MLP model and compare them with the GAN patches.
3.2 112 patches
(a) 
(b) 
Although the results of the GAN model trained on 56 patches are impressive, the assumption of the 2D images being sufficiently flat cannot be carried over when analyzing the 112 patches. The distribution of pixel intensities of the patches generated by the GAN model is almost perfectly gaussian signifying a loss of information, this can be solved by training with spherical CMB patches instead of 2D images and replacing the convolutional layers in the GAN model with spherical convolutional layers.
4 Conclusion and future plans
We have successfully presented the ability of Generative adversarial networks to learn the complex distribution behind flat CMB anisotropy maps. We have trained a deep convolutional generative adversarial network on a dataset of 56 CMB patches and 112 CMB patches obtained using CAMB. The patches generated by the GAN model trained on 56 patches are very similar to our training data that are the patches obtained by CAMB and healpy, the power spectrum of the patches generated by GAN and the patches obtained by CAMB are in very close agreement and a similar trend can be seen in other diagnostic metrics as well. On the other hand the distribution of pixel intensities of the patches generated by the GAN model trained on 112 patches is almost a gaussian signifying a loss of information, we attribute this feature to the fact that we have approximated the 56 CMB patches to be sufficiently flat, an assumption which cannot be extended to 112 patches. We can solve this issue by using spherical CMB maps and using Spherical convolutional layer in the Generative adversarial networks. We have shown that deep learning can be a viable alternative to traditional methods of CMB data generation and computationally much more efficient for cosmological experiments that require a large amount of CMB data, we hope to extend this study to the simulation of fullsky maps using spherical convolutional layers and generative adversarial networks in the future. We have trained our deep generative model using patches of CMB instead of fullsky maps because we were constrained by limited computational power. We hope to further improve our model and train using fullsky maps to present a viable method of CMB data generation for cosmological analysis.
Footnotes
 affiliation: BITS Pilani Hyderabad Campus
 affiliation: BITS Pilani Hyderabad Campus
 affiliation: BITS Pilani Hyderabad Campus
References
 Bucher, M. 2015, International Journal of Modern Physics D, 24, 1530004â1530303.
 Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759.
 Goodfellow, Ian, et al. 2018, Deep Learning, MITP.
 Hastie, Trevor, et al. 2017, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer.
 Liddle, Andrew R 2015, An Introduction to Modern Cosmology, John Wiley and Sons.

Hu, Wayne & White, Martin 2004, The Cosmic Symphony, Scientific American. 290. 4453.
10.1038/scientificamerican020444.  Bennett, C. L., Larson, D., Weiland, J. L., et al. 2013, The Astrophysical Journal Supplement Series, 208, 20.
 Koberlein, Brian, and David Meisel 2009, Astrophysics through Computation.
 Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2011, A&A, 536, A1.
 Bennett, C. L., Banday, A. J., Gorski, K. M., et al. 1996, ApJ, 464, L1.
 Lewis & Challinor. Code for Anisotropies in the Microwave Background(CAMB), https://camb.info.
 Rippel, O., Snoek, J., & Adams, R. P. 2015, arXiv eprints, arXiv:1506.03767.

Kingma, D. P., & Ba, J. 2014, arXiv eprints,
arXiv:1412.6980.  Shen, H. 2016, arXiv eprints , arXiv:1611.05827.
 Michelucci, Umberto. (2018). Applied Deep Learning: A CaseBased Approach to Understanding Deep Neural Networks. 10.1007/9781484237908
 Sifaoui, Amel & Abdelkrim, Afef & Benrejeb, Mohamed. (2008). On the Use of Neural Network as a Universal Approximator. International Journal of Sciences. 2. 386399.
 Yu Zhang, Qiang Yang, Bo Liu, Ying Wei. Deep neural net works for high dimension, low sample size data. In Pro ceedings of the TwentySixth International Joint Confer ence on Artificial Intelligence, IJCAI17, pages 22872293, 2017
 Daniel Svozil, Vladimr Kvasnicka, Jir Pospichal, Introduction to multilayer feedforward neural networks, Chemometrics and Intelligent Laboratory Systems, Volume 39, Issue 1, 1997, Pages 4362, ISSN 01697439
 J.A.Bullinaria,Learning in multilayer perceptrons,backpropagation, Neural Comput. Lect., vol. 7, no. 8, pp. 116, 2015
 Hastie, Trevor, et al. 2017, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer.
 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097â1105, 2012.
 I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks. ArXiv 1406.2661, June 2014.
 Paulina Grnarova, Kfir Y Levy, Aurelien Lucchi, Thomas Hofmann, and Andreas Krause. An online learning approach to generative adversarial networks. arXiv preprint arXiv:1706.03269, 2017.
 Akash Srivastava, Lazar Valkoz, Chris Russell, Michael U Gutmann, and Charles Sutton. Veegan: Reducing mode collapse in gans using implicit variational learning. In Advances in Neural Information Processing Systems, pages 3308â3318, 2017.
 Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. ArXiv 1511.06434, 2015.
 I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved Training of Wasserstein GANs. ArXiv 1704.00028, March 2017.
 Gorski, K. M., Hivon, E., Banday, A. J., et al. 2005, Astro phys. J., 622, 759.