A volumetric deep Convolutional Neural Network for simulation of dark matter halo catalogues
Abstract
For modern largescale structure survey techniques it has become standard practice to test data analysis pipelines on large suites of mock simulations, a task which is currently prohibitively expensive for full Nbody simulations. Instead of calculating this costly gravitational evolution, we have trained a threedimensional deep Convolutional Neural Network (CNN) to identify dark matter protohalos directly from the cosmological initial conditions. Training on halo catalogues from the Peak Patch semianalytic code, we test various CNN architectures and find they generically achieve a Dice coefficient of \sim\!92\% in only 24 hours of training. We present a simple and fast geometric halo finding algorithm to extract halos from this powerful pixelwise binary classifier and find that the predicted catalogues match the mass function and power spectra of the ground truth simulations to within \sim 10\%. We investigate the effect of longrange tidal forces on an objectbyobject basis and find that the network’s predictions are consistent with the nonlinear ellipsoidal collapse equations used explicitly by the Peak Patch algorithm.
keywords:
largescale structure of Universe – galaxies: haloes – dark matter – methods: numerical1 Introduction
The fundamental observable in the study of the largescale structure of the Universe is the nonlinear matter density field. In Nbody simulations of collisionless cold dark matter (CDM) particles, initially overdense regions collapse under gravity to form virialized structures termed dark matter halos. In the standard model of cosmology these objects form the potential wells in which baryonic matter can collect to form galaxies, galaxy groups, and galaxy clusters (Rubin et al., 1980; Hopkins et al., 2014). An essential output of Nbody simulations is a catalogue of positions, velocities, and masses of halos. These mock dark matter halo catalogues allow us to interpret the observations of galaxy surveys and constrain cosmological models.
Modern largescale structure survey techniques like SunaeyevZeldovich effect (Planck Collaboration et al., 2016; George et al., 2015), weak lensing (Ade et al., 2016; Joudaki et al., 2017; Hildebrandt et al., 2017; DES Collaboration et al., 2017), or intensity mapping (Kovetz et al., 2017), hold incredible promise for constraining fundamental physics such as gravity on large scales (Masui et al., 2010), the equationofstate of dark energy (Shaw et al., 2015), neutrino masses (Inman et al., 2016), or the physics of inflation (Alvarez et al., 2014). However each technique is accompanied by complicated systematics which, if not understood, would wash out the soughtafter signal. It has become standard practice, therefore, to test data analysis pipelines on large suites of mock simulations (Avila et al., 2017; Manera et al., 2013), which combine realistic models of the signal and instrument. However, the number of simulations required to accurately determine survey error bars and scan parameter space is currently prohibitively large for full Nbody simulations. This has led to the development of many ‘approximate methods’ of large scale structure which attempt to create simulations of a satisfactory accuracy at minimal computational cost (Bond & Myers, 1996a; Monaco et al., 2013; Tassev et al., 2013; Izard et al., 2016; Feng et al., 2016; Avila et al., 2015; Kitaura et al., 2014; Chuang et al., 2015; White et al., 2014).
In this work, we investigate the use of a Convolutional Neural Network (CNN) for fast generation of mock dark matter halo catalogues directly from the initial conditions. In recent years, CNNs have been lauded for their performance in computer vision tasks such as object detection or image segmentation (Krizhevsky et al., 2012). In cosmology, machine learning techniques have shown promise for the purposes of cosmological parameter estimation (Ravanbakhsh et al., 2017; Gupta et al., 2018; Gillet et al., 2018), simulating twodimensional slices of the nonlinear density field (Rodriguez et al., 2018), initial conditions reconstruction (Modi et al., 2018), as well as classifying evolved structures in Nbody simulations (AragonCalvo, 2018) (who uses a similar CNN architecture to that of this work). Recently, LucieSmith et al. reported on their study of random forest classifier for halos formed in an Nbody simulation, traced back to the initial conditions.
Here we report the first application of a threedimensional CNN for simulation of mock halo catalogs. We formulate haloidentification as a pixelwise binary classification (or image segmentation) problem. The input of the CNN is therefore the initial, or Lagrangian space, density field and the output is a mask whose value is the network’s certainty that a voxel ends up inside of a halo in the evolved simulation. The CNN is free to learn any spatial function (or feature) of the initial density field which allows it to distinguish between collapsed (halo) and uncollapsed voxels. This is unlike the random forest method of LucieSmith et al. (2018), where the input features are chosen.
The pioneering work of Press & Schechter (1974) in the theoretical understanding of dark matter halo formation described the process statistically as a thresholding operation on the Gaussian random initial density field. Bardeen et al. (1986) added further constraints, noting that local maxima (or peaks) of the field should dictate the collapse dynamics, requiring information on both its first and second derivatives. Bond & Myers (1996a) formalized the relationship between tidal forces and triaxial (or ellipsoidal) collapse of the regions surrounding peaks (peak patches), giving rise to the Peak Patch algorithm. The latter is the method used to generate the socalled “ground truth” simulations that we train our network on. The relationship between halo masses and tidal forces is both a nontrivial and a well defined property of (Peak Patch) halos, which can be evaluated on an objectbyobject basis. CNNs quickly learn to find edges (Krizhevsky et al., 2012), for example, so it is interesting to ask whether more complicated combinations of derivatives can be learned as well.
In addition, producing large mocks of the universe with a CNN has two strong advantages over an Nbody simulation: computational speed and orders of magnitude less memory requirement. The CNN by nature only considers the effects of pixels spaced by the size of the largest filter (which can be quite large, 128 Mpc in this study). This approximation allows the cosmological density field to be subdivided into separate volumes, eliminating the need to compute costly longrange gravitational forces, which typically requires expensive message passing between nodes. When combined with a multiscale initial conditions generator (e.g. Hahn & Abel (2011)), holding the full simulation in memory at once can also be eliminated, and a large volume of the universe can be simulated on any modest machine.
The paper is outlined as follows. In the following section (2), we provide some relevant background information on the VNet architecture we have adopted. Then, in Section 3, we discuss the specifics of its implementation and the simulations that were used as training, validation, and test data. We tested several network variations in order to determine one that performed best after a reasonable amount of training. In Section 4, we suggest a simple and fast algorithm for extracting halo catalogues from the mask that is output by the CNN, with special attention to completeness. We call this method for producing mock catalogs “HaloNet”. We can then evaluate the accuracy of these catalogues with population and clustering statistics, which we describe in Section 5. Finally, in Section 6, we discuss the implications of our findings and the future prospects for fast and accurate mocks with Convolutional Neural Networks.
2 Convolutional Neural Networks for Image Segmentation
A standard CNN passes inputs through a series of layers which decrease in size along the input dimension, but increase in size along a new dimension which labels the identified features. Eventually, at output, these features are combined to hopefully identify the image as belonging to one of the requested classes. The output dimension is equal to the number of classes and its value represents the network’s certainty that input image is in that class. In image segmentation, however, the output should have the same dimensionality as the input, with values representing the certainty that an input element is part of a region of interest, termed the foreground. This problem was addressed for medical image segmentation by Ronneberger et al. (2015), whose UNet architecture makes use of deconvolutions to return to the input image dimensionality. We strongly recommend that the unfamiliar reader consult Ronneberger et al. (2015) and Milletari et al. (2016) for detailed descriptions as well as beautifully constructed connectivity graphs of their networks. We do not reproduce those graphs here, but we nonetheless summarize the essential aspects of those architectures we have used in our implementation. The term UNet refers to a graphical picture of the Ushaped flow of the data through the network. One first descends the left side of the U, identifying features on larger and larger scales. These features are then remapped into the image space through deconvolution. Ronneberger et al. (2015) introduced the concept of finegrained feature forwarding where, as one reascends the right side of the U, the features output from the samedimension level on the left are concatenated onto those coming from below. The final convolutional layer’s output has the same dimensionality as the input but with two features, across which a softmax \sigma:\mathbb{R}^{D}\rightarrow[0,1]^{D}, is applied to convert the final output to a probability,
\sigma(z_{i})=e^{z_{i}}/\sum\limits_{j=1,\dots,N}e^{z_{j}},  (1) 
where i=1,\dots,N, D is the dimensionality of the space, and N=2. The two features are then compared to the groundtruth foreground and background masks, respectively, to compute the loss.
This idea was then applied in threedimensions to MRI images by Milletari et al., whose VNet architecture we adopt for this work. Milletari et al. (2016) further formulated the successive convolutions applied on each level (these don’t change the dimensionality) as residual networks, which have been found to significantly improve the training of very deep networks (He et al., 2015). We refer the reader to Milletari et al. (2016) for further details on VNet, however in Section 2 we summarize our implementation and the variations thereof that we have tested.
3 Implementation and Training set
3.1 Peak Patch review
We train our network on dark matter halo catalogues generated with the Peak Patch semianalytic code (Bond & Myers, 1996a, b, c). Peak Patch identifies dark matter halos in Lagrangian space by performing spherically averaged measurements of both the density and tidal tensor at locations of candidate peaks of the density field. A Peak Patch halo is therefore the largest spherical region of Lagrangian space which satisfies the conditions for ellipsoidal collapse at the target redshift. A hierarchical exclusion and binary merging algorithm is then performed to determine the final catalogues. A candidate peak is excluded if its centre lies inside a larger halo. If two peaks overlap only slightly, to conserve mass the overlapping mass is then subtracted from the smaller. The final halos are then moved to their evolved positions (to Eulerian space) using secondorder Lagrangian perturbation theory (although for this study we concentrate on the Lagrangian halo positions and masses, and so do not specify the details of the displacement). Peak Patch has passed extensive validations against many modern simulations, which will be outlined in a series of upcoming papers.
For our training data, we have simulated 256 (512~{}{\rm comoving\ Mpc})^{3} Peak Patch boxes, with 1 Mpc resolution (a 512{}^{3} periodic grid). These are computed at redshift 0 and using the following cosmological parameters: H_{0}=70.0 km/s Mpc{}^{1}, \Omega_{b}=0.043, \Omega_{c}=0.207, \Omega_{\Lambda}=0.75, n_{s}=0.96, and \sigma_{8}=0.8. We place a lower mass cutoff at the radius of a 27 cell halo. We keep 32 of these simulations as our testing set (Section 5). The ground truth mask is generated at the same resolution from the Peak Patch catalogues by masking only voxels whose centres lie inside a halo. The masks and associated density fields are then divided into 128^{3} subvolumes to be input to the network. Our training set therefore consists of 14336 independent volumes, an eighth of which are saved for validation. This is augmented by another factor of 8 by random reflections. While the Peak Patch algorithm itself requires both the density and displacement (velocity) fields, only the density is provided as input to the network.
We note that our method is also applicable to protohalos traced back from the output of Nbody simulations. While Nbody simulations calculate the true dynamics of collisionless cold dark matter particles, dark matter halos are dynamic objects with complicated morphologies (Diemer & Kravtsov, 2014; Adhikari et al., 2014), and typical Eulerian space halo finders suffer from this uncertainty in their definition. We choose to perform this study on Peak Patch halos largely for the ease with which we can generate large numbers of statisticallyindependent realizations, but also due to the unambiguous definition of a Peak Patch halo. In Nbody, disconnected regions of Lagrangian space can belong to the same Eulerian halo (unlike in Peak Patch), which can add another degree of difficulty when performing Lagrangian spaced halo finding. Peak Patch has also been shown to have percent level accuracy for cluster and group sized halos which are to a high degree spherical, and also reproduces a wide range of nonlinear effects related to tidal forces also observed in Nbody simulations. See Section 6 for a discussion of how our method generalizes.
3.2 Implementation and training
We have coded a custom implementation of VNet using keras (Chollet et al., 2015). Following Milletari et al. (2016), we raise and lower the level (halve and double the resolution) with threedimensional cubic down (Conv3D(2)) and up (Conv3DT(2)) convolutions of size 2 with stride of 2. On each level successive cubic convolutions of size 5 with unit stride are applied (Conv3D(5)), bracketed by an identity “shortcut” (see He et al. (2015) for details) to obtain a residual block. After every convolution we apply a leaky rectified linear unit (LReLU) activation with \alpha=0.05. Appropriate zero padding is used throughout to obtain the required output dimensionality. The architecture whose output we analyze in the following sections has 5 levels and 3 Conv3D(5)s within each residual block. This is true for all levels except the input, which performs a single Conv3D(5) to set the initial number of filters (in this case 16) and then a single Conv3D(5) bracketed by a shortcut. As our simulations have 1 Mpc resolution, this network downsamples to 2^{5}=32 Mpc scales, but then learns (5\times 5\times 5) kernels on that level, meaning its largest filter could learn 160 Mpc features.
To test whether we are capturing largescale environmental effects we train a 6 level VNet as well. However, due to memory limitations we are forced to reduce the initial number of filters from 16 to 10 (which reduces the doubling on each successive level by that factor). Still, the total number of free parameters of the 5 and 6 level networks are comparable. To confirm that 5 levels are necessary to capture all information, we train a 4 level network with 16 initial filters. This information is summarized in Table 1. All shallower networks, networks with a smaller number of filters, and networks with smaller convolution kernels performed worse in training than the 5 level. We also tested the use of batch normalization (Ioffe & Szegedy, 2015) before nonlinearities. While we observed gains in training smaller 64^{3} boxes with batch normalization at the input of every LReLU activation, the method has too large a memory overhead for the 128^{3} boxes. We tested several inhomogeneous placements of the normalizations but found that these were generally sensitive to overfitting.
We train using the stochastic gradient descent optimizer provided in keras and TensorFlow (Abadi et al., 2016) backend, on a single Power 8 node with 4 Nvidia Tesla P100 GPUs. Following Milletari et al. (2016), we maximize the Dice coefficient \mathcal{D},
\mathcal{D}=\frac{2\vec{g}\cdot\vec{p}}{\vec{g}^{2}+\vec{p}^{2}},  (2) 
where \vec{g} and \vec{p} are Ddimensional vectors representing the ground truth and network outputs. The sum is performed over both foreground and background masks. Our training proceeds in two stages. For the first stage we use a learning rate of 0.1, momentum of 0.4, and dropout of 0.5 on the activations following the Conv3D(2) and Conv3DT(2) layers of the inner levels. We perform this stage a very small (5\times 512^{3} box) sample of the dataset. This allows the training to proceed very quickly past the local minimum corresponding the collapse fraction (the fraction of Lagrangian space which ends up in halos above the minimum halo mass cutoff) of f_{col}\simeq.27 (i.e. Dice coefficient of \sim 73\%). For the second stage, we stop the training, change to a learning rate of 0.01, momentum of 0.9, turn off dropout, and iterate through the entire training set. At all stages we use minibatches of size 3, due to memory limitations.
In Figure 1, we show the second stage of training for the architectures summarized in Table 1. We train each for a 24 hour period, and find that all achieve a Dice coefficient \geq 92\%. We find that the 5 level network achieves the largest Dice coefficient, despite the fact that its gradient updates (iterations) take the longest to compute, meaning the optimizer completes a smaller number of gradient updates in a fixed time. While the fractionofapercent improvement of the 5 level over the 4 may seem marginal, as the Dice coefficient is a pixelbased quantity it is difficult to interpret (and the difference may be large) in terms of accuracy on halo properties. A better metric is to compare this increase only to the fraction of pixels in true halos (f_{col}\simeq.27). Interestingly, we find a marginal improvement in normalizing each simulation by the mean gridlevel standard deviation of all simulations versus its own, indicating that the 5 level network has access to information on box scales. Both the individually and globally normalized 5 level networks are shown in Figure 1, while the 4 and 6 level are individually normalized. We choose therefore to train the globally normalized 5 level for another 48 hours (72 hours total), and this final network in analyzed in the following sections.
3.3 Validating the Trained Neural Network
Having completed the training, we investigated the accuracy of the network’s prediction for the set of 32 test simulations. HaloNet was trained on 128{}^{3} volumes of the density field, so to predict the probability mask for full a 512{}^{3} simulation (which represents the certainty that a voxel will end up in a dark matter halo) we partitioned it into subvolumes of the input size and predicted on each separately. In order to avoid edge effects we split the simulation into 8{}^{3} overlapping subvolumes of 128{}^{3} cells, where 32 cells on each edge are used as a buffer, as this is roughly the maximum halo size that we expect. Therefore, each pass to the network results in an effective volume of (12832\times 2)^{3}=64^{3} cells. Combining the predicted subvolumes back together allows us to create the final output mask. Due to the speed of HaloNet, the rather large buffer region (by relative volume) is not a computational problem, as a full 512{}^{3} run takes only \sim3 minutes to predict.
In Figure 2 we see that the predicted mask visually resembles the true mask for the vast majority of halos, which is not explicitly guaranteed by the high Dice coefficient. The panels in the center and right are coloured by the probability that a voxel, or “dark matter particle”, ends up as part of a halo at redshift 0. Masked regions belonging to large halos are correctly predicted to a very high level of accuracy. The predicted probability begins to decrease for smaller halos, but even the smallest halos almost all have a region of probability above the minimum cutoff shown of 50%. This is promising for the halo finding we implement in the following section.
To characterize the performance of our pixelwise binary classifier directly we create receiver operating characteristic (ROC) curves, which graphically represent the balance between the true and false positive rates as a function of the specified probability threshold. For a given probability cut, the true positive rate (TPR) and false positive rate (FPR) are given in terms of the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), as
\displaystyle TPR  \displaystyle=\frac{TP}{TP+FN},  (3)  
\displaystyle FPR  \displaystyle=\frac{FP}{FP+TN}.  (4) 
True positives (negatives) in our classification correspond to particles correctly identified as ending up inside (outside) of halos, while false positives (negatives) correspond to particles incorrectly identified as ending up inside (outside) of halos, all for a given probability cut.
In Figure 3, we vary the probability threshold from 0 to 1 and calculate the TPR and FPR at each threshold in order to create a set of ROC curves. To quantify the performance of a classifier, a widelyused measure is the Area Under Curve (AUC). In the ideal case, the classifier would predict cells with 100% accuracy at any threshold, and the AUC would be equal to 1. We separately calculated the ROC for mass ranges and radial ranges, adopting the same halo definitions of inner (r < 0.3R{}_{h}), middle (0.3R{}_{h} < r < 0.6R{}_{h}), and outer (r > 0.6R{}_{h}) as LucieSmith et al. (2018), and similar definitions of clusters (M{}_{h} > 10{}^{14}M{}_{\odot}), groups (10{}^{13}M{}_{\odot} > M{}_{h} > 10{}^{14}M{}_{\odot}), and galaxies (3.2\times10{}^{12}M{}_{\odot} > M{}_{h} > 10{}^{13}M{}_{\odot}). We find very high AUC values across all mass and radial splits, meaning HaloNet is a very accurate pixelwise binary classifier for the problem in question.
The end goal of this work is to define halos in the predicted probability mask to create a final halo catalogue. As we roughly want to maximize the TPR while minimizing the FPR, we could use the ROC to inform the choice of a constant probability cut to define as the boundary of halos. But, as seen in the ROC curves, clusters are predicted with more accuracy than galaxies, providing motivation towards using an adaptive probability threshold as a function of scale. We therefore perform measurements of the probability profile around true halo locations to determine the average P{}_{\mathrm{cut}}(\mathrm{R_{halo}}) as a function of halo radius, seen in Figure 4. The final probability thresholds we use typically lie in the range 0.5 <P_{\mathrm{cut}}(\mathrm{R_{halo}})< 0.7. From the ROC curves we see that this roughly corresponds to a false positive rate of 0.04 and a true positive rate of 0.8. We show in Section 5 that these numbers are not directly related to the accuracy of the final halo catalogue, as halo finding can use local information and average the probability in radial shells to reduce many unwanted fluctuations in the probability field.
4 Binary Classification to Halo Catalogue
In order to transform a three dimensional mask of probabilities to a mock halo catalogue we need to partition volumes of the density field into individual dark matter halos, based on the probability values of the predicted mask. In Figure 2 we can clearly see that predicted mask probabilities correspond with high accuracy to the true mask, but there remains some small differences, mostly due to overlapping halos in high density regions. It is also apparent that regions of Lagrangian space belonging to more massive halos have a greater central predicted probability in the HaloNet output when compared to smaller halos. We use these observations to design a simple, hierarchical, geometrical, Lagrangian halo finder to identify halos in the predicted mask, using three simple steps:

Find connected regions in the field above a probability threshold P{}_{\rm cut}^{\rm peak}. The connected regions of space will be roughly nonspherical, but their centresofmass will nearly correspond to the centers of the the true halos, given the predicted probability mask matches the true mask at those regions. The centerofmass of each connected region is then used as the center of a new halo.

At each center, proceed outwards and average the probability mask in spherical shells until the mean probability of the shell has dropped below P{}_{\rm cut}^{\rm shell}(\rm R_{\rm halo}). To reduce the effects of halo clustering only consider cells that do not already belong to other halos. The radius of the previous shell is then assigned as the radius of the halo, and the position and size of the halo are added to the final catalogue.

Descend to the next P{}_{\rm peak}^{\rm cut} in the list P{}_{\rm cut}^{\rm peak}=[p_{0},p_{1},...,p_{n}] and repeat steps 12. Using multiple probability thresholds of decreasing value ensures that small regions of the probability mask are not removed before the large halos have been found.
To determine P{}_{\rm cut}^{\rm shell}(\rm R_{\rm halo}) we stacked the HaloNet output on the true halo positions and measured the mean probability in radial shells outward from the origin, as seen in Figure 4. We found that the radius of the true halos corresponds roughly to P_{\rm cut}^{\rm shell}(R_{\rm halo})=0.65 for medium to large sized halos, but the predicted probability begins to drop for smaller sized halos. This is to be expected, as smaller halos have larger tidal forces acting upon them, and are more difficult to predict. Therefore, we choose an empirically determined piecewise linear function, seen in the inset on the left of Figure 4, which dictates approximately where the mean probability of a radial shell drops below P{}_{\rm cut}^{\rm shell}:
P_{\rm cut}^{\rm shell}(R)=\begin{cases}0.044x+0.31&R\leq 6.7~{}{\rm cells}\\ 0.0045x+0.57&R>6.7~{}{\rm cells}\end{cases} 
This set of radial probabilities is adopted as the definition of a HaloNet halo. We note that a flat probability cut of P_{\rm cut}^{\rm shell}=0.65 gives similar results, but in order to maximize accuracy we adopted the piecewise linear function, as measuring it takes up a negligible amount of time compared to the training of the network.
In Figure 4 we also show the standard deviation in radial shells outward from the true halo centers. The standard deviation remains very close to 0 at small radii, meaning that the probability mask is very close to spherically symmetric around the locations of true halos. Closer to the radius of the halo the standard deviation of the shell starts to increase, meaning that the mask starts to become less spherical near the halo boundary. This increase is largely due to halo clustering, so the probability mask is spherical to a good approximation within the radius of the halo.
The values of P{}_{\rm cut}^{\rm peak} were determined by the probability contours of Figure 4. By choosing a first P{}_{\rm cut}^{\rm peak} of 0.9999 we will find all halos above a mass of 2\times 10^{4} cells. As halos of this mass are rare, these should be nonoverlapping. Similarly, by choosing a second P{}_{\rm cut}^{\rm peak} of 0.99 we will find all halos above a mass of 3\times 10^{2} cells. We find that P{}_{\rm cut}^{\rm peak}=[0.9999,0.99,0.975] results in an accurate halo catalogue, finding the required halos across the whole mass range. We performed a simple convergence test, adding 3 extra P{}_{\rm cut}^{\rm peak} values between each of the ones listed above, and adding filters below, and found no improvement.
5 Mock Catalogue Results
Satisfied that our halo finder provides an accurate identification of the objects in the probability mask, we validate the HaloNet halo catalogues against the true halo catalogues using four main categories: visually, halo abundance, halo clustering, and individual halo properties.
In Figure 5 we show a comparison of final halo catalogues. It is immediately apparent that HaloNet accurately reproduces the mass, position, and clustering of the dark matter halos.
5.1 Halo Masses
Correctly determining the abundance of halos of a given mass is crucial when creating a mock catalogue. Many observables, such as the SunyaevZel’dovich effect (Planck Collaboration et al., 2016; George et al., 2015), weak lensing (Ade et al., 2016; Joudaki et al., 2017; Hildebrandt et al., 2017; DES Collaboration et al., 2017), or line intensity (Kovetz et al., 2017) can be directly related to the total mass and redshift of the cluster. Therefore, incorrect masses will inhibit the mock’s ability to reproduce the statistics of true cosmological observations.
The mass of a Peak Patch halo is defined by the largest spherical region which collapses by the redshift of interest under the homogeneous ellipsoid collapse approximation. As the periodic grid of the simulation is defined in comoving coordinates, the radius of a halo is easily related to the mass through \rm{M_{h}=\frac{4}{3}\pi\rm R_{h}^{3}\bar{\rho}_{M}}, where \rm{\bar{\rho}_{M}=2.775\times 10^{11}\Omega_{M}h^{2}\ [M_{\odot}/Mpc^{3}]} is the mean matter density of the universe. We adopt the same definition of mass for our HaloNet halos.
Each simulation belonging to the test set has \sim60,000 true halos and \sim55,000 HaloNet halos above the minimum mass cut of 100 cells. Figure 6 shows the abundance of HaloNet halos compared to the true halo abundance, and compared to the Tinker et al. (2008) and Press & Schechter (1974) halo mass functions. We chose to compare to the Tinker et al. (2008) mass function as it shares a similar definition of halo mass, measured by a spherical overdensity calculation (SO) and not by friendsoffriends (FoF) as is common. We find that our results agree very well with the Peak Patch mass function across the entire mass range, and by extension also with Tinker. A key result is that HaloNet does not simply predict the same results of the Press & Schechter (1974) mass function, which would be the case of a spherical collapse calculation using a simple density threshold, usually \delta_{c}=1.686.
The overprediction of mass for the largest HaloNet halos comes from the fact that our halo finder can result in centreofmass positions slightly miscentred towards overlapping halos. As we used the true halo locations to measure the probability cutoff P{}_{\rm cut}^{\rm shell}, a miscentered halo will now have increased probability when averaged in radial shells, as it now takes into account more voxels that belong to the neighbouring halos. We observed that this overprediction for the largest halos also affected the neighbouring intermediate mass halos, leading to a reduction of mass.
To validate the Lagrangian halo finder (described in Section 4), we tested by performing the halo finding measurements from the known centers of the true halos. We found that the resulting catalogue was nearly identical visually and quantitatively to the original HaloNet catalogue, except for the highest mass halos where we found a better fit to the true mass function, meaning that our simple prescription for finding halos in the probability field works sufficiently well.
5.2 Halo Clustering
Quantifying the clustering of halos is done through calculating a spatial correlation function. The spatial correlation (or twopoint) function is defined as the excess probability of finding a pair of halos at a separation \bf{r}, when compared to what is expected for a random distribution. Instead of calculating the correlation function we used its Fourier transform, the halo power spectrum P(\bf{k}).
We calculate the halo power spectrum by binning halos into a 512{}^{3} grid (the same resolution as the initial density field) to create \delta_{h}(\bf{x}). The power spectrum is then defined as \langle\delta_{h}({\bf{k}})\delta_{h}({\bf{k^{\prime}}})\rangle=(2\pi)^{3}% \delta_{D}^{3}({\bf{k}}+{\bf{k^{\prime}}})P_{hh}(k), which we can simply calculate for each linearly spaced bin k_{i} through
\displaystyle\begin{split}\displaystyle P(k_{i})&\displaystyle=\int_{{\bf{k}}% \in k_{i}}\frac{d^{3}{\bf{k}}}{V_{k_{i}}}\delta_{h}({\bf{k}})\delta_{h}({\bf% {k}})\\ &\displaystyle=\frac{1}{V_{cell}}\sum_{{\bf{k}}\in k_{i}}\frac{1}{n_{k_{i}}}% \delta_{h}({\bf{k}})\delta_{h}({\bf{k}})\end{split}  (5) 
where the second equality holds when the calculation is performed on a discrete periodic grid such as we use in our analysis. V_{cell} is the volume of a cell of the grid, and V_{k_{i}}\approx 4\pi k_{i}^{2}\Delta k.
Figure 7 shows the power spectrum and the cross correlation results for three number cuts, where the catalogues are first ranked in mass. The cross correlation is defined as
r=\frac{P_{HN\times PP}}{\sqrt{P_{HN}\times P_{PP}}}  (6) 
where HN denotes HaloNet, and PP denotes Peak Patch.
We find that the power spectrum is within 5% for the first number cut (most massive halos) and when including all halos. For all number cuts we find a roughly linear bias, meaning we reproduce well the shape of the power spectrum. We see that HaloNet performs well on large scales, including scales larger than the 64 Mpc subvolumes that tile the full box.
5.3 Halo Measurements
Dark matter halos provide a unique opportunity to study the predictions of Convolutional Neural Networks on an objectbyobject basis. Based on the results of the previous sections, its clear that HaloNet is somehow searching for halolike objects, but the degree to which it considers complicated features such as tidal forces is not. The Peak Patch algorithm explicitly includes tidal information by solving the ellipsoidal collapse equation (Eq. 2.21 of Bond & Myers (1996a)) to find the point of virialization for each halo at the target redshift. The essential inputs determining the collapse dynamics are the eigenvalues \lambda_{i},i=1\dots 3, of the strain tensor,
s_{ij}(\mathbf{x})=\frac{\nabla_{i}\nabla_{j}}{\nabla^{2}}\delta_{m}(\mathbf{x% }),  (7) 
averaged within the radius of the peak. We can then compute the peak shear ellipticity, e_{v} and prolaticity, p_{v} as
\displaystyle e_{v}  \displaystyle=\frac{1}{2\bar{\delta}_{v}}(\lambda_{1}\lambda_{3}),  (8)  
\displaystyle p_{v}  \displaystyle=\frac{1}{2\bar{\delta}_{v}}(\lambda_{1}2\lambda_{2}+\lambda_{3}),  (9) 
where \bar{\delta}_{v}=\sum_{i}\lambda_{i} is the mean overdensity of the peak. The ordering \lambda_{1}\geq\lambda_{2}\geq\lambda_{3} imposes the constraints e_{v}\geq 0 and e_{v}\leq p_{v}\leq e_{v} for each halo.
Figure 8 show a comparison between the distributions \bar{\delta}, e_{v}, and p_{v} as a function of halo mass, averaged over the 32 test simulations. We find that the HaloNet distributions closely resemble the Peak Patch over the entire range of this study, except for small deviations at the high and low mass ends.
The low mass end represents the population where tides have the largest effect and so the properties deviate the most from the spherical collapse approximation (red line in Figure 8). On average, only halos found in larger overdensities can collapse. While the predicted means and 1 standard deviation regions track the ground truth closely there, HaloNet occasionally selects more random regions, as evidenced by the tendency towards \delta=0 and the larger scatter in e_{v}, p_{v}. This suggests that HaloNet is to some degree aware of tidal forces but not to the accuracy of Peak Patch, at this stage of training. Furthermore, we have performed a matching analysis between the predicted and simulated catalogs, defining a match to have its centre inside and its size within 25\% of the radius of its partner, which finds matches for 60\% of Peak Patch halos. While indeed the unmatched HaloNet halos account for the scatter in Figure 8, they maintain the same overall distributions. We note that the peak patch values of \bar{\delta}, e_{v}, and p_{v}, shown are not identical to the ones which dictate the homogeneous ellipsoidal collapse equations of Peak Patch, as we have recalculated them after the merging and exclusion steps of the algorithm, but they are overall very similar.
For the high mass end we see that \bar{\delta} is sightly low with larger scatter, while e_{v} is high (and noisy as well). This is most likely due to the effect of the 2 halo term on our halo finder, discussed in Section 4, biasing the more massive halos larger. Alternatively, since the high mass end consists of the rarest events, to which the network is only exposed a handful of times, this could be the attempt of the network to extrapolate features learned on smaller spatial scales.
6 Discussion & Future Work
In this work, we presented the first application of a volumetric deep Convolutional Neural Network (CNN) for simulation of dark matter halo catalogues. Our 5level VNet architecture is a powerful pixelwise binary classifier for particles in the initial conditions (Section 3), achieving a Dice coefficient of \sim0.93 in 72 hours of training. We showed how a simple geometric Lagrangian halo finder (Section 4) can be performed on the network output in order to create a dark matter halo catalogue that achieves high accuracy on the halo mass function, halo clustering, and individual halo properties, when compared to the true halo catalogue, across the entire mass range of the study.
The effect of halo clustering on our halo finder has some noticeable effect on the accuracy, even after modifications helped to largely mitigate this (Section 4). While a more expensive halo finder could increase the mass function accuracy, we are confident that it would not have a large effect on the main findings of this work.
We found that the HaloNet catalogue was consistent with the predictions of ellipsoidal collapse (Sections 5.1, 5.3), the underlying principle in the ground truth Peak Patch simulations. Although we did observe increased scatter about this behaviour (Section 5.3), we stopped the training of our network before observing a complete flattening of the loss function. Therefore increased training should result in higher accuracy, although at the point we stopped the learning rate is quite slow, so the required training is beyond the scope of this work.
This method allows for the simulation of large volumes of the universe for a very small computational cost and memory requirement, orders of magnitude smaller than an Nbody simulation. The high accuracy of the HaloNet prediction for Peak Patch gives motivation to train on Nbody halos traced back to the initial conditions. While the network architecture and algorithms presented here could be applied directly in that context, the training set is more expensive to compute. Furthermore, Nbody halos suffer from further complications such as complicated morphologies and initially disconnected regions ending up in the same halo. This suggests that exact Nbody halo identification is a problem in semantic segmentation, where each halo is treated as a different class, intractable with a naive generalization of our method.
We have trained HaloNet at a single redshift and set of cosmological parameters. In its current form, therefore, our method suggests a process where the network is trained on a small sample of exact simulations (at a target redshift and cosmology) and is then used to generate large boxes and sets of independent realizations. However, the eventual goal would be to modify the network architecture to include fullyconnected combinations of these parameters along side the CNN, trained at some grid sampling, to produce a cosmological realisation emulator.
Acknowledgments
We thank Erik Spence for a great course on “Advanced Neural Networks” where we initially came up with the idea for this project and further developed our skills. We also thank Pranai Vasudev, Mohamad AliDib, Charles Zhu, Kristen Menou, Dick Bond, Xin Wang, ISheng Yang, and Benjamin Wandelt for useful discussions along the way. Research in Canada is supported by NSERC. These calculations were performed on the GPC supercomputer at the SciNet HPC Consortium. SciNet is funded by: the Canada Foundation for Innovation under the auspices of Compute Canada; the Government of Ontario; Ontario Research Fund  Research Excellence; and the University of Toronto.
References
 Abadi et al. (2016) Abadi M., et al., 2016, preprint, (arXiv:1603.04467)
 Ade et al. (2016) Ade P. A. R., et al., 2016, Astron. Astrophys., 594, A15
 Adhikari et al. (2014) Adhikari S., Dalal N., Chamberlain R. T., 2014, JCAP, 1411, 019
 Alvarez et al. (2014) Alvarez M., et al., 2014, preprint, (arXiv:1412.4671)
 AragonCalvo (2018) AragonCalvo M. A., 2018, preprint, (arXiv:1804.00816)
 Avila et al. (2015) Avila S., Murray S. G., Knebe A., Power C., Robotham A. S. G., GarciaBellido J., 2015, MNRAS, 450, 1856
 Avila et al. (2017) Avila S., et al., 2017, preprint, (arXiv:1712.06232)
 Bardeen et al. (1986) Bardeen J. M., Bond J. R., Kaiser N., Szalay A. S., 1986, ApJ, 304, 15
 Bond & Myers (1996a) Bond J. R., Myers S. T., 1996a, Astrophys. J. S., 103, 1
 Bond & Myers (1996b) Bond J. R., Myers S. T., 1996b, ApJS, 103, 41
 Bond & Myers (1996c) Bond J. R., Myers S. T., 1996c, ApJS, 103, 63
 Chollet et al. (2015) Chollet F., et al., 2015, Keras, https://github.com/fchollet/keras
 Chuang et al. (2015) Chuang C.H., Kitaura F.S., Prada F., Zhao C., Yepes G., 2015, MNRAS, 446, 2621
 DES Collaboration et al. (2017) DES Collaboration et al., 2017, preprint, (arXiv:1708.01530)
 Diemer & Kravtsov (2014) Diemer B., Kravtsov A. V., 2014, ApJ, 789, 1
 Feng et al. (2016) Feng Y., Chu M.Y., Seljak U., McDonald P., 2016, MNRAS, 463, 2273
 George et al. (2015) George E. M., et al., 2015, Astrophys. J., 799, 177
 Gillet et al. (2018) Gillet N., Mesinger A., Greig B., Liu A., Ucci G., 2018, preprint, (arXiv:1805.02699)
 Gupta et al. (2018) Gupta A., Zorrilla Matilla J. M., Hsu D., Haiman Z., 2018, preprint, (arXiv:1802.01212)
 Hahn & Abel (2011) Hahn O., Abel T., 2011, MNRAS, 415, 2101
 He et al. (2015) He K., Zhang X., Ren S., Sun J., 2015, preprint, (arXiv:1512.03385)
 Hildebrandt et al. (2017) Hildebrandt H., et al., 2017, MNRAS, 465, 1454
 Hopkins et al. (2014) Hopkins P. F., Keres D., Onorbe J., FaucherGiguere C.A., Quataert E., Murray N., Bullock J. S., 2014, Mon. Not. Roy. Astron. Soc., 445, 581
 Inman et al. (2016) Inman D., et al., 2016, preprint, (arXiv:1610.09354)
 Ioffe & Szegedy (2015) Ioffe S., Szegedy C., 2015, preprint, (arXiv:1502.03167)
 Izard et al. (2016) Izard A., Crocce M., Fosalba P., 2016, MNRAS, 459, 2327
 Joudaki et al. (2017) Joudaki S., et al., 2017, MNRAS, 465, 2033
 Kitaura et al. (2014) Kitaura F.S., Yepes G., Prada F., 2014, MNRAS, 439, L21
 Kovetz et al. (2017) Kovetz E. D., et al., 2017, preprint, (arXiv:1709.09066)
 Krizhevsky et al. (2012) Krizhevsky A., Sutskever I., Hinton G. E., 2012, in , Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp 1097–1105
 LucieSmith et al. (2018) LucieSmith L., Peiris H. V., Pontzen A., Lochner M., 2018, preprint, (arXiv:1802.04271)
 Manera et al. (2013) Manera M., et al., 2013, MNRAS, 428, 1036
 Masui et al. (2010) Masui K. W., Schmidt F., Pen U.L., McDonald P., 2010, Phys. Rev. D, 81, 062001
 Milletari et al. (2016) Milletari F., Navab N., Ahmadi S.A., 2016, preprint, (arXiv:1606.04797)
 Modi et al. (2018) Modi C., Feng Y., Seljak U., 2018, preprint, (arXiv:1805.02247)
 Monaco et al. (2013) Monaco P., Sefusatti E., Borgani S., Crocce M., Fosalba P., Sheth R. K., Theuns T., 2013, MNRAS, 433, 2389
 Planck Collaboration et al. (2016) Planck Collaboration et al., 2016, Astron. & Astrophys., 594, A27
 Press & Schechter (1974) Press W. H., Schechter P., 1974, ApJ, 187, 425
 Ravanbakhsh et al. (2017) Ravanbakhsh S., Oliva J., Fromenteau S., Price L. C., Ho S., Schneider J., Poczos B., 2017, preprint, (arXiv:1711.02033)
 Rodriguez et al. (2018) Rodriguez A. C., Kacprzak T., Lucchi A., Amara A., Sgier R., Fluri J., Hofmann T., Réfrégier A., 2018, preprint, (arXiv:1801.09070)
 Ronneberger et al. (2015) Ronneberger O., Fischer P., Brox T., 2015, preprint, (arXiv:1505.04597)
 Rubin et al. (1980) Rubin V. C., Ford Jr. W. K., Thonnard N., 1980, ApJ, 238, 471
 Shaw et al. (2015) Shaw J. R., Sigurdson K., Sitwell M., Stebbins A., Pen U.L., 2015, Phys. Rev. D, 91, 083514
 Tassev et al. (2013) Tassev S., Zaldarriaga M., Eisenstein D. J., 2013, JCAP, 6, 036
 Tinker et al. (2008) Tinker J., Kravtsov A. V., Klypin A., Abazajian K., Warren M., Yepes G., Gottlöber S., Holz D. E., 2008, ApJ, 688, 709
 White et al. (2014) White M., Tinker J. L., McBride C. K., 2014, MNRAS, 437, 2594