EoR Models Classifier

Identifying Reionization Sources from 21cm Maps using Convolutional Neural Networks


Active Galactic Nuclei (AGN) and star-forming galaxies are leading candidates for being the luminous sources that reionized our Universe. Next-generation 21cm surveys are promising to break degeneracies between a broad range of reionization models, hence revealing the nature of the source population. While many current efforts are focused on a measurement of the 21cm power spectrum, some surveys will also image the 21cm field during reionization. This provides further information with which to determine the nature of reionizing sources. We create a Convolutional Neural Network (CNN) that is efficiently able to distinguish between 21cm maps that are produced by AGN versus galaxies scenarios with an accuracy of 92-100%, depending on redshift and neutral fraction range. When adding thermal noise from typical 21cm experiments, the classification accuracy depends strongly on the effectiveness of foreground removal. Our results show that if foregrounds can be removed reasonably well, both SKA and HERA should be able to discriminate between source models with greater accuracy than using the power spectrum alone.

dark ages, reionisation, first stars - galaxies: active - galaxies: high-redshift - galaxies: quasars - intergalactic medium

1 Introduction

Low-frequency radio interferometer experiments, such as the Low Frequency Array (LOFAR; van Haarlem et al., 2013), the Precision Array for Probing the Epoch of Reionization (PAPER; Parsons et al., 2010), the Hydrogen Epoch of Reionization Array (HERA; DeBoer et al., 2017), the Murchison Wide field Array (MWA; Bowman et al., 2013; Tingay et al., 2013), the Giant Metre-wave Radio Telescope (GMRT; Paciga et al., 2011), and the Square Kilometer Array (SKA; Mellema et al., 2015), are promising instruments for detecting reionization on cosmological scales in the near future. These growing observational efforts motivate the development of statistical tools for efficiently extracting astrophysical and cosmological information.

The nature of sources driving cosmic reionization remains a debated topic. While many studies (e.g. Shapiro & Giroux, 1987; Hopkins et al., 2007; Haardt & Madau, 2012; Hassan et al., 2018) conclude that reionization is mainly driven by star-forming galaxies, there remain some discrepancies that might be resolved by allowing Active Galactic Nuclei (AGN) to reionize the Universe. These discrepancies include the flat slope of ionising emissivity measurements (Becker & Bolton, 2013), the early and extended timing of Helium reionization (Worseck et al., 2016), and large scale intergalactic opacity fluctuations (Becker et al., 2015). In addition, the Giallongo et al. (2015) high-redshift AGN measurements indicate a flatter slope at the faint end of the luminosity function, hence implying the presence of sufficient AGN to complete reionization as opposed to what was previously believed.

In Hassan et al. (2018), we studied the role of galaxies versus AGN during reionization using an improved semi-numerical simulation. This simulation implements realistic and physically motivated recipes to include ionizing photons from galaxies, following the Finlator et al. (2015) radiative transfer simulations, and from AGN, following the Giallongo et al. (2015) observations and a fixed Quasar Halo Occupancy Distribution. We found that AGN-dominated scenarios are unable to simultaneously match current reionization constraints, namely the Planck (2016) optical depth, ionising emissivity measurement by Becker & Bolton (2013), and the Fan et al. (2006) neutral fraction constraints at the end of reionization. This shows that AGN are highly unlikely to be the sole drivers of cosmic reionization. However, a model with equal photon contributions from AGN and galaxies is still allowed given the large uncertainties of current observational constraints. The broad range of theoretical models allowed by current reionization observations is expected to narrow when data from 21cm surveys become available. We found in Hassan et al. (2018) that power spectrum measurements from LOFAR, HERA and SKA could potentially discriminate between galaxies-only models and those with AGN contributions. This is because the latter produce larger ionized bubbles, and hence boost the 21cm power spectrum by a factor of as compared to the former.

Future 21cm observations are also expected to provide huge datasets of large-scale 21cm images that will encode more information than just the 21cm power spectrum. These images might provide an alternative approach to constrain the contribution of different reionizing sources. Here we assess the viability of using Convolutional Neural Networks (CNNs) to discriminate between AGN and galaxy models. This approach has been successfully employed in astronomy and cosmology to perform regression (e.g. Gupta et al., 2018; Gillet et al., 2018) and classification (e.g. Aragon-Calvo, 2018; Huertas-Company et al., 2018; Schaefer et al., 2018).

In this paper, we first examine the CNN capability to discriminate between these models at several redshifts as a function of neutral fraction when the simulations are free of instrumental effects. We then include real-world survey effects, adding realistic instrumental noise and foreground mitigation cuts in order to test the CNN performance on datasets as they will be seen by the SKA, HERA, and LOFAR surveys. We finally consider the case where our different reionization models share similar power spectra. This is essentially a test of our CNN’s ability to break degeneracies inherent in a power spectral analysis.

This paper is organized as follows: Section 2 provides a brief summary of the simulations used for this study, a description of our two reionization models (galaxies, AGN), and details of our instrument simulations. We introduce Convolutional Neutral Networks and describe the classifier architecture in Section 3. We describe our training datasets in Section 4, present the results in Section 5 and summarize our conclusions in Section 6.

2 Simulations

We use the Time-integrated version of our semi-numerical code SimFast21 (Santos et al., 2010) that has been developed in Hassan et al. (2017). We describe here the general components of our simulation and refer to Santos et al. (2010) for the full description of the algorithm and to Hassan et al. (2016, 2017) for more details on the Time-integrated version development.

We first generate the dark matter density field using a Monte-Carlo Gaussian approach in the linear regime. We then dynamically evolve the linear density field into a non-linear field by applying the Zel’Dovich (1970) approximation. Using the underlying non-linear density realizations, the dark matter halos are identified using the well known excursion set formalism (ESF, Press & Schechter, 1974; Bond et al., 1991) in which a region is assumed to collapse into halos, if its mean overdensity is higher than an overdensity threshold of , where is the linear growth factor. Around each point in the density grid, the mean overdensity is obtained through a sequence of top-hat filters, starting with the box size and decreasing down to the cell size. The minimum halo mass is set to 10, which corresponds to the hydrogen cooling limit during reionization. In the Time-integrated model, the ionised regions are identified using a similar ESF that is based on comparing the time-integrated ionisation rate with that of the recombination rate and the local neutral hydrogen density within each spherical volume specified by the ESF. Regions are flagged as ionised if


where is the photon escape fraction, is the ionised fraction, and is the total number of hydrogen atoms. Equation (1) represents the ionisation condition used in the Time-integrated model. The recombination field is modelled using a derived parametrization taken from a high-resolution radiative simulation  (Finlator et al., 2015) to account for the small scale inhomogeneity below our cell size and clumping factor effects. The recombination rate, , is parametrized as a function of overdensity and redshift as follows:


where cms (proper units), , , . We refer to Hassan et al. (2016) for more details on the derivation of and the effect of this parameter on reionization observables.

The key difference between the galaxy and AGN models is in their prescriptions for computing . We dub this variable for the galaxy model and for the AGN model.

2.1 Galaxies

Our parameterization for the ionizing photon output from galaxies is motivated by a combination of radiative transfer simulations (Finlator et al., 2015) and larger hydrodynamic simulations (Davé et al., 2013) that have been calibrated to match wide range of observations. In particular, is parametrized as a function of halo mass and redshift as follows:


where s, , and . This ionization rate is computed directly from the star formation rate (SFR) of these hydrodynamic simulations galaxies catalogs. This parametrization has been dervied fully in Hassan et al. (2016). As previously considered in Hassan et al. (2017), we keep all other parameters fixed and vary the photon escape fraction , the emissivity amplitude and the emissivity-halo mass power dependence index to produce different ionizing emissivities , for our training samples, as follows:


where the proportionality constant is


2.2 Agn

We compute the ionizing photon output from AGN using the strong correlations seen in local Universe observations (Ferrarese, 2002; Tremaine et al., 2002) between black hole masses and circular velocities of their host halos. This correlation is written as


where the constant may be regarded as the black hole formation efficiency, which is a free parameter in the AGN source model besides . Assuming the Eddington luminosity in the B-band and a power law spectral energy distribution, the AGN ionization rate is given by:


where is Planck’s constant and

To study the effect of massive and less massive AGN, it is reasonable to assume a non-linear relation between and , similar to our Galaxies source model in which depends non-linearly on through . Thus, we introduce a new parameter in the above relations. This leads to an AGN model with three free parameters: AGN photon escape fraction , ionising emissivity-black hole mass dependence (the non-linear scale index) , and ionising emissivity amplitude (equivalent to the black hole formation efficiency). We keep all other parameters fixed and vary these parameters to generate different ionizing emissivities , for our training datasets, using:


where the proportionality constant is . This shows how the AGN ionizing emissivity is related to the host halo mass. We next use the Giallongo et al. (2015) luminosity function at to find the corresponding AGN duty cycle for each halo mass bin, which is the ratio between the number of AGN to the number of possible host halos. This ratio is alternatively called the Quasar Halo Occupancy distribution (QHOD). The QHOD then provides a probability for a particular host halos to have an AGN that is actually active, and we use the QHOD to randomly designate a given halo as one that produces ionizing photons according the prescription outlined above. Instead of using a fixed luminosity function, we use a fixed QHOD at higher redshifts to find the number of AGN. We defer to Hassan et al. (2018) for the full details of our AGN source model prescription.

Figure 1: Several 21cm images, each with a thickness of a single cell size (0.535 Mpc), as produced by Galaxies (first three rows) and AGN (last three rows) at , 9, 8, 7 at 0.25, 0.5, 0.75. We quote in subtitles the astrophysical parameters used for galaxies () and for AGN ().

In Figure 1, we show different 21cm images for Galaxies and AGN models at , 9, 8, 7 at different stages of reionization when x 0.25, 0.5, 0.75. Each image is extracted from an 21cm box that is generated from a different realization of the underlying density field (different seed) and different set of astrophysical parameters. These parameters are quotes above each image, in the format () for Galaxies and () for AGN. These images are randomly selected from our training datasets, whose assembly we explain in greater detail in Section 4. We see that Galaxies scenarios (first three rows) have more small scale bubbles than AGN scenarios (last three rows), which tend to have larger bubbles. The size of ionised bubbles in each model is basically controlled by the emissivity’s dependence on mass, namely and for Galaxies and AGN, respectively. Models with larger values of generally tend to produce larger ionised bubbles than models with smaller values. For Galaxies and AGN models that have the same and values, AGN models will typically still produce larger ionised bubbles at same neutral fraction and redshift. This can be seen in Figure 1 at , 9 and 0.25, 0.75 for , 0.67, respectively. On the other hand, if a Galaxies model uses a larger value for than an AGN model does for , the Galaxies model will produce larger ionised bubbles at the same redshift and neutral fraction as seen at and 50%. Of the remaining parameters, the photon escape fraction and the emissivity amplitude linearly increase the ionising emissivity by the same amount for every source, and hence they have a bigger effect on the reionization history than on the reionization topology.

To summarize, the main differences between our Galaxies and AGN models (which we hope to distinguish using our classifier) are:

  • Both models have ionizing emissivities that scale super-linearly with halo mass, but the Galaxies model includes an exponential cut-off at low masses; see Equations (8) and (4).

  • The AGN model accounts for redshift evolution by following the halos mass function (HMF), whereas the Galaxies model has an extra redshift evolution factor; see Equation (4).

  • In Galaxies model, the star-formation duty cycle is assumed to be unity, which means that all identified halos are forming stars and contributing to the total ionizing emissivity. On the other hand, AGN models implement a duty cycle following the Giallongo et al. (2015) luminosity function, which means that not all halos host AGN and contribute to the photon production rate.

2.3 21cm Instrument Simulations

We now describe our methodology for including realistic 21cm instrumental effects. Doing so requires accounting for the finite angular resolution of telescopes, the effects of foreground cleaning, and the presence of instrumental noise.

Angular resolution

The finite angular resolution of an 21cm experiment limits sensitivity on large scales, where is the comoving wavenumber in the directions perpendicular to our line of sight (i.e., the angular directions). Our recipe for adjusting the 21cm brightness temperature boxes to account for an instrument’s angular resolution is as follows:

  • We compute the maximum baseline length () from the antenna distribution of the 21cm experiment in question.

  • We compute the corresponding cut-off using:


    where is the comoving distance to and is the wavelength of the 21cm line at redshift .

  • We Fourier transform the 21cm box, arbitrarily picking two of the axes of the box to be the and axes (with Fourier dual wavenumbers of and , respectively). We then compute for every voxel of the box.

  • We zero out all Fourier modes with .

In this (somewhat simplistic) treatment, the angular resolution of a telescope is determined by the longest baseline of the array. In future work, it will be necessary to improve upon this by explicitly accounting for the detailed baseline distribution of different interferometers. For this paper, however, our simple prescription is sufficient for the purposes of qualitatively exploring the efficiency of our CNN classifier with different levels of angular resolution.

Foreground cleaning

We next adjust the 21cm brightness temperature boxes by removing modes that are contaminated by foregrounds. These modes come in the form of a “foreground wedge” in the - plane (see, e.g., Liu et al. 2014 for more details on the wedge and its complement, the reionization window). The wedge can be parameterized by saying that modes satisfying


are foreground contaminated, where is a wedge slope that is given by


where is the Hubble parameter, is the speed of light, and . A conservative way to fight foregrounds is simply to zero all modes within the wedge.

Using Equation (11), the wedge slope value is about during reionization redshifts. However, we also consider smaller values of to model foreground mitigation algorithms that are able to reduce the Fourier space footprint of the contaminants. Such algorithms reduce the number of modes that need to be zeroed out, preserving more of the information content of the data.

To mimic the effect of foreground contaminants (and the effect of their removal) in our boxes, we proceed as follows:

  • We compute the wedge slope at the appropriate redshift using Equation (11) and additionally consider wedge slopes that are smaller than this value.

  • We zero out all Fourier modes satisfying .

Experiment Design Diameter [m] Collecting area [m] Receiver temperature [mK]
LOFAR 48 tiles of bow-tie high band antennae 30.75 35,762 140,000
HERA 350 hexagonally packed antennae 14 50,953 100,000
SKA 866 compact core antennae 35 833,189 100 T + 40,000
Table 1: Summary of parameters used in 21cmSense package to obtain the thermal noise sensitivity for each experiment. Columns (from left to right) are experiments’ names, designs, antenna diameter, total collecting area, and the reciver temperature. The sky temperature is given by T = 60 K.

Figure 2: A summary of the 21cm instrumental effect pipeline. Bottom left: a 21cm slice with a thickness of a single cell size (0.535 Mpc) from our Galaxies source model of a simulation box of Mpc and 140 at . Second, third and fourth columns: the signal as seen by SKA, HERA and LOFAR, respectively. First row: the effect of the cutoff, corresponding to the maximum baseline of each experiment, to produce an angular resolution limited signal. Second row: the effect of wedge filtering using a wedge slope of from . Third row: instrumental noise realizations. Last row: the addition of the thermal noise to the wedge-filtered and angular resolution-limited signal. The large slope value () used in the wedge filtering step removes most of the large and small scale features.

Figure 3: Same 21cm signal as in Figure 2 but with different wedge filtering levels as quoted in subtitles. First, second and third rows are the signal as seen by the SKA, HERA and LOFAR, respectively. At a wedge slope , the signal features start to appear more clearly, particularly for the SKA and HERA. The thermal noise from LOFAR dominates the image.

Thermal noise

To add instrumental noise, we make use of 21cmSense3, which allows us to account for thermal noise based on the experiments’ designs (Pober et al., 2013, 2014). The sample variance contribution is already implicitly included in this study, since our training sets are generated from different random seeds of the underlying density field fluctuations. The 21cm thermal noise is generated as follows:

  • The thermal noise power spectrum is computed by the 21cmSense package using the expression


    where is a conversion factor from angle and frequency units to comoving cosmological distances, is the primary beam’s field-of-view, is the integration time on the mode with wavenumber , and is the total system (sky plus receiver) temperature. The final proposed configurations of our considered experiments is summarized in Table 1.

  • We generate a white noise box drawn from a Gaussian distribution with a zero mean and unit standard deviation.

  • We transform the white noise box to the Fourier space and scale it using the noise power spectrum.

  • We then inverse Fourier transform the scaled white noise box back to real space.

As a final step, we add the noise realization box to our cosmological signal box (which has been treated for angular resolution and foreground effects, as discussed above). This then forms our mock 21cm observations.

Figure 2 illustrates the impact of instrumental effects. There, we show a slice from a box based on our Galaxies model at (bottom left), the same slice but angular resolution limited (excluding -modes with ; first row), wedge-filtered slices using a wedge slope value (second row), the thermal noise contributions from each experiment (third row), and finally the image as observed by SKA, HERA, and LOFAR, respectively. In the first row, we see the effect of angular resolution, which is set by the maximum baseline. The maximum baseline lengths for the SKA, HERA and LOFAR are about , and (see Table 1), which correspond to of about , , and , respectively. The SKA and LOFAR have higher angular resolution, as compared with HERA, and hence their angularly limited versions (first row) are quite similar to the original signal (bottom left). The low angular resolution of HERA prevents the resolution of small scale features of the signal, but bubbles can still be detected on large scales. We next apply the wedge filtering on top of the angular resolution limited signal, which results in the images on the second row. Here, we used a wedge slope of , which is the conservative expectation at given Equation (11). It is quite clear that using destroys most of the signal features, on large scales along and on small scales along . Next, we produce thermal noise maps (third row). As expected, the SKA has a dimmer noise map than HERA and LOFAR do. Finally, we add the noise maps to the angular resolution-limited and wedge-filtered signal to produce mock images as seen by our experiments (last row). We see that HERA and SKA are able to see the most prominent features of the wedge-filtered images. The thermal noise from LOFAR, on the other hand, dominates the image, and so we expect that LOFAR may not be able to easily distinguish between images generated from the Galaxies and AGN scenarios.

In Figure 3, we explore the effect of different wedge filtering levels. We show the signal’s response to different wedge slopes , 2.0, 1.0, 0.5, and 0.25. Results for the SKA, HERA and LOFAR are shown in the first, second, and third row, respectively. As expected, the lower the wedge slope, the clearer the signal features are, since fewer -modes are removed due to foregrounds. At , the signal prominent features on large scales begin to appear more clearly, and hence we expect better classification performance at this limit of wedge filtering for the SKA and HERA. On the other hand, thermal noise from LOFAR limits the identification of signal features, even at very small wedge filtering levels (). In Section 5, we will explore the classifier ability to discriminate between Galaxies and AGN models as a function of the wedge slope.

3 Reionization Models Classifier Architecture

Figure 4: Reionization models classifier architecture. Each 21cm map is processed into four blocks of two convolutional and pooling layers that are followed by three fully connected layers to eventually output the image class (0-Galaxies, 1-AGN). Dropout regularizations are only applied on the fully connected layers during training, keeping 75% of the neurons. Batch normalization is applied before all layers except the final output layer.

We employ Convolutional Neural Networks (CNNs) to build up the reionization models classifier. CNNs are a class of deep learning models that are powerful for large scale image recognition (for a comprehensive review see Rawat & Wang, 2017). Similar to the classical Artificial Neural Networks (ANNs), CNNs are feedforward networks that take in an input () and eventually outputs a predicted label (). The connection between these is constructed by stacking neuron layers, each of which is a linear combination of the components of with weights and biases. Each layer can produce multiple outputs of the form


where is the number of inputs in the layer, is the th output (or “feature”), is the weighting of the th input in the th feature, and is the bias in the th feature. The function is a nonlinear function response function that is chosen by the architect of the neural network.

The CNN architecture is comprised of a series of different type of neuron layers that each have a specific form for the weights:

  • Convolutional layers extract different features by convolving the input image with learned weights of 2-dimensional kernels.

  • Pooling layers reduce the dimension and spatial resolution of features. They speed up the network performance and prevent over-fitting. Examples of pooling layers are average-pooling or max-pooling in which the input image is divided into non-overlapping sub-images where the values of each sub-image are the average or max of its original values from the input image, respectively.

  • Fully connected layers are the classical ANN layers in which the input is connected to all neurons.

Two other standard techniques are employed in our CNN:

  • Dropout regularizes the training process by randomly switching off neurons with a given probability. This prevents coadaptations of same features and overcomes the problem of overfitting.

  • Batch Normalization insures that the input images, before each layer, are uncorrelated while keeping the averages and variances fixed. Batch normalization scales and shifts the input by two new learned parameters and as follows:


    where is a small constant to avoid arithmetical instability. This technique has shown to improve training performance significantly (e.g. Roy et al., 2018; Schaefer et al., 2018).

Combining this together, our CNN architecture is as follows: the network is composed of four blocks of convolutional layers, three fully connected layers, and an output layer. Similar CNN architectures have been very successful to perform large-scale image recognition (e.g. Simonyan & Zisserman, 2015; Schaefer et al., 2018). In each block, there are two convolutional layers with convolutional kernels of size 33 and a single 22 max-pooling layer. The number of generated features increases from one block to another while the features’ dimension decreases through the max-pooling. The final output layer with two neurons is applied to classify the input 21cm map by providing the probabilities that the input images were generated from the Galaxies or AGN models. Dropout is applied on all fully connected layers (only during training) to keep 75% of the neurons. Before each layer, batch normalization is applied except for the final output layer where a softmax activation is applied instead. The Rectified Linear Unit (ReLU) activation function is applied on all convolutional and fully connected layers. The ReLU function is defined as:


which returns if the input is negative; otherwise it keeps the raw input. Figure 4 shows our classifier design and Table 2 summarizes the image processing steps through the classifier layers as outlined above.

Step Layer type Output dimension
1 33 Convolution 16140140
2 Batch Normalization + ReLU
3 33 Convolution 16140140
4 Batch Normalization + ReLU
5 22 max-pooling 167070
6 33 Convolution 327070
7 Batch Normalization + ReLU
8 33 Convolution 327070
9 Batch Normalization + ReLU
10 22 max-pooling 323535
11 33 Convolution 643535
12 Batch Normalization + ReLU
13 33 Convolution 643535
14 Batch Normalization + ReLU
15 22 max-pooling 641818
16 33 Convolution 1281818
17 Batch Normalization + ReLU
18 33 Convolution 1281818
19 Batch Normalization + ReLU
20 22 max-pooling 12899
21 Flattening 10368
22 Fully connected 1024
23 Batch Normalization+ ReLU
24 Dropout (25%)
25 Fully connected 1024
26 Batch Normalization+ ReLU
27 Dropout (25%)
28 Fully connected 1024
29 Batch Normalization + ReLU
30 Dropout (25%)
31 Fully connected + softmax 2
Table 2: Architectural summary of our classifier that is used to map the input 140140 21cm images into binary model classes (0-Galaxies, 1-AGN).

Figure 5: The evolution of a single 21cm map (top-middle) through the first 16 filters of the classifier layers as stored in the final step of training, for the SKA model in Section 5.3. Titles indicate the filters and the results of the convolutions (conv), including the application of the ReLU activation function. The numbers indicate the order of the convolutional layers in the network as shown in Figure 4. Some filters detect small scale features but most of them strongly highlight the large scale bubbles. It is evident that the classification is driven by the large-scale bubbles.

At the beginning of training, we need to initialize all layers weights and biases with different small numbers to break the symmetry between layers neurons. There are different initialization strategies, and the most common approach is the use of random numbers sampled from a Gaussian distribution of zero mean and finite standard deviation (e.g., = 0.1 or 0.01). However, in order to ensure faster training and keep the input data properties, particularly, the variance, we initialize all weights and biases with a generalized form of Xavier initializer (Xavier & Yoshua, 2010) that is called the Variance Scaling initializer. In such an initialization, the random numbers are drawn from a zero mean Gaussian distribution with a variance equal to the inverse of the average of the number of input and output neurons. This initializer insures that the variance of the input data is preserved through all the classifier layers.

At each training step, we first calculate the distance between the true and predicted labels via their cross entropy. This is our loss function. We next minimize the loss function using a mini-batch gradient descent momentum optimizer. With this optimizer, the loss gradients are computed with respect to the classifiers’ parameters (weights, biases and the batch normalization parameters) for a mini-batch of the training dataset to indicate in which directions of parameter space one should move in order to approach the global minimum. The update is controlled by the learning rate, which speeds/slows the learning process, and with the momentum term, which accelerates the update direction by keeping a fraction of the previous update direction. This fraction is called momentum and is usually set to a value close to unity. At zero momentum, the Momentum optimizer becomes the standard Gradient Decent optimizer. Here, we use a value of 0.9 for momentum and 0.01 for the initial learning rate, which then exponentially decays by 97% for every 20 training steps.

It is worthwhile to mention that the architecture in this paper is different from and more complex than previous architectures used to classify the same models in Hassan et al. (2018). In Hassan et al. (2018), the training datasets were obtained by varying the models’ astrophysical parameters while leaving the underlying density realization fixed. The treatment of instrumental effects was also considerably simpler. As a result, a very simple network architecture, with two convolution blocks and a single fully connected layer, could produce high accuracy ( > 95%) from the first few training epochs. With the more complicated training sets of this paper, deeper and more complex networks are necessary.

To illustrate how the classifier extracts different features from the input 21cm images, we show in Figure 5 an example of a single 21cm image as it passes through the different layers of the trained neural network. We show only the first 16 filters (i.e., convolution kernel) from each convolutional layer. We also show the convolution of each filter with the input image, together with the application of the ReLU activation function. We see that different filters extract subtly different features of the ionised bubbles from the input image. The input image responds differently to each filter and the convolution output shows different levels of bubbles detections. The ReLU activation function then clears away the less important features (negative pixels) and keeps the most prominent features (positive pixels). This process continues through all convolutional layers. The eighth convolutional layer is the last step before the two-dimensional information is collapsed into an abstract data vector. At this layer (Conv8; last row), we see that most of the filters strongly highlight the large scale bubbles while a few of the filters extract small scale features. In addition, the max-pooling layer is applied three times before the last convolutional layer, which reduces the input image dimension from 140140 to 1818, and hence smooths the small scale features more than the large scales ones. This indicates that the classification might be primarily determined by the sizes of large-scale bubbles. As we saw in Figure 3, the prominence of large-scale bubbles depends on the effectiveness of foreground removal. A conservative cut of foregrounds in the data (e.g., with ) makes ionised bubbles difficult to identify. These large scale features reappear when assuming a moderate level of foregrounds, with . At such levels of foreground cleaning, the signal features can be easily recognized even in the presence of noise levels representative of the SKA and HERA.

Figure 6: Classifier accuracy and loss as a function of training step for different redshifts and different neutral fraction range. Solid and dashed lines are the mean accuracy of training and testing samples. Red, blue, green, cyan shaded areas are the 1 level out of the 5 runs with different random number generator to initialize the classifier weights. The classifier converges after the first training steps ( 5 training epochs). Accuracy is greatest at highest redshifts, or when datasets are restricted to narrow neutral fraction ranges. Except at (when the universe is highly ionized), the classifier is able to discriminate between models at an accuracy of about 92-100%, depending on redshift and neutral fraction range.
range z=7 z=8 z=9 z=10
0.55 > x> 0.45 1.0000 0.0000 0.9971 0.0018 1.0000 0.0000 1.0000 0.0000
0.75 > x> 0.25 0.9680 0.0031 0.9875 0.0020 0.9925 0.0021 0.9969 0.0012
0.95 > x> 0.05 0.6662 0.0109 0.9159 0.0096 0.9608 0.0025 0.9642 0.0018
Table 3: Summary of the classifier accuracy and loss as a function of redshift and neutral fraction at the final training step as evaluated on the validation dataset. Results are quoted in terms of the mean and standard devation of the five classifier runs with different random number generator to initialize the weights. Instrumental effects are not taken into account.

4 Training dataset

We generate our training dataset from a simulation box of size 75 Mpc with 140 cells. We choose this box size since it covers the relevant scales that future 21cm surveys will be sensitive to. In addition, we have previously tested the 21cm power spectrum convergence with the box size and found that our simulations are well converged down to a box size of 75 Mpc (Hassan et al., 2016).

We run 1,000 reionization simulation realizations with 1,000 different random seed numbers for the initial density field fluctuations from each model. We choose 1,000 values out of the following parameters ranges:

  • Galaxies: (0.50.01), (10 10), (1.00.0).

  • AGN: (1.00.5), (10 10), (1.00.0).

These parameters range translate into a spread in the total ionizing emissivity of each source model. For instance with this range, at , Galaxies have a total ionization rate range of , whereas the AGN total ionization rate has a range from . These ranges are much larger than the level of the ionising emissivity measurements at by Becker & Bolton (2013), which is . By neglecting such additional constraints in this paper, we are conservatively considering a wider parameter space than is currently allowed by observations. In future work, one hopes that the job of a classifier can be made easier by using ionising emissivity observations to restrict the possible parameter space of models.

In addition to the variations in the astrophysical parameters, our training datasets also have varied underlying density fields (by initializing each box with a different random seed). Our training set is thus much more complex than the ones previously considered in Hassan et al. (2018) and more suitable for training classifiers that might be applied to realistic 21cm surveys.

The specific ranges chosen for the Galaxies parameters come from Hassan et al. (2017). The ranges are essentially the 1 level obtained in an MCMC parameter estimation to match various reionization observables. As for the AGN source model, the AGN photon escape fraction is most commonly assumed to be unity due to the observed hardness of AGN spectra. However, we relax this assumption and consider as lower values as 50%. In Hassan et al. (2018), we showed that the AGN model reproduces ionising emissivity constraints if 10, and hence we consider one order of magnitude above and below this value. The emissivity’s non-linear dependence on black hole mass follows our choice of the similar parameter from Galaxies models. However, as previously shown in Hassan et al. (2017), these parameters are degenerate, particularly between the photon escape fraction and emissivity amplitude, so simulation realizations with values lie outside these ranges can easily be extrapolated.

We note that, from each simulation, one may extract 140 images of 21 cm fluctuations, choosing one of the principal axes (x, y, or z) of the simulation as the line-of-sight axis. A two-dimensional image is generated by extracting a slice with the line-of-sight axis having a thickness of a single cell (0.535 Mpc), and the full extent of the box (75 Mpc) along the other two axes. This shows that 1,000 simulation realizations, in fact, contain 14031000 420,000 possible different 21cm images, which is sufficient for training a deep network. However, close 21cm slices from the same box might share similar features, and hence we jump at least for 2 Mpc comoving distance before we extract the following slice. Out of these 1,000 boxes, there are around 300 boxes that produce a neutral fraction between 95% 5% at different redshifts, and hence we limit our study to those boxes since we expect our two models to produce very similar 21cm fluctuations at the beginning (x > 95%) and end (x < 5%) of reionization.

After generating our training samples, we use 8% for testing, 8% for validation, and 86% for training. For each study, we use at least 6,000 different images of each class for training and at least 1,000 different images of each class for testing and validation. At each training step, we randomly select a batch of 32 labelled images for training and testing. We have tested the effect of selecting small versus large batches and the result remains unchanged.

Test Accuracy
No Instrumental Effects 0.990 0.002
Wedge slope (m)
m=3.4 m=2.0 m=1.0 m=0.5
SKA 0.858 0.007 0.919 0.006 0.976 0.005 0.990 0.003
HERA 0.797 0.011 0.852 0.008 0.917 0.006 0.967 0.002
LOFAR 0.545 0.010 0.544 0.021 0.608 0.029 0.699 0.009
Table 4: Summary of the classifier accuracy for different 21cm experiments at and neutral fraction range of 0.25 < x< 0.75 as a function of the wedge slope. Results are expressed in terms of the mean and standard deviation of the accuracy evaluated on the validation dataset at the final training step for the different five runs.

5 Results

We quantify the classifier performance by its accuracy, which is defined as the percentage of images that are recognized correctly. For each section, we run the classifier five times using different random seeds for initializing the learned parameters of the network. This examines the classifier ability to converge to same results irrespective of the random initializations of weights. For these five different runs, we use the validation sample to evaluate the classifier accuracy at the end of training. We then express all results in terms of the mean accuracy and standard deviation out of these five runs.

Figure 7: Classifier accuracy as a function of the wedge slope for different 21cm experiments at fixed redshift . The red solid, blue dashed, and green dash-dotted lines are the mean validation accuracy from the mock datasets as observed by the SKA, HERA, and LOFAR. Shaded areas are the 1 level of the validation accuracy for the different five runs. It is evident that smaller wedge values () are required for these experiments to achieve the desired accuracy.

5.1 Dependence on redshift and neutral fraction (without noise and foreground removal)

We first study the classifier accuracy and loss (similar to the error rate) as a function of redshift and neutral fraction. The results are shown in Figure 6. The red, blue, green, and cyan colours represent the redshifts 10, 9, 8, and 7, respectively. Solid lines indicate the mean accuracy/loss from training samples whereas the dashed lines are from testing samples. Shaded areas show the 1 regions out of the five runs with different random seeds. In all cases, the classifier converges within the first 2000 training steps ( 5 training epochs) to yield accuracies and losses that stay approximately constant. The 1 levels (shaded areas) are quite small when one is restricted to narrow neutral fraction ranges. This shows that our classifier is robust and insensitive to the random initialization of network weights. We note that the classifier accuracy increases as the neutral fraction range decreases and redshift increases. We summarize the accuracy on the whole validation set for the 5 different runs in Table 3. When the universe is highly ionized at , AGN-only and galaxies-only models tend to produce similar ionization and 21cm fields, and hence the accuracy is low (66%) on the testing and validation samples. In this case, we see that the loss increases slowly, which is an indication of slight over-fitting—with a large degree of similarity between these models at end of reionization, the network is resorting to simply memorizing the inputs and outputs of the training sets. Such an approach naturally fails for the test samples, so the loss rises. Excluding (when our Universe is highly ionized), the classifier is able to attain an accuracy of 92-100% from the testing and validation samples across a wide variety of neutral fractions.

5.2 Classification at a fixed redshift (with noise and foreground removal)

We now determine the classifier accuracy in the presence of instrumental effects. To illustrate the impact of these effects, we use the dataset from , restricted to the neutral fraction range of 0.75 > x> 0.25. We then add noise from SKA, HERA and LOFAR. For each experiment, we consider four datasets with different wedge slopes , 2.0, 1.0, 0.5 to remove foreground-contaminated -modes. We train the classifier on each of the four datasets five times with different random seeds for the generation of initial weights and biases.

Test Accuracy
No Instrumental Effects 0.999 0.001
Wedge slope (m)
m=3.4 m=2.0 m=1.0 m=0.5 m=0.25
SKA 0.864 0.012 0.876 0.015 0.873 0.008 0.880 0.006 0.943 0.003
HERA 0.736 0.015 0.794 0.009 0.799 0.007 0.821 0.006 0.893 0.005
LOFAR 0.566 0.011 0.558 0.014 0.576 0.010 0.578 0.010 0.619 0.022
Table 5: Summary of the classifier accuracy for different 21cm experiments at a fixed power spectrum as a function of the wedge slope. Results are expressed in terms of the mean and standard deviation of the accuracy evaluated on the validation dataset at the final training step for five different runs.

We show the classifier accuracy as a function of the wedge slope in Figure 7. The red solid, blue dashed and green dash-dotted lines are the mean validation accuracy on datasets as observed by SKA, HERA, and LOFAR respectively. The corresponding shaded areas are the 1 levels from five different runs. As previously noted, a conservative wedge slope at is at which all experiments are unable to score an accuracy . We see that the accuracy gradually increases as the wedge slope decreases. We find that the SKA and HERA both achieve 92% of accuracy at and , respectively. However, LOFAR is unable to discriminate between these models at even small values of the wedge slope . Compared to the case without noise, the noise from SKA and HERA at reduces the accuracy by 2% and 7%, respectively. It is quite clear that for lower wedge slopes (), these experiments will be able to achieve a more desirable accuracy. As seen from Figure 3, a wedge slope value of is roughly when large-scale bubbles begin to be recognizable. That the classifier achieves high accuracy () at this point is suggestive of the interpretation that the classifier determines the differences between these models based on large-scale bubble. We summarize the validation accuracy for the five different runs in Table 4.

Figure 8: 21cm power spectra from our Galaxies model at and our AGN model at in the neutral fraction range 0.45< x < 0.95. At these redshifts and neutral fractions, the two models have similar 21cm power spectra and the 21cm maps from these simulations test the classifier’s ability to discriminate between these models using information beyond the 21cm power spectrum.

Figure 9: Same as Figure 7 but for the case when the two models share similar power spectra (shown in Figure 8). Fixing the power spectra of the two models makes it more difficult to distinguish between the models, and foregrounds must be more efficiently removed () for successful classification.

5.3 Classification at a fixed power spectrum (with noise and foreground removal)

In order to determine whether our classification algorithm is taking advantage of information in the 21cm images beyond what is available in the power spectrum, we consider a special case when the two models produce similar 21cm power. A Galaxies-only model yields 21cm power spectra at that are almost identical (at least in shape) to those obtained by the AGN-only model at , within the neutral fraction range . This is shown in Figure 8. As the AGN-only models produce larger ionized bubbles that source a larger amplitude power spectrum, the Galaxies-only models require a greater contribution of power from the matter density field—available at lower redshifts due to structure growth—to compensate for this and to obtain a similar 21cm power spectrum. In any case, comparisons of different semi-numerical simulations reveal that different codes have better agreement when compared at matched neutral fractions rather than at matched redshifts (Zahn et al., 2011). This suggests considerable theoretical uncertainty in the ionization history, and thus it is not unreasonable to consider simulations shifted in redshift as we do here.

In artificially pairing boxes at different redshifts, it is important to also harmonize their respective noise simulations. The thermal noise depends on the system temperature (), which in turn depends on the redshift/frequency through the foreground sky model. Thus, adding a noise realization to our Galaxies boxes and a to our AGN boxes will result in different levels of signal to noise. As a result, the classifier might be able to discriminate between these models based on differences in . To exclude the possibility that the classifier doing this, we conservatively add the noise from to both two datasets.

We then re-do the same exercise as in the previous section and consider different wedge slopes with , 2.0, 1.0, 0.5, 0.25 to generate five different datasets for each of our experiments. We display the classifier accuracy as a function of the wedge slope in Figure 9. Just like before, LOFAR fails to discriminate between these models even with wedge slopes as low as . At this low wedge slope value of , the SKA and HERA obtain accuracies of 94% and 90%, corresponding to a degradation of 5% and 9% in accuracy compared to the case without instrumental effects, respectively. Still, this shows that SKA and HERA are promising telescopes for differentiating between our models (even with degenerate power spectra) when foreground removal is able to achieve wedge slopes with . We summarize the validation accuracy for the five different runs in Table 4.

Unlike Figure 7, we find that the accuracy evolution as a function of wedge slope is not gradual. Here, we see that the accuracy from all experiments is almost constant between and . This sets a limit of for the SKA and HERA to discriminate between Galaxies and AGN models. This limit is roughly a factor of two more stringent than the limit found in the previous section, suggesting that robust foreground mitigation is particularly important for breaking degeneracies that are present in a power spectral analysis alone.

6 Conclusion

In this work, we created a CNN-based classifier of reionization models that is able to discriminate between 21cm images produced by Galaxy- and AGN-driven reionization. In order to prepare this classifier for future 21cm surveys, we included instrumental effects for experiments such as the SKA, HERA, and LOFAR. We have studied the classifier performance as a function of redshift and neutral fraction. We further explored the impact of instrumental assumptions on the classification accuracy and the ability of these 21cm experiments to discriminate between our two models at fixed redshift and with power spectra tuned to be the same between the two scenarios.

Our key findings are as follows:

  • The classifier accuracy increases at high redshifts and/or for restricted neutral fraction ranges. Excluding the case when the universe is highly ionized at , the classifier is able to correctly classify more than 92-100% of the images, with some dependence on redshift and neutral fraction range (see Figure 6 and Table 3).

  • At a fixed redshift of and neutral fraction range , the SKA and HERA achieve 97% and 92% accuracy for a wedge slope of , respectively (see Figure 7 and Table 4).

  • If the power spectrum is constrained to be nearly identical between the two scenarios, the SKA and HERA are still able to achieve 94% and 90% of accuracy for a wedge slope of , respectively (see Figure 9 and Table 5).

  • LOFAR is typically unable to classify models with high accuracy (generally producing results in the regime), due to its relatively large thermal noise uncertainty.

  • None of the experiments are able to achieve a classification accuracy of for conservative values of the wedge slope, which is about at reionization redshifts (i.e. to ).

Our results show the preliminary promise of directly using 21cm images from future surveys to discriminate between Galaxy and AGN reionization scenarios. These direct constraints also have the potential to break degeneracies that would be present in a power spectrum analysis alone. For our proposed techniques to be maximally effective, however, it is important to be able to clean foreground contaminants beyond the most conservative wedge regions in Fourier space.

It is worth mentioning that our results might be limited by the relatively small box size ( 75 Mpc) and coarse resolution ( 0.5 Mpc) used in this study. The differences between Galaxies and AGN reionization models might be more prominent for larger cosmological volumes and higher resolutions, hence relaxing the requirement for smaller wedge slopes. We leave exploring larger volumes and higher resolutions for future work. The approximation implemented in these semi-numerical models of Galaxies and AGN place an additional limitation to the presented results. Nonetheless, this work provides a step forward in our quest to extract astrophysical and cosmological information from future 21cm surveys to reveal the nature of cosmic reionization.


The authors acknowledges helpful discussions with J. Aguirre, N. Gillet, A. Lidz, A. Mesinger, J. Pober, and M. Santos. SH acknowledges the financial assistance of the South Africa SKA Project (SKA SA) towards this research. SH also acknowledges the support received from CAMPARE-HERA Astronomy Minority Program (CHAMP/HERA), where part of this work was initially conducted at University of Pennsylvania. AL acknowledges support for this work by NASA through Hubble Fellowship grant # HST-HF2-51363.001-A awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA, under contract NAS5-26555. SAK is supported by a University of Pennsylvania SAS Dissertation Completion Fellowship.


  1. pubyear: 2017
  2. pagerange: Identifying Reionization Sources from 21cm Maps using Convolutional Neural NetworksIdentifying Reionization Sources from 21cm Maps using Convolutional Neural Networks
  3. https://github.com/jpober/21cmSense


  1. Aragon-Calvo A. M., 2018, arXiv:1804.00816
  2. Becker, G. D., Bolton, J. S. 2013, MNRAS, 436, 1023
  3. Becker, G. D. et al., 2015, MNRAS, 447, 3402
  4. Bond, J. R., Cole, S., Efstathiou, G., Kaiser, N., 1991, ApJ, 379, 440
  5. Bowman, J. D., et al. 2013, PASA, 30, 31
  6. Davé, R., Katz, N., Oppenheimer, B. D., Kollmeier, J. A., Weinberg, D. H. 2013, MNRAS, 434, 2645
  7. DeBoer, D. R., et al. 2017, PASP, 129, 974
  8. Fan, X., Carilli, C.L., Keating B. 2006, Annual Review of Astronomy and Astrophysics, 44, 415
  9. Finlator, K., Thompson, R., Huang, S., Davé, R., Zackrisson, E., Oppenheimer, B. D. 2015, MNRAS, 447, 2526
  10. Ferrarese L., 2002, ApJ, 578, 90
  11. Giallongo E., et al., 2015, A &A, 578, A83
  12. Gillet N. et al, 2018, arXiv:1805.02699
  13. Gupta, A. et al., 2018, Phys. Rev. D 97, 103515
  14. Haardt, F., Madau, P., 2012, ApJ, 746, 125
  15. Hassan, S. and Davé, R. and Finlator, K. and Santos, M. G., 2016, MNRAS, 457, 1550
  16. Hassan, S. and Davé, R. and Finlator, K. and Santos, M. G., 2017, MNRAS, 468, 122
  17. Hassan S., Davé R., Mitra S., Finlator K., Ciardi B., Santos M. G., 2018, MNRAS, 473, 227
  18. Hassan S., Liu A., Kohn S., Aguirre E. J., Plante La P., Lidz A., 2018, Proceedings IAU Symposium, No. 333, 1701083, V. Jelic & T. van der Hulst, eds.
  19. Hopkins P. F., Richards G. T., Hernquist L., 2007, ApJ, 654, 731
  20. Huertas-Company M. et al, 2018, ApJ, 858, 2
  21. Liu A. et al, 2014, Phys. Rev. D 90, 023018
  22. Mellema, G., Koopmans, L., Shukla, H., Datta, K. K., Mesinger, A., & Majumdar, S. 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA14), 10
  23. Paciga, G., et al. 2011, MNRAS, 413, 1174
  24. Parsons, A. R., et al. 2010, AJ, 139, 1468
  25. Pober, J. C., Parsons, A. R. et al, 2013, AJ, 145, 65.
  26. Pober, J. C., Liu, A. et al, 2014, ApJ, 782, 66.
  27. Rawat W. and Wang Z., 2017, Neural Computation, vol. 29, no. 9, pp. 2352–2449.
  28. Roy D., Panda P., Roy K., 2018, arXiv:1802.05800
  29. Planck intermediate results. XLVII, Adam, R., Aghanim, N., et al 2016, arXiv:1605.03507
  30. Press, W. H., Schechter, P., 1974, ApJ, 187, 425.
  31. Santos M. G., Ferramacho L., Silva M. B., Amblard A., Cooray A. 2010, MNRAS, 406, 2421
  32. Shapiro, P. R., Giroux, M. L., 1987, ApJ, 321, 107
  33. Schaefer C., Geiger M., Kuntzer T., Kneib J., 2018, 611, 9.
  34. Simonyan, K. & Zisserman A., 2015, ICLR conference, arXiv:1409.1556
  35. Tingay, S. J., et al. 2013, PASA, 30, 7
  36. Thyagarajan N. et al, 2015, ApJ, 804, 1
  37. Tremaine S., et al., 2002, ApJ, 574, 740
  38. van Haarlem, M. P. et al, 2013, A&A, 556, 2.
  39. Worseck G., Prochaska J. X., Hennawi J. F., McQuinn M., 2016, ApJ, 825, 144
  40. Zel’Dovich Y. B., 1970, A&A, 5, 84
  41. Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249-256, 2010.
  42. Zahn O., Mesinger A., McQuinn M., Trac H., Cen R., Hernquist L. E., 2011, MNRAS, 414, 727
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description