Identifying Reionization Sources from 21cm Maps using Convolutional Neural Networks
Abstract
Active Galactic Nuclei (AGN) and starforming galaxies are leading candidates for being the luminous sources that reionized our Universe. Nextgeneration 21cm surveys are promising to break degeneracies between a broad range of reionization models, hence revealing the nature of the source population. While many current efforts are focused on a measurement of the 21cm power spectrum, some surveys will also image the 21cm field during reionization. This provides further information with which to determine the nature of reionizing sources. We create a Convolutional Neural Network (CNN) that is efficiently able to distinguish between 21cm maps that are produced by AGN versus galaxies scenarios with an accuracy of 92100%, depending on redshift and neutral fraction range. When adding thermal noise from typical 21cm experiments, the classification accuracy depends strongly on the effectiveness of foreground removal. Our results show that if foregrounds can be removed reasonably well, both SKA and HERA should be able to discriminate between source models with greater accuracy than using the power spectrum alone.
keywords:
dark ages, reionisation, first stars  galaxies: active  galaxies: highredshift  galaxies: quasars  intergalactic medium1 Introduction
Lowfrequency radio interferometer experiments, such as the Low Frequency Array (LOFAR; van Haarlem et al., 2013), the Precision Array for Probing the Epoch of Reionization (PAPER; Parsons et al., 2010), the Hydrogen Epoch of Reionization Array (HERA; DeBoer et al., 2017), the Murchison Wide field Array (MWA; Bowman et al., 2013; Tingay et al., 2013), the Giant Metrewave Radio Telescope (GMRT; Paciga et al., 2011), and the Square Kilometer Array (SKA; Mellema et al., 2015), are promising instruments for detecting reionization on cosmological scales in the near future. These growing observational efforts motivate the development of statistical tools for efficiently extracting astrophysical and cosmological information.
The nature of sources driving cosmic reionization remains a debated topic. While many studies (e.g. Shapiro & Giroux, 1987; Hopkins et al., 2007; Haardt & Madau, 2012; Hassan et al., 2018) conclude that reionization is mainly driven by starforming galaxies, there remain some discrepancies that might be resolved by allowing Active Galactic Nuclei (AGN) to reionize the Universe. These discrepancies include the flat slope of ionising emissivity measurements (Becker & Bolton, 2013), the early and extended timing of Helium reionization (Worseck et al., 2016), and large scale intergalactic opacity fluctuations (Becker et al., 2015). In addition, the Giallongo et al. (2015) highredshift AGN measurements indicate a flatter slope at the faint end of the luminosity function, hence implying the presence of sufficient AGN to complete reionization as opposed to what was previously believed.
In Hassan et al. (2018), we studied the role of galaxies versus AGN during reionization using an improved seminumerical simulation. This simulation implements realistic and physically motivated recipes to include ionizing photons from galaxies, following the Finlator et al. (2015) radiative transfer simulations, and from AGN, following the Giallongo et al. (2015) observations and a fixed Quasar Halo Occupancy Distribution. We found that AGNdominated scenarios are unable to simultaneously match current reionization constraints, namely the Planck (2016) optical depth, ionising emissivity measurement by Becker & Bolton (2013), and the Fan et al. (2006) neutral fraction constraints at the end of reionization. This shows that AGN are highly unlikely to be the sole drivers of cosmic reionization. However, a model with equal photon contributions from AGN and galaxies is still allowed given the large uncertainties of current observational constraints. The broad range of theoretical models allowed by current reionization observations is expected to narrow when data from 21cm surveys become available. We found in Hassan et al. (2018) that power spectrum measurements from LOFAR, HERA and SKA could potentially discriminate between galaxiesonly models and those with AGN contributions. This is because the latter produce larger ionized bubbles, and hence boost the 21cm power spectrum by a factor of as compared to the former.
Future 21cm observations are also expected to provide huge datasets of largescale 21cm images that will encode more information than just the 21cm power spectrum. These images might provide an alternative approach to constrain the contribution of different reionizing sources. Here we assess the viability of using Convolutional Neural Networks (CNNs) to discriminate between AGN and galaxy models. This approach has been successfully employed in astronomy and cosmology to perform regression (e.g. Gupta et al., 2018; Gillet et al., 2018) and classification (e.g. AragonCalvo, 2018; HuertasCompany et al., 2018; Schaefer et al., 2018).
In this paper, we first examine the CNN capability to discriminate between these models at several redshifts as a function of neutral fraction when the simulations are free of instrumental effects. We then include realworld survey effects, adding realistic instrumental noise and foreground mitigation cuts in order to test the CNN performance on datasets as they will be seen by the SKA, HERA, and LOFAR surveys. We finally consider the case where our different reionization models share similar power spectra. This is essentially a test of our CNN’s ability to break degeneracies inherent in a power spectral analysis.
This paper is organized as follows: Section 2 provides a brief summary of the simulations used for this study, a description of our two reionization models (galaxies, AGN), and details of our instrument simulations. We introduce Convolutional Neutral Networks and describe the classifier architecture in Section 3. We describe our training datasets in Section 4, present the results in Section 5 and summarize our conclusions in Section 6.
2 Simulations
We use the Timeintegrated version of our seminumerical code SimFast21 (Santos et al., 2010) that has been developed in Hassan et al. (2017). We describe here the general components of our simulation and refer to Santos et al. (2010) for the full description of the algorithm and to Hassan et al. (2016, 2017) for more details on the Timeintegrated version development.
We first generate the dark matter density field using a MonteCarlo Gaussian approach in the linear regime. We then dynamically evolve the linear density field into a nonlinear field by applying the Zel’Dovich (1970) approximation. Using the underlying nonlinear density realizations, the dark matter halos are identified using the well known excursion set formalism (ESF, Press & Schechter, 1974; Bond et al., 1991) in which a region is assumed to collapse into halos, if its mean overdensity is higher than an overdensity threshold of , where is the linear growth factor. Around each point in the density grid, the mean overdensity is obtained through a sequence of tophat filters, starting with the box size and decreasing down to the cell size. The minimum halo mass is set to 10, which corresponds to the hydrogen cooling limit during reionization. In the Timeintegrated model, the ionised regions are identified using a similar ESF that is based on comparing the timeintegrated ionisation rate with that of the recombination rate and the local neutral hydrogen density within each spherical volume specified by the ESF. Regions are flagged as ionised if
(1) 
where is the photon escape fraction, is the ionised fraction, and is the total number of hydrogen atoms. Equation (1) represents the ionisation condition used in the Timeintegrated model. The recombination field is modelled using a derived parametrization taken from a highresolution radiative simulation (Finlator et al., 2015) to account for the small scale inhomogeneity below our cell size and clumping factor effects. The recombination rate, , is parametrized as a function of overdensity and redshift as follows:
(2) 
where cms (proper units), , , . We refer to Hassan et al. (2016) for more details on the derivation of and the effect of this parameter on reionization observables.
The key difference between the galaxy and AGN models is in their prescriptions for computing . We dub this variable for the galaxy model and for the AGN model.
2.1 Galaxies
Our parameterization for the ionizing photon output from galaxies is motivated by a combination of radiative transfer simulations (Finlator et al., 2015) and larger hydrodynamic simulations (Davé et al., 2013) that have been calibrated to match wide range of observations. In particular, is parametrized as a function of halo mass and redshift as follows:
(3) 
where s, , and . This ionization rate is computed directly from the star formation rate (SFR) of these hydrodynamic simulations galaxies catalogs. This parametrization has been dervied fully in Hassan et al. (2016). As previously considered in Hassan et al. (2017), we keep all other parameters fixed and vary the photon escape fraction , the emissivity amplitude and the emissivityhalo mass power dependence index to produce different ionizing emissivities , for our training samples, as follows:
(4) 
where the proportionality constant is
(5) 
2.2 Agn
We compute the ionizing photon output from AGN using the strong correlations seen in local Universe observations (Ferrarese, 2002; Tremaine et al., 2002) between black hole masses and circular velocities of their host halos. This correlation is written as
(6) 
where the constant may be regarded as the black hole formation efficiency, which is a free parameter in the AGN source model besides . Assuming the Eddington luminosity in the Bband and a power law spectral energy distribution, the AGN ionization rate is given by:
(7) 
where is Planck’s constant and
To study the effect of massive and less massive AGN, it is reasonable to assume a nonlinear relation between and , similar to our Galaxies source model in which depends nonlinearly on through . Thus, we introduce a new parameter in the above relations. This leads to an AGN model with three free parameters: AGN photon escape fraction , ionising emissivityblack hole mass dependence (the nonlinear scale index) , and ionising emissivity amplitude (equivalent to the black hole formation efficiency). We keep all other parameters fixed and vary these parameters to generate different ionizing emissivities , for our training datasets, using:
(8) 
where the proportionality constant is . This shows how the AGN ionizing emissivity is related to the host halo mass. We next use the Giallongo et al. (2015) luminosity function at to find the corresponding AGN duty cycle for each halo mass bin, which is the ratio between the number of AGN to the number of possible host halos. This ratio is alternatively called the Quasar Halo Occupancy distribution (QHOD). The QHOD then provides a probability for a particular host halos to have an AGN that is actually active, and we use the QHOD to randomly designate a given halo as one that produces ionizing photons according the prescription outlined above. Instead of using a fixed luminosity function, we use a fixed QHOD at higher redshifts to find the number of AGN. We defer to Hassan et al. (2018) for the full details of our AGN source model prescription.
In Figure 1, we show different 21cm images for Galaxies and AGN models at , 9, 8, 7 at different stages of reionization when x 0.25, 0.5, 0.75. Each image is extracted from an 21cm box that is generated from a different realization of the underlying density field (different seed) and different set of astrophysical parameters. These parameters are quotes above each image, in the format () for Galaxies and () for AGN. These images are randomly selected from our training datasets, whose assembly we explain in greater detail in Section 4. We see that Galaxies scenarios (first three rows) have more small scale bubbles than AGN scenarios (last three rows), which tend to have larger bubbles. The size of ionised bubbles in each model is basically controlled by the emissivity’s dependence on mass, namely and for Galaxies and AGN, respectively. Models with larger values of generally tend to produce larger ionised bubbles than models with smaller values. For Galaxies and AGN models that have the same and values, AGN models will typically still produce larger ionised bubbles at same neutral fraction and redshift. This can be seen in Figure 1 at , 9 and 0.25, 0.75 for , 0.67, respectively. On the other hand, if a Galaxies model uses a larger value for than an AGN model does for , the Galaxies model will produce larger ionised bubbles at the same redshift and neutral fraction as seen at and 50%. Of the remaining parameters, the photon escape fraction and the emissivity amplitude linearly increase the ionising emissivity by the same amount for every source, and hence they have a bigger effect on the reionization history than on the reionization topology.
To summarize, the main differences between our Galaxies and AGN models (which we hope to distinguish using our classifier) are:

The AGN model accounts for redshift evolution by following the halos mass function (HMF), whereas the Galaxies model has an extra redshift evolution factor; see Equation (4).

In Galaxies model, the starformation duty cycle is assumed to be unity, which means that all identified halos are forming stars and contributing to the total ionizing emissivity. On the other hand, AGN models implement a duty cycle following the Giallongo et al. (2015) luminosity function, which means that not all halos host AGN and contribute to the photon production rate.
2.3 21cm Instrument Simulations
We now describe our methodology for including realistic 21cm instrumental effects. Doing so requires accounting for the finite angular resolution of telescopes, the effects of foreground cleaning, and the presence of instrumental noise.
Angular resolution
The finite angular resolution of an 21cm experiment limits sensitivity on large scales, where is the comoving wavenumber in the directions perpendicular to our line of sight (i.e., the angular directions). Our recipe for adjusting the 21cm brightness temperature boxes to account for an instrument’s angular resolution is as follows:

We compute the maximum baseline length () from the antenna distribution of the 21cm experiment in question.

We compute the corresponding cutoff using:
(9) where is the comoving distance to and is the wavelength of the 21cm line at redshift .

We Fourier transform the 21cm box, arbitrarily picking two of the axes of the box to be the and axes (with Fourier dual wavenumbers of and , respectively). We then compute for every voxel of the box.

We zero out all Fourier modes with .
In this (somewhat simplistic) treatment, the angular resolution of a telescope is determined by the longest baseline of the array. In future work, it will be necessary to improve upon this by explicitly accounting for the detailed baseline distribution of different interferometers. For this paper, however, our simple prescription is sufficient for the purposes of qualitatively exploring the efficiency of our CNN classifier with different levels of angular resolution.
Foreground cleaning
We next adjust the 21cm brightness temperature boxes by removing modes that are contaminated by foregrounds. These modes come in the form of a “foreground wedge” in the  plane (see, e.g., Liu et al. 2014 for more details on the wedge and its complement, the reionization window). The wedge can be parameterized by saying that modes satisfying
(10) 
are foreground contaminated, where is a wedge slope that is given by
(11) 
where is the Hubble parameter, is the speed of light, and . A conservative way to fight foregrounds is simply to zero all modes within the wedge.
Using Equation (11), the wedge slope value is about during reionization redshifts. However, we also consider smaller values of to model foreground mitigation algorithms that are able to reduce the Fourier space footprint of the contaminants. Such algorithms reduce the number of modes that need to be zeroed out, preserving more of the information content of the data.
To mimic the effect of foreground contaminants (and the effect of their removal) in our boxes, we proceed as follows:

We compute the wedge slope at the appropriate redshift using Equation (11) and additionally consider wedge slopes that are smaller than this value.

We zero out all Fourier modes satisfying .
Experiment  Design  Diameter [m]  Collecting area [m]  Receiver temperature [mK] 

LOFAR  48 tiles of bowtie high band antennae  30.75  35,762  140,000 
HERA  350 hexagonally packed antennae  14  50,953  100,000 
SKA  866 compact core antennae  35  833,189  100 T + 40,000 
Thermal noise
To add instrumental noise, we make use of 21cmSense

The thermal noise power spectrum is computed by the 21cmSense package using the expression
(12) where is a conversion factor from angle and frequency units to comoving cosmological distances, is the primary beam’s fieldofview, is the integration time on the mode with wavenumber , and is the total system (sky plus receiver) temperature. The final proposed configurations of our considered experiments is summarized in Table 1.

We generate a white noise box drawn from a Gaussian distribution with a zero mean and unit standard deviation.

We transform the white noise box to the Fourier space and scale it using the noise power spectrum.

We then inverse Fourier transform the scaled white noise box back to real space.
As a final step, we add the noise realization box to our cosmological signal box (which has been treated for angular resolution and foreground effects, as discussed above). This then forms our mock 21cm observations.
Figure 2 illustrates the impact of instrumental effects. There, we show a slice from a box based on our Galaxies model at (bottom left), the same slice but angular resolution limited (excluding modes with ; first row), wedgefiltered slices using a wedge slope value (second row), the thermal noise contributions from each experiment (third row), and finally the image as observed by SKA, HERA, and LOFAR, respectively. In the first row, we see the effect of angular resolution, which is set by the maximum baseline. The maximum baseline lengths for the SKA, HERA and LOFAR are about , and (see Table 1), which correspond to of about , , and , respectively. The SKA and LOFAR have higher angular resolution, as compared with HERA, and hence their angularly limited versions (first row) are quite similar to the original signal (bottom left). The low angular resolution of HERA prevents the resolution of small scale features of the signal, but bubbles can still be detected on large scales. We next apply the wedge filtering on top of the angular resolution limited signal, which results in the images on the second row. Here, we used a wedge slope of , which is the conservative expectation at given Equation (11). It is quite clear that using destroys most of the signal features, on large scales along and on small scales along . Next, we produce thermal noise maps (third row). As expected, the SKA has a dimmer noise map than HERA and LOFAR do. Finally, we add the noise maps to the angular resolutionlimited and wedgefiltered signal to produce mock images as seen by our experiments (last row). We see that HERA and SKA are able to see the most prominent features of the wedgefiltered images. The thermal noise from LOFAR, on the other hand, dominates the image, and so we expect that LOFAR may not be able to easily distinguish between images generated from the Galaxies and AGN scenarios.
In Figure 3, we explore the effect of different wedge filtering levels. We show the signal’s response to different wedge slopes , 2.0, 1.0, 0.5, and 0.25. Results for the SKA, HERA and LOFAR are shown in the first, second, and third row, respectively. As expected, the lower the wedge slope, the clearer the signal features are, since fewer modes are removed due to foregrounds. At , the signal prominent features on large scales begin to appear more clearly, and hence we expect better classification performance at this limit of wedge filtering for the SKA and HERA. On the other hand, thermal noise from LOFAR limits the identification of signal features, even at very small wedge filtering levels (). In Section 5, we will explore the classifier ability to discriminate between Galaxies and AGN models as a function of the wedge slope.
3 Reionization Models Classifier Architecture
We employ Convolutional Neural Networks (CNNs) to build up the reionization models classifier. CNNs are a class of deep learning models that are powerful for large scale image recognition (for a comprehensive review see Rawat & Wang, 2017). Similar to the classical Artificial Neural Networks (ANNs), CNNs are feedforward networks that take in an input () and eventually outputs a predicted label (). The connection between these is constructed by stacking neuron layers, each of which is a linear combination of the components of with weights and biases. Each layer can produce multiple outputs of the form
(13) 
where is the number of inputs in the layer, is the th output (or “feature”), is the weighting of the th input in the th feature, and is the bias in the th feature. The function is a nonlinear function response function that is chosen by the architect of the neural network.
The CNN architecture is comprised of a series of different type of neuron layers that each have a specific form for the weights:

Convolutional layers extract different features by convolving the input image with learned weights of 2dimensional kernels.

Pooling layers reduce the dimension and spatial resolution of features. They speed up the network performance and prevent overfitting. Examples of pooling layers are averagepooling or maxpooling in which the input image is divided into nonoverlapping subimages where the values of each subimage are the average or max of its original values from the input image, respectively.

Fully connected layers are the classical ANN layers in which the input is connected to all neurons.
Two other standard techniques are employed in our CNN:

Dropout regularizes the training process by randomly switching off neurons with a given probability. This prevents coadaptations of same features and overcomes the problem of overfitting.

Batch Normalization insures that the input images, before each layer, are uncorrelated while keeping the averages and variances fixed. Batch normalization scales and shifts the input by two new learned parameters and as follows:
(14) where is a small constant to avoid arithmetical instability. This technique has shown to improve training performance significantly (e.g. Roy et al., 2018; Schaefer et al., 2018).
Combining this together, our CNN architecture is as follows: the network is composed of four blocks of convolutional layers, three fully connected layers, and an output layer. Similar CNN architectures have been very successful to perform largescale image recognition (e.g. Simonyan & Zisserman, 2015; Schaefer et al., 2018). In each block, there are two convolutional layers with convolutional kernels of size 33 and a single 22 maxpooling layer. The number of generated features increases from one block to another while the features’ dimension decreases through the maxpooling. The final output layer with two neurons is applied to classify the input 21cm map by providing the probabilities that the input images were generated from the Galaxies or AGN models. Dropout is applied on all fully connected layers (only during training) to keep 75% of the neurons. Before each layer, batch normalization is applied except for the final output layer where a softmax activation is applied instead. The Rectified Linear Unit (ReLU) activation function is applied on all convolutional and fully connected layers. The ReLU function is defined as:
(15) 
which returns if the input is negative; otherwise it keeps the raw input. Figure 4 shows our classifier design and Table 2 summarizes the image processing steps through the classifier layers as outlined above.
Step  Layer type  Output dimension 

1  33 Convolution  16140140 
2  Batch Normalization + ReLU  
3  33 Convolution  16140140 
4  Batch Normalization + ReLU  
5  22 maxpooling  167070 
6  33 Convolution  327070 
7  Batch Normalization + ReLU  
8  33 Convolution  327070 
9  Batch Normalization + ReLU  
10  22 maxpooling  323535 
11  33 Convolution  643535 
12  Batch Normalization + ReLU  
13  33 Convolution  643535 
14  Batch Normalization + ReLU  
15  22 maxpooling  641818 
16  33 Convolution  1281818 
17  Batch Normalization + ReLU  
18  33 Convolution  1281818 
19  Batch Normalization + ReLU  
20  22 maxpooling  12899 
21  Flattening  10368 
22  Fully connected  1024 
23  Batch Normalization+ ReLU  
24  Dropout (25%)  
25  Fully connected  1024 
26  Batch Normalization+ ReLU  
27  Dropout (25%)  
28  Fully connected  1024 
29  Batch Normalization + ReLU  
30  Dropout (25%)  
31  Fully connected + softmax  2 
At the beginning of training, we need to initialize all layers weights and biases with different small numbers to break the symmetry between layers neurons. There are different initialization strategies, and the most common approach is the use of random numbers sampled from a Gaussian distribution of zero mean and finite standard deviation (e.g., = 0.1 or 0.01). However, in order to ensure faster training and keep the input data properties, particularly, the variance, we initialize all weights and biases with a generalized form of Xavier initializer (Xavier & Yoshua, 2010) that is called the Variance Scaling initializer. In such an initialization, the random numbers are drawn from a zero mean Gaussian distribution with a variance equal to the inverse of the average of the number of input and output neurons. This initializer insures that the variance of the input data is preserved through all the classifier layers.
At each training step, we first calculate the distance between the true and predicted labels via their cross entropy. This is our loss function. We next minimize the loss function using a minibatch gradient descent momentum optimizer. With this optimizer, the loss gradients are computed with respect to the classifiers’ parameters (weights, biases and the batch normalization parameters) for a minibatch of the training dataset to indicate in which directions of parameter space one should move in order to approach the global minimum. The update is controlled by the learning rate, which speeds/slows the learning process, and with the momentum term, which accelerates the update direction by keeping a fraction of the previous update direction. This fraction is called momentum and is usually set to a value close to unity. At zero momentum, the Momentum optimizer becomes the standard Gradient Decent optimizer. Here, we use a value of 0.9 for momentum and 0.01 for the initial learning rate, which then exponentially decays by 97% for every 20 training steps.
It is worthwhile to mention that the architecture in this paper is different from and more complex than previous architectures used to classify the same models in Hassan et al. (2018). In Hassan et al. (2018), the training datasets were obtained by varying the models’ astrophysical parameters while leaving the underlying density realization fixed. The treatment of instrumental effects was also considerably simpler. As a result, a very simple network architecture, with two convolution blocks and a single fully connected layer, could produce high accuracy ( > 95%) from the first few training epochs. With the more complicated training sets of this paper, deeper and more complex networks are necessary.
To illustrate how the classifier extracts different features from the input 21cm images, we show in Figure 5 an example of a single 21cm image as it passes through the different layers of the trained neural network. We show only the first 16 filters (i.e., convolution kernel) from each convolutional layer. We also show the convolution of each filter with the input image, together with the application of the ReLU activation function. We see that different filters extract subtly different features of the ionised bubbles from the input image. The input image responds differently to each filter and the convolution output shows different levels of bubbles detections. The ReLU activation function then clears away the less important features (negative pixels) and keeps the most prominent features (positive pixels). This process continues through all convolutional layers. The eighth convolutional layer is the last step before the twodimensional information is collapsed into an abstract data vector. At this layer (Conv8; last row), we see that most of the filters strongly highlight the large scale bubbles while a few of the filters extract small scale features. In addition, the maxpooling layer is applied three times before the last convolutional layer, which reduces the input image dimension from 140140 to 1818, and hence smooths the small scale features more than the large scales ones. This indicates that the classification might be primarily determined by the sizes of largescale bubbles. As we saw in Figure 3, the prominence of largescale bubbles depends on the effectiveness of foreground removal. A conservative cut of foregrounds in the data (e.g., with ) makes ionised bubbles difficult to identify. These large scale features reappear when assuming a moderate level of foregrounds, with . At such levels of foreground cleaning, the signal features can be easily recognized even in the presence of noise levels representative of the SKA and HERA.
Accuracy  

range  z=7  z=8  z=9  z=10 
0.55 > x> 0.45  1.0000 0.0000  0.9971 0.0018  1.0000 0.0000  1.0000 0.0000 
0.75 > x> 0.25  0.9680 0.0031  0.9875 0.0020  0.9925 0.0021  0.9969 0.0012 
0.95 > x> 0.05  0.6662 0.0109  0.9159 0.0096  0.9608 0.0025  0.9642 0.0018 
4 Training dataset
We generate our training dataset from a simulation box of size 75 Mpc with 140 cells. We choose this box size since it covers the relevant scales that future 21cm surveys will be sensitive to. In addition, we have previously tested the 21cm power spectrum convergence with the box size and found that our simulations are well converged down to a box size of 75 Mpc (Hassan et al., 2016).
We run 1,000 reionization simulation realizations with 1,000 different random seed numbers for the initial density field fluctuations from each model. We choose 1,000 values out of the following parameters ranges:

Galaxies: (0.50.01), (10 10), (1.00.0).

AGN: (1.00.5), (10 10), (1.00.0).
These parameters range translate into a spread in the total ionizing emissivity of each source model. For instance with this range, at , Galaxies have a total ionization rate range of , whereas the AGN total ionization rate has a range from . These ranges are much larger than the level of the ionising emissivity measurements at by Becker & Bolton (2013), which is . By neglecting such additional constraints in this paper, we are conservatively considering a wider parameter space than is currently allowed by observations. In future work, one hopes that the job of a classifier can be made easier by using ionising emissivity observations to restrict the possible parameter space of models.
In addition to the variations in the astrophysical parameters, our training datasets also have varied underlying density fields (by initializing each box with a different random seed). Our training set is thus much more complex than the ones previously considered in Hassan et al. (2018) and more suitable for training classifiers that might be applied to realistic 21cm surveys.
The specific ranges chosen for the Galaxies parameters come from Hassan et al. (2017). The ranges are essentially the 1 level obtained in an MCMC parameter estimation to match various reionization observables. As for the AGN source model, the AGN photon escape fraction is most commonly assumed to be unity due to the observed hardness of AGN spectra. However, we relax this assumption and consider as lower values as 50%. In Hassan et al. (2018), we showed that the AGN model reproduces ionising emissivity constraints if 10, and hence we consider one order of magnitude above and below this value. The emissivity’s nonlinear dependence on black hole mass follows our choice of the similar parameter from Galaxies models. However, as previously shown in Hassan et al. (2017), these parameters are degenerate, particularly between the photon escape fraction and emissivity amplitude, so simulation realizations with values lie outside these ranges can easily be extrapolated.
We note that, from each simulation, one may extract 140 images of 21 cm fluctuations, choosing one of the principal axes (x, y, or z) of the simulation as the lineofsight axis. A twodimensional image is generated by extracting a slice with the lineofsight axis having a thickness of a single cell (0.535 Mpc), and the full extent of the box (75 Mpc) along the other two axes. This shows that 1,000 simulation realizations, in fact, contain 14031000 420,000 possible different 21cm images, which is sufficient for training a deep network. However, close 21cm slices from the same box might share similar features, and hence we jump at least for 2 Mpc comoving distance before we extract the following slice. Out of these 1,000 boxes, there are around 300 boxes that produce a neutral fraction between 95% 5% at different redshifts, and hence we limit our study to those boxes since we expect our two models to produce very similar 21cm fluctuations at the beginning (x > 95%) and end (x < 5%) of reionization.
After generating our training samples, we use 8% for testing, 8% for validation, and 86% for training. For each study, we use at least 6,000 different images of each class for training and at least 1,000 different images of each class for testing and validation. At each training step, we randomly select a batch of 32 labelled images for training and testing. We have tested the effect of selecting small versus large batches and the result remains unchanged.
Test  Accuracy  

No Instrumental Effects  0.990 0.002  
Wedge slope (m)  
m=3.4  m=2.0  m=1.0  m=0.5  
SKA  0.858 0.007  0.919 0.006  0.976 0.005  0.990 0.003 
HERA  0.797 0.011  0.852 0.008  0.917 0.006  0.967 0.002 
LOFAR  0.545 0.010  0.544 0.021  0.608 0.029  0.699 0.009 
5 Results
We quantify the classifier performance by its accuracy, which is defined as the percentage of images that are recognized correctly. For each section, we run the classifier five times using different random seeds for initializing the learned parameters of the network. This examines the classifier ability to converge to same results irrespective of the random initializations of weights. For these five different runs, we use the validation sample to evaluate the classifier accuracy at the end of training. We then express all results in terms of the mean accuracy and standard deviation out of these five runs.
5.1 Dependence on redshift and neutral fraction (without noise and foreground removal)
We first study the classifier accuracy and loss (similar to the error rate) as a function of redshift and neutral fraction. The results are shown in Figure 6. The red, blue, green, and cyan colours represent the redshifts 10, 9, 8, and 7, respectively. Solid lines indicate the mean accuracy/loss from training samples whereas the dashed lines are from testing samples. Shaded areas show the 1 regions out of the five runs with different random seeds. In all cases, the classifier converges within the first 2000 training steps ( 5 training epochs) to yield accuracies and losses that stay approximately constant. The 1 levels (shaded areas) are quite small when one is restricted to narrow neutral fraction ranges. This shows that our classifier is robust and insensitive to the random initialization of network weights. We note that the classifier accuracy increases as the neutral fraction range decreases and redshift increases. We summarize the accuracy on the whole validation set for the 5 different runs in Table 3. When the universe is highly ionized at , AGNonly and galaxiesonly models tend to produce similar ionization and 21cm fields, and hence the accuracy is low (66%) on the testing and validation samples. In this case, we see that the loss increases slowly, which is an indication of slight overfitting—with a large degree of similarity between these models at end of reionization, the network is resorting to simply memorizing the inputs and outputs of the training sets. Such an approach naturally fails for the test samples, so the loss rises. Excluding (when our Universe is highly ionized), the classifier is able to attain an accuracy of 92100% from the testing and validation samples across a wide variety of neutral fractions.
5.2 Classification at a fixed redshift (with noise and foreground removal)
We now determine the classifier accuracy in the presence of instrumental effects. To illustrate the impact of these effects, we use the dataset from , restricted to the neutral fraction range of 0.75 > x> 0.25. We then add noise from SKA, HERA and LOFAR. For each experiment, we consider four datasets with different wedge slopes , 2.0, 1.0, 0.5 to remove foregroundcontaminated modes. We train the classifier on each of the four datasets five times with different random seeds for the generation of initial weights and biases.
Test  Accuracy  

No Instrumental Effects  0.999 0.001  
Wedge slope (m)  
m=3.4  m=2.0  m=1.0  m=0.5  m=0.25  
SKA  0.864 0.012  0.876 0.015  0.873 0.008  0.880 0.006  0.943 0.003 
HERA  0.736 0.015  0.794 0.009  0.799 0.007  0.821 0.006  0.893 0.005 
LOFAR  0.566 0.011  0.558 0.014  0.576 0.010  0.578 0.010  0.619 0.022 
We show the classifier accuracy as a function of the wedge slope in Figure 7. The red solid, blue dashed and green dashdotted lines are the mean validation accuracy on datasets as observed by SKA, HERA, and LOFAR respectively. The corresponding shaded areas are the 1 levels from five different runs. As previously noted, a conservative wedge slope at is at which all experiments are unable to score an accuracy . We see that the accuracy gradually increases as the wedge slope decreases. We find that the SKA and HERA both achieve 92% of accuracy at and , respectively. However, LOFAR is unable to discriminate between these models at even small values of the wedge slope . Compared to the case without noise, the noise from SKA and HERA at reduces the accuracy by 2% and 7%, respectively. It is quite clear that for lower wedge slopes (), these experiments will be able to achieve a more desirable accuracy. As seen from Figure 3, a wedge slope value of is roughly when largescale bubbles begin to be recognizable. That the classifier achieves high accuracy () at this point is suggestive of the interpretation that the classifier determines the differences between these models based on largescale bubble. We summarize the validation accuracy for the five different runs in Table 4.
5.3 Classification at a fixed power spectrum (with noise and foreground removal)
In order to determine whether our classification algorithm is taking advantage of information in the 21cm images beyond what is available in the power spectrum, we consider a special case when the two models produce similar 21cm power. A Galaxiesonly model yields 21cm power spectra at that are almost identical (at least in shape) to those obtained by the AGNonly model at , within the neutral fraction range . This is shown in Figure 8. As the AGNonly models produce larger ionized bubbles that source a larger amplitude power spectrum, the Galaxiesonly models require a greater contribution of power from the matter density field—available at lower redshifts due to structure growth—to compensate for this and to obtain a similar 21cm power spectrum. In any case, comparisons of different seminumerical simulations reveal that different codes have better agreement when compared at matched neutral fractions rather than at matched redshifts (Zahn et al., 2011). This suggests considerable theoretical uncertainty in the ionization history, and thus it is not unreasonable to consider simulations shifted in redshift as we do here.
In artificially pairing boxes at different redshifts, it is important to also harmonize their respective noise simulations. The thermal noise depends on the system temperature (), which in turn depends on the redshift/frequency through the foreground sky model. Thus, adding a noise realization to our Galaxies boxes and a to our AGN boxes will result in different levels of signal to noise. As a result, the classifier might be able to discriminate between these models based on differences in . To exclude the possibility that the classifier doing this, we conservatively add the noise from to both two datasets.
We then redo the same exercise as in the previous section and consider different wedge slopes with , 2.0, 1.0, 0.5, 0.25 to generate five different datasets for each of our experiments. We display the classifier accuracy as a function of the wedge slope in Figure 9. Just like before, LOFAR fails to discriminate between these models even with wedge slopes as low as . At this low wedge slope value of , the SKA and HERA obtain accuracies of 94% and 90%, corresponding to a degradation of 5% and 9% in accuracy compared to the case without instrumental effects, respectively. Still, this shows that SKA and HERA are promising telescopes for differentiating between our models (even with degenerate power spectra) when foreground removal is able to achieve wedge slopes with . We summarize the validation accuracy for the five different runs in Table 4.
Unlike Figure 7, we find that the accuracy evolution as a function of wedge slope is not gradual. Here, we see that the accuracy from all experiments is almost constant between and . This sets a limit of for the SKA and HERA to discriminate between Galaxies and AGN models. This limit is roughly a factor of two more stringent than the limit found in the previous section, suggesting that robust foreground mitigation is particularly important for breaking degeneracies that are present in a power spectral analysis alone.
6 Conclusion
In this work, we created a CNNbased classifier of reionization models that is able to discriminate between 21cm images produced by Galaxy and AGNdriven reionization. In order to prepare this classifier for future 21cm surveys, we included instrumental effects for experiments such as the SKA, HERA, and LOFAR. We have studied the classifier performance as a function of redshift and neutral fraction. We further explored the impact of instrumental assumptions on the classification accuracy and the ability of these 21cm experiments to discriminate between our two models at fixed redshift and with power spectra tuned to be the same between the two scenarios.
Our key findings are as follows:

The classifier accuracy increases at high redshifts and/or for restricted neutral fraction ranges. Excluding the case when the universe is highly ionized at , the classifier is able to correctly classify more than 92100% of the images, with some dependence on redshift and neutral fraction range (see Figure 6 and Table 3).

LOFAR is typically unable to classify models with high accuracy (generally producing results in the regime), due to its relatively large thermal noise uncertainty.

None of the experiments are able to achieve a classification accuracy of for conservative values of the wedge slope, which is about at reionization redshifts (i.e. to ).
Our results show the preliminary promise of directly using 21cm images from future surveys to discriminate between Galaxy and AGN reionization scenarios. These direct constraints also have the potential to break degeneracies that would be present in a power spectrum analysis alone. For our proposed techniques to be maximally effective, however, it is important to be able to clean foreground contaminants beyond the most conservative wedge regions in Fourier space.
It is worth mentioning that our results might be limited by the relatively small box size ( 75 Mpc) and coarse resolution ( 0.5 Mpc) used in this study. The differences between Galaxies and AGN reionization models might be more prominent for larger cosmological volumes and higher resolutions, hence relaxing the requirement for smaller wedge slopes. We leave exploring larger volumes and higher resolutions for future work. The approximation implemented in these seminumerical models of Galaxies and AGN place an additional limitation to the presented results. Nonetheless, this work provides a step forward in our quest to extract astrophysical and cosmological information from future 21cm surveys to reveal the nature of cosmic reionization.
acknowledgements
The authors acknowledges helpful discussions with J. Aguirre, N. Gillet, A. Lidz, A. Mesinger, J. Pober, and M. Santos. SH acknowledges the financial assistance of the South Africa SKA Project (SKA SA) towards this research. SH also acknowledges the support received from CAMPAREHERA Astronomy Minority Program (CHAMP/HERA), where part of this work was initially conducted at University of Pennsylvania. AL acknowledges support for this work by NASA through Hubble Fellowship grant # HSTHF251363.001A awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA, under contract NAS526555. SAK is supported by a University of Pennsylvania SAS Dissertation Completion Fellowship.
Footnotes
 pubyear: 2017
 pagerange: Identifying Reionization Sources from 21cm Maps using Convolutional Neural Networks–Identifying Reionization Sources from 21cm Maps using Convolutional Neural Networks
 https://github.com/jpober/21cmSense
References
 AragonCalvo A. M., 2018, arXiv:1804.00816
 Becker, G. D., Bolton, J. S. 2013, MNRAS, 436, 1023
 Becker, G. D. et al., 2015, MNRAS, 447, 3402
 Bond, J. R., Cole, S., Efstathiou, G., Kaiser, N., 1991, ApJ, 379, 440
 Bowman, J. D., et al. 2013, PASA, 30, 31
 Davé, R., Katz, N., Oppenheimer, B. D., Kollmeier, J. A., Weinberg, D. H. 2013, MNRAS, 434, 2645
 DeBoer, D. R., et al. 2017, PASP, 129, 974
 Fan, X., Carilli, C.L., Keating B. 2006, Annual Review of Astronomy and Astrophysics, 44, 415
 Finlator, K., Thompson, R., Huang, S., Davé, R., Zackrisson, E., Oppenheimer, B. D. 2015, MNRAS, 447, 2526
 Ferrarese L., 2002, ApJ, 578, 90
 Giallongo E., et al., 2015, A &A, 578, A83
 Gillet N. et al, 2018, arXiv:1805.02699
 Gupta, A. et al., 2018, Phys. Rev. D 97, 103515
 Haardt, F., Madau, P., 2012, ApJ, 746, 125
 Hassan, S. and Davé, R. and Finlator, K. and Santos, M. G., 2016, MNRAS, 457, 1550
 Hassan, S. and Davé, R. and Finlator, K. and Santos, M. G., 2017, MNRAS, 468, 122
 Hassan S., Davé R., Mitra S., Finlator K., Ciardi B., Santos M. G., 2018, MNRAS, 473, 227
 Hassan S., Liu A., Kohn S., Aguirre E. J., Plante La P., Lidz A., 2018, Proceedings IAU Symposium, No. 333, 1701083, V. Jelic & T. van der Hulst, eds.
 Hopkins P. F., Richards G. T., Hernquist L., 2007, ApJ, 654, 731
 HuertasCompany M. et al, 2018, ApJ, 858, 2
 Liu A. et al, 2014, Phys. Rev. D 90, 023018
 Mellema, G., Koopmans, L., Shukla, H., Datta, K. K., Mesinger, A., & Majumdar, S. 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA14), 10
 Paciga, G., et al. 2011, MNRAS, 413, 1174
 Parsons, A. R., et al. 2010, AJ, 139, 1468
 Pober, J. C., Parsons, A. R. et al, 2013, AJ, 145, 65.
 Pober, J. C., Liu, A. et al, 2014, ApJ, 782, 66.
 Rawat W. and Wang Z., 2017, Neural Computation, vol. 29, no. 9, pp. 2352â2449.
 Roy D., Panda P., Roy K., 2018, arXiv:1802.05800
 Planck intermediate results. XLVII, Adam, R., Aghanim, N., et al 2016, arXiv:1605.03507
 Press, W. H., Schechter, P., 1974, ApJ, 187, 425.
 Santos M. G., Ferramacho L., Silva M. B., Amblard A., Cooray A. 2010, MNRAS, 406, 2421
 Shapiro, P. R., Giroux, M. L., 1987, ApJ, 321, 107
 Schaefer C., Geiger M., Kuntzer T., Kneib J., 2018, 611, 9.
 Simonyan, K. & Zisserman A., 2015, ICLR conference, arXiv:1409.1556
 Tingay, S. J., et al. 2013, PASA, 30, 7
 Thyagarajan N. et al, 2015, ApJ, 804, 1
 Tremaine S., et al., 2002, ApJ, 574, 740
 van Haarlem, M. P. et al, 2013, A&A, 556, 2.
 Worseck G., Prochaska J. X., Hennawi J. F., McQuinn M., 2016, ApJ, 825, 144
 Zel’Dovich Y. B., 1970, A&A, 5, 84
 Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249256, 2010.
 Zahn O., Mesinger A., McQuinn M., Trac H., Cen R., Hernquist L. E., 2011, MNRAS, 414, 727