# Statistics of the epoch of reionization 21-cm signal – I. Power spectrum error-covariance

###### Abstract

The non-Gaussian nature of the epoch of reionization (EoR) 21-cm signal has a significant impact on the error variance of its power spectrum . We have used a large ensemble of semi-numerical simulations and an analytical model to estimate the effect of this non-Gaussianity on the entire error-covariance matrix . Our analytical model shows that has contributions from two sources. One is the usual variance for a Gaussian random field which scales inversely of the number of modes that goes into the estimation of . The other is the trispectrum of the signal. Using the simulated 21-cm signal ensemble, an ensemble of the randomized signal and ensembles of Gaussian random ensembles we have quantified the effect of the trispectrum on the error variance . We find that its relative contribution is comparable to or larger than that of the Gaussian term for the range , and can be even times larger at . We also establish that the off-diagonal terms of have statistically significant non-zero values which arise purely from the trispectrum. This further signifies that the error in different modes are not independent. We find a strong correlation between the errors at large values (), and a weak correlation between the smallest and largest values. There is also a small anti-correlation between the errors in the smallest and intermediate values. These results are relevant for the range that will be probed by the current and upcoming EoR 21-cm experiments.

###### keywords:

methods: statistical - cosmology: theory - dark ages, reionization, first stars - diffuse radiation.^{†}

^{†}pagerange: Statistics of the epoch of reionization 21-cm signal – I. Power spectrum error-covariance–References

^{†}

^{†}pubyear: 2015

## 1 Introduction

The epoch of reionization (EoR) is one of the least known but important periods in the history of our Universe. During this epoch the diffused hydrogen in our universe gradually changed from neutral to ionized. Our current knowledge about this epoch is very limited. The measurements of the Thomson scattering optical depth of the cosmic microwave background (CMB) photons from the free electrons in the intergalactic media (IGM) (e.g. see Komatsu et al. 2011; Planck Collaboration et al. 2014; Planck Collaboration 2015 etc.) and the observations of the Lyman- absorption spectra of the high redshift quasars (e.g. seeBecker et al. 2001; Fan et al. 2003; White et al. 2003; Goto et al. 2011; Becker et al. 2015 etc.) suggest that this epoch was probably extended over a wide redshift range (see e.g. Mitra, Choudhury & Ferrara 2011, 2015; Mitra, Ferrara & Choudhury 2013; Robertson et al. 2013, 2015). However, many fundamental issues such as the characteristics of the major ionizing sources, the precise duration and timing of reionization and the topology of the neutral hydrogen (Hi ) distribution etc. cannot be resolved using these indirect observations.

Observation of the redshifted Hi 21-cm signal which provides a direct
window to the state of the hydrogen in the IGM is a very promising
probe of the EoR. There is a considerable effort underway to detect
the EoR 21-cm signal through radio interferometry using e.g. the
GMRT^{1}^{1}1http://www.gmrt.ncra.tifr.res.in (Paciga et al., 2013),
LOFAR^{2}^{2}2http://www.lofar.org/ (van Haarlem et al., 2013; Yatawatta et al., 2013),
MWA^{3}^{3}3http://www.haystack.mit.edu/ast/arrays/mwa/
(Bowman et al., 2013; Tingay et al., 2013; Dillon et al., 2014) and
PAPER^{4}^{4}4http://eor.berkeley.edu/
(Parsons et al., 2014; Ali et al., 2015; Jacobs et al., 2015). Apart from these first-generation
radio interferometers, the detection of this signal is also one of the
key science goals of the future telescopes such as the
SKA^{5}^{5}5http://www.skatelescope.org/
(Mellema et al., 2013; Koopmans et al., 2015) and
HERA^{6}^{6}6http://reionization.org/ (Furlanetto et al., 2009). The
Hi 21-cm signal is expected to be very weak ( orders in
magnitude) compared to the enormous amount of foreground emission,
from our own galaxy and the extragalactic sources, in which it is
buried
(Di Matteo et al., 2002; Gleser, Nusser & Benson, 2008; Ali, Bharadwaj & Chengalur, 2008; Jelić et al., 2008; Bernardi et al., 2009; Ghosh et al., 2012; Pober et al., 2013; Moore et al., 2013, 2015).
Mainly these foregrounds, the system noise (Morales, 2005; McQuinn et al., 2006)
and the other sources of calibration errors together have kept the
cosmologists at bay from detecting this signal and till today only a
rather weak upper limit on it have been obtained
(Paciga et al., 2013; Dillon et al., 2014; Parsons et al., 2014; Ali et al., 2015). Due to these obstacles, it
is anticipated that the first detection of the signal will be through
statistical estimators such as the variance (Patil et al., 2014) and the
power spectrum (Pober et al., 2014), where one adds up the signal optimally
to enhance the signal-to-noise ratio (SNR).

Any statistical estimation of a signal comes with an intrinsic uncertainty of its own, which arises because of the uncertainties in the signal across its different statistically independent realizations. In cosmology, this uncertainty is more commonly known as the cosmic variance (in other words this is the uncertainty due to the fact that we have only one universe to estimate the signal). Apart from the cosmic variance there will be uncertainties due to the sensitivity of the instrument as well (e.g. system noise, non-uniform baseline distribution etc.). It is necessary to quantify the different possible uncertainties in these measurements to correctly interpret the signal once it has been detected. If the EoR 21-cm signal had the nature and properties similar to a Gaussian random field, the estimation of its cosmic variance would have been very straight forward, as it scales as the square root of the number of independent measurements. Almost all studies (e.g. Morales 2005; McQuinn et al. 2006; Beardsley et al. 2013; Jensen et al. 2013; Pober et al. 2014; Koopmans et al. 2015 etc.) that have been undertaken to quantify the detectability of the EoR 21-cm power spectrum using different instruments such as the MWA, LOFAR, PAPER, SKA etc. assume the signal to be a Gaussian random field while estimating its cosmic variance. This can be a reasonably good assumption at large length-scales during the early phases of reionization when the Hi is expected to trace the underlying dark matter distribution. However, during the intermediate and the later stages of the reionization, the signal only appears from the neutral hydrogen located on the periphery of the ionized (Hii ) regions which are gradually growing both in number and size. This makes the redshifted 21-cm signal from the later stages of EoR highly non-Gaussian.

The statistics of a Gaussian random field is completely specified by its power spectrum, whereas the higher order statistics like the bispectrum (Bharadwaj & Ali, 2005) and the trispectrum are also important for a highly non-Gaussian field like the EoR 21-cm signal. Though the power spectrum itself cannot capture the non-Gaussian nature of the signal, the non-Gaussianity however will significantly affect its error estimates (i.e. cosmic variance). This has been demonstrated in a recent work by Mondal et al. (2015) using a large ensemble of simulated EoR 21-cm signal. Mondal et al. (2015) have shown that for a fixed observation volume, it is not possible to obtain an SNR above a certain limiting value, even when one increases the number of Fourier modes that goes into the estimation of the power spectrum. The analytical model for the cosmic variance proposed in this work further indicates that this limiting value of the SNR is directly related to the trispectrum of the signal and the total survey volume under consideration.

In this follow-up work on Mondal et al. (2015), we extend their analytical model to derive a generic expression for the entire error covariance matrix of the binned 21-cm power spectrum. Using a large number of realizations of the simulated 21-cm signal from EoR we further attempt to quantify the error covariance of its power spectrum. We also interpret it in the light of this improved analytical model. Since, this study is limited by the finite number of realizations of the signal, thus we further check the statistical significance of this error covariance. Besides this, the entire analysis of this paper is based on the numerical simulations of 21-cm signal which have a finite comoving volume. We therefore test the convergence of our results by increasing our simulation volume. Finally, we have tried to extract the trispectrum of the signal from the non-Gaussian component of the error covariance of the power spectrum. It is also important to note that the nature of the results and the analytical model that we have presented here is not limited only to the EoR 21-cm signal but can be applied to the analysis of any non-Gaussian cosmological signal such as the galaxy redshift surveys (Feldman, Kaiser & Peacock, 1994; Neyrinck, 2011; Mohammed & Seljak, 2014; Carron, Wolk & Szapudi, 2015).

The structure of this paper is as follows. Starting from the basic definition of the 21-cm brightness temperature fluctuations we derive the expressions for the power spectrum and the trispectrum of the EoR redshifted 21-cm signal in Section 2. We next derive the error covariance of the binned power spectrum estimator and also show its relation to the trispectrum in Section 3. Section 4 describes the semi-numerical simulations that we have used to generate the realizations of the EoR 21-cm signal. In Section 5, we describe about the reference ensembles which are used to interpret the results. In Section 6, we describe our results i.e. the estimated error covariance of the power spectrum from the simulated data. Finally, in Section 7, we discuss and summarize our results.

Throughout this paper, we have used the Planck+WP best-fitting values of cosmological parameters , , , , and (Planck Collaboration et al., 2014).

## 2 The power spectrum and the trispectrum

The EoR 21-cm signal is quantified through the brightness temperature fluctuation

(1) |

In this paper we are interested in the statistical properties of which is assumed to be a statistically homogeneous random field. The two point statistics of is quantified through the two-point correlation function which is defined as

(2) |

where the angular brackets denote an ensemble average over many statistically independent realizations of . It follows from statistical homogeneity that the two-point correlation function is invariant if we apply a displacement a to both and , or equivalently depends only on the relative displacement vector between the two points and

(3) |

The EoR 21-cm signal is not statistically isotropic due to redshift space distortion (Bharadwaj & Ali, 2004). While several works have attempted to quantify this anisotropy (Majumdar, Bharadwaj & Choudhury, 2013; Jensen et al., 2013; Shapiro et al., 2013; Majumdar et al., 2014; Ghara, Choudhury & Datta, 2015; Fialkov, Barkana & Cohen, 2015; Majumdar et al., 2015), in this work we only consider , which is the monopole (isotropic) component of .

We now consider the four point statistics (see e.g. equation 35.3 of Peebles 1980)

(4) |

where the (reduced) four-point correlation function quantifies the excess over the product of s. Here, statistical homogeneity implies that is invariant if we apply a displacement a to , , and i.e.

(5) |

or equivalently depends only on three relative displacement vectors

(6) |

It is convenient to use the Fourier representation considering a cubic comoving volume with periodic boundary conditions. We then have

(7) |

where is the Fourier transform of . Note that the wave vector k assumes both positive and negative values, however these are not independent as we have the relation which holds for the Fourier transform of a real quantity. Further, we can equally well interpret as the Fourier transform of for all values of k barring .

We first consider the two-point statistics. Incorporating the Fourier representation equation (7) in equation (2), we have

(8) |

We see that the r.h.s. picks up an extra phase factor if we apply a displacement to both and . However, the assumption of statistical homogeneity (equation 3) requires equation (8) to be invariant under such a displacement. This implies that has non-zero values only when for which , and it is zero when , We than have

(9) |

where the Konecker delta is 1 if and 0 otherwise. Here the power spectrum is defined as

(10) |

Using equation (9) in equation (8), we have

(11) |

whereby we see that the power spectrum is the Fourier transform of the two-point correlation function

(12) |

Proceeding in exactly the same manner for the four-point statistics (equation 4), statistical homogeneity (equation 5) requires that

(13) |

where we have used the notation . Here the trispectrum is the Fourier transform of the four-point correlation function

(14) |

Note that equation (14) for the four point statistics is exactly analogous to equation (11) which has been discussed earlier for the two-point statistics. We can also carry out the sum over and express equation (14) as

(15) |

The entire analysis of this paper is based on numerical simulations which have a finite comoving volume . The various factors of that appear in equations (9), and (12) – (14) leave one wondering whether the power spectrum and particularly the trispectrum would vary if the volume were changed. To address this, we consider the limit . In this limit the power spectrum

(16) |

and the trispectrum

(17) |

have finite, well defined values provided the integrals

(18) |

and

(19) |

respectively converge.

We have assumed that and fall sufficiently rapidly at large separations so that the integrals in equations (18) and (19) both converge. The limiting power spectrum and trispectrum then have finite, well defined values, and the simulated and would respectively converge to and if the simulation volume were increased. In our analysis we assume that our simulations cover a sufficiently large volume of the universe whereby the simulated power spectrum and trispectrum are respectively sufficiently close to and for the range of our interest, and the simulated values would not change significantly if the volume were increased further.

## 3 The error-covariance of the power spectrum

The question here is ‘How accurately can we estimate the power spectrum from a given EoR data?’. In general, any observation will yield a combination of the EoR signal and instrumental noise, assuming that the foregrounds have been completely subtracted out. In this analysis, we only consider the statistical errors which are inherent to the EoR signal, and we do not consider the instrumental noise. The statistical errors which we have considered here are usually referred to as the cosmic variance.

We consider the binned power spectrum estimator which, for the th bin, is defined as

(20) |

where , and respectively refer to the sum, the number and the average comoving wavenumber of all the Fourier modes in the th bin. The bins here are spherical shells of width in Fourier space. We have used logarithmic binning which essentially implies that will vary from bin to bin. As the modes k and do not provide independent estimates of the power spectrum, we have restricted the sum to the upper half of the spherical shell which has volume in k space. To calculate , the number of Fourier modes in this volume, we note that the different wave vectors k are all equally spaced at a separation of in k space. We consequently have

(21) |

which we use to estimate .

The ensemble average of the estimator gives the bin-averaged power spectrum

(22) |

The error-covariance of the power spectrum estimator

(23) |

is the quantity of interest here. This can also be written as

(24) |

and the term in the square brackets of equation (24) can be expressed as

(25) |

Using eq. (13) to simplify eq. (25) we can express the error covariance as

(26) |

where

(27) |

is the square of the power spectrum averaged over the th bin, and

(28) |

is the average trispectrum where and are summed over the th and the th bins respectively.

We first discuss the results expected for a Gaussian random field for which the trispectrum is zero. In this case we can use equation (21) to express the covariance matrix as

(29) |

The first point here is that the covariance matrix is diagonal. This implies that the errors in the different bins are uncorrelated. The second point is that the covariance matrix scales as if we increase the observational volume or the bin width .

It is possible to interpret the diagonal elements as the error variance for the power spectrum. We can then express the error in the estimated power spectrum as

(30) |

which is analogous to the error estimate in the context of galaxy redshift surveys (e.g. equation 11.119 of Dodelson 2003). We see that the error falls as if we increase the observational volume. For a fixed observational volume, we expect the error to fall as until it reaches a minimum value which is achieved when all the Fourier modes are combined into a single bin.

The EoR signal becomes increasingly non-Gaussian as the reionization proceeds. This manifests itself as a non-zero trispectrum in the error-covariance (equation 26) which can be expressed as

(31) |

The covariance matrix still retains the dependence, similar to the Gaussian random field discussed earlier. Consequently we still expect the errors in the estimated power spectrum to go down as if the observational volume is increased. However, the covariance matrix now has two major differences from that of a Gaussian random field.

The first difference is that the covariance matrix is no longer diagonal. The average trispectrum quantifies the correlation between the EoR signal in two different bins ( and ). The off-diagonal elements of the covariance matrix () quantifies the correlation between the errors in the power spectrum estimated in the and bins respectively.

The second difference is that the diagonal terms of the covariance matrix deviate from the behaviour predicted for a Gaussian random filed. For small bin-widths (), we expect the error variance to fall as as the bin-width is increased. The error variance saturates as the bin-width approaches , and it does not fall below the limiting value even if all the Fourier modes are combined into a single bin.

For a Gaussian random field, we expect the signal to noise ratio to increase as if we increase the number of modes in the bin . The , however, will saturate at a limiting value when the EoR 21-cm signal becomes non-Gaussian. Semi-numerical simulations show (Mondal et al., 2015) that the behaviour only holds for small , and saturates at a limiting value as is increased. The limiting value is found to decrease (i.e. increases) as reionization proceeds.

The expected behaviour is a consequence of the fact that the signal in the different Fourier modes is independent for a Gaussian random field. The EoR signal at the different Fourier modes, however, become correlated as ionized bubbles develop and the Hi signal becomes non-Gaussian. The trispectrum quantifies this correlation between the signal at different Fourier modes. The fact that saturates and does not decrease beyond even if we increase is a consequence of the fact that we are not adding independent information by increasing the number of Fourier modes in the bin.

The trispectrum is, in general (equation 13), sensitive to correlations in both the amplitude and the phase of the signal at the different Fourier modes , , and . The average trispectrum (equation 28). which appears in our expression for the error covariance (equation 26), however, depends only on the term which is insensitive to correlations in the phase of the modes and . We therefore see that the error covariance (equation 31) is only affected by the correlations in the amplitude of and , it is insensitive to purely phase correlations between the signal at these two modes.

In summary of this section we note that the non-Gaussianity introduces an extra term in the error covariance (equation 31). As a consequence the error variance for the binned power spectrum saturates at a limiting value , it is not possible to decrease the error in the estimated power spectrum beyond by increasing the number of Fourier modes in the bin. Further, the error covariance matrix is not diagonal. The off-diagonal terms quantify the correlations between the errors in the power spectrum estimated at different bins.

## 4 Simulating the EoR redshifted 21-cm signal

We have used semi-numerical simulations to generate the EoR redshifted 21-cm signal. These simulations consist of three main steps. First, we use a particle mesh -body code to generate the dark matter distribution at the desired redshift. We have run simulations with two different comoving volumes and using grids of size and , respectively. The spatial resolution and the mass resolution is maintained the same for both and . In the next step we identify the mass and the location of collapsed haloes using the standard Friends-of-Friends (FoF) algorithm (Davis et al., 1985) with a fixed linking length of times the mean inter-particle distance. We have set the criterion that a halo should have at least dark matter particles whereby we have a minimum halo mass of

The final step generates the ionization map based on the excursion set formalism of Furlanetto, Zaldarriaga & Hernquist (2004). The basic assumption here is that the hydrogen traces the dark matter distribution and the dark matter haloes host the sources which emit ionizing radiation. It is assumed that the number of ionizing photons emitted by a source is proportional to the mass of the host halo, and it is possible to achieve different values of the mass averaged Hi neutral fractions by tuning this constant of proportionality. Our simulation closely follows Choudhury, Haehnelt & Regan (2009) to generate the ionization map, and the resulting Hi distribution is mapped onto redshift space to generate 21-cm brightness temperature maps following Majumdar, Bharadwaj & Choudhury (2013). The grid used to generate the ionization maps and the 21-cm brightness temperature maps is eight times coarser than that used for the -body simulation.

The redshift evolution of is, at present, largely unknown. Instead of assuming a particular model for , we have fixed the redshift and run our simulations for different values of at this fixed redshift. We have simulated Hi maps for values at an interval of in the range in addition to . For each simulation volume ( and ) and for each value of , we have run independent simulations to generate an ensemble of statistically independent realizations of the 21-cm signal. We refer to this ensemble as the Signal Ensemble (SE). The left-hand panel of Fig. 1 shows a section through one realization of the SE for . We have used the SE to estimate the bin-averaged power spectrum and the error covariance matrix for the two different simulation volumes and , and for the different values mentioned earlier.

## 5 Simulating reference ensembles

The previous section describes how we have estimated the power spectrum error-covariance . In summary, we have constructed an ensemble of statistically independent realizations of the simulated EoR 21-cm signal and used this to estimate . We refer to this ensemble as the SE. The question now is ‘How do we interpret the estimated ?’. We know that for a Gaussian random field we expect: (A.) the diagonal terms to have values as predicted by equation (29), and (B.) the off-diagonal terms to be zero. We may interpret any deviation from this as arising from non-Gaussianity, and then use these deviations to quantify the contribution from the trispectrum in equation (31). While this is straightforward in concept, several complications arise in practice.

### 5.1 The Randomized Signal Ensemble

The first complication arises when we try to interpret the diagonal terms . We expect these to have values as predicted by equation (29) if the signal were a Gaussian random field, and it is possible to interpret deviations from this relation in terms of the trispectrum which appears in equation (31) when the signal becomes non-Gaussian. The problem arises because it is not possible to use the SE to independently determine the value of which appears in equation (29). We have overcome this problem by constructing the Randomized Signal Ensemble (RSE).

Each realization of RSE contains the signal drawn from all the realizations in SE. We have labeled all the modes in the simulation volume as . Note that we are free to choose any arbitrary labeling scheme as long as it assigns an unique label to each distinct Fourier mode k. The Fourier modes are then divided into sets , ,… . For the first realization in RSE, the signal for all the modes in is drawn from the first realization in SE (i.e. [SE]), and the signal for all the modes in is drawn from the second realization in SE (i.e. [SE]), and so on. The first realization in RSE thus contains a mixture of the signal drawn from all the realizations in SE. For the second realization in RSE, the signal for all the modes in is drawn from [SE], and the signal for all the modes in is drawn from [SE] and so on. The second realization in RSE also contains signal drawn from all the realizations in SE. Further, there is no signal which is common between the first and second realization in RSE. The realizations in RSE have all been constructed in this fashion such that each realization of RSE contains a mixture of the signal from all the realizations in SE. Further, none of the realizations in RSE have any signal in common. The right-hand panel of Fig. 1 shows a section through one realization of the RSE for .

We do not expect the signal in the modes drawn from SE to be correlated with those drawn from SE, etc. We therefore expect the average trispectrum to be at least times smaller for RSE as compared to SE. For the purpose of this work we have assumed that for RSE. Further, since the entire signal in SE is also present in RSE, we expect and to have exactly the same value in both SE and RSE. The RSE, therefore, provides an independent estimates of . We have used this to estimate the values which the diagonal elements of (equation 31) are expected to have if the EoR signal were a Gaussian random field with . It thus becomes possible to interpret any deviations from this as arising from due to the non-Gaussianity in the EoR 21-cm signal.

### 5.2 Ensemble of Gaussian Random Ensembles

The second complication arises from the fact that the SE has a finite number of realizations. To appreciate this we construct the Gaussian Random Ensemble (GRE). The GRE, like the SE, contains realizations of the 21-cm signal, the signal in each realization however is a Gaussian random field. The signal at any mode k in the th bin is calculated using

(32) |

where and are two real valued independent Gaussian random variables of unit variance, and is the bin-averaged power spectrum calculated from SE. The middle panel of Fig. 1 shows a section through one realization of the GRE for .

The bin-averaged power spectrum estimated from any single realization in GRE will be different from . Further, the bin averaged power spectrum estimated using all members of GRE, which we refer to as , will also differ from because of the limited number of realizations. Similarly, the off -diagonal terms of the error-covariance estimated from GRE will not be zero but will have random fluctuations around zero due to the limited number of realizations. It is thus necessary to compare the estimated from SE against the random fluctuation of in order to determine whether estimated from SE is statistically significant or not. The issue now is to estimate the variance of the covariance . We have used independent GREs to construct an Ensemble of Gaussian Random Ensembles (EGRE) which we have used to estimate the variance of . In summary, we cannot straightaway interpret the non-zero off-diagonal terms in as arising from non-Gaussianity in the EoR 21-cm signal. It is necessary to assess the statistical significance of the non-zero values by comparing them against estimated from the EGRE.

## 6 Results

Fig. 1 shows three 21-cm maps corresponding to individual realizations drawn from the SE, GRE and RSE respectively. The simulations all correspond to the same neutral fraction and they all have the same bin averaged power spectrum . It is believed that at the length scales which will be probed by observations the EoR 21-cm signal (in terms of both power spectrum and variance) peaks around (see e.g. McQuinn et al. 2007; Lidz et al. 2008; Barkana 2009; Choudhury, Haehnelt & Regan 2009; Mesinger, Furlanetto & Cen 2011; Jensen et al. 2013; Majumdar, Bharadwaj & Choudhury 2013; Iliev et al. 2014; Patil et al. 2014; Watkinson & Pritchard 2014; Majumdar et al. 2015), and we have thus restricted the entire discussion of this section to the situation where . At this stage we expect a little less than of the volume to be occupied by ionized bubbles. These bubbles, which are quite distinctly visible in the left-hand panel, cause the EoR 21-cm signal to be significantly non-Gaussian at . This is quite apparent if we compare the EoR signal to the central panel which is a Gaussian random field. There are no bubbles visible in the central panel. The Randomized Signal (shown in the right most panel of the same figure), which has a much smaller trispectrum compared to the EoR signal, looks quite distinct from both the other cases.

Fig. 2 shows the mean squared brightness temperature fluctuation of the EoR 21-cm signal as a function of . This essentially is a measure of the bin averaged 21-cm power spectrum estimated from SE. The range to has been divided in 10 equally spaced logarithmic bins with . We have maintained the same bin widths for both the simulation volumes and . However, we notice that the value of , the average value corresponding to a particular bin, varies from to (Fig. 2). This variation arises because the exact number and values of the Fourier modes in a particular bin changes if we change the simulation volume even though is fixed. Comparing the results from the two simulation volumes, we see that there is very little change in the power spectrum between and . This indicates that the simulation volumes used here are sufficiently large so that the power spectrum has converged. The error bars shown in the figure correspond to the error estimated from SE. We notice that the error bars change from to , the errors being smaller for the larger simulation. This arises from the dependence (eq. 31) discussed earlier. A detailed analysis of the covariance matrix follows.

We now shift our attention to the error covariance matrix which is the main focus of this paper. Fig. 3 shows the diagonal elements as a function of for the two different simulation volumes and . We have also shown where the matrix elements determined for have been scaled to account for the dependence predicted by equation (31). We see that the scaled elements are in reasonable agreement with the results for , roughly indicating that the error-covariance has converged within the simulation volume which we have used here. We see that the values of the covariance matrix span a very large dynamical range, and it is not very convenient to analyse this if we are looking for relatively small changes in the values. We find that it is much more convenient to instead use the dimensionless covariance matrix which is defined as

(33) |

and which, using equation (31), can be expressed as

(34) |

where

(35) |

is the dimensionless bin-averaged trispectrum and

(36) |

is a number of order unity introduced in Mondal et al. (2015). The value of is expected to vary from bin to bin. We also expect its value to vary if we change the simulation volume. However, all these variations are expected to be small, and we may expect a value in most situations.

The left-hand panel of Fig. 4 shows , the diagonal elements of the dimensionless covariance matrix, as a function of . The volume dependence of has been scaled out in the definition of (equation 33), and we do not expect the values to change if we vary the volume provided that the error-covariance has converged within the simulation volume. We find that the values of obtained from the two different volumes and are consistent with each other over the range . The values obtained from , however, are times larger than those obtained from at larger values . The difference at small () may be attributed to the cosmic variance of the error-covariance and is possibly not statistically significant. However, the differences between and at large appears to be significant. We find that the smaller volume is under-estimating the error-covariance relative to , indicating that for the error-covariance has not converged within the simulation volume. One would naively expect convergence issues to be more important at large scales which are comparable to the simulation size. The fact that the error-covariance appears to have converged at large scales () while it seems to have not converged at small scales () is quite counter intuitive, and we shall address this a little later.

The right-hand panel of Fig. 4 shows estimated from the RSE for which we expect , whereby

(37) |

This gives an estimate of the error-covariance that would be expected if the EoR signal were a Gaussian random field. As expected, we see that the values of are below those estimated from SE. Mondal et al. (2015) have estimated the value of in a completely independent manner by fitting the behaviour of the as a function of . The latter method ignores the fact that varies from bin to bin, and returns just a single value of which is for . We have also plotted using this value and the values corresponding to the bins in . We note that , though the actual value changes somewhat from bin to bin. We find that estimated from the and RSE simulations, and also from equation (37) using the constant , are all consistent with one another.This consistency, in a sense, also validates the idea that the RSE allows us to independently estimate the error-covariance that would be expected if the EoR signal were a Gaussian random field (equation 37).

We further illustrate the idea behind the RSE and also validate this in Fig. 5. Recollect that each realization in the RSE contains a mixture of signal from independent realizations of the EoR signal, and we expect for RSE to be at least times smaller than estimated from SE. In addition to RSE, we also show results for RSE10 and RSE25. Each realization in RSE10 has signal drawn from independent realization from SE instead of . We expect for RSE10 and RSE25 to be respectively around and times smaller than estimated from SE. Starting from SE (equation 34) , we expect the values of to slowly approach equation (37) as we move from RSE10 to RSE25 and then to RSE. This transition is clearly seen in Fig. 5. There is very little change in the values of from RSE25 to RSE50 ( except possibly at the largest value). This validates the assumption that for the RSE.

The difference gives an estimate of the bin-averaged trispectrum. Here we have used to estimate the dimensionless bin-averaged trispectrum for which the results are shown in Fig. 6. We see that the results for the two different simulation volumes look quite similar, though there are some differences in the actual values. The values estimated from the larger volume are larger than those estimated from at . The values differ by a nearly constant ratio of at . The trend is reversed at where the values estimated from are larger than those from . Taken at face value, these discrepancies in the values of between the two different simulation volumes indicate that the trispectrum has not converged within the simulation volume. We note, however, that it is necessary to be cautious before arriving at such a conclusion because we have no estimate of the cosmic variance for . The discrepancy at large is possibly genuine, whereas the discrepancy at small is possibly influenced by the cosmic variance. For the subsequent discussion in this paper we focus on the larger volume assuming that the results are representative of what would be expected for an even larger volume.

We see (Fig. 6) that we have for , and it increases quite rapidly with and at and respectively. In contrast, we have for nearly the entire range (Fig. 4). We thus expect the error-covariance to be largely dominated by the trispectrum for nearly the entire range that we have considered here. Fig. 7 shows the ratio which quantifies the relative magnitudes of the two terms that contribute to (equation 34). We see that the two terms make roughly equal contributions in the range . The relative contribution from the trispectrum increases quite steeply with increasing . At the largest value (), the contribution from the trispectrum is times larger than the error-covariance that we would expect if the EoR signal were a Gaussian random field.

We now shift our focus to the off-diagonal elements of which quantify the correlation between the errors at different bins. Since the diagonal terms span a pretty large dynamical range, it is more convenient to consider the correlation coefficient

(38) |

instead of directly analysing the off-diagonal terms of . The values of are, by definition, restricted to lie in the range , the values and indicating that the errors in the and bin are fully correlated and anti-correlated respectively. Intermediate values () indicate partial correlation or anti-correlation, and indicates that the errors in the