# The Difference PDF of 21-cm Fluctuations: A Powerful Statistical Tool for Probing Cosmic Reionization

## Abstract

A new generation of radio telescopes are currently being built with the goal of tracing the cosmic distribution of atomic hydrogen at redshifts 6-15 through its 21-cm line. The observations will probe the large-scale brightness fluctuations sourced by ionization fluctuations during cosmic reionization. Since detailed maps will be difficult to extract due to noise and foreground emission, efforts have focused on a statistical detection of the 21-cm fluctuations. During cosmic reionization, these fluctuations are highly non-Gaussian and thus more information can be extracted than just the one-dimensional function that is usually considered, i.e., the correlation function. We calculate a two-dimensional function that if measured observationally would allow a more thorough investigation of the properties of the underlying ionizing sources. This function is the probability distribution function (PDF) of the difference in the 21-cm brightness temperature between two points, as a function of the separation between the points. While the standard correlation function is determined by a complicated mixture of contributions from density and ionization fluctuations, we show that the difference PDF holds the key to separately measuring the statistical properties of the ionized regions.

###### keywords:

galaxies:high-redshift – cosmology:theory – galaxies:formation^{1}

^{2}

## 1 Introduction

The earliest generations of stars are thought to have transformed the
universe from darkness to light and to have reionized and heated the
intergalactic medium. Knowing how the reionization process happened is
a primary goal of cosmologists, because this would tell us when the
early stars formed and in what kinds of galaxies. The clustering of
these galaxies is particularly interesting since it is driven by
large-scale density fluctuations in the dark matter
(Barkana & Loeb, 2004). While the distribution of neutral hydrogen during
reionization can in principle be measured from maps of 21-cm emission
by neutral hydrogen, upcoming experiments such as the Mileura
Widefield Array^{3}^{4}

Studies of statistics of the 21-cm fluctuations have focused on the two-point correlation function (or power spectrum) of the 21-cm brightness temperature. This is true both for analytical and numerical studies and analyses of the expected sensitivity of the new experiments (Bowman, Morales, & Hewitt, 2006; McQuinn et al., 2006). The power spectrum is the natural statistic at very high redshifts, as it contains all the available statistical information as long as Gaussian primordial density fluctuations drive the 21-cm fluctuations. However, during reionization the hydrogen distribution is a highly non-linear function of the distribution of the underlying ionizing sources. This follows most simply from the fact that the H I fraction is constrained to vary between 0 and 1, and this range is fully covered in any scenario driven by stars, in which the intergalactic medium is sharply divided between H I and H II regions. The resulting non-Gaussianity (Bharadwaj & Ali, 2005) raises the possibility of using complementary statistics to measuring additional information that is not directly derivable from the power spectrum (Saiyad-Ali et al., 2006).

Numerical simulations have recently begun to reach the large scales (of order 100 Mpc) needed to capture the evolution of the IGM during reionization (Mellema et al., 2006; Zahn et al., 2007). These simulations account accurately for gravitational evolution on a wide range of scales but still crudely for gas dynamics, star formation, and the radiative transfer of ionizing photons. Analytically, Furlanetto et al. (2004) used the statistics of a random walk with a linear barrier to model the H II bubble size distribution during the reionization epoch. Schematic approximations were developed for the two-point correlation function (Furlanetto et al., 2004; McQuinn et al., 2005), but recently Barkana (2007) developed an accurate, self-consistent analytical expression for the full two-point distribution within the Furlanetto et al. (2004) model, and in particular used it to calculate the 21-cm correlation function.

Noting the expected non-Gaussianity and the importance of additional statistics, Furlanetto et al. (2004) also calculated the one-point probability distribution function (PDF) of the 21-cm brightness temperature at a point. The PDF has begun to be explored in numerical simulations as well (Ciardi & Madau, 2003; Mellema et al., 2006). Some of the additional information available in the PDF can be captured by the skewness (Wyithe & Morales, 2007) or bispectrum (Saiyad-Ali et al., 2006) statistics. Both the correlation function and the PDF are functions of a single variable (at each redshift): the two-point correlation function is a function of separation, and the PDF is a function of . It is possible to create a two-dimensional function by calculating the one-point PDF as a function of smoothing scale (or pixel size), but this quantity is difficult to interpret since it is not simply related to the 21-cm correlation function or to the ionization statistics.

In this paper we consider a two-dimensional function that generalizes both the one-point PDF and the correlation function and yields additional information beyond those statistics. In particular, the variance of this new statistic is simply related to the 21-cm correlation function that is usually considered. This function is the PDF of the difference of the 21-cm brightness temperatures at two points. We present in the next section our analytical model for predicting the difference PDF; its precise relation to the two-point correlation function is presented in section 2.3. We present illustrative predictions of the difference PDF in section 3, where in section 3.2 we emphasize that it can be used to separately measure ionization correlations. We summarize our conclusions in section 4.

## 2 Model

Analytical approaches to galaxy formation and reionization are based on the mathematical problem of random walks with barriers. The statistics of a single random walk can be used to calculate various one-point distributions; in particular, the statistics of a random walk with a linear barrier can be used to calculate the distribution of ionized bubble sizes during reionization (Furlanetto et al., 2004). However, to calculate the correlation function and other two-point distributions requires us to solve for the simultaneous evolution of two correlated random walks at two different points. Scannapieco & Barkana (2002) found an approximate but quite accurate analytical solution in the case of constant barriers and used it to calculate the joint, bivariate mass function of halos forming at two redshifts; Scannapieco & Thacker (2005) showed that this solution describes well the two-point correlation function of halos in numerical simulations, particularly when expressed in Lagrangian coordinates (i.e., in terms of the initial comoving halo separation). Barkana (2007) generalized the two-point solution to the case of linear barriers, and applied it to calculate the correlation function of cosmological 21-cm fluctuations during reionization. In this section we first review the basic setup of the two-barrier problem in the context of reionization. We then briefly summarize the solution of Barkana (2007), except that we generalize it slightly to the case of measurements in the presence of additional smoothing (e.g., due to a limited instrumental resolution). We show how to apply this solution to calculate the difference PDF of 21-cm fluctuations.

### 2.1 Reionization: basic setup

The basic approach for using random walks with barriers in cosmology follows Bond et al. (1991), who used it to rederive and extend the halo formation model of Press & Schechter (1974). In this approach we work with the linear overdensity field , where is a comoving position in space, is the cosmological redshift and is the mean value of the mass density . In the linear regime, the overdensity grows in proportion to the linear growth factor (defined relative to ). This fact is used in order to extrapolate the linear density field to the present time, i.e., the initial density field at high redshift is extrapolated to the present by multiplication by the relative growth factor. We adopt this view, and throughout this paper quantities such as and the power spectrum refer to their values linearly-extrapolated to the present. In each application there is in addition a barrier that signifies the critical value which the linearly-extrapolated must reach in order to achieve some physical milestone on some scale. In this work the milestone corresponds to having a sufficient number of galaxies within some region in order to fully reionize that same region.

At a given , we consider the smoothed density in a region around a fixed point in space. We begin by averaging over a large scale or, equivalently, by including only small comoving wavenumbers . We then average over smaller scales (i.e., include larger ) until we find the largest scale on which the averaged overdensity is higher than the barrier; in the application to reionization, we then assume that the point belongs to an H II bubble of this size. Mathematically, if the initial density field is a Gaussian random field and the smoothing is done using sharp -space filters, then the value of the smoothed undergoes a random walk as the cutoff value of is increased. Instead of using , we adopt the (linearly-extrapolated) variance of density fluctuations as the independent variable. While the solutions are derived in reference to sharp -space smoothing, we follow the traditional extended Press-Schechter approach and substitute real-space quantities in the final formulas. In particular, is calculated as the variance of the appropriate mass enclosed in a spatial sphere of comoving radius .

We apply mathematical random-walk statistics to the distribution of H II regions during reionization using the model of Furlanetto et al. (2004). According to this model, a given point is contained within a bubble of size given by the largest surrounding spherical region that contains enough ionizing sources to fully reionize itself. If we ignore recombinations, then the ionized fraction in a region is given by , where is the collapse fraction (i.e., the gas fraction in galactic halos) and is the overall efficiency factor, which is the number of ionizing photons that escape from galactic halos per hydrogen atom (or ion) contained in these halos. This simple version of the model remains approximately valid even with recombinations if the number of recombinations per hydrogen atom in the IGM is roughly uniform; in this case, the resulting reduction of the ionized fraction by a constant factor can be incorporated into the value of .

In the extended Press-Schechter model (Bond et al., 1991), in a region containing a mass corresponding to variance ,

(1) |

where is the variance corresponding to the minimum mass of a halo that hosts a galaxy, is the mean density fluctuation in the given region, and is the critical density for halo collapse at . While this describes fluctuations in well, the cosmic mean collapse fraction (and thus the overall evolution of reionization with redshift) is better described by the halo mass function of Sheth & Tormen (1999) (with the updated parameters suggested by Sheth & Tormen (2002)). We thus use the latter mean mass function and adjust in different regions in proportion to the extended Press-Schechter formula; Barkana & Loeb (2004) suggested this hybrid prescription and showed that it fits a broad range of simulation results. With these assumptions, the exact ionized fraction in a region is given by

(2) |

where and are the cosmic mean collapse fractions according to the Sheth-Tormen and Press-Schechter models, respectively.

The resulting condition for having an ionized bubble of a given size, written as a condition for vs. , is of the same form as in Furlanetto et al. (2004), at a given redshift, and thus (as they showed) yields a linear barrier to a good approximation (see also Furlanetto et al. (2006)). We write the effective linear barrier in a general notation:

(3) |

where

(4) |

with erfc denoting the inverse function of erfc. For consistency, in what follows we use a modified formula for the ionized fraction, replacing equation (2) with the expression that corresponds to the linear approximation of the barrier:

(5) |

This replacement ensures that the ionized fraction varies from 0 to 1 as goes from up to the barrier. We also denote the neutral fraction and, in particular,

(6) |

The approximation of the linear barrier is quite accurate as long as the maximum that we consider is much smaller than , which is the case in the applications of the model in this paper, where the maximum is set by the resolution of the upcoming experiments (see the next subsection).

Because of some approximations in this model, the total ionized fraction as given by the model [see equation (15) in the next subsection] comes out slightly different from the direct result for the mean global ionized fraction, in terms of the cosmic mean collapse fraction (Note that the model of Furlanetto et al. (2004) suffers from a similar difficulty). To deal with this, we adopt the direct values of versus redshift, and adjust within the model to an effective value of at each redshift that gives a model value of that equals the desired one. This typically only requires an adjustment by a few percent or less.

### 2.2 The 21-cm one-point PDF

Before considering two-point functions, we first calculate the 21-cm PDF around one point by following Furlanetto et al. (2004), except that we obtain the ionized fraction from eq. (5) for consistency with the barrier, and we also apply a non-linear correction to the density. We denote the PDF itself by , and the cumulative probability distribution (CPD) by .

During cosmic reionization, we assume that there are sufficient radiation backgrounds of X-rays and of Ly photons so that the cosmic gas has been heated to well above the cosmic microwave background temperature and the 21-cm level occupations have come into equilibrium with the gas temperature. In this case, the observed 21-cm brightness temperature relative to the CMB is independent of the spin temperature and, for our assumed cosmological parameters, is given by (Madau et al., 1997)

(7) |

with , where is the neutral hydrogen fraction and is the linear overdensity at (as opposed to which denotes the density linearly-extrapolated to redshift 0). Under these conditions, the 21-cm fluctuations are thus determined by fluctuations in .

In the model, is determined by the halo abundance, which is in turn determined by the statistics of the linear density field, and thus naturally falls within the correct range of 0 to 1. However, also depends on the actual density, and the linear density can take on unphysical values below -1. While a full non-linear model would be difficult to solve, within the context of the model where statistics are averaged over spherical regions, we can make a simple, approximate correction in order to get reasonable values for the actual, nonlinear density . Mo & White (1996) developed such an approximate formula for as a function of , based on spherical collapse (of overdensities) or spherical expansion (of voids). In particular, they incorporated the asymptotic limits of and as well as the correct behavior near . Their formula is valid for the Einstein-de Sitter universe or more generally at high redshift (when the dynamical effect of the cosmological constant is negligible). We similarly develop and use here an accurate approximation for the inverse function,

(8) | |||||

where is the critical collapse overdensity in an Einstein-de Sitter cosmology. Thus, the expression we use for is

(9) |

If we denote by the one-point PDF of at a point, then this is related to the 21-cm one-point PDF defined above by

(10) |

Also, the corresponding cumulative probability distributions are equal. We also assume that the PDF is considered on a resolution scale (corresponding to a variance ), i.e., that the density and ionization states are averaged on the scale around the point being considered. In other words, we are really considering the density and ionization distributions in a region of size centered at a point.

We must now consider the separate contributions to the PDF of from two possible cases. First, if the point lies within a fully ionized region, then is identically zero, so this case contributes a -function (Dirac delta function) at 0, containing the total probability that the region is ionized. This probability is given by the quantity in equation (15) in Barkana (2007), as first derived by McQuinn et al. (2005). The second case is if the point lies within a region that is still partially neutral. In this case, is given by equation (9), where for each value of we use from equation (6). We then distribute the total probability of this case into various values of using the conservation of probability, i.e., for each possible value of , the probability [equation (14) in Barkana (2007), again in agreement with McQuinn et al. (2005)] contributes to at the value of that corresponds to . As noted by Furlanetto et al. (2004), there is a maximum value of , , since in the two extreme limits, both when the density goes to zero (due to the density term in ) and when it goes up to the barrier (due to full ionization). Thus, two values of contribute to each value of , and is singular at .

### 2.3 The 21-cm difference PDF

We now consider two separate points, with the same assumptions as in the previous subsection. We wish to consider the PDF of the difference of the 21-cm brightness temperatures at two points (or, in fact, averaged over regions centered at each of the two points). We denote the PDF itself by , and the CPD by . If we denote by the PDF of the difference between the values of in the two regions, then this is related to the 21-cm difference PDF by

(11) |

Also, the corresponding cumulative probability distributions are equal.

We therefore wish to determine the PDF of the difference as a function of the comoving distance between two points being considered at redshift . As before, we also assume that the PDF is considered on a resolution scale , i.e., we are really considering the joint density and ionization distributions of two regions of size centered at two points separated by a distance . The model of Barkana (2007) provides the probability that either or both of these regions lie completely within H II bubbles and, when the regions are not fully ionized, the model provides the correlated distributions of their average overdensities.

We must now consider the separate contributions to the PDF of from three possible cases. First, if both points are within fully ionized regions, then is identically zero, so this case contributes a -function at 0, containing the total probability that both regions are ionized. This probability is given by the quantity in equation (40) in Barkana (2007), where is the effective real-space cross-correlation between the densities of the two regions [see section 4.1 of Barkana (2007)]:

(12) |

where is the Fourier transform of a spherical top-hat window function. The second case is where one of the regions is fully ionized, so, e.g., we assume that region 2 is in an H II bubble while region 1 is not, and then double the contribution in order to include the symmetric, opposite situation. In this case, , where for each value of we use from equation (6). We then distribute the total probability of this case into various values of using the conservation of probability, i.e., for each possible value of , the probability [equation (43) in Barkana (2007)] contributes to at the value of that corresponds to . The final case is where both regions are not fully ionized. In this case, is a function of and , and the conservation of probability turns [equation (36) in Barkana (2007)] into at the appropriate value of .

The variance of the 21-cm PDF (for two points separated by a distance ) is

(13) |

where the ordinary 21-cm correlation function is , and the notation denotes averaging on the resolution scale . The variance can be calculated from the PDF using one of these expressions:

(14) | |||||

where we integrated by parts to get the second expression.

To calculate the correlation function of (or, equivalently, of ) without first calculating the PDF, we can calculate various expectation values using the Barkana (2007) solution, once again generalized to include a resolution/smoothing length. First, the mean ionized fraction in the model is

(15) | |||||

where and are given in section 3 of Barkana (2007), while the mean at a point is

(16) | |||||

For two points separated by a distance ,

(17) | |||||

which is a generalization of equation (49) of Barkana (2007) and reduces to that equation in the limit . Also, in the limit equation (17) simplifies to

(18) | |||||

## 3 Results

### 3.1 The full 21-cm difference PDF

In this subsection we use our model to predict the 21-cm difference PDF, plot it in full as a two-dimensional function of the separation and the brightness temperature difference of the two points, and explore its dependence on a number of the input parameters (including the redshift and the resolution scale). In the following subsection we then show that important information can be extracted using gross features of the PDF that are insensitive to its detailed shape; in particular, this information can be used to cleanly separate out and measure statistics of the ionization field that otherwise would be mixed in and convolved with the density field within the usually-considered two-point correlation function. Throughout this section, we illustrate our predictions in a CDM universe that includes dark matter, baryons, radiation, and a cosmological constant. We assume cosmological parameters that match the three year WMAP data together with weak lensing observations (Spergel et al., 2007), namely , , , , and .

Figure 1 shows an example of the 21-cm one-point PDF and CPD and the two-point 21-cm difference PDF and CPD. For the one-point function, there are two separate contributions; the case of a partly neutral region is shown as a function of , while the case where the region is ionized contributes an additional -function to the PDF, or equivalently a step function to the CPD. In the CPD the size of this step function can be easily read off as the additional value needed to bring it up to unity at . Similarly, for the difference PDF and CPD there are three separate contributions; two of them – the cases of both regions being partly neutral or just one of them – are shown as functions of , while the third case – with both regions fully ionized – contributes an additional -function to the PDF. Again, in the CPD the size of the step function equals the additional value needed to bring the CPD up to unity at . Another advantage of the one-point or two-point CPDs, pointed out by Furlanetto et al. (2004) in the one-point case, is that the PDF becomes singular (in the model) at the maximum value of while the CPD does not diverge. Thus, for these two reasons, we henceforth prefer to plot the CPD instead of the PDF.

The model predicts characteristic shapes for the PDF and CPD during reionization. In particular, the one-point function cuts off at some maximum value of (which corresponds in this case to ), and has most of the probability near the cutoff. The reason for this behavior is shown explicitly in Figure 2. Except very early in reionization, a small (or even somewhat negative) value of overdensity suffices in order to significantly ionize a region. In particular, if we increase the overdensity of a region in the model, eventually we reach a large-enough overdensity at which the region fully reionizes itself and drops to zero. Thus it is not possible to have arbitrarily large values of . In practice, reaches a maximum at values near zero. Two factors then ensure that most of the probability of values is located around this maximum value. First, the function versus is flat near its maximum (particularly in the later stages of reionization), and second, the probability distribution of is centered at values of near zero. Note that we assume a Gaussian probability distribution for , although the density is weakly non-linear and its distribution will thus be slightly modified. For instance, the standard deviation of in these examples is for a resolution, and half that for a resolution. Since reionization is driven by the distribution of halos, and the halo number density is strongly coupled to the mean density in each region, we expect the functional form of versus to be fairly robust. This means that the shape of the PDF will also be fairly robust even if the probability distribution of density becomes slightly non-Gaussian.

Returning to Figure 1, we see that in the plotted case, the contribution of the one-neutral-one-ionized case to the difference PDF is similar (though not identical) in shape to the one-point PDF, and in particular is centered near the maximum value of . This results from the fact that in this case, the value of is simply equal to for the region that is not fully ionized. Note that in general the maximum value of for the difference PDF is equal to the maximum value of for the one-point PDF, since is always non-negative. The both-neutral contribution to the difference PDF is quite different, since in each of the two regions tends toward , so the most likely difference between the two values is zero. This tendency is further strengthened by the correlation between the densities in the two regions. The outcome of all of this is a difference PDF that has two peaks, with a valley at intermediate values of . As a result, the 21-cm difference CPD first declines at small values of (where it is dominated by the both-neutral case), then flattens at larger values, and finally cuts off sharply at the maximum value of .

The bottom panels of Figure 3 again show the full CPD (with the non-linear correction of the density), but also compare it to the cumulative of a Gaussian (i.e., an error function) with the same variance. The CPD shape described above is clearly very different from the error function. As illustrated by the top panels, the non-linear correction that we have applied to the density is important in order to ensure that is always positive and that there is a sharp cutoff at a maximum value of (while a calculation with linear densities leads to the unphysical result of having negative values of when the density fluctuation is more negative than -1). Other than this cutoff, the non-linear correction modifies the shape of the CPD only slightly, with the correction having a smaller effect in the case where the resolution angle is larger (and where fluctuations on the corresponding scale are more linear). Thus, while a non-linear correction of the density is required to ensure a physical result (with a non-negative density), we do not expect our results to depend strongly on the precise form of non-linear correction that we have used.

Figure 4 shows the time evolution of the CPD during reionization, considered at two different values of the separation . Throughout the parameter range considered, the CPD clearly has the same characteristic shape as noted above (although for in the top panels, the flat portions occur at lower values of the CPD than are included in the plot). In the bottom-left panel, we show an example of two cases ( and 0.5) which have nearly identical variances (i.e., the corresponding Gaussian CPD curves are nearly indistinguishable), and yet the actual 21-cm difference CPDs differ substantially in these two cases. The Figure illustrates how the CPD evolves during reionization, declining with time at low values (since the probability associated with both regions being fully ionized increases), and cutting off at a lower value of (since the overdensity needed for full reionization of a given region declines as reionization progresses globally – see Figure 2).

The information on spatial correlations contained within the CPD is illustrated more clearly in Figure 5. In most of the cases shown, the Mpc and Mpc curves are nearly indistinguishable, since even at a 30 Mpc separation the two regions are nearly independent. The CPD drops rapidly as the separation is decreased, with the probability becoming concentrated near once the two regions become highly correlated. The decline with separation, which occurs at –10 Mpc early in reionization () but over a broader range of –30 Mpc later on (), indicates the relative importance of bubble and density correlations on various scales.

A full measurement of the CPD would yield a two-dimensional function of and at each redshift. This full function is illustrated with a contour plot in Figure 6. Regions that contain much of the probability – i.e., where the CPD changes rapidly and there are large spaces between consecutive contours – indicate both the characteristic scale of correlations and a corresponding characteristic value of (which is related to the correlated distribution of densities in the two separated regions). While the full 21-cm difference PDF would be a great tool to study theoretically and to observe, in the next subsection we show that even if the gross features of the PDF were measured, they already would reveal important information that is not directly available from measurements of the correlation function alone.

### 3.2 Using the difference PDF to separately measure ionization statistics

Next, we consider robust ways to extract information from the one-point PDF and the two-point, difference PDF. While further study may show that the detailed shape of the PDF can be used as a sensitive probe of the underlying astrophysics (such as properties of the population of ionizing sources), the analytical model we consider in this paper is approximate and neglects some non-linear corrections and other physical effects, so the precise shape would change somewhat in more complete models or numerical simulations. However, we expect our model to correctly capture the gross features of the PDF, which likely constitute more robust predictions. Also, on the observational side, while the full PDF may be difficult to measure with low signal-to-noise data, we expect it to be easier to extract just these gross features. We leave for future study the question of how these features can be extracted in a realistic scenario with noise (assuming that the PDF of the noise can first be measured accurately). Our goal here is to study what information can be extracted just from the gross features of the PDF.

We first consider the one-point PDF. Two gross quantities can be simply extracted from it: the probability that a region of the resolution size is fully ionized, and the value of . The first quantity can be extracted by measuring the size of the step function of the CPD, or (equivalently) by subtracting from unity the value of the CPD just above ; alternatively, this value equals the integrated area under the PDF curve, not including the -function at zero. Since most of the probability lies near the maximum value of (see Figure 1), measuring it depends mostly on pixels with the highest signal-to-noise ratios. Measuring the value of also depends on the same pixels; this value can be derived from the maximum value of using eq. (7). Note that while a fully realistic PDF may not feature such a total sudden drop at a maximum value of , there should nonetheless be a definite, sharp drop due to the strong dependence of ionization on density through variations in the halo number density. Note also that in simulations, while a region cannot be truly fully ionized (because of the presence of very high-density gas), the interpretation of in this case is where the bubbles within the region have fully overlapped and all the gas is highly ionized except for some gas at (which at high redshift generally makes up only a small volume and mass fraction). Indeed, in simulations by Mellema et al. (2006) the PDF as a function of has a fairly flat portion (though not always increasing) followed by a rather sharp cutoff. We note that the shape of such statistics as measured in simulations has not yet been physically justified or been subjected to numerical convergence tests. But it is generically expected that the two quantities and should clearly feature in the PDF. If models or simulations can reliably establish at least the approximate form of the PDF, then this would make it easier to measure these two quantities.

Now consider the difference PDF (see Figure 1). One gross feature is obviously a cutoff that can also be used to measure the same value of . The difference PDF can also be used to extract three other inter-related quantities, each as a function of the separation : the probability of joint full ionization; the probability of full ionization of only one of the two regions (with the factor of 2 accounting for the symmetry of selecting which region is the ionized one); and the probability that neither region is fully ionized. The first probability can be extracted from the size of the step function of the CPD at , while the other two can be extracted from the areas under the two peaks that are fairly well separated in the PDF during reionization (see Figure 1), or more accurately by modeling the two separate contributions to the PDF at . In reality, the three probabilities need not be measured separately, since they can easily be shown to be closely related to each other; in fact, together with the one-point quantity can be used to express the other two two-point probabilities:

(19) |

and

(20) |

These relations should make it much easier to extract this information from even approximate measurements of the difference PDF.

Thus, from the one-point and two-point PDF, we can measure two independent probabilities that depend directly only on ionization statistics, not mixed in with the value of the density. These are (a single quantity at each redshift) and (a function of at each redshift). Even a rough measurement of the PDFs may yield a reasonable estimate of these gross quantities. Note also that in the limit of infinitely good resolution (i.e., a very small resolution scale), becomes the cosmic mean ionized fraction, and becomes the ionization correlation function (after subtracting the square of the mean ionized fraction). In addition, the value of yields an interesting piece of information on the dependence of ionization on the density. All of these quantities are separate from the correlation function, which yields just one function of that is a complicated convolution of fluctuations in density and ionization.

Figure 7 shows predictions of our model for the two quantities and that are available from the 1-pt PDF. can be used as a rough estimate for the cosmic mean , although this works better with high resolution () and late in reionization. The robustness of theoretical predictions of the relation between and can be investigated with further models and simulations. Even for a fixed end-of-reionization redshift and when expressed as functions of , both and depend significantly on the characteristic halo mass of ionizing sources.

Figure 8 shows the three inter-related probabilities obtainable from the difference PDF, compared to the expectation value . The standard 21-cm 2-pt correlation function does not actually yield but rather subtracts off , which is the value of at . Thus, in the figures the information available from the 2-pt correlation function is not the absolute plotted values of , but just the values relative to the large-scale asymptotic value. Therefore, the ability to measure the 2-pt correlation function and extract useful information from it depends on the total change in from small to large scales. This total change is smaller than the overall change with scale of most of the curves that show the ionization probabilities. The probability goes from at to at , while goes from to . The largest change is seen in the probability that one region is ionized and the other is not; this varies from zero at small to at large .

The characteristic scales of the bubble correlations can be read off as the scales where the various quantities in Figure 8 change most rapidly as a function of . In order to focus on this important feature, we plot the derivatives with respect to in Figure 9, just for and for (since the other two probabilities can be derived from this quantity). In general, both and show roughly the same dominant scales, but shows a greater variation with scale during the central stages of reionization at which the variation is maximized. Figure 10 shows similar trends in the case of reionization by more massive and highly-biased halos, except that here the characteristic scale grows to even larger values by the end of reionization; early on in reionization, the characteristic scale is approximately the same as in the previous case of less-massive halos, but the magnitudes of the scale-derivatives are larger in the case of the more-massive halos (corresponding to stronger ionization and 21-cm fluctuations). Figure 11 demonstrates the importance of achieving high resolution in the upcoming experiments. In the scenario considered here, a cosmic mean ionization fraction of one half occurs at , at which the resolution of corresponds to a radius of 1.3 comoving Mpc (com Mpc), and to 6.7 com Mpc. The typical correlation scale grows with time and overtakes the scale only late in reionization (), so until then the peaks of the curves in Figure 11 indicate roughly the resolution scale (rather than the desired correlation scale). Observations should therefore achieve a resolution of a few arcminutes or better in order to make it possible to measure the evolution of the dominant correlation scale throughout most of the reionization era.

## 4 Conclusions

We have presented and studied a new statistic for analyzing 21-cm fluctuations, namely the PDF of the difference of the 21-cm brightness temperatures of two regions, as a function of the separation between their centers. This two-dimensional statistics generalizes to one-dimensional functions, the one-point PDF and the two-point correlation function; the latter is simply related to the variance of the difference PDF (Eq. 13).

We have predicted the difference PDF based on the correlated two-point distribution of density and ionization (Barkana, 2007), generalized to the case of a finite observed resolution, and including a non-linear correction for the density (Eq. 8). The model predicts a characteristic shape during reionization for the PDF (or its cumulative form, the CPD) that can be understood from the various contributions to it depicted in Figure 1. The PDF contains information on the distribution of the density within neutral regions and on the dominant spatial scales of bubble and density correlations (Figures 4 and 5). The full PDF is a function of separation and of at each redshift (Figure 6).

While the usual correlation function is determined by a complicated mixture of contributions from density and ionization fluctuations, we have shown that the difference PDF (together with the one-point PDF) holds the key to separately measuring statistics of the ionization distribution. In particular, even an approximately measurement of the PDFs can generically be used to measure the ionization probability of a resolution-sized region, and the joint ionization probability of two such regions as a function of their separation. Within our model, the joint ionization probability shows the same characteristic correlation scale as the two-point 21-cm correlation function but has more power (Figures 9 and 10); this is because the contributions of density and ionization to 21-cm fluctuations are anticorrelated (a higher density implies less neutral gas), which reduces the 21-cm fluctuations relative to ionization fluctuations.

If quasars contribute significantly to reionization by producing large bubbles even early in reionization, or if quasars or supernovae emit significant X-ray photons (which have a long mean free path), then the density and ionization fluctuations will not be as simply related as we have assumed. In this case, it will be much more important to measure the ionization statistics separately since it will be difficult to extract this information from the 21-cm statistics. Also in this case the relation between the one-point and difference PDFs and the 21-cm correlation function should be significantly different from the case we have considered. The full quantitative details of such a scenario goes beyond the scope of this paper and merits a separate study.

## Acknowledgments

The authors would like to acknowledge Israel - U.S. Binational Science Foundation grant 2004386 and Harvard University grants. RB is grateful for the kind hospitality of the Institute for Theory & Computation (ITC) at the Harvard-Smithsonian CfA, where this work began, and also acknowledges support by Israel Science Foundation grant 629/05.

### Footnotes

- pagerange: The Difference PDF of 21-cm Fluctuations: A Powerful Statistical Tool for Probing Cosmic Reionization–References
- pubyear: 2007
- http://www.haystack.mit.edu/ast/arrays/mwa/
- http://www.lofar.org/

### References

- Barkana R., 2007, MNRAS, 376, 1784
- Barkana R., Loeb A., 2004, ApJ, 609, 474
- Barkana R., Loeb A., 2007, Rep. Prog. Phys., 70, 627
- Bharadwaj, S., & Pandey, S. K., 2005, MNRAS, 358, 968
- Bond J. R., Cole S., Efstathiou G., Kaiser N., 1991, ApJ, 379, 440
- Bowman J. D., Morales M. F., Hewitt J. N., 2006, ApJ, 638, 20
- Ciardi B., Madau, P., 2003, ApJ, 596, 1
- Furlanetto S. R., McQuinn M., Hernquist L., 2006, MNRAS, 365, 115
- Furlanetto S. R., Oh S. P., Briggs, F., 2006, Phys. Rep., 433, 181
- Furlanetto S. R., Zaldarriaga M., Hernquist L., 2004, ApJ, 613, 1
- Madau, P., Meiksin, A., & Rees, M. J. 1997, ApJ, 475, 429
- McQuinn M., Furlanetto S. R., Hernquist L., Zahn O., Zaldarriaga M., 2005, ApJ, 630, 643
- McQuinn M., Zahn O., Zaldarriaga M., Hernquist L., Furlanetto S. R., 2006, ApJ, 653, 815
- Mellema G., Iliev I. T., Pen U.-L., Shapiro P. R., 2006, MNRAS, 372, 679
- Mo, H. J. & White, S. D. M. 1996, MNRAS, 282, 347
- Press W. H., Schechter P., 1974, ApJ, 187, 425
- Saiyad-Ali, S., Bharadwaj, S., & Pandey, S. K., 2006, MNRAS, 366, 213
- Scannapieco E., Barkana R., 2002, ApJ, 571, 585
- Scannapieco E., Thacker R. J., 2005, ApJ, 619, 1
- Sheth R. K., Tormen G., 1999, MNRAS, 308, 119
- Sheth R. K., Tormen G., 2002, MNRAS, 329, 61
- Spergel D. N., et al., 2007, astro-ph/0603449
- Wyithe S., Morales M., 2007, astro-ph/0703070
- Zahn O., Lidz A., McQuinn M., Dutta S., Hernquist L., Zaldarriaga M., Furlanetto S. R., 2007, ApJ, 654, 12