Studying the morphology of reionisation with the triangle correlation function of phases
Abstract
We present a new statistical tool, called the triangle correlation function, inspired by the earlier work of Obreschkow et al. (2013). It is derived from the 3point correlation function (3PCF) and aims to probe the characteristic scale of ionised regions during the Epoch of Reionisation (EoR) from 21cm interferometric observations. Unlike most works, which focus on power spectrum, i.e. amplitude information, our statistic is based on the information we can extract from the phases of the Fourier transform of the ionisation field. In this perspective, it may benefit from the wellknown interferometric concept of closure phases. We find that this statistical estimator performs very well on simple ionisation fields. For example, with welldefined fully ionised disks, there is a peaking scale, which we can relate to the radius of the actual ionised bubbles through a simple linear relation. We also explore its robustness when observational effects such as angular resolution and noise from the instrument are considered. The triangle correlation function also gives satisfying results on fields generated by more elaborate simulations such as 21CMFAST. Although the variety of sources and ionised morphologies in the early stages of the process make the interpretation of results more challenging, the nature of the signal can tell us about the stage of reionisation. Finally, and in contrast to other bubble size distribution algorithms, the triangle correlation function can resolve two different characteristic scales in a given map.
keywords:
methods: statistical – cosmology: dark ages, reionization, first stars – theory – largescale structure of Universe1 Introduction
During the Epoch of Reionisation (EoR), from to , early light sources ionised the hydrogen and helium atoms of the Intergalactic Medium (IGM). The information currently available on this period comes from indirect observations such as the redshift evolution of the density of starforming galaxies, thought to be a major source of reionisation and estimated through UV luminosity densities (Bouwens et al., 2015; Ishigaki et al., 2015; Robertson et al., 2015); the integrated Thomson optical depth from Cosmic Microwave Background (CMB) observations (Planck Collaboration et al., 2016b, 2018); and from estimates of the averaged neutral fraction of the IGM obtained via the damping wings of quasars (Greig et al., 2017; Greig et al., 2018), surveys of Lyman emitters (Konno et al., 2014; Schenker et al., 2014; Mason et al., 2018) or gammaray bursts afterglows (Totani et al., 2014). Although these observations keep improving, in terms of both redshift and precision, they are still not sufficient to draw a precise history of reionisation. Many uncertainties remain on vairous aspects of the reionisation process such as the emissivity of early galaxies or the level of clumpiness of the IGM (Bouwens et al., 2017; Gorce et al., 2018). Interferometric measurements of the 21cm signal will potentially allow us to map regions in the sky and get a sense of both the topology and morphology of the reionisation process. To optimise the signaltonoise ratio of current radio interferometers such as the Murchison Widefield Array (MWA)^{1}^{1}1http://www.mwatelescope.org, the Precision Array to Probe the Epoch of Reionization (PAPER)^{2}^{2}2http://eor.berkeley.edu and the Low Frequency Array (LOFAR)^{3}^{3}3http://www.lofar.org, many works focus on statistical estimators such as the point correlation functions (PCF) and in particular the power spectrum (). Whilst the infinite family of PCF is sufficient to fully describe a statistically homogeneous and isotropic field, the signal from the EoR is expected to be highly nonGaussian and these estimators will not suffice (e.g. Ciardi & Madau, 2003; Furlanetto et al., 2004b; Iliev et al., 2006; Harker et al., 2009; Mondal et al., 2015). We therefore need to develop new statistical tools to make the most out of upcoming observations, especially of 21cm tomographic imaging, which is a key science goal of the future Square Kilometer Array (SKA)^{4}^{4}4http://www.skatelescope.org/. Many works have already focused on the use of the bispectrum, i.e. the Fourier transform of the 3PCF, and particularly its amplitude, to learn about the nonGaussianity of the reionisation signal (Watkinson et al., 2017; Majumdar et al., 2018; Watkinson et al., 2019). However, PCF () only carry a limited amount of new information compared to the 2PCF since they still only make use of amplitude information. In this paper, we choose to look at the phases of the bispectrum rather than its amplitude, as the information extracted from phases will contain more information than the power spectrum alone.
1.1 Phase information
Consider an ionisation field . The visibilities observed by radio interferometers give information in Fourier space, so it is useful to expand it with a Fourier series,
(1) 
is real, but each is a complex number which can be decomposed into an amplitude and a phase term , so that . When we consider the power spectrum of the field, we lose all the information contained in the phases. If is statistically homogeneous and isotropic, i.e. a pure Gaussian random field, then the phases are uniformly distributed on the interval (Watts et al., 2003) and all the information about its structure lives in the power spectrum. Conversely, any form of anisotropy or inhomogeneity in the distribution must be traceable to the phases. In this work, we aim at evaluating the amount of information we could extract from phases to add to the usual power spectrum studies. Eventually, we want to use phases to learn about the structure of an ionisation field and extract a characteristic length, such as the average radius of ionised bubbles. In this perspective, we develop a new statistical estimator, derived from the 3point correlation function and solely based on phases, called the triangle correlation function. It is derived from the line correlation function introduced by Obreschkow et al. (2013) to study linear structures in dark matter fields. To consider phase information only, we compute statistics from the phase factor defined as
(2) 
where is the phase term of . By construction, the phase factor has an amplitude of one, so that its 2PCF – and consequently its power spectrum, will vanish. The simplest correlation function related to is then its 3PCF, which we will use to construct our estimator.
The top left panel of Fig. 1 shows an ionisation field made of randomly distributed spherical ionised bubbles on a neutral background. The top right panel corresponds to the inverse Fourier transform of the phase factor of this field, taken to be a tracer of phase information in real space. We see that phases mostly preserve the edges of the ionised bubbles – discs appear as circles, and so will provide information about the structure of the field. The same exercise is done in the lower panels of the Figure but for an ionisation field extracted from the seminumerical 21CMFAST simulation (Mesinger & Furlanetto, 2007) at and global ionised fraction (details on the simulation parameters can be found in Section 4.3). We see that the more complex structure of the ionisation field in this simulation is reflected in a more complex phase map. The boundaries between neutral and ionised regions are not as definite as they were for the toy model above; it will therefore be more difficult to extract information about the structure of this field from its phases.
To this day, little work has been done regarding Fourier phases, mostly because of the lack of a solid statistical framework to do so, and in particular, no work has been done on phase information during the EoR. Thyagarajan et al. (2018) have shown that the 21cm signal from EoR can be detected in bispectrum phase spectra, and mention a potential use of their results for intensity mapping experiments. However, most of the existing work is related to the study of galaxy clustering: although the initial conditions to matter distribution in the Universe are Gaussian, the growth of cosmic structures will lead to some nonlinearity and the distributions of both amplitudes and phases will be altered (Watts et al., 2003; Levrier et al., 2006). By characterising this alteration, one hopes to learn about the formation of cosmic structures and possibly about the properties of dark matter (Obreschkow et al. 2013, see also Wolstenhulme et al. 2015; Eggemeier et al. 2015). We will discuss the benefits of phases compared to amplitude information in more details in Section 6.
1.2 Current estimators of the size of ionised regions during reionisation
Characterising the morphology and topology of regions at different times through 21cm tomography is essential to the study of reionisation. For now, two types of metrics have been proposed to do so: Minkowski functionals (Gleser et al., 2006; Lee
et al., 2008; Bag
et al., 2018; Chen
et al., 2018) and bubble size distributions (BSD). Using Minkowski functionals on tomographic images, one can learn about the topology of the reionisation process at a given time, for example the level of percolation or the shape of ionised regions: when the ionised regions largely overlap, there will likely be a unique extended percolated ionised region pierced by neutral tunnels (Bag
et al., 2018; Elbers &
van de Weygaert, 2018).
Bubble size distributions give more quantitative results as they allow a direct estimation of the size of ionised regions. To derive BSD, three main methods can be found in the literature: the friendoffriends algorithm (FOF), based on the connections between ionised regions (Iliev et al., 2006); the spherical average method (SPA), which looks for the largest spherical volume which can cover an ionised region and exceed a given ionisation threshold (Zahn
et al., 2007); and the meanfreepath method (RMFP), based on MonteCarlo Markov Chain (MCMC) techniques and implemented in the seminumerical simulation 21CMFAST (Mesinger &
Furlanetto, 2007; Mesinger
et al., 2011). The RMFP algorithm proceeds as follows: it chooses randomly an ionised pixel, stores its distance to the closest neutral pixel in a random direction and builds an histogram from the distribution of these lengths for many ionised pixels. A thorough comparison of these methods is made in Lin
et al. (2016) for clean simulation images and in Giri
et al. (2018a) for simulated 21cm tomographic data, including observational effects. A new method, which the authors call granulometry, has been recently presented in Kakiichi
et al. (2017). The idea is to successively sieve a binary field with a spherical hole of increasing radius . Depending on how many ionised pixels are sieved, one can build the probability distribution of regions sizes. In their work, the authors prove that granulometry should perform well on future SKA observations, as long as the correct observing strategies are chosen. Lin
et al. (2016) have also extended to 3D the wellknown watershed algorithm: it connects same value pixels with each other to find isodensity lines i.e. edges of characteristic structures. Finally, Giri
et al. (2018b) applied the image processing method called superpixels, based on oversegmentation, to simulated noisy 21cm tomographic images. Their results seem more conclusive for the late stages of EoR than most other current techniques.
Note that all these methods require real space data: one of the main advantages of our method is that it can be used directly on data in Fourier space.
In this paper, we first describe the 21cm signal, on which the simulated observations we study here are based in Section 2. Then, in Section 3, we develop the mathematical formalism leading to the definition of the triangle correlation function, and in particular the definitions of point correlation functions and polyspectra. In Section 4, we apply triangle correlations to simulated ionisation fields, either filled with randomly distributed ionised bubbles or outputs of the 21CMFAST simulation. We relate our method to observation in Section 5. Finally, other aspects of the results are discussed: we first give evidence for the benefits of phase information in Section 6 and then consider computational performance in Section 7. Conclusions can be found in Section 8.
2 The 21cm signal
The neutral hydrogen 21cm line corresponds to the spinflip transition of an electron between the two hyperfine levels of the ground state of the atom. As a tracer of neutral hydrogen, it is naturally a very interesting observable to learn about the EoR and for the last decades, many efforts have been made to design experiments capable of detecting this signal, although it has proved to be more challenging than anticipated, due to a combination of the signal’s low amplitude in the presence of strong foregrounds and huge calibration challenges. The 21cm signal is seen with respect to a radio background, usually the CMB. We can write the evolution of the observed differential 21cm brightness temperature as (Pritchard & Loeb, 2012):
(3)  
where is the spin temperature, which characterises the relative populations of the two spin states, and the baryon overdensity. During the late stages of reionisation, it is expected that the spin temperature will dominate the CMB temperature because it is coupled to the kinetic temperature of gas, and we can make the approximation of the second line. With this approximation, we can map the 21cm signal on the sky and, thanks to Eq. 3, interpret cold spots, i.e. regions where the signal is weaker than average, either as ionised or underdense regions; and hot spots as neutral or overdense regions. Being able to differentiate underdense from ionised regions in brightness temperature measurements is a major challenge of 21cm tomography (Giri et al., 2018a). In this work however, we consider this separation has already been made on data and directly study binary ionisation fields. Because of redshift, a photon emitted with a wavelength of 21cm during the EoR () will reach us today with a frequency i.e. the frequency range that the new generation of radio interferometers, such as MWA, PAPER, LOFAR and SKA, are expected to probe.
3 The triangle correlation function of phases
For a real ionisation field of volume in dimension , the point correlation function (PCF) measures the correlations between points, described as a family of vectors . It is defined as:
(4) 
where and the integral is over the whole volume . If we average this definition over all possible rotations, we obtain the isotropic PCF . As can be seen in Eq. 4, the PCFs are convolutions and it will be easier to compute them in Fourier space. The Fourier transform of the PCF is called the polyspectrum such that
(5) 
and
(6)  
For , is called the power spectrum and for , we have the bispectrum such that
(7a)  
(7b) 
where in the first equation we have used the fact that, because is a real field, its Fourier transform will verify . Note that we can also write the bispectrum in terms of three vectors forming a closed triangle i.e .
Consider the 3point correlation function (3PCF), i.e. the inverse Fourier transform of the bispectrum:
(8) 
To study filamentary structures in matter fields, Obreschkow et al. (2013, hereafter O13) consider the 3PCF in Eq. 8 for i.e. three aligned points. Because we look for spherical structures, we will consider two vectors and forming an equilateral triangle, as it is the threepoint shape closest to a sphere. In this case, is just rotated by an angle :
(9) 
where the 2D case is limited to the first two equations^{5}^{5}5Note that in the actual computation of the triangle correlation function, we will consider rotations not only around the axis but also around the other two.. Let
(10) 
then we can write in Eq. 8. If we choose to consider the phase factor of the bispectrum rather than its full form (see Eq. 2), we find the modified 3PCF in dimension
(11) 
A rotational average of the above expression gives the corresponding isotropic modified 3PCF:
(12) 
where is the window function defined by
(13) 
for the Bessel function of the first kind and order 0. Numerically, we will need to discretise the integral as
(14) 
where we have considered our density field as a periodic box with physical side length , divided into cells. Each cell has a width . In Fourier space, this box transforms into a box of same dimensions () but of side length and spacing between each mode. For small scales (), we will have an infinite number of modes since the largest one is and the correlation function will diverge: to prevent this, we introduce a cutoff on the sums of Eq. 14. Similarly, when , and the discretisation no longer holds. As discussed in O13, we introduce the prefactor to Eq. 14 to solve this issue. These changes applied to the isotropic modified 3PCF define the triangle correlation function of phases:
(15) 
The imaginary part of is zero therefore we only consider its real part in the following sections.
All these derivations have been done with being the ionisation field . If we consider the neutral field , we have and which we can plug back into the equations above to find . Therefore when applied to a field mostly ionised, with remote neutral islands, the correlations dominating the signal will be related to regions and the signal will be negative. In particular, because 21cm tomographic images trace the neutral gas in the sky, if we apply our method on 21cm brightness temperature maps, we obtain the exact same signal as for the corresponding ionisation field, but with a reversed symmetry. This also implies that during the middle stages of the reionisation process, positive and negative (respectively, and ) correlations overlap, and the triangle correlation function flattens down.
4 Application to simulated ionisation fields
We generate boxes of cells filled with randomly distributed ionised disks. Although the derivations performed in Section 3 are valid for a two or threedimensional ionisation field, we now limit our work to 2D boxes, as they are closer to what one would observe with a radiotelescope. These boxes are binary: an ionised region will have pixels of value 1, whereas a neutral zone will be filled by zero pixels. Because the UV photons emitted by early galaxies have a very short mean free path in the surrounding neutral IGM, the boundary between ionised and neutral regions is expected to be sharp, and a binary model where ionised bubbles have and neutral regions is acceptable (Furlanetto et al., 2004a). Each simulated box has periodic boundary conditions and ionised regions are allowed to overlap^{6}^{6}6When they overlap, the maximum pixel value is set to 1, since it is a proxy for ionised level.. In a more realistic field, we would expect the ionised bubbles to be clustered around overdense regions rather than randomly distributed but we choose to first ignore this effect, as we will later consider it when applying our method to the 21CMFAST simulation. For each box, we compute the triangle correlation function as defined in Eq. 15 for a range of correlation scales . We compare the resulting plot with the known size of the ionised bubbles filling the box. The programme we developed to compute these correlations is publicly available online^{7}^{7}7https://github.com/adeliegorce/Triangle_correlations.
4.1 Picking up characteristic scales
Fig. 2 shows the triangle correlations computed for 30 realisations of a 2D box of pixels and side length filled with 70 binary bubbles of radius (in pixel units) i.e. about . The solid line is the mean value for all 30 boxes, and the shaded area represents the confidence interval. Although the signal is very similar for all realisations of the box at small correlation scales, there is more variance at larger scales, as gets closer to the box side length . To see if this scattering is a numerical or physical effect, we compute the triangle correlations of 30 boxes of same dimensions, but with random Fourier phases; the result is added in red to the Figure. In this case, the mean signal is close to zero but the same variance can be seen at large scales. More generally, we find that results are more reliable on small scales, when . Although there is more variance at larger scales, we see that when considering many realisations of the same box, the mean signal flattens out. This suggests that the power seen on large scales corresponds to correlations linking three points not belonging to the same ionised bubble, but three points located in separate bubbles and so traces a form of correlation in the bubble locations. Because our simulation randomly distributes bubbles in the box, it is then only natural that these correlations would average out when considering more realisations of the same box. As we will see in the next sections, the oscillations on the left side of the peak are due to the sharp edges of the bubbles: once the field is smoothed by a given angular resolution, they will vanish. The problems induced by solid spheres in the bispectrum are flagged up in other works, e.g. Watkinson et al. (2017) and Majumdar et al. (2018).
The triangle correlation function probes equilateral triangles inscribed in the ionised disks of the box, and simple geometry shows that these have a side length of , a scale at which we would expect to peak and which is represented as a dotted vertical line on the figure. On Figure 2, there is indeed a welldefined peak, but at . Because of this shift, we need to confirm that the peak is actually related to the structure of our field. To do so, we generate many pixels boxes with and fill each one of them with 50 binary bubbles of a given radius, i.e. one box with 50 bubbles of radius , another with 50 bubbles of radius etc. Results are gathered in Fig. 3 for 10 realisations of each box type: the correlation peak indeed shifts to larger scales as bubbles increase in size. We plot the peaking scale as a function of the bubble radius in the lower panel of Fig. 3, where the error bars correspond to the confidence interval for 10 realisations of the same box. Points are closely aligned, and a MonteCarlo Markov Chain (MCMC) fit to a linear relation gives the following results, where both and are in :
(16) 
The shift between peaking scale and radius size can actually be traced to the window function .
Here, we chose to work with a binary field, with no partially ionised regions where . To see if our results hold when such regions exist, as there would be if Xray photons significantly contributed to reionisation (Visbal & Loeb, 2012) or if some observed regions are unresolved, we perform the same test but for symmetric 2D Gaussian distributions with the bubble radius as variance. The box dimensions are the same as before. As for binary bubbles, we observe a peak in the triangle correlation function at scales slightly smaller than . A linear relation can still well describe the correlation between and but there is more scattering around the model, mostly because the peaks quickly become difficult to locate as the bubble radius increases: on average, the signal is much flatter for Gaussian than for binary bubbles. For both, the signal gets weaker as bubbles get larger (for a fixed number of bubbles in the same box), i.e. when the filling fraction of the box increases. Conversely, for a given radius, the amplitude of the signal will decrease as the number of bubbles – and so the filling fraction of the box, increases. In general, we find that for any radius size, as long as the filling fraction does not exceed , there is still a clear peak in the signal and we can infer a characteristic scale. Indeed, for larger global ionised fractions, there is no well defined characteristic ionised regions size anymore. We can compare these results to those of Bharadwaj & Pandey (2005) who, through a more observational approach, relate the bispectrum of fluctuations to the correlations between the visibilities measured at three different baselines of an interferometer. They generate the same kind of toy models as us and find that their signal increases with the size of the ionised regions. However their method cannot be used in the midstages of reionisation () because of overlaps. Their approach is quite similar to ours as they only consider combinations of baselines forming an equilateral triangle; however, it requires the bubble radius size as an input.
4.2 Further tests
The triangle correlation function seems to correctly pick up the characteristic scale of spherical regions in an ionisation field. Because we want to use this method directly on Fourier data, for which no real space image will be available, we need to ensure that the peaking scale corresponds to spherical regions only. In their work, O13 define the line correlation function, which is analogous to our triangle correlation function but for two vectors and aligned () so that it will probe linear rather than spherical structures in a field (see Sec. 3). It writes as:
(17) 
where the window function has not changed compared to our expression but is now a function of and not (see Eq. 15). O13 find that the line correlation function is almost flat when there are only spherical structures in the matter field considered (see their Figs. 3 and 4). Here, we compute the line correlation function of O13 for fields filled with bubbles of various sizes and compare the results with triangle correlations: we find that the line correlation function seems to also pick up the bubble size, with a peak located at roughly the same correlation scales as for triangle correlations, but with a much weaker amplitude. To estimate the efficiency of each function with respect to the sphericity of objects in the field, we analyse 2D boxes filled with more or less stretched binary ellipses. We define the prolateness of an ellipse as the ratio of its semiminor axis to its semimajor axis so that an ellipse with prolateness is a flat disk. Fig. 4 shows the evolution of the integral of the two correlation functions as a function of the prolateness of the objects in the 2D box considered^{8}^{8}8Note that we integrate only on the correlation range where the signal is more reliable (see Sec. 4.1).. As prolateness tends towards 1, the triangle correlations signal increases, while the line correlations signal decreases, even reaching negative values for . On the contrary, the signal for triangle correlations is close to zero for but very strong for line correlations. Note that triangle correlations overshoot line correlations as soon as i.e. when the ellipsoid is still very stretched. These results prove that the triangle correlation function is better at picking up spherical shapes, whereas line correlations are better at picking up elongated structures.
In reality, the reionisation process will not be as homogeneous as the fields studied so far: it is unlikely that all ionised bubbles should have the same radius at a given redshift. It would therefore be useful if the triangle correlation function could differentiate between bubbles of different radii in the same box. To test for this, we generate a box of pixels and side length , filled with 20 binary disks of radius and 60 of radius . Results are displayed on Fig. (a)a for 20 realisations of this box. We can clearly distinguish two peaks, which seem to match the bubble radii (dashdotted vertical lines), once scaled by the coefficients of Eq. 16 (dotted vertical lines) which relates peaking scale and bubble radius. Note that this result holds when we replace the binary bubbles by Gaussian disks. To ensure that the scales picked up by our function correspond to two different sizes of bubbles and not to the length and width of elongated structures, we compute the triangle correlations of a box filled with 60 binary ellipses of the same dimensions as the bubbles above i.e. a semimajor axis and a semiminor axis (bubbles were 4 and 14 in radius). The curve obtained, shown as an orange dashed line on the figure, differs largely from what we obtained for disks. There is only one clear peak, corresponding to scales close to the semiminor axis: because it probes equilateral triangles, the function picks up elongated ellipses as rows of disks of radius . Finally, we fill boxes with bubbles whose radii follow a Gaussian distribution. Because we generate the box, we know the exact distribution of bubble radii inside and we want to see if our estimator is able to retrace it. On Fig. (b)b, we compare the triangle correlation function (solid line) with the actual radius distribution (histogram) for 3 different bubble sizes: and . The number of bubbles in the box is adjusted so that they all have the same filling fraction . Overall, there is a reasonably good match between the two. We see that as the mean radius increases, for the same variance , the discrepancy between peaking scale and real bubble size grows in a way that the linear relation of Eq. 16 cannot fully compensate for.
4.3 Results on 21CMFAST simulation
Now that our method has been tested and characterised on toy models, we apply it to more physical ionisation fields, extracted from the seminumerical simulation of reionisation 21CMFAST^{9}^{9}9http://github.com/andreimesinger/21cmFAST. We choose 21CMFAST because, thanks to relatively short executing times, it gives a large flexibility in parametrisation. It also includes an implementation of the random mean free path algorithm to estimate the bubble size distribution of data cubes, which we can compare to our estimator. Note that an important difference with the toy models studied before is that the distribution of bubbles within the box is no longer random, and large scales correlations related to the bubble locations might appear in our signal. The 21CMFAST code first generates a highredshift linear density field which is then evolved to lower redshifts using linear theory and the Zel’dovich approximation. The ionisation field is extracted from this density field using excursionset theory for haloes with virial temperature . We use this code to generate a box with sufficient resolution to obtain a field of pixels and side length for the following cosmology: , and (Planck Collaboration et al., 2016a). The resulting reionisation history has its midpoint at for a duration of and gives an integrated Thomson optical depth of – the evolution of the global ionised fraction of the simulation with redshift is given on Fig. 7.
The output of the simulation is a 3D field but we choose to analyse 2D slices rather than the full 3D field to be closer to actual observations. We also convert the given field into a binary field in order to get positive triangle correlations when most of the field is ionised and negative correlations when the sky is mainly neutral (see Eq. 7b for and discussion at the end of Sec. 3), in continuity with previous sections. Fig. 7 presents the triangle correlations computed for 2D slices at redshifts and corresponding to global ionisation levels of and respectively, along with a picture of the corresponding field. At high redshift, in what is often referred to as the preoverlap phase, there is more power at small scales but there is no clear peak in our signal. This is likely due the variety of regions sizes: many small ionised regions around young sources parcel the neutral background out. However, by looking at the integrated signal, we can infer an upper limit on the sizes of the ionised regions: at , scales smaller than (after conversion with Eq. 16) contribute for of the signal – integrated over the range ; at and , this upper limit increases respectively to and . The structure of the ionisation field at high redshift directly relates to the way the 21CMFAST algorithm is constructed and particularly to the use of excursion set theory. In excursion set theory, when the average density of a region exceeds a given threshold, it collapses to form a halo. When applied to reionisation, the threshold additionally considers ionising photons production: if the region has produced a sufficient number of ionising photons with respect to its mass and volume, then it is considered ionised. Because of this, the ionisation field in the early stages of reionisation will always have many very small sources rather than a few efficient sources. It would be interesting to apply triangle correlations to other types of reionisation simulations, in particular simulations where reionisation is led by Active Galactic Nuclei (AGN), where ionising sources are more scarce and have a better ionising efficiency so that the topology might be closer to the toy models mentioned in Sec. 4.1. In this perspective, our method may be able to differentiate between different reionisation scenarios. Such a study goes beyond the scope of this work but we refer the interested reader to Watkinson et al. (2019), where the authors use the bispectrum as a probe for nonGaussianities due to Xray heating.
Later on, when the global ionisation fraction reaches values between and , negative signal coming from neutral regions overlaps with the positive signal linked to ionised regions. The triangle correlation function flattens, and therefore cannot give information about the morphology of the field. However, measuring a flat signal from actual data could be interpreted as the reionisation process being in its middle stages. For , most of the sky is ionised and only a few remote neutral islands remain. This is the postoverlap phase. We see on Fig. 7 that the sizes of these neutral islands are efficiently picked up by the triangle correlation function: for , there is a very clear negative peak at scales which correspond to a radius size of once the linear relation in Eq. 16 is applied as correction. If we estimate by hand the size of the neutral zones, we find that they are about in radius, which is very close to the estimation given by triangle correlations. For , we see two clear peaks in the signal, corresponding to the two sizes of ionised regions seen on the real space field (left panel): two large ones at the bottom and two smaller ones at the top. The first peak is spread over scales i.e. , while the second one is more narrow, centred around . For comparison, we find that we can sieve the two larger neutral islands in the field with disks of radius ; whereas the smaller ones can fit in disks whose radii range from to . This motivates a further study of how well our estimator performs on 21CMFAST compared to algorithms designed to derive bubble size distributions (BSDs).
RMFP  SPA  

13.0  0.036  < 22.5  2.01  < 5.9 
12.0  0.062  < 24.5  2.34  < 7.4 
11.0  0.010  < 26.3  3.12  < 8.9 
6.4  0.971  6.2 & 20.2  7.0  13.9 
6.2  0.994  7.7  7.0  9.4 
Because our method gives interesting results on outputs of the 21CMFAST simulation, we challenge it with other methods found in the literature to estimate the BSD of a 21cm tomographic image, namely the spherical average code (SPA) and the random mean free path method (RMFP), directly implemented in the 21CMFAST package. Note that both these methods need to be applied to real space images and not directly Fourier data as ours. We choose not to compare our results to the friendoffriends algorithm (Iliev et al., 2006) because it considers overlapping bubbles as a unique large ionised region, which is fundamentally different from our approach. We use a version of the SPA algorithm implemented by the authors of Giri et al. (2018a)^{10}^{10}10tools21cm are found on https://github.com/sambitgiri/tools21cm.. Recall that the SPA algorithm looks for the largest sphere around each ionised region whose ionisation level exceeds a given threshold, here chosen to be . RMFP, on the other side, relies on MCMC: it looks for the first neutral cell encountered in a random direction starting from an ionised pixel and records the length of the ray to build an histogram. The outputs of the RMFP and SPA algorithms are the probability distribution of radii and not a delta function at the exact scale: they are called diffusive estimators of the bubble size. Results can be seen on Fig. 8 for and 12.0 and are detailed in Table 1. For clarity, i.e. to have a positive signal for all redshifts, we have plotted in the first two panels. At low redshift, our method mostly agree with the other BSDs, although the triangle correlation function has a more narrow peak; we are also able to clearly distinguish between two characteristic scales for , whereas SPA et RMFP only give one peaking scale, apparently corresponding to the smaller neutral regions. At higher redshift, the SPA distributions tend to and we choose to compare values of quantiles rather than maximum likelihood values: at , according to SPA, there is a probability that the ionised regions have a radius smaller than ; similarly, scales represent of the integrated triangle correlations. These upper limits, computed for both and , are shown as vertical lines on the figure. RMFP gives a maximum likelihood radius of (the error corresponding to the bin size), therefore the most quantitative result. For all three methods, the upper limit increases as redshift decreases, which corresponds to ionised regions growing with time. The upper limit given by the SPA algorithm is the lowest which corroborates the results of Lin et al. (2016), who argue that SPA was heavily biased towards smaller bubble radii than the actual bubble sizes: for a single bubble of radius , they find that the SPA probability distribution peaks at . On the contrary, they find the RMFP method to be unbiased and to peak at the correct bubble size. Note that on the two right panels of Fig. 8, we see some signal at large scales. This can be related to statistical noise on one side – we analyse a unique slice of the simulation; and to the potential correlations between separate bubbles, since they are not randomly distributed anymore.
Overall, our estimator performs well with respect to comparable BSD algorithms. However, for this method to be an useful tool in the analysis of upcoming radio observations, a deeper analysis, including the study of different types of reionisation simulations, would be required. In practise, we would also use this method as a forward modelling process, and compare triangle correlations from observations to a set of predicted signals for different types of ionisation fields. We keep the construction of such a database to future work.
5 Relation to observations
5.1 Visibilities and closure phases
When gathering data with an interferometer, the signal will be measured for pairs of antennae separated by a baseline . The measurement is made in terms of a complex visibility , where and are the projection of the baseline in wavelength units on the plane perpendicular to the vector pointing to the phase reference position, i.e. the centre of the field to be imaged. In Eq. 3, the brightness temperature depends on the position in the sky where the signal was observed: . will be the intensity of the 21cm signal observed through the interferometer for a given frequency (i.e. for a given redshift, since )^{11}^{11}11We assume the instrument to have narrow bandpass filters and so to probe exactly rather than a frequency band centred on .. In what follows, we will use and interchangebly. If the interferometer probes a sufficiently small region of the sky compared to the beam width of the antennae, we can approximate this region by a flat plane. This is called the flatsky approximation. Then the source intensity distribution is a function of two real spatial variables and the vanCittert theorem tells us that the complex visibility is the 2D inverse Fourier transform of the intensity, corrected by the normalised average effective collecting area of the two antennae (Thompson et al., 2017):
(18) 
that we can rewrite as
(19) 
where denotes a convolution, and are respectively the 2D Fourier transforms of and and we define . Note that this approximation introduces a phase error that may need to be taken into account when applying our method, based on phase information. Because is a measurable instrumental characteristic, if we know our instrument sufficiently well, we can deconvolve the measured complex visibilities by and obtain, after correcting for the different prefactors, the terms to compute the triangle correlation function.
One of the interests methods based on phase information is that the phases of the measured visibilities, combined in a bispectrum, will not be sensitive to errors related to calibration (Jennison, 1958; Monnier, 2007). Consider the visibility measured between two antennae and at a given frequency . It will have contributions from the true visibility , coming from the cosmological signal, but also from the amplitude and phase errors of each antenna, modelled by a complex gain , such that:
(20)  
where denotes the complex conjugate. The amplitude of this gain will come from beam specific effects such as mirror reflectivity, detector sensitivity or local scintillations whereas the phase term can originate either from telescope errors or from outside effects such as atmospheric turbulence (Monnier, 2007). If we combine the signal from three antennae forming a closed triangle, we can avoid this phase error and we will be left only with what is called the closure phase. Indeed, consider three baselines , and observing at the same given frequency . We can write the bispectrum of their complex visibilities as
(21)  
where in the last step, the different phase terms cancel each other out: the phase of the measured bispectrum is the phase of the true bispectrum. Our definition of the bispectrum in Eq. 7b assumed a closed triangle configuration, the third vector being implicitly defined by . Because we choose to work in two dimensions, the three vectors lie in the same plane on the sky, perpendicular to the lineofsight, and are measured for the same frequency so that the closure relation holds. We will then be able to use our method on observational data without worrying about calibration errors. For an example of the use of bispectrum closure phases in interferometry from an observational point of view, we refer the reader to Thyagarajan et al. (2018). In this work, the authors compare the bispectrum phase spectra coming from different components of a simulated signal, i.e. a single point source, diffuse foregrounds and fluctuations from the EoR and demonstrate that a quantitative relationship exists between the EoR signal strength and the whole bispectrum phase power spectra.
These properties are a major benefit of our technique compared to other solutions found in the literature to estimate the characteristic size of ionised regions from interferometric data as these always require to reconstruct the realspace image corresponding to observations: our method can directly use as input the complex visibilities observed by an interferometer, and if we choose to use closure phases, results will be independent of antennabased calibration and calibration errors. However, in practise, there are some limitations to the use of the closure relation.
First, only a limited number of triangles can be constructed from the array of antennae of a telescope, therefore some information will be lost compared to simple baseline measurements. Monnier (2007) count that for an array made of antennae, there are possible closed triangles, independent Fourier phases and independent closure phases. Therefore the amount of phase information recovered from closure phases is
(22) 
With as little as 40 antennae, we are able to recover of the phase information but 1000 antennae are not enough to reach . Note that for the 296 antennae of the SKA1Low central array and the 48 antennae of LOFAR, we recover respectively and of the phase information from closure phases. However, even if almost all the phase information is recovered, Readhead et al. (1988) find that the noise level of a phaseonly observation will still be at least twice higher than a map made from full visibility data, because when ignoring amplitudes, half of the signal is lost. Additionally, Readhead et al. (1988) show that because the bispectrum is a triple product, there will be new sources of noise compared to single baseline observations. First, if the same signal is measured on different time intervals, the observed bispectrum will not only be the product of three complex numbers anymore, but of the sum of each observation on each time interval:
(23) 
where is the visibility measured for baseline on time interval . Then the cross terms combining signals integrated on different time intervals (e.g. and ) will give incoherent phase terms that can be assimilated to noise. The second potential source of noise mentioned by Readhead et al. (1988) corresponds to the same kind of reasoning, but in the spatial domain: if we consider the triangle formed by three baselines , then on a redundant array there will be contributions not only from the triangles but also from identical baselines who are not part of a triangle. See the example on Fig. 9 of a redundant array: all numbered vectors correspond to the same baseline but only 1, 2 and 4 form triangles. When probing triangles, the measured bispectrum will take the sum of the four signals as the visibility corresponding to this baseline (e.g. ) and additional cross terms, irrelevant to closure phases because they include , will arise.
Readhead et al. (1988) show however than these two types of noise can be reduced to a sensible signaltonoise ratio if enough frames are used in the integration. Similarly, the authors of Carilli et al. (2018) mention that some instrumental effects such as polarisation leakage or cross coupling of antennae, called "closure errors" can lead to a departure from the closure relation.
Finally, the limited number of triangles one can construct from given antennae will limit the sampling of space and worsen the sparsity of observations. Applying our method to sparse data goes beyond the scope of this work, but it would be interesting to see how triangle correlations perform with this additional difficulty. For now, we will limit ourselves to noisy data sets.
5.2 Instrumental effects
To see how our method performs when applied to actual observations, we now add instrumental effects to our ionisation maps: beam smoothing and noise corresponding to observations by the core of SKA1Low, its central area, and by LOFAR. We pick three simulated ionisation fields: the first comes from the toy models described in Section 4. It is made of 70 bubbles of radius , and assumed to correspond to a redshift . The two others are extracted from 21CMFAST at redshifts and 6.4. They are shown, in this order, on the left panels of Fig. 10. We first convert each one of them into a brightness temperature map according to Eq. 3 with the following cosmology: , and .
The resolution of 21cm tomographic data will be first limited by the angular resolution of the interferometer considered. The full width at half maximum (FWHM) of an interferometer is given by (in radians):
(24) 
where is the redshifted 21cm wavelength i.e. and is the maximum baseline of the interferometer. We have , and for LOFAR, SKA1Low core and SKA1Low central respectively. To account for this effect, we convolve the map with a Gaussian kernel of FWHM , with the comoving distance at the redshift (i.e. frequency) considered. For SKA central, we get the secondtoleft panels on the figure – note that because they have similar maximum baselines, the angular resolution of SKA central is close the one of LOFAR. For SKA core, because is much smaller, the smoothing blurs the shape of the ionised bubbles to an extreme point, and our method will perform poorly. Finally, we simulate realistic instrumental noise by using the measurement equation software OSKAR^{12}^{12}12https://github.com/OxfordSKA/OSKAR.
Clean  Smoothed  Smoothing  Smoothing  

signal  signal  + 1000h  scale  
LOFAR  6.7  6.6  7.4  4.5 
SKA1Low core  6.7  8.2  –  17.0 
SKA1Low central  6.7  6.7  6.9  5.3 
Fig. 10 presents the results: the first column corresponds to triangle correlations for a clean field. We identify the peaking scale for this clean field in order to compare it to the scale picked on corrupted data later on and indicate it as a dotted blue line on each plot. Note that for the third field (21CMFAST at ), there are two scales picked up. The larger one is well defined, whereas we use an interval for the smaller one. On the figure, the second column is for the field smoothed with the angular resolution of SKA1Low central; and the last three for the smoothed signal with instrumental noise from each of the three experiments considered (respectively SKA1Low core, central and LOFAR) and 1000 hours of integration time. The dashdotted lines on each plot mark the smoothing scale of the corresponding experiment, whose values are given in Table 2. One can see that results with SKA core are not satisfying because of a too low angular resolution that blurs the edges of the ionised regions and prevents the triangle correlation function from picking up any characteristic scale: the signal is mostly flat or seems to pick up the smoothing scale. However, results for SKA central and LOFAR prove very satisfactory on the toy model (first row), with a clear peak at a scale close to the one found with clean signal (see Table 2 for details): applying the linear relation of Eq. 16 to the peaking scale could give a good approximation of the actual size of ionised regions in the image. When the integration time is increased, we get for both LOFAR and SKA1Low central a peaking scale even closer to the one found for clean signal. Unfortunately, the LOFAR sensitivity does not allow to extract much information from the 21CMFAST maps: at the signal is mostly flat, and although there is a clear peak for the map, it cannot resolve the two characteristic scales, however picked up when accounting for SKA central sensitivity. Note that these results do not change when the integration time is increased to hours. We have also tried our statistic on 21CMFAST boxes at higher redshifts, and we find that, because there is no clear characteristic scale in the field (see Section 4.3), the triangle correlation function peaks at the smoothing scale corresponding to the telescope considered (listed in Table 2). There is however no risk of misinterpretation of this peak since real data would be deconvolved from telescope properties such as angular resolution before being analysed.
Based on Chapman et al. (2015), who presented efficient foreground removal techniques that will produce good quality 21cm maps, in all the work above, we assumed that foreground pollution was completely removed from the data cubes. However, foreground residuals could still impact our results. A precise instrumental calibration is usually required to separate the 21cm signal from foregrounds and is one of the main challenges of upcoming experiments although as mentioned before, it seems that bispectrum phases and therefore our results would be unaffected by calibration errors (Thyagarajan et al., 2018).
6 Why phase information?
In order to see if our phaseonly estimator truly supplements power spectrum information, we generate a 2D ionisation field composed of 70 binary disks of radius and compute its 2point correlation function and its triangle correlations. We then shuffle its Fourier phases – replace them by random phases ranging from 0 to and compute the corresponding 2PCF and triangle correlations. Results can be seen on the upper and lower panels of Fig. 11 respectively. On the left panels, we see that reshuffling the phases has made the field lose all its structure: there are no bubbles any more. Because we kept the absolute value of the field unchanged, the 2PCFs in the second column are exactly identical. However, the extra information carried by phases is clearly visible when comparing the triangle correlation functions on the righthand panels: there is almost no signal for the field with random phases whereas the field with structured phases exhibits a clear peak.
For completeness, we compute three different types of triangle correlations encompassing more or less phase information and compare results on Fig. 12. We do so for three boxes filled with 70 binary bubbles: the first (upper panel) with bubbles of radius , the second (middle panel) and the third (lower panel) .

The second includes the full bispectrum, amplitude included (purple dashed line on Fig. 12):
(25) 
The third only uses the amplitude of the bispectrum (green dotted line on Fig. 12):
(26)
We see that the main advantage of the phaseonly triangle correlation function is that, compared to statistics using bispectrum amplitude, the peaking scale is much better defined. It is also less sensitive to the filling fraction than the two other functions: in the bottom panel, where bubbles are larger and the box filling fraction higher, phase correlations still exhibit a welldefined peak whereas the other two start to flatten.
7 Computational performance
In Sec. 4.3 we compared the results of different methods to derive a bubble size distribution or equivalent, but we did not mention computational performance. Note that our code has been parallelised via OpenMP (OpenMP Architecture Review Board, 2013) in order to reduce computing times: because most of the code consists of sums, i.e. loops, the parallelisation is very efficient. Three parameters will play an essential role in determining the computing time necessary to evaluate triangle correlations:

The number of pixels in the box.

The range of correlations scales for which we compute the triangle correlations: because of the limits of our sum (, see Eq. 15), there will be more terms to sum over for smaller scales. Therefore, in this work, we have chosen to compute triangle correlations for .

The box side length , because it determines the sampling of cells: for smaller , there will be more modes with norm smaller than and hence more terms to sum over.
Most of the boxes analysed in this work had pixels for a side length and triangles correlations were computed for , for an average computation time of 180 minutes on 20 cores. For comparison, the SPA algorithm, as implemented by Giri et al. (2018a), takes about half an hour to run on a 3D box with same side length and sampling; whereas RMFP, as implemented in 21CMFAST, only takes a couple minutes. Therefore more work is required to improve the computational efficiency of our algorithm further and so its usability.
8 Conclusions
Following the work of O13, we have constructed a new statistical tool, based on phase information only and called the triangle correlation function, which can be used to determine the characteristic scale of ionised regions on 21cm interferometric data from the Epoch of Reionisation. Indeed, if we plot this function over a range of correlation scales for simple fields, made of perfectly spherical fully ionised regions on a neutral background, we find a peaked signal, and the peaking scale can be directly related to the actual size of the bubbles. From such toy models, we have derived important properties for our estimator: it can differentiate between ionised regions of different sizes, and distinguish spherical from elongated structures. We have also found its results to be more reliable on scales smaller than about a twentieth of the physical side length of the field studied, as the finite size of the box implies more sample variance at larger scales. To see if our method can be applied to observational data, we have confronted our estimator with ionisation fields corrupted by instrumental noise or angular resolution from LOFAR or SKA, and we have found that it still performs well at giving the characteristic scale of ionised regions, as long as the integration time is sufficient – here, we have worked with 1000 hours. By comparing results for 3PCF including amplitude information to , we also proved that a statistical tool using phase information only will be more efficient at picking up the characteristic scale of spherical structures in a field than a correlation function also using information from the amplitude of the bispectrum. Indeed, the phaseonly function is more peaked, allowing a better identification of the characteristic scale. Moving on to the more elaborate reionisation simulation 21CMFAST (Mesinger & Furlanetto, 2007), we have found that our method gives a satisfying estimation of the size of remote neutral islands at the very end of the reionisation process. In particular, and contrarily to other BSD algorithms such as RMFP and SPA, it is able to resolve two characteristic scales in a field. In the early stages of EoR, because there are many very small ionised regions covering the neutral background, the signal is dominated by Poisson noise, and there is no clear characteristic scale to pick up. Therefore, for fields with a very low global ionised fraction, we rather use our method to infer an upper limit on the size of ionised regions. Note that during the overlap phase, positive correlations from ionised regions overlap with negative signal from neutral zones and the signal flattens out: an absence of signal can be interpreted as reionisation being in its middle stages and so we can learn about the duration of the overlap phase. In general, it will be more difficult to extract phase information from non binary ionisation field because of the more complex structure of the field, which is reverberated in the phase information: Fig. 1 compares the realspace phase information of one of our toy models with for one of our 21CMFAST boxes. These results, on both toy models and more elaboration simulations, hold when instrumental effects such as telescope angular resolution and instrumental noise are added to clean maps.
Apart from noise and instrumental resolution, there will be some other important questions to solve before applying our method to true 21cm tomographic data. Kakiichi et al. (2017) proved that the cold spots in 21cm tomographic images trace regions more efficiently at low redshift () and high filling fraction (): earlier in the reionisation process, cold spots due to a local underdensity rather than ionisation are more frequent (see Sec. 2). Corroborated with the fact that our estimator performs well on remote neutral islands at the end of the reionisation process, we can expect to get good results on low redshift data. However, correctly identifying ionised regions in 21cm observations i.e. transforming a map of the differential brightness temperature into a binary field made of fully ionised and fully neutral regions remains a challenge – for an extended discussion of this issue, see Giri et al. (2018a). Note that this is not an issue for the friendsoffriends (Iliev et al., 2006; Friedrich et al., 2011) and watershed methods (Lin et al., 2016) since they segment the data themselves. Additionally, it remains to be seen exactly how to exploit closure phases in practice, and how the quality of the data, especially its sparsity, would impact our results. Finally, the practical application of triangle correlations to observational data would require forwardmodelling i.e. comparing measurements to the signal obtained for a number of simulations corresponding to various parametrisations of reionisation to infer the reionisation scenario corresponding to the observation. In this perspective, one would need to compute the triangle correlations of many more simulations.
Acknowledgements
The authors thank Catherine A. Watkinson, Tom Binnie, Marian Douspis and Benoit Semelin for useful discussions on various aspects of the analysis. They also thank Emma Chapman for kindly providing results from OSKAR runs to perform our noise analysis and Nithyanandan Thyagarajan for his helpful comments on a draft version of this paper. AG and JRP acknowledge financial support from the European Research Council under ERC grant number 638743FIRSTDAWN. AG’s work is supported by a PhD studentship from the UK Science and Technology Facilities Council (STFC).
The work presented in this paper made use of astropy, a communitydeveloped core Python package for astronomy (Astropy Collaboration et al., 2013, 2018); matplotlib, a Python library for publication quality graphics (Hunter, 2007); scipy, a Pythonbased ecosystem of opensource software for mathematics, science, and engineering (Jones et al., 2001) – including numpy (Oliphant, 2006), and emcee, an implementation of the affine invariant MCMC ensemble sampler (ForemanMackey et al., 2013).
References
 Astropy Collaboration et al. (2013) Astropy Collaboration et al., 2013, A&A, 558, A33
 Astropy Collaboration et al. (2018) Astropy Collaboration et al., 2018, preprint, (arXiv:1801.02634)
 Bag et al. (2018) Bag S., Mondal R., Sarkar P., Bharadwaj S., Sahni V., 2018, Monthly Notices of the Royal Astronomical Society, 477, 1984
 Bharadwaj & Pandey (2005) Bharadwaj S., Pandey S. K., 2005, MNRAS, 358, 968
 Bouwens et al. (2015) Bouwens R. J., Illingworth G. D., Oesch P. A., Caruana J., Holwerda B., Smit R., Wilkins S., 2015, ApJ, 811, 140
 Bouwens et al. (2017) Bouwens R. J., Oesch P. A., Illingworth G. D., Ellis R. S., Stefanon M., 2017, ApJ, 843, 129
 Carilli et al. (2018) Carilli C. L., et al., 2018, preprint, (arXiv:1805.00953)
 Chapman et al. (2015) Chapman E., et al., 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA14), p. 5
 Chen et al. (2018) Chen Z., Xu Y., Wang Y., Chen X., 2018, arXiv eprints, p. arXiv:1812.10333
 Ciardi & Madau (2003) Ciardi B., Madau P., 2003, ApJ, 596, 1
 Eggemeier et al. (2015) Eggemeier A., Battefeld T., Smith R. E., Niemeyer J., 2015, MNRAS, 453, 797
 Elbers & van de Weygaert (2018) Elbers W., van de Weygaert R., 2018, arXiv eprints, p. arXiv:1812.00462
 ForemanMackey et al. (2013) ForemanMackey D., Hogg D. W., Lang D., Goodman J., 2013, PASP, 125, 306
 Friedrich et al. (2011) Friedrich M. M., Mellema G., Alvarez M. A., Shapiro P. R., Iliev I. T., 2011, MNRAS, 413, 1353
 Furlanetto et al. (2004a) Furlanetto S. R., Zaldarriaga M., Hernquist L., 2004a, ApJ, 613, 1
 Furlanetto et al. (2004b) Furlanetto S. R., Zaldarriaga M., Hernquist L., 2004b, ApJ, 613, 16
 Giri et al. (2018a) Giri S. K., Mellema G., Dixon K. L., Iliev I. T., 2018a, MNRAS, 473, 2949
 Giri et al. (2018b) Giri S. K., Mellema G., Ghara R., 2018b, MNRAS, 479, 5596
 Gleser et al. (2006) Gleser L., Nusser A., Ciardi B., Desjacques V., 2006, MNRAS, 370, 1329
 Gorce et al. (2018) Gorce A., Douspis M., Aghanim N., Langer M., 2018, A&A, 616, 113
 Greig et al. (2017) Greig B., Mesinger A., Haiman Z., Simcoe R. A., 2017, MNRAS, 466, 4239
 Greig et al. (2018) Greig B., Mesinger A., Bañados E., 2018, preprint, p. arXiv:1807.01593 (arXiv:1807.01593)
 Harker et al. (2009) Harker G. J. A., et al., 2009, MNRAS, 393, 1449
 Hunter (2007) Hunter J. D., 2007, Computing In Science & Engineering, 9, 90
 Iliev et al. (2006) Iliev I. T., Mellema G., Pen U.L., Merz H., Shapiro P. R., Alvarez M. A., 2006, MNRAS, 369, 1625
 Ishigaki et al. (2015) Ishigaki M., Kawamata R., Ouchi M., Oguri M., Shimasaku K., Ono Y., 2015, ApJ, 799, 12
 Jennison (1958) Jennison R. C., 1958, MNRAS, 118, 276
 Jones et al. (2001) Jones E., Oliphant T., Peterson P., et al., 2001, SciPy: Open source scientific tools for Python, http://www.scipy.org/
 Kakiichi et al. (2017) Kakiichi K., et al., 2017, MNRAS, 471, 1936
 Konno et al. (2014) Konno A., et al., 2014, ApJ, 797, 16
 Lee et al. (2008) Lee K.G., Cen R., Gott J. Richard I., Trac H., 2008, ApJ, 675, 8
 Levrier et al. (2006) Levrier F., Falgarone E., Viallefond F., 2006, A&A, 456, 205
 Lin et al. (2016) Lin Y., Oh S. P., Furlanetto S. R., Sutter P. M., 2016, MNRAS, 461, 3361
 Majumdar et al. (2018) Majumdar S., Pritchard J. R., Mondal R., Watkinson C. A., Bharadwaj S., Mellema G., 2018, MNRAS, 476, 4007
 Mason et al. (2018) Mason C. A., Treu T., Dijkstra M., Mesinger A., Trenti M., Pentericci L., de Barros S., Vanzella E., 2018, ApJ, 856, 2
 Mesinger & Furlanetto (2007) Mesinger A., Furlanetto S., 2007, ApJ, 669, 663
 Mesinger et al. (2011) Mesinger A., Furlanetto S., Cen R., 2011, MNRAS, 411, 955
 Mondal et al. (2015) Mondal R., Bharadwaj S., Majumdar S., Bera A., Acharyya A., 2015, MNRAS, 449, L41
 Monnier (2007) Monnier J. D., 2007, New Astron. Rev., 51, 604
 Obreschkow et al. (2013) Obreschkow D., Power C., Bruderer M., Bonvin C., 2013, ApJ, 762, 115
 Oliphant (2006) Oliphant T., 2006, NumPy: A guide to NumPy, USA: Trelgol Publishing, http://www.numpy.org/
 OpenMP Architecture Review Board (2013) OpenMP Architecture Review Board 2013, OpenMP Application Program Interface Version 4.0, https://www.openmp.org/wpcontent/uploads/OpenMP4.0C.pdf
 Planck Collaboration et al. (2016a) Planck Collaboration et al., 2016a, A&A, 594, A13
 Planck Collaboration et al. (2016b) Planck Collaboration et al., 2016b, A&A, 596, A108
 Planck Collaboration et al. (2018) Planck Collaboration et al., 2018, preprint, (arXiv:1807.06209)
 Pritchard & Loeb (2012) Pritchard J. R., Loeb A., 2012, Reports on Progress in Physics, 75
 Readhead et al. (1988) Readhead A. C. S., Nakajima T. S., Pearson T. J., Neugebauer G., Oke J. B., Sargent W. L. W., 1988, AJ, 95, 1278
 Robertson et al. (2015) Robertson B. E., Ellis R. S., Furlanetto S. R., Dunlop J. S., 2015, ApJ, 802, L19
 Schenker et al. (2014) Schenker M. A., Ellis R. S., Konidaris N. P., Stark D. P., 2014, The Astrophysical Journal, 795, 20
 Thompson et al. (2017) Thompson A. R., Moran J. M., Swenson George W. J., 2017, Interferometry and Synthesis in Radio Astronomy, 3rd Edition, doi:10.1007/9783319444314.
 Thyagarajan et al. (2018) Thyagarajan N., Carilli C. L., Nikolic B., 2018, Phys. Rev. Lett., 120, 251301
 Totani et al. (2014) Totani T., et al., 2014, Publications of the Astronomical Society of Japan, 66, 63
 Visbal & Loeb (2012) Visbal E., Loeb A., 2012, Journal of Cosmology and AstroParticle Physics, 2012, 007
 Watkinson et al. (2017) Watkinson C. A., Majumdar S., Pritchard J. R., Mondal R., 2017, MNRAS, 472, 2436
 Watkinson et al. (2019) Watkinson C. A., Giri S. K., Ross H. E., Dixon K. L., Iliev I. T., Mellema G., Pritchard J. R., 2019, MNRAS, 482, 2653
 Watts et al. (2003) Watts P., Coles P., Melott A., 2003, ApJ, 589, L61
 Wolstenhulme et al. (2015) Wolstenhulme R., Bonvin C., Obreschkow D., 2015, ApJ, 804, 132
 Zahn et al. (2007) Zahn O., Lidz A., McQuinn M., Dutta S., Hernquist L., Zaldarriaga M., Furlanetto S. R., 2007, ApJ, 654, 12
Appendix A Analytical derivation for a toy model
Consider a 2D box of volume filled with fully ionised bubbles of radius , randomly distributed throughout the box so that their centres are located at for . The ionisation field of the box is then
(27) 
where is the Heaviside step function worth 1 if and 0 otherwise. We need to take the Fourier transform of this field, derived as follows:
(28)  
where to go from the first to the second line, we have used the change of variables . From the fourth to the fifth, we have let to be able to use:
(29) 
If we define , the Fourier transform of the tophat window function in 2D, we have the final expression of the Fourier transform of our ionisation field:
(30) 
Indeed, if we let the mean number density of ionised bubbles such that and ignore overlapping^{13}^{13}13This will be a reasonable assumption for the filling fractions considered., . From this we deduce analytic expressions for the power spectrum
(31)  
which is a real number; and the bispectrum of the box
(32)  
To find the analytic expression of our triangle correlations function for this toy model, we then plug Eq. 32 into Eq. 15.
In Fig. 13, we compare the result of a numerical integration of the triangle correlation function computed from the bispectrum in Eq. 7b with a version where we compute analytically the triangle correlation function, knowing the locations of the ionised bubbles and the size of the box only, according to Eq. 32 and detailed above. We consider 20 realisations of a box of dimensions for pixels, filled with 20 bubbles of radius ^{14}^{14}14The choices of box size and bubble number are limited by the computational cost of the analytic method.. We see that there is a good match between both computational methods: we have the confirmation that our estimator indeed traces the bubble distribution in the ionisation field considered.