The Scale of the Problem : Recovering Images of Reionization with GMCA
Abstract
The accurate and precise removal of 21cm foregrounds from Epoch of Reionization redshifted 21cm emission data is essential if we are to gain insight into an unexplored cosmological era. We apply a nonparametric technique, Generalized Morphological Component Analysis or gmca, to simulated LOFAREoR data and show that it has the ability to clean the foregrounds with high accuracy. We recover the 21cm 1D, 2D and 3D power spectra with high accuracy across an impressive range of frequencies and scales. We show that gmca preserves the 21cm phase information, especially when the smallest spatial scale data is discarded. While it has been shown that LOFAREoR image recovery is theoretically possible using image smoothing, we add that wavelet decomposition is an efficient way of recovering 21cm signal maps to the same or greater order of accuracy with more flexibility. By comparing the gmca output residual maps (equal to the noise, 21cm signal and any foreground fitting errors) with the 21cm maps at one frequency and discarding the smaller wavelet scale information, we find a correlation coefficient of 0.689, compared to 0.588 for the equivalently smoothed image. Considering only the pixels in a central patch covering 50 of the total map area, these coefficients improve to 0.905 and 0.605 respectively and we conclude that wavelet decomposition is a significantly more powerful method to denoise reconstructed 21cm maps than smoothing.
keywords:
cosmology: theory – dark ages, reionization, first stars – diffuse radiation – methods: statistical.1 Introduction
When the first ionizing sources appeared 400 million years after the Big Bang, the Universe emerged from the ‘Dark Ages’ and began to be reionized. This Epoch of Reionization (EoR) is on the verge of being directly observed for the first time, with a new generation of radio telescopes beginning to see first light (e.g. Low Frequency Array (LOFAR)^{1}^{1}1http://www.lofar.org/ (van Haarlem et al., in preparation), Giant Metrewave Radio Telescope (GMRT)^{2}^{2}2http://gmrt.ncra.tifr.res.in/, Murchison Widefield Array (MWA)^{3}^{3}3http://www.mwatelescope.org/,Precision Array to Probe the Epoch of Reionization (PAPER)^{4}^{4}4http://astro.berkeley.edu/ dbacker/eor/, 21 Centimeter Array (21CMA)^{5}^{5}5http://21cma.bao.ac.cn/).
The vast majority of EoR observations will take advantage of the 21cm spectral line  produced by a spin flip in neutral hydrogen (van de Hulst, 1945; Ewen & Purcell, 1951; Muller & Oort, 1951). This 21cm radiation can be observed interferometrically at radio wavelengths as a deviation from the brightness temperature of the CMB (Field 1958; Field 1959; Madau, Meiksin & Rees 1997; Shaver et al. 1999).
Observationally, the 21cm signal will be accompanied by systematic effects due to the ionosphere and instrument response, system noise, extragalactic foregrounds and Galactic foregrounds (e.g. Jelić et al., 2008, 2010), the latter of which are orders of magnitude larger than the 21cm signal we wish to detect. The foregrounds must be accurately and precisely removed from the observed data as any error at this stage has the ability to strongly affect the EoR 21cm signal. Foreground removal and the implications for 21cm cosmology have been extensively researched over the past decade (e.g Di Matteo et al. 2002; Oh & Mack 2003; Di Matteo, Ciardi & Miniati 2004; Zaldarriaga, Furlanetto & Hernquist 2004; Morales & Hewitt 2004; Santos, Cooray & Knox 2005; Wang et al. 2006; McQuinn et al. 2006; Jelić et al. 2008; Gleser, Nusser & Benson 2008; Bowman, Morales & Hewitt 2006; Harker et al. 2009; Liu, Tegmark & Zaldarriaga 2009; Liu et al. 2009; Harker et al. 2010; Liu & Tegmark 2011; Petrovic & Oh 2011; Mao 2012; Liu & Tegmark 2012; Cho, Lazarian & Timbie 2012; Chapman et al. 2012). This paper concentrates on the spectral fitting of the foregrounds and assumes that bright sources have been accurately removed, for example via a flux cut (Di Matteo et al., 2004). This is a safe assumption since, as recently shown by Trott, Wayth & Tingay 2012, the residuals from the bright source subtraction are expected to be smaller than the thermal noise contribution and so are not a limiting factor.
In our first paper, Chapman et al. (2012), we introduced a method to successfully remove the foregrounds while making only minimal assumptions. By introducing another similarly successful method, we acknowledge the possibility that different methods will be well suited for the extraction of different information from the data and that there is an advantage in having several foreground cleaning methods to apply the data independently to confirm a statistical detection. In this paper we implement another nonparametric method, the sparsitybased blind source separation (BSS) technique Generalized Morphological Component Analysis, or gmca. gmca provides a complete basis set for foreground removal as opposed to a polynomial fitting method which is not a complete set unless we describe it with a polynomial of the order of the number of frequencies. Polynomial methods rarely utilise orders larger than five and so most polynomial fitting methods may leave the foregrounds incompletely described compared to gmca.
In Section 2 we summarise the various foreground removal pipelines which have been introduced in the literature so far and then go on to detail the method that we utilise, gmca. In Section 3 we introduce our data cube and the methods used to produce it before presenting the statistical results in Section 4. In Section 5 we explore the possibility of recovering images of reionization before we set out our conclusions in Section 6.
2 Foreground Removal Techniques
The statistical detection of the 21cm reionization signal depends on an accurate and robust method for removing the foregrounds from the total observed signal. For a brief summary of the recent 21cm foreground removal literature we ask the reader to refer to Section 2 of Chapman et al. (2012).
The majority of the literature involves parametric methods, whereby at some point a certain form for the foregrounds is assumed, for example polynomials (e.g. Santos et al. 2005; Wang et al. 2006; McQuinn et al. 2006; Bowman et al. 2006; Jelić et al. 2008; Gleser et al. 2008; Liu, Tegmark & Zaldarriaga 2009; Liu et al. 2009; Petrovic & Oh 2011). In contrast, nonparametric methods do not assume a specific form for the foregrounds, instead allowing the data to determine the foregrounds using more free parameters. While advantageous for poorly constrained data, results are often not as promising as parametric results, though recent methods have challenged this. Harker et al. (2009, 2010) preferentially considered foreground models with as few inflection points as possible, which when applied to simulated LOFAREoR data compared very favourably with parametric methods. Similarly, fastica as presented in Chapman et al. (2012) accurately recovered the 21cm power spectra by considering the statistically independent components of the foregrounds. gmca (Bobin et al. 2007, Bobin et al. 2008, Bobin, Starck, Moudden & Fadili 2008, Bobin et al. 2012) is another nonparametric method which has a greater flexibility through wavelet choice without the sacrifice of the blind nature of the approach. We will show that gmca not only recovers the power spectra to high accuracy but also that, using wavelet decomposition, the simulated 21cm signal maps can be recovered exceedingly well after the foreground removal process.
2.1 The GMCA method
The nonparametric method of removing the foregrounds is effectively a BSS problem. Chapman et al. (2012) utilised a statistical approach to source separation, namely fastica. This assumed that the components of the foregrounds were statistically independent and nonGaussian in order to reconstruct the smooth spectral form of the foregrounds and leave a residual signal from which we could identify the 21cm emission statistics. This statistical pursuit of independence is only one form that BSS techniques take, the other utilising morphological diversity and sparsity to separate the sources. Zibulevsky & Pearlmutter (2001) proposed a new method of BSS, where one could find a basis set in which the sources to be found would be sparsely represented, i.e. a basis set where only a few of the coefficients would be nonzero. With the sources being unlikely to have the same few nonzero coefficients one could then use this sparsity to more easily separate the mixture. For example, were the 21cm signal strong enough to detect directly using this method, we would expect it to be sparse on certain scales given the characteristic size of the ionized bubbles, much in the same way the SZ effect can be detected with this method when analysing CMB data (Bobin et al., 2008). These bubble sizes change as a function of redshift so the sparse signal from these would arise as a pattern as a function of wavelength. In comparison, the smooth frequency structure of the foregrounds implies that the few sparse nonzero coefficients describing the foregrounds at the same scales as the 21cm signal would be unchanging with frequency. Given our noise realisations, the 21cm signal is far too small for this technique to pick it out as a source in its own right. Instead, it is how the foregrounds can be described as different sparse components which enables us to obtain the 21cm signal and noise as a residual.
The idea of exploiting the sparseness of sources in different bases has evolved into a full and diverse field of applications. The method has evolved to allow sources to have different morphologies, exploit multichannel data and consider different bases for different sources in order to achieve the most sparse representations.
Consider an observation of maps each constituting pixels across channels of observation. The problem to be solved can be stated in the following manner:
(1) 
where X is the matrix representing the observed data, is the number of sources to be estimated, S is the signal matrix to be determined, A is the mixing matrix and N is the noise matrix.
As this is a BSS problem, we need to estimate both S and A. We seek to find the 21cm signal as a residual in the separation process, therefore S represents the foreground signal and, due to the extremely low signaltonoise of this problem, the 21cm signal is numerically ignored by the method and can be thought of as an insignificant part of the noise.
We can expand the sources, , in a wavelet basis which can mathematically be thought of as a matrix of wavelet waveforms, , such that . is defined to be sparse if only a few of the are significantly nonzero.
The objective of GMCA is to seek an unmixing scheme, through the estimation of A, which yields the sparsest sources S in the wavelet domain. This is expressed by the following optimization problem written in the augmented Lagrangian form :
(2) 
where typically ; sparsity is generally enforced for which measures the number of nonzero entries of (or its relaxed convex version with ) and is the Frobenius norm. The problem in Equation 2 is solved in an iterative twostep algorithm such that at each iteration :

Estimation of the S for A fixed to :
Solving the problem in Equation 2 for assuming A is fixed to , the sources are estimated as follows :where stands for the hardthresholding operator with threshold ; this operator puts to zero all coefficients with amplitudes lower than . In practice, the threshold is set to be equal to times the standard deviation of the noise level to exclude noise coefficients. The term denotes the Moore pseudoinverse of the matrix .

Estimation of the A for S fixed to :
Updating the mixing matrix assuming that the sources are known and fixed to is as follows :
For more technical details about GMCA, we refer the interested reader to (Bobin et al. 2007, Bobin et al. 2008, Bobin, Starck, Moudden & Fadili 2008, Bobin et al. 2012), where it is shown that sparsity, as used in GMCA, allows for a more precise estimation of the mixing matrix A and more robustness to noise than ICAbased techniques.
GMCA provides an efficient method of separating the foreground signal from the noise and 21cm signal by locating the most sparse components that the foreground signal could be made of in the wavelet basis . From a Bayesian point of view, using this sparsity method is equivalent to having an inbuilt prior in the model that the foregrounds are sparse over the basis chosen. While a Bayesian evidence analysis could be possible in order to compare different methods as well as different bases, and quantify the overfitting due to more free parameters, we consider it outside the scope of this project.
2.2 Wavelets
The set of basis functions, , used by gmca comprises wavelet functions.
The Fourier transform is a well known method of analysing data at different scales with a single set of basis functions  sines and cosines. In reality this confined basis set can obscure information, and so we instead consider an infinite set of basis functions localized in space  the wavelet functions. There are many types of wavelets  with some more localized in space, some smoother and some with fractal structures.
The most common form of wavelet used in astrophysics is the isotropic undecimated wavelet transform (IUWT) which we describe briefly below in reference to the more complete descriptions in literature (e.g. Starck, Murtagh & Bijaoui 1998, Starck & Murtagh 2006). Consider an image with pixels (where, using the previous section’s notation and we will refer to the pixel coordinates as ).
We can decompose an image at a particular frequency, , into a coarse version of itself, , along with a superposition of the original image at different wavelet scales:
(3) 
where the wavelet coefficient represents the data at scale 2.
The decomposition is typically achieved using lowpass 1D filters, which we call , implemented by the “à trous” algorithm:
(4)  
(5) 
When can be described by only a few significantly nonzero , we say that is sparse in that basis.
In this paper we utilise wavelets twice. The first time is within the gmca algorithm, where we have a choice of different wavelet types that can be used as the basis set. gmca uses wavelet decomposition to identify the sources, S, but then returns a data cube with all scales present. In our analysis in Section 5 we wish to look at these results on different scales and so we utilise wavelet decomposition to do this. We will use the IUWT both within gmca and later to analyse the images at different scales, though we briefly consider other wavelets when we question how much our results depend on this choice of wavelet.
For our gmca implementation we set the number of decomposition scales to be eight, the maximum allowed by the dimensions of our data cube.
3 Simulated EoR Data
We simulate 170 frequency maps between 115 and 199.5 MHz with spacings of 0.5 MHz. The maps consist of pixels representing a comoving 1734 Mpc area. At 10.8 this is equivalent to a field of view of 1010 or a resolution of 1.17 arcminutes per pixel. Since an interferometer like LOFAR is insensitive to the mean value of the brightness temperature, we use meansubtracted maps. We use the same set of simulations as Chapman et al. (2012) and the reader can refer there for more detailed information on the simulations.
The observable of the 21cm radiation, the brightness temperature , is simulated using the seminumeric modelling tool 21cmfast (Mesinger & Furlanetto 2007; Mesinger, Furlanetto & Cen 2011). The code was run using a standard cosmology, ()=(0.72,0.28,0.046,0.96,0.82,73)(Komatsu et al., 2011) and initialised at =300 on a grid. The velocity fields used to perturb the initial conditions as well as the resulting 21cm boxes were formed on a cruder grid of before being interpolated up to . We define halos contributing ionizing photons as having a minimum virial mass of .
The foreground simulations are obtained using the foreground models described in Jelić et al. (2008, 2010). We model Galactic diffuse synchrotron emission (GDSE), Galactic localised synchrotron emission, Galactic diffuse freefree emission and extragalactic foregrounds consisting of contributions from radio galaxies and radio clusters. While the Galactic foreground emission is simulated using Gaussian random fields, the total foreground signal is nonGaussian. We do not consider the polarisation of the foregrounds. The foregrounds simulated here can be up to five orders of magnitude larger than the signal we hope to detect but since interferometers such as LOFAR measure only fluctuations, foreground fluctuations dominate by ‘only’ three orders of magnitude.
To simulate the noise at each frequency, a LOFAR measurement set was filled with Gaussian noise in the uv plane. This was then imaged to create a realspace image, the root mean square of which can be normalized to the value as given by the prescription detailed in Chapman et al. (2012). For example the noise sensitivity at 150 MHz for an integration time of 600 h, a PSF of width 4 arcminutes and a frequency spacing of 0.5 MHz is 64 mK. The 170 noise maps were uncorrelated over frequency  i.e. a different noise realization was used to fill the measurement set for each frequency.
The success of an interferometer such as LOFAR is highly dependent on how uv space is sampled. The particular pattern of uv sampling forms a beam which affects how the components such as the foregrounds are seen by the interferometer. Dirty foreground and 21cm images were simulated by convolving with the PSF of the LOFAR setup used to simulate the noise. The PSF used for creating dirty images (and for creating the noise as described in the previous section) was chosen to be the worst in the observation bandwidth  i.e. the PSF at 115 MHz. In observations the synthesized beam decreases in size with increasing frequency, causing point source signals to oscillate with the beam, producing a foreground signal with an oscillatory signal very much like that of the 21cm signal (Vedantham, Shankar & Subrahmanyan, 2012). However, this modemixing contribution has been found not to threaten the 21cm recovery and have a power well below the 21cm level (Bowman et al. 2006; Liu, Tegmark & Zaldarriaga 2009; Trott, Wayth & Tingay 2012). As such we do not consider a frequency dependent PSF here.
Once the foregrounds and 21cm signal have been adjusted for uv sampling, the three component cubes are added together. The components of the total along a random line of sight are shown in Fig. 1.
4 Results
In the following section, the word ‘reconstructed’ refers to a component which has been estimated from the simulated data using gmca. The ‘residuals’ or (‘rest’ for short) are the difference between the total mixed signal and the reconstructed foregrounds and should therefore consist of the 21cm signal, noise and any fitting errors.
4.1 Source Number
gmca requires the specification of the number of sparse sources that the data can be defined by. We utilise this component separation technique to define the foregrounds in order to subtract them, treating the 21cm signal as noise. Therefore, the number of sources refers to the number of foreground contributions which can be described by unique sparse descriptions (not necessarily the number of different foreground components such as Galactic freefree).
As each foreground model might be best described by a different number of sources, we seek to define the number of sources by minimizing the leakage of foregrounds into the 21cm signal. Let us introduce the statistics and :
(6) 
(7) 
where A is the mixing matrix calculated by gmca and Y is a data cube, for example the foregrounds or the 21cm signal. is the amount of the data Y that contributes to the gmca sources (in our case the reconstructed foregrounds). Thus we can calculate the amount of leakage of simulated noise and 21cm signal, into the reconstructed foregrounds by allowing Y to equal the combined simulated 21cm and noise data cube. Conversely, is the amount of data Y which does not contribute to the gmca source model. For example, letting Y equal the simulated foreground cube, , will tell us how much of the foregrounds leak into the residuals.
We take the power spectra of and and compare them to the power spectra of the 21cm signal to see the number of sources for which leakage is minimized, Fig. 2. Note that, prior to our wavelet type analysis in Section 4.2, we adopt the default setting of gmca for this source number analysis, the IUWT using the à trous algorithm.
To be confident of our reconstructed 21cm signal, we ideally want both the power spectra of and to lie below that of the 21cm signal and to be as small as possible. We see that, while one source does not seem enough to accurately constrain the foregrounds, there is very little difference between the leakages resulting from a 2 or 3 source foreground model. Indeed, we find this holds true for 4 and 5 sources also. As more and more sources are added to the model, we might expect the 21cm signal itself to leak into the foreground model and be picked out as an individual source. However the magnitude of this (seen as part of ) will be small as the signal to noise of the foregrounds is much larger than that of the cosmological signal. When applying to real data we will have to rely on models of the foregrounds as observed by the data to estimate the level of leakage from the signal onto the foregrounds.
We choose to assume two foreground sources for the rest of this paper, though the reader should be aware that different foreground models may require different source numbers in order to optimise the method.
4.2 Choice of wavelet
In Fig. 3 we consider how the leakages and change depending on the choice of wavelet used by gmca. There are many different types of wavelets, but they can be broadly categorized by whether they are decimated (i.e. provide a redundant signal representation), isotropic and which filter they use for the separation of data at different scales. Here we consider the IUWT, a decimated wavelet transform (referred to as Mallat), a nondyadic and undecimated wavelet transform (referred to as Feauveau) and an undecimated wavelet transform using the Haar filter (referred to as Haar).
We can see that the choice of wavelet can affect how small the foreground leakages are, and therefore the success of the method. These differences are highly dependent on frequency as well, with one wavelet type outperforming others at certain frequencies. For our implementation we choose the IUWT as it consistently minimizes the leakages over the frequency range. We will show in Section 4.3.2 that we can potentially correct for , though not and due to the blind nature of the problem.
4.3 Power Spectra
EoR experiments aim to recover the power spectrum of the cosmological signal over a broad range of frequencies.
The power spectrum of a lineofsight/map/cube at a single frequency is calculated by 1D/2D/3D Fourier transforming that lineofsight/map/cube and binning the pixels according to Fourier scale, . The power at any particular , is the average power of all the uv cells in the bin centering on . The error on the point for a particular bin, , is calculated as where is the number of uv cells that reside in that bin. The power spectrum of the reconstructed 21cm signal is calculated by subtraction of the noise power spectrum from the gmca residuals power spectrum. The total error on the reconstructed 21cm power spectrum is calculated using the above error formula applied to the reconstructed 21cm power spectrum, added in quadrature with the formula applied to the noise power spectrum, in order to take into account both sample variance and the effect of any error in the noise estimate. Note that we assume Gaussianity whereas the 21cm signal is not Gaussian and also we calculate the error bars from the power of a single realization rather than over an ensemble of simulations. We ask the reader to bear in mind that these error bars might be considered incomplete because of this.
To explain a few graphical conventions: any points where the power of the residuals is below the power of the noise are omitted, as this leads to an unrealistic negative reconstructed 21cm power; any error bars extending to below the x axis in linear space are shown with a lower error bar of equal length to the upper error bar in log space.
4.3.1 1D Power Spectra
The 1D power spectra are calculated over frequency wedges of 8MHz to avoid evolution effects. Each line of sight produces a 1D power spectrum, one for each of the 512 512 pixels. These power spectra are then averaged over all the pixels. The frequencies mentioned correspond to the frequency in the middle of each 8MHz wedge and the quantity plotted in Fig. 4 is where is the comoving length of the wedge. The 1D power spectra are recovered to high accuracy across the frequency range and across the scales.
4.3.2 2D and 3D Power Spectra
For the 2D power spectra, the quantity plotted is where is the area of the simulation map. To calculate the 3D power spectra we divide the cube into subbands of 8 MHz to avoid signal evolution effects. The quantity plotted is where V is the volume of the subband.
We now consider the recovered 2D and 3D 21cm power spectra and how best we can minimise the effects of leakage. The noise leakage into the reconstructed foregrounds can be accurately quantified by creating an independent realization of the noise (no2) (for real data it is assumed we will know the statistics of the noise to high precision and can therefore create a second realization from the known noise power spectrum) and applying Equation 6 to find . We find that the power spectra of and are almost identical meaning that A is not a strong function of the original noise realization. We can use this information to find a better estimate of the 2D and 3D power spectra and compensate for the noise leakage using the following calculation:
(8) 
To calculate a leakage ratio we would generally have to include all the leakages, such that . However, since we can correct for we can instead calculate:
(9) 
which quantifies the amount of 21cm leakage into the foregrounds and vice versa. We can see the leakage ratio plotted for the 2D and 3D power spectra for multiple frequencies in Fig. 5. We see that we can expect an accurate 21cm power spectrum recovery at large once the noise leakage is compensated for. The smaller recovery is still very dependent on the foreground magnitude and therefore frequency of observation.
We can see the result of applying Equation 8 in Figs. 6 and 7. Wherever and/or exceeds the power of the simulated 21cm signal we see a degraded fit in Fig. 6. The 2D power spectra are recovered to excellent accuracy and we see that once the noise leakage is taken into account there is much less leakage at large scales, allowing a more complete power spectrum reconstruction.
For the 3D power spectra, a similarly accurate recovery is made across the frequency range. The recovery in 3D is more precise due to the larger amount of data in a box as opposed to a single slice. Again, once the noise leakage is taken into account the recovered spectra become much more complete on the larger scales.
5 Phase Conservation and Imaging
Recently it has been shown that imaging of the neutral hydrogen in the late stages of reionization is possible with the current generation of radio telescopes when angular scales larger than 0.5 are considered, independent of the type of reionization source (Zaroubi et al., 2012). Here, we compare the output residual maps with the simulated 21cm maps and consider how well the phases of the 21cm signal are conserved through the foreground removal process. The better the phases are conserved, the more correlation between maps we will observe. We will also consider the maps at different scales and as such we also decompose the output maps into 8 wavelet scales using the IUWT.
For a particular frequency, we calculate the phase of each uv point in a Fourier transformed map, F, as Phase[u,v] = (Im(F(u,v))/Re(F(u,v))).
In Fig. 8 we see the phase density relating to each wavelet scale of the 21cm and residual cubes. For each frequency we calculate the phase of each pixel in the 21cm and residual maps and use this as a coordinate on a phase density map where the x axis is the phase of the residuals and the y axis is the phase of the 21cm map. The more pixels with coordinates corresponding to a particular bin in the phases, the higher the phase density we will observe. If gmca perfectly preserves the phase of the 21cm signal we should see a diagonal phase density plot. For clarity, only the phase bins with a pixel count in the largest 67 of the pixel count distribution are plotted. We see that for the three crudest wavelet scales there is excellent phase recovery across the frequency range, while at 160 MHz this excellent recovery can be seen on even smaller wavelet scales. This is in line with expectations since the relative rms values (i.e. ratio of the rms of the 21cm and noise and ratio of the rms of the 21cm and foregrounds) for the 21cm signal both peak at just after 160 MHz.
In Fig. 9 we start with the crudest wavelet scale and then add the next crudest scale one at a time to see the effect on phase conservation, finding that this improves the recovery considerably.
In Fig. 10 we compare the reconstructed signal maps with the simulated signal maps at the different wavelet scales. It is clear that the foregrounds are reconstructed to a high accuracy at all scales. The residuals show clear correlation with the 21cm maps  especially at scales 54434 Mpc. By considering the data on different scales we compensate for the small scale noise leakage and can retrieve convincing reconstructed images in comparison to the map with all scale data included.
We plot the Pearson correlation coefficient between the simulated 21cm maps and the residual maps at different individual scales in Fig. 11. On an individual basis, we can clearly see that the finer the scale of structure we correlate, the less of a correlation there is between the residuals and simulated 21cm maps. The finer wavelet scale we look at, the more dominant noise leakage will be in the residual maps and so a smaller correlation is observed. By only comparing the crudest wavelet scales however, we risk losing a lot of the small scale 21cm structure and being increasingly dominated by the foreground signal.
We can add several of the wavelet scales together in order to balance having enough useful information without including too much noise leakage on the smaller wavelet scales. To compare with Zaroubi et al. (2012), we also include images of the full data which have been smoothed with a 20 arcminute Gaussian kernel. Wavelet decomposition has the advantage of providing a selection of scales on which one can analyse the images, as opposed to being restricted by filters such as a Gaussian kernel which simply remove all modes below a certain scale. However, the scales at which one can analyse images with wavelet decomposition are determined by the method itself  you cannot then ask what the data look like at a scale half way inbetween two wavelet scales.
In Fig. 12 we recover impressive images of the reionization signal at 165 MHz when the smallest scale information is discarded. Comparing the residual and 21cm maps on each row we find correlation coefficients of 0.689, 0.687 and 0.588 for the top, middle and bottom rows respectively. We therefore conclude that the wavelet decomposition more optimally removes the noise from the residuals than the smoothing technique (bottom row) employed by Zaroubi et al. (2012). In Fig. 10 we can see the wavelet decomposition has the effect of clustering the noise around the edges of the image (this effect is particularly pronounced in rows 3 and 4). To avoid this edge effect we also correlate the maps in Fig. 12 again but this time considering only the pixels in a central patch covering 50 of the total map area. We find correlation coefficients of 0.905, 0.788 and 0.605 for the top, middle and bottom rows respectively. With real data, we can assume that this edge effect would not be a problem as the fitting and decomposition can always be carried out over a slightly larger cube with only the central region being used for data analysis.
6 Conclusions
In this paper we have assessed the sparsitybased blind source separation technique gmca as a possible way of removing the foregrounds on a 21cm EoR signal. We recover the 1D, 2D and 3D power spectra to high accuracy across the frequency range. Since the mixing matrix calculated by gmca was shown not to be a strong function of the noise realization, we were able to compensate for leakage of noise power into the reconstructed foregrounds using an independent noise realization, leading to more complete 2D and 3D power spectra.
We also considered if images of reionization could be recovered from the LOFAREoR data once foreground removal with gmca has been carried out. Using wavelet decomposition, we considered the phase correlation between the gmca residuals and the simulated 21cm at different scales. We find strong correlations at the cruder wavelet scales and add several scales together to balance the amount of information in an image with the accuracy of the phase recovery. We find that when distance scales of below 54 Mpc are discounted, the gmca residuals images are highly correlated with the 21cm images, with correlation coefficients of just less than 0.7. In comparison, smoothing the images did not produce as strong a correlation.
gmca is a highly adaptable method and there remains the possibility that with careful tuning, the 21cm signal could be picked out as a separate source as opposed to being present as a residual of the process. We intend to explore this further and consider using different mixing matrices for each scale.
7 Acknowledgments
FBA acknowledges the support of the Royal Society via a University Research Fellowship.
GH is a member of the LUNAR consortium, which is funded by the NASA Lunar Science Institute (via Cooperative Agreement NNA09DB30A) to investigate concepts for astrophysical observatories on the Moon.
LVEK acknowledges the financial support from the European Research Council under ERCStarting Grant FIRSTLIGHT  258942.
The authors would like to acknowledge Mario Santos for useful discussion.
References
 Bobin et al. (2008) Bobin J., Moudden Y., Starck J.L., Fadili J., Aghanim N., 2008, Statistical Methodology, 5, 307
 Bobin et al. (2007) Bobin J., Starck J.L., Fadili J., Moudden Y., 2007, Image Processing, IEEE Transactions on, 16, 2662
 Bobin et al. (2008) Bobin J., Starck J.L., Moudden Y., Fadili M. J., 2008, in Hawkes P. W., ed., Advances in Imaging and Electron Physics, Vol. 152, Advances in IMAGING AND ELECTRON PHYSICS. Elsevier, pp 221 – 302
 Bobin et al. (2012) Bobin J., Starck J.L., Sureau F., Basak S., 2012, Sparse component separation for accurate CMB map estimation, preprint (astroph/1206.1773
 Bowman et al. (2006) Bowman J. D., Morales M. F., Hewitt J. N., 2006, ApJ, 638, 20
 Chapman et al. (2012) Chapman E., Abdalla F. B., Harker G., JeliÄ V., Labropoulos P., Zaroubi S., Brentjens M. A., de Bruyn A. G., Koopmans L. V. E., 2012, Monthly Notices of the Royal Astronomical Society, 423, 2518
 Cho et al. (2012) Cho J., Lazarian A., Timbie P. T., 2012, The Astrophysical Journal, 749, 164
 Di Matteo et al. (2004) Di Matteo T., Ciardi B., Miniati F., 2004, MNRAS, 355, 1053
 Di Matteo et al. (2002) Di Matteo T., Perna R., Abel T., Rees M. J., 2002, ApJ, 564, 576
 Ewen & Purcell (1951) Ewen H., Purcell E., 1951, Nature, 168, 356
 Field (1958) Field G., 1958, Proceedings of the IRE, 46, 240
 Field (1959) Field G., 1959, ApJ, 129, 536
 Gleser et al. (2008) Gleser L., Nusser A., Benson A. J., 2008, MNRAS, 391, 383
 Harker et al. (2010) Harker G., Zaroubi S., Bernardi G., Brentjens M. A., de Bruyn A. G., Ciardi B., Jelić V., Koopmans L. V. E., Labropoulos P., Mellema G., Offringa A., Pandey V. N., Pawlik A. H., Schaye J., Thomas R. M., Yatawatta S., 2010, MNRAS, 405, 2492
 Harker et al. (2009) Harker G., Zaroubi S., Bernardi G., Brentjens M. A., De Bruyn A. G., Ciardi B., Jelić V., Koopmans L. V. E., Labropoulos P., Mellema G., Offringa A., Pandey V. N., Schaye J., Thomas R. M., Yatawatta S., 2009, MNRAS, 397, 1138
 Jelić et al. (2010) Jelić V., Zaroubi S., Labropoulos P., Bernardi G., de Bruyn A. G., Koopmans L. V. E., 2010, MNRAS, 409, 1647
 Jelić et al. (2008) Jelić V., Zaroubi S., Labropoulos P., Thomas R. M., Bernardi G., Brentjens M. A., De Bruyn A. G., Ciardi B., Harker G., Koopmans L. V. E., Pandey V. N., Schaye J., Yatawatta S., 2008, MNRAS, 389, 1319
 Komatsu et al. (2011) Komatsu E., Smith K. M., Dunkley J., Bennett C. L., Gold B., Hinshaw G., Jarosik N., Larson D., Nolta M. R., Page L., Spergel D. N., Halpern M., Hill R. S., Kogut A., Limon M., Meyer S. S., Odegard N., Tucker G. S., Weiland J. L., Wollack E., Wright E. L., 2011, The Astrophysical Journal Supplement Series, 192, 18
 Liu & Tegmark (2011) Liu A., Tegmark M., 2011, Phys. Rev. D, 83, 103006
 Liu & Tegmark (2012) Liu A., Tegmark M., 2012, MNRAS, 419, 3491
 Liu et al. (2009) Liu A., Tegmark M., Bowman J., Hewitt J., Zaldarriaga M., 2009, MNRAS, 398, 401
 Liu et al. (2009) Liu A., Tegmark M., Zaldarriaga M., 2009, MNRAS, 394, 1575
 McQuinn et al. (2006) McQuinn M., Zahn O., Zaldarriaga M., Hernquist L., Furlanetto S. R., 2006, ApJ, 653, 815
 Madau et al. (1997) Madau P., Meiksin A., Rees M. J., 1997, ApJ, 475, 429
 Mao (2012) Mao X.C., 2012, The Astrophysical Journal, 744, 29
 Mesinger & Furlanetto (2007) Mesinger A., Furlanetto S., 2007, ApJ, 669, 663
 Mesinger et al. (2011) Mesinger A., Furlanetto S., Cen R., 2011, MNRAS, 411, 955
 Morales & Hewitt (2004) Morales M. F., Hewitt J., 2004, ApJ, 615, 7
 Muller & Oort (1951) Muller C., Oort J., 1951, Nature, 168, 357
 Oh & Mack (2003) Oh S. P., Mack K. J., 2003, MNRAS, 346, 871
 Petrovic & Oh (2011) Petrovic N., Oh S. P., 2011, MNRAS, 413, 2103
 Santos et al. (2005) Santos M. G., Cooray A., Knox L., 2005, ApJ, 625, 575
 Shaver et al. (1999) Shaver P. A., Windhorst R. A., Madau P., de Bruyn A. G., 1999, \aap, 345, 380
 Starck & Murtagh (2006) Starck J.L., Murtagh F., 2006, Astronomical image and data analysis. Springer
 Starck et al. (1998) Starck J.L., Murtagh F., Bijaoui A., 1998, Image Processing and Data Analysis:The Multiscale Approach. Cambridge University Press
 Trott et al. (2012) Trott C. M., Wayth R. B., Tingay S. J., 2012, The Astrophysical Journal, 757, 101
 van de Hulst (1945) van de Hulst H., 1945, Ned.Tijdschr.Natuuurk, 11, 201
 Vedantham et al. (2012) Vedantham H., Shankar N. U., Subrahmanyan R., 2012, The Astrophysical Journal, 745, 176
 Wang et al. (2006) Wang X., Tegmark M., Santos M. G., Knox L., 2006, ApJ, 650, 529
 Zaldarriaga et al. (2004) Zaldarriaga M., Furlanetto S. R., Hernquist L., 2004, ApJ, 608, 622
 Zaroubi et al. (2012) Zaroubi S., de Bruyn A. G., Harker G., Thomas R. M., Labropolous P., JeliÄ V., Koopmans L. V. E., Brentjens M. A., Bernardi G., Ciardi B., Daiboo S., Kazemi S., MartinezRubi O., Mellema G., Offringa A. R., Pandey V. N., Schaye J., Veligatla V., Vedantham H., Yatawatta S., 2012, Monthly Notices of the Royal Astronomical Society, 425, 2964
 Zibulevsky & Pearlmutter (2001) Zibulevsky M., Pearlmutter B. A., 2001, Neural Computation, 13, 863