The Fast Fourier Transform Telescope
Abstract
We propose an alldigital telescope for 21 cm tomography, which combines key advantages of both single dishes and interferometers. The electric field is digitized by antennas on a rectangular grid, after which a series of Fast Fourier Transforms recovers simultaneous multifrequency images of up to half the sky. Thanks to Moore’s law, the bandwidth up to which this is feasible has now reached about 1 GHz, and will likely continue doubling every couple of years. The main advantages over a single dish telescope are cost and orders of magnitude larger fieldofview, translating into dramatically better sensitivity for largearea surveys. The key advantages over traditional interferometers are cost (the correlator computational cost for an element array scales as rather than ) and a compact synthesized beam. We argue that 21 cm tomography could be an ideal first application of a very large Fast Fourier Transform Telescope, which would provide both massive sensitivity improvements per dollar and mitigate the offbeam point source foreground problem with its clean beam. Another potentially interesting application is cosmic microwave background polarization.
pacs:
98.80.EsI Introduction
Since Galileo first pointed his telescope skyward, design innovations have improved attainable sensitivity, resolution and wavelength coverage by many orders of magnitude. Yet we are still far from the ultimate telescope that simultaneously observes light of all wavelengths from all directions, so there is still room for improvement.
From a mathematical point of view, telescopes are Fourier transformers. We want to know individual Fourier modes of the electromagnetic field, as their direction encodes our image and their magnitude encodes the wavelength, but the field at a given spacetime point tells us only a sum of all these Fourier modes weighted by phase factors .
Traditional telescopes perform the spatial Fourier transform from space to space by approximate analog means using lenses or mirrors, which are accurate across a relatively small field of view, and perform the temporal Fourier transform from to using slits, gratings or bandpass filters. Traditional interferometers used analog means to separate frequencies and measure electromagnetic field correlations between different receivers, then Fouriertransformed to space digitally, using computers. In the tradeoff between resolution, sensitivity and cost, single dish telescopes and interferometers are highly complementary, and which is best depends on the science goal at hand.
Thanks to Moore’s law, it has very recently become possible to build alldigital interferometers up to about 1 GHz, where the analog signal is digitized right at each antenna and subsequent correlations and Fourier transforms are done by computers. In addition to reducing various systematic errors, this digital revolution enables the “Fast Fourier Transform Telescope” or “omniscope” that we describe in this paper. We will show that it acts much like a single dish telescope with a dramatically larger field of view, yet is potentially much cheaper than a standard interferometer with comparable area. If a modern alldigital interferometer such as the MWA (1) is scaled up to a very large number of antennas , its price becomes completely dominated by the computing hardware cost for performing of order correlations between all its antenna pairs. The key idea behind the FFT Telescope is that, if the antennas are arranged on a rectangular grid, this cost can be cut to scale merely as using Fast Fourier Transforms. As we will see, this design also eliminates the need for individual antennas that are pointable (mechanically or electronically), and has the potential to dramatically improve the sensitivity for some applications of future telescopes like the square kilometer array without increasing their cost.
This basic idea is rather obvious, so when we had it, we wondered why nothing like the massive allsky lowfrequency telescope that we are proposing had ever been built. We have since found other applications of the idea in the astronomy and engineering literature dating as far back as the early days of radio astronomy (8); (9); (10); (12); (11); (13); (14); (15); (16); (17), and it is clear that the answer lies in lack of both computer power and good science applications. Moore’s law has only recently enabled A/D conversion up to the GHz range, so in older work, Fourier transforms were done by analog means and usually in only one dimension (e.g., using a socalled Butler matrix (8)), severely limiting the number of antennas that could be used. For example, the 45 MHz interferometer in (9) used six elements. Moreover, to keep the number of elements modest while maintaining large collecting area, the elements themselves would be dishes or interconnected antennas that observed only a small fraction of the sky at any one time. A Japanese group worked on an analog FFT Telescope about 15 years ago for studying transient radio sources (10); (11), and then upgraded it to digital signal processing aiming for a array with a field of view just under . Electronics from this effort is also used in the 1dimensional 8element Nasu Interferometer (14).
Most traditional radio astronomy applications involve mapping objects subtending a small angle surrounded by darker background sky, requiring only enough sensitivity to detect the object itself. For most such cases, conventional radio dishes and interferometers work well, and an FFT Telescope (hereafter FFTT) is neither necessary nor advantageous. For the emerging field of 21 cm tomography, which holds the potential to one day overtake the microwave background as our most sensitive cosmological probe (18); (19); (20); (21); (22); (23); (24), the challenge is completely different: it involves mapping a faint and diffuse cosmic signal that covers all of the sky and needs to be separated from foreground contamination that is many orders of magnitude brighter, requiring extreme sensitivity and beam control. This 21cm science application and the major efforts devoted to it by experiments such as MWA (1), LOFAR (2), PAPER(4), 21CMA (3), GMRT (5); (6) and SKA (7) makes our paper timely.
An interesting recent development is a North American effort (15); (16) to do 21 cm cosmology with a onedimensional array of cylindrical telescopes that can be analyzed with FFT’s, in the spirit of the Cambridge 1.7m instrument from 1957, exploiting Earth rotation to fill in the missing twodimensional information (15); (16). We will provide a detailed analysis of this design below, arguing that is is complementary to the 2D FFTT at higher frequencies while a 2D FFTT provides sharper cosmological constraints at low frequencies.
The rest of this paper is organized as follows. In Section II, we describe our proposed design for FFT Telescopes. In Section III, we compare the figures of merit of different types of telescopes, and argue that the FFT Telescope is complementary to both single dish telescopes and standard interferometers. We identify the regimes where each of the three is preferable to the other two. In Section IV, we focus on the regime where the FFT Telescope is ideal, which is when you have strong needs for sensitivity and beam cleanliness but not resolution, and argue that 21 cm tomography may be a promising first application for it. We also comment briefly on cosmic microwave background applications. We summarize our conclusions in Section V and relegate various technical details to a series of appendices.
Ii How the FFT Telescope works
In this section, we describe the basic design and data processing algorithm for the FFT Telescope. We first summarize the relevant mathematical formalism, then discuss data processing, and conclude by discussing some practical issues. For a comprehensive discussion of radio interferometry techniques, see e.g. (25).
ii.1 Interferometry without the flat sky approximation
Since the FFT Telescope images half the sky at once, the flatsky approximation that is common in radio astronomy is not valid. We therefore start by briefly summarizing the general curvedsky results formalism. Suppose we have a set of antennas at positions with sky responses at a fixed frequency , , and a sky signal from the direction given by the unit vector (this radiation thus travels in the direction ). The data measured by each antenna in response to a sky signal is then
(1) 
Details related to polarization are covered below in Appendix A, but are irrelevant for
the present section. For now, all that matters is that specifies the sky signal,
specifies the data that is recorded, and specifies the relation between
the two.
The sky signal has a slow time dependence because the sky rotates overhead, because of variable astronomical sources, and because of distorting atmospheric/ionospheric fluctuations. However, since these changes are many orders of magnitude slower than the electric field fluctuation timescale , we can to an excellent approximation treat equation (1) as exact for a snapshot of the sky. Below we derive how to recover the snapshot sky image from these raw measurements; only when coadding different snapshots does one need to take sky rotation and other variability into account.
The statements above hold for any telescope array. For the special case of the FFT Telescope, all antennas have approximately identical beam patterns and lie in a plane, which we can without loss of generality take to be the plane so that . Using the fact that
(2) 
where is the length of the component of the vector perpendicular to the axis, we can rewrite equation (1) as a 2dimensional Fourier transform
(3) 
where we have defined the 2dimensional vectors
(4) 
and the function
(5) 
Here the 2dimensional function is defined to equal when , zero otherwise, and is defined analogously. can therefore be thought of as the windowed, weighted and zeropadded sky signal. Equation (3) holds under the assumption that vanishes for , i.e., that a ground screen eliminates all response to radiation heading up from below the horizon, so that we can limit the integration over solid angle to radiation pointed towards the lower hemisphere. Note that for our application, the simple Fourier relation of equation (3) is exact, and that none of the approximations that are commonly used in radio astronomy for the socalled “term” (see Equation 3.7 in (25)) are needed.
One usually models the fields arriving from different directions as uncorrelated, so that
(6) 
where is the sky intensity Stokes matrix and the spherical function satisfies
(7) 
so that for any function . Combining equation (3) with equation (6) implies that the correlation between two measurements, traditionally referred to as a visibility, has the expectation value
(8)  
where is the Fourier transform of:
(9) 
is the beamweighted, projectionweighted and zeropadded sky brightness map.
In summary, things are not significantly more complicated than in standard interferometry in small sky patches (where the flat sky approximation is customarily made). One can therefore follow the usual radio astronomy procedure with minimal modifications: first measure at a large number of baselines corresponding to different antenna separations , then use these measurements to estimate the Fourier transform of this function, , and finally recover the desired sky map by inverting equation (9):
(10) 
ii.2 FFTT analysis algorithm
Equation (8) shows that the Fourier transformed beamconvolved sky is measured at each baseline, i.e., at each separation vector for an antenna pair. A traditional correlating array with antennas measures all such pairwise correlations, and optionally fills in more missing parts of the Fourier plane exploiting Earth rotation. Since the cost of antennas, amplifiers, A/Dconverters, etc. scales roughly linearly with , this means that the cost of a truly massive array (like what may be needed for precision cosmology with 21cm tomography (24)) will be dominated by the cost of the computing power for calculating the correlations, which scales like .
For the FFT Telescope, the antenna positions are chosen to form a rectangular grid. This means that the all baselines also fall on a rectangular grid, typically with any given baseline being measured by many different antenna pairs.
The sums of for each baseline can be computed with only of order (as opposed to ) operations by using Fast Fourier Transforms. Essentially, what we wish to measure in the Fourier plane are the antenna measurements (laid out on a 2D grid) convolved with themselves, and this naively convolution can be reduced to an FFT, a squaring, and an inverse FFT.
In fact, equation (3) shows that after FFTing the 2D antenna grid of data , one already has the two electric field components from each sky direction, and can multiply them to measure the sky intensity from each direction (Stokes , , and ) without any need to return to Fourier space, as illustrated in Figure 1. This procedure is then repeated for each time sample and each frequency, and the many intensity maps at each frequency are averaged (after compensating for sky rotation, ionospheric motion, etc.) to improve signaltonoise.
It should be noted that the computational cost for the entire FFT Telescope signal processing pipeline is (up to some relatively unimportant log factors) merely proportional to the total number of numbers measured by all antennas throughout the duration of the observations. In particular, the time required for the spatial FFT operations is of the same order as the time required for the timedomain FFT’s that are used to separate out the different frequencies from the time signal using standard digital filtering. If the antennas form an rectangular array, so that , and each antenna measures different time samples (for a particular polarization), then it is helpful to imagine this data arranged in a 3dimensional block. The temporal and spatial FFT’s (left branch in Figure 1) together correspond to a 3D FFT of this block, performed by three 1dimensional FFT operations:

For each antenna, FFT in the direction.

For each time and antenna row, FFT in the direction.

For each time and antenna column, FFT in the direction.
One processes one such block for each of the two polarizations. These three steps each involve of order multiplications (up to orderofunity factors , and ), and it is easy to show that the number of operations for the three steps combined scales as , i.e., depends only on the total amount of data . After step 3, one has the two electric field components from each direction at each frequency. Phase and amplitude calibration of each antenna/amplifier system is normally performed after step 1. If one is interested in sharp pulses that are not welllocalized in frequency, one may opt to skip step 1 or perform a broad bandpass filtering rather than a full spectral separation.
The FFT Telescope cuts down not only on CPU time, but also on data storage costs, since the amount of data obtained at each snapshot scales as number of time samples taken times rather than .
In a conventional interferometer, antennas are correlated only with other antennas and not with themselves, to eliminate noise bias. This can be trivially incorporated in the FFTT analysis pipeline as well by setting the pixel at the origin of the UV plane (corresponding to zero baseline) to zero, and is mathematically equivalent to removing the mean form the recovered sky map.
ii.3 Practical considerations
Although we have laid out the mathematical and computational framework for an FFT Telescope above, there are a number of practical issues that require better understanding before building a massive scale FFT Telescope.
As we will quantify in Section III below, the main advantages of an FFT Telescope relative to single dish telescopes and conventional interferometers emerge when the number of antennas is very large. A successful FFTT design should therefore emphasize simplicity and massproduction, and minimize hardware costs. To exploit the FFT data processing speedup, care must be taken to make the antenna array as uniform as possible. The locations of the antennas need to be kept in a planar rectangular grid to within a small fraction of a wavelength, so when selecting the construction site, it is important that the land is quite flat to start with, that bulldozing is feasible, and that there are no immovable obstacles. It is equally important that the sky response be close to identical for all antennas. A ground screen, which can simply consist of cheap wire mesh laid out flat under the entire array, should therefore extend sufficiently far beyond the edges of the array that it can to reasonable accuracy be modeled as an infinite reflecting plane, affecting all antennas in the same way. The sky response of an antenna will also be affected by the presence of neighbors: whereas the response of antennas in the central parts of a large array will be essentially identical to one another (and essentially identical to that for an antenna in the middle of an infinite array), antennas near the edges of the array will have significantly different response. Instead of complicating the analysis to incorporate this, it is probably more cost effective to surround the desired array with enough rows of dummy antennas that the active ones can be accurately modeled as being in an infinite array. These dummy antennas could be relatively cheap, as they need not be equipped with amplifiers or other electronics (merely with an equivalent impedance), and no signals are extracted from them.
The FFT algorithm naturally lends itself to a a rectangular array of antennas. However, this rectangle need not be square; we saw above that the processing time is independent of the shape of the rectangle, depending only on the total number of antennas, and below we will even discuss the extreme limit where the telescope is onedimensional. Another interesting alternative to a square FFTT telescope is a circular one, consisting of only those of the antennas in the square grid that lie within a circle inscribed in the square. This in no way complicates the analysis algorithm, as the FFT’s need to be zeropadded in any case, and increases the computational cost for a given collecting area by only about a quarter. The main advantage is a simple rotationally invariant synthesized beam as discussed below. Antennas can also be weighted in software before the spatial FFT do create beams with other desired properties; for example, edge tapering can be used to make the beam even more compact. A third variant is to place the antennas further apart to gain resolution at the price of undersampling the Fourier plane and picking up sidelobes.
Table 1 – How telescope properties scale with dish size , collecting area and wavelength . We assume that the standard interferometer has separate dishes with a maximum separation that together cover a fraction of the total array region rather uniformly.
Single Dish Telescopes  Interferometers  
Single  Maximal  Standard  
Receiver  Focal Plane  FFT  Interferometric  
Telescope  Telescope  Telescope  Telescope  
Resolution  
Field of view  
Resolution elements  
Etendu  
Sensitivity  
Cost 
Iii Comparison of different types of telescopes
iii.1 Telescopes generalized
In this section, we compare the figures of merit (resolution, sensitivity, cost, etc.) of different types of telescopes, summarized in Table 1, and argue that the FFT Telescope is complementary to both single dish telescopes and standard interferometers. We identify the regimes where each of the three is preferable to the other two, as summarized in Figure 2.
It is wellknown that all telescopes can be analyzed within a single unified formalism that characterizes their linear response to sky signals and their noise properties. In particular, a single dish telescope can be thought of as an interferometer, where every little piece of the collecting area is an independent antenna, and the correlation is performed by approximate analog means using curved mirrors. This eliminates the costly computational step, but the approximations involved are only valid in a limited field of view (Table 1). Traditional interferometers can attain larger field of view and better resolution for a given collecting area, but at a computational cost. The FFT Telescope is a hybrid of the two in the sense that it combines the resolution of a single dish telescope with the allsky fieldofview of a dipole interferometer — at a potentially much lower cost than either a single dish or a traditional interferometer of the same collecting area. Let us now quantify these statements, starting with angular resolution and its generalization and then turning to sensitivity and cost. We first briefly review some wellknown radio astronomy formalism that is required for our applications.
iii.2 Angular resolution and the beam function
The angular resolution of the telescopes we will compare are all much better than a radian, so
we can approximate the sky as flat for the purposes of this section.
If we ignore polarization, then it is wellknown that the response of an interferometer to radiation intensity coming from near the local
zenith
(11) 
the famous Airy pattern plotted in Figure 3.
Here , where is the angle to the zenith.
When the beam is asymmetric, we will mainly be interested in the azimuthally averaged beam which again depends only on ;
the result for a square telescope like the fully instrumented FFTT plotted
for comparison
For these three cases, the shapes are seen to be sufficiently similar that, for many purposes, all one needs to know about the beam can be encoded in a single number specifying its width. The most popular choices in astronomy are summarized in Table 2: the rms (the rootmeansquared value of averaged across the beam), the FWHM (twice the value where has dropped to half its central value) and the first null (the smallest at which ). We will mainly focus on the FWHM in our cost comparison below.
Table 1 – Different measures of angular resolution, measured in units of .
rms  FWHM  First null  
Disk of diameter  0.53  1.22  
Square of side  0.49  1.07  
Gaussian with  1 
The primary beam that was introduced in Section II.1 can itself be derived from this same formalism by considering each piece of an antenna as an independent element. For example, a single radio dish has with W̌ given by equation (11) modulo polarization complications. To properly compute the polarization response that is encoded in the matrix , the full 3dimensional structure of the antenna and how it is connected to the two amplifiers must be taken into account, and the presence of nearby conducting objects affects as well.
For applications like CMB and 21cm mapping, where one wishes to measure a cosmological power spectrum, the key aspect of the synthesized beam that matters is how sensitive it is to different angular scales and their associated spherical harmonic coefficients. This response to different angular is encoded in the spherical harmonic expansion of the synthesized beam . If the synthesized beam is rotationally symmetric (or made symmetric by averaging observations with different orientations as Earth rotates), then its spherical harmonic coefficients vanish except for , and we only need to keep track of the socalled beam function, the coefficients plotted in Figure 4. In the flatsky approximation, this beam function for a rotationally symmetric synthesized beam reduces to the twodimensional Fourier transform of , which is simply the baseline distribution :
(15) 
Figure 4 shows for the circular, square and Gaussian aperture cases mentioned above. WMAP and many other CMB experiments have published detailed measurements of their beam functions (e.g., (26)), many of which are fairly well approximated by Gaussians. For interferometers, the beam functions can be significantly more interesting. Since scales simply as the number of baselines at different separations, more complicated synthesized beams involving more than one scale can be designed if desirable.
iii.3 Sensitivity
How the noise power spectrum is defined and normalized
The sensitivity of an arbitrary telescope to signals on various angular scales is quantified by its noise power spectrum . If the telescope were to make a uniformly observed allsky map, then would be the variance (due to detector noise) with which a spherical harmonic coefficient could be measured. For a map that covers merely a small sky patch, the corresponding noise power spectrum is the that would result if the whole sky were observed with this same sensitivity. Without loss of generality, we can factor the noise power spectrum as (27); (28)
(16) 
where is the beam function from the previous section, and is an overall normalization constant. To avoid ambiguity in this factorization, we normalize the beam function so that its maximum value equals unity. For a single dishe, the maximum is always at . This gives the normalization , which given equation (15), which means that the synthesized beam integrates to unity and that we can interpret the signal as measuring a weighted average of the true sky map. In all cases, our normalization Most interferometers have and thus no sensitivity to the mean; in many such cases, is roughly constant on angular scales much larger than the synthesized beam but much smaller than the primary beam, taking its maximum on these intermediate scales.
This seemingly annoying lack of sensitivity to the mean is a conscious choice and indeed a key advantage of interferometers. The mean sensitivity can optionally be retained by simply including the antenna autocorrelations in the analysis (i.e., not explicitly setting the pixel at the origin of the plane equal to zero), but this pixel normally contains a large positive bias due to noise that is difficult to accurately subtract out. In contrast, the noise in all other pixels normally has zero mean, because the noise in different antennas is uncorrelated. Since singledish telescopes cannot exclude this zero mode, they often require other approaches to mitigate this noise bias, such as rapid scanning or d beamswitching.
How it depends on experimental details
Consider a telescope with total collecting area observing for a time with a bandwidth around some frequency . If this telescope performs a single pointing in some direction, then the noise power spectrum for this observed region is (19):
(17) 
Here is a dimensionless factor of order unity that depends on the convention used to define the telescope system temperature ; below we simply adopt the convention where . For a singledish telescope and for a maximally compact interferometer like the FFTT, . For an interferometer where the antennas are rather uniformly spread out over a larger circular area, is the fraction of this area that they cover; if there are antennas with diameter in this larger area of diameter , we thus have and total collecting area . For a general interferometer the noise power spectrum depends on the distribution of baselines and could be a complicated function of . We are absorbing all dependence into the beam function as per equation (16).
If instead of just pointing at a fixed sky patch, the telescope scans the sky (using Earth rotation and/or pointing) to map a solid angle that exceeds its fieldof view , and spends roughly the same amount of time covering all parts of the map, then a given point in the map is observed a fraction of the time. The resulting noise power spectrum for the map is then
(18) 
Here is the fraction of the sky covered by the map, and we have introduced the dimensionless parameter to denote the relative bandwidth.
iii.4 The 3D noise power spectrum
For 21cm applications, it is also important to know the threedimensional noise power spectrum the “data cube” mapped by treating the frequency as the radial direction (the higher the frequency, the larger the redshift and hence the larger the distance to the hydrogen gas responsible for the 21 cm signal). In a comoving volume of space subtending a small angle and a small redshift range centered around , we can linearize the relation between the comoving coordinate and the observed quantities (e.g., (24)):
(19)  
(20) 
Here gives the angular distance away from the center of the field being imaged, and is the corresponding comoving distance transverse to the line of sight. is the comoving angular diameter distance to redshift , and
(21) 
where cm is the restframe wavelength of the 21 cm line, and is the cosmic expansion rate at redshift . In Appendix B, we show that these two conversion functions can be accurately approximated by
(22)  
(23) 
for the regime most relevant to 21 cm tomography given the flat concordance cosmological parameter values and kmsMpc (29); (30).
If a 2dimensional map is subdivided into pixels of area and the noise is uncorrelated with variance in these pixels, then
(24) 
for angular scales well above the pixel scale. Analogously, if a 3dimensional map is subdivided into pixels (voxels) of volume and the noise is uncorrelated with variance in them, then
(25) 
on length scales well above the pixel scale. Since the volume of a 3D pixel is , i.e., its area times its depth, combining equations (18), (24) and (25) gives the largescale noise power spectrum
(26) 
When 2D and 3D power spectra are discussed in the cosmology literature, it is popular to introduce corresponding quantities
(27)  
(28) 
which give the variance contribution per logarithmic interval in scale. One typically has when both the angular scale and the bandwidth are chosen to match the length scale , i.e., when and . Beware that here (and only here) we use to denote the wavenumber of cosmic fluctuations, while everywhere else in this paper, we use it to denote the wave vector of electromagnetic radiation.
Sensitivity to point sources
It is obviously good to have a small noise power spectrum and a large field of view. However, the tradeoff between these two differs depending on the science goal at hand. Below we mention two cases of common interest.
If one wishes to measure the flux from an isolated point source, it is easy to show that the attainable accuracy is
(29) 
In the approximation of a Gaussian beam with rms width , this simplifies to
(30)  
In the last step, we used the fact that the angular resolution . The total information (inverse variance) in the map about the point source flux thus scales as . That this information is proportional to the field of view , the observing time and te bandwidth is rather obvious. That it scales like the collecting area as rather than is because every baseline carries an equal amount of information about the flux , and the number of baselines scales quadratically with the area. It is independent of because it does not matter how long the baselines are; therefore the result is the same regardless of where the antennas are placed. This last result also provides intuition for the factor in equation (17): since is independent of and , we must have . As drops and the same total amount of information is spread out over an area in the UV plane that is a factor larger, the information in any given bin that was previously observed must drop by the same factor, increasing its variance by a factor .
Power spectrum sensitivity
For CMB and 21 cm applications, one is interested in measuring the power spectrum of the sky signal. The accuracy with which this can be done depends not only on , but also on the signal itself (which contributes sample variance) and on the mapped sky fraction . The average power spectrum across a band consisting of multipoles centered around can be measured to precision (31); (32)
(31) 
Since , there is an optimal choice of that minimizes . In cases where is optimal, this best choice corresponds to , so that sample variance and noise make comparable contributions (31); (32). This means that optimized measurements tend to fall into one of three regimes:

No detection: even when is made as small as the telescope permits. Upper limit .

Improvable detection: , and .

Cosmic variance limited detection: , and further noise reductions do not help.
The regime depends normally depends on , since and tend to have different shapes. For example, the WMAP measurement of the unpolarized CMB is in regimes 1, 2 and 3 at , and , respectively.
iii.5 Field of view
The field of view of a telescope is the solid angle that it can map in a single pointing. For a telescope with a single dish of diameter and a single receiver/detector pixel in its focal plane (a dish for satellite TV reception, say), the receiver will simply map a sky patch corresponding to the angular resolution , giving . The opposite extreme is to fill the entire focal plane with receivers, as is often done for, e.g., microwave and optical telescopes. In Appendix C, we show that the largest focal plane possible covers an angle of order , corresponding to . This upper bound comes from the fact that the analog Fourier transform performed by telescope optics is only approximate. Many actual multireceiver telescopes fall somewhere between these two extremes. In summary, singledish telescopes have a field of view somewhere in the range
(32) 
We refer to the two extreme cases in this inequality as the single receiver telescope (SRT) and the maximal focal plane telescope (MFPT), respectively.
Since the performs its Fourier transform with no approximations, it can in principle observe the entire sky above the horizon, corresponding to . However, the useful field of view is only of order half of this, because the image quality degrades near the horizon: viewed from a zenith angle , one dimension of the telescope appears foreshortened by a factor , causing loss of both angular resolution and collecting area (and thus sensitivity) near the horizon.
iii.6 Cost
Detailed cost estimates for telescopes are notoriously difficult to make, and will not be attempted here. We will instead limit our analysis to the approximate scaling of cost with collecting area, as summarized in Table 1, which qualitatively determines which telescopes are cheapest in the different parts of the parameter space of Figure 2.
For a singledish telescope, the cost usually grows slightly faster than linearly with area. Specifically, it has been estimated that the cost for radio telescopes (33).
For a standard interferometric telescope consisting of separate dishes, the total cost for the dishes themselves is of course proportional to . However, the cost for the correlator hardware that computes the correlations between all the pairs of dishes scales as , and thus completely dominates the total cost in the large limit that is the focus of the present paper (already at the modest scale of the MWA experiment, where , the and parts of the hardware cost are comparable). For fixed dish size, the total collecting area so that the cost . f For an FFT Telescope, the cost of antennas, ground screen and amplifiers are all proportional to the number of antennas and hence to the area. As described in Section II, the computational hardware is also proportional to the area, up to some small logarithmic factors that we to first approximation can ignore.
The abovementioned approximate scalings are of course only valid over a certain range.
All telescopes must have . The cost of single dishes grows more rapidly once their structural integrity
becomes an issue — for example, engineering challenges appear to make
a singledish radio telescope with km daunting with current technology
iii.7 Which telescope is best for what?
Let us now put together the results from the previous subsections to investigate which telescope design is most cost effective for various science goals.
We will use the noise power spectrum to quantify sensitivity. We will begin our discussion focusing on only two parameters, the largescale sensitivity and the angular resolution , since the parametrization is a reasonable approximation for many of the telescope designs that we have discussed. We then turn to more general noise power spectra when discussing elongated FFTs, general interferometers and the issue of point source subtraction.
Complementarity
If we need a telescope with angular resolution and largescale sensitivity , then which design will meet out requirements at the lowest cost? The answer is summarized in Figure 2 for a MHz example. First of all, we see that SDTs, FFTTs and SITs and are highly complementary: the cheapest solution is offered by SDTs for low resolution, FFTTs for high sensitivity K, and standard interferometers or elongated FFTTs for high resolution K, .
Calculational details
A few comments are in order about how these results were obtained.
For a single SRT, MFPT or FFTT, both the resolution and the sensitivity are determined by their area alone, so as the area is scaled up, they each trace a line through the parameter space of Figure 2. The cheapest way to attain a better sensitivity at the same resolution is simply to build multiple telescopes of the same area (except for the FFTT, where cost, so that one might as well build a single larger telescope instead and get extra resolution for free). Since , where is the number of telescopes whose images are averaged together, the sensitivity of an FFTT with a given resolution can be matched by building telescopes, where for the MFPT and for the SRT. The cost relative to an FFTT of the same resolution and sensitivity thus grows as for MFPTs and as for SRT’s. The area below which single dish telescopes are cheaper depends strongly on wavelength; for the illustrative purposes of Figure 2, we have taken this to be at 150 GHz based on crude hardware cost estimates for the GMRT (5) and MWA (1) telescopes.
For regions to the right of the FFTT line in Figure 2, one has the option of either building a square (or circular) FFTT with unnecessarily high sensitivity to attain the required resolution, or to build an elongated FFTT or a conventional interferometer — we return to this below, and argue that the latter is generally cheaper.
How the results depend on frequency and survey details
Although the Figure 2 is for a specific example, these qualitative results hold more generally. Survey duration, bandwidth, system temperature and sky coverage all merely rescale the numbers on the vertical axis, leaving the figure otherwise unchanged. As one alters the observing wavelength, the resolution and sensitivity remains the same if one alters the other scales accordingly: , , except that grows rapidly towards very low frequencies as the brightness temperature of synchrotron radiation exceeds the instrument temperature. The cost depends strongly and nonlinearly on frequency. As discussed in Section III.6, both the FFTT and digital SITs are currently feasible only below about 1 GHz, and and analog interferometry has not yet been successfully carried out above optical frequencies(?).
The advantage of an FFTT over a single dish telescope
The results above show that the FFTT can be thought of as simply a cheap singledish telescope with a field of view. Compared to singledish telescope, the FFTT has two important advantages:

It is cheaper in the limit of a large collecting area, with the cost scaling roughly like rather than or more.

It has better power spectrum sensitivity even for fixed area , because of a field of view that is larger by a factor between and .
An important disadvantage of the FFTT is that it currently only works below a about 1 GHz. Even if it were not for this limitation, since the computational cost of interferometry depends on the number of resolution elements , which grows fast toward higher frequencies (as for the MFPT and as for the FFTT), singledish telescopes become comparatively more advantageous at higher frequencies. However, as Moore’s law marches on, the critical frequency where an FFTT loses out to an SDT should grow exponentially over time.
The advantage of an FFT Telescope over a traditional correlating interferometer
The results above also show that the FFTT can be thought of as a cheap maximally compact interferometer with a fullsky primary beam. To convert a stateoftheart interferometers such as MWA (1), LOFAR (2), PAPER(4), 21CMA (3) into an FFTT, one would need to do three things:

Move all antenna tiles together so that they nearly touch.

Get rid of any beamformer that “points” tiles towards a specific sky direction by adding relative phases to its component antennas, and treat each antenna as independent instead, thus allowing the array to image all sky directions simultaneously.

Move the antennas onto a rectangular grid to cut the correlator cost from to .
This highlights both advantages and disadvantages of the FFTT compared to traditional interferometers. There are three important advantages:

It is cheaper in the limit of a large collecting area, with the cost scaling roughly like rather than .

It has better power spectrum sensitivity even for fixed area , because of a field of view that is larger than for an interferometer whose primary beam is not full sky (because its array elements are either singledish radio telescopes or antenna tiles that are pointed with beamformers).

The synthesized beam is as clean and compact as for a SDT, corresponding to something like a simple Airy pattern. This has advantages for multifrequency point source subtraction as discussed below, and also for high fidelity mapmaking.
The most obvious drawback of a square or circular FFTT is that the angular resolution is much poorer than what a traditional interferometer can deliver. This makes it unsuitable for many traditional radio astronomy applications. We discuss below how this drawback can be partly mitigated by a rectangular rather than square design.
A second drawback is the lack of flexibility in antenna positioning. Whereas traditional interferometry allows one to place the antennae wherever it is convenient given the existing terrain, the construction of a large FFTT requires bulldozing.
The advantage of a 2D FFTT over a 1D FFTT exploiting Earth Rotation
There are two fundamentally different approaches to fully sampling a disk around the origin of the Fourier plane (usually referred to as the UV plane in the radio astronomy terminology): build a twodimensional array (like a square FFTT) whose baselines cover this disk, or build a more sparse array that fills the disk gradually, after adding together observations made at multiple times, when Earth rotation has rotated the available baselines. Equation (18) shows that, given a fixed number of antennas and hence a fixed collecting area, the former option gives lower and hence more accurate power spectrum measurements as long as the angular resolution is sufficient. The reason is that the factor in the denominator equals unity for the former case, and is otherwise smaller. For a rectangular FFTT of dimensions , depends on the angular scale and it is easy to show that
(34) 
In essence, making the telescope more oblong simply dilutes the same total amount of information out over a broader range of space, thus giving poorer sensitivity on the angular scales originally probed.
What telescope configuration is desirable depends on the science goal at hand. It has been argued (35); (24) that for doing cosmology with 21 cm tomography in the near term, it is best to make the telescope as compact as possible, i.e., to build a square or circular telescope. The basic origin of this conclusion is the result “a rolling stone gathers no moss” mentioned in Section III.4.2: for power spectrum measurement, it is optimal to focus the efforts to make the signaltonoise of order unity. The first generation of experiments have much lower signaltonoise than this, and thus benefit from focusing on large angular scales and measuring them as accurately as possible rather than measuring a larger range of angular scales with even poorer sensitivity. Of course, none of these 1st generation telescopes were funded for 21cm cosmology alone, and their ability to perform other science hinges on having better angular resolution, explaining why they were designed with less compact configurations. Better angular resolution can also aid point source removal.
For other applications where high angular resolution required, an oblong telescope is preferable. An interesting proposal of this type is the higherfrequency mapping proposed Pittsburgh Cylinder telescope (15); (16), which is onedimensional. Instead rather omnidirectional antennas, it takes advantage of its onedimensional nature by having a long cylindrical mirror, which increases the collecting area at higher frequencies. This is advantageous because its goal is to map 21 cm emission at the lower redshifts (higher frequencies 200 MHz) corresponding to the period after cosmic reionization, to detect neutral hydrogen in galaxies and use this to measure the baryon acoustic oscillation scale as a function of redshift. If one wishes to perform rotation synthesis with an oblong or 1D FFTT, it will probably be advantageous to build multiple telescopes rotated relative to one another (say in an Lshaped layout, or like spokes on a wheel), to reduce the amount of integration time needed to fill the UV plane. Crosscorrelating the antennas between the telescopes would incur a prohibitive computational cost, so such a design with separate telescopes would probably need to discard all but of a fraction if the total information, corresponding to the intratelescope baselines.
Another array layout giving higher resolution is to build an array whose elements consist of FFTTs placed far apart. After performing a spatial FFT of their individual outputs, these can then be multiplied and inversetransformed pairwise, and the resulting block coverage of the UV plane can be filled in by Earth rotation. As long as the number of separate FFTTs is modest, the extra numerical cost for this may be acceptable.
Above we discussed the tradeoff between different shapes for fixed collecting area. If one instead replaces a twodimensional FFTT by a onedimensional FFTT of length using rotation synthesis, then equation (18) shows that one loses sensitivity in two separate ways: at the angular scale where the power spectrum error bar from equation (31) is the smallest, one loses one factor of from the drop in , and a second factor of from the drop in collecting area . Another way of seeing this is to note that the available information scales as the number of baselines, which scales as the square of the number of antennas and hence as . This quadratic scaling can also be seen in equation (30): the total amount of information scales as , so whereas field of view, observing time and bandwidth help only linearly, area helps quadratically. This is because we can correlate electromagnetic radiation at different points in the telescope, but not at different times, at different frequencies or from different points in the sky. The common statement that the information gathered scales as the etendu is thus true only at fixed ; when all angular scales are counted, the scaling becomes .
If in the quest of more sensitivity, one keeps lengthening an oblong or onedimensional FFT to increase the collecting area, one eventually hits a limit: the curvature of Earth’s surface makes a flat exceedingly costly, requiring instead telescope curving along Earth’s surface and the alternative analysis framework mentioned above in Section III.6. If one desires maximally straightforward data analysis, one thus wants to grow the telescope in the other dimension to make it less oblong, as discussed in Section III.6. This means that if one needs antennas for adequate 21 cm cosmology sensitivity, one is forced to build a 2D rather than 1D telescope. For comparison, even the currently funded MWA experiment with its antennas is close to this number.
One final science application where 2D is required is the study of transient phenomena that vary on a time scale much shorter than a day, invalidating the static sky approximation that underlies rotation synthesis. This was the key motivation behind the aforementioned Waseda telescope (10); (11); (12).
Iv Application to 21 cm tomography
In the previous section we discussed the pros and cons of the FFTT telescope, and found that it’s main strength is for mapping below about 1 GHz when extreme sensitivity is required. This suggests that the emerging field of 21 cm tomography is an ideal first science application of the FFTT: it requires sky mapping in the subGHz frequency range, and the sensitivity requirements, especially to improve cosmic microwave background constraints on cosmological parameters, are far beyond what has been achieved in the past (38); (37); (24); (39).
iv.1 21cm tomography science
It is becoming increasingly clear that 21 cm tomography has great scientific potential for both astrophysics (18); (19); (20); (21); (35) and fundamental physics (36); (38); (37); (24); (39). The basic idea is to produce a threedimensional map of the matter distribution throughout our Universe through precision measurements of the redshifted 21 cm hydrogen line. For astrophysics, much of the excitement centers around probing the cosmic dark ages and the subsequent epoch of reionization caused by the first stars. Here we will focus mainly on fundamental physics, as this arguably involves both the most extreme sensitivity requirements and the greatest potential for funding extremely sensitive measurements.
Three physics frontiers
Future measurements of the redshifted 21 cm hydrogen line have the potential to probe hitherto unexplored regions of parameter space, pushing three separate frontiers: time, scale, and sensitivity. Figure 5 shows a scaled sketch of our observable Universe, our Hubble patch. It serves to show the regions that can be mapped with various cosmological probes, and illustrates that the vast majority of our observable universe is still not mapped. We are located at the center of the diagram. Galaxies (from the Sloan Digital Sky Survey (SDSS) in the plot) map the distribution of matter in a three dimensional region at low redshifts. Other popular probes like gravitational lensing, supernovae Ia, galaxy clusters and the Lyman forest are currently also limited to the small volume fraction corresponding to redshifts or less, and in many cases much less. The CMB can be used to infer the distribution of matter in a thin shell at the socalled “surface of last scattering”, whose thickness corresponds to the width of the black circle at and thus covers only a tiny fraction of the total volume. The region available for observation with the 21 cm line of hydrogen is shown in light blue/grey. Clearly the 21 cm line of hydrogen has the potential of allowing us to map the largest fraction of our observable universe and thus obtain the largest amount of cosmological information.
At the high redshift end the 21 cm signal is relatively simple to model as perturbations are still linear and “gastrophysics” related to stars and quasars is expected to be unimportant. At intermediate times, during the epoch of reionization (EOR) around redshift , the signal is strongly affected by the first generation of sources of radiation that heat the gas and ionize hydrogen. Modeling this era requires understanding a wide range of astrophysical processes. At low redshifts, after the epoch of reionization, the 21 cm line can be used to trace neutral gas in galaxies and map the large scale distribution of those galaxies.
The time frontier
Figure 5 illustrates that observations of the 21 cm line from the EOR and higher redshifts would map the distribution of hydrogen at times where we currently have no other observational probe, pushing the redshift frontier. Measurements of the 21 cm signal as a function of redshift will constrain the expansion history of the universe, the growth rate of perturbations and the thermal history of the gas during an epoch that has yet to be probed.

Tests of the standard model predictions for our cosmic thermal history , expansion history (which can be measured independently using both expansion and the angular diameter distances), and linear clustering growth.

Constraints on modified gravity from the abovementioned measurements of and clustering growth.

Constraints on decay or annihilation of dark matter particles, or any other longlived relic, from the abovementioned measurement of our thermal history (40); (41); (42). Here 21cm is so sensitive that even the expected annihilation of “vanilla” neutralino WIMP cold dark matter may be detectable (42).

Constraints on evaporating primordial black holes from the thermal history measurement (43).

Constraints on timevariation of fundamental physical constants such as the fine structure constant (44).
The scale frontier
These observations can potentially push the “scale frontier”, significantly extending the range of scales that are accessible to do cosmology. This is illustrated in figure 6, where the scales probed by different techniques are compared to what is available in 21 cm. Neutral hydrogen is a good probe of the small scales for two separate but related reasons. First, one can potentially make observations at higher redshifts, where more of the scales of interest are in the linear regime and thus can be better modeled. Second, at early times in the history of our Universe, hydrogen is still very cold and thus its distribution is expected to trace that of the dark matter up to very small scales, the socalled Jeans scale, where pressure forces in the gas can compete with gravity (45).

Precision tests of inflation by constraining smallscale nonGaussianity (46).

Precision constraints on noncold dark matter from probing galactic scales while they were still linear.
The sensitivity frontier
This combination of a large available volume with the presence of fluctuations on small scales that can be
used to constrain cosmology implies that the amount of information that at least in principle can be obtained
with the 21 cm is extremely large. This can be illustrated by calculating the number of Fourier modes
available to do cosmology that can be measured with this technique. This number can be compared with the
number of modes measured to date with various other techniques such as galaxy surveys, the CMB, etc. In
figure 7, we show the number of modes measured by past surveys and some planned probes
including 21 cm experiments
The FFTT sensitivity improvement translates into better measurement accuracy for many of the usual cosmological parameters. It has been shown that even the limited redshift range (dark shading in Figure 5) has the potential to greatly improve on cosmic microwave background constraints from WMAP and Planck: it could improve the sensitivity to spatial curvature and neutrino masses by up to two orders of magnitude, to and eV, and give a detection of the spectral index running predicted by the simplest inflation models (24). Indeed, it may even be possible to measure three individual neutrino masses from the scale and time dependence of clustering (24); (47).
Measuring the 21 cm power spectrum and using it to constrain physics and astrophysics does not require pushing the noise level down to the signal level, since the noise can be averaged down by combining many Fourier modes probing the same range of scales. This is analogous to how the COBE satellite produced the first measurement of the CMB power spectrum even though individual pixels in its sky maps were dominated by noise rather than signal (48). Further boosting the sensitivity to allow imaging (with signaltonoise per pixel exceeding unity) allows a number of improvements:

Improving quantification, modeling and understanding of foregrounds and systematic errors

Pushing down residual foregrounds with better cleaning (like in the CMB field, the residual foreground level after cleaning is likely to be comparable to the noise level)

Enabling power spectrum and nonGaussianity estimation after masking out ionized bubbles, thus greatly reducing the hardtomodel “gastrophysics” contribution

Pushing to higher redshift where the physics is simpler
iv.2 The cost of sensitivity
There is thus little doubt that sensitivity improvements can be put to good use. Equation (26) implies that the highredshift frontier in particular has an almost insatiable appetite for sensitivity: since , , depends only weakly on , and the diffuse synchrotron foreground that dominates at low frequencies scales roughly as in the cleanest parts of the sky for MHz (52), equation (26) gives a sensitivity
(35) 
if the observing time and field of view is held fixed (like for the FFTT). Pushing from to with the same sensitivity thus requires increasing the collecting area by a factor around 300. This would keep the signaltonoise level roughly the same if the 21 cm fluctuation amplitude is comparable and peaks at similar angular scales at the two redshifts, as suggested by the calculations of (23). Equation (35) shows that imaging smaller scales is expensive too, with an order of magnitude smaller scales (multiplying by 10) requiring a thousandfold increase in collecting area.
Figures 8 and 9 illustrate the rough cost of attaining the sensitivity
levels required for various physics milestones mentioned above.
Our cost estimates are very crude, and making more accurate ones
would go beyond the scope of the present paper, but the qualitative scalings
seen in the figures should nonetheless give a good indication of how
the different telescope designs complement each
other.
Figure 8 is for the case when all we care about is sensitivity, not how large a sky area is mapped with this sensitivity. We thus keep the telescope pointing at the same sky patch and get , so equation (35) gives a sensitivity . For a fixed spatial scale and redshift , the sensitivity thus depends only on the collecting effective area plotted on the horizontal axis. The solid curves in the figure all have maximally compact configurations with , corresponding to angular resolution . The lines are dotted where this resolution for the baseline wavelength m. If we insist on the higher resolution , we can achieve this goal by making the FFT or SIT oblong or otherwise sparse, with , so in this regime, and hence , and this area in turn determines the cost — this is why the solid curves in Figure 8 lie above the corresponding dotted ones.
Figure 9 is for the case when we want a map of a fixed area (WMAPstyle), in this case covering half the sky (), so equation (35) gives a sensitivity . For a fixed spatial scale and redshift, the sensitivity thus depends only on the collecting effective etendu plotted on the horizontal axis. Since drops with area for both the SRT and MFPT, in order to boost sensitivity, these telescopes now need an extra area boost to make up for the drop in . Although an MFPT has cost, it also has , so that and the cost . Once is large enough to give sufficient resolution () it becomes smarter to simply build multiple telescopes, giving cost .
For comparison, we have indicated some sensitivity benchmarks as vertical lines. Equation (35) shows that ; this redshift scaling is illustrated by these vertical lines. Additional sensitivity can also be put to good use for probing smaller scales, since an order of magnitude change in corresponds to three orders of magnitude on the horizontal axis.
iv.3 21 cm foregrounds
Aside from its extreme sensitivity requirements, another unique feature of 21 cm cosmology is the magnitude of its foreground problem: it involves mapping a faint and diffuse cosmic signal that needs to be separated from foreground contamination that is many orders of magnitude brighter (20); (22); (52), requiring extreme sensitivity and beam control. Fortunately, the foreground emission (mainly synchrotron radiation) has a rather smooth frequency spectrum, while the cosmological signal varies rapidly with frequency (corresponding to variations in physical conditions along the line of sight). Early work on 21cm foregrounds (53); (54); (55) has indicated that this can be exploited to clean out the foregrounds down to an acceptable level, effectively by highpass filtering the data cube in the frequency direction.
However, these papers have generally not treated the additional complication that the synthesized beam is frequency dependent, dilating like , which means that when raw sky maps at two different frequencies cannot be readily compared. For a singledish telescope or an FFTT, the synthesized beam is compact and simple enough that this complication can be modeled and remedied exactly (say by convolving maps at all frequencies to have the same resolution before foreground cleaning), but for a standard interferometer, complicated lowlevel “frizz” extending far from the central parts of the synthesized beam appears to make this unfeasible at the present time. Recent work (56); (57); (58) has indicated that this is a serious problem: whereas the foreground emission from our own galaxy is smooth enough that these offbeam contributions average down to low levels, emission from other galaxies appears as point sources to which the telescope response varies rapidly with frequency because of the beam dilation effect. The ability to mitigate this problem is still subject to significant uncertainty (58), and may therefore limit the ultimate potential of 21 cm cosmology with a conventional interferometer. The ability to deal with foreground contamination is thus another valuable advantage of the FFT Telescope.
V Conclusions
We have presented a detailed analysis of an alldigital telescope design where mirrors are replaced by fast Fourier transforms, showing how it complements conventional telescope designs. The main advantages over a single dish telescope are cost and orders of magnitude larger fieldofview, translating into dramatically better sensitivity for largearea surveys. The key advantages over traditional interferometers are cost (the correlator computational cost for an element array scales as rather than ) and a compact synthesized beam. These traits make the FFT Telescope ideal for applications where the angular resolution requirements are modest while those on sensitivity are extreme. We have argued that the emerging field of 21 cm tomography could provide an ideal first application of a very large FFT Telescope, since it could provide massive sensitivity improvements per dollar as well as mitigate the offbeam point source foreground problem with its clean beam.
v.1 Outstanding challenges
There are a number of interesting challenges and design questions that would need to be addressed before building a massive FFT Telescope for 21 cm cosmology. For example:

To what extent can the massive redundancy of an FFT Telescope (where the same baseline is typically measured by independent antenna pairs) be exploited to calibrate the antennas against one another in a computationally feasible way?

To what extent, if any, are more distant antennas outside the FFTT needed to resolve bright point sources and calibrate the FFTT antennas?

After calibration, how do gain fluctuations in the individual array elements affect the noise properties of the recovered sky map?

How do variations in primary beam from equation (1) from between individual antennas affect the properties of the recovered sky map?

How many layers of dummy antennas are needed around the active instrumented part of the array to ensure that the beam patterns of all utilized antennas are sufficiently identical?

What antenna design is optimal for a particular FFT Telescope science application, maximizing gain in the relevant frequency range? The limit of an infinite square grid of antennas on an infinite ground screen is quite different from the limit of a single isolated antenna, and modeling mutual coupling effects becomes crucial when computing the primary beam from equation (1)

What unforeseen challenges does the FFT Telescope entail, and how can they be overcome?

Can performing the first stages of the spatial FFT by analog means (say connecting adjacent or antenna blocks with Butler matrices (8)) lower the effective system temperature in parts of the sky with overall lower levels of synchrotron emission?
Answering these questions will require a combination of theoretical and experimental work. The authors are currently designing a small FFTT prototype with a group of radio astronomy colleagues to address these questions and to identify unforeseen obstacles.
v.2 Outlook
Looking further ahead, we would like to encourage theorists to think big and look into what additional physics may be learned from the sort of massive sensitivity gains that an FFTT could offer, as this can in turn increase the motivation for hard work on experimental challenges like those listed above.
Perhaps in a distant future, almost all telescopes will be FFT Telescopes, simultaneously observing light of all wavelengths from all directions. In the more immediate future, as Moore’s law enables FFTT’s with higher bandwidth, cosmic microwave background polarization may be an interesting application besides 21 cm cosmology. By using an analog frequency mixer to extract of order a GHz of bandwidth in the CMB frequency range (around say 30 GHz or 100 GHz), it would be possible to obtain a much greater instantaneous sky coverage than current CMB experiments provide, and this gain in could outweight the disadvantage of lower bandwidth in equation (18) to provide overall better sensitivity. The fact that extremely high spectral resolution would be available essentially for free may also help groundbased measurements, allowing exploitation of the fact that some atmospheric lines are rather narrow.
Acknowledgements: The authors wishes to thank Michiel Brentjens, Don Backer, Angelica de OliveiraCosta, Ron Ekers, Jacqueline Hewitt, Mike Jones, Avi Loeb, Adrian Liu, UeLi Pen, Jeff Peterson, Miguel Morales, Daniel Mitchell, James Moran, Jonathan Rothberg, Irwin Shapiro, Richard Thompson and an anonymous referee for helpful comments, and Avi Loeb in particular for encouragement to finish this manuscript after a year of procrastination. This work was supported by NASA grants NAG511099 and NNG 05G40G, NSF grants AST0134999 and AST0506556, a grant from the John Templeton foundation and fellowships from the David and Lucile Packard Foundation and the Research Corporation.
Appendix A Polarization issues
The Stokes matrix defined by equation (6) is related to the usual Stokes parameters by
(36) 
In the dot product,
(37) 
and
(38) 
contains the four Pauli matrices. As usual, denotes the total intensity, and quantify the linear polarization and the circular polarization (which normally vanishes for astrophysical sources). It is easy to invert equation (36) to solve for the Stokes parameters:
(39) 
An annoying but harmless nuisance when dealing with largearea polarization maps is the wellknown fact that “you can’t comb a sphere”, i.e., that there is no global choice of reference vector to define the Jones vector and the Stokes parameters all across the sky. In practice, it never matters until at the very last analysis step, since one can collect the data and reconstruct both and without worrying about this issue. To compute and solve for the Stokes parameters, any convention for defining the Stokes parameters will suffice, even one involving separate schemes for a number of partially overlapping sky patches; it is easy to see that the choice of convention has no effect on the accuracy or numerical stability of the inversion method.
Appendix B Cosmic geometry
In this Appendix, we derive equations (22) and (23). For a flat universe (which is an excellent approximation for ours (29); (30)), the comoving angular diameter distance is given by (59)
(40) 
where
(41) 
where . The second term in the square root becomes negligible for for (30), which gives equation (23). The dark energy density is completely negligible at the high redshift regime relevant to 21 cm cosmology also in most models where this density evolves with time. For such high redshifts, we can therefore approximate equation (40) as follows:
(42)  
for , which gives equation (22). The accuracy of Equation (42) better than 1% for , i.e., better that with which the relevant cosmological parameters have currently been measured.
Appendix C Fieldofview estimates
In this appendix, we derive the restriction on the field of view for a single dish telescope. Consider a parabolic mirror of height given by:
(43) 
where and are the coordinates in the plane of the ground and determines the radius of curvature. The mirror has a diameter such that
(44) 
We consider radiation initially traveling with wave vector with . We will calculate the phase of the radiation that scatters at the location