The information content of cosmic microwave background anisotropies
Abstract
The cosmic microwave background (CMB) contains perturbations that are close to Gaussian and isotropic. This means that its information content, in the sense of the ability to constrain cosmological models, is closely related to the number of modes probed in CMB power spectra. Rather than making forecasts for specific experimental setups, here we take a more pedagogical approach and ask how much information we can extract from the CMB if we are only limited by sample variance. We show that, compared with temperature measurements, the addition of mode polarization doubles the number of modes available out to a fixed maximum multipole, provided that all of the , , and power spectra are measured. However, the situation in terms of constraints on particular parameters is more complicated, as we explain and illustrate graphically. We also discuss the enhancements in information that can come from adding mode polarization and gravitational lensing. We show how well one could ever determine the basic cosmological parameters from CMB data compared with what has been achieved with Planck, which has already probed a substantial fraction of the information. Lastly, we look at constraints on neutrino mass as a specific example of how lensing information improves future prospects beyond the current 6parameter model.
a]Douglas Scott,^{1}^{1}footnotetext: Corresponding author. a]Dagoberto Contreras, a]Ali Narimani, b]YinZhe Ma
Prepared for submission to JCAP
The information content of cosmic microwave background anisotropies

Department of Physics and Astronomy,
University of British Columbia,
Vancouver, BC, Canada V6T1Z1 
Astrophysics and Cosmology Research Unit,
School of Chemistry and Physics,
University of KwaZuluNatal,
Durban 4041, South Africa
Keywords: CMB theory – cosmological parameters from CMB
1 Introduction
Planck [1], the Wilkinson Microwave Anisotropy Probe (WMAP [2]), the Atacama Cosmology Telescope (ACT [3]), the South Pole Experiment (SPT [4]) and other cosmic microwave background (CMB) experiments have measured the CMB with high sensitivity, covering angular scales from essentially the whole sky down to the arcminute regime [5]. It is well known that the precision with which these measurements are made (together with an understanding of the physics generating the anisotropies) allows us to place very tight constraints on cosmological parameters. A large number of studies have focused on predicting how well the parameters can be constrained using existing or future CMB data [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. In this paper we want to take a rather more global view, and ask just how much constraining power there is to be mined, or in other words, how much cosmological information there is to extract from the CMB anisotropies. This will lead us to address questions like:

How is the overall information content related to the number of CMB modes measured?

What does polarization add to temperature information?

How do power spectrum measurements relate to constraints on parameters?

Is the information finite, and how far have we progressed towards the goal of measuring all that is available?
The basic 2015 Planck data set (including fairly conservative masking of the sky, as well as fitting of foreground signals) gives a temperature power spectrum that is measured to approximately and polarizationrelated power spectra that are measured to around [1]. We might therefore naively expect that there is about worth of constraints to be shared out among the cosmological parameters. However, a check of the Planckderived constraints shows that this same data set yields a value for (a parameterization of the ratio of the sound horizon to the lastscattering surface distance) that is [16], which corresponds to an almost measurement of . At the same time, the constraints on the other five parameters in the usual set give a quadrature sum of about (which is fairly negligible compared to the constraint).
The original motivation for this paper was to ask “how is it that an measurement of anisotropy power leads to a roughly combined constraint on cosmological parameters?” In attempting to answer this question, we hope to illuminate some issues concerning cosmological parameter constraints in general, and how they might relate to experimental design in the future.
In this paper we focus on the conventional cosmologicalconstantdominated cold dark matter model, CDM, and a standard 6parameter set of cosmological parameters: , the amplitude of the initial power spectrum; , the powerlaw slope of the initial conditions; , the baryonic density; , the cold dark matter density; , which we have already defined; and , the optical depth to reionization. Here is the Hubble parameter today, , in units of . Within this model we use the code CAMB [17] to calculate CMB power spectra.
2 CMB anisotropy information
The word “information” has many different meanings. Here, we use the word to mean the strength of our ability to constrain cosmologies. The total amount of information available for cosmological surveys is related to the number of observable modes [18, 19]. This situation has been described in many papers related to measuring the 3dimensional power spectrum in order to constrain cosmological parameters [20, 21, 22, 23, 24].
The situation for CMB temperature anisotropies is simpler, since it only involves assessing the information content of a purely 2dimensional sky. On the other hand, as we shall see the relationship between the modes and the constraints on cosmological parameters is nontrivial.
Let us start by recalling that the temperature field on the sky is usually expanded in terms of spherical harmonics, i.e.,
(2.1) 
where we have removed the monopole (average CMB temperature) and dipole (which is dominated by our local velocity). Then, provided one goes to sufficiently high multipoles, and ignoring effects of beams and masking, one can use the set of s as an alternative representation of the pixels in the map. The power spectrum is the expectation value of the variance of the s as a function of , with each being equivalent since there are no cosmologically preferred directions.
If the perturbations are Gaussian then each sky is a realization of this power spectrum. The scatter among these realizations is known as “cosmic variance.” The cosmic variance in estimates of the s on the full sky is
(2.2) 
(e.g., Ref. [25]). The factor of 2 here is because this is effectively the “variance of the variance,” and for a Gaussian distribution that is twice the variance. The factor of is the number of modes for each and if only a fraction is observed, then the approximate effect is to increase the uncertainty in so that the “sample variance” is larger [26]. A more realistic expression can also be written that includes the instrumental noise and beam, as described in Ref. [27].
For the simple case of an allsky, noisefree experiment, which measures multipoles perfectly up to , the total square of the signaltonoise ratio (using Eq. 2.2) is
(2.3)  
Note that this calculation is exactly half the total number of modes, i.e., . This means that in terms of constraints on the power spectrum, each mode contributes to the square of the total signaltonoise ratio (SNR). In other words, estimating the information in the power spectrum is effectively the same thing as counting modes.
To be clear, we are distinguishing here between trivial information that tells us about the particular realization of our Universe (which we can continue to measure as precisely as we wish) and the more valuable information we can extract from our Hubble patch, which gives us constraints on the background cosmological model. The fact that the CMB sky is remarkably close to Gaussian allows us to reduce the information contained in the individual s (or equivalently in the particular hot and cold spots on our sky) to estimates of the power at each (or equivalently the variance among pixels as a function of angular separation). For Gaussian skies, the amount of information in the power spectrum is directly proportional to the number of independent modes that can be measured.
Certainly one could regard all the data in a map as being “information,” i.e., the fact that there is a CMB hot spot in a particular direction is of some consequence, just as it matters that we live in the Milky Way galaxy, rather than M31. However, we discount these particulars about our realization, since they do not tell us about our overall cosmological model. So here, when we say “information,” we are referring to the constraining power for cosmological models, or more specifically the signaltonoise ratio. Clearly there is a relationship between this “information” and the amount of computer memory required to store the related data, i.e., the number of bits needed. In a more formal information theoretical sense, the number of bits required corresponds to the base 2 logarithm of what we are defining as information (e.g., Ref. [28]) – but here we are talking about signaltonoise ratio for power spectra, since that is where parameter constraints come from.
For the case of the CMB each is a random number coming from a Gaussian distribution, with mean zero and variance . To estimate the from a set of observed s, it is sufficient to have only a very few bits of information for each , since we only need each to help us obtain an estimate of the variance. Hence the amount of cosmological information – i.e., the ability to eventually constrain parameters – is determined by the number of measurable modes times a (roughly) constant (but parameterization dependent) numerical factor. However, for a particular parameter it may be that some modes are more important than others; to understand how the information of power spectra maps onto the information of the cosmological model, we can perform a more rigorous calculation by considering the Fisher matrix, as we do in the following sections.
3 CMB Fisher information
The previous section considered CMB temperature anisotropies only, but since the CMB sky can be linearly polarized, there exists additional information in each pixel of a CMB map. If an experiment can measure the and Stokes parameters (in addition to ), then in principle there are two additional pieces of information for each pixel. The most useful way of describing these additional degrees of freedom is through a geometrical approach, defining a divergencelike combination, usually called “,” and a curllike combination, usually called “” [29, 30, 31]. , , and fields can be used to determine auto and crosspower spectra, and with parity considerations (at least for cosmological signals) making and uncorrelated, we are left with four CMB power spectra from which we can constrain parameters, namely , , , and .
The  and mode maps have now been well measured [32, 33, 34, 35, 36, 37, 38], but estimates of modes are still in their infancy [39]. Moreover, even when primordial modes become detectable, we expect them to be small, and hence one will need an experiment with an entirely different sensitivity range to probe the information contained in those modes. For these reasons we will neglect modes in most of the discussion of this paper. However, the modes caused by the effects of gravitational lensing have now been detected by several experiments [40, 41, 42, 43, 38, 44], and we will discuss this later in Section 4.
Focusing only on and , an important fact is that the two fields on the sky are not independent, but contain correlations, which can be measured through the crosspower spectrum . Because of this, it may not be entirely obvious how much additional information is provided by measurements of CMB polarization – does a measurement of and provide twice as much information as provided by each of them alone? We will answer this question in the following subsections.
3.1 The Fisher matrix
The Fisher matrix gives a powerful formalism for describing the information content coming from observables in terms of underlying parameters (see e.g., Ref. [45]). Under the assumption of Gaussian perturbations and with negligible instrumental noise, the Fisher information matrix for CMB temperature and polarization anisotropies is [46]
(3.1) 
where and are the power in the th multipole for , , or (temperature, mode polarization, and correlation, respectively), and the are cosmological parameters. Here we are ignoring the modes, as already explained; however, in principle one could easily extend Eq. (3.1) to include them.
We now define the vector as
(3.2) 
and as a covariance matrix
(3.3) 
We can then formulate Eq. (3.1) into a matrix product as
(3.4) 
where the entries for the noisefree case are
(3.5) 
following Ref. [46]. For convenience we will define and , which is the correlation coefficient between and (see appendix A6 in Ref. [36]).
The CramerRao bound states that we can assign the statistical uncertainties to be , this gives the smallest possible errors achievable. Now we will consider the simplest situation, where there is only one cosmological parameter to determine. This scenario, though simple, will explain the effects of including polarization along with temperature data, given that the maps are correlated. In Section 5 we will consider larger parameter sets which will explain the effects of correlations between parameters.
3.2 Tt only
Let us first consider the case in which only the CMB temperature is mapped, i.e., we have not measured and in this scenario. If we focus on a single parameter , then the SNR can be written as
(3.6) 
where we sum over all the multipoles . For the temperature power spectrum alone, then the covariance matrix just has a single entry and we can write the squared SNR at each multipole as
(3.7) 
If we have a simple scaling parameter (similar to ) then and the logarithmic derivative is unity, so we just have
(3.8)  
(3.9) 
Hence we see that in the simple case of a single parameter that is proportional to the amplitude of the power spectrum, the constraining power on this parameter from the power spectrum is the same as the mode counting presented in the previous section.
Things are more complicated when we consider general parameters. Figure 1 shows the squared SNR per multipole for the six parameters of the standard CDM model following Eq. (3.7), i.e., with all other parameters held fixed.^{1}^{1}1Gaussianity is not a valid assumption for the low multipoles, and hence the low part of the plots should be taken as approximate estimates only. The red curves in each panel show the only case. We can see many details from this curve, illustrating how different multipole ranges affect constraints on each parameter.
Variations due to the parameter are similar to those we have described for a “scaling parameter,” but not quite the same. If we looked at the spectrum without the effects of gravitational lensing, then we would find that the spectrum scales exactly like , and hence this is the scaling parameter we referred to in Eq. (3.7). Since for Fig. 1 all of the parameters are held fixed except each one individually, then both and serve effectively as scaling parameters in the sense of Eq. (3.7). Because of this, the curve for looks like the one for , multiplied by (since ), except for an extra variation at the lowest s. When we include lensing in the usual way (as we have done in Fig. 1) then there are also small wiggles in the curve, which come from the smoothing effect of lensing on the peaks and troughs.
For the slope, , we see the effect of the “pivot” point at , which projects to for ; at multipoles around this point there is no constraint on . The variations for most of the other parameters reflect the structure of the s themselves. For example, for the curve goes to zero near the positions of peaks and troughs (because the gradient of is zero there).
The panel in Figure 1 can serve as a guideline for inferring the sensitivity of a to a particular parameter. One can approximately define the dependency of to a generic parameter , by a polynomial as , with then representing linear dependence, and we might speak of “less than linear” and “nonlinear” as corresponding to and , respectively. Comparing each of the panels of Figure 1 with , shows that the set all have a less than linear relation with the s at , while exhibits a nonlinear relationship. The parameter is nonlinear over almost the entire multipole range (note the different axis range for this panel). In the high region, , all of the parameters (except ) show a mildly less than linear relation, with oscillations close to zero at some multipoles (and this is even true for ), while has a close to linear relation at .
We will discuss the blue and green ( and ) curves in the following subsections.
3.3 Ee only
Now consider the situation where only the modes are mapped, and hence we only have access to the power spectrum for constraining cosmology. Here the situation is clearly exactly the same as it was for the only case.
We have the squared SNR for a single parameter being
(3.10) 
and the total for a scaling parameter is
(3.11) 
Again, the result is just what we expect from mode counting. If we have polarization data out to some , then it provides the same constraints on a scaling parameter as having temperature data out to the same .
The situation for more general parameters is presented by the blue curves in the panels of Figure 1. One can see several effects that are similar to the case. The situation for and are essentially the same as for the case, with dramatic improvement for at low because of the sensitivity of largescale polarization to reionization. The constraining power for also has a zero in the case, but the pivot projects to a slightly different (reflecting the slightly different scales with which polarization is sourced compared to temperature). For the parameters , , and we see that the constraining power from is generally higher than that for (as recently pointed out in Ref. [15] and explained in the next section). This illustrates the improved parameter constraints from polarization, essentially because of the sharper acoustic features in polarization.
3.4 Te correlation only
Now let us examine the case where we measure the power spectrum only. This is clearly not a realistic situation, but evaluating it will elucidate some interesting points. The Fisher matrix again has a single entry, coming from the term , and for a single parameter , we obtain
(3.12) 
As before we have defined , which is plotted in Figure 2. For a scaling parameter we can simply replace the derivative with unity.
For a general parameter there can be a different situation than we saw for and . In the power spectrum the amount of correlation and (possibly surprisingly) how it changes under the influence of is important. To see this we can rewrite the square of the SNR of in terms of , , and as
(3.13) 
We can then look at the two limiting cases :
(3.14)  
(3.15) 
The first thing to notice is that in the limit of full correlation the information in is directly given by the information content of and , as expected. However, we also see that even in the case of a vanishing , there is still information to be obtained from measuring (this is because the SNR does not vanish as ). This will depend on the behaviour of the parameter and specifically on how varies as the parameter changes. For a scaling parameter the information content of vanishes when vanishes, but this is not true in general.
Details of the parameter values per multipole are shown in Figure 1, with the green curves being for the case. Once again, the panel is helpful in here because we can see that the dramatic drops in the green curve correspond to the points with . Checking the green curve value of these locations at other panels, such as , reveals that one gains information by measuring even if at a specific angular scale (e.g., see Figure 1 just below ).
The curves are usually lower than the and curves, simply because the  and modes are only partially correlated. This is manifested particularly strongly in the panel of Figure 1, where it is clear that is much less sensitive to the reionization bump than . However, we see that can be higher in some multipole ranges for some parameters, particularly for and . For example, is more sensitive to than in the range and more sensitive than in the range . In fact Ref. [15] already pointed out that can constrain better than by around – this would be hard to determine directly from Figure 1, since the figure does not account for correlations among parameters (although this is something we do consider in Section 5).
3.5 Tt and Ee power spectra, no correlation
We would like to understand the basic way that polarization information combines with temperature information. So let us now consider the simple (although hypothetical) situation in which we have mapped out mode polarization, but when this polarization is uncorrelated with temperature anisotropies. Under these conditions the covariance matrix of Eq. (3.3) takes on a simple form. In this case it is easy to verify that the information is exactly doubled compared with the temperatureonly case:
(3.16) 
Therefore for a scaling parameter the total SNR is
(3.17)  
i.e., we obtain a factor of improvement over the temperatureonly case. This makes sense, because uncorrelated mode polarization is adding precisely one additional piece of information for every pixel on the sky, or equivalently, is adding an independent set of modes to the modes.
3.6 Tt and Ee correlated, but ignoring Te
The situation in the previous subsection is of course not realistic, because in reality there is a correlation in the CMB anisotropies, and hence there are three distinct power spectra to determine, , , and . So how does this affect the total information content?
To see how this works, let us first of all imagine that although the temperature and polarization fields are correlated, we have not measured this correlation (we are imagining an impractical scenario here where the cosmologist has been careless and ignored ). We will again treat the simple case of a scaling parameter. If we just use the matrix in the upper left part of Eq. (3.3), we find
(3.18)  
Here we have defined
(3.19) 
giving the ratio between this case and the (previously considered) case where and are uncorrelated. Since , it is clear that (at least for sufficiently high ). The limit (equivalent to ) corresponds to a perfect correlation between temperature and polarization and is therefore the same as the temperatureonly case. On the other hand, (corresponding to ), is when and are uncorrelated.
The signaltonoise ratio for our hypothetical scaling parameter when is unmeasured is just
(3.20)  
For the standard CDM cosmology, we show the correlation coefficient in Figure 2, illustrating that for some multipoles the magnitude of the correlation between and can be as high as 60 %. We also plot as a function of ; this effectively shows the amount of information that would be lost by neglecting the crosscorrelation power spectrum. The plot shows that this can be as much as 20 %, but for high is a little under 10 %.
3.7 Full Tt, Ee, and Te
Now let us consider the case when , , and are all measured (for a smiple scaling parameter first). We then need to consider the full covariance matrix when calculating the Fisher information and we recover
(3.21) 
i.e., the same as in the situation where and are assumed (unrealistically) to be uncorrelated.
This can be seen explicitly by inverting Eq. (3.3):
(3.22) 
where the quantity has been introduced as the ratio of the polarization to temperature anisotropy and we also define and to simplify the expression above, specifically with
(3.23)  
(3.24)  
(3.25)  
(3.26) 
The quantity is also plotted in Figure 2, showing that polarization anisotropies are only a few percent of temperature anisotropies at large angular scales, and asymptote to a value close to 20 % at higher multipoles.
Following the algebra introduced above, the Fisher matrix becomes
(3.27)  
(3.28)  
(3.29)  
(3.30)  
(3.31) 
which is similar to the result of the previous section, with . Therefore, measurements to some of the full , , and spectrum have exactly twice as much information as a temperatureonly (or polarizationonly) experiment. In appendix A we give a conceptually simpler derivation of this same result, by transforming to fields that are uncorrelated by construction.
What about more complicated parameters? For power spectra with arbitrary dependence on a single parameter the total squared SNR is then
(3.32) 
In appendix A we present an interpretation of the contributions to the total SNR, which come from temperature (uncorrelated with polarization), polarization (uncorrelated with temperature), and the correlation itself, (defined in Eq. A.3). Equation 3.32, however allows us to consider the situation at a scale for which . In this case we obtain
(3.33) 
where the final term is given by 4 times Eq. (3.15). We see here (somewhat surprisingly) that the correlation will contribute information even when , provided that .
3.8 Total SNR for a single parameter
To complete this section, let us be more explicit, and give a quantitative example. For a cosmicvariancelimited experiment up to and covering the entire sky, the total signaltonoise ratio in a single scaling parameter is . And if we add mode polarization information, also cosmicvariancelimited to the same , we obtain .
An experiment that makes ideal measurements of and out to some has precisely twice as much information (i.e., constraining power for a scaling parameter) as a only experiment, provided that all of , , and are measured. A parameter with more complicated dependence, like for example, will have more information from due to its stronger contrast between peaks and troughs. Such a parameter will also have part of its constraint coming from how the correlation itself changes (an effect that cannot be seen with a simple scaling parameter).
In terms of constraints on specific parameters, we know that the situation is more complicated still. For example, polarization data are important for breaking particular degeneracies [47, 48] (especially for determining the reionization optical depth, ), and so polarization may constrain some parameters much better than expected for simply twice as much information. We have already seen this presented in Figure 1 In Section 5 we will focus on several CDM parameters and how their correlations affect parameter constraints.
4 Additional CMB information
Before investigating parameter dependence in detail, it is worth noting that some other information that could come from the CMB, in addition to the three power spectra we have been considering.
If one could measure to the same then instead of the full + + measurement giving twice as many modes as come from alone, we would now have 3 times as many modes. In practice we expect primordial modes to be quite weak, and it is extremely unlikely that we could measure this power spectrum beyond the first few hundred multipoles (e.g., Ref. [49]). Hence modes are never going to add substantially to the mode count. On the other hand, any measurement of primordial modes would provide a direct constraint on the tensortoscalar ratio that would be better than the indirect constraints from other power spectra. Hence (as is well known) the constraints on this additional parameter are dramatically improved through better mode experiments.
We have been assuming that the CMB sky contains Gaussian perturbations, but we know that this cannot be exactly true. Certainly there is hope that we may one day detect nonGaussianity from higherorder correlations in the CMB (e.g., Ref. [50]), and here polarization offers the promise of pushing the uncertainties down. However, we do not expect such a signal to give a very high SNR (at least compared to the power spectra), since the CMB is clearly very close to Gaussian.
One exception to this is that the 4point function of the CMB sky contains correlations from the effects of gravitational lensing. This signature allows us to estimate an additional power spectrum, , which has already been done to by Planck [43]. If we could measure this power spectrum to the same maximum multipole as for the temperature and polarization power spectra, then it would add the same number of modes again. However, things are not so simple, because the index for lensing comes from the coupling of modes at different scales, and hence making a noisefree temperature map out to will not give a cosmic variance limited measurement of to . But putting that aside, lensing will add effectively as many modes as temperature and polarization. Hence there is in principle times as much information contained in a full CMB mapping experiment (plus some additional information from modes) as there is in a measurement purely of . On the other hand, in terms of parameter constraints, the lensing power spectrum has little dependence on cosmological parameters other than amplitude [51, 52].
Since is a Gaussian random field, its power spectrum has the same statistical properties as the or maps. Therefore, adding lensing to the covariance matrix is similar to adding or to . One could also consider adding the information coming from lensed modes – but then one would have to account for the fact that these modes are not Gaussian (since they come from a convolution of modes with lensing modes [53]). However, the addition of lensed modes provides little to no improvement on parameters if modes and lensing modes are already accounted for. The reason is an intuitive one: the lensed modes come directly from unlensed modes and modes. Thus adding modes simply double counts some combination of the  and modes (albeit at different scales). Thus adding the lensed modes only helps in the noisedominated case. In Section 5.3 we consider including lensing modes to our data vector (Eq. 3.1); for the reasons stated above we do not simultaneously consider lensed modes.
5 CMB parameter information
5.1 Relationship with overall SNR
We now discuss the connection between the information in the power spectra and the constraints on the 6parameters of the standard CDM model. One might expect the total SNR in space to be very crudely of order the SNR in the parameter space; however, in detail we do not expect them to be the same. This is due to the degeneracies between parameters and the sensitivity of subsets of the data to changes in specific parameters, as well as the important fact that the power spectra do not depend linearly on the parameters, as was discussed already (see Figure 1).
One can think of the dimensional parameter space as a data compression scheme. This compression reduces the measured power spectrum values and their uncertainties into numbers and their corresponding uncertainties. The power spectra depend linearly on the parameter (apart from small lensing effects), and hence if that was the only parameter, then its SNR would be the same as the SNR for the power spectra, and hence would be the same as the mode counting exercise discussed in the section 3. However, other parameters are not “linear” in this sense, and hence can be constrained better or worse than seen for the simple case of .
As a simple example, if someone wanted to treat as a parameter instead of , then the SNR would be better by a factor of 2. A more dramatic change will come from considering rather than . And in general the standard parameters affect the power spectra in ways that are fairly different from those of a scaling parameter. The most nonlinear of the parameters is , as was already discussed, since fairly small changes in can result in large changes to the power spectra, because of the relative sharpness of the adiabatic peaks and troughs. The best way to understand the constraints on parameters in general is to use the Fisher matrix to investigate what happens for the standard 6parameter cosmology. In general, however, we should not be surprised to find that the SNR values for some cosmological parameters only differ by factors of order unity from those of a linearly scaling parameter.
A comparison of the total SNR in parameters with the total SNR in the power spectra is shown in Figure 3. Here the error bars for each parameter are derived from a Fisher matrix calculation, assuming a noisefree experiment with sample variance only, with sky coverage of 50 % (picked to approximately match that of Planck). The fiducial model used for the Fisher matrix calculation is CDM with parameters equal to the bestfit values of the Planck2015 “TT+lowTEB+lensing” combination [16]. The total SNR in parameter is
(5.1) 
and Eq. 2.3 (including an factor) is used to calculate the total SNR in power spectra.
Figure 3 (top panel) shows that the total in parameters can be a factor of as much as 100 larger than the total in the power spectra. However, this number is highly dependent on the set of parameters that are chosen and can vary dramatically, e.g., if is replaced by , as plotted in the bottom panel of Figure 3. The ratio of the total SNR in parameters to the SNR in the power spectra (temperature and polarization) is close to unity when is substituted for . This suggests that the combination of parameters in the bottom panel of Figure 3 is more “linearly” related to the power spectra than the set in the top panel. The behaviour seen in this figure explains the observation made in Section 1 that the SNR on from Planck exceeds the total SNR in the power spectrum.
One can learn more about parameter constraints by looking more closely at Figure 3. Firstly, the relatively poor constraints on the 6parameter set at low multipoles is a result of parameter degeneracies, which are not broken until higher multipole data are included. Secondly, we see that adding polarization data makes a substantial improvement to the overall constraint on parameters. The more dramatic improvement in the total constraint (green lines in the figure) at low multipoles is a result of the wellknown ability of CMB polarization data to break the degeneracy. Thirdly, the polarization data on their own are more constraining (for the same ) than the temperature data; this arises essentially because the polarization power spectra are “sharper” than for , as has been stressed in other studies (e.g., Ref. [15]). We also see structure in these SNR curves that clearly reflects the shape of the s; we shall discuss this further in the next section.
5.2 Parameter constraints from power spectra
Although the SNR in the power spectra gets effectively shared out among the parameters, as we have seen this is only very crudely correct, and in practice the detailed constraints on the parameters will change based on many factors. In particular the parameter constraints will depend on which power spectra are used, which multipole range is measured, and what set of parameters was chosen in the first place. We need to appreciate that some information is special for particular parameters, e.g., largeangle polarization for ; thus we should focus on the range that is important for each parameter. We now describe this connection more comprehensively by showing some examples and comparing with the current uncertainties. It is important to realize that we are not intending here to make forecasts for specific experiments (with particular assumptions about beamsize, noise, foreground contamination, etc.), since this has been done before. Instead we are asking the more general question of how good the parameter constraints could ever become, by comparing the ideal values with current constraints from Planck.
Figure 4 shows the results of a Fisher calculation for a cosmicvariancelimited, CMBonly experiment, where we have picked a sky coverage of 50 %, which approximately matches the effective area used by Planck for the main parameter constraints (although this is frequency dependent, see Ref. [16]). Note that the errors would just scale as for other values. The axis here indicates the maximum used in the calculation. The red line is for a temperatureonly experiment, while the blue and green lines are for polarizationonly and the full set of three spectra, respectively. The error bars for each parameter are compared with the Planck2015 “TT+lowTEB+lensing” 68 % confidence limits. The vertical solid and dashed lines show the temperature peak and trough positions, respectively, which are almost the same as the polarization troughs and peaks [36].
The modecounting argument (of Section 2) tells us that every has equal weight for contributing to the SNR of the power spectrum, but, as we have seen, this is not true for the parameter SNR, since the s are not equal in delivering constraints on individual parameters. For example, while the multipoles around the third trough, –1100, are particularly useful for reducing the uncertainty on , the information gain from the –1650 range (from the fourth peak to the fifth trough) is almost negligible.
A careful examination of Figure 4 shows that in general the troughs are more important than the peaks for reducing the error bars (particularly clear when focusing on and , for example). We also see (in the green lines) that when we add temperature to polarization the curves are much smoother. This is because the effects coming from the troughs and peaks of more or less cancel with the effects coming from the troughs and peaks of .
But there is still the issue to explain of why the troughs, which are obviously lower than the peaks, and therefore (one might expect) carry less information, should give stronger constraints. By comparing Figure 4 with Figure 5 – which shows the same predictions but with lensing effects turned off – one can see that the reason for the importance of the troughs is the effect of lensing on the spectrum. The explanation is that since lensing smooths the peaks and troughs, while preserving total power, then the relative change from lensing is larger around the troughs than around the peaks, and hence the troughs can give better constraints on parameters.
Comparing Figure 4 with Figure 5 also shows that while the lensing of the power spectra is useful for breaking the – degeneracy, it makes the constraints on the set weaker. This is because the peaks and troughs are “sharper” in the unlensed spectra, and are therefore a better source of information for constraining . The uncertainties on the matter densities are also improved as a result of the better constraints on and the correlations among the parameters.
What we have highlighted here is the simple observation that constraints on cosmological parameters come from two basic factors: the first is having power spectra that are sensitive to changes in parameters (see Eq. 3.32 and Figure 1); the second is the ability for power spectra to break degeneracies between parameters. In the following subsection we will consider the effect of adding lensing modes, which generally have weak dependence on cosmological parameters. Nevertheless, these modes can break parameter degeneracies, which is crucial for going beyond the 6parameter model.
5.3 Extended parameters – neutrino mass
We will now choose a specific example to illustrate how we can think about the relationship between information and parameter constraints. Future CMB observations will target extensions to the 6parameter CDM model. In particular, the detection of the sum of the masses of the neutrino species seems like a realistic (although challenging) possibility in the near future [54, 55, 56], since the current upper limit on [16] is only a factor of a few higher than the eV limit imposed by direct measurements of mass differences (e.g., Ref. [57]). However, the left panel of Figure 6 shows that such a measurement is not feasible with CMB temperature or polarization maps alone, because there is not enough information in the power spectra out to .
Besides the temperature and polarization fluctuations, one can also map the fluctuations of the gravitational potential or, equivalently, the lensing deflection angle. Inclusion of the lensing power spectrum in the Fisher formalism is discussed in Ref. [58], assuming Gaussianity and ignoring correlations between different multipoles. Making similar assumptions, we find that CMB lensing can improve the constraints on the total neutrino mass dramatically, as is shown in the right panel of Figure 6. Here we have assumed that the CMB lensing power spectrum, , and its correlations with temperature and mode polarization can be measured with samplevariance accuracy over of the entire sky (see e.g., Ref. [58]). Given these assumptions, the full set of CMB power spectra reaches the fiducial sensitivity (corresponding to meV in mass) at ,^{2}^{2}2Here we are using the same multipole symbol, , for lensing and temperature or polarization. and can ultimately measure the mass at about the 2 level. However, there is a very strong degeneracy between the neutrino mass and the dark matter density (with a correlation coefficient at ), as well as a fairly strong degeneracy between neutrino mass and (with a correlation coefficient of ). Hence, any additional independent measurement of these parameters – e.g., via galaxy weak lensing, baryon acoustic oscillations, or redshift space distortions – that can break these degeneracies will lead to substantial improvement.
6 Discussion
In terms of modecounting, the information contained in the CMB anisotropies is clearly finite, because it is limited by cosmic variance and the fact that the power spectra damp at the highest multipoles. For temperature information alone the total SNR in the power spectrum is . Planck has measured most of what is available out to , with ACT and SPT continuing that out to higher , but over relatively small , and with foregrounds making it seem unrealistic to push beyond , say. This means that although we can continue to measure our CMB sky to ever more sensitive levels, we have already reached a point where the bulk of the useful information has already been extracted from the temperature anisotropies.
However, the situation is different for polarization information, since the foregrounds (from galaxies and clusters of galaxies) are very weakly polarized, and hence there is hope that we should be able to measure the primary polarization anisotropies to much higher multipoles, and perhaps considerably higher [59, 35]. This means that there is at least an order of magnitude more polarization information to extract from the CMB sky. Moreover, as we have seen, the polarization data can place tighter constraints on parameters in general, and on some parameters in particular. The and power spectra add additional information, but in practice this is probably a small fraction of what is available from the  and modes.
Despite the dramatic improvement still expected from CMB polarization, the maximal SNR in the CMB power spectra is from primary CMB anisotropies. To improve cosmological parameter constraints we therefore need to go to 3dimensional surveys (such as high 21cm fluctuations) where there are considerably more modes [19]. As an illustration of what this CMB limitation means, let us consider the determination of the curvature of space, . The cosmic variance limit is at the level, since this is the amplitude of the curvature perturbation on the Hubble scale [60]. The current uncertainty from CMB data is at the level, and given the above argument, we expect this to only decrease by about another order of magnitude. Hence, assuming that the Universe is sufficiently close to being spatially flat, then we will never be able to determine whether it is flat or curved using CMB data alone – there is simply not enough information for us to achieve the required SNR level on from CMB anisotropies. Ambitious future experiments may probe 3D modes, which would in fact allow us to reach below the cosmicvariance limit for the measurement of within our observable volume.
The discussion here has focused on the Gaussian primary anisotropies, supplemented by CMB lensing. However, we should acknowledge that there is also some cosmological information content in the secondary anisotropies (i.e., cosmic IR background, integrated SachsWolfe effect, SunyaevZeldovich effects, etc.). These effects certainly enable further cosmological information to be measured, not just on the lastscattering surface, but also at other epochs along the light cone. Nevertheless the additional information seems limited in its scope for constraining background parameters, because either there is only a modest amount of information available at all (like in the ISW effect), or the additional information is still effectively on a 2D surface. The only way to obtain a dramatic improvement in constraining power will be to pursue methods that are fundamentally 3D.
7 Conclusions
We have taken a pedagogical approach to investigating the information content in CMB anisotropies, in the sense of constraining the cosmological model. It is clear that for temperature anisotropies, we have already mined a substantial part of what is available, and we are effectively running out of information. However, for CMB polarization we still have a way to go, and there may be an order of magnitude more constraining power still to extract from the CMB sky.
The CMB power spectrum has an SNR of approximately , which can be thought of as a simple modecounting calculation. The SNR on a scaling parameter (like ) is the same, while some parameters (such as ) have a dependence which is “nonlinear,” hence allowing them to be constrained more tightly than the total SNR for the power spectrum as a whole.
We have shown that the mapping from information about the CMB power spectra to information about the cosmological parameters has two ingredients, namely the sensitivity of the power spectra to parameters (Eq. 3.32 and Figure 1), and the ability to break parameter degeneracies (Figures 4–6). The latter concept will likely become more important as we explore further data sets to constrain cosmology within and beyond the 6parameter CDM paradigm.
Temperature and polarization anisotropies are correlated, and hence contains information that enhances what is there from and alone. A full measurement of , , and yields one additional quantity per pixel in the map (compared with just measuring ), and hence a total SNR that is times bigger than for alone. In addition to gaining back the information lost due to the correlation, when one measures all three power spectra one can also obtains additional information about how the correlation itself changes (most easily seen in Eq. A.12–A.13). In fact the information gained from the correlation can sometimes be greater than that from temperature (Ref. [15]) or polarization (Figure 1) alone.
Adding modes could in principle give one more quantity for each pixel, although in practice the primordial signal is expected to be weak. On the other hand CMB lensing provides an additional map of , which provides a whole other set of modes that can be used to constrain parameters. For the standard CDM model lensing helps to break the – degeneracy, but for extensions to the standard model (e.g., with neutrino mass included as an additional parameter) these data could be even more useful in future.
Constraints from the CMB will continue to improve as we measure more modes from polarization and from lensing. There is certainly a bright nearterm future ahead as these measurements move towards being samplevariancelimited to small angular scales. In the longer term future there will be other secondary signals extractable from CMB measurements, but ultimately, to dramatically increase the number of modes probed, one will need to go to other observables (such as redshifted 21cm maps), which can provide 3D surveys of our past light cone.
Appendix A Decorrelating T and E
As an alternative to the derivation of total CMB information content in Section 3, one can define new variables that are uncorrelated, so that the covariance matrix becomes diagonal. Doing this makes it clear that the improvement in SNR from including mode polarization information is exactly , regardless of how and are correlated. The change in adding modes is then trivial (since they are uncorrelated with either or ). The approach we describe here is for any two correlated data sets, the specific example will be temperature and polarization data. Whether they are lensed or not does not change the arguments, although the addition of lensing data themselves would require a generalization of the method to deal with the associated correlations.
We can decorrelate temperature and polarization simply by rotating the data into a new basis, designated as , through an appropriate angle :
(A.1) 
We denote the power spectra derived from this new set of variables as and they are given by
(A.2) 
or , with defined as the transformation above. By demanding that and be uncorrelated (equivalently that ) we fix the angle to be
(A.3) 
Note that there are alternative approaches to decorrelating and , e.g., by leaving unaltered, while removing the correlated part from [61]; the approach we describe here is easy to picture as a rotation. The covariance matrix for these new power spectra can simply be derived by computing the 4point functions of and . However, a much simpler method is to take the previous covariance matrix (i.e., ) and make the following replacements: ; ; and . The covariance and inverse covariance matrices then become
(A.4)  
(A.5) 
respectively (or equivalently and ). The data vector takes the simple form
(A.6) 
Note that even although is zero, this does not imply that for a general parameter , will vanish.
The transformation performed here leaves the Fisher matrix unchanged. This is because
(A.7)  
(A.8)  
(A.9)  
(A.10) 
Hence, for a single scaling parameter , such that , we trivially find
(A.11) 
which is the same as the result of Eq. (3.31).
We can also consider a general parameter like in Section 3.7, in which case we will have a fixed rotation matrix and hence