An improved estimator for non-Gaussianity in cosmic microwave background observations
An improved estimator for the amplitude of local-type non-Gaussianity from the cosmic microwave background (CMB) bispectrum is discussed. The standard estimator is constructed to be optimal in the zero-signal (i.e., Gaussian) limit. When applied to CMB maps which have a detectable level of non-Gaussianity the standard estimator is no longer optimal, possibly limiting the sensitivity of future observations to a non-Gaussian signal. Previous studies have proposed an improved estimator by using a realization-dependent normalization. Under the approximations of a flat sky and a vanishingly thin last-scattering surface, these studies showed that the variance of this improved estimator can be significantly smaller than the variance of the standard estimator when applied to non-Gaussian CMB maps. Here this technique is generalized to the full sky and to include the full radiation transfer function, yielding expressions for the improved estimator that can be directly applied to CMB maps. The ability of this estimator to reduce the variance as compared to the standard estimator in the face of a significant non-Gaussian signal is re-assessed using the full CMB transfer function. As a result of the late time integrated Sachs-Wolfe effect, the performance of the improved estimator is degraded. If CMB maps are first cleaned of the late-time ISW effect using a tracer of foreground structure, such as a galaxy survey or a measurement of CMB weak lensing, the new estimator does remove a majority of the excess variance, allowing a higher significance detection of .
Over the past two decades our understanding of the physics of the early universe has gone from speculative to precise. We have numerous data sets which probe both the overall expansion of the universe as well as the statistics of the large-scale structure we see today. In addition to these observations, our standard cosmological model, with only six free parameters, provides a good fit to all of these data sets (see, e.g. Komatsu et al. (2011)). The standard cosmological model relies on some basic assumptions about the origin and evolution of the universe: namely that soon after the big bang, the universe underwent a period of cosmic inflation during which nearly Gaussian perturbations were produced in an otherwise isotropic and homogeneous universe. After this period the universe was ‘reheated’– i.e., populated with a thermal plasma of standard model particles. Particle physics dictates the interactions between the constituents of this primordial fluid, while general relativity dictates how the perturbations grow to form the structures we observe in both the clustering of galaxies, as well as in the anisotropies of the cosmic microwave background (CMB).
With increasingly precise observations, we may test the foundations of the standard cosmological model. Here we will be concerned with testing the assumption that the statistics of CMB fluctuations are Gaussian. Although small levels of non-Gaussianity may develop through non-linearities in the standard cosmological model Salopek and Bond (1990); Pyne and Carroll (1996); Mollerach and Matarrese (1997); Mollerach et al. (1995); Munshi et al. (1995), significant departures from Gaussianity can only be explained by changes to the fundamental physics of the early universe. For example, a detection of primordial non-Gaussianity could yield information on the interactions of the field (or fields) that seed the primordial curvature perturbation. If this field is the one that drives inflation (the inflaton), the measurement could yield insight into the detailed physics of inflation. If the curvature perturbation is seeded by a field that is sub-dominant during inflation (as in the curvaton scenario Mollerach (1990); Lyth et al. (2003); Lyth and Wands (2002); Gordon and Lewis (2003)), the primordial fluctuations may also be significantly non-Gaussian. Some more exotic possibilities, such as a non-canonical kinetic term for the inflaton Cheung et al. (2008); Alishahiha et al. (2004); Arkani-Hamed et al. (2004), spatially modulated reheating Dvali et al. (2004), and novel initial vacuum states for the fluctuations Martin et al. (2000), could also be probed by a detection or limit to non-Gaussianity in the CMB.
There are an infinite number of ways in which the statistics of the CMB may be non-Gaussian, although the effective field theory approach shows that inflationary theories can only produce three forms for the non-Gaussian primordial three-point correlation function Cheung et al. (2008). In this study we restrict our attention to the ‘local model’ of non-Gaussianity, in which the primordial curvature perturbation, , can be written in terms of an auxiliary Gaussian field as Salopek and Bond (1990); Gangui et al. (1994); Gangui (1994); Komatsu and Spergel (2001); Verde et al. (2000)
where the amplitude parameterizes the level of non-Gaussianity. This model is particularly important because if were detected then all single-field slow-roll inflation models would be ruled out Creminelli and Zaldarriaga (2004); Smith et al. (2011a). In addition, the local-type non-Gaussianity is predicted by the curvaton model Lyth et al. (2003); Lyth and Wands (2002); Gordon and Lewis (2003), and so a detection could provide support for, or constraints to, this model.
Although there are several ways to estimate the level of non-Gaussianity in the CMB, an estimate of the harmonic three-point function (known as the CMB bispectrum) is the most sensitive Komatsu and Spergel (2001); Smith and Kamionkowski (2012). Given the large number of modes in the bispectrum (after restrictions due to statistical isotropy there are terms in the bispectrum, where is the maximum multipole measured by a given experiment) a full exploration of the likelihood surface is computationally prohibitive, and so attempts to constrain the level of non-Gaussianity are made through estimators which have been constructed to minimize variance under the null hypothesis– i.e., when applied to CMB maps which are purely Gaussian Babich and Zaldarriaga (2004); Babich (2005).
A direct application of the minimum-variance null-hypothesis (MVNH) estimator of from the CMB bispectrum is also computationally expensive since the computation involves a very large number of terms (). The Planck satellite The Planck Collaboration (2006) will measure multipole moments so a blind application of this estimator will take operations to compute! The real computational expense is even higher, as thousands of simulations must be run to characterize the statistics of this estimator. For a special set of non-Gaussian models (of which the local model is one) fast Fourier transforms (FFTs) may be used to greatly reduce this computational expense Komatsu et al. (2005); Smith and Zaldarriaga (2011). Using FFTs the estimator may then be evaluated with a computation time that scales as , considerably reducing the computational expense Komatsu et al. (2005); Munshi and Heavens (2010). CMB data are currently consistent with vanishing non-Gaussianity Smith et al. (2009); Yadav and Wandelt (2008); analysis of the WMAP 7-year data yields the 2- constraint Komatsu et al. (2011).
If future data remain consistent with the null hypothesis, then the standard MVNH estimator should have the minimum variance. As the data improves, if is found to be significantly different from zero, then the standard MVNH estimator is non-optimal and there may be other estimators which have a smaller variance (and hence increased signal-to-noise, ). This is because the MVNH estimator was constructed to have the minimum variance when and when applied to data with the variance is no longer minimized. In particular, when applied to the local model, for the MVNH estimator exhibits a variance which is proportional to , leading to a saturation of the of the estimator, even for large values of Creminelli et al. (2007); Liguori et al. (2007); Smith et al. (2011b). This indicates that a new method to estimate may extract a higher from the measurements.
In one approach, estimators are abandoned, and the full likelihood surface is explored using Bayesian methods Elsner and Wandelt (2010). Although this approach has been shown to give improved constraints on , it is computationally expensive, taking over 150,000 CPU hours to compute a best fit-value and confidence interval for with . In another approach, an improved estimator is constructed by defining a new realization-normalized estimator (RNE) Creminelli et al. (2007); Smith et al. (2011b). This method normalizes the MVNH estimator using a particular combination of observed multipoles which are chosen so as to divide out the extra variance present. In Ref. Creminelli et al. (2007) the RNE was derived in the flat sky and Sachs-Wolfe limit (in which the temperature anisotropies are given by fluctuations in the gravitational potential at the surface of last scattering, White and Hu (1997)). In these limits, it was shown in Ref. Creminelli et al. (2007) that the RNE successfully removes all the variance proportional to of the MVNH estimator when .
In this work we generalize the RNE to include the full radiation transfer function and provide expressions that are valid on all-sky CMB maps. We compute the variance of the RNE and find that, unlike in the Sachs-Wolfe limit, only of the -dependent variance is removed.
The residual variance is a result of the late-time, large-scale integrated Sachs-Wolfe (ISW) effect, in which CMB anisotropies are generated by gravitational potential decay along the line-of-sight Rees and Sciama (1968); Crittenden and Turok (1996); Boughn and Crittenden (2005); Giannantonio et al. (2006); Ho et al. (2008); Giannantonio et al. (2008); Francis and Peacock (2010). To understand why the late-time ISW effect reduces our ability to remove this extra variance, we note that the form of the local-model non-Gaussianities implies that the bispectrum signal is dominated by harmonic-space triangles which correlate one large-scale mode with two small scale modes (i.e., ‘squeezed’ triangles). If the large-scale anisotropies are generated only by the Sachs-Wolfe effect then they can be inverted to give a direct measurement of the value of the large-scale gravitational potential at the surface of last scattering. These measurements, in turn, allow us to directly infer the non-Gaussian contribution to the large-scale anisotropies, leading to a complete removal of all of the excess variance when . However, in the presence of both the late-time ISW and Sachs-Wolfe effect, this inversion is not possible, leading to a degradation of the performance of the RNE. We find, however, that if a tracer of foreground structure (such as a wide-field galaxy survey or the deflection field responsible for weak lensing of the CMB) is first applied to ‘clean’ a CMB map of the late-time ISW effect then of the -dependent variance may be removed by using the RNE.
We begin by summarizing the non-Gaussian local model and its bispectrum in Sec. II. We provide an intuitive, geometric argument for the origin of the -dependent variance that afflicts the local-model MVNH estimator in Sec. III. The realization-dependent normalization (RNE) is derived in Sec. IV using a method that applies in the presence of the full radiation transfer function. In Sec. V we compute the variance of the RNE using the full radiation transfer function, finding that the -dependent variance of the MVNH estimator is only partially removed. In Sec. VI, we show that the -dependent variance is further reduced by first cleaning the CMB of the late-time ISW contribution using a large-scale structure survey [such as the NRAO VLA Sky Survey (NVSS), or the next-generation Joint Dark Energy Mission (WFIRST) proposal] or a measurement of CMB lensing. Conventions for flat-sky approximations employed in numerical calculations are stated in Appendix A. Detailed expressions needed to obtain the variance are derived in Appendix B. In Appendix C we present a computationally efficient algorithm (using fast Fourier transforms) to compute our estimator on full-sky CMB maps.
Ii The non-Gaussian local model
First we will review the basic equations that relate to the definition and estimation of the CMB bispectrum, restricting attention to the local model as defined in Eq. (1). The CMB bispectrum is defined by
where the are the usual multipole moments of the temperature map, is the reduced CMB bispectrum, and the Gaunt integral is given by
and is a Wigner 3-coefficient. A product of three multipoles form an unbiased estimator for the angle-averaged CMB bispectrum:
and is the primordial (i.e., theoretical) angle-averaged bispectrum. The normalization of this estimator is given by the variance under the null hypothesis, Yadav et al. (2008):
where if , if , and otherwise. Finally, the angle-averaged primordial bispectrum, , is given in terms of the reduced bispectrum
Now restricting attention to the local-model bispectrum, at any radial location along the line of sight the primordial curvature potential, , can be decomposed into spherical harmonics. The local model ansatz in Eq. (1) then implies that
This allows us to write the reduced bispectrum in a line-of-sight integral:
where the two filter-functions are given in terms of the transfer functions
and is the CMB temperature anisotropy transfer function Seljak and Zaldarriaga (1996). The generalization to polarization is straightforward Yadav et al. (2008); in this work we limit our attention to the signature of primordial non-Gaussianity in the CMB temperature.
Finally, we will need expressions for various auto and cross-correlations in terms of the distance along the line-of-sight . The power-spectrum for the auxiliary Gaussian field is
These expressions lead to the line-of-sight autocorrelation Liguori et al. (2003)
With this we can write the line-of-sight cross-correlation
These expressions will be useful in the following discussion.
Iii The origin of the increased variance
The standard non-Gaussian estimator written in Eq. (5) is only optimal under the null hypothesis: . In the case where the estimator becomes suboptimal (in the sense that it no longer saturates minimum variance attainable according to the Cramer-Rao bound, see Refs. Babich and Zaldarriaga (2004); Creminelli et al. (2007)) and has a variance which depends on . Specifically, in the flat-sky Sachs-Wolfe limit the variance scales as Creminelli et al. (2007); Smith et al. (2011b); Liguori et al. (2007)
where is the amplitude of the power-spectrum of the primordial curvature perturbations, . From this expression we can see that for that in the large limit the variance flattens out and scales as .
To understand the origin of this -dependent variance let us first consider a simple toy model. Let be a random variable with and . As we will discuss further, the -dependent variance in the estimator comes from terms which look like
The mean and variance of are
where is the number of data points. From this, it is clear that the of is given by
We can see that the is a constant. The reason is simple: since the are
uncorrelated, the off diagonal terms in with do not contribute to the
signal; on the other hand, they do contribute to the noise, leading to a constant
To understand the origin of the increased variance we expand the MVNH estimator in Eq. (5) in powers of
The total observed multipole moment is
We now shed light on the origin of the increased variance by considering the flat-sky and Sachs-Wolfe limit of Eq. (25). In this limit the radiation transfer function is given by where is the conformal distance to the surface of last scattering. We refer the reader to Appendix A for expressions which relate the full-sky to flat-sky expressions. In these limits we have Creminelli et al. (2007); Smith et al. (2011b):
where the primordial power-spectrum is taken to be scale invariant, . Rewriting in this way we can see that it is a product of two triangles with a shared side . We show a graphical representation of this in the top section of Fig. 1. We can rewrite schematically as
with labeling the triangle and labeling the triangle .
The mean is given by a sum over a single triangle, as shown in the right panel of Fig. 1:
Therefore, we can see that the mean is composed of a sum of terms. On the other hand, Fig. 1 shows that the variance of is computed from a sum of a product of two triangles which gives . As in the simple example given above, this shows that there are a large number of terms which do not contribute to the mean of but do contribute to the variance, leading to a signal-to-noise () which does not decrease as fast as one might have expected with increasing .
It might seem that this can be corrected by choosing a different weight function . Since this weight was chosen to optimize the estimator under the null (i.e., ) hypothesis it is natural to think that a different weighting will be needed when . Unfortunately, since the terms in which contribute to the variance but not the mean involve contractions between (,) and (,), it is clear that no choice of weight, which depends only on the ‘observed’ indices , will remove these terms. Therefore, a new weighting cannot decrease the variance of this estimator in the high regime. Another way to remove the terms which do not contribute to the signal but do contribute to the variance is to use a realization dependent normalization.
Before we discuss how to construct a realization dependent normalization that reduces the variance in let us first demonstrate in what sense the squeezed limit dominates the variance in . In the flat-sky Sachs-Wolfe limits, it has been shown that the variance is well approximated by Smith et al. (2011b)
where indicates the sum is over . We can calculate this by first writing it in a less compact form: In these limits the bispectrum is given by Babich and Zaldarriaga (2004)
The second sum in Eq. (33) is given by
and the summand is dominated by triangles where one index is much less than the other two (i.e., the squeezed limit) so that
We then have that
We can see from this expression that it will be dominated by the terms where is smallest. Therefore, taking the squeezed limit again we write and find that
where we have taken the limit and used the expression for found in Eq. (18).
The numerical calculation found in Ref. Smith et al. (2011b) shows that the variance decreases slightly faster with - non-squeezed configurations do make some contribution to the estimator which causes the variance fall off more rapidly with . For other forms of non-Gaussianity, the term in the expansion of the estimator can not be written simply as in Eq. (25). An expression analogous to Eq. (30), however, may still be written though with a different weighting due to the shapes of the Fourier-space triangles which dominate the in these cases. Since these models are not dominated by the squeezed limit, they are not afflicted by the slow scaling with of the that occurs for the local model.
Iv Realization normalized estimator
In order to improve the estimator when , we will search for a realization dependent normalization which removes some of the variance in those terms in the estimator which do not contribute to the signal but do contribute to the noise. To do this we follow the approach presented in Ref. Creminelli et al. (2007) and construct a new realization-dependent normalization which will remove as much of the -dependent variance of as possible. In other words, we wish to define a realization-dependent normalization that is highly correlated with .
We write the new realization-dependent normalization as an estimator for , , so that we obtain a new estimator for , the realization-normalized estimator (RNE):
The extent to which the new realization-dependent normalization decreases the variance of this estimator is determined by the correlation between and ; in the limit that these two terms are fully correlated the variance is completely removed.
Looking at the equation for [Eq. (25)] the only term that cannot be immediately written in terms of observables is . Therefore, when constructing the estimator we must find a minimum variance estimator for . To do this we define a weighted sum,
and demand that the weight minimizes the variance,
Choosing the weight
satisfies Eq. (41) and thus leads to a minimum variance estimator for . With this, the estimator for can be written
where we have made the approximation , which is true to lowest order in .
We note that there are several other ways of arriving at the same estimator for . In particular, if we were to instead search for a minimum variance estimator for as in Ref. Komatsu et al. (2005), then we would again be lead to the estimator for in Eq. (44). Furthermore, the same relation between the underlying gravitational potential and the observed multipoles appears when calculating the shape of the likelihood surface for the MVNH estimator as discussed in Ref. Babich (2005).
As we now discuss, the improved estimator given by Eqs. (39) and (44) (the RNE), has two important properties. First we show that in the case where the transfer function is just the Sachs-Wolfe transfer function, the RNE is the same as the estimator presented in Ref. Creminelli et al. (2007). Our discussion then shows how the RNE, in this limit, is able to remove all of the -dependent variance from the standard MVNH estimator. Second, when the full transfer-function is used, the combination of both the Sachs-Wolfe and late-time integrated Sachs-Wolfe (ISW) effects on large angular scales causes the RNE to be less effective; it only removes part of the -dependent variance. Using other tracers of large-scale structure, as discussed in Ref. Mead et al. (2011), allows us to improve the performance of the RNE, so that it can remove most of the -dependent variance. Finally, in Appendix C we discuss how our estimator can be rewritten in terms of real-space quantities, so as to be computationally efficient.
V Properties of the realization-normalized estimator
To simplify the calculations in this section we work in the flat-sky approximation. Although the results presented will differ when compared to full-sky calculations, since we are interested in calculating the fractional reduction of the -dependent variance (relative to the -dependent variance of the standard MVNH estimator), we expect the differences to be small.
To quantify the statistical properties of the RNE we must calculate the mean and variance of the ratios and . To do this we will use an approximate formula for the variance of the ratio of two stochastic variables. Let and be two stochastic variables with means and , variances and , and covariance . We wish to calculate the mean and variance of
The probability density function (PDF) of may be computed analytically if and are normally distributed Fieller (1932). However, the formula for the PDF is quite complicated and it does not immediately yield analytic formulae for the mean, , and variance . Instead, we will derive an approximate expression. To do this we write and . We then assume ; this will be true near the peak of the normal distribution if (we have verified that the stochastic quantities discussed we are interested in satisfy this condition). We can write
With this we can easily compute
v.1 The mean
We wish to show that . To leading order in in the flat-sky approximation,
The fact that requires that the Wick contraction which contains does not contribute to the mean. Therefore we are left with
Now to calculate the equivalent expression for . Again, as with only the ‘cross’ terms survive and we have
so we have that . With this we can conclude that the RNE is unbiased.
v.2 The variance
The variance of the first term can be approximated by
The variance of the second term can be written as
A tedious yet straight-forward calculation shows that the variance of the RNE is composed of four terms (we show the details of the calculation in Appendix A). Of those four terms one dominates in the limit so that we have