Bayesian powerspectrum inference for Large Scale Structure data
Abstract
We describe an exact, flexible, and computationally efficient algorithm for a joint estimation of the largescale structure and its powerspectrum, building on a Gibbs sampling framework and present its implementation ARES (Algorithm for REconstruction and Sampling). ARES is designed to reconstruct the 3D powerspectrum together with the underlying dark matter density field in a Bayesian framework, under the reasonable assumption that the long wavelength Fourier components are Gaussian distributed. As a result ARES does not only provide a single estimate but samples from the joint posterior of the powerspectrum and density field conditional on a set of observations. This enables us to calculate any desired statistical summary, in particular we are able to provide joint uncertainty estimates. We apply our method to mock catalogs, with highly structured observational masks and selection functions, in order to demonstrate its ability to reconstruct the powerspectrum from real data sets, while fully accounting for any mask induced mode coupling.
keywords:
large scale – reconstruction –Bayesian inference – cosmology – observations – methods – numerical1 Introduction
Throughout cosmic history a wealth of information on the origin and evolution of our Universe has been imprinted to the large scale structure via the gravitational amplification of primordial density perturbations. Harvesting this information from probes of the large scale structure, such as large galaxy surveys, therefore is an important scientific task to further our knowledge and to establish a conclusive cosmological picture. In recent years great advances have been made, both in retrieving huge amounts of data and increasing sensitivity in galaxy redshift surveys. Especially the recent galaxy surveys, the 2dF Galaxy Redshift Survey (Colless et al., 2001) and the Sloan Digital Sky Survey (Abazajian et al., 2008) provide sufficient redshifts to probe the 3D galaxy distribution on large scales. In particular, the two point statistics of the matter distribution contains valuable information to test standard inflation and cosmological models, which describe the origin and evolution of all observed structure in the Universe. Measuring the powerspectrum from galaxy observations therefore has always attracted great interest. Precise determination of the overall shape of the powerspectrum can for instance place important constraints on neutrino masses, help to identify the primordial powerspectrum, and break degeneracies for cosmological parameter estimation from CMB data (e.g. Hu et al., 1998; Spergel et al., 2003; Hannestad, 2003; Efstathiou et al., 2002; Percival et al., 2002; Spergel et al., 2003; Verde et al., 2003). In addition, several characteristic length scales have been imprinted to the matter distribution throughout cosmic history, which can serve as new standard rulers to measure the Universe. A prominent example of these length scales is the sound horizon, which yields oscillatory features in the powerspectrum, the so called baryon acoustic oscillations (BAO) (e.g. Silk, 1968; Peebles & Yu, 1970; Sunyaev & Zeldovich, 1970). Since the physics governing these oscillatory features is well understood, precise measurements of the BAO will allow us to establish a new, precise standard ruler to measure the Universe through the distance redshift relation (Blake & Glazebrook, 2003; Seo & Eisenstein, 2003). Precision analysis of large scale structure data therefore is a crucial step in developing a conclusive cosmological theory.
Unfortunately, contact between theory and observations cannot be made directly, since observational data is subject to a variety of systematic effects and statistical uncertainties. Such systematics and uncertainties arise either from the observational strategy or are due to intrinsic clustering behavior of the galaxy sample itself (Sánchez & Cole, 2008). Some of the most prominent uncertainties and systematics are:

survey geometry and selection effects

close pair incompleteness due to fiber collisions

galaxy biases

redshift space distortions
The details of galaxy clustering, and how galaxies trace the underlying density field are in general very complicated. The bias between galaxies and mass density is most likely nonlinear and stochastical, so that the estimated galaxy spectrum is expected to differ from that of the mass (Dekel & Lahav, 1999). Even in the limit where a linear bias could be employed, it still differs for different classes of galaxies (see e.g. Cole et al., 2005). In addition, the apparent density field, obtained from redshift surveys, will generally be distorted along the lineofsight due to the existence of peculiar velocities.
However, the main cause for the systematic uncertainties in large scale powerspectrum estimations is the treatment of the survey geometry (Tegmark, 1995; Ballinger et al., 1995). Due to the survey geometry the raw powerspectrum yields an expectation value for the powerspectrum, which is the true cosmic powerspectrum convolved with the survey mask (Cole et al., 2005). This convolution causes an overall distortion of the powerspectrum shape, and drastically decreases the visibility of the baryonic features.
The problems, mentioned above, have been discussed extensively in literature, and many different approaches to powerspectrum analysis have been proposed. Some of the main techniques to recover the powerspectrum from galaxy surveys are Fourier transform based, such as the optimal weighting scheme, which assigns a weight to the galaxy fluctuation field, in order to reduce the error in the estimated power (see e.g. Feldman et al., 1994; Tegmark, 1995; Hamilton, 1997a; Yamamoto, 2003; Percival et al., 2004). Alternative methods rely on KarhunenLoève decompositions (Tegmark et al., 1997; Tegmark & et al., 2004; Pope et al., 2004) or decompositions into spherical harmonics, which is especially suited to address the redshift space distortions problematic (Fisher et al., 1994; Heavens & Taylor, 1995; Tadros et al., 1999; Percival et al., 2004; Percival, 2005). In addition, to these deconvolution methods there exists a variety of likelihood methods to estimate the real space powerspectrum (Ballinger et al., 1995; Hamilton, 1997a, b; Tadros et al., 1999; Percival, 2005). In order to not just provide the maximum likelihood estimate but also conditional errors, Percival (2005) proposed a Markov Chain Monte Carlo method to map out the likelihood surface.
Nevertheless, as the precision of large scale structure experiments has improved, the requirement on the control and characterization of systematic effects, as discussed above, also steadily increases. It is of critical importance to propagate properly the uncertainties caused by these effects through to the matter powerspectrum and cosmological parameters estimates, in order to not underestimate the final uncertainties and thereby draw incorrect conclusions on the cosmological model.
We therefore felt inspired to develope a new Bayesian approach to extract information on the two point statistics from a given large scale structure dataset. We prefer Bayesian methods to conventional likelihood methods, as the yield more general and profound statements about measurements. In example, conventional likelihood methods can only answer questions of the inner form like :” Given the true value of a signal, what is the probability distribution of the measured values ?” A Bayesian method, on the other hand, answers questions of the type :”Given the observations , what is the probability distribution of the true underlying signal ?” For this reason, Bayesian statistics answers the underlying question to every measurement problem, of how to estimate the true value of the signal from observations, while conventional likelihood methods do not (Michel & Kirchhoff, 1999). Since the result of any Bayesian method is a complete probability distribution they permit fully global analyses, taking into account all systematic effects and statistical uncertainties. In particular, here, we aim at evaluating the powerspectrum posterior distribution , with being the powerspectrum coefficients of the th mode and is an observation at position in three dimensional configuration space. This probability distribution would then contain all information on the two point statistics supported by the data. In order to explore this posterior distribution we employ a Gibbs sampling method, previously applied to CMB data analysis (see e.g. Wandelt, 2004; Eriksen et al., 2004; Jewell et al., 2004).
Since direct sampling from is impossible or at least difficult, they propose instead to draw samples from the full joint posterior distribution of the powerspectrum coefficients and the 3D matter density contrast amplitudes conditional on a given set of data points . This is achieved by iteratively drawing density samples from a Wienerposterior distribution and powerspectrum samples via an efficient Gibbs sampling scheme (see figure 1 for an illustration). Here, artificial mode coupling, as introduced by survey geometry and selection function, is resolved by solving the Wienerfiltering equation, which naturally regularizes inversions of the observational response operator in unobserved regions. In this fashion, we obtain a set of Monte Carlo samples from the joint posterior, which allows us to compute any desired property of the joint posterior density, with the accuracy only limited by the sample size. In particular, we obtain the power spectrum posterior by simply marginalizing the joint posterior over the auxiliary density amplitudes , which is trivially achieved by ignoring the samples.
The Gibbs sampler also offers unique capabilities for propagating systematic uncertainties endtoend. Any effect, for which we can define a sampling algorithm, either jointly with or conditionally on other quantities, can be propagated seamlessly through to the final posterior.
It is worth noting, that our method differs from traditional methods of analyzing galaxy surveys in a fundamental aspect. Traditional methods consider the analysis task as a set of steps, each of which arrives at intermediate outputs which are then fed as inputs to the next step in the pipeline. Our approach is a truly global analysis, in the sense that the statistics of all science products are computed jointly, respecting and exploiting the full statistical dependence structure between various components.
In this paper we present ARES (Algorithm for REconstruction and Sampling), a computer algorithm to perform a full Bayesian data analysis of 3D redshift surveys. In section 3 we give an introduction to the general idea of the large scale structure Gibbs sampling approach, followed by section 4 and 5, where we describe and derive in detail the necessary ingredients to sample the 3D density distribution and the powerspectrum respectively. The choice of the prior and the relevance for the cosmic variance are discussed in section 6. Details concerning the numerical implementation are discussed in section 7. We then test ARES thouroughly in section 8, particularly focussing on the treatment of survey masks and selection functions. In section 9 we demonstrate the running median filter, and use it as an example to demonstrate how uncertainties can be propagated to all inferences based on the set of Gibbs samples. Finally we conclude in section 10, by discussing the results of the method and giving an outlook for future extensions and application of our method.
2 Notation
In this section, we describe the basic notation used throughout this work. Let the quantity be the field amplitude of the three dimensional field at position . Then the index has to be understood as a multi index, which labels the three components of the position vector:
(1) 
where is the th component of the th position vector. Alternatively one can understand the index as a set of three indices so that for an equidistant grid along the three axes the position vector can be expressed as:
(2) 
with , and being the grid spacing along the three axes. With this definition we yield:
(3) 
Also note that any summation running over the multi index is defined as the three sums over the three indices , and :
(4) 
Further, we will frequently use the notation , which denotes the set of field amplitudes at different positions . In particular:
(5) 
where is the total number of position vectors.
3 The Large scale structure Gibbs sampler
As already described in the introduction, we seek to sample from the joint posterior distribution of the powerspectrum coefficients and the 3D matter density contrast amplitudes given a set of observations .
In principle, this joint posterior distribution could be mapped out over a grid in the multidimensional space of the signal amplitudes and powerspectrum coefficients . But since the number of grid points required for such an analysis scales exponentially with the number of free parameters, this approach cannot be realized efficiently. For this reason, we propose a Gibbs sampling approach to this problem.
The theory of Gibbs sampling (Gelfand & Smith, 1990; Tanner, 1996; O’Hagan, 2000) states, that if it is possible to sample from the conditional densities and , then iterating the following two sampling equations will, after an initial burnin period, lead to samples from the joint posterior :
(6) 
(7) 
where the symbol denotes a random draw from the probability density on its right.
Once a set of samples from has been obtained, the properties of this probability density can be summarized in terms of any preferred statistic, such as its multivariate mean, mode or variance.
As our approach probes the joint distribution, we are able to quantify joint uncertainties of the signal amplitudes and the powerspectrum conditional just on the data. For this reason, the Gibbs sampling approach should not be considered as yet another maximum likelihood technique, although it certainly is able to produce such an estimate.
4 Sampling the signal maps
Assuming a Gaussian signal posterior distribution , the task of drawing a random signal sample can be split into two steps.
First, we estimate the maximum a postiori values for the signal amplitudes , which in the Gaussian case coincide with the mean values. Then a fluctuation term, being consistent with the correct covariance, is added to the mean signal. The sum of the mean and the fluctuation term will then yield a sample from the conditional posterior.
The most challenging procedure in this signal sampling step is to calculate the a postiori values for the signal amplitudes . Assuming a Gaussian posterior will directly lead to a Wiener filtering procedure, described below. However, this method requires to invert huge matrices which consists of the sum of the inverse signal and inverse noise covariance matrices. The matrix inversion is a numerically very demanding step, and at the same time presents the bottleneck for our method, as it defines the computational speed with which a signal sample can be produced. The efficient implementation of these matrix inversion step, as described by Kitaura & Enßlin (2008), allows for the production of many thousands of samples in computational feasible times. In the following sections we will describe the details of the signal sampling procedure.
4.1 The Wiener filter
As already described above, the main task for the signal sampling step is to derive the maximum a postiori values for the signal amplitudes . According to Bayes’ theorem the conditional signal posterior can be written as the product of a signal prior and a likelihood normalized by the so called evidence. Further, here we will use the signal covariance matrix rather than the powerspectrum . It is well known, that the powerspectrum is just the Fourier transform of the signal covariance in configuration space. Since the Fourier transform is a basis transformation with a unitary transformation matrix, the signal covariance matrix and the powerspectrum can be used interchangeably for a normalized Fourier transform (see section 5.1 and Appendix B for more details).
We can therefore write the signal posterior as:
(8)  
where we assume that the data amplitudes are conditionally independent of the signal covariance matrix , once the signal amplitudes are given. Following Bardeen et al. (1986), we describe the signal prior for the large scale matter distribution as a multivariate Gaussian, with zero mean and the signal covariance . We can then write:
(9) 
The Fourier transform of the signal covariance matrix has an especially appealing form in Fourier space. It is well known, that in an homogeneous and isotropic universe the Fourier transform of the signal covariance is a diagonal matrix, with the diagonal elements being the powerspectrum. Hence, we can express the Fourier representation of the signal covariance as:
(10) 
where the symbol denotes a Fourier transform, is the Kronecker delta and is the powerspectrum coefficient at the Fourier mode in three dimensional pixel space (see e.g. Padmanabhan, 1993; Lahav & Suto, 2004).
The choice of the Gaussian prior can be justified by inflationary theories, which predict the matter field amplitudes to be Gaussian distributed in the linear regimes at scales (Peacock & Dodds, 1994; Percival et al., 2001).
At nonlinear scales the Gaussian prior does not represent the full statistical behavior of the matter field anymore. During nonlinear gravitational structure formation the statistics of the initial density field has evolved from a Gaussian distribution towards a log normal distribution as commonly assumed in literature (Coles & Jones, 1991; Colombi, 1994; Kayo et al., 2001).
However, note that in this case the Gaussian prior still describes the two point statistics of the underlying density field even in the nonlinear regime. The Gaussian prior should therefore be regarded as our a priori knowledge of the matter distribution, which is only formulated up to two point statistics. Next, we discuss the likelihood given in equation (8).
As we seek to recover the maximum a postiori signal from the set of observations we must assume a model which relates these both quantities. The most straight forward data model is linear, and can be written as:
(11) 
where is an observation response operator and is an additive noise contribution, which will be defined in more detail in the next section. If we assume the noise to be Gaussian distributed, with zero mean and covariance , we can express the likelihood as:
(12) 
where we simply inserted the data model given in equation (11) into the Gaussian noise distribution.
With these definitions the signal posterior is a multivariate Gaussian distribution and can be written as:
where we omitted the normalization factor, which is of no interest in the following.
Note, that omitting the normalization of the likelihood, requires that the additive noise term is independent of any signal contribution, as otherwise the noise covariance matrix would carry signal information and could not be neglected in the following. This assumption, however, is in general not true for the Poissonian noise, as described, in the next section.
The maximum of this signal posterior can then easily be found by either completing the square in the exponent of equation (4.1), or by extremizing with respect to the signal amplitudes . The latter approach allows us to directly read of the Wiener filter equation from equation (4.1), by simply differentiating the exponent with respect to and setting the result to zero. The result is the famous Wiener filter equation which is given as:
(14) 
where we denoted the variable as a Wiener mean signal amplitude, to clarify that this reconstruction is the mean and not a typical sample of the distribution. The solution of this equation requires to invert the matrix:
(15) 
which leads to the solution for the signal amplitudes
(16) 
This result demonstrates that estimating the maximum a postiori values for the signal amplitudes involves inversions of the Wiener filter operator . Therefore, the signalsampling operation is by far the most demanding step of our Gibbs sampling scheme, as it requires the solution of a very large linear system.
Formally speaking, in practice, this corresponds to inverting matrices of order or larger, which clearly is not computationally feasible through bruteforce methods. For example, matrix inversion algorithms, based on usual linear algebra methods, have a numerically prohibitive scaling, in order to transform to the eigenspace of the system, which bars sampling from the signal posterior.
This is the situation in which Kitaura & Enßlin (2008) proposed a particular operator based inversion technique to allow for computationally efficient calculation of the Wiener filter equation in three dimensional space.
In this implementation, the system of equations (16) can be solved by means of conjugate gradients (CGs). The computational scaling of this method is thus reduced to the most expensive step for applying the operator on the righthand side of the equations, which in our case is the Fast Fourier transform, which scales as .
4.2 The galaxy data model
In order to adapt the Wiener filter procedure for the specific application to galaxy observations, we are going to present the galaxy data model together with the according Poissonian noise covariance matrix.
It is possible to describe the observed galaxy distribution as a realization of an inhomogeneous Poissonian process (Martínez & Saar, 2002). We can therefore assume the observed galaxy numbers at position in three dimensional configuration space to be drawn from a Poissonian distribution (Martínez & Saar, 2002; Kitaura et al., 2009).
(17) 
where the arrow denotes a random draw from the probability distribution and is the mean observable galaxy number at position . We can then write the observed galaxy numbers at discrete positions as:
(18) 
where the noise term denotes the difference between the observed galaxy number and the mean observable galaxy number. The Poissonian noise covariance matrix can then easily be obtained by:
(19) 
where we simply calculated the Poissionian variance for the observed galaxy number assuming the galaxies to be independent and identically distributed (i.i.d). The mean observable galaxy number can be related to the true mean galaxy number by applying the observation response operator as:
(20) 
The true mean galaxy number, on the other hand, can be related to the dark matter over density field, the signal , by introducing a physical model in the form of a bias operator , e.g. a scale dependent bias:
(21) 
By inserting equations (20) and (21) into equation (18) and applying trivial algebraic conversions, we yield the data model:
(22) 
For the case of galaxy redshift surveys the response operator is the product of the sky mask and the selection function, which are both local in configuration space, and hence the response operator turns to:
(23) 
where is the value of the sky mask and is the value of the selection function at position . We therefore arrive at the data model already described in equation (11), which reads:
(24)  
where we introduced the effective observation response operator and the noise contribution . This is the galaxy data model which we derived from the assumption of the Poissonian distribution of galaxies.
The Wiener filter operator requires the definition of the noise covariance matrix , which for the Poissonian noise can be expressed as:
(25) 
where we used the Poissonian noise covariance matrix given in equation (19).
However, introducing equation (20) and (21) yields the noise covariance matrix:
(26) 
which immediately reveals, that there is a correlation between the underlying signal amplitudes and the level of shot noise produced by the discrete distribution of galaxies (see e.g. Seljak, 1998).
Nevertheless, as pointed out in the previous section, the Wiener filter relies on the fact, that the additive noise contribution is uncorrelated with the signal. Hence, we have to assume the noise covariance as uncorrelated with the signal, but it may have some structure.
Therefore, we provide two approaches to effectively approximate the noise covariance matrix given in equation (26).
In the first approach we calculate an effective noise covariance matrix by averaging the noise covariance matrix given in equation (26) over the signal. We then obtain:
(27)  
where we used the fact, that the ensemble mean of the signal amplitudes for the density contrast vanishes. Note, that this model also arises when persuing a least squares approach to matter field reconstructions rather than the Bayesian approach as described in this work (for details see Kitaura & Enßlin, 2008).
In the other approach we introduce a noise structure function given as:
(28) 
With this definition the noise is approximated as being uncorrelated to the signal, but nonuniform. The noise covariance matrix then reads:
(29) 
In order to use this noise structure function we have to estimate from the observed galaxy numbers . By applying Bayes’ theorem to the Poissonian distribution given in relation (39) we yield:
(30) 
In the absence of any further a priori knowledge on we assume a flat prior and search for the maximum of:
(31) 
which is normalized to yield unity when integrated over all .
The noise structure function can then be estimated by searching the most likely value for from equation (31). This yields:
(32) 
Another estimator for is based on evaluating the mean of the probability distribution given in equation (31). The ensemble mean is calculated as:
(33) 
in this case the noise structure function can be written as:
(34) 
A more thorough discussion on Poissonian noise models and their numerical implications for matter field reconstructions can be found in Kitaura et al. (2009).
4.3 Drawing signal samples
In the previous sections, we described the Wiener filter and the galaxy data model, which are required to estimate the mean of the signal posterior. However, this mean signal is no sample from the signal posterior yet, neither does it represent a physical density field, as it lacks power in the low signal to noise regions. To create a true sample from the signal posterior, one must therefore add a fluctuation term , which compensates the power lost due to noise filtering. The signal sample can then be written as the sum of the signal mean and a fluctuation term:
(35) 
In our approach we realize the fluctuation term by generating a mock signal and a mock observation consistent with the data model given in equation (24). This kind of mock observation generation is well known in literature and has been applied to various scientific applications, as for instance the generation of constrained initial conditions for Nbody simulations (see e.g. Bertschinger, 1987; Hoffman & Ribak, 1991; Ganon & Hoffman, 1993; Kitaura & Enßlin, 2008). The fluctuation term can then simply be calculated as:
(36) 
The interpretation of this equation is simple. In the high signal to noise regime, the Wiener filter is nearly a passthrough operator, meaning the reconstructed signal is nearly identical to the true underlying signal. Therefore, as the variance is low, the fluctuation term tends towards zero. In the low signal to noise regime, on the other hand, the Wiener filter will block, and no signal can be reconstructed. The fluctuation term will therefore be nearly identical to the underlying mock signal .
In this fashion we add the correct power to the Wiener mean reconstruction. The effect of adding the fluctuation term to the Wiener mean is presented in figure 2, where we see the Wiener mean reconstruction, the fluctuation term and the sum of both.
The mock data is generated to obey the data model described in equation (24) and the Wiener variance.
We therefore first draw a mock signal with correct statistics from the multivariate Gaussian signal prior given in equation (9). Such a mock signal is best generated in Fourier space following the description of Martel (2005). One first draws two Gaussian random numbers, and , with zero mean and unit variance and then calculates the real and imaginary part of the signal in Fourier space as:
(37) 
where is the powerspectrum coefficient at the th position in Fourier space. Note, that the mock signal is supposed to be a real quantity, and therefore hermiticity has to be imposed in Fourier space before performing the inverse Fourier transform (for details see Martel, 2005).
Next we have to generate the additive noise contribution. In order to draw a noise term with the correct Poissonian statistics, we first draw a random number from the Poissonian distribution:
(38) 
where we choose the mean observed galaxy number to be . According to equations (18) and (24) the mock noise term can be calculated as:
(39) 
It is clear by construction that this mock noise term has vanishing mean and the correct noise covariance matrix. Then, according to equation (24) the mock observation is given as:
(40) 
The proof, that the fluctuation term as generated by equation (36) truly generates the correct variance is given in Appendix C.
Note, that the application of the Wiener operator is a linear operation, and we can therefore rewrite equation (35) as:
(41) 
where the Wiener operator is applied to the true data and the mock observation simultaneously. This greately reduces the CPU time required for the generation of one signal sample.
5 Sampling the Power Spectrum
As described above, the signal sampling step provides a noiseless full sky signal sample consistent with the data. The next step in the Gibbs sampling iteration scheme requires to draw powerspectrum samples from the conditional probability distribution . Since in this Gibbs sampling step the perfect sky signal amplitudes are known, the powerspectrum is conditionally independent of the data amplitudes . Hence, in this Gibbs sampling step, we can sample the powerspectrum from the probability distribution . In the following we will show that the powerspectrum can easily be drawn from an inverse gamma distribution.
5.1 Drawing powerspectrum samples
According to Bayes’ theorem, we can rewrite the conditional probability as:
(42) 
where is the prior for the signal covariance, is given by equation (9) and is a normalization constant in this Gibbs sampling step.
More specifically, we are interested in the set of matrix coefficients of the covariance matrix . As already pointed out in section 4.1 the signal covariance matrix of an homogeneous and isotropic universe, has an especially appealing form in Fourier space, where it takes a diagonal form. In our application the real space covariance matrix coefficients are related to their Fourier representation via the fast Fourier transform, as defined in Appendix A. We can therefore write:
(43)  
Then we can express the conditional distribution for the Fourier signal covariance coefficients as:
(44) 
where
(45) 
is the Jacobian determinant for this coordinate transformation. As the discrete Fourier transform is proportional to a unitary matrix, this Jacobian determinant only amounts to a normalization constant, as has been demonstrated in Appendix B.
With this definition we can rewrite the conditional probability in equation (42), by replacing all the real space covariance matrix coefficients by their Fourier representation , and normalizing it with the constant obtained from the coordinate transformation. We can therefore write:
where we used the discrete Fourier transform definition, given in Appendix A, to replace the real space signal amplitudes by their Fourier counterparts . Introducing equation (10) then allows us to rewrite equation (5.1) in terms of the powerspectrum coefficients as:
where the determinant factorizes due to the diagonal form of the signal covariance matrix in Fourier space.
Note, that due to isotropy the powerspectrum is independent of direction in Fourier space, meaning the powerspectrum coefficients only depend on the modulus of the mode vector :
(47) 
For this reason, the angular dependence in Fourier space can be summed over.
To do so we remark that the mode vector , as a geometrical object, will not change if we express it in the basis of cartesian coordinates , or if we describe it in the basis of spherical coordinates . We can therefore split the multi index summation into the summation over the three spherical coordinates as:
(48)  
where we introduced , which is the summed signal power on spherical shells around the origin in Fourier space, and the index labels each of the shells belonging to the different mode vector modulus in the Fourier box.
Several different mode vectors may have the same vector modulus , and therefore belong to the same shell. To account for this we introduce the number , which counts the number of different mode vectors , belonging to the th shell in Fourier space. This number , therefore counts the degrees of freedom for each of the modes. We can then express the product in equation (5.1) in terms of as:
(49) 
With these definitions equation (5.1) turns to:
(50) 
When ignoring the powerspectrum prior in the above equation (50), we see that the probability distribution factorizes in the different , meaning they could be sampled independently.
If also the prior for the different would factorize as:
(51) 
then it is possible to sample each mode of the powerspectrum independently.
On large scales, or in the linear regime, the theory of gravitational structure formation tells us that the different Fourier modes evolve independent of each other. In these regimes the proposed powerspectrum prior would be the adequate choice. However, we also know that nonlinear structure formation couples the different Fourier modes, the stronger the deeper we reach into the nonlinear regime. In these regimes another prior would be more adequate, but also harder to sample.
Anyhow, as already described in section 3 the entire powerspectrum sampling method requires two steps. While the different powerspectrum modes are assumed to be independent in the powerspectrum sampling step, they are not in the signal sampling step. There the different modes are coupled via the observation mask and selection function, and furthermore, the physical coupling of the different modes is represented in the data.
Therefore, in the following we assume a powerspectrum prior, as proposed in equation (51), and defer a more thorough investigation of adequate prior choices in the nonlinear regime to future work.
With this prior choice each mode can be sampled independently from the following probability density distribution:
(52) 
Further, we will assume a powerlaw behavior for the individual mode prior where is a power law index. Note, that a powerlaw index describes the flat prior, while amounts to the Jeffrey’s prior. The Jeffrey’s prior is a solution to a measure invariant scale transformation of the form (Wandelt, 2004), and therefore is a scale independent prior, as different scales have the same probability.
Inserting this power law prior in equation (52) and imposing the correct normalization, reveals that the powerspectrum coefficients have to be sampled from an inverse gamma distribution given as:
(53) 
By introducing the new variable and performing the change of variables we yield the distribution as:
(54) 
where . Sampling the powerspectrum coefficients is now an easy task, as it reduces to drawing random samples from the distribution. A random sample from the distribution for an integer can be drawn as follows.
Let be independent, normally distributed random variates with zero mean and unit variance then:
(55) 
is distributed, and is a element vector, with each element being normally distributed. The powerspectrum coefficient sample is then obtained by:
(56) 
It is easy to see that each spectrum coefficient sample is a positive quantity, this ensures that the signal covariance matrix is positive definite as it has to be by definition.
To summarize, we provide an optimal estimator for the powerspectrum coefficients, and their uncertainties.
It is also worth mentioning, that the inverse gamma distribution is a highly nonGaussian distribution, and that for this reason, the joint estimates of signal amplitudes and powerspectrum coefficients are drawn from a nonGaussian distribution.
5.2 BlackwellRao estimator
As described in the introduction, we seek to estimate the probability distribution , which we can now simply obtain by marginalizing over the signal samples:
(57)  
where are the signal Gibbs samples, and is the total number of Gibbs samples.
This result is known as the BlackwellRao estimator of which is guaranteed to have a lower variance than a binned estimator (Wandelt, 2004).
It is worth noting, that has a very simple analytic form, and therefore equation (61) provides an analytic approximation to based on the Gibbs samples. All the information on is therefore contained in the of the individual Gibbs steps, which generate a data set of size , where is the maximal number of independent modes. In addition, to being a faithful representation of the BlackwellRao estimator is also a computationally efficient representation, which allows to calculate any moment of as:
(58) 
where each of the terms on the right handside can be calculated analytically.
For the inverse gamma distribution given in equation (53) we can then simply calculate the mean of the probability distribution as:
(59) 
and in analogy the variance as:
(60) 
The BlackwellRao estimator also allows us to demonstrate another remarkable property of the Gibbs sampling approach. Although a specific powerspectrum prior has to be employed during the Gibbs analysis of the data, a post processing analysis of the powerspectrum can be performed with any desired powerspectrum prior. Lets assume one prefers to perform a post processing analysis with a powerspectrum prior , rather than with the prior , which was employed during Gibbs sampling. We therefore want to estimate the powerspectrum from the following posterior:
(61)  
where we simply made use of the Bayes theorem. Since is a simple Gaussian distribution, and therefore given analytically, the posterior can be calculated with any desired powerspectrum prior in a postprocessing step.
6 The prior and the cosmic variance
The Gibbs sampling procedure consists of the two basic steps, of first sampling perfect noiseless full sky signal samples and then sampling the powerspectrum coefficients given .
Therefore, the probability distribution , given in equation (53), encodes our knowledge on the powerspectrum coefficients , if we had perfect knowledge of the true signal amplitudes