Methods for Bayesian power spectrum inference with galaxy surveys
We derive and implement a full Bayesian large scale structure inference method aiming at precision recovery of the cosmological power spectrum from galaxy redshift surveys. Our approach improves over previous Bayesian methods by performing a joint inference of the three dimensional density field, the cosmological power spectrum, luminosity dependent galaxy biases and corresponding normalizations. We account for all joint and correlated uncertainties between all inferred quantities. Classes of galaxies with different biases are treated as separate sub samples. The method therefore also allows the combined analysis of more than one galaxy survey. In particular, it solves the problem of inferring the power spectrum from galaxy surveys with non-trivial survey geometries by exploring the joint posterior distribution with efficient implementations of multiple block Markov chain and Hybrid Monte Carlo methods. Our Markov sampler achieves high statistical efficiency in low signal to noise regimes by using a deterministic reversible jump algorithm. This approach reduces the correlation length of the sampler by several orders of magnitude turning the otherwise numerical unfeasible problem of joint parameter exploration into a numerically manageable task. We test our method on an artificial mock galaxy survey, emulating characteristic features of the Sloan Digital Sky Survey data release 7, such as its survey geometry and luminosity dependent biases. These tests demonstrate the numerical feasibility of our large scale Bayesian inference frame work when the parameter space has millions of dimensions. The method reveals and correctly treats the anti-correlation between bias amplitudes and power spectrum, which are not taken into account in current approaches to power spectrum estimation, a 20 percent effect across large ranges in k-space. In addition, the method results in constrained realizations of density fields obtained without assuming the power spectrum or bias parameters in advance.
Subject headings:cosmology: observations: methods: numerical
With the advent of large three dimensional galaxy surveys the three dimensional large scale structure has become a major source of knowledge to understand the homogeneous and inhomogeneous evolution of our Universe. In particular, two-point statistics of the three dimensional matter distribution, are important tools to test our current picture of inflation and evaluate cosmological models describing the origin and evolution of the Universe. For example, detailed knowledge on the overall shape of the matter power spectrum can provide constraints on neutrino masses or the primordial power spectrum and break degeneracies in cosmological parameter estimation from Cosmic Microwave Background (CMB) data when measuring the parameter combination (e.g. W. Hu, D.J. Eisenstein and M. Tegmark (1998); D.N. Spergel, L. Verde, H.V. Peiris, E. Komatsu, M.R. Nolta, C.L. Bennett, M. Halpern, G. Hinshaw, N.C. Jarosik, A. Kogut, M. Limon, S.S. Meyer, L. Page, G.S. Tucker, J.L. Weiland, E. Wollack and E.L. Wright (2003); S. Hannestad (2003); 14; W. J. Percival (2002); D.N. Spergel, L. Verde, H.V. Peiris, E. Komatsu, M.R. Nolta, C.L. Bennett, M. Halpern, G. Hinshaw, N.C. Jarosik, A. Kogut, M. Limon, S.S. Meyer, L. Page, G.S. Tucker, J.L. Weiland, E. Wollack and E.L. Wright (2003); L. Verde (2003)). Since the physics governing the baryon acoustic oscillations (BAO) in the power spectrum is well understood (e.g. Silk, 1968; Peebles and Yu, 1970; Sunyaev and Zeldovich, 1970), precise measurements of the BAO will allow us to establish this length scale as a new standard ruler to measure the geometry of our Universe through the redshift distance relation (Blake and Glazebrook, 2003; Seo and Eisenstein, 2003). Inferring the BAOs from observation therefore constitutes an important task for determining the detailed nature of a possible dynamic dark energy component, which may explain the currently observed accelerated expansion of the Universe.
Precision cosmology of this kind requires not only the large and informative data sets which are becoming available, such as the Sloan Digital Sky Survey (SDSS), Dark Energy Survey (DES), Canada-France-Hawaii Telsecope Legacy Survey (CFHTLS) and EUCLID, but also fair and accurate data analysis methods (York, 2000; Coupon, 2009; Refregier et al., 2010; Frieman, 2011). Establishing contact between observations and theoretical predictions in large scale structure is non-trivial. This is due to the fact that generally galaxy redshift surveys are subject to a variety of systematic and statistical uncertainties that arise in the observational process such as, noise, survey geometry, selection effects, and close pair incompleteness due to fiber collisions.
Furthermore, the unknown clustering bias of any one considered galaxy sample compared to the overall mass distribution adds an intrinsic level of uncertainty (Sánchez and Cole, 2008). Redshift space distortions can be considered a systematic for some analyses but also contain cosmological information.
A careful treatment of the survey geometry is essential for the analysis of LSS surveys (Tegmark, 1995; Ballinger et al., 1995). Generally, the raw power spectrum, estimated from a galaxy redshift survey, yields an expectation value that is the true power spectrum convolved with the survey mask (Cole and Percival, 2005). This convolution distorts the shape of the power spectrum and dramatically reduces the detectability of subtle features such as the BAOs. Also the details of galaxy clustering can have a major impact on the shape of the power spectrum (Tegmark, 2004). The details of how galaxies cluster and trace the gravitational potentials of dark matter are complicated and not conclusively understood at present. Most likely the relation between galaxies and matter is essentially non-linear and possibly stochastic in nature, such that the inferred galaxy power spectrum is expected to deviate from that of mass (Dekel and Lahav, 1999). The issue of the galaxy bias remains complicated even in the limit of a linear bias, since different types of galaxies exhibit different clustering behavior (see e.g. Cole and Percival, 2005). As an example, luminosity dependent clustering of galaxies introduces scale dependent biases, when inferring power-spectra from flux limited galaxy surveys (Tegmark, 2004).
1.1. The state of the art
To address these problems, a large variety of different power spectrum inference methods have been proposed in the literature. Substantial work has been done in respect to Fourier transform based methods, such as the optimal weighting scheme. In this approach galaxies are assigned a weight, in order to reduce the error in the estimated power (see e.g. Feldman et al., 1994; Tegmark, 1995; Hamilton, 1997a; Yamamoto, 2003; Percival et al., 2004). As alternatives to Fourier space methods, there exist methods relying on Karhunen-Loève or spherical harmonics decompositions (see e.g. Tegmark et al., 1997; Tegmark, 2004; Pope, 2004; Fisher et al., 1994; Heavens and Taylor, 1995; Tadros, 1999; Percival et al., 2004; Percival, 2005). Furthermore, there exists a rich body of research based on a variety of likelihood methods aiming at inferring the real space power spectrum (Ballinger et al., 1995; Hamilton, 1997a, b; Tadros, 1999; Percival, 2005). To not only provide maximum likelihood estimates but also corresponding conditional errors, Percival (2005) proposed a Markov Chain Monte Carlo approach.
1.2. Bayesian analysis
Inference of the three dimensional density field and the corresponding cosmological power spectrum from galaxy redshift surveys with highly structured survey geometries and statistical uncertainties is an ill-posed problem. This means, there generally exists a large range of feasible solutions consistent with observations. The Bayesian approach solves this problem by providing numerical representations of the joint three dimensional density and power spectrum posterior distribution (Jasche et al., 2010). Similar approaches have also been applied to Bayesian inference with CMB data (see e.g. Wandelt et al., 2004; O’Dwyer et al., 2004; Eriksen et al., 2004; Jewell et al., 2004; Larson et al., 2007; Eriksen et al., 2007; Jewell et al., 2009).
In the age of precision cosmology the goal is to assess joint and correlated uncertainties between quantities inferred from a given galaxy redshift survey. A particularly important example are galaxy biases that, if measured incorrectly, have the potential to distort the shape of the inferred power spectrum over a large range of modes (Tegmark, 2004). Traditionally, biases are accounted for in the following sequential pipeline process: first, measure the bias amplitudes from power-spectra of different galaxy populations fixing to a fiducial power spectrum; then infer the power spectrum from the entire catalog (see e.g. Tegmark, 2004). This approach leaves open the charge of overusing the information content of the data because biases and the power spectrum inferences are treated as independent when they are not. Alternative approaches like the optimal weighting scheme or the Bayesian power spectrum inference method assume fixed bias models (see e.g. Percival et al., 2004; Jasche et al., 2010).
In all these methods uncertainties in the inferred bias models do not feed back to the power spectrum. It is therefore important to quantify these joint uncertainties and to devise methods to account for these effects. In this paper we report on the inclusion of these effects in the inference framework described in Jasche et al. (2010).
Our approach accounts for joint uncertainties by exploring the joint posterior distribution of the three dimensional density field, the power spectrum, galaxy biases and corresponding normalizations conditioned to observations. Numerical and statistical efficient exploration of the corresponding high dimensional parameter space is achieved via new implementations of multiple block Markov chain and Hybrid Monte Carlo methods. The schematic outline of the iterative block sampling scheme is given in figure 1. Running through the analysis cycle generates a sampled representation of the full joint posterior distribution, which naturally accounts for joint and correlated uncertainties involved in the inference process.
This paper also describes radical methodological changes in the implementation of the sampling scheme underlying the exploration of this joint posterior density. Using a combination of an efficient Hybrid Monte Carlo sampling method for all the parameters, together with the reversible jump algorithm described by Jewell et al. (2009) we obtained an algorithm that proves to sample all the parameters efficiently both in the high and in the low signal-to-noise regimes.
The paper is structured as follows. In section 2, we summarize the notation used in this work. Section 3 describes the large scale structure posterior distribution, employed to infer the large scale three dimensional density field, power spectrum, galaxy biases and normalizations from galaxy observations. Here we also describe how the complex task of sampling the full joint posterior distribution can be dissected into a sequence of smaller sub problems within the framework of multiple block Metropolis-Hastings sampling. In Section 4, we discuss the numerical implementation of our method. The following section 5 describes the generation of an artificial galaxy mock observation which emulates characteristic features of the Sloan Digital Sky Survey Data Release 7 (Abazajian, 2009). In section 6, we perform a test of our method by applying it to artificial galaxy data. These tests particularly aim at estimating the efficiency of the algorithm in a realistic scenario. Some results of the performed test are discussed in section 7. In particular, here we show, that the posterior correlations between inferred power-spectra and galaxy biases are strongly anti-correlated over large ranges in Fourier-space. The paper is concluded in section 8 by a discussion of the results obtained in this work and some concluding remarks.
In this section, we describe the basic notation used throughout this work. Let the quantity be the field amplitude of the three dimensional field at position . Then the index has to be understood as a multi index, which labels the three components of the position vector:
where is the th component of the th position vector. Alternatively one can understand the index as a set of three indices so that for an equidistant grid along the three axes the position vector can be expressed as:
with , and being the grid spacing along the three axes. With this definition we yield:
Also note that any summation running over the multi index is defined as the three sums over the three indices , and :
Further, we will frequently use the notation , which denotes the set of field amplitudes at different positions . In particular:
where is the total number of position vectors or voxels in the three dimensional volume.
3. The Large scale structure Posterior
In this section we will describe the Large scale structure posterior distribution employed in this work. As will be demonstrated, the multiple block sampling approach permits to dissect the complex problem of exploring a joint posterior distribution into a set of smaller subtasks.
3.1. The joint posterior distribution
The aim of this work is to update and complement the previously presented power spectrum sampling framework described in Jasche et al. (2010) to be numerically more efficient and also to account for additional systematics and statistical uncertainties of the galaxy distribution from which the density field and the power spectrum shall be inferred. In particular, the previously developed algorithm accounted for systematics such as arising from survey geometry and selection effects as well as statistical uncertainties due to noise in the galaxy distribution and cosmic variance (see Jasche et al., 2010, for details). Additional uncertainties arise from the galaxy bias and the correct normalization for the galaxy density contrast. Both of these effects can greatly effect the recovery of the large scale power spectrum and may yield erroneous results when not accounted for accurately. In particular, a galaxy sample consisting of many galaxy types, all tracing the underlying density field differently, does not provide a homogeneous sample of the underlying density field. This is because luminous galaxies will dominate the galaxy sample at larger distances while fainter galaxies are found in regions closer to the observer. Due to the different clustering behavior of luminous and faint galaxies, this observational effect introduces a spatially varying bias, which translates to a scale depended bias in Fourier-space (Tegmark, 2004; Percival et al., 2004). In addition, a wrong assumption on the galaxy density contrast normalization will yield a non-vanishing mean resulting in erroneous large scale power when estimating power-spectra. It is therefore generally crucial to account for these effects when inferring the density field and power spectrum from realistic galaxy surveys.
Traditional approaches generally measure the power spectrum and the galaxy bias parameters in separate steps by either measuring it from the same data set or employing fitting functions calibrated on other galaxy catalogs (see e.g. Percival et al., 2004; Tegmark, 2004; Cole and Percival, 2005; Percival et al., 2007). The risk of such approaches is that they may over-use the data by ignoring joint and correlated uncertainties for inferred biases and power-spectra, leading to wrong conclusions on the accuracy of inferred quantities and possibly biased results because not all parameters are explored jointly.
In principle there may also be cases where a joint inference of power spectrum and galaxy biases may be mutually beneficial. Traditional approaches would miss such opportunities.
In this paper we aim for the full characterization of the joint posterior of different tracer populations labeled by the index . We will achieve this by simultaneously exploring the three dimensional density field , the corresponding power spectrum , the galaxy biases and the galaxy density contrast normalizations given the data – galaxy observations in the form of three dimensional number counts .
3.2. Dissecting the posterior distribution
Direct analysis of the joint posterior is challenging: it is non-Gaussian and very high-dimensional. From a technical point of view, it is also not advisable to directly generate random variates from the joint posterior distribution since this may result in a numerically unfeasible algorithm preventing rapid exploration of the parameter space. In such a situation one usually has to rely on Metropolis-Hastings algorithms, which reduce the problem of numerical parameter space exploration to the design of suitable proposal distribution for the generation of candidate solutions. A particular important theorem on Metropolis Hastings block sampling permits to dissect the high dimensional sampling problem into a sequence of lower dimensional problems once conditional distributions for the subtasks can be formulated (Hastings, 1970). It is therefore possible to subdivide the exploration of the full joint parameter space into the following sequence of conditional sampling procedures:
where labels the sample number. Iterating these individual sampling steps will then provide samples from the full joint posterior distribution (Hastings, 1970). Also see figure 1 where we have illustrated this multiple block sampling procedure, also clarifying the flow of information in subsequent iterations. In the following we will discuss the individual contributions to the full joint posterior distribution as listed in equation (6).
3.3. The density posterior
We now derive the posterior distribution to infer the three dimensional density field (for further details see Jasche et al., 2010).
3.3.1 The density prior distribution
This work mainly concerns the inference of the largest scales, where the density field can be reasonably well described by Gaussian statistics. A Gaussian prior will therefore be an adequate description of the statistical behavior of the large scale density field. It should nevertheless be remarked, that a Gaussian prior resembles the least informative prior, once the mean and the covariance of the density field are specified. For this reason, from a Bayesian perspective, the Gaussian prior is well justified approximation even for inference of the density field in the non-linear regime. Therefore, in the following, we will assume the prior distribution for the density contrast to be a multivariate normal distribution with zero mean and the Fourier transform of the covariance matrix being the cosmological power spectrum . With these definitions the prior distribution for the density contrast can be written as:
For further discussions the interested reader is referred to Jasche et al. (2010).
3.3.2 The density likelihood and the data model
As already discussed above, it is crucial to account for different clustering behavior of galaxy populations in the large scale structure sample from which density fields and power-spectra are inferred. In order to do so, we split the galaxy sample into sub samples for which we can treat the respective systematic and statistical uncertainties. The aim of this work is to present a method which can extract joint information on the three dimensional density contrast field , the power spectrum , the bias and the mean number of galaxies in the survey from a set of galaxy samples, all tracing the underlying density field differently, while properly accounting for the individual systematic and stochastic uncertainties. We present the method in a general fashion that includes joint cosmological analyses from two or more different galaxy surveys or even different probes of the large scale structure, while accounting for their respective systematic and stochastic uncertainties. In the case of sub samples or galaxy observations, a linear data model for each of the subsets can be formulated as:
where are the observed number counts of galaxies in the th sample and in the th volume element, is the corresponding expected number of galaxies, is the corresponding survey response function, which accounts for survey geometry and selection function, is a linear galaxy bias, is the linear growth function and is a random noise component. Also note, that here we are mainly interested in recovering the largest scales and so a linear bias should suffice. We will defer the inclusion of more complicated non-linear and non-local biases to a future work. The survey response operator is given by the product of the angular mask or completeness function and the radial selection function at the th volume element:
By following the arguments in Jasche et al. (2010), we will assume the random noise component to follow a Gaussian distribution with zero mean and noise covariance matrix , given as:
where is the Kronecker delta. Equation (10) resembles the Covariance matrix of a Poisson process when averaged over realizations of the density contrast , since the average of vanishes (see Jasche et al., 2010, for some discussions of the noise covariance matrix). Alternatively, the assumed Gaussian uncertainty model can be derived by a Taylor-expansion of the Poissonian log-likelihood to second order in the density field. This demonstrates, that the Gaussian likelihood model is correct at large scales, where density amplitudes are small, while it describes an approximation to small scale uncertainties. Although this Gaussian model assumes independence of the noise from the underlying density field, the number counts of the tracer populations are correlated in spatially overlapping regions as they all refer to the same underlying three dimensional density field . In this fashion, the data model given in equation (8) emulates the expected behavior of galaxy formation models. In this work, we are focusing at inferring the largest scales from observations, for which our data model is adequate. Note, that precise inference of the large scale structure in the non-linear regime while accounting for the full Poissonian noise statistics can be performed with the method presented in Jasche and Kitaura (2010) and Jasche et al. (2010).
With these definitions the likelihood for the dataset can be expressed as a multivariate normal distribution:
The joint likelihood from all datasets is then obtained as:
where we assumed conditional independence of the individual likelihoods, once the density contrast , the mean galaxy numbers and the biases are given.
3.4. The Power-spectrum posterior distribution
As described above, sampling from the Wiener posterior distribution will provide realizations of full three dimensional density fields conditioned on the observations. Basing on these density fields, in subsequent steps realizations of the corresponding power spectrum can be generated by sampling from the conditional power spectrum posterior distribution:
where we assumed conditional independence of the data on the power spectrum, once the full three dimensional density field is given, and in the last line we used the density prior given in equation (7). The result provided by equation (13) demonstrates that once a realization of the three dimensional density field is given, inferring a realization of the power spectrum is conditionally independent on the data. This fact was first pointed out by Wandelt et al. (2004) in case of CMB analysis and was later adapted for large scale structure inference by Jasche et al. (2010). This is a particularly important result for the power spectrum sampling procedure. Since all observational systematics and uncertainties have already been accounted for in inferring a density field realization, this considerably simplifies conditional power spectrum inference (also see Jasche et al., 2010, for a deeper discussion). Also note, that equation (13) is a standard result for Bayesian density and power spectrum inference with Wiener posteriors in the case of CMB and large scale structure analyses (see e.g. Wandelt et al., 2004; Eriksen et al., 2004; Jasche et al., 2010). It is interesting to remark that equation (13) provides a direct sampling procedure to generate power spectrum realizations. Following Jasche et al. (2010), equation (13) can be re expressed as an inverse Gamma distribution, when employing a power-law prior for the power spectrum amplitudes. Thus, inserting a power law prior in equation (13) and imposing the correct normalization, the conditional power spectrum posterior can be written as (Jasche et al., 2010):
where is the power spectrum amplitude of the th bin corresponding to the Fourier mode , is the number of modes in that bin, is the power-law index of the power-law prior, is the Gamma function and is given as the squared isotropic average over the mth mode bin in Fourier space :
with being the Fourier transform of the density field. Note, that the power-law prior with describes a ”flat” prior on a unit scale, while yields a Jeffreys Prior. Jeffreys prior is a solution to a measure invariant scale transformation, and hence is a scale independent prior, as different scales have the same probability. For this reason, Jeffreys prior is optimal to infer cosmological power-spectra, which constitute scale measurements, as it does not introduce any bias on a logarithmic scale. For a more detailed derivation and discussion of the Inverse Gamma distribution in case of LSS inference the reader is referred to our previous work presented in Jasche et al. (2010).
3.5. The bias posterior distribution
Accounting for galaxy biases is particularly important when analyzing galaxy samples exhibiting widely varying clustering behavior across galaxy types. These galaxy biases give rise to additional systematics, which when ignored, have the potential to distort the shape of inferred power-spectra. We extend the power spectrum sampling framework described in Jasche et al. (2010) with a galaxy bias sampling procedure. To be maximally agnostic about the functional shape of the galaxy bias as a function of galaxy properties such as luminosity or color, we will assign a galaxy bias to each of the individual galaxy sub samples. Another approach would be to assume a functional shape for the galaxy bias as a function of some galaxy properties such as luminosity and just fit the corresponding function parameters. Such an approach would be a posteriori and hence overuse the data since knowledge of such a functional shape is generally extracted from the galaxy sample under consideration itself. We will instead pursue an approach that only assumes that bias properties remain approximately constant within luminosity bins.
Realizations of the bias factors can then be obtained by sampling from the conditional bias posterior distribution given as:
where the bias factors are conditionally independent of the power spectrum, once the full three dimensional density field is known. Using Bayes theorem we can rewrite the conditional bias posterior as:
where we introduced a prior factorizing in the different galaxy samples and assumed conditional independence . Since the probability distribution given in equation (17) factorizes in the galaxy sub samples, one can estimate the bias factors for each galaxy sample separately by sampling from the following distribution:
In order to follow the maximum ignorance principle also with regard to the amplitude of the bias, we assume a flat prior . With these assumptions the conditional bias posterior distribution is simply proportional to the density likelihood given in equation (3.3.2):
with given as:
and the mean given as:
A random realization for the galaxy bias can thus be obtained by simply sampling from a univariate normal distribution. Note, that while previous methods such as presented in Tegmark (2004) measure the relative bias from estimated power-spectra, our method infers the bias parameters directly from the relation between the data and the three dimensional density field. Further, since the density field contains information inferred jointly from all galaxy sub samples, inference of individual bias factors depends on all other biases via the density field, enabling us to measure relative biases and thus the shape of the power spectrum up to an overall normalization.
3.6. The posterior distribution
Of similar importance as galaxy biases are the mean numbers of galaxies for the sub samples. These mean numbers of galaxies are crucial to define the true underlying density contrast of the galaxy distribution. A false value of will yield a non-vanishing mean and result in erroneous large scale power in the inferred power-spectra. Since the are not known a priorly, they will also have to be inferred jointly, in order not to miss possible cross-correlations and interdependencies. The parameters for the sub samples can then be inferred from a conditional distribution given as:
where, in a similar fashion as described in section 3.5, we assume conditional independence of the power spectrum once the full three dimensional density field is given. Further, following similar arguments as described in section 3.5, it is possible to infer each parameter separately, since the corresponding probability distribution factorizes. As a result we obtain the following probability distribution for the parameter :
Note, that also here we followed a maximum ignorance approach by setting the prior . As can be seen the resultant probability distribution for the parameter is proportional to the density likelihood given in equation (LABEL:eq:LH_per_dataset). Following the discussion presented in appendix A, the probability distribution for the given in equation (LABEL:eq:nmean_post) is a generalized inverse Gaussian distribution (GIG) given as:
where is a modified Bessel function of the second kind, , is given as:
The number of observed voxels is given by:
where the Heaviside function counts the non zero elements of the three dimensional survey mask. Random realizations for the parameters can then be provided by sampling from the GIG distribution. It is interesting to note, that the inferred also depend on the galaxy bias parameters and the three dimensional density field. This indicates, that a joint estimation of all these parameters is required in order to correctly account for joint information and correlated uncertainties. Algorithms to sample from the GIG distribution are described in literature (see e.g. Dagpunar, 1988).
4. Numerical implementation
In this section we will describe the numerical implementation of the large scale structure sampler.
4.1. Sampling the density field
As is manifest from the above discussion, the three dimensional density sampling step resembles the core of our method. The Wiener posterior for the density sampling step described in section 3 is a multivariate normal distribution, and as such possesses a clear sampling procedure, by first calculating the Wiener-mean and adding a variance term corresponding to the Wiener-variance (see e.g. Wandelt et al., 2004; O’Dwyer et al., 2004; Eriksen et al., 2004; Jewell et al., 2004; Larson et al., 2007; Eriksen et al., 2007; Jewell et al., 2009; Jasche et al., 2010). Nevertheless, calculating the Wiener mean and variance requires matrix inversions for large matrices with typically on the order of entries. These matrix inversions can be performed with efficient operator based implementations of conjugate gradient methods (see e.g. Wandelt et al., 2004; Eriksen et al., 2004; Kitaura et al., 2009; Jasche et al., 2010). Alternatively, the Wiener posterior in case of CMB data has also been explored using a Hybrid Monte Carlo (HMC) algorithm (Taylor et al., 2008). Furthermore, the HMC has been successfully applied for the inference of three dimensional density fields in the non-linear regime (see e.g. Jasche and Kitaura, 2010; Jasche et al., 2010; Jasche and Wandelt, 2012). Since these references have found the HMC algorithm to be very efficient in large scale structure problems we implement an HMC algorithm for the exploration of the large scale structure density field. Compared to the direct sampling approach based on iterative techniques, HMC is less sensitive to the effects of numerical inaccuracies, e.g. due to insufficiently strict tolerances in the inversion scheme, because the final accept/reject step of the Metropolis Hastings sampler ensures correctness of the target density. Any such inaccuracies would only impact efficiency but not correctness.
In the following we will review the basics of the HMC algorithm required for this work.
4.1.1 The HMC algorithm
The numerical efficiency of the Hybrid Monte Carlo algorithm originates from the fact that it exploits techniques developed to follow classical dynamical particle motion in potentials to deterministically provide new proposals to the Metropolis-Hastings algorithm (Duane et al., 1987; Neal, 1993, 1996). Assuming we wish to generate random variates from a probability distribution , with being a set consisting of the elements , we may interpret the negative logarithm of this distribution as a potential :
Further, by introducing a ’momentum’ variable and a ’mass matrix’ , one can add a kinetic term to the potential in order to obtain a Hamiltonian function, which, similar to ordinary mechanics, describes the dynamics in a high dimensional parameter space. Such a Hamiltonian can be written as:
The new target probability distribution is then obtained by exponentiating the Hamiltonian given in equation (29):
As can be seen from equation (30), the form of the Hamiltonian was chosen such, that the new target distribution is separable into a Gaussian distribution in the momenta and the target distribution . This immediately clarifies that the two sets of variables and are independent and marginalization over the artificial momenta yields the original target distribution .
Given an initial point in this high dimensional phase space it is possible to deterministically propose a new sample from the target distribution by following persistent trajectories for some fixed time according to Hamilton’s equations:
A new point in phase space will be accepted according to the standard Metropolis-Hastings acceptance rule:
Since Hamiltonian dynamics conserve the Hamiltonian, the HMC approach results in an acceptance rate of unity. In practical applications, however, the acceptance rate may be lower due to numerical inaccuracies in the integration scheme. Once a new sample has been accepted the momentum variable is discarded and the process restarts by randomly drawing a new set of momenta. The individual momenta will not be stored, and therefore discarding them amounts to marginalizing over this auxiliary quantity. Hence, the Hamiltonian sampling procedure basically consists of two steps. The first step is a Gibbs sampling step, which yields a new set of Gaussian distributed momenta. The second step, on the other hand amounts to solving a deterministic dynamical trajectory on the posterior surface.
4.1.2 Equations of motion for the Wiener Posterior
As described above, in this work we will employ a Wiener posterior for the inference of the three dimensional density field. In this case the Hamiltonian potential can be written as:
where we omitted terms corresponding to normalization constants. Variation of with respect to the three dimensional density field then yields the Hamiltonian forces which govern the dynamics in parameter space:
With this definition the equations of motion can be given as:
As can be seen from equation (4.1.2), in the framework of the HMC the quadratic form of the Wiener posterior yields a multidimensional coupled harmonic oscillator whose zero points are defined by the data. Also note, that equations (36) and (4.1.2) can be combined to yield a second order differential equation:
This clarifies that an ideal choice for the mass matrix would be given by:
which would decouple the harmonic oscillators and permit a trivial analytic solution to the equations of motion for a set of independent harmonic oscillators. Nevertheless, this approach would again require matrix inversions and hence be identical to a Gibbs sampling approach, which we try to avoid here (see e.g. Wandelt et al., 2004; O’Dwyer et al., 2004; Eriksen et al., 2004; Jewell et al., 2004; Larson et al., 2007; Eriksen et al., 2007; Jewell et al., 2009; Jasche et al., 2010). In order to provide samples from the target distribution, we will therefore integrate the equations of motion (36) and (4.1.2) numerically over a pseudo time interval via a leapfrog scheme. In order to avoid resonant trajectories, the pseudo time interval is drawn from a uniform distribution. It is also important to remark, that the leapfrog scheme is a symplectic integrator, and as such ensures detailed balance of the Markov chain. For a detailed description of the numerical implementation of the HMC in general and in cosmology in particular the reader is referred to the literature (in particular see Duane et al., 1987; Neal, 1993, 1996; Taylor et al., 2008; Jasche and Kitaura, 2010). In general the choice of the mass matrix does not affect the validity but only the numerical efficiency of the sampling method. We choose a diagonal form consisting of the diagonal elements of the full mass matrix, given in equation (39), in its Fourier representation. This choice enhances performance while avoiding matrix inversions, analogous to using a diagonal preconditioner in iterative conjugate gradients solvers.
4.2. Sample the power spectrum
The conditional power spectrum posterior, given by the inverse gamma distribution in equation (14) , facilitates a direct sampling scheme (see e.g. Eriksen et al., 2004; Wandelt et al., 2004; Jasche et al., 2010). In particular, by performing a change of variables one obtains the -distribution
where . Consequently, generating random variates for the power spectrum reduces to the task of drawing independent random samples from the -distribution for each Fourier mode . Let be independent, normally distributed random variates with zero mean and unit variance. Then (Jasche et al., 2010):
is -distributed, with a element vector with elements . A sample of the power spectrum amplitude at Fourier mode is then obtained by:
It should be remarked that each is a positive quantity, ensuring the requirement that the signal covariance matrix be a positive definite quantity.
4.3. Exploring the low signal to noise regime
In a Poissonian picture of galaxy formation the signal-to-noise ratio can be expressed as the square root of the number of galaxies found in a given volume . Consequently, splitting a galaxy sample into a number of sub samples, to accurately account for the respective systematics will yield a lower signal-to-noise ratio for each of these galaxy sub samples. Although, the Likelihood given in equation (12) correctly accounts for the joint recovery of information from all galaxy samples by correctly weighting them corresponding to their respective noise properties, the overall signal-to-noise ratio may still be low, depending on the amount of sub samples. This effect may influence the numerical efficiency for the power spectrum sampling procedure as described above.
While the approach described in sections 4.1 and 4.2 provably converges to the target posterior on all scales probed by the data, it may take prohibitive computational time to generate a sufficient amount of independent samples in low signal to noise regime (Jewell et al., 2009). In particular, the variations in subsequent power spectrum samples are solely determined by cosmic variance, whereas the full posterior distribution is governed by cosmic variance and noise. This permits a rapid exploration of parameters at the largest scales, where cosmic variance is dominant, but yields poor statistical efficiency for the small scale regime, where cosmic variance is typically much smaller than the noise variance. This results in a prohibitively long correlation length of the sequence of spectra in the low-signal to noise regime, requiring unfeasible long Markov Chains to generate sufficient numbers of independent samples (Jewell et al., 2009). The above discussion shows that the variables chosen to explore the parameter space are optimal for the cosmic variance dominated regime, but it may also suggest that a change of variables is advantageous for the exploration of the low signal to noise regime.
In fact, a change of variables can greatly improve the mixing behavior of the Markov Chain. Following Jewell et al. (2009), we introduce a change of variables which corresponds to the following deterministic transition for the three dimensional density field in Fourier space:
while, as demonstrated in appendix C, a new power spectrum sample has to be generated from the distribution:
with and labeling the sample number in the Markov chain. Unfortunately there exists no immediate way to generate power spectrum samples from this distribution. While Jewell et al. (2009) propose a simple Metropolis-Hastings sampler to generate samples from the distribution given in equation (4.3), we will employ a HMC to solve the problem. As already discussed above, the HMC has the advantage that it greatly improves the acceptance probability compared to standard implementations of a Metropolis-Hastings algorithm. Furthermore, the entire procedure of sampling a power spectrum from the distribution given in equation (4.3) followed by the deterministic transition, described by equation (43), for the density field is numerically as expensive as the sampling procedure outlined in sections 4.1 and 4.2. Alternating these two sampling procedures will thus not decrease the overall numerical efficiency of the algorithm. On the other hand, alternated sampling in these two variable bases greatly improves the statistical efficiency and hence the overall performance of our algorithm.
5. Generating artificial observations
Above we discussed the derivation and numerical implementation of our method. In this section we will describe the generation of an artificial galaxy survey, used to test our method. Following closely the description in Jasche et al. (2010), at first a realization for the density contrast is generated from a normal distribution with zero mean and a covariance matrix corresponding to a cosmological power spectrum. The density field will be stored on a three dimensional equidistant Cartesian grid with , corresponding to volume elements, and a co moving box length of . The cosmological power spectrum for the underlying matter distribution, with baryonic wiggles, is calculated according to the prescription described in Eisenstein and Hu (1998) and Eisenstein and Hu (1999). Further, a standard CDM cosmology with a set of cosmological parameters (, , , , , ) is assumed.
On top of this three dimensional density contrast realization , artificial galaxy sub samples will be generated corresponding to the data model described in section 3.3.2. For each subsample the noise term is generated from a multivariate normal distribution with zero mean and a diagonal covariance matrix .
In particular, the aim of the present work is to test our method in a realistic scenario with largely inhomogeneous galaxy samples with respect to their clustering behavior. As already discussed above, such a sample would not represent a spatially homogeneous sample of the three dimensional density field, and hence introduces scale dependent biases in Fourier space (Tegmark, 2004). This situation is usually faced when analyzing real galaxy observations (see e.g. Tegmark, 2004). Subdividing the galaxy sample into sub samples permits to account for the respective systematic effects while jointly extracting information on the cosmological power spectrum from a galaxy survey.
In setting up a realistic test scenario, we emulate the main features of the Sloan Digital Sky survey data release 7 as closely as possible (Abazajian, 2009). In particular, we use the completeness mask of the Sloan Digital Sky Survey data release 7 depicted in the right panel of figure 3. This mask was evaluated with the MANGLE code provided by Swanson et al. (2008) and has been stored on a HEALPIX map with (Górski et al., 2005). Further, we assume that the radial selection function follows from a standard Schechter luminosity function with standard r-band parameters ( , ), and we only include galaxies within an apparent Petrosian r-band magnitude range and within the absolute magnitude ranges to . The corresponding radial selection function is then obtained by integrating the Schechter luminosity function over the range in absolute magnitude:
where is given in appendix D. The selection functions for the galaxy sub samples are depicted in the left panel of figure 3. The product of the survey geometry and the selection function at each point in the three dimensional volume yields the survey response operator:
where and are the right ascension and declination coordinates corresponding to the th volume element, and is the corresponding redshift. The functions and are the continuous survey completeness and selection function, respectively, sampled at the th grid position. Further, we will model a luminosity dependent bias, in terms of absolute magnitudes M, as:
with the fitting parameters , and (see Tegmark, 2004, for details). Since equation (47) formulates only the relative bias with respect to a reference point in absolute magnitude , we will arbitrarily set and for our test scenario. Finally we have to define a artificial set of mean galaxy numbers . In the case studied here, the different are directly related to the Schechter luminosity function by integrating over the width of the th absolute magnitude bin:
The individual galaxy observations are then obtained by evaluating equation 8 with the quantities as specified above.
6. Testing the algorithm
The aim of this section is to conduct a series of tests to judge the performance of our method in a realistic scenario by applying the algorithm to the simulated galaxy survey described in the previous section. We particularly focus on studying the burn-in and convergence behavior of our method, which determines its overall numerical feasibility and efficiency in real-world applications.
6.1. Testing burn-in and statistical efficiency
Metropolis Hastings samplers are designed to have the target distribution as their stationary distribution (see e.g. Metropolis et al., 1953; Hastings, 1970; Neal, 1993). The sampling process will provide samples from the specified joint large scale structure posterior distribution after a sufficiently long burn-in phase. The theory of Metropolis Hastings sampling by itself does not provide any criterion to determine the length of the burn-in phase, so we study it in numerical experiments.
Typically, burn-in manifests itself as a systematic drift of sampled parameter values towards their true underlying values from which the artificial data was generated. This behavior can be monitored by following the evolution of parameters in subsequent samples (see e.g. Eriksen et al., 2004; Jasche et al., 2010). To test this initial burn-in behavior we will arbitrarily set the initial values for the parameters and to unity , while the initial power spectrum amplitudes for are assumed to be two orders of magnitude larger than their true underlying values. One can then observe the successive drift towards the true values of the respective parameters. The 1000 first successive samples of the Markov chain are presented in figure 4, which clearly demonstrates the drift towards the respective true solutions.
Figure 4 already illustrates the interplay between bias and power spectrum samples. While bias parameters approach their true values from the top, the power spectrum approaches its fiducial value from below. This effect originates from the anti-correlation between biases and power-spectra for given data. In particular high bias values will result in lower power spectrum amplitudes and vice versa.
More quantitatively the initial burn-in behavior is studied by comparing the th sample of a parameter in the chain to its true underlying value via:
where for our case, the parameter sample has to be replaced by one of the parameters sampled by the Markov chain. The results for the , and are shown in the lower panels of figure 4. It can be seen that the algorithm drifts towards the preferred region in parameter space and starts exploring the parameters corresponding to their joint uncertainties. Also note, that figure 4 already indicates the degree of scatter which can be expected in the final analysis of the Markov run.
In this test we do not observe any particular hysteresis for the poorly constrained large scale modes, meaning they do not remain at their initially set values but efficiently explore the parameter space. This reflects the ability of our algorithm to account for artificial mode coupling as introduced by the survey geometry, selection effects and biases.
These tests show that the algorithm requires an initial phase of on the order of samples before providing typical samples from the joint posterior. This initial burn-in period can be shortened by initializing the algorithm with a set of parameters closer to the true solutions, contrary to the test presented here where the initialization was intentionally chosen to test the burn-in behavior.
Statistical efficiency is a major design goal of Markov Chain Monte Carlo algorithms. Generally, successive samples in the chain are not independent but correlated with their predecessors. Consequently the degree of correlation between successive samples determines the amount of independent samples which can be drawn from a chain with given length. To study this correlation effect we follow a similar approach as described in (Eriksen et al., 2004) or (Jasche et al., 2010).
Assuming all parameters in the Markov chain to be independent of each other, the correlation between subsequent samples for the parameter can be quantified in terms of the autocorrelation function
where is the distance in the chain measured in iterations (also see e.g. Eriksen et al., 2004; Jasche et al., 2010, for a similar discussion). The results for this analysis are presented in figure 5 which shows the auto-correlation functions for the parameters , and .
One can define a correlation length of the Markov sampler as the distance in the chain beyond which the correlation coefficient has dropped below a threshold of . As can be seen in figure 5 the correlation length is typically less than 200 samples, except for a few power-spectrum modes in the Nyquist regime of the box, which are correlated across a few hundred samples. This demonstrates the overall high statistical efficiency of the sampler. Note, that these Nyquist modes are under-sampled due to the finite resolution of the Fast Fourier Transform. Even though our method treats these eight Nyquist frequencies correctly, they will not be considered for a typical cosmological analyses where usually only modes with frequencies less than seventy percent of the Nyquist frequency are considered (see e.g. Jing, 2005).
These tests demonstrate the numerical feasibility of our method to explore the full joint posterior distribution despite the high dimensionality of the problem presented here.
6.2. Joint uncertainties and correction of systematics
The previous section demonstrates the numerical and statistical efficiency of our algorithm to explore the joint large scale structure posterior in high dimensional spaces. As will be shown in this section, the algorithm also accurately accounts for systematics and joint stochastic uncertainties in the inference of the different parameter. Of particular importance for the measurement of cosmological parameters from power-spectra are the effects of survey geometry and selection effects. Survey geometry and selection effects introduce artificial mode coupling to the Fourier modes of the three-dimensional density field, and can greatly reduce the visibility of features in the power spectrum, such as the baryonic acoustic oscillations (Cole and Percival, 2005). Correcting these effects is hence of utmost importance for any method to estimate the cosmological power spectrum from observations.
The information on how well our method corrects for survey geometry effects is naturally contained in the correlation structure of the individual power spectrum samples. To examine this effect, we calculate the correlation matrix of the power spectrum samples
where is the power spectrum amplitude at the mode . The result of this exercise is presented in figure 6, where the ensemble average has been taken over power spectrum samples. As can be clearly seen, the resultant correlation matrix exhibits a well defined diagonal structure, as expected from theory. Also note the slight red band of anti-correlation around the diagonal, particularly at the highest frequencies in figure 6. This anti-correlation indicates that the power spectrum frequency resolution is higher than supported by the data. Note that these effects are accurately accounted for by our method, since it provides the full probability distribution for each mode. If one wishes, it is possible to reduce frequency resolution in a post-processing step until anti-correlation vanishes (Jasche et al., 2010).
It should be noted, that generally the posterior distributions for the individual are non-Gaussian. Consequently, two-point statistics of the power spectrum, as presented in figure 6, do not provide conclusive information on the full statistical behavior of the . Since our method provides a numerical representation of the large scale structure posterior higher order statistics for the power spectrum can naturally be taken into account. To illustrate this fact, we plot two dimensional marginalized posterior distributions for a few arbitrarily chosen pairs of s in figure 7.
In the previous section 6.1 we saw burn-in behavior that suggested bias parameters being anti-correlated with inferred power-spectra . This behavior is easily understood: a higher bias parameter requires a lower amplitude of the power spectrum to match observations. The details and strength of these effects generally depend on the magnitude bin for the bias parameter and the mode of the power spectrum. To quantify these effects we now study the posterior correlations between the inferred bias parameters , the and the power spectrum . Figure 8 shows the correlation matrices
where and have to be replaced by the parameter under consideration. The scale dependent anti-correlation between bias parameters and power-spectra is clearly visible in the left panel of figure 8. Note, that we are measuring relative biases and for that reason the bias amplitude at a magnitude is fixed to 1.
The maximum anti-correlation amounts to about 20 per cent, and significant anti-correlation can be observed at all scales and for all magnitudes. This result emphasizes our claim, that even simple linear bias amplitudes cannot be estimated independently from power-spectra and vice versa.
The middle panel of figure 8 shows the correlations between the and different modes of the power spectrum. Generally correlations and anti-correlations amount maximally up to a few percent. However, at the magnitude bin of the fixed bias amplitude is strongly anti-correlated with the modes of the power spectrum. This indicates, that carries additional information on the bias amplitude at the magnitude bin .
Finally, the right panel of figure 8 shows the correlations between inferred