Bayesian nonlinear large scale structure inference of the Sloan Digital Sky Survey data release 7
Abstract
In this work we present the first nonlinear, nonGaussian full Bayesian large scale structure analysis of the cosmic density field conducted so far. The density inference is based on the Sloan Digital Sky Survey data release 7, which covers the northern galactic cap. We employ a novel Bayesian sampling algorithm, which enables us to explore the extremely high dimensional nonGaussian, nonlinear lognormal Poissonian posterior of the three dimensional density field conditional on the data. These techniques are efficiently implemented in the HADES computer algorithm and permit the precise recovery of poorly sampled objects and nonlinear density fields. The nonlinear density inference is performed on a 750 Mpc cube with roughly 3 Mpc gridresolution, while accounting for systematic effects, introduced by survey geometry and selection function of the SDSS, and the correct treatment of a Poissonian shot noise contribution. Our high resolution results represent remarkably well the cosmic web structure of the cosmic density field. Filaments, voids and clusters are clearly visible. Further, we also conduct a dynamical web classification, and estimated the web type posterior distribution conditional on the SDSS data.
keywords:
large scale – reconstruction –Bayesian inference – cosmology – observations – methods – numerical1 Introduction
Observations of the large scale structure have always attracted enormous interest, since they contain a wealth of information on the origin and evolution of our Universe. The details of structure formation are very complicated and involve many different physical disciplines ranging from quantum field theory, general relativity or modified gravity to the dynamics of collisionless matter and the behavior of the baryonic sector. Throughout cosmic history the interplay of these different physical phenomena therefore has left its imprints in the matter distribution surrounding us. Probes of the large scale structure, such as large galaxy surveys, hence enable us to test current physical and cosmological theories and will generally further our understanding of the Universe.
Especially a cosmographical description of the matter distribution will permit us to study details of structure formation mechanisms and the clustering behavior of galaxies as well as it will provide information on the initial fluctuations and large scale cosmic flows. For this reason, several different methods to recover the three dimensional density or velocity field from galaxy observations have been developed and applied to existing galaxy surveys (Ebeling & Wiedenmann, 1993; Hoffman, 1994; Lahav, 1994; Lahav et al., 1994; Zaninetti, 1995; Fisher et al., 1995; Zaroubi et al., 1995; Webster et al., 1997; Zaroubi et al., 1999; van de Weygaert & Schaap, 2001; Erdoğdu et al., 2006; Erdoğdu et al., 2004; Kitaura et al., 2009). In particular, recently Kitaura et al. (2009) presented a high resolution three dimensional Wiener reconstruction of the Sloan Digital Sky Survey data release 6 data, which demonstrated the feasibility of high precision density field inference from galaxy redshift surveys. These three dimensional density maps are interesting for a variety of different scientific applications, such as studying the dependence of galaxy properties on their cosmic environment, increasing the detectability of the integrated SachsWolfe effect in the CMB or performing constrained simulations (see e.g. Bistolas & Hoffman, 1998; Lee & Lee, 2008; Lee & Li, 2008; Frommert et al., 2008; Klypin et al., 2003; Libeskind et al., 2009; MartinezVaquero et al., 2009).
However, modern precision cosmology demands an increasing control of observational systematic and statistical uncertainties, and the means to propagate them to any finally inferred quantity in order not to draw wrong conclusion on the theoretical model to be tested. For this reason, here we present the first application of the new Bayesian large scale structure inference computer algorithm HADES (HAmiltonian Density Estimation and Sampling) to data (see Jasche & Kitaura, 2009, for a description of the algorithm). HADES performs a full scale nonlinear, nonGaussian Markov Chain Monte Carlo analysis by drawing samples from the lognormal Poissonian posterior of the three dimensional density field conditional on the data. This extremely high dimensional posterior distribution, consisting of or more free parameters, is explored via a numerically efficient Hamiltonian sampling scheme which suppresses the random walk behavior of conventional Metropolis Hastings algorithms by following persistent trajectories through the parameter space (Duane et al., 1987; Neal, 1993, 1996). The advantages of this method are manyfold. Beside correcting observational systematics introduced by survey geometry and selection effects, the exact treatment of the nonGaussian behavior and structure of the Poissonian shot noise contribution of discrete galaxy distributions, permits very accurate recovery of poorly sampled objects, such as voids and filaments. In addition, the lognormal prior has been demonstrated to be an adequate statistical description for the present density field and hence enables us to infer the cosmic density field deep into the nonlinear regime (see e.g. Hubble, 1934; Peebles, 1980; Coles & Jones, 1991; Gaztanaga & Yokoyama, 1993; Kayo et al., 2001). The important thing to remark about HADES is, that it does not only yield a single estimate, such as a mean, mode or variance, in fact it provides a sampled representation of the full nonGaussian density posterior. This posterior encodes the full nonlinear and nonGaussian observational uncertainties, which can easily be propagated to any finally inferred quantity.
The application of HADES to Sloan Digital Sky Survey (SDSS) data therefore is the first nonlinear, nonGaussian full Bayesian large scale structure analysis conducted so far (SDSS; York et al., 2000). In particular, we applied our method to the recent SDSS data release 7 (DR7) data (DR7; Abazajian et al., 2009), and produced about 3TB of valuable scientific information in the form of high resolution nonlinear density samples. The density inference is conducted on an equidistant cubic grid with side length Mpc consisting of volume elements. The recovered density field clearly reveals the cosmic web structure, consisting of voids, filaments and clusters, of the large scale structure surrounding us.
These results provide the basis for forthcoming investigations on the clustering behavior of galaxies in relation to their largescale environment. Such analyses yield valuable information about the formation and evolution of galaxies. In example, it has been known since long that physical properties such as morphological type, color, luminosity, spin parameter, star formation rate, concentration parameter, etc., are functions of the cosmic environment (see e.g. Dressler, 1980; Postman & Geller, 1984; Whitmore et al., 1993; Lewis et al., 2002; Gómez et al., 2003; Goto et al., 2003; Rojas et al., 2005; Kuehn & Ryden, 2005; Blanton et al., 2005; Bernardi et al., 2006; Choi et al., 2007; Park et al., 2007; Lee & Lee, 2008; Lee & Li, 2008).
In this work we already conduct a preliminary examination of the dependence of stellar mass and color of galaxies on their largescale environment. However, more thorough investigations will be presented in following works. Analyzing galaxy properties in the largescale environment also requires to classify the large scale structure into different cosmic web types. We do so by following the dynamic cosmic web type classification procedure as proposed by Hahn et al. (2007) with the extension of ForeroRomero et al. (2009). The application of this procedure to our results yields the cosmic web type posterior, which provides the probability of finding a certain web type (void, sheet, filament, halo) at a given position in the volume conditional on the SDSS data. This permits simple propagation of all observational uncertainties to the final analysis of galaxy properties.
The paper is structured as follows. We start by a brief review of the methodology in section 2, particularly describing the lognormal Poissonian posterior and the Bayesian computer algorithm HADES. Additionally, here we describe the dynamic web classification procedure as mentioned above. In section 3 we give a description of the SDSS DR7 data and present necessary data preparation steps required to apply the analysis procedure. Specifically, we describe the preparation of the linear observation response operator and the creation of the three dimensional data cube. In the following section 4 we present the results obtained from the nonlinear, nonGaussian sampling procedure. We start by analyzing the convergence behavior of the Markov chain via a Gelman & Rubin diagnostic, followed by a discussion of the properties of individual Hamiltonian samples. Here we also provide estimates for the ensemble mean density field and according variance. These fields depict remarkable well the properties of the cosmic web consisting of voids, filaments and halos. Based on these results we perform a simple cosmic web classification in section 5. In section 6, we present a preliminary examination on the correlation between the largescale environment of galaxies and their physical properties. In particular, here we study the stellar mass and color of galaxies in relation with the density contrast . We conclude the paper in section 7 by summarizing and discussing the results.
2 Methodology
In this section we give a brief review of the methods used for the large scale structure inference. In particular, we discuss the lognormal Poissonian posterior, and the according data model. Further, we give a description of the HADES algorithm and a dynamic cosmic web classification procedure.
2.1 Lognormal Poissonian posterior
Precision inference of the large scale structure in the mildly and strongly nonlinear regime requires detailed treatment of the nonGaussian behavior of the large scale structure posterior. Although, the exact probability distribution for the density field in these regimes is not known, for a long time already it has been suggested that the fully evolved nonlinear matter field can be well described by lognormal statistics (see e.g. Hubble, 1934; Peebles, 1980; Coles & Jones, 1991; Gaztanaga & Yokoyama, 1993; Kayo et al., 2001). This phenomenological guess has been justified by the theoretical considerations of Coles & Jones (1991). They argue that assuming Gaussian initial conditions in the density and velocity distributions will lead to a lognormally distributed density field. It is a direct consequence of the continuity equation or the conservation of mass. In addition, the validity of the lognormal distribution as a description of the statistical properties of nonlinear density fields has been evaluated in Kayo et al. (2001). In this work, they studied the probability distribution of cosmological nonlinear density fluctuations from Nbody simulations with Gaussian initial conditions. They found that the lognormal distribution accurately describes the nonlinear density field even up to values of the density contrast of . In addition, recently Kitaura et al. (2009) analyzed the statistical properties of the SDSS DR6 Wiener reconstructed density field, and confirmed a lognormal behavior.
For all these reasons, we believe, that the statistical behavior of the nonlinear density field can be well described by a multivariate lognormal distribution, as given by:
(1) 
where is the density signal at the three dimensional cartesian position , is the covariance matrix of the lognormal distribution and describes a constant mean field given by:
(2) 
This probability distribution, seems to be an adequate prior choice for reconstructing the present density field.
Studying the actual matter distribution of the Universe requires to draw inference from some observable tracer particle, such as a set of observed galaxies. Assuming galaxies to be discrete particles, their distribution can be described as a specific realization drawn from an inhomogeneous Poisson process (see e.g. Layzer, 1956; Peebles, 1980; Martínez & Saar, 2002). The according probability distribution is given as:
(3) 
where is the observed galaxy number at position in the sky and is the expected number of galaxies at this position. The mean galaxy number is related to the signal via:
(4) 
where is a linear response operator, incorporating survey geometries and selection effects, is the mean number of galaxies in the volume and is a nonlinear, non local, bias operator at position . The lognormal prior given in equation (1) together with the Poissonian likelihood given in equation (3) yields the lognormal Poissonian posterior, for the density contrast given some galaxy observations :
(5)  
It is important to note, that this is a highly nonGaussian distribution, and nonlinear reconstruction methods are required in order to perform accurate matter field reconstructions in the nonlinear regime. In example, estimating the maximum a posteriori values from the lognormal Poissonian distribution involves the solution of implicit equations. Several attempts to use a lognormal Poissonian posterior for density inference have been presented in literature. These attempts date back to Sheth (1995) who proposed to use a variable transformation in order to derive a generalized Wiener filter for the lognormal distribution. This approach, however, yielded a very complex form for the noise covariance matrix making applications to real data sets impractical. The first successful application of the lognormal Poissonian distribution for density inference was presented by Saunders et al. (2000). Their method is based on the expansion of the density logarithm into spherical harmonics (Saunders & Ballinger, 2000). More accurate schemes based on maximum and mean posteriori principles were derived by (Enßlin et al., 2008). Recently, an implementation of the maximum a posteriori scheme was presented and thoroughly tested by (Kitaura et al., 2009). They found that, assuming a linear bias, the lognormal Poissonian posterior permits recovery of the density field deep in the nonlinear regime up to values of the density contrast. Finally, Jasche & Kitaura (2009) developed the Hamiltonian density estimation and sampling scheme to map out the posterior probability distribution.
2.2 Hades
As already described above, Bayesian nonlinear large scale structure inference requires to sample from nonGaussian posterior distributions. In order to do so, we developed HADES (see Jasche et al., 2009, for more details). HADES explores the very high dimensional parameter space of the three dimensional density field via an Hamiltonian Monte Carlo (HMC) sampling scheme. Unlike conventional Metropolis Hastings algorithms, which move through the parameter space by a random walk, and therefore require prohibitive amounts of steps to explore high dimensional spaces, the HMC sampler suppresses random walk behavior by introducing a persistent motion of the Markov chain through the parameter space (Duane et al., 1987; Neal, 1993, 1996). In this fashion, the HMC sampler maintains a reasonable efficiency even for high dimensional problems (Hanson, 2001). Since it is a fully Bayesian method, the scientific output is not a single estimate, but a sampled representation of the multidimensional lognormal Poissonian posterior distribution given in equation (5). Given this representation of the posterior any desired statistical summary, such as mean, mode or variances can easily be calculated. Further, any uncertainty can seamlessly be propagated to the finally estimated quantities, by simply applying the according estimation procedure to all Hamiltonian samples. For a detailed description of the theory behind the large scale structure sampler and its numerical implementation see Jasche et al. (2009).
2.3 Classification of the cosmic web
The results generated by the Hamiltonian sampler HADES will permit a variety of scientific analyses of the large scale structure in the observed Universe. An interesting example is to classify the cosmic web, in particular identifying different types of structures in the density field. Such an analysis, in example, is valuable for studying the environmental dependence of galaxy formation and evolution (see e.g. Lee & Lee, 2008; Lee & Li, 2008). Since the structure classification is not always unique, we provide the full Bayesian posterior distribution of the structure type at a given position conditional on the observations.
However, to do so we first need a means to identify different structure types from the density field. Numerous methods for structure analysis have been presented in literature (see e.g. Lemson & Kauffmann, 1999; Colberg et al., 2005; Novikov et al., 2006; Hahn et al., 2007; AragónCalvo et al., 2007; Colberg et al., 2008; ForeroRomero et al., 2009). In principle, all of these methods can be applied for the analysis of the Hamiltonian samples, however for the purpose of this paper we follow the dynamical cosmic web classification procedure as proposed by Hahn et al. (2007). They propose to classify the large scale structure environment into four web types (voids, sheets, filaments and halos) based on a localstability criterion for the orbits of test particles. The basic idea of this dynamical classification approach is that the eigenvalues of the deformation tensor characterize the geometrical properties of each point in space. The deformation tensor is given by the Hessian of the gravitational potential :
(6) 
with being the rescaled gravitational potential given as (see ForeroRomero et al., 2009):
(7) 
It is important to note, that the deformation tensor, and the rescaled gravitational potential are both physical quantities, and hence their calculation requires the availability of a full physical density field in contrast to a smoothed mean reconstruction of the density field. As was already mentioned above, and will be clarified in section 4.2, the Hamiltonian samples provide such required full physical density fields. The deformation tensor can therefore easily be calculated for each Hamiltonian sample from the Fourier space representation of equation (6). Each spatial point can then be classified as a specific web type by considering the three eigenvalues, , of the deformation tensor. Namely, a void point corresponds to no positive eigenvalue, a sheet to one, a filament to two and a halo to three positive eigenvalues (ForeroRomero et al., 2009). The interpretation of this rule is straight forward, as the sign of a given eigenvalue at a given position defines, whether the gravitational force at the direction of the principal direction of the corresponding eigenvector is contracting (positive eigenvalues) or expanding (negative eigenvalues). However, ForeroRomero et al. (2009) found that rather than using a threshold value of zero different positive values can yield better web classifications. For this reason, in this work, we use the extended classification procedure as proposed by ForeroRomero et al. (2009). The structures are then classified according to the rules given in table 1. By applying this classification procedure to all Hamiltonian samples we are able to estimate the web type posterior of four different web types ( void, sheet, filament, halo) conditional on the observations and the threshold criterion .
Structure type  rule 

Void  
Sheet  and 
Filament  and 
Halo 
3 Data
In this section we describe the SDSS galaxy sample used for the analysis. Additionally, we discuss the data preparation steps required to perform the three dimensional density inference procedure.
3.1 The SDSS galaxy sample
We use data from Sample dr72 of the New York University Value
Added Catalogue (NYUVAGC)
The NYUVAGC also provides the necessary information to correct for incompleteness in our spectroscopic sample. This includes two parts: a mask which shows which areas of the sky have been targeted and which have not, and a radial selection function which gives the fraction of galaxies in the absolute magnitude range being considered that are within the apparent magnitude range of the sample at a given redshift. The mask defines the effective area of the survey on the sky, which is 6437 deg for the sample we use here. This survey area is divided into a large number of smaller subareas, called polygons, for each of which the NYUVAGC lists a spectroscopic completeness, defined as the fraction of photometrically identified target galaxies in the polygon for which usable spectra were obtained. Over our sample the average completeness is 0.92.
3.2 Completeness and selection function
Three dimensional density field inference requires the definition of the linear observation response operator , as given in section 2.1. This response operator describes to what percentage each volume element of the three dimensional domain has been observed. It is hence a projection of the product of radial and angular selection function into the three dimensional voxelized space. In particular, we have to solve the convolution integral:
(8) 
where is the voxel kernel, is the radial selection function, with being the distance from the observer and is the angular selection function, where and are right ascension and declination respectively. We evaluate this integral numerically for the nearest grid point kernel by following different line of sights and calculating the contribution of the product of angular and radial selection function to each voxel.
As mentioned above, in this work we used the two dimensional sky mask and the radial selection function provided by the NYUVAGC.
3.3 Creating the three dimensional data cube
The large scale structure sampler operates on a three dimensional equidistant grid. In particular, in this work we set up a cubic grid with side length and voxels. This amounts to a resolution of voxel side length. Since our algorithm relies on the correlation function in comoving space, all calculations are performed with comoving length units rather than with redshift distances. For this reason, we transform all galaxy redshifts to comoving distances via the relation:
(9) 
where is the estimated galaxy redshift, is the speed of light and is the Hubble parameter given as:
(10) 
Further, we choose a concordance CDM model with a set of cosmological parameters (Spergel et al., 2007). With these definitions we can calculate the three dimensional cartesian coordinates for each galaxy as:
(11) 
where and are the right ascension and declination respectively. We then sort the galaxy distribution into the three dimensional equidistant grid via a nearest grid point procedure (see e.g. Hockney & Eastwood, 1988). An estimate for the expected number of galaxies can then be calculated as:
(12) 
(see e.g. Kitaura et al., 2009; Jasche et al., 2009, for details).
3.4 Physical model
Observations of the galaxy redshifts do not permit direct inference of the underlying matter distribution. Various physical effects such as galaxy biasing and redshift space distortions must be taken into account for proper analyses. This is of particular relevance for the choice of powerspectrum required for the sampling procedure (see equation (1)). However, according to the discussion in Erdoğdu et al. (2004) and Kitaura et al. (2009) these effects can be greately accounted for in a separate postprocessing step, once the continuous expected galaxy density field in redshift space has been obtained. For this reason, here we seek to recover the density field in redshift space permitting us to test various bias models and redshift space distortions correction methods in a subsequent step.
In particular, the relation between the true underlying dark matter density field and the expected continuous galaxy density contrast is generally very complicated and involves nonlocal and nonlinear bias operators. Several nonlocal bias models have been presented, which mostly aim at correcting the large scale power in powerspectrum estimation procedures (see e.g. Tegmark et al., 2004; Seljak, 2000; Peacock & Smith, 2000; Hamann et al., 2008). As described in section 2 and 2.2 the Hamiltonian sampler is able to account for such bias models. Note however, that a specific bias model also fixes the model for the underlying dark matter distribution. Therefore, here, we prefer to follow the approach of previous works of setting the bias operator to a constant linear factor equal to unity (Erdoğdu et al., 2004; Kitaura et al., 2009). In this fashion, one obtains the expected continuous galaxy density contrast. As discussed in Kitaura et al. (2009), the according underlying dark matter distribution can then be simply obtained by deconvolving the results with a specific scale dependent bias model, permitting us to explore various different bias models.
In a similar manner, one can treat redshiftspace distortion effects. These are mainly due to the peculiar velocities of galaxies, which introduces Doppler effects in the redshift measurement (see e.g. Kaiser, 1987; Peacock & Dodds, 1994; Hamilton, 1998; Davis & Peebles, 1983). This effect leads to a radial smearing of the observed density field in redshiftspace and yields elongated structures along the line of sight, the so called fingerofGod effect.
Additionally, there exists a cosmological redshiftspace effect which is sensitive to the global geometry of the Universe. In particular, the comoving separation of a pair of galaxies at is not determined only by their observable angular and redshift separations without specifying the geometry, or equivalently the matter content of the Universe (Magira et al., 2000). This effect yields anisotropies in the matter distribution especially at (see e.g. Alcock & Paczyński, 1979; Matsubara & Suto, 1996; Ballinger et al., 1996; Popowski et al., 1998). However, for the volume considered in this work (), the dominant redshiftspace distortions are due to nonlinear peculiar motions of galaxies in large overdensities. This effect has pronounced consequences for the powerspectrum in redshiftspace, since it suppresses power on small scales. As demonstrated in Erdoğdu et al. (2004), the redshiftspace powerspectrum of a fully evolved nonlinear matter distribution is very similar to a linear powerspectrum at the scales relevant for this work ( ). Here, they used the nonlinear powerspectrum fitting formula provided by Smith et al. (2003). However, the exact galaxy powerspectrum in redshiftspace is not known. The work of Tegmark et al. (2006) indicates that the recovered powerspectrum of the SDSS main sample is close to a linear powerspectrum, which may be due to the fact that this galaxy sample is not strongly clustered. In this case, the redshiftspace powerspectrum would be even closer to a linear powerspectrum. In any case assuming a linear powerspectrum will still permit physically accurate matter field inference in redshiftspace (Erdoğdu et al., 2004). For this reason, in the absence of more precise information on the galaxy powerspectrum in redshiftspace, here we will assume a linear powerspectrum, calculated according to the prescription provided by Eisenstein & Hu (1998) and Eisenstein & Hu (1999). One should also bear in mind that the data itself will govern the inference process. For this reason, powerspectra measured from the Hamiltonian samples will only be partially defined by the a priori powerspectrum guess but mostly by the data. However, we defer a more careful treatment of all physical effects including a joint inference of density field and powerspectrum to a future work.
It is clear, that precise correction of these redshiftspace effects requires knowledge about the peculiar velocities of all observed galaxies, which is usually not provided by galaxy redshift surveys. Therefore, precise correction of redshiftspace distortions is very complicated and subject to ongoing research. In the linear regime, the theory behind the observed redshiftspace distortions is well developed (Kaiser, 1987; Hamilton, 1998). However, in quasilinear and nonlinear regimes, we instead have to resort to making approximations or using fitting formulae based on numerical simulations (Percival & White, 2009). Literature provides numerous approaches to alleviate these redshiftspace distortions particularly in powerspectrum estimation. Most of these approaches aim at restoring the correct power by deconvolution with an redshiftspace convolution kernel which takes into account the random pair velocities of galaxies in collapsed objects (see e.g Peacock & Dodds, 1994; Ballinger et al., 1996; Jing et al., 1998; Hamilton, 1998; Kang et al., 2002; Jing & Börner, 2004; Erdoğdu et al., 2004; Scoccimarro, 2004; Cabré & Gaztañaga, 2009; Percival & White, 2009). Such techniques have been adopted to correct Wiener density reconstructions by applying a redshift distortion operator to the final result, in order to restore the correct power (Erdoğdu et al., 2004; Kitaura et al., 2009). However, it must be noted that this method does not account for the correction of phase information, and therefore only corrects the twopoint statistics of the recovered density field.
Three dimensional density inference hence requires redshiftspace distortions corrections which also account for phase information and would be dependent on the density or gravitational potential. In the linear regime it is possible to apply an inverse redshiftspace operator which transforms the redshiftspace density to its realspace counterpart (Taylor & Valentine, 1999; D’Mellow & Taylor, 2000). However, it does not account for the strongly nonlinear regime which mostly generates the fingerofGod effect. For this reason Tegmark et al. (2004) proposed a fingerofGod compression method. Here they use a standard friendsoffriends algorithm to identify a cluster by taking into account different density thresholds, which set the linking length. They then measure the dispersion of galaxies about the cluster center along the line of sight and in transverse direction. If the radial dispersion exceeds the transverse dispersion, the cluster is compressed radially until the radial dispersion equals the transverse dispersion (Tegmark et al., 2004). However, it is not clear to what degree such a method would falsely isotropize filaments or under dense objects along the line of sight to spherical clusters. Such a method of isotropizing the density field, however, can also be applied in a post processing step, by noting that a density threshold refers to a linking length in the friendsoffriends algorithm.
Nevertheless, the above correction methods mask the fact that redshiftspace distortions introduce statistical uncertainties. Thus unique recovery of the realspace density field is generally not possible. A full characterization of the joint uncertainties of the realspace density hence would require to carefully take into account the uncertainties introduced by redshiftspace distortions or the lack of knowledge on peculiar velocities. This can be achieved by introducing a density dependent peculiar velocity sampling scheme to our method, as proposed by Kitaura & Enßlin (2008). However, we defer sampling of the peculiar velocities to a future work.
4 Results
In this section we describe the results obtained from the large scale structure inference procedure.
4.1 Convergence test
HADES is a Markov Chain Monte Carlo sampler and hence we have to test, if the individual Hamiltonian samples really represent the lognormal Poissonian posterior. Convergence diagnostic of Markov chains is subject of many discussions in literature (see e.g. Heidelberger & Welch, 1981; Gelman & Rubin, 1992; Geweke, 1992; Raftery & Lewis, 1995; Cowles & Carlin, 1996; Hanson, 2001; Dunkley et al., 2005). However, here we apply the widely used Gelman & Rubin diagnostic, which is based on multiple simulated chains by comparing the variances within each chain and the variance between chains (Gelman & Rubin, 1992). In particular, we calculate the potential scale reduction factor (PSRF) (see Jasche & Kitaura, 2009). A large PSRF indicates that the inter chain variance is substantially greater than the intra chain variance and longer chains are required. Once the PSRF approaches unity, one can conclude that each chain has reached the target distribution. We calculated the PSRF for each voxel in our calculation domain. The result for this test is presented in figure 1. It indicates convergence of the Markov chain. However, it can be seen that some regions of the domain converge faster than others. This is due to the fact, that not all regions of the cubical volume have been observed equally. Regions which contain good observations converge faster, since there the probability distribution is narrower, while poorly or non observed regions converge slower, since the space of possible solutions is larger. Also note, that the Gelman & Rubin diagnostic is generally a conservative test, and other tests might indicate convergence much earlier. However, this test clearly demonstrates that the quality and amount of observational data can have a strong impact on the convergence behavior of the chain.
4.2 Hamiltonian samples
Since the Markov chain converges we can conclude, that the individual samples are really samples from the large scale structure posterior. At this point it is important to insist that the Hamiltonian samples are not the result of a filtering procedure. A filter generally suppresses the signal in lowsignal to noise regions, and therefore produces biased estimates for the physical density field. This is not the case for the individual Hamiltonian samples. Since they are random realizations of the lognormal Poissonian posterior, they are unbiased density fields in the sense that they possess correct physical power throughout the entire cubical volume. As an example we present slices through an arbitrary density sample in figure 2. Already visually, one has the impression, that the density field has equal power throughout the entire domain, even in the unobserved regions. This is because the Hamiltonian sampler nonlinearly augments the poorly or not observed regions with statistically correct information. Each density sample therefore is a proper physical density field, from which physical quantities can be derived. To demonstrate this, we measure the powerspectra of some of these Hamiltonian samples. The result is presented in figure 3. As can be seen, the powerspectra of the individual samples, are very close to the assumed linear CDM powerspectrum. The deviations at large scales and small scales are due to the impact of the data. At small scales the deviation can be explained by redshift space distortions, while at the largest scales cosmic variance is dominant. There is clearly no sign of artificial power loss due to the survey geometry. Since the individual samples are valid density field realizations, it is easy to derive meaningful physical quantities, such as the gravitational potential, cosmic flows or the tidal shear tensor as demonstrated in the remainder of this paper.
4.3 Ensemble mean and variance
Here we want to present the ensemble mean and variance for the set of Hamiltonian samples, each consisting of voxels. For comparison with a single density sample the middle panels of figure 2 show the according slices through the ensemble mean density field, which exhibits many interesting features. First of all, it renders remarkably well the filamentary structure of our cosmic neighborhood. Many clusters, filaments and voids can clearly be seen by visual inspection. In the unobserved regions the ensemble mean density amplitudes drop to the cosmic mean for the density contrast , just as required by construction. Structures close to the observer, at cartesian coordinates , are more clearly visible than structures at larger distances. Especially, filaments and voids are less prominent at larger distances. This is due to the observational response operator , which due to the radial selection function drops to very low values at large distances. Therefore, once a galaxy is detected far away from the observer, it must reside inside a large overdensity, and hence inside a cluster. This expectation is clearly represented by the ensemble mean density field. Another interesting point to remark is, that the borders to the unobserved regions are not very sharp. Some of the observed information is nonlinearly propagated into the unobserved regions, since our method takes into account the correlation structure of the underlying signal. It can therefore be seen, that some clusters and voids are interpolated further out into the unobserved regions. In comparison to the Wiener filter as previously applied to SDSS data by Kitaura et al. (2009), it seems that the Hamiltonian sampler is more conservative and less optimistic for the extrapolation of information into the unobserved region. This may be due to the fact, that here we take into account the full Poissonian noise statistics rather than restricting the noise to a Gaussian approximation. Beside the ensemble mean, here we also calculate the ensemble variance per voxel, which is the diagonal of the full ensemble covariance matrix. Some slices through the ensemble mean, ensemble variance and the according slices through the observational response operator are presented in figure 4. Here the middle panels correspond to ensemble variance. At first glance, one can nicely see the Poissonian nature of the galaxy shot noise. High density peaks in the ensemble mean map correspond to high variance regions in the ensemble variance map, as expected for Poissonian noise. One can clearly see that the Hamiltonian sampler took into account the full three dimensional noise structure of the galaxy distribution. Additionally, with larger distance to the observer, the average variance increases, as is expected due to the radial selection function. It is also interesting to remark, that some voids have been detected with quite low variance, hence with high confidence. Note, however, although here we only plotted the diagonal of the density covariance matrix, the full nondiagonal covariance structure is completely encoded in the set of Hamiltonian samples, and can be taken into account for future analysis. Also note, that the variance slices show high variances in regions where many galaxies have been observed. This is a key feature of the Poisson statistics, because the standard deviation is equal to the squareroot of the number of individual galaxies. That is, if there are galaxies in each voxel, the mean is equal to and the standard deviation is equal to . This makes the signaltonoise ratio equal to for such an homogeneous case. To emphasize the fact, that regions which show high variances have also high signaltonoise ratios, we calculate the density to variance ratio:
(13) 
The result of this calculation is presented in figure 5 for the case of the lower slices of figure 4. It clearly indicates high signaltonoise ratios in high density regions.
In addition, we also estimate the cumulative probabilities , at twenty different density threshold values , for the density found at each voxel. This cumulative probabilities are estimated from the Hamiltonian samples by:
(14) 
where labels the individual Hamiltonian samples, is the total number of samples and is the Heaviside function. These cumulative probabilities allow for example to estimate the median density at each voxel, and can be usefull, when analyzing galaxies in their cosmic environment as will be done in a following project. Some such cumulative probability distributions, chosen randomly, are shown in figure 6. As can be seen, the recovered density amplitudes extend over a large range, from small linear to very high nonlinear values.
5 Web classification
Already in the introduction we mentioned that the results presented in section 4 are to be used for analyzing galaxy properties in the largescale environment in a future work. Such analyses also require the classification of the large scale density field into different web type objects. Therefore, in order to characterize the environment of our SDSS galaxy population, here we apply the dynamic web classification procedure, as described in section 2.3, to the set of Hamiltonian samples. A similar analysis has been previously carried out by Lee & Erdogdu (2007) and Lee & Lee (2008) based on a Wiener mean density reconstructions of the 2MASS Redshift survey to study alignments of galaxy spins with the tidal field and the variation of galaxy morphological type with environmental shear.
Here we will follow a similar procedure to classify each individual voxel of a given Hamiltonian sample as one of the four web types , with the different types being void, sheet, filament, halo. To do so, we perform the following three steps for an individual Hamiltonian sample:

Solve equation (6) for the deformation tensor by means of Fast Fourier Transform techniques

Solve the cubic characteristic equation for the three eigenvalues of the deformation tensor at each spatial position

Apply the rules given in table 1 to classify the web type at each spatial position for a given threshold value .
The result of this procedure for the th sample is then a unit four vector at each voxel position . All of the entries of this four vector are zero except for one, which indicates the web type. Applying the above method to all Hamiltonian samples will yield a set of classification four vectors, which encodes the information and uncertainty of the observations. Additionally, as an intermediate result, we obtain the set of the three eigenvalues for each individual Hamiltonian sample. Slices through their ensemble mean estimates are presented in figure 7.
However, rather than summarizing the results in terms of mean and variance here we want to estimate the full cosmic web posterior. This is achieved by counting the relative frequencies for each web type at each individual spatial coordinate within the set of Hamiltonian samples. With these definitions we yield the cosmic web posterior for each web type as:
(15) 
where labels the individual Hamiltonian samples, is the total number of samples and is the Kronecker delta. The cosmic web posterior incorporates all observational information and uncertainties, and enables us to determine how well different structures can be classified with respect to observational uncertainties.
We evaluate the cosmic web posterior for four different values of , with . Slices through the cosmic web posteriors for the four different cases are presented in figure 8. It can be clearly seen, that the properties of the survey geometry are represented by the four posterior distributions. While the web classification in the observed regions clearly follows the structure of the underlying density field, it obviously can not provide a clear classification of unobserved regions. Also with distance to the observer, the web classification becomes more and more uncertain. In this fashion, the cosmic web posterior renders the uncertainties introduced by the radial selection function and the resulting higher shot noise contribution at larger distances. The impact of the threshold can be observed when comparing the four cosmic web posteriors. In the case of the cosmic web consists of many small isolated voids, which occupy only a small fraction of the total area of the slice. With increasing threshold , voids become bigger and more connected until they completely dominate the cosmic web for the case . The opposite behavior can be observed in case of the halo posteriors, as the number of clearly detected halos declines with increasing threshold . Following ForeroRomero et al. (2009), we also calculate the volume occupied by each web type (the volume filling fraction  VFF) and the fraction of mass contained in such a volume (mass filling fraction  MFF). The results are presented in figure 9, and show the same behavior as described in ForeroRomero et al. (2009). Figure 9 supports the visual impression, gained by inspection of figure 8, that especially the VFF and MFF for voids strongly depend on the threshold value . This shows that voids can serve as a sensitive monitor and indicator of the cosmic web (ForeroRomero et al., 2009). Unfortunately, ForeroRomero et al. (2009) do not provide an explicit gauging of the values from simulations. Such a gauging and hence a clear definition of the different cosmic web types would be very valuable for these types of analysis.
Having now a representation of the web type posterior we can for example calculate the odds ratio given as:
(16) 
which tells us how much a specific web type is favored over all others. Here, the can be obtained by averaging over all voxels in the volume. In example, this permits us to build a simple structure type map which can be used for visual analyses as presented in the next section. Such a map can be defined as:
(17) 
where is an odds threshold usually chosen larger than unity.
6 Galaxy properties versus LSS
In this section we present a preliminary, but intuitive examination of the correlations between the largescale environment of galaxies and their physical properties. Here we consider two properties of galaxies: stellar mass and color, and study how these are correlated with the overdensity of the largescale environment and its type, which is one of the four web types as classified as halo, filament, sheet, and void. We will come back to this topic in a separate paper by considering more physical properties of galaxies and performing more careful and quantitative analyses.
Our results are shown in figures 10 and 11 where we plot the galaxies in our sample with different stellar masses and colors, on top of a slice through the ensemble mean density field. In each figure the four panels correspond to four intervals as indicated. The galaxies falling into a given range are plotted in the corresponding panel, with red (blue) galaxies being shown as red (blue) dots. Here we classify each galaxy into red or blue population using its color and the luminositydependent divider as described in Li et al. (2006) (see their Eq. 7 and Table 4). The observer on Earth is at the bottom righthand corner of the slice where and Mpc. The density field with Mpc is projected onto the plane and is repeated in every panel. In figure 10 the background density field is coded by the mean overdensity, , averaged for each pixel over the range probed and the 40,000 Hamiltonian samples. In figure 11 we present a structure type map as defined in equation (17) by choosing an odds threshold of and . Each pixel of this map is colorcoded by the web type which is determined by our classification algorithm described above, with types of halo, filament, sheet and void being plotted in black, light grey, dark grey and white respectively.
Qualitatively the galaxies plotted in these figures appear to closely trace the underlying largescale structure. This is not surprising because, by construction, the latter is reconstructed from the former. However, careful comparison of the different panels reveals a number of interesting trends. First, there exists a clear correlation between galaxy mass and the largescale environment, regardless of how the environment is quantified. More massive galaxies tend to reside in regions with higher densities and more halolike structures. At the highest masses, almost all galaxies are confined within regions of high densities, or those of halo and filament types. As decreases, more and more galaxies are found in voidlike regions. Second, at fixed stellar mass, galaxy color also appears to be correlated with largescale environment. Red galaxies trace the density field more closely than blue galaxies. At all masses, the distribution of blue galaxies is more extended across the different types of structures. At low masses, the blue population dominates the galaxies in voidlike environment.
These trends are consistent with recent similar studies by Lee & Lee (2008) and Lee & Li (2008), which were based on much shallower galaxy samples (thus smaller volume), and also with the clustering analyses of Li et al. (2006). More work is needed in order to have more quantitative characterization of the relationships between galaxy properties and the largescale environment, and thus more powerful constrains on galaxy formation models. These results, in turn, can be fed back to the large scale structure inference and help to improve our cosmographical description of the Universe.
7 Summary and Conclusion
In this work we present the first application of the nonlinear, nonGaussian Bayesian large scale structure inference algorithm HADES to SDSS DR7 data.
HADES is a numerically efficient implementation of a Hamiltonian Markov Chain sampler, which performs sampling in extremely high parameter spaces usually consisting of or more free parameters. In particular, HADES explores the lognormal Poissonian density posterior, which permits precision recovery of poorly sampled objects and density field inference deep into the nonlinear regime (Jasche et al., 2009).
The large scale structure inference was conducted on a cubic equidistant grid with sidelength of Mpc consisting of voxels, yielding a grid resolution of about Mpc. The large scale structure inference procedure correctly accounts for the survey geometry, completeness and radial selection effects as well as for the correct treatment of Poissonian noise. The analysis yielded about 3 TB of valuable scientific information in the form of full three dimensional density samples of the lognormal Poissonian density posterior. This set of density samples is thus a sampled representation of the full nonGaussian density posterior distribution and therefore encodes all observational systematics and statistical uncertainties. Hence, all uncertainties and systematics can seamlessly be propagated to any finally inferred quantity, by simply applying the according inference procedure to the set of samples. In this fashion, the results permit us to make precise and quantitative statements about the large scale density field and any derived quantity.
We stress that our Hamiltonian samples are not the result of a filtering procedure. A filter generally suppresses the power of the signal in low signaltonoise regions and therefore does not yield a physical meaningful density, since it lacks power in poorly or unobserved regions. However, each Hamiltonian density sample represents a complete physical matter field realization conditional on the observations, in the sense that it possesses correct physical power throughout the entire volume. Already visual inspection of these density samples shows a homogeneous distribution of power throughout the entire volume. This fact was emphasized by the demonstration of powerspectra measured from these density samples, which show no sign of being affected by lack of power or artificial mode coupling nor do they show any sign of being affected by an adaptive smoothing kernel as would be expected for filter applications. It should be noted that this fact marks the crucial difference of our method to previous filter based density estimation procedures.
In section 4.3, we estimated the ensemble mean and the according variance from the density samples. The estimated ensemble mean nicely depicts the cosmic web consisting of filaments, voids and clusters extracted from the SDSS data. It is clear, that the ensemble mean represents the mean estimated from the lognormal Poissonian posterior conditional on the SDSS data. Therefore, it encodes the observational uncertainties and systematics. This can be seen by the fact, that the ensemble mean approaches cosmic mean density in poorly or not observed regions. Further, we plotted the according variance, which demonstrates that the nonGaussian behavior and structure of the Poissonian shot noise was correctly taken into account in our analysis. Especially, the expected correlation between high mean density and high variance regions was clearly visible. We also estimated the cumulative probabilities for the density amplitude at each volume element, and demonstrated that the recovered density fields truly cover the broad range from linear to nonlinear density amplitudes.
To characterize the environment of our galaxy sample, but also to demonstrate the advantages of the Hamiltonian samples, we performed an example cosmic web type classification in section 5. In particular, we followed the dynamical cosmic web classification approach of Hahn et al. (2007) with the extensions proposed by ForeroRomero et al. (2009). This procedure involves the calculation of the cosmic deformation tensor and its eigenvalues. We demonstrated that this procedure can easily be applied to the set of samples, since they represent full physical matter field realizations. As a byproduct of this procedure we estimated the ensemble mean for the three eigenvalues of the cosmic deformation tensor. Further, we classified the individual volume elements as one of the four different web types void, sheet, filament and halo. The classification into four discrete web types enabled us to explicitely estimate the cosmic web posterior, which provides the probability of finding a specific web type at a given point in the volume conditional on the SDSS data. This result is especially appealing from a Bayesian point of view, since it emphasizes the fact, that the result of a Bayesian method is a complete probability distribution rather than just a single estimate. Here we saw, that especially voids are a sensitive measure for the cosmic web. Of course, it is possible to repeat the cosmic web classification in a similar manner with any other classification procedure.
In the following section 6, we presented a preliminary examination of the correlation between the largescale environment and physical properties of galaxies. In particular, we considered the stellar mass and color of galaxies in relation to the density contrast . A qualitative analysis revealed that there exist correlation between these galaxy properties and the large scale structure. In particular, massive galaxies are more likely to be found in massive structures, while low mass galaxies reside in void like structures. The plots also demonstrate the different clustering behavior of red and blue galaxies. Also note, that these observed trends are consistent with previous works (Lee & Lee, 2008; Lee & Li, 2008; Li et al., 2006). However, more work is required in order to provide quantitative statements. This will be done in a forthcoming publication.
The results presented in this work will be valuable for many subsequent scientific analyses of the dependence of galaxy properties on their cosmic environment. Herefore, particularly the Hamiltonian samples allow for a more intuitive handling of observational data, since they can be understood as full matter field realizations or different multiverses consistent with our data of the Universe we live in. Beside providing quantitative characterizations of the large scale structure, the results also give us an intuitive understanding of the three dimensional matter distribution in our cosmic neighborhood. We intend to make our data publically available to the community.
Future applications will also take into account nonlinear bias models and peculiar velocity sampling procedures, to provide even more accurate density analyses.
We hope that this work demonstrates the potential of Bayesian large scale structure inference and its contribution to current and future precision analyses of our Universe.
Acknowledgments
We would like to thank Ofer Lahav and Benjamin D. Wandelt for suggesting us to use the lognormal Poissonian posterior for large scale structure inference. We also thank Simon D. M. White for encouraging discussions. Particular thanks also to Rainer Moll and Björn Malte Schäfer for usefull discussions and support with many valuable numerical gadgets. The authors also thank Benton R. Metcalf for many interesting discussions and comments on this project and Andreas Faltenbacher for suggesting us to estimate the cosmic deformation tensor. Special thanks also to María Ángeles Bazarra Castro for helpful assistance during the course of this project. Further, we thank the ”Transregional Collaborative Research Centre TRR 33  The Dark Universe” for the support of this work.
Funding for the SDSS and SDSSII has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is http://www.sdss.org/.
The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the MaxPlanckInstitute for Astronomy (MPIA), the MaxPlanckInstitute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.
Footnotes
 pagerange: Bayesian nonlinear large scale structure inference of the Sloan Digital Sky Survey data release 7–Bayesian nonlinear large scale structure inference of the Sloan Digital Sky Survey data release 7
 pubyear: 2006
 http://sdss.physics.nyu.edu/vagc/
References
 Abazajian K. N., et al., 2009, ApJS, 182, 543
 Alcock C., Paczyński B., 1979, Nature, 281, 358
 AragónCalvo M. A., Jones B. J. T., van de Weygaert R., van der Hulst J. M., 2007, A&A, 474, 315
 Ballinger W. E., Peacock J. A., Heavens A. F., 1996, MNRAS, 282, 877
 Bernardi M., Nichol R. C., Sheth R. K., Miller C. J., Brinkmann J., 2006, AJ, 131, 1288
 Bistolas V., Hoffman Y., 1998, ApJ, 492, 439
 Blanton M. R., Brinkmann J., Csabai I., Doi M., Eisenstein D., Fukugita M., Gunn J. E., Hogg D. W., Schlegel D. J., 2003, AJ, 125, 2348
 Blanton M. R., Eisenstein D., Hogg D. W., Schlegel D. J., Brinkmann J., 2005, ApJ, 629, 143
 Blanton M. R., Lin H., Lupton R. H., Maley F. M., Young N., Zehavi I., Loveday J., 2003, AJ, 125, 2276
 Blanton M. R., Roweis S., 2007, AJ, 133, 734
 Cabré A., Gaztañaga E., 2009, MNRAS, 396, 1119
 Choi Y., Park C., Vogeley M. S., 2007, ApJ, 658, 884
 Colberg J. M., et al., 2008, MNRAS, 387, 933
 Colberg J. M., Sheth R. K., Diaferio A., Gao L., Yoshida N., 2005, MNRAS, 360, 216
 Coles P., Jones B., 1991, MNRAS, 248, 1
 Cowles M. K., Carlin B. P., 1996, Journal of the American Statistical Association, 91, 883
 Davis M., Peebles P. J. E., 1983, ApJ, 267, 465
 D’Mellow K. J., Taylor A. N., 2000, in S. Courteau & J. Willick ed., Cosmic Flows Workshop Vol. 201 of Astronomical Society of the Pacific Conference Series, Generalising the Inverse RedshiftSpace Operator: Vorticity in RedshiftSpace. p. 298
 Dressler A., 1980, ApJ, 236, 351
 Duane S., Kennedy A. D., Pendleton B. J., Roweth D., 1987, Physics Letters B, 195, 216
 Dunkley J., Bucher M., Ferreira P. G., Moodley K., Skordis C., 2005, MNRAS, 356, 925
 Ebeling H., Wiedenmann G., 1993, PRE, 47, 704
 Eisenstein D. J., Hu W., 1998, ApJ, 496, 605
 Eisenstein D. J., Hu W., 1999, ApJ, 511, 5
 Enßlin T. A., Frommert M., Kitaura F. S., 2008, ArXiv eprints
 Erdoğdu P., et al., 2004, MNRAS, 352, 939
 Erdoğdu P., Lahav O., Huchra J., et al. 2006, MNRAS, 373, 45
 Fisher K. B., Lahav O., Hoffman Y., LyndenBell D., Zaroubi S., 1995, MNRAS, 272, 885
 ForeroRomero J. E., Hoffman Y., Gottlöber S., Klypin A., Yepes G., 2009, MNRAS, 396, 1815
 Frommert M., Enßlin T. A., Kitaura F. S., 2008, MNRAS, 391, 1315
 Gaztanaga E., Yokoyama J., 1993, ApJ, 403, 450
 Gelman A., Rubin D., 1992, Statistical Science, 7, 457
 Geweke J., , 1992, Evaluating the Accuracy of SamplingBased Approaches to the Calculation of Posterior Moments
 Gómez P. L., et al., 2003, ApJ, 584, 210
 Goto T., Yamauchi C., Fujita Y., Okamura S., Sekiguchi M., Smail I., Bernardi M., Gomez P. L., 2003, MNRAS, 346, 601
 Hahn O., Porciani C., Carollo C. M., Dekel A., 2007, MNRAS, 375, 489
 Hamann J., Hannestad S., Melchiorri A., Wong Y. Y. Y., 2008, Journal of Cosmology and AstroParticle Physics, 7, 17
 Hamilton A. J. S., 1998, in Hamilton D., ed., The Evolving Universe Vol. 231 of Astrophysics and Space Science Library, Linear Redshift Distortions: a Review. p. 185
 Hanson K. M., 2001, in Sonka M., Hanson K. M., eds, Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series Vol. 4322 of Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series, Markov chain Monte Carlo posterior sampling with the Hamiltonian method. pp 456–467
 Heidelberger P., Welch P. D., 1981, Commun. ACM, 24, 233
 Hockney R. W., Eastwood J. W., 1988, Computer simulation using particles. Taylor & Francis, Inc., Bristol, PA, USA
 Hoffman Y., 1994, in Balkowski C., KraanKorteweg R. C., eds, Unveiling LargeScale Structures Behind the Milky Way Vol. 67 of Astronomical Society of the Pacific Conference Series, Wiener Reconstruction of the LargeScale Structure in the Zone of Avoidance. p. 185
 Hubble E., 1934, ApJ, 79, 8
 Jasche J., Kitaura F. S., , 2009, Fast Hamiltonian Sampling for large scale structure inference, submitted to MNRAS
 Jasche J., Kitaura F. S., Wandelt B. D., Enßlin T. A., , 2009, Bayesian powerspectrum inference for Large Scale Structure data, submitted to MNRAS
 Jing Y. P., Börner G., 2004, ApJ, 617, 782
 Jing Y. P., Mo H. J., Boerner G., 1998, ApJ, 494, 1
 Kaiser N., 1987, MNRAS, 227, 1
 Kang X., Jing Y. P., Mo H. J., Börner G., 2002, MNRAS, 336, 892
 Kayo I., Taruya A., Suto Y., 2001, ApJ, 561, 22
 Kitaura F. S., Enßlin T. A., 2008, MNRAS, 389, 497
 Kitaura F. S., Jasche J., Li C., Enßlin T. A., Metcalf R. B., Wandelt B. D., Lemson G., White S. D. M., 2009, ArXiv eprints
 Kitaura F. S., Jasche J., Metcalf R. B., 2009, ArXiv eprints
 Klypin A., Hoffman Y., Kravtsov A. V., Gottlöber S., 2003, ApJ, 596, 19
 Kuehn F., Ryden B. S., 2005, ApJ, 634, 1032
 Lahav O., 1994, in Balkowski C., KraanKorteweg R. C., eds, ASP Conf. Ser. 67: Unveiling LargeScale Structures Behind the Milky Way Wiener Reconstruction of AllSky Spherical Harmonic Maps of the LargeScale Structure. p. 171
 Lahav O., Fisher K. B., Hoffman Y., Scharf C. A., Zaroubi S., 1994, ApJL, 423, L93+
 Layzer D., 1956, AJ, 61, 383
 Lee J., Erdogdu P., 2007, ApJ, 671, 1248
 Lee J., Lee B., 2008, ApJ, 688, 78
 Lee J., Li C., 2008, ArXiv eprints
 Lemson G., Kauffmann G., 1999, MNRAS, 302, 111
 Lewis I., et al., 2002, MNRAS, 334, 673
 Li C., Kauffmann G., Jing Y. P., White S. D. M., Börner G., Cheng F. Z., 2006, MNRAS, 368, 21
 Libeskind N. I., Yepes G., Knebe A., Gottloeber S., Hoffman Y., Knollman S. R., 2009, ArXiv eprints
 Magira H., Jing Y. P., Suto Y., 2000, ApJ, 528, 30
 Martínez V. J., Saar E., 2002, Statistics of the Galaxy Distribution. Chapman
 MartinezVaquero L. A., Yepes G., Hoffman Y., Gottlöber S., Sivan M., 2009, MNRAS, 397, 2070
 Matsubara T., Suto Y., 1996, ApJL, 470, L1
 Neal R. M., 1993, Technical Report CRGTR931, Probabilistic inference using Markov chain Monte Carlo methods. University of Toronto
 Neal R. M., 1996, Bayesian Learning for Neural Networks (Lecture Notes in Statistics), 1 edn. Springer
 Novikov D., Colombi S., Doré O., 2006, MNRAS, 366, 1201
 Park C., Choi Y., Vogeley M. S., Gott J. R. I., Blanton M. R., 2007, ApJ, 658, 898
 Peacock J. A., Dodds S. J., 1994, MNRAS, 267, 1020
 Peacock J. A., Smith R. E., 2000, MNRAS, 318, 1144
 Peebles P. J. E., 1980, The largescale structure of the universe
 Percival W. J., White M., 2009, MNRAS, 393, 297
 Popowski P. A., Weinberg D. H., Ryden B. S., Osmer P. S., 1998, ApJ, 498, 11
 Postman M., Geller M. J., 1984, ApJ, 281, 95
 Raftery A. E., Lewis S. M., 1995, in In Practical Markov Chain Monte Carlo (W.R. Gilks, D.J. Spiegelhalter and The number of iterations, convergence diagnostics and generic metropolis algorithms. Chapman and Hall, pp 115–130
 Rojas R. R., Vogeley M. S., Hoyle F., Brinkmann J., 2005, ApJ, 624, 571
 Saunders W., Ballinger W. E., 2000, in KraanKorteweg R. C., Henning P. A., Andernach H., eds, Mapping the Hidden Universe: The Universe behind the Mily Way  The Universe in HI Vol. 218 of Astronomical Society of the Pacific Conference Series, Interpolation of DiscretelySampled Density Fields. p. 181
 Saunders W., et al., 2000, in R. C. KraanKorteweg, P. A. Henning, & H. Andernach ed., Mapping the Hidden Universe: The Universe behind the Mily Way  The Universe in HI Vol. 218 of Astronomical Society of the Pacific Conference Series, The IRAS View of the Local Universe. p. 141
 Scoccimarro R., 2004, Phys. Rev. D, 70, 083007
 Seljak U., 2000, MNRAS, 318, 203
 Sheth R. K., 1995, MNRAS, 277, 933
 Smith R. E., Peacock J. A., Jenkins A., White S. D. M., Frenk C. S., Pearce F. R., Thomas P. A., Efstathiou G., Couchman H. M. P., 2003, MNRAS, 341, 1311
 Spergel D. N., et al., 2007, ApJS, 170, 377
 Taylor A., Valentine H., 1999, MNRAS, 306, 491
 Tegmark M., et al., 2004, Phys. Rev. D, 69
 Tegmark M., et al., 2006, Phys. Rev. D, 74, 123507
 van de Weygaert R., Schaap W., 2001, in Banday A. J., Zaroubi S., Bartelmann M., eds, Mining the Sky Tessellation Reconstruction Techniques. p. 268
 Webster M., Lahav O., Fisher K., 1997, MNRAS, 287, 425
 Whitmore B. C., Gilmore D. M., Jones C., 1993, ApJ, 407, 489
 York D. G., et al., 2000, AJ, 120, 1579
 Zaninetti L., 1995, A&A Suppl. Ser., 109, 71
 Zaroubi S., Hoffman Y., Dekel A., 1999, ApJ, 520, 413
 Zaroubi S., Hoffman Y., Fisher K. B., Lahav O., 1995, ApJ, 449, 446