Bayesian inference of cosmic density fields from nonlinear, scaledependent, and stochastic biased tracers
Abstract
We present a Bayesian reconstruction algorithm to generate unbiased samples of the underlying dark matter field from halo catalogues. Our new contribution consists of implementing a nonPoisson likelihood including a deterministic nonlinear and scaledependent bias. In particular we present the Hamiltonian equations of motions for the negative binomial (NB) probability distribution function. This permits us to efficiently sample the posterior distribution function of density fields given a sample of galaxies using the Hamiltonian Monte Carlo technique implemented in the argo code. We have tested our algorithm with the Bolshoi body simulation at redshift , inferring the underlying dark matter density field from subsamples of the halo catalogue with biases smaller and larger than one. Our method shows that we can draw closely unbiased samples (compatible within 1) from the posterior distribution up to scales of about 1 Mpc in terms of powerspectra and celltocell correlations. We find that a Poisson likelihood including a scaledependent nonlinear deterministic bias can yield reconstructions with power spectra deviating more than 10% at Mpc. Our reconstruction algorithm is especially suited for emission line galaxy data for which a complex nonlinear stochastic biasing treatment beyond Poissonity becomes indispensable.
keywords:
largescale structure of Universe – cosmology: theory – catalogues – galaxies: statistics1 Introduction
The largescale structure encodes the key information to understand structure formation and the expansion of the Universe. However, the luminous objects, i.e., the galaxies tracing the largescale structure, represent only a biased fraction of the total underlying matter governing the laws of gravity.
In the era of precision cosmology, the data analysis methods need to account for the nonlinear, biased and discrete nature of the distribution of galaxies to accurately extract any valuable cosmological information. This becomes even more important with the advent of the new generation of galaxy surveys. Many of these surveys rely on emission line galaxies, see WiggleZ^{1}^{1}1http://wigglez.swin.edu.au/site/ (Drinkwater et al., 2010), VIPERS^{2}^{2}2http://vipers.inaf.it/ (Guzzo & The Vipers Team, 2013), DESI^{3}^{3}3http://desi.lbl.gov//BigBOSS (Schlegel et al., 2011), DES^{4}^{4}4http://www.darkenergysurvey.org (Frieman & Dark Energy Survey Collaboration, 2013), LSST ^{5}^{5}5http://www.lsst.org/lsst/ (LSST Dark Energy Science Collaboration, 2012), JPAS^{6}^{6}6http://jpas.org/ (Benitez et al., 2014), 4MOST^{7}^{7}7http://www.aip.de/en/research/researchareaea/researchgroupsandprojects/4most (de Jong et al., 2012) or Euclid^{8}^{8}8http://www.euclidec.org (Cimatti et al., 2009; Laureijs, 2009). These objects provide denser sampled volumes, deeply tracing the nonlinear cosmic web structure. Moreover they introduce a more complex biasing, covering a wider range of galaxy masses, as compared to, e.g., luminous red galaxies.
We present in this work a Bayesian approach designed to deal with nonlinear stochastic biased tracers. We consider nonPoisson probability distribution functions (PDFs) for the likelihood modeling the distribution of galaxies. In this way we account for the overdispersion (larger dispersion than Poisson) of galaxy counts. We model the expected galaxy number density relating it to the dark matter density through a nonlinear scaledependent expression extracted from body simulations (see Cen & Ostriker (1993); de la Torre & Peacock (2013); Kitaura, Yepes & Prada (2014); Neyrinck et al. (2014); Ahn et al. (2014)). In this way we extend the works based on the Poisson and linear bias models (Kitaura & Enßlin (2008); Kitaura, Jasche & Metcalf (2010); Jasche & Kitaura (2010); Jasche & Wandelt (2013)) following the ideas presented in Kitaura (2012a). In particular, we implement these improvements in the argo Hamiltoniansampling code, which is able to jointly infer density, peculiar velocity fields and powerspectra (Kitaura, Gallerani & Ferrara (2012)). For the prior distribution describing structure formation of the dark matter field we use the lognormal assumption (Coles & Jones (1991)). We note however, that this prior can be substituted by another one, e.g., based on Lagrangian perturbation theory (see Kitaura (2013); Jasche & Wandelt (2013); Wang et al. (2013); Heß, Kitaura & Gottlöber (2013)). Alternatively, one can extend the lognormal assumption in an Edgeworth expansion to include higher order correlation functions (Colombi (1994); Kitaura (2012b)). We show in this work that our likelihood model is able of yielding unbiased dark matter field reconstructions on Mpc scales based on body simulations.
2 Method
To infer the dark matter density field from biased tracers such as galaxies or halos, one has to define a target distribution, called posterior PDF. We use the Bayesian framework to express the posterior distribution based on a model PDF for the data, the likelihood, and a model PDF for the signal, the prior. We then recap the Hamiltonian sampling technique used in this work to sample from such a PDF, and present the Hamiltonian equations of motions for our model.
2.1 Bayesian approach: the posterior distribution function
In this work we will restrict ourselves to the reconstruction of dark matter fields given a set of biased tracers in realspace. We note that redshiftspace distortions can be corrected within a Gibbssampling scheme and leave this additional complication for a later work (see Kitaura & Enßlin, 2008; Kitaura, Gallerani & Ferrara, 2012).
Let us divide the volume under consideration into a grid with cells. Our input data vector is given by the number counts of halos or galaxies per cell and the desired signal is the dark matter density . In addition, we need to assume some model for the dark matter distribution and for the bias relating the number counts of galaxies to the underlying dark matter distribution , including the nonlinear deterministic and stochastic parameters (see §2.2 and §2.3). We will assume in this work that there is one population of tracers and that they can be described with a set of bias parameters. Nevertheless, we show in an appendix (see §A) that we could subdivide the sample into various tracer types, each one with its own bias parameters, and combine them in a multitracer analysis in a straightforward way, within the methodology presented here. The posterior distribution function of dark matter fields given , and can be expressed within the Bayesian framework as the product between the prior and the likelihood up to a normalization
2.2 The Likelihood: stochastic bias
Let us start defining the likelihood, which models the statistical nature of the data. First we assume that the biasing relation is known and define the expected number count by , where denotes the ensemble average over the halo or galaxy realization. At this point we need to introduce the stochastic biasing parameters , necessary to model the deviation from the Poisson distribution . The positive (negative) correlation of halos at subgrid resolution introduces over (under) dispersed distributions depending on the halo population and density regime (see Somerville et al., 2001; CasasMiranda et al., 2002; Neyrinck et al., 2014). This effect was already predicted by Peebles (1980). We focus on modeling overdispersion, as underdispersion is a subdominant effect, only present for very massive objects (see Baldauf et al., 2012, 2013). We note, that stochastic bias has been studied in a number of works, more or less explicitly (see, e.g., Press & Schechter, 1974; Peacock & Heavens, 1985; Bardeen et al., 1986; Fry & Gaztanaga, 1993; Mo & White, 1996; Dekel & Lahav, 1999; Sheth & Lemson, 1999; Seljak, 2000; Berlind & Weinberg, 2002; Smith, Scoccimarro & Sheth, 2007; Desjacques et al., 2010; Beltrán Jiménez & Durrer, 2011; Valageas & Nishimichi, 2011; Elia, Ludlow & Porciani, 2012; Chan, Scoccimarro & Sheth, 2012; Baldauf et al., 2012, 2013).
For a given distribution function with expectation value , observed number count and the set of stochastic bias parameters , the likelihood can be written as follows
(2) 
The product is computed over the number of cells , that corresponds to the number of dimensions of the problem.
Let us consider the negative binomial distribution (NB) and the gravitational thermodynamics distribution by Saslaw & Hamilton (1984) (GT), to describe the deviation from Poissonity, which require a single stochastic bias parameter and , respectively. The Poisson the NB and the GT distributions are written as
(3)  
(4)  
respectively. The parameters and are connected to the expectation value and the variance by and , respectively. This implies that the overdispersion term shows a quadratic and a linear dependence on the expected halo number count for the NB and the GT case, respectively. To obtain a different dependence, one could take the NB expression and include a dependence of on . For we find that the NB and the GT PDFs are equivalent in terms of overdispersion. For this reason we will focus in the following on the NB PDF and leave a study on the dependence for a later work. One could investigate these dependencies for different population of halos, by taking for instance, ensembles of high resolution simulations like those in AragonCalvo (2012).
2.3 Link between prior and likelihood: deterministic bias
Let us now define the link between the likelihood and the desired signal, i.e., the dark matter density field. This is given by the deterministic bias relating the expected number counts to the dark matter overdensity field, which is in general nonlinear, scale dependent and nonlocal. We note that nonlocality introduces a scatter which can be absorbed in the stochastic bias (see discussion in Kitaura et al., 2014).
One could expand the dark matter overdensity field (with being the mean dark matter density) in a Taylor series (Fry & Gaztanaga, 1993). Alternatively one could follow Cen & Ostriker (1993) and expand the series based on the logarithm of the density field (avoiding in this way negative densities allowed in the previous expansion). This model to linear order, corresponding to a powerlaw of the dark matter field, has been proposed for resolution augmentation of body simulations (see de la Torre & Peacock, 2013). The power law can be interpreted as a linear bias factor in Lagrangian space which undergoes gravitational evolution within the lognormal approximation. It has recently been found that the bias is very well fit by a compact relation including an exponential cutoff: (see Neyrinck et al., 2014), which can be approximated by a Heaviside stepfunction (, if , else , see Kitaura, Yepes & Prada, 2014). This model is in concordance with the Press & Schechter (1974) and the peak background split formalism (see, e.g., Kaiser, 1984; Bardeen et al., 1986; Cole & Kaiser, 1989; Mo & White, 1996; Sheth, Mo & Tormen, 2001), which permit the formation of haloes only above a certain density threshold. It has recently been shown to be a crucial ingredient in the halo three point statistics (Kitaura et al., 2014). The deterministic bias model including the density powerlaw and the density cutoff is given by the following expression for the expected halo/galaxy density:
(6) 
where is the completeness at each cell . In this way our method is able of dealing with incomplete data samples as well (see also Kitaura, Jasche & Metcalf, 2010; Jasche & Kitaura, 2010; Kitaura, Gallerani & Ferrara, 2012). The normalization ensuring a particular mean number count is given by
(7) 
where denotes the ensemble average over all possible dark matter realizations (cosmic variance).
2.4 The Prior: statistics of dark matter fields
Let us define now the model describing the distribution of dark matter density fields . The Gaussian assumption has since long been commonly used in the literature. This assumption is for instance present in the Wiener reconstruction method (see, e.g., Zaroubi et al., 1995; Kitaura et al., 2009). Nevertheless, it is well known that gravity induces deviations from Gaussianity. For this reason a nonGaussian model is essential to exploit the accuracy of our likelihood model. For the sake of simplicity we will restrict this work to the lognormal assumption (see Coles & Jones (1991), and its implementation within a Bayesian context Kitaura, Jasche & Metcalf (2010)). We note, however, that any structure formation model can be implemented at this point as has been shown in a series of recent works (see Kitaura, 2013; Jasche & Wandelt, 2013; Wang et al., 2013; Heß, Kitaura & Gottlöber, 2013; Wang et al., 2014). Moreover, the lognormal assumption has recently been shown to satisfy sufficient statistics Carron & Szapudi (2014), and thereby extract the maximal cosmological information. However, it is known that the lognormal approximation fails at modeling the largescale structure in the low density regime (see, e.g., Colombi, 1994). However, it has been shown to be a good approximation for the moderate to high density regime (Kitaura, 2012a). We leave an extension with more complex models like those including higher order statistics for later work (Kitaura, 2012b).
Let us define the signal distribution as a logarithmic transformation of the matter density field
(8) 
with the mean field ensuring that the mean of vanishes, . The Gaussian prior for the logarithmically transformed density field is then given as
(9) 
where the covariance matrix (or power spectrum in Fourierspace) depends on the set of cosmological parameters , where denotes the ensemble average over all possible lognormal fields. In practice, we assume a linear power spectrum and the corresponding covariance matrix.
2.5 Sampling from the posterior distribution function
To sample from the posterior distribution we rely on the Hamiltonian Monte Carlo sampling technique (HMC), which is a hybrid Markov Chain Monte Carlo (MCMC) approach that uses physically motivated equations of motion to explore the phase space avoiding inefficient random walks. Let us recap here the basics of Hamiltonian sampling. For more details we refer to Duane et al. (1987), Neal (1993) and more recently Neal (2012)). For applications to astronomy we refer to, e.g., Taylor, Ashdown & Hobson (2008), Jasche et al. (2010), and Kitaura et al. (2012). Here we briefly point to the main features and also emphasize the modifications we implemented. The HMC approach treats the problem as a thermodynamical system. The contact with a heat bath moves the system into equilibrium, i.e., to the statistical space in which samples are drawn from the posterior distribution . The key idea of the HMC approach consists of moving the system by solving the Hamiltonian equations of motions, which involve a stochastic kinetic term. Let us define in this physical analogy the potential (), kinetic () and Hamiltonian () energy through the following relations
(10)  
(11) 
where is the potential function at the coordinate vector and is the kinetic term of the momentum of the form . Combining Eqs. 10 & 11, we see that the target distribution can be inferred from
(12) 
This Eq. shows that drawing samples from yields the desired distribution if one is withdrawing the kinetic term, by marginalizing over the momenta. The momenta are drawn from a multivariate Gaussian distribution with covariance matrix . In order to explore the high dimensional phase space we need to evolve the system. Therefore we solve the Hamiltonian equations of motion
(13)  
(14) 
Having fully evolved the initial position of the system, we apply a criterion to accept or withdraw the new point in phase space that writes
(15) 
2.6 Hamiltonian Equations of motion
As shown in Eq. 10 ,13 ,14 and 15, we need to calculate the logarithm of the target distribution and its gradient term. Let us summarize below the potential energy terms and the corresponding analytic gradient expressions required in our work.
2.6.1 Gaussian prior
We calculate the negative logarithm of Eq. 9 as
(16) 
where incorporates all constant terms of the normalization.
The derivative w.r.t. the signal writes
(17) 
Here we substituted the dummy variable with the logarithmic signal variable we want to evaluate. We note, that more complex schemes connecting the initial conditions with the final gravitationally evolved density fields assume this prior for the primordial fluctuations, but include the Lagrangian to Eulerian mapping in the likelihood (see Kitaura (2013); Jasche & Wandelt (2013); Wang et al. (2013); Heß, Kitaura & Gottlöber (2013)). We will show below the likelihood expressions for the lognormal case, but include more general expressions in the appendix (see §B).
2.6.2 Poisson Likelihood
The terms for the Poisson likelihood write as follows
(18) 
Finally the derivative w.r.t. to signal variable writes as
(19) 
2.6.3 NonPoisson Likelihood: Negative Binomial
We calculate the corresponding terms for the NB distribution:
(21)  
Corresponding to Eq. 19 we can write the derivative of these likelihood functions as
(22) 
2.7 Leap frog scheme
To explore the phase space and solve iteratively the Hamiltonian equations of motion we use the leapfrog scheme
(23)  
(24)  
(25) 
with being the number of steps, the step size and the total pseudo time given by .
3 Validation of the method
To validate our algorithm we take the Bolshoi dark matter simulation (Klypin, TrujilloGomez & Primack, 2011) at redshift , which was created using following cosmological parameters: .
We consider two different subsamples of the halo catalogue, named and , created with the bounddensitymaxima BDM halo finder Klypin & Holtzman (1997) as inputs to our method.
For we randomly select haloes of the masses between to . These mass limits represent the total mass range of the Boshoi simulation (see appendix D) and yield a bias lower than one (see Fig. 2)
Additionally we consider the subsample with a mass cut of (see appendix D), resulting in haloes and yielding a bias larger than one (see Fig. 3). Subsample permits us to test threshold bias described in Eq. 6, which is negligible for subsample . To technically overcome divergencies caused by the threshold bias we introduce a Gibbs sampling procedure, described in appendix B.3
We use the nearestgridpoint (NGP) scheme to compute the number density for each cell on a mesh with cells. We use the dark matter distribution from the Bolshoi simulation as a reference to estimate the accuracy of our reconstructions and define it as the true dark matter field.
To test the different models described in §2, we apply our novel argo code running a series of Hamiltonian Markov chains with different likelihoods and bias assumptions:

Poisson likelihood and unity bias using subsample ,

Poisson likelihood and powerlaw bias using subsample ,

NB likelihood and powerlaw bias using subsample ,

NB likelihood and powerlaw bias including thresholding using subsample .
We disregard the first 2000 iterations of the chains until the power spectra have converged and use a total of 10000 iterations for our analysis. The convergence behavior is estimated through the Gelman & Rubin (1992) test in the appendix (see §C).
Fig. 1 shows a slice through the volume of the Bolshoi simulation. This figure illustrates the problem, as the haloes (middle panel) represent in our casestudy only 4 to 5 orders of magnitude less matter tracers than the dark matter particles used for the simulation (left panel). We have plotted the means of an ensemble of 10000 reconstructions using the NB model including a powerlaw bias (upper right panel without and lower right panel with thresholding bias) and find that the relevant structures are in general terms very well recovered. The filaments in the low density regions are less accurately reconstructed (see upper and lower right panel). This is expected from the low signaltonoise ratio in those regions. Moreover, the models we are using for the low density regime are not optimal. We lack a good description of the dark matter field in the low density regime due to the lognormal approximation. We also find that the high density peaks are less pronounced than in the true dark matter field. This is caused by the smoothing introduced from averaging the reconstructed samples, i.e., the mean estimator yields, as expected, a conservative reconstruction.
3.1 Twopoint statistics: power spectra
Let us further investigate the twopoint statistics of the reconstructed fields in Fourier space, i.e., the power spectrum. This is shown in Fig. 2 and Fig. 3. In these plots we can clearly see the scaledependent bias of the halo samples (solid green line) with respect to the dark matter field (solid black line). The power spectra of the halo fields have been corrected for the mass assignment kernel and shotnoise following Jing (2005). Therefore, the halo power spectra can be trusted up to approximatly , which is about 50% of the Nyquist frequency. In Fig. 2 the reconstruction using the Poisson likelihood with unity bias (dashed magenta line) is very close to the halo power spectrum, esentially confirming that the shotnoise correction follows a Poisson model, as the deterministic bias is neglected in this run. We find a considerable improvement by adjusting the powerlaw bias (dotted red line). Nevertheless, the power spectrum lackes power towards high s, since the dispersion has been underestimated. We register 10% deviation at Mpc. Only when modeling also overdispersion with the NB likelihood we find an excellent agreement with the dark matter power spectrum up to high s of nearly 1Mpc. This is in agreement with the findings of Kitaura, Yepes & Prada (2014), however, by performing an inference analysis. We also find that the power spectra show a larger scatter for the NB than for the Poisson case (compare cyan with yellow shaded regions, respectively). This is expected from the larger dispersion of the NB model.
3.2 Celltocell cross correlation
To further assess the accuracy of our reconstructions, we compare them to the true dark matter field within a celltocell correlation (see Fig. 4). To have a fair comparison we smooth each catalogue with a Gaussian kernel with smoothing length of Mpc(see left panels in Fig. 4). This should compensate for the different number density of haloes and dark matter particles. For the argo reconstructions the average over 10000 samples is shown. We find that the haloes and the Poisson unity bias case show very similar celltocell correlations, which are strongly biased towards high densities. This could have interesting applications to reconstruct the expected halo density field (see appendix E, Fig. 7) In the low density regime we can see that the reconstruction is more biased than the halo sample. This must be caused by the lognormal assumption, as the bias was set to one in this run. The negative binomial reconstruction shows unbiased celltocell correlations towards high densities, and the same bias at low densities. We can see that the scatter is larger for the NB than for the Poisson unity bias case or the halo distribution. This is expected from the stochasticity included in the model. Nonlocal contributions, which may be deterministic, are absorbed in our stochastic bias model, thereby, possibly enhancing the scatter. Interestingly, we see in our run (iv) including the thresholding bias, that the biased low halo density (see lower left panel in Fig. 4) is corrected in the low density region (see lower right panel in Fig. 4), achieving a similar accuracy to the run (iii) where haloes are included in the low density regions (see upper right panel in Fig. 4). We also find in run (iv) using subsample (moderate massive objects of minimum mass of ), that the deviation from Poissonity is negligible. We expect an underdispersed distribution for more massive haloes, clusters or quasars. The method presented in this work is therefore optimal for overdispersed tracers like emission line galaxies.
4 Conclusions
We have presented a Bayesian reconstruction algorithm, which is able to produce unbiased samples of the underlying dark matter field from nonlinear stochastic biased tracers.
We find in this study that an accurate reconstruction of the dark matter field, requires both modeling of the stochastic and the deterministic bias. Our results using a powerlaw bias model and the negative binomial distribution function are very encouraging, as they produce unbiased statistics over a wide range of scales and density regimes.
We have focused on the negative binomial distribution function and discussed for which parameter dependencies it can be equivalent to the gravitational thermodynamics PDF. Furthermore, we model the deterministic bias relating the expected galaxy number density to the dark matter density through a nonlinear scaledependent expression.
We have presented the Hamiltonian equations of motions for our model and implemented them in the argo code. We have shown that this permits us to efficiently sample the posterior distribution function of density fields given a set of biased tracers. We have also introduced a Gibbssampling scheme to deal with strongly biased objects tracing the high density peaks.
In particular, we have tested our algorithm with the Bolshoi body simulation, inferring the underlying dark matter density field from a subsample of the corresponding halo catalogue. We found that a Poisson likelihood yields reconstructions with power spectra deviating more than 10% at Mpc. Our method shows that we can draw closely unbiased samples (compatible within 1 ) from the posterior distribution up to scales of about 1 Mpc in terms of powerspectra and celltocell correlations with the NB PDF.
We have furthermore analytically shown that our method can deal with incomplete data and perform a multitracer analysis. Further investigation need to be done here to demonstrate the level of accuracy of such approaches.
We will demonstrate in a forthcoming publication that we can also correct for redshift space distortions in an iterative way. Our work represents the first attempt to deal with nonlinear stochastic bias in a reconstruction algorithm and will contribute towards an optimal analysis of galaxy surveys.
Acknowledgments
MA thanks the FriedrichEbertFoundation. The authors thank Benjamin Granett for encouraging discussions on the multitracer analysis. The MultiDark Database used in this paper and the web application providing online access to it were constructed as part of the activities of the German Astrophysical Virtual Observatory as result of a collaboration between the LeibnizInstitute for Astrophysics Potsdam (AIP) and the Spanish MultiDark Consolider Project CSD200900064.
References
 Ahn et al. (2014) Ahn K., Iliev I. T., Shapiro P. R., Srisawat C. B., 2014, ArXiv eprints
 AragonCalvo (2012) AragonCalvo M. A., 2012, ArXiv eprints
 Baldauf et al. (2012) Baldauf T., Seljak U., Desjacques V., McDonald P., 2012, Phys.Rev.D, 86, 083540
 Baldauf et al. (2013) Baldauf T., Seljak U., Smith R. E., Hamaus N., Desjacques V., 2013, ArXiv eprints
 Bardeen et al. (1986) Bardeen J. M., Bond J. R., Kaiser N., Szalay A. S., 1986, ApJ, 304, 15
 Beltrán Jiménez & Durrer (2011) Beltrán Jiménez J., Durrer R., 2011, Phys.Rev.D, 83, 103509
 Benitez et al. (2014) Benitez N. et al., 2014, ArXiv eprints
 Berlind & Weinberg (2002) Berlind A. A., Weinberg D. H., 2002, ApJ, 575, 587
 Carron & Szapudi (2014) Carron J., Szapudi I., 2014, ArXiv eprints
 CasasMiranda et al. (2002) CasasMiranda R., Mo H. J., Sheth R. K., Boerner G., 2002, MNRAS, 333, 730
 Cen & Ostriker (1993) Cen R., Ostriker J. P., 1993, ApJ, 417, 415
 Chan, Scoccimarro & Sheth (2012) Chan K. C., Scoccimarro R., Sheth R. K., 2012, Phys.Rev.D, 85, 083509
 Cimatti et al. (2009) Cimatti A., Robberto M., Baugh C., Beckwith S. V. W., Content R., Daddi E., De Lucia G., et al, 2009, Experimental Astronomy, 23, 39
 Cole & Kaiser (1989) Cole S., Kaiser N., 1989, MNRAS, 237, 1127
 Coles & Jones (1991) Coles P., Jones B., 1991, MNRAS, 248, 1
 Colombi (1994) Colombi S., 1994, ApJ, 435, 536
 de Jong et al. (2012) de Jong R. S. et al., 2012, in Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series, Vol. 8446, Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series
 de la Torre & Peacock (2013) de la Torre S., Peacock J. A., 2013, MNRAS, 435, 743
 Dekel & Lahav (1999) Dekel A., Lahav O., 1999, ApJ, 520, 24
 Desjacques et al. (2010) Desjacques V., Crocce M., Scoccimarro R., Sheth R. K., 2010, Phys.Rev.D, 82, 103529
 Drinkwater et al. (2010) Drinkwater M. J., Jurek R. J., Blake C., Woods D., Pimbblet K. A., Glazebrook K., Sharp R., et al, 2010, MNRAS, 401, 1429
 Duane et al. (1987) Duane S., Kennedy A. D., Pendleton B. J., Roweth D., 1987, Physics Letters B, 195, 216
 Elia, Ludlow & Porciani (2012) Elia A., Ludlow A. D., Porciani C., 2012, MNRAS, 421, 3472
 Frieman & Dark Energy Survey Collaboration (2013) Frieman J., Dark Energy Survey Collaboration, 2013, in American Astronomical Society Meeting Abstracts, Vol. 221, American Astronomical Society Meeting Abstracts, p. 335.01
 Fry & Gaztanaga (1993) Fry J. N., Gaztanaga E., 1993, ApJ, 413, 447
 Gelman & Rubin (1992) Gelman A., Rubin D., 1992, Statistical Science, 7, 457, http://www.stat.columbia.edu/~gelman/research/published/itsim.pdf
 Guzzo & The Vipers Team (2013) Guzzo L., The Vipers Team, 2013, The Messenger, 151, 41
 Heß, Kitaura & Gottlöber (2013) Heß S., Kitaura F.S., Gottlöber S., 2013, MNRAS
 Jasche & Kitaura (2010) Jasche J., Kitaura F.S., 2010, MNRAS, 407, 29
 Jasche et al. (2010) Jasche J., Kitaura F.S., Li C., Enßlin T. A., 2010, MNRAS, 1638
 Jasche & Wandelt (2013) Jasche J., Wandelt B. D., 2013, MNRAS, 432, 894
 Jing (2005) Jing Y. P., 2005, ApJ, 620, 559
 Kaiser (1984) Kaiser N., 1984, ApJ, 284, L9
 Kitaura (2012a) Kitaura F.S., 2012a, Bayesian Analysis of Cosmic Structures, Sarro L. M., Eyer L., O’Mullane W., De Ridder J., eds., p. 143
 Kitaura (2012b) Kitaura F.S., 2012b, MNRAS, 420, 2737
 Kitaura (2013) Kitaura F.S., 2013, MNRAS, 429, L84
 Kitaura & Enßlin (2008) Kitaura F.S., Enßlin T. A., 2008, MNRAS, 389, 497
 Kitaura et al. (2012) Kitaura F.S., Erdoǧdu P., Nuza S. E., Khalatyan A., Angulo R. E., Hoffman Y., Gottlöber S., 2012, MNRAS, 427, L35
 Kitaura, Gallerani & Ferrara (2012) Kitaura F.S., Gallerani S., Ferrara A., 2012, MNRAS, 420, 61
 Kitaura et al. (2014) Kitaura F.S., GilMarín H., Scoccola C., Chuang C.H., Müller V., Yepes G., Prada F., 2014, ArXiv eprints
 Kitaura et al. (2009) Kitaura F.S., Jasche J., Li C., Enßlin T. A., Metcalf R. B., Wandelt B. D., Lemson G., White S. D. M., 2009, MNRAS, 400, 183
 Kitaura, Jasche & Metcalf (2010) Kitaura F.S., Jasche J., Metcalf R. B., 2010, MNRAS, 403, 589
 Kitaura, Yepes & Prada (2014) Kitaura F.S., Yepes G., Prada F., 2014, MNRAS, 439, L21
 Klypin & Holtzman (1997) Klypin A., Holtzman J., 1997, ArXiv Astrophysics eprints
 Klypin, TrujilloGomez & Primack (2011) Klypin A. A., TrujilloGomez S., Primack J., 2011, ApJ, 740, 102
 Laureijs (2009) Laureijs R., 2009, ArXiv eprints
 LSST Dark Energy Science Collaboration (2012) LSST Dark Energy Science Collaboration, 2012, ArXiv eprints
 Mo & White (1996) Mo H. J., White S. D. M., 1996, MNRAS, 282, 347
 Neal (1993) Neal R. M., 1993, in Technical Report CRGTR931, Dept. of Computer Science, University of Toronto
 Neal (2012) Neal R. M., 2012, ArXiv eprints
 Neyrinck et al. (2014) Neyrinck M. C., AragónCalvo M. A., Jeong D., Wang X., 2014, MNRAS, 441, 646
 Peacock & Heavens (1985) Peacock J. A., Heavens A. F., 1985, MNRAS, 217, 805
 Peebles (1980) Peebles P. J. E., 1980, The largescale structure of the universe. Research supported by the National Science Foundation. Princeton, N.J., Princeton University Press, 1980. 435 p.
 Press & Schechter (1974) Press W. H., Schechter P., 1974, ApJ, 187, 425
 Saslaw & Hamilton (1984) Saslaw W. C., Hamilton A. J. S., 1984, ApJ, 276, 13
 Schlegel et al. (2011) Schlegel D., Abdalla F., Abraham T., Ahn C., Allende Prieto C., Annis J., Aubourg E., et al, 2011, ArXiv eprints
 Seljak (2000) Seljak U., 2000, MNRAS, 318, 203
 Sheth & Lemson (1999) Sheth R. K., Lemson G., 1999, MNRAS, 304, 767
 Sheth, Mo & Tormen (2001) Sheth R. K., Mo H. J., Tormen G., 2001, MNRAS, 323, 1
 Smith, Scoccimarro & Sheth (2007) Smith R. E., Scoccimarro R., Sheth R. K., 2007, Phys.Rev.D, 75, 063512
 Somerville et al. (2001) Somerville R. S., Lemson G., Sigad Y., Dekel A., Kauffmann G., White S. D. M., 2001, MNRAS, 320, 289
 Taylor, Ashdown & Hobson (2008) Taylor J. F., Ashdown M. A. J., Hobson M. P., 2008, MNRAS, 389, 1284
 Valageas & Nishimichi (2011) Valageas P., Nishimichi T., 2011, Astr.Astrophy., 527, A87
 Wang et al. (2014) Wang H., Mo H. J., Yang X., Jing Y. P., Lin W. P., 2014, ArXiv eprints
 Wang et al. (2013) Wang H., Mo H. J., Yang X., van den Bosch F. C., 2013, ApJ, 772, 63
 Zaroubi et al. (1995) Zaroubi S., Hoffman Y., Fisher K. B., Lahav O., 1995, ApJ, 449, 446
Appendix A Multitracer analysis
Let us assume that we have a set of halo or galaxy samples . We can write the joint problem of inferring the dark matter field conditioned on the different halo/galaxy samples by the following posterior PDF
(26)  
with being the joint bias.
If the samples have distinct biasing parameters, we can assume that each of the samples is conditioned on the underlying dark matter field only
Hence, we can write the posterior PDF as
For the Hamiltonian sampler (see § 2.6) we need to compute the potential energy, which is defined as
(29)  
This expression permits us to incorporate any additional galaxy sample and combine different galaxy catalogues with the method presented in this work. The above calculations demonstrate that the dark matter field serves as a common denominator for different halo/galaxy samples and allows one to perform a multitracer analysis.
Appendix B Gradient calculations
Here we calculate the derivatives of the prior and the likelihood (see Section 2.5) separately with signal vector .
(30)  
(31) 
b.1 Prior
The derivative for the Gaussian prior for the linearized Gaussian field writes as
(32) 
b.2 Likelihood
We calculate the gradient of the likelihood
(33) 
as follows
(34)  
(35)  
(36) 
We can now just calculate the derivative of the likelihood w.r.t. and get the final result as
(37) 
Taking this into account, the derivative of the Likelihood writes as
(38) 
We note that for , which is identical to the solution of the likelihood.
b.3 Gibbs sampling of the thresholding
The density cutoff bias component introduces a numerical instability, since the additional gradient terms diverge around the density threshold. Therefore we follow a Gibbssampling strategy by considering the stepfunction to be constant with respect to the signal . This permits us to neglect all the terms where gradients and logarithms of the stepfunction appear (see previous section). The scheme can be described as follows:
(39)  
(40) 
We consider a vanishing uncertainty for the PDF of the step function given and , which is equivalent to a Dirac delta density function. We have tested this scheme and found it very stable. Nevertheless, we consider investigating the introduction of some uncertainty in the PDF of the step function in future work. This may yield to an even better agreement in the power spectra.
Appendix C Convergence of the Hamiltonian sampler with nonlinear nonPoisson likelihoods
There is no unique way to estimate the convergence of a Markov Chain nor a stopping criterion in literature. In principle, multiple chains are supposed to converge to the same stationary distribution and thus all samples should be consistent. Also within one running chain the drawn samples should be consistent after the burnin phase. So comparing the means and variances within one converged chain to the samples of independently run chains will probe the convergence of the chains. This has been introduced by Gelman & Rubin (1992). The test can be applied to any parameter without loss of generality.
We assume chains of length . The output of the chain is denoted as , with and . In our case would the multidimensional ensemble of the overdensity of cell in our reconstructed volume. For simplicity we show the calculations for a one dimensional observable . Starting from an identical proposal distribution we calculate as follows

Calculate each chain’s mean value

Calculate each chain’s variance

Calculate all chains’ mean

Calculate the weighted mean of each chain’s variance

Calculate the average variance within one chain

The potential scale reduction factor (PSRF) then is defined as
If all chains converge to the same target distribution, we expect the variance within one chain to be close to the variance between the chains, so that the PSRF to be close to one.
Appendix D Mass ranges of the Bolshoi simulation
In Fig. 6 we show the mass function of the Bolshoi simulation in a cumulative histogram. Our cut of creates a subsample with 13000 tracers.
Appendix E Reconstruction of the halo density field
We also run argo with a Poisson likelihood and unity bias. We can see in Fig. 7 that this model creates smoothed reconstructions of the biased halo field (compare Fig. 4). The averaged celltocell correlation is in excellent agreement with the outcome of the halo field.