# A new method to measure galaxy bias

###### Abstract

We present a new approach for modelling galaxy/halo bias that utilizes the full non-linear information contained in the moments of the matter density field, which we derive using a set of numerical simulations. Although our method is general, we perform a case study based on the local Eulerian bias scheme truncated to second-order. Using 200 -body simulations covering a total comoving volume of , we measure several - and -point statistics of the halo distribution to unprecedented accuracy. We use the bias model to fit the halo-halo power spectrum, the halo-matter cross spectrum and the corresponding three bispectra for wavenumbers in the range . We find the constraints on the bias parameters obtained using the full non-linear information differ significantly from those derived using standard perturbation theory at leading order. Hence, neglecting the full non-linear information leads to biased results for this particular scale range. We also test the validity of the second-order Eulerian local biasing scheme by comparing the parameter constraints derived from different statistics. Analysis of the halo-matter cross-correlation coefficients defined for the 2- and 3-point statistics reveals further inconsistencies contained in the second-order Eulerian bias scheme, suggesting it is too simple a model to describe halo bias with high accuracy.

###### keywords:

cosmology: theory, large-scale structure## 1 Introduction

The clustering statistics of the galaxy distribution contain a wealth of information about the cosmological model. However, in the absence of a robust theory for galaxy formation, extracting this information can only be achieved in part. In practice, to do this requires us to assume a specific phenomenological relationship between the density field of galaxies and that of the underlying matter, more commonly referred to as galaxy bias. Whilst still incomplete, our leading theories of galaxy formation, do provide a great deal of insight about the distribution of galaxies. For instance they predict that galaxies should only reside in dark-matter haloes and be strongly associated with the distribution of sub-structures (for a detailed review of galaxy formation see Mo, van den Bosch & White, 2010). This greatly simplifies our ability to construct a phenomenological model for the galaxy distribution on large scales: it should be closely related to a weighted average of the dark-matter-halo overdensities (e.g. Smith, Scoccimarro & Sheth, 2007).

There are a number of detailed analytical approaches for characterizing the bias of dark-matter haloes with respect to the mass distribution (for a recent review see Porciani, 2013). However, it has yet to be determined which model provides the most accurate description of galaxy bias. In the simplest method, the local Eulerian bias model (hereafter LEB), one assumes that the overdensities of the biased tracers can be written as some function of the matter-density field at the same location. If both densities are smoothed over the patch scale , then the biased field may be written as a Taylor-series expansion (Fry & Gaztanaga, 1993). If one considers sufficiently large patches, then high-order corrections are guaranteed to be small and the series may be truncated after a finite number of terms.

Halo-clustering predictions of the LEB expressed in terms of standard perturbation theory (hereafter SPT, for a review see Bernardeau et al., 2002) have been examined in numerous works (Scoccimarro et al., 2001; Smith, Scoccimarro & Sheth, 2007; Guo & Jing, 2009; Roth & Porciani, 2011; Manera & Gaztañaga, 2011; Pollack, Smith & Porciani, 2012; Chuen Chan & Scoccimarro, 2012). One of the results to emerge from these studies is that, when the model is applied to halo counts within finite volumes of linear size , the coefficients of the bias expansion show a running with the “cell” size. However, halo-clustering statistics such as the -point correlation functions (or the corresponding -spectra) do not contain any smoothing scale and should not depend on . There has been much debate in the literature on how to reconcile these seemingly contrasting results (see Porciani, 2013, for a concise summary).

This has led some to discuss an “effective” or “renormalized” bias approach where the scale-dependence of the bias coefficients is compensated by the contribution of small-scale perturbations in the matter density (Heavens, Matarrese & Verde, 1998; McDonald, 2006; Schmidt, Jeong & Desjacques, 2012). Whilst such a scheme may be plausible (Jeong & Komatsu, 2008; Smith, Hernández-Monteagudo & Seljak, 2009), the development of a unique renormalization method is still ongoing, especially for dynamically evolved configurations in Eulerian space. On the other hand, it was recently proposed by Chuen Chan & Scoccimarro (2012) that the bias parameters obtained counting halos within cells of size are only relevant for describing perturbations of wavenumber in the halo distribution. While there is no challenge to their argument when analyzing power-spectrum data, it does present a complication when using higher-order statistics such as the bispectrum. In order to interpret the galaxy bispectrum one would be required to compute bias coefficients separately for each configuration of wavevectors. This approach appears somewhat cumbersome to implement.

Currently, most observational analyses of galaxy clustering assume that galaxy bias can be described by the truncated LEB and that the statistical properties of the non-linear matter density field can be modelled using SPT. To leading order in the perturbations, this requires only one bias parameter for 2-point statistics of the tracers and two parameters for 3-point statistics. Present-day galaxy surveys, however, do not cover enough comoving volume to accurately sample the spatial scales at which tree-level results provide an accurate description of galaxy clustering. The presence of rare large-scale structures, for instance, significantly alters the measurements of three-point statistics (e.g. Nichol et al., 2006). On smaller scales, where data are more robust, dynamical non-linearities pose a serious challenge to the models. Adopting the simplified LEB+SPT model may therefore generate systematic errors and thus influence the characterisation of the bias or the estimation of the cosmological parameters.

The LEB truncated to second order is the standard workhorse for studying three-point statistics of galaxy clustering. Its predictions to leading perturbative order have been used to interpret measurements from the two-degree field galaxy redshift survey (Verde et al., 2002; Jing & Börner, 2004; Wang et al., 2004; Gaztañaga et al., 2005), the Sloan Digital Sky Survey (Kayo et al., 2004; Hikage et al., 2005; Pan & Szapudi, 2005; Kulkarni et al., 2007; Nishimichi et al., 2007; Marín, 2011; McBride et al., 2011, 2011; Guo et al., 2013), and the WiggleZ Dark Energy Survey (Marín et al., 2013). In our previous study (Pollack, Smith, & Porciani, 2012), we demonstrated that, in order to robustly model three-point statistics with the LEB, one must necessarily have an accurate model for the clustering statistics of the non-linear matter density on the relevant scales. This is imperative to recover the correct values of the bias parameters in controlled numerical experiments. Therefore, it is not surprising that past investigations based on the LEB+SPT model reached inconsistent conclusions. For example, studying the galaxy bispectrum on scales , Verde et al. (2002) concluded that 2dF galaxies are unbiased tracers of the mass distribution. On the other hand, using the complete 2dF sample, Gaztañaga et al. (2005) found strong evidence for non-linear biasing from the analysis of the three-point correlation function with triangle configurations that probe separations between 9 and (see also Jing & Börner, 2004; Wang et al., 2004).

In this paper, we build upon our past experience and present a general method to model the clustering of biased tracers of the mass distribution on mildly non-linear scales . This is key to extend studies of galaxy clustering to smaller spatial separations where observational data are less uncertain. Our method relies on using N-body simulations to measure the relevant statistics for the clustering of the underlying mass distribution. Related approaches have been presented by Sigad, Branchini & Dekel (2000) and Szapudi & Pan (2004) for galaxy counts in cells (see also Pan & Szapudi, 2005, for an application to correlation functions). We apply our general framework to the modelling of -point clustering statistics of non-linear, Eulerian, locally biased tracers. In our framework, bias parameters run with the patch scale . We address the running of the bias by treating the filter scale as a nuisance parameter to be marginalized over. The major advantage of our scheme is that we exactly recover the matter poly-spectra used in the bias model at every order. The only truncation necessary in the model is the choice as to what level to truncate the bias expansion, and this may be selected by the data in a Bayesian model comparison. We test our modelling framework up to quadratic order in the local bias expansion (as commonly done in recent observational studies), for the power- and bi-spectra of haloes and their cross-spectra with matter measured from a large ensemble (200 realizations) of measurements from a series of large CDM -body simulations. This ensemble of simulations resolves the halos that should host luminous red galaxies over a total comoving volume of , and so provides us with a very stringent statistical test ground for our model.

The sections are organized as follows. In §2 we set our mathematical notation and introduce the LEB. The numerical simulations used in this work are briefly described in §3 and used in §4 to measure several statistical quantities for the matter and halo distributions. In §5 we use Bayesian statistics to estimate the free parameters of the LEB and describe our main results. Finally, in §6 and 7 we further discuss our findings and present our conclusions.

## 2 A new framework for modelling the clustering of biased tracers

### 2.1 General formalism

Consider some discrete tracers of the large-scale structure (dark-matter haloes or galaxies) with mean density and physical density . We want to relate this random field to the underlying distribution of matter with local density . If we assume that the density contrast of the tracers averaged over some patch of linear size , , is locally related to the density of matter in the same patch, then we may write

(1) |

where denotes a generic function and the symbols

(2) |

(where stands for haloes or matter) denote smoothed overdensity fields, being a rotation-invariant filter function with size .

Since we are dealing with smooth mathematical functions we may Taylor expand Eq. (1) to obtain (Fry & Gaztanaga, 1993):

(3) |

where the terms are the Eulerian bias coefficients of order , which depend on both the smoothing scale and the exact definition of the tracers (e.g. halo mass, etc.). Note that the subtraction of the terms at each order ensures that , where denote an ensemble average. On Fourier transforming the above relation one finds, for ,

(4) |

where can be written as

(5) |

In the last expression denotes the Dirac-delta distribution and we have made use of the compact notation and .

We now define the power spectrum of the biased tracers and their cross-spectrum with the matter in terms of the correlators:

(6) |

Similarly, the corresponding bispectra can be defined as

(7) | |||||

where we have suppressed the dependence of the bispectrum on the third wavevector, since the Dirac-delta distribution imposes . On inserting Eq. (4) into Eq. (6), we find:

with and (for haloes and matter, respectively) where denotes the Kronecker-delta function. Similarly for Eq. (7) we have:

It is convenient to introduce the functions and such that

(10) |

and

(11) | |||||

In simple words, denotes the cross power spectrum between the smoothed random fields and , while is the corresponding bispectrum. Thus for the halo and matter power and bispectra we have:

(12) | |||||

(13) |

The above sets of equations provide us with models for the power spectra and the bispectra of halo counts in cells of size . However, what we really want to model is the halo 2- and 3-point functions, and . We assume that these quantities can be approximately recovered by “de-smoothing” and (Smith, Scoccimarro & Sheth, 2007; Smith, Sheth & Scoccimarro, 2008; Sefusatti, 2009):

(14) | |||||

(15) |

Note that when considering a model of halo bias beyond linear order this operation does not fully remove the dependence of the theory on . In Section 5, we will use the models presented in Eq. (14) and Eq. (15) to fit simulation data. Nevertheless, our choice to “de-smooth” the theoretical model is equivalent to analyzing counts in cell data with a smoothed model. This is due to the fact that in Fourier-space the smoothing kernels can be treated as multiplicative factors, which means that if we factorize the expressions by dividing out the product of the window functions the relation between the model and the data still holds. Hence, fitting counts in cells data with a smoothed model is indifferent to analyzing unsmoothed data with a “de-smoothed” or factorized model.

The smoothing scale must therefore be considered as a free parameter of the model, and so it must be either determined by fitting a set of data or marginalized over.

In §A.1 and §A.2 we show how the terms and are related to the -point matter spectra, where or , respectively. In §A.3 we prove that the functions are totally symmetric in and . For , the functions are not in general symmetric in , , and , unless the wavevectors are also exchanged, i.e. whilst , . Note that in this study we choose to work with -point spectra, , that are symmetric to an exchange of their vectorial arguments, and we accomplish this through the symmetrization operation:

(16) |

where denotes the -dimensional generalization of the Levi-Civita symbol and we take its absolute value.

In previous studies, the functions and have been modelled through the use of a combination of perturbation theory and semi-empirical models. In Pollack, Smith & Porciani (2012) we recovered these functions exactly from an -body simulation and demonstrated that they are essential to measure the bias parameters in an unbiased way. We will revisit these issues in §4 and §5.

### 2.2 Case study: biasing to second order

As an example, let us evaluate the case when the bias is taken to second order and all higher-order bias coefficients are vanishing. This is a widespread assumption often used to interpret observational data from massive redshift surveys (see §1 for a long list of references). We will consider a unique set of dark-matter haloes. For the case where we have multiple halo bins (e.g. mass selected), the expressions are more cumbersome but no more complicated. Starting with the two-point statistics, one can formulate the halo auto- and cross-power spectra with the total mass up to second-order in the LEB:

(17) | |||||

where from §A.1, we see that

(19) | |||||

(20) |

Note that the functions are -dimensional integrals over the smoothed matter correlators of order , . These include connected and disconnected terms (see §B).

For the three-point statistics, the symmetrized auto- halo and cross-bispectra with respect to the matter, up to second order in the bias model, may be written:

(21) | |||||

(22) | |||||

(23) | |||||

where for brevity we suppressed the dependence of the bispectra on . In §A.2, the functions are -dimensional integrals of the polyspectra of order . Specifically:

(24) | |||||

(25) | |||||

In §4 we show how one may estimate and directly from an -body simulation.

## 3 N-body Simulations

In order to test the LEB and also to determine the covariance matrices of the various spectra we have simulated 200 realizations of a flat CDM cosmological model. The specific cosmological parameters that we have adopted are: where: is the variance of linear mass fluctuations in top-hat spheres of radius ; and are the matter and baryon density parameters; is the dimensionless Hubble parameter in units of 100 km s Mpc; and is the power-law index of the primordial density power spectrum. Our adopted values were inspired by the results from the WMAP experiment (Komatsu et al., 2009).

All of the -body simulations were run using the publicly available Tree-PM code GADGET-2 (Springel, 2005). This code was used to follow with high accuracy the non-linear evolution under gravity of equal mass particles in a periodic comoving cube of length , giving a total sample volume of . Newtonian two-body forces were softened below scales . The transfer function for the simulations was generated using the publicly available cmbfast code (Seljak & Zaldarriaga, 1996), with high sampling of the spatial frequencies on large scales. Initial conditions were laid down at redshift using the serial version of the publicly available 2LPT code (Crocce, Pueblas & Scoccimarro, 2006).

We use only the simulation outputs at redshift for analysis and identify dark matter haloes using the code BFoF. This is a Friends-of-Friends algorithm (Davis et al., 1985), where we adopted a linking length corresponding to times the mean inter-particle spacing. The minimum number of particles an object must contain to be considered a bound halo was set to 20. This implies a minimum halo mass of and a mean number density of . Further details regarding this set of -body simulations can be found in Smith (2009) and Smith et al. (2012).

## 4 Estimating the spectra

In this section we describe how we estimate all the halo and matter polyspectra that enter the second-order LEB from the -body simulations at redshift .

### 4.1 The halo auto- and cross-power and bispectra

To begin, the halo and matter density fields are interpolated onto a cubical Cartesian mesh using the cloud-in-cell (CIC) algorithm. Throughout we use mesh sizes corresponding to . We then Fourier transform these grids using the Fast Fourier Transform technique and correct each mode for the CIC assignment. The three power spectra , , , and the four bispectra, , , , , are then estimated using the expressions:

(27) | |||||

(28) | |||||

where is the number of Fourier modes in a narrow shell centred on , represents the pair of vectors which lie in thin shells centred on and , whose angular separation lies in the angular bin centred on , and is the total number of triangles with this configuration in Fourier space. The superscript “d” denotes that these are spectra of a discrete distribution of points (i.e. haloes) and must be corrected for shot noise. The forms of the Poissonian shot-noise corrections we adopt were presented in Pollack, Smith & Porciani (2012):

(29) | |||||

(30) | |||||

where denotes the mean number density of either the matter particles or the halo population.

Figure 1 presents the various power- and bi-spectra averaged over the 200 realizations with the corresponding standard errors on the mean. All spectra were corrected for shot noise using Eqs (29) and (30). The bispectra were measured for triangle configurations with fixed lengths and , but with varying angle . We adopt the convention for and parallel. In order to use the same range of wavenumbers, the power spectra were measured over the scale range .

### 4.2 Estimating and

The polyspectra and that enter the expressions for the halo power- and bi-spectra in the LEB are affected by the non-linear evolution of the matter fluctuations. While these terms are usually approximated with perturbative techniques, we measure them directly from our -body simulations. We do this as follows. First, we correct each Fourier mode of the mass-density field for convolution with the CIC grid. Then we multiply the result by a Gaussian smoothing function and inverse transform back to real space. Next, we generate the fields for the relevant values of and re-transform them into Fourier space. We then deconvolve these fields for the original smoothing, which means simply multiplying each Fourier mode by . Finally, the required and terms, defined in terms of (see Eq. (5)), can be estimated as follows

(31) |

and

(32) | |||||

We note that the functions and slowly vary with and so can be smoothly interpolated. Based on this knowledge, we measure the spectral functions over the range: , in increments of , but including an additional measurement at . The lower limit was adopted because we do not wish to smooth below the Lagrangian size of haloes, which for our sample is of the order of . The upper bound of we justify by noting that we do not want the largest -mode entering our computations of the halo power- and bi-spectra to be too heavily smoothed.

Before inspecting the functions and , we first report the level of non-linearity present in the smoothed matter-density field, . We quantify this by measuring the variance of the density perturbations, and the fraction of cells where the density contrast exceeds unity, , as a function of the filter scale (see Table 1). Our results show that for . We therefore expect the quadratic bias model to be a poor description for smaller values of . However, we note that the fraction of the cells with is for all of the filter scales considered. Furthermore, in our previous work (Pollack, Smith & Porciani, 2012), we evaluated the scatter plots of versus measured from our -body simulations for different smoothing radii. We found that expressing as a polynomial function at second-order in can describe reasonably well the mean trend of the scatter.

2 | 2.44 | 10.0 |
---|---|---|

4 | 0.71 | 8.4 |

6 | 0.38 | 6.0 |

7 | 0.28 | 4.9 |

8 | 0.22 | 3.8 |

10 | 0.15 | 2.2 |

12 | 0.10 | 1.1 |

14 | 0.08 | 0.5 |

16 | 0.06 | 0.2 |

18 | 0.05 | 0.1 |

### 4.3 Results for and

Figures 2 and 3 show the results for the ensemble-averaged de-smoothed power and bispectra, and , respectively. Focusing on the power spectrum, the panels show (from left to right) the matter power spectrum followed by the terms , and .

For the bispectra, the panels show: the matter bispectrum (top left), (top right), (bottom left), and (bottom right). We have restricted the triangle configurations to those which enter the auto- and cross- halo bispectra shown in Figure 1. Each panel shows six sets of points with errorbars which denote the results obtained for different smoothing scales. The red crosses denote the resulting polyspectra when no Gaussian smoothing (and de-smoothing) is applied on top of the CIC assignment.

Comparing the different panels reveals how the amplitudes of the de-smoothed quantities vary. Obviously, for the matter power and bispectra, and , all of the spectra overlap with the CIC result as the smoothing and the de-smoothing procedures perfectly cancel each other out. However, for the remaining and functions, the de-smoothed quantities vary with the scale . In particular, as decreases, the overall amplitude of the spectra increases due to the contributions of small-scale modes. For the largest smoothing scales, the configuration dependence of the spectra is also modified. In order to gain some insight into the origin of this behaviour, let us consider, for instance, the term

(33) |

where is a generic weighting function defined as

(34) |

For Gaussian smoothing, the weighting function in Eq. (33) can be re-expressed as , with the cosine of the angle between and . The contribution to the integral from all modes with is exponentially suppressed (i.e. ).

The contribution to the integral from all modes with is exponentially suppressed (i.e. ). However, at fixed , assumes values larger than unity for and (independently of ) and presents an absolute maximum for and where it takes the value . Note that, when , so that all configurations where receive nearly the same weight. In this case, the parameter regulates how quickly the function drops when moves away from the region where . In other words, behaves nicely as a smoothing function. This is not true, however, when and the value of grows very large. In this case, receives dominant contributions from a narrow shell of modes located at and . This effect is clearly seen in Figure 2 for where the over-smoothing (i.e. the fact that is significantly larger than unity for ) leads to a change in shape for which is particularly evident for the configurations with the largest wavenumbers.

It is interesting to investigate why, for , the configuration dependence of changes very little with and only the overall normalisation appears to depend on the smoothing scale. If we assume that the amplitude of the bispectrum keeps nearly constant at all scales assuming a value , Equation (33) then gives

(35) |

The first term on the right-hand-side gives the -space volume over which the bispectrum is averaged to get . At fixed , this expression diverges as when and exponentially as while it shows broad minimum around .

Clearly, had we not smoothed the density field, the resulting and would be divergent in any CDM cosmology.

### 4.4 Modelling and with SPT

In order to better understand what drives the amplitude and functional form of the and terms we have attempted to model their signal with SPT. For simplicity, we have focused on the lowest-order non-trivial terms and .

To leading order in the perturbations, the matter bispectrum can be written as with the second-order SPT kernel (see Appendix B) and the linear power spectrum. In Figure 4 (left panel) we show the results obtained after inserting this expression into Equation (33) in comparison with the measurements from the -body simulations. The SPT-based model displays the same scaling behaviour with and as the data. However, for the SPT predictions are accurate to better than per cent, which is still not at the level of precision required for future galaxy clustering datasets; the deviations become larger with smaller . It follows from the definition of the term that (see Appendix B for )

(36) | |||||

where denotes the matter trispectrum (i.e. the connected part of the -point correlator). The SPT contribution to lowest non-vanishing order is simply:

(37) |

In the right panel of Figure 4 we show that this approximation (star-shaped points) strongly underestimates the outcome from the -body simulations (solid symbols with errorbars) and does not display the same scaling behaviour with and as the data. A common approach performed during observational data analysis is to substitute in place of the linear power spectrum, , shown in Eq. 37, the fully non-linear power spectrum, . We found that performing this substitution has little effect on the resulting amplitudes, remaining roughly equivalent as the lowest nonvanishing contributions. We then go one step further and compute the next-to-leading-order corrections to which are of sixth-order in terms of the linear density field. This gives

(38) | |||||

where denotes the first loop correction to the power spectrum (i.e. ) and the term represents the tree-level contribution to the connected trispectrum. In Appendix B we provide the expressions needed for evaluating all these quantities, which are de-smoothed according to Eq. (15). Our final results are shown in Figure 4 (solid lines). The SPT approximation shows the correct scaling with , but for it tends to overpredict the amplitude for collinear (i.e. and ) configurations. For it also underpredicts the amplitude for triangles in which and are nearly perpendicular. However, as increases the discrepancy lessens and at SPT performs better. This suggests that using SPT to fit galaxy bispectra in the scale range may possibly lead to seriously biased estimates for the parameters of the LEB.

Nevertheless, whilst the analytic calculations of and are feasible, computing higher-order terms becomes increasingly challenging. However, estimating these quantities from simulations is no more demanding than measuring the low-order terms and so our approach offers a distinct advantage over the classical SPT calculations.

## 5 Estimation of halo bias

### 5.1 Bayesian parameter estimation

The second-order LEB contains three parameters: . In this section, we use Bayesian statistics to determine their values that best represent the halo power and bispectra extracted from our simulations. For simplicity, we assume that the cosmological parameters are perfectly known and that the measurement errors are Gaussian distributed, i.e.