1 Introduction

Halo clustering and -type primordial non-Gaussianity

Kendrick M. Smith, Simone Ferraro, and Marilena LoVerde

Princeton University Observatory, Peyton Hall, Ivy Lane, Princeton, NJ 08544 USA Institute for Advanced Study, Einstein Drive, Princeton, NJ 08540, USA

A wide range of multifield inflationary models generate non-Gaussian initial conditions in which the initial adiabatic fluctuation is of the form . We study halo clustering in these models using two different analytic methods: the peak-background split framework, and brute force calculation in a barrier crossing model, obtaining agreement between the two. We find a simple, theoretically motivated expression for halo bias which agrees with -body simulations and can be used to constrain from observations. We discuss practical caveats to constraining using only observable properties of a tracer population, and argue that constraints obtained from populations whose observed bias is are generally not robust to uncertainties in modeling the halo occupation distribution of the population.  

1 Introduction

In the last few decades, increasingly precise observations (e.g. [1, 2, 3, 4, 5, 6]) have led to a standard cosmological model in which small initial fluctuations evolve in a CDM background to give rise to the observed universe. Current data are consistent with initial fluctuations which are adiabatic, scalar, Gaussian, with weak deviations from scale invariance ( at 3).

The statistics of the initial fluctuations, i.e. deviations from Gaussian initial conditions, provide a powerful probe of the physics of the early universe. In the context of inflation [7, 8, 9, 10, 11, 12, 13], the simplest models (single-field, minimally coupled slow-roll) predict initial curvature perturbations with negligible deviations from Gaussianity. However, there is a rich phenomenology of non-Gaussian initial conditions in models with multiple fields, self-interactions near horizon crossing, or speed of sound during inflation. In this paper, we will focus on non-Gaussianity of the so-called local type [14, 15, 16, 17], in which the primordial potential111In studies of primordial non-Gaussianity, it is conventional to define a primordial potential , where is the initial adiabatic curvature perturbation. Note that is not the conformal Newtonian potential, which is given by deep in the radiation-dominated epoch where Eq. (1) applies. is of the form


where is a Gaussian field and , are free parameters.222We define -type non-Gaussianity including the term ; this term simply renormalizes so that its power spectrum is equal to the observed power spectrum (to first order in ).

Local non-Gaussianity can be generated by physical mechanisms involving multiple fields, such as light spectator fields during inflation which evolve to generate the initial adiabatic fluctuations (the curvaton scenario) [18, 19, 20], or models where the inflaton decay rate is modulated by a second field [21, 22]. Non-Gaussianity of local type is also naturally generated in non-inflationary models of the early universe such as the new ekpyrotic/cyclic scenario [23, 24, 25]. There is a theorem [26, 27] which states that any single-field model of inflation cannot generate detectable levels of local non-Gaussianity without violating observed limits on deviation from a scale-invariant power spectrum. Thus, detection of either or would rule out all single field models of inflation and place powerful constraints on the physics of the early universe. Current observational constraints on these parameters are consistent with zero [1, 28, 29, 30], but are expected to improve by an order of magnitude or more in the near future.

In models of inflation in which , it is unlikely that observational constraints on will be competitive with constraints on . However, there are a number of examples where , making the term in Eq. (1) the dominant source of primordial non-Gaussianity. This situation arises in curvaton models where non-quadratic terms in the potential are important [31, 32, 33, 34, 35] or in multifield models in which varies rapidly at the end of inflation [36, 37]. The existence of these scenarios makes searching for just as important as and measurements provide important constraints on the microphysical parameter space.

In a pioneering paper [38], Dalal et al showed that large-scale clustering of halos depends sensitively on . More precisely, a sample of halos (or tracers such as galaxies or quasars) with constant bias in a Gaussian cosmology will have scale-dependent bias given by


in an cosmology. Here, is the spherical collapse threshold and is defined by


so that the linear density field and the primordial potential are related by . Large-scale structure constraints on from scale-dependent bias are currently competitive with the CMB (e.g. [28, 39]) and may ultimately provide constraints which are stronger (e.g. [40, 41]). The key identity (2) has been derived using several different analytic frameworks [42, 28, 43] and agrees with -body simulations (e.g. [38, 44, 45, 30]).

In this paper we study the related issue of large-scale halo-clustering in a cosmology. We consider the large-scale halo bias in two analytic frameworks: the peak-background split (§3) and a barrier crossing model (§4). We find consistency between the two formalisms (in disagreement with [46]) and obtain an expression analogous to Eq. (2) for the scale-dependent halo bias in a cosmology. Our main results are a universal relation between the scale-dependent halo bias in a cosmology and the mass function in an cosmology,


and expressions for (Eqs. (50), (51)) which can be used in practice to constrain from data. We also discuss caveats when estimating the bias from observable quantities (§5.4) and argue that constraints obtained from tracer populations which are not highly biased () are generally not robust to uncertainties in HOD modeling.

Throughout this paper we use the WMAP5+BAO+SN fiducial cosmology [47], with baryon density , CDM density , Hubble parameter , spectral index , optical depth , and power-law initial curvature power spectrum where and Mpc. All power spectra and transfer functions have been computed using CAMB [48].

2 Definitions and notation

We will sometimes model halos of mass with peaks in a smoothed density field defined as follows. Let be the linear density field smoothed by a tophat filter with radius , i.e.




Let be the RMS amplitude of the smoothed density field, and let be its -th non-Gaussian cumulant, defined by:


Since and are defined via linear theory, is independent of redshift as implied by the notation. To first order in and , we have


with higher cumulants equal to zero, where are the values of the cumulants at and respectively. These values are given explicitly by:


where was defined previously in Eq. (3) and is the power spectrum of the linear density field, . For numerical calculation, the following fitting functions (from [49]) are convenient:


This paper is mainly concerned with calculating halo bias to first order in and , so let us establish notation from the outset, by writing the large-scale bias in the general form:


where unlike Eq. (2) and Eq. (4) we have allowed for scale-independent corrections and from and primordial non-Gaussianity. Equation (14) defines the coefficients . This equation assumes that the -dependence is of the functional form , but we will derive this analytically (Eq. (35)) and show that it agrees with simulations (§5.1). In this notation, the Dalal et al formula (2) can be written as .

3 Peak-background split

The peak-background split formalism is a procedure for predicting halo clustering statistics on large scales. The basic idea is that a long-wavelength fluctuation in the initial curvature alters the local abundance of halos in a way which is equivalent to perturbing parameters of the background cosmology, e.g. the matter density or the amplitude of the initial fluctuations. The use of this formalism to study halo bias in non-Gaussian cosmologies was pioneered in [28]; we will review this calculation of the bias in an cosmology (§3.1) and then perform an analogous calculation in the case (§3.2).

3.1 cosmology

In an cosmology, the initial conditions are given by:


To analyze the effect of a long-wavelength mode, let us decompose the Gaussian potential as a sum of long-wavelength and short-wavelength contributions. The long/short-wavelength decomposition of the non-Gaussian potential is then


and contains explicit coupling between long and short wavelength modes of the Gaussian potential.

Let us consider how the term in Eq. (16) affects , the long-wavelength part of the halo number density field. In a local region where the long-wavelength potential takes some value , the amplitude of the small-scale modes is perturbed: . This modifies the local halo abundance, in the same way that the global abundance would be modified if the cosmological parameter were perturbed, i.e. we get a term in the long-wavelength halo density of the form . In addition, even in a Gaussian cosmology, there is a perturbation to the local halo abundance which is proportional to the long-wavelength part of the density fluctuation, i.e. a term of the form . Putting this together, the long-wavelength part of the halo density is given by:333In this derivation, we have swept two terms in Eq. (15) under the rug; let us now argue that these are indeed negligible. The term alters the statistics of the small scale modes; this does perturb the halo abundance (by generating skewness in the density field) but the perturbation is independent of the long-wavelength fluctuation . Therefore, this term does not contibute to the large-scale halo bias. The term perturbs the long-wavelength modes and decorrelates them (to order ) from both the linear density fluctuation and the field which modulates the local power spectrum amplitude . In principle, this should generate stochastic bias at order , but we will neglect this, since we are only calculating to order .




Intuitively, in an cosmology, the local power spectrum amplitude is not spatially constant, but varies throughout the universe in a way which is proportional to the long-wavelength potential .

Computing the halo bias from Eq. (17) for , we get:


From the preceding argument, we predict that the scale-dependent bias is given by . We will refer to this as a “weak” prediction for the bias: it cannot be used to constrain from real data, since has not been expressed in terms of observable quantities.

To make further progress, we need to evaluate the derivative , by making additional assumptions. If we assume that the halo mass function is universal, then one can calculate the derivative, obtaining , where is the Gaussian bias [28], so that:


We will refer to this as a “strong” prediction for the scale-dependent bias in an cosmology, since has been expressed in terms of the observable quantity . The strong form is essential for constraining from observations.

3.2 cosmology

Let us now generalize the analysis of large-scale clustering in the previous subsection to the case of a cosmology, with initial conditions given by:


Separating the Gaussian field into long and short wavelength pieces , we decompose as follows:

As in the case, we’ll consider the perturbation to the long-wavelength halo density generated by each of these terms.

The term can be interpreted as a local modulation in the small-scale power spectrum amplitude, given by . This generates a term in the long-wavelength halo density, in close analogy with the case (the modulation is proportional to in this case, rather than ).

The term can be interpreted as follows. In a local region where the long-wavelength potential takes the value , the small-scale modes are perturbed in the same way as in an cosmology where the global value of is given by . This generates a term in the long-wavelength halo density.

Finally, there is the usual term due to changes in mean background density (as in the Gaussian case).

Putting this all together, we find that the long-wavelength halo density field in a cosmology is given by:444Analogously to the case, we have neglected two terms in Eq. (3.2). The term only alters power spectra at order , and we will neglect terms of this order. The term generates kurtosis in the density field and modifies the halo mass function [49], but in a way which is independent of and therefore does not contribute to large-scale clustering.


where and were defined previously (Eqs. (18), (19)), and:


The large-scale halo bias is given by:


Note that the term in Eq. (24) does not contribute to the bias, since the field and the long-wavelength density field are uncorrelated (their cross correlation is a three-point function of Gaussian fields, which vanishes). This term should generate stochastic bias, but we defer a systematic study of halo stochasticity in non-Gaussian cosmologies to a future paper [50].

We have now arrived at the peak-background split prediction (26) for halo bias in a cosmology, which relates the scale-dependent bias to the derivative of the halo mass function in an cosmology. In the terminology of the previous subsection, this is a “weak” prediction: we have shown that the problem of computing the bias is naturally related to the problem of understanding the mass function in an cosmology, but the coefficient has not been expressed in terms of observable quantities.

To obtain a “strong” prediction, we need to evaluate the derivative , which requires making additional assumptions. This has been done in [49], assuming a barrier crossing model for the mass function and using the Edgeworth expansion to calculate the derivative (see also [51, 52, 53, 54, 55, 56, 57]). The result is:


where , and and are Hermite polynomials. We will compare this prediction with -body simulations in §5.

4 Barrier crossing model

In this section, we will study large-scale bias using a barrier crossing model, obtaining results which are consistent with the peak-background split formalism from the previous section. The two approaches are complementary: the barrier model has the advantage that it generates complete predictions for halo statistics (such as the mass function or bias) via an algorithmic calculational procedure, but obscures the physical intuition of the peak-background split. For completeness, the calculations in this section will be sufficiently general to include the cases of Gaussian, -type, and -type initial conditions.

4.1 Setting up the calculation

The barrier crossing model is an old, widely influential idea in cosmology, in which halos of mass are identified with peaks in the smoothed linear density field [58]. Although more complex versions have been proposed, we will use the simplest version: a spherical collapse model with constant barrier height, defined formally as follows.

We model halos of mass as regions where the smoothed linear density field (defined in Eq. (5)) exceeds the threshold value , i.e. the halo number density is given by:


where is the step function


Throughout this paper, we take ; this value produces somewhat improved agreement between the barrier model and simulations, compared to the Press-Schechter value .555We experimented with using a mass-dependent barrier chosen for consistency with a universal mass function such as Sheth-Tormen [59] or Warren [60], but found that this did not result in further improvement.

To study halo bias in this model, we define the following notation. Let , be two points separated by distance , let denote the (unsmoothed) linear density field at , and let denote the smoothed linear density field at . We denote the joint PDF of these random variables by , and denote the 1-variable PDF of by . We define


These quantities are related to the halo mass function and matter-halo correlation function , but there is one wrinkle. In the barrier crossing model, the field defined in Eq. (28) represents the number density of halos with mass , whereas we want to consider a sample of halos with mass in a narrow mass range . Thus and are obtained by taking derivatives as follows:


4.2 Mass function, halo bias, and interpretation

In principle, calculation of the halo mass function and large-scale bias in the barrier crossing model has now been reduced to evaluation of Eqs. (30)–(33). We defer details of the calculation to Appendix A and quote the final results. The halo mass function is given by:


The halo bias is given by (in the large-scale limit ):




Although the above expressions are the result of a purely formal calculation, we will now show that each term has a natural interpretation.

Considering first the halo mass function (34), we have found a Press-Schechter mass function (with ) in the Gaussian case, plus first-order corrections in and which agree with those found in [52, 49] using the Edgeworth expansion. This agreement is expected since the two calculations are based on the same barrier crossing model.

Moving on to halo bias, in the Gaussian case, we predict that is constant on large scales, with value given by Eq. (36). The peak-background split argument suggests a general relation between the large-scale halo bias and the halo mass function which applies generally to a universal mass function of the form:


On large scales, the bias is predicted to be scale-independent and given by [61]:


Comparing our predictions (34), (36) for and , we find agreement, i.e. Eq. (36) for can be interpreted as the general peak-background split expression for halo bias, specialized to the Press-Schechter mass function.

More generally, the and contributions to the bias (Eqs. (37), (38)) represent shifts in the scale-independent part of the bias due to primordial non-Gaussianity. It is straightforward to check that these terms can be obtained by plugging the non-Gaussian mass function in Eq. (34) into the peak-background split prediction (42) for scale-independent bias, i.e. the and terms can be interpreted as changes to the bias which are entirely due to the mass function being perturbed in a non-Gaussian cosmology. This type of term (scale-independent bias proportional to ) was first found for cosmologies in [30]. Note that a scale-independent shift is unobservable in practice, and cannot be used to constrain non-Gaussianity, since the bias of a real tracer population, such as galaxies or quasars, is a free parameter.

The contribution to the bias is the well-known scale-dependent bias in an cosmology. Comparing Eq. (39) for with Eqs. (34), (36), this term can be written either as or . (In §3, we referred to these as “weak” and “strong” predictions.)

The contribution to the bias is the focus of this paper: scale-dependent bias in a cosmology. Eq. (40) gives this term in the “strong” form that was found previously (Eq. (27)) using the peak-background split argument. Alternately, we can write this term in the “weak” form using Eq. (34).

In summary, we have found that the complete expression for large-scale halo bias in the barrier crossing model (Eq. (35)) agrees perfectly with the peak-background split calculation from §3. The bias contains a scale-independent part which can be obtained from the halo mass function, via the general relation (42). The scale-independent bias depends on and , because the halo mass function depends on these parameters. The bias also contains a scale-dependent part whose coefficients can be calculated explicitly and agree with the peak-background split predictions.

4.3 Comparison with previous work

It is interesting to compare the above calculations with the results of [62] (see also [43]), where was calculated using the MLB formula [63], which gives -point functions of halos as an asymptotic series in . The scale-dependent bias was found to be (in our notation):


When this prediction was compared to -body simulations, it was found to be a poor fit.

Comparing with our calculation (40) for , it is seen that the two agree in the high-mass limit , but disagree in subleading terms. This is expected since the MLB formula is based on the same barrier crossing model that we have used, but it is an asymptotic result, whereas we have done an exact calculation (to first order in , ). For realistic halo masses, the “subleading” terms neglected in the MLB formula are of order one (to quantify this better, and agree to 10% only when the halo bias ), so in practice the two predictions are quite different.

Recently, ref. [46] argued that the barrier crossing model cannot generate correct predictions for general non-Gaussian initial conditions such as the model, but we found the opposite conclusion: brute-force calculation in the barrier crossing model, collecting all terms of order , agrees precisely (i.e. to all orders in ) with the peak-background split. It seems intuitively plausible that two must be consistent, since the peak-background split argument depends only on the assumption that halo formation is determined by the local density field, and the barrier crossing model is a concrete example of a model in which this assumption is satisfied.

5 Results from -body simulations

In the last two sections, we have obtained complete analytic predictions for large-scale bias in a cosmology, finding agreement between the peak-background split formalism (§3) and a barrier crossing model based on spherical collapse (§4).

To compare these predictions with simulation, we performed collisionless -body simulations using the GADGET-2 TreePM code [64]. Simulations were done using periodic box size  Mpc, particle count , and force softening length . With these parameters and the fiducial cosmology from §1, the particle mass is  .

We generate initial conditions by simulating a Gaussian primordial potential , and applying or corrections by straightforward use of Eq. (1). We linearly evolve to redshift using the transfer function666One subtlety here: straightforward use of CAMB’s transfer function at redshift 100 leads to inconsistencies since CAMB includes radiation (which is not negligible at ) in its expansion history, while GADGET does not. For this reason we use CAMB’s linear transfer function at low redshift and extrapolate back to using the growth function in an universe. from CAMB [48], and obtain initial particle positions at this redshift using the Zeldovich approximation [65]. (At , transient effects due to use of this approximation should be negligible [66].)

After running the -body simulation, we group particles into halos using an MPI parallelized implementation of the friends-of-friends algorithm [67] with link length . For a halo containing particles, we assign a halo position given by the mean of the individual particle positions. We estimate halo bias using the procedure described in Appendix A of [68]. The statistical error obtained using this procedure is smaller than the error that would be obtained assuming uncorrelated estimates of the power spectra and , since shared sample variance is taken into account.

Results in this paper are based on 4 simulations with Gaussian initial conditions, 5 simulations with , and 3 simulations with (for a total of 20 simulations).

5.1 Fitting the functional form

We now compare our analytic prediction for to simulation in several steps, corresponding to increasingly strong versions of the prediction.

Figure 1: An example to illustrate that halo bias in a cosmology takes the functional form form . This figure corresponds to redshift and halo mass range , but we find the same functional form for all redshifts and halo masses.

First, consider the weakest possible question: our analytic prediction for the bias is of the functional form


Is this is a good fit to simulation, if we treat the coefficients and as free parameters? (We will compare our analytic prediction for to simulation in the next subsection; for now we are just testing whether the functional form (44) is correct.)

In Fig. 1, we show some example fits of this form, for redshift and halo mass range . Each fit was performed using bias estimates from 4 independent simulations with Mpc and wavenumbers Mpc. We find good values for the fits, with recovered parameters:


We note that the recovered bias parameters (45) in this example show that both and are -dependent. In the barrier crossing model, we made a prediction for the dependence of (Eq. (38)). We find good agreement between this prediction and our simulations. Note that in practice, the dependence of is unobservable since for a real tracer population, the halo occupation distribution is not known precisely and must be treated as a free parameter to be determined from data.

The observed dependence of corresponds to scale-dependent bias of order or higher (note that is defined in such a way that constant corresponds to scale-dependent bias which is linear in ). This complicates comparison with our analytic predictions, since we have only calculated the bias to order . We address this by estimating by averaging the estimates obtained from simulations with , thus removing contributions to which are proportional to . Note that this does not remove contributions to the bias, but we have checked that such contributions are negligible for , by comparing with simulations with halved step size.

Repeating this fitting procedure for redshifts and a range of halo masses (the precise set of halo mass bins used is shown in Fig. 2 below), we find values which are consistent with their expected distribution, i.e. we find that the functional form (44) is a good fit to the simulations for a wide range of redshifts and halo masses. For this reason, in subsequent sections, we will “compress” the estimates of in each simulation (as shown in Fig. 1) to two numbers ( and ), with statistical errors given by the fitting procedure.

5.2 Comparison with analytic predictions

Now that we have established the functional form of the bias, and a procedure for estimating from simulation as a function of redshift and halo mass, we would like to compare with our analytic predictions for .

Figure 2: Comparison of the “weak” and “strong” predictions for the scale-dependent bias in a cosmology. Blue squares: Direct estimates of the bias, extracted from simulations with as described in §5.1. Green circles: “Weak” analytic prediction for the bias () from the peak-background split formalism, showing perfect agreement. The estimates of shown in the figure were obtained directly from simulations with . Red dotted curve: Edgeworth prediction for the bias (Eq. (40)). Good agreement is seen at high mass, but at low masses Edgeworth underpredicts . We will find an improvement in §5.3.

First, consider the “weak” form of the prediction () obtained from the peak-background split argument. We can test this prediction by estimating the derivative directly from simulations, by taking finite differences of in simulations with . (We checked convergence in the step size.) We find that the prediction holds perfectly (within the statistical errors of the simulations) for all redshifts and halo masses (Fig. 2).

Second, consider the “strong” Edgeworth prediction (Eq. (40)), in which an explicit formula for is given. In this case, we find reasonable agreement at high mass ( ), but the prediction breaks down at low halo mass (Fig. 2).

Our interpretation is as follows. The peak-background split prediction is a universal relation between bias in a cosmology and the mass function in an cosmology. Although “weak” in the sense that it does not supply a closed-form expression for , the derivation makes few assumptions, and one expects it to be exact. In order to constrain from real data, we need a “strong” prediction which expresses in closed form, using only observable quantities (i.e. the analog of the Dalal et al formula for an cosmology). Using the Edgeworth expansion, one can make such a prediction in the context of the barrier crossing model (Eq. (40)), and obtain rough agreement with simulations, but the level of agreement is not really good enough for doing precision cosmology. Therefore, we next propose a slightly modified version of the Edgeworth prediction.

5.3 A simple universal formula for the bias in a cosmology

We would like to slightly modify the Edgeworth prediction (40) for so that it agrees better with -body simulations. It is also convenient to have a prediction in which is given as a function of observable quantities: Gaussian bias (rather than halo mass, which is unobservable) and redshift .

We start by rewriting the Edgeworth prediction (40) for in terms of variables . The following fitting functions for and are convenient:


For purposes of this subsection, we define the quantity to be given in terms of and by:


The Edgeworth prediction for can be written in the following form:


Empirically, we find that if we tweak the Edgeworth prediction by changing the coefficients of the polynomial in brackets as follows: