# An exploration of galaxy-galaxy lensing and galaxy clustering in the Millennium-XXL simulation

###### Abstract

The combination of galaxy-galaxy lensing and galaxy clustering data has the potential to simultaneously constrain both the cosmological and galaxy formation models. In this paper we perform a comprehensive exploration of these signals and their covariances through a combination of analytic and numerical approaches. First, we derive analytic expressions for the projected galaxy correlation function and stacked tangential shear profile and their respective covariances, which include Gaussian and discreteness noise terms. Secondly, we measure these quantities from mock galaxy catalogues obtained from the Millennium-XXL simulation and semi-analytic models of galaxy formation. We find that on large scales (), the galaxy bias is roughly linear and deterministic. On smaller scales () the bias is a complicated function of scale and luminosity, determined by the different spatial distribution and abundance of satellite galaxies present when different magnitude cuts are applied, as well as by the mass dependence of the host haloes on magnitude. Our theoretical model for the covariances provides a reasonably good description of the measured ones on small and large scales. However, on intermediate scales , the predicted errors are 2–3 times smaller, suggesting that the inclusion of higher-order, non-Gaussian terms in the covariance will be required for further improvements. Importantly, both our theoretical and numerical methods show that the galaxy-galaxy lensing and clustering signals have a non-zero cross-covariance matrix with significant bin-to-bin correlations. Future surveys aiming to combine these probes must take this into account in order to obtain unbiased and realistic constraints.

###### keywords:

Cosmology: theory. Gravitational lensing: weak## 1 Introduction

Since the first pioneering attempt to measure the galaxy-galaxy lensing (hereafter GGL) signal by Tyson et al. (1984), there have been significant technological developments in deep and wide-field astronomy, which have lead to the emergence of GGL as one of the most promising probes for simultaneously constraining both the cosmological and galaxy formation models.

The first robust detection of the GGL signal was made by Brainerd, Blandford & Smail (1996) using 90 arcmin of imaging data from the 5m Palomar telescope. They showed that whilst the tangential shear profile around individual galaxies was too weak to be measured, the stacked signal around all lens galaxies could be detected with high signal-to-noise. Since then there has been a rapid explosion in the field: the addition of the Wide-Field Camera to the Hubble Space Telescope enabled a number of key GGL studies (Griffiths et al., 1996; Hudson et al., 1998; dell’Antonio & Tyson, 1996; Leauthaud et al., 2012). The improvement of ground-based facilities such as the Canada-France-Hawaii Telescope has also led to significant developments (Wilson et al., 2001; Hoekstra, Yee & Gladders, 2004; Parker et al., 2007; van Uitert et al., 2011; Hudson et al., 2015; Velander et al., 2014). However, perhaps the most prolific work in this area comes from the analysis of the data from the Sloan Digital Sky Survey (hereafter SDSS) (McKay et al., 2001; Guzik & Seljak, 2001, 2002; Sheldon et al., 2004; Hirata et al., 2004; Mandelbaum et al., 2005, 2006c, 2006a, 2006b; Johnston et al., 2007; Sheldon et al., 2009b, a; Nakajima et al., 2012; Mandelbaum et al., 2013). All these works have revealed that the GGL signal is a complex function depending on a number of galaxy properties, such as luminosity, colour, spectral type etc. The key importance of GGL is that it enables one to make a direct link from galaxy properties to the underlying dark matter distribution. Indeed, these works have also constrained the mass, density profiles, ellipticity of the dark matter haloes hosting the lens galaxies.

One of the first to realize that cosmic shear could help to simultaneously constrain galaxy formation and cosmology, through directly measuring the bias, was Schneider (1998). Schneider’s approach of using aperture mass filters was implemented by Hoekstra et al. (2002) who directly measured galaxy bias, establishing that it was a complicated function of scale. This approach was further theoretically developed for the Halo Occupation Distribution (hereafter HOD) framework by Guzik & Seljak (2001) and later Seljak et al. (2005) and Yoo et al. (2006). More recent work has been performed by Cacciato et al. (2009, 2013) who have combined the results from GGL and galaxy clustering (hereafter GC) studies, along with measurements of the galaxy luminosity function (hereafter GLF) from SDSS to constrain the parameters of the Conditional Luminosity Function (hereafter CLF) model – which fully specifies the link between a given dark matter halo and the galaxies it hosts, albeit with assumptions about the functional form of the CLF (Yang, Mo & van den Bosch, 2003; van den Bosch et al., 2013). An interesting result to emerge from this work, was that if one did not include the GGL measurements in the analysis, then equally good fits to the CLF model parameters could be obtained for either WMAP1 or WMAP3 cosmology (Spergel et al., 2003, 2007). Including the GGL data broke this degeneracy and identified the WMAP3 parameters as the preferred cosmological model. See also the very recent works of Miyatake et al. (2013); More et al. (2014).

Whilst there has been significant progress in attempting to understand and interpret the GGL signal (see also Baldauf et al., 2010; Saghiha et al., 2012), our understanding of how to perform a robust likelihood analysis with such data sets has been lacking. For example, in the recent development of the CLF framework (van den Bosch et al., 2013; Cacciato et al., 2013; More et al., 2013), the GGL and GLF measurements were taken to have diagonal covariance matrices, and the GC covariance matrix was obtained from jackknife estimation. Moreover, these probes were assumed to have zero cross-covariance. This is clearly a simplification that future large data sets analyses should improve on. A better analysis of the errors was performed by Leauthaud et al. (2012) who used numerical simulations to estimate the covariance matrices of the GGL and GC measurements, while Mandelbaum et al. (2013) used the jackknife approach. The very recent works of Miyatake et al. (2013); More et al. (2014) also had an improved error analysis. However, again in their analyses the cross-covariance of the two measurements was assumed to be negligible.

If upcoming surveys such as the Dark Energy Survey (DES), the Javalambre-Physics of the Accelerated Universe Astrophysical Survey (J-PAS), Euclid, and the Large Synoptic Survey Telescope (LSST) are to optimally constrain the cosmological model, then it is inevitable that they must also jointly constrain the model of galaxy formation. The best way to do this will be to combine the GGL, GC and GLF measurements. This will require not only accurate models for the signals themselves, but also accurate modelling of the covariance and cross-covariance matrices of these probes.

In this paper we develop an analytical framework to compute both the covariance and cross-covariance of GC and GGL. Our work builds upon the analysis of the earlier work of Jeong, Komatsu & Jain (2009) for GGL and that of Smith & Marian (2014, 2015) for the GC signal. We then use the semi-analytic galaxy catalogues and dark matter distribution from the Millennium-XXL (hereafter MXXL) simulation (Angulo et al., 2012) to directly measure these observables and their associated auto- and cross-covariances, for several bins in luminosity. Unlike the CLF approach, semi-analytic models (hereafter SAM) make no direct assumption on how galaxies populate dark matter haloes. Instead, they attempt to model the relevant physical processes for galaxy formation and evolution, and how these are affected by environment and assembly history. Thus, the nonlinearity and stochasticity of galaxy bias are predictions, not assumptions in our study.

The paper is structured as follows. In §2 we present the necessary theoretical expressions for modelling the stacked tangential shear profiles and projected galaxy clustering signals. In §3 we present expressions for their associated auto- and cross-covariances. In §4 we provide an overview of the MXXL-simulation and the SAM galaxy catalogues that we use. In §5 we present the measurements of the GGL and GC signals for a set of luminosity bins, and compare them with the predictions from the theory. In §6 we present our results for the GC and GGL covariance and cross-covariance matrices. In §7 we summarize our findings and draw our conclusions.

## 2 Theoretical predictions for the GGL and GC signals

In this section we present theoretical expressions for the stacked tangential shear signal of a population of galaxy lenses and the signal for the projected galaxy correlation function.

### 2.1 Overview of required lensing ingredients

In the flat-sky approximation, the complex shear is written in terms of the convergence as (Bartelmann & Schneider, 2001)

(1) |

where the lensing kernel is defined to be,

(2) |

with and being the Cartesian components of the Euclidean vector . These equations can be written in Fourier space as:

(3) |

where and are the Cartesian components of the vector ; and and are the Fourier transforms of and respectively. From Eq. (3), the complex shear can also be written using the polar coordinates of the Fourier vector as:

(4) |

The tangential shear at position with respect to position (where and are defined with respect to the same origin), is:

(5) |

where is the polar angle of the relative position vector . Combining the last two equations, one can write:

(6) |

Finally, we define the azimuthal average of the tangential shear as:

(7) |

With the help of the very useful Bessel relation

(8) |

one obtains the following equation which will recur many times in this section:

(9) |

Therefore the azimuthal average of the tangential shear is:

(10) |

### 2.2 Estimator for the stacked tangential shear

In galaxy-galaxy lensing, the signal of individual lenses is very weak, so in order to improve the signal-to-noise of this probe, one has to stack several lenses. Suppose there is a population of galaxy lenses at positions in a chosen coordinate system. Suppose also the total area of the survey to be . The number density of these lenses can be written as a sum over their positions:

(11) |

where is a 2D vector in the survey area. Using the above equation, an estimator for the tangential shear of such a lens population at an arbitrary position with respect to the location of the galaxy centres is (Jeong, Komatsu & Jain, 2009):

(12) | |||||

We define the fluctuation in the number density of lenses as , where the mean angular density of lens galaxies is . At this point we also introduce the definitions of the convergence and galaxy density auto- and cross-power spectra, which shall be used throughout this paper:

(13) |

Since the tangential shear with respect to an origin vanishes on average , the ensemble average of the estimator in Eq. (12) is

(14) | |||||

In the above we have also used the homogeneity of the Universe, which makes the ensemble average of two cosmological fields to be invariant under translations. We shall take advantage extensively of this property throughout this work. With Eq. (9), we arrive at the azimuthally-averaged expression for the ensemble average of the stacked shear estimator:

(15) |

Our goal is to compare theory predictions with estimates from simulations, so we must take into account that the measured tangential shear is bin averaged and not just azimuthally averaged. We introduce the bin area:

(16) |

with and being the lower and upper bounds of the radial bin . The bin-averaged stacked tangential shear estimator is defined by

(17) |

Defining the bin-averaged Bessel function of order to be:

(18) |

and using Eq. (15), we write the final expression for the bin-averaged stacked tangential shear estimator

(19) |

### 2.3 The projected galaxy correlation function

We define the estimator for the projected galaxy correlation function to be:

(20) |

where is an estimator for the 3D galaxy correlation function, is the comoving projection length, and the position vector has the components in cylindrical coordinates. The estimator for the galaxy correlation function is discussed in section §A.2 of the appendix, here we just mention that it is unbiased, i.e. . Note that in this study we shall ignore redshift-space distortions, as for our chosen projection length , their contribution to is of on the largest scales that we consider and less otherwise (Baldauf et al., 2010). The galaxy correlation function can be written in terms of its Fourier transform, the galaxy power spectrum:

(21) | |||||

where the second line follows from expressing the wavevector in cylindrical coordinates, with components . is the component along the line-of-sight and the magnitude is defined in the standard way . The expectation of the projected galaxy correlation function estimator may therefore be written

where is the zero order spherical Bessel function. Note that we can also obtain an expression for the ensemble average of our projected correlation function estimator in spherical coordinates, choosing a particular frame where

(24) | |||||

where in the last equality we took advantage of the fact that the result did not depend on our particular choice of frame, and switched back to cylindrical coordinates. Whilst Eqs. (LABEL:eq:wgg_ave) and (24) are expected to yield the same result, the evaluation of the latter should more accurate since it involves a single Bessel function integral.

Finally, we may apply the Limber approximation to simplify Eq. (LABEL:eq:wgg_ave). In this approximation it is only modes that are transverse to the line-of-sight which contribute to the power spectrum integral, and so the second integral in Eq. (2.3)) becomes,

(25) | |||||

Using this relation in Eq. (2.3) we find that the
Limber-approximated ensemble average of the projected galaxy
correlation function estimator is therefore^{1}^{1}1Note that this
result can also be obtained if in Eq. (LABEL:eq:wgg_ave) one takes
.:

(26) |

## 3 Signal covariance matrices

In this section we compute the auto- and cross-covariance matrices of the stacked tangential shear signal and the projected galaxy correlation function.

### 3.1 The covariance matrix of the stacked tangential shear estimator

The definition of the covariance of the estimator in Eq. (12) is

(27) |

The bin-averaged estimate of the GGL covariance is defined according to Eq. (17) as:

(28) |

In appendix A.1 we provide the complete details of the derivation of the covariance of the stacked tangential shear profiles. The main result is (c.f. Eq. (73)):

(29) | |||||

where is the variance per shear component in the measurement of one source galaxy, and is the mean angular density of the source galaxies.

### 3.2 The covariance matrix of the projected galaxy correlation function estimator

The azimuthally-averaged covariance of the projected correlation function estimator can be written as a projection of the covariance of the estimator for the 3D galaxy correlation function, which we denote . Hence,

In the appendix A.2 we provide complete details of the derivation of the covariance matrix of and the final result is given by Eq. (84). On combining the equation above and Eq. (84), we arrive at the expression for the azimuthally-averaged covariance of the projected galaxy correlation function estimator:

(30) | |||||

where is a constant quantifying how dense the random catalogue used to estimate the correlation function is relative to the galaxy data (see §A.2 for more details). Note that in the above is the mean galaxy volume density, whereas in §3.1 the same notation is used for the mean angular density of lens galaxies. As in the case of the ensemble-averaged projected galaxy correlation function estimator, here too we apply the Limber approximation to simplify the above result. The Limber-approximated covariance is given by:

(31) | |||||

Note that the survey volume can be expressed as , where denotes the transverse area of the survey. If , then we also have and the Limber-approximated covariance becomes:

Therefore, the Limber covariance is well-behaved in the limit where . Finally, using the definitions in Eqs. (16) and (17), we write the expression for the bin-averaged, Limber-approximated covariance of the projected galaxy correlation function estimator:

(32) | |||||

### 3.3 The cross-covariance of the stacked tangential shear and the projected galaxy correlation function estimators

In this section our goal is to compute the cross-covariance of the estimators for the stacked tangential shear and projected galaxy correlation functions, defined by Eqs. (12) and (20). To this avail, we follow the same procedure as before, and define the cross-covariance as:

(33) | |||||

where and are two 2D position vectors. To simplify this calculation, we shall use the angular correlation function instead of the projected correlation function. This is justified by the fact that our lenses are in a thin redshift slice, in which case the two correlation functions are equivalent. In appendix A.3 we provide complete details of the derivation of the cross-covariance matrix.

The final expression is given by (c.f. Eq. (98)):

Just like before, we are interested in the cross-covariance matrix of the bin-averaged measurements. In a similar fashion to the analysis of the previous sections, we see that under binning the above equation becomes:

(34) | |||||

Eq. (34) represents the bin-averaged cross-covariance of the estimator for the angular galaxy correlation function and the stacked tangential shear estimator. As expected, it has no dimensions.

## 4 The MXXL simulation

The MXXL is the largest simulation in the Millennium series, with a volume of and dark matter particles of mass . The cosmological model corresponds to a flat CDM universe with: the matter density parameter ; the dimensionless Hubble parameter ; the amplitude of matter fluctuations ; the primordial spectral index ; and a constant dark energy equation of state with . For a complete description of the MXXL we refer the reader to Angulo et al. (2012).

Halo and subhalo catalogues were stored for snapshots. The smallest object in these catalogues has a mass . Merger trees were built by identifying for every subhalo in each snapshot the most likely descendant in the next snapshot. The trees were then used to build a galaxy catalogue with the SAM galaxy formation code L-Galaxies (Springel et al., 2005).

The L-Galaxies code corresponds to a set of differential equations that couple with the above-mentioned merger trees and that encode the key physical mechanisms for galaxy formation. Processes such as gas cooling, star formation, feedback from SN and AGN, galaxy mergers, black hole formation and growth, and generation of metals are all implemented in a self-consistent manner. We refer the interested reader to Guo et al. (2011); Henriques et al. (2012) and references therein for specific details on the method, and to Angulo et al. (2014) for details on the implementation in the MXXL simulation. Here, we just highlight that the galaxy population of a given halo does not depend on its mass alone, as commonly assumed in many models, but also on the details of the halo assembly history and environment. For each galaxy, the full star formation history is stored, and when coupled with population synthesis models and an assumed initial stellar mass function, it allows us to compute the expected luminosity for each of the five SDSS filters.

In Figure 1 we present the luminosity function for the five SDSS bands. We find that all of the luminosity functions show qualitatively similar behaviour: a steep fall-off at bright magnitudes and a turn-over followed by a power-law-like tail at intermediate and faint magnitudes. We also see that for a given magnitude band there are greater abundances of galaxies at red wavelengths than at blue. This is qualitatively consistent with observational results from the SDSS (Blanton et al., 2003). For the faintest magnitudes we notice artifacts produced by the finite mass resolution of the MXXL simulation. We elaborate on this next.

In the upper left panel of Figure 2 we present the relative abundance of central, satellite and orphan galaxies as a function of their red-band absolute magnitude. ‘Central’ galaxies reside at the centres of the main halo(subhalo), and are therefore the main galaxies of the FoF haloes. ‘Satellite’ galaxies inhabit satellite subhaloes within the FoF haloes. ‘Orphan’ galaxies are satellite galaxies whose dark matter subhalo has been stripped down below the resolution limit of the simulation. The figure clearly shows that the brightest galaxies () are mostly centrals. The satellites with a resolved dark matter subhalo are sub-dominant for all magnitude bins, but dominate among satellite galaxies with . These features depend on the mass resolution of the simulation (see Figure 17 for an analogous figure constructed from the higher-resolution Millennium simulation). With a much higher mass resolution, central galaxies would dominate at any luminosity and there would be no orphan galaxies.

The upper right panel of Figure 2 shows how the mass of the host haloes evolves with the galaxy luminosity. Independent of the brightness of satellite and orphan galaxies, the haloes hosting them are quite massive (), whereas the haloes inhabited by central galaxies display a substantial decrease in mass with decreasing luminosity. This drop in halo mass spans more than two orders of magnitude, starting at .

The bottom left panel of Figure 2 shows the average distance of the galaxies from the central galaxy, as a function of magnitude. By definition the distance of central galaxies is zero. Satellite galaxies are on average most distant from the halo centre: the brightest have a separation of , and the faintest of about . On the other hand, orphan galaxies are on average within from the halo centre, which is a consequence of tidal forces being stronger closer to the halo centre where tidal disruption and mass loss happen.

Finally, the bottom right panel of Figure 2 presents the evolution of the host subhalo mass with luminosity. For centrals this is in fact the same as shown in the upper right panel; the other two types follow the same trend as the centrals. Note that the subhalo mass associated with the orphan galaxies is defined to be the mass of the last subhalo tracked before it fell below the mass resolution of the simulation. We can see that there are no strong differences among different types which is a consequence of the fact that the -band magnitude mostly traces the total amount of mass in stars, which in turn depends primarily on the total amount of gas available for star formation and thus on the mass of the host dark matter structure.

In this paper we shall use only the red band, since most of the GGL and GC studies to date have focused on this band. We shall only consider galaxies with , split into four absolute magnitude bins, with each bin spanning a single unit of magnitude, except for the brightest bin for which we take all galaxies with . We have ensured that above the chosen limit our results are qualitatively insensitive to the finite mass resolution of the MXXL by explicitly comparing the GGL and GC signals with those derived from the higher-resolution Millennium simulation.

## 5 Results I: clustering and lensing measurements in MXXL

### 5.1 Methodology for estimating the projected correlation functions

In order to estimate the GGL and GC signals and their covariance from the MXXL data, we divide the simulation box into 216 subcubes of volume . Each subcube therefore contains dark matter particles, as well as galaxies from the catalogues described in section §4. For our analysis we assume both the Born and Limber approximations, which allow us to perform all of the computations at the fixed redshift . For each of the subcubes, we measure the projected matter-matter, galaxy-matter, and galaxy-galaxy correlation functions. In order to handle the huge data volume, we have developed a fast k-D tree code in C++ with MPI parallelization. Our algorithm is similar to that described by Moore et al. (2001) and Jarvis, Bernstein & Jain (2004). However, rather than invoking an approximate scheme for binning the pair counts as is done in these algorithms, we place every particle exactly into the correct radial bin. We have carefully tested that our code obeys the pair counting scaling and that it reproduces exactly the answer obtained from a brute-force pair summation code.

For the particular problem of computing the correlation functions in the MXXL simulation, we count pairs in logarithmic bins of the transverse distance , and linear bins of the line-of-sight distance . Since the subcubes do not have periodic boundary conditions, we also cross-correlate the data with a random catalogue to account for boundary effects on the pair counts. With the pair counts for the bins, the 3D correlation function can be estimated by using the unbiased and minimum-variance (in the limit of no-clustering) estimator of Landy & Szalay (1993):

(35) |

where and represent the first and second data catalogues, and is the random catalogue. , , , and represent the respective pair counts. For auto-correlations, , while for cross-correlations e.g. of galaxies and matter, they differ. Note that this estimator perfectly matches the one we defined in Eqs. (74) and (77), since we took the weights there to be constant. The ratio of the number of data particles to the number of particles from the random catalogue represents the from Eq. (74).

Some details of how we estimate these correlations are as follows. In order to obey computing time constraints, we limit the random catalogues to particles. To maintain a value as low as possible for , we subsample both the matter and the galaxy data. The number of subsamples is , and for each of them we generate a random catalogue. The rate of sampling for matter is , which gives us about dark matter particles per subsample, while for each luminosity bin we randomly select no more than galaxies. These values correspond to for matter, and for galaxies in each luminosity bin. The only exception is the first luminosity bin, containing the brightest galaxies, which has only about galaxies per subcube, and is therefore not subsampled. The matter particles and galaxies from every subsampled subcube are correlated with the random catalogue generated for that subsample. Thus for all the subcubes, the ith subsample is correlated with the ith random catalogue. We have checked that given the number of data particles, the number of random particles is sufficiently large not to yield significant errors in the measured projected correlations.

We found that it was crucial to correct the projected correlation functions for the integral constraint, otherwise the results exhibited a strong dependence on the projection length . This owes to the fact that the total density of objects in each subcube is not guaranteed to reach the universal mean.

We implement the integral constraint in the estimates of the 3D galaxy correlation function in a similar fashion to the procedure described in Landy & Szalay (1993): To begin, we define the ‘geometric factor’:

(36) |

where is the number of random pair counts for the bin , and is the cylindrical volume of the respective bin, i.e. , with . Thus is the number of pairs possible in the bin relative to the total number of pairs in the survey volume. It is normalized to unity over the survey volume, i.e. . The integral constraint is defined by the equation:

(37) |

where is the estimator introduced in Eq. (35) at the respective bin. The integral-constraint-corrected estimator for the projected galaxy correlation function is given by:

(38) |

where is the number of bins in . We apply this to all of the measurements.

Finally, we mention the choice of bins: we have 25 logarithmic bins in , spanning the interval , and 10 linear bins in , with a chosen of . We checked that the number of line-of-sight bins is sufficiently large to obtain an accurate estimation of the projected correlation function.

To summarize: the projected correlation functions are estimated through the following steps: i) use the tree code to evaluate the 3D pair counts for each bin, and for every subsample; ii) build the Landy-Szalay estimator for the 3D correlation function from the pair counts and determine the integral constraint factor; iii) add the line-of-sight bins to obtain the projected correlation functions, i.e. compute following Eq. (38); iv) calculate the average of the subsamples.

### 5.2 Galaxy and matter correlation functions

Figure 3 presents the projected matter-matter, galaxy-matter, and galaxy-galaxy correlation functions – red, green, blue solid lines respectively – from the 216 subcubes of the MXXL. Each line corresponds to the estimate from one subcube, and each panel to a magnitude bin, starting with the brightest galaxies in the upper left corner, and down to the faintest in the lower right corner. The measurements have some scatter for the two brightest magnitude bins, but are relatively tight otherwise. This plot also provides a check that none of the subcubes displays any anomalous behaviour.

### 5.3 Galaxy bias and cross-correlation coefficient

We may obtain the galaxy bias parameter either from the galaxy-galaxy or galaxy-matter projected correlation functions through:

(39) | |||||

(40) |

Figure 5 presents the two bias estimates for all magnitude bins and averaged over all 216 subcubes of the MXXL simulation. There are several points worth noticing in these figures. Firstly, on large scales, we see that both and appear to be constant and qualitatively consistent with one another. We also note that the bias is relatively similar for the fainter bins, but it increases sharply for the brightest galaxies. This luminosity dependence of the bias in SAM models is consistent with earlier studies (see Smith, 2012, and references therein), and can be understood from Figure 2. There we see that it is only for the brightest magnitude bin that the host haloes of both the central and satellite galaxies are very massive. For the fainter bins, the mass of the host haloes decreases with luminosity in the case of central galaxies, while remaining relatively constant in the case of the satellites. Thus the bias of the latter is boosted. This explanation is consistent with the picture where an individual galaxy inherits the bias of the halo hosting it.

On smaller scales, we notice that : the scale where this transition occurs decreases with increasing absolute magnitude, ranging from for the brightest galaxies to for the faintest bin. The fact that can be qualitatively understood from the following reasons. The 3D galaxy-galaxy correlations in SAM models obey an exclusion condition, which is that individual galaxies cannot come closer than the separation of the sum of their individual subhalo virial radii. On scales smaller than this, the correlation function drops to -1. Whereas for the galaxy-matter cross-correlation function no such exclusion is present and one simply probes the density profile of matter around galaxies – which is known to be cuspy (Hayashi & White, 2008). However, in projection these effects are less significant, but nevertheless still operate and lead to the shape shown in Figure 5. The transition scale varies with magnitude because the halo mass of central galaxies decreases with increasing absolute magnitude (c.f. Figure 2).

The cross-correlation coefficient can be defined as:

(41) |

Note that this is not the same as is usually understood in statistics, where the cross-correlation coefficient of two variables and is constrained to be . Eq. (41) is defined in terms of correlation functions, and hence provided it is not required to obey the condition . Indeed if either the galaxy-galaxy or matter-matter correlation functions cross zero, which they most certainly do, then is formally divergent. Nevertheless, the diagnostic properties of are key: if the galaxy bias were linear and deterministic then , and measurements of either or may be directly related to the underlying matter distribution, modulo an amplitude factor. However, any departure from unity indicates that the bias is either nonlinear or stochastic, or both (for more discussion see Dekel & Lahav, 1999; Seljak & Warren, 2004).

Figure 5 shows the cross-correlation coefficient as a function of the transverse scale . We find that on large scales, the correlation coefficient approaches unity for the four magnitude bins that we have considered. This implies that, at least for the SAM galaxies in MXXL, the large-scale bias is linear and deterministic and that it describes both the galaxy-galaxy and galaxy-matter correlation functions. On small scales we see that the correlation coefficient decreases sharply and then shoots up above unity. This is consistent with the nonlinear scale-dependent bias presented in Figure 5. Note that the small-scale clustering of galaxies is very sensitive to the treatment of dynamical friction of orphan galaxies, so we do not wish to over-interpret the results of Figures 5 and 5 for .

### 5.4 Comparison of the measured and theoretical projected galaxy correlation function

In Figure 6 we show again the projected galaxy correlation functions for the four magnitude bins, but this time we have averaged the data over the 216 MXXL subcubes. The error bars are plotted on the mean. The solid lines present the theoretical predictions. We evaluate the theory as follows: instead of a direct numerical evaluation of Eq. (26), which would require a model for , we determine the projected nonlinear matter correlation function under the Limber approximation by replacing . We then simply multiply this quantity by the measured bias, e.g. Eq. (39):

(42) |

In computing we use the nonlinear matter power spectrum fitting formula halofit (Smith et al., 2003).

In Figure 6 we see that on small scales , the predictions underestimate the measured by roughly . These discrepancies are attributed to the fact that halofit underpredicts the true nonlinear matter power spectrum on small-scales (Takahashi et al., 2012). The more recent work of (Fosalba et al., 2015a, b) shows however that the revised halofit of (Takahashi et al., 2012) overpredicts the nonlinear power spectrum and consequently also the convergence power spectrum on small scales. Since the agreement between halofit models and simulation measurements depends on both the mass resolution and the volume of the simulations – the closer the simulations’ specifications to those used for the calibration of the semi-analytical prescriptions, the better the agreement – we defer more sophisticated theory predictions to a future work and are satisfied for now that the original halofit of (Smith et al., 2003) gives reasonable enough answers for our purpose here.

It is also interesting to note that the faintest galaxies yield high projected correlation functions around , most likely due to contributions from the satellite galaxies, which inhabit higher-mass haloes and are therefore more biased.

### 5.5 The stacked tangential shear of galaxies

As mentioned earlier, since the full particle data was not available for a large set of redshifts, we were not able to perform ray-tracing simulations. Instead, we use the Born approximation to make lensing observables from the MXXL data. In real terms, this means that the convergence is obtained as a weighted line-of-sight integration of the matter density fluctuations (Bartelmann & Schneider, 2001):

(43) |

where we have assumed a flat space-time geometry and is the Hubble constant, is the speed of light, is the linear matter density perturbation and is the expansion factor.

If we now reexamine Eq. (15) it can be proven that the azimuthally-averaged tangential shear about a randomly-selected point which we take to be , may be written in real space as (Schneider, 2005):

(44) |

If instead of random points we consider the centre of a lens galaxy as the reference point, then on averaging over all galaxies the above expression becomes (Guzik & Seljak, 2001):

(45) |

where we used the relation and the differential surface mass density is given by

(46) |

In the above is the comoving matter density of the Universe, and we have assumed that the circularly-averaged tangential shear is sourced only by the matter associated with a single lens galaxy. The critical surface-density for lensing is given by:

where , , and represent the observer-source, observer-lens, and lens-source angular diameter distances, respectively. To keep the notation compact, we shall omit the dependence on and write the critical density simply as .

From our earlier discussion in §2.2 we see that an alternative way to compute the stacked tangential shear is through Eq. (15). In the Limber approximation and assuming once again that only matter associated with the lens galaxy creates the shear signal we have:

(47) |

We shall use Eq. (46) to compute the excess surface density both analytically and from the simulation data. For the theoretical predictions, we choose to compute in the same way as we computed in Eq. (42), only this time using from Eq. (40). We prefer this approach to that offered by Eq. (47), because it allows us to obtain in the same way for both simulations and the theory. At larger this choice is unimportant, however at smaller radii, owing to the fact that the radial binning does not start at , it plays a more important role and the results from Eqs. (45) and (47) differ systematically.

The right panel of Figure 6 presents a comparison between the theoretical predictions and measurements of as a function of the transverse spatial scale , for the four magnitude bins considered. The symbols correspond to the simulations and the lines to the theory predictions. On large scales, , we see that similar to , the shear amplitudes in the three faintest bins are comparable, whereas the brightest galaxies have a higher amplitude. On smaller scales, , we find a systematic trend: the brighter the galaxies the larger the amplitude of the tangential shear profile. This finding is in accord with the GGL measurements from the SDSS by Mandelbaum et al. (2006c).

On closer inspection of the faintest luminosity bin, we see that it appears to have a somewhat broad, flattish shear profile, with a very slight second peak at , and higher amplitude than the brighter bins (excepting the brightest bin) on intermediate scales . This might be due to the increased relative abundance of satellite galaxies compared to centrals. The satellite galaxies are mainly hosted by high-mass haloes and have an average distance from the centre of the main halo that is roughly on the order of , and so we expect that their tangential shear profiles receive two significant contributions: the first from the dark matter associated with their own subhalo; the second comes as the shear profile radius encompasses the central cusp of the main halo. This qualitatively explains the broadening of the profile of the faintest bin. We also note that the theory underpredicts the measurements more significantly than for . This can be attributed to the small-scale inaccuracies of halofit contributing to at all radii.

## 6 Results II: covariance matrices from the MXXL

### 6.1 Covariance of the projected correlation function

In Figure 8 we present the errors on the mean projected galaxy correlation function, divided through by the signal, as a function of the transverse scale , and for the magnitude bins discussed previously. The blue pentagons in the figure represent the noise-to-signal ratio estimated directly from the subcubes. The unbiased estimator of the mean and covariance is:

(48) |

(49) | |||||

where is the bin-averaged estimate of the projected correlation function from the th subcube. The covariance and error on the mean are then simply obtained by further dividing the right-hand side of Eq. (49) by the number of subcubes . The red triangles den