Clustering of \rm{H}\alpha emitters

The luminosity-dependent clustering of star-forming galaxies from to with HiZELS


We present clustering analyses of identically-selected star-forming galaxies in 3 narrow redshift slices (at , and ), from HiZELS, a deep, near-infrared narrow-band survey. The HiZELS samples span the peak in the cosmic star-formation rate density, identifying typical star-forming galaxies at each epoch. Narrow-band samples have well-defined redshift distributions and are therefore ideal for clustering analyses. We quantify the clustering of the three samples, and of luminosity-selected subsamples, initially using simple power law fits to the two-point correlation function. We extend this work to link the evolution of star-forming galaxies and their host dark matter halos over cosmic time using sophisticated dark matter halo models. We find that the clustering strength, , and the bias of galaxy populations relative to the clustering of dark matter increase linearly with luminosity (and, by implication, star-formation rate) at all three redshifts, as do the host dark matter halo masses of the HiZELS galaxies. The typical galaxies in our samples are star-forming centrals, residing in halos of mass a few times . We find a remarkably tight redshift-independent relation between the luminosity scaled by the characteristic luminosity, , and the minimum host dark matter halo mass of central galaxies. This reveals that the dark matter halo environment is a strong driver of galaxy star-formation rate and therefore of the evolution of the star-formation rate density in the Universe.

galaxies: evolution – galaxies: high-redshift – galaxies: halo – cosmology: large-scale structure of Universe

1 Introduction

The galaxies we observe exist in a wide range of environments, from rich clusters to underdense void regions. They are thought to trace an underlying distribution of dark matter, with more highly clustered galaxies occupying massive dark matter overdensities (Zwicky, 1933; Peebles, 1982). This is commonly explained via the paradigm of hierarchical growth: weak density fluctuations in an expanding, homogeneous Universe are amplified by gravitational instabilities, with smaller structures forming first. Galaxies form due to the collapse of baryonic matter under the gravity of dark matter halos (White & Frenk, 1991), with the progenitors of the most massive clusters starting to form earliest. Dark matter halos assemble via successive mergers and accretion of small halos, which naturally leads to the formation of galaxy groups and clusters, with a single dark matter halo capable of hosting many galaxies.
While the observed ‘cosmic web’ spatial distribution of dark matter in the Lambda Cold Dark Matter (CDM) paradigm can be successfully modelled using N-body simulations (Davis et al., 1985) as advanced as the Millennium Simulation (Springel et al., 2005), resolution is limited and the evolution of galaxies within this web is harder to model. This complexity reflects the additional baryonic processes present: we must consider not only the underlying distribution of dark matter but also the non-linear physics of galaxy formation and evolution. Key processes such as gas cooling, star-formation, and the physics of feedback due to star-formation and black hole accretion all act on different timescales with different galaxy mass and environment dependencies. The latest generation of hydrodynamical simulations such as Illustris (Vogelsberger et al., 2014) and EAGLE (Crain et al., 2015; Schaye et al., 2015) and semi-analytic models (e.g. Baugh, 2006) currently do fairly well in modelling such processes, broadly reproducing key observed relations such as galaxy luminosity and stellar mass functions, and the bimodal galaxy colour distribution, but a wealth of observational data is required to fine-tune parameters.
Many details of the environmental drivers of galaxy evolution, and how they relate to galaxy mass, remain poorly understood. It has long been known that at low redshifts, galaxies in rich clusters are preferentially passive ellipticals (Oemler, 1977; Dressler, 1980), whereas field galaxies tend to be star-forming and disk-like, with increasing star-formation rates and star-forming fractions further from cluster centres (Lewis et al., 2002; Gomez et al., 2003). High mass galaxies are also far less likely to be star-forming than their low mass counterparts (Baldry et al., 2006). Despite these well-established observational trends, the effects of mass and environment have remained hard to distinguish, given the inter-dependence of the two quantities (galaxies of higher masses tend to reside in higher density environments).
The latest observational data at both low and high redshifts has provoked a flurry of recent work aiming to understand the relationships between stellar mass, star-formation rate and environment (e.g. Peng et al., 2010; Sobral et al., 2011; Scoville et al., 2013; Darvish et al., 2015). Both mass and environment are associated with transformations in colour, star-formation rate and morphology, popularly known as ‘quenching’. Supplementing low redshift data from the SDSS (York & Adelman, 2000) with higher redshift data from the zCOSMOS survey (Lilly et al., 2007), Peng et al. (2010) proposed that two primary quenching mechanisms, ‘mass quenching’ and ‘environment quenching’, act independently and dominate at different epochs and galaxy masses. ‘Environment quenching’, which primarily affects satellite galaxies (Peng et al., 2012), is attributed to some combination of gas stripping (due to ram pressure (Boselli & Gavazzi, 2006) or tidal effects) and ‘strangulation’ (Larson, 1980; Peng, Maiolino & Cochrane, 2015), whereby gas is prevented from cooling onto the galaxy from its hot halo, perhaps upon accretion onto a massive halo. Mass quenching, which dominates the cessation of star-formation for massive galaxies, is also attributed to a shut-down of cold gas accretion, via shock heating by the hot halo (Dekel & Birnboim, 2006), possibly in combination with AGN heating (Croton et al., 2006; Best et al., 2006).
There is evidence that the trends observed at low redshift hold to at least . At , Sobral et al. (2011) and Muzzin et al. (2012) both find that the fraction of galaxies that are star-forming decreases once we reach group densities and at high galaxy masses. However, things become less clear at even higher redshifts. Scoville et al. (2013) find a flattening in the relationship between environmental overdensity and both star-forming fraction and star-formation rate above for galaxies in the COSMOS field, and note that this flattening holds out to their highest redshift galaxies at . Other studies have found an apparent reversal of the low-z star-formation rate (or morphology)-density relation at higher redshifts (Butcher & Oemler, 1978). Both Sobral et al. (2011) and Elbaz et al. (2007) found that at , median galaxy star-formation rates increase with overdensity until cluster densities are reached, at which point star-formation rates decrease with overdensity, as in the local universe. Attempting to explain these opposing trends, McGee et al. (2009) propose that the pressure of the intra-cluster medium on infalling galaxies in the outskirts of galaxy clusters actually compresses gas and enhances star-formation prior to stripping in the denser environment of the cluster core. Increased galaxy-galaxy interactions may also trigger intense star-formation via the disruption of gas disks. At high redshifts, high gas fractions (e.g. Tacconi et al., 2010) permit more efficient starburst responses. Thus at high redshifts, the richest environments may provide the combination of large gas reservoirs and ICM pressures which fuel high star-formation rates and later lead to quenching via gas exhaustion and stripping (Smail et al., 2014).
Quantifying the environmental dependence of star-formation activity at high redshift directly is inherently challenging. An alternative approach to studying this is through auto-correlation functions of star-forming galaxies. The dark matter correlation function is the inverse fourier transform of the dark matter power spectrum. Observing the projected real-space galaxy correlation function, which is a linear scaling of the dark matter correlation function, provides a natural connection between galaxies and the underlying matter distribution which determines their large-scale environments. Modelling these correlation functions using Halo Occupation Distribution model frameworks (Peacock & Smith, 2000) can yield more information about galaxy host halos, in particular their masses. It also provides a powerful technique for exploring the central/satellite dichotomy in galaxy populations. The ‘one-halo’ term represents clustering on small scales, within a single dark matter halo, and is determined by the spatial separation of central galaxies and their satellites. The ‘two-halo’ term, in contrast, is controlled by the larger-scale clustering of galaxies in different dark matter halos (driven primarily by the halo mass), and incorporates central-central pairs as well as satellite-satellite and central-satellite correlations. A consistent picture has emerged in which more luminous and more massive star-forming galaxies tend to be more strongly clustered, as a result of lying preferentially in high mass dark matter halos. This holds at both low redshifts (e.g. Norberg et al., 2001; Zehavi et al., 2011) and at high redshifts (e.g. Sobral et al., 2010; Wake et al., 2011; Geach et al., 2012; Hatfield et al., 2016).

Figure 1: Distribution of dust-corrected luminosities of HiZELS emission line-selected galaxies in our samples at the three epochs. Vertical dashed lines show the characteristic luminosity, , at each redshift. HiZELS galaxies span a large luminosity range at each epoch, probing well below .

In this paper we build upon the work presented in Sobral et al. (2010), which studied the clustering of emitters at from the High-Redshift(Z) Emission Line Survey (HiZELS, see Section 2). Narrow band surveys such as HiZELS select only those galaxies with emission lines within a very narrow redshift range (), and with a well-defined redshift distribution. For clustering measurements, these types of survey are therefore superior to photometric ones, which are often hampered by systematic uncertainties and require a more complex treatment of the spatial distribution in the clustering analysis. Furthermore, unlike many spectroscopic surveys, the narrow band approach provides a clean selection function down to a known flux (star-formation rate) limit. Sobral et al. (2010) found evidence for a strong luminosity dependence of the clustering strength of emitters at , along with evidence for a single relation with from to . Geach et al. (2008, 2012) supplemented this work with the first analyses of the clustering of HiZELS galaxies at , though the sample was not sufficiently large to permit binning by luminosity.
Here we analyse a larger sample of emitters at spanning three fields: COSMOS, UDS and SA22. Crucially, we also use larger samples of emitters at , and include new data at (Sobral et al., 2012, 2013). Our samples, which span large ranges in luminosity and redshift, provide optimal data for revealing the drivers of galaxy evolution over cosmic time. We provide details of the HiZELS sample selection in Section 2. In Section 3, we lay out our approach to quantifying the clustering of these sources via two-point correlation functions, and in Section 4 we present the results of simple power-law fits to these. Given the high quality of the correlation functions obtained, we extend these analyses to incorporate a sophisticated Halo Occupation Distribution (HOD) modelling treatment. In Section 5 we set up the HOD framework and present derived halo properties for our HiZELS galaxies, in particular typical halo masses and galaxy central/satellite fractions. We discuss the implications of these results in Section 6.
We use a , and cosmology throughout this paper.

Field No. emitters Area ()
Table 1: Numbers and mean redshifts of H emitters identified by the HiZELS survey and selected for this analysis. \textcolorblackHiZELS uses standard and custom-made narrow-band filters, complemented by broad-band imaging, over well-studied fields. Only emitters which exceed the limiting flux, , of their frames are included.
Figure 2: Left: the completeness curve used to place sources in frames with flux limit . We account for a small number of excess sources due to flux boosting around the detection limit. Right: example of random sources in the COSMOS field, colour coded by the limiting flux of their frame, with real sources shown by stars overlaid. Fluxes are given in units of .

2 The HiZELS survey and sample selection

2.1 Sample of emitters

HiZELS (Geach et al. 2008; Sobral et al. 2009; Sobral et al. 2012, 2013) used the United Kingdom Infra-Red Telescope (UKIRT)’s Wide Field CAMera (WFCAM), the Subaru Telescope’s Suprime-Cam with the NB921 filter, the Very Large Telescope (VLT)’s HAWK-I camera and the Canada France Hawaii Telescope (CFHT) with MegaCam (CFHiZELS; Sobral et al. 2015) to detect line emitters over large areas within well-studied fields. We present only a brief overview of the survey here, referring the curious reader to Sobral et al. (2013) for a full description of the HiZELS COSMOS and UDS data, and to Sobral et al. (2015) for details of the SA22 CFHiZELS campaign.
HiZELS uses standard and custom-made narrow-band (NB) filters, complemented by broad-band (BB) imaging. Sources identified by the narrow-band filters are matched to those in the broad-band images by using the same aperture size and a search radius of . True emitters are selected based on their NB-BB colour excess, with a signal-to-noise cut of and an equivalent width selection corresponding to for . High quality photometric redshifts derived from data spanning from optical to mid-IR wavelengths (e.g. Cirasuolo et al. 2010; Ilbert et al. 2009; Lawrence et al. 2007) were used to identify which emission line is being selected for each emitter, and thus select a clean sample of H emitters. This technique enables the identical selection of H emitting galaxies at (NBJ: COSMOS, UDS, SA22), (NBH: COSMOS, UDS) and (NBK: COSMOS, UDS); see Table 1 for details. Spectroscopic redshifts confirmed that the large sample of galaxies we obtain lies within well-defined redshift ranges (see also Sobral et al. 2016b; Stott et al. 2016).
fluxes are corrected for contamination by the adjacent lines within the NB filter using the relationship between and derived by Sobral et al. (2013) and confirmed spectroscopically in Sobral et al. (2015). They are also corrected for dust attenuation assuming (Garn et al., 2010; Ibar et al., 2013). The median combined correction is at , at and at .

2.2 Generating random samples

We generated unclustered random samples in order to quantify the clustering of the observed emitters. Variations in coverage and observing conditions have resulted in individual HiZELS frames having different depths, meaning that robustly-constructed random samples are essential to differentiate between true clustering and that introduced by the observing strategy. In this section we describe the construction of random samples which reflect these depths.
Most simply, random sources may be generated by calculating a limiting flux at which each frame is essentially complete, drawing sources from the luminosity function down to this flux, and distributing these randomly across the frame. For this analysis we aim to push further in flux, so as to include as many sources as possible. We include sources down to luminosities corresponding to the completeness flux, , as calculated by Sobral et al. (2013); Sobral et al. (2015) for each frame using Monte Carlo simulations. To study source detection as a function of the limiting flux (taking account of both incompleteness and flux boosting biases), we have calculated the ratio of the number of sources recovered, , to the number of sources expected from the luminosity function, , as a function of in each frame. We found a small boost in the number of sources with recovered fluxes around the flux limit, suggesting that flux-boosting effects dominate over incompleteness. We tested different filters, and both deep and shallow fields separately, and found that all show the same general form. We have therefore fitted a single empirically-derived effective completeness curve (Figure 2, left) and taken this into account when generating the random catalogues. Numerous tests have confirmed that our results are qualitatively unchanged if the random sources are simply drawn from the luminosity function down to or constructed using a slightly different completeness curve.
In this paper, we use luminosity functions of the form:


Here, represents the characteristic luminosity ‘break’ of the LF, is the corresponding characteristic comoving space density, and is the ‘faint-end’ slope of the power law, dominant at low luminosities. The parameters we adopt, given in Table 2, were derived using the samples of emitters from Sobral et al. (2013); Sobral et al. (2015).

Table 2: LF parameters used in this paper, derived in Sobral et al. (2013); Sobral et al. (2015). At , we use the Schechter function fit to the much larger sample by Sobral et al. (2015), which is more accurate than that presented by Sobral et al. (2013) and is also a good fit for the data.

We generated a random position for each random source, carefully taking into account the boundaries of each frame and the masked regions due to bright stars and artefacts. The final number of sources generated within a frame depends on both its unmasked chip area and its depth. All random samples are substantially larger (e.g. ) than the real samples. When constructing correlation functions for samples binned by flux, we also require knowledge of the fluxes of the random sources, to account for faint sources being preferentially detected in the deepest frames. The fluxes of random sources are drawn from the luminosity functions given in Table 2, scaled by the fitted completeness curve (Figure 2) for a given . We have also incorporated average corrections for dust and [NII] emission line contamination. We did not include any real or random sources with flux in this analysis.

2.3 Effects of potential contaminants

Here we discuss three classes of possible contaminants: sources that are not true emitters; true emitters which are different lines misclassified as ; and AGN interlopers. As discussed in Section 2.1, HiZELS emitters are selected based on their NB-BB colour excess, with a signal-to-noise cut of . To check the possibility of including false emitters, we have repeated the clustering measurements using a more conservative cut of for various luminosity bins. We find no significant differences in the clustering strengths. We also note that the exclusion of sources with fluxes below their frame’s serves to remove some potential low-flux contaminants. Contamination from misclassified lines is also estimated to be small, at , as estimated by Sobral et al. (2013). Such contaminants will generally have the effect of a small decrease in , with much smaller effects than our observed trends.
Our sample could suffer from contamination from AGN, for which emission is not a reliable tracer of star-formation rate. Using extensive multi-wavelength data to identify AGN candidates within HiZELS samples in the COSMOS and UDS fields, Garn & Best (2010) estimate an AGN fraction of , but Sobral et al. (2016a) find that this can be much higher at very high luminosities. We expect that the effect of AGN contamination may only be very important in the highest luminosity bins. However, these bins show no evidence of deviation from the linear trend of the low-luminosity regime (see Section 4.2). Given that it is difficult to exclude these individual sources from our analyses, we present all results using luminosity rather than converting to star-formation rate explicitly. We invoke star-formation rate only in our gas-regulator model interpretation in Section 6.2.

3 Quantifying galaxy clustering using the two-point correlation function

Broadly, the two-point correlation function compares the clustering of an observed sample to a uniformly distributed random sample with the same areal coverage. It quantifies overdensities on a large range of scales; unlike nearest-neighbour estimators, it can yield insights into both the local environment within halos and the large scale environment. When quantifying galaxy clustering, we construct correlation functions based on angular or projected distances between pairs of galaxies on the sky.

3.1 Angular two-point clustering statistics

The angular two-point correlation function, , is a popular estimator of the clustering strength of galaxies. It is defined as the excess probability of finding a pair of galaxies separated by a given angular distance, relative to that probability for a uniform (unclustered) distribution. The probability of finding objects in solid angles and separated by angular distance is:


where N is the surface density of objects.

Many estimators of have been proposed. We use the minimum variance estimator proposed by Landy & Szalay (1993), which was shown to be less susceptible to bias from small sample sizes and fields:


where and are the total number of random and data galaxies in the sample, and , and correspond to the number of random-random, data-data, and data-random pairs separated by angle . is normally fitted with a power law, , where . Traditionally, Poissonian errors are used:


However, these errors are underestimates (e.g. see Norberg et al., 2009), since they do not account for cosmic variance or correlations between adjacent bins. Using these errors also gives unjustifiably large weightings to the largest angular separations, where large DD pair counts result in very low .
Norberg et al. (2009) conclude that while no internal estimator reproduces the error of external estimators faithfully, jackknife and bootstrap resampling methods perform reasonably well, although both overestimate the errors. They note that jackknife resampling estimates the large-scale variance accurately but struggles on smaller scales (), with the resulting bias strongly dependent on the number of sub volumes. Bootstrap resampling, meanwhile, overestimates the variance by approximately 50% on all scales, which may be minimised by oversampling the sub-volumes. In this paper, we use the bootstrap resampling method with each correlation function constructed from 1000 bootstraps, taking the error on each bin as the diagonal element of the bootstrap covariance matrix.
We also implement the integral constraint, IC, (Groth & Peebles, 1977), a small correction to account for the underestimation of clustering strength due to the finite area surveyed.


IC is small where fields are large. HiZELS fields reach square-degree scales, and so IC corrections are largely negligible.

3.2 Obtaining a real-space correlation length

In order to compare the clustering strengths of populations of star-forming galaxies at different redshifts quantitatively, we convert the angular correlation function to a spatial one. This conversion is often performed using Limber’s approximation (Limber, 1953), which assumes that spatial correlations which follow are projected as angular correlation functions with slopes . This results in the approximate relation between and :


where , and are the filter profiles for projected fields 1&2. Substituting yields:



blackwhere is the gamma function. This is a good approximation for small angular scales, where , and can thus be used to evaluate from the fitted profile. However, the integral diverges for narrow filters. Simon (2007) shows that in the limiting case of a delta function filter, the observed is no longer a projection, but simply a rescaled (thus at large separations). Since Limber’s approximation is not reliable for our samples of galaxies, which span fields with separations of degrees and use very narrow filters, we perform a numerical integration of the exact equation:


Here, , , , and is the profile of the filter, fitted as a Gaussian profile with and that depend on the filter being considered (see Table 3 for the parameters of our filters). We assume the standard value of . fitting of observed against modelled , generated using different , allows us to estimate and its error (following Sobral et al., 2010).

Projected-space two-point clustering statistics

The clustering statistic required as input for the halo fitting routine we use in Section 5.3 is the projected-space () two-point correlation function, . We therefore transform our measured to . is defined by first considering the spatial two-point correlation function along the line of sight () and perpendicular to the line of sight ():


is then integrated over to obtain :


This is related to the real-space correlation function by:


in the limit of a wide filter, and the solution tends to:


In the case of a narrower top-hat filter, we integrate over a finite range of , using as the upper limit to the integral in Equation 11.

In this paper, we calculate from our observed . However, our filter profiles are not top-hat (as assumed for the integral in Equation 11) but are better approximated by Gaussian profiles (see Table 3 for parameters). To account for this difference, we perform numerical integrations to determine the factor by which differs (for a given ) if observed over a top-hat of width as opposed to a Gaussian of width (changing in Equation 8); we find a required correction of . Using this, and combining Equations 39,  10, with , we then obtain:

0.011 1970 14
0.016 3010 18
0.016 3847 18
Table 3: Parameters of \textcolorblackgaussian filter profile fits for the three HiZELS redshifts studied.

4 Results from power-law fits to the angular correlation function

Figure 3: Top: power law fits (with the correction to Limber’s approximation at large scales) to the measured angular correlation functions at three redshifts, each over the same range in . Bottom: derived clustering strength, , for luminosity-binned and luminosity-limited samples. We also show alternative binning. The plotted luminosity value is the mean value of for the luminosity-binned samples, and the lower limit for the luminosity-limited samples. The clustering strength increases with for all three redshifts surveyed in a broadly linear manner.

4.1 Whole samples at well-defined redshifts

We have derived angular correlation functions for large samples of emitters at each redshift and fitted these with power-law models (see Figure 3). The exact luminosity ranges of these samples, given in Table 4, are chosen to compare similar samples at each redshift, and span the same range in : \textcolorblack(albeit with non-matched distributions within this range). The fits shown are those described in Section 3.2, with a power-law of fixed gradient for the spatial correlation function, leading to a slope of in the angular correlation function on small scales and the correction to Limber’s approximation at large scales where the angular separation is much greater than the separation along the line of sight. This parameterisation is sufficient to derive indicative clustering strengths. However, the correlation functions of all three samples do show clear departures from the traditional power-law relation fitted here. At angular scales of order 10s of arcseconds the power-law fit consistently overestimates the observed , indicative of a dominant contribution from a separate 1-halo term at small angular separations. We explore this further in Section 5.

4.2 Clustering strength as a function of galaxy H luminosity

We have fitted both luminosity-binned data and luminosity-limited data with the same power-law models (see Table 4 \textcolorblackand Appendix A). As shown in the lower panels of Figure 3, the clustering strength, , increases roughly linearly with galaxy luminosity for the luminosity-binned samples, showing that more highly star-forming galaxies are more strongly clustered, and hence may live in more massive dark matter halo environments. The trends are similar for the luminosity-limited samples: these also show an increase in clustering strength with galaxy luminosity. The results for the two sample types do not agree exactly because luminosity-limited samples of galaxies with faint limits have their clustering increased by the inclusion of a small number of bright sources, and therefore have a greater clustering strength than that of galaxies entirely within a faint luminosity bin.
Although the absolute values of agree (within errors) with the previous HiZELS study of a smaller sample of emitters at , the apparently linear relationship is at odds with the results of Sobral et al. (2010), who found tentative hints of a more step-like behaviour around the characteristic luminosity. With our much larger sample of emitters, there is no longer evidence for a break in the vs relationship, and a linear relation provides a far better fit. The trends at and also show no clear departure from a simple linear trend, albeit that the results are noisier. \textcolorblackThese results are also broadly consistent with previous studies. We find

Redshift range/ Mean
‘Full’ samples:
0.8 41.72-42.42
1.47 42.16-42.86
2.23 42.47-43.17
Luminosity-selected subsamples
0.8 41.7-41.85
0.8 41.775-41.925
0.8 41.85-42.0
0.8 41.925-42.075
0.8 42.0-42.15
0.8 42.075-42.25
0.8 42.15-42.35
0.8 42.25-42.475
0.8 42.35-42.6
0.8 >41.775
0.8 >41.85
0.8 >41.925
0.8 >42.0
0.8 >42.075
0.8 >42.15
0.8 >42.25
0.8 >42.4
1.47 42.3-42.45
1.47 42.375-42.525
1.47 42.45-42.6
1.47 42.525-42.675
1.47 42.6-42.75
1.47 42.675-42.85
1.47 42.75-43.3
1.47 >42.2
1.47 >42.375
1.47 >42.45
1.47 >42.525
1.47 >42.6
1.47 >42.675
1.47 >42.75
2.23 42.2-42.5
2.23 42.35-42.6
2.23 42.5-42.7
2.23 42.6-42.8
2.23 42.7-42.9
2.23 42.8-43.0
2.23 42.9-43.6
2.23 >42.2
2.23 >42.3
2.23 >42.4
2.23 >42.5
2.23 >42.6
2.23 >42.7
2.23 >42.8
2.23 >42.9
Table 4: values and key parameters derived from HOD fitting, for samples of emitters at different redshifts and luminosities. We find a clear trend towards increasing , , and for samples of galaxies with higher luminosities at all redshifts, but little evidence for changing satellite fractions for these SFR-selected samples.

for our sample at , while Kashino et al. (2017) obtain for emitters at . We find for the full sample at , which is slightly higher than Geach et al. (2012) found using a smaller sample at the same redshift (), but this depends critically on the luminosity range studied.
\textcolorblackIn Figure 4, we show the luminosity-dependent clustering of HiZELS emitters split into two observed K-band magnitude bins. Observed K-band magnitude is believed to be a rough proxy for galaxy stellar mass. We find that the clustering strength increases broadly linearly with within each of the broad K-band magnitude bins, and that this trend is much larger than any differences between the two K-band magnitude bins. We will explore the stellar mass-dependence of the clustering of star-forming galaxies more thoroughly in a subsequent paper, but we stress here that the strong trends of clustering strength with luminosity presented in this paper are not driven primarily by galaxy stellar mass.

Figure 4: \textcolorblackTo investigate whether trends with are driven by stellar mass, we plot against for observed K-band magnitude-selected subsamples of the HiZELS emitters. We find that the strong trends of clustering strength with luminosity hold for these subsamples. This indicates that trends with are not driven primarily by stellar mass.
Figure 5: Left: \textcolorblackHalo Occupation Model fit to the correlation function of the whole sample \textcolorblackusing HALOMOD. This multi-parameter model provides a better fit to data than the single power law model and shows the separate contributions of satellite and central galaxies. Right: the best-fitting halo occupation distribution model. \textcolorblackThe contribution from satellite galaxies becomes significant only in halos more massive than .

5 Modelling galaxy populations via Halo Occupation Distribution fitting

The Halo Occupation Distribution (HOD) formalism extends dark matter halo models to galaxy populations: given a set of input parameters, we can predict the average number of galaxies of a certain type as a function of dark matter halo mass, . A combination of a cosmological model and an HOD enables us to predict any clustering statistic on any scale; usually observations of galaxy clustering (or weak lensing) are then used to constrain cosmological or galaxy evolution models. Here, HOD modelling enables us to estimate typical host halo masses for HiZELS galaxies. We can also do better than the straight-line fit; HOD fitting takes into account the small dip observed on angular scales of order 10s of arcseconds, below which the clustering is dominated by correlations between galaxies within a single dark matter halo. We can now include the effects of the satellite galaxy population on the observed clustering, no longer assuming that a power-law relationship holds on the smallest scales.
A number of different halo occupation parameterisations have been used to fit 2-point galaxy correlation functions. Typically, 3 or 5-parameter fits of Zehavi et al. (2005) and Zheng et al. (2005) are used. While these do well for stellar mass-selected samples (e.g. Wake et al. 2011; Hatfield et al. 2016), they \textcolorblackmay not be suitable for our sample. As noted by Contreras et al. (2013), HODs for stellar-mass selected samples are very different to the HODs of SFR or cold gas mass selected samples. In particular, HODs for mass-selected samples sensibly assume that above a given halo mass, all halos contain a central galaxy. However, in not all cases does this central galaxy fall within a star-formation rate or cold gas selected sample (e.g. due to the suppression of gas cooling in high mass halos via AGN feedback), so for star-formation rate limited samples the HOD for central galaxies may be peaked rather than a step function (Contreras et al., 2013).

5.1 An 8-parameter HOD model

Studying the clustering of emitters at , Geach et al. (2012) developed an 8-parameter model suitable for star-formation selected samples via comparison to the predictions of the semi-analytic model GALFORM (Cole et al., 2000; Bower et al., 2006). In this parameterisation, the mean numbers of central3 and satellite galaxies in a halo of mass M are given by:


The key parameters are:

  • : the halo mass at which the probability of hosting a central galaxy peaks.

  • : the width of the Gaussian distribution of centrals around its peak, .

  • : the threshold halo mass for satellite galaxies, above which the distribution follows a power law .

  • : characterises the width of the transition to around .

  • : the slope of the power law for in halos with .

  • : normalisation factors, in range [0,1].

  • : the mean number of satellite galaxies per halo, at .

Geach et al. (2012) did not have large enough samples to fit all 8 parameters simultaneously, so fixed the following parameters:

  • . The minimum mass halo hosting a satellite galaxy is the mass at which the central HOD peaks.

  • . The smoothing of the low-mass cut-off for satellite galaxies is not critical, as satellites in low mass halos contribute little to the overall HOD.

  • . This is consistent with the literature for mass-selected samples.

The total number of galaxies is given by:


Some implementations use , requiring a central for every satellite galaxy. Given that our sample is essentially star-formation rate limited, some of our galaxies could be star-forming satellites around less highly star-forming centrals which are not included in our sample. Therefore we do not impose this condition.
We have performed a number of tests with different HOD parameterisations (e.g. allowing to vary, fitting a full 8-parameter model), and confirm that neither the reproduction of the correlation function nor the values of the derived parameters are dependent on our choice (see Appendix B.1). We base our parameterisation on the 5-parameter model of Geach et al. (2012), but truncate the halo occupation sharply at , allowing only halos more massive than this to host galaxies. As detailed in Appendix B.2, we have found that allowing the HOD to reach lower halo masses results in values of which are strongly dependent on the lower limit of the HOD integral, and which are poorly constrained. is now the minimum mass of halo hosting central galaxies, and, due to the shape of the halo mass function, also the most common host halo mass. Reassuringly, all other derived parameters are robust against the choice of lower limit.

[“unif”, 10, 13.0, 11.5] [“unif”, 0.001, 1.0, 0.01] [“unif”, 0.001, 1.0, 0.9] [“unif”, 0.001, 1.0, 0.4] [“log”, 0.05, 1.0, 0.5]
Table 5: Fitted HOD parameters, with MCMC priors used (form, minimum, maximum, starting point). We show here the derived parameters for the large samples of galaxies within a fixed range at each redshift. is the minimum mass halo hosting a galaxy, determines the number of satellite galaxies per halo, are normalisation factors, and is the width of the Gaussian distribution of centrals around its peak, .

5.2 Physical parameters from HOD models

When fitting the models to data, we use the observed number density of galaxies as a constraint. For a given output from the halo model, the predicted number density of galaxies is:


where n(M) is the halo mass function. Here we use that of Tinker et al. (2010). The observed number density of galaxies used is the integral of the luminosity function between the same limits used to select the real and random galaxy sample (using the luminosity functions derived by Sobral et al. (2013); Sobral et al. (2015) for the same data). We assume a error on the number density in the fitting.

For each set of HOD parameters, we may derive a number of parameters of interest for galaxy evolution. The satellite fraction is:


with the corresponding central fraction .
The effective halo mass, the typical mass of galaxy host halo is:


The average effective bias factor, which characterises the clustering of galaxies relative to dark matter, is:


where is the halo bias, a function of halo mass M. We use from Tinker et al. (2010).

5.3 Fitting HOD models to HiZELS -emitting galaxies

We use the HMF (Murray et al., 2013) and HALOMOD codes (Murray, in prep.) to fit HOD models to the correlation functions. These take an HOD parameterisation and construct real-space correlation functions for a range of parameter inputs. For each set of parameter inputs, we compare the projection of the modelled real-space correlation function with that observed, and calculate the log likelihood. We use emcee (Foreman-Mackey et al., 2013), a fast python implementation of an affine-invariant Markov Chain Monte Carlo (MCMC) ensemble sampler, to sample the parameter space of our 5 fitted parameters and optimise the fit to the correlation function. As discussed, we fit the number density of galaxies in the log-likelihood fitting as a further constraint. We use 500 walkers, each with 1000 steps.
We present examples of the best-fit modelled correlation function and its HOD occupation, decomposed into the central and satellite galaxy terms, in Figure 5. The parameterisation, shown here for a correlation function constructed using the full sample of galaxies at , provides a good fit to the data, and clearly shows the separate contributions of the clustering within a single halo and between dark matter halos.
For each correlation function to which an HOD model is fitted, we estimate the following parameters: , , , . We take the 50th, 16th and 84th percentiles of the posterior distribution of each of these derived parameters, to obtain an estimate of the median and associated errors. The individual HOD input parameters , \textcolorblackand , tend to be individually less well constrained due to correlations between them. In Table 5, we present the five HOD parameters fitted to the correlation functions of large samples of galaxies within a fixed range at each redshift. In Appendix B.3, we show an example of the MCMC output for one of our HOD fits.
\textcolorblackThe selection of galaxies within a fixed range, as in Section 4.1 (see Table 4), allows the comparison of similar galaxies across cosmic time. Interestingly, the derived galaxy occupations as a function of halo mass are similar, consistent within their errors (see Figure 6). Although the distributions are not exactly the same across the different redshift ranges, we deduce from this that samples of galaxies selected from HiZELS at similar trace similar dark matter halos across redshift. Intrigued by this, we compare galaxies within narrower bins in Section 5.4.

Figure 6: HOD parameterisations of samples of galaxies at , and , within fixed ranges of line up closely. \textcolorblackAlthough the distributions are not exactly the same across the different redshift ranges, galaxies selected at similar seem to trace similar dark matter halos across redshift.
Figure 7: Fitted halo occupation distributions for luminosity-binned (left) and luminosity-limited (right) samples at . Higher luminosity emitters occupy higher mass dark matter halos. \textcolorblackOur results are qualitatively consistent between the luminosity-binned and luminosity-limited samples, but trends are cleaner for the luminosity-limited samples, which are larger.
Figure 8: Derived properties of galaxy populations of HiZELS galaxies binned by luminosity. We find a linear, broadly redshift-independent relationship between halo mass and luminosity. As in Figure 3, the paler colours denote alternative binning. The lines of best fit derived in Section 5.4 are overplotted: