Halo models of HI selected galaxies

Halo models of HI selected galaxies

Niladri Paul, Tirthankar Roy Choudhury & Aseem Paranjape
Inter-University Centre for Astronomy and Astrophysics, Ganeshkhind, Post Bag 4, Pune 411007, India.
National Centre for Radio Astrophysics, Tata Institute of Fundamental Research, Ganeshkhind, Post Bag 3, Pune 411007, India.
E-mail: npaul@iucaa.inE-mail: tirth@ncra.tifr.res.inE-mail: aseem@iucaa.in

Modelling the distribution of neutral hydrogen (Hi) in dark matter halos is important for studying galaxy evolution in the cosmological context. We compute the abundance and clustering properties of Hi-selected galaxies using halo models constrained by data from the ALFALFA survey. We apply an MCMC-based statistical analysis to constrain the model parameters through two different approaches. In the first approach, we describe the Hi content of galaxies in dark matter halos directly, using a halo occupation distribution (HOD) for the number counts of Hi galaxies. We find that a significant number of low mass () galaxies must be satellites. In the second, more novel approach, we infer the Hi-dark matter connection at the massive end () indirectly, using optical properties of low-redshift galaxies as an intermediary. In particular, we use a previously calibrated optical HOD describing the luminosity- and colour-dependent clustering of SDSS galaxies and describe the Hi content using a statistical scaling relation between the optical properties and Hi mass. The resulting best-fit scaling relation identifies massive Hi galaxies primarily with optically faint blue centrals, consistent with expectations from galaxy formation models. Our results will be useful in making forecasts for future observations of Hi galaxies with upcoming radio telescopes like the SKA, as well as in exploring synergies between SKA and optical surveys such as Euclid and LSST.

galaxies: formation – cosmology: large-scale structure of the Universe – methods: numerical, analytical
pagerange: Halo models of HI selected galaxiesC

1 Introduction

In the current understanding of Lambda-cold dark matter (CDM) cosmology, the large scale structure of the universe forms in a hierarchical manner driven by the collisionless cold dark matter. The gravitational instability leads to formation of high density collapsed objects (or haloes) that are decoupled from the cosmic expansion. The baryons, being less abundant, follow the dark matter distribution and settle in the gravitational potential well formed by the dark matter haloes and eventually form stars and galaxies (Mo et al., 2010). Perhaps the most effective observational probes of structure formation, particularly at low redshifts (say ), are related to the stars and the interstellar medium of these galaxies. For example, the environmental dependence of star formation in galaxies can be probed using optical and UV (ultra-violet) luminosities through large surveys like the 2dF Galaxy Redshift Survey (2dFGRS) (Norberg et al., 2001; Lewis et al., 2002) and Sloan Digital Sky Survey (SDSS) (Baldry et al., 2004; Zehavi et al., 2005; Skibba & Sheth, 2009). Alternatively, one can also probe the low-density gas residing in the interstellar medium using atomic and molecular lines, one of the most prominent lines being the 21 cm transition of neutral hydrogen (Hi) (Draine, 2011). At low redshifts, this line is detected in emission through surveys like the HI Parkes All-Sky Survey (HIPASS) (Zwaan et al., 2003; Zwaan et al., 2005) and Arecibo Legacy Fast ALFA (ALFALFA) (Giovanelli et al., 2007) which can provide us with a robust estimate of the Hi mass of the galaxy. The cold gas traced by the Hi 21 cm line can act as a natural source for star formation and thus one expects some correlation between the Hi and optical properties of the galaxies. Such correlations have been explored by cross-matching the Hi galaxy catalogues with samples of optical and ultra-violet selected galaxies and studying physically interesting quantities such as the Hi-to-stellar mass ratio (Catinella et al., 2010; Maddox et al., 2015).

On the theoretical front, the formation and evolution of dark matter haloes are reasonably well understood, thanks to both -body simulations and analytical calculations. Since these dark matter haloes cannot be probed directly via observations, one needs to include some basic understanding of the galaxy formation process for connecting the theoretical calculations to the observational data. Unfortunately, the theoretical modelling of galaxy formation consists of various complex astrophysical processes, many of which are relatively poorly understood. These include, e.g., gas cooling, fragmentation, formation of molecular gas, star formation, feedback processes etc. These processes are highly non-linear and cover a wide range of length and temporal scales. Hence it is almost impossible to model them self-consistently in analytical calculations or numerical simulations. The most common approach in modelling the galaxy formation has been to make approximations in the calculations by introducing a number of free parameters. These parameters can subsequently be constrained by matching with observations over a wide range of wavebands, thus gaining insights into the physical processes involved (see Somerville & Davé, 2015, for a review).

An alternate, statistical approach in connecting the observations of galaxies to the dark matter haloes is to use a phenomenological prescription to populate the haloes with galaxies. These prescriptions consist of free parameters which are constrained using observations. These include, e.g., the Halo Occupation Distribution (HOD; see, e.g., Cooray & Sheth, 2002, for a review) and SubHalo Abundance Matching (SHAM) based approaches (see, e.g., Reddick et al., 2013; Hearin et al., 2013), both of which have been widely used while studying the optical/UV-selected galaxies. The main advantage of these methods is that they do not require any assumptions about the astrophysical processes (say, cooling or feedback) inside the halo, rather the physical insights can be inferred indirectly once the model parameters are reliably constrained (Kravtsov et al., 2004; Zheng et al., 2005).

Since the number of Hi-selected galaxies has become quite substantial in recent years, it is worth exploring whether such statistical models can be naturally extended to understand the Hi properties of galaxies (Wyithe et al., 2009; Wyithe & Brown, 2010; Padmanabhan et al., 2016; Padmanabhan & Refregier, 2017; Padmanabhan et al., 2017; Padmanabhan & Kulkarni, 2017). Studies along these lines have been attempted recently by Guo et al. (2017) who have found that a SHAM-based model can match both the mass function and two-point correlation of the Hi galaxies.

In this paper, we perform an independent investigation of HOD-based models of the Hi distribution, using the abundance and clustering data of resolved Hi galaxies at low redshifts. Our analysis, based on the Markov chain Monte Carlo (MCMC) method, consists of two separate approaches: (a) In the first case, we use an HOD-based prescription to model the observed properties of the Hi-selected galaxies, exactly similar to that done for the optical SDSS galaxies by Zehavi et al. (2011). This will allow us to explore the relation between Hi content and the host dark matter halo mass. (b) In the second method, we use a novel treatment where we combine pre-calibrated optical HODs – these describe the luminosity- and colour-dependent clustering of SDSS galaxies (Zehavi et al., 2011; Skibba & Sheth, 2009) – with a modelled scaling relation between the optical properties and Hi mass of the galaxies. The free parameters of the scaling relation can be constrained using observations of Hi galaxies, thus allowing us to infer the correlation between the optical properties and Hi content. Note that this analysis does not require the Hi and optical samples to be necessarily matched and thus can contain larger number of objects. One can, of course, use the results of the matched samples as additional constraints in our analysis which may further reduce the uncertainties in the values of the free parameters.

The article is organized as follows. We describe the data set used in this work in section 2. In section 3, we present the results of our HOD-based analysis. This is followed by the analysis based on scaling relation between the Hi mass and optical properties of the galaxies in section 4. We present a detailed discussion of our results in section 5, and conclude in section 6. Some of the details of our calculations are given in the Appendices.

Throughout this article, we have chosen a flat CDM cosmology with matter density parameter , baryon density parameter , Hubble parameter with , primordial r.m.s. density fluctuations at a scale of , and an inflationary spectral index, , consistent with the 5-year results of the Wilkinson Microwave Anisotropy Probe experiment (Komatsu et al., 2009), and also with the values assumed by Zehavi et al. (2011) in calibrating their HOD, which we will use in our scaling analysis. Wherever needed, we have used the Eisenstein & Hu (1999) fitting function for the linear theory matter power spectrum. We will quote halo masses in , neutral hydrogen masses in and stellar masses in . Galaxy luminosities are quoted in terms of , where is SDSS -band absolute magnitude, K-corrected and evolution corrected to (Blanton et al., 2003).

2 Data set

In this work we use the abundance and clustering data of individually resolved galaxies based on their Hi mass from ALFALFA survey (Papastergis et al., 2013) to constrain our halo model. The sample consists of Hi-selected galaxies from the catalogue of the ALFALFA survey (Giovanelli et al., 2007). In particular, we use the projected correlation function for samples defined using six different thresholds of neutral hydrogen mass: with .

While computing the likelihood for our analysis, we use the full covariance error matrix which was obtained by bootstrapping the data sample (Papastergis et al., 2013). Clustering data and bootstrap covariance matrices were available for projected separations between - Mpc. For three of the thresholds – , and – the quality of the covariance matrix for a few values of separation was sub-optimal and led to non-converged MCMC chains. In these cases, we have excluded the correlation function measurements at these separations, along with the corresponding row and column for each in the covariance matrix. We have indicated the results affected by this choice below.

For the sake of brevity, we will display results below for the thresholds and . The summary of all other results will be given in a Table and a summary plot.

3 HOD analysis

Our first approach for studying the Hi content in galaxies is based on populating the dark matter haloes using the HOD approach.

A simplifying assumption in this approach is that the galaxy distribution is determined only by the halo mass and no other property of the halo (Berlind & Weinberg, 2002). In principle, the method can be extended to account for the dependence on other halo properties as well (Cooray & Sheth, 2002; Wechsler et al., 2006). In this work, however, we will focus on the simple “halo mass only” parametrisation and comment on possible extensions (see, e.g., Paranjape et al., 2015; Zu & Mandelbaum, 2015; Hearin et al., 2016) towards the end. We will adopt the standard convention and assume a “central” galaxy living at the center-of-mass of the halo, along with other “satellites” distributed in a sphere around the central. Since we will focus on the projected 2-point clustering signal below, we ignore velocity offsets between galaxies and the dark matter potential well which can be important in modelling redshift space clustering (Guo et al., 2015).

We parametrize the fraction of -halos (i.e. halos having mass in the range and ) which contain a central galaxy having neutral hydrogen mass more than as


where is the maximum value of the fraction of haloes hosting a central, is the cutoff mass where the halo fraction is and is the width of the cutoff profile. Note that the above parametrization implies when , i.e., only a fraction of large mass haloes are guaranteed to host a central. This form is very similar to the most common parametrization used while modelling the optical galaxies (Zheng et al., 2007; Zehavi et al., 2011), the only difference being the additional parameter . This parameter is included to allow for the possibility that a fraction of haloes may not contain any Hi galaxies at all. Note that the parameter is constrained to have values on the range .

For the satellites we assume that their number in a halo follows a Poisson distribution with a mean that depends on halo mass in the form of a power law. The clustering data of Hi-selected galaxies we use in this work (see below) is not sufficiently constraining to explore deviations from these assumptions, such as those described by van den Bosch et al. (2013), for example. The mean number of satellites in -halos having Hi mass more than is assumed to be of the form


where is the mean number of satellites in haloes of mass and is the power-law index. Finally, throughout the analysis we impose the numerical restriction and never consider halos smaller than the smallest Hi content of the galaxies they are expected to host.

Figure 1: This plot shows the 1, 1.5 and 2- parameter contours, and the corresponding marginalized 1-D distributions, of our parametric HOD (c.f. equations 1 and 2) for . The values of are in . The orange horizontal and vertical lines denote the best-fit values of the parameters. The values above the 1-D distributions indicate the median and range (i.e., the and percentiles) of the respective parameter.

We thus have five free parameters in our model , , and . Having populated the haloes with galaxies in this manner, we can compute their mass function and two-point correlation function in the standard halo model framework (Cooray & Sheth, 2002; Berlind & Weinberg, 2002; Zehavi et al., 2005) which we briefly describe in Appendix A.1. The computed Hi mass function and correlation function can then be matched with observation and the free parameters in our model are constrained with a Bayesian MCMC analysis using the package EMCEE (Foreman-Mackey et al., 2013).

3.1 Constraints on HOD parameters

The lowest threshold contains all the Hi-selected galaxies in the sample. The likelihood contours for the four parameters for this threshold are shown in Figure 1. We see that the parameter is consistent with being unity, i.e., almost all large mass haloes are guaranteed to host a galaxy having Hi mass . The most interesting conclusion drawn from this figure is that the value of is inconsistent with zero with a high significance (). This implies that the current data requires Hi to be distributed in satellites and simple prescriptions where dark matter haloes are assumed to contain not more than one Hi galaxy are ruled out.

Figure 2: Projected correlation function (see Appendix A.1) for the threshold . The red points with error bars show the projected correlation function measured from the ALFALFA survey (Papastergis et al., 2013). The open circle indicates a data point that had to be excluded from the analysis; see text for a discussion. The black solid curve shows the correlation function computed using our best-fit parametric HOD. The long-dashed (short-dashed) curve shows the -halo (-halo) term separately. The reduced Chi-squared dof and -value of the fit are quoted in the labels (see also Table 1). The inset shows , the average number of galaxies per unit comoving volume for this threshold; the red point with error bar shows measurement from ALFALFA survey (Papastergis et al., 2013) and the black circle shows the abundance as computed from our best-fit model. Since Papastergis et al. (2013) do not quote any uncertainty on the abundance data, we used binned measurements from Martin et al. (2010) to compute the error on the thresholded abundances assuming independent Poisson errors.
8.0 44.21/18
8.5 15.71/18
9.5 14.06/18
0 0 7.65/8
Table 1: Summary of results for the parametric HOD analysis based on equations (1) and (2). The first column lists the threshold of neutral hydrogen masses for which the analysis is carried out. The last column quotes the total Chi-squared and number of degrees of freedom (dof) in the problem. The other five columns quote the parameter values; the best-fit values are quoted outside the parentheses while the numbers inside parentheses are the median value with range (i.e., the and percentiles) for each parameter. Rows marked with an asterisk on the  threshold indicate cases where we excluded some data due to the quality of the covariance matrix (see section 2). For and , we excluded the single data point with  Mpc (marked as the open circle in Figure 2), while for we excluded all data points having  Mpc and performed a 3-parameter fit assuming zero satellites. We find good fits for all thresholds except ; see text for a discussion.

The reason why satellites are necessary to match the data can be understood from Figure 2 where we show the projected correlation and abundance of Hi galaxies (see Appendix A.1) for the model with best-fit parameter values. For , we show the contributions from the -halo and -halo terms separately. It is clear that the correlation at large separations Mpc is consistent with the -halo term. However, the data points at the smaller scales require invoking the -halo contribution which in turn leads to the necessary requirement of satellites. In order to check whether this conclusion is an artefact of the HOD parametrization chosen, we carried out a separate MCMC analysis for models without satellites, i.e., by fixing the two parameters . In that case, we found that the best-fit model has a , significantly worse than the best-fit model with satellites included for which (see Table 1). We also note that, for this threshold, the data point at  Mpc (indicated by the open circle in Figure 2) was excluded along with the corresponding row and column of the error covariance matrix (see section 2). To ensure that our conclusion of non-zero satellite number is not then driven by the single data point at  Mpc, we also repeated the analysis additionally excluding this data point as well (i.e., excluding both data points with Mpc). Although the constraints are understandably weaker in this case, we still find with a best-fit value of .

Figure 3: Mean number of galaxies per halo in halos of mass for the threshold as computed from our best-fit parametric HOD model. The dashed red (dotted blue) line shows the abundances of central (satellite) galaxies as a function of halo mass, and the solid black curve shows the sum of these two. The dotted black line shows the total galaxy abundance using the median values of our parameter chains, and the green band shows the error around the median. We see that there is a substantial number of satellite galaxies contributing neutral hydrogen for this threshold and their number increases with halo mass.

Further insights on the Hi-hosting galaxies can be gained from Figure 3 where we plot the mean number of Hi galaxies per halo as a function of halo mass. We can see that the galaxies are expected to reside in haloes of mass . The mean central fraction saturates to almost unity at (determined by the value of ) while the satellite number increases with halo mass.

We repeat the analysis for other Hi mass thresholds, the results of which are summarized in Table 1. From the values of the reduced Chi-squared for the best-fit model (last column), we find that our model provides acceptable fits to the data for all thresholds except for the case of . The in this case is , considerably larger than the other cases. We suspect that this may be related to the non-convergence of the error covariance matrix of the observational data obtained by bootstrap method. For the subsequent discussions, we will thus ignore this threshold.

The next point to be noted from the Table is that the median value of decreases to for the higher thresholds. This implies that only half of the large mass haloes can contain centrals of Hi mass . We also find that the the necessity of including satellites in the model holds for the and cases as well, as can be seen from the constraints on . Interestingly, the number of satellites drops significantly for the next threshold , where we find that the value of is consistent with zero within and the best-fit value is also small. To understand the reason, we plot for this threshold in Figure 4 showing the contributions from the -halo and -halo terms separately. It is clear from the plot that all the data points, including the ones at smaller separations, are well-fit by the -halo term alone. This indicates that the presence of any significant number of satellites is not necessary to explain the high mass Hi galaxies. The mean galaxy number per halo for this Hi threshold as a function of halo mass is shown in Figure 5. It is again obvious that the data does not require many satellites to be present except at low halo masses .

Figure 4: Same as Figure 2 but for the threshold .

For the highest mass threshold , we find that the five-parameter model leads to non-converging MCMC chains. Given the results for the previous threshold, we anticipate that high mass Hi galaxies are unlikely to be related to the satellites. Hence we set , and fit for only three parameters , and . The chains converge satisfactorily in this case and we obtain a good fit to the data, as can be seen from the last row of Table 1. 111The reader may have noticed that, once we set , the 2-halo term depends only on the combination at large scales (see equation (9) in Appendix A). Strictly speaking, there is a strong degeneracy between and . We therefore performed a test in which we fixed and varied . We found that, although we achieve a formally good fit with , the model is forced towards unphysically large values of () accompanied by large values of . In other words, the model tries to compensate for the fact that in this case, by placing galaxies in very large halos with a large scatter, thus mimicking in the range . Something similar happens if both and are allowed to vary; the degeneracy between and again forces the model into unphysical regimes of . We thus conclude that a model with , with allowed to vary, is in fact the natural and self-consistent choice for this threshold.

We have therefore demonstrated that a simple halo mass-based HOD can in fact produce a reasonable description of the data. We should mention here that Guo et al. (2017) have attempted a similar HOD-based analysis and found that it leads to results that are physically not justifiable (e.g., they found Hi-to-halo mass ratios larger than unity in some cases). It is not straightforward to identify the precise reason why our conclusions differ with theirs. One possible reason could be that the clustering measurements we use are weighted by inverse number density (Papastergis et al., 2013), while Guo et al. (2017) have used pairwise inverse weights. In Appendix C we study the Hi-to-halo mass relation implied by our HOD analysis in some detail, showing that our best-fit HODs place physically reasonable amounts of Hi in the vast majority of Hi-occupied halos. We will return to a discussion of the implications of our results in section 5, where we will also compare them with the results of another, more novel, analysis described in the next section.

Figure 5: Same as Figure 3 but for the threshold . Additionally, the brown curves with orange band are formatted similarly to the black curves and green band, and show the HOD inferred from the scaling relation analysis in section 4. For this threshold, the Hi contribution in the best-fit parametric HOD comes mainly from the central galaxies. The scaling relation analysis, on the other hand, assumed zero satellites at the outset. See text for a discussion.

4 Scaling between HI and optical properties

In this section, we discuss a different approach to understand the relation between Hi content and the halo mass, by additionally exploring the relation between Hi and optical properties of the galaxies. To this end, we make use of existing halo model calibrations for the luminosity- (Zehavi et al., 2011) and colour-dependent clustering (Skibba & Sheth, 2009; Paranjape et al., 2015; Pahwa & Paranjape, 2017) of optical SDSS galaxies and prescribe a scaling relation between the optical and Hi properties. We focus on the massive end of the Hi galaxy population which is expected to have substantial overlap with the optically faint end of the galaxy population observed by SDSS. Indeed, cross-matching analyses have found large numbers of galaxies that can be paired between ALFALFA, SDSS and GALEX (Catinella et al., 2010; Toribio et al., 2011; Maddox et al., 2015). Our approach to connecting the optical properties with Hi content, however, is statistical, inspired by similar analyses routinely performed in the galaxy cluster community.

4.1 Setup

We assume that Hi mass obeys a scaling relation in terms of optical luminosity and colour , defined by a mean relation and a scatter around this mean. For definiteness, we assume that the distribution of  at fixed luminosity and colour is Lognormal, so that this mean and width describe the full distribution. We then tie together this scaling with colour-luminosity relations and an optical HOD calibrated by previous analyses, to produce a model for the Hi mass function and 2-point correlation function. Schematically, we write a conditional Hi-mass function in terms of a conditional luminosity-function , colour-luminosity distribution and the assumed Hi-optical scaling distribution as


which is then used to build a halo model of Hi clustering. As in the HOD analysis of section 3, we impose the numerical restriction and never consider halos smaller than the smallest Hi content of the galaxies they are expected to host.

In principle, the optical (and also Hi) properties of centrals and satellites should be modelled separately (Skibba & Sheth, 2009; Zehavi et al., 2011). We argue below, however, that our analysis must be restricted to the massive end of the Hi mass function, which allows us to set the number of satellites to zero and only focus on the scaling relation for centrals. This restriction could be relaxed in the future by modelling the clustering of optically faint galaxies. Appendix A.2 gives the details of our implementation, and also discusses the motivation behind some of our technical choices (e.g., we assume that the colour-luminosity relation and Hi-optical scaling are independent of halo mass ).

After exploring various choices for the functional dependence of the mean relation on luminosity and colour, we finally settled on


where we use the SDSS -band absolute magnitude in place of luminosity, and set to be the colour. is a luminosity threshold that we discuss later. The sign convention for and is such that implies a positive correlation between Hi mass and luminosity and places more Hi in blue galaxies.

The motivation for the complicated-looking logarithm in equation (4) involving colour is the fact that the quantity inside the logarithm gives a good description of the stellar mass-to-light ratio of SDSS galaxies (Wang & White, 2012; Paranjape et al., 2015), so that


(see equation 7 of Paranjape et al., 2015) where and is the -band absolute magnitude of the Sun. It follows that setting would enforce a power-law scaling between Hi mass and stellar mass, . The model in equation (4) is flexible enough to allow for deviations from a strict dependence of Hi mass on stellar mass, while still stable enough to the dynamic range of optical colours. (Other choices such as linear or quadratic scalings with colour led to unstable results with non-convergent Monte Carlo chains.)

We then use the MCMC method to constrain the four free parameters that define the scaling, i.e., , , and . The Hi galaxy clustering data we use for obtaining the constraints is essentially the same as the one used in the previous sections. For the Hi mass function, we use the binned results given in Figure  of Martin et al. (2010). As compared to the results of Papastergis et al. (2013), this gives us more leverage in the MCMC analysis by probing the shape of the high-mass Hi mass function in some detail. We assume independent errors for these binned measurements.

4.2 Sample completeness

Since we are relying on an optical HOD calibrated by Zehavi et al. (2011), we are restricted to using the luminosity range explored by those authors in their clustering measurements and HOD analysis, which was . Our luminosity threshold in all integrals is therefore set to . Consequently, our analysis must also be restricted to the massive end of the Hi distribution, since we expect at least some positive correlation between luminosity and Hi mass. Exactly which threshold of  we can reliably probe, however, cannot be decided a priori since the scaling between Hi mass and optical properties that would give us this information is the very quantity we are trying to constrain. To break this circularity and produce stable and converged results, we proceed as follows.

Let us assume that the true scaling is unique, i.e., independent of the  threshold. In the following we focus on the two highest  thresholds we have measurements for, namely and . The faintest luminosity we can access is , and we will build the argument below using this and one additional brighter threshold, which we choose to be . The argument is conceptually insensitive to the value of the second threshold, and can also accommodate a larger number of thresholds brighter than .

The following possibilities then exist:

  • Case I: Galaxies with all correspond to , and galaxies with correspond to .

  • Case II: Galaxies with all correspond to , but a substantial number of galaxies with correspond to , i.e., very faint SDSS galaxies.

  • Case III: A substantial number of galaxies with correspond to .

This exhausts all relevant possibilities for the correlation between Hi mass and luminosity in this example.

9.5 28.81/26
Table 2: Summary of results for the scaling relation analysis based on equation (4) (see also Appendix A.2). The columns refer to the  threshold used in the analysis, followed by the best-fit (outside parentheses) and median values (in parentheses) of the four parameters , , , , and finally the total Chi-squared and number of degrees of freedom (dof) in the problem. The parameter has units of . See text for a discussion.
Figure 6: Same as Figure 1, showing results for the four parameters in our scaling relation analysis (equation 4; see also Appendix A.2) obtained for the threshold .

To test which of these cases is consistent with the data, consider performing MCMC analyses using the following cuts in Hi mass and luminosity: (i) {, }, (ii) {, }, (iii) {, }, (iv) {, }. If Case I holds, then (i) and (ii) will give consistent results, (iii) will give biased results, but (iv) will give results consistent with (i) and (ii). If Case II holds, then (i) and (ii) will be consistent with each other, but neither (iii) nor (iv) will be consistent with these or with each other. If Case III holds, then (i) and (ii) will not be consistent with each other.

We performed all four analyses (i)-(iv) and found that the results are in fact consistent with Case I. We also checked that using as the second threshold instead of leads to similar conclusions. One can therefore safely say that most galaxies with correspond to luminosities . Below, we will therefore display results for , setting .

4.3 Excluding satellites

Finally, motivated by the HOD-based analysis of the previous section, we set the number of Hi-containing satellites in the model to zero for our chosen threshold of . We have also run a model where the number of satellites is taken to be a free parameter. It turns out that the best-fit parameter values in this case correspond to Hi being hosted in roughly equal parts in optically red and blue galaxies at the massive end, which in turn is inconsistent with the colour distributions of Hi-selected galaxies (Maddox et al., 2015). For completeness, we present the details of this analysis in Appendix B.

Figure 7: The Hi mass function. The cyan circles with error bars are measurements from the ALFALFA survey (Martin et al., 2010). The solid black curve shows the mass function computed using our best-fit scaling relation constrained using abundance and clustering data for . The turn over at the low Hi mass end is due to the incompleteness of the optical HOD based on SDSS data used in our scaling relation. The red (blue) dashed curve shows the mass function computed using the scaling relation for red (blue) galaxies. Our model predicts that blue galaxies dominate the Hi content for our chosen threshold.

4.4 Results

We find a good fit for our four-parameter model constrained by the Hi mass function and correlation function, with a Chi-squared/dof of (see Table 2 for the best-fit parameter values). The corresponding likelihood contours are shown in Figure 6. We see that the best-fit value of corresponds to a small but positive correlation between Hi mass and luminosity, and similarly the best-fit value of implies blue galaxies being preferred in Hi content. To better understand the characteristics of the best-fit model, we show the corresponding Hi mass function in Figure 7, with the contribution of the red and blue galaxies shown separately. Unsurprisingly, the mass function is dominated by the blue galaxies, with the red galaxies contributing only . The turn-over of the predicted mass function at small is due to the hard luminosity cut of imposed, as we mentioned before, by the HOD calibration of Zehavi et al. (2011).

Figure 8: Projected correlation function for the threshold . The red points with error bars (repeated from Figure 4) show the measured correlation function from the ALFALFA survey (Papastergis et al., 2013). The solid black curve shows the correlation function computed using the best-fit scaling relation parameters for all galaxies whereas the dotted red and blue curves show the same for red and blue galaxies, respectively.
Figure 9: Total galaxy number density, defined as the mean number of galaxies per halo multiplied by the halo mass function, as a function of halo mass. The curves and bands are the counterparts of the ones in Figure 5.

The corresponding predictions for the projected Hi correlation function for the best-fit model are shown in Figure 8. The clustering data, as we can see from the plot, do not require any contribution from the satellites (i.e., the -halo term is sufficient to explain the data). Our model predicts the clustering amplitude of Hi-hosting red galaxies to be slightly higher than the blue ones (c.f. the colour-dependent clustering of optical galaxies, Skibba & Sheth, 2009; Zehavi et al., 2011); the data, of course, do not distinguish between these.

The mean number of galaxies per halo as a function of halo mass – i.e., an inferred HOD – obtained from the scaling analysis was shown in Figure 5 as the brown curves with the orange band. Figure 9 shows an alternative representation of these results, where we have multiplied the respective HODs with the halo mass function (we use the fitting function from Tinker et al., 2008, for the latter; see Appendix A.1); the result is the total number density of galaxies per logarithmic halo mass interval, as a function of halo mass. We discuss these results further in section 5.

5 Discussion

In this section, we discuss some implications of our results, comparing the results obtained using the two different methods discussed in the previous two sections with each other and with results from the literature. We also highlight some of the caveats in our analysis and mention a few possibilities for future improvement of our technique.

5.1 Implications of scaling relation analysis

Our scaling relation analysis, which is built on the success of optical HODs calibrated using SDSS DR7 data, allows us to address some interesting questions which the more traditional parametric HOD analysis based only on ALFALFA data cannot.

For example, our choice of  scaling in equation (4) allows us to calculate the mean Hi mass density contributed by low-redshift massive galaxies with different luminosities and colours. This is shown in Figure 10, where the coloured region indicates the value of defined as the mean Hi mass per unit comoving volume (in units of ), per unit optical () colour, per unit -band absolute magnitude, as a function of optical properties. We have used the best-fit values for the scaling relation (Table 2) in making this plot, which clearly shows that our model places the bulk of Hi in SDSS galaxies with in faint blue centrals, with a small amount in red centrals as well.

Figure 10: Spatial density of  evaluated using our best-fit scaling relation, per unit colour, per unit -band magnitude , as a function of and . In particular, the colour indicates the value of , where and are the luminosity function and colour distribution at fixed luminosity, respectively, of central galaxies (see Appendix A.2). We see that most of the neutral hydrogen in SDSS galaxies with in our model resides in faint blue centrals except a small amount in red centrals. For comparison, the dashed line shows the relation which is often used as an empirical separator of red and blue galaxies in optical analyses (see, e.g., Skibba & Sheth, 2009).
Figure 11:  at fixed stellar mass predicted by our best-fit scaling relation, as a function of . The green points and the green band show the median and relation, respectively. The gray shaded area indicates values of that are affected by incompleteness due to the hard luminosity cut . For comparison, the black points with error bars show the mean and scatter of the measurements from Maddox et al. (2015, their Table 1); note that these were not used in constraining our model. Our model predicts a systematically smaller slope than is observed, although the scatter agrees well with that in the measurements.
Figure 12: This figure shows a summary of our analysis. The left panel shows the result for all galaxies (i.e., centrals + satellites) both from our parametric HOD (golden yellow markers) and scaling relation analysis (brown markers). We show this for two different ranges of halo mass (diamonds denoting the lower range and stars the higher range) and over the whole range of halo mass (open up-down triangle markers). The cyan circles denote the mass function from Martin et al. (2010). We see from this plot that almost all the contribution arises from the lower halo mass range. Therefore in the right panel we show the contribution from central (cyan inverted triangles) and satellite galaxies (green upward triangles) separately for the entire range of halo mass as found from our parametric halo model study. The points labelled ‘All galaxies’ are the same as the ‘Entire halo mass’ points from the left panel. In both the panels, the mass function computed from Papastergis et al. (2013) has been shown as open red squares joined by black lines.

The analysis above has been observationally motivated, with all variables and relations defined exclusively in terms of directly observable quantities such as Hi mass and optical absolute magnitudes. More physically, one might expect the amount of cold gas represented by  to be correlated with the stellar content of galaxies represented by the stellar mass . As discussed in section 4.1, a good description of the correlation between and optical observables is given by equation (5). We can, therefore, use our best-fit scaling relation to calculate the mean Hi mass in galaxies containing a fixed amount of stellar mass. This Hi-to-stellar mass relation has been well-explored in the literature, both observationally (see, e.g., Catinella et al., 2010; Toribio et al., 2011; Maddox et al., 2015) and theoretically (see, e.g., Davé et al., 2013; Popping et al., 2015). Figure 11 shows our best-fit prediction for this relation, compared with observational constraints from Maddox et al. (2015). The gray shaded area indicates values of that are affected by incompleteness due to the hard luminosity cut , calculated by comparing the stellar mass function predicted by the optical HOD (Appendix A.2) and equation (5) with Schechter-function fits to the measured SDSS stellar mass function from Peng et al. (2012). We see that our model systematically underpredicts the mean Hi content of massive galaxies.222In detail, our model prediction can be understood as follows. Combining equations (4) and (5), we can write , where the constant depends on the model parameters but not on luminosity and stellar mass, and is therefore irrelevant for this argument. To obtain , we must average over the conditional luminosity function of central galaxies (recall that we have set the satellite number to zero in this model). Since there is a positive correlation between luminosity and stellar mass for central galaxies, and since the best-fit and are both positive, the two terms involving and compete with each other. The result is a near cancellation which leaves almost no dependence of  on in our best-fit model (Figure 11). It is interesting to note that the same effect which puts Hi in blue galaxies in our model, namely the positivity of , also contributes to making the - correlation less positive. Since the discrepancy occurs at high stellar masses which are not affected by incompleteness, we believe it may be related to systematic effects in the correlation function measurements, which we discuss below. The error bars on the Maddox et al. data show the measured scatter of the  distribution at fixed . Correspondingly, the error band on the green symbols shows the prediction for this scatter from our model; this agrees well with the measurements.

We emphasize, however, that although our scaling relation analysis is restricted to the high Hi mass end, it provides a new statistical method to connect the Hi distribution to optical properties of galaxies, which is complementary to direct observational analyses of matched galaxy samples. A novelty of our method is that, since it is a halo model by construction, it also provides a simultaneous theoretical connection of the Hi and optical properties with the underlying dark matter distribution. In section 6, we mention a few interesting avenues for future work along similar lines.

5.2 Comparison of parametric HOD and scaling relation results

We now turn to a comparison of the results obtained using our two techniques. In Figure 12, we show the binned Hi mass function for our best-fit models, with contributions from different halo mass ranges shown separately. The mass function measured by Papastergis et al. (2013) (computed as differences of the thresholded number counts) and by Martin et al. (2010) are shown for comparison. We find that for both the analyses (parametric HOD and scaling relation) the Hi galaxies are primarily hosted in haloes of masses . It is obvious that the HOD-based analysis probes mainly the low Hi mass range, while the scaling-based method is restricted to more massive () galaxies.

At this point, it would be interesting to discuss the relative contribution of centrals and satellites to the Hi mass function, which is shown in the right panel of Figure 12. We show the number density of centrals and satellites as a function of Hi mass (we remind the reader that the scaling relation analysis, restricted to , does not include any satellites). We see that satellites dominate the Hi contribution by number at all but the highest values. While the number of the centrals remain roughly constant (allowing for the error-bars) throughout the mass range, the number of satellites decrease at high Hi masses.

Our findings are somewhat in disagreement with mass function predictions from recent SAMs of galaxy formation (Kim et al., 2015, 2017). The SAMs, which incorporate the photoionization feedback from reionization to probe the low-mass end of the Hi mass function, predict that the mass function is dominated by centrals for . In these models, the satellites contribute to only very small mass Hi galaxies. A careful look at our calculations would show that the satellites are necessary to explain the clustering measurements at small scales for . Hence it would be interesting to compare the SAMs rigorously against the data obtained from the Hi surveys.

Interestingly, using a SHAM-based analysis, Guo et al. (2017) also find that the Hi mass function is dominated by centrals across all the Hi mass range of interest. One reason for the difference could be the weighting scheme used for calculating the correlation functions (weighted by inverse number density in Papastergis et al. (2013), while weighted by inverse in Guo et al. (2017)). In particular, Guo et al. (2017) found that the large-scale clustering of Hi galaxies larger than in their analysis showed a substantial dependence on the  threshold, in stark contrast to the analysis of Papastergis et al. (2013) which showed essentially no -dependence of clustering.

Because of these differences, it is possible that some our conclusions, particularly in the high  range, could change if we use the measurements of Guo et al. (2017). E.g., the discrepancy we found between our model’s prediction for the - relation and observational estimates of this quantity in Figure 11 could plausibly be due to this systematic effect in the measured 2-point correlation function. If we were to instead use the measurements of Guo et al. (2017), the -dependent clustering they observe would force our model to place high  galaxies in increasingly more massive halos than it currently does. E.g., this could be accomplished by making the value of larger, thus more closely tying together the Hi-dependence of clustering in ALFALFA and the luminosity-dependence of clustering in SDSS. Since a similar argument applies to high galaxies as well (see, e.g., Zu & Mandelbaum, 2015, for a recent stellar mass-based analysis of SDSS clustering), it is conceivable that our model would now predict a stronger positive correlation between  and . We intend to explore this in the near future, along with implementing some improvements in our model as we describe in the next subsection.

It is nevertheless encouraging to note that our two different analyses lead to similar results in the high  regime where they can be compared (Figures 5 and 9). The total galaxy number density in Figure 9, in particular, peaks at in both methods. The best-fit and the median models of the scaling relation are within the range of the HOD-based method for almost the entire halo mass range. This is a non-trivial result which, combined with the discussion in section 5.1, emphasizes the power of joint optical-Hi statistical analyses in building a complete picture of galaxy evolution. Additionally, the analysis in Appendix C shows that our best fit models in both cases produce physically meaningful results for the ratio of Hi mass to halo mass in the majority of halos in which they place Hi.

5.3 Caveats

In addition to the potential improvement in clustering measurements mentioned above, our scaling relation model itself could benefit from a complete joint analysis of the optical HOD together with the optical-Hi scaling relation. This is primarily because the colour-luminosity relations we have used in this work (Paranjape et al., 2015; Pahwa & Paranjape, 2017, see Appendix A.2 for details) have all been calibrated separately and independently of the luminosity-dependent HOD of Zehavi et al. (2011). In particular, the calibration of the satellite red fraction (equation 17) taken from Pahwa & Paranjape (2017) is the most susceptible to systematic effects, since it was not based on a rigorous Monte Carlo fit to clustering data (see also Skibba & Sheth, 2009). Although our scaling relation analysis assumed no Hi-carrying satellites with , the corresponding split between red and blue centrals depends on the calibration of through equation (18). This aspect of our analysis can, therefore, be improved by simultaneously using clustering information from SDSS and ALFALFA in a joint analysis.

In the same same spirit, several ingredients of the analytical model described in Appendix A.1 – particularly those in the -halo regime, such as our treatment of scale dependence of halo bias, or the fact that we ignore halo exclusion – are affected by systematic uncertainty (see, e.g., van den Bosch et al., 2013, for a comprehensive study) which would result in biases on the parameter values being constrained. This affects both the scaling relation as well as the parametric HOD analysis. By comparing with mock galaxy catalogs, Pahwa & Paranjape (2017) had estimated these effects to be of the order of in the predictions of SDSS colour-dependent clustering. These uncertainties can also be largely mitigated by using numerical calibrations of halo clustering based directly on -body simulations (Zheng & Guo, 2016). We are currently incorporating such calibrations in our MCMC routines, and the results will be presented in forthcoming work (Pahwa et al., in preparation).

6 Conclusion

The study of Hi in galaxies is important because of its connection with star formation and galaxy evolution. In particular, connecting these galaxies to the properties of the host dark matter haloes can be of great help in comparing theoretical models with observations. In this work, we have used two different approaches – a traditional one based on a parametric HOD and another, more novel approach, based on an assumed scaling relation between Hi content and optical properties of galaxies – to model the mass function and clustering properties of Hi galaxies as measured in the ALFALFA survey.

Our two main findings are:

  • At low Hi masses (), the clustering data demand that Hi be placed in a significant number of satellite galaxies (Figure 3 and Table 1), with the number increasing towards lower  (Figure 12).

  • At higher masses , most Hi must be in central galaxies (Table 1). The optical-Hi scaling analysis in section 4 for this sample further implies that this Hi is primarily in optically faint blue centrals (Figure 10), which is consistent with observations from matched samples and expectations from galaxy formation models.

As mentioned previously, some of our conclusions regarding the high Hi mass end (e.g., the discrepancy between the Hi-to-stellar mass ratio from observations and our best-fit scaling relation seen in Figure 11) might change upon using an updated version of the ALFALFA clustering data (Guo et al., 2017); we intend to explore this in the near future.

Our work can be extended in several directions. For example, using our low-redshift results and making reasonable assumptions regarding the evolution of the Hi-to-dark matter mapping would allow us to make predictions for future high-redshift observations of Hi such as those expected from the SKA. The optical-Hi scaling analysis of section 4 is particularly interesting in this regard, since it can be combined with the known evolution of the optical HOD (Contreras et al., 2017) to make predictions for matched samples between SKA and upcoming optical surveys such as Euclid and LSST (Kitching et al., 2015; Bacon et al., 2015) .

Finally, although our models have been restricted to the simplest ‘halo mass only’ flavour of HOD analysis, they can easily be extended to include effects such as ‘galactic conformity’ (Weinmann et al., 2006), in which satellite galaxies ‘know’ about the star formation properties (or optical colour) of their central, with blue centrals being preferentially surrounded by blue satellites at fixed halo mass. The optical-Hi scaling analysis of section 4 is particularly suited to such an extension, since it can ride on previously developed formalism (Paranjape et al., 2015; Pahwa & Paranjape, 2017) which allows for galactic conformity to be included in the analysis in a tunable manner. The addition of Hi abundance and clustering data will add new leverage in the problem and is expected to be very useful in constraining the nature of galactic conformity, particularly at large separations (Kauffmann et al., 2013; Hearin et al., 2015). We will return to these questions in future work.


We are grateful to E. Papastergis for kindly providing us with the covariance matrix estimates for the clustering data from Papastergis et al. (2013). We thank Neeraj Gupta for useful discussions in the initial stages of this work, and R. Srianand for discussions throughout. We gratefully acknowledge the use of high performance computing facilities at IUCAA, Pune (http://hpc.iucaa.in). This work used the open source computing packages NumPy (Van Der Walt et al., 2011, http://www.numpy.org), SciPy (Jones et al., 01) and the plotting softwares Veusz (https://veusz.github.io/) and corner.py (Foreman-Mackey, 2016). NP acknowledges the financial support from the Council of Scientific and Industrial Research (CSIR), India as a Shyama Prasad Mukherjee Fellow. The research of AP is supported by the Associateship Scheme of ICTP, Trieste and the Ramanujan Fellowship awarded by the Department of Science and Technology, Government of India.


  • Bacon et al. (2015) Bacon D., et al., 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA14), p. 145
  • Baldry et al. (2004) Baldry I. K., Glazebrook K., Brinkmann J., Ivezić Ž., Lupton R. H., Nichol R. C., Szalay A. S., 2004, ApJ, 600, 681
  • Berlind & Weinberg (2002) Berlind A. A., Weinberg D. H., 2002, ApJ, 575, 587
  • Blanton et al. (2003) Blanton M. R., et al., 2003, ApJ, 592, 819
  • Catinella et al. (2010) Catinella B., et al., 2010, MNRAS, 403, 683
  • Contreras et al. (2017) Contreras S., Zehavi I., Baugh C. M., Padilla N., Norberg P., 2017, MNRAS, 465, 2833
  • Cooray & Sheth (2002) Cooray A., Sheth R., 2002, Physics Reports, 372, 1
  • Davé et al. (2013) Davé R., Katz N., Oppenheimer B. D., Kollmeier J. A., Weinberg D. H., 2013, MNRAS, 434, 2645
  • Davis & Peebles (1983) Davis M., Peebles P. J. E., 1983, ApJ, 267, 465
  • Draine (2011) Draine B. T., 2011, Physics of the Interstellar and Intergalactic Medium. Princeton University Press
  • Eisenstein & Hu (1999) Eisenstein D. J., Hu W., 1999, ApJ, 511, 5
  • Foreman-Mackey (2016) Foreman-Mackey D., 2016, The Journal of Open Source Software, 24
  • Foreman-Mackey et al. (2013) Foreman-Mackey D., Hogg D. W., Lang D., Goodman J., 2013, PASP, 125, 306
  • Giovanelli et al. (2007) Giovanelli R., et al., 2007, AJ, 133, 2569
  • Guo et al. (2015) Guo H., et al., 2015, MNRAS, 446, 578
  • Guo et al. (2017) Guo H., Li C., Zheng Z., Mo H. J., Jing Y. P., Zu Y., Lim S. H., Xu H., 2017, ApJ, 846, 61
  • Hearin et al. (2013) Hearin A. P., Zentner A. R., Berlind A. A., Newman J. A., 2013, MNRAS, 433, 659
  • Hearin et al. (2015) Hearin A. P., Watson D. F., van den Bosch F. C., 2015, MNRAS, 452, 1958
  • Hearin et al. (2016) Hearin A. P., Zentner A. R., van den Bosch F. C., Campbell D., Tollerud E., 2016, MNRAS, 460, 2552
  • Jones et al. (01 ) Jones E., Oliphant T., Peterson P., et al., 2001–, SciPy: Open source scientific tools for Python, http://www.scipy.org/
  • Kauffmann et al. (2013) Kauffmann G., Li C., Zhang W., Weinmann S., 2013, MNRAS, 430, 1447
  • Kim et al. (2015) Kim H.-S., Wyithe J. S. B., Power C., Park J., Lagos C. d. P., Baugh C. M., 2015, MNRAS, 453, 2315
  • Kim et al. (2017) Kim H.-S., Wyithe J. S. B., Baugh C. M., Lagos C. d. P., Power C., Park J., 2017, MNRAS, 465, 111
  • Kitching et al. (2015) Kitching T., et al., 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA14), p. 146
  • Komatsu et al. (2009) Komatsu E., et al., 2009, ApJS, 180, 330
  • Kravtsov et al. (2004) Kravtsov A. V., Berlind A. A., Wechsler R. H., Klypin A. A., Gottlöber S., Allgood B., Primack J. R., 2004, ApJ, 609, 35
  • Lewis et al. (2002) Lewis I., et al., 2002, MNRAS, 334, 673
  • Ludlow et al. (2014) Ludlow A. D., Navarro J. F., Angulo R. E., Boylan-Kolchin M., Springel V., Frenk C., White S. D. M., 2014, MNRAS, 441, 378
  • Maddox et al. (2015) Maddox N., Hess K. M., Obreschkow D., Jarvis M. J., Blyth S.-L., 2015, MNRAS, 447, 1610
  • Martin et al. (2010) Martin A. M., Papastergis E., Giovanelli R., Haynes M. P., Springob C. M., Stierwalt S., 2010, ApJ, 723, 1359
  • Mo et al. (2010) Mo H., van den Bosch F. C., White S., 2010, Galaxy Formation and Evolution. UK: Cambridge University Press
  • Navarro et al. (1997) Navarro J. F., Frenk C. S., White S. D. M., 1997, ApJ, 490, 493
  • Norberg et al. (2001) Norberg P., et al., 2001, MNRAS, 328, 64
  • Padmanabhan & Kulkarni (2017) Padmanabhan H., Kulkarni G., 2017, MNRAS, 470, 340
  • Padmanabhan & Refregier (2017) Padmanabhan H., Refregier A., 2017, MNRAS, 464, 4008
  • Padmanabhan et al. (2016) Padmanabhan H., Choudhury T. R., Refregier A., 2016, MNRAS, 458, 781
  • Padmanabhan et al. (2017) Padmanabhan H., Refregier A., Amara A., 2017, MNRAS, 469, 2323
  • Pahwa & Paranjape (2017) Pahwa I., Paranjape A., 2017, MNRAS, 470, 1298
  • Papastergis et al. (2013) Papastergis E., Giovanelli R., Haynes M. P., Rodríguez-Puebla A., Jones M. G., 2013, ApJ, 776, 43
  • Paranjape (2014) Paranjape A., 2014, Phys. Rev. D, 90, 023520
  • Paranjape et al. (2015) Paranjape A., Kovač K., Hartley W. G., Pahwa I., 2015, MNRAS, 454, 3030
  • Peng et al. (2012) Peng Y.-j., Lilly S. J., Renzini A., Carollo M., 2012, ApJ, 757, 4
  • Popping et al. (2015) Popping G., Behroozi P. S., Peeples M. S., 2015, MNRAS, 449, 477
  • Power et al. (2010) Power C., Baugh C. M., Lacey C. G., 2010, MNRAS, 406, 43
  • Reddick et al. (2013) Reddick R. M., Wechsler R. H., Tinker J. L., Behroozi P. S., 2013, ApJ, 771, 30
  • Skibba & Sheth (2009) Skibba R. A., Sheth R. K., 2009, MNRAS, 392, 1080
  • Smith et al. (2003) Smith R. E., et al., 2003, MNRAS, 341, 1311
  • Somerville & Davé (2015) Somerville R. S., Davé R., 2015, ARA&A, 53, 51
  • Tinker et al. (2005) Tinker J. L., Weinberg D. H., Zheng Z., Zehavi I., 2005, ApJ, 631, 41
  • Tinker et al. (2008) Tinker J., Kravtsov A. V., Klypin A., Abazajian K., Warren M., Yepes G., Gottlöber S., Holz D. E., 2008, ApJ, 688, 709
  • Tinker et al. (2010) Tinker J. L., Robertson B. E., Kravtsov A. V., Klypin A., Warren M. S., Yepes G., Gottlöber S., 2010, ApJ, 724, 878
  • Toribio et al. (2011) Toribio M. C., Solanes J. M., Giovanelli R., Haynes M. P., Martin A. M., 2011, ApJ, 732, 93
  • Van Der Walt et al. (2011) Van Der Walt S., Colbert S. C., Varoquaux G., 2011, preprint, (arXiv:1102.1523)
  • Wang & White (2012) Wang W., White S. D. M., 2012, MNRAS, 424, 2574
  • Wechsler et al. (2006) Wechsler R. H., Zentner A. R., Bullock J. S., Kravtsov A. V., Allgood B., 2006, ApJ, 652, 71
  • Weinmann et al. (2006) Weinmann S. M., van den Bosch F. C., Yang X., Mo H. J., 2006, MNRAS, 366, 2
  • Wyithe & Brown (2010) Wyithe J. S. B., Brown M. J. I., 2010, MNRAS, 404, 876
  • Wyithe et al. (2009) Wyithe S., Brown M. J. I., Zwaan M. A., Meyer M. J., 2009, preprint, (arXiv:0908.2854)
  • Zehavi et al. (2005) Zehavi I., et al., 2005, ApJ, 630, 1
  • Zehavi et al. (2011) Zehavi I., et al., 2011, ApJ, 736, 59
  • Zheng & Guo (2016) Zheng Z., Guo H., 2016, MNRAS, 458, 4015
  • Zheng et al. (2005) Zheng Z., et al., 2005, ApJ, 633, 791
  • Zheng et al. (2007) Zheng Z., Coil A. L., Zehavi I., 2007, ApJ, 667, 760
  • Zu & Mandelbaum (2015) Zu Y., Mandelbaum R., 2015, MNRAS, 454, 1161
  • Zwaan et al. (2003) Zwaan M. A., et al., 2003, AJ, 125, 2842
  • Zwaan et al. (2005) Zwaan M. A., Meyer M. J., Staveley-Smith L., Webster R. L., 2005, MNRAS, 359, L30
  • van den Bosch et al. (2013) van den Bosch F. C., More S., Cacciato M., Mo H., Yang X., 2013, MNRAS, 430, 725

Appendix A Implementation details

In this Appendix, we give the details of our implementation of the analytical model describing the abundance and clustering of Hi galaxies.

a.1 Basic halo model

In the halo model framework (Cooray & Sheth, 2002), it is assumed that all galaxies live inside dark matter halos. In the standard approach, galaxies in a given halo are split into a central galaxy that lives close to the center of the gravitational potential and satellites that orbit around the central. The key ingredient to any halo model framework is the prescription by which one populates the halos with galaxies based on some quantity , where could be, e.g., luminosity, stellar mass, mass of the gas content, etc. We denote by the fraction of -halos (i.e. halos which have their masses in the range to ) which contain a central galaxy having that particular quantity larger than a threshold value . Similarly, denotes the average number of satellites having that quantity greater than and residing in a -halo that hosts a central galaxy having the same quantity more than . Our model thus assumes that the central is the ‘first’ galaxy in a group, i.e., groups cannot be devoid of centrals. As mentioned in the Introduction, our model will place each central at exactly the center-of-mass of its parent halo, ignoring the effects of any offsets in position or velocity with respect to the dark matter.

The average number of galaxies in a -halo with the chosen property larger than can then be expressed as


Then the average comoving number density of such galaxies can be computed as,


In the above expression, is the differential comoving number density of -halos, i.e. the halo mass function. For the present work, we have used the halo mass function as prescribed in Tinker et al. (2008).

The galaxy -point correlation function can be decomposed into the so-called -halo and -halo terms such that,


The -halo term of the correlation function gives the correlation between galaxies living in two different halos and can be written in Fourier space as


where is the non-linear power spectrum of the underlying dark matter density field and is the scale-dependent bias of dark matter halos. While writing the above expression we assumed that the -point correlation function between halos of mass and can be written as where is the two-point correlation function of the dark matter density field (i.e., the Fourier transform of ). For computing , we have used the prescription presented in Smith et al. (2003). This prescription takes the linear theory matter power spectrum as an input for which we have used the fitting function given in Eisenstein & Hu (1999). To compute the halo bias we have used the fitting form of linear scale-independent halo bias as given in Tinker et al. (2010) and the prescription for including scale dependence as given by Tinker et al. (2005). The -halo term in real space is then obtained as .

The quantity in equation (9) is the normalized Fourier transform of the profile with which satellites are distributed around their central galaxy. In this work, we have used an NFW profile (Navarro et al., 1997) for , with a mean concentration-mass relation as prescribed in the first half of equation (A5) of Paranjape (2014) which is a slightly modified version of the relation given in Ludlow et al. (2014). Scatter around this mean relation has been neglected. Having defined all these quantities, we can compute the correlation between the galaxies inside the same halo, i.e., the -halo term of the correlation function, as


In the above equation, is the convolution of with itself. We have assumed Poisson distribution for satellite counts to replace the second factorial moment with in the second line.

In principle, to gain a few percent level accuracy in the model, a few other aspects have to be considered too. These are the halo exclusion in the 2-halo term, non-Poissonian effects in satellite pair counts, scatter in the concentration-mass relation in the 1-halo term, the effects of redshift space distortions (see, e.g., van den Bosch et al., 2013) etc. Given the quality of the Hi selected galaxy data and for the sake of simplicity, we ignore these additional effects. As mentioned in the main text, however, we are in the process of improving upon these systematic uncertainties using calibrations of halo clustering directly from -body simulations (Zheng & Guo, 2016).

To reduce the redshift space effects, correlation measurements are often quoted in terms of the projected correlation function (Davis & Peebles, 1983)


Here () is the separation between two galaxies perpendicular (parallel) to the line of sight. All the quantities , and defined above depend on the property which we have not shown explicitly for brevity.

a.2 Optical and HI scaling relation

In the following we will consistently denote SDSS galaxy colour as . Since we are considering only central galaxies in our scaling model, equation (3) can be rewritten as,


The quantity is determined by the optical HOD analysis of Zehavi et al. (2011), and we use an interpolation kindly provided by Ramin Skibba. The reader may note that we have not accounted for any halo mass dependence in the remaining functions and , which respectively refer to the colour distribution of centrals at fixed luminosity and the distribution of Hi in centrals of fixed luminosity and colour. We discuss the reasons for this below.

The observed colour-luminosity distribution of SDSS galaxies has a well-known bimodality (ultimately related to the star-formation properties of galaxies) which can be modelled as a double-Gaussian in colour at fixed luminosity (see, e.g., Baldry et al., 2004), which we can write as


where is the ‘red fraction’ of galaxies with luminosity , and are the red/blue modes of the distribution.

When constructing a halo model that includes colour information, one must additionally model the colour distribution of satellites and centrals. Skibba & Sheth (2009) showed that the data allow us to think of each of these as simply weighted versions of the same two Gaussian modes used for modelling the observed colour distribution. Since the all-galaxy sample leads to very accurate double-Gaussian fits, this means that the uncertainty in the individual shapes of the central and satellite colour distributions is relegated to two functions that describe the red fraction of satellites/centrals of luminosity in -halos. These functions can be constrained by comparing with the colour-dependence of clustering in SDSS.

As Skibba & Sheth (2009) further demonstrated using SDSS DR5 data (this has also been confirmed by Paranjape et al., 2015, for SDSS DR7 data), a final simplification is that these satellite/central red fractions can be safely assumed to be independent of halo mass. This also implies that the full colour distributions of centrals (and satellites) are independent of halo mass. Since the centrals and satellites must add up to the full galaxy population whose (halo mass independent) colour distribution is directly observed, this reduces the problem to calibrating a single function of luminosity, which can be chosen to be the satellite red fraction .

In detail then, the quantity we require in our model is built as follows (see also Appendix A of Paranjape et al., 2015):


The distributions and are Gaussians with mean and and standard deviation and . These were calibrated by Paranjape et al. (2015) to have the forms


We will also need expressions for the all-galaxy red fraction and the satellite red fraction . The former was calibrated by Paranjape et al. (2015) to have the form


while Pahwa & Paranjape (2017) showed that the satellite red fraction given by


gives a good description of SDSS colour-dependent clustering, although this conclusion was not based on a rigorous statistical analysis. We discuss possible improvements on calibrating this ingredient in the main text.

Putting things together, the central red fraction appearing in equation (14) can be written as




and .

As regards the halo mass dependence of the Hi-optical scaling distribution , the quality of the clustering data force us to minimise the number of free parameters in the model. For simplicity, therefore, we assume this distribution to be independent of halo mass. Inspired by the near-symmetric distributions for found by matched analyses such as Maddox et al. (2015) or Catinella et al. (2010), we model to be Gaussian with mean (c.f. equation 4) and standard deviation as discussed in section 4. The good quality of the fit we achieve (see Table 2) is a post hoc justification that this model is indeed acceptable.

Finally, equation (12) describes the scaling relation for galaxies in bins of Hi mass. To get the HOD for thresholds of Hi mass (this is essential since the clustering data is only available for such thresholds), we need the quantity which is simply given by the expression,


where is the error function.

Given a scaling relation (i.e., values for the four parameters in equation 4), we can also use equation (12) to calculate the binned Hi mass function by averaging over halo mass,