Halo models of HI selected galaxies
Abstract
Modelling the distribution of neutral hydrogen (Hi) in dark matter halos is important for studying galaxy evolution in the cosmological context. We compute the abundance and clustering properties of Hiselected galaxies using halo models constrained by data from the ALFALFA survey. We apply an MCMCbased statistical analysis to constrain the model parameters through two different approaches. In the first approach, we describe the Hi content of galaxies in dark matter halos directly, using a halo occupation distribution (HOD) for the number counts of Hi galaxies. We find that a significant number of low mass () galaxies must be satellites. In the second, more novel approach, we infer the Hidark matter connection at the massive end () indirectly, using optical properties of lowredshift galaxies as an intermediary. In particular, we use a previously calibrated optical HOD describing the luminosity and colourdependent clustering of SDSS galaxies and describe the Hi content using a statistical scaling relation between the optical properties and Hi mass. The resulting bestfit scaling relation identifies massive Hi galaxies primarily with optically faint blue centrals, consistent with expectations from galaxy formation models. Our results will be useful in making forecasts for future observations of Hi galaxies with upcoming radio telescopes like the SKA, as well as in exploring synergies between SKA and optical surveys such as Euclid and LSST.
keywords:
galaxies: formation – cosmology: largescale structure of the Universe – methods: numerical, analytical1 Introduction
In the current understanding of Lambdacold dark matter (CDM) cosmology, the large scale structure of the universe forms in a hierarchical manner driven by the collisionless cold dark matter. The gravitational instability leads to formation of high density collapsed objects (or haloes) that are decoupled from the cosmic expansion. The baryons, being less abundant, follow the dark matter distribution and settle in the gravitational potential well formed by the dark matter haloes and eventually form stars and galaxies (Mo et al., 2010). Perhaps the most effective observational probes of structure formation, particularly at low redshifts (say ), are related to the stars and the interstellar medium of these galaxies. For example, the environmental dependence of star formation in galaxies can be probed using optical and UV (ultraviolet) luminosities through large surveys like the 2dF Galaxy Redshift Survey (2dFGRS) (Norberg et al., 2001; Lewis et al., 2002) and Sloan Digital Sky Survey (SDSS) (Baldry et al., 2004; Zehavi et al., 2005; Skibba & Sheth, 2009). Alternatively, one can also probe the lowdensity gas residing in the interstellar medium using atomic and molecular lines, one of the most prominent lines being the 21 cm transition of neutral hydrogen (Hi) (Draine, 2011). At low redshifts, this line is detected in emission through surveys like the HI Parkes AllSky Survey (HIPASS) (Zwaan et al., 2003; Zwaan et al., 2005) and Arecibo Legacy Fast ALFA (ALFALFA) (Giovanelli et al., 2007) which can provide us with a robust estimate of the Hi mass of the galaxy. The cold gas traced by the Hi 21 cm line can act as a natural source for star formation and thus one expects some correlation between the Hi and optical properties of the galaxies. Such correlations have been explored by crossmatching the Hi galaxy catalogues with samples of optical and ultraviolet selected galaxies and studying physically interesting quantities such as the Hitostellar mass ratio (Catinella et al., 2010; Maddox et al., 2015).
On the theoretical front, the formation and evolution of dark matter haloes are reasonably well understood, thanks to both body simulations and analytical calculations. Since these dark matter haloes cannot be probed directly via observations, one needs to include some basic understanding of the galaxy formation process for connecting the theoretical calculations to the observational data. Unfortunately, the theoretical modelling of galaxy formation consists of various complex astrophysical processes, many of which are relatively poorly understood. These include, e.g., gas cooling, fragmentation, formation of molecular gas, star formation, feedback processes etc. These processes are highly nonlinear and cover a wide range of length and temporal scales. Hence it is almost impossible to model them selfconsistently in analytical calculations or numerical simulations. The most common approach in modelling the galaxy formation has been to make approximations in the calculations by introducing a number of free parameters. These parameters can subsequently be constrained by matching with observations over a wide range of wavebands, thus gaining insights into the physical processes involved (see Somerville & Davé, 2015, for a review).
An alternate, statistical approach in connecting the observations of galaxies to the dark matter haloes is to use a phenomenological prescription to populate the haloes with galaxies. These prescriptions consist of free parameters which are constrained using observations. These include, e.g., the Halo Occupation Distribution (HOD; see, e.g., Cooray & Sheth, 2002, for a review) and SubHalo Abundance Matching (SHAM) based approaches (see, e.g., Reddick et al., 2013; Hearin et al., 2013), both of which have been widely used while studying the optical/UVselected galaxies. The main advantage of these methods is that they do not require any assumptions about the astrophysical processes (say, cooling or feedback) inside the halo, rather the physical insights can be inferred indirectly once the model parameters are reliably constrained (Kravtsov et al., 2004; Zheng et al., 2005).
Since the number of Hiselected galaxies has become quite substantial in recent years, it is worth exploring whether such statistical models can be naturally extended to understand the Hi properties of galaxies (Wyithe et al., 2009; Wyithe & Brown, 2010; Padmanabhan et al., 2016; Padmanabhan & Refregier, 2017; Padmanabhan et al., 2017; Padmanabhan & Kulkarni, 2017). Studies along these lines have been attempted recently by Guo et al. (2017) who have found that a SHAMbased model can match both the mass function and twopoint correlation of the Hi galaxies.
In this paper, we perform an independent investigation of HODbased models of the Hi distribution, using the abundance and clustering data of resolved Hi galaxies at low redshifts. Our analysis, based on the Markov chain Monte Carlo (MCMC) method, consists of two separate approaches: (a) In the first case, we use an HODbased prescription to model the observed properties of the Hiselected galaxies, exactly similar to that done for the optical SDSS galaxies by Zehavi et al. (2011). This will allow us to explore the relation between Hi content and the host dark matter halo mass. (b) In the second method, we use a novel treatment where we combine precalibrated optical HODs – these describe the luminosity and colourdependent clustering of SDSS galaxies (Zehavi et al., 2011; Skibba & Sheth, 2009) – with a modelled scaling relation between the optical properties and Hi mass of the galaxies. The free parameters of the scaling relation can be constrained using observations of Hi galaxies, thus allowing us to infer the correlation between the optical properties and Hi content. Note that this analysis does not require the Hi and optical samples to be necessarily matched and thus can contain larger number of objects. One can, of course, use the results of the matched samples as additional constraints in our analysis which may further reduce the uncertainties in the values of the free parameters.
The article is organized as follows. We describe the data set used in this work in section 2. In section 3, we present the results of our HODbased analysis. This is followed by the analysis based on scaling relation between the Hi mass and optical properties of the galaxies in section 4. We present a detailed discussion of our results in section 5, and conclude in section 6. Some of the details of our calculations are given in the Appendices.
Throughout this article, we have chosen a flat CDM cosmology with matter density parameter , baryon density parameter , Hubble parameter with , primordial r.m.s. density fluctuations at a scale of , and an inflationary spectral index, , consistent with the 5year results of the Wilkinson Microwave Anisotropy Probe experiment (Komatsu et al., 2009), and also with the values assumed by Zehavi et al. (2011) in calibrating their HOD, which we will use in our scaling analysis. Wherever needed, we have used the Eisenstein & Hu (1999) fitting function for the linear theory matter power spectrum. We will quote halo masses in , neutral hydrogen masses in and stellar masses in . Galaxy luminosities are quoted in terms of , where is SDSS band absolute magnitude, Kcorrected and evolution corrected to (Blanton et al., 2003).
2 Data set
In this work we use the abundance and clustering data of individually resolved galaxies based on their Hi mass from ALFALFA survey (Papastergis et al., 2013) to constrain our halo model. The sample consists of Hiselected galaxies from the catalogue of the ALFALFA survey (Giovanelli et al., 2007). In particular, we use the projected correlation function for samples defined using six different thresholds of neutral hydrogen mass: with .
While computing the likelihood for our analysis, we use the full covariance error matrix which was obtained by bootstrapping the data sample (Papastergis et al., 2013). Clustering data and bootstrap covariance matrices were available for projected separations between  Mpc. For three of the thresholds – , and – the quality of the covariance matrix for a few values of separation was suboptimal and led to nonconverged MCMC chains. In these cases, we have excluded the correlation function measurements at these separations, along with the corresponding row and column for each in the covariance matrix. We have indicated the results affected by this choice below.
For the sake of brevity, we will display results below for the thresholds and . The summary of all other results will be given in a Table and a summary plot.
3 HOD analysis
Our first approach for studying the Hi content in galaxies is based on populating the dark matter haloes using the HOD approach.
A simplifying assumption in this approach is that the galaxy distribution is determined only by the halo mass and no other property of the halo (Berlind & Weinberg, 2002). In principle, the method can be extended to account for the dependence on other halo properties as well (Cooray & Sheth, 2002; Wechsler et al., 2006). In this work, however, we will focus on the simple “halo mass only” parametrisation and comment on possible extensions (see, e.g., Paranjape et al., 2015; Zu & Mandelbaum, 2015; Hearin et al., 2016) towards the end. We will adopt the standard convention and assume a “central” galaxy living at the centerofmass of the halo, along with other “satellites” distributed in a sphere around the central. Since we will focus on the projected 2point clustering signal below, we ignore velocity offsets between galaxies and the dark matter potential well which can be important in modelling redshift space clustering (Guo et al., 2015).
We parametrize the fraction of halos (i.e. halos having mass in the range and ) which contain a central galaxy having neutral hydrogen mass more than as
(1) 
where is the maximum value of the fraction of haloes hosting a central, is the cutoff mass where the halo fraction is and is the width of the cutoff profile. Note that the above parametrization implies when , i.e., only a fraction of large mass haloes are guaranteed to host a central. This form is very similar to the most common parametrization used while modelling the optical galaxies (Zheng et al., 2007; Zehavi et al., 2011), the only difference being the additional parameter . This parameter is included to allow for the possibility that a fraction of haloes may not contain any Hi galaxies at all. Note that the parameter is constrained to have values on the range .
For the satellites we assume that their number in a halo follows a Poisson distribution with a mean that depends on halo mass in the form of a power law. The clustering data of Hiselected galaxies we use in this work (see below) is not sufficiently constraining to explore deviations from these assumptions, such as those described by van den Bosch et al. (2013), for example. The mean number of satellites in halos having Hi mass more than is assumed to be of the form
(2) 
where is the mean number of satellites in haloes of mass and is the powerlaw index. Finally, throughout the analysis we impose the numerical restriction and never consider halos smaller than the smallest Hi content of the galaxies they are expected to host.
We thus have five free parameters in our model , , and . Having populated the haloes with galaxies in this manner, we can compute their mass function and twopoint correlation function in the standard halo model framework (Cooray & Sheth, 2002; Berlind & Weinberg, 2002; Zehavi et al., 2005) which we briefly describe in Appendix A.1. The computed Hi mass function and correlation function can then be matched with observation and the free parameters in our model are constrained with a Bayesian MCMC analysis using the package EMCEE (ForemanMackey et al., 2013).
3.1 Constraints on HOD parameters
The lowest threshold contains all the Hiselected galaxies in the sample. The likelihood contours for the four parameters for this threshold are shown in Figure 1. We see that the parameter is consistent with being unity, i.e., almost all large mass haloes are guaranteed to host a galaxy having Hi mass . The most interesting conclusion drawn from this figure is that the value of is inconsistent with zero with a high significance (). This implies that the current data requires Hi to be distributed in satellites and simple prescriptions where dark matter haloes are assumed to contain not more than one Hi galaxy are ruled out.
26.05/17  
8.0  44.21/18  
8.5  15.71/18  
11.42/17  
9.5  14.06/18  
0  0  7.65/8 
The reason why satellites are necessary to match the data can be understood from Figure 2 where we show the projected correlation and abundance of Hi galaxies (see Appendix A.1) for the model with bestfit parameter values. For , we show the contributions from the halo and halo terms separately. It is clear that the correlation at large separations Mpc is consistent with the halo term. However, the data points at the smaller scales require invoking the halo contribution which in turn leads to the necessary requirement of satellites. In order to check whether this conclusion is an artefact of the HOD parametrization chosen, we carried out a separate MCMC analysis for models without satellites, i.e., by fixing the two parameters . In that case, we found that the bestfit model has a , significantly worse than the bestfit model with satellites included for which (see Table 1). We also note that, for this threshold, the data point at Mpc (indicated by the open circle in Figure 2) was excluded along with the corresponding row and column of the error covariance matrix (see section 2). To ensure that our conclusion of nonzero satellite number is not then driven by the single data point at Mpc, we also repeated the analysis additionally excluding this data point as well (i.e., excluding both data points with Mpc). Although the constraints are understandably weaker in this case, we still find with a bestfit value of .
Further insights on the Hihosting galaxies can be gained from Figure 3 where we plot the mean number of Hi galaxies per halo as a function of halo mass. We can see that the galaxies are expected to reside in haloes of mass . The mean central fraction saturates to almost unity at (determined by the value of ) while the satellite number increases with halo mass.
We repeat the analysis for other Hi mass thresholds, the results of which are summarized in Table 1. From the values of the reduced Chisquared for the bestfit model (last column), we find that our model provides acceptable fits to the data for all thresholds except for the case of . The in this case is , considerably larger than the other cases. We suspect that this may be related to the nonconvergence of the error covariance matrix of the observational data obtained by bootstrap method. For the subsequent discussions, we will thus ignore this threshold.
The next point to be noted from the Table is that the median value of decreases to for the higher thresholds. This implies that only half of the large mass haloes can contain centrals of Hi mass . We also find that the the necessity of including satellites in the model holds for the and cases as well, as can be seen from the constraints on . Interestingly, the number of satellites drops significantly for the next threshold , where we find that the value of is consistent with zero within and the bestfit value is also small. To understand the reason, we plot for this threshold in Figure 4 showing the contributions from the halo and halo terms separately. It is clear from the plot that all the data points, including the ones at smaller separations, are wellfit by the halo term alone. This indicates that the presence of any significant number of satellites is not necessary to explain the high mass Hi galaxies. The mean galaxy number per halo for this Hi threshold as a function of halo mass is shown in Figure 5. It is again obvious that the data does not require many satellites to be present except at low halo masses .
For the highest mass threshold , we find that the fiveparameter model leads to nonconverging MCMC chains. Given the results for the previous threshold, we anticipate that high mass Hi galaxies are unlikely to be related to the satellites. Hence we set , and fit for only three parameters , and . The chains converge satisfactorily in this case and we obtain a good fit to the data, as can be seen from the last row of Table 1. ^{1}^{1}1The reader may have noticed that, once we set , the 2halo term depends only on the combination at large scales (see equation (9) in Appendix A). Strictly speaking, there is a strong degeneracy between and . We therefore performed a test in which we fixed and varied . We found that, although we achieve a formally good fit with , the model is forced towards unphysically large values of () accompanied by large values of . In other words, the model tries to compensate for the fact that in this case, by placing galaxies in very large halos with a large scatter, thus mimicking in the range . Something similar happens if both and are allowed to vary; the degeneracy between and again forces the model into unphysical regimes of . We thus conclude that a model with , with allowed to vary, is in fact the natural and selfconsistent choice for this threshold.
We have therefore demonstrated that a simple halo massbased HOD can in fact produce a reasonable description of the data. We should mention here that Guo et al. (2017) have attempted a similar HODbased analysis and found that it leads to results that are physically not justifiable (e.g., they found Hitohalo mass ratios larger than unity in some cases). It is not straightforward to identify the precise reason why our conclusions differ with theirs. One possible reason could be that the clustering measurements we use are weighted by inverse number density (Papastergis et al., 2013), while Guo et al. (2017) have used pairwise inverse weights. In Appendix C we study the Hitohalo mass relation implied by our HOD analysis in some detail, showing that our bestfit HODs place physically reasonable amounts of Hi in the vast majority of Hioccupied halos. We will return to a discussion of the implications of our results in section 5, where we will also compare them with the results of another, more novel, analysis described in the next section.
4 Scaling between HI and optical properties
In this section, we discuss a different approach to understand the relation between Hi content and the halo mass, by additionally exploring the relation between Hi and optical properties of the galaxies. To this end, we make use of existing halo model calibrations for the luminosity (Zehavi et al., 2011) and colourdependent clustering (Skibba & Sheth, 2009; Paranjape et al., 2015; Pahwa & Paranjape, 2017) of optical SDSS galaxies and prescribe a scaling relation between the optical and Hi properties. We focus on the massive end of the Hi galaxy population which is expected to have substantial overlap with the optically faint end of the galaxy population observed by SDSS. Indeed, crossmatching analyses have found large numbers of galaxies that can be paired between ALFALFA, SDSS and GALEX (Catinella et al., 2010; Toribio et al., 2011; Maddox et al., 2015). Our approach to connecting the optical properties with Hi content, however, is statistical, inspired by similar analyses routinely performed in the galaxy cluster community.
4.1 Setup
We assume that Hi mass obeys a scaling relation in terms of optical luminosity and colour , defined by a mean relation and a scatter around this mean. For definiteness, we assume that the distribution of at fixed luminosity and colour is Lognormal, so that this mean and width describe the full distribution. We then tie together this scaling with colourluminosity relations and an optical HOD calibrated by previous analyses, to produce a model for the Hi mass function and 2point correlation function. Schematically, we write a conditional Himass function in terms of a conditional luminosityfunction , colourluminosity distribution and the assumed Hioptical scaling distribution as
(3) 
which is then used to build a halo model of Hi clustering. As in the HOD analysis of section 3, we impose the numerical restriction and never consider halos smaller than the smallest Hi content of the galaxies they are expected to host.
In principle, the optical (and also Hi) properties of centrals and satellites should be modelled separately (Skibba & Sheth, 2009; Zehavi et al., 2011). We argue below, however, that our analysis must be restricted to the massive end of the Hi mass function, which allows us to set the number of satellites to zero and only focus on the scaling relation for centrals. This restriction could be relaxed in the future by modelling the clustering of optically faint galaxies. Appendix A.2 gives the details of our implementation, and also discusses the motivation behind some of our technical choices (e.g., we assume that the colourluminosity relation and Hioptical scaling are independent of halo mass ).
After exploring various choices for the functional dependence of the mean relation on luminosity and colour, we finally settled on
(4) 
where we use the SDSS band absolute magnitude in place of luminosity, and set to be the colour. is a luminosity threshold that we discuss later. The sign convention for and is such that implies a positive correlation between Hi mass and luminosity and places more Hi in blue galaxies.
The motivation for the complicatedlooking logarithm in equation (4) involving colour is the fact that the quantity inside the logarithm gives a good description of the stellar masstolight ratio of SDSS galaxies (Wang & White, 2012; Paranjape et al., 2015), so that
(5) 
(see equation 7 of Paranjape et al., 2015) where and is the band absolute magnitude of the Sun. It follows that setting would enforce a powerlaw scaling between Hi mass and stellar mass, . The model in equation (4) is flexible enough to allow for deviations from a strict dependence of Hi mass on stellar mass, while still stable enough to the dynamic range of optical colours. (Other choices such as linear or quadratic scalings with colour led to unstable results with nonconvergent Monte Carlo chains.)
We then use the MCMC method to constrain the four free parameters that define the scaling, i.e., , , and . The Hi galaxy clustering data we use for obtaining the constraints is essentially the same as the one used in the previous sections. For the Hi mass function, we use the binned results given in Figure of Martin et al. (2010). As compared to the results of Papastergis et al. (2013), this gives us more leverage in the MCMC analysis by probing the shape of the highmass Hi mass function in some detail. We assume independent errors for these binned measurements.
4.2 Sample completeness
Since we are relying on an optical HOD calibrated by Zehavi et al. (2011), we are restricted to using the luminosity range explored by those authors in their clustering measurements and HOD analysis, which was . Our luminosity threshold in all integrals is therefore set to . Consequently, our analysis must also be restricted to the massive end of the Hi distribution, since we expect at least some positive correlation between luminosity and Hi mass. Exactly which threshold of we can reliably probe, however, cannot be decided a priori since the scaling between Hi mass and optical properties that would give us this information is the very quantity we are trying to constrain. To break this circularity and produce stable and converged results, we proceed as follows.
Let us assume that the true scaling is unique, i.e., independent of the threshold. In the following we focus on the two highest thresholds we have measurements for, namely and . The faintest luminosity we can access is , and we will build the argument below using this and one additional brighter threshold, which we choose to be . The argument is conceptually insensitive to the value of the second threshold, and can also accommodate a larger number of thresholds brighter than .
The following possibilities then exist:

Case I: Galaxies with all correspond to , and galaxies with correspond to .

Case II: Galaxies with all correspond to , but a substantial number of galaxies with correspond to , i.e., very faint SDSS galaxies.

Case III: A substantial number of galaxies with correspond to .
This exhausts all relevant possibilities for the correlation between Hi mass and luminosity in this example.
9.5  28.81/26 
To test which of these cases is consistent with the data, consider performing MCMC analyses using the following cuts in Hi mass and luminosity: (i) {, }, (ii) {, }, (iii) {, }, (iv) {, }. If Case I holds, then (i) and (ii) will give consistent results, (iii) will give biased results, but (iv) will give results consistent with (i) and (ii). If Case II holds, then (i) and (ii) will be consistent with each other, but neither (iii) nor (iv) will be consistent with these or with each other. If Case III holds, then (i) and (ii) will not be consistent with each other.
We performed all four analyses (i)(iv) and found that the results are in fact consistent with Case I. We also checked that using as the second threshold instead of leads to similar conclusions. One can therefore safely say that most galaxies with correspond to luminosities . Below, we will therefore display results for , setting .
4.3 Excluding satellites
Finally, motivated by the HODbased analysis of the previous section, we set the number of Hicontaining satellites in the model to zero for our chosen threshold of . We have also run a model where the number of satellites is taken to be a free parameter. It turns out that the bestfit parameter values in this case correspond to Hi being hosted in roughly equal parts in optically red and blue galaxies at the massive end, which in turn is inconsistent with the colour distributions of Hiselected galaxies (Maddox et al., 2015). For completeness, we present the details of this analysis in Appendix B.
4.4 Results
We find a good fit for our fourparameter model constrained by the Hi mass function and correlation function, with a Chisquared/dof of (see Table 2 for the bestfit parameter values). The corresponding likelihood contours are shown in Figure 6. We see that the bestfit value of corresponds to a small but positive correlation between Hi mass and luminosity, and similarly the bestfit value of implies blue galaxies being preferred in Hi content. To better understand the characteristics of the bestfit model, we show the corresponding Hi mass function in Figure 7, with the contribution of the red and blue galaxies shown separately. Unsurprisingly, the mass function is dominated by the blue galaxies, with the red galaxies contributing only . The turnover of the predicted mass function at small is due to the hard luminosity cut of imposed, as we mentioned before, by the HOD calibration of Zehavi et al. (2011).
The corresponding predictions for the projected Hi correlation function for the bestfit model are shown in Figure 8. The clustering data, as we can see from the plot, do not require any contribution from the satellites (i.e., the halo term is sufficient to explain the data). Our model predicts the clustering amplitude of Hihosting red galaxies to be slightly higher than the blue ones (c.f. the colourdependent clustering of optical galaxies, Skibba & Sheth, 2009; Zehavi et al., 2011); the data, of course, do not distinguish between these.
The mean number of galaxies per halo as a function of halo mass – i.e., an inferred HOD – obtained from the scaling analysis was shown in Figure 5 as the brown curves with the orange band. Figure 9 shows an alternative representation of these results, where we have multiplied the respective HODs with the halo mass function (we use the fitting function from Tinker et al., 2008, for the latter; see Appendix A.1); the result is the total number density of galaxies per logarithmic halo mass interval, as a function of halo mass. We discuss these results further in section 5.
5 Discussion
In this section, we discuss some implications of our results, comparing the results obtained using the two different methods discussed in the previous two sections with each other and with results from the literature. We also highlight some of the caveats in our analysis and mention a few possibilities for future improvement of our technique.
5.1 Implications of scaling relation analysis
Our scaling relation analysis, which is built on the success of optical HODs calibrated using SDSS DR7 data, allows us to address some interesting questions which the more traditional parametric HOD analysis based only on ALFALFA data cannot.
For example, our choice of scaling in equation (4) allows us to calculate the mean Hi mass density contributed by lowredshift massive galaxies with different luminosities and colours. This is shown in Figure 10, where the coloured region indicates the value of defined as the mean Hi mass per unit comoving volume (in units of ), per unit optical () colour, per unit band absolute magnitude, as a function of optical properties. We have used the bestfit values for the scaling relation (Table 2) in making this plot, which clearly shows that our model places the bulk of Hi in SDSS galaxies with in faint blue centrals, with a small amount in red centrals as well.
The analysis above has been observationally motivated, with all variables and relations defined exclusively in terms of directly observable quantities such as Hi mass and optical absolute magnitudes. More physically, one might expect the amount of cold gas represented by to be correlated with the stellar content of galaxies represented by the stellar mass . As discussed in section 4.1, a good description of the correlation between and optical observables is given by equation (5). We can, therefore, use our bestfit scaling relation to calculate the mean Hi mass in galaxies containing a fixed amount of stellar mass. This Hitostellar mass relation has been wellexplored in the literature, both observationally (see, e.g., Catinella et al., 2010; Toribio et al., 2011; Maddox et al., 2015) and theoretically (see, e.g., Davé et al., 2013; Popping et al., 2015). Figure 11 shows our bestfit prediction for this relation, compared with observational constraints from Maddox et al. (2015). The gray shaded area indicates values of that are affected by incompleteness due to the hard luminosity cut , calculated by comparing the stellar mass function predicted by the optical HOD (Appendix A.2) and equation (5) with Schechterfunction fits to the measured SDSS stellar mass function from Peng et al. (2012). We see that our model systematically underpredicts the mean Hi content of massive galaxies.^{2}^{2}2In detail, our model prediction can be understood as follows. Combining equations (4) and (5), we can write , where the constant depends on the model parameters but not on luminosity and stellar mass, and is therefore irrelevant for this argument. To obtain , we must average over the conditional luminosity function of central galaxies (recall that we have set the satellite number to zero in this model). Since there is a positive correlation between luminosity and stellar mass for central galaxies, and since the bestfit and are both positive, the two terms involving and compete with each other. The result is a near cancellation which leaves almost no dependence of on in our bestfit model (Figure 11). It is interesting to note that the same effect which puts Hi in blue galaxies in our model, namely the positivity of , also contributes to making the  correlation less positive. Since the discrepancy occurs at high stellar masses which are not affected by incompleteness, we believe it may be related to systematic effects in the correlation function measurements, which we discuss below. The error bars on the Maddox et al. data show the measured scatter of the distribution at fixed . Correspondingly, the error band on the green symbols shows the prediction for this scatter from our model; this agrees well with the measurements.
We emphasize, however, that although our scaling relation analysis is restricted to the high Hi mass end, it provides a new statistical method to connect the Hi distribution to optical properties of galaxies, which is complementary to direct observational analyses of matched galaxy samples. A novelty of our method is that, since it is a halo model by construction, it also provides a simultaneous theoretical connection of the Hi and optical properties with the underlying dark matter distribution. In section 6, we mention a few interesting avenues for future work along similar lines.
5.2 Comparison of parametric HOD and scaling relation results
We now turn to a comparison of the results obtained using our two techniques. In Figure 12, we show the binned Hi mass function for our bestfit models, with contributions from different halo mass ranges shown separately. The mass function measured by Papastergis et al. (2013) (computed as differences of the thresholded number counts) and by Martin et al. (2010) are shown for comparison. We find that for both the analyses (parametric HOD and scaling relation) the Hi galaxies are primarily hosted in haloes of masses . It is obvious that the HODbased analysis probes mainly the low Hi mass range, while the scalingbased method is restricted to more massive () galaxies.
At this point, it would be interesting to discuss the relative contribution of centrals and satellites to the Hi mass function, which is shown in the right panel of Figure 12. We show the number density of centrals and satellites as a function of Hi mass (we remind the reader that the scaling relation analysis, restricted to , does not include any satellites). We see that satellites dominate the Hi contribution by number at all but the highest values. While the number of the centrals remain roughly constant (allowing for the errorbars) throughout the mass range, the number of satellites decrease at high Hi masses.
Our findings are somewhat in disagreement with mass function predictions from recent SAMs of galaxy formation (Kim et al., 2015, 2017). The SAMs, which incorporate the photoionization feedback from reionization to probe the lowmass end of the Hi mass function, predict that the mass function is dominated by centrals for . In these models, the satellites contribute to only very small mass Hi galaxies. A careful look at our calculations would show that the satellites are necessary to explain the clustering measurements at small scales for . Hence it would be interesting to compare the SAMs rigorously against the data obtained from the Hi surveys.
Interestingly, using a SHAMbased analysis, Guo et al. (2017) also find that the Hi mass function is dominated by centrals across all the Hi mass range of interest. One reason for the difference could be the weighting scheme used for calculating the correlation functions (weighted by inverse number density in Papastergis et al. (2013), while weighted by inverse in Guo et al. (2017)). In particular, Guo et al. (2017) found that the largescale clustering of Hi galaxies larger than in their analysis showed a substantial dependence on the threshold, in stark contrast to the analysis of Papastergis et al. (2013) which showed essentially no dependence of clustering.
Because of these differences, it is possible that some our conclusions, particularly in the high range, could change if we use the measurements of Guo et al. (2017). E.g., the discrepancy we found between our model’s prediction for the  relation and observational estimates of this quantity in Figure 11 could plausibly be due to this systematic effect in the measured 2point correlation function. If we were to instead use the measurements of Guo et al. (2017), the dependent clustering they observe would force our model to place high galaxies in increasingly more massive halos than it currently does. E.g., this could be accomplished by making the value of larger, thus more closely tying together the Hidependence of clustering in ALFALFA and the luminositydependence of clustering in SDSS. Since a similar argument applies to high galaxies as well (see, e.g., Zu & Mandelbaum, 2015, for a recent stellar massbased analysis of SDSS clustering), it is conceivable that our model would now predict a stronger positive correlation between and . We intend to explore this in the near future, along with implementing some improvements in our model as we describe in the next subsection.
It is nevertheless encouraging to note that our two different analyses lead to similar results in the high regime where they can be compared (Figures 5 and 9). The total galaxy number density in Figure 9, in particular, peaks at in both methods. The bestfit and the median models of the scaling relation are within the range of the HODbased method for almost the entire halo mass range. This is a nontrivial result which, combined with the discussion in section 5.1, emphasizes the power of joint opticalHi statistical analyses in building a complete picture of galaxy evolution. Additionally, the analysis in Appendix C shows that our best fit models in both cases produce physically meaningful results for the ratio of Hi mass to halo mass in the majority of halos in which they place Hi.
5.3 Caveats
In addition to the potential improvement in clustering measurements mentioned above, our scaling relation model itself could benefit from a complete joint analysis of the optical HOD together with the opticalHi scaling relation. This is primarily because the colourluminosity relations we have used in this work (Paranjape et al., 2015; Pahwa & Paranjape, 2017, see Appendix A.2 for details) have all been calibrated separately and independently of the luminositydependent HOD of Zehavi et al. (2011). In particular, the calibration of the satellite red fraction (equation 17) taken from Pahwa & Paranjape (2017) is the most susceptible to systematic effects, since it was not based on a rigorous Monte Carlo fit to clustering data (see also Skibba & Sheth, 2009). Although our scaling relation analysis assumed no Hicarrying satellites with , the corresponding split between red and blue centrals depends on the calibration of through equation (18). This aspect of our analysis can, therefore, be improved by simultaneously using clustering information from SDSS and ALFALFA in a joint analysis.
In the same same spirit, several ingredients of the analytical model described in Appendix A.1 – particularly those in the halo regime, such as our treatment of scale dependence of halo bias, or the fact that we ignore halo exclusion – are affected by systematic uncertainty (see, e.g., van den Bosch et al., 2013, for a comprehensive study) which would result in biases on the parameter values being constrained. This affects both the scaling relation as well as the parametric HOD analysis. By comparing with mock galaxy catalogs, Pahwa & Paranjape (2017) had estimated these effects to be of the order of in the predictions of SDSS colourdependent clustering. These uncertainties can also be largely mitigated by using numerical calibrations of halo clustering based directly on body simulations (Zheng & Guo, 2016). We are currently incorporating such calibrations in our MCMC routines, and the results will be presented in forthcoming work (Pahwa et al., in preparation).
6 Conclusion
The study of Hi in galaxies is important because of its connection with star formation and galaxy evolution. In particular, connecting these galaxies to the properties of the host dark matter haloes can be of great help in comparing theoretical models with observations. In this work, we have used two different approaches – a traditional one based on a parametric HOD and another, more novel approach, based on an assumed scaling relation between Hi content and optical properties of galaxies – to model the mass function and clustering properties of Hi galaxies as measured in the ALFALFA survey.
Our two main findings are:

At higher masses , most Hi must be in central galaxies (Table 1). The opticalHi scaling analysis in section 4 for this sample further implies that this Hi is primarily in optically faint blue centrals (Figure 10), which is consistent with observations from matched samples and expectations from galaxy formation models.
As mentioned previously, some of our conclusions regarding the high Hi mass end (e.g., the discrepancy between the Hitostellar mass ratio from observations and our bestfit scaling relation seen in Figure 11) might change upon using an updated version of the ALFALFA clustering data (Guo et al., 2017); we intend to explore this in the near future.
Our work can be extended in several directions. For example, using our lowredshift results and making reasonable assumptions regarding the evolution of the Hitodark matter mapping would allow us to make predictions for future highredshift observations of Hi such as those expected from the SKA. The opticalHi scaling analysis of section 4 is particularly interesting in this regard, since it can be combined with the known evolution of the optical HOD (Contreras et al., 2017) to make predictions for matched samples between SKA and upcoming optical surveys such as Euclid and LSST (Kitching et al., 2015; Bacon et al., 2015) .
Finally, although our models have been restricted to the simplest ‘halo mass only’ flavour of HOD analysis, they can easily be extended to include effects such as ‘galactic conformity’ (Weinmann et al., 2006), in which satellite galaxies ‘know’ about the star formation properties (or optical colour) of their central, with blue centrals being preferentially surrounded by blue satellites at fixed halo mass. The opticalHi scaling analysis of section 4 is particularly suited to such an extension, since it can ride on previously developed formalism (Paranjape et al., 2015; Pahwa & Paranjape, 2017) which allows for galactic conformity to be included in the analysis in a tunable manner. The addition of Hi abundance and clustering data will add new leverage in the problem and is expected to be very useful in constraining the nature of galactic conformity, particularly at large separations (Kauffmann et al., 2013; Hearin et al., 2015). We will return to these questions in future work.
Acknowledgments
We are grateful to E. Papastergis for kindly providing us with the covariance matrix estimates for the clustering data from Papastergis et al. (2013). We thank Neeraj Gupta for useful discussions in the initial stages of this work, and R. Srianand for discussions throughout. We gratefully acknowledge the use of high performance computing facilities at IUCAA, Pune (http://hpc.iucaa.in). This work used the open source computing packages NumPy (Van Der Walt et al., 2011, http://www.numpy.org), SciPy (Jones et al., 01) and the plotting softwares Veusz (https://veusz.github.io/) and corner.py (ForemanMackey, 2016). NP acknowledges the financial support from the Council of Scientific and Industrial Research (CSIR), India as a Shyama Prasad Mukherjee Fellow. The research of AP is supported by the Associateship Scheme of ICTP, Trieste and the Ramanujan Fellowship awarded by the Department of Science and Technology, Government of India.
References
 Bacon et al. (2015) Bacon D., et al., 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA14), p. 145
 Baldry et al. (2004) Baldry I. K., Glazebrook K., Brinkmann J., Ivezić Ž., Lupton R. H., Nichol R. C., Szalay A. S., 2004, ApJ, 600, 681
 Berlind & Weinberg (2002) Berlind A. A., Weinberg D. H., 2002, ApJ, 575, 587
 Blanton et al. (2003) Blanton M. R., et al., 2003, ApJ, 592, 819
 Catinella et al. (2010) Catinella B., et al., 2010, MNRAS, 403, 683
 Contreras et al. (2017) Contreras S., Zehavi I., Baugh C. M., Padilla N., Norberg P., 2017, MNRAS, 465, 2833
 Cooray & Sheth (2002) Cooray A., Sheth R., 2002, Physics Reports, 372, 1
 Davé et al. (2013) Davé R., Katz N., Oppenheimer B. D., Kollmeier J. A., Weinberg D. H., 2013, MNRAS, 434, 2645
 Davis & Peebles (1983) Davis M., Peebles P. J. E., 1983, ApJ, 267, 465
 Draine (2011) Draine B. T., 2011, Physics of the Interstellar and Intergalactic Medium. Princeton University Press
 Eisenstein & Hu (1999) Eisenstein D. J., Hu W., 1999, ApJ, 511, 5
 ForemanMackey (2016) ForemanMackey D., 2016, The Journal of Open Source Software, 24
 ForemanMackey et al. (2013) ForemanMackey D., Hogg D. W., Lang D., Goodman J., 2013, PASP, 125, 306
 Giovanelli et al. (2007) Giovanelli R., et al., 2007, AJ, 133, 2569
 Guo et al. (2015) Guo H., et al., 2015, MNRAS, 446, 578
 Guo et al. (2017) Guo H., Li C., Zheng Z., Mo H. J., Jing Y. P., Zu Y., Lim S. H., Xu H., 2017, ApJ, 846, 61
 Hearin et al. (2013) Hearin A. P., Zentner A. R., Berlind A. A., Newman J. A., 2013, MNRAS, 433, 659
 Hearin et al. (2015) Hearin A. P., Watson D. F., van den Bosch F. C., 2015, MNRAS, 452, 1958
 Hearin et al. (2016) Hearin A. P., Zentner A. R., van den Bosch F. C., Campbell D., Tollerud E., 2016, MNRAS, 460, 2552
 Jones et al. (01 ) Jones E., Oliphant T., Peterson P., et al., 2001–, SciPy: Open source scientific tools for Python, http://www.scipy.org/
 Kauffmann et al. (2013) Kauffmann G., Li C., Zhang W., Weinmann S., 2013, MNRAS, 430, 1447
 Kim et al. (2015) Kim H.S., Wyithe J. S. B., Power C., Park J., Lagos C. d. P., Baugh C. M., 2015, MNRAS, 453, 2315
 Kim et al. (2017) Kim H.S., Wyithe J. S. B., Baugh C. M., Lagos C. d. P., Power C., Park J., 2017, MNRAS, 465, 111
 Kitching et al. (2015) Kitching T., et al., 2015, Advancing Astrophysics with the Square Kilometre Array (AASKA14), p. 146
 Komatsu et al. (2009) Komatsu E., et al., 2009, ApJS, 180, 330
 Kravtsov et al. (2004) Kravtsov A. V., Berlind A. A., Wechsler R. H., Klypin A. A., Gottlöber S., Allgood B., Primack J. R., 2004, ApJ, 609, 35
 Lewis et al. (2002) Lewis I., et al., 2002, MNRAS, 334, 673
 Ludlow et al. (2014) Ludlow A. D., Navarro J. F., Angulo R. E., BoylanKolchin M., Springel V., Frenk C., White S. D. M., 2014, MNRAS, 441, 378
 Maddox et al. (2015) Maddox N., Hess K. M., Obreschkow D., Jarvis M. J., Blyth S.L., 2015, MNRAS, 447, 1610
 Martin et al. (2010) Martin A. M., Papastergis E., Giovanelli R., Haynes M. P., Springob C. M., Stierwalt S., 2010, ApJ, 723, 1359
 Mo et al. (2010) Mo H., van den Bosch F. C., White S., 2010, Galaxy Formation and Evolution. UK: Cambridge University Press
 Navarro et al. (1997) Navarro J. F., Frenk C. S., White S. D. M., 1997, ApJ, 490, 493
 Norberg et al. (2001) Norberg P., et al., 2001, MNRAS, 328, 64
 Padmanabhan & Kulkarni (2017) Padmanabhan H., Kulkarni G., 2017, MNRAS, 470, 340
 Padmanabhan & Refregier (2017) Padmanabhan H., Refregier A., 2017, MNRAS, 464, 4008
 Padmanabhan et al. (2016) Padmanabhan H., Choudhury T. R., Refregier A., 2016, MNRAS, 458, 781
 Padmanabhan et al. (2017) Padmanabhan H., Refregier A., Amara A., 2017, MNRAS, 469, 2323
 Pahwa & Paranjape (2017) Pahwa I., Paranjape A., 2017, MNRAS, 470, 1298
 Papastergis et al. (2013) Papastergis E., Giovanelli R., Haynes M. P., RodríguezPuebla A., Jones M. G., 2013, ApJ, 776, 43
 Paranjape (2014) Paranjape A., 2014, Phys. Rev. D, 90, 023520
 Paranjape et al. (2015) Paranjape A., Kovač K., Hartley W. G., Pahwa I., 2015, MNRAS, 454, 3030
 Peng et al. (2012) Peng Y.j., Lilly S. J., Renzini A., Carollo M., 2012, ApJ, 757, 4
 Popping et al. (2015) Popping G., Behroozi P. S., Peeples M. S., 2015, MNRAS, 449, 477
 Power et al. (2010) Power C., Baugh C. M., Lacey C. G., 2010, MNRAS, 406, 43
 Reddick et al. (2013) Reddick R. M., Wechsler R. H., Tinker J. L., Behroozi P. S., 2013, ApJ, 771, 30
 Skibba & Sheth (2009) Skibba R. A., Sheth R. K., 2009, MNRAS, 392, 1080
 Smith et al. (2003) Smith R. E., et al., 2003, MNRAS, 341, 1311
 Somerville & Davé (2015) Somerville R. S., Davé R., 2015, ARA&A, 53, 51
 Tinker et al. (2005) Tinker J. L., Weinberg D. H., Zheng Z., Zehavi I., 2005, ApJ, 631, 41
 Tinker et al. (2008) Tinker J., Kravtsov A. V., Klypin A., Abazajian K., Warren M., Yepes G., Gottlöber S., Holz D. E., 2008, ApJ, 688, 709
 Tinker et al. (2010) Tinker J. L., Robertson B. E., Kravtsov A. V., Klypin A., Warren M. S., Yepes G., Gottlöber S., 2010, ApJ, 724, 878
 Toribio et al. (2011) Toribio M. C., Solanes J. M., Giovanelli R., Haynes M. P., Martin A. M., 2011, ApJ, 732, 93
 Van Der Walt et al. (2011) Van Der Walt S., Colbert S. C., Varoquaux G., 2011, preprint, (arXiv:1102.1523)
 Wang & White (2012) Wang W., White S. D. M., 2012, MNRAS, 424, 2574
 Wechsler et al. (2006) Wechsler R. H., Zentner A. R., Bullock J. S., Kravtsov A. V., Allgood B., 2006, ApJ, 652, 71
 Weinmann et al. (2006) Weinmann S. M., van den Bosch F. C., Yang X., Mo H. J., 2006, MNRAS, 366, 2
 Wyithe & Brown (2010) Wyithe J. S. B., Brown M. J. I., 2010, MNRAS, 404, 876
 Wyithe et al. (2009) Wyithe S., Brown M. J. I., Zwaan M. A., Meyer M. J., 2009, preprint, (arXiv:0908.2854)
 Zehavi et al. (2005) Zehavi I., et al., 2005, ApJ, 630, 1
 Zehavi et al. (2011) Zehavi I., et al., 2011, ApJ, 736, 59
 Zheng & Guo (2016) Zheng Z., Guo H., 2016, MNRAS, 458, 4015
 Zheng et al. (2005) Zheng Z., et al., 2005, ApJ, 633, 791
 Zheng et al. (2007) Zheng Z., Coil A. L., Zehavi I., 2007, ApJ, 667, 760
 Zu & Mandelbaum (2015) Zu Y., Mandelbaum R., 2015, MNRAS, 454, 1161
 Zwaan et al. (2003) Zwaan M. A., et al., 2003, AJ, 125, 2842
 Zwaan et al. (2005) Zwaan M. A., Meyer M. J., StaveleySmith L., Webster R. L., 2005, MNRAS, 359, L30
 van den Bosch et al. (2013) van den Bosch F. C., More S., Cacciato M., Mo H., Yang X., 2013, MNRAS, 430, 725
Appendix A Implementation details
In this Appendix, we give the details of our implementation of the analytical model describing the abundance and clustering of Hi galaxies.
a.1 Basic halo model
In the halo model framework (Cooray & Sheth, 2002), it is assumed that all galaxies live inside dark matter halos. In the standard approach, galaxies in a given halo are split into a central galaxy that lives close to the center of the gravitational potential and satellites that orbit around the central. The key ingredient to any halo model framework is the prescription by which one populates the halos with galaxies based on some quantity , where could be, e.g., luminosity, stellar mass, mass of the gas content, etc. We denote by the fraction of halos (i.e. halos which have their masses in the range to ) which contain a central galaxy having that particular quantity larger than a threshold value . Similarly, denotes the average number of satellites having that quantity greater than and residing in a halo that hosts a central galaxy having the same quantity more than . Our model thus assumes that the central is the ‘first’ galaxy in a group, i.e., groups cannot be devoid of centrals. As mentioned in the Introduction, our model will place each central at exactly the centerofmass of its parent halo, ignoring the effects of any offsets in position or velocity with respect to the dark matter.
The average number of galaxies in a halo with the chosen property larger than can then be expressed as
(6) 
Then the average comoving number density of such galaxies can be computed as,
(7) 
In the above expression, is the differential comoving number density of halos, i.e. the halo mass function. For the present work, we have used the halo mass function as prescribed in Tinker et al. (2008).
The galaxy point correlation function can be decomposed into the socalled halo and halo terms such that,
(8) 
The halo term of the correlation function gives the correlation between galaxies living in two different halos and can be written in Fourier space as
(9) 
where is the nonlinear power spectrum of the underlying dark matter density field and is the scaledependent bias of dark matter halos. While writing the above expression we assumed that the point correlation function between halos of mass and can be written as where is the twopoint correlation function of the dark matter density field (i.e., the Fourier transform of ). For computing , we have used the prescription presented in Smith et al. (2003). This prescription takes the linear theory matter power spectrum as an input for which we have used the fitting function given in Eisenstein & Hu (1999). To compute the halo bias we have used the fitting form of linear scaleindependent halo bias as given in Tinker et al. (2010) and the prescription for including scale dependence as given by Tinker et al. (2005). The halo term in real space is then obtained as .
The quantity in equation (9) is the normalized Fourier transform of the profile with which satellites are distributed around their central galaxy. In this work, we have used an NFW profile (Navarro et al., 1997) for , with a mean concentrationmass relation as prescribed in the first half of equation (A5) of Paranjape (2014) which is a slightly modified version of the relation given in Ludlow et al. (2014). Scatter around this mean relation has been neglected. Having defined all these quantities, we can compute the correlation between the galaxies inside the same halo, i.e., the halo term of the correlation function, as
(10) 
In the above equation, is the convolution of with itself. We have assumed Poisson distribution for satellite counts to replace the second factorial moment with in the second line.
In principle, to gain a few percent level accuracy in the model, a few other aspects have to be considered too. These are the halo exclusion in the 2halo term, nonPoissonian effects in satellite pair counts, scatter in the concentrationmass relation in the 1halo term, the effects of redshift space distortions (see, e.g., van den Bosch et al., 2013) etc. Given the quality of the Hi selected galaxy data and for the sake of simplicity, we ignore these additional effects. As mentioned in the main text, however, we are in the process of improving upon these systematic uncertainties using calibrations of halo clustering directly from body simulations (Zheng & Guo, 2016).
To reduce the redshift space effects, correlation measurements are often quoted in terms of the projected correlation function (Davis & Peebles, 1983)
(11) 
Here () is the separation between two galaxies perpendicular (parallel) to the line of sight. All the quantities , and defined above depend on the property which we have not shown explicitly for brevity.
a.2 Optical and HI scaling relation
In the following we will consistently denote SDSS galaxy colour as . Since we are considering only central galaxies in our scaling model, equation (3) can be rewritten as,
(12) 
The quantity is determined by the optical HOD analysis of Zehavi et al. (2011), and we use an interpolation kindly provided by Ramin Skibba. The reader may note that we have not accounted for any halo mass dependence in the remaining functions and , which respectively refer to the colour distribution of centrals at fixed luminosity and the distribution of Hi in centrals of fixed luminosity and colour. We discuss the reasons for this below.
The observed colourluminosity distribution of SDSS galaxies has a wellknown bimodality (ultimately related to the starformation properties of galaxies) which can be modelled as a doubleGaussian in colour at fixed luminosity (see, e.g., Baldry et al., 2004), which we can write as
(13) 
where is the ‘red fraction’ of galaxies with luminosity , and are the red/blue modes of the distribution.
When constructing a halo model that includes colour information, one must additionally model the colour distribution of satellites and centrals. Skibba & Sheth (2009) showed that the data allow us to think of each of these as simply weighted versions of the same two Gaussian modes used for modelling the observed colour distribution. Since the allgalaxy sample leads to very accurate doubleGaussian fits, this means that the uncertainty in the individual shapes of the central and satellite colour distributions is relegated to two functions that describe the red fraction of satellites/centrals of luminosity in halos. These functions can be constrained by comparing with the colourdependence of clustering in SDSS.
As Skibba & Sheth (2009) further demonstrated using SDSS DR5 data (this has also been confirmed by Paranjape et al., 2015, for SDSS DR7 data), a final simplification is that these satellite/central red fractions can be safely assumed to be independent of halo mass. This also implies that the full colour distributions of centrals (and satellites) are independent of halo mass. Since the centrals and satellites must add up to the full galaxy population whose (halo mass independent) colour distribution is directly observed, this reduces the problem to calibrating a single function of luminosity, which can be chosen to be the satellite red fraction .
In detail then, the quantity we require in our model is built as follows (see also Appendix A of Paranjape et al., 2015):
(14) 
The distributions and are Gaussians with mean and and standard deviation and . These were calibrated by Paranjape et al. (2015) to have the forms
(15) 
We will also need expressions for the allgalaxy red fraction and the satellite red fraction . The former was calibrated by Paranjape et al. (2015) to have the form
(16) 
while Pahwa & Paranjape (2017) showed that the satellite red fraction given by
(17) 
gives a good description of SDSS colourdependent clustering, although this conclusion was not based on a rigorous statistical analysis. We discuss possible improvements on calibrating this ingredient in the main text.
Putting things together, the central red fraction appearing in equation (14) can be written as
(18) 
where
(19) 
and .
As regards the halo mass dependence of the Hioptical scaling distribution , the quality of the clustering data force us to minimise the number of free parameters in the model. For simplicity, therefore, we assume this distribution to be independent of halo mass. Inspired by the nearsymmetric distributions for found by matched analyses such as Maddox et al. (2015) or Catinella et al. (2010), we model to be Gaussian with mean (c.f. equation 4) and standard deviation as discussed in section 4. The good quality of the fit we achieve (see Table 2) is a post hoc justification that this model is indeed acceptable.
Finally, equation (12) describes the scaling relation for galaxies in bins of Hi mass. To get the HOD for thresholds of Hi mass (this is essential since the clustering data is only available for such thresholds), we need the quantity which is simply given by the expression,
(20) 
where is the error function.