Assessing the reliability of Friends-of-Friends groups
on the future Javalambre Physics of the
Accelerating Universe Astrophysical Survey
Key Words.:Methods: numerical – Methods: statistical – Galaxies: groups: general
Aims:We have performed a detailed analysis of the ability of the friends-of-friends algorithm in identifying real galaxy systems in deep surveys such as the future Javalambre Physics of the Accelerating Universe Astrophysical Survey. Our approach is two-fold, i.e., assessing the reliability of the algorithm in both real and redshift space. In the latter, our intention is also to determine the degree of accuracy that could be achieved when using spectroscopic or photometric redshift determinations as a distance indicator.
Methods:We have built a light-cone mock catalogue using synthetic galaxies constructed from the Millennium Run Simulation I plus a semi-analytical model of galaxy formation. We have explored different ways to define the proper linking length parameters of the algorithm in order to perform an identification of galaxy groups as suitable as possible in each case.
Results:We find that, when identifying systems in redshift space using spectroscopic information, the linking lengths should take into account the variation of the luminosity function with redshift as well as the linear redshift dependence of the radial fiducial velocity in the line of sight direction. When testing purity and completeness of the group samples, we find that the best resulting group sample reaches values of and of systems with high levels of purity and completeness, respectively, when using spectroscopic information. When identifying systems using photometric redshifts, we adopted a probabilistic approach to link galaxies in the line of sight direction. Our result suggests that it is possible to identify a sample of groups with less than false identification at the same time as we recover around of the true groups.
Conclusions:This modified version of the algorithm can be applied to deep surveys provided that the linking lengths are selected appropriately for the science to be done with the data.
The study of galaxy systems is one of the most important topics of extragalactic astronomy because the history of galaxy formation and evolution is encrypted in these density peaks. Analysing the properties of galaxies in groups at different times is a direct probe of how the local environment shapes the galaxies inside of them, offering a direct insight into the physics that has occurred within the halos.
In order to use these great laboratories to improve our understanding of the Universe, it is crucial to define them properly. Then, it is necessary to implement an identification criterion to define galaxy systems. Through the decades, defining the proper algorithm to identify galaxy systems has troubled scientists. Many attempts have been carried out on the search of the most suitable method to identify galaxy systems using optical properties (see Gal 2006 for a review of different identification methods). Among them, we can highlight the following: methods that use positional information of galaxies to detect density peaks over a background density (e.g. Couch et al. 1991; Dalton et al. 1997; Ramella et al. 2001; Merchán & Zandivarez 2002; Trevese et al. 2007; Gillis & Hudson 2011; Farrens et al. 2011); methods that include some observational restrictions for a given type of galaxy, like their colours, magnitudes and their membership to a red sequence (e.g. Gladders & Yee 2000; Goto et al. 2002; Miller et al. 2005; Koester et al. 2007); and finally, methods that model cluster properties such as luminosity and density profiles through different probability approaches (e.g. Shectman 1985; Postman et al. 1996; Kepner et al. 1999; Gal et al. 2000; Milkeraitis et al. 2010; Ascaso et al. 2012).
Among all these different methods, those based only on the geometric positional information of galaxies have the advantage that they do not bin the data or impose any constraints on the physical properties of the systems to avoid selection biases. The most extensively used finding algorithm that follow this criterion is the Friends-of-Friends (FoF) technique, which detects density enhancements in 3-dimensions by searching galaxy pairs that are closer than a given separation. When applied to an observational catalogue, the FoF algorithm makes use of the angular coordinates and the spectroscopic redshifts of the galaxies. Nevertheless, to identify groups in redshift space one has to deal with certain difficulties. One of them is the fact that in most cases the observational samples are flux limited, for which the observed decreasing galaxy number density as a function of redshift should be taken into account. Another important issue is the peculiar velocities of galaxies in groups, since they elongate groups in the redshift (line-of-sight) direction making them harder to detect, and may cause group members to be linked with field galaxies or even to merge into another group. Although the FoF technique has been widely used to find groups and clusters in galaxy surveys, it has not been tested properly at intermediate and high redshifts. Therefore, it is very important to put the method under a highly detailed testing process to clearly determine its capability to recover real systems.
In the last years, several medium band photometric surveys (e.g., COMBO-17 - Wolf et al. 2004, COSMOS 21 - Ilbert et al. 2009, ALHAMBRA Survey - Moles et al. 2008; Molino et al. 2013, SHARDs - Pérez-González et al. 2013) have become available. These surveys provide photometric redshift resolution, and very valuable datasets to identify galaxy concentrations. Future surveys will provide hundreds of millions of galaxies with this photo-z resolution, make specially important to study and develop the application of FoF algorithms to photometric redshift datasets, a task which is not straightforward, due to the pronounced blurring of galaxy systems in redshift space and the sometimes complex shape of the photometric redshift error distributions. There are several authors that have proposed a modified FoF algorithm to be applied to photometric surveys (e.g. Botzler et al. 2004; Liu et al. 2008; Li & Yee 2008; van Breukelen & Clewley 2009). Beyond the chosen method, all the parameters and scaling relations of any algorithm should be carefully tested in order to apply one of these methods on a given deep photometric survey.
One of the most promising international projects to build a wide field photometric survey is the Javalambre Physics of the Accelerating Universe Astrophysical Survey (J-PAS111jpas.org, Benítez et al. 2009; Benitez et al. 2013, in prep.) which will cover more than 8000 square degrees in 54 narrow bands and 5 broad bands in the optical frequency range. The survey, which is an international collaboration mainly between Spain and Brazil, will be carried out using two telescopes of 2.5 m and 0.8 m apertures, which are being built at Sierra de Javalambre, in Spain (Benítez et al. 2009; Moles et al. 2010). The catalogue is planned to take 4-5 years to be undertaken and it is expected to map down to an apparent magnitude of .
The advent of deep photometric surveys with reliable estimates of photometric redshifts, as the future J-PAS, will demand a well-tailored set of tools in order to perform different statistical studies. Among them, the availability of different algorithms to extract reliable samples of galaxy systems is quite important. However, in order to test the different observational restrictions in the identification procedure, we must use reliable mock galaxy catalogues built from cosmological numerical simulation with all the 3-d positional information. One of the largest cosmological numerical simulations is the Millennium Simulation (Springel et al. 2005). When combined with semi-analytic models of galaxy formation, this simulation constitutes a very useful tool to mimic the observational constraints of a given catalogue under study. The several snapshots available for this numerical simulation at different times can allow the construction of very detailed light-cone mock catalogues that include the corresponding effects of galaxy evolution up to redshift values similar to those expected to be achieved with the future J-PAS ().
The aim of this work is performing a detailed analysis of the capability of a modified FOF algorithm to identify galaxy systems in a deep photometric redshift survey like the future J-PAS. The adopted modified FoF algorithm is the one developed by Liu et al. (2008), known as Probability FoF. This method uses a probability distribution function to model the photometric redshift uncertainties, obtaining a very realistic way to deal with the radial linking length without introducing artificial slices in the survey. Our work involves testing each observational restriction in order to disentangle possible problems introduced in the identification process. This task is performed on a J-PAS light-cone mock galaxy catalogue constructed using the semi-analytical galaxies extracted from the Millennium Simulation (Guo et al. 2011). Our study intends to determine the purity and completeness of a resulting galaxy group sample obtained from a group identification algorithm that only uses (angular coordinates+redshifts) positional galaxy information and the usefulness of this sample to become an input catalogue for further refinements adding other observational properties.
The layout of this paper is as follows: in section 2, we describe the N-body simulation and the semi-analytic model of galaxy formation used to build the mock catalogue. In section 3 we describe the implementation of the FoF algorithm and the modifications needed in order to identify groups in deep redshift surveys as well as photometric ones. We also include in this section the percentage of purity and completeness of the resulting finder algorithm as a function of redshift. Finally, in section 4 we summarise our results and discuss the statistical implications of using this type of algorithm in deep photometric surveys.
2 The mock catalogue
We build a light-cone mock catalogue using a simulated set of galaxies extracted from the Guo et al. (2011) semi-analytic model of galaxy formation applied on top of the Millennium Run Simulation I.
2.1 The N-body simulation
The Millennium Simulation is a cosmological Tree-Particle-Mesh (Xu 1995) N-body Simulation (Springel et al. 2005), which evolves 10 billion () dark matter particles in a 500 periodic box, using a comoving softening length of 5 . The cosmological parameters of this simulation are consistent with WMAP1 data (Spergel et al. 2003), i.e., a flat cosmological model with a non-vanishing cosmological constant (): =0.25, =0.045, =0.75, =0.9, =1 and =0.73. The simulation was started at =127, with the particles initially positioned in a glass-like distribution according to the primordial density fluctuation power spectrum. The particles of mass are then advanced with the TPM code, using 11,000 internal time-steps, on a 512-processor supercomputer. The full particle data (positions and velocities) between and were stored at 60 output times spaced in expansion factor according to . Additional outputs were added at = 30, 50, 80, 127 to produce a total of 64 snapshots in all.
2.2 The semi-analytic model
In order to obtain a simulated galaxy set we adopt the Guo et al. (2011) semi-analytic model, which fixed several open issues present in some of its predecessors. For instance, the authors increased the efficiency of supernova feedback by introducing a direct dependence of the amount of gas reheated and ejected on the virial mass of the host halo. Although the resulting model fits the stellar mass function of galaxies well at low redshifts, it still over produces low-mass galaxies at . Guo et al. (2011) also introduced a more realistic treatment of satellite galaxy evolution and of mergers, allowing satellites to continue forming stars for a longer period of time and reducing the satellite excessively rapid reddening. The model also includes a treatment of the tidal disruption of satellite galaxies.
This model produces a complete sample when considering galaxies with rest frame absolute magnitude in the SDSS -band brighter than -16.4, which implies galaxies with stellar masses larger than .
Since different cosmological parameters have been found from WMAP7 (Komatsu et al. 2011), one may argue that the studies carried out in the present simulation could produce results that do not agree with the current cosmological model. However, Guo et al. (2013) have demonstrated that the abundance and clustering of dark halos and galaxy properties, including clustering, in WMAP7 are very similar to those found in WMAP1 for , which is the redshift range of interest in this work (see Sect. 2.3).
2.3 Mock catalogue construction
We present mock observations of the artificial Universe constructed from the Millennium Simulation, by positioning a virtual observer at zero redshift and finding those galaxies which lie on the observer’s backward light-cone. In order to do this, we build a mock sample of galaxies within an octant (solid angle= sr), made up of shells taken from different snapshots corresponding to the epoch of the lookback time at their corresponding distance. This method is commonly used to construct mock galaxy catalogues and it takes into account gravitational evolution as well as the evolution of the astrophysical properties (Díaz-Giménez 2002; Blaizot et al. 2005; Kitzbichler & White 2007; Henriques et al. 2011; Wang & White 2012). We use the last 27 snapshots, which reach a maximum redshift of z=1.5. Given that the simulation box is only 500 on a side, in order to reach a greater distance, it is necessary to use the periodicity of the simulation box and build a “super-box”, which is by construction several simulations put together side by side. The cosmological redshift (or redshift in real space) is obtained from the comoving distance of the galaxies in the super-box by using , where is the comoving distance and .
To mimic the observations, we introduce the distorted or spectroscopic redshift, , by considering the peculiar velocities of the galaxies in the radial direction, therefore:
Given that the method to construct the light-cone uses shells at different snapshots, it introduces differences when compared with the observed Universe:
The first problem arises because all galaxies at a given shell have the same evolutionary stage corresponding to the output simulation time. Therefore, the mock galaxies show a discrete magnitude evolution which turns out to be more abrupt at higher redshifts (since the snapshots are spaced logarithmically with time). However, observationally, the properties of the galaxies vary continuously with redshift. This issue introduces a bias in the galaxy density distribution of the resulting mock catalogue. Also, the clustering of galaxies changes from snapshot to snapshot due to their proper movements: the larger the time-spacing between subsequent snapshots, the larger the variation in the structures.
The second problem arises because at the edges of the imaginary shells, galaxies come from two different evolutionary stages. Just considering the movement in the simulation box, if the spacing among outputs is too large, the positions of the galaxies could have changed dramatically from one output to the next one, making a galaxy being observed either twice or not at all, depending on the direction of its motion (see Fig. 1).
To deal with these issues we introduced the following corrections during the mock construction procedure:
Positions and velocities are interpolated between the outputs in the and shells, according to their distance to the shell edges. We recompute the rest-frame absolute magnitudes of the galaxies within a given shell at cosmic time, , by interpolating linearly between the values corresponding to the current shell and the previous snapshot at (early time), but using the previously interpolated galaxy position inside the shell. It has been argued in previous works that using interpolated positions and velocities could produce dynamically incorrect velocities and could diffuse structures (Kitzbichler & White 2007). In appendix A we show that using a mock catalogue with interpolated galaxy positions and velocities do not introduce any particular bias in the results that we have obtained in this work.
We considered two possible cases. First, the repeated galaxies case, where galaxies near the low redshift side of the shell are moving towards lower redshifts (top right panel of Fig. 1) also appear in the shell (top left panel of Fig. 1). Second, the missing galaxies case, where galaxies close to the low redshift side of the shell, below the boundary, are moving towards higher redshifts (bottom right panel of Fig. 1), and do not appear in the shell either (bottom left panel of the Fig. 1). In the first case we just discarded the galaxy positioned at the shell, since it will appear at the consecutive shell. In the second case, we reassigned the position of the galaxy in the shell with the interpolated position of the galaxy in the shell.
As previously stated, in order to reach the desired depth of the catalogue we have filled the space with a required number of replications of the fundamental volume, leading us to obvious artefacts if the simulation is viewed along one of its preferred axes. Although we can not avoid this behaviour in the octant light-cone, we could minimise this kaleidoscopic effect in a smaller light-cone by orienting the survey field appropriately following the procedure described by Kitzbichler & White (2007). According to that work, if we select an observational field defined by the lines-of-sight to the four points with Cartesian coordinates given by where is the side of the cube, and and are arbitrary numbers, we obtain a near rectangular light-cone survey of angular size x sr with the first duplicate point at comoving distance . In this way, we select the parameters in order to obtain a light-cone with a solid angle of and without repetitions out to .
The volume limited sample with absolute magnitudes brighter than contained in the selected light-cone comprises 6,756,097 galaxies up to . Finally, we compute the observer-frame galaxy apparent magnitudes from the publicly available rest-frame absolute magnitudes provided by the semi-analytic model: , where is the comoving distance computed from the spectroscopic redshift. The -corrections are obtained as a byproduct of the method that computes the photometric redshifts (see Sect. 2.4). We set an observer-frame apparent magnitude limit of .
The final spectroscopic mock catalogue (sp-mock) comprises 793,559 galaxies with a median redshift of within a solid angle of 17.6 . In Fig. 2 we show an illustration of the galaxy distribution as a function of redshift (upper panel) and the redshift distribution of galaxies with in the selected light-cone (lower panel).
2.4 Photometric redshift assignment
We assigned photometric redshifts to the mock catalogue previously built. In order to do this, we first obtained spectral types from the original rest-frame photometry and spectroscopic redshifts by running the Bayesian Photometric Redshift package (BPZ, Benítez (2000)) with the ONLY_TYPE yes option. Then, we transformed the given photometry in the mock catalogue to the photometry of the J-PAS. This transformation uses the filter curve response and the spectral types obtained. Finally, we ran again BPZ on this new photometry obtaining the photometric redshift associated with the new photometry. As a byproduct of this method, we compute the observer-frame apparent magnitudes of the mock galaxies (and therefore, their corresponding k-corrections). All the details can be found in Ascaso et. al (2013, in prep.)
3 The Friends-of-Friends algorithm and the tuning of the linking length parameters
The Friends-of-Friends algorithm was initially developed to identify galaxy systems in redshift space considering a flux limited catalogue (Huchra & Geller 1982). Since then, several adaptations of this percolation algorithm have been used (Merchán & Zandivarez 2002; Eke et al. 2004; Knobel et al. 2009), or modified to identify halos in 3-D from simulations (Davis et al. 1985) - for a compilation of algorithms see Knebe et al. (2011)) - or identifying groups through photometric redshifts (Botzler et al. 2004; Li & Yee 2008; Liu et al. 2008).
The FoF algorithm links galaxies that share common neighbours (friends). It starts looking for the ”friends” of an initial galaxy that have separations lower than a given threshold. Groups are defined as sets of galaxies that are connected by one or more friendship relations, i.e., friends of friends. For each galaxy not assigned to a group, the algorithm searches around it for companions with projected separation from the first galaxy:
where is the angular separation among a pair of galaxies, and refer to their radial velocities (or redshifts), and is the mean of their comoving distances. All friends of a galaxy are added to the list of group members. The surroundings of each friend are then examined. This process is repeated until no further neighbours are found.
When working with observational samples, there are two main characteristics inherent to the observations that make the group finding difficult. One of them is the flux limit of the catalogue, and the other is the redshift space distortion. In order to adopt the best linking length parameters, and , both issues must be treated separately.
|Sample||flux limit||space||Linking lengths||Total number of||Groups with||Groups with|
|gals in groups|
|flux limited-LF variable||23||real||138,675||14,317||2,980|
|flux limited-LF fixed||23||real||159,484||16,641||3,414|
3.1 Reference sample: volume limited sample in real space
We define a sample of galaxies without the two mentioned issues, i.e., we created a volume limited sample of galaxies in real space. This sample is complete down to absolute magnitude . Avoiding the observational constraints, the identification of groups in this sample can be performed straightforwardly. The linking length parameters are defined as follows:
where is the Hubble constant as a function of redshift and takes into account the overdensity of virialised structures in the Universe at a given time:
where is the luminosity function, and is the contour overdensity contrast. Similar to other authors in previous works (see for instance, Snaith et al. 2011), in order to model the , we assume that galaxies are unbiased mass tracers. Analysing the mass function of halos identified with FoF algorithms, Courtin et al. (2011) found deviations from universality in the mass function due to the use of halo parameters not adjusted for different virialisation overdensities in different cosmologies and redshifts. More et al. (2011) showed that the boundary of FoF halos does not correspond to a single local overdensity, but rather to a range of overdensities, and that the enclosed overdensities of the FoF halos are significantly larger than commonly thought. Courtin et al. (2011) showed that deviations from universality are not random but are correlated with the nonlinear virialisation overdensity, , expected from the spherical collapse model for a given cosmology and redshift. In particular, they showed that the linking length required to minimise deviations of the FoF mass function from universal form for a given cosmology and redshift is correlated with the corresponding as:
where is the linking length parameter commonly used for identifying dark matter halos and is set to a value of . From Weinberg & Kamionkowski (2003), the enclosed overdensity of virialised halos is
with . For a Universe with cosmological parameters (0.3, 0.7), the last equation leads to the known value of an enclosed overdensity of virialised halos at z=0 of . It is worth reminding that for the Millennium simulation the cosmological parameters are (0.25, 0.75) which implies that the virialised overdensity at z=0 is .
Even though we are adopting a redshift dependent contour overdensity contrast for our algorithm, it is worth noting that, for the cosmology of the Millennium Simulation, the empirical relation produces a variation of of only in the whole redshift range under study. On the other hand, in appendix B we introduce a variation in the eq. 3 in order to investigate the effect in our results of using a higher contour overdensity contrast, as expected from the analyses of galaxy group density profiles.
Before applying the identifier, it is necessary to compute the luminosity function of the galaxies in the catalogue. To this end, we made use of the information from the semi-analytic model, and computed the LF for every snapshot of the simulation. Then, we fitted double-Schechter functions to the distributions of rest-frame absolute magnitudes:
The best fitting parameters are shown in Fig. 3 as a function of the redshift.
The variation of used in this section as a function of redshift can be seen as the solid line in the left panel of Fig. 4.
This algorithm produces a sample of groups with 4 or more galaxy members, within a solid angle of up to redshift (see Table 1). These groups constitute the reference sample that will be used for testing the algorithm against as we introduce the observational constraints in the mock catalogue.
It is also worth selecting from the reference groups those that have 4 or more members with observer-frame magnitude brighter than 23, i.e., those groups that could be identified in the flux limited catalogue. We will refer to this subsample of reference groups as restricted-reference group sample, which comprises groups (see Table 1).
3.2 Flux limited sample in real space
We first tested the algorithm against a flux limited sample. Now, both linking lengths have to take into account the flux limit of the catalogue, so besides being related to the overdensity contrast they have to include the variation of the sampling of the luminosity function produced by the different distances of the groups to the observers, which is introduced, following Huchra & Geller (1982), by the scale factor 222We kept the notation introduced by Huchra & Geller (1982) although the parameters in this work depend also on the redshifts.:
where , and , with the mean luminosity distance for the galaxy pair.
|LF fixed||LF variable|
Usually, for low redshift samples, the luminosity function of galaxies included in the factor is computed for the whole sample, and it is assumed that there is no evolution in the luminosities up to the maximum depth of the catalogue. Since we intend to reach higher redshift groups, it is worth introducing the evolution of the luminosities of the catalogued galaxies. Therefore, it is important to compute the luminosity function of the galaxies in bins of redshifts, as we did in the previous section, to account for the variation of the density of galaxies as well as their internal luminosity evolution. However, in this section we will also select a sample of groups without using the luminosity evolution of galaxies, i.e, by using a fixed luminosity function determined at redshift zero to assess the importance that it could have in the resulting sample. In Fig. 4, the variation of , and the linking length are shown as a function of redshift. Solid lines correspond to the values obtained from a LF that varies with redshift, while dashed lines correspond to a fixed luminosity function.
We use an observer-frame apparent magnitude to limit our mock galaxies. The number of groups with 4 or more members identified with a fixed LF is , while when varying the LF with redshift, it is (see Table 1).
In order to compare the sample of groups identified in this flux limited catalogue to the reference sample, we use the restricted-reference sample to analyse the purity and completeness of the flux-limited groups.
We define purity and completeness based on a member-to-member comparison. As purity, we consider the fraction of members in the flux-limited groups that belongs to any restricted-reference group, i.e., we want to quantify how good the identified groups are. As completeness we consider the fraction of members in the restricted-reference groups that are part of the flux-limited groups, this quantity intends to indicate the fraction of the true groups that we are able to identify.
Regarding the purity of the flux-limited sample, in the upper panels of Fig. 5 we show the fraction of galaxies belonging to a flux-limited group that are associated to one restricted-reference group which possesses the largest matching rate (solid lines), and the fraction of flux-limited group galaxy members that are not associated to any restricted-reference group (interlopers, dashed lines), both as a function of their real-space redshifts. The left boxes correspond to the flux-limited sample identified with a fixed LF, while the right boxes correspond to the sample identified with LF variable. From these plots, it is clear that the effect of assuming no-evolution in the luminosities leads to a more contaminated sample towards higher redshifts. It can be seen that, when considering evolution in the luminosity function, the purity of our flux-limited groups is high, or in other words, the fraction of interlopers is really low (less than 20%).
However, quantifying the fraction of member galaxies in a flux-limited group that belong to some restricted-reference group is not enough to understand the real nature of the identified groups. For instance, one single flux-limited group could be formed by members that originally belonged to more than one restricted-reference group. In order to disentangle the different galaxy contributions to a given galaxy group, six group categories could be defined when comparing two samples of groups: and .
(perfect match): Groups in sample having 100% of their members associated with only one group in the control sample (red solid lines)
(quasi-perfect match): Groups in sample having between 70% and 100% of their members associated with only one group in the control sample , and the remaining galaxies are interlopers (0%interlopers30%) (blue long dashed lines)
(merging): Groups in sample having between 70% and 100% (inclusive) of their members associated with more than one group in the control sample . This category may accept interlopers (0% interlopers 30%) (green dot short dashed lines)
(group+interlopers): Groups in sample having less than 70% of their members belonging to only one group in the control sample . The remaining members are interlopers (interlopers) (cyan dot long dashed lines)
(merging+interlopers): Groups in sample having less than 70% of their members belonging to more than one group in the control sample , the remaining galaxies are interlopers (brown short dash-long dashed lines)
(false): Groups in sample having 100% of their members not belonging to any group in the control sample (100% interlopers) (black dotted lines)
In this case, to examine the purity of the flux-limited groups, they are split into the six categories defined above taking the sample as the flux-limited sample, while the control sample is the restricted-reference sample.
The fractions of flux-limited groups within each category of purity per redshift bin are shown in the bottom panels of the upper boxes of Fig. 5. The “perfect match” between flux-limited and restricted-reference groups are those in the , in which all the group members of the flux-limited sample belong to a unique restricted-reference group (still, the restricted-reference group might have more extra members). As expected, the higher the redshifts, the lower the fraction of perfectly matched groups. Even though this behaviour is common for both identifications, the sample when using a variable LF has a higher percentage of groups along the whole redshift range than the corresponding values for the fixed LF. The sample includes “quasi”-perfectly matched groups. The fraction of these groups is similar in both identifications.
The green dot short dashed lines () involve flux-limited groups that are the result of merging true groups plus few interlopers. For both identifications, this category is almost nonexistent.
The and contain part of real groups, but also the interlopers are an important fraction of the galaxies in these groups. In both identification they sum up less than in the whole redshift range.
The least desired category is , those are completely false groups. It can be seen that identifying with a fixed LF produces a higher percentage of false groups at higher redshifts. The magenta short dashed lines are the complement of the class, therefore represent all other groups except the least desirable class, , or in other words, groups that contain at least part of the true groups.
In Table 2 we quote the percentage of groups in each of these classes for the whole samples. It can be seen that the sample of flux-limited groups obtained from a variable LF contains a higher percentage () of groups, lower percentage () of , and very similar percentages of the remaining classes than those obtained when identifying groups with a fixed LF.
Regarding the completeness of the sample, the lower plots of Fig. 5 show the results as a function of redshifts for both samples, fixed LF (left panels) and variable LF (right panels). To define completeness, we quantify how many of the restricted-reference groups were identified in the flux-limited sample. From the upper panels of the completeness plots, it can be seen than more than of the members of the reference-sample are included in a given group of the flux-limited samples.
Following a similar procedure as used for the purity analysis, we split groups into six completeness categories:
(perfect match): Groups in the control sample having 100% of their members identified within only one group in the sample (red solid lines)
(quasi-perfect match): Groups in the control sample having between 70% and 100% of their members identified within only one group in sample , and the remaining galaxies are missing in the new identification (0%missing30%, blue long dashed lines)
(split): Groups in the control sample having between 70% and 100% (inclusive) of their members identified within more than one group in sample . This category may accept missing galaxies (0% missing 30%, green dot short dashed lines)
(group+missing galaxies): Groups in the control sample having less than 70% of their members identified within only one group in sample . The remaining members are not identified in any group in the new identification (missing, cyan dot long dashed lines)
(split+missing galaxies): Groups in the control sample having less than 70% of their members identified within more than one group in sample , the remaining galaxies are lost (brown short dash long dashed lines)
(missing group): Groups in the control sample having 100% of their members not identified in any group in sample (100% missing galaxies, black dotted lines)
In this case, the control sample is the restricted-reference sample, while the sample is the flux-limited group sample. The completeness as a function of redshifts based on the different categories is shown in the lower plots of the bottom boxes of Fig. 5. We find that both algorithms are able to identify most of the members of the restricted-reference sample, i.e., the and categories are dominant at all redshifts. We observe that the variable LF identification shows a more pronounced decay of the fractions of groups to higher redshift than the observed for the fixed LF case, however, this behaviour is almost fully compensated for an increasing fraction of groups. The fraction of groups in the other categories is almost negligible, with a slightly increase of groups towards higher redshifts in the variable LF identification, being lower than at the highest redshifts.
In Table 2 we quote the total percentages of restricted-reference groups belonging to each of the completeness categories. The class is lower and the is higher in the variable LF identification than in the fixed LF identification, while the are quite similar in both . One might be tempted to think that the identification with fixed LF produced a better result since the fraction of groups in this identification is slightly higher and the fraction of slightly lower than when using a variable LF. However, it is not worth recovering most of the true group members if the identified groups will be contaminated by a larger number of interlopers that could change the intrinsic properties of the groups or including many false groups. Therefore, it is important to analyse the combination of purity and completeness. The categories and represent the highly pure and complete. Analysing Table 2, the percentages of highly complete groups of both identifications are quite similar ( vs. ), while the percentage of highly pure groups when identified with LF variable is higher. Also, the fixed LF produces of false groups compared with for the variable LF. Therefore, using a variable LF to identify groups is the most appropriated procedure to recover as best as possible most of the restricted-reference group sample.
3.3 Redshift distortions: volume limited sample in spectroscopic redshift space
The other observational constraint that needs to be addressed in order to choose the best linking length parameters is the redshift space distortion. It is necessary to modify the radial linking length when working in redshift space, since the structures seem elongated along the line of sight due to the infall of galaxies in virialised galaxy groups. These elongated structures are commonly called Fingers of God. Therefore, we built a volume limited sample complete down to absolute magnitude , just like the reference sample but in this case the positions of galaxies are distorted according to eq. 1. The linking length parameters are:
The value of is defined above for the reference sample, the value is taken equal to 1, since there is no flux limit, while here we investigate different options for the value of . Usually, for low redshift samples, is defined as a constant that is tuned to produce the more reliable sample of groups in terms of purity and/or completeness.
To determine the best value of in this work, we analysed the sample of reference groups, and computed the velocity differences in redshift space along the line of sight among the group members. Our goal is to find the most appropriate value that satisfies the requirement of being the minimum velocity value needed to link most of the galaxy members of a given group in redshift space. Therefore, for each group we looked for the maximum velocity difference of the members in the line-of-sight to their closest neighbours. These maximum values are shown in the left upper panel of Fig. 6. Dots represent the median values per bin of redshifts while the error bars are their semi-interquartile ranges. It can be seen that the maximum velocity difference to the closest neighbour is increasing towards higher redshifts. In the left lower panel, we divided the y-axis by . The medians of these points determine a roughly constant value of km/s (solid line). Hence, we are going to test the identification algorithm against using a constant value of km/s and a value that varies with redshift as km/s. Moreover, in order to test the influence of the choice of , we also examined a second value. Instead of looking for the maximum of the velocity differences to the closest neighbours, we also investigated the second maximum of those differences. The results are shown in the right panels of Fig. 6. In this case, the values of to be analysed are km/s and km/s. This second approach, with a lower value for , is made in order to test whether a lower value could improve the resulting group sample in both, purity and completeness.
Therefore, we performed four different identifications. We find groups with more than 4 members when using km/s, and when km/s, we find . With the shorter linking length, we identify and groups, with km/s and km/s respectively (see Table 1).
As in the previous subsection, we analyse and compare the purity and completeness of these samples in order to choose the best radial linking length parameter. The purity is defined considering the members of the newly redshift-space identified groups (four samples ) in comparison with the reference sample (control sample ); while completeness is defined taking the members in the reference sample (control sample ) and looking for their counterparts in the redshift-space groups (four samples ). The results as a function of redshifts are shown in Figs. 7 and 8.
The effect of using either a constant or variable value of can be seen by comparing the left to the right boxes of these figures. Firstly, analysing the purity in Fig. 7, it can be seen that the purity of the groups is poorly affected, i.e, modulating the linking length by or keeping it constant, produces similar results as a function of redshift. We observe that roughly of galaxies are associated with the group in the reference sample with the highest matching rate, while of galaxies are interlopers. From the six-category analysis,for both identifications we find and of and groups, respectively. When using a constant , there are of misidentified groups () for any redshift, while this percentage is slightly higher when using a variable .
Now, when including the completeness analysis for both identifications, remarkable differences arise. For the constant km/s, the fraction of galaxies in the reference sample associated with the group in the redshift-space sample with the highest matching rate drastically dropping as a function of redshift, declining to as low as at higher redshifts (top panels of left bottom box of Fig. 7). Also at high redshifts, the groups (completely missing) reach and the contribution of is in the whole redshift range. On the other hand, when analysing the completeness of the sample identified with variable , we observe that more than of galaxies in the reference sample are recovered at all redshifts, with only galaxies missing (top panels of right bottom box). Moreover, the completeness is highly improved obtaining of groups and more than of groups, and less than of missing groups at all redshifts.
From Table 2, based on the combined percentages of the classes 1 and 2, it can be seen that while the percentage of highly pure groups for the identification performed with variable is lower, the percentage of highly complete groups of this sample is significantly higher (). Therefore, the best choice for the radial linking length is such that it varies with redshift.
By comparing Figs. 7 and 8, it can be seen the effect of the amplitude of . Using fixed or variable with km/s, all the fractions observed in the purity analysis are slightly higher than those observed and described above when using km/s. Then, a shorter radial linking length (Fig 8) seems better in terms of purity, i.e, it is able to identify more groups whose members belong to some reference group ( higher in the total percentage of class for both, constant or variable , see Table 2). However, the results from the completeness analysis help choosing the appropriate value. For both of the -identifications, the resulting samples are highly incomplete regardless the redshift. In the best scenario (considering variable ), the fraction of reference members that are included in the redshift-space groups reaches only . This result implies that shortening the size of the radial linking length makes the algorithm to identify fewer of the true groups, resulting in a completeness for the sample which is quite low. This result is clearer when inspecting the total percentages of classes in Table 2. By analysing the identification with variable , it can be seen that the percentage of groups drastically drops from obtained for to for . Even more, the resulting group samples obtained when using km/s are not only incomplete, but dominated by groups of category , the least desired.
It has also been corroborated that using a value higher than , besides not having any physical motivation, increases the completeness of the sample at the cost of the purity, making it lower than .
Therefore, our choice for the radial linking length in redshift space catalogues is (right plots of Fig. 7). The redshift space distortions make it difficult to recover “perfectly matched” groups ( and ), although they are the most common categories that we identify at all redshifts, followed by and . There are of false groups, while the algorithm is not able to recover only of the true groups. All in all, the resulting sample has more than of highly pure groups while we are able to identify of the highly complete groups.
3.4 Spectroscopic sample: flux limited sample in spectroscopic-redshift space
After having chosen the best linking length parameters, we identify groups in the mock galaxy catalogue described in Sect. 2.3. The identification is therefore performed with the following linking lengths:
The algorithm produces a sample of mock groups with 4 or more members (see Table 1). The purity and completeness as a function of redshifts for this sample are shown in Fig 9. Both statistics are computed using the restricted-reference groups as the control sample. It can be seen the combined effect of both observational constraints, the flux limit and the redshift space distortions. Regarding the purity, the fraction of members in the spectroscopic groups that also belong to the restricted-reference group with the highest matching rate (top panels in the left box) drastically decreases towards higher redshifts, ranging from to . When analysing the six categories of groups defined above, an increase in false identification ( groups) can be seen towards higher redshifts, with the sample at redshifts higher than being dominated by these false groups. The “perfectly matched” groups () and “quasi-perfectly matched” groups () are the more frequent among the other categories. Groups associated with a single real group plus more than of interlopers () represent at all redshifts.
From the completeness analysis (right box), the fraction of members in the restricted-reference sample that we have been able to identify in the spectroscopic group with highest matching rate (top panels) decreases with redshift, i.e, it is more likely to lose some of the true members at high redshift.
The “perfectly recovered” groups () are dominant at all redshifts, followed by those groups where only a few members are missing (). The fraction of completely missing groups is almost constant at up to , and then increases towards higher redshifts.
To deepen our study, we analysed the purity of the spectroscopic groups splitting the sample into low () and high () membership groups. The results are shown in Fig.10. It can be seen that the low membership groups are more prone to include false identifications (), while this category is almost non-existent at low redshifts among the high membership groups, and it increases towards higher redshifts. The “perfectly-matched” groups are not frequent in the high membership groups, however this sample is dominated by the “quasi-perfectly-matched” groups until , and groups with more than of interlopers (). The groups (merging) are at al redshifts. These results indicate that the low membership group sample is highly contaminated, and it is strongly recommended not to use it for statistical purposes.
Analysing the total percentages within each of the purity and completeness classes (Table 2), we find that the spectroscopic group catalogue has of groups of high-quality purity (), while the of the restricted-reference sample is well recovered (). The false groups () sum up to , mainly due to low membership false groups, while we completely loose of the true groups (). A closer inspection to the lower panels of Fig. 10 reveals that at low redshifts the percentage of false groups is lower than for low membership groups, while it is negligible for high membership groups, which means that our choices of the linking lengths produce similar results to those that were found in low redshift catalogues by Merchán & Zandivarez (2002).
3.5 Photometric sample: flux limited sample in photometric-redshift space
In this section we perform a similar analysis than in the previous section but focusing in observational catalogues with distances calculated using only photometric information, i.e., by means of photometric redshifts.
3.5.1 The probability Friend-of-Friends: PFOF
To take into account the uncertainties of using photometric redshifts, we modify the identification algorithm in the line of sight direction using the method developed by Liu et al. (2008).
Instead of just computing the module difference among the velocities of a galaxy pair () and restricting it to be smaller than , the definition of a galaxy pair has to take into account the probabilistic nature of the photometric redshifts, and therefore the algorithm has to compute the probability of the distance between two galaxies to be less than the linking length, and then restrict such probability with an artificial threshold. Therefore, following Liu et al. (2008), the probability of two galaxies being closer than is:
where and are the probability distribution functions for the two galaxies in the line of sight direction. Therefore, the line of sight criterion to determine that two galaxies are physically associated is
where is an appropriate probability threshold. This threshold will be determined in the sections below in order to obtain a sample of groups with the suitable balance between purity and completeness.
3.5.2 Testing the PFOF algorithm
In order to apply this modification to our algorithm, we have to adopt a probability distribution for the photometric redshifts.
The most common model used in the literature when working with photometric redshifts is a Gaussian probability distribution (Liu et al. 2008; Ascaso et al. 2012). Therefore, we follow that approach and model the probability distribution function associated with each galaxy by a Gaussian function, i.e.:
where is the photometric redshift and the photometric redshift error of galaxy .
But also, we adopt a different probability distribution, a Lorentzian function, and test the behaviour of the method against different distributions. A Lorentzian function is given by:
|Probability||Gals in||Groups with||Groups with|
Firstly, we test the PFOF algorithm in the case where the galaxy redshifts have small uncertainties, as it is true in the case of spectroscopic redshifts. We adopt km/s (the typical error in SDSS), and apply the PFOF to the mock galaxy catalogue described in Sect. 2.3 using a Gaussian probability distribution in Eq. 5. We identify groups having 4 or more members using a probability threshold of . Choosing as control sample the groups identified in Sect. 3.4, the analyses of completeness and purity revealed that the new identification is pure and complete, considering just the combined fractions and , defined in the previous sections. This means that, in the limit of small uncertainties, the PFOF algorithm behaves as the original FOF algorithm.
As a second test, the value of is adopted in order to mimic the difference between the BPZ photometric redshifts and the spectroscopic redshifts shown in the upper right panel of Fig. 11 (grey histogram). Choosing a Gaussian function to fit the differences, we adopt as the best fitting333we used the Levenberg-Marquardt method to fit no-linear functions redshift error for all galaxies. We also adopt a Lorentzian probability distribution to fit the differences. The best fitting redshift error for the Lorentzian function is .
Then, we modify the redshifts of the galaxies in the mock catalogue by randomly shifting the spectroscopic redshifts according to the previously fitted probability distributions: we generate a sample with the gaussian distribution and a sample with the lorentzian distribution. The distribution of differences for the resulting random samples are shown in Fig. 11. The sample generated with the Gaussian distribution is shown as the black histogram in the upper right panel. It can be seen that this distribution reproduces the mean of that obtained from a more realistic determination of photometric redshifts (BPZ). However, it is not possible to reproduce the tails of the realistic distribution when using a simple Gaussian function. The resulting redshift differences for the random lorentzian sample are shown as black histogram in the lower right panel of Fig. 11. In this case, the mean and the tails of the original distribution are well recovered.
We test the PFOF algorithm in both samples, one with photometric redshifts generated from a Gaussian function, and the other where the photometric redshifts come from a Lorentzian function. The application of the PFOF is straightforward, just using for each galaxy the input distribution from which their redshifts have been generated to compute the probability of Eq. 5.
We tested different probability thresholds to perform the identification on the different samples. These thresholds are defined as being a percentage (99, 95, 90, 80, 70, 60 and 30%) of the maximum probability obtained from Eq. 5. The effect of choosing different thresholds will be described in the analyses of purity and completeness of the resulting group samples.
The number of groups identified in each sample is shown in Table 3.
We analyse the purity and completeness of these samples of groups taking as control sample the restricted-reference group sample, defined in Sect. 3.1. In Fig. 12 we show the percentage of groups identified with PFOF and classified as having purity (blue solid lines), and the percentage of groups of the restricted-reference sample that have been lost by the PFOF algorithm (, red dashed lines), both as a function of the probability threshold. We chose to show here only these categories since they show how bad the identification was. In this figure, the top panel corresponds to the identifications performed on samples of galaxies with photometric redshifts assigned randomly according to a Gaussian distribution, while the bottom panel shows the results for the samples where the photometric redshifts come from a Lorentzian distribution. It can be seen that the percentage of false identifications decreases towards higher probability thresholds, while the opposite happens with the percentage of the missing groups. An appropriate choice of the probability threshold would be the value where both trends overlap, i.e., for the Gaussian distributions, and for the Lorentzian distribution. Having chosen the probability threshold, in both samples, the false groups will sum up to as well as the missing groups.
3.5.3 Application of PFOF to mock galaxies with BPZ photometric redshift
We now test the PFOF algorithm when applied to mock galaxies whose photometric redshifts have been computed in a realistic way (see Sect. 2.4). We identify 2 samples of groups: (i) the algorithm works with a probability Gaussian function with , and (ii) the algorithm works with a probability Lorentzian function with
The numbers of groups identified for the different probability thresholds are shown in Table 3. In order to determine the purity and completeness of these samples, we take as control sample the restricted-reference sample of groups. In Fig. 13, the percentages of false groups () groups and the missing groups () are shown as a function of the probability thresholds. The global behaviour of the trends are similar to what we found when the photometric redshifts were assigned randomly. It can be seen that there is little difference in the identifications when using a Gaussian function to describe the distribution of the photometric redshifts or a Lorentzian function, although the Lorentzian distribution is a better description for the data in a wider range (Fig 11). The appropriate probability threshold are when using Gaussian functions in the algorithm, and when using Lorentzian functions. The percentage of false and missing groups are .
The total percentages of purity and completeness within each category for the different probability threshold when using a Lorentzian function in the PFOF algorithm are quoted in Table 4. is the best compromise to obtain higher percentages of purity and completeness (or lower fractions of false and missing groups).
|Class||BPZ - Lorentzian -|
We also investigate the variation of the fraction of groups within each of the six categories of purity and completeness as a function of redshifts. We choose as our main sample that obtained when using a Lorentzian function in the PFOF algorithm and a probability threshold of . The resulting trends are shown in Fig. 14.
Regarding the purity (top panel), it can be seen that the resulting sample of groups is dominated by false groups () at all redshifts, followed by groups having less than 70% of galaxies that belong to one true group (). Perfect or quasi perfect matched groups are less than in the whole redshift range, which is expectable given the probabilistic nature of the identification. In this figure the magenta short dashed line represent the sum of all the categories but the , resembling groups that contain at least part of the true groups. At redshifts lower than the contribution of all these categories together is higher than the contribution of the false groups, while this behaviour reverses at higher redshifts.
The analysis of the completeness is shown in the bottom panel of Fig. 14. Most of the true groups are missing at redshifts higher than , which is shown with the black dotted line (). At lower redshifts, groups with less than of their members identified in the photometric sample are dominant. The contribution of all true groups whose members have been included total or partially in any photometric group (the sum of all the categories but C6) is higher than at redshifts lower than .
We also split the sample of photometric groups into low and high membership groups (for groups with and , respectively). The six category analysis of purity for low and high membership groups is shown in Fig. 15. The top panel of this figure shows that the low membership groups are responsible for the high contamination by false groups () in the sample in the whole redshift range. High membership false groups are less than at all redshifts, indicating that groups that contain at least part of the true groups sum up to roughly . Therefore, we suggest not using the low membership sample identified with this algorithm to perform statistical studies.
In this work we have performed a detailed analysis in order to assess the reliability of a FoF algorithm to obtain real galaxy systems in deep spectroscopic/photometric redshift surveys. To achieve this goal, we have constructed a synthetic galaxy catalogue using one of the largest simulated galaxy samples available at the present, the semi-analytical galaxies built by Guo et al. (2011) on top of the Millennium Simulation I. We note that adopting any specific semi-analytical model could introduce a dependence of the results on the particular set of parameters and physical processes that were used in the model construction. Nevertheless, analysing the differences caused by using different semi-analytical models is beyond the scope of this work
To build a light-cone mock catalogue we use the information available at different evolutionary stages in order to reproduce temporal galaxy evolution. We have applied several recipes into the mock catalogue construction procedure to avoid different problems that arise from the construction technique, like missing/duplicate galaxies and repetition of structures in the survey. The mock catalogue is tailored to the future J-PAS apparent magnitude limit and photometric band. The technique to compute photometric redshifts for each mock galaxy is also the same that will be applied to that future photometric all-sky survey. The resulting light-cone mock catalogue comprises roughly 800,000 galaxies down to an observer frame apparent magnitude of in the SDSS -band, with a median redshift of and a maximum of within a solid angle of 17.6 .
Firstly, we sought the proper linking lengths to apply in a FoF algorithm in order to identify galaxy groups in a deep spectroscopic redshift survey. We analysed completeness and purity of the sample on the basis of a comparison member-to-member between the identified groups and a reference sample. The analyses of completeness and purity of the resulting sample revealed that the best identification is obtained when the algorithm takes into account the variation of the galaxy luminosity function with the redshift, as well as a linear redshift dependence of the radial fiducial velocity in the line of sight direction. The best choice of the linking lengths are those that lead to a compromise between the completeness and the purity of the resulting sample. In the best scenario, we are able to identify a galaxy group sample in the spectroscopic catalogue that contains more than of highly pure groups (completely pure or with a few interlopers), at the same time that we are able to recover of highly complete groups (completely recovered or with only a few missing galaxies). The percentage of groups that contain at least part of a true group is (in other words, of the groups are completely false identifications), while of the true groups are recovered in the identification process either in one or several groups (only of the true groups are completely lost)
Secondly, using the procedure developed by Liu et al. (2008), we have adapted the FoF algorithm in the line of sight direction into a probabilistic algorithm (pFoF) to work with photometric redshifts as distance estimators. Our analyses were performed in order to determine which is the proper probability distribution function that best describes the data and that leads to the most reliable group identification.On the other hand, we determine the best probability threshold that produces the most complete and pure sample of groups. By comparing the spectroscopic and photometric information of the mock galaxies, we observe that a Lorentzian probability distribution function performs better than a Gaussian function to quantify the discrepancies between the photometric and spectroscopic redshifts. However, after using both distribution functions in the identification procedure for different probability thresholds, we observe that the percentages of completely false and missing groups show little differences as a function of the adopted distribution function. Adopting a compromise between the completeness and purity of the resulting sample, we have determined that the best identification is obtained when using a probability threshold of of the maximum value. The resulting sample includes less than of false identifications while it is able to recover around of the true groups.
We have also observed that, regardless of whether the redshifts are spectroscopic or photometric, the group samples are strongly improved (in terms of purity) when using only groups with more than 10 galaxy members.
This work may be used to predict the number of groups that the algorithm described in this paper might find when applied to the future J-PAS survey. Taking into account the survey geometry, we expect to obtain a sample of groups with low membership () and groups with high membership () when applying the PFOF algorithm with a Lorentzian probability function, an overdensity contrast that resembles the used for DM halos, and a probability threshold of , out to . On the other hand, if we adopt a higher contour overdensity contrast assuming that galaxies are more concentrated than dark matter, we would obtain a galaxy group sample for the future J-PAS survey of low membership groups and groups with high membership when applying the PFOF algorithm with a Lorentzian probability function and a probability threshold of , out to (see appendix B for details).
However, it is worth noticing that the choice of the probability threshold should be done according to the final purpose of the group sample: if the obtained groups will be used as proxies for other group searching algorithms, then one may choose a low probability threshold which would imply a sample with a high completeness level (low purity); if the groups will be used for performing analyses of group properties, then it is better to choose a high probability threshold which would imply a high purity level (low completeness).
Finally, given that our criteria to define purity and completeness of groups are very detailed and restrictive, we were able to assess the different types of groups that contribute to the resulting identified sample. Using more relaxed criteria to define pure and complete groups, as well as different reference samples, could lead to higher percentages compared to those found in this work.
During the latest stages of this work, we have become aware of the existence of a recently submitted work by Jian et al. (2013). In that work, the authors have performed a similar analysis as the one presented here, i.e., using the Liu et al. (2008) adaptation of the FoF algorithm to identify galaxy groups but in the Pan-STARRS1 Medium Deep Surveys. Even though both works pursuit similar objectives about assessing the reliability of galaxy group identification in photometric redshift surveys, the approaches adopted in both works are quite different. For instance, the semi-analytical galaxies, the procedures for determining the proper linking length parameters, the reference samples, the criteria to compute purity and completeness of identified groups as well as the way to compute the photometric redshifts in mock catalogues are some of the points where the two works clearly differ. Although it is difficult to perform a fair comparison among the two works, we note that our values of purity and completeness are overall consistent with those obtained by Jian et al..
Acknowledgements.We thank Manuel Merchán, Mario Sgró and Raúl Angulo for useful discussions and suggestions. The Millennium Simulation databases used in this paper and the web application providing online access to them were constructed as part of the activities of the German Astrophysical Virtual Observatory (GAVO). We thank Qi Guo for allowing public access for the outputs of her very impressive semi-analytical model of galaxy formation. This work has been partially supported by Consejo Nacional de Investigaciones Científicas y Técnicas de la República Argentina (CONICET, PIP2011/2013 11220100100336), Secretaría de Ciencia y Tecnología de la Universidad de Córdoba (SeCyT) and Fundação de Amparo à Pesquisa do Estado do São Paulo (FAPESP), through grants 2011/50471-4 and 2011/50002-4. CMdO acknowleges support of FAPESP (grant #2006/56213-9) and Conselho Nacional de Pesquisas (CNPq). AZ and EDG wish to thank the IAG staff for the hospitality during the extended visit, when part of this work was done.
- Ascaso et al. (2012) Ascaso, B., Wittman, D., & Benítez, N. 2012, MNRAS, 420, 1167
- Benítez (2000) Benítez, N. 2000, ApJ, 536, 571
- Benítez et al. (2009) Benítez, N., Gaztañaga, E., Miquel, R., et al. 2009, ApJ, 691, 241
- Berlind et al. (2006) Berlind, A. A., Frieman, J., Weinberg, D. H., et al. 2006, ApJS, 167, 1
- Blaizot et al. (2005) Blaizot, J., Wadadekar, Y., Guiderdoni, B., et al. 2005, MNRAS, 360, 159
- Botzler et al. (2004) Botzler, C. S., Snigula, J., Bender, R., & Hopp, U. 2004, MNRAS, 349, 425
- Couch et al. (1991) Couch, W. J., Ellis, R. S., MacLaren, I., & Malin, D. F. 1991, MNRAS, 249, 606
- Courtin et al. (2011) Courtin, J., Rasera, Y., Alimi, J.-M., et al. 2011, MNRAS, 410, 1911
- Dalton et al. (1997) Dalton, G. B., Maddox, S. J., Sutherland, W. J., & Efstathiou, G. 1997, MNRAS, 289, 263
- Davis et al. (1985) Davis, M., Efstathiou, G., Frenk, C. S., & White, S. D. M. 1985, ApJ, 292, 371
- Díaz-Giménez (2002) Díaz-Giménez, E. 2002, Master’s thesis, FaMAF, Universidad Nacional de Córdoba
- Eke et al. (2004) Eke, V. R., Baugh, C. M., Cole, S., et al. 2004, MNRAS, 348, 866
- Farrens et al. (2011) Farrens, S., Abdalla, F. B., Cypriano, E. S., Sabiu, C., & Blake, C. 2011, MNRAS, 417, 1402
- Gal (2006) Gal, R. R. 2006, ArXiv Astrophysics e-prints
- Gal et al. (2000) Gal, R. R., de Carvalho, R. R., Odewahn, S. C., Djorgovski, S. G., & Margoniner, V. E. 2000, AJ, 119, 12
- Geller & Huchra (1983) Geller, M. J. & Huchra, J. P. 1983, ApJS, 52, 61
- Gillis & Hudson (2011) Gillis, B. R. & Hudson, M. J. 2011, MNRAS, 410, 13
- Gladders & Yee (2000) Gladders, M. D. & Yee, H. K. C. 2000, AJ, 120, 2148
- Goto et al. (2002) Goto, T., Sekiguchi, M., Nichol, R. C., et al. 2002, AJ, 123, 1807
- Guo et al. (2013) Guo, Q., White, S., Angulo, R. E., et al. 2013, MNRAS, 428, 1351
- Guo et al. (2011) Guo, Q., White, S., Boylan-Kolchin, M., et al. 2011, MNRAS, 413, 101
- Henriques et al. (2011) Henriques, B., White, S., Lemson, G., et al. 2011, ArXiv e-prints
- Huchra & Geller (1982) Huchra, J. P. & Geller, M. J. 1982, ApJ, 257, 423
- Ilbert et al. (2009) Ilbert, O., Capak, P., Salvato, M., et al. 2009, ApJ, 690, 1236
- Jian et al. (2013) Jian, H.-Y., Lin, L., Chiueh, T., et al. 2013, ArXiv e-prints
- Kepner et al. (1999) Kepner, J., Fan, X., Bahcall, N., et al. 1999, ApJ, 517, 78
- Kitzbichler & White (2007) Kitzbichler, M. G. & White, S. D. M. 2007, MNRAS, 376, 2
- Knebe et al. (2011) Knebe, A., Knollmann, S. R., Muldrew, S. I., et al. 2011, MNRAS, 415, 2293
- Knebe et al. (2013) Knebe, A., Pearce, F. R., Lux, H., et al. 2013, MNRAS
- Knobel et al. (2009) Knobel, C., Lilly, S. J., Iovino, A., et al. 2009, ApJ, 697, 1842
- Koester et al. (2007) Koester, B. P., McKay, T. A., Annis, J., et al. 2007, ApJ, 660, 221
- Komatsu et al. (2011) Komatsu, E., Smith, K. M., Dunkley, J., et al. 2011, ApJS, 192, 18
- Li & Yee (2008) Li, I. H. & Yee, H. K. C. 2008, AJ, 135, 809
- Liu et al. (2008) Liu, H. B., Hsieh, B. C., Ho, P. T. P., Lin, L., & Yan, R. 2008, ApJ, 681, 1046
- Merchán & Zandivarez (2002) Merchán, M. & Zandivarez, A. 2002, MNRAS, 335, 216
- Merchán & Zandivarez (2005) Merchán, M. E. & Zandivarez, A. 2005, ApJ, 630, 759
- Milkeraitis et al. (2010) Milkeraitis, M., van Waerbeke, L., Heymans, C., et al. 2010, MNRAS, 406, 673
- Miller et al. (2005) Miller, C. J., Nichol, R. C., Reichart, D., et al. 2005, AJ, 130, 968
- Mo et al. (2010) Mo, H., van den Bosch, F. C., & White, S. 2010, Galaxy Formation and Evolution
- Moles et al. (2008) Moles, M., Benítez, N., Aguerri, J. A. L., et al. 2008, AJ, 136, 1325
- Moles et al. (2010) Moles, M., Sánchez, S. F., Lamadrid, J. L., et al. 2010, PASP, 122, 363
- Molino et al. (2013) Molino, A., Benítez, N., Moles, M., et al. 2013, ArXiv e-prints
- More et al. (2011) More, S., Kravtsov, A. V., Dalal, N., & Gottlöber, S. 2011, ApJS, 195, 4
- Peacock (1999) Peacock, J. A. 1999, Cosmological Physics
- Pérez-González et al. (2013) Pérez-González, P. G., Cava, A., Barro, G., et al. 2013, ApJ, 762, 46
- Postman et al. (1996) Postman, M., Lubin, L. M., Gunn, J. E., et al. 1996, AJ, 111, 615
- Ramella et al. (2001) Ramella, M., Boschin, W., Fadda, D., & Nonino, M. 2001, A&A, 368, 776
- Ramella et al. (1989) Ramella, M., Geller, M. J., & Huchra, J. P. 1989, ApJ, 344, 57
- Shectman (1985) Shectman, S. A. 1985, ApJS, 57, 77
- Snaith et al. (2011) Snaith, O. N., Gibson, B. K., Brook, C. B., et al. 2011, MNRAS, 415, 2798
- Spergel et al. (2003) Spergel, D. N., Verde, L., Peiris, H. V., et al. 2003, ApJS, 148, 175
- Springel et al. (2005) Springel, V., White, S. D. M., Jenkins, A., et al. 2005, Nature, 435, 629
- Trevese et al. (2007) Trevese, D., Castellano, M., Fontana, A., & Giallongo, E. 2007, A&A, 463, 853
- van Breukelen & Clewley (2009) van Breukelen, C. & Clewley, L. 2009, MNRAS, 395, 1845
- Wang & White (2012) Wang, W. & White, S. D. M. 2012, MNRAS, 424, 2574
- Weinberg & Kamionkowski (2003) Weinberg, N. N. & Kamionkowski, M. 2003, MNRAS, 341, 251
- Wolf et al. (2004) Wolf, C., Meisenheimer, K., Kleinheinrich, M., et al. 2004, A&A, 421, 913
- Xu (1995) Xu, G. 1995, ApJS, 98, 355
- Zandivarez & Martínez (2011) Zandivarez, A. & Martínez, H. J. 2011, MNRAS, 415, 2553
Appendix A Testing the mock catalogue: non-interpolated galaxy positions and velocities
We performed an additional test using a different galaxy lightcone mock catalogue constructed using the original galaxy positions and peculiar velocities obtained from each simulation snapshot (hereafter, Non-Interpolated Positions and Velocities, NIPV).
The new mock catalogue comprises galaxies with absolute magnitudes brighter than up to , i.e, more galaxies than in the interpolated positions and velocities (hereafter, IPV) mock catalogue.
Following the procedure described in Sect. 3.1, we identified a new reference sample for the NIPV mock catalogue. A comparison between the resulting group sample for the NIPV mock catalogue and the original IPV mock catalogue is shown in Table 5.
From the Table, it can be seen that the new reference group sample is only larger than the IPV group sample, and comprises more galaxies. To investigate intrinsic differences among the groups of both reference samples we performed a comparison member by member. If we use the IPV group sample as reference, our comparison shows that the of the NIPV groups are directly correlated with the IPV group sample, while only of NIPV groups are intrinsically different. On the other hand, using the NIPV group sample as reference, of the IPV groups are directly correlated with the NIPV groups sample, while only of IPV groups are missing in the NIPV group sample. Therefore, from this two-way comparison, we conclude that both reference samples show a high level of statistical agreement.
|Mock||Total number||Total number of groups|
Nevertheless, small differences in the positions/velocities of galaxies in both mock catalogues could still have an impact on the resulting computations of purity and completeness of different group identifications carried out in this work. Hence, we have performed a second test in order to quantify the impact of using a NIPV mock catalogue in the results obtained in our work. On the NIPV mock catalogue, we performed the same procedure described in Sect. 3.3. First, we use the NIPV group reference sample and compute the maximum (and second maximum) velocity difference of the members in the line-of-sight to their closest neighbours. As expected from the very good statistical agreement among the reference samples, the values previously obtained in Sect. 3.3 are also the best values for the NIPV group sample, i.e, and . Second, we reproduced the test previously performed on the volume limited IPV mock catalogue to analyse the effect of distortions in redshift space, by performing an identification of groups in redshift space on the volume limited NIPV mock catalogue using four different linking length parameters in the line-of-sight direction: , , and . The percentages of purity and completeness of groups split into six categories obtained for the NIPV groups samples are shown in Table 6. For a direct comparison, we also included the previous findings associated with the IPV samples.
|Class||Redshift Space -|
From the comparison with the values obtained for the IPV group samples, it is quite clear that identifying groups on a NIPV mock catalogue does not introduce statistical significant differences in the corresponding percentages of purity and completeness of groups. Therefore, we conclude that the adopted IPV mock catalogue used throughout our work does not introduce any particular bias in our results.
Appendix B Groups identified with higher contour overdensity contrast
Properties of groups of galaxies depend sensitively on the algorithm for group selection. In the past, groups of galaxies have been identified in observational catalogues with FoF linking lengths corresponding to different contour overdensity contrasts: 20 (Geller & Huchra 1983), 80 (Ramella et al. 1989; Merchán & Zandivarez 2002, 2005), 200 (Zandivarez & Martínez 2011) or 365 (Berlind et al. 2006). According to Knebe et al. (2013), it must be stressed that there is no right or wrong way; users of halo finder catalogues just need to be aware that several alternative definitions exist and which one of these has been used, especially when computing masses and other group properties.
In this section we apply a different contour overdensity contrast to identify groups in the mock galaxy catalogues. Some authors argued that since galaxies are more concentrated than dark matter, a higher contour overdensity contrast should be used (Eke et al. 2004; Berlind et al. 2006). Therefore, using these works we modify the empirical contour overdensity contrast of Courtin et al. (2011), shown in Eq. 3, by lowering the original linking length parameter from 0.2 to 0.14. Note that at redshift z=0, this formula leads to compared to that has been used in the main body of this work. As stated previously in Sect 3.1, it should be reminded that the redshift dependence only introduce a variation of the linking length parameter of in the whole redshift range.
With the aim of analysing the effect of a different overdensity in the performance of the group finder, we repeated all the stages of this work for this new identification. The new reference sample identified in real space comprises groups with more than 4 members. This sample has less groups than the sample identified with a lower contour overdensity contrast.
|Class||Flux limited||Redshift -||Sp-mock|
|Class||BPZ - Lorentzian -|
We performed the tests of the FoF algorithm against redshift space distortions and flux limit. The appropriate linking length in the line of sight direction was determined in the same way as we did before: by measuring the maximum separation in the distorted radial direction to the closest neighbour (and second closest neighbour). We found no differences in the result. In Table 7, we quote the percentages of groups split into the different categories of purity and completeness. By comparing with Table 2, it can be seen that there is no change in the behaviour of the group finder. The samples obtained with a higher contour overdensity contrast exhibit the same purity and completeness as the sample obtained with a lower overdensity contrast. The final spectroscopic sample of galaxy groups comprises groups with more than 4 members, i.e., it has less groups than the sample obtained with the higher overdensity.
We also tested the algorithm in photometric redshift space and determined the best probability threshold in terms of purity and completeness. Table 8 shows the percentage of groups into the different categories of purity and completeness for different probability thresholds. It can be seen that the best compromise between purity and completeness is reached when adopting a probability threshold lower than in the identification with lower overdensity contrast. In this case, the best choice in terms of purity and completeness is which produces a sample of groups with more than 4 members, which has less groups than the best sample identified with the higher overdensity contrast and . In this sample, the percentage of false groups is while the percentage of missing groups is .
In Fig. 16 we show the variation of the six categories of purity and completeness for the sample identified with as a function of redshift. This figure can be directly compared to Fig. 14 to observe that changing the contour overdensity contrast does not introduce major differences in the purity/completeness of the resulting sample provided the probability threshold is also changed.
Finally, in Fig. 17 we also show the behaviour of the purity of the sample when groups, identified with , are split into low (9,323 groups) and high (1417 groups) membership. It can be seen that the low membership sample introduces the highest percentage of false identifications (P6), and therefore we recommend to avoid using those groups when performing statistical analyses of the properties of these groups. This result is very similar to the one previously shown in Fig. 15 using a lower overdensity contrast in the identification process.