Robust identification of galaxies in simulations

Ghost busting: introducing a new, robust galaxy finder algorithm for simulations

Rodrigo Cañas, Pascal J. Elahi, Charlotte Welker, Claudia del P. Lagos, Chris Power, Yohan Dubois and Christophe Pichon
International Centre for Radio Astronomy Research, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
ARC Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D)
CNRS and UPMC Univ. Paris 06, UMR 7095, Institut d’Astrophysique de Paris, 98 bis Boulevard Arago, F-75014 Paris, France
Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh, EH9 3HJ, United Kingdom
Korea Institute of Advanced Studies (KIAS) 85 Hoegiro, Dongdaemun-gu, Seoul, 02455, Republic of Korea
E-mail: (RC)
Accepted XXX. Received YYY; in original form ZZZ

Identifying galaxies in hydrodynamical simulations is a difficult task, particularly in regions of high density such as galaxy groups and clusters. We present a new scale-free shape-independent algorithm to robustly and accurately identify galaxies in simulation, implemented within the phase-space halo-finder code VELOCIraptor. This is achieved by using the full phase-space dispersion tensor for particle assignment and an iterative adjustment of search parameters, which help us overcome common structure finding problems. We apply our improved method to the Horizon-AGN simulation and compare galaxy stellar masses (), star formation rates (SFR) and sizes with the elaborate configuration-space halo finder, HaloMaker. Galaxies living in halos with galaxy are the most affected by the shortcomings of real-space finders, with their mass, SFR, and sizes being times larger (smaller) in the case of host (satellite) galaxies. Thus, our ability to measure minor/major merger rates and disentangle environmental effects in simulations can be generally hindered if the identification of galaxies is not treated carefully. Though large systematic differences are obtained on a one-to-one basis, the overall Galaxy Stellar Mass Function, the Star Formation Rate Function and mass-size relations are not greatly affected. This is due to isolated galaxies being the most abundant population, dominating broad statistics.

methods: numerical – galaxies: evolution – cosmology: theory-dark matter
pubyear: 2018pagerange: Ghost busting: introducing a new, robust galaxy finder algorithm for simulationsB

1 Introduction

Galaxies are the result of a wide variety of physical processes. Their evolution and properties are determined by both their hierarchical assembly and the complex interplay between many multi-scale non-linear processes such as star formation, radiative cooling, and feedback loops (see Somerville & Davé, 2015, for a recent review). Cosmological hydro-dynamical simulations of galaxy formation are ideal laboratories to explore and isolate the effects of these physical processes on the evolution of galaxies in realistic environments (Dubois et al., 2014; Vogelsberger et al., 2014; Schaye et al., 2015). The advantage of these simulations over other numerical methods, such as abundance matching (e.g. Berlind & Weinberg, 2002; Berlind et al., 2001) and semi-analytic models of galaxy formation (Lacey & Cole, 1993; Cole et al., 2000; Kauffmann & Charlot, 1998) is the ability to predict the internal structure of galaxies, as the hydrodynamics that give rise to it is resolved through direct resolution of the equations of physics down to sub-galactic scales.

In recent years a major breakthrough in the capability of cosmological hydrodynamical simulations to produce realistic galaxy populations has taken place. This has been achieved thanks to the combined results of major improvements in numerical algorithms, availability of computing resources, improved subgrid models for unresolved feedback processes, and the calibration of subgrid feedback parameters to match key observables. Examples of this new generation of simulations include Horizon-AGN (Dubois et al., 2014), EAGLE (Schaye et al., 2015), Illustris (Vogelsberger et al., 2014) and IllustrisTNG (Pillepich et al., 2018). Simulated boxes of with sub-kpc resolution are becoming common. These simulations reproduce observables beyond those they were tuned for, with various degrees of success. For example, these simulations produce reasonable morphological diversity of galaxies, the colour bimodality of galaxies, the SFR-stellar mass relation, the stellar mass function and the cosmic SFR density evolution (e.g. Furlong et al., 2015; Genel et al., 2014; Trayford et al., 2015; Trayford et al., 2016; Snyder et al., 2015; Dubois et al., 2016; Nelson et al., 2018).

In order to understand the physics involved in the formation of galaxies through simulations, we first need to understand and test the extent to which such results depend on numerical effects rather than on the physics (e.g. Klypin et al., 1999). This issue has been pointed out over the years by several studies which have shown that properties of galaxies and galaxy populations sensitively depend on the specific code used, the implemented subgrid physics and their respective tuning, as well as numerical resolution (see e.g. Frenk et al., 1999; Kim et al., 2014; Power et al., 2014; Knebe et al., 2015; Scannapieco et al., 2012; Schaye et al., 2015; Elahi et al., 2016; Sembolini et al., 2016a, b).

Often overlooked is the issue of the robustness with which we can measure galaxy properties in these simulations that can affect the conclusions reached. The latter ultimately depends on how well we identify structures in the simulations (Knebe et al., 2011, 2013b). These issues are of particular interest for the new and coming generation of hydrodynamical simulations, which have taken the route of fine tuning the free parameters of the subgrid physics modules (i.e. which describe the processes that are expected to take place at scales below the resolution limit) against a desired observable (e.g. the galaxy stellar mass function, GSMF, and the size-mass relation, Crain et al. 2015). Robustly measuring the desired galaxy property to perform the tuning in simulations is therefore crucial.

In the first studies of hierarchical formation, simple structure finding algorithms, such as spherical over-density (SO, Press & Schechter, 1974) and Friends-of-Friends (FOF, Davis et al., 1985), were able to give a reasonable estimation of “condensed” structures in simulations. However, with the ever increasing size of simulations and the need of higher accuracy in measurements, such simple approaches are not necessarily optimal, and a large number of codes have appeared in the literature addressing the finding of structures in simulations (see Knebe et al., 2011, 2013b, and references therein). Early approaches have been characterised by using solely configuration-space information (e.g. bdm, Klypin & Holtzman 1997; hop, Eisenstein & Hut 1998; skid, Stadel 2001; subfind, Springel et al. 2001; ahf, Gill et al. 2004), while more recent sophisticated algorithms have addressed the problem adding the velocity-space information (e.g. 6dfof, Diemand et al. 2006; hsf, Maciejewski et al. 2009; VELOCIraptor, Elahi et al. 2011; rockstar, Behroozi et al. 2013) Although all these algorithms attempt to solve the same problem, the specific details of each implementation can introduce artifacts in the final results.

It is essential that we understand the reliability of measurements and the associated systematic uncertainties. This has been addressed by many comparison projects in which structure finding codes are tested against the same data to study the similarities and differences on the measurements of the properties of dark matter haloes (Knebe et al., 2011), subhaloes (Onions et al., 2012), galaxies (Knebe et al., 2013a) and tidal structures (Elahi et al., 2013). Such studies have found overall agreement when analyzing dark matter halo populations (Knebe et al., 2011). However, large differences are obtained on the overall mass recovered for dark matter subhaloes, satellite galaxies and tidal streams (Knebe et al., 2013b; Onions et al., 2012; Knebe et al., 2013a; Elahi et al., 2013). While the identification of substructures depends on the identification of density peaks, the major challenge is to assign the “background” particles to statistically significant density peaks which can affect drastically the properties of the structures. For this reason, algorithms that only use configuration space information, although fast, struggle to identify appropriately subhaloes in dense environments (e.g. galaxy groups and clusters, and merging systems), while finders that include also include velocity-space information obtain better results in these regimes (Knebe et al., 2011).

This paper presents a new galaxy finding algorithm which makes use of the full configuration and velocity space information, and presents a thorough study of the effects that the identification method has on the properties of individual galaxies and galaxy populations. This implementation is an extension of the halo-finder code VELOCIraptor (Elahi et al., 2011, Elahi et al. in prep). We pay special attention to two regimes that have been traditionally challenging for galaxy finding algorithms: (i) mergers and interactions and (ii) identification of substructures in high density environments. The main problem in both of these regimes is that the outskirts of hosts and satellite structures can have similar densities, making it difficult to distinguish to which structure they belong. This is even harder if only configuration space information is taken into account. These problems are equally valid for dark matter haloes and galaxies, while there is a plethora of literature that addresses the former (see for reference Knebe et al., 2013b), the latter has not yet been thoroughly addressed. Galaxies have a range of morphologies which during interactions produce complex stellar structures that form on an already significant density peak. Thus, the problem of identifying galaxies cannot be solved using dark matter halo finding tools. We show that the undesirable consequences of poor identification affect radial mass profiles, sizes and total masses. We apply our new galaxy finding algorithm to the state-of-the-art cosmological hydrodynamical simulation Horizon-AGN (Dubois et al., 2014) and compare our results with the original galaxy catalog, which was obtained by applying the configuration space finder HaloMaker (Aubert et al., 2004; Tweed et al., 2009).

This work is organised as follows: In Section 2 we provide a general and brief description of the code VELOCIraptor and the Horizon-AGN simulation. In Section 3 we describe in detail the improved algorithm to identify galaxies in simulations and implemented in VELOCIraptor. In Section 4 we present examples of the performance of our new algorithm on strongly interacting scenarios. In Section 5 we compare results obtained with VELOCIraptor and the original Horizon-AGN galaxy catalog on a galaxy-to-galaxy basis, as well as comparing the entire galaxy populations. Discussion is presented in Section 6, and summary and conclusions are presented in Section 7. Lastly, in Appendix A we show how configuration space linking length affect galaxy delimitation, and in Appendix B we show different weights affect particle assignment.

2 Numerical Methods

In this section, we briefly describe the Horizon-AGN simulation, and the structure finding code VELOCIraptor. For further details the interested reader is referred to Dubois et al. (2014), where the Horizon-AGN simulation was presented, and to Elahi et al. (2011) for a detailed description of VELOCIraptor.

2.1 Horizon-AGN Simulation

Horizon-AGN is a state-of-the-art hydrodynamical simulation, presented in Dubois et al. (2014). It follows the formation and evolution of galaxies in a standard cold dark matter (CDM) cosmology, adopting values of a total matter density , dark energy density , amplitude of the linear power spectrum , baryon density , Hubble constant km s  Mpc , and spectral index , in concordance to results from the Wilkinson Microwave Anisotropy Probe 7 (WMAP7, Komatsu et al., 2011).

The simulation was run using the adaptive mesh refinement (AMR) code ramses (Teyssier, 2002), and it has a comoving box size of Mpc, a total of 1024 dark matter particles with mass ; and an initial number of 1024 gas cells, which are refined up to seven times to a maximum physical resolution of kpc.

Implemented subgrid physics include: gas cooling, heating from a uniform redshift-dependent UV background, star formation, stellar feedback driven by supernovae (SNe) Type Ia, II and stellar winds, and black hole (BH) accretion and its associated active galactic nucleus (AGN) feedback. Following Dubois et al. (2012), BHs are created with a seed mass of , and grow according to a Bondi-Hoyle-Lyttleon accretion scheme capped at Eddington accretion rate. A two-mode AGN feedback is explicitly implemented as a bipolar outflow at accretion rates smaller than 1% the Eddington accretion (Dubois et al., 2010), and as an isotropic thermal energy injection otherwise (see Dubois et al., 2014; Volonteri et al., 2016, for further details).

Galaxies in Horizon-AGN were originally identified with HaloMaker, which uses AdaptaHOP structure finder based on identification of saddle points in the density field (Aubert et al., 2004; Tweed et al., 2009). Using star particles information only, the local density for each particle is calculated using 20 neighbours, and a local threshold of times the average total matter density is applied to select relevant densities. A minimum physical size above which substructures are considered relevant of 2 kpc is also applied. Only galactic structures with more than 50 star particles are considered.

Horizon-AGN has been used to study the alignments between the spin of galaxies and the cosmic web filaments, and how mergers change the spin orientation of galaxies (Dubois et al., 2014; Welker et al., 2014). Its BH growth and AGN feedback implementations have succeeded in producing a BH population whose overall properties agree with observations (Volonteri et al., 2016), and have shown the importance of AGN feedback in helping the simulation to reproduce the observed morphology and kinematic properties of massive galaxies (Dubois et al., 2016). Additionally, the simulation has also been used to study the evolution of the galaxy luminosity and stellar mass functions, star formation main sequence and galaxy colours (Kaviraj et al., 2017)111Further research projects and publications can be found in the Horizon-AGN simulation website (

2.2 VELOCIraptor

VELOCIraptor (also known as stf, Elahi et al., 2011) is a structure finding algorithm capable of identifying dark matter haloes, galaxies and substructures such as satellite subhaloes and streams in simulations. Here we briefly summarize the algorithm presented in Elahi et al. (2011).

The standard VELOCIraptor’s algorithm is based on the assumption that the velocity distribution of a system composed by many objects can be split into a smooth background component with overdense features in it. The former would correspond to the main halo, and the latter to the substructures embedded in it. Hence, substructures are found by identifying the particles whose local velocity density stands out from the expected background velocity density , effectively looking for clustering in orbit space.

In order to calculate these quantities for each particle, the main halo is split in volume cells using the KD-tree algorithm (Bentley, 1975). This is done such that each cell contains enough particles to minimize statistical errors, but not too many to avoid variations in the gravitational potential and velocity density in each cell. The expected background velocity density, , is estimated as a multivariate Gaussian. Hence, for a particle with velocity


where and are respectively the local average velocity and velocity dispersion tensor about , at the particle’s position . To accurately determine and , the and of each cell are calculated, and these quantities are linearly interpolated to the particle’s position using the cell containing the particle and six neighboring cells. For each cell




where and are the particle ’s mass and velocity respectively, and and are the number of particles and mass of the cell , respectively. Finally, the local velocity density is calculated using a smoothing kernel scheme from velocity-space nearest neighbours.

For each particle , the logarithmic ratio of the local and background velocity distributions


is calculated. Particles with above a threshold are kept and classified as potential substructures.

Once the outlying particles are found, they are clustered into groups using a Friends-of-Friends (FOF, Davis et al., 1985) motivated algorithm. Particles and are grouped if:


where and are a particle’s position and velocity respectively, is the configuration-space linking length, is the velocity ratio threshold determining the range in which the norm of the particles’ velocities are considered to be similar, and is an opening angle threshold within which directions of the particles’ velocity vector must align. This effectively means that particles in a group need not only to be physically close, but they also need to be close in orbital space.

VELOCIraptor has been employed in several comparison projects that have confirmed its versatility and ability to accurately find structures and substructures in -body and hydrodynamical simulations (e.g. Knebe et al., 2013a; Elahi et al., 2013; Behroozi et al., 2015; Onions et al., 2012). An updated version of the code along with new features and tools will be presented in Elahi et al. in prep.

3 Robust Identification of Galaxies

VELOCIraptor was originally designed to find dark matter structures in simulations, including haloes, subhaloes and dark matter streams. While it has also been used to identify galaxies in hydrodynamical simulations (Knebe et al., 2013a), the treatment of the baryonic component was limited to first identifying dark matter (sub-)haloes, and then linking gas and stellar particles to the nearest dark matter particle in phase-space. Though this procedure in principle provides a phase-space assignment of baryons to dark matter haloes, there were two key aspects that needed improvement. First, the metric used for baryon assignment was quite simple, which could cause incorrect assignment of particles especially for non-spherical or complex geometries, which are particularly present in interacting galaxies. Secondly, for some interacting galaxies, the dark matter haloes might be indistinguishable, assigning the merging galaxies to the a single halo.

These problems could be solved by running VELOCIraptor independently over stellar particles to identify galaxies. However, the original VELOCIraptor algorithm assumes the existence of a smooth, semi-virialised background. The code was not optimised to find substructures in any system where the background is sparsely sampled.

Here, we describe a new algorithm that uses the tools already implemented in VELOCIraptor to perform fast and efficient phase-space FOF searches, but modifying several search and assignment criteria to get the desired robustness in the identification of galaxies.

Figure 1: Summary of the algorithm to find galaxies with VELOCIraptor introduced in Section 3. Structures are search by separating particles in the simulation in 3DFOF objects (1), and posteriorly doing a 6DFOF search (2). Then, an iterative 6DFOF search is done in each of these objects to look for dense cores of galaxies in close interactions or mergers (3). Once cores are found they are grown by assigning particles in the original 6DFOF object (4). Finally, properties of all objects found are calculated and galaxies are selected according to these properties (5). A key aspect of this algorithm is the particle assignment procedure (core growth), as even in the presence of satellites close to the host centre (6DFOF Core objects in purple), we can obtain smooth profiles (see central galaxy in orange). See text for further details.

3.1 An Improved Algorithm to Identify Galaxies

Here we present the improved algorithm to identify galaxies in simulations. A schematic representation is shown in Fig. 1.

3.1.1 Step 1 - 3DFOF

First we perform, a configuration-space FOF search (3DFOF), described by equation (5), on the star particles. By grouping particles that are close in physical space, this search provides us with a ‘first guess’ of where the overdense structures of interest (i.e. galaxies) are located. Since its introduction in Davis et al. (1985), this first step is commonly used by many finding algorithms (e.g. Subfind, HaloMaker, Rockstar, Springel et al., 2001; Aubert et al., 2004; Tweed et al., 2009; Behroozi et al., 2013) due to its simplicity and versatility. For cosmological simulations, a widely adopted scheme is


where is the configuration-space linking length, is the simulation’s mean inter-particle spacing, and . We adopt a value of which is a common choice used in dark matter only cosmological simulations (e.g. Boylan-Kolchin et al., 2009; Schaye et al., 2015; Vogelsberger et al., 2014).

3.1.2 Step 2 - 6DFOF

Galaxies are centrally concentrated distributions of stars in configuration and velocity space. In simulations, the positions and velocities of the constituent particles are expected to be found close in phase-space. Galaxies are identified by performing a phase-space FOF (6DFOF) search separating each 3DFOF object into kinematically distinct substructures. Particles and are linked into 6DFOF groups if and only if


where and are the configuration space and velocity linking lengths, respectively.

We stress that appropriate values of and have to be chosen in 6DFOF searches. If a very large value of is adopted, this would result in a velocity-only FOF search and vice-versa; while very small values of linking lengths would result in either splitting single structures into multiple components, or missing structures.

At this point we are interested in separating structures that have been found in a common 3DFOF envelope. For this purpose is chosen to be a function of and is estimated from the velocity dispersion of the full 3DFOF object




Here , is the velocity dispersion in the direction, and is a user defined parameter which should be of order unity. As local properties of each 3DFOF object are used for its 6DFOF search, we are effectively performing a ‘tailored’ 6DFOF search 222Consider trying to link particles belonging to a Gaussian distribution. Its dispersion, , provides a good starting point for linking length..

Intuitively it would be more consistent to compute using similar arguments as for . However, due to the complexity of the environment in which some galaxies reside, measurements of position dispersion of the particles would actually result in very large values of . This is especially the case for galaxy groups and clusters where particle bridges between galaxies make 3DFOF structures too extended. By shrinking the configuration-space linking length, the 6DFOF search will only find structures with higher over-densities than the one of the 3DFOF object. A similar argument can be stated against using equation (11), as large 3DFOF objects are expected to have very large velocity dispersion, and consequently very large values of . However, in this case we do not have a priori knowledge of what the scale of the velocity linking length should be, as this is the first 6DFOF search, provides a good first estimation of . A more extensive discussion on the choice of configuration space and velocity space linking lengths at this step of the algorithm can be found on Appendix A.

3.1.3 Step 3 - Iterative 6DFOF core search

Although the 6DFOF search should already have separated galaxies with distinct phase-space distributions, multiple galaxies can still be found in single 6DFOF groups. This is the case of merging galaxies whose outskirts have phase-mixed to some degree but whose cores (dense kinematically cold galactic centres) have not yet fully merged, or satellites that orbit close to the centre of a much bigger galaxy. Instead of trying to recover a group in its entirety, we adopt a different approach and attempt to isolate their cores. In order to separate galaxies in these structures we perform an iterative 6DFOF core search for each preliminary 6DFOF group. For this iterative 6DFOF core search we use the same criteria as equation (9) to link particles, but using a different choice of linking lengths, which for clarity will be identified with the subscript . These linking lengths scale with the dispersion of the system being searched.

FOF algorithms, particularly when used in an iterative fashion, are sensitive to the choice of linking parameters: too large and separate structures can be joined; too small and structures can be fragmented. Rockstar (Behroozi et al., 2013), which uses a 6DFOF to recover groups in full, addresses the latter problem by merging groups if their centres are closer than a phase-space distance threshold to clean for false positives. Although useful, our approach is oriented towards a robust search of the densest portions of groups, followed by carefully growing candidate cores, and does not solely rely on the effectiveness of cleaning procedures. Therefore, we first set appropriately the search parameters, which are then modified in each iteration.

For the initial velocity space linking length we adopt


where is the length of the largest principal axis of the velocity dispersion tensor, . As for the first 6DFOF search, equation (12) sets the scale for the initial velocity space linking length. For the following iterations is iteratively shrunk, i.e.


where the super-script indicates the iteration level, and is a user-defined shrinking factor. Small values of lead to fewer iterations, hence less use of computational resources, but can result in missing the identification of objects; while despite being a conservative value and more likely to find all structures, can lead to a large number of iterations to separate structures, and consequently more use of computational resources, especially for major mergers. We find values of separate structures while minimizing the number of iterations. Smaller values often lead to unidentified structures. By shrinking the velocity space linking lengths this way, we remove the wings and bridges in the distribution, because in each iteration we truncate the original distribution towards the coldest regions, separating cores.

The adopted configuration space linking length here is


where is the length of the largest principal axis of the configuration dispersion tensor, , and is the number of particles in the 6DFOF group. Equation (14) is then the mean inter-particle spacing in a radius sphere. This linking length scales with configuration-space dispersion and the extent to which the distribution is well sampled. The logic of including a scaling that decreases the linking length with increasing number of particles is as follows. With a well-sampled distribution, the 3 scaling used will link not only the central region but the outskirts as well, possibly joining this distribution with neighbouring ones. Decreasing the linking length, if well sampled, reduces the likelihood of artificially joining structures. Conversely, if poorly sampled, the measured dispersion will underestimate the true one. Therefore, relative to a well sampled system, we scale up the linking length.

Although at this stage the iterative 6DFOF search is done to separate structures, configuration space linking length is kept fixed through iterations. We could in principle modify by some factor at each iteration as is done for . However, equation (14) already includes the information on how concentrated the distribution (6DFOF object) is in configuration space. Reducing value will likely cause that we either miss or fragment structures. Our approach requires a fixed short enough to separate structures in configuration space, and a long enough to gather statistically significant groups of particles. In each iteration is shrunk to separate structures that might be linked by their velocity-space outskirts.

Parameter Value Reference
0.2 Equation  8
0.2 Equation  10
1.0 Equation  11
0.8 Equation  13
1.5 Equation  15
8 Section  3.1.3
50 Section  3.1.3
0.5 Equation  20
Table 1: Suggested values for the parameters used for galaxy identification with VELOCIraptor.

For each FOF search, a minimum particle number, has to be set to define statistically significant structures. For steps 1 and 2 (Sections 3.1.1 and 3.1.2) we suggest a . For the iterative search, however, is updated after each iteration as


where is the minimum number of particles, and superscript indicates the iteration level. Increasing the minimum number of particles while shrinking linking lengths may sound non-intuitive at first as we expect to link fewer particles per group in each iteration. However, as the linking length becomes smaller, it also becomes easier to identify small phase-space overdense (noisy) patches in the distribution, which can result in finding multiple spurious structures. Iteratively increasing reduces the likelihood of finding noisy patches. Similarly to other parameters, poorly chosen can either lead to the identification of many small structures if , or miss the identification if large values of are chosen. Values of are recommended.

A more intuitive choice of would be one that scales with the number of particles in a given group or iteration level, instead of choosing a fixed for all searches. However, bearing in mind that the number of particles can differ by orders of magnitude between galaxies in the same system, even using a logarithmic scale of the number of particles can lead to , and consequently to very large in a couple of iterations.

This iterative 6DFOF search starts with the entire 6DFOF object. For subsequent iterations the 6DFOF search is done only for the largest core. This prevents the loss of an already found structure due to the increment of . These cores are kept for particle assignment (core growth, Section 3.1.4) and are revisited later to look for possible mergers or close interactions. Iterations on the largest core stop when a user-defined maximum number of iterations, , has been reached, or when no more structures are found with the current iteration level search parameters.

3.1.4 Step 4 - Core growth

The critical step once cores are identified is assigning particles to these cores, reconstructing the galaxies. We assign particles that belong to the original 6DFOF structure (step 2, Section 3.1.2) that are not member of a core. This process is crucial as the final product of structure searches (either galaxies or dark matter halos) can be severely affected by how this is done.

Given the phase-space nature of the 6DFOF searches, the obvious criteria would be to assign a given particle to the closest core in phase-space. This concept has been previously used by other algorithms, but several implementations can exist. A naive 6D phase-space distance as implied by Behroozi et al. (2013), implicitly assumes a spherical morphology. This might work well for dark matter haloes but can lead to systematic effects due to the complex morphologies of galaxies.

Instead, starting at level , we characterize the phase-space distribution of each core , by calculating its mean (phase-space centre-of-mass vector), and phase-space dispersion tensor (distribution’s covariance matrix),


Here, and are the total mass and the total number of particles in the core , respectively; is the coordinate of the phase-space coordinate vector of particle with mass , that belongs to core . Then, for all the particles at that were not assigned to any core at level , we calculate


Here is the phase-space distance from untagged particle to core and is a weighting constant. A weighting scheme is necessary to avoid assigning too many particles to tidal streams and shells. Without a weighting, this could happen as these structures can be quite extended and have large position and velocity dispersion compared to those of galaxies (compact centrally concentrated distributions). To compensate for this, we adopt


with a free parameter. Taking can cause all particles to be assigned to the largest object, again, as galaxy masses in the same system can differ by orders of magnitude. Values of give a that scales with tidal radius. We have found that leads to good results; we justify this choice of in Appendix B.

After calculating these distances, particles are assigned to the closest core in phase-space. When a single core is found at level , all untagged particles at the previous level, are assigned to that single core. Then, and are recalculated for all the cores in the following levels and the process is repeated until all particles in the original 6DFOF group have been assigned to a core.

This approach is particularly powerful for many reasons: (i) it effectively takes into account the shape and orientation of the distribution; (ii) it allows the shape of the distribution to change from the inner to the outer parts; (iii) this produces smooth density profiles for the galaxies even when galaxies are passing through the inner radii of larger galaxies. Hence, galaxies will not have missing holes or bubble-like structures (see Figs  1 and 3 for some examples). This is essential when measuring galaxy properties’ radial profiles.

For each 6DFOF object (step 2, Section 3.1.2) the algorithm continues as follows. After performing step 3 (Section 3.1.3) on the largest core, particles are assigned to all cores inside following step 4. The top hierarchy level, , is assigned to the largest core (candidate central galaxy). The rest of the cores will have hierarchy level . Steps 3 and 4 are then repeated for all substructures. If any sub-substructures are found they are assigned a hierarchy level , and so on. The algorithm finishes when all (sub)structures have been iteratively searched.

We show in Table 1 suggested values for free parameters of the algorithm.

3.1.5 Step 5 - Selecting galaxies

Once all (sub)structures have passed through the iterative core search and their respective core growth, bulk properties of the structures are calculated to determine if they are galaxies or not. This is necessary because the versatility of the algorithm allows us to identify not only galaxies but also tidal features such as streams and shells. This catalogue can be cleaned if only galaxies are desired.

We classify objects as galaxies or streams following Elahi et al. (2013). We calculate the ratios and of the eigenvalues, , of the position and velocity dispersion tensors for all the structures, as well as the bound fraction of particles . A structure is not considered as a galaxy if


that is, galaxies are expected to be bound ellipsoidal distributions of stars. Structure with less than 1% of bound particles are unlikely to be galaxies. Highly elongated structures either in configuration or velocity space (i.e. low values of , , , and ), which can be bound to some degree, are likely to be streams or shells. The fraction of bound particles is kept to such low thresholds, as neither gas nor dark matter information is taken into account when computing the gravitational potential.

Figure 2: Projected stellar density for a major merger in Horizon-AGN simulation in configuration space and velocity space, as labelled. shows the galaxies before the merging occurs, and during the merger. For clarity, for the velocity space visualization at , galaxies are shown both individually and as part of the same velocity space. Galaxies identified with HaloMaker and VELOCIraptor at are shown. We show for each space the projection in which particle distributions are most distinguishable. It can be seen that although HaloMaker is able to identify two galaxies, it appears that for the small galaxy only the core of it is identified as an individual one, while its outer regions are assigned to its companion. Due to its phase-space implementation for search and core growth, VELOCIraptor is able to find both galaxies and provide a better estimate of their mass and size. The horizontal line in the merger inset shows a length of 20 kpc (200 km s), which is the same for all the configuration (velocity) space insets.

3.1.6 Intra-halo stellar component

Once galaxies have been identified inside a 3DFOF object, the remaining stellar particles are kept and labelled as Intra-Halo Stellar Component (IHSC). The extent, distribution and shape of this component relies on the definition itself of galaxies (see Appendix A and Fig. 11). The IHSC is therefore all the material that is kinematically different enough from the distribution of any structure in the 3DFOF object. This diffuse component can be associated to either extended stellar haloes on Milky Way like systems, up to Intra-Cluster Light in densely populated environments. In-depth analysis of the IHSC is beyond the scope of this work; thus, we address this in upcoming studies (Cañas et al, in prep).


This core growth method has been also implemented in the VELOCIraptor algorithm to find dark matter haloes.

We note that none of the finding algorithms is exempt from finding undesired (spurious) structures. Although for this study most of such structures are removed from our galaxy catalogue with the criteria described in equation (21), some spurious structures can still be present if they happen to be not very elongated in phase-space and are marginally bound. We leave methods and discussions on this matter for the upcoming VELOCIraptor paper (Elahi et al., in prep).

As it is shown in the following sections, this algorithm is quite efficient and powerful to find galaxies at all simulation-resolved mass scales, in all environments. We note, however, that this is not the definitive method for finding simulated galaxies because we do not include baryons in the form of gas. Hence, we may miss gas-dominated dwarf galaxies, which would have very few stellar particles or with a bound fraction of particles below our adopted threshold. This is anyway solved by applying conservative particle number thresholds when selecting galaxies. In the future we plan to link gas to galaxies in a similar fashion as we do in the core growth, but to do this properly we need to take into account the thermal energy of the gas. This needs to be carefully implemented to include both particle-based and mesh-based algorithms. Further discussion on this matter is beyond the scope of this paper and is left for future studies.

Figure 3: Projected stellar density of the two most massive galaxies found by HaloMaker (middle row), their respective galaxy cluster (top row), and their VELOCIraptor counterparts (bottom row). Although both codes are able to identify the central galaxy, HaloMaker fails to separate stellar content that belongs to other galaxies. To emphasize the full extension of the galaxies, insets show a zoomed-out visualization of the same objects. Panels (insets) have a box size of 600 (2000) kpc.

4 Case Studies

Here we present two case studies in which we compare the results of the improved algorithm of VELOCIraptor and the galaxies from the original catalogue identified with HaloMaker. With these case studies we address the most challenging cases for galaxy identifications, which our new algorithm solves well: (i) strongly interacting and merging galaxies and (ii) robust identification and particle assignment in dense environments, such as galaxy groups and clusters.

4.1 Close interactions

Structure finding algorithms have been known to struggle to produce robust results when trying to separate dark matter haloes and galaxies in the process of merging (Knebe et al., 2011; Behroozi et al., 2013, 2015). The reason behind this problem is that as structures start to get closer, the particle distributions that describe them start to mix, and separating them becomes a complicated task. For FOF finders, particle mixing creates bridges between the centres of the structures that link them together; while for density threshold algorithms, the mixture of the distribution reduces the contrast between peaks and saddle points in the density field, making it more difficult to identify correctly the components. As particle distributions also mix in phase-space, even iterative procedures can struggle to find peaks, and to assign particles correctly to structures, hence host and substructure identities can be swapped between snapshots (see e.g. Behroozi et al., 2015; Poole et al., 2017). Here we show how our improved galaxy finding algorithm performs in such cases.

We show an example of a close merger in Fig. 2. At a given snapshot, , the galaxies are still separated, and have masses of and respectively, giving a merger ratio of . In a subsequent snapshot, , HaloMaker identifies two galaxies with very different masses of and respectively, corresponding to a merger ratio of . During a merger, we expect some of the mass of one galaxy to be accreted by the other. However, from visual inspection we can tell that the galaxies have not been well separated by HaloMaker, as it seems that only the core of one of them is identified as an individual galaxy, while its outer parts have been assigned to its companion. Although two galaxies are identified, the mass of the smallest galaxy is underestimated, while the mass of the larger one is overestimated.

We ran VELOCIraptor on the same merger and it can be seen from simple visual inspection that a better result is obtained, despite the complexity of the interaction. The recovered masses of the galaxies are and , giving merger ratio of . This is in much better agreement with what is measured at , when galaxies were far enough as to be easily identified by a 3DFOF algorithm. It can be seen that not only both the galaxy centres are found, but also the shapes of the galaxies are well recovered thanks to the improved particle assignment (core growth) implementation. Correctly assigning particles to galaxy centres is crucial for an accurate estimation of the overall properties of galaxies. It affects the ratio of the merger, which in turn can affect the overall minor and major merger rate estimates, especially when only single snapshots are taken into account.

4.2 Groups and Clusters of Galaxies

Galaxy identification can be a complex task in galaxy groups and clusters. Stripped material from multiple interactions generates particle bridges and decreases the contrast in the density field, causing similar problems as the ones discussed in Section 4.1. Robust identification of galaxies in such systems is crucial as it can affect a very large number of galaxies. This can in principle affect environmental studies, as well as impact on galaxy population measurements.

We show in Fig. 3 two galaxy clusters in Horizon-AGN that host the two most massive galaxies identified by HaloMaker. We show projected stellar density of the full 3DFOF structure (step 1, Section 3.1.1), the central galaxy identified with HaloMaker, and the VELOCIraptor counterpart; a zoomed-out visualization of the objects is shown in the insets.

We can see that both codes are able to identify correctly a single peak in the central galaxy, meaning that there is no contamination from undetected satellite galaxies. In the zoomed-out images it can be seen that HaloMaker tends to assign a large number of particles to the central galaxy that belong to other galaxies in the cluster. This leads to the odd bubble shapes observed for the second HaloMaker galaxy on its top, and bottom-right in the zoom-out inset. This problem causes the mass and size of the central galaxy to be overestimated. On the other hand, because it searches for structure in phase space, VELOCIraptor is able to identify kinematically distinct structures, which results in a better delimitation of the galaxy’s boundaries.

This example also demonstrates how crucial the particle assignment is for the robust identification of structures. Although both finders are capable of identifying the cores of the central and satellite galaxies, galaxies are greatly different due to particle assignment. HaloMaker produces a central galaxy that has stars belonging to an orbiting satellite (bubble-shaped feature). On the other hand, due to the improvements of particle assignment using phase-space dispersion tensors, VELOCIraptor is able to separate distinct components even if their distributions overlap. This produces not only a better estimation of the masses of the galaxies, but also allows us to recover smooth density profiles of the galaxies, which is important if we are interested in studying radial profiles of galaxy properties.

This problem is not unique of HaloMaker, but of structure finding codes in general. Structure finding codes tend to focus on the identification of peaks in density fields (either configuration, velocity or phase-space), but the assignment has not been well addressed (Knebe et al., 2011). This problem can be tackled in many ways, the most widely implemented one involves unbinding procedures in which particles are removed from the satellites and returned to host structures. This procedure would need to take into account all mater contribution (gas, stars, and dark matter), while here we are only dealing with stars. Although the latter is physically motivated, it is highly probable that particles belonging to a background whose position overlaps with that of a substructure have the right velocity to be considered as bounded to the satellite. This will add mass to the satellite galaxies, and remove particles from the central one, producing the bubble-like features seen in the outskirts of central galaxies.

5 Results

In this section we study the differences between the HaloMaker and VELOCIraptor. For this purpose we generate a new galaxy catalogue for the Horizon-AGN simulation using our improved algorithm. In Section 5.1 we compare the catalogues on a galaxy-to-galaxy basis to study how much galaxy properties can be affected by the identification method. In Section 5.2 we investigate how differences in the identification can affect measurements of galaxy population properties.

5.1 Galaxy-to-Galaxy Comparison

We investigate differences between the finders by performing a galaxy-to-galaxy comparison. We matched galaxies using TreeFrog (Elahi et al. in prep, Poulton et al. in prep), a tool included in the VELOCIraptor repository to construct merger trees for simulations, which can also be used as a catalogue correlator. Galaxies in a reference catalogue are matched by finding the structure in the second catalogue which shares the most particles. We do this by looking at the individual particle IDs that belong to the galaxies and computing a merit function


Here, and are the total number of particles in structures and respectively, and is the number of shared particles, i.e. that exist both in and . This method ensures that galaxies in one catalogue are matched to the galaxy in the other catalogue that is most similar in particle members and that shares a large fraction of those.

We compare the total stellar mass , SFR, and sizes of matched galaxies by computing


which is the ratio between the above mentioned quantities, , as measured for the HaloMaker galaxy, over the one measured for its VELOCIraptor counterpart. In order to make a proper comparison and avoid resolution effects, we only show of galaxies whose total stellar mass is greater than in both catalogues, and only matches with (galaxies sharing % of particles) are shown.

Figure 4: Distributions of the mass ratio at (left panels), (middle panels), and (right panels) for different stellar mass ranges, as labelled. The contribution from isolated galaxies and loosely interacting galaxies is shown as a dashed line, from interacting galaxies as a dotted line, and the combined distribution as a solid line. For strongly interacting galaxies the contribution from hosts is shown as shaded blue region, while for satellites as shaded red region. Vertical dashed lines are shown as reference at dex from an exact match ().

To properly account for the cases shown in Section 4, we labelled galaxies depending on their degree of interaction as:

  • Isolated - The galaxy is the only structure found in the initial 3DFOF envelope.

  • Loosely interacting - The galaxy belongs to a 3DFOF object with multiple structures, and no structures were found in its iterative search, i.e. a single structure in a 6DFOF object.

  • Strongly interacting - The galaxy belongs to a 3DFOF object with multiple structures, and one or more additional structures were found in the 6DFOF object iterative search. The most massive galaxy in the 6DFOF object will be referred to as host, otherwise as satellite. Note that strongly interacting hosts are not necessarily central galaxies of the 3DFOF object (e.g. most massive galaxy of an interacting pair falling into a galaxy cluster).

A total of 81,583 matches fulfil the above criteria at . Of those, a total of 54,113 are isolated; from the remaining 27,470 interacting galaxies, 16,851 (10,619) are loosely (strongly) interacting. Only 0.096% of HaloMaker galaxies with do not have a counterpart identified in VELOCIraptor.

5.1.1 Total Stellar Mass

We measure the impact of identification on one of the most fundamental properties of a galaxy, the total stellar mass, . We show in Fig. 4 the distributions of the ratio for galaxies in VELOCIraptor in the mass ranges of , , and , which we will refer to as M09, M10, and M11 respectively. The distribution of all galaxies is shown as a solid line; dashed lines show the contribution from both isolated and loosely interacting galaxies (i.e. not strongly interacting); dotted lines show the contribution from strongly interacting galaxies. For the latter we show the contribution from host and satellite galaxies as shaded blue and red regions, respectively. Vertical dashed lines are shown as reference at dex with respect to an identical estimated mass, i.e. .

At , the distributions of the M09, M10 and M11 samples show that galaxies that are not strongly interacting are overall distributed around . This is something we would expect because any finder in the literature should not have any problem with the identification of isolated density peaks or particle distributions. Looking closely we see that the peak is slightly skewed towards , as a result of differences in the initial galaxy identification steps taken by the codes. HaloMaker assigns all particles inside a 3DFOF object to identified structures, but VELOCIraptor performs an additional 6DFOF search which delimits galaxies by grouping only phase-space close particles. This procedure effectively gets rid of the furthest phase-space particles reducing the ‘available’ mass to distribute between galaxies inside the original 3DFOF object; the mass excess observed for HaloMaker galaxies is the mass we consider to be part of the IHSC (Section 3.1.6). The reason why the M09 distribution is not as narrow as the other sample is likely due to all galaxies in M10 and M11 being well resolved, while some galaxies in M09 could still be affected by resolution effects. In Fig. 5 the distributions for all not strongly interacting galaxies in the M09 sample are shown. The majority of isolated and loosely interacting hosts have , consistent with the above description. On the other hand, loosely interacting satellites describe a symmetric distribution centred at , suggesting that their mass can either be over- or underestimated by HaloMaker compared to VELOCIraptor.

Figure 5: Distributions of the mass ratio for not strongly interacting galaxies with at (top panel) and (bottom panel). The contribution from isolated galaxies is shown as a solid green line, and from loosely interacting hosts and satellites as shaded blue and red regions, respectively. Vertical dashed lines are shown as reference at dex from an exact match ().

The greatest difference between the catalogues is seen in the strongly interacting population (dotted lines in Fig. 4). Their distributions at all stellar masses are broader than for the rest of the population. This shows that there is a non-negligible number of resolved galaxies that are affected by the artificial transfer of mass in interacting systems due to the particle assignment criteria of the finder (see Fig. 2 for an example). Host galaxies display a significant preference for , while the part of the distribution is predominantly dominated by satellite galaxies, which is more evident for the M11 sample. This picture is consistent with the examples shown in Section 4 (Figs. 2 and 3), meaning that the behaviour observed in those examples are not simple HaloMaker outliers, but are a recurrent phenomenon in the simulation.

For a considerable amount of strongly interacting satellites HaloMaker overestimates their mass compared to VELOCIraptor, i.e. they have . As these are interacting systems, a fraction of these cases can be explained by host-satellite swapping, where a galaxy that is considered a satellite by VELOCIraptor is in fact the host galaxy in HaloMaker. This artificially increases their mass compared to what VELOCIraptor estimates. It is not uncommon to see this phenomenon across catalogues from different finders, and it is even present for the same finder between different snapshots, especially in the case of major mergers (see e.g. Behroozi et al., 2015; Poole et al., 2017).

At higher redshifts, the distributions display an amplified version of the behaviour observed at . Distributions for all galaxies at all masses become wider, and peak further from . The distributions at , shown in the middle panels of Fig. 4, are wider, with more prominent wings compared to . In addition, the M09 at peaks at for not strongly interacting galaxies. Strongly interacting galaxies show a similar behaviour to the ones, with the region being dominated by host galaxies, and the region by satellites, with some fraction of satellites also having , likely due to the host-satellite swapping. The distributions at (right panels in Fig. 4) show an even wider distribution than that at with a less prominent peak in the case of isolated galaxies.

In order to understand some of the differences at and we have to bear in mind the nature of the AMR calculation, and the properties of high-redshift galaxies. Horizon-AGN was run using an AMR code for which the grid cells used to compute gravity and hydrodynamics change as the simulation evolves depending on local density, affecting the effective resolution of the simulation. Cells are allowed to be refined when the universe has an expansion factor of (), in order to keep the physical size of the cell somewhat constant. This affects the spatial and mass scales at which gas forms stars, hence the scales on which galaxies are resolved, impacting also on their identification. Additionally, such large differences for the same galaxy can be explained by the fact that at high redshifts, galaxies are clumpier and more compact than at the present time. Bursts of star formation within the same galaxy could be easily identified as separate structures, by either of the finders. However, we expect that VELOCIraptor is capable of joining structures that are kinematically similar that might appear as separate structures in configuration space. Lastly, at high redshifts we also expect the number of mergers to increase (e.g. Fakhouri et al., 2010), hence we expect that the example analysed in Section 4 becomes more frequent.

5.1.2 Star Formation Rate

Another fundamental quantity measured for galaxies is their star formation rate (SFR). We calculate the SFR for each galaxy by adding up the mass of all stellar particles with age smaller than a given threshold, and dividing the sum over that period of time. Results presented here were obtained adopting , and were corrected for a recycling fraction of 0.44 for a Kroupa initial mass function following Courteau et al. (2014), implicitly assuming instantaneous recycling. We have also calculated SFRs using windows of and Myrs, find that results are robust.

Figure 6: Distribution of the SFR ratio at for different mass ranges, as labelled. The contribution from isolated galaxies is shown as a dashed line, interacting galaxies as a dotted line, and the combined distribution as a solid line. For interacting galaxies the contribution from host and satellite galaxies is shown as shaded blue and red region, respectively. For reference vertical dashed lines are shown at dex from an exact match .

We measure to quantify the galaxy-to-galaxy difference in the estimated SFR. Figure 6 shows the distribution at for samples M09, M10, and M11 as green, red and blue lines, respectively. Similarly to Fig. 4, we show the total distribution, and the contribution from not strongly interacting galaxies, and strongly interacting hosts and satellites, as labelled. peaks close to for not strongly interacting galaxies in all mass ranges.

In this case the distribution’s peak is slightly more narrow compared to the one displayed by the distribution (Fig. 4), and is also slightly shifted towards due to the 6DFOF search of VELOCIraptor that ‘removes’ the phase-space outermost particles of the initial 3DFOF object, as discussed in Section 5.1.1. The spread in the distribution comes mostly from strongly interacting galaxies, with () corresponding to host (satellite) galaxies. It is interesting though that for host galaxies the tail is quite prominent and extends to at all stellar masses. For satellites, on the other hand, tails are more prominent at lower stellar masses. Galaxies whose mass is overestimated via the spurious acquisition of outer material of an orbiting satellite, will for instance increase their SFR, and vice versa. Moreover, the satellite galaxies affected by this are likely to have only the inner non-star-forming core as the galaxy (see for example Fig. 2). This will drastically reduce their estimated SFR, while for their hosts it will be enhanced by the incorrect assignment of the star-forming outskirts of the satellite.

Figure 7: Distribution of the , , and , , ratios for different mass ranges at , as labelled. The contribution from not strongly interacting and strongly interacting host and satellite galaxies is shown, as labelled. For reference, vertical dashed lines show 0.2 dex from .

5.1.3 Sizes - Enclosed Mass Radius

Accurate estimation of galaxy sizes is crucial as they are used not only to test how well galaxy formation models agree with observations, but have also been used for calibration of subgrid physics parameters by some of the present-day hydrodynamical simulations (e.g Crain et al., 2015). We calculate spherical radii and , which enclose 50% and 90%, respectively, of the total stellar mass of a galaxy. We show in Fig. 7 the distribution, for and at . Galaxy samples are colour coded as in Figs. 4 and 6.

At all stellar masses the total distribution peaks close to , and its tails extend beyond 0.3 dex from this value. At the peak of comes from not strongly interacting galaxies, while at isolated and interacting host galaxies contribute equally. Isolated and loosely interacting galaxies display a narrow distribution close to at all stellar masses, with its peak being slightly shifted towards . Both of these behaviours are similar to those seen for and obey to the same reasons (see Section 5.1.1). Not strongly interacting galaxies, however, at show a larger spread on , as galaxies can have values from up to ; while the former does not contribute largely to the low- tail, isolated and loosely interacting galaxies do contribute to the spread at the high- end.

Similarly to and (Figs. 4 and 6, respectively), strongly interacting galaxies contribute the most to the spread in the distribution. Satellite galaxies comprise the majority of the low- population at all stellar masses, whereas the high- population tail arises from host galaxies, consistent with previous comparisons. This is most evident for galaxies with (top panel) where strongly interacting host galaxies comprise the bulk of the total distribution, which is skewed towards . The latter can be seen in the bottom row of Fig. 3, where the zoomed-out insets clearly show that the extension of the galaxy is expanded by HaloMaker as it also considers other galaxies’ outskirts as part of the central.

Figure 8: Galaxy Stellar Mass Function (GSMF) of the Horizon-AGN simulation calculated using VELOCIraptor (solid blue) and HaloMaker (solid green), measured at redshifts , as labelled in each panel. The GSMF of all galaxies is shown as a solid thick lines; the contribution from not strongly interacting galaxies is shown as a solid thin line; and from strongly interacting hosts and satellites as dashed and dotted lines, respectively. Bottom panels show the relative difference, of the total GSMF. A dotted thick line is shown for HaloMaker’s total GSMF at stellar masses where VELOCIraptor does not find any structures. A vertical dashed grey line is shown at the mass equivalent to structures composed of 100 particles for reference. Observations of the GSMF from Wright et al. (2017), Moustakas et al. (2013), Muzzin et al. (2013) and Ilbert et al. (2013) are shown as symbols, as labelled.

The overall behaviour of (right panels in Fig. 7) is similar to , however the spread is much larger at all stellar masses, for all the galaxy samples. For not strongly interacting galaxies a narrow peak is no longer visible and extends well above 0.2 dex from . Although encloses almost all the mass of the galaxy, the distributions do not resemble to those of at the same redshift (see Fig. 4), especially for galaxies, suggesting that sizes are more sensitive to finder systematics than the stellar mass is. Interacting host galaxies peak at , showing that the size of central galaxies of groups and clusters are greatly increased by HaloMaker, which is consistent with the examples shown in Fig. 3.

For completeness purposes we repeated the above analysis for radius enclosing 20% and 100% of total stellar mass, and results are consistent to those for and , respectively.

5.2 Galaxy Population Statistics

In this section we study the impact of the identification method on the statistical properties of the galaxy population of Horizon-AGN. We measure standard galaxy properties and compare VELOCIraptor and HaloMaker catalogues. Our main objective here is not to test how well Horizon-AGN reproduces the observed galaxy population, but to compare how statistical measurements of galaxy population can be affected by identification and the resulting consequences of the biases discussed in Section 5.1.

5.2.1 Galaxy Stellar Mass Function

We start with the simplest measurement, the Galaxy Stellar Mass Function (GSMF). Simulations are often tuned to reproduce this quantity (see review of Somerville & Davé, 2015). Moreover, it has been demonstrated that a GSMF consistent with observations can be obtained by tuning subgrid physical model parameters (see e.g. Crain et al., 2015). Therefore, it is essential that we understand and control for all the systematic effects behind measuring the GSMF in our simulations.

We show in Fig. 8 the GSMF, , of Horizon-AGN measured with VELOCIraptor (solid blue line), and HaloMaker (solid green line) at redshifts . To quantify the agreement between the catalogues, we compute the relative difference


We also show the contribution to the GSMF from isolated (solid thin line) and strongly interacting hosts and satellites (dashed and dotted lines, respectively) from the matched galaxies as described in Section 5.1.

At the overall shape of the GSMF measured by both catalogues is similar. However, differences can be seen at both the low and high mass ends. At HaloMaker finds fewer galaxies than VELOCIraptor. The GSMF predicted by HaloMaker displays a declining curve towards lower stellar masses, while VELOCIraptor’s GSMF shows a plateau. This can be attributed to two factors. First at low number of particles, the density field used by HaloMaker is likely to be poorly sampled, making it possible that structures are not dense enough in configuration space to be identified. VELOCIraptor is better at picking up these structures as they are dense in velocity-space as well. This is consistent with comparison studies which have found that in general 6D-based finders tend to perform better at identifying structures with low number of particles (Knebe et al., 2011, 2013b). The second reason is attributed to the specific particle and density thresholds used by HaloMaker to define relevant structures. At this mass range, galaxies are composed of particles, close to the resolution limit of the simulation, making the identification of its peaks and saddle points challenging.

It has been pointed out by several studies that structures need to be composed by at least hundreds or thousands of particles in order to have reliable measurements of their internal properties, as well as resolved merger histories, (e.g Knebe et al., 2013b; van den Bosch et al., 2018; Chisari et al., 2015; van den Bosch, 2017; Elahi et al., 2018). We argue that VELOCIraptor is capable of robustly identifying structures at very low particle numbers, as has been shown in other studies (Elahi et al., 2018). However, we leave further analysis on structures with small number of particles for an upcoming study (Elahi et al. in prep). Finally, it is worth mentioning that for Horizon-AGN, only galaxies with are considered as resolved structures due to resolution.

At , the GSMF of HaloMaker predicts between 20% up to 100% more galaxies than VELOCIraptor. This difference is a result of the IHSC being assigned to the central galaxy in HaloMaker. Therefore it is not that VELOCIraptor is unable to find such big galaxies, but the fact that the mass of central galaxies in HaloMaker is systematically increased by the finder. Note that HaloMaker’s GSMF extends to higher masses than the most massive galaxy obtained by VELOCIraptor, shown as a dotted thick green line in Fig 8.

Despite the wide distributions shown in Fig. 4, the GSMFs practically overlap in the mass range between . This is partially explained by the peak of the total distribution (Fig. 4), located close to , which mostly comes from isolated galaxies (see Fig. 5). The latter are the galaxies that contribute the most to the GSMF at . Another factor is that the over- and under-estimation of the stellar mass by HaloMaker is compensated between systems of different masses. This effect can be seen at from the mass functions of different galaxy populations, where VELOCIraptor predicts more isolated and strongly interacting host galaxies than HaloMaker. The opposite happens for strongly interacting satellites, giving a total GSMF that agrees at those stellar masses. This is even more evident at at , where catalogues predict different numbers of galaxies for different populations, and still the total GSMFs agree relatively well.

At higher redshifts, the GSMF has a different behaviour than at . Although the overall shape of the GSMF is roughly similar, there is a clear offset in the normalisation. At (middle panel of Fig. 8) the GSMFs at behave similarly to the ones, except for a slightly higher number density obtained by HaloMaker compared to VELOCIraptor at . At , both GSMF agree well, however as we go to higher masses, HaloMaker’s GSMF starts to deviate from the one measured by VELOCIraptor, with up to % more galaxies at . At (right panel in Fig. 8), the GSMF of VELOCIraptor and HaloMaker are completely offset at all masses. For galaxies with , HaloMaker predicts between 30% and 40% more galaxies than VELOCIraptor. This difference increases to 50% up to more than 100% at . We can see from the mass function of isolated galaxies and strongly interacting hosts that HaloMaker predicts more galaxies at all stellar masses. This difference, however, to some degree can be explained by the IHSC that VELOCIraptor is able to separate, but HaloMaker includes as part of the galaxy. By adding the IHSC mass to their respective central galaxy we could in principle shift these mass functions to the right, matching those obtained by HaloMaker. As discussed for Fig. 4, differences between the catalogues at high redshift can also be attributed to (i) higher merger rates, (ii) bursty star formation (iii) AMR resolution implementation.

5.2.2 Star Formation Rate Function

Figure 9: Star Formation Rate Function (SFRF) of Horizon-AGN using VELOCIraptor (blue line) and HaloMaker (green line) at . The SFRF from all galaxies is shown as a solid thick line; the contribution from isolated galaxies is shown as a solid thin line; and from strongly interacting hosts and satellites as a dashed and dotted lines, respectively. Bottom panels show the . The vertical dashed grey line shows a SFR equivalent to 10 star particles formed in the last 50 Myr for reference. From the compilation of observations presented by Katsianis et al. (2017), we show estimations of the SFRF derived from IR LF from Magnelli et al. (2011), and UV LF from Alavi et al. (2016) and Parsa et al. (2016), are shown as symbols, as labelled.
Figure 10: Galaxy size-mass relation as measured by VELOCIraptor (green) and HaloMaker (blue). The relation for and is shown as circles and triangles, respectively. The mass-size relations are shown for all the matched galaxies (first panel), isolated galaxies (second panel), and for strongly interacting host and satellite galaxies (third and fourth panel, respectively). Dashed vertical lines, coloured according to the finder, delimit the edge of mass where bins have less than 10 galaxies. -band linear fit from Lange et al. (2015) (corrected for the observational 2D projection) for early (late) type galaxies shown in red (magenta).

In Fig. 9 we show the estimated SFRF, , of Horizon-AGN using VELOCIraptor (blue line) and HaloMaker (green line) at . SFRFs from isolated galaxies, as well as strongly interacting hosts and satellites are shown, as labelled. Bottom panel shows the relative difference , for the total SFRF. The dashed vertical line shows a SFR equivalent to 10 new star particles formed in the last 50 Myr for reference.

The overall shape of the total SFRF is in good agreement between the finders, as well as with the estimated from observations. At , HaloMaker predicts 50% less galaxies than VELOCIraptor. These are, however, SFR values close to star particle formed in the last 50 Myr. At higher SFRs, the values SFRFs start to become more similar, reaching a negligible difference at . At these SFRs the total SFRF is principally dominated by isolated galaxies whose SFRF agree between the catalogues; however, similarly to the GSMF, there is also ‘compensation’ from different galaxy samples, as VELOCIraptor predicts more strongly interacting hosts than HaloMaker, but less strongly interacting satellites at SFR , and vice versa at SFR . At SFRs , HaloMaker predicts more galaxies than VELOCIraptor, reaching a maximum difference of 50% at . This excess on number density predicted by HaloMaker compared to VELOCIraptor is caused by the addition of mass from other galaxies to the central (see Fig. 3 and the IHSC that is separated in VELOCIraptor). As we showed in Section 5.1.1, systematics affect most of the systems composed by multiple galaxies, and not only contact mergers, hence the differences observed in the estimated SFRFs for different galaxy samples.

5.2.3 Galaxy Mass-Size Relation

We show in Fig. 10 the estimated galaxy mass-size relation using (circles) and (triangles) for the galaxies found by VELOCIraptor (blue) and HaloMaker (green) in the sample of matched galaxies (as described in Section 5.1), at . Symbols show the median calculated in equal size bins in logarithmic scale, using the same bins for both catalogues. The four panels show the mass-size relation for all galaxies (first panel), isolated galaxies (second panel), and strongly interacting host and satellite galaxies (third and fourth panel, respectively). Vertical dashed lines are shown at the high-mass end where bins have less than 10 galaxies. For reference, the -band mass-size relation linear fit from Lange et al. (2015) is shown as a solid red (magenta) line for early (late) type galaxies. We apply an average correction to the observations due to the fact that we are measuring 3D sizes in the simulation, while observations measure projected sizes. The latter is a simple scaling of applied to the observations (which comes from the fact that galaxies have minor to major axis ratios of and are inclined by  degrees, on average).

The overall shape for and mass-size relation of the whole sample of matched galaxies is roughly similar for both finders. At , () of all the galaxies in the sample are on average 10% (20%) larger in HaloMaker than in VELOCIraptor. At this difference reaches a minimum for both radii, being almost negligible for , and 5% for . At , the difference in the sizes of galaxies between the catalogs starts to increase. The difference on the estimated radii of galaxies peaks at where on average HaloMaker galaxies have () values up to 20% (%) larger than VELOCIraptor. Although there is agreement between both finders at , the number of galaxies is very low. As discussed in Sections 5.1.1 and  5.2.1, high-mass galaxies are more massive in HaloMaker than in VELOCIraptor, extending the mass-size relation to larger values. It is interesting though, that despite the individual differences seen in Figs. 4 and  7, a simple extrapolation of the VELOCIraptor’s relation would agree with the one described by HaloMaker.

The mass-size relation for isolated and loosely interacting galaxies, as well as for strongly interacting host galaxies (second and third panel of Fig. 10, respectively) has a similar behaviour as the complete sample at all stellar masses. Smaller and in VELOCIraptor are expected for isolated galaxies due to its 6DOF implementation, which, as discussed in Section 5.1.1, reduces the stellar particle budget for galaxies and is kept as the IHSC. Although the latter also affects sizes of interacting host galaxies, their sizes are again artificially increased because of how particles in a common 3DFOF object are distributed as was shown in Fig. 3. Both effects are evident at for , and to a greater extent for . Although at the very high-mass end there seems to be agreement, as discussed above, the number of galaxies for both finders is very low preventing us from reaching any conclusion.

The difference in the mass-size relation for strongly interacting satellites, however, has a different behaviour compared to other samples. At , interacting satellite galaxies are on average more compact in the HaloMaker catalog; both and have on average lower values than their VELOCIraptor counterparts. Similarly to other samples, differences are larger for than for in the same stellar mass bin. At , () is on average 10% ( 20%) smaller in HaloMaker than in VELOCIraptor. At higher stellar masses the difference increases reaching a maximum at with galaxies being on average 20% and 30% smaller for and , respectively, in HaloMaker compared to VELOCIraptor. At , there are two mass bins where and of interacting satellites are on average similar and even larger in HaloMaker, contrary to what would be expected. This is likely to be caused by host-satellite swapping, and can be seen in Fig. 7 as a small bump at At () is on average 15% ( 20%) smaller for for HaloMaker satellites.

6 Discussion

We have presented an improved algorithm for identifying galaxies in simulations and showed how galaxy properties are affected by the finding algorithm. In this section we discuss implications of our algorithm, as well as possible consequences that non-robust identification of galaxies can have in cosmological hydrodynamical simulations.

6.1 Identifying galaxies vs. dark matter halos

Many structure finding codes are capable of finding galaxies in simulations (see for reference Knebe et al., 2013a). The vast majority of them are generally limited to either taking all bound baryons inside a dark matter (sub)halo and label them as the galaxy, or use the same algorithm and parameters adopted for dark matter haloes to identify galaxies. Although both approaches are valid for the identification of galaxies, there are important differences between dark matter halos and galaxies to keep in mind: (i) Stars, that make up galaxies, are formed from gas elements at the bottom of potential wells, hence galaxies are expected to be more compact than dark matter haloes; such gas elements can also cool into discs, very different than the geometry of dark matter halos. Consequently, for a large fraction of the galaxies, the stellar density profiles do not resemble those of their dark matter counterpart. (ii) Taking into account this variety of shapes and distributions is extremely important for the identification of merging and interacting galaxies, as such morphologies can distort and become quite complex, making their identification a non-trivial task. (iii) During mergers, the outer dark matter component will at some point phase-mix, but the stars in its centre do that on a different timescale, with some features being long living (such as streams and shells). This makes it important to analyse them separately. Even if stars and dark matter are both collisionless in simulations and interact solely through gravity, we should not use the same approach if codes were designed under assumptions that are valid only for dark matter haloes. Although for some galaxies the above approaches might work, that is not expected to be the case for the entire galaxy population in cosmological simulations, as we have shown in this study.

The algorithm presented here is a solution to tackle this problem. It is particularly powerful as it was designed to work without any a priori assumption on shape or distribution, and resolution. It is therefore a generalised solution that can be also easily applied to other components in simulations, which we explore in upcoming studies (Elahi et al. in prep).

6.2 Impact of identification

Simulation results

We have shown in this study how the total mass, size and star formation rate of galaxies can be affected by the assumptions and sometimes oversimplification of the finder. Additionally, as seen in the case studies (Section 4), misestimation of masses of merging galaxies impact the estimated merger ratio. This has several consequences as galaxy mergers are essential for the growth of massive galaxies (Robotham et al., 2014). Inability to resolve galaxies in interactions can affect estimated merger ratios and therefore estimated minor and major merger rates, and the impact they have on the build up of galaxies. A related area of great interest is whether interactions enhance/suppress the star formation activity in galaxies (Ellison et al., 2008; Davies et al., 2015; Kaviraj et al., 2015; Martin et al., 2017, Davies et al. submitted). As shown in this work the SFRs of galaxies that are strongly interacting have their SFRs affected within a fraction of 2 to 3 by the misidentification. The latter is comparable to the enhancements inferred observationally, showing that it is critical to robustly measure SFRs in simulated galaxies if we want to use them to offer physical interpretations at these environmental trends. Misestimation of masses and sizes can impact the interpretation that we can give to galaxies in dense environments such as groups and clusters, where both central and satellites can be largely affected. This implies that we could get misleading results when studying environmental effects on galaxy quenching in hydrodynamical simulations.

These are not the only possible consequences. Our case studies also showed that radial profiles can be affected, such as angular momentum or inertia tensors, both used for alignment studies. Regarding angular momentum, it has been recently shown by e.g. Cortese et al. (2016) in observations and Lagos et al. (2017) in simulations, that the estimated specific angular momentum can be up to 2.5 times (0.4 dex) higher if measured at two effective radius rather than one. It is therefore important for related studies in simulations to account for systematic effects that can severely affect the estimated sizes (e.g ) of interacting galaxies. Taking into account the offset we found when comparing HaloMaker with our new algorithm in Section 5.1.3, we would expect 3D finders to bias the specific angular momentum of satellites (centrals) towards low (high) values.

The effects on our understanding of Galaxy Formation

We showed in this study that despite the large differences seen in individual galaxies, especially the interacting ones, the overall galaxy population statistics are not severely affected by finder systematics. This has important implications for the way the galaxy formation is modelled and understood. We have already stated that population statistics are used to tune free parameters of subgrid physics models in simulations. To a certain extent, through tuning we can learn how recipes affect galaxies as well as the impact of different models, e.g. star formation, stellar and AGN feedback. However, we have shown that we can obtain the right amplitude of a relation or function (i.e. stellar mass function, or mass-size relation) using vastly different finders but for different reasons. We argue that the study of subsamples of the galaxy population (e.g. satellite galaxies, galaxy groups/clusters) can unveil such differences, and therefore provide key information to estimate the systematic effects introduced by the choice of finder. Subgrid physics often model unresolved and generally not-so-well understood physical processes that can affect the large-scale properties of galaxies. This is the case of BH growth and its corresponding AGN feedback. A major growth channel of BHs are mergers, and we have shown that the choice of finder affects the derived merger ratio. This in turn affects our estimates of BH merging timescales, possibly causing the existence of multiple BHs in merger remnants, and thus changing the associated effect of AGN feedback on the galaxy properties.

This all shows that perfecting our ability to identify galaxies and measure their properties in simulations is a key task that cannot be overlooked. Our new algorithm offers a new, robust and accurate way of doing this, yielding smoother stellar profiles (see Figs. 2 and 3) and more robust stellar mass estimates than widely used 3D finders. This implementation of our algorithm in VELOCIraptor will be made public in the next release of the code (Elahi et al. in prep).

7 Summary and Conclusions

We have extended the halo-finder code VELOCIraptor (Elahi et al., 2011, Elahi et al. in prep) to robustly identify galaxies in state-of-the-art simulations of galaxy formation. This new implementation overcomes many common problems that even state-of-the-art structure finding codes struggle with, such as particle assignment and accurate identification of strongly interacting systems. We have paid special attention to the appropriate selection and iterative adjustment of search parameters, to account for the wide dynamical range that simulations can have. Particle assignment (core growth) was improved by using the full phase-space dispersion tensor, allowing us not only to recover arbitrary galaxy shapes, but also to obtain smooth density profiles even for galaxies with satellites embedded within it.

With our improved code, we built an additional galaxy catalogue for the state-of-the-art cosmological hydrodynamical simulation Horizon-AGN, and compared its outcomes with those of the complex configuration-space based finder, HaloMaker. Case studies confirmed the versatility and robustness of our algorithm, and provided insight into how identification tools can affect galaxy properties (e.g. mass and sizes), as well as the estimates of merger ratios. Below we summarize our main results.

Galaxy-to-galaxy comparison.

We matched the galaxy catalogues to quantify how the total , SFR and sizes ( and ) can be affected by the chosen finder. We built distributions of , where corresponds to each of the properties above, and separate the contribution from isolated and interacting galaxies. Interacting galaxies are those hosted by halos with more than 1 substructure, otherwise galaxies are considered as isolated.

  • Isolated galaxies are in general narrowly distributed close to for , SFR, and . Such similarities between catalogues are expected as the identification of isolated galaxies should be straightforward. For , however, the peak is not narrow and a considerable amount of isolated galaxies have values around  0.3 dex from . This suggests that is highly dependent on the finding algorithm.

  • Interacting galaxies show a very wide distribution for all quantities studied. There is an evident difference between host and satellite galaxies, which peak at and , respectively. These differences are mainly caused by inadequate particle assignment in HaloMaker, which we show our improved version of VELOCIraptor handles better. HaloMaker artificially increases (decreases) the estimated values of , SFR and for host (satellite) galaxies in interacting systems.

  • Differences between the catalogues are amplified at higher redshifts, where the distributions of interacting and isolated galaxies widen.

Galaxy population statistics.

We investigate how the choice of finder affects the overall galaxy population statistics. We explore the GSMF, SFRF, as well as the Mass-Size relation for and .

  • At , the GSMFs of HaloMaker and VELOCIraptor agree well, while at higher stellar masses the former predicts from 20% to 100% more galaxies than the latter. At higher redshifts differences are amplified. At , HaloMaker’s GSMF predicts a number density of galaxies at least 30% higher than VELOCIraptor over the whole mass range, increasing to 100% at .

  • The SFRF at is also fairly similar between the finders, with differences increasing with redshift. At , the peak of the cosmic star formation history, HaloMaker predicts up to 50% more galaxies of than VELOCIraptor. This is important as these galaxies are expected to dominate the cosmic SFR.

  • We compare the and size-mass relation predicted by both finders. We find that the mass-size relation resulting from the two finders are similar, except at the high mass end, , where HaloMaker’s galaxies are 20% larger than VELOCIraptor’s. These differences increase by 30% when we study . This results from the fact that the stellar content and structure in the outskirts of galaxies is very sensitive to the choice of finder.

Although we see that the overall galaxy statistics are not greatly impacted by the choice of finder, individual galaxies can display differences in mass and size of more than a factor of between the two finders studied here. We suggest that the tuning of simulations of galaxy formation is relatively robust as it has consistently focused on population statistics. However, comparisons of galaxy sub-populations with observations, specifically in the context of pairs, groups and clusters, can be greatly affected by the choice of finder. We showed that our new algorithm outperforms 3D finders and provided extensive evidence of this.

One of our key findings is that the stellar outskirts of galaxies is greatly affected by the choice of finder. In upcoming studies we will explore in detail the diffuse Intra Halo Stellar Component, stellar streams and the outer stellar profiles of galaxies.


The authors thank Aaron Robotham, Matthieu Schaller and the theory and computing group at ICRAR for helpful discussions. RC is supported by the MERAC foundation postdoctoral grant awarded to CL and by the Consejo Nacional de Ciencia y Tecnología CONACYT CVU 520137 Scholar 290609 Overseas Scholarship 438594. CL is funded by a Discovery Early Career Researcher Award (DE150100618). Parts of this research were conducted by the Australian Research Council Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D), through project number CE170100013. This research is part of Spin(e) (ANR-13- BS05-0005, This work has made use of the Horizon Cluster hosted by Institut d’Astrophysique de Paris. We thank Stephane Rouberol for running smoothly this cluster for us.


  • Alavi et al. (2016) Alavi A., et al., 2016, ApJ, 832, 56
  • Aubert et al. (2004) Aubert D., Pichon C., Colombi S., 2004, MNRAS, 352, 376
  • Behroozi et al. (2013) Behroozi P. S., Wechsler R. H., Wu H.-Y., 2013, ApJ, 762, 109
  • Behroozi et al. (2015) Behroozi P., et al., 2015, MNRAS, 454, 3020
  • Bentley (1975) Bentley J. L., 1975, Commun. ACM, 18, 509
  • Berlind & Weinberg (2002) Berlind A. A., Weinberg D. H., 2002, ApJ, 575, 587
  • Berlind et al. (2001) Berlind A. A., Narayanan V. K., Weinberg D. H., 2001, ApJ, 549, 688
  • Binney & Tremaine (2008) Binney J., Tremaine S., 2008, Galactic Dynamics: Second Edition. Princeton University Press
  • Boylan-Kolchin et al. (2009) Boylan-Kolchin M., Springel V., White S. D. M., Jenkins A., Lemson G., 2009, MNRAS, 398, 1150
  • Chisari et al. (2015) Chisari N., et al., 2015, MNRAS, 454, 2736
  • Cole et al. (2000) Cole S., Lacey C. G., Baugh C. M., Frenk C. S., 2000, MNRAS, 319, 168
  • Cortese et al. (2016) Cortese L., et al., 2016, MNRAS, 463, 170
  • Courteau et al. (2014) Courteau S., et al., 2014, Reviews of Modern Physics, 86, 47
  • Crain et al. (2015) Crain R. A., et al., 2015, MNRAS, 450, 1937
  • Davies et al. (2015) Davies L. J. M., et al., 2015, MNRAS, 452, 616
  • Davis et al. (1985) Davis M., Efstathiou G., Frenk C. S., White S. D. M., 1985, ApJ, 292, 371
  • Diemand et al. (2006) Diemand J., Kuhlen M., Madau P., 2006, ApJ, 649, 1
  • Dubois et al. (2010) Dubois Y., Devriendt J., Slyz A., Teyssier R., 2010, MNRAS, 409, 985
  • Dubois et al. (2012) Dubois Y., Devriendt J., Slyz A., Teyssier R., 2012, MNRAS, 420, 2662
  • Dubois et al. (2014) Dubois Y., et al., 2014, MNRAS, 444, 1453
  • Dubois et al. (2016) Dubois Y., Peirani S., Pichon C., Devriendt J., Gavazzi R., Welker C., Volonteri M., 2016, MNRAS, 463, 3948
  • Eisenstein & Hut (1998) Eisenstein D. J., Hut P., 1998, ApJ, 498, 137
  • Elahi et al. (2011) Elahi P. J., Thacker R. J., Widrow L. M., 2011, MNRAS, 418, 320
  • Elahi et al. (2013) Elahi P. J., et al., 2013, MNRAS, 433, 1537
  • Elahi et al. (2016) Elahi P. J., et al., 2016, MNRAS, 458, 1096
  • Elahi et al. (2018) Elahi P. J., Welker C., Power C., Lagos C. d. P., Robotham A. S. G., Cañas R., Poulton R., 2018, MNRAS, 475, 5338
  • Ellison et al. (2008) Ellison S. L., Patton D. R., Simard L., McConnachie A. W., 2008, AJ, 135, 1877
  • Fakhouri et al. (2010) Fakhouri O., Ma C.-P., Boylan-Kolchin M., 2010, MNRAS, 406, 2267
  • Frenk et al. (1999) Frenk C. S., et al., 1999, ApJ, 525, 554
  • Furlong et al. (2015) Furlong M., et al., 2015, MNRAS, 450, 4486
  • Genel et al. (2014) Genel S., et al., 2014, MNRAS, 445, 175
  • Gill et al. (2004) Gill S. P. D., Knebe A., Gibson B. K., 2004, MNRAS, 351, 399
  • Ilbert et al. (2013) Ilbert O., et al., 2013, A&A, 556, A55
  • Katsianis et al. (2017) Katsianis A., et al., 2017, MNRAS, 472, 919
  • Kauffmann & Charlot (1998) Kauffmann G., Charlot S., 1998, MNRAS, 294, 705
  • Kaviraj et al. (2015) Kaviraj S., Devriendt J., Dubois Y., Slyz A., Welker C., Pichon C., Peirani S., Le Borgne D., 2015, MNRAS, 452, 2845
  • Kaviraj et al. (2017) Kaviraj S., et al., 2017, MNRAS, 467, 4739
  • Kim et al. (2014) Kim J.-h., et al., 2014, ApJS, 210, 14
  • Klypin & Holtzman (1997) Klypin A., Holtzman J., 1997, ArXiv Astrophysics e-prints,
  • Klypin et al. (1999) Klypin A., Gottlöber S., Kravtsov A. V., Khokhlov A. M., 1999, ApJ, 516, 530
  • Knebe et al. (2011) Knebe A., et al., 2011, MNRAS, 415, 2293
  • Knebe et al. (2013a) Knebe A., et al., 2013a, MNRAS, 428, 2039
  • Knebe et al. (2013b) Knebe A., et al., 2013b, MNRAS, 435, 1618
  • Knebe et al. (2015) Knebe A., et al., 2015, MNRAS, 451, 4029
  • Komatsu et al. (2011) Komatsu E., et al., 2011, ApJS, 192, 18
  • Lacey & Cole (1993) Lacey C., Cole S., 1993, MNRAS, 262, 627
  • Lagos et al. (2017) Lagos C. d. P., Theuns T., Stevens A. R. H., Cortese L., Padilla N. D., Davis T. A., Contreras S., Croton D., 2017, MNRAS, 464, 3850
  • Lange et al. (2015) Lange R., et al., 2015, MNRAS, 447, 2603
  • Maciejewski et al. (2009) Maciejewski M., Colombi S., Springel V., Alard C., Bouchet F. R., 2009, MNRAS, 396, 1329
  • Magnelli et al. (2011) Magnelli B., Elbaz D., Chary R. R., Dickinson M., Le Borgne D., Frayer D. T., Willmer C. N. A., 2011, A&A, 528, A35
  • Martin et al. (2017) Martin G., Kaviraj S., Devriendt J. E. G., Dubois Y., Laigle C., Pichon C., 2017, MNRAS, 472, L50
  • Moustakas et al. (2013) Moustakas J., et al., 2013, ApJ, 767, 50
  • Muzzin et al. (2013) Muzzin A., et al., 2013, ApJ, 777, 18
  • Nelson et al. (2018) Nelson D., et al., 2018, MNRAS, 475, 624
  • Onions et al. (2012) Onions J., et al., 2012, MNRAS, 423, 1200
  • Parsa et al. (2016) Parsa S., Dunlop J. S., McLure R. J., Mortlock A., 2016, MNRAS, 456, 3194
  • Pillepich et al. (2018) Pillepich A., et al., 2018, MNRAS, 473, 4077
  • Poole et al. (2017) Poole G. B., Mutch S. J., Croton D. J., Wyithe S., 2017, MNRAS, 472, 3659
  • Power et al. (2014) Power C., Read J. I., Hobbs A., 2014, MNRAS, 440, 3243
  • Press & Schechter (1974) Press W. H., Schechter P., 1974, ApJ, 187, 425
  • Robotham et al. (2014) Robotham A. S. G., et al., 2014, MNRAS, 444, 3986
  • Scannapieco et al. (2012) Scannapieco C., et al., 2012, MNRAS, 423, 1726
  • Schaye et al. (2015) Schaye J., et al., 2015, MNRAS, 446, 521
  • Sembolini et al. (2016a) Sembolini F., et al., 2016a, MNRAS, 457, 4063
  • Sembolini et al. (2016b) Sembolini F., et al., 2016b, MNRAS, 459, 2973
  • Snyder et al. (2015) Snyder G. F., et al., 2015, MNRAS, 454, 1886
  • Somerville & Davé (2015) Somerville R. S., Davé R., 2015, ARA&A, 53, 51
  • Springel et al. (2001) Springel V., White S. D. M., Tormen G., Kauffmann G., 2001, MNRAS, 328, 726
  • Stadel (2001) Stadel J. G., 2001, PhD thesis, UNIVERSITY OF WASHINGTON
  • Teyssier (2002) Teyssier R., 2002, A&A, 385, 337
  • Trayford et al. (2015) Trayford J. W., et al., 2015, MNRAS, 452, 2879
  • Trayford et al. (2016) Trayford J. W., Theuns T., Bower R. G., Crain R. A., Lagos C. d. P., Schaller M., Schaye J., 2016, MNRAS, 460, 3925
  • Tweed et al. (2009) Tweed D., Devriendt J., Blaizot J., Colombi S., Slyz A., 2009, A&A, 506, 647
  • Vogelsberger et al. (2014) Vogelsberger M., et al., 2014, MNRAS, 444, 1518
  • Volonteri et al. (2016) Volonteri M., Dubois Y., Pichon C., Devriendt J., 2016, MNRAS, 460, 2979
  • Welker et al. (2014) Welker C., Devriendt J., Dubois Y., Pichon C., Peirani S., 2014, MNRAS, 445, L46
  • Wright et al. (2017) Wright A. H., et al., 2017, MNRAS, 470, 283
  • van den Bosch (2017) van den Bosch F. C., 2017, MNRAS, 468, 885
  • van den Bosch et al. (2018) van den Bosch F. C., Ogiya G., Hahn O., Burkert A., 2018, MNRAS, 474, 3043

Appendix A 6DFOF linking length

As described in Section 3.1.2, we use a fraction of for the configuration space linking length, described in equation (10). We show in Fig. 11, how the particular choice of can affect how galaxies are delimited in VELOCIraptor, and how its IHSC changes with it.

Figure 11: Projected stellar density of the IHSC, 6DFOF object (step 2), and the final central galaxy, for different choices of (row labels). The 3DFOF object (step 1) is shown in grey in the middle panels. For this work our standard choice is

Values of link a large number of substructures to the 6DFOF object. Though smaller values get rid of some of the substructures in the outskirts of the central/group, they can also shrink the size of the central galaxy to only the inner parts as is seen for . The latter is appreciated in the IHSC, where bubble-like features can be seen at the position of galaxies, as if the central parts of the galaxies were carved leaving the outer parts unassigned to any galaxy. Mid range values of leave out some of the outskirts of the group without leaving bubble-like structures in the IHSC. For this study our preferred choice is as it leaves out most of the satellites, while preserving structures that can only be separated using the iterative search in the 6DFOF object333It is important to notice that even a small satellite still remains as part of the 6DFOF object (this can be seen by zooming the figure in the electronic version), and that independently of the value used, the inner profile of the galaxy is smooth showing the power of the iterative search and core growth..

Though, it can be seen that the estimated IHSC depends on the used, this example is illustrative of upcoming studies focused on the IHSC (Cañas et al, in prep).

Appendix B Core growth weighting

In Section 3.1.4 we stated that when calculating phase-space distances from cores, equation (19), a weight , was needed to compensate for the compactness of galaxies and extension of streams and shells. For this we use a mass-dependent weight, as this would reduced the phase-space distance from particles to phase-space compact high-mass objects (i.e. galaxy candidates), compared to low-density extended objects (streams and shells). We show in Fig. 12 surface density projections of how the growth of phase-space cores changes for , where is the total mass of the core at a given iteration level.

We show how the central galaxy is affected by in the first row. For and it can be seen that some of the outer parts of the galaxy are missing. The reason behind this is the fact that some streams can have very large dispersion so that ‘weak’ weights do not compensate, as is seen in the last row. On the other hand ‘strong’ weights can in fact make that the central galaxy absorbs the vast majority of the particles, as they would have to be close to any structure by the same orders of magnitude difference of their masses. A weight of does a slightly better job than , as for the largest satellite (second row) it is able to return correctly its outskirts. For satellites with leading or trailing tails, and some streams (rows 3 to 5), it can be seen that weak weights recover a little better outer material than strong ones, as for the latter streams seem to look a little fragmented. The last row shows an example of a tidal structure which grows drastically when weak weights are used, due to their large dispersions and the metric used to assign particles.

VELOCIraptor can also identify tidal features with the same algorithm. Though the ability of identifying streams is important for many studies, our priority is first to identify galaxies as cleanly as possible. In order to avoid the extreme growth of streams, we incline for stronger weights. Both and give similar results, however the latter does a better job at preventing that all particles are assigned to the central, and recovers better outskirts for satellites. We further compare these two weights by taking a closer look to the streams from the last rows of Fig. 12. In Fig. 13 we show zoomed-in projected surface densities and velocity field for both structures. The left panel shows how for an arc stream the velocity field smoothly changes along it. The same is recovered for both weights. However, for the last stream it can be seen that for a weight of , the velocity field is not continuous along the structure, and are in fact distinct structures, while for a weight of the stream (or tidal feature) has a smooth the velocity field.

Weights proportional to other powers of , or to any arbitrary quantity can be easily applied. However, a weight of is used as it gives the desired results for the purposes of this study.

Physically, for is the proportional to a galaxy’s tidal the radius, . In the case of point masses, the tidal radii can be approximated as the Roche lobe radii, i.e. . For a King’s profile the tidal radii is approximately (see e.g. Binney & Tremaine, 2008). For more realistic profiles , which for the spherical collapse model gives . Therefore, , give us , with . Our choice of is then appropriate to properly account for the size of galaxies for particle assignment.

Figure 12: Projected stellar density of structures found by VELOCIraptor inside a galaxy cluster in Horizon-AGN using different weights for the core growth.
Figure 13: Projected stellar density (color) and velocity field (arrows), for two streams identified by VELOCIraptor and grown with weights proportional to and , as labelled. For the left one it can be seen that the velocity field changes smoothly along the structure for both weights. For the right one it can be seen that has assigned particles of a distinct structure to the stream, while for the velocity field is much smoother.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description