A Quantification of the Butterfly Effect in Cosmological Simulations and Implications for Galaxy Scaling Relations
Abstract
We study the chaoticlike behavior of cosmological simulations by quantifying how minute perturbations grow over time and manifest as macroscopic differences in galaxy properties. When we run the same setup multiple times, the results produced by our code, AREPO, are binary identical. However, when we run pairs of ‘shadow’ simulations that are identical except for random minute initial displacements to particle positions (e.g. of order ), the results diverge from each other at the individual galaxy level (while the statistical properties of the ensemble of galaxies are unchanged). After cosmological times, the global properties of pairs of ‘shadow’ galaxies that are matched between the simulations differ from each other generally at a level of , depending on the considered physical quantity. We perform these experiments using cosmological volumes of evolved either purely with dark matter, or with baryons and starformation but no feedback, or using the full feedback model of the IllustrisTNG project. The runs cover four resolution levels spanning a factor of 512 in mass. We find that without feedback the differences between shadow galaxies generally become smaller as the resolution increases, but with the IllustrisTNG model the results are mostly converging towards a ‘floor’. This hints at the role of feedback in setting the chaotic properties of galaxy formation. Importantly, we compare the macroscopic differences between shadow galaxies to the overall scatter in various galaxy scaling relations, and conclude that for the star formationmass and the TullyFisher relations the chaotic behavior of our simulations contributes significantly to the overall scatter. We discuss the implications for galaxy formation theory in general and for cosmological simulations in particular.
0000000231851540]Shy Genel \move@AU\move@AF\@affiliationCenter for Computational Astrophysics, Flatiron Institute, 162 Fifth Avenue, New York, NY 10010, USA \move@AU\move@AF\@affiliationColumbia Astrophysics Laboratory, Columbia University, 550 West 120th Street, New York, NY 10027, USA \move@AU\move@AF\@affiliationCenter for Computational Astrophysics, Flatiron Institute, 162 Fifth Avenue, New York, NY 10010, USA \move@AU\move@AF\@affiliationDepartment of Astronomy, Columbia University, 550 West 120th Street, New York, NY 10027, USA \move@AU\move@AF\@affiliationMaxPlanckInstitut für Astrophysik, KarlSchwarzschildStraÃe 1, 85740 Garching bei München, Germany \move@AU\move@AF\@affiliationInstitute for Theory and Computation, HarvardSmithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138, USA \move@AU\move@AF\@affiliationMaxPlanckInstitut für Astrophysik, KarlSchwarzschildStraÃe 1, 85740 Garching bei München, Germany \move@AU\move@AF\@affiliationMaxPlanckInstitut für Astronomie, Königstuhl 17, 69117 Heidelberg, Germany \move@AU\move@AF\@affiliationHeidelberg Institute for Theoretical Studies, SchlossWolfsbrunnenweg 35, 69118 Heidelberg, Germany \move@AU\move@AF\@affiliationHeidelberg Institute for Theoretical Studies, SchlossWolfsbrunnenweg 35, 69118 Heidelberg, Germany \move@AU\move@AF\@affiliationInstitute for Theory and Computation, HarvardSmithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138, USA \move@AU\move@AF\@affiliationDepartment of Physics, Kavli Institute for Astrophysics and Space Research, MIT, Cambridge, MA 02139, USA
1 Introduction
Cosmological simulations are the most general tool for theoretical studies of galaxy formation. Significant progress is continuously being made on their physical fidelity, numerical accuracy and computing power, and as a result, also on their realism. However, several factors hinder the prospect of accurately simulating our Universe. One wellknown limitation stems from small scales: the need to model processes occurring on scales below the resolution of any given simulation using approximations (usually called ‘subgrid’ models). Another limitation that is widely appreciated originates from large scales: our ignorance about the initial conditions of cosmological systems, whether our own Galaxy or our Universe as a whole (often referred to as ‘cosmic variance’). In this work we consider for the first time the possible consequences of a limitation that in some sense is a combination of the two: our ignorance about initial conditions on small rather than large scales.
The butterfly effect is the phenomenon whereby a dynamical system evolves in a macroscopically different manner due to a minute change in initial conditions. Systems that possess this property are often loosely referred to as chaotic. In this work we use the term ‘chaoticlike’ to refer to phenomena related to the butterfly effect. A more formal definition of a chaotic system may involve the existence of a positive Lyapunov exponent, namely the exponential divergence of trajectories that are initially only infinitesimally separated. In regimes where we do identify an exponential growth of initially small differences, we refer to the time scale associated with this growth as the Lyapunov time scale, but in many cases the divergence we observe is not exponential and is therefore ‘chaoticlike’. Simulations that start from almost identical initial conditions are referred to here, following standard nomenclature in the context of chaos studies, as ‘shadow’ simulations, and matched systems within these simulations, such as particles or galaxies, are also referred to as ‘shadow’ versions of each other.
Chaoticlike systems can be found in diverse contexts in Astrophysics. Examples include the dynamics of planetary systems (Laskar, 1989), Nbody systems such as star clusters or dark matter halos (Heggie, 1991; ElZant et al., 2018) as well as galactic disks and bars (Fux, 2001; Sellwood & Debattista, 2009), starformation in turbulent molecular clouds (Adams, 2004; Bate et al., 2010), and the orbits of satellite galaxies, stellar streams and halo stars (Maffione et al., 2015; PriceWhelan et al., 2016a, b). Here we study the butterfly effect in a context that has hitherto been largely neglected: the galaxy formation process from cosmological initial conditions in the paradigm. To this end, we employ stateoftheart cosmological hydrodynamical simulations and study the growth over cosmological time scales of minute perturbations applied to them. We also discuss the applicability of our results and conclusions beyond the realm of simulations, namely for the real Universe.
Chaoticlike sensitivity to initial conditions has been considered for cosmological systems in a few cases before. Thiébaut et al. (2008) measured the characteristic growth (Lyapunov) time scales of small differences between initial conditions in sets of otherwise identical cosmological pure Nbody boxes. They found that chaoslike behavior appears on small, nonlinear scales, but is absent on large, linear scales. Interestingly, some global properties of dark matter halos were found to be robust and stable to these magnified differences on the particle level, but not all. Thiébaut et al. (2008) identified several global halo properties that differed significantly between shadow versions of the same halos, such as spin and orientation of the velocity dispersion tensor. ElZant et al. (2018) recently found that global properties of noncosmological, equilibrium spherical Nbody systems show an initial exponential growth of errors but then a saturation that converges towards zero as the number of particles is increased towards the collisionless limit. The direct relevance of this result to the case of halos developing from cosmological initial conditions is unknown and merits further research (see also Benhaiem et al., 2018). Kaurov (2017) found that smallscale modifications to cosmological initial conditions propagate to much larger scales by the epoch of reionization, dramatically affecting simulation results such as the escape fraction. In this paper we perform measurements that are similar in spirit to those of Thiébaut et al. (2008), but on cosmological simulations that include baryons, hydrodynamics and galaxy formation models, and using different methods for introducing differences and measuring their growth.
Recently, while this paper was in preparation, Keller et al. (2018) investigated chaoticlike behavior seeded by roundoff errors in gravitohydrodynamical simulations of a few individual galaxies, both from idealized and from cosmological initial conditions. With the codes they used, GASOLINE2 (Wadsley et al., 2017) and RAMSES (Teyssier, 2002), repeated runs of the same setup resulted in different outcomes. They showed that the results of these different runs have normal distributions. In cases where the difference between two such shadow simulations grows to large values (even up to order unity), which often are associated with galaxy mergers, it tends to later converge back to the mean, a behavior they interpret as a result of negative feedback loops and global physical constraints on the system. They conclude that in order to determine the degree to which the results from simulations with different physical models truly differ from one another, the measured differences between them must be assessed keeping in mind the butterfly effect, namely with respect to differences that would occur merely by repeating runs with the same model.
In this work we quantify the differences between shadow hydrodynamical simulations of galaxies in the cosmological context. In contrast with Keller et al. (2018), who have studied just a few individual galaxies and a large number of shadow simulations for each of them, we use ‘largescale’ (tens of ) cosmological boxes that contain thousands of galaxies, and use a small number of shadow simulations for each set of initial conditions. This allows us to quantify the average magnitude of the butterfly effect for a statistically representative galaxy population. In addition, we study how galaxies move due to the butterfly effect in parameter spaces combining several physical quantities that are related to each other through ‘scaling relations’, and thereby quantify how much of the scatter in those relations is affected by the butterfly effect. The width, or scatter, in scaling relations is often considered to be no less fundamental than their shape parameters such as mean normalization and slope. For example, McGaugh (2012); McGaugh & Schombert (2015) consider the very small scatter in the TullyFisher relation between galaxy luminosity and rotation speed (Tully & Fisher, 1977) as evidence towards modified gravity. The scatter around the mean relation between galaxy mass and starformation rate has also been studied extensively (e.g. Tacchella et al., 2016; Matthee & Schaye, 2018), and it is believed to encode a variety of key processes in galaxy formation.
This paper is organized as follows. Section 2 describes the simulations we use and the analysis methods applied to them. Section 3 presents results for several individual galaxy properties from hydrodynamical cosmological simulations. Section 4 lays out the main results of this work, which concern several combinations of properties, namely scaling relations. Section 5 contains a summary and an extensive discussion. Finally, Appendix A briefly presents results from dark matteronly cosmological simulations, and Appendix B discusses several special sets of simulations run for numerical verification purposes.
2 Methods
2.1 Simulations
2.1.1 Code and Setup
We employ the MPIparallel TreePMmovingmesh code AREPO (Springel, 2010) to run three series of cosmological simulations, distinguished by different sets of physical components and models they include. Specifically, the DMonly series represents pure Nbody simulations of cold dark matter; the NoFeedback series adds baryons, hydrodynamics, radiative cooling, and star formation, utilizing the methods presented in Vogelsberger et al. (2012); and the TNG series employs a more comprehensive treatment of the physics of galaxy formation, including in particular supermassive black holes as well as various feedback processes, utilizing the same models (Weinberger et al., 2017a; Pillepich et al., 2018a) used for the IllustrisTNG project (Marinacci et al., 2017; Naiman et al., 2018; Nelson et al., 2018; Pillepich et al., 2018b; Springel et al., 2018).
Each of these series is comprised of simulations at four resolution levels, the basic parameters of which are provided in Table 2.1.1. The naming convention we use to distinguish the resolution levels is related to the spatial resolution. The resolution level, for example, is similar to (slightly worse than) the Illustris simulation (Genel et al., 2014; Vogelsberger et al., 2014a, b), while the level has a mass resolution that is nearly five times better than Illustris. For the higher resolution levels we are limited by computational power to volumes of , but for the lower resolution levels we can afford to run larger volumes of , which is helpful for statistical power. The initial conditions for some of our cosmological boxes have been generated with NGenIC (Springel et al., 2005) and are adopted from Vogelsberger et al. (2013), and some generated with MUSIC (Hahn & Abel, 2011) especially for this study. We uniformly use a cosmology with , , , , and (except for the DMonly series) .
resolution  dark matter gravitational  baryonic  dark matter  box size  number of 

level  softening [comoving ]  particle mass []  particle mass []  []  dark matter particles 
table \hyper@makecurrenttable
Table 0. \Hy@raisedlink\hyper@@anchor\@currentHrefProperties of the different simulation resolution levels used in this study. Our simulations are comprised of four resolution levels that span a factor of in spatial resolution and in mass resolution. Throughout the paper, they are referred to using the notation in the leftmost column, based on their spatial resolution. In comparison to the IllustrisTNG simulations, the level is similar to (slightly worse than) the resolution of the TNG100, and so is the level with respect to TNG300. In addition to dark matter particles, whose number is provided in the rightmost column, the initial conditions of hydrodynamical simulations include an identical number of gas cells.
2.1.2 Creating Shadow Simulations Using Minute Perturbations
Each cosmological box is first evolved from its initial conditions at down to some final redshift, producing several snapshots at intermediate times. These snapshots are then used as initial conditions for what we call sets of shadow simulations, up to a unique minute perturbation that is applied to each of the shadow simulations in the set (described in the next paragraph). A set consisting of simulations contains then pairs of shadow simulations, for which the setup and initial conditions are identical up to a minute perturbation. An overview of these sets is provided in Table 2.1.2, including the number of simulations and pairs in each set, as well as the perturbation (namely, initial) redshift and the final one. In most cases, unless otherwise noted, the shadow simulations produce snapshots at prescribed times starting after their initial time, namely the time the perturbations are introduced, in intervals increasing by a factor of two up to past the perturbation time^{2}^{2}2Since snapshots can only be written by our code at time steps when all particles are active, the snapshot times cannot be prescribed exactly, but are rounded to those special time steps. This implies that simulations with lower resolutions have a lesser ability to produce snapshots at very fine intervals. Accordingly, only the resolution level simulations can produce a snapshot as early as after their initial time, while for the resolution level the first snapshot is only written into the run. This can easily be changed by imposing a maximum time step, but that undesirably affects also the integration itself, as shown in Appendix B and is therefore not done in the main body of this work.. This achieves high time resolution for following the early stages of the evolution of the perturbations. Thereafter, the snapshot separation is approximately equal in the logarithm of the cosmological scale factor. The total number of snapshots written by each shadow simulation between and is . In addition to the sets presented in Table 2.1.2, several special sets have been run for numerical verification reasons. These are described and discussed in Appendix B.
series  resolution level  number of sets (volumes)  number of simulations  resulting number of pairs  perturbation  final 

DMonly  1  3  3  5  0  
1  3  3  5  0  
1  3  3  5  0  
1  2  1  5  0  
NoFeedback  1  3  3  5  0  
1  2  1  5  0  
1  2  1  4  0  
1  3  3  5  0  
2  5  0.5  
TNG model  1  3  3  5  0  
1  2  1  5  0  
1  3  3  5  0  
1  2  1  5  0 
table \hyper@makecurrenttable
Table 0. \Hy@raisedlink\hyper@@anchor\@currentHrefAn overview of the simulation suite used in this study. We use three series of simulations (first column), each with a different physical model: simulations including only dark matter (DMonly), simulations with baryons and starformation (NoFeedback), and simulations with a full galaxy formation model (TNG model). In each series, there are four resolution levels (second column), most of which employ a single cosmological box, except for the highresolution NoFeedback case that uses two distinct boxes, providing two sets of shadow simulations (third column). The total number of simulations comprising each set is reported in the fourth column, and the resulting number of pairs of shadow simulations is reported in the fifth column (in the NoFeedback case, the two numbers correspond to the two sets). The penultimate column reports the redshift at which the shadow simulations are perturbed and resumed, and the last one the final redshift to which they are evolved.
The ‘minute perturbation’ applied to every simulation in every set is in most cases (unless noted otherwise) implemented as a displacement in the position of each and every particle in the snapshot that serves as the common initial conditions of the set. These displacements are applied only once, immediately after reading the snapshot data into memory and before any calculations are done to evolve the system. These displacements are applied in all three Cartesian spatial directions, and their magnitudes in each direction are drawn from a uniform distribution between (unless noted otherwise) and , with the coordinate of the particle in the Cartesian direction . Since particle positions are handled with double precision floating points, whose significand has a precision of 53 bits or decimal digits, this range of possible displacements spanning translates into possible values for the displacement of any given particle along each Cartesian axis.
With this design choice of limiting the displacements to a constant, small number of bits representing the position of each particle, the typical physical size of the displacement scales with the position in the box . Given the box sizes we use, the maximal particle coordinates are of order tens of , and the displacements are hence at most of order (comoving). An alternative possible design choice of keeping a constant physical displacement size across the box rather than a constant relative displacement size would be inconsequential to the results we present, for two reasons. First, due to the fact that throughout the vast majority of the simulation volume (except very close to the origin) the initial displacements are within the same order of magnitude. Second, due to the fact that our results are largely insensitive to the magnitude of the initial perturbations, as demonstrated in Appendix B.1.
2.1.3 Discussion of Numerical Nuisance Parameters
Other than the application of a unique realization of displacements to each simulation, all shadow simulations in a given set are evolved identically, in terms of, e.g. the operating system, the executable, the number of compute cores and MPI tasks, the seed for the random number generator, and so on^{3}^{3}3This does not include, however, the specific nodes on which the computation is done.. We choose to introduce perturbations directly so that we have full control over them. We could have, however, introduced them in a less direct way by, for example, running each shadow simulation in a set using a different number of MPI tasks. Such a choice would immediately introduce a different realization of roundoff errors in the calculation of, e.g. forces, fluxes, and physical displacements, generating a very similar outcome to perturbations we introduce ‘by hand’ close to the machine precision level. The number of MPI tasks serves hence as a nuisance parameter that affects the results of a simulation through the arbitrary ‘choice’ of roundoff errors. This is true even with the simplest set of physics, namely in pure Nbody simulations, as well as in pure hydrodynamical simulations, let alone in a combined gravity and hydrodynamics case.
It is worth commenting, however, that when we use the exact same setup, keeping all factors described above fixed, and do not introduce any perturbation, namely running ‘the same simulation’ more than once, our code produces results that are binary identical, remaining so even over integrations of billions of years of cosmic time^{4}^{4}4Note that the specific nodes on which the computations are done are not required to be kept fixed for the results of the calculations to be binary identical.. This is achieved by a deterministic order of operations that is independent of machine noise such as communication speeds between different nodes, accounting for deterministic roundoff errors. It is nevertheless important to realize that this feature of exact reproducibility has nothing to do with accuracy: the reproducible realization of roundoff errors with a particular setup of our code is arbitrary, and is no more accurate than any other one. For example, the different arbitrary realization of roundoff errors that our exact same code and setup would obtain if only the number of MPI tasks was modified is just as correct.
In our simulations that on top of gravity and hydrodynamics include also starformation there is an additional nuisance parameter that is worth discussing, which is the seed for the random number generator. Random numbers are used in our model^{5}^{5}5Specifically, by employing the gsl_rng_ranlxd1 random number generator from the GNU Scientific Library. in the starformation and feedback process to determine where stars will form or galactic winds be launched (Springel & Hernquist, 2003). This is necessary since the time scales associated with these processes are of order , while simulation time steps can be as short as . Therefore, starforming gas cells have typically very low probabilities during individual time steps to be converted into stellar or ‘wind’ particles. The realization of these probabilities into actual starformation or windlaunching events is controlled by random numbers. With a fixed seed for the random number generator, two identical setups result in identical results. However, if the seed for the random number generator is modified, a different sequence of random numbers is generated, and stars will form at different times and positions. This will also be the case if the same seed, and hence random numbers sequence, is used but with a time or cell offset between two simulations. It is important to realize that differences in roundoff errors, or the introduction of minute displacements as described above, will quickly develop into effective offsets in the random number sequence, and hence have the same effect. This is because once the roundoff errors develop into a situation where the number of starforming gas cells in one simulation is different from its shadow simulation, each individual cell will be affected by a modified series of random numbers. For further discussion of the role of random numbers in our results, and the relation to the real Universe, see Appendix B and Section 5.
2.2 Analysis
2.2.1 Matching between Shadow Simulations
The first analysis task given a set of shadow simulations is to match individual objects – galaxies or dark matter halos – between these simulations and thereby obtain a catalog of ‘shadow objects’. The type of objects that we match in practice is SUBFIND subhalos (Springel et al., 2001). These objects are matched between each pair of shadow simulations by identifying subhalos across simulations that have common dark matter particle IDs, namely according to commonalities of their Lagrangian patches. Specifically, the shadow subhalo in simulation B of a subhalo in simulation A is the subhalo in simulation B that contains the largest number of dark matter particles that are among the most bound dark matter particles in the subhalo in question from simulation A. The number is set to of the total number of dark matter particles in the subhalo in question, bounded by from below and from above. Further, if multiple halos from simulation A find the same match in simulation B, then only the most massive of them is kept as a valid match, and the rest are discarded^{6}^{6}6The determination of which simulation in a pair of shadow simulations is ‘A’ and which is ‘B’ is arbitrary. We also checked an alternative method: enforcing a bidirectional match by discarding all galaxies whose match’s match is not themselves. This resulted in discarding of the galaxies, and had virtually no effect on our results.. We perform these matches for all subhalos with a stellar mass larger than in the hydrodynamical simulations or total mass larger than in the DMonly simulations. This procedure results typically in a matched fraction of . Figure (2.2.1) presents mock stellar light images of a pair of matched shadow galaxies from a series of snapshots starting from shortly after the perturbation is applied and covering most of cosmic time.
In our analysis we narrow these matches down to include only those that are between two subhalos that are both the main subhalos of their FriendsOfFriends (Davis et al., 1985) halos, namely central subhalos, or central galaxies in the case of the hydrodynamical simulations series. This is a conservative choice, as differences between shadow subhalos where one is a central and one is a satellite tend to be larger, due to the strong environmentdriven evolution of satellites. Such cases occur when timing differences appear between shadow systems, for example if one, in which the subhalo is still a central, lags behind the other, in which the subhalo is already a satellite. Such cases are quite rare, and the galaxy populations in our simulations are not large enough to sample them well, which is another reason for our choice to exclude them from the main analysis.
2.2.2 Quantifying the Differences between Shadow Galaxies
Once we have a catalog of shadow subhalos between each pair of shadow simulations in a set, we calculate logarithmic differences, namely ratios, in the properties of those shadow subhalos. We focus on the following quantities: total bound stellar mass and dark matter mass , the maximum of the circular velocity profile ( as a function of radius ), the halfmass radius of the stellar distribution , the instantaneous starformation rate based on the gas distribution in the subhalo , and the starformation rate (SFR) averaged over a time window of , . All of these quantities are calculated by SUBFIND during the run, except for , which we calculate in postprocessing based on the formation times of the stellar particles belonging to the subhalo (in Appendix C we verify that our results are not significantly affected by the particularities of the SUBFIND algorithm).
The logarithmic differences of these quantities between shadow subhalos are studied in Section 3. We show that their distributions are wellfit by Gaussians, and quantify the standard deviations of these distributions, namely the typical pairwise differences, as a function of time since the perturbation and of subhalo mass. It is important to realize that the distribution of pairwise differences is wider by than the distribution of actual values among many perturbed realizations. This is simply because each realization is drawn from the normal distribution of actual values, and the distribution of pairwise differences is then a distribution of the differences between two identical normal random variables, which is indeed in itself a normal distribution that is wider than the original one. In our case, we have a small number of pairwise differences per subhalo, or even just a single one, so we cannot reliably quantify the distribution of actual values. However, we do have a large statistical sample of many galaxies, and therefore many pairwise differences for a population, whose distribution can be robustly quantified and fit with a Gaussian. We therefore present examples of these distributions in and of themselves in Figure (3.1), as discussed in the next Section. However, it is important to keep in mind that when using the standard deviations of these distributions to quantitatively compare to distributions of values, rather than of differences, as done in Section 4, the width of the pairwise shadow differences have to be divided by for a meaningful comparison, and so this is the way they are presented throughout the paper, with the exception of Figure (3.1).
In Section 4 we go beyond the individual quantities, and study the evolution of differences between shadow galaxies in the context of scaling relations. Specifically, we quantify the extent to which differences in various individual quantities between shadow galaxies move them perpendicular to, versus along, certain scaling relations. To this end, for a given pair of physical quantities, e.g. stellar mass and halo mass, we perform a piecewise linear fit to all the galaxies in all the simulations of a given set. These fits then define the scaling relation between these quantities, as well as the (piecewise) perpendicular direction to the relation, namely the direction in which the scatter of the relation is minimal. We then calculate the difference between each pair of shadow galaxies in that perpendicular direction^{7}^{7}7The differences between a pair of shadow galaxies in an other direction, and in particular in the two directions that are defined by the two physical quantities that define the relation themselves, are related to each other directly via the slope of the relation. We consider the distances perpendicular to the relation as the least arbitrary and the most meaningful quantity of these possibilities.. The standard deviation of these pairwise perpendicular differences (divided by for reasons discussed in the previous paragraph) is compared to the total scatter (among all galaxies) perpendicular to the scaling relation, in order to assess the contribution of the butterfly effect to the total scaling relation scatter.
3 Results: Individual Quantities
3.1 Distributions of Shadow Pairwise Differences
We find that the distributions of pairwise logarithmic differences between the properties of shadow galaxies are well fit by Gaussians whose centers are consistent with zero. This is a general result, which we demonstrate in Figure (3.1) in a particular regime. There, we present the probability density functions of all pairwise logarithmic differences between the values of the maximum circular velocity profile of shadow galaxies, for all central galaxies with stellar mass in our NoFeedback (top) and TNG model (bottom) simulation series at the last available snapshot, separated by resolution level. In each case, the actual probability density function (thick stepwise curves), which comprises of of pairwise differences, can be described well by a bestfit Gaussian (thin curves). This shape probably arises due to the central limit theorem, as a large number of individual factors (resolution elements) contribute to the quantity . As mentioned, this is a general result that we find holds for other quantities and for other galaxy selections.
The dependence on resolution seen in Figure (3.1) is illuminating. In the NoFeedback series, the width of the distribution decreases with increasing resolution: at higher resolution the minute perturbations that are introduced at grow less by than they do at lower resolution. That the result is not converged implies that the magnitude to which these perturbations grow in the lowerresolution cases is not physical, and possibly that their growth is altogether a numerical artifact even in the highest resolution that is available to us, rather than an intrinsic property of the simulated physical system. In contrast, the results when the TNG model feedback processes are turned on show no meaningful dependence on resolution. At all resolution levels, the standard deviation of the distribution is , namely a typical difference of between the values of shadow galaxies. Note that galaxies in the considered mass bin of are resolved at the resolution level with only stellar particles, rendering the invariance of the result between all resolution levels quite striking.
This convergence suggests that the growth of the initial perturbations, on a scale of one part in , to percentlevel differences is inherent to (the numerical realization of) the physical system, namely a system evolving from cosmological initial conditions according to the physical processes included in the TNG model and their particular implementation in this model. In particular, at the resolution level, the distribution of pairwise differences is significantly broader than it is at the same resolution level in the NoFeedback case, indicating that the final level of differences is not inherent to the code in general, but is related to the particular physical processes that it implements. Specifically, that the pairwise differences do not keep shrinking with increasing resolution as in the NoFeedback case is an indication that the form of feedback implemented in the TNG model increases the sensitivity of the system to small perturbation, or in other words the degree of chaoticlike behavior it manifests.
After establishing that the pairwise differences distributions are Gaussian, throughout the rest of this paper we characterize them with a simple summary statistic: their standard deviation. However, as discussed in Section 2.2, for each individual galaxy, the standard deviation of the pairwise differences between its various shadow versions is a factor of larger than the standard deviation of the values themselves. Since here we have only a few pairs per galaxy, we cannot sample the distribution of the values themselves well. However, we have a large number of galaxies, and hence do have a robust estimate of the standard deviation of the distribution of pairwise distances. We hereafter use this robust estimate and divide it by in order to obtain a robust estimate of the standard deviation of the values themselves even in the absence of a direct probe into their distribution. As discussed in Section 2.2, the standard deviation of the latter is the more meaningful quantity.
3.2 Growth of Differences Over Time
3.2.1 Results from NoFeedback Simulations
Figure (3.2.1) presents the standard deviations of distributions like the ones discussed so far (divided by , as discussed above) as a function of time, where is defined to be the time the perturbations were introduced, namely in this case . These are shown for four physical quantities, one per panel as indicated in the figure, and for four resolution levels via different line styles, as indicated in the legend, all for galaxies from the NoFeedback series in a fixed mass bin of (in the bottomright panel: ). The differences between shadow galaxies have a generic evolution as a function of time for all explored quantities at all resolution levels: an initial growth that can be described reasonably well by a power law , which then plateaus approximately after the perturbation. In other words, after a transition period lasting about after the perturbation, galaxies of have a certain (resolutiondependent) degree of random variation between shadow simulations that is independent of cosmic epoch. For most quantities, in accordance with the top panel in Figure (3.1), the results are not converged, as the differences are smaller at higher resolution, both in the growth phase as well as after reaching a plateau. At the resolution level, the plateau levels are for the various quantities, while for the highest, resolution level, they are .
For the two quantities shown on the right column of Figure (3.2.1), stellar mass (top) and dark matter mass (bottom), there is one source of randomness that is easy to estimate: Poisson noise. Since both stellar and dark matter particles are numerical constructs that discretely sample an underlying smooth field, we can expect random variations on the masses of collections of them, such as subhalos, to scale as , where is the typical particle mass and is the number of particles in a given subhalo. Hence, the relative random scatter in the mass of a subhalo is expected to have a lower limit at . These lower limits are shown in the right column of Figure (3.2.1) as horizontal dashed lines. Indeed, this expectation is confirmed, as for the lower resolution levels, the ‘chaotic’ differences between shadow subhalos plateau exactly to the values expected from this Poisson noise estimate. It takes about for the initial perturbations to evolve to that level, since at shorter times after the perturbation the masses of the subhalos still mostly consist of their components that formed prior to the perturbation, and hence is in common to all shadow realizations. In other words, in this context applies to the number of particles added since the perturbation. It is therefore expected that the time to reach the plateau corresponds roughly to the growth timescale of the mass itself, and this is consistent with the observed timescale of . Moreover, for a constant mass growth rate , which is a reasonable approximation for a relatively short window of , is roughly linear with time, and hence the relative error (where is a constant by selection) scales roughly as , as indeed observed.
Importantly, the expected Poisson noise diminishes as the square root of the mass resolution, namely by a factor of with every step in resolution level. In the case of the stellar mass, the measured ‘chaotic’ differences indeed diminish at that rate for the lowest three resolution levels, indicating that Poisson noise is the dominant factor in those regimes. However, for the level this is no longer the case, as the measured differences are larger than expected from Poisson noise. This indicates that at this high resolution there exists a different origin to the ‘chaotic’ differences that is not just sampling noise. In the case of the dark matter mass, this is even more pronounced, as the differences are larger than expected from Poisson noise at all resolution levels but the lowest one, and in fact the differences appear to be converged between and . This, again, indicates that there is something beyond the simple randomness of the sampling of the mass field that gives rise to mass differences between shadow simulations.
For the quantities shown on the left column of Figure (3.2.1), maximum circular velocity (top) and stellar halfmass radius (bottom), it is not clear whether a simple analytic estimate can be devised. It is to be expected that there is an initial growth phase during the time that there still exists a significant component that formed before the perturbation. For reasons that are unknown to us, the differences in grow at a similar rate to those of the masses, roughly , but the growth of the differences begins slower than that and then accelerates around after the perturbation^{8}^{8}8For a possible connection between a rough divergence of integrated quantities of Nbody systems and diffusion, see ElZant et al., 2018).. It is also then not entirely expected or straightforward that the differences between the shadow galaxies in these two quantities reach a plateau around the same time the masses do, , suggesting that the differences may be massdependent but not timedependent. Importantly, and curiously, these structural properties that are on the left column show worse convergence than the masses on the right column, suggesting that they might be driven by numerical discreteness that will continue diminishing with increasing resolution.
3.2.2 Results from TNG model Simulations
Figure (3.2.2) is analogous to Figure (3.2.1) except that it presents the results for the TNG model simulation series, and that it includes four additional panels for additional physical quantities. Several important qualitative differences exist between Figures (3.2.1) and (3.2.2).
First and foremost, the results for the common four physical quantities (top four panels) appear to be wellconverged with the TNG model, as opposed to the case without feedback, extending a similar result discussed around Figure (3.1). In particular, at the highest resolution level, , the typical differences among shadow galaxies close to are much larger with the TNG model than without feedback: (or ) versus for (topleft), (or ) versus for (topright), (or ) versus for (middleleft), and (or ) versus for (middleright). It appears, then, that the introduction of feedback in the TNG model gives rise to a much stronger amplification of the initial perturbations.
Second, for all the baryonic properties we examine (i.e. except for ), the differences appear to be rising with the TNG model at all cosmic times, and in particular still be rising at , rather than reaching a plateau as in the NoFeedback case. In other words, galaxy mass is no longer the sole determinant of the differences between shadow simulations; instead, galaxies at a fixed mass tend to show a larger effect of the initial perturbations at later epochs.
Third, the evolution of the differences in the stellar and dark matter masses (topright and secondright panels) is with the TNG model not strongly affected by Poisson noise (the only exception being the resolution level for ), but instead continues growing to much higher levels than that, implying that actual physical processes generate these differences rather than effects of discrete sampling. Interestingly, the growth keeps its approximate powerlaw dependence on time even after crossing the Poisson noise level, which is curious as the explanation we suggested above for this dependence applied only to the early regime, before reaching that level.
The bottom four panels in Figure (3.2.2) present four additional quantities that were not included in Figure (3.2.1) for the NoFeedback model. In the third row on the left are the logarithmic differences between shadow galaxies in stellar metallicities. The results are systematically converging and appear very well converged between the two highest resolution levels after , at a level of (or ) at . In the third row on the right are the differences in black hole masses, which hover around (or ) at for the various resolution levels, which however do not show a monotonic behavior, as discussed below.
The two quantities examined in the bottom row of Figure (3.2.2) are measurements of the starformation rate (SFR), but on different timescales. In the bottomleft, it is the instantaneous SFR as measured from the gas cells, which is determined by their density based on the Springel & Hernquist (2003) model, . In the bottomright, it is the SFR averaged over the past , as measured from the number of stellar particles that actually formed during this time window, . For both quantities, an estimate of Poisson errors can be made based on the number of resolution elements that contribute to the calculation. For this is somewhat less accurate, as the instantaneous SFRs of individual cells can vary greatly, than for , which is based on the almostconstant masses of stellar particles. Nevertheless, both quantities show a similar picture indicating that Poisson noise^{9}^{9}9Unlike mass, which is constant by selection, the SFRs change over cosmic time, and hence the Poisson noise level is not constant. The dashed horizontal lines in the bottom two panels of Figure (3.2.2) are calculated based on the SFRs, which are at their nadir at that time, resulting in larger Poisson noise levels than at any other cosmic epoch. does not dominate, except perhaps at the lowest resolution level^{10}^{10}10Note that the feature at that appears for is there essentially by construction, as at all times shorter than past the perturbation, the measurement of is based partially on stellar particles that were formed prior to the perturbation, namely ones that are by construction in common between all the shadow simulation in a set. Only after longer times can and do the differences grow substantially to (and even beyond) the indicated Poisson noise level, which is calculated assuming that all the particles are independent draws from some underlying smooth field.. The effect of the perturbations is clearly still rising for the SFRs as a function of cosmic time even at , for galaxies in this fixed mass bin. Perhaps surprisingly, the differences between shadow simulation in are quite close to those in the instantaneous , both being at . This indicates that the star formation histories of shadow galaxies diverge from one another in a significant way not only on short time scales, but rather even when averaged over time windows much longer than, e.g., a galactic dynamical time. It is also worth pointing out, for context, that this level of differences between shadow galaxies is comparable to the overall scatter of SFRs between galaxies in this stellar mass bin, a point discussed in more detail in Section 4.
The results in the bottom row of Figure (3.2.2) show a curious behavior with respect to dependence on resolution. They appear essentially converged (at ) between the two intermediate resolution levels of and , but then diverge towards smaller values for the highest level, . We interpret this as evidence that star formation itself proceeds in a different way in the set, affecting the process of perturbation amplification. This is in accordance with the findings of Sparre et al. (2014) that a new, more bursty mode of starformation appears at resolution levels beyond that of Illustris. In other words, not only the process we study here, namely the perturbation amplification, is affected by changing resolution, but also the results of the simulation itself and thereby also the dominance and effect of various physical processes that occur within the simulation and which drive the perturbation amplification. It is hard to separate the direct effect of numerical resolution on the perturbation amplification from its indirect effect through changes to the simulation results themselves and to the relevant physical processes. In fact, this indirect effect of resolution is not guaranteed to act in the direction of decreasing the amplification. Indeed, a nonmonotonic dependence on resolution appears for the case of black hole masses (third from top panel on the right), and a tentative hint for an opposite effect can be seen in the topleft panel for , where at late times the growth of the differences is faster, and their amplitude is larger, in the case than in the other resolution levels, which in themselves appear quite converged. A careful examination of Figure (3.1) reveals that, at least in the last snapshot, this is driven by the larger number of outliers in the case, which one may speculate to, as well, be driven by the more bursty mode of star formation at this resolution level. While this particular case is not conclusive due to small number statistics, this general point is discussed further in Section 3.3.
3.2.3 Comparing Shadow Differences to Overall Scatter
We close this subsection with a study of one additional quantity, the angle between the angular momentum vectors of the stellar component of subhalos and of their total mass content (including the dark matter and gas), which we denote . Pairwise differences of between shadow galaxies are presented in Figure (3.2.3) (for this quantity, we find no mass dependence, hence this figure is based on all galaxies with ). In the top panel, the solid curves are analogous to those in Figures (3.2.1) and (3.2.2), and they present a similar picture of initial power lawlike growth and a plateau reached at , which is resolutiondependent, but possibly close to converged in the highest resolution level.
In addition to the typical differences between shadows (solid curves), the top panel in Figure (3.2.3) also shows the standard deviations of the overall distributions of among all galaxies with (dashed curves). These are seen to be rather constant as a function of time and for the most part between the four resolution levels, at . At low resolution, however, the overall scatter is larger than at higher resolutions, and is not much larger than the typical difference between shadow galaxies. This suggests that it is the butterfly effect itself that affects, namely enhances, the overall scatter at the level. When the former drops, at higher resolution, so does the latter.
Shown in the bottom panel of Figure (3.2.3) is the ratio between the solid and dashed curves of the top panel, which can be interpreted as the fractional contribution of the butterfly effect to the total scatter in this quantity. Since we compare standard deviations of distributions, and plausibly other contributions of scatter would be independent and hence add quadratically, it is the square of the ratio shown in the bottom panel that is the more meaningful quantity, namely the contribution of the butterfly effect to the variance of among the overall galaxy population. That in the highestresolution case it appears possibly converged at implies that about of the variance among galaxies in the misalignment between these two vectors cannot be derived from deterministic macroscopic arguments, namely cannot be predicted and explained. In Section 4 we will make similar comparisons, but instead of for the scatter in an individual quantity, for that of a scaling relation between two quantities.
3.3 Differences Versus Mass
In Section 3.2 we have shown how differences between shadow simulations grow as a function of time for a fixed selected mass bin. Here, we present a complementary view, of the latetime differences between shadow subhalos of various mass bins (all perturbed with respect to one another at ). These results are shown in Figure (3.3) for the NoFeedback series, in an analogous organization to that of Figure (3.2.1), with a different physical quantity in each panel and a different color for each resolution level. The middle of the five stellar mass bins, , is the one for which results were discussed in Section 3.2, and so is the first of the four dark matter mass bins in the lowerleft panel, . In order to increase the statistical significance, the results are shown for the average of six snapshots corresponding to the redshifts of the last six snapshots available for the highest resolution level, which in the case of the NoFeedback series corresponds to .
The topright panel in Figure (3.3) shows that the logarithmic differences between the stellar masses of shadow galaxies in the NoFeedback series is strongly massdependent, and specifically smaller for more massive galaxies. This is easy to understand, as the close match is apparent between the actual data (solid steps) and the estimates based on Poisson noise (dotted curves). Hence, the conclusion from the topright panel of Figure (3.2.1) regarding the Poisson noise origin of the differences holds generally for all mass bins. The exception to this conclusion, which is also in alignment with Figure (3.2.1), is the highest resolution level, and in particular so, for higher mass bins. In particular, for galaxies with , the stellar mass differences between the shadow simulations are several times larger than expected based purely on sampling noise given the number of particles comprising these galaxies. Still, the standard deviations between the stellar masses in shadow galaxies at this high resolution level is rather small, , across the mass range.
The result is quite different for the dark matter mass of these subhalos, as shown in the lowerright panel of Figure (3.3). It is not clear if the results show convergence towards a value larger than zero, however they are definitely larger than expected purely due to Poisson noise, at all resolution levels and all masses, except at the combination of lowest mass and lowest resolution. Nevertheless, the lower resolution levels, and in particular , are clearly affected by the Poisson noise to a certain degree. At the end of the day, the magnitude of the result at the highest resolution level may be considered small: it is across the full mass range explored.
The results on the left column of Figure (3.3) do suggest convergence toward massindependent values of for the maximum circular velocity (top) and for the stellar halfmass radius (bottom). We do not have an analytical estimate analogous to the one we have for the massbased quantities shown on the right, but it is nevertheless clear that at lower simulation resolution levels, lowermass galaxies are more strongly affected by the butterfly effect, and that this mass dependence becomes weaker at higher resolutions. This suggests that there is a role here too for discreteness or sampling effects, but those however appear to be largely mitigated at the resolution level, where a massindependent ‘floor’ is reached.
In Figure (3.3) we present a similar study, but for the TNGmodel simulation series, where again the structure is analogous to that of the timedependent Figure (3.2.2). The phenomenology seen in Figure (3.3) is quite rich, and we here discuss the aspects we find most significant and illuminating.

As seen in the topleft panel of Figure (3.2.2) for the middle mass bin shown here, the dependence of the differences on resolution is not monotonic, and while the results for three resolution levels are very close to each other, those for the highest one is markedly different. The examination here of additional mass bins reveals a more general picture: in the low mass bins, higher resolution results in lower differences, while in the high mass bins, highest resolution results in larger differences. This highlights an argument made in the discussion of Figure (3.2.2), namely that changes with resolution may arise due to the appearance of new physical processes or phenomena at higher resolution levels, for example in the mode of star formation, or the galactic dynamics. This highlights further the idea that our quantitative results are idiosyncratic to the particular physical model that is employed, in the broadest sense that involves also the numerical resolution, and cannot immediately be generalized to other numerical or physical setups, or to the real Universe. A similar discussion is relevant for the behavior of black hole mass differences (third row, right column).

The results for stellar mass (topright) are similar for all mass bins we consider except for the highest one. The stellar mass differences are significantly larger than those expected purely from Poisson noise, and instead decrease with increasing resolution in a way that appears to converge towards a finite value that is only mildly massdependent. Specifically, in all mass bins and resolution levels, when galaxies are represented by more than roughly stellar particles, the differences between shadow simulations become almost independent of the number of particles (even up to particles), and are typically . An exception to the appearance of convergence is the highest mass bin, which shows a rather sharp decrease between the three low resolution levels and the highest one. This is possibly related to the decreased scatter in the highmass, highresolution case shown in the bottom two panels, discussed next.

At the highest resolution level, the differences between shadow simulations show a nearly massindependent value, for (bottomleft) and for (bottomright). Perhaps surprisingly at first look, in the lower mass bins this appears to be a value towards which the lower resolution levels are converging, while in the higher mass bins the results are nonmonotonic with resolution, and in particular show a large decrease between the three lower resolution levels and the highest one. We hypothesize that this has to do with the onset of quenching in the high mass bins, and its sensitivity to resolution. In particular, if it is the case that the butterfly effect can determine whether a galaxy is quenched or not, large shadow pairwise differences are to be expected. Since the quenched fraction is high in high mass bins (e.g. Nelson et al., 2018), it should not be surprising that the differences are indeed seen to increase with mass. This is not the case, however, for the highest resolution level, where the results are more in line with the lower mass bins, potentially indicating weaker quenching at this high resolution. To test this hypothesis, we calculate the mean and overall scatter between the SFRs of galaxies (all galaxies, not between shadow galaxies) in the two highest mass bins at the resolution level. We find and for the means and and for the scatters, for the two bins respectively. For the resolution level, in contrast, strong quenching exists in these high mass bins, where the means are and and standard deviations and , respectively. This indeed serves as evidence in support of our hypothesis.

The results for the halfmass radius (second row, left) are remarkably insensitive to resolution variations and show little mass dependence, with a standard deviation of across this parameter space. The exceptions are lowmass bins at the lowest resolution, which contain only a few dozen stellar particles and hence show larger differences. Those however quickly reach their converged values already at the resolution level. Namely, all galaxies at all resolution levels, which are resolved by more than particles, show a roughly converged result. The results for the stellar metallicities (third row, left) are the most wellbehaved with resolution, showing both a monotonic and converging trend of decreasing differences as the resolution increases, and in particular results that are very similar between the two highest resolution levels.
4 Results: Scaling Relations
While so far we have quantified and discussed the differences that develop between shadow simulations one physical quantity at a time, we now turn to study relations between the differences in pairs of quantities, and the implications of those for our general understanding of ‘galaxy scaling relations’, namely correlations between several quantities within a population of galaxies. We begin by presenting an extension into two dimensions of Figure (3.1), which presented examples of onedimensional distributions of pairwise logarithmic differences between shadow galaxies. In Figure (4) we show several examples of how these differences in one quantity are related to those in another, using heat maps that represent the twodimensional distributions of differences in several such pairs of quantities. These are all based on the snapshot in the TNGmodel series of simulations that have been perturbed at . Each row shows a different combination of two quantities, with one panel per resolution level, increasing from left () to right ().
The first row of Figure (4) demonstrates that the differences between shadow galaxies in stellar mass and maximum circular velocity are positively correlated with substantial scatter. The situation is similar between stellar mass and SFR (second row). On the other hand, there appears to be a very mild anticorrelation between stellar mass and halfmass radius differences (third row), and no significant correlation between stellar mass and dark matter mass (bottom row). These (non/)correlations appear to be stable with resolution variation even as the magnitude of the differences themselves vary significantly in some cases (in particular, and ). We do not aim here to explain these results, but we discuss their implications.
If the differences between shadow galaxies in a pair of quantities relate to each other in a similar way to the mean relation between those quantities for a large galaxy population, then we can say that this pair of shadow galaxies are displaced with respect to one another ‘along’ the overall ‘scaling relation’ between those quantities. This holds also for the case of an anticorrelation that goes exactly in the opposite direction. If, however, the differences relate to each other in a different way, then the line connecting the two shadow galaxies is not parallel to the overall scaling relation, and there is a component that is perpendicular to it and parallel to its scatter.
Given uncorrelated differences between two quantities, on a galaxy population level, individual pairs of shadow galaxies would necessarily be displaced with respect to one another to some extent in a way that is perpendicular to the scaling relation between those two quantities. Some pairs would be displaced perpendicular to the relation, some parallel to it, and most in some intermediate direction. Figure (4) clearly indicates that that is the case for the pairs of quantities shown in the third and fourth rows, where there is no significant correlation between the differences, even though the quantities themselves certainly are correlated. But even in the case of the first and second rows, which do show some overall positive correlation that is indeed similar to the overall scaling relation between the two quantities, there is significant scatter. Hence, also in the case of the  plane, individual pairs of shadow galaxies are expected to show significant displacements in all directions.
This is demonstrated explicitly in Figure (4), which shows a scatter plot of the stellar mass and the maximum circular velocity of galaxies in the simulation set of the TNGmodel series. The full galaxy population in each of the three simulations in this set is shown with small dots of a different color, clearly delineating (a version of) the wellknown TullyFisher relation and its scatter. In addition, twelve triplets of shadow galaxies are shown using large black symbols, each with a unique symbol. Some of them (crosses, hexagrams) are displaced roughly in parallel to the overall slope of the mean scaling relation. Some, however, are displaced roughly in the perpendicular direction (asterisks, diamonds). Some do not have a strong preferred direction (triangles), while some are displaced mostly along one of the axes (pentagrams, circles). It is possible that shadow versions of certain galaxies indeed intrinsically tend to be displaced in certain preferred directions, or perhaps these particular cases are just random draws from an underlying distribution of displacements that is similar for all galaxies. To distinguish these two possibilities would require having a large number of shadow versions for a sizable number of galaxies, but since we only have a small number of shadow versions for each galaxy (though for a large number of galaxies), our setup does not allow addressing this specific question further.
Figure (4) suggests visually that the scatter between shadow versions of individual galaxies may be comparable to the overall scatter in certain scaling relations in our simulations. In Figure (4) this notion is quantified (using the procedure described in Section 2) for a selection of six scaling relations with the TNGmodel series: the TullyFisher relation  (top left), the black hole massstellar mass relation  (middle left), the star formation main sequence  (bottom left), the sizemass relation  (top right), and the baryonic conversion efficiency  (middle right), and the massmetallicity relation  (bottom right). In particular, what is shown as a function of postperturbation time is the ratio between the inferred standard deviations between shadow galaxies in the direction perpendicular to the various scaling relations and the standard deviations of the full galaxy population in that same direction, namely the intrinsic scatter of the relations. This can be thought of as the fractional contribution of the butterfly effect to the total scatter of the relations. More precisely, under the reasonable assumption that the butterfly effect and additional effects contribute to the scatter independently, and hence contributions should be summed in squares, the square of the quantity shown on the vertical axes is the fractional contribution of the butterfly effect to the variance of the scaling relations.
Figure (4) presents what we regard as the central result of this work. It shows that with the TNG model the butterfly effect contributes at late cosmic epochs of the variance (the square of the scatter) around the TullyFisher relation, the  relation, and the the star formation main sequence (three left panels). For the former relation, this contribution is for most of cosmic time (), while for the latter, it is throughout this time window. These results are very convincingly converged. In contrast, the contributions of the butterfly effect to the sizemass, baryonic conversion efficiency and stellar massmetallicity scaling relations (three right panels) are much smaller and less clearly converged with increasing resolution. At our highest resolution, the contribution at late times to the variance around the these relations is .
It is interesting to consider which of the two quantities making up each of the relations contributes more significantly to these results. The cases of the ,  and  relations are all similar: the relations themselves are roughly linear, the differences between shadow galaxies in the two quantities making up the relation are uncorrelated, and one of them is larger than than the other, making it the dominant contribution. As can be seen in Figure (3.2.2), the differences in stellar mass are larger than those in dark matter mass, making the butterfly effect for the stellar mass dominate its relative contribution to the scatter in the  relation. Similarly, the differences in SFR and those in black hole mass are larger than those in stellar mass, making the former two dominate the total contribution of the butterfly effect to the scatter in the star formation main sequence and the  relation, respectively. For the sizemass and massmetallicity relations, the picture is slightly different, since the relations themselves are not linear but sublinear. This means that differences in stellar mass contribute less significantly to the scatter in these relations than equal differences in size or metallicity, since those in stellar mass displace galaxies more parallel to the relation than perpendicular to it. The implication of this is that the differences in size (metallicity) dominate the butterfly effect contribution to the scatter in the sizemass (massmetallicity) relation. All these results hold true at all the resolution levels we probed.
The case of the TullyFisher relation is most involved in this respect. At low resolution, the differences in stellar mass are an order of magnitude larger than those in maximum circular velocity (as can be seen in Figure (3.2.2)), hence in spite of the flatness of the  relation, the differences in stellar mass dominate the contribution to the overall scatter in the relation. This is however driven by the nonconvergence of the stellar mass differences at the resolution level. At the highest resolution, in comparison, the differences in are smaller, while the differences in are similar, rendering the latter the dominant contributor to the overall butterfly effect contribution to the scatter in the relation. Note that it is still the case even at the highest resolution that the differences in are smaller than those in , but since the  relation is sublinear, the former are nevertheless dominant in the total scatter of the relation.
5 Summary and Discussion
5.1 Summary
In this paper we investigate the response of cosmological simulations, in particular hydrodynamical ones that include models for galaxy formation, to minute perturbations to their initial conditions. The main metrics we use are global, integrated properties of galaxies such as their mass, peak circular velocity, or starformation rate, and our samples contain hundreds to thousands of galaxies since our simulations are of uniformresolution cosmological boxes. We find that minute differences, close to the machine precision, that we introduce between sets of otherwise identical ‘shadow’ simulations at early cosmic times grow over billions of years by many orders of magnitude. We hence determine that ‘the butterfly effect’ is present in our cosmological hydrodynamical simulations. To understand whether the magnitude of the effect is large enough to be of general interest for galaxy formation theory, we quantify the typical uncertainty on various simulated galaxy properties that the effect induces, and moreover quantify the contribution of the effect to the scatter in various galaxy scaling relations. Before further discussing our results as well as their relation to the real Universe, we summarize them as follows.

Figures (3.2.1) and (3.2.2): The divergence rate between shadow simulations, and in particular the existence of a saturation level and its magnitude, is not universal but varies with the considered quantity, the physics included in the simulation, and numerical resolution. Generally speaking, the resolution dependence of the results is much weaker in simulations that include stellar and black hole feedback than in those that include no feedback. This implies that at the highest resolution we consider, which is better than that of the Illustris and TNG100 simulations, differences between shadow simulations that include feedback are larger than between those that do not.

Figure (3.2.2): After of cosmic evolution, the differences between shadow simulations with our fiducial feedback model in terms of all the baryonic galaxy properties that we explored are still growing. At our highest resolution, by they reach a level of (namely, a few percents) for peak circular velocity, (namely, tens of percents) for stellar halfmass size, starformation rate and angle between halo and galaxy angular momentum vectors, and in between those values for stellar mass. The differences of dark matter mass, on the other hand, reach a constant level of already after of evolution.

Figures (4) and (4): On a galaxybygalaxy basis, the differences between shadow galaxies in the values of different properties are largely uncorrelated. This means that for the scaling relations between two distinct galaxy properties that we examined, for example the TullyFisher relation, the separations between sets of shadow galaxies are sometimes roughly aligned with the relation but sometimes are roughly perpendicular to it. In other words, the scatter about (i.e. perpendicular to) scaling relations arises not only due to macroscopic differences in initial conditions between different galaxies, which determine e.g. the largescale tidal field and the timings and mass ratios of mergers in their formation history, but also due to the sensitivity of the final galaxy properties to the ‘microscopic’ initial conditions.

Figure (4): Quantifying the previous point, we find that the scatter perpendicular to the TullyFisher relation between shadow galaxies with reaches, at late cosmic times, a value that is approximately of the total scatter in the relation. This means that about one half of the variance around the mean relation arises from the chaoticlike behavior of the simulation. Similar or even higher values are found for the sequence of starforming galaxies between their SFR and their stellar mass and for the relation between black hole mass and host galaxy stellar mass. In contrast, for the relations between stellar mass and halo mass as well as between stellar size or metallicity and stellar mass, the contribution of the butterfly effect to the overall relation scatter is not converged and is lower at higher resolutions. In particular, at our highest resolution, the butterfly effect only contributed a few percents of the variance about these relations.
5.2 Implications for Interpretation of Simulations
First, we should emphasize that, as a result of sample variance, the effect we have studied, namely differences between shadow galaxies, in principle propagates into differences of the properties of the ensemble of galaxies between shadow simulations. Namely, ensemble properties such as the mean and scatter of scaling relations between two quantities or the total stellar mass or SFR in the simulation box may differ between two shadow simulations simply because each and every galaxy is different to a certain extent from its shadow. However, we consider this effect to be unimportant in most cases, since ensemble differences shrink towards zero as the number of galaxies increases, due to the central limit theorem. In other words, if the simulation volume is small and contains only a small number of galaxies, ensemble properties of those galaxies will be sensitive to the individual galaxies; instead, for larger and larger number of galaxies in the ensemble, the differences between the shadows will tend to cancel out more and more completely, leaving the average statistical properties of the ensemble of galaxies less and less affected. Nevertheless, in regimes in which a small number of galaxies is considered, for example by applying some cuts in a multidimensional parameter space, the ensemble properties may be strongly enough affected by the uncertainty of the properties of the individual galaxies the ensemble is comprised of. For example, the stellar mass function at the highestmass end of any simulation box is by definition based on a small number of the most massive galaxies in the simulation. The uncertainty on those masses implied by the butterfly effect (e.g. at a level of a few percent, Figure (3.3)) will then translate to a similar level of horizontal uncertainty of the mass function itself. This in turn, if the mass function is steep, will translate into a larger vertical uncertainty, which may be needed to be taken into consideration.
A distinct implication of this work is on our ability to explain the scatter in scaling relations or in galaxy properties using deterministic considerations. If for given initial conditions and a given physical model, each galaxy may occupy a finite rather than infinitesimal region in property space, then its properties can only partially be predicted based on its initial conditions and a set of physical arguments, or in other words, there is a limit to the degree to which one can ‘understand’ the properties of that galaxy. When applied to a galaxy population, this kind of argument implies that only a fraction of the scatter in galaxy properties or in correlations between them can be understood, and once a correct model explains that fraction, the understanding is in fact complete. If the butterfly effect exists in real galaxy formation as it does in our simulations, a possibility discussed below, then these arguments apply to the scatter and scaling relations of galaxies in the real Universe.
If, however, the effect we measure in the simulations does not exist or is much larger than in the real Universe, then the implication may be that some of the simulated scatter is artificially inflated by the numerics. In this case, care should be taken when comparing the simulated scatter to the observationallyinferred one. For example, if the simulated scatter is smaller than the intrinsic scatter inferred from observations (as has been argued to be the case for the TullyFisher relation, e.g. McGaugh, 2012, and as is probably the case for the black holestellar mass relation, e.g. Weinberger et al., 2017b), then the tension between the two may in fact be even starker than it might appear without considering the numerical butterfly effect. Conversely, if the simulated scatter is larger than observed, the numerical butterfly effect could account for the discrepancy.
In these considerations we have implicitly assumed that scatter driven by the butterfly effect is independent of, and can be added e.g. in quadrature, to other sources of scatter, namely scatter arising from ‘macroscopic’ differences between the environments and initial conditions of different galaxies. However, this does not necessarily have to be the case. If all the butterfly effect did was ‘shuffling’ galaxy properties between different galaxies, the methods used in this work would detect a nonzero butterfly effect, while the overall scatter between galaxies would not increase.
A more specific scenario where such a situation could arise is one where the scatter between galaxies is associated with short time scale oscillations of galaxy properties. These oscillations could be driven by some physical process regardless of the butterfly effect. In this case, the butterfly effect can be thought of as merely determining the ‘phase’ of each galaxy within the oscillation pattern, but neither as the driver of the pattern itself nor of the scatter of galaxy properties associated with it. In such a scenario, the evolution paths of shadow galaxies in some physical property space will be oscillating around some mean path and possibly recurrently crossing each other (as opposed to monotonically drifting away from each other). In this case, measuring the butterfly effect on timeaveraged galaxy properties will result in a diminished effect compared to instantaneous properties. We leave further considerations along these lines to future work, but point out that in the single case we have examined using a timeaveraged measurement, namely that of , the magnitude of the butterfly effect we found was very close to that of the instantaneous property . Note that even in the oscillatory scenario, our measurement of the magnitude of the butterfly effect is an indication of the level to which the properties of individual galaxies can(not) be predicted from first principles, even if the level of scatter between galaxies can be attributed to the physical processes driving the oscillations rather than to the butterfly effect itself.
Our work has implications also for the interpretation of differences between simulations with different physical models or numerical schemes on an objectbyobject basis, most notably in the context of ‘zoomin’ simulations. For a conclusion to be drawn that the properties of a certain simulated galaxy differs between two simulations due to changes to the numerical scheme (including thereby both physical processes and their particular implementation), it has to first be determined that these differences do not arise due to the butterfly effect alone. Unless a large ensemble of shadow simulations is available, which is normally not the case, this implies that the changes have to be significantly larger than the typical magnitude of the butterfly effect in order to be considered ‘real’. A complication that arises is that the magnitude of the butterfly effect on the considered quantity it is not apriori known, since as we have demonstrated here, that magnitude itself varies with the physical model as well as with numerical resolution. That Keller et al. (2018) found an opposite effect of feedback to the one we found, namely that in their simulations feedback acts to reduce the magnitude of the butterfly effect rather than enhance it as in ours, serves as further evidence that the dependence on physical and numerical approach may be significant, and is complicated.
5.3 Implications for Galaxy Formation in the Real Universe
A fundamental question that underlies the work presented here is whether the effect we identified is purely numerical, namely applies only to the simulated systems, or physical, in the sense that it exists in the real Universe as well. We do not have a clear answer to this question, but we discuss several interesting aspects of it.
First, one can ask whether the inherent limited accuracy of the integration of the gravity and hydrodynamics equations may be introducing chaos into the system. For example, an infinitesimal change in the position of a dark matter particle or a gas meshgenerating point may result in a finite change to the forces or the fluxes due to a finite change in the structure of the gravity tree or the geometry of the mesh. This clearly results in the amplification of some differences, at least locally and on short time scales. The fundamental question is however whether it is this kind of amplification that builds up gradually towards the macroscopic effect we have quantified, or whether those purely numerical effects tend to cancel out.
A second, related question is whether the use of probabilistic modeling in the simulations (which cannot be trivially converted into continuous/nonprobabilistic formulations) introduces chaos into the numerical system that does not exist in the physical reality. The probabilistic algorithmic implementation uses random number generators to control several physical processes. We find that infinitesimal changes in our simulations may result in discrete changes of finite magnitude within a single simulation time step due to changes in the ‘field’ of random numbers as a function of space and time. Indeed, we show in Appendix B.2 that differences between shadow simulations grow faster once their respective random number sequences are effectively no longer the same.
We believe that it is at least plausible that while these numerical drivers of the butterfly effect exist in the simulations but not in physical reality, our results do largely apply to galaxy formation in the real Universe as well, due to physical drivers of the butterfly effect. This is because galaxies contain chaotic systems of various natures and scales, which inject chaos into the galactic scale in analogy with the purely numerical factors described above. For example, even in a purely gravitational system and in the absence of discrete effects in the force calculation, some satellite galaxies and stars are on truly chaotic orbits within their dark matter halos. Further, if turbulence in molecular clouds is truly chaotic (e.g. Deissler, 1986; Bohr et al., 2005) then chaos determines where and when individual stars form and hence where and when they explode. These are ‘discrete’ events that are analogs to the choice of a random number to determine, e.g., the birth time and place of a star in the unresolved interstellar medium in our simulations. In some aspects our simulations are most likely to actually suppress chaos that exists in reality. For example, the flow in the interstellar medium in our simulations is less turbulent than in reality due to numerical viscosity. Another example is the lack of stochasticity in the sampling of the initial mass function (IMF) in our simulations, in which each stellar population is comprised of a ‘smooth’, idealized IMF.
While the nature of chaos injection from small scales into the galactic scale differs between our simulations and reality, as discussed, it is possible that the growth of differences in macroscopic galaxy properties that is exhibited by our simulations captures a real phenomenon. This is the third, ‘dynamical’, phase discussed at the end of Appendix B.2, during which the growth of differences is no longer exponential but instead powerlaw or slower, but during which most of the growth in absolute terms is achieved. It is instructive in this context that we find a significant difference in the characteristics of the butterfly effect between our simulations with and without feedback. In spite of having the same smallscale chaos drivers such as roundoff errors, discreteness effects and random numbers as the simulations without feedback, those with feedback result in a stronger butterfly effect. This suggests that it is the nature of the dynamics on galactic scales that determines the degree to which ‘random’ differences e.g. in the formation sites of stars develop into global differences in galaxy properties.
Even under the assumption that this is indeed the case, our work can nevertheless not yet determine with great certainty what is the magnitude of the butterfly effect on galaxy formation in the real Universe. Additional work would be required in order to characterize and understand the dependence of this magnitude on the physical models used in the simulation. It is possible that eventually only an accurate simulation of galaxy formation, perhaps much more accurate than ours, will be reliable enough to parallel the real Universe in this respect.
We thank Paul Torrey for comments on the manuscript. The simulations analyzed in this study were run on the Iron cluster at the Flatiron Institute and the GordonSimons system at the San Diego Supercomputer Center, as well as on the Stampede supercomputer at the Texas Advanced Computing Center through XSEDE allocation AST160026. We are grateful to the scientific computing teams at all these three institutions for their continual and dedicated technical assistance and support. The Flatiron Institute is supported by the Simons Foundation. GLB acknowledges support from NSF grant NNX15AB20G and NSF grant AST1615955. RW acknowledges support through the European Research Council under ERCStG grant EXAGAL308037, and would like to thank the Klaus Tschira Foundation.
References
 Adams (2004) Adams, F. C. 2004, in Astronomical Society of the Pacific Conference Series, Vol. 323, Star Formation in the Interstellar Medium: In Honor of David Hollenbach, ed. D. Johnstone, F. C. Adams, D. N. C. Lin, D. A. Neufeeld, & E. C. Ostriker, 357
 Bate et al. (2010) Bate, M. R., Lodato, G., & Pringle, J. E. 2010, MNRAS, 401, 1505, doi: 10.1111/j.13652966.2009.15773.x
 Benhaiem et al. (2018) Benhaiem, D., Joyce, M., Sylos Labini, F., & Worrakitpoonpon, T. 2018, MNRAS, 473, 2348, doi: 10.1093/mnras/stx2444
 Bohr et al. (2005) Bohr, T., Jensen, M. H., Paladin, G., & Vulpiani, A. 2005, Dynamical Systems Approach to Turbulence (Cambridge Nonlinear Science Series) (Cambridge University Press)
 Davis et al. (1985) Davis, M., Efstathiou, G., Frenk, C. S., & White, S. D. M. 1985, ApJ, 292, 371, doi: 10.1086/163168
 Deissler (1986) Deissler, R. G. 1986, The Physics of Fluids, 29, 1453, doi: 10.1063/1.865663
 ElZant et al. (2018) ElZant, A., Everritt, M., & Kassem, S. 2018, ArXiv eprints. https://arxiv.org/abs/1804.06920
 Fux (2001) Fux, R. 2001, A&A, 373, 511, doi: 10.1051/00046361:20010561
 Genel et al. (2014) Genel, S., Vogelsberger, M., Springel, V., et al. 2014, MNRAS, 445, 175, doi: 10.1093/mnras/stu1654
 Gunn & Gott (1972) Gunn, J. E., & Gott, III, J. R. 1972, ApJ, 176, 1, doi: 10.1086/151605
 Hahn & Abel (2011) Hahn, O., & Abel, T. 2011, MNRAS, 415, 2101, doi: 10.1111/j.13652966.2011.18820.x
 Heggie (1991) Heggie, D. C. 1991, in Predictability, Stability, and Chaos in NBody Dynamical Systems, ed. S. Roeser & U. Bastian, 47–62
 Hernquist (1987) Hernquist, L. 1987, ApJS, 64, 715
 Kaurov (2017) Kaurov, A. A. 2017, ArXiv eprints. https://arxiv.org/abs/1709.04353
 Keller et al. (2018) Keller, B. W., Wadsley, J. W., Wang, L., & Kruijssen, J. M. D. 2018, ArXiv eprints. https://arxiv.org/abs/1803.05445
 Laskar (1989) Laskar, J. 1989, Nature, 338, 237, doi: 10.1038/338237a0
 Maffione et al. (2015) Maffione, N. P., Gómez, F. A., Cincotta, P. M., et al. 2015, MNRAS, 453, 2830, doi: 10.1093/mnras/stv1778
 Marinacci et al. (2017) Marinacci, F., Vogelsberger, M., Pakmor, R., et al. 2017, ArXiv eprints, 1707.03396. https://arxiv.org/abs/1707.03396
 Matthee & Schaye (2018) Matthee, J., & Schaye, J. 2018, ArXiv eprints. https://arxiv.org/abs/1805.05956
 McGaugh (2012) McGaugh, S. S. 2012, AJ, 143, 40, doi: 10.1088/00046256/143/2/40
 McGaugh & Schombert (2015) McGaugh, S. S., & Schombert, J. M. 2015, ApJ, 802, 18, doi: 10.1088/0004637X/802/1/18
 Naiman et al. (2018) Naiman, J. P., Pillepich, A., Springel, V., et al. 2018, MNRAS, doi: 10.1093/mnras/sty618
 Nelson et al. (2018) Nelson, D., Pillepich, A., Springel, V., et al. 2018, MNRAS, 475, 624, doi: 10.1093/mnras/stx3040
 Pillepich et al. (2018a) Pillepich, A., Springel, V., Nelson, D., et al. 2018a, MNRAS, 473, 4077, doi: 10.1093/mnras/stx2656
 Pillepich et al. (2018b) Pillepich, A., Nelson, D., Hernquist, L., et al. 2018b, MNRAS, 475, 648, doi: 10.1093/mnras/stx3112
 PriceWhelan et al. (2016a) PriceWhelan, A. M., Johnston, K. V., Valluri, M., et al. 2016a, MNRAS, 455, 1079, doi: 10.1093/mnras/stv2383
 PriceWhelan et al. (2016b) PriceWhelan, A. M., Sesar, B., Johnston, K. V., & Rix, H.W. 2016b, ApJ, 824, 104, doi: 10.3847/0004637X/824/2/104
 Sellwood & Debattista (2009) Sellwood, J. A., & Debattista, V. P. 2009, MNRAS, 398, 1279, doi: 10.1111/j.13652966.2009.15219.x
 Sparre et al. (2014) Sparre, M., Hayward, C. C., Springel, V., et al. 2014, ArXiv eprints. https://arxiv.org/abs/1409.0009
 Springel (2010) Springel, V. 2010, MNRAS, 401, 791, doi: 10.1111/j.13652966.2009.15715.x
 Springel & Hernquist (2003) Springel, V., & Hernquist, L. 2003, MNRAS, 339, 289, doi: 10.1046/j.13658711.2003.06206.x
 Springel et al. (2001) Springel, V., White, S. D. M., Tormen, G., & Kauffmann, G. 2001, MNRAS, 328, 726, doi: 10.1046/j.13658711.2001.04912.x
 Springel et al. (2005) Springel, V., White, S. D. M., Jenkins, A., et al. 2005, Nature, 435, 629, doi: 10.1038/nature03597
 Springel et al. (2018) Springel, V., Pakmor, R., Pillepich, A., et al. 2018, MNRAS, 475, 676, doi: 10.1093/mnras/stx3304
 Tacchella et al. (2016) Tacchella, S., Dekel, A., Carollo, C. M., et al. 2016, MNRAS, 457, 2790, doi: 10.1093/mnras/stw131
 Teyssier (2002) Teyssier, R. 2002, A&A, 385, 337, doi: 10.1051/00046361:20011817
 Thiébaut et al. (2008) Thiébaut, J., Pichon, C., Sousbie, T., Prunet, S., & Pogosyan, D. 2008, MNRAS, 387, 397, doi: 10.1111/j.13652966.2008.13250.x
 Tully & Fisher (1977) Tully, R. B., & Fisher, J. R. 1977, A&A, 54, 661
 Vogelsberger et al. (2013) Vogelsberger, M., Genel, S., Sijacki, D., et al. 2013, MNRAS, 436, 3031, doi: 10.1093/mnras/stt1789
 Vogelsberger et al. (2012) Vogelsberger, M., Sijacki, D., Kereš, D., Springel, V., & Hernquist, L. 2012, MNRAS, 425, 3024, doi: 10.1111/j.13652966.2012.21590.x
 Vogelsberger et al. (2014a) Vogelsberger, M., Genel, S., Springel, V., et al. 2014a, Nature, 509, 177, doi: 10.1038/nature13316
 Vogelsberger et al. (2014b) —. 2014b, MNRAS, 444, 1518, doi: 10.1093/mnras/stu1536
 Wadsley et al. (2017) Wadsley, J. W., Keller, B. W., & Quinn, T. R. 2017, MNRAS, 471, 2357, doi: 10.1093/mnras/stx1643
 Weinberger et al. (2017a) Weinberger, R., Springel, V., Hernquist, L., et al. 2017a, MNRAS, 465, 3291, doi: 10.1093/mnras/stw2944
 Weinberger et al. (2017b) Weinberger, R., Springel, V., Pakmor, R., et al. 2017b, ArXiv eprints. https://arxiv.org/abs/1710.04659
APPENDIX
Appendix A Dark Matteronly simulations
In Figure (A) we show that the initial minute perturbations we apply to particle positions at evolve into percentlevel differences in halo properties even in the DMonly case. At early times the various resolution levels evolve similarly. At late times, however, lower resolution levels show continuously increasing differences, while higher resolution levels show a weaker growth rate, which for the highest level, , appears already as a plateau at . For mass (right panel), this plateau level is however still very close to the one expected just from shot noise given the finite number of dark matter particles in the halos. Hence, the four resolution levels we have are not enough to clearly determine whether the result is converged or will continue shrinking with even higher resolution. This is different from the cases with hydrodynamics and galaxy formation models discussed in the main part of the paper.
Appendix B Special simulations for numerical verification
This Appendix has several goals: explore the sensitivity of the results to several numerical nuisance parameters, briefly explore the butterfly effect in pure Nbody cosmological simulations containing only dark matter, and support the interpretation of our main results with regards to the role of the usage of random numbers in the baryonic physics models.
b.1 The Case of Pure Dark Matter Simulations
Table B.1 provides an overview of the additional sets of simulations shown in this Appendix. The special features distinguishing these simulations from the fiducial ones fall into three categories: (i) greater numerical integration accuracy through the usage of smaller simulation time steps, (ii) variations of the magnitudes of the initial perturbations applied to the initial conditions of the shadow simulations, and (iii) a different usage of random numbers. We run several sets of DMonly verification simulations, all at resolution level , as well as several with the TNG model, mostly at resolution level .
Simulation type  Physics model  Resolution level  Line style 

x smaller simulation time step (individually)  DMonly  black asterisks, Figures (B.1) and (B.1)  
x smaller gravity tree opening angle  DMonly    
x smaller maximum simulation time step (globally)  DMonly  magenta triangles, Figures (B.1) and (B.1)  
x smaller maximum simulation time step (globally)  DMonly  magenta dots, Figures (B.1) and (B.1)  
x smaller maximum simulation time step (globally), and only a single particle is initially perturbed  DMonly  red circles, Figures (B.1) and (B.1)  
x larger initial perturbation  DMonly  cyan squares, Figures (B.1) and (B.1)  
x larger initial perturbation  TNG model    
x smaller maximum simulation time step (globally)  TNG model  magenta triangles, Figure (B.2)  
x smaller maximum simulation time step (globally) and different usage of random numbers, method 1  TNG model  purple pentagrams, Figure (B.2)  
x smaller maximum simulation time step (globally) and different usage of random numbers, method 2  TNG model  gray crosses, Figure (B.2) 
table \hyper@makecurrenttable
Table 0. \Hy@raisedlink\hyper@@anchor\@currentHrefAn overview of the numerical verification simulations. We have generated six numerical verification pairs of shadow DMonly simulations at resolution level , five of which are shown in Figures (B.1) and (B.1). Each of these pairs has a unique difference in its setup with respect to the fiducial simulations discussed throughout the paper, as briefly summarized in the first column and discussed in detail in the Appendix. One unique pair was run with the TNG model, which produces virtually identical results to the fiducial case and is therefore not explicitly shown. Three numerical verification pairs were run at resolution level with the TNG model, which are presented in Figure (B.2).
Figure (B.1) presents the standard deviations of the differences distributions between shadow subhalos in DMonly simulations, similarly to Figure (A), but for the numerical verification sets. The left panel shows the usual loglog view, while the right panel shows linear time on the horizontal axis, and only up to . It becomes clear from examination of the right panel that at early times, , the evolution can be well fit by an exponential growth of the differences in time. This is a characteristic of chaotic systems, and the thick blue lines in the topleft of that panel indicate exponential growth rates with Lyapunov time scales of and , roughly bracketing the slopes seen for the various cases in their initial phases. It is worth noting that these time scales are perhaps not surprising given that the dynamical time of dark matter halos at the time the perturbations are applied is .
The fiducial case (at the level, as all other simulations in Figure (B.1)) is shown in green, and on the left panel it is identical to the line with the same style in Figure (A). One of the special cases, where the simulation time step was decreased uniformly by a factor of for all particles (black asterisks), appears to behave essentially just like the fiducial case. The same holds for an additional case, which is listed in Table B.1 but not shown in Figure (B.1) for visual clarity, where the force calculation accuracy of the tree algorithm was significantly increased by decreasing the node opening angle threshold (Hernquist, 1987). However, when forcing the time steps of all particles to a common, smaller maximum time step (magenta triangles and dots for factors of and compared to the fiducial simulations, respectively), then the results are affected. In particular, a very similar evolution occurs, namely exponential with a similar time scale, but it is delayed with respect to the fiducial case. The origin of this behavior will be elucidated when discussing the next figure.
Before doing so, we point out the two sets where the nature of the initial perturbation has been modified. In one case (cyan squares), each particle is displaced initially at its eighth significant digit instead of the fifteenth, namely by up to instead of (corresponding roughly to a ‘single precision’ perturbation). This results in differences that are initially about two orders of magnitude larger than in the fiducial case, but the initial growth is still exponential with a similar time scale, such that a plateau is reached earlier – but to the same level as in the fiducial case. In the second case (red circles), the initial perturbation (at the fiducial magnitude) is applied only to a single particle in the whole simulation box. In this case, nonzero differences take to appear, but thereafter evolve similarly and after several more billions of years reach again the same level as the fiducial case. Next we discuss the nature of this delay as well.
In Figure (B.1) we present the time evolution of a different kind of quantity from the ones discussed so far. In this case, it is the rootmeansquare (RMS) of the distances between the positions of matched individual dark matter particles between pairs of shadow simulations. This is a useful quantity because unlike differences in global subhalo properties, these distances are continuous and are directly related to the initial perturbation, which is implemented as a displacement of particle positions. Indeed, the left panel shows that the cases with the fiducial kind of perturbation begin at a level of , as prescribed, and the case with the larger initial perturbation is correspondingly at . Finally, the case where only a single particle is perturbed appears initially at a level of , which is indeed what is expected given that the number of particles in our DMonly simulations is .
The left panel of Figure (B.1) shows the usual loglog view, while the middle and right panels show time on a linear axis, up to in the former, and further zooming in on in the latter, which also focuses on a particular range on the vertical axis (as symbolized by long dashed brown lines) that includes only the simulations with the fiducial kind of perturbations. In the middle panel it is seen that all simulation types show in the first of evolution an exponential growth of the RMS distance between shadow particles, with a Lyapunov time consistent with the dynamical time of dark matter halos at (, defined as of the contemporaneous Hubble time), as indicated by the thick blue line at the bottom of the panel^{11}^{11}11We have confirmed that it is indeed dark matter particles that are part of dark matter halos that drive the growth, rather than particles outside of halos, see also Thiébaut et al. (2008).. This is very similar to the evolution of the differences seen in Figure (B.1). Also similar is the transition to an approximate powerlaw growth at and the convergence by to a very similar result in all cases despite their vastly different initial stages and evolutions^{12}^{12}12We note that the initial imposed correlation between the magnitude of the perturbation and the position in the box is gradually erased over time, as the results converge towards a value that is independent of the magnitude of the initial perturbation (this happens even faster in the TNG model case than with DM only). This is analogous to the shrinking differences between simulations with different overall magnitudes of initial perturbations..
However, a close examination of the first in the right panel of Figure (B.1) demonstrates that in the very initial phase, the ‘effective’ Lyapunov time actually differs between the different simulations. In particular, the growth in the fiducial case (green) in the first few million years has no exponential form altogether, but instead can be seen in the left panel to be a powerlaw with an index of unity. The ‘effective’ growth time scales when exponential fits are forced, indicated with blue thick lines and associated time scales, become longer as the maximum simulation time step is reduced. Importantly, in the most aggressive case (magenta dots), the growth time scale appears to converge to the level that then continues throughout the first , as seen in the middle panel. We interpret these results to imply that the fiducial case is affected by numerical accuracy errors that present themselves as an early powerlaw growth with an artificially short associated growth time scale, but that these can be mitigated by using shorter time steps, in which case the evolution from the very beginning is physical and is on the expected, dynamical time scale^{13}^{13}13In the case where only a single particle is perturbed, the growth rate of the RMS distance before the exponential growth sets in is a power law versus time with an index of . The same power law holds for the number of particles with a nonzero distance between the shadows (not shown). A possible interpretation of this growth rate is that it corresponds to the growth of the volume of a perturbation in the spherical collapse model (Gunn & Gott, 1972)..
The lack of an initial artificially fast growth rate in the simulations that use aggressively short time steps, seen clearly in the right panel, results in an effective delay with respect to the other simulations, which is clearly seen in the middle and left panels. The intermediate cases too (magenta triangles and black asterisks) are delayed with respect to the fiducial case, by an intermediate amount^{14}^{14}14This is more significant and seen clearly in the case of the factor smaller maximum time step (magenta triangles), where the initial growth rate is intermediate too (). In the case of individually smaller time steps by a factor of (black asterisks), the generation of the ‘delay’ is unresolved by the snapshot time separations we have available, i.e. it occurs at .. This brings us back to the delay in the appearance of differences that we observed in Figure (B.1). Simulations with smaller time steps first show nonzero differences at a later time because it takes them longer to develop substantial RMS particle position differences, which are necessary to give rise to differences. A similar case applies for the large delay of the simulation where only a single particle is perturbed. In particular, we observe that it is common to all simulations that nonzero differences appear once and only once their RMS particle differences reach a level of , which occurs at different times in the various cases.
We conclude that as long as the initial perturbation of positions is large enough such that given a growth rate that is roughly of the Hubble time it has enough time to grow to a level of , then it is large enough to develop into macroscopic (percentlevel) differences in global subhalo properties such as .
b.2 The Case of Hydrodynamical Simulations with the TNG Model
Figure (B.2) is a combination analogous to Figures (B.1) and (B.1), but for test simulations based on the TNG model rather than on dark matter only. Figure (B.2) presents differences and Figure (B.2) RMS particle position differences, in both cases the left panel on a logarithmic time axis and the right panel on a linear time axis limited to . First we note that in the fiducial case (orange squares) we do not see a gradual growth of the differences as in Figure (B.1) but they appear directly at a level of . The same holds even for the case with shorter time steps (magenta triangles), where earlier snapshots are available, and where this level of differences is then seen as early as after the perturbation. Also, the RMS particle position differences show a powerlaw growth with time as early as can be probed, instead of the exponential growth seen in Figure (B.1). This suggests that a different mechanism is at play in generating the differences with respect to the DMonly case.
We can gain significant insight from an additional set that has no analog in the DMonly case, in which the treatment of random numbers is modified compared to the fiducial model (and at the same time the maximum time step is limited in the same way as discussed above, in order to have high accuracy and high time resolution). In the fiducial TNG model^{15}^{15}15Note that this aspect of the TNG model is inherited from the original starformation subgrid model of Springel & Hernquist (2003)., there is a single stream of random numbers that is used on each MPI^{16}^{16}16Message Passing Interface. task during the calculation, and each time a starforming gas cell requires a random number to determine whether to turn into a star or wind particle, it draws the next number from that stream. This means that changes such as to the number of starforming cells in the simulation, the way they are distributed between MPI tasks, or the sizes of their individual time steps, all necessarily affect the random number series that each and every gas cell in the simulation is using. This in turn means that, for example, changing one cell in the simulation box from nonstarforming to starforming will immediately result in star formation and wind ejection events occurring in modified positions and times throughout the whole simulation volume (and hence, clearly, involving a superluminal flow of information).
In an attempt to control and mitigate this unphysical effect, which normally has no adverse consequences but critically pertains to our study here, we introduced a change where the random numbers are a deterministic function of time step and of coarsegrained position (‘method 1’ in Table B.1). In other words, for each simulation time step and coarsegrained spatial position, there exists a single, welldefined random number. The level of coarsegraining used is . In this way, a change to a single cell as in the example above only propagates through more ‘local’ effects. For example, a change in one cell will affect the dynamics of its neighboring cells, some of which then might move from one coarsegrained ‘pixel’ to the other and hence have their random number series changed, inducing further changes around them. Such cascade is however expected to take longer to develop than in the fiducial case. However, once the changes are significant enough to induce a change in the overall time stepping sequence of the simulation, by skipping even a single time step in one of the simulations with respect to the other, due to even a single particle requiring a shorter time step, then in subsequent times the behavior will be identical to the fiducial case, in that every cell in the simulation will be affected by a different set of random numbers between the two shadow simulations.
The results from the set with modified random number treatment according to this ‘method 1’ are presented in Figure (B.2) with purple pentagrams. In Figure (B.2) we indeed find that initially the differences are at a very small level of , similarly to the very early times in the DMonly case. They then show a step function to the level of the fiducial case at . We interpret this delay to imply that for the first after the perturbation, the random number sequences that determine the evolution of individual cells are for the most part identical between the two shadow simulations. This is confirmed by examining the time step sequences of the two, which indeed are found to be identical in this case for steps representing of evolution, at which point one of the two simulations introduces one additional short time step as required for evolving one single cell at an earlier time than its shadow simulation, thereby decoupling the random number sequences of the two simulations from each other.
The decoupling of the random number sequences between the two shadow simulations has a strong effect on the evolution of the RMS particle position differences, seen in Figure (B.2). Following that critical time, which in the fiducial case occurs essentially immediately at the perturbation time, and in the modified case occurs at , the RMS distance between shadow particles evolves very accurately like a power law with index for approximately . We do not yet have an explanation for this behavior. Note that in the case of the set with modified random numbers, this quantity goes as , which accounts for the steep transition region seen in the left panel around .
In a further numerical experiment (‘method 2’ in Table B.1), we make the random numbers a deterministic function of both coarsegrained space and coarsegrained cosmological scale factor, such that they are independent of the nuisance parameter that is the sequential time step number of the simulation. In this case, the transition to a regime where all cells see different random number sequences between the two simulations occurs more gradually. This can be seen in all panels of Figure (B.2) (gray crosses): the growth of the RMS distance (Figure (B.2)) does not show as sharp a transition as does ‘method 1’ at . This is because the number of cells that develop different coarsegrained spatiotemporal evolution tracks grows gradually, with only local influences between cells^{17}^{17}17The shorter maximum time step we applied in ‘method 2’ compared to ‘method 1’ accounts for the slower initial growth, analogously to the DMonly case discussed in Appendix B.1.. This results also in a more gradual evolution of differences (Figure (B.2)), which in this case take almost to grow from to (compared to with ‘method 1’). At very early times after the perturbation, , before random number differences start to accumulate in a significant way, the evolution is very accurately fit with an exponential with a time scale of (not shown). This time scale is shorter than the found in the DMonly case with the same small time steps, possibly due to the shorter dynamical times of galaxies compared to dark matter halos, but a more thorough investigation of this difference is left beyond this work.
We conclude that the use of random numbers in our simulations (and possibly by extension cosmological simulations produced by other codes) injects instantaneously a relatively high level of differences between shadow simulations, levels that may take of order billions of years to develop in DMonly simulations where no such random number issues are relevant. While the particular workings of these random numbers is clearly a numerical construct, it is an interesting – and relevant – question whether they have analogs in the real Universe. This is discussed in Section 5. It is also worth pointing out that once such differences appear, they continue developing more slowly over cosmic time, as described in the main part of the paper. In this sense, the final differences are only seeded by the random number differences, but not directly determined by them, and indeed they generally develop to very different levels depending on whether feedback is present or not. Hence, we identify three regimes to the development of the initial perturbations. First, the ‘chaotic’ regime, which is similar to the DMonly case, in which the growth of perturbation is exponential. Second, the ‘injection’ regime, in which the very small perturbations are very rapidly blown up by the injection of randomness into the starformation process through the random numbers. This phase does not exist in the DMonly case. Third is the slower ‘dynamical’ regime, during which the perturbations continue growing as a powerlaw or slower, in some cases reaching a plateau after several . They typically grow into percentlevel or even larger differences, which often constitute a sizable fraction of the overall variation within the galaxy population,
Appendix C (In)sensitivity to the group finder
To mitigate a potential concern that the results we present in the main text are significantly affected by properties of the SUBFIND group structure algorithm, here we present results that are not based on SUBFIND in any way. Such a concern may arise because SUBFIND is not fully translationally and rotationally invariant. It may return different results in response to small changes in particle coordinates due to its use of a tree structure, and for the same reason its results are also generally not completely invariant to certain nuisance parameters. By calculating quantities only based on the raw particle data and on the FriendsOfFriends algorithm, as we do in this Appendix, we verify that in fact these properties of SUBFIND do not affect our results in any significant way.
Figure (C) is analogous to the top row in Figure (3.2.2), and shows the growth over time of pairwise differences between the maximum circular velocities and the masses of shadow galaxies in our TNG model series. Here, however, these two quantities are calculated in a different way from that used for Figure (3.2.2), avoiding the use of SUBFIND. The mass (right panel) is simply the full stellar mass assigned to FriendsOfFriends halos. The maximum circular velocity (left panel) is calculated within a fixed aperture of around the stellar particle with the lowest potential energy.
Quantitatively, the results in Figure (C) are similar to those in Figure (3.2.2). In fact, the difference between the two is such that the pairwise differences are somewhat larger in the SUBFINDindependent case shown here. We interpret that to be a result of the difference aperture that is used here (which itself is chosen so as to avoid the use of SUBFIND), rather than a direct consequence of the group finding algorithm. This also means that we believe that the results presented in the main body of the paper are not driven or dominated by SUBFIND.
It is worth pointing out, however, a general artifact of group finding, not specifically of SUBFIND, that occurs in rare cases. This is where the timing of a merger is different between two shadow simulations such that in one shadow simulation there are two nearby galaxies that are still considered separate objects while in the other they are already considered as merged. In such a situation, the galaxy properties calculated by the group finder may be even of order unity different between the two shadows, while the state of the physical system itself is almost identical. In these cases, the large pairwise difference is an outlier to the Gaussianlike distribution of pairwise differences such as those shown in Figure (3.1), but it can affect the overall standard deviation of the distribution. An outcome of this can be seen most prominently in the and panels of Figure (3.2.1) at , where the outlier point of the level is affected by a single galaxy, which in two of the shadows have just ‘absorbed’ a satellite while in the other two has not yet.