SELGIFS data challenge

The SELGIFS data challenge: generating synthetic observations of CALIFA galaxies from hydrodynamical simulations

G. Guidi, J. Casado, Y. Ascasibar, L. Galbany, P. Sánchez-Blázquez,
S. F. Sánchez, F. F. Rosales-Ortega and C. Scannapieco
Leibniz-Institut für Astrophysik Potsdam (AIP), D-14482, Potsdam, Germany
Universidad Autónoma de Madrid, 28049 Madrid, Spain
Astro-UAM, UAM, Unidad Asociada CSIC
Pittsburgh Particle Physics, Astrophysics, and Cosmology Center (PITT PACC).
Physics and Astronomy Department, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Instituto de Astronomía, Universidad Nacional Autónoma de México, A.P. 70-264, 04519, México
Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), 72840 Tonantzintla, Puebla, México
Instituto de Astrofísica, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 4860, 7820436 Macul, Santiago, Chile
Accepted July 14, 2019 Received …; in original form …

In this work we present a set of synthetic observations that mimic the properties of the Integral Field Spectroscopy (IFS) survey CALIFA, generated using radiative transfer techniques applied to hydrodynamical simulations of galaxies in a cosmological context. The simulated spatially-resolved spectra include stellar and nebular emission, kinematic broadening of the lines, and dust extinction and scattering. The results of the radiative transfer simulations have been post-processed to reproduce the main properties of the CALIFA V500 and V1200 observational setups. The data has been further formatted to mimic the CALIFA survey in terms of field of view size, spectral range and sampling. We have included the effect of the spatial and spectral Point Spread Functions affecting CALIFA observations, and added detector noise after characterizing it on a sample of 20 galaxies. The simulated datacubes are suited to be analyzed by the same algorithms used on real IFS data. In order to provide a benchmark to compare the results obtained applying IFS observational techniques to our synthetic datacubes, and test the calibration and accuracy of the analysis tools, we have computed the spatially-resolved properties of the simulations. Hence, we provide maps derived directly from the hydrodynamical snapshots or the noiseless spectra, in a way that is consistent with the values recovered by the observational analysis algorithms. Both the synthetic observations and the product datacubes are public and can be found in the collaboration website

hydrodynamics - radiative transfer - galaxies: formation - galaxies: evolution - methods: numerical - techniques: imaging spectroscopy
pagerange: LABEL:firstpageReferencespubyear: 2015

1 Introduction

Over the last two decades, Integral-Field Spectroscopy (IFS) has become a standard technique to study galaxy formation and evolution over cosmic time. Compared to single-fibre or long-slit spectroscopy, IFS allows to simultaneously recover the full spatial and spectral information of the target object. The optical spectrum of a galaxy, or a part thereof, comprises information about the different components that emit or absorb light within the observed region, and therefore spatially-resolved spectroscopy over a significant extent of a galaxy provides an unprecedented level of detail on the local physical properties of its gas, dust, and stars, as well as valuable constraints on other important variables, such as its dark matter content, or the evolutionary path that the system may have followed to reach its state at the time of observation.

Nowadays, several observational programmes have produced, or will soon provide, systematic IFS surveys targeted at different galaxy populations, both in the local Universe, such as e.g. SAURON (Bacon01), DiskMass (Bershady10), PINGs (Rosales-Ortega10), Atlas3D (Cappellari11), CALIFA (Sanchez12), SAMI (Croom12), MaNGA (Bundy15), MUSE (Bacon04) or AMUSING (Galbany16), as well as at high redshift, such as e.g. SINS (Foerster-Schreiber09), KMOS (Wisnioski15), or KROSS (Stott16). Although all these datasets differ widely in terms of both the number of galaxies observed and the number of spaxels sampling each object, the total number of spectra is, in most cases, so large that a significant part of the analysis must necessarily rely on fully automated procedures. In the near future, instruments such as WEAVE (Dalton14) or HARMONI (Thatte14) will routinely produce even larger datasets just for a single galaxy, and their likely use in survey mode will increase the number of spectra to be analysed by several orders of magnitude.

Albeit spectroscopic data allow in principle to infer the physical properties of the observed galaxies at a high level of detail, the correctness of the determination strongly relies on the accuracy of the different tools and procedures applied in the analysis of the spectra. Hence, the calibration of the analysis procedures, together with a rigorous assessment of the associated biases, uncertainties, model dependencies and degeneracies, is of paramount importance. Many of the tools developed in the context of traditional spectroscopy, often aimed at disentangling the emission of gas and stars, determining the kinematics of either/both components, and/or reconstructing the star formation history by means of stellar population synthesis, usually include a discussion of this kind of issues in the description of their methodology (e.g. Cappellari04; Cid_Fernandes05; Ocvirk06; Sarzi06; Koleva09; MacArthur09; Walcher11; Walcher15; Sanchez16).

Compared to traditional spectroscopy, the IFS technique provides much more information about the observed galaxies, at the price of a higher level of complexity in the analysis of the data. In particular, the precise way in which spatial information is treated may have a crucial impact on the feasibility of any scientific case as a function of the signal-to-noise ratio () of the observations. Ideally, one would like to take advantage of the highest spatial resolution provided by the instrument, analysing every spaxel as an independent spectrum, but quickly decreases to potentially unacceptable levels as the incoming light is divided into many wavelengths and spaxels. In order to find a trade-off between spatial resolution and , several algorithms have been developed to carry out a spatial segmentation (binning) of the IFS datacubes based on a variety of different approaches (see e.g. Stetson87; Bertin96; Sanders01; Papaderos02; Cappellari03; Diehl06; Sanchez12_HIIexplorer; Sanchez16; Casado16). In fact, one of the advantages of IFS over traditional spectroscopy is that large areas may be combined in order to properly characterize weak signals. As pointed out by e.g. Casado16, the optimal strategy for the segmentation is completely dependent on the specific problem under consideration, and a thorough study is necessary on a case-by-case basis.

One possible approach to test the analysis tools and methodology used both with traditional and IFS data is to apply them on simulated spectra created from analytical models of the galaxies’ stellar and dust content, or from cosmological hydrodynamical simulations of galaxy formation. The main advantage of this kind of experiments with respect to a purely observational approach is that the correct solution to be recovered (i.e. the physical properties of the galaxies) is accurately known, which makes possible to detect, quantify, and perhaps even correct systematic errors. Since current hydrodynamical codes (Governato10; Aumer13; Vogelsberger14; Wang15; Governato07; Scannapieco08; Nelson15; Schaye15 among many others) self-consistently follow the intertwined evolution of gas, dark matter and stars over cosmic time implementing a significant part of the relevant physics at the sub-resolution level through simple numerical schemes (whose details have a significant influence on the results, see e.g. Scannapieco12), they are able to connect the observable properties of the galaxies with their merger and accretion history, providing useful initial conditions for the creation of simulated spectra, with a complexity similar to the one of real galaxies.

When simulations are compared with observational data of one particular instrument/survey, after creating the spectra of the simulations a crucial point is to generate a full ‘synthetic observation’, mimicking as closely as possible all the known selection effects and biases inherent to the particular instrument, and then processing these data with the same algorithms and techniques that are applied to the actual observations, as done by e.g. Scannapieco10; Belovary14; Michalowski14; Hayward14; Smith15; Hayward15; Guidi15; Guidi16. Recently, some efforts in producing mock data in the context of IFS, modelling the galaxy spectra using simple recipes for the stellar population content have been undertaken by Kendrew16, who have used the hsim pipeline (Zieleniewski15) to create synthetic observations of simulated high-redshift galaxies that reproduce the conditions of the HARMONI instrument (Thatte14), to test its capabilities in recovering the stellar kinematics (see Wild14 for a similar study within CALIFA).

In this work we have developed a pipeline to generate IFS synthetic observations mimicking the Calar Alto Legacy Integral-Field Area (CALIFA) survey (Sanchez12) from hydrodynamical simulations of galaxies. We have used the radiative transfer code sunrise (Jonsson06; Jonsson10) to calculate the spatially-resolved spectral energy distributions of different cosmological hydrodynamical simulations, carried out with the galaxy formation codes by Scannapieco05; Scannapieco06; Aumer13. We have post-processed the output of the radiative transfer code in order to reproduce the CALIFA observations in terms of field of view, spatial and spectral resolution, noise statistics, and data format. Our final products, publicly available through a web interface111, consist of CALIFA-like synthetic datacubes, and of resolved maps of several physical properties of our simulated galaxies (masses, ages, metallicities, …), as well as maps of the emission line intensities and absorption line indices. It is one of the long-term goals of the SELGIFS collaboration to use the proposed ‘Data Challenge’ to carefully evaluate the merits and drawbacks of different strategies that may be followed in order to infer the physical properties from real IFS data.

The structure of the paper is as follows. We illustrate in Section 2 the set of hydrodynamical simulations used in this project, and we present the calculation of their spatially-resolved properties. In Section 3 we describe the procedure followed to generate the spectra of the simulations using the radiative transfer code, and we explain how we calculate some of the resolved properties from the simulated spectra. In Section 4 we illustrate the main features of the CALIFA survey, as well as the technical properties of the CALIFA observational dataset reproduced in our synthetic datacubes. We present the data format of the simulated dataset and of the resolved maps in Section 5, and we summarize our work in Section 6.

2 Hydrodynamical simulations

To produce our mock data sample we use three hydrodynamical simulations of galaxies in a CDM Universe, generated from a dark-matter simulation with the zoom-in technique (Tormen97). The initial conditions for the hydrodynamical simulations are taken from the Aquarius dark-matter only simulation (Springel08), identifying at redshift zero halos (as defined by the subfind halo finder algorithm, Springel01) that are possible candidates for the formation of galaxies with properties similar to the Milky Way, with virial mass (calculated as the mass within the radius where the density is ) between and M, and a quiet merger history in the recent past, excluding halos with neighbours more massive than half of their mass within a spherical region of 1.4 Mpc radius at (see Scannapieco09 for details). The cosmological parameters assumed are the following: , , , , and  km s Mpc with . The simulations have, at redshift , mass resolution of M for dark matter particles and of M for stellar/gas particles, and gravitational softening of pc.

Two halos, that we name C-CS and E-CS (where the first letter identifies the Aquarius halos according to the Springel et al. convention) have been simulated with a new version of Scannapieco05; Scannapieco06 model (CS hereafter), which implements chemical enrichment and Supernovae (SNe) feedback in the Tree-PM SPH code Gadget-3 (Springel05). The main changes in the updated version (CS model, Poulhazan et al., in prep.) concern the use of new metal yields (from Portinari98) including chemical enrichment from AGB stars by Portinari et al. (1998); Marigo01, a new IMF (Chabrier03), while the assumed cooling function is the one by Sutherland93 as in the original Scannapieco et al. model.

The third halo (D-MA) is simulated with the Aumer13 independent update (MA hereafter) to the CS model. MA model is different from CS in the chemical yields (which also include AGB stars contribution) with the additional modelling of metal diffusion in the ISM, in the use of a Kroupa IMF, and in the cooling function, which is taken from Wiersma09. More important, the energy feedback from SNe is, unlike in the CS model where feedback is purely thermal, divided into a thermal and a kinetic part, and the code also includes the feedback on the ISM of the radiation pressure due to massive young stars. The MA model gives in general stronger feedback compared to CS/CS, and hence younger, more metal rich and disk-dominated galaxies (for details on this model we refer the reader to Aumer13).

2.1 Properties of the simulated galaxies

Measurements of some of the integrated properties of the simulated objects are listed in Table 1. The global properties of these galaxies (already derived in Guidi15; Guidi16) have been computed considering the particles belonging to the main halo, in a 60 kpc60 kpc region with the galaxy in the centre. The galaxies are oriented face-on according to the direction of the total angular momentum.

We calculate the following global properties:

  • Total stellar mass: the mass in stars (in units of ) is computed considering the stellar particles in the simulations inside the 60 kpc60 kpc central region.

  • band absolute magnitude: we calculate the absolute magnitudes of each galaxy in the band convolving the total (face-on) spectrum generated with the sunrise radiative transfer code (Sec. 3) with the band filter (Gunn98; Gunn06).

  • Mean stellar age: we derive the global mean stellar age weighting both by the mass of the stellar particles (), and by the luminosity in the -band calculated with the Bruzual03 SPS model (); the units are 222Notice that in this work we use arithmetic means both for the ages and metalliticies (Asari07; Cid_Fernandes13). A different definition often found in the literature is the geometric mean (e.g. Gallazzi05; Gonzalez_Delgado14; Gonzalez_Delgado15; Sanchez16_FIT3D). Since we will provide these quantities smoothed by the CALIFA spatial PSF (Sec. 4.3), we choose to weight the linear quantities in order to avoid biases in the calculation of the smoothed properties..

  • Mean stellar metallicity: the global mean stellar metallicity is calculated weighting by the mass () and by the luminosity in the -band (); both are in logarithmic solar units with .

  • Velocity dispersion: we compute the velocity dispersion in the face-on projection as


    where is the mean velocity. The units are .

  • Mean gas metallicity: we derive the mean oxygen abundance of the gas 12 + log(O/H) as the mean of the (O/H) ratio of each gas particle.

Name Total mass Absolute magnitude Stellar age (log [yr]) Stellar metallicity (log []) v Gas metallicity
(band)     [km/s] [12+log (O/H)]

10.66 -21.22 10.01           9.93 -0.39           -0.37 94.2 8.52
E-CS 10.21 -20.13 10.00           9.91 -0.44           -0.49 62.4 8.24
D-MA 10.75 -21.83 9.84           9.68 -0.19           -0.05 65.8 9.09
Table 1: Global properties of the simulated galaxies used to generate the CALIFA mock datacubes. These properties have been calculated in a 60 kpc60 kpc region and for face-on orientation. Edge-on values differ from the ones presented here, and can be found in Guidi15 together with several other physical properties, while in Guidi16 these galaxies have been compared with the Sloan Digital Sky Survey dataset (Abazajian09).

Together with the global properties described above we also calculate the spatially-resolved ones, i.e. considering the particles enclosed in the physical size covered by each spaxel of our “virtual” CALIFA observations (see Section 4.1). We refer to these maps of directly measured properties (listed in tab. 2) as product datacubes (some examples can be seen in Figure 1), that represent the ‘solutions’ to be recovered by the observational algorithms.

We describe now these calculations:

  • Stellar mass/Stellar mass density: the stellar mass and stellar mass density maps have been derived from the simulation snapshots, considering the amount of stellar mass in the region corresponding to each spaxel; the units of the maps are and respectively.

  • Mass-weighted mean stellar age: we compute the mass-weighted mean stellar ages as the logarithm of the mean of the ages in each spaxel (, eq. 1); the units are .

  • Mass-weighted mean stellar metallicity: we derive the mean mass-weighted stellar metallicities (, eq. 3), where is the metalllicty of a stellar particle in solar units (with ) and we provide the logarithm of the mean metallicity in each spaxel. The units are .

  • Luminosity-weighted mean stellar age: we compute the luminosity-weighted mean stellar age weighting the ages by the flux of each star particle at calculated with starburst99333To calculate luminosity-weighted quantities in the product datacubes we use the flux at following the choice done in several studies of the CALIFA galaxies, e.g. Gonzalez_Delgado14; Sanchez16; Ruiz_Lara16. (eq. 2). We store the logarithm of the mean age in units .

  • Luminosity-weighted mean stellar metallicity: to derive the mean luminosity-weighted stellar metallicity (, eq. 4), we weight the metalllicty of a stellar particle (in solar units assuming ) by the luminosity calculated at using starburst99 SPS model. It is in units .

  • Mean velocity/Velocity dispersion: the mean velocity and velocity dispersion maps (both in units of ) have been derived weighting the line-of-sight velocity of the stellar particles inside the region sampled by each spaxel by their luminosity at , calculated with starburst99.

  • Star formation rate: the maps of the spatially-resolved SFRs are generated from the simulation snapshots, considering the amount of stellar mass formed in the previous 10 Myr, on a timescale similar to the one sampled by most of the observational indicators (Kennicutt98); it is in units of .

Figure 1: Spatially-resolved stellar properties of two simulated galaxies, D-MA_0 (face-on) on the left and D-MA_2 (edge-on) on the right. These maps show, from top to bottom, the stellar mass density, the mean luminosity-weighted ages and metallicities, the mean velocity along the line of sight, and the number of stellar particles in each spaxel (in logarithmic colour scale).
Stellar property Units
Mass density
Mean age mass-weighted
Mean metallicity mass-weighted
Mean age luminosity-weighted
Mean metallicity luminosity-weighted
Mean velocity km/s
Velocity dispersion km/s
Star formation rate M/yr
Table 2: List of the spatially-resolved stellar properties provided in the product datacubes.

3 Simulated spectra

In this section we describe the procedure followed to generate the spatially-resolved spectral energy distribution of our simulated galaxies. To this end, we post-process the simulation snapshots at redshift zero with the Monte Carlo Radiative Transfer code sunrise (Jonsson06; Jonsson10), considering as input for the radiative transfer post-processing only the stellar particles belonging to the main halo in the hydrodynamical simulations, defined according to the subfind algorithm. sunrise is a 3-D polychromatic Monte Carlo radiative transfer code, which is able to self-consistently simulate the emission and propagation of light in a dusty InterStellar Medium (ISM) from hydrodynamical snapshots, to obtain the full spatially-resolved UV-to-submillimetre Spectral Energy Distributions (SEDs). The resulting SEDs include the contribution of stellar and nebular emission, dust absorption and scattering, and hence show stellar absorption features, emission lines, as well as the effects of kinematics.

The procedure followed to obtain the spectra of the hydrodynamical simulations consists mainly in three distinct steps:

  1. sunrise assigns a specific spectrum to every stellar particle depending on the age, metallicity, normalized by the mass of the particle. In particular, according to the age of the particle, two different model spectra are considered.

    • age Myr: spectra from the starburst99 Stellar Population Synthesis (SPS) model (SB99, Leitherer99) are assigned to the stellar particles. To create the input stellar model we have selected the Padova 1994 stellar tracks (Fagotto94; Fagotto94_1) assuming a Kroupa IMF (Kroupa02) with for and for . The low-resolution spectra (which have sampling ) have been computed choosing the Pauldrach/Hillier stellar atmospheres, while for the high-resolution region of the starburst99 spectra (available only in the range with sampling of ) we have used the fully theoretical atmospheres by Martins05. The final input stellar model is the combination of the low-resolution spectra for wavelengths , , and the high-resolution spectra in the range .

    • age Myr: young stellar particles are assumed to be the source of significant amount of ionizing photons, which are efficiently absorbed by the surrounding gas producing recombination lines and forming an HII region. To these young stellar particles a modified spectrum is assigned, that takes into account the effects of photo-dissociation and recombination of the gas. The spectra coming from the HII regions are pre-computed with the 1D photo-ionization code mappings III (Groves04; Groves08) assuming spherical geometry for the surrounding gas, and depend on the metallicity of the stellar particle and of the gas around it, on the compactness parameter (which in turn depends on the ISM pressure and on the chosen value of the cluster mass M by equation 13 in Groves08), and on the covering fraction , which is the time-averaged fraction of stellar cluster solid angle covered by the Photo-Dissociation Region (PDR). We set the mappings free parameters and M to the fiducial values of and M given by Jonsson10.

  2. After sunrise assigns stellar or nebular spectra to all stellar particles, in the radiative transfer stage randomly-generated photon packets are propagated through the dusty ISM (assuming a constant dust-to-metals ratio of 0.4 according to Dwek98) with a Monte Carlo approach. Since in the multi-phase model of the ISM implemented in our hydrodynamical code (Scannapieco06) each gas particle has a single temperature, density and entropy (while other ISM models may have cold/hot phases in a given gas particle, e.g. SH03) the amount of dust is directly linked to the total amount of metals in each gas particle (for a discussion of the effects of the ISM sub-resolution structure on radiative transfer calculations see Hayward11; Snyder13; Lanz14). Dust extinction is described by a Milky Way-like curve with (Cardelli89; Draine03), while for dust scattering the phase function by Henyey41 is adopted. The Monte Carlo rays generated by the sunrise algorithm are traced on an adaptive grid made by cells covering a region of (120 kpc) with minimum cell size of pc, calculated assuming in sunrise a value of tolerance tol and V-band metals opacity for unit mass of metals kpc M (see Jonsson06 for details).

  3. The model cameras placed around the simulated galaxies obtain the SED in each pixel. In our calculations we place cameras with three different orientations (defined according to the alignment of the total angular momentum of the stars with the direction) for each galaxy, respectively face-on, and edge-on. The flux in the cameras may be convolved with bandpass filters to get broadband magnitudes and images as in Fig. 2, which shows the ()-band colour-composite images of the simulated galaxies in the three orientations; the region of the simulations observed by the CALIFA hexagonal field of view is in red in this figure.

In order to reduce the random noise introduced by the sunrise Monte Carlo algorithm, the radiative transfer process described above is run ten times for each galaxy, changing only the random seeds, and the resulting spectra are averaged over the ten different random realizations In this way we are able to reach a ’signal-to-noise’ S/N (where N is the standard deviation over the ten realizations) of in the central spaxels and S/N in the outskirt regions, which is negligible compared to the typical values of the S/N in the CALIFA spectrograph that we aim to mimic (Sanchez12).

Figure 2: Composite synthetic broadband images created in the -bands using the Lupton04 composition algorithm for our three simulated galaxies (from top to bottom, C-CS, E-CS, and D-MA) in a field of view of kpc with pixels. The orientations are, from left to right, face-on, and edge-on, labelled in the synthetic datacubes as _0, _1 and _2 respectively. The red hexagon is the region of the simulations sampled by the CALIFA field of view, with physical sizes of kpc respectively for the C-CS, E-CS and D-MA galaxies.
Figure 3: Left panel: RGB image of the [OIII]5007, H and [NII]6584  emission lines for the galaxy C-CS_1. Right panels: synthetic spectra in the spaxels corresponding to the red and green squares in the RGB image. In the upper right panel, the spaxel samples a nebular region (red square), while the lower right panel shows a V500 spectrum containing only stellar emission (green square). The part of the spectrum generated with the low-resolution stellar model (Sec. 3) is marked in red in the plot.

3.1 Measurements on the simulated spectra

As part of the product datacubes we also provide resolved maps of some spectral features as derived from noiseless datacubes (i.e. prior to the addition of the detector noise, see Section 4.4). To obtain measurements of the properties of the stellar and nebular spectra without using any observational algorithm to separate the two spectral components (which may introduce many caveats and uncertainties), we additionally generate stellar-only synthetic datacubes following the same procedure described in Sec. 3, but switching off the nebular contribution. These stellar-only synthetic datacubes are then subtracted from the full datacubes, to obtain the spatially-resolved nebular-only spectra.

  • Lick indices: we derive the strength of the Lick stellar absorption features from the stellar-only datacubes. The spectra have sampling (mimicking the V500 setup, Section 4.2), while the CALIFA spectral PSF has been not included in this calculation. The list of the absorption features provided in the product datacubes is given in Table 3444Notice that the Lick indices depend also on the velocity dispersion at which they are measured (see e.g. Sanchez-Blazquez06; Oliva-Altamirano15). In IFS observational studies the spectra in each spaxel are usually broadened to a single velocity dispersion prior to the measurement of the Lick indices, in order to consistently compare them with models with the same dispersion (e.g. Wild14). In our product datacubes we do not change the broadening of the absorption lines, since this procedure introduces additionally uncertainties in the analysis. The spaxel-by-spaxel velocity dispersion is provided in the GALNAME.stellar.fits files (Sec. 5.1) and can be used to tune the fitted models. .

  • Nebular emission line intensities: the fluxes of the emission lines listed in Table 4 are measured from the nebular-only synthetic datacubes (see above). We compute the flux of each emission line as the total flux in the nebular-only spectra between the lower and upper bounds of each line (given in Table 4). Note that the nebular-only datacubes have spectral sampling of , and do not consider the effect of the spectral PSF. It is also important to emphasize here that the nebular emission in the datacubes is limited to the stellar particles younger than 10 Myr (HII regions), and we do not count on any other sources of ionizing photons. The line intensities are stored in units of (Section 5.1).

In Figure 3 we show an RGB image of the intensities (derived from the nebular maps) of the [OIII]5007, H and [NII]6584 emission lines, together with spectra in two different spaxels in the synthetic datacubes, one containing nebular emission and the other only stellar light. An example of these maps is given in Figure 4, where we show for one of our simulated galaxies the intensities of the BPT (Baldwin81) emission lines (H, H, [OIII]5007, [NII]6584), and the corresponding signal-to-noise maps (see Section 4.4 for the discussion of the detector noise implementation in the synthetic datacubes).

Figure 4: Line intensity (left) and Signal-to-Noise (right) maps of the four BPT lines (Baldwin81) for the galaxy C-CS_1. The S/N in every spaxel is obtained as the ratio between the mean signal and noise in the wavelength range of the corresponding emission line given in Table 4.
Name Index Bandpass Blue continuum bandpass Red continuum bandpass Units Reference

4142.125 - 4177.125 4080.125 - 4117.625 4244.125 - 4284.125 mag Worthey94
CN 4142.125 - 4177.125 4083.875 - 4096.375 4244.125 - 4284.125 mag Worthey94
Ca4227 4222.250 - 4234.750 4211.000 - 4219.750 4241.000 - 4251.000 Worthey94
G4300 4281.375 - 4316.375 4266.375 - 4282.625 4318.875 - 4335.125 Worthey94
Fe4383 4369.125 - 4420.375 4359.125 - 4370.375 4442.875 - 4455.375 Worthey94
Ca4455 4452.125 - 4474.625 4445.875 - 4454.625 4477.125 - 4492.125 Worthey94
Fe4531 4514.250 - 4559.250 4504.250 - 4514.250 4560.500 - 4579.250 Worthey94
Fe4668 4634.000 - 4720.250 4611.500 - 4630.250 4742.750 - 4756.500 Worthey94
H 4847.875 - 4876.625 4827.875 - 4847.875 4876.625 - 4891.625 Worthey94
Fe5015 4977.750 - 5054.000 4946.500 - 4977.750 5054.000 - 5065.250 Worthey94
Mg 5069.125 - 5134.125 4895.125 - 4957.625 5301.125 - 5366.125 mag Worthey94
Mg 5154.125 - 5196.625 4895.125 - 4957.625 5301.125 - 5366.125 mag Worthey94
Mg 5160.125 - 5192.625 5142.625 - 5161.375 5191.375 - 5206.375 Worthey94
Fe5270 5245.650 - 5285.650 5233.150 - 5248.150 5285.650 - 5318.150 Worthey94
Fe5335 5312.125 - 5352.125 5304.625 - 5315.875 5353.375 - 5363.375 Worthey94
Fe5406 5387.500 - 5415.000 5376.250 - 5387.500 5415.000 - 5425.000 Worthey94
Fe5709 5696.625 - 5720.375 5672.875 - 5696.625 5722.875 - 5736.625 Worthey94
Fe5782 5776.625 - 5796.625 5765.375 - 5775.375 5797.875 - 5811.625 Worthey94
Na D 5876.875 - 5909.375 5860.625 - 5875.625 5922.125 - 5948.125 Worthey94
TiO 5936.625 - 5994.125 5816.625 - 5849.125 6038.625 - 6103.625 mag Worthey94
TiO 6189.625 - 6272.125 6066.625 - 6141.625 6372.625 - 6415.125 mag Worthey94
H 4083.500 - 4122.250 4041.600 - 4079.750 4128.500 - 4161.000 Worthey97
H 4319.750 - 4363.500 4283.500 - 4319.750 4367.250 - 4419.750 Worthey97
H 4091.000 - 4112.250 4057.250 - 4088.500 4114.750 - 4137.250 Worthey97
H 4331.250 - 4352.250 4283.500 - 4319.750 4354.750 - 4384.750 Worthey97
D4000_n 3850.000 - 3950.000 4000.000 - 4100.000 Balogh99
Table 3: List of the absorption line indices for which the strength in each spaxel is provided, together with the definition of the continuum and bandpass wavelength ranges.
Species Line center Lower/
upper bounds
[Ne III]3869 3869.060
H 4101.734
H 4340.464
[O III]4363 4363.210
H 4861.325
[O III]4959 4958.911
[O III]5007 5006.843
HeI 5876 5875.670
[N II]6548 6548.040
H 6562.800
[N II]6584 6583.460
[S II]6717 6716.440
[S II]6731 6730.810
Table 4: List of the emission line intensities provided in the product datacubes. Line centers, lower and upper bounds are taken from the Sloan Digital Sky Survey-Garching DR7 analysis (available at the url

4 CALIFA mock observations

In this section we describe how we convert the output of the sunrise radiative transfer algorithm into synthetic IFS observations mimicking the CALIFA survey (Sanchez12; Garcia_Benito15). The CALIFA observations were taken with the Potsdam Multi Aperture Spectrograph (PMAS, Roth05), mounted on the Calar Alto 3.5 m telescope, utilizing the large hexagonal Field-Of-View offered by the PPak fiber bundle (Verheijen+04; Kelz+06). The final CALIFA Public Data Release (DR3, Sanchez16_1) consists of 667 galaxies555The DR3 is available at the url

In this section we present a summary of the technical properties of the CALIFA survey, and we describe how we reproduce the CALIFA observations in our synthetic datacubes.

4.1 Field-of-View and spaxel size

CALIFA sample selection criteria (Walcher+14) and the observing strategy (three points dither pattern, Sanchez12) were carefully conceived to reach a filling factor of 100% across the Field-of-View (FoV) and guarantee to cover the entire optical extent of the galaxies up to effective radii . The CALIFA datacubes (i.e. three-dimensional data) present a typical spaxel physical size of kpc, with a 2D distribution666The number of spaxels is different among the objects due to the observing conditions and the disposition of the dithering pattern; usually it is in the range and in the right ascension and declination axes respectively. We have chosen a scheme for our simulated datacubes. of spectra with spatial sampling and 2.5” Full Width at Half Maximum (FWHM) spatial resolution (Sanchez12). To reproduce in our simulated observations the CALIFA FoV size, we first derive for each object the half-light radius in the -band from low-resolution radiative transfer simulations with a FoV of kpc, and we calculate the physical size of the spaxels with the cosmological calculator by Wright06, assuming that a FoV of with spatial sampling covers a region up to of each simulated galaxy. We also compute the redshift and luminosity distance corresponding to the physical size of the spaxels for the CALIFA aperture. These quantities are given in Table 5.

Object Redshift Luminosity distance Physical size
[Mpc] of the spaxels [kpc]
C-CS 0.013 58.1 0.25
E-CS 0.018 80.8 0.35
D-MA 0.024 108.2 0.45
Table 5: Redshift, luminosity distance and physical size covered by the spaxels in our synthetic CALIFA observations.

4.2 Spectral properties

In the CALIFA datasets two different overlapping spectral setups are available, the V500 low-resolution mode covering the range with sampling and FWHM spectral resolution , and the blue mid-resolution setup V1200 that covers the range with spectral sampling and FWHM resolution (Sanchez12; Garcia_Benito15). In order to generate the synthetic datacubes in both setups we must cover their respective spectral ranges. The high resolution stellar model (see Sec. 3) is available only in the wavelength range . Therefore, to cover the full spectral range of the CALIFA V500 configuration () we generate for each galaxy two different sets of radiative transfer simulations, one at higher resolution including the kinematics between , the other at lower resolution from to without kinematics, and we paste the SEDs together at . In this way we are able to exactly match the CALIFA V500 wavelength range (while the V1200 range is fully covered by the high-resolution stellar model), although the redder part of the spectrum has a spectral resolution lower than the CALIFA sampling. The regions of the spectra generated with the low-resolution stellar model are then flagged as bad pixels in the datacubes (see Sec. 5.2) and should not be used for SED fitting analysis777Notice that when we redshift our synthetic spectra we reduce the range of bad pixels, starting from the wavelength depending on the redshift of the object..

After we create these datacubes that cover the V500 and V1200 spectral range we redshift the spectra to their corresponding redshift, and we resample them to a spacing of and according to the spectral sampling of the V500 and V1200 configurations respectively. In the last steps, we remove the wavelengths outside the V500 and V1200 spectral range, as well as some of the pixels to obtain a hexagonal FoV configuration.

4.3 Point Spread Functions

In our datacubes we account for both the spatial and spectral Point Spread Functions (PSFs) affecting CALIFA observations. We convolve the two spatial dimensions of our synthetic 3-dimensional datacubes with a two dimensional Gaussian PSF, to account for the 2.5” FWHM spatial resolution in CALIFA, and we also include the effect of the PSF in the spectral dimension by convolving it with a one-dimensional Gaussian kernel, reproducing in our synthetic datacubes the known FWHM values for the PMAS/PPak spectral resolution of and for the V500 and V1200 setups respectively.

4.4 CALIFA detector noise

We include in our simulated CALIFA dataset the charateristic noise of the detector by adding random Gaussian noise to the synthetic datacubes. In order to characterize the noise associated to the PMAS/PPak instrument we have considered a sample of 20 objects from the CALIFA survey and performed a spatial and spectral analysis of the noise in both the V500 and V1200 setups. The analysed sample covers a set of different visually-classified morphological types (Walcher+14)888The morphological classification is part of DR3 and can be found at the url consisting of 9 spiral galaxies (3 face-on, 3 edge-on and 3 with intermediate inclination ), 7 ellipticals and 4 objects in the process of merging. The results of the analysis show that the dependency of the noise with the intensity of the signal, characteristic of charge-coupled devices, can be modelled with a simple parametric formula:


and refer to the intensity and the noise provided in the CALIFA datacube normalized to the median value of the intensity in the given datacube, i.e. and . We use normalized errors and fluxes in order to obtain a uniform object-independent unit-free characterization of the noise. and are the parameters we aim to fit. The first term in the equation is associated to the ‘white noise’ in the detector, while the second one accounts for the Poisson noise (or shot noise), known to be characteristic of photon counting detectors.

Our analysis also shows that the detector noise does not depend on wavelength, except for the expected edge effects. Some spatial artifacts are also found, related to specific observational errors (e.g. missalignments of the different pointings). None of them are modelled or included in our synthetic datacubes. Besides, we must comment on the well-known noise correlation caused by the CALIFA three-point dithering scheme (Sanchez07), as already shown by Husemann+13; Garcia_Benito15. Since our goal is to set the basis in order to generate state-of-the-art synthetic IFS data, and not to approach the specific problem of combining the PMAS/PPak observations and the CALIFA observational strategy (other instruments/surveys do not use dithering techniques), we have not considered this source of error.

We fit our data to equation 8 and average our results over the 20 objects considered in the analysis. The values obtained (for both setups) are summarized in Table 6, together with the main properties of the simulated datacubes: spatial and spectral dimensions (, , ), spectral sampling () and spectral resolution (). Fig. 5 shows an example of the fitted noise-intensity relation for three of the galaxies. White solid line corresponds to the best fitting of our modelled detector noise.


77 73 1877 2.0 6.0 0.23 12.10
V1200 77 73 1701 0.7 2.3 0.6 11.26

Table 6: Sizes of the simulated datacubes in the spatial and spectral dimensions (), spectral sampling and spectral resolution (), and the best-fitted noise parameters of equation 8 ().
Figure 5: Colour maps showing the distribution of the noise relative to the signal , normalized to the median value of the signal (in logarithmic scale) for three out of the twenty galaxies considered to characterize the properties of the noise. Left (right) panel displays the values corresponding to the V500 (V1200) setup. Black solid lines correspond to the median and value of the distributed noise at a given intensity. White solid lines show the best-fit curves, obtained averaging the values fitted for each galaxy over the full sample of 20 objects; the parameters of the white curve and are given in Table 6. In these plots the linear colour scale corresponds to the number of pixels in the CALIFA datacubes ( in total both for the V500 and V1200 configuration) at a given signal and noise.

5 The SELGIFS data challenge

The main goal of this work is to provide the scientific community with a reliable set of synthetic IFS observations, and with the corresponding maps of directly measured properties, that allows to test existing (and future) dedicated analysis tools, as well as to create a benchmark for verifying hypothesis and/or preparing observations.

The data are distributed through a web page999 hosted by the Universidad Autónoma de Madrid. The description of the different files and their data format is presented in the following sections.

5.1 Product datacubes

The direct calculation of the resolved (spaxel-by-spaxel) galaxy properties described in Sections 2 and 3 are provided in separate files. These maps have been calculated directly from the simulations’ output, or from the noiseless synthetic spectra prior to the addition of any observational effect.

The name of the files and data format are listed below:

  • GALNAME.stellar.fits: the file contains the resolved maps obtained directly from the hydrodynamical simulations as described in Section 2.1. These FITS files have a single Header Data Unit (HDU) holding a 9-layer matrix, containing the nine maps of the stellar properties in the order given in Table 2. The header includes the information about the physical property stored in every layer (DESC_*) and its units (UNITS_*), where * refers to the layer number.

  • GALNAME.Lick_indices.fits: this file stores the resolved maps for the 26 Lick indices measured from the noiseless stellar-only datacube (see Section 3.1). Each file consists of a single HDU unit with a 26-layer matrix that contains the twenty-six maps of the different absorption features listed in Table 3. The header provides for each layer the Lick index name (DESC_*) and its measured units (UNITS_*), with * indicating the layer number.

  • GALNAME.nebular.fits: it encloses the resolved maps for the 13 nebular line intensities measured from the noiseless nebular-only datacube (Sec. 3.1). The data are stored in a single HDU unit with a 13-layer matrix, containing all the thirteen maps of the nebular lines given in Table 4. The header stores the line names (DESC_*), rest frame wavelengths (LAMBDA_*) and units (UNITS_*) for each layer * in the file.

  • GALNAME.particle_number.fits: these files provide the number of stellar particles in the region covered by each spaxel of the simulated datacubes.

In order to provide results directly comparable with the ones generated by the observational algorithms applied to the syntetic datacubes, maps at the same spatial resolution of the synthetic datacubes are additionally available. These have been obtained convolving the stellar maps with a 2.5 FWHM Gaussian kernel, and the synthetic spectra with a 2.5” FWHM PSF before extracting the Lick indices and the nebular line intensities as described in Section 3.1. Notice that when we compute the logarithmic quantities in the stellar maps the PSF is added prior to the calculation of the logarithm.

5.2 Synthetic observations

HDU Extension name Format Content
0 PRIMARY 32-bit float flux density in units
1 ERROR 32-bit float error on the flux density
2 ERRWEIGHT 32-bit float error weighting factor
3 BADPIX 8-bit integer bad pixel flags (1=bad, 0=good)
4 FIBCOVER 8-bit integer number of fibers used to fill each spaxel
Table 7: Structure of the CALIFA FITS files in DR2 (from Garcia_Benito15).

Our synthetic CALIFA datacubes in the two V500 and V1200 setups (Section 4) are provided in different files, identified following the CALIFA DR2 naming convention GALNAME.V500.rscube.fits.gz and GALNAME.V1200.rscube.fits.gz for the V500 and V1200 respectively. The data structure of these simulated data closely follows the one adopted in CALIFA, namely datacubes in the standard FITS file format.

The FITS header of the simulated datacubes stores only the most relevant keywords available in the DR2 header. Most of the DR2 keywords containing information about the pointing, the reduction pipeline, Galactic extinction, sky brightness, etc. have been removed. In addition to the mandatory FITS keywords, we store in the primary HDU the information about declination and right ascension of the object (according to the Greisen02 standard). We give arbitrary values to these two parameters to avoid problems with visualization tools. The flux unit has been stored under the keyword PIPE UNITS as in the CALIFA datacubes, and also under the keyword BUNIT following the most recent FITS file keywords definition by Pence10.

Each FITS file contains the data for a single galaxy stored in five HDU (see Table 7), every one of them providing different information according to the data format of the pipeline V1.5 used in DR2 (Garcia_Benito15). The first two axes in the datacubes (, ) correspond to the spatial dimensions (along the right ascension and declination) with a sampling. The third dimension () represents the wavelength axis, with ranges and samplings described in Section 4.2 and Table 6.

Here we summarize the content of each HDU:

0) Primary (PRIMARY)

The primary HDU contains the measured flux densities in CALIFA units of .

1) Error (ERROR)

This extension provides the values of the noise level in each pixel, calculated according to Eq. 8. In the case of bad pixels, we store a value of following the CALIFA data structure.

2) Error weight (ERRWEIGHT)

In the CALIFA datacubes, this HDU gives the error scaling factor for each pixel, in the case that all valid pixels of the cube are co-added; in our case we set all the values to 1.

3) Bad pixel (BADPIX)

This extension stores a flag advising on potential problems in a pixel; in the CALIFA dataset this may occur for instance due to cosmic rays contamination, bad CCD columns, or the effect of vignetting. In our datacubes we flag as bad pixel (i.e. equal to 1) the regions in the spectra that are generated with the lower-resolution stellar model (see Section 4.2).

4) Fiber coverage (FIBCOVER)

This HDU, available only from DR2, accounts for the number of fibers used to recover the flux, and is set to 3 in our mock datacubes.

6 Summary

We use hydrodynamical simulations of galaxies formed in a cosmological context to generate mock data mimicking the Integral Field Spectroscopy (IFS) survey CALIFA (Sanchez12). The hydrodynamical code follows, in addition to gravity and hydrodynamics, many other relevant galactic-scale physical processes, such as energy feedback and chemical enrichment from SNe explosions, multi-phase InterStellar Medium (ISM), and metal-dependent cooling of the gas. Our hydrodynamical simulations have been post-processed with the radiative transfer code sunrise, in order to obtain their spatially-resolved spectral energy distributions. These spectra contain the light emitted by the stars and the nebulae (young stars) in the simulations, and include the broadening of the absorption and emission lines due to kinematics, as well as the extinction and scattering by the dust in the ISM.

The input parameters in sunrise have been tuned to reproduce the properties of the CALIFA instrument in terms of field of view size, number of spaxels and spectral range. After we obtain the results of the radiative transfer with sunrise, we redshift the simulated spectra to match the physical size covered by the spaxels in the radiative transfer stage with the angular resolution of the PMAS instrument used in CALIFA, and we resample and cut these spectra according to the sampling and wavelength range of the low-resolution V500 and blue mid-resolution V1200 CALIFA setups. We convert our spatially-resolved spectra into the V500 and V1200 data format of the CALIFA DR2, and we convolve these 3-dimensional datasets with Gaussian Point Spread Functions both in the spatial and spectral dimensions, mimicking the properties of the CALIFA observations in terms of spatial and spectral resolution. Finally, after we parametrize the properties of the noise in a sample of 20 galaxies both from the CALIFA V500 and V1200 datasets, we add similar noise to the simulated V500 and V1200 data.

Our final sample of 18 datacubes (3 objects with 3 inclinations both in the V500 and V1200 setups) provide observers with a powerful benchmark to test the accuracy and calibration of their analysis tools and set the basis for a reliable comparison between simulations and IFS observational data. To this purpose we generate, together with the synthetic IFS observations, a corresponding set of product datacubes, i.e. resolved maps of several properties computed directly from the simulations and/or simulated noiseless datacubes.

Although this work is specifically designed to reproduce the properties of the CALIFA observations, the method illustrated in this paper can be easily extended to mimic other integral field spectrographs such as MUSE (Bacon04), WEAVE (Dalton14), MaNGA (Bundy15) or SAMI (Allen15) by changing some of the input parameters in the radiative transfer stage and performing a similar study of the detector noise. Hence, this procedure can be easily applied to generate synthetic observations for different IFS instruments, or for studying a specific science case prior to applying for observing time. The present project can also be extended to use other hydrodynamical simulations, which will be very important in order to enlarge the given dataset and consider a more complete sample of galaxies in terms of morphology, total mass, stellar age and metallicity, gas content and merger history.

We then encourage researchers to contact the authors in case they are interested in obtaining simulated data mimicking the properties of different IFS surveys, or if they have interest in converting their hydrodynamical simulations into CALIFA-like datacubes. We hope that this work would encourage more collaboration and connection among observers and simulators, as this will be of crucial importance in view of the several ongoing and future IFS surveys, which will provide the community with large datasets of spatially-resolved properties of galaxies at different cosmic times, allowing to study galaxy formation physics at a higher level of detail than ever before.


We thank Michael Aumer for providing his simulations, P.-A. Poulhazan and P. Creasey for sharing the new chemical code, and Elena Terlevich for useful comments on the manuscript. GG and CS acknowledge support from the Leibniz Gemeinschaft, through SAW-Project SAW-2012-AIP-5 129, and from the High Performance Computer in Bavaria (SuperMUC) through Project pr94zo. YA, CS, JC and GG acknowledge support from the DAAD through the Spain-Germany Collaboration programm PPP-Spain-57050803. The ’Study of Emission-Line Galaxies with Integral-Field Spectroscopy’ programme (SELGIFS, FP7-PEOPLE-2013-IRSES-612701) is funded by the Research Executive Agency (REA, EU). YA is supported by contract RyC-2011-09461 of the Ramón y Cajal programme (Mineco, Spain). JC has been financially supported by the “Research Grants - Short-Term Grants, 2016” (57214227) promoted by the DAAD. JC and YA also acknowledge financial support from grant AYA2013-47742-C4-3-P (Mineco, Spain). LG was supported in part by the US National Science Foundation under Grant AST-1311862. PSB acknowledges support from the BASAL Center for Astrophysics and Associated Technologies (PFB-06). JC would like to thank the ’Galaxy and Quasars’ research group at the Leibniz-Institut für Astrophysik Potsdam (AIP) for the useful discussions and constructive feedback. This study uses data provided by the Calar Alto Legacy Integral Field Area (CALIFA) survey (, based on observations collected at the Centro Astronómico Hispano Alemán (CAHA) at Calar Alto, operated jointly by the Max-Planck-Institut für Astronomie and the Instituto de Astrofísica de Andalucía (CSIC).


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description