Forward Modeling of Spectroscopic Galaxy Surveys: Application to SDSS
Galaxy spectra are essential to probe the spatial distribution of galaxies in our Universe. To better interpret current and future spectroscopic galaxy redshift surveys, it is important to be able to simulate these data sets. We describe Uspec, a forward modeling tool to generate galaxy spectra taking into account intrinsic galaxy properties as well as instrumental responses of a given telescope. The model for the intrinsic properties of the galaxy population was developed in an earlier work for broad-band imaging surveys . We apply Uspec to the SDSS/CMASS sample of Luminous Red Galaxies (LRGs). We construct selection cuts that match those used to build this LRG sample, which we then apply to data and simulations in the same way. The resulting real and simulated average spectra show a very good agreement overall, with the simulated one showing a slightly bluer galaxy population. For a quantitative comparison, we perform Principal Component Analysis (PCA) of the sets of spectra. By comparing the PCs constructed from simulations and data, we find very good agreement for the first four components, and moderate for the fifth. The distributions of the eigencoefficients also show an appreciable overlap. We are therefore able to properly simulate the LRG sample taking into account the SDSS/BOSS instrumental responses. The small residual differences between the two samples can be ascribed to the intrinsic properties of the simulated galaxy population, which can be reduced by adjusting the model parameters in the future. This provides good prospects for the forward modeling of upcoming large spectroscopic surveys.
a]Martina Fagioli, a]Julian Riebartsch, a]Andrina Nicola, a]Jörg Herbel, a]Adam Amara, a]Alexandre Refregier, b]Chihway Chang, a,c]and Laurenz Gamper \affiliation[a]Institute for Particle Physics and Astrophysics, ETH Zürich, 8093 Zürich, Switzerland \affiliation[b]Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637, USA \affiliation[c]uSystems, Technoparkstrasse 2, 8406 Winterthur, Switzerland \emailAddmartina.firstname.lastname@example.org \keywordsSpectroscopic surveys, spectra simulations, principal components analysis
Although the cosmological principle ensures that the universe is homogeneous and isotropic when studied at sufficiently large scales, observations tell us that at smaller scales galaxies are not randomly and evenly distributed in space. Galaxies clump into clusters, and create voids, large areas of the universe which are empty, forming also complicated structures like filaments and sheets. This large scale structure depends both on the cosmology which describes the universe, and on galaxy properties. Three-dimensional maps that take into account the angular positions in the sky and the redshifts of galaxies are therefore a very powerful cosmological probe [2, 3, 4, 5]. This, combined with measurements of intrinsic galaxy properties such as colors, luminosities, morphologies, spectral types, or stellar masses, can also provide clues about galaxy formation and evolution [6, 7, 8].
A large sample of tracers of such large scale structures is needed to extract information about the galaxy clustering pattern and its relation with galaxy properties [9, 10, 11, 12]. Ongoing and upcoming wide-field galaxy surveys such as the Dark Energy Survey111https://www.darkenergysurvey.org. (DES), the Kilo-Degree Survey222http://kids.strw.leidenuniv.nl. (KiDS) and the survey of the Large Synoptic Survey Telescope333https://www.lsst.org. (LSST) provide a wealth of photometric data. However, the errors associated with photometric redshift measurements make spectroscopic redshift surveys also necessary. Galaxy redshift surveys such as the Deep Extragalactic Evolutionary Probe (DEEP2) , the Very Large Telescope Deep Survey (VVDS)  and the Baryon Oscillation Spectroscopic Survey (BOSS)  within the Sloan Digital Sky Survey (SDSS) III  already provide measurements of the galaxy clustering power spectrum.
In light of upcoming large spectroscopic redshift surveys, such as the extended BOSS (eBOSS444http://www.sdss.org/surveys/eboss/) [17, 18], the Dark Energy Spectroscopic Instrument (DESI555http://desi.lbl.gov/) [19, 20], the Wide Field Infrared Survey Telescope (WFIRST666https://wfirst.gsfc.nasa.gov), and ESA’s Euclid satellite777http://www.euclid-ec.org/ , it is necessary to be able to interpret the increasingly precise and numerous data from such cosmological surveys and to forecast the science performances of the experiments, for example, understanding redshift fitting routines or the reliability of photometric redshift estimates. Simulations will play a key role in this scenario. Spectroscopic surveys such as SDSS/BOSS, the 6df Galaxy Survey (6dFGS), the Big Baryon Oscillation Spectroscopic Survey (BigBOSS) and 4m Multi-Object Spectroscopic Telescope (4MOST)  have developed simulation tools to prepare their observing strategies and improve their data reduction performances (see e.g., [23, 24, 25, 26]).  developed the SPectrOscopic KEn Simulation (SPOKES), an end-to-end simulation facility for spectroscopic cosmological surveys.
In this paper, we describe Uspec and its performances in simulating realistic galaxy spectra. Uspec takes as inputs a galaxy model, described in details in , and the instrumental setup of a given telescope, and outputs redshifted, noisy galaxy spectra. This allows us to forward model a spectroscopic galaxy sample, and to compare it to an existing one after having performed the same cuts on both. The forward modeling approach is becoming a widely used technique [28, 1, 29], and consists of generating observable quantities from an astrophysical model containing, for instance, the evolution of the luminosity functions of red and blue galaxies with cosmic time (see e.g. ). A similar approach has also been performed recently for image simulations with narrow band photometric data from the Physics of the Accelerating Universe Survey, (PAUS888https://www.pausurvey.org), as shown in Tortorelli et al. (in prep.).
Here, we choose to simulate a sample of Luminous Red Galaxies (LRGs) from the SDSS surveys. LRG are a widely used tracer of large scale structure [30, 31, 32, 33], and their spectra can be well approximated by a linear combination of templates and coefficients [34, 35, 36]. We compare our simulated sample to the real ones from SDSS, after applying the appropriate cuts to both. The resulting stacked galaxy spectrum is compared to a stacked galaxy spectrum coming from the SDSS/BOSS survey. We also perform a principal component analysis which shows the agreement between the simulations and the data, proving that the method is able to reproduce the variety of properties of the LRGs we aimed to study.
The paper is structured as follows. In Section 2, we describe the model from which the basic ingredients to simulate galaxies are drawn. In Section 3, we describe the data set and the reasons behind the cuts applied to the data. In Section 4, we describe Uspec and the instrumental setup added to the model galaxies in order to generate realistic spectra. In Section 5, we compare our simulated data to real galaxy spectra, both through comparing stacked real and simulated spectra and through a Principal Component Analysis of the two populations. In Section 6, we present our conclusions.
Throughout this work, we use a standard CDM cosmology with = 0.3, = 0.7 and km s Mpc.
2 Galaxy population model
To simulate galaxy spectra, we need basic galaxy properties as inputs. The model from which such properties are drawn is fully described in , and the model parameter values that we use in this study are given in Tortorelli et al. (in prep.). In this section, we review the aspects of the model necessary for simulating galaxy spectra given an input galaxy population. For a more complete description of the model we refer the reader to .
2.1 Galaxy luminosity functions
where denotes redshift. The functional form of is taken to be a Schechter function . The galaxies are drawn from separate and evolving luminosity functions for blue and red galaxies. The distinction between red and blue galaxies is done through their Specific Star Formation Rates (SSFRs). The redshifts and absolute magnitudes are obtained by sampling from the corresponding luminosity function.
2.2 Spectral energy distributions
The next step is to model the Spectral Energy Distributions (SEDs). We model the SEDs of galaxies as linear combinations of templates weighted by coefficients , where are suitably chosen templates and come from a Dirichlet distribution  of order five, as described in . This model was constructed using the NYU Value-Added Galaxy Catalog (NYU-VAGC999http://sdss.physics.nyu.edu/vagc/) based on SDSS, with galaxies mostly at . The templates used are the templates presented in . The templates are based on the Bruzual Charlot stellar evolution synthesis models . For the coefficients , different Dirichlet distributions are used for blue and red galaxies and the parameters describing these distributions are chosen to be redshift dependent, such that statistically different coefficients are assigned to galaxies at different redshift. The coefficients , together with redshifts and magnitudes are stored and given as inputs for the spectra simulations.
2.3 Catalog generator
The basic galaxy properties described above are generated and stored as described in . In , the galaxy catalogs were used in order to simulate astronomical images, with the Ultra Fast Image Generator (Ufig) [44, 45, 46, 47, 1, 29]. Galaxy catalogs can be generated modifying the filters to compute the output magnitudes, and the extinction given by Milky Way dust can be added. This is the first time these catalogs are used to generate galaxy spectra.
3 Data: SDSS/BOSS
As a starting point, we choose a red galaxy sample that is widely used for large scale structure studies and whose properties are expected to be easier to model. We thus select LRGs from Data Release 13 (DR13) of the SDSS/BOSS survey . The imaging and spectroscopic data of this survey were obtained at the 2.5m telescope of the Apache Point Observatory (APO) in Sunspot, New Mexico , with respectively a wide field mosaic CCD camera  and a twin multi-object fiber spectrograph. The BOSS survey (Baryon Oscillation Spectroscopic Survey)  within the SDSS III  uses an upgrade of the SDSS spectrograph. We describe its relevant features in the sections below.
3.1 SDSS photometry
We make use of the petroMag magnitudes of the SDSS catalog . These magnitudes are computed with using a modified form of the  system, as described in  and . Petrosian magnitudes are measured containing a constant fraction of the total light of the objects. Furthermore, these magnitudes are model-independent101010http://www.sdss.org/dr12/algorithms/magnitudes/. They are therefore the most suitable choice to describe the photometry of bright galaxies with high signal-to-noise ratio. In the following analysis, we do not consider the u-band, as the u-band magnitude measurements have large photometric errors for SDSS red galaxies (see e.g. ). Note that we do not correct the magnitudes used here for the reddening caused by Galactic dust, but we include this effect in our modeling of the LRGs.
3.2 SDSS spectroscopy
The spectra used in this analysis are taken from the BOSS survey. Here we highlight the relevant features of the spectra we employ; for a full description of the characteristics of these data we refer the reader to .
In the BOSS survey, the number of fibers per plate has been increased to 500 (from the previous 320 of SDSS) so that a total of 1000 objects is observed per exposure. Of these fibers, 895 are dedicated to science targets, 100 to sky and standard stars, and 5 to repeated targets. The BOSS fiber size is . This is the most suitable choice for high redshift galaxies (as in this case, up to ) in order to maximize the signal-to-noise while keeping the sky background contamination low. We use spectroscopic redshifts from . The wavelength range extends from 3,650 to 10,400 Å. However, part of the blue wavelengths (Å) are excluded from the analysis below due to fringing. The same applies to the reddest wavelengths of the spectrum (Å) due to the prominent sky background residuals. For a more detailed description of noise effects, such as read-out and shot noise, and effects of flux loss such as atmospheric transmission, see Section 4.1. For a more detailed description of the sky background and resolving power effects, see Section 4.2.
3.3 Sample selection
Here, we describe how we selected the final sample of the galaxies that we analyzed. We emulate cuts as in the SDSS/BOSS CMASS sample , which aims at selecting a stellar-mass limited sample of galaxies111111http://www.sdss3.org/dr9/algorithms/boss_galaxy_ts.php. Color cuts are applied in the and plane in order to isolate high redshift galaxies, in the approximate redshift range . The sample is aimed at including red galaxies only; however, the color cuts explicitly applied in SDSS-I/II Cut-II  and 2SLAQ  to select red galaxies are not applied in CMASS. This is the reason why in our final sample galaxies with sign of gas emission (like e.g., and emission lines) are included. However, galaxies with visible emission lines only account for about 4 of the total CMASS sample (see e.g.,  and ).
First, we exclude galaxies with or band magnitudes above the SDSS magnitude limit in those bands, namely:
Cuts on the band magnitude are already included in the final CMASS color/magnitude cuts applied below. No cuts are applied in the band (see Section 3.1).
In order to emulate the CMASS sample, the following cuts in magnitude, color and redshift are applied:
where , which is the distance perpendicular to the locus of the galaxy colors in the vs. color plane. This ensures the exclusion of low redshift galaxies from the sample. The cuts in the band define the faint and bright limits. is the measurement of the flux contained within the aperture of a spectroscopic fiber in band ( in the case of the BOSS spectrograph).
A sharp cut in redshift () is also introduced by us in order to ensure a closer match between the redshift distribution of simulated and real galaxies (see Section 3.4.2 below). This cut is introduced to focus on the performances of the spectra simulations. In our future work, we will rely only on photometric quantities, as spectroscopic properties such as redshifts need to be measured on spectra themselves, which are the data products we seek to simulate.
3.4 Catalogs comparison
In this section, we present the comparison between the simulated and real galaxy catalogs. The simulated properties are derived from the model described in Section 2 and are given as input in order to simulate galaxy spectra. Here we compare those simulated input properties and the data. In Section 3.3 we listed the cuts in magnitude and redshift spaces applied to SDSS galaxies to mimic the CMASS sample. The same cuts have been applied to the simulated galaxies with the appropriate modifications (see below).
The magnitudes here employed for the simulated galaxies have been computed using SDSS filters. These magnitudes are noise free. To add realistic uncertainties to our magnitudes, we take those from the real SDSS magnitudes we employ. We find a correlation between SDSS magnitudes and their associated uncertainties, which we model with a linear relationship. As expected, fainter magnitudes have larger uncertainties. We construct Gaussians centered in the fitted value of uncertainty in each magnitude bin, having as standard deviation the scatter around the fitted values. We randomly draw uncertainties from these distributions. These uncertainties are then added to our noise-free simulated magnitudes.
The data magnitudes we employ are affected by reddening caused by dust. Therefore, we also add this effect on simulated magnitudes using reddening maps from . With regard to the cut presented in Section 3.3 on the fiber2Mag magnitude, the fiber2Mag is assumed to be the same as the band magnitude for the simulated galaxies.
Figure 1 shows the comparison between real (green) and simulated (red) magnitudes in four () of the five SDSS bands. The band, although present in the simulated galaxy catalog, has been excluded from the analysis for the reasons described in Section 3.1. It can be seen from Figure 1 that the distributions of the real and simulated galaxy population magnitudes are similarly centered and occupy the same region in the magnitude space. However, the band magnitude distributions show a small disagreement between the two galaxy populations, with the simulated galaxies being on average fainter with respect to the SDSS ones.
Redshifts are also assigned to simulated galaxies during the galaxy catalog generation. Figure 2 shows the comparison between the real (green) and the simulated (red) spectroscopic redshift distributions. The simulated redshift distribution is shifted towards lower redshifts, with a median redshift of 0.54 for SDSS galaxies and median redshifts of 0.50 for simulated galaxies (i.e., ). In future work, the parameters of the input model can be adjusted to improve the match of the two distributions.
In this first analysis of Uspec, we force a match between the redshift distributions of the real and the simulated samples. The matching is done such that in every bin of redshift there is the same number of objects for both the real and the simulated galaxies, randomly chosen from a parent sample. However, it is worth noting that the difference in is related to the parameters controlling the redshift evolution of the luminosity functions. For a more detailed discussion on this aspect, see Appendix A. After this matching, the total number of galaxies to be analyzed for both the real and the simulated sample is 1617 objects.
4 Spectra simulations: Uspec
Uspec simulates galaxy spectra of an experiment given its location, instrument and a cosmological model. All the steps taken in order to construct galaxy spectra are illustrated in the flowchart in Figure 3.
First, the spectrum of a galaxy is constructed as a linear combination
where are the 5 kcorrect templates from , and the coefficients are described in Section 2.2. This produces a noise-free, rest-frame galaxy model spectrum. The spectrum is then shifted to the observed frame given the redshift provided by the simulated galaxy catalog. The dimming effect due to shift in wavelengths, i.e., the fact that the flux enclosed in a wavelength bin must be assigned to a broader wavelength bin , and the dimming due to the luminosity distance of the sources, are applied .
4.1 Instrumental response
Noise is added to the constructed galaxy model spectra. The instrumental effects which are included into the simulated SDSS-like spectra are listed below:
Read-out noise: A read-out noise of /pixel is assumed (see as a reference  for read-out noise in the BOSS spectrograph). The read-out noise is computed per unit wavelength, taking into account the instrumental resolution, and a random normal realization of it is added to the model galaxy and sky spectra in photons (see the flowchart in Figure 3).
Shot noise: The shot noise is the poisson random realization of the model galaxy or sky spectrum. Specifically, a poisson realization of the , where stands in this case for flux expressed in photons, is created. A different random poisson realization of is then subtracted, in order to simulate a realistic sky background subtraction, as sky from a different fiber in the same plate is subtracted from the galaxy in the SDSS survey. We do not account for differences in the sky background given by different locations in the SDSS plate, which we assume to be negligible. For a detailed description of the sky model, see Section 4.2.
Transmission curve: The transmission loss due to atmosphere and instrument is taken into account. We have used the ‘spthroughput’ routine of the ‘idlspec2D’ spectroscopic reduction pipeline121212http://www.sdss3.org/dr8/software/products.php built by Princeton University and flux calibration files to create the throughput of the instrument. The galaxy and sky model spectra are first multiplied for the atmospheric transmission. The final, sky-subtracted spectrum is then divided by the atmospheric transmission as to mimic the steps of the data reduction (see flowchart in Figure 3).
4.2 Sky model
The emission lines of the night sky are important contaminants for astronomical observations. It is therefore important to properly model the sky spectrum in order to estimate its effective impact on the noise associated to the galaxies simulated by Uspec.
The night sky spectrum is recorded in every single astronomical observation in SDSS. Although its variability makes it difficult to model if one wants to simulate the precise intensity of the features of the night sky at a given moment of the night and position in the sky, the central wavelengths of the emission lines are constant in time and space, and can therefore be easily modeled. For the purpose of forward modeling the sky, a sky spectrum at a given moment of time or position in space is not needed. The poisson realization of a night sky background which includes the common sky lines and continuum is sufficient for purpose of mimicking a realistic sky subtraction. Below we list the input parameters for our simulated sky model; in Figure 4 we show our (noise-free) model sky spectrum model, its individual components and the comparison with a real (noisy) random BOSS sky spectrum.
Line Spread Function (LSF): We compute the LSF (i.e., the broadening due to the instrumental resolving power as a function of wavelength ) on randomly observed sky spectra in the BOSS survey. We fit Gaussians to a series of evenly distributed sky lines along the wavelength direction. We then derive a linear relation between the FWHM of those lines and their central wavelengths. The relation we derive is in agreement with the resolving power provided by BOSS ( in the blue range, in the red range ). We use this relation to determine the width of the sky lines in our model spectrum.
UVES Atlas of Paranal sky lines131313https://www.eso.org/observing/dfo/quality/UVES/pipeline/sky_spectrum.html: The sky lines central wavelengths and intensities are taken from . The atlas of lines in the optical and near-IR wavelength range has been acquired by UVES, the echelle spectrograph at the 8.2-m UT2 telescope of the Very Large Telescope (VLT). While the absolute intensities of sky line emission depend on the time and location of the observations, the UVES line intensities are used here as a reference in order to construct a realistic sky model spectrum.
Light pollution emission lines:  list emission lines tracing light pollution, such as HgI 5461, 5770, 5791 Å and components of the NaI 5890, 5896 Å lines indicative of both high-pressure and low-pressure Sodium lamps. These lines are not included in the UVES atlas of sky lines as in a dark site like Paranal there is no trace of such elements in the atmosphere. For this reason, we fit Gaussians to these sky lines from a real SDSS spectrum, deriving their peak intensities. We then construct a mock spectrum with using these sky lines only, using the LSF described above and the intensities as described here, and add it to the total spectrum.
Continuum: The continuum for our model sky has been computed with the SkyCal Web Application141414https://www.eso.org/observing/etc/bin/gen/form?INS.MODE=swspectr+INS.NAME=SKYCALC from the ESO Sky Calculator [69, 70]. This includes:
Scattered Moonlight: Scattered moonlight has a stronger effect on the continuum if the observing date is close to full Moon. This is not the case for the observations of CMASS, which have been taken during dark time. However, as particularly the blue wavelengths are affected by it, it can not be neglected.
Zodiacal Light: Zodiacal light is coming from interplanetary dust grains scattering sunlight. Here we choose values for ecliptic latitude and heliocentric ecliptic longitude for targets at the zenith, with airmass = 1. A strong continuum coming from zodiacal light would be expected for low absolute values of such coordinates .
Scattered Starlight: Starlight is scattered in the atmosphere. The distribution of stars reaches a peak when it gets close to the centre of the Milky Way. Therefore, the scattering model required for this kind of distribution is that for extended sources. This component of the continuum is minor compared to the other two main components mentioned above. As a consequence, computing a mean continuum spectrum it is sufficient for an exposure time calculator application, such as the one used here .
A demonstration of the impact of the sky model on the data analysis is given in Appendix B.
4.3 Construction of the final Uspec spectrum
As described in the flowchart in Figure 3 and in the previous sections, a model sky-subtracted galaxy spectrum is built starting by a linear combination of 5 templates and coefficients assigned to the galaxies, as described in Section 2. The simulated spectrum includes read-out and shot noise and the effects of the atmospheric transmission. At this point, the magnitude in the band is computed and compared to the input band. A warning is generated in case the difference between the two exceeds . This happens for of the generated galaxy spectra, which are however included in the final sample. The final fluxes, input magnitudes, redshifts and output band magnitudes are stored in order to be compared with those from real data.
5.1 Stacked spectra comparison
In order to compare real and simulated galaxy spectra, we compute the average stacked galaxy spectra for both samples. A total of 1617 galaxy spectra is used for both the SDSS and the Uspec samples. As in , the SDSS spectra are corrected for Galactic extinction following the extinction curve for diffuse gas from  with , and using the Galactic values from the maps of . This is needed as simulated galaxy spectra do not include Galactic dust. We do not correct for any internal dust extinction, since passive galaxies are expected to have negligible intrinsic dust. After, the spectra are shifted to the rest frame and normalized by the mean flux at Å wavelengths. In this region, the spectra of red galaxies are flat and no strong features are expected. The spectra are then interpolated onto a 1 Å linearly spaced wavelength grid. The normalization and interpolation steps are applied to both SDSS and Uspec spectra. Figure 5 shows the comparison between the two spectra. The average spectra for both galaxy samples are clearly those of red galaxies with an old stellar population, revealing features such as the Ca II H K lines, the G-band at 4300 Å, the Balmer absorption lines and the break at Å. The Mg lines are also clearly visible in both spectra.
Overall, the agreement between the two average spectra is good. Nonetheless, it is visible that the Uspec average spectrum shows a somewhat flatter shape and a stronger emission in [O II] and [O III], an indication of an overall bluer population than the SDSS sample. This difference in the overall population can be explained by looking at the difference in the color-color space between the two populations. Figure 6 shows the comparison between SDSS and Uspec galaxies in the vs. color-color space. The Figure shows that the centroids of the two distributions have a small offset. This is due to the differences in magnitude which are also visible in Figure 1, which become especially relevant for the band. Figure 6 shows that the real data galaxy population has a tail in the distribution towards redder colors in both the and the planes. This explains the difference in the overall shape of the stacked spectra and the stronger emission in [O II] for the simulated galaxies with respect to the SDSS ones, and also shows the ability of  and Uspec to generate realistic spectra given some input properties. This difference in the spectra can be used as a diagnostic to fix the differences in the input galaxy properties in . This can be achieved in future work with an ABC (Approximate Bayesian Computation) optimization of the model parameters, as discussed in Section 6. These differences, however, highlight the need of further quantifying the agreement of the two populations studied, which we now turn to.
5.2 Principal Component Analysis (PCA)
As discussed above, we need to quantify the differences between the two populations of real and simulated galaxies. Galaxy spectra contain large amount of information, as each galaxy spectrum is described by 3469 data points. A useful approach to this problem is therefore trying to reduce its dimensionality.
The Karhunen-Loéve transform, also commonly called Principal Component Analysis (PCA), is a technique which is widely used to reduce the dimensionality of big data sets . Its application in astronomy has been exploited in details (see e.g., [72, 73, 74]). Applying PCA to spectroscopy basically consists in representing the spectra as a lower dimensional set of eigenspectra . The eigenspectra are obtained by finding a matrix such that
where is the diagonal matrix containing the eigenvalues of the correlation matrix constructed from the spectra. No weights are included in this analysis. We solve this problem through Singular Value Decomposition (SVD). In both real and simulated spectra, we mask the regions where strong sky lines are expected, namely at 5578.5 Å, 5894.6 Å, 6301.7 Å, 6364.5 Å, 7246 Å, with a FWHM of 15 Å for each sky line. See Appendix B for a discussion on how PCA performs without masking those lines. The first 200 Å in the blue and about 2500 Å in the infrared regions of the spectra have also been excluded from the analysis as they are severely dominated by residual sky features. The same analysis is applied to both the SDSS spectral sample and the Uspec simulated spectra. Each galaxy spectrum can be constructed as follows:
where a (b) are the expansion coefficients, or eigencoefficients (see Section 5.2.2 below) and () are the eigenspectra for the data (simulations).
The first five principal components comparison for both samples is shown in Figure 7. The grey areas show the masked regions where strong sky lines are expected. Overplotted are the and bands from SDSS. The PCA conducted independently for the data and the simulations shows good agreement in the first four components. The choice of using five components is motivated by the initial construction of the Uspec spectra which are built from the five templates (Section 4). As is can be seen in the Figure, the first two components capture most of the physical information coming from the spectra. Even if individual features are not visible, as the spectra are analyzed in the observed frame, the overall shape of red galaxies spectra is clearly visible in the first two PCA components. The three higher PCA components all show similar patterns, for both the SDSS and the Uspec sample. In those, the characteristic bump at Å, which is the position of the 4000 Å break shifted by the median redshifts of the two populations, is clearly visible, together with other redshifted features such as [O II] emission, the G-band and the H absorption, all broadened due to the variety of the redshifts of the two samples.
5.2.1 Mixing matrix
By definition, solving the eigenvalue problem of Equation 3 means that the basis set and are two sets of orthonormal basis. This means that we can define a mixing matrix M such that
so that, if , reduces to
In other words, in the ideal case, i.e., if real and simulated data were described by the same basis set, the mixing matrix would be the identity matrix. Figure 8 shows a graphical representation of the mixing matrix between the real and the simulated spectra presented above. In the Figure, the numbers in the boxes show the scalar products between the different components. As expected, the first components show better agreement than higher order components, with a significant drop for the fifth component. The non diagonal elements of the matrix are significantly smaller than the diagonal ones up to the fourth component. A distance metric can be defined in order to assess how far the mixing matrix is from being diagonal. We choose to use the ratio between the product of the diagonal elements of matrix M and its determinant, . For a diagonal matrix, such a ratio should be one. In our case
which shows how similar the two basis sets independently used to describe real and simulated spectra are. In Appendix A, we describe how the mixing matrix changes when we do not match the redshift distributions of the two populations. In particular, it is worth noting how varies for a difference in the median redshift between the two populations of , going from 0.872 to 0.827. In Appendix B, we show how changes with not masking the strong sky emission lines, dominating all principal components except for the first one. The effect of sky lines should be taken into account as they might strongly influence the distance metrics introduced here.
We can determine the relative contribution of each eigenspectrum to the observed spectrum by calculating the respective eigencoefficients, i.e., and from Equations 4 and 5. Those are simply the scalar products of the eigenspectra with their normalized spectra. Figures 9 and 10 show the distributions of the five eigencoefficients. Figure 9 shows the real spectra from SDSS (green) and the simulated Uspec spectra (red) projected onto SDSS principal components (i.e., onto the basis set from Equation 4). The coefficients distributions for both the real and the simulated spectra are overlapping. Also, all the coefficients are correlated to each other. However, the Uspec eigencoefficients are occupying a larger parameter space in all five coefficients. This is even more evident when looking at Figure 10, where the spectra are projected onto Uspec principal components (or the basis set from Equation 5). The coefficients distributions still appear to be overlapping and correlations between different coefficients are visible. However, the regions occupied by Uspec spectra are clearly larger than SDSS ones. This is due to the higher signal-to-noise of Uspec simulated spectra, which can be also seen in Figure 7. This signal-to-noise ratio brings the eigencoefficients to be sensitive to a wider variety in properties in the simulated galaxy population. This effect can be accounted for when matching the noise properties of the SDSS spectra. Furthermore, the differences in the eigencoefficients can also be used as distance measures to check the correctness of the inputs of the simulations. This is particularly evident when comparing the projected coefficients here described and those shown in Appendix A. In an upcoming publication, we will use the eigencoefficients defined here to better constrain the redshift distribution presented in , in addition to the distance measures already outlined in .
In this paper we describe Uspec, a tool to simulate galaxy spectra for cosmological surveys. Uspec builds galaxy spectra starting from a linear combination of templates and coefficients. The coefficients, together with magnitudes and spectroscopic redshifts, are drawn from luminosity functions which evolve with redshift.
To compare our simulations to real data, we considered LRGs samples using the redshifts and colors cuts as in the CMASS sample from SDSS/BOSS. We apply these cuts to both real and simulated galaxies. We then modeled the noise and instrumental properties and include them in the simulated spectra. In particular, we modeled the read-out noise, the shot noise and the instrumental and atmospheric throughput. We also constructed and included in Uspec a night background sky model spectrum with same characteristic as the ones observed at the APO in New Mexico. The LSF for the line broadening has been derived from sky spectra observed in the BOSS survey.
We compared the average spectra of the real and simulated galaxy populations, finding good agreement. A small residual difference between the average spectra can be seen and be ascribed to the intrinsic properties of the simulated galaxy population. This can be reduced by adjusting the model parameters in the future.
To further quantify the level of agreement between the real and the simulated galaxy samples, we also performed a PCA. We find a remarkably good agreement between the two populations for the first four principal components, and a moderate agreement for the fifth component. The comparison of the eigencoefficients of the two galaxy populations also shows that we are able to reproduce the variety of properties of LRGs in the SDSS survey. Both the distribution of the eigencoefficients resulting from projecting real spectra from SDSS and simulated Uspec spectra onto SDSS, and onto Uspec principal components, overlap. However, the Uspec galaxies distributions are somewhat broader than those of data, which can be explained by the higher signal-to-noise ratio of the simulated galaxies, which results in a wider variety of projected coefficients.
We define a distance measure as the ratio between the product of the diagonal elements of the mixing matrix M of the PCAs and its determinant. The mixing matrix is expected to be the identity matrix if real and simulated data have the same principal components. We find . In the course of our analysis, we match the redshift distribution of real and simulated data. It is worth noting however that, if we have a difference in the median redshifts between the two galaxy sample of , then the value of decreases to 0.827. Also, the eigencoefficients distributions change when introducing such a difference. This is an interesting result as it indicates that the distance measure , as well as the eigencoefficients distributions, are sensitive to the parameters that control the redshift evolution of the input luminosity function. For a detailed discussion on this aspect, see Appendix A.
The results presented here are promising and offer good prospects for applying our method to large upcoming spectroscopic surveys such as DESI. In our future work, we plan to incorporate Uspec into a full ABC (Approximate Bayesian Computation) framework, to better match the intrinsic and noise properties of our simulated galaxies and real data. Furthermore, we will seek to simulate the population of blue star-forming galaxies, to also test the population of emitters which offers new different insights into the study of clustering of different galaxy populations.
MF would like to thank Luca Tortorelli for useful discussions on galaxy properties. This research made use of IPython, NumPy, SciPy, and Matplotlib. We acknowledge support by SNF grant .
-  J. Herbel, T. Kacprzak, A. Amara, A. Refregier, C. Bruderer and A. Nicola, The redshift distribution of cosmological samples: a forward modeling approach, Journal of Cosmology and Astroparticle Physics 8 (Aug., 2017) 035, [1705.05386].
-  P. J. E. Peebles and J. T. Yu, Primeval Adiabatic Perturbation in an Expanding Universe, The Astrophysical Journal 162 (Dec., 1970) 815.
-  R. A. Sunyaev and Y. B. Zeldovich, Small scale entropy and adiabatic density perturbations - Antimatter in the Universe, Astrophysics and Space Science 9 (Dec., 1970) 368–382.
-  J. R. Bond and G. Efstathiou, Cosmic background radiation anisotropies in universes dominated by nonbaryonic dark matter, Astrophysical Journal, Letters 285 (Oct., 1984) L45–L48.
-  A. L. Coil, The Large-Scale Structure of the Universe, p. 387. 2013. 10.1007/978-94-007-5609-0_8.
-  D. S. Madgwick, E. Hawkins, O. Lahav, S. Maddox, P. Norberg, J. A. Peacock et al., The 2dF Galaxy Redshift Survey: galaxy clustering per spectral type, Monthly Notices of the Royal Astronomical Society 344 (Sept., 2003) 847–856, [astro-ph/0303668].
-  I. Zehavi, D. J. Eisenstein, R. C. Nichol, M. R. Blanton, D. W. Hogg, J. Brinkmann et al., The Intermediate-Scale Clustering of Luminous Red Galaxies, The Astrophysical Journal 621 (Mar., 2005) 22–31, [astro-ph/0411557].
-  I. Zehavi, Z. Zheng, D. H. Weinberg, M. R. Blanton, N. A. Bahcall, A. A. Berlind et al., Galaxy Clustering in the Completed SDSS Redshift Survey: The Dependence on Color and Luminosity, The Astrophysical Journal 736 (July, 2011) 59, [1005.2413].
-  M. Tegmark, Measuring Cosmological Parameters with Galaxy Surveys, Physical Review Letters 79 (Nov., 1997) 3806–3809, [astro-ph/9706198].
-  D. M. Goldberg and M. A. Strauss, Determination of the Baryon Density from Large-Scale Galaxy Redshift Surveys, The Astrophysical Journal 495 (Mar., 1998) 29–43, [astro-ph/9707209].
-  D. J. Eisenstein, W. Hu and M. Tegmark, Cosmic Complementarity: H and from Combining Cosmic Microwave Background Experiments and Redshift Surveys, Astrophysical Journal, Letters 504 (Sept., 1998) L57–L60, [astro-ph/9805239].
-  T. Hong, J. L. Han, Z. L. Wen, L. Sun and H. Zhan, The Correlation Function of Galaxy Clusters and Detection of Baryon Acoustic Oscillations, The Astrophysical Journal 749 (Apr., 2012) 81, [1202.0640].
-  B. J. Weiner, A. C. Phillips, S. M. Faber, C. N. A. Willmer, N. P. Vogt, L. Simard et al., The DEEP Groth Strip Galaxy Redshift Survey. III. Redshift Catalog and Properties of Galaxies, The Astrophysical Journal 620 (Feb., 2005) 595–617, [astro-ph/0411128].
-  B. Garilli, O. Le Fèvre, L. Guzzo, D. Maccagni, V. Le Brun, S. de la Torre et al., The Vimos VLT deep survey. Global properties of 20,000 galaxies in the I 22.5 WIDE survey, Astronomy and Astrophysics 486 (Aug., 2008) 683–695, [0804.4568].
-  D. Schlegel, M. White and D. Eisenstein, The Baryon Oscillation Spectroscopic Survey: Precision measurement of the absolute cosmic distance scale, in astro2010: The Astronomy and Astrophysics Decadal Survey, vol. 2010 of ArXiv Astrophysics e-prints, 2009, 0902.4680.
-  D. J. Eisenstein, D. H. Weinberg, E. Agol, H. Aihara, C. Allende Prieto, S. F. Anderson et al., SDSS-III: Massive Spectroscopic Surveys of the Distant Universe, the Milky Way, and Extra-Solar Planetary Systems, Astronomical Journal 142 (Sept., 2011) 72, [1101.1529].
-  K. S. Dawson, J.-P. Kneib, W. J. Percival, S. Alam, F. D. Albareti, S. F. Anderson et al., The SDSS-IV Extended Baryon Oscillation Spectroscopic Survey: Overview and Early Data, Astronomical Journal 151 (Feb., 2016) 44, [1508.04473].
-  G.-B. Zhao, Y. Wang, A. J. Ross, S. Shandera, W. J. Percival, K. S. Dawson et al., The extended Baryon Oscillation Spectroscopic Survey: a cosmological forecast, Monthly Notices of the Royal Astronomical Society 457 (Apr., 2016) 2377–2390, [1510.08216].
-  DESI Collaboration, A. Aghamousa, J. Aguilar, S. Ahlen, S. Alam, L. E. Allen et al., The DESI Experiment Part I: Science,Targeting, and Survey Design, ArXiv e-prints (Oct., 2016) , [1611.00036].
-  DESI Collaboration, A. Aghamousa, J. Aguilar, S. Ahlen, S. Alam, L. E. Allen et al., The DESI Experiment Part II: Instrument Design, ArXiv e-prints (Oct., 2016) , [1611.00037].
-  R. Laureijs, J. Amiaux, S. Arduini, J. . Auguères, J. Brinchmann, R. Cole et al., Euclid Definition Study Report, ArXiv e-prints (Oct., 2011) , [1110.3193].
-  R. S. de Jong, O. Bellido-Tirado, C. Chiappini, É. Depagne, R. Haynes, D. Johl et al., 4MOST: 4-metre multi-object spectroscopic telescope, in Ground-based and Airborne Instrumentation for Astronomy IV, vol. 8446 of Proceedings of the SPIE, p. 84460T, Sept., 2012, 1206.6885, DOI.
-  L. Campbell, W. Saunders and M. Colless, The tiling algorithm for the 6dF Galaxy Survey, Monthly Notices of the Royal Astronomical Society 350 (June, 2004) 1467–1476, [astro-ph/0403502].
-  M. R. Blanton, H. Lin, R. H. Lupton, F. M. Maley, N. Young, I. Zehavi et al., An Efficient Targeting Strategy for Multiobject Spectrograph Surveys: the Sloan Digital Sky Survey “Tiling” Algorithm, Astronomical Journal 125 (Apr., 2003) 2276–2286, [astro-ph/0105535].
-  D. Schlegel, F. Abdalla, T. Abraham, C. Ahn, C. Allende Prieto, J. Annis et al., The BigBOSS Experiment, ArXiv e-prints (June, 2011) , [1106.1706].
-  T. Boller and T. Dwelly, The 4MOST facility simulator: instrument and science optimisation, in Observatory Operations: Strategies, Processes, and Systems IV, vol. 8448 of Proceedings of the SPIE, p. 84480X, Sept., 2012, 1208.4733, DOI.
-  B. Nord, A. Amara, A. Réfrégier, L. Gamper, L. Gamper, B. Hambrecht et al., SPOKES: An end-to-end simulation facility for spectroscopic cosmological surveys, Astronomy and Computing 15 (Apr., 2016) 1–15, [1602.01480].
-  A. Refregier and A. Amara, A way forward for Cosmic Shear: Monte-Carlo Control Loops, Physics of the Dark Universe 3 (Apr., 2014) 1–3, [1303.4739].
-  C. Bruderer, A. Nicola, A. Amara, A. Refregier, J. Herbel and T. Kacprzak, Cosmic shear calibration with forward modeling, ArXiv e-prints (July, 2017) , [1707.06233].
-  D. J. Eisenstein, I. Zehavi, D. W. Hogg, R. Scoccimarro, M. R. Blanton, R. C. Nichol et al., Detection of the Baryon Acoustic Peak in the Large-Scale Correlation Function of SDSS Luminous Red Galaxies, The Astrophysical Journal 633 (Nov., 2005) 560–574, [astro-ph/0501171].
-  G. Hütsi, Power spectrum of the SDSS luminous red galaxies: constraints on cosmological parameters, Astronomy and Astrophysics 459 (Nov., 2006) 375–389, [astro-ph/0604129].
-  N. Padmanabhan, D. J. Schlegel, U. Seljak, A. Makarov, N. A. Bahcall, M. R. Blanton et al., The clustering of luminous red galaxies in the Sloan Digital Sky Survey imaging data, Monthly Notices of the Royal Astronomical Society 378 (July, 2007) 852–872, [astro-ph/0605302].
-  C. Almeida, C. M. Baugh, D. A. Wake, C. G. Lacey, A. J. Benson, R. G. Bower et al., Luminous red galaxies in hierarchical cosmologies, Monthly Notices of the Royal Astronomical Society 386 (June, 2008) 2145–2160, [0710.3557].
-  P. Norberg, C. M. Baugh, E. Hawkins, S. Maddox, D. Madgwick, O. Lahav et al., The 2dF Galaxy Redshift Survey: the dependence of galaxy clustering on luminosity and spectral type, Monthly Notices of the Royal Astronomical Society 332 (June, 2002) 827–838, [astro-ph/0112043].
-  M. Cappellari and E. Emsellem, Parametric Recovery of Line-of-Sight Velocity Distributions from Absorption-Line Spectra of Galaxies via Penalized Likelihood, Publications of the ASP 116 (Feb., 2004) 138–147, [astro-ph/0312201].
-  M. Cappellari, Improving the full spectrum fitting method: accurate convolution with Gauss-Hermite functions, Monthly Notices of the Royal Astronomical Society 466 (Apr., 2017) 798–811, [1607.08538].
-  R. Johnston, Shedding light on the galaxy luminosity function, Astronomy and Astrophysics Reviews 19 (Aug., 2011) 41, [1106.2039].
-  R. Beare, M. J. I. Brown, K. Pimbblet, F. Bian and Y.-T. Lin, The iz/i 1.2 Optical Luminosity Function from a Sample of 410,000 Galaxies in Bo#1255tes, The Astrophysical Journal 815 (Dec., 2015) 94, [1511.01580].
-  P. Schechter, An analytic expression for the luminosity function for galaxies., The Astrophysical Journal 203 (Jan., 1976) 297–306.
-  N. Balakrishnan, Handbook of the Logistic Distribution. Statistics: A Series of Textbooks and Monographs. Taylor & Francis, 2013.
-  M. R. Blanton, D. J. Schlegel, M. A. Strauss, J. Brinkmann, D. Finkbeiner, M. Fukugita et al., New York University Value-Added Galaxy Catalog: A Galaxy Catalog Based on New Public Surveys, Astronomical Journal 129 (June, 2005) 2562–2578, [astro-ph/0410166].
-  M. R. Blanton and S. Roweis, K-Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared, Astronomical Journal 133 (Feb., 2007) 734–754, [astro-ph/0606170].
-  G. Bruzual and S. Charlot, Stellar population synthesis at the resolution of 2003, Monthly Notices of the Royal Astronomical Society 344 (Oct., 2003) 1000–1028, [astro-ph/0309134].
-  J. Bergé, L. Gamper, A. Réfrégier and A. Amara, An Ultra Fast Image Generator (UFIG) for wide-field astronomy, Astronomy and Computing 1 (Feb., 2013) 23–32, [1209.1200].
-  C. Bruderer, C. Chang, A. Refregier, A. Amara, J. Bergé and L. Gamper, Calibrated Ultra Fast Image Simulations for the Dark Energy Survey, The Astrophysical Journal 817 (Jan., 2016) 25, [1504.02778].
-  C. Bonnett, M. A. Troxel, W. Hartley, A. Amara, B. Leistedt, M. R. Becker et al., Redshift distributions of galaxies in the Dark Energy Survey Science Verification shear catalogue and implications for weak lensing, Physical Review D 94 (Aug., 2016) 042005, [1507.05909].
-  B. Leistedt, H. V. Peiris, F. Elsner, A. Benoit-Lévy, A. Amara, A. H. Bauer et al., Mapping and Simulating Systematics due to Spatially Varying Observing Conditions in DES Science Verification Data, Astrophysical Journal, Supplement 226 (Oct., 2016) 24, [1507.05647].
-  F. D. Albareti, C. Allende Prieto, A. Almeida, F. Anders, S. Anderson, B. H. Andrews et al., The 13th Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the SDSS-IV Survey Mapping Nearby Galaxies at Apache Point Observatory, Astrophysical Journal, Supplement 233 (Dec., 2017) 25, [1608.02013].
-  J. E. Gunn, W. A. Siegmund, E. J. Mannery, R. E. Owen, C. L. Hull, R. F. Leger et al., The 2.5 m Telescope of the Sloan Digital Sky Survey, Astronomical Journal 131 (Apr., 2006) 2332–2359, [astro-ph/0602326].
-  J. E. Gunn, M. Carr, C. Rockosi, M. Sekiguchi, K. Berry, B. Elms et al., The Sloan Digital Sky Survey Photometric Camera, Astronomical Journal 116 (Dec., 1998) 3040–3081, [astro-ph/9809085].
-  C. Stoughton, R. H. Lupton, M. Bernardi, M. R. Blanton, S. Burles, F. J. Castander et al., Sloan Digital Sky Survey: Early Data Release, Astronomical Journal 123 (Jan., 2002) 485–548.
-  V. Petrosian, Surface brightness and evolution of galaxies, Astrophysical Journal, Letters 209 (Oct., 1976) L1–L5.
-  M. R. Blanton, J. Dalcanton, D. Eisenstein, J. Loveday, M. A. Strauss, M. SubbaRao et al., The Luminosity Function of Galaxies in SDSS Commissioning Data, Astronomical Journal 121 (May, 2001) 2358–2380, [astro-ph/0012085].
-  N. Yasuda, M. Fukugita, V. K. Narayanan, R. H. Lupton, I. Strateva, M. A. Strauss et al., Galaxy Number Counts from the Sloan Digital Sky Survey Commissioning Data, Astronomical Journal 122 (Sept., 2001) 1104–1124, [astro-ph/0105545].
-  S. A. Smee, J. E. Gunn, A. Uomoto, N. Roe, D. Schlegel, C. M. Rockosi et al., The Multi-object, Fiber-fed Spectrographs for the Sloan Digital Sky Survey and the Baryon Oscillation Spectroscopic Survey, Astronomical Journal 146 (Aug., 2013) 32, [1208.2233].
-  A. S. Bolton, D. J. Schlegel, É. Aubourg, S. Bailey, V. Bhardwaj, J. R. Brownstein et al., Spectral Classification and Redshift Measurement for the SDSS-III Baryon Oscillation Spectroscopic Survey, Astronomical Journal 144 (Nov., 2012) 144, [1207.7326].
-  M. Fagioli, C. M. Carollo, A. Renzini, S. J. Lilly, M. Onodera and S. Tacchella, Minor Mergers or Progenitor Bias? The Stellar Ages of Small and Large Quenched Galaxies, The Astrophysical Journal 831 (Nov., 2016) 173, [1607.03493].
-  J. E. O’Donnell, R-dependent optical and near-ultraviolet extinction, The Astrophysical Journal 422 (Feb., 1994) 158–163.
-  D. J. Schlegel, D. P. Finkbeiner and M. Davis, Maps of Dust Infrared Emission for Use in Estimation of Reddening and Cosmic Microwave Background Radiation Foregrounds, The Astrophysical Journal 500 (June, 1998) 525–553, [astro-ph/9710327].
-  K. S. Dawson, D. J. Schlegel, C. P. Ahn, S. F. Anderson, É. Aubourg, S. Bailey et al., The Baryon Oscillation Spectroscopic Survey of SDSS-III, Astronomical Journal 145 (Jan., 2013) 10, [1208.0022].
-  D. J. Eisenstein, J. Annis, J. E. Gunn, A. S. Szalay, A. J. Connolly, R. C. Nichol et al., Spectroscopic Target Selection for the Sloan Digital Sky Survey: The Luminous Red Galaxy Sample, Astronomical Journal 122 (Nov., 2001) 2267–2280, [astro-ph/0108153].
-  R. Cannon, M. Drinkwater, A. Edge, D. Eisenstein, R. Nichol, P. Outram et al., The 2dF-SDSS LRG and QSO (2SLAQ) Luminous Red Galaxy Survey, Monthly Notices of the Royal Astronomical Society 372 (Oct., 2006) 425–442, [astro-ph/0607631].
-  D. Thomas, O. Steele, C. Maraston, J. Johansson, A. Beifiori, J. Pforr et al., Stellar velocity dispersions and emission line properties of SDSS-III/BOSS galaxies, Monthly Notices of the Royal Astronomical Society 431 (May, 2013) 1383–1397, [1207.6115].
-  B. Reid, S. Ho, N. Padmanabhan, W. J. Percival, J. Tinker, R. Tojeiro et al., SDSS-III Baryon Oscillation Spectroscopic Survey Data Release 12: galaxy target selection and large-scale structure catalogues, Monthly Notices of the Royal Astronomical Society 455 (Jan., 2016) 1553–1573, [1509.06529].
-  I. Appenzeller, High-Redshift Galaxies - Light from the Early Universe. 2009, 10.1007/978-3-540-75824-2.
-  D. R. Law, B. Cherinka, R. Yan, B. H. Andrews, M. A. Bershady, D. Bizyaev et al., The Data Reduction Pipeline for the SDSS-IV MaNGA IFU Galaxy Survey, Astronomical Journal 152 (Oct., 2016) 83, [1607.08619].
-  R. W. Hanuschik, A flux-calibrated, high-resolution atlas of optical sky emission from UVES, Astronomy and Astrophysics 407 (Sept., 2003) 1157–1164.
-  D. E. Osterbrock, J. P. Fulbright, A. R. Martel, M. J. Keane, S. C. Trager and G. Basri, Night-Sky High-Resolution Spectral Atlas of OH and O2 Emission Lines for Echelle Spectrograph Wavelength Calibration, Publications of the ASP 108 (Mar., 1996) 277.
-  S. Noll, W. Kausch, M. Barden, A. M. Jones, C. Szyszka, S. Kimeswenger et al., An atmospheric radiation model for Cerro Paranal. I. The optical spectral range, Astronomy and Astrophysics 543 (July, 2012) A92, [1205.2003].
-  A. Jones, S. Noll, W. Kausch, C. Szyszka and S. Kimeswenger, An advanced scattered moonlight model for Cerro Paranal, Astronomy and Astrophysics 560 (Dec., 2013) A91, [1310.7030].
-  I. T. Jolliffe and J. Cadima, Principal component analysis: a review and recent developments, Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 374 (2016) , [http://rsta.royalsocietypublishing.org/content/374/2065/20150202.full.pdf].
-  G. Efstathiou and S. M. Fall, Multivariate analysis of elliptical galaxies, Monthly Notices of the Royal Astronomical Society 206 (Jan., 1984) 453–464.
-  F. Murtagh and A. Heck, eds., Multivariate Data Analysis, vol. 131 of Astrophysics and Space Science Library, 1987. 10.1007/978-94-009-3789-5.
-  C. W. Yip, A. J. Connolly, D. E. Vanden Berk, Z. Ma, J. A. Frieman, M. SubbaRao et al., Spectral Classification of Quasars in the Sloan Digital Sky Survey: Eigenspectra, Redshift, and Luminosity Effects, Astronomical Journal 128 (Dec., 2004) 2603–2630, [astro-ph/0408578].
-  A. J. Connolly, A. S. Szalay, M. A. Bershady, A. L. Kinney and D. Calzetti, Spectral Classification of Galaxies: an Orthogonal Approach, Astronomical Journal 110 (Sept., 1995) 1071, [astro-ph/9411044].
Appendix A Sensitivity analysis for the redshift distribution
This paper is aimed at presenting Uspec and its capabilities of simulating realistic galaxy spectra. The PCA analysis quantifies the differences between the real and simulated spectra. However, here we describe how the PCA analysis can be also used as an additional constraint for the input redshift distribution of .
Figures 11 and 12 show the PCA analysis results when keeping all the galaxies that pass the selection criteria described in Section 3.3, i.e., not matching the redshift distributions. This brings the total number of galaxies analyzed to 2126. The left panel of Figure 11 shows the comparison between the five principal components of real and simulated spectra. The first two components are only sensitive to the overall shape of the spectra, and appear to be almost unchanged with respect to those in Figure 7. It is visible however how the position of the redshifted 4000 Å break is changed for the Uspec galaxies. This is especially evident in the first component. The effect of the redshift difference (as reported in the main text in Section 3.4.2, ) is more evident in the higher order components, where the spectral features for the Uspec are shifted towards bluer wavelengths, where the median of the Uspec is centered. The difference is reflected in the mixing matrix M, shown in the right panel of Figure 11. If we evaluate the distance metrics based on the mixing matrix, we find:
From the comparison with the evaluation of the mixing matrix in the main text (), where the of real and simulated data are matched, it is clear how this can be used as a distance metrics to constrain the input .
Also, Figure 12 shows the eigencoefficients both projected onto SDSS principal components (left panel), and onto Uspec principal components (right panel). The distributions of higher order Uspec coefficients are clearly different than those of SDSS, and also than those shown in the main text (Figures 9 and 10). This is an indication of how the coefficient distribution can be used as distance metrics in refining the input of simulated galaxies.
Appendix B Impact of the sky model
As discussed in Section 4.2, the background sky is an important source of systematics in any astronomical observation. The impossibility to predict the strength of skylines can have important effects also in our PCA analysis. However, the position of the strong sky lines is easily predictable, as well as their width given the instrumental resolution R. Throughout our analysis, we masked the regions where the strongest skylines at 5578.5 Å, 5894.6 Å, 6301.7 Å, 6364.5 Å, 7246 Å are expected, with a FWHM of 15 Å for each sky line. Figure 13 shows how the PCA analysis looks line when not masking these sky lines. The redshift distributions are here matched. The first components appear mostly unchanged, excepted for the presence of the [O I] sky line at 5578.5 Å. This sky line completely dominates the higher order principal components. This is reflected also in the mixing matrix shown in the right panel of Figure 13. In this case, . As sky lines are not redshifted, they are not washed out as galactic intrinsic emission lines when performing the PCA analysis in observed frame. This shows the impact of strong emission lines on the real spectra and the importance of properly masking them.