Photo-zs and Model SEDs of BOSS CMASS Galaxies

Photometric Redshifts and Model Spectral Energy Distributions of Galaxies From the SDSS-III BOSS DR10 Data


We construct a set of model spectra specifically designed to match the colours of the BOSS CMASS galaxies and to be used with photometric redshift template fitting techniques. As a basis we use a set of \glsplSED of single and composite stellar population models. These models cannot describe well the whole colour range populated by the CMASS galaxies at all redshifts, wherefore we modify them by multiplying the \glsplSED with for for different values of and . When fitting these \glsplSED to the colours of the CMASS sample, with a burst and dust components in superposition, we can recreate the location in colour spaces inhabited by the CMASS galaxies. From the best fitting models we select a small subset in a two-dimensional plane, whereto the galaxies were mapped by a self-organizing map. These models are used for the estimation of photometric redshifts with a Bayesian template fitting code. The photometric redshifts with the novel templates have a very small outlier rate of , a low bias , and scatter of in the restframe. Using our models, the galaxy colours are reproduced to a better extent with the photometric redshifts of this work than with photometric redshifts of SDSS.

galaxies: distances and redshifts, galaxies: evolution, galaxies: fundamental parameters (colours)

AGBAGBasymptotic giant branch \newacronymBOSSBOSSBaryon Oscillation Spectroscopic Survey \newacronymCSPCSPcomposite stellar population \newacronymDESDESDark Energy Survey \newacronymGALEXGALEXGalaxy Evolution Explorer \newacronymFIRFIRfar infrared \newacronymFUVFUVfar ultraviolet \newacronymIMFIMFinitial mass function \newacronymIRIRinfrared \newacronym[longplural=luminous red galaxies]LRGLRGluminous red galaxy \newacronymNIRNIRnear infrared \newacronymNUVNUVnear ultraviolet \newacronympanstarrsPan-STARRSPanoramic Survey Telescope and Rapid Response System \newacronym[longplural = principal component analyses]PCAPCAprincipal component analysis \newacronymPDFPDFprobability distribution function \newacronymphotozphoto-photometric redshift \newacronymSDSSSDSSSloan Digital Sky Survey \newacronymSEDSEDspectral energy distribution \newacronymSFHSFHstar formation history \newacronymSFRSFRstar formation rate \newacronymSOMSOMself-organising map \newacronymspeczspec-spectroscopic redshift \newacronymSSPSSPsingle stellar population \newacronymUVUVultraviolet \glsunsetBOSS \glsunsetDES \glsunsetGALEX \glsunsetpanstarrs \glsunsetSDSS \glsunsetSOM


BOSS \glsunsetDES \glsunsetGALEX \glsunsetpanstarrs \glsunsetSDSS

1 Introduction

Spectroscopic surveys provide very precise measurements of the cosmological redshift, but they are time consuming and cannot be applied to fainter galaxies. Significantly larger volumes of the cosmos can be probed on shorter timescales with photometric surveys by observation through selected filter bands. Since spectral features cannot be resolved by medium- or broadband photometry, one has to apply statistical methods to derive the \glsphotoz. Redshift measurements are necessary in many cosmological contexts whenever information about the redshift tomography, or the distribution of galaxies as a function of redshift is required. The results of cosmological applications strongly depend on the photometric redshift accuracies. Large biases and scatter in the \glsplphotoz can deteriorate any such study, e.g., the dark energy constraints from shear tomography (Ma, Hu & Huterer, 2006), or the baryonic acoustic oscillation scale (Benítez et al., 2009; Sánchez et al., 2011). However, statistical errors can be accounted for if they are well known (e.g., Huterer et al., 2006). Furthermore, in opposition to a single \glsphotoz estimate, including the probability density function (PDF) in the analysis of the \glsphotoz uncertainties enhances the accuracy of cosmological measurements (e.g., Mandelbaum et al., 2008; Hildebrandt et al., 2012).
The techniques for \glsphotoz estimation are commonly divided into two categories: template fitting and empirical methods. Empirical methods learn a relation between the photometric observables and the spectroscopic redshift (spec-) of a training set of galaxies in order to employ that relation to objects without spectroscopic information (e.g., Collister & Lahav, 2004; Gerdes, 2009; Carrasco Kind & Brunner, 2013, 2014). These techniques have the great advantage that they take calibration errors explicitly into account, and that they can include photometric observables other than magnitudes and colours. Nevertheless, they strongly rely on the training sample which has to be a good representation of the query galaxies to yield accurate \glsplphotoz. Furthermore, empirical methods have to be trained anew when they are applied to different surveys, filter systems, or extracted magnitudes or fluxes. Also, many empirical methods do not take photometric measurement uncertainties into account.
In contrast to empirical methods, template fitting techniques essentially perform a maximum likelihood fit to the data (e.g., Arnouts et al., 1999; Benítez, 2000; Bolzonella, Miralles & Pelló, 2000; Ilbert et al., 2006; Feldmann et al., 2006; Brammer, van Dokkum & Coppi, 2008). A set of template \glsplSED are shifted to several redshift steps, where they are multiplied with the filter functions and integrated over. The such predicted fluxes of the templates are then fitted to the photometry of the data. Therefore, template fitting techniques can be applied to data from any photometric system, and, concurrently to the \glsphotoz estimation, also provide restframe properties. In order to predict the \glsplphotoz to a high precision, the data has to be well calibrated (or a catalogue with precise redshift measurements has to be available to re-calibrate the zeropoints). Furthermore, it is of paramount importance that the underlying model \glsplSED represent the data in question. In order to achieve that, some codes use combinations of model \glsplSED, or “repair” the templates (e.g., Csabai et al., 2003; Feldmann et al., 2006; Brammer, van Dokkum & Coppi, 2008). However, the underlying \glsplSED have to be already well selected if a modification of them should succeed as a representation of the data. Moreover, if one introduces templates that do not match the data, the quality of the photometric redshifts is deteriorated. Therefore, we generate in this work a model set designed to match the data in question.
In Greisel et al. (2013, hereafter G13) we created model \glsplSED for spectroscopically observed \glsplLRG from the Sloan Digital Sky Survey data release 7 (SDSS-II, York et al., 2000; Eisenstein et al., 2001; Abazajian et al., 2009). These models were then used for the estimation of \glsplphotoz and yielded accurate results. The \glsLRG sample of \glsSDSS-II included \glsplLRG only up to redshift . In this paper we extend the work done in G13 and generate a set of model \glsplSED on the basis of the CMASS sample of the Baryon Oscillation Spectroscopic Survey (BOSS, Eisenstein et al., 2011; Dawson et al., 2013; Ahn et al., 2014). These templates are created on the basis of the colours of the CMASS galaxies. They are specifically designed and selected to produce accurate photometric redshifts with template fitting techniques.
This paper is organized as follows. We present the template fitting photometric redshift code used in this work in Sec. 2. In Sec. 3 the data used in this work is described. We briefly explain the \glsSED fitting routine in Sec. 4. After that, we will go into detail about the generation of model \glsplSED and how we modify them in order to fit the data colours to a better extent. At the end of Sec. 4 we select a model set which is to be used with template fitting photometric redshift codes. In Sec. 5 we present the photometric redshift results of our new models and compare them with the two different \glsplphotoz available on the \glsSDSS database. We also predict the colours of our models when fitted to the more accurate one of the \glsSDSS \glsplphotoz in comparison to their prediction with the redshifts of this work. Finally, we give a summary and conclusions in Sec. 6.
Throughout this paper we assume a CDM cosmology with , and . Magnitudes are magnitudes given in the AB system (Oke & Gunn, 1983).

2 Photometric Redshifts

For photometric redshift estimation in this paper we use the Bayesian template fitting code PhotoZ (Bender et al., 2001). Template fitting codes essentially determine the photometric redshift by performing a maximum likelihood fit of the predicted colours of a set of template \glsplSED, calculated by the multiplication of the \glsplSED with the survey filter functions and integrated, to the observed galaxy colours at varying redshifts. In order to lift degeneracies in the colours of model \glsplSED, Bayesian codes include the possibility to assign prior probabilities in redshift and luminosity to individual models. The resulting probability of a model-redshift combination then reads

where and are the colours and magnitudes of the photometric data, while denotes the absolute magnitude and the redshift . The second factor is the prior probability , with the \glsplPDF of and for a template . In the case of the PhotoZ code, the prior functions follow

with . , , and can be chosen for each template separately, while is even. The individual photometric redshift value of a galaxy is determined by the mode of the distribution of the best fitting model. We estimate the error of the individual \glsphotoz prediction to


Furthermore, we determine the stacked probability densities for all models in the template set which we will use in Sec. 5 in the calculation of \glsphotoz quality metrics.
The PhotoZ code has been applied in the past to a variety of photometric catalogues (Drory et al., 2001; Gabasch et al., 2004; Drory et al., 2005; Feulner et al., 2005; Brimioulle et al., 2008; Gabasch et al., 2008; Greisel et al., 2013; Brimioulle et al., 2013; Gruen et al., 2013, 2014; Sánchez et al., 2014). It is also part of the PS1 Photometric Classification Server (Saglia et al., 2012).

3 The BOSS CMASS Sample

In this paper we use the spectroscopic data from the Baryon Oscillation Spectroscopic Survey (BOSS, Eisenstein et al., 2011; Dawson et al., 2013). The latest public data release of \glsBOSS (DR10, Ahn et al., 2014) targets 14,555 square degrees of the sky obtaining spectra for 1,848,851 galaxies. The two spectrographs used in \glsBOSS are rebuilt from the original \glsSDSS spectrographs (Smee et al., 2013) and cover a wavelength range of 3600 to 10,400 Å at a resolution of 1560 to 2650 Å. \glsBOSS was designed to measure the spatial distribution of \glsplLRG and quasars to investigate the baryon acoustic oscillations that are imprinted on the large scale structure of today’s universe. Galaxies surveyed by \glsBOSS have redshifts up to . The target selection is discussed in Eisenstein et al. (2011). \glsBOSS target galaxies are selected from \glsSDSS-II imaging data in such a way that they have high luminosities and masses. Furthermore, the \glsBOSS target selection requires approximate uniform stellar masses throughout the redshift range of . Above , the \glsBOSS sample is magnitude limited. Similar to the cuts of the \glsSDSS-II \glsLRG sample (Eisenstein et al., 2001; Padmanabhan et al., 2005), a number of magnitude and colour cuts are applied to ensure the above requirements are fulfilled. They are selected in such a way to track the colours of a passively evolving galaxy from Maraston et al. (2009, M09 hereafter). The \glsBOSS sample is divided into two subsamples, a lower redshift sample tagged LOWZ at , and a higher redshift sample, dubbed CMASS for the constant mass requirement (Eisenstein et al., 2011).
The \glsSDSS-III data can be acquired from the \glsSDSS CasJobs website1. For the selection of our sample we employ the \glsSDSS clean photometry flag. This flag ensures that we do not have duplicates in our sample by removing multiple detections on different frames. Also, objects with deblending problems are removed, as well as the ones where more than 20 % of the PSF flux is interpolated over (that is, only 80 % of the flux is actually detected). Additionally to that, we demand that objects were detected in the first pass (unbinned image), are not saturated, and that a radial profile could be constructed.
We require furthermore that galaxies in our sample have spectroscopic redshifts determined to a high accuracy. Therefore, we chose only objects for our catalogue whose spec- warning flag is equal to zero. We want to create models specifically designed for galaxies at higher redshifts than in G13. Also, CMASS galaxies are very sparse at redshifts , which is why we reduce the sample to galaxies at . The resulting catalogue then contains CMASS galaxies. The CasJobs SQL query used to acquire this catalogue is given in App. B.

Figure 1: Upper panel: Spectroscopic redshift versus absolute magnitude in the band derived from a fit of G13 \glsLRG models to the CMASS catalogue of this work. Lower panel: Normalized frequency in spectroscopic redshift of the catalogue.

Fig. 1 shows the redshift distribution of the such selected CMASS galaxies. In the upper panel we plot the density of the sample in versus absolute magnitude in the band. is derived by fitting the \glsLRG models of G13 to the data at known spectroscopic redshifts. The lower panel of Fig. 1 presents the frequency in spec-, normalised to an integral of one.

3.1 Colours of SED Templates versus Colours of BOSS Galaxies

Figure 2: In the upper panels of the four major panels the CMASS colours are plotted as a function of redshift in grey, where the median is shown by a black dashed line. The lower panels are normalised to the median colour and show the residuals. Error bars show the median data errors in five redshift bins. On top of that the predicted colours of the M09 are plotted for several ages and formation redshifts. The purple and blue lines show the M09 model at constant ages of and . Furthermore, the green and red lines show a passively evolving version of M09 at formation redshifts of and .

In this section we want to compare the predicted colours of model \glsplSED to the colours of the \glsBOSS CMASS sample. In Fig. 2 we show the case of the M09 \glsLRG model which is used in the definition of the colour cuts of the \glsBOSS galaxy sample. This model was created by M09 by adding a mass fraction of of metal poor stars () to a \glsSSP model with solar metallicity from the Pickles (1998) stellar library in order to match the and colours of \glsSDSS-II \glsplLRG. We sample the M09 model at ages of and and predict the colours while redshifting the \glsSED (keeping the age fixed). Additionally, we consider two passively evolving variations of the M09 model with formation redshifts of and . Since the M09 model is available only at distinct ages, we have to interpolate between them to determine the \glsplSED as a function of redshift. The passive evolution was computed with the EzGal software (Mancone & Gonzalez, 2012). M09 specifically created this model to match the colours of the \glsSDSS-II \glsplLRG, and it fits their median colours well (see also Fig. 5 in G13, or Figs. 1 and 2 in M09). This is also true for the colours of the CMASS sample, where we find that the model with fits the , , and  colours best in the observed cases. In the case of the  and  colours there are deviations from the data for , but also the sample size of the CMASS galaxies in this region decreases significantly. However, the predicted colours in  are too blue up to a redshift of and lie outside the median uncertainties of the data. Furthermore, the colour errors are too small to be responsible for the spread in colour, wherefore the colour values of the data is due to the galaxy population not the photometric uncertainties.
Therefore, the data cannot be matched by the M09 model simultaneously in all colours, also not when taking evolution effects into account. The spread in colour can be explained by differences in ages of the galaxy, but could be also due to different stellar populations, i.e., the metallicity and the distribution of stellar ages.

4 New SED Templates

We aim to create a set of templates that can be used for photometric redshift estimation of galaxies with similar properties as the CMASS sample. To create models specifically designed to match the colours of the \glsBOSS data, we fit a number of model \glsplSED to the data at their known spectroscopic redshift and select from the best fitting models a subset that should represent the data in terms of colours, while yielding accurate photometric redshifts.
We expect the galaxy population to vary as a function of redshift. In the process of \glsphotoz estimation with template fitting we can account for that by assigning different redshift prior probabilities to individual model \glsplSED. One could be tempted to use a huge amount of \glsplSED with different properties that can match the data in question and yield reliable \glsphotoz results (while heavily increasing the computation time). However, this is not the case since one has to deal with degeneracies in colours resulting from different galaxy properties, e.g., age and metallicity. Also, introducing peculiar templates can deteriorate the of a galaxy. Therefore, one has to carefully select a small set of templates able to match the galaxy catalogue in question. In order to create a set of models that match the data at different redshifts we fit a variety of model \glsplSED to the CMASS galaxies within four redshift bins. The bins are centred on (continuing the sequence from G13) with interval widths of .
In the following we first give a short introduction of the \glsSED fitting procedure and turn afterwards to the description of the models we used as a basis to construct new model \glsplSED. After that we explain how we select models to represent the data that should then serve as a template set for photometric redshift estimation.

4.1 Generating Model SEDs for Objects in the BOSS Catalogue by SED Fitting

To fit model \glsplSED to the data we use the \glsSED fitting software SEDfit (Drory, Bender & Hopp, 2004). The code fits a number of model \glsplSED to the colours of the data by performing a maximum likelihood fit. Additionally to the models, one can define a burst model which is then fitted in superposition to the main component at several mass fractions. Furthermore, the SEDfit code applies dust extinction to both, the main and star burst component, following the extinction law of Calzetti et al. (2000). We perform the \glsSED fitting procedure in such a way that signal-to-noise ratios smaller than are considered upper limits.
The 4000 Å break is the most significant feature in the spectra of red galaxies and the reason why we can estimate \glsplphotoz from broad band photometry to a high accuracy. At redshifts populated by the CMASS galaxies the 4000 Å break lies within the band, wherefore the band is the bluest band needed to derive the position of the break in wavelength. The band is very shallow and it may deteriorate the quality of the \glsSED fitting results and the photometric redshifts if the errors are not determined accurately. For these reasons we decide to omit it in the following \glsSED fitting and photometric redshift estimation. Furthermore, the \glsSED fits are not performed on the whole data of a bin because of the high computation time, but we randomly select subsamples from within each bin containing objects.

SED Fitting Results with BC03 Models

We tried a variety of available model \glsplSED in the \glsSED fitting procedure. The most extensive public libraries originate from Bruzual & Charlot (2003, hereafter dubbed BC03), Maraston (1998, 2005, hereafter M05), and Maraston & Strömbäck (2011, hereafter M11). We fit all of these models in separate trial runs in order to select a basic model set to proceed further, since large sets of models require enormous computation times.
The BC03 models can be generated by the software GALAXEV2. We create synthetical models from the BaSeL 3.1 library, using the Padova 1994 evolutionary tracks recommended by BC03 and the \glsIMF by Chabrier (2003). We generate models with four different metallicities, , , , and . From these we produce \glsSSP models, as well as \glsCSP models. \glsplSSP assume that all stars are formed instantaneously in a delta-functional star burst at the birth of the galaxy (age zero) and are passively evolving afterwards. To create \glsplCSP one can assign essentially any function for the star formation history (SFH), such that star formation takes place for longer periods of time. Usually, a star formation rate is assumed that behaves proportional to (at least for galaxies at lower redshifts ), where is the age of the galaxy and is the -folding time scale (e.g., Shapley et al., 2005; Longhetti & Saracco, 2009). We produce models with different values, , where the latter simulate an almost constant \glsSFR. We sample the models at ages ranging from to . In order to exploit the maximum freedom available, we also create model \glsplSED with increasing \glsSFR, hence . This kind of \glsSFH is mostly important for high(er) redshifts () and yields more physical results (e.g., Maraston et al., 2010; Monna et al., 2014), but was used for \glsSED fitting also at lower redshifts (e.g., Pforr, Maraston & Tonini, 2012). In our case the increasing \glsSFR models do not significantly increase the range in colour space needed to match the CMASS galaxies which is why we omit them in the following to save computation time.
We create the same variety of \glsplCSP for the M05 and M11 models (at least where possible) using the stellar population synthesis code EzGal (Mancone & Gonzalez, 2012), and fit them to the data as well. We show the distribution in likelihoods in Fig. 29 in App. A for the \glsSED fits of BC03, M05, and M11 models to the CMASS data in the four redshift bins. The BC03 models outperform the M05 and M11 models in terms of their values marginally at most. Fig. 29 shows that the fits with BC03 models have a slightly higher frequency of lower values than M05/11. Furthermore, considering Fig. 20 in G13, the BC03 models are a better match to the \glsSDSS-II \glsLRG data in terms of values (this is not so evident here, as seen in Fig. 29). Because of the versatility in the model creation with GALAXEV, and the variety of provided ages and metallicities, we choose to use the BC03 models in the remainder of this work.

Figure 3: Colour-colour plots for CMASS galaxies at . The \glsBOSS data is shown by grey shades and accompanying grey contours. Error bars denote the median error in colour in this redshift bin. The density of the resulting colours of the \glsSED fitted unmodified BC03 models are shown by coloured contours. Both sets of contours are drawn at the same frequency levels, , , , and .
Figure 4: Colour-colour plots for CMASS galaxies and best fitting BC03 models at . See Fig. 3 for a detailed description.
Figure 5: Colour-colour plots for CMASS galaxies and best fitting BC03 models at . See Fig. 3 for a detailed description.
Figure 6: Colour-colour plots for CMASS galaxies and best fitting BC03 models at . See Fig. 3 for a detailed description. The contours had to be smoothed to make them visible due to the small sample size in this redshift bin.

In Figs. 3 to 6 we present the \glsSED fitting results with BC03 \glsSSP and \glsCSP models with increasing and decreasing \glsplSFR in colour space. The data colour is plotted in grey scales with according contours, and the median colour errors are shown by error bars in each panel. The density of the predicted colours of the best fitting models are plotted over the data distribution in coloured contours. Both sets of contours are plotted at the same frequency levels.
The bluer  colour range populated by the CMASS galaxies in the redshift bins at . and is not populated by the best fitting BC03 models. Additionally, the redder  colours at are also not matched by the BC03 models. All these offsets in colour cannot be accounted for by the photometric errors which are also given in Figs. 3 to 6. We investigate the colour mismatch in the next section and modify the model \glsplSED to fit the data to a better extent.

Modification of the Red Continuum Slope

In G13 we already showed in Fig. 19 that the predicted  colour of the \glsSDSS-II \glsLRG sample cannot be matched by the models for the highest redshift bin , at least not while the other three colours are fitted simultaneously. The model colours were too red in comparison to the data, which means that the decrease in the continuum slope redwards of the 4000 Å break of the model \glsplSED is not strong enough. The mismatch in  is also present for the CMASS sample which is located at even higher redshifts than the \glsSDSS-II \glsplLRG.
The slope of the continuum is changed by variations in the physical properties of the models which we investigate in the following paragraph. It is unlikely that the choice of the \glsIMF could affect the model \glsplSED in such a way to produce the colour mismatch we observe. Changing the Chabrier \glsIMF to a Salpeter (1955) or Kroupa (2001) \glsIMF would only result in a change of the abundance of low mass stars. These should not have a great impact on the continuum slope, since the light in this part of the spectrum is dominated by red giants. Optical colours are not sensitive to the choice of the IMF, which is (when stars are not resolved) often derived from spectroscopy or IR photometry (e.g., Cenarro et al., 2003; Meidt et al., 2012; Conroy & van Dokkum, 2012). In turn, the choice of the particular IMF cannot cause the observed mismatch in colours. Different metallicities change the slope in the continuum as well, but we have considered sub- to super-solar values. Furthermore, we have exploited the model age and extinction as further parameters up to still reasonable values, i.e., and . The burst component significantly affects only the bluer parts of the spectrum and has only marginal impact on the red part, which is dominated by the main stellar population, and not by the small (in total mass) burst fractions. Lastly, the \glsSFH has to be considered. We analysed \glsplCSP with nearly constant \glsSFR and \glsplSSP which create all their stars at one instant in time. We even considered exponentially increasing \glsplSFR. If the too shallow red continuum slope would be a result of a poorly chosen \glsSFH, we would at least expect that the data is bracketed by the considered models in colour space.
Since we cannot isolate physical model parameters that cause the colour mismatch, we can only assume that the issue arises due to inaccuracies in the modelling of the stellar evolution phase most difficult to follow theoretically, i.e., the \glsAGB. The wavelength ranges covered by the colour mismatch hints at an incorrect modelling of \glsAGB stars, since these dominate the SED in the red optical and NIR parts of the spectrum. E.g., Zibetti et al. (2013) show through NIR spectroscopy (comparing BC03 and M05 models in their Fig. 1) that the chemical composition influences the slope of the model SEDs in the NIR, i.e., that these differences are caused by different C/O abundances. Furthermore, the different composition of dust in the circumstellar envelopes of carbon and oxygen-rich \glsAGB stars affect the efficiency of dust absorption and emission which can therefore greatly influence the colours (e.g., Marigo et al., 2008; Cassarà et al., 2013; Salaris et al., 2014). Other explanations could be that the simple mixing-length theory models describing the convection in stars in the \glsAGB phase are insufficient to reproduce the colours, or that incorrect assumptions on mass loss leads to the deviation.
A close investigation of the modelling of the \glsAGB phase is needed to resolve the problem which is beyond the scope of this paper. However, we explain in the next paragraph how we modify the red continuum slope to better match the data.
The red \glsSED continuum follows a function proportional to , heavily modified by absorption lines. Therefore, we can also change the steepness of the continuum by varying . This is done in such a way that we multiply the \glsSED redwards of a wavelength with , where is chosen such that the fluxes of the underlying model \glsSED and the modified \glsSED coincide at . In this way we can change the continuum slope for , wherefore the created \glsSED on average follows . We will term the such modified models “ models” in the remainder of the text.

Figure 7: The black line is an \glsLRG model \glsSED from G13. The dotted blue, green, and red lines show the function with for . The solid lines are the above \glsSED multiplied by , also for . In grey (dark grey dashed line, and light grey solid line) the \glsSDSS filter curves are shown for a galaxy at , and respectively.

We exemplary show the resulting \glsplSED of an \glsLRG model from G13 when modified by for and in Fig. 7. We also plot the \glsSDSS filters as positioned in wavelength in the observed frame at and , the approximate extreme redshift values of the CMASS sample. We can see that for the change in slope mostly affects the and band at lower redshifts. For higher redshifts has moved in the band. Therefore, we create variations of the BC03 \glsSSP and the \glsCSP models with decreasing \glsSFR explained above, with such that the variation does affect the fluxes in different filters for similar redshifts. Furthermore, the values span a range from to with a step size of . Larger ranges for values of and/or do not improve the results further, since the location in colour spaces they would inhabit is already covered by the other models or not populated by the CMASS galaxies.

Figure 8: Colour-colour plots of the CMASS data and model \glsplSED. The \glsBOSS data is split into four equally sized redshift bins. From these bins we calculate the mean colour and plot it as points in the panel. The error bars denote the accompanying root mean square values of the colour in each bin. The points and the error bars are colour-coded with redshift (see the colour bar on the right, where white lines indicate the mean redshift in each bin). The red shaded track shows the colours of the model \glsSED of Fig. 7 with varying redshift, where the crosses are at the same as the data. Again, the redshift of the track is colour coded and shown in the colour bar. The green and blue shaded tracks originate from the same model, when multiplied by  with for . A more detailed description is given in the text.

In Fig. 8 we show the colours of the CMASS data (grey). We furthermore plot the colours of an exemplary \glsSED from G13 (red), and the colours of the same model when modified by , for and (green and blue). The data in Fig. 8 is split into four redshift bins, and we plot the mean colour and corresponding root mean square values of each bin. The points are colour-coded in such a way that they darken with increasing redshift, which is shown by the grey colour bar on the right of Fig. 8, where white lines denote the mean redshifts within the bins. On top of the data, the colour tracks as functions of redshift are plotted for the models. The model colours at the values of the data are highlighted by crosses, and the tracks in the colour spaces are again colour-coded as a function of redshift. The track of the original model is plotted by varying intensities of red, whereas those of the corresponding  models are plotted in green shades for , and blue shades for respectively. All plotted models have . While the data colours (grey) are well matched by the original model (red track) at lower redshifts, the deviations from the mean colour increase with increasing . The modification by  produces a better agreement with the mean data colours. The value of necessary to fit the data best is somewhat ambiguous in this plot, but has to be decided individually for each galaxy together with the best fitting model.

SED Fitting Results with modified BC03 Models

Figure 9: Colour-colour plots for CMASS galaxies at (analogous to Fig. 6). The \glsBOSS data is shown by grey shades and accompanying grey contours. Error bars denote the median error in colour in this redshift bin. The density of the resulting colours of the best fitting BC03 and BC03  models are shown by coloured contours. Both sets of contours are drawn at the same frequency levels, , , , and .
Figure 10: Colour-colour plots for CMASS galaxies and best fitting BC03  models at (analogous to Fig. 4). See Fig. 9 for a detailed explanation of the plot.
Figure 11: Colour-colour plots for CMASS galaxies and best fitting BC03  models at (analogous to Fig. 5). See Fig. 9 for a detailed explanation of the plot.
Figure 12: Colour-colour plots for CMASS galaxies and best fitting BC03  models at (analogous to Fig. 6). See Fig. 9 for a detailed explanation of the plot. The contours had to be smoothed to make them visible due to the small sample size in this redshift bin. This is done in the same way as in Fig. 6.

We now introduce the modified versions of the BC03 models together with the original \glsplSSP and \glsplCSP with decreasing \glsplSFH from Sec. 4.1.1 in the SEDfit code with the same fitting parameters as in Sec. 4.1.1. Figs. 9 through 12 show the \glsSED fitting results in colour spaces of the CMASS data in the four bins. We see that the contours of the data and the best fitting models (which are located at the same steps in frequency) almost coincide in Figs. 9, 10, and 11.

Comparing Figs. 3 to 6 with Figs. 9 to 12, we can observe an improvement on the match between the locations in colour space populated by the best fitting models and the data. This is additionally confirmed by Fig. 29 in App. A, where we see that the resulting values of the fits are more frequently found at lower values for the BC03  models than for the original ones, hence improving the goodness of fit. Here, we want to specifically analyse the offsets in the , , , and colours of the best fitting BC03 models and their modified variations from the data. We show in Figs. 13 to 16 the deviations of the model colours, predicted by the best fitting \glsplSED, to the data, , for both setups and in all four considered bins. The distribution in of the original BC03 models (\glsplSSP, as well as decreasing and increasing \glsSFR \glsplCSP) is presented by red histograms, where the Gaussian curve best fitting the histogram is shown by a dashed orange line. The colour offsets of the BC03  models are plotted by a grey histogram, with an according Gaussian drawn in black. is shown by a dashed black line, and we print the parameter values of the Gaussians in the plots.
While the mean deviations presented in Figs. 13 to 16 are more or less the same in all considered redshift bins, the standard deviation of is about to times higher for the original BC03 \glsplSED. Furthermore, the BC03  distributions of resemble a Gaussian much more closely than that of the original BC03, especially in the  and  colours, but also in . We already pointed out that the  colour is overestimated by the BC03 models analysed in G13 for . This is still true for to , observable in Figs. 13 to 15 (cf. Figs. 9 to 11). Concurrently, the predicted  colours of the unmodified BC03 models are too blue up to , and too red in . In contrast to that, the colours of the BC03  \glsplSED yield very symmetric distributions, although with increased flanks in comparison to a Gaussian for higher redshifts.

In summary, we investigated two additional degrees of freedom, apart from the model \glsSED parameters metallicity, \glsSFH, age, burst, and dust, by modification of the red continuum slope through multiplying with at to match the colours of the CMASS galaxies to a better extent. When fitted to the data, the  \glsplSED predict colours that deviate less from the data than the unmodified models they originate from, and yield lower values. We will use the variety of best fitting \glsplSED (including the additional burst and dust components) as a basis to select from when we define a model set for photometric redshift template fitting codes.

Figure 13: Deviations in , ,  and colours for BC03 models and BC03  models from the data within . The distribution for the BC03  models are plotted in grey, and the best fitting Gaussian is indicated by a solid black line. The same distribution, but for unmodified BC03 models is given by a dark red line, where we plot the best fitting Gaussian by a dashed orange line. The black dashed vertical line highlights . The best fitting parameters of the Gaussian curves are given in the plots.
Figure 14: Deviations in , ,  and colours for BC03 models and BC03  models from the data within . A detailed description of the plot is given in Fig. 13.
Figure 15: Deviations in , ,  and colours for BC03 models and BC03  models from the data within . A detailed description of the plot is given in Fig. 13.
Figure 16: Deviations in , ,  and colours for BC03 models and BC03  models from the data within . A detailed description of the plot is given in Fig. 13.

4.2 Selection of Best Fitting SEDs for the New Template Set

Figure 17: \glsSOM with bins for , , , and for galaxies within , where the parameter values are colour-encoded (see colourbars on the right of each panel). is computed by fitting templates from G13 to the spectroscopic redshift. Contours show the number density of the data in the map (note that the contours are smoothed to improve the clarity of the figure) and are drawn at six equally distributed frequency levels from (black) to (white).

We want to select a set of model \glsplSED from the best fitting models of the previous section that represent the data in terms of colours for each redshift bin. The space we can construct from the \glsSDSS colours is many-dimensional, and we want to reduce the dimensions for simplicity of the selection process. Therefore, we decided that the target space should have two dimensions, a compromise between the loss of information (which is greater for less dimensions) and the simplification through the reduction of dimensions. We could, in principle, perform a \glsPCA, and reduce the dimensions by concentrating on the space which is spanned by the first two eigenvectors of the \glsPCA that have the highest variance. Another possibility for the reduction of dimensions is a self-organising map (SOM or Kohonen-map, Kohonen, 1982, 2001). A \glsSOM is an artificial neural network (ANN) that provides a discrete representation of a set of higher dimensional data values in a lower dimensional space (most often two-dimensional). The network is trained using unsupervised learning to map the data onto the surface in such a way that data points with similar properties (i.e., data values) are located in close neighbourhoods. Unlike a \glsPCA, where the neglection of the third and later components yields a complete loss of the information carried by them, the two-dimensional \glsSOM still entails this information in its points. This is why we chose a \glsSOM over a \glsPCA for the mapping onto a two-dimensional plane in the model selection below.
We create maps of the galaxy catalogues from the , , and  colours of the four considered redshift bins. Furthermore, we can easily introduce also the absolute magnitudes of the galaxies as a forth quantity to be mapped, since can also hold information about the colour (e.g., Baldry et al., 2004). We have to take care of the errors in the data by normalising the colours to their mean value and dividing through the colour errors determined by the uncertainties in the photometry. The absolute magnitudes of the galaxies are calculated by fitting the \glsLRG model \glsplSED of G13 to the data at their spectroscopic redshifts. From the best fitting models we extract the absolute magnitude in the \glsSDSS band, which has to be normalised to a standard normal distribution to be comparable to the colour values. Once this is done we train the \glsSOM and create a surface with discrete - and -values with assigned input values (i.e., colours and ). The positions of data points in the \glsSOM are then determined by performing a nearest neighbour search.
We use a \glsSOM algorithm implemented in python and provided in the PyMVPA package (Hanke et al., 2009). Fig. 17 presents the \glsSOM trained on the catalogue with on the , , and  colours and on . The four panels are representations of the same \glsSOM, but with the values of the four mapped parameters , , and  colours and encoded in colour. The density of the mapped underlying data are shown by contours. We can see from Fig. 17 that the extreme values of  and  have the greatest separation in the map, which is due to these colours having the largest spread in values. This is not a result of measurement errors, since we took these into account through the normalisation of the colours explained in the last paragraph. We can see for example that the dense region in the middle of the panels corresponds to a higher concentration of the data at the respective colour values. These are , , and , which represent also the areas of highest density in the colour-colour plots of Fig. 9.

As previously mentioned, we aim to select a sample of model \glsplSED from the best fitting models of the last section, where we want to take these density variations in the population of the data into account. Therefore, we want to identify clusters in the mapped data to select a model from each cluster that should represent the galaxies within the same cluster cell in terms of colour and absolute band magnitude. To perform the cluster search we employ a -means clustering algorithm (Steinhaus, 1957) that partitions the two-dimensional maps into cluster cells. Each cell is a Voronoi bin (Voronoi, 1908), where two bins or clusters are separated from one another by a border orthogonal to the line connecting the cluster centres. Every data point belongs to the cluster with the nearest centre. Specifically, we use the python -means clustering algorithm included in the scikit-learn package (Pedregosa et al., 2011).
Fig. 18 presents the bins of the -means clustering algorithm with clusters applied to the \glsSOM of Fig. 17, where the cluster centres are indicated by black crosses. The density of the data in the \glsSOM is shown by contours in the plot. The data exhibit a number of clusters in all redshift bins which does not exceed six, wherefore was chosen.
We want to select a set of \glsplSED from the best fitting model \glsplSED which we will use in the following as template set for \glsplphotoz. Therefore, for each cluster bin we take all models into account whose corresponding data points also lie in the same bin. Then, for each of these models separately, we estimate photometric redshifts on all objects within the cell and determine resulting quality parameters, i.e., mean error, scatter and catastrophic failures of the \glsplphotoz. For each cluster we chose a model that yields the best \glsphotoz results in terms of these parameters. Thereby, we have for each of the four redshift bins six models, hence models in total. The such selected \glsplSED will be used in the following to estimate \glsplphotoz.

Figure 18: -means clustering of the \glsSOM for from Fig. 17. The Voronoi bins are highlighted by different colours, and cluster centres are marked by black crosses. Contours show the number frequency of the data (note that the contours are smoothed to increase the clarity of the figure) and are drawn at intervals between (black) and (white).

5 Photometric Redshifts

In this section we analyse the photometric redshift results we get using our novel templates in combination with the PhotoZ code on the whole CMASS sample detailed in Sec. 3. Afterwards, we compare our \glsplphotoz with the photometric redshifts provided by the \glsSDSS database.

5.1 Photometric Redshifts with the Novel Template SEDs

To introduce the models created in the previous section into the PhotoZ code, we define the prior such that of a model \glsSED is the central value of the bin of the catalogue from which the model originates, wherefore . As , we set a default value of which leads to a prior function wide enough to avoid focusing effects at certain redshift bin centres, while ensuring a smooth transition between them. The resolution in redshift of a \glsphotoz run is in the range of . The allowed redshift range is much wider than that which is populated by the CMASS galaxies, such that we can analyse if the \glsphotoz accuracy is diminished by values that are highly over- or underestimated. This is done because we would like to be able to run \glsphotoz codes with the new models on catalogues with galaxies from larger redshift ranges and with more variations in \glsSED type. We therefore want to make sure that small errors in are not due to a restriction in the redshift range.
To improve the priors by adapting them iteratively, we analyse the outcome for subsamples of objects which are fitted best by a specific model. Thereby we can adjust the redshift and luminosity priors for each model \glsSED in order to reduce outliers and bias. Essentially, we decrease the value of whenever a model yields lower accuracies for redshifts further away from its bin centre. Furthermore, we allow to vary if the \glsphotoz performance of a specific model can be enhanced. If we observe that a specific template provides very bad redshifts which cannot be resolved by adjusting the respective prior, we omit these models completely in following runs. This is mostly the case for models created from the highest two redshift bins. Since we chose the models only on account of their \glsphotoz performance on a redshift bin, they might still yield a bad estimate in redshift ranges outside the bin. The resulting model set then consists of nine \glsplSED with adapted redshift priors. The luminosity priors where set initially to , , and , to allow for a wide range of higher luminosities. The high exponent leads to a very flat functional behaviour within , and to steep decreases in at . Adjusting in the iteration is not necessary, since we cannot detect outliers which could be avoided through a different luminosity prior.
The model \glsplSED are shown in Fig. 19, where we plot them in the wavelength range covered by \glsSDSS at the redshifts of our catalogue . In the lower panel we present the redshift prior parameters , and which correspond to the model \glsplSED. Fig. 20 shows the colours of the nine models as a function of redshift. With the nine models we can account for the large spread of the data in most cases. Furthermore, we cover also the bluer parts in , , and  (cf. Fig. 2). For lower redshifts, our models produce colours that cover only the bluer  and  ranges of the data. This is not because the BC03  do not fit the colours of the data (cf. Figs. 9 to 12), but accidental, since the models were selected (from within their cluster bins) on account of their \glsphotoz performance in Sec. 4.2. The model SEDs with designations to cover only peculiar blue colours of the data. These models were created on the basis of the higher samples, which is why they match the data better at higher redshifts (especially the  colour). The bluer colours of the models can only be observed from a small number of galaxies. Therefore, when all models are fitted to the spectroscopic redshift (but also in the \glsphotoz estimation below), the models - are best fitting only for to of galaxies.
The physical parameters and the and values of the nine model \glsplSED are summarised in Tab. 1. They explain the behaviour of the \glsplSED. From Fig. 19 we see that the model \glsplSED roughly follow a trend and become bluer with increasing redshift (except for the red coloured \glsSED with ), which is mirrored in the values of Tab. 1. The -folding time scale roughly increases, as well as the burst fraction, making the resulting \glsplSED bluer on average. The red highlighted \glsSED has lower fluxes than the orange and the yellow-green \glsplSED in the UV part of the spectrum, not continuing the sequence. It is redder because of the high extinction values and because of its (relatively) high metallicity, .
In summary, although we can see a qualitative trend in the \glsplSED as a function of redshift prior (which originates from the spectroscopic redshifts of the underlying bin with small adaptions), the trends in the physical parameters are not that evident. This is because they are degenerate and changes in one parameter can yield similar results in the SEDs as a variation in another parameter (e.g., the well-known age-metallicity degeneracy).

Figure 19: Upper panel: The nine surviving model \glsplSED within the range of the \glsSDSS filter system at redshifts (the range of our galaxy sample). Lower panel: Corresponding prior parameters, and . The colour code is the same as in the upper panel.
Figure 20: In the upper panels of the four major panels the CMASS colours are plotted as a function of redshift in grey, and the median is shown by a black dashed line. The lower panels are normalised to the median colour and show the residuals. Error bars present the median data errors in five redshift bins. On top of that the predicted colours of the nine selected models are drawn. The colour code matches that of Fig. 19.
model # age burst
1 (black) 0.008 SSP 2.0 1.2 5000 2.0
2 (violet) 0.004 SSP 6.0 0.6 1.0 % 2.0 5500 1.5
3 (blue) 0.008 1.0 6.0 1.0 1.0 % 0.0 5000 1.5
4 (turquoise) 0.05 SSP 8.0 0.7 1.0 % 0.0 3500 2.0
5 (dark green) 0.02 SSP 4.0 0.0 1.0 % 1.0 5000 1.5
6 (green) 0.05 SSP 3.0 1.3 2.0 % 0.0 3500 2.0
7 (yellow-green) 0.05 3.0 5.0 1.4 1.0 % 0.0 5000 2.0
8 (orange) 0.004 50.0 4.0 2.7 1.0 % 0.0 3000 1.5
9 (red) 0.02 3.0 4.0 2.2 2.0 % 2.0 4500 2.0
Table 1: Physical parameters of the nine surviving templates. The first column gives the numbering of the models and the plot colour according to Figs. 19 and 20. The column ”burst“ is the mass fraction of the burst, whereas is the burst extinction.

Before analysing the \glsphotoz performance of the new models, we want to introduce several metrics which we use to assess the photometric redshift quality. We decide to provide a large number of metrics to enable the reader to compare with other publications. The photometric redshift error is , and in the rest frame. Catastrophic outliers are defined such that (cf. Ilbert et al., 2006). The mean errors are characterized by the bias and the mean absolute error , as well as by their corresponding values in the rest frame, and . The root of the sample variance is denoted , and is the half of the width of the distribution where of the sample is located, corresponding to a confidence interval. Finally, in terms of scatter we also calculate the normalised median absolute deviation (Ilbert et al., 2006) which is calculated for non-outliers only, and gives a clue about the width of the distribution without regarding the flanks. We calculate the fractions , of galaxies within , where (cf. Carrasco Kind & Brunner, 2014). If behaved as a perfect Gaussian, then and would be the and confidence intervals and therefore hold and of the objects. But the distribution of can be non-Gaussian and still yield reasonable values for and (compare Figs. 22 and 27 later on).
To evaluate the precision of the \glsphotoz errors estimated by the code (Eq. 1) from the \glsPDF (cf. Sec. 2), we introduce . If the errors are estimated correctly, the distribution of should resemble a standard normal distribution (cf. Sánchez et al., 2014). We will therefore analyse the values of the mean and the standard deviation of the distribution of .
Lastly, we perform Kolmogorov-Smirnov (KS) tests on the distributions and (cf. Carrasco Kind & Brunner, 2014), where the latter is the distribution derived by stacking the \glsplPDF for all objects. The KS test value is the maximum absolute difference between the cumulative distribution functions of the probability density of , or the stacked , to the cumulative distribution of .

Figure 21: Photo- results with novel templates, based on the BC03 models and selected through a \glsSOM and a -means clustering algorithm. Left Panels: The upper panel shows the spectroscopic versus the photometric redshift, where the dashed line is at . The middle panel is the distribution of the rest frame \glsphotoz error as a function of the spectroscopic redshift, where we plot the bias (solid line) as well as the mean absolute error (dashed line). Finally, the lower panel shows versus the absolute rest frame error, where we plot (solid) and (dashed) on top. Right Panel: The relative frequency distributions of (grey filled histogram), (dark grey line), and the stacked \glsPDF of all objects (black line). Additionally, the \glsphotoz quality metrics are printed in the figure. We provide two values for the test, for the predictions of single values (from the mode of the \glsPDF, grey), and for the \glsPDF (black).

After this introduction of the \glsphotoz quality metrics we present the photometric redshift results of the CMASS sample with the novel template set and priors of Sec. 4.2 and Fig. 19 in Fig. 21. The upper left panel shows versus , the middle left panel shows the photometric redshift rest frame error as a function of spec-, and the lower left panel presents versus . We indicate the median and values in the middle left panel by solid and dashed black lines, whereas in the lower panel we highlight the and values as a function of the spectroscopic redshift also by solid and dashed black lines. Finally, the right panel presents the normalised redshift distributions for (grey filled histogram), the distribution derived from the single-value photometric redshifts (grey line) derived by the mode of , and the stacked \glsplPDF (black line). The photometric redshift quality metrics discussed above are printed in the plot as well.
The bias has a small positive value for the lowest considered redshifts, then is close to zero, and is decreasing to negative values for higher redshifts . The overall mean value is still positive due to the small sample size at higher redshifts, visible by the number density in Fig. 1 or the right panel of Fig. 21. The scatter (lower left panel in Fig. 21) increases slightly with increasing spec-, while the value of stays more or less the same. This means, that the outliers (which are not considered in the calculation of ) are predominantly responsible for an increase in . The fraction of catastrophic outliers however is very small . In the right panel of Fig. 21 we observe that the \glsphotoz predictions from the mode of the \glsPDF yield deviations from larger than for the case where the whole \glsplPDF are considered. The excesses observed in are due to the overestimation mentioned previously. They are not anymore visible when we use the posterior distributions in the reconstruction, which means that the \glsPDF should be favoured for science analyses (cf., e.g., Mandelbaum et al., 2008; Hildebrandt et al., 2012).
As for the error estimation of the PhotoZ code, we see from the values of and , that behaves very close to a standard normal distribution, which is the goal in the photometric redshift error predictions. This means that not only the approach for the calculation of is legitimate, but also that the models and priors create a reasonable (from which is extracted).
Tab. 2 in Sec. 5.2 holds a summary of the derived quality metrics, together with results of public \glsplphotoz from the \glsSDSS, which we will analyse in Sec. 5.2.
To investigate if the estimated errors are reliable in identifying outliers, we assume the null hypothesis that an object is a \glsphotoz outlier. The probability of an outlier being falsely classified as a non-outlier on account of is then the type I error . Additionally, the type II error gives the probability of a non-outlier being misclassified as an outlier by the estimated error. For the \glsplphotoz of this work we get and . This means that an outlier is falsely classified as a non-outlier with probability , but also that a non-outlier is almost never classified as an outlier (). Although is close to, but slightly smaller than, , the deviation is probably caused by to shallow peaks around the mode of for outliers, such that their errors are underestimated. The and value are summarised together with the results of the \glsSDSS (Sec. 5.2) in Tab. 3.

Figure 22: Photometric redshift rest frame error distribution (grey filled histogram) estimated with the novel templates and priors. is highlighted by a solid black line, whereas the bias is shown by a solid red line. The steps at , , and are represented by dashed, dash dotted, and dotted red lines, and the corresponding number fractions are given in the legend. We also plot a Gaussian with , and in grey. Furthermore, dash dotted and dotted grey vertical lines show the real and intervals. Since we cannot discern the red and the grey dash dotted lines. The derived from the stacked \glsplPDF is shown by a green solid histogram, to which we fit a Gaussian highlighted by a dashed green curve.

The distribution of the photometric redshift errors in the rest frame is shown in Fig. 22 by a grey filled histogram. In red we indicate the bias and the ranges of , where . Furthermore, we calculate the real interval widths where , and of the galaxies are located and introduce them in the plot (black dash dotted and dotted vertical lines). Finally, we calculate the distribution from the stacked \glsplPDF and fit a Gaussian to the histogram (green lines).
The peak of the distribution is slightly shifted to the right, which is again due to the overestimated \glsplphotoz at the lowest redshifts. Concerning the resemblance to a Gaussian, we observe from the values given in the plot that (which is why we cannot discern the lines in Fig. 22) and . The latter is due to the outliers which can be scattered far from the spec- value due to the allowed fitting range. The fractions and of objects within and have reasonable values. While is very close to the desired value of , is slightly smaller by only . The distribution derived from the \glsplPDF is slightly broader than the distribution of the individual \glsphotoz results. This is a consequence of the asymmetry of the \glsplPDF that often have higher probabilities for redshifts greater than the most probable .

Figure 23: Deviations in magnitudes of the predicted magnitudes of the novel model \glsplSED to the data.

Fig. 23 presents the deviations of magnitudes, predicted by the models in a \glsphotoz run, and the data. A Gaussian is fitted to the histograms whose best fitting parameters are printed in the panels. For the and band magnitudes the distributions are very narrow, while the expectation values of the Gaussians are near zero. This is thanks to the depth of the photometry in these filters, which is greatest in and , and the photometry has the smallest measurement uncertainties. For the and the band magnitudes however, the broadness of the distributions increases, which is mostly an effect of the more shallow photometry in these bands, but also of the respective magnitude ranges of the galaxies. Furthermore, the mean magnitude deviations in these two bands are higher than in and . While of the best fitting Gaussian is still small in comparison to for the band, it is relatively high in the band. We selected the models for the \glsphotoz estimation only on account of their \glsphotoz performance, and not on how well they match the data in terms of magnitudes. However, this does not necessarily imply that the BC03  models are not able to fit the data in terms of magnitudes. Indeed, calculating the average magnitude offsets of the \glsSED fitting of Sec. 4.1.3, yield maximum values of in , , , and for the four redshift bins.
In the next section we want to compare the photometric redshift results of this section with the \glsplphotoz provided by \glsSDSS.

5.2 Comparison to SDSS Photometric Redshifts

Figure 24: Photo- results from the KF approach published by \glsSDSS. See Fig. 21 for a detailed description.
Figure 25: Photometric redshift rest frame error distribution (grey filled histogram) estimated by the SDSS-KF method. A detailed explanation is given in Fig. 22.

We present the photometric redshifts from the \glsSDSS-III database in comparison to the results of our code-template combination. The \glsSDSS database provides \glsplphotoz from two different empirical methods. One code uses a -d tree nearest neighbour fit to derive the redshifts (hereafter KF after Csabai et al., 2007). The SDSS-KF results for our catalogue are shown in Fig. 24, which is equivalent to Fig. 21 and also shows the quality metrics. From the middle panel of Fig. 24 we see that the location of the densest part of the population lies close to the line for low redshifts . In all higher regions the photometric redshift is systematically underestimated which leads to the low negative value in the bias. An underestimation is also present in the results from Sec. 5.1, but it is slightly higher in the SDSS-KF case. From the lower panel of Fig. 24 we see that also here the rise in is mostly caused by outliers, which are not considered in the calculation of (which stays more or less constant). Taking the estimated photometric redshift errors into account, the relatively high bias does not decrease, which is mirrored in the higher absolute value of , but even still increases. Hence, the bad estimates of the \glsplphotoz are not recognized by the error estimation. This can also be seen in the value of which is well beyond and means that the errors are on average underestimated. This also affects the results of a significance test on outlier classification (cf. Sec. 5.1). The errors type I and II read (at two significant figures) and (see also Tab. 3 for comparison with the other results), confirming that the errors are underestimated.
In terms of the low \glsphotoz values at higher do not change the shape significantly, which is due to the low sample size in these regions, and results in the low value of KS. We cannot compare to the results for \glsplPDF, since they are not provided.
As before, we show in Fig. 25 the distribution. Comparing with our case from Fig. 22, we see that here the results are slightly shifted to the left, an effect due to the underestimated \glsplphotoz at higher redshifts. Also in the case of the SDSS-KF redshifts, the and values do not perfectly coincide with and , but are elevated. is very close to the desired value, while deviates by .
In comparison to the redshifts we get with our novel templates and code, the quality metrics for bias, error, and scatter are all higher in the case of the SDSS-KF \glsplphotoz. Furthermore, the estimated redshift errors are more reliable in our code-template combination. The KS test yields better results in the case of this work, since the SDSS-KF \glsplphotoz are systematically underestimated.

Figure 26: Photo- results from the RF approach published by the \glsSDSS. See Fig. 21 for a detailed description.
Figure 27: Photometric redshift rest frame error distribution of the SDSS-RF published by the \glsSDSS. A detailed explanation is given in Fig. 22.

The second photometric redshift results published by the \glsSDSS are from another empirical code, which uses random forests to predict \glsplphotoz (Carliles et al., 2010, hereafter SDSS-RF). Fig. 26 presents the results of the SDSS-RF code. In this case, the outlier rate is significantly higher than in both previous cases, as well as the mean and the absolute rest frame errors. Above the \glsplphotoz are systematically underestimated to a greater extent than in the former two cases, yielding the low bias. Furthermore, the scatter values , , and have also increased in comparison to the results of this work, as well as compared to the SDSS-KF redshifts.
In the distribution of we can observe the lack of higher \glsphotoz values which accumulate around . Because of that, the KS test value is higher. In terms of their estimated photometric redshift errors, the SDSS-RF method outperforms the SDSS-KF approach, but still produces a larger bias in . The errors are furthermore overestimated, visible in the value which is lower than the desired value of . Again, this affects also the outlier classification derived from . In the case of SDSS-RF \glsplphotoz, the null hypothesis of an object being an outlier yields and (shown together with the previous results in Tab. 3). The higher values of yield a smaller probability of an object being misclassified as a non-outlier, but also raises (although only to a small value). So although the errors are overestimated on average, outliers are still misclassified at a significance level of .
In Fig. 27 we plot the distribution. The shift to the left is present to an even greater extent than in the previous case, and strongly deviating from a Gaussian with and also plotted in Fig. 27. This is a result of the underestimation of the \glsplphotoz. Furthermore, as in the previous cases, the value of is marginally higher than , while the deviation is greater in . On the other hand is even closer to than in both previous cases, while deviates more strongly, .

setup KS
this work
Table 2: Summary of photometric redshift quality metrics of the template fitting results with the novel templates and priors used with the PhotoZ code, the \glsSDSS \glsplphotoz of the KF code, and the random forest code (SDSS-RF). Values for KS in the case of this work are derived from , whereas the values in brackets are calculated using only the predictions.
this work SDSS-KF SDSS-RF
Table 3: Significance test for outlier classification. The null hypothesis is that an object is an outlier, and and are the type I and II errors respectively (rounded to two significance figures).

We present a summary of the photometric redshift quality metrics of the results from the template fitting of this work and the two \glsSDSS \glsplphotoz in Tab. 2. The model-prior combination of this work produces the lowest outlier fraction, bias, and mean absolute error compared to the \glsSDSS \glsphotoz results. When considered as a function of spec- the photometric redshifts derived in this work are slightly biased to lower values for higher , but not to the same extent of the \glsSDSS codes. Furthermore, the different scatter values calculated (i.e., , , and ) are also lower for \glsplphotoz of this work. The distribution of redshift errors are more similar to a Gaussian for the models, priors, and code of this work, which can be observed in Figs. 22, 25, and 27. Considering the number fractions within and all three codes perform similarly well. However, this does not mean that the distributions are necessarily good approximations of Gaussian distributions with and which can be observed in Figs. 22, 25 and 27.
When we evaluate the similarity between the redshift distributions, from which spectroscopic and photometric redshifts are sampled, through a KS test, the \glsplphotoz of this work yield the best results for predictions of the mode which are improved when considering the whole distribution.
The results of a significance test with the null hypothesis being that an object is a photometric redshift outlier is shown in Tab. 3. Not one of the codes produces errors that can reliably predict outliers. Albeit the SDSS-RF produces the best , this is mostly due to the overestimated errors (cf. Tab. 2).

5.3 Deviations in Colour Predictions

In the last section we analysed the photometric redshift results with our code in comparison to the \glsSDSS \glsplphotoz. The redshifts from the SDSS-RF method were outperformed by those derived with the SDSS-KF code. We want to analyse if the \glsplphotoz of this work can produce the CMASS galaxy colours to a better extent than if we fit the same models to the data at the photometric redshifts of the SDSS-KF method.

Figure 28: Deviation in , , , and colours for photometric redshift results with the novel templates (grey filled histogram) and model predictions when fitting the same templates to the SDSS-KF \glsplphotoz (blue histogram). Gaussians are fitted to both distributions, and are shown by a solid black line and a dashed green line respectively. The best fitting parameters of the Gaussians are printed in the panels.

Fig. 28 presents the differences between the model predictions and the data in the , , , and colours. The prediction of the \glsplphotoz of this work are plotted in grey, the best fitting Gaussians are presented by black curves, and their parameters are given in the plot. When fitted to the photometric redshifts of the SDSS-KF approach, the novel BC03  models from Sec. 4.2 predict colours which are represented by blue histograms in Fig. 28, to which we also fit Gaussian curves. In all four colours, the standard deviation of the distribution is larger for SDSS-KF. Furthermore, the mean deviation from the data colours is greater for SDSS-KF results in the  and colours. is of the same order in  for both cases. It is smaller for SDSS-KF \glsplphotoz in , but there both values are very low. The fact that the SDSS-KF , , and colours are shifted bluewards from the distribution in colours of this work is a result of the SDSS-KF \glsplphotoz being more heavily underestimated. All considered distributions resemble their best fitting Gaussians well in the inner parts, with increased flanks. However, this increase is more significant in for the SDSS-KF \glsplphotoz in .
Therefore, the photometric redshifts derived in this paper not only generally have better quality metrics than the \glsSDSS \glsplphotoz, but can furthermore recreate the colours of the CMASS galaxies to a better extent.

6 Summary and Conclusions

In this work we created a set of model \glsplSED that are designed to match the colours of the \glsBOSS CMASS sample and provide accurate photometric redshifts. We first analysed the colours of the \glsLRG model of Maraston et al. (2009) which was created to match the  and  colours of the \glsLRG sample of \glsSDSS-II (Eisenstein et al., 2001), and found that we cannot use a singular age or evolution configuration that matches the data in all colours. Therefore, we created models for four redshift bins of widths centred on with the stellar population synthesis code of Bruzual & Charlot (BC03, 2003). We generated \glsplSSP and \glsplCSP with decreasing \glsplSFH at various metallicities, and sampled the models from a wide age range. These models were then fitted in superposition to a burst component and dust extinction to the data at known spectroscopic redshifts.
We observe a mismatch in the colours of the data and the models which is due to a too shallow red continuum slope in the model \glsplSED. Large variations in the model parameters (i.e., metallicity, \glsSFH, age, burst, dust extinction, \glsIMF) do not produce models that better fit the data. We speculate that inaccuracies in the modelling of the \glsAGB phase could be the reason for the observed colour mismatch. In order to better recreate the colours of the CMASS galaxies we introduce additional degrees of freedom by modifying the model \glsplSED by multiplication with for with several values for and . We showed that the BC03 models modified in this way indeed are a better match to the colours of the CMASS galaxies (Figs. 3 to 6, 9 to 12, and 13 to 16) and also yield better values from the fitting (Fig. 29).
From these best fitting \glsplSED we selected a small subset that should cover the region in colour space and absolute magnitude in of the CMASS sample. We therefore projected the CMASS galaxy colours and of the four bins onto two-dimensional planes using a self-organizing map. Afterwards we partition the plane in six clusters for each redshift bin with a -means clustering algorithm and select one model \glsSED per cluster cell that produces the best \glsplphotoz for galaxies within the same cell. We estimated photometric redshifts with a template fitting code and with the selected models and analyse their individual performance. Thereby we modified their redshift priors to improve on the \glsplphotoz, but also decided to omit some of the templates which do not yield accurate \glsplphotoz on the whole sample, regardless how the priors are modified. We then compared the photometric redshift results with the \glsplphotoz of two empirical methods published by \glsSDSS and calculated several metrics that assess the quality of the \glsplphotoz, their estimated errors, and their distribution. We found that the \glsplphotoz with the generated models of this work produce better values in all quality metrics. Furthermore, we observed that including the stacked \glsplPDF yield better results in the reconstruction of mirrored in the results of a KS test. Concerning the estimated errors , a significance test shows that none of the three considered results provides a reliable classification of outliers. However, the probabilities of a non-outlier being misclassified is very small for all three considered cases.
Finally, we compared the predicted colours of the novel model \glsplSED when fitted to the \glsplphotoz of this work and to the better of the two \glsSDSS redshifts, to the data. We found that the deviations from the data are smaller for \glsplphotoz of this work.

The models of this work and G13 can be downloaded together with the priors at We would be happy to provide the photometric redshifts on request.


We would like to thank the referee for her/his suggestions, which improved the manuscript. We also thank Achim Weiss for discussions about \glsAGB modelling.

This work was supported by SFB-Transregio 33 (TR33) “The Dark Universe” by the Deutsche Forschungsgemeinschaft (DFG) and the DFG cluster of excellence “Origin and Structure of the Universe“.

Funding for \glsSDSS-III has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, and the U.S. Department of Energy Office of Science. The \glsSDSS-III web site is \glsSDSS-III is managed by the Astrophysical Research Consortium for the Participating Institutions of the \glsSDSS-III Collaboration including the University of Arizona, the Brazilian Participation Group, Brookhaven National Laboratory, Carnegie Mellon University, University of Florida, the French Participation Group, the German Participation Group, Harvard University, the Instituto de Astrofisica de Canarias, the Michigan State/Notre Dame/JINA Participation Group, Johns Hopkins University, Lawrence Berkeley National Laboratory, Max Planck Institute for Astrophysics, Max Planck Institute for Extraterrestrial Physics, New Mexico State University, New York University, Ohio State University, Pennsylvania State University, University of Portsmouth, Princeton University, the Spanish Participation Group, University of Tokyo, University of Utah, Vanderbilt University, University of Virginia, University of Washington, and Yale University.

Appendix A SED Fitting Results

In this paper we create models that should match the \glsBOSS CMASS galaxies in terms of their colours in redshift bins with widths centred on . We do so by fitting sets of model \glsplSED with different properties like metallicity, \glsplSFH, and ages to the data (Sec. 4.1.3). In superposition to the \glsplSED, a burst model and additional dust extinctions for the burst and the main component are further degrees of freedom. In Fig. 29 we show the distributions of the values returned by the \glsSED fitting code, when we fit the BC03, M05, and M11 models in equal setups to the data. The setup of the \glsSED fitting and the code are detailed in Sec. 4.1 and is equal for all cases.

Figure 29: distributions of \glsSED fitting runs (at ) detailed with BC03 (grey), BC03  (orange), M05 (blue), and M11 (green) models for the four considered redshift regions. The frequency is normalised and plotted logarithmically. We can see that although the distributions for BC03, M05, and M11 are similar, the values are improved by including the BC03  models.

In Sec. 4.1.3 we modify the BC03 models, changing their continuum slope by multiplying with for with various and values. This is done to introduce further degrees of freedom and to create models that match the colours of the data to a better extent. We fit the  modifications of BC03 \glsplSSP and \glsplCSP with decreasing \glsplSFR to the CMASS galaxy colours with the same setup as mentioned above and as detailed in Sec. 4.1.3. The resulting values of the fit are shown together with the previous results of the BC03, M05, and M11 models in Fig. 29, where they are highlighted by green histograms.
While the results of the original BC03 models, M05, and M11 produce similar results in terms of , the fits are significantly improved by using the BC03  models. This is also shown in terms of colours in Figs. 13 to 16 of Sec. 4.1.3.

Appendix B Query for CMASS Sample

We present here the SQL query executed to download the data from the \glsSDSS-III CasJobs website3 used in this work and detailed in Sec. 3. In the following <fil> denotes any of the five Sloan filters and can be replaced by u, g, r, i, and z.

select p.ObjID, s.z as zSpec,
       s.CModelMag_<fil>, s.CModelMagErr_<fil>,
       p.CModelFlux_<fil>, p.CModelFluxIvar_<fil>,
       pz.z as zPhot, pz.zErr as zPhotErr,
       pzrf.z as zPhotRF, pzrf.zErr as zPhotRFErr
from specPhotoall s, spa a, Photoobjall p,
     Photoz pz, PhotozRF pzrf
where s.SpecObjID=a.SpecObjID and
      s.ObjID=p.ObjID and pz.ObjID=p.ObjID and
      pzrf.ObjID=p.ObjID and
      s.zWarning=0 and s.z>0 and
      (a.BOSS_TARGET1 & 2>0 or
      (a.BOSS_TARGET1 & 1>0 and s.Tile>=10324)) and
      a.BOSSPrimary=1 and a.zWarning_NoQSO=0 and
      a.Chunk!='BOSS1' and a.Chunk!='BOSS2' and
      p.Fiber2Mag_i<21.5 and p.Clean=1 and
      (p.CalibStatus_<fil> & 1)!=0 and
      s.z>=0.45 and s.z <=0.9


  2. or


  1. Abazajian K. N. et al., 2009, ApJS, 182, 543
  2. Ahn C. P. et al., 2014, ApJS, 211, 17
  3. Arnouts S., Cristiani S., Moscardini L., Matarrese S., Lucchin F., Fontana A., Giallongo E., 1999, MNRAS, 310, 540
  4. Baldry I. K., Glazebrook K., Brinkmann J., Ivezić Ž., Lupton R. H., Nichol R. C., Szalay A. S., 2004, ApJ, 600, 681
  5. Bender R. et al., 2001, in Deep Fields, S. Cristiani, A. Renzini, & R. E. Williams, ed., p. 96
  6. Benítez N., 2000, ApJ, 536, 571
  7. Benítez N. et al., 2009, ApJ, 691, 241
  8. Bolzonella M., Miralles J.-M., Pelló R., 2000, A&A, 363, 476
  9. Brammer G. B., van Dokkum P. G., Coppi P., 2008, ApJ, 686, 1503
  10. Brimioulle F., Lerchster M., Seitz S., Bender R., Snigula J., 2008, ArXiv e-prints
  11. Brimioulle F., Seitz S., Lerchster M., Bender R., Snigula J., 2013, MNRAS, 432, 1046
  12. Bruzual G., Charlot S., 2003, MNRAS, 344, 1000
  13. Calzetti D., Armus L., Bohlin R. C., Kinney A. L., Koornneef J., Storchi-Bergmann T., 2000, ApJ, 533, 682
  14. Carliles S., Budavári T., Heinis S., Priebe C., Szalay A. S., 2010, ApJ, 712, 511
  15. Carrasco Kind M., Brunner R. J., 2013, MNRAS, 432, 1483
  16. Carrasco Kind M., Brunner R. J., 2014, MNRAS, 438, 3409
  17. Cassarà L. P., Piovan L., Weiss A., Salaris M., Chiosi C., 2013, MNRAS, 436, 2824
  18. Cenarro A. J., Gorgas J., Vazdekis A., Cardiel N., Peletier R. F., 2003, MNRAS, 339, L12
  19. Chabrier G., 2003, PASP, 115, 763
  20. Collister A. A., Lahav O., 2004, PASP, 116, 345
  21. Conroy C., van Dokkum P. G., 2012, ApJ, 760, 71
  22. Csabai I. et al., 2003, AJ, 125, 580
  23. Csabai I., Dobos L., Trencséni M., Herczegh G., Józsa P., Purger N., Budavári T., Szalay A. S., 2007, Astronomische Nachrichten, 328, 852
  24. Dawson K. S. et al., 2013, AJ, 145, 10
  25. Drory N., Bender R., Hopp U., 2004, ApJ, 616, L103
  26. Drory N., Feulner G., Bender R., Botzler C. S., Hopp U., Maraston C., Mendes de Oliveira C., Snigula J., 2001, MNRAS, 325, 550
  27. Drory N., Salvato M., Gabasch A., Bender R., Hopp U., Feulner G., Pannella M., 2005, ApJ, 619, L131
  28. Eisenstein D. J. et al., 2001, AJ, 122, 2267
  29. Eisenstein D. J. et al., 2011, AJ, 142, 72
  30. Feldmann R. et al., 2006, MNRAS, 372, 565
  31. Feulner G., Gabasch A., Salvato M., Drory N., Hopp U., Bender R., 2005, ApJ, 633, L9
  32. Gabasch A. et al., 2004, A&A, 421, 41
  33. Gabasch A., Goranova Y., Hopp U., Noll S., Pannella M., 2008, MNRAS, 383, 1319
  34. Gerdes D. W., 2009, in Bulletin of the American Astronomical Society, Vol. 41, American Astronomical Society Meeting Abstracts #213, p. #483.03
  35. Greisel N., Seitz S., Drory N., Bender R., Saglia R. P., Snigula J., 2013, ApJ, 768, 117
  36. Gruen D. et al., 2013, MNRAS, 432, 1455
  37. Gruen D. et al., 2014, MNRAS, 442, 1507
  38. Hanke M., Halchenko Y. O., Sederberg P. B., Hanson S. J., Haxby J. V., Pollmann S., 2009, Neuroinformatics, 7, 37
  39. Hildebrandt H. et al., 2012, MNRAS, 421, 2355
  40. Huterer D., Takada M., Bernstein G., Jain B., 2006, MNRAS, 366, 101
  41. Ilbert O. et al., 2006, A&A, 457, 841
  42. Kohonen T., 1982, Biological Cybernetics, 43, 59
  43. Kohonen T., 2001, Self-Organizing Maps
  44. Kroupa P., 2001, MNRAS, 322, 231
  45. Longhetti M., Saracco P., 2009, MNRAS, 394, 774
  46. Ma Z., Hu W., Huterer D., 2006, ApJ, 636, 21
  47. Mancone C. L., Gonzalez A. H., 2012, PASP, 124, 606
  48. Mandelbaum R. et al., 2008, MNRAS, 386, 781
  49. Maraston C., 1998, MNRAS, 300, 872
  50. Maraston C., 2005, MNRAS, 362, 799
  51. Maraston C., Pforr J., Renzini A., Daddi E., Dickinson M., Cimatti A., Tonini C., 2010, MNRAS, 407, 830
  52. Maraston C., Strömbäck G., 2011, MNRAS, 418, 2785
  53. Maraston C., Strömbäck G., Thomas D., Wake D. A., Nichol R. C., 2009, MNRAS, 394, L107
  54. Marigo P., Girardi L., Bressan A., Groenewegen M. A. T., Silva L., Granato G. L., 2008, A&A, 482, 883
  55. Meidt S. E. et al., 2012, ApJ, 744, 17
  56. Monna A. et al., 2014, MNRAS, 438, 1417
  57. Oke J. B., Gunn J. E., 1983, ApJ, 266, 713
  58. Padmanabhan N. et al., 2005, MNRAS, 359, 237
  59. Pedregosa F. et al., 2011, Journal of Machine Learning Research, 12, 2825
  60. Pforr J., Maraston C., Tonini C., 2012, MNRAS, 422, 3285
  61. Pickles A. J., 1998, PASP, 110, 863
  62. Saglia R. P. et al., 2012, ApJ, 746, 128
  63. Salaris M., Weiss A., Cassarà L. P., Piovan L., Chiosi C., 2014, A&A, 565, A9
  64. Salpeter E. E., 1955, ApJ, 121, 161
  65. Sánchez C. et al., 2014, MNRAS, 445, 1482
  66. Sánchez E. et al., 2011, MNRAS, 411, 277
  67. Shapley A. E., Steidel C. C., Erb D. K., Reddy N. A., Adelberger K. L., Pettini M., Barmby P., Huang J., 2005, ApJ, 626, 698
  68. Smee S. A. et al., 2013, AJ, 146, 32
  69. Steinhaus H., 1957, Bull. Acad. Pol. Sci., Cl. III, 4, 801
  70. Voronoi G., 1908, Journal für die Reine und Angewandte Mathematik, 133, 97–178
  71. York D. G. et al., 2000, AJ, 120, 1579
  72. Zibetti S., Gallazzi A., Charlot S., Pierini D., Pasquali A., 2013, MNRAS, 428, 1479
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description