Do spectra improve distance measurements of Type Ia supernovae?
Key Words.:
supernovae: general — cosmology: observationsWe investigate the use of a wide variety of spectroscopic measurements to determine distances to lowredshift Type Ia supernovae (SN Ia) in the Hubble flow observed through the CfA Supernova Program. We consider linear models for predicting distances to SN Ia using lightcurve width and color parameters (determined using the SALT2 lightcurve fitter) and a spectroscopic indicator, and evaluate the resulting Hubble diagram scatter using a crossvalidation procedure. We confirm the ability of spectral flux ratios alone at maximum light to reduce the scatter of Hubble residuals by % [weighted rms, or mag for the flux ratio ] with respect to the standard combination of lightcurve width and color, for which mag. When used in combination with the SALT2 color parameter, the colorcorrected flux ratio at maximum light leads to an even lower scatter ( mag), although the improvement has low statistical significance () given the size of our sample (26 SN Ia). We highlight the importance of an accurate relative flux calibration and the failure of this method for highlyreddened objects. Comparison with synthetic spectra from 2D delayeddetonation explosion models shows that the correlation of with SN Ia absolute magnitudes can be largely attributed to intrinsic color variations and not to reddening by dust in the host galaxy. We consider flux ratios at other ages, as well as the use of pairs of flux ratios, revealing the presence of smallscale intrinsic spectroscopic variations in the irongroupdominated absorption features around Å and Å. The best flux ratio overall is the colorcorrected at d from maximum light, which leads to % lower scatter ( mag) with respect to the standard combination of lightcurve width and color, at significance. We examine other spectroscopic indicators related to lineprofile morphology (absorption velocity, pseudoequivalent width etc.), but none appear to lead to a significant improvement over the standard lightcurve width and color parameters. We discuss the use of spectra in measuring more precise distances to SN Ia and the implications for future surveys which seek to determine the properties of dark energy.
1 Introduction
Precise distances to Type Ia supernovae (SN Ia) formed the cornerstone of the discovery of cosmic acceleration (Riess et al. 1998; Perlmutter et al. 1999). These measurements use the shape of supernova light curves and their colors to tell which supernovae are bright and which are intrinsically dim (Phillips 1993; Riess et al. 1996; Prieto et al. 2006; Jha et al. 2007; Guy et al. 2007; Conley et al. 2008; Mandel et al. 2009). In this paper we explore the suggestion of Bailey et al. (2009) that spectra can contribute to improved distance measurements. We apply statistical tests to a subset of the SN Ia for which we have good light curves and spectra based on the ongoing program of supernova observations at the HarvardSmithsonian Center for Astrophysics (CfA; Matheson et al. 2008; Hicken et al. 2009a).
It is important to construct the best possible distance indicators to extract the maximum cosmological information from supernova surveys. The present stateoftheart gives distances to wellobserved individual objects with uncertainties of order 10%, so that samples of nearby (Hicken et al. 2009b) and distant SN Ia (ESSENCE, Miknaitis et al. 2007; SNLS, Astier et al. 2006) can be combined to constrain the equationofstate for dark energy, noted . The first results show that for a flat universe with constant , the dark energy is compatible with a cosmological constant (for which ) within about 10% (Astier et al. 2006; WoodVasey et al. 2007). Constraints on the variation of with redshift come from from highredshift observations with the Hubble Space Telescope (Riess et al. 2004, 2007). Presentday limits are weak, but future work with large, carefully calibrated samples from the ground (PanSTARRS, Dark Energy Survey, LSST) and from space (Euclid, WFIRST) will contribute to distinguishing the nature of dark energy (Albrecht et al. 2009). In designing the followup observations for these enterprises, it is worth knowing whether spectra will be useful only for classification and precise redshifts, or whether the spectra of the supernovae themselves can be used to improve the precision of the distances. The way we explore this is to analyze the CfA sample, using the difference between the distance derived from Hubble expansion with the distance predicted from our various models. This difference is the Hubble residual, which we use as a measure of the power of a particular model to predict the supernova distance. As described below, we explore models that combine quantitative information from the spectrum with information on light curve shape and color.
Spectroscopic information is fundamental to the success of employing SN Ia as distance indicators in large surveys. Cleanly separating Type Ia supernovae from corecollapse events like SN Ib and SN Ic improves the purity of the sample. More directly, Nugent et al. (1995) showed that some easilymeasured line ratios in SN Ia spectra are correlated with the luminosity. Measurements of line velocities (and gradients thereof), strengths, and widths and their relation to supernova luminosity have been explored recently by several authors (Benetti et al. 2005; Blondin et al. 2006; Bongard et al. 2006; Hachinger et al. 2006; Bronder et al. 2008). Likewise, Matheson et al. (2008) revealed spectroscopic variability amongst SN Ia of similar luminosity. But the first application of spectroscopic clues to improve distance estimates has come from Bailey et al. (2009). Using spectra of 58 SN Ia from the Nearby Supernova Factory, they showed that the ratio of fluxes in selected wavelength bins (flux ratios) could reduce the scatter of Hubble residuals by % compared to the usual combination of lightcurve width and color parameters ( mag cf. mag). By using a flux ratio measured on a dereddened spectrum in combination with a color parameter they found a further % improvement ( mag). We have sought first to see if we can reproduce their results using the CfA data set, and then to test additional ideas about ways to use spectra to improve the estimates of supernova distances.
In practice, the standardization of SN Ia magnitudes involves a term related to the width of the light curve and a correction due to color. While some methods attempt to separate intrinsic color variations from reddening by dust in the host galaxy (e.g., MLCS2k2; Jha et al. 2007) others use a single parameter for both effects (e.g. SALT2; Guy et al. 2007), exploiting the degeneracy between the two: underluminous SN Ia are also intrinsically redder than overluminous SN Ia (e.g. Tripp 1998). We adopt the latter approach in this paper, to match the method used by Bailey et al. (2009). An active area of research involves the use of SN Ia spectra to provide independent or complementary information on SN Ia luminosities that would help improve their use as distance indicators.
We consider models for predicting distances to SN Ia of the form:
(1) 
where is the apparent restframe band magnitude at peak, is a reference absolute magnitude, “width” and “color” are the usual lightcurve parameters, and “spec” is some spectroscopic indicator; are fitting constants. We study the following five models:

only a spectroscopic indicator is used [i.e. ],

both a spectroscopic indicator and a lightcurve width parameter are used, but no color parameter (i.e. ),

both a spectroscopic indicator and a color parameter are used, but no lightcurve width parameter (i.e. ),

a spectroscopic indicator is used in addition to the lightcurve width and color parameters.

both lightcurve width and color parameters are used, but no spectroscopic indicator (i.e. ). We refer to this as the “standard” model.
We refer to the set of lightcurve parameters and spectroscopic indicators in a given model as the “predictors” for that model, as is common practice in the field of statistics. We can evaluate the use of including a spectroscopic indicator (models 14) by comparing the resulting scatter of Hubble diagram residuals with that from the standard model (No. 5).
The paper is organized as follows: in § 2 we present our lightcurve fitting and training method, as well as a crossvalidation procedure to evaluate the impact of each spectroscopic indicator. We present the CfA data set in § 3. In § 4 we study the flux ratios of Bailey et al. (2009), while in § 5 we consider other spectroscopic indicators. We discuss the use of SN Ia spectra for distance measurements in § 6 and conclude in § 7.
2 Methodology
2.1 Lightcurve fitting
We use the SALT2 lightcurve fitter of Guy et al. (2007) to determine the width and color parameters for each SN Ia in our sample. A model relating distance, apparent magnitudes, and linear dependencies of the absolute magnitude is:
(2) 
where are the SALT2 light curve width and color parameters, and is some spectroscopic indicator. The restframe peak apparent band magnitude is , also obtained from the SALT2 fit to a supernova’s light curve. The distance modulus predicted from the light curve and spectral indicators is , and the constant is a reference absolute magnitude. The distance modulus estimated from the redshift is under a fixed cosmology, where is the luminosity distance.
We use the exact same SALT2 options as Guy et al. (2007) to fit the SN Ia light curves in our sample, and only trust the result when the following conditions are met: reduced ; at least one band point before +5 d from band maximum, and one after +10 d; at least 5  and band points in the age range d; finally, we impose a cut on the SALT2 parameter, namely . This last condition is equivalent to considering SN Ia in the range (i.e. subluminous 1991bglike SN Ia are excluded). We examined all the lightcurve fits by eye to ensure they were satisfactory given this set of conditions. Approximately 170 of the SN Ia with light curves from the CfA SN program pass these requirements.
2.2 Training
For estimating the coefficients of the model (training), we use a custom version of the luminosity distance fitter simple_cosfitter^{1}^{1}1http://qold.astro.utoronto.ca/conley/simple_cosfitter (A. Conley 2009, private communication) based on the Minuit function minimization package (James & Roos 1975). This code minimizes the following expression with respect to the parameters :
(3) 
where is the restframe peak apparent band magnitude of the SN Ia, and is the predicted peak apparent band magnitude, given by:
(4) 
where is the luminosity distance at redshift for a given cosmological model described by the standard parameters . Since our analysis only includes objects at low redshifts (), we do not solve for these parameters and simply assume a flat, cosmological constantdominated model with . The term is a collection of constants including the reference .
The variance that appears in the denominator of Eq. 3 includes an error on the corrected magnitude (Eq. 2), using the estimation error covariance of the lightcurve parameters and spectroscopic indicators, a variance due to peculiar velocities [, where we take the rms peculiar velocity kms], and an intrinsic dispersion of SN Ia magnitudes:
(5) 
where is adjusted iteratively until [typically mag for the standard model]. For any particular model, the intrinsic variance accounts for deviations in magnitude in the Hubble diagram beyond that explained by measurement error or random peculiar velocities, and hence represents a floor to how accurately the model can predict distances. To limit the impact of the peculiar velocity error we restrict our analysis to SN Ia at redshifts ( mag). Of the 170 SN Ia with satisfactory SALT2 fits, 114 are at redshifts greater than 0.015.
2.3 Crossvalidation
We consider several models described by Eq. 2 that use different subsets of the predictors ). If we train a model on the data of all the SN in the sample to estimate the coefficients , we can evaluate the fit of the model by computing the training error, e.g. the mean squared distance modulus residual, , over all SN in the training set.
For finite samples, the average Hubble diagram residual of the training set SN is an optimistic estimate of the ability of the statistical model, Eq. 2, to make accurate predictions given the supernova observables. This is because it uses the supernova data twice: first for estimating the model parameters (training), and second in evaluating the residual error. Hence, the training set residuals underestimate the prediction error, which is the expected error in estimating the distance of a SN that was not originally in the finite training set. We refer to these data as “outofsample”. Furthermore, with a fixed, finite, and noisy training data set, it is always possible to reduce the residual, or training, error of the fit by introducing more predictors to the model. However, this may lead to overfitting, in which apparently significant predictors are found in noisy data, even though in reality there was no trend. These relationships are sensitive to the finite training set and would not generalize to outofsample cases. To evaluate predictive performance and guard against overfitting with a statistical model based on finite data, we should estimate the prediction error for outofsample cases. To do so, we use a crossvalidation (CV) procedure to evaluate the impact of using a spectroscopic indicator , alone and in conjunction with standard light curve parameters, on the accuracy of distance predictions in the Hubble diagram.
Crossvalidation seeks to estimate prediction error and to test the sensitivity of the trained statistical model to the data set by partitioning the full data set into smaller subsets. One subset is held out for testing predictions of the model, while its complement is used to train the model. This process is repeated over partitions of the full data set. This method avoids using the same data simultaneously for training the model and for estimating its prediction error. Crossvalidation was used before for statistical modeling of SN Ia by Mandel et al. (2009), who applied the .632 bootstrap method to evaluate distance predictions for SN Ia using near infrared light curves. A careful implementation of a crossvalidation method is particularly important for small samples, as is the case in this paper (e.g. 26 SN Ia at maximum light; see § 4.3).
In this paper, the crossvalidation method we use is known as fold CV. The idea is to divide our SN Ia sample into subsets, train a given model on subsets, and validate it on the remaining subset. This procedure is repeated times, at which point all SN Ia have been part of a validation set once. Typical choices of are 5 or 10 (e.g., Hastie et al. 2009). The case , where is the number of SN Ia in our sample, is known as “leaveoneout” CV. In this case, each SN Ia in turn is used as a validation set, and the training is repeated times on SN Ia.
In practice, we run fold CV as follows:

the sample of SN Ia is randomly divided into subsets of equal size (when is not a multiple of , the number of SN Ia between any two subsets differs by at most one).

Looping over each fold:

all the SN Ia in the subset are removed from the sample: they form the validation set. The remaining SN Ia define the training set.

using this set of parameters we predict the magnitudes of the SN Ia in the validation set (indexed ):
(6) The Hubble residual, or error, of the predicted distance modulus is then
(7)


When the magnitude or distance of each SN Ia has been predicted once using the above scheme, we analyze the prediction errors (§2.4). When doing so, we check that the set of bestfit are consistent amongst all training sets.
For all the spectroscopic indicators we consider in this paper, we run fold CV with , and to make sure our results are not sensitive to the exact choice of (the impact on the weighted rms of prediction Hubble residuals is mag). Moreover, we run each fold CV 10 times to check the outcome is insensitive to how the starting SN Ia sample is divided into subsets (the impact on the weighted rms of prediction Hubble residuals is mag). In what follows we report our results based on .
2.4 Comparing model predictions
For each model, which we label by its predictors, e.g. , crossvalidation gives us a set of prediction errors for each SN . To summarize the total dispersion of predictions, we computed the weighted mean squared error,
(8) 
the square root of which is the weighted rms. We weight the contribution from each SN by the inverse of its expected total variance (the precision) . We prefer to use the rms of the prediction residuals rather than the sample standard deviation, since the former measures the average squared deviation of the distance prediction from the Hubble distance , whereas the latter measures the average squared deviation of prediction errors from the mean prediction error. Note that the mean squared error is equal to the sample variance plus the square of the mean error. Thus, the mean squared error will be larger than the sample variance if the mean error, or bias, is significant, but the two statistics will be the same if it is not. Since the mean prediction error is not guaranteed to be zero, we use the WRMS statistic to assess the total dispersion of distance prediction errors. We also estimate the sampling variance of this statistic (see Appendix A).
The WRMS measures the total dispersion in the Hubble diagram. However, we expect that some of that scatter is due to random peculiar velocities [influencing with variance ], and some due to measurement error (). Using the crossvalidated distance errors, we also estimate how precisely we can expect a particular model to predict the distance to a SN Ia when these other sources of error are negligible. We call this variance estimate the rms intrinsic prediction error, a property of the model itself, and label it . Intuitively, this is the result of subtracting from the total dispersion the expected contributions of peculiar velocities and measurement uncertainties. It is similar to the intrinsic variance discussed in § 2.2, in that it represents a floor to how accurately the model can predict distances. It is not strictly equivalent, however, since is adjusted during the training process so that , while is estimated using the crossvalidated distance modulus prediction errors. In Appendix B, we describe a maximum likelihood estimate for and its standard error from the set of distance predictions.
We are also interested in the intrinsic covariance of the distance prediction errors generated by two different models. Imagine that peculiar velocities and measurement error were negligible, and model and model predict distances to the same set of SN Ia. We calculate the prediction errors, from each model. There is a positive intrinsic covariance if tends to be positive when is positive, and a negative intrinsic covariance if they tend to make errors in opposite directions. The intrinsic correlation is important because it suggests how useful it would be to combine the distance predictions of two models. If two models tend to make prediction errors in the same direction (positive correlation), then the combined model is not likely to do much better than the most accurate of the two original models. However, if two models tend to make prediction errors that are wrong in different ways (zero or negative correlation), then we expect to see a gain from averaging the two models.
Even if two models make prediction errors that are intrinsically uncorrelated, random peculiar velocities will tend to induce a positive correlation in the realized errors if the methods are used on the same set of SN. This is because the unknown peculiar velocity for a given SN is the same regardless of the model we use to generate its distance prediction. Hence, the expected contribution of random peculiar velocities to the sample covariance of predictions must be removed to estimate the intrinsic covariance between two models. In Appendix B, we describe a maximum likelihood estimator for the intrinsic covariance and its standard error using the set of distance predictions.
We use the maximum likelihood estimation method to estimate the intrinsic prediction error and intrinsic covariance of each model compared to the reference model that uses only light curve information.
3 Spectroscopic data
We have used a large spectroscopic data set obtained through the CfA Supernova Program. Since 1994, we have obtained optical spectra of lowredshift () SN Ia with the 1.5 m Tillinghast telescope at FLWO using the FAST spectrograph (Fabricant et al. 1998). Several spectra were published in studies of specific supernovae (e.g., SN 1998bu; Jha et al. 1999), while 432 spectra of 32 SN Ia have recently been published by Matheson et al. (2008). We also have complementary multiband optical photometry for a subset of SN Ia (Riess et al. 1999; Jha et al. 2006; Hicken et al. 2009a), as well as NIR photometry for the brighter ones (WoodVasey et al. 2008). All published data are available via the CfA Supernova Archive^{2}^{2}2http://www.cfa.harvard.edu/supernova/SNarchive.html.
All the spectra were obtained with the same telescope and instrument, and reduced in a consistent manner (see Matheson et al. 2008 for details). The uniformity of this data set is unique and enables an accurate estimate of our measurement errors.
4 Spectral flux ratios
4.1 Measurements
Bailey et al. (2009) introduced a new spectroscopic indicator, calculated as the ratio of fluxes in two wavelength regions of a SN Ia spectrum binned on a logarithmic wavelength scale. This ratio, noted [ and being the restframe wavelength coordinates in Å of a given bin center], is measured on a deredshifted spectrum corrected for Galactic reddening using the Cardelli et al. (1989) extinction law with in combination with the dust maps of Schlegel et al. (1998). A colorcorrected version of this flux ratio, noted , is measured on a spectrum additionally corrected for the SALT2 color parameter using the color law of Guy et al. (2007). Figure 1 illustrates both measurements.
We use the same binning as Bailey et al. (2009), namely 134 bins equally spaced in between 3500 Å and 8500 Å (rest frame), although most of the CfA spectra used here do not extend beyond Å (see § 4.3). The resulting kms bin size is significantly less than the typical width of a SN Ia feature ( kms). The error on includes a flux error (from the corresponding variance spectrum), an error due to the relative flux calibration accuracy (see § 4.2), and an error due to the SALT2 color precision. When there are several spectra of a given SN Ia within d of the age we consider (see § 4.3 for spectra at maximum light; § 4.4 for spectra at other ages), we use the errorweighted mean and standard deviation of all flux ratios as our measurement and error, respectively. Bailey et al. (2009) also chose d in their analysis, and we find that increasing worsens the results while decreasing it leads to too small a sample.
Bailey et al. (2009) crosschecked the results for their best single flux ratio using the the sample of SN Ia spectra published by Matheson et al. (2008) [and available through the CfA SN Archive]. We checked the validity of our flux ratio measurements by comparing the values of in the Matheson et al. (2008) sample with those reported in Table 2 of Bailey et al. (2009). In all cases, our measurements agree well within the errors. This also holds for SN 1998bu, accidentally removed from the Matheson et al. (2008) sample by Bailey et al. (2009) [H. Fakhouri 2010, private communication]. We note that we were unable to crosscheck the flux ratio measurements of Bailey et al. (2009) in a similar fashion, since none of their 58 SN Ia spectra are publicly available.
4.2 Impact of relative flux calibration and SALT2 color
When the and wavelength bins have a large separation ( Å), is essentially a color measurement. We therefore expect flux ratios to be sensitive to the relative flux calibration accuracy of the spectra. Fig. 2 shows the relation between uncorrected Hubble residuals [i.e. ] and our most highlyranked flux ratio at maximum light (see § 4.3). There is one data point per SN Ia, colorcoded according to the absolute difference in color at maximum light derived from the spectrum and that derived from the photometry, noted , which we use as a proxy for relative flux calibration accuracy. The bulk of the sample defines a highly correlated relation (dashed line), with several outliers all having mag. We therefore restrict our analysis to SN Ia with spectra that have a relative flux calibration better than 0.1 mag.
Bailey et al. (2009) noted that the highlyreddened SN 1999cl was a large outlier in their analysis, and attributed this to the nonstandard nature of the extinction towards this SN (; Krisciunas et al. 2006). To explore the effects of reddening, in Fig. 3 (left), we show the relation between uncorrected Hubble residual and at maximum light, for SN Ia at redshifts that satisfy our requirement on the relative flux calibration accuracy. Using this lower redshift bound has the effect of including several highlyreddened SN Ia (including SN 1999cl; see Fig. 4), which are otherwise excluded based on the redshift cut we use elsewhere this paper (). For SN Ia with , is highly correlated with uncorrected Hubble residuals, but those with red colors () tend to deviate significantly from this relation (dashed line; this is not the case for SN 1995E, for which ), the two largest outliers corresponding to the reddest SN Ia (SN 1999cl and SN 2006X). Both are subject to high extinction by nonstandard dust in their respective host galaxies ( mag for ; Krisciunas et al. 2006; Wang et al. 2008) and display timevariable Na I D absorption, whose circumstellar or interstellar origin is still debated (Patat et al. 2007; Blondin et al. 2009). The reddening curves in Fig. 3 (dotted lines) seem to corroborate the fact that the nonlinear increase of flux ratios at high values of the SALT2 color parameter is mainly due to reddening by dust with low . Nonetheless, SN 1999cl still stands out in this respect as it would require a value of inconsistent with that found by Krisciunas et al. (2006). Moreover, while we obtain consistent estimates for SN 2006X using other flux ratios, this is not the case for SN 2006br, for which some flux ratios are consistent with .
The right panel of Fig. 3 shows the relation between colorcorrected Hubble residual [i.e. ] and our most highlyranked colorcorrected flux ratio at maximum light (see § 4.3) for the same sample. SN Ia with a SALT2 color are again outliers. As noted by Bailey et al. 2009, this shows that a single color parameter cannot encompass the variety of SN Ia intrinsic colors and extinction by nonstandard dust. We therefore impose a cut on SALT2 color in our analysis, only considering SN Ia with . Four of the five SNe with in Fig. 3 are rejected anyway based on our redshift cut. The remaining one, SN 2006br, is then rejected based on our color cut.
4.3 Results on Maximumlight Spectra
4.3.1 Selecting the best flux ratios
After selecting SN Ia that satisfy both requirements on relative flux calibration accuracy and SALT2 color parameter, we are left with 26 SN Ia at with spectra within d from maximum light (see Table 1, where we also present selected flux ratio measurements). The spectra show no sign of significant contamination by hostgalaxy light, which can also bias the flux ratio measurements. We make no cut based on the signaltonoise ratio (S/N) of our spectra, as they are generally well in excess of 100 per logwavelength bin. We only consider flux ratios for wavelength bins represented in all the spectra. This leads to 98 bins between Å and Å, i.e. 9506 independent flux ratios.
SN  

1998V  0.0170  15.085 (0.020)  (0.161)  (0.015)  0.330 (0.005)  0.744 (0.006)  1.045 (0.004)  1.485 (0.006)  0.933 (0.004) 
1998dx  0.0539  17.536 (0.037)  (0.457)  (0.027)  0.365 (0.018)  0.949 (0.027)  1.086 (0.024)  1.445 (0.033)  0.946 (0.018) 
1998eg  0.0237  16.096 (0.016)  (0.366)  (0.019)  0.378 (0.023)  0.920 (0.022)  1.070 (0.011)  1.478 (0.013)  0.995 (0.007) 
1999aa  0.0152  14.698 (0.009)  (0.073)  (0.009)  0.286 (0.027)  0.704 (0.025)  1.054 (0.012)  1.448 (0.011)  0.994 (0.006) 
1999cc  0.0316  16.760 (0.010)  (0.175)  (0.012)  0.406 (0.013)  0.994 (0.017)  1.001 (0.015)  1.513 (0.019)  0.850 (0.011) 
1999ek  0.0176  15.587 (0.009)  (0.127)  (0.010)  0.478 (0.039)  0.936 (0.036)  1.091 (0.016)  1.461 (0.014)  0.952 (0.007) 
1999gd  0.0191  16.940 (0.022)  (0.193)  (0.022)  0.714 (0.013)  0.929 (0.016)  1.081 (0.024)  1.455 (0.020)  1.028 (0.015) 
2000dk  0.0165  15.347 (0.021)  (0.301)  (0.022)  0.423 (0.017)  0.933 (0.017)  1.038 (0.009)  1.425 (0.010)  1.006 (0.006) 
2000fa  0.0218  15.883 (0.023)  (0.127)  (0.018)  0.410 (0.046)  0.829 (0.042)  1.082 (0.020)  1.459 (0.017)  0.902 (0.009) 
2001eh  0.0363  16.575 (0.018)  (0.222)  (0.017)  0.312 (0.047)  0.799 (0.043)  1.040 (0.020)  1.451 (0.019)  0.931 (0.010) 
2002ck  0.0302  16.303 (0.048)  (0.147)  (0.023)  0.348 (0.046)  0.752 (0.042)  1.030 (0.020)  1.463 (0.017)  0.945 (0.010) 
2002hd  0.0360  16.738 (0.038)  (0.456)  (0.022)  0.403 (0.027)  0.748 (0.027)  0.955 (0.018)  1.378 (0.020)  0.899 (0.013) 
2002hu  0.0359  16.587 (0.012)  (0.143)  (0.012)  0.293 (0.027)  0.768 (0.029)  1.048 (0.021)  1.509 (0.029)  0.984 (0.017) 
2002jy  0.0187  15.702 (0.019)  (0.212)  (0.015)  0.305 (0.015)  0.795 (0.015)  1.073 (0.010)  1.566 (0.012)  0.963 (0.007) 
2002kf  0.0195  15.654 (0.033)  (0.189)  (0.023)  0.361 (0.035)  0.953 (0.035)  1.132 (0.024)  1.550 (0.024)  0.969 (0.014) 
2003U  0.0279  16.471 (0.046)  (0.558)  (0.035)  0.373 (0.009)  0.891 (0.016)  1.048 (0.016)  1.445 (0.025)  0.965 (0.014) 
2003ch  0.0256  16.659 (0.022)  (0.297)  (0.019)  0.402 (0.025)  1.024 (0.024)  1.218 (0.010)  1.570 (0.009)  1.043 (0.005) 
2003it  0.0240  16.342 (0.028)  (0.359)  (0.029)  0.432 (0.011)  0.918 (0.011)  1.087 (0.007)  1.456 (0.009)  0.950 (0.005) 
2003iv  0.0335  16.961 (0.026)  (0.486)  (0.028)  0.420 (0.005)  1.092 (0.011)  1.081 (0.011)  1.467 (0.015)  0.940 (0.009) 
2004as  0.0321  16.956 (0.018)  (0.206)  (0.016)  0.415 (0.006)  0.838 (0.008)  1.065 (0.009)  1.573 (0.011)  0.975 (0.007) 
2005ki  0.0208  15.551 (0.029)  (0.153)  (0.026)  0.398 (0.032)  0.975 (0.030)  1.091 (0.015)  1.441 (0.015)  0.957 (0.009) 
2006ax  0.0180  15.010 (0.010)  (0.062)  (0.009)  0.304 (0.019)  0.851 (0.018)  1.082 (0.008)  1.558 (0.009)  1.009 (0.005) 
2006gj  0.0277  17.668 (0.033)  (0.280)  (0.023)  0.675 (0.008)  0.988 (0.012)  1.011 (0.011)  1.377 (0.014)  0.930 (0.009) 
2006sr  0.0232  16.126 (0.017)  (0.220)  (0.015)  0.422 (0.019)  0.939 (0.018)  1.031 (0.010)  1.474 (0.010)  0.913 (0.006) 
2007ca  0.0151  15.933 (0.013)  (0.122)  (0.012)  0.547 (0.028)  0.895 (0.026)  1.097 (0.012)  1.569 (0.012)  0.986 (0.006) 
2008bf  0.0257  15.703 (0.010)  (0.095)  (0.010)  0.315 (0.038)  0.806 (0.035)  1.047 (0.016)  1.526 (0.013)  0.924 (0.007) 
We run the fold crossvalidation procedure outlined in § 2.3, and consider the five models for estimating distances to SN Ia described in § 1:
(9)  
(10)  
(11)  
(12)  
(13) 
When no color correction is involved (Eqs. 910), we use the uncorrected flux ratio . When a color correction is involved (the term in Eqs. 1112), we use the colorcorrected version of the flux ratio . Using in combination with color, or alone or in combination with , severely degrades the predictive power of the model, so we do not report results using ; alone; or .
We rank the flux ratios in each case based on the intrinsic prediction error (; see § 2.4), but note that ranking based on the weighted rms of prediction Hubble residuals makes almost no difference. The results for the top five flux ratios are displayed in Table 2. We also report the bestfit , the weighted rms of prediction Hubble residuals (WRMS), the intrinsic correlation of residuals with those found using the standard predictors (noted ; see § 2.4), and the difference in intrinsic prediction error with respect to the standard model, noted . Since we compute the error on (see Appendix B), we also report the significance of this difference with respect to the standard predictors. This is a direct measure of whether a particular model predicts more accurate distances to SN Ia when compared to the standard approach, and if so how significant is the improvement. Fig. 5 shows the resulting Hubble diagram residuals vs. redshift for the best flux ratio in each of the four models given by Eqs. 912, and using the standard predictors.
Rank  WRMS  
1  6630  4400  
2  6630  4430  
3  6630  4670  
4  6900  4460  
5  6420  4430  
1  6630  4400  
2  6630  4430  
3  6900  4460  
4  6900  4370  
5  6990  4370  
1  6420  5290  
2  6630  4890  
3  4890  6630  
4  4890  6810  
5  6540  4890  
1  5690  5360  
2  5360  5690  
3  5660  5290  
4  5690  5290  
5  5290  5660  
All the flux ratios listed in Table 2 lead to an improvement over the standard correction (i.e. ), as found by Bailey et al. (2009), but the significance is low: for only; for ; for and . This is in part due to the small number of SN Ia in our sample. Note that in all cases, i.e. the models that include a flux ratio tend to make prediction errors in the same direction as , and we do not expect to gain much by combining these models.
Using best single flux ratio by itself reduces the weighted rms of prediction residuals (as well as the intrinsic prediction error, ) by % when compared with [ mag cf. mag], although as noted above the significance of the difference in intrinsic prediction error is negligible ( mag, or ).
Using in combination with leads to no improvement over using alone (although this is not reported by Bailey et al. 2009, it is consistent with their findings; S. Bailey 2009, private communication), and even leads to systematically worse results. Our best single flux ratio yields a difference in intrinsic prediction error with respect to of mag, when used on its own, while it yields mag when combined with . These differences are statistically indistinguishable from one another given the size of the error on , but they are systematic regardless of the flux ratio we consider.
This seems counterintuitive, as one might expect that including an additional predictor would result in more accurate distance predictions. However, this is not necessarily the case under crossvalidation. The reason is that by itself is a poor predictor of Hubble residuals, and one does not gain anything by combining it with . This is not surprising, as the relation between lightcurve width and luminosity is only valid if the SN Ia are corrected for color or extinction by dust beforehand. In fact, by itself accounts for most of the variation in Hubble residuals. When we crossvalidate, the extra coefficient will tend to fit some noise in a given training set, and this relation will not generalize to the validation set. This results in an increase in prediction error because the added information is not useful. We see from Table 2 that adding affects the bestfit value for [ cf. for only]; moreover, we obtain when using where when using , which again shows that is fitting noise when is combined with . This illustrates the advantage of using crossvalidation in guarding against overfitting noise as more parameters and potential predictors are added.
Figure 6 (upper panel) shows why alone is a good predictor of Hubble residuals. Its strong correlation with SALT2 color (Pearson correlation coefficient ) shows that this ratio is essentially a color measurement. The correlation with is less pronounced (), but this is largely due to a small number of outliers: removing the three largest outliers results in a Pearson correlation coefficient . The flux ratio by itself is thus as useful a predictor as and combined.
The relation between and is not linear, but it is certainly true that SN Ia with higher (i.e. broader light curves) tend to have lower [the same is true for , the highestranked flux ratio by Bailey et al. 2009]. Since the width of the lightcurve is a parameter intrinsic to each SN Ia (although its measurement can be subtly affected by hostgalaxy reddening; see Phillips et al. 1999), the correlation between and shows that the color variation measured by is intrinsic in part. This is consistent with the socalled “brighterbluer” relation of Tripp (1998): overluminous SN Ia are intrinsically bluer than underluminous SN Ia (see also Riess et al. 1996).
Using a colorcorrected flux ratio in combination with color results in even lower Hubble residual scatter when compared with the single flux ratio case. Our best flux ratio in this case, , reduces the weighted rms of prediction residuals by % with respect to [ mag cf. mag], and the intrinsic prediction error by % [ mag cf. mag]. Again, the significance of this difference is only ( mag). We see from Fig. 6 (middle panel) that is strongly anticorrelated with (), and that dereddening the spectra using the SALT2 color law is effective in removing any dependence of on color, as expected.
One would naively think that combining our best colorcorrected ratio with would lead to an even further improvement, but this is not the case. In fact, ranks 298 when we consider the set of predictors . This is due to the strong anticorrelation of with . Adding as an extra predictor when already includes this information means will tend to fit noise in a given training set, as was the case for the set of predictors when compared with only. Indeed, the bestfit value for for is again consistent with 0.
Nonetheless, several colorcorrected flux ratios do result in a further reduced scatter when combined with , although the wavelength baseline for these ratios is much smaller ( Å) and the wavelength bins forming the ratios are all concentrated in the region of the S II 5454,5640 doublet. Our highestranked flux ratio in this case, , reduces the weighted rms of prediction residuals by % with respect to [ mag cf. mag], and the intrinsic prediction error by % [ cf. mag]. Again, the significance of this difference is only ( mag). We see from Fig. 6 (lower panel) that this ratio is not correlated with () or (), and thus constitutes a useful additional predictor of distances to SN Ia.
4.3.2 Twodimensional maps of all flux ratios
The results for all 9506 flux ratios are displayed in Fig. 7. The four rows correspond to the four models for estimating SN Ia distances that include a flux ratio (Eqs. 912). The left column is colorcoded according to the weighted rms of prediction Hubble residuals (flux ratios that result in mag are given the color corresponding to mag), while the right column is colorcoded according to the absolute Pearson correlation coefficient of the correction terms with uncorrected Hubble residuals [e.g. for the set of predictors , the correlation of with uncorrected residuals].
WRMS residuals [mag]  Absolute Pearson correlation  
Only a very restricted number of wavelength bins lead to a low WRMS of prediction Hubble residuals when a flux ratio is used by itself (Fig. 7; upper left), namely Å and Å (4 of the 5 best flux ratios in Table 2 for only have Å). This is in stark contrast with the large number of flux ratios with absolute Pearson correlation coefficients (Fig. 7; upper right). In general, a flux ratio with a higher correlation coefficient will result in a Hubble diagram with less scatter, but this is not systematically the case, and the relation between the two is certainly not linear. For Pearson correlation coefficients , the standard deviation of Hubble residuals can vary by up to 0.1 mag at any given (Fig. 8, top panel). This is because the crosscorrelation coefficient does not take into account errors on or on the Hubble residual, and is biased by outliers and reddened SN Ia. The lower panel of Fig. 8 shows the impact of including the highlyreddened SN 2006br: at any given , the resulting weighted rms of prediction Hubble residuals is 3060% higher. Moreover, many flux ratios with high correlation coefficients () result in Hubble diagrams with excessively large scatter ( mag). This is counterintuitive, since the resulting scatter in these cases appears to be larger than when no predictors at all are used to determine distances to SN Ia (in which case mag). The reason is that we consider the scatter under crossvalidation, as opposed to fitting all the SN Ia at the same time. In these aberrant cases, the trained model is sensitive to the inclusion or exclusion of some outlier in the training set, and this leads to large errors when the outlier is in the validation set. Last, including this SN leads to correlations with , where there are none otherwise. Fig. 8 thus justifies our excluding SN 2006br from the sample (already excluded based on our cut on SALT2 color; see § 4.2), and illustrates the advantage of selecting flux ratios based directly on the weighted rms of prediction Hubble diagram residuals, rather than on crosscorrelation coefficients. As already mentioned in § 4.3.1, using crossvalidated prediction errors to select the best flux ratios guards us against overfitting a small sample: in the naive approach that consists in fitting the entire SN Ia sample at once, adding more predictors always leads to a lower scatter in Hubble residuals (this is known as “resubstitution”; see, e.g., Mandel et al. 2009).
When the SALT2 color parameter is used in combination with a colorcorrected flux ratio , there are again restricted wavelength regions that lead to a low weighted rms of prediction Hubble residuals (Fig. 7; third row left) [4 of the 5 best flux ratios in Table 2 for predictors involve wavelength bins at Å and Å]. The SALT2 color parameter does not attempt to distinguish between reddening by dust and intrinsic color variations. Dereddening the spectra using this parameter corrects for both effects regardless of their relative importance. However, since the SALT2 color law is very similar to the Cardelli et al. (1989) extinction law with and mag (Guy et al. 2007, their Fig. 3), one generally assumes that the color correction removes the bulk of reddening by dust, and the remaining variations in the SED are primarily intrinsic to the supernova. If this is so, it is intriguing that the best flux ratios for the only and models share similar wavelength bins. The recent survey of 2D SN Ia models from Kasen et al. (2009) suggests that a significant part of the color variation measured by the is indeed intrinsic (see § 4.3.4).
The second row of Fig. 7 confirms that using the parameter in combination with a flux ratio results in a slight degradation in the weighted rms of prediction residuals, while the correlations with uncorrected Hubble residuals are degraded with respect to cases where is used by itself. Last, the bottom row of Fig. 7 is a visual demonstration that fares better than overall, although the best colorcorrected flux ratios do not perform significantly better. We see from the right panel that the correlations of with uncorrected residuals all have absolute Pearson correlation coefficients . The two regions at Å stand out in the 2D plot of WRMS residuals, and all the top colorcorrected ratios for include a wavelength bin in that region (which corresponds to the absorption trough of the S II 5454 line).
4.3.3 Comparison with Bailey et al. (2009)
We confirm the basic result of Bailey et al. (2009) using an independent sample and a different crossvalidation method: the use of a flux ratio alone or in combination with a color parameter results in a Hubble diagram with lower scatter when compared to the standard model. Using a flux ratio alone, Bailey et al. (2009) find as their most highlyranked ratio, while we find [see Table 2]. The wavelength bins are almost identical, and in any case is amongst our top 5 ratios. For this ratio we find , in agreement with found by Bailey et al. (2009)^{5}^{5}5In fact Bailey et al. (2009) find , but this is due to a typo in their equation for the distance modulus: really appears as a negative term in their paper (S. Bailey 2010, private communication)..
The other four flux ratios given by Bailey et al. (2009) [their Table 1] are not part of our top5 . For two of these ratios the reason is trivial: they include wavelength bins redder than 7100 Å, not covered by most of our spectra. The other two flux ratios [ and ] lead to differences % on the Hubble diagram residual scatter with respect to the standard model according to Bailey et al. (2009) [ mag for and mag for , cf. mag for ], and they rank 29 and 658 in our study, respectively. This discrepancy is in part due to the selection method: Bailey et al. (2009) select their best ratios based on crosscorrelation coefficients with uncorrected magnitudes, while we select them based on the intrinsic prediction error from crossvalidated Hubble diagram residuals. However, ranking our ratios using the same method as Bailey et al. (2009) does not resolve the discrepancy. It is possible that Bailey et al. (2009) are sensitive to their exact choice of training and validation samples, where we have randomized the approach. We note however that the impact on the weighted rms of prediction residuals is statistically indistinguishable for many flux ratios given our sample size (e.g. error on WRMS mag cf. differences of mag in WRMS for the top 5 flux ratios; see Table 2), so that the exact ranking of flux ratios is not well determined and subject to revisions from small changes in the input data.
Using both a colorcorrected flux ratio and the SALT2 color parameter decreases the residual scatter further, as found by Bailey et al. (2009). Using the set of predictors leads to % lower WRMS with respect to [ mag cf. mag], and to % lower [ mag cf. mag] at significance based on the difference in intrinsic prediction error, . None of the colorcorrected flux ratios listed by Bailey et al. (2009) [their Table 1] are part of our five highestranked , although our top ratios are formed with almost the same wavelength bins [ in this paper; in Bailey et al. (2009)]. The other colorcorrected ratios in Bailey et al. (2009) rank well below in our study, whether we select the best according to the resulting Hubble residual scatter or the crosscorrelation of with uncorrected residuals. The same caveats apply here as when selecting the best uncorrected flux ratios (see previous paragraph), although the measurement is probably even more sensitive to the relative flux calibration accuracy of the spectra.
We also crosschecked the results of Bailey et al. (2009) by simply validating their best flux ratios on our entire SN Ia sample. The results are displayed in Table 3, where we give the weighted rms of Hubble residuals from a simultaneous fit to the entire SN Ia sample (as done by Bailey et al. 2009), as opposed to prediction residuals under crossvalidation. For all flux ratios (both and ) in Table 3, our own bestfit agrees within the errors with that found by Bailey et al. (2009) [noted (B09) in Table 3] , although we have systematically larger errors. We note that most of the top ratios reported by Bailey et al. (2009) lead to no significant improvement over , and even leads to slightly worse results for some ratios [e.g. results in mag cf. mag for ]. A closer look at Table 1 of Bailey et al. (2009) shows that this is also the case in their paper: for the only model, only one ratio out of five, namely , results in a lower Hubble diagram residual scatter. The other four are either consistent with no improvement [ and ], or yield slightly worse results [ and ]. Again, this results from the way Bailey et al. (2009) selected their best ratios, based on the correlation with uncorrected Hubble residuals.
(B09)  WRMS^{}^{}Weighted rms of Hubble residuals from a simultaneous fit to the entire SN Ia sample (as done by Bailey et al. 2009), as opposed to prediction residuals under crossvalidation. As explained in § 2.3, the weighted rms of prediction Hubble residuals is a more realistic estimate of the accuracy of a given model in measuring distances to SN Ia.  
6420  4430  
6420  4170  
7720^{}^{}Wavelength bins redder than 7100 Å, not covered by most of our spectra.  4370  
6420  5120  
7280^{}^{}Wavelength bins redder than 7100 Å, not covered by most of our spectra.  3980  
6420  5190  
5770  6420  
6420  5360  
6760  6420  
6420  4430  
We cannot directly compare the resulting scatter in Hubble diagram residuals with those reported in Table 1 of Bailey et al. (2009). First, they use the sample standard deviation (), whereas we use the weighted rms (see § 2.4). Second, the scatter they find for the standard model is significantly lower than ours. We have refit the data presented in Table 1 of Bailey et al. (2009) to derive the weighted rms of Hubble residuals for the model from their sample, and find mag, which is almost 0.05 mag smaller when compared to our sample ( mag). This difference in the Hubble residual scatter between the SNFactory and CfA samples is consistent with the difference found amongst other nearby SN Ia samples by Hicken et al. (2009b).
Interestingly, using the WRMS statistic as opposed to the sample standard deviation results in a smaller difference in residual scatter between the only and models. Using our own fits of the data presented in Table 1 of Bailey et al. (2009), we find mag for , i.e. % smaller scatter when compared to , where the difference between the two models is % when considering the sample standard deviation.
4.3.4 Comparison with 2D models
We use synthetic spectra based on a recent 2D survey of delayeddetonation SN Ia models by Kasen et al. (2009) to investigate the physical origin of the high correlation between several flux ratios and uncorrected SN Ia magnitudes. These models were found to reproduce the empirical relation between peak band magnitude and postmaximum decline rate. A more detailed comparison of SN Ia data with these models will be presented elsewhere.
We measured flux ratios in the same manner as we did for our data, and computed Pearson correlation coefficients with (uncorrected) absolute magnitudes synthesized directly from the spectra. The 2D correlation map is shown in Fig. 9 (left panel), alongside the same map derived from the CfA SN Ia sample (right panel). At first glance, the two maps appear similar, with two large Åwide “bands” of flux ratios with strong correlations with uncorrected magnitudes, for Å and Å, although the correlations are even stronger in the models (several flux ratios have absolute Pearson correlation coefficients , where there are none in the data). A closer look reveals some important differences, the models having strong correlations for Å Å and Å that are not present in the data. The same applies to the regions with coordinates Å. These differences are significant and illustrate the potential for such comparisons to impose strong constraints on SN Ia models.
In Fig. 10 we show the correlation of uncorrected absolute restframe band magnitudes () with our highestranked flux ratio , both from the 2D models and CfA data, where we have used the redshiftbased distance for the latter. The vertical offset is arbitrary and solely depends on the normalization adopted for the data, which we have chosen for sake of clarity. There are 1320 model points, each corresponding to one of the 44 2D delayeddetonation models of Kasen et al. (2009) viewed from one of 30 different viewing angles. The linear fits shown in Fig. 10 are done over the range , where the models and data overlap. For the data this is equivalent to excluding the three most highlyreddened SN Ia (open circles), for which the hostgalaxy visual extinction was determined based on lightcurve fits with MLCS2k2 (Jha et al. 2007). This is justified since no reddening by dust is applied to the models. The slope of the relation between and is significantly steeper for the models () than for the data (), and the correlation is much stronger ( cf. 0.69 for the data). This is not surprising since the data are subject to random measurement and peculiar velocity errors, which degrade the correlation. Including models for which softens the slope to and results in a stronger correlation (), while including data with mag results in and a much stronger correlation . This last value for can be compared with the fitting parameter for this same flux ratio (; see Table 2), although the latter is based on a formal crossvalidation procedure and the opposite sign is a consequence of the convention when using the flux ratio to predict SN Ia distances. As noted in § 4.3.2, the correlation of with is largely biased by the minority of highlyreddened SN Ia.
The models yield values for ranging between and , all due to intrinsic color variations. Since these models reproduce the relation between and postmaximum decline rate of Phillips (1993), they confirm the intrinsic nature of the correlation between and shown in Fig. 6 (upper left panel).
The wavelength bins Å and Å are close to the central wavelengths of the standard and broadband filters, hence is a rough measure of the color at band maximum. The 2D models of Fig. 10 indicate that a large part of the variation in seen in the data is due to intrinsic variations in color. Reddening in the host galaxy is then needed to explain values of , while at lower values it is challenging at best to discriminate between the effects of intrinsic color variations and extinction by dust, since both affect in the same manner, as illustrated by the reddening vectors in Fig. 10 [they are really reddening curves, cf. Fig. 3, but the behavior is almost linear over this small range in ].
The models also give a physical explanation for the correlation of with absolute magnitude. Indeed, the variation of this ratio is largely caused by spectroscopic variations around 4400 Å, a region dominated by lines of Fe II and Fe III, with contributions from Mg II (Ti II provides an important source of opacity for the least luminous SN Ia), while the region around 6630 Å has little intrinsic variation (this was noted by Bailey et al. 2009). This translates to a standard deviation of peak band magnitudes ( mag) that is almost twice as large as the band magnitude (at maximum; mag) in the models. The relative contribution of Fe II and Fe III lines is related to the temperature of the lineforming regions in the SN Ia ejecta, itself a function of peak luminosity (dimmer SN Ia are generally cooler; see, e.g., Kasen & Woosley 2007). One thus expects a large luminositydependent spectroscopic variation in this wavelength region, although its exact shape and relation to temperature remains largely unknown.
While these models provide useful insights into the physical origin of these correlations, a direct comparison with the data reveals some of their shortcomings. In Fig. 10 we see that some models predict values of the flux ratio for the most luminous SN Ia, where the data are limited to values greater than this. Our sample includes several SN Ia at the high luminosity end that show no sign of extinction in their host galaxies ( mag based on lightcurve fits with MLCS2k2), so the differences are real and point to discrepancies between the data and the models, some of the latter having bluer colors at band maximum. This is not surprising, as the models explore a larger range of parameter space than is realized in nature. Comparisons of this sort can then help constrain the range of model input parameters. A more detailed comparison of SN Ia data from the CfA SN program with these models will be presented elsewhere.
4.4 Results on spectra at other ages
Bailey et al. (2009) restricted their analysis to spectra within d from band maximum. In this section we consider flux ratios measured on spectra at other ages. We impose the same cuts on relative flux calibration accuracy ( mag), SALT2 color (), redshift (), and age range ( d) as those used for the maximumlight spectra in the previous section. We consider all ages between d and d, in steps of 2.5 d (for ages earlier than d or later than +7.5 d the number of SN Ia with spectra that satisfy our cuts falls below 20, and we do not trust the results). We report the best ratio at each age in Table 4, for both the only and models.
Rank  WRMS  
6540  4580  24  
6630  4400  26  
6630  4040  26  
6590  4490  26  
6590  4890  25  
4610  4260  24  
6420  5290  26  
5550  6630  26  
6540  5580 