Standing on the shoulders of Dwarfs: the Kepler asteroseismic LEGACY sample I — oscillation mode parameters
The advent of space-based missions like Kepler has revolutionized the study of solar-type stars, particularly through the measurement and modeling of their resonant modes of oscillation. Here we analyze a sample of 66 Kepler main-sequence stars showing solar-like oscillations as part of the Kepler seismic LEGACY project. We use Kepler short-cadence data, of which each star has at least 12 months, to create frequency power spectra optimized for asteroseismology. For each star we identify its modes of oscillation and extract parameters such as frequency, amplitude, and line width using a Bayesian Markov chain Monte Carlo ‘peak-bagging’ approach. We report the extracted mode parameters for all 66 stars, as well as derived quantities such as frequency difference ratios, the large and small separations and ; the behavior of line widths with frequency and line widths at with , for which we derive parametrizations; and behavior of mode visibilities. These average properties can be applied in future peak-bagging exercises to better constrain the parameters of the stellar oscillation spectra. The frequencies and frequency ratios can tightly constrain the fundamental parameters of these solar-type stars, and mode line widths and amplitudes can test models of mode damping and excitation.
Subject headings:Asteroseismology – stars: evolution – stars: oscillations – stars: fundamental parameters
The study of stars and extrasolar planets via the properties of their host stars has experienced a revolution in recent years (Chaplin & Miglio 2013; Christensen-Dalsgaard 2016). This largely arose from the successful application of asteroseismology using observations from the CoRoT (Baglin et al. 2009) and Kepler missions (Gilliland et al. 2010). This application has been made possible by extracting high-precision parameters from the stellar frequency-power spectra owing to the long time-baseline and photometric quality of these space missions.
Asteroseismology allows the determination of fundamental stellar parameters such as mass, radius, and age through modeling of individual mode frequencies or frequency-difference ratios. The Kepler mission has already provided stellar parameters for a number of stars, including planetary hosts, using average seismic parameters (Chaplin et al. 2011a, 2014; Silva Aguirre et al. 2012; Huber et al. 2013), individual frequencies (Christensen-Dalsgaard et al. 2010; Basu et al. 2010; Howell et al. 2012; Metcalfe et al. 2012, 2014, 2015; Van Eylen et al. 2014; Lund et al. 2014c; Campante et al. 2015; Silva Aguirre et al. 2015), and frequency-difference ratios (Silva Aguirre et al. 2013, 2015; Lebreton & Goupil 2014).
The high precision of extracted mode frequencies further allows the study of ionization zones and the convective envelope boundary from acoustic glitches (Houdek & Gough 2007; Silva Aguirre et al. 2011; Mazumdar et al. 2014; Verma et al. 2014), and one may also learn about the physics of the excitation and damping of the oscillation modes from measured mode line widths, amplitudes, and visibilities (Houdek et al. 1999; Houdek 2006; Samadi et al. 2005, 2007; Belkacem et al. 2012).
Frequencies and line widths have been reported for several solar-like and subgiant stars observed by Kepler by Appourchaux et al. (2012b, a, 2014a), and planet-hosting stars by Davies et al. (2016). In this paper we analyze a sample of 66 main-sequence (MS) solar-like stars observed for at least 12 months by the Kepler mission. We extracted mode parameters by ‘peak-bagging’111First coined by Roger Ulrich circa 1983 (private communication) from the analogy to hill climbing where it refers to reaching the summits of a collection of peaks, the term was later re-introduced by Appourchaux (2003b). the frequency-power spectra of the stars using a Markov chain Monte Carlo (MCMC) method (Handberg & Campante 2011) and used the Bayesian quality control presented by Davies et al. (2016). For each star we report values for the mode frequencies, amplitudes, line widths, and visibilities. Additionally, we provide summary descriptions for each of the above quantities, such as average seismic parameters derived from the frequencies and prescriptions of the mode line widths against frequency. The frequencies reported here will be modeled in the accompanying paper by Silva Aguirre et al. (2016, hereafter Paper II). The lessons learned from the presented analysis will be useful for the study of MS solar-like oscillators with the TESS (Ricker et al. 2014) and PLATO (Rauer et al. 2013) missions, and the continued analysis of these data from K2 (Chaplin et al. 2015; Lund et al. 2016a, b).
The paper is structured as follows: Section 2 describes the target sample, including the preparation of Kepler data and spectroscopic properties. Section 3 is devoted to the parameter estimation from the MCMC peak-bagging, including a description of the fitting strategy, the adopted Bayesian quality assurance, and the derivation of frequency difference ratios and their correlations. In Section 4 we present our results from the peak-bagging for the mode frequencies, focusing specifically on frequency errors and average seismic parameters in Section 4.1; amplitudes in Section 4.2; line widths in Section 4.3; and visibilities in Section 4.4. In Section 5 we give an example of the output generated for each of the analyzed stars. We conclude in Section 6.
2. Target sample
Our sample consists of 66 solar-type oscillators observed by the Kepler satellite, all part of the KASC (Kjeldsen et al. 2010) working group 1 (WG1) sample of solar-like p-mode oscillators. All stars have short-cadence (SC; ) observations with an observing base line of at least 12 months, and represent some of the highest signal-to-noise solar-like oscillators observed by Kepler. The sample consists only of main-sequence (MS) and slightly more evolved subgiant stars. These have frequency structures corresponding to the ‘Simple’ or ‘F-type’ categories by Appourchaux et al. (2012a), i.e., none of the stars show obvious bumped dipole modes. The sample was peak-bagged as part of the Kepler dwarf seismic ‘LEGACY’ project, with the asteroseismic modeling of extracted parameters presented in Paper II. In Figure 1 the sample is shown in a Kiel-diagram ( vs. ), with parameters adopted from Paper II; for additional details on the sample see Table 1. We note that all targets from the Appourchaux et al. (2014b) study of oscillation mode line widths (which included data up to Quarter 12) are part of our sample, with the exception of KIC 3424541, 3733735, 10355856, and 10909629. These four stars were classified as F-type by Appourchaux et al. (2014b), but were omitted from our sample because of possible mixed-mode structures.
|16 Cyg A||Simple||4||4|
|16 Cyg B||Simple||4||4|
2.1. Data preparation
For most targets, data were taken continuously from Quarter 5 (Q5) through Q17. To minimize gaps in the time series, data from the initial short quarters (Q0 or Q1) were omitted unless continuous with the subsequent data. Table 1 lists the quarters used for each target. Light curves were constructed from pixel data downloaded from the KASOC database222www.kasoc.phys.au.dk, using the procedure developed by S. Bloemen (private comm.) to define pixel masks for aperture photometry. The light curves were then corrected using the KASOC filter (see Handberg & Lund 2014). Briefly, the light curves were first corrected for jumps and concatenated. They were then median filtered using two filters of different widths — one long, one short — with the final filter being a weighted sum of the two filters based on the variability in the light curve. For the four Kepler objects of interest (KOIs) in the sample (KICs 3632418, 9414417, 9955598, and 10963065) an iterative removal of the planetary transits was performed based on the planetary phase-curve (see Handberg & Lund 2014 for further details).
The power density spectrum (PDS) returned from the KASOC filter is made from a weighted least-squares sine-wave fitting, single-sided calibrated, normalized to Parseval’s theorem, and converted to power density by dividing by the integral of the spectral window (Kjeldsen & Frandsen 1992; Kjeldsen 1992).
2.2. Atmospheric and stellar parameters
We have obtained atmospheric parameters from the Stellar Parameters Classification tool (SPC; see Buchhave et al. 2012), with data from the Tillinghast Reflector Echelle Spectrograph (TRES; Szentgyorgyi & Furész 2007; Fürész 2008) on the 1.5-m Tillinghast telescope at the F. L. Whipple Observatory. Information from the SPC analysis is available on the Kepler Community Follow-up Observing Program (CFOP) website333https://cfop.ipac.caltech.edu/home/. In the SPC derivation of parameters, values were fixed to the asteroseismic values given in Chaplin et al. (2014) to decrease the impact on uncertainties from correlations between , , and . We added in quadrature to the derived uncertainties on and systematic uncertainties of and , as suggested by Torres et al. (2012). For a subset of targets, spectroscopic values were taken from the literature (Table 1). We also list in Table 1 the line-of-sight (LOS) velocities derived from the SPC analysis, which should be used in any modeling efforts using individual frequencies to account for the Doppler shift of the frequencies (Davies et al. 2014b). In Figure 2 we show the values of these Doppler frequency shifts, which in some cases exceed the uncertainties on the individual frequencies. Even if the frequency shift is small compared to the uncertainties on the mode frequencies, it is systematic and should therefore always be corrected to avoid biases in the stellar modeling. The SPC LOS values have been corrected by to put the velocities onto the IAU system. This correction is primarily accounting for the fact that the CfA library of synthetic spectra does not include the solar gravitational redshift. Stellar parameters used in this paper, such as masses and radii, are adopted from the modeling effort presented in Paper II.
2.3. Sun-as-a-star data
As part of the project, the Sun was fitted in the same manner as the sample targets (see Section 3). This was done primarily to test the modeling efforts presented in Paper II against a known reference, and at the same time to assess the returns from the peak-bagging. The power spectrum was produced from data from VIRGO444Variability of Solar Irradiance and Gravity Oscillations (Fröhlich 2009) on-board the SoHO555Solar and Heliospheric Observatory spacecraft (Fröhlich et al. 1995; Frohlich et al. 1997). Specifically, a time series was created from a weighted sum of the green and red channels of the VIRGO Sun photometers (SPM) with central wavelengths of 500 nm (green), and 862 nm (red). Weights were selected such that the response-function weighted centroid wavelength from the two SPM channels matched that from the Kepler response function (641.7 nm). The two-component light curves were filtered individually using a 30-day median filter and then summed in relative flux units with the appropriate weights (green: 0.785; red: 0.215). The solar time series had a length of 1150 days (corresponding to years, or the approximate duration of 13 Quarters). This is the typical time series length for targets in the sample. To find the level to which the spectrum should be degraded, the magnitude distribution was computed for the sample, including also stars that have a mixed-mode character. The median magnitude of closely matches that of KIC 9139151 and so noise was added to the solar time series to match the level of this star.
The solar data set will primarily be used for estimates relating to frequencies, such as and , but not for analysis of line widths, amplitudes, or visibilities. This is because one cannot, with the simple weighting of relatively narrow band filters done here, assume that the measurements of amplitudes and visibilities adhere strictly to what would be observed with Kepler.
3. Parameter estimation
3.1. Oscillation spectrum model
To model the power spectrum, we described each oscillation mode as a Lorentzian function (), which corresponds to the shape of a stochastically-excited and intrinsically-damped mode (Batchelor 1953; Kumar et al. 1988):
Each mode is characterized by the frequency of the zonal () component, a height , a FWHM mode width , and a rotational splitting (assumed constant with frequency; see Lund et al. 2014b, for a discussion of the impact of differential rotation on the constancy of ). In , is the geometrical factor that sets the relative visibilities between the (azimuthal) -components as a function of the stellar inclination (see, Dziembowski 1977; Gizon & Solanki 2003); denotes the squared visibility (power units) of a non-radial mode relative to a radial mode at the same frequency, i.e.,
from the spatial filtering resulting from integrating the intensity for a mode of a given degree over the stellar surface; then denotes the height of the radial mode of order . The use of assumes equipartition of energy between modes of different angular degrees, thus only with a dependence on frequency. This is a good assumption for stochastically excited low degree high order acoustic modes, observed for many lifetimes (see, e.g., Christensen-Dalsgaard & Gough 1982).
The full model to fit to the power spectrum is then given by a series of the Lorentzian functions in Equation 3.1 as
Here denotes the adopted background function; is a constant white shot-noise component; describes the apodization of the signal power at frequency from the -minute sampling of the temporal signal (see, e.g., Chaplin et al. 2011c; Kallinger et al. 2014), and is given by:
where gives the integration time for the observations666In Kepler equals the sampling time wherefore sometimes is given as , where is the Nyquist frequency — this is, however, an imprecise definition, because relates to the sampling time whereas the apodization relates to the integration time.. For the background we used the function (Harvey et al. 1993; Andersen et al. 1994):
which characterizes a temporal signal from granulation having an exponentially decaying autocovariance, with a power of the temporal decay rate as ; gives the characteristic time scale of the th background component; the corresponding root-mean-square (rms) variation of the component in the time domain. The normalization constants are such that the integral (for positive frequencies) of the background component equals , in accordance with the Parseval-Plancherel theorem (see, e.g., Michel et al. 2009; Karoff et al. 2013; Kallinger et al. 2014).
In fitting Equation 3.3 to the power spectrum, we varied the mode amplitude (square-root of integrated mode power) rather than the mode height to decrease the correlation with (Toutain & Appourchaux 1994). To obtain the height () in power density units from the varied amplitude () we used the relation (Fletcher et al. 2006; Chaplin et al. 2008b):
This is a valid approximation for a single-sided power spectrum when the modes are well resolved, i.e., when the observing duration greatly exceeds the mode life time . We note that and were varied for radial modes only (), and then linearly interpolated to the frequencies of the non-radial modes. The fitting of the power spectrum then finally involved estimating the parameters .
3.2. Fitting strategy
Parameters were estimated in a Bayesian manner from a global peak-bagging fit to the power spectrum including all parameters (see, e.g., Handberg & Campante 2011). This was done by mapping the posterior probability of the parameters given the data and any prior information , which from Bayes’ theorem is given as:
Here is the prior probability assigned to the parameters given , and is the likelihood of the observed data given the parameters . , known as the evidence, is given by the integral of the numerator over the full parameter space, and thus acts as a normalization. The evidence is unnecessary in the mapping of the relative posterior distribution, so we end up mapping:
where logarithmic units are adopted for numerical stability, and is a constant. Assuming a 2-d.o.f. statistic for the power spectrum relative to the limit spectrum in Equation 3.3 (Gabriel 1994), the logarithm of the likelihood for a given observed power, , relates to the limit spectrum, , as (see Duvall & Harvey 1986; Anderson et al. 1990; Toutain & Appourchaux 1994):
Mapping of Equation 3.7 was performed using an affine invariant ensemble Markov chain Monte Carlo (MCMC) sampler (see Goodman & Weare 2010), specifically via the Python implementation emcee by Foreman-Mackey et al. (2013). For a given fit we employed 500 so-called walkers that were initiated by sampling from the priors of the model parameters (see Section 3.2.2). Each of these was run for at least 2000 steps. We further adopted parallel tempering using five temperatures, with tempering parameters determined according to Benomar et al. (2009), and a thinning of the MCMC chains by a factor of 10. As part of the post-processing, the appropriate burn-in for a given target and whether sufficient mixing had been achieved was determined by (1) visual inspection of the chain traces, (2) using the Geweke diagnostic (Geweke 1992), and (3) by assessing the length of the chain compared to the autocorrelation time (giving the number of independent draws from the target distribution).
Final parameter estimates were obtained from the median (frequencies) or mode (amplitudes, line widths, and visibilities) of the marginalized posterior probability density functions (PDFs) — with the MCMC sampling the marginalization is obtained naturally and the PDF for a given parameter is simply given by the normalized distribution of the samples of the parameter. A measure for the parameter uncertainty is given by the credible interval as the interval spanning the highest probability density (HPD) of the PDF.
3.2.1 Mode identification and initial guesses
Before the peak-bagging can commence, initial guesses must be defined for the mode-frequencies to include in Equation 3.3, and the modes must further be identified in terms of their angular degree . For acoustic modes of high radial order and low angular degree the frequencies may be approximated by the asymptotic relation (Tassoul 1980; Scherrer et al. 1983):
Here is the large separation, given by the average frequency spacing between consecutive overtones for modes of a given ; is a dimensionless offset sensitive to the surface layers (see, e.g., Gough 1986; Pérez Hernández & Christensen-Dalsgaard 1998; Roxburgh 2016); is sensitive to the sound-speed gradient near the stellar core (Scherrer et al. 1983; Christensen-Dalsgaard 1993). Mode identification was then, by and large, achieved via visual inspection of échelle diagrams (Grec et al. 1983; Bedding 2011). Here, modes of a given will form vertical ridges for the correct average large separation. The identification of and radial order was checked against the relation for as a function of (Figure 3), where can be found from the échelle diagram (Figure 4) by the vertical position of the radial degree () ridge (see White et al. 2011b, a).
For this study, consisting of high S/N oscillation signals, the identification was relatively simple. Initial guesses for mode frequencies were primarily defined by hand from smoothed versions of the power density spectra. These were checked against frequencies returned from applying the pseudo-global fitting method of Fletcher et al. (2009).
The power spectrum was fitted in the range to , where and denote the minimum and maximum mode frequency included in the peak-bagging. Before the peak-bagging fit, a background-only fit was performed in the range from to the SC Nyquist frequency (). In this fit, the power from solar-like oscillations was accounted for by a Gaussian envelope centered at . Using the posterior distributions from the background-only fit as priors in the peak-bagging allowed us to constrain the background in the relatively narrow frequency range included.
3.2.2 Prior functions
For mode frequencies, we adopted wide top-hat priors centered on the initial guesses of the frequencies. Top-hat priors were also adopted for the rotational frequency splitting and inclination , with the inclination sampled from the range to . The reason for the extended range in inclination is that if the solution is close to either or any sharp truncation from a prior at these values will make it difficult to properly sample these extreme values. The final posterior on the inclination was then obtained from folding the samples onto the range from to .
For the amplitudes and line widths we adopted a modified Jeffrey’s prior, given as (see, e.g., Handberg & Campante 2011)
which behaves as a uniform prior when and a standard scale invariant Jeffrey’s prior when . The maximum of the prior occurs at .
For mode visibilities we adopted truncated Gaussian functions as priors, defined as:
with given as:
Here, denotes the error function, and give the chosen mode value and width of the Gaussian, and and give the lower and upper truncations of the Gaussian. We specifically adopted for , for , and for . Similarly, we adopted truncated Gaussian priors for the parameters of the background. Here we used as the Gaussian mode value () the median of the posteriors from the background-only fit (see Section 3.2.1), and we adopted , , and .
To ensure that the mode identification did not swap for neighboring and modes, which is a risk especially at high frequency where the small frequency separation is small compared to the mode line width, we added the prior constraint that on that it must be positive. In principle could be negative in the event of bumped modes, however, because the stars were screened for bumped modes and the strength of an avoided crossing is expected to be lower than that of a mode due to the larger evanescent region (see, e.g., Aerts et al. 2010; Deheuvels & Michel 2011), we do not expect values of for these stars.
3.2.3 Quality assurance
For each fitted mode, we computed a metric for the quality of the fit in the same manner as detailed in Davies et al. (2016), see also Appourchaux (2004) and Appourchaux et al. (2012a). Briefly, we first ran a fast null hypothesis () test to identify which modes had an unambiguous detection, and for which the probability of detection conditioned on the data needed to be explicitly determined and evaluated. This was done because the explicit determination of is computationally expensive. In the fast test it was assessed whether the S/N in the background corrected power spectrum () for a given proposed mode, with the power binned across a number of frequencies to account for the spread in power from the mode line width, was consistent with a pure noise spectrum or whether the hypothesis could be rejected at the level (Appourchaux 2003a, 2004; Lund et al. 2012). When the high S/N modes had been identified in this manner the probability of detection was computed for the remainder low-S/N modes.
In the computation of both the probability of assuming , , and the probability of the alternative hypothesis of a detected mode, , need to be estimated. The latter was assessed by integrating the probability of measuring the data given a model over a range of mode parameters — this integration was achieved by marginalizing over , with the parameter space sampled over the posteriors from the peak-bagging using emcee. Specifically, a mixture-model was used in which both and were optimized simultaneously to give , the probability of observing the data given the model of a given set of modes with parameters :
Here, the parameter , ranging between 0 and 1, then gives the probability of the detection of the given set of modes. This probability was kept free in the emcee run, and in the end was assessed from the posterior distribution of (Hogg et al. 2010; Farr et al. 2015). We finally report the Bayes factor , given as the median of the posterior probability distributions of the natural logarithm of the ratio of over , as:
The value of can then be assessed qualitatively on the Kass & Raftery (1995) scale as follows:
For a detailed account of the quality control we refer to Davies et al. (2016).
3.3. Derived quantities and correlations
Besides the parameters included in the model of the power spectrum, we computed parameters for derived quantities, such as frequency difference ratios. Firstly, we derived the frequency ratios defined as (Roxburgh & Vorontsov 2003)
Here, and are the smooth five-point small frequency separations defined as
and the large separation is
These ratios are useful for model fitting, where they can be used instead of individual frequencies (Lebreton & Goupil 2014; Silva Aguirre et al. 2015) because they are largely insensitive to the stellar surface layers (Kjeldsen et al. 2008a; Ball & Gizon 2014; Roxburgh 2015; Ball et al. 2016). For further details on the use of these ratios we refer to Roxburgh & Vorontsov (2003, 2013), Roxburgh (2005), Otí Floranes et al. (2005), and Silva Aguirre et al. (2011, 2013).
Secondly, we calculated the second differences:
which are useful for studying acoustic glitches from the base of the convection zone and the position of the second helium ionization zone (see, e.g., Basu et al. 1994, 2004; Houdek & Gough 2007; Mazumdar et al. 2014). Figure 5 gives an example of the second differences for KIC 6225718 (Saxo2), together with the best-fitting glitch model from Houdek et al. (in prep.). The second differences shown in Figure 5 were computed using Equation 3.20 on the frequencies given in Table 6, and frequency uncertainties were taken as the average of the asymmetric uncertainties.
In computing these derived quantities we used the full posterior probability distributions (PPDs) of the individual frequencies entering in the descriptions, rather than using simply the median value for the PPD of a given frequency. This ensured that any asymmetries, and deviations from a Gaussian shape in general, that might be for the PPDs of the individual frequencies were properly propagated to the description of the derived quantity. The final value and credible interval were then computed from the distribution of the quantity in the same manner as for the parameters describing the model power spectrum. Using the full distribution also allowed us to easily compute the correlations between the above frequency differences and ratios, such that these might be included as a covariance matrix in any fit to the quantities.
The parameter correlations were calculated in a robust way using the median absolute deviation (MAD) correlation coefficient (see Pasman & Shevlyakov 1987; Shevlyakov & Smirnov 2011). The MAD estimator is given by the median of the absolute deviation around the median. We opted for instead of the standard Pearson product-moment correlation coefficient, because the latter would be very susceptible to even a single walker in the MCMC optimization straying away from the stationary solution. The between two parameters and was calculated as follows:
where and are the robust principle variables for and :
Below we present some of the conclusions that can be drawn on the different parameters extracted from the peak-bagging. Results on the rotational splittings and inclination angles will be presented in a separate paper. All results will be made available on the KASOC database\@footnotemark.
4.1. Mode frequencies
4.1.1 Frequency uncertainties
A proper understanding of the uncertainties on mode frequencies is important, because they will ultimately limit the precision with which stellar parameters can be estimated from modeling the individual frequencies. We may compare the frequency uncertainties from the peak-bagging with expectations from an analytical maximum likelihood (ML) approach (see, e.g., Libbrecht 1992; Toutain & Appourchaux 1994; Ballot et al. 2008). It should be noted that the ML estimator (if unbiased) reaches the minimum variance bound, in accordance with the Cramér-Rao theorem (Cramér 1946; Rao 1945). Thus one should expect uncertainties at least as large as those from the ML estimator (MLE). The standard way of obtaining uncertainties on a ML estimator is from inverting the negative Hessian matrix, and therefore the standard parameters come from the diagonal elements of the resulting variance-covariance matrix. The Hessian itself is obtained from the matrix of second derivatives of the log-likelihood function with respect to the parameters. Assuming an isolated mode as described by Equation 3.1 and a likelihood function as given in Equation 3.9, one may follow Libbrecht (1992) and Toutain & Appourchaux (1994