The Octave (Birmingham  Sheffield Hallam) automated pipeline for extracting oscillation parameters of solarlike mainsequence stars
Abstract
The number of mainsequence stars for which we can observe solarlike oscillations is expected to increase considerably with the shortcadence highprecision photometric observations from the NASA Kepler satellite. Because of this increase in number of stars, automated tools are needed to analyse these data in a reasonable amount of time. In the framework of the asteroFLAG consortium, we present an automated pipeline which extracts frequencies and other parameters of solarlike oscillations in mainsequence and subgiant stars. The pipeline uses only the timeseries data as input and does not require any other input information. Tests on 353 artificial stars reveal that we can obtain accurate frequencies and oscillation parameters for about three quarters of the stars. We conclude that our methods are well suited for the analysis of mainsequence stars, which show mainly pmode oscillations.
keywords:
methods: data analysis – stars: mainsequence – stars: oscillations1 Introduction
Stars with subsurface convection zones, like the Sun, display acoustic oscillations. The stochastic excitation mechanism limits the amplitudes of the oscillations to intrinsically weak values. However, it gives rise to a rich spectrum of oscillations. The excited pressure (p) modes probe different interior volumes, with the radial and other low angulardegree modes probing as deeply as the core. This differential penetration of the oscillations allows the internal structure and dynamics to be inferred as a function of depth. Seismic studies of the Sun have indeed proven to be very powerful in inferring its internal structure.
The fact that the solarlike oscillations have such small amplitudes has made observations of these oscillations in stars other than the Sun very challenging. Over the past few years asteroseismic observations of mainsequence stars up to evolved redgiant stars have been made using Doppler velocity measurements from groundbased spectrographs, e.g. ELODIE (Baranne et al., 1996), CORALIE, HARPS (Queloz et al., 2001), UCLES and UVES (D’Odorico, 2000), and photometric spacebased instruments such as WIRE (Buzasi, 2000), MOST (Matthews et al., 2000) and CoRoT (Baglin et al., 2006). This has led to detections of solarlike oscillations in more than ten mainsequence stars and of the order of one thousand red giants. For a recent, but preCoRoT, review of these results see Bedding & Kjeldsen (2008). CoRoT results for mainsequence stars are presented by e.g., Michel et al. (2008); Appourchaux et al. (2008); García et al. (2009), while De Ridder et al. (2009) and Hekker et al. (2009) present first CoRoT results for red giants.
The NASA Kepler satellite was launched successfully into an Earth trailing orbit on March 7, 2009. The satellite contains a Schmidt telescope with a 0.95m aperture and a 105deg field of view, equipped with a highly sensitive photometer with a spectral bandpass from 400 to 850 nm. It is designed to continuously and simultaneously monitor 100 000 stars brighter than 14 magnitude. Kepler will be pointed towards the constellations Cygnus and Lyra during the entire mission, which has a nominal length of 3.5 years. For most stars, data will be integrated over 30 minutes, while for approximately 512 stars at a time, data with a 1minute cadence will be obtained. Although the driving goal for the development of Kepler is to observe transiting Earthlike exoplanets, these observations are very well suited for asteroseismology, including lowamplitude solarlike oscillations. Even so, asteroseismology will contribute to the exoplanet investigations by determining radii of the planethosting stars, which are needed to extract the planetary radii. The radii of stars exhibiting solarlike oscillations can be obtained using the difference in frequency between modes with consecutive radial orders, i.e., the large separation (). This is a measure of the sound travel time across the star, which depends on the density of the star. See Stello et al. (2009) for an overview of current approaches to obtain radii. The asteroseismic potential of Kepler is described in more detail by ChristensenDalsgaard et al. (2008).
Apart from the determination of radii, the detection of solarlike oscillations in stars at different epochs along stellar evolutionary life cycles offers the prospect to test theories of stellar evolution and stellar dynamos for many stars. The input data for probing stellar interiors are the mode parameters. Accurate mode parameters are a vital prerequisite for robust, accurate inference on the fundamental stellar parameters.
We expect in total of the order of a thousand solarlike stars to be observed by Kepler in short cadence. Shortcadence oscillation data are needed to observe solarlike oscillations in mainsequence stars and subgiants as these occur at frequencies of the order of a fewhundred up to severalthousand microHertz. The shortcadence (1 minute) data have a Nyquist frequency of 8300 Hz, while the Nyquist frequency of the longcadence (30 minute) data is 275 Hz.
In preparation for the Kepler mission, the asteroFLAG consortium has developed automated tools to analyse solarlike oscillations in mainsequence stars (e.g. Huber et al., 2009; Mathur et al., 2009; Mosser & Appourchaux, 2009), and tested these tools extensively, see e.g., Chaplin et al. (2008); Stello et al. (2009). Automated tools are needed to cope with the large number of stars we expect to be observed with Kepler. Here, we present an automated pipeline, built to determine oscillation parameters of solarlike oscillations in mainsequence stars and subgiants, which we describe in Section 3 (some mathematical details are deferred to Appendix A). We compare the results (Section 4) of the automated analysis with the input parameters used for simulations of realistic artificial data of a few hundred stars, prepared as if they were observed with Kepler in shortcadence (Section 2).
2 Simulated data
The simulated timeseries are based on stellar parameters available in the Kepler Input Catalogue (KIC) (Brown et al., 2005) for the mainsequence and subgiants commissioning and survey targets. For these simulations, stellar parameters are randomly chosen within the expected formal and systematic errors around their KIC values and used as inputs to the model grid prepared for the Aarhus Kepler pipeline (Quirion et al. (2009), in preparation). The resulting parameters and model frequencies have then been used for the simulations. Rotation effects, granulation, activity and white noise corresponding to the brightness of the target have been added. Also the lifetimes of the oscillation modes have been varied. All time series are 30 days long with a 1minute cadence.
The stellar models were generated with the Aarhus stellar evolution code (ChristensenDalsgaard, 2008b) using the OPAL equation of state (Iglesias & Rogers, 1996) along with the Grevesse & Noels (1993) solar mixture using the OPAL and Alexander & Ferguson (1994) opacity tables. The frequencies of the p modes were calculated using the adiabatic pulsation code ADIPLS (ChristensenDalsgaard, 2008a). The timeseries are generated using a combination of the asteroFLAG and Aarhus simulators (Chaplin et al., 2008; Stello et al., 2004).
3 Methodology
We developed an automated pipeline to obtain the following (oscillation) parameters of mainsequence and subgiant solarlike oscillators from Fourier spectra of the timeseries observations:

frequency range of oscillations,

frequency at which maximum oscillation power occurs,

parameterisation of the background of the entire Fourier spectrum,

average large frequency separation between consecutive radial orders,

maximum mode amplitude and amplitude envelope of the oscillations,

linewidth (lifetime) at the frequency of maximum power of the oscillations,

individual frequencies.
Our methods to determine these parameters are described below, with some mathematical details and error calculations in Appendix A.
We stress here that this pipeline only uses the timeseries data as input and does not require any other information.
3.1 Frequency range of the oscillations
We are looking for highorder, lowdegree, solarlike (pmode) oscillations, the frequency () of which we expect to follow approximately the asymptotic relation (Tassoul, 1980). In the present study we use the following version of the relation:
(1) 
where is the radial order and the angular degree. is the large separation, which is sensitive to the sound travel time across the star and is a constant sensitive to the surface layers. is related to the small separation () between adjacent modes = 0 and = 2 by the expression 6. Because we deal with photometric data it is unlikely that we can observe = 3 modes due to cancellation effects.
We know from theoretical models and observations in the Sun and other stars that , and depend slightly on frequency and angular degree. Because the changes in are usually relatively small, we consider constant to a first approximation, and search for a frequency range in which the power spectrum has peaks at nearequidistant intervals.
From 200 Hz up to the Nyquist frequency, the power spectrum is divided into windows of variable width () depending on the location of the central frequency of the window (), with separated by /4. The frequency is used as a proxy of and is therefore expected to scale with the acoustic cutoff frequency. Hence, is defined as = (/) , with = 3100 Hz, i.e., the central frequency of the oscillations in the Sun, and = 2000 Hz, the expected width of the frequency interval over which we may find oscillations in the Sun, were the Sun to be observed as a bright star with Kepler.
To find equidistant frequency peaks, we compute the power spectrum of the power spectrum (PSPS), which is equivalent to the autocorrelation of the time series, in each frequency window (see Fig. 1 for an example of a PSPS). Subsequently, we check for the presence of features at predicted values of /2, /4 and /6. The predicted value of is obtained from (see Stello et al. (2009) and Hekker et al. (2009)), and we allowed for a 30% deviation from the predicted value. When the probability of the presence of these three features being due to noise is less than 0.2, we interpret this as oscillations being present in the considered window. All windows in which we find equidistant frequency peaks are selected as part of the frequency range of the oscillations. For details on the computation of the probability see Appendix A.
In subgiants, for which we expect oscillations in the frequency range 100  1000 Hz, the assumption of regularly spaced frequencies may no longer be valid due to the presence of gmodes and mixed modes. The increased luminosity of these more extended stars results in higher mode amplitudes (). This is because (/), with and the stellar luminosity and mass, respectively. The value of exponent is of the order of 1, but still debated in the literature, e.g., see Kjeldsen & Bedding (1995) and Samadi et al. (2005). As a result of the increased mode amplitudes we expect oscillations with a good signaltonoise ratio and therefore the presence of prominent peaks in the power spectrum. Therefore, in case we did not find equidistant frequency peaks, we fit a background signal including granulation, activity and white noise to the power spectrum and check whether there is a significant power excess with respect to this fit in the frequency range 100  1000 Hz (see Fig. 6 for examples of such fits). If this is the case, the frequency range of this power excess is taken to be the oscillation frequency range. We note here that before applying this fitting to real data one has to check for possible observational artefacts in the data, which can possibly have a similar signal in the power spectrum.
In case we do not detect any interval with equidistantly spaced frequencies, or significant power excess due to oscillations, we search once more for oscillation frequencies separated by , but now we do not assume to be constant with frequency. To account for this frequency dependency, we stretch (or compress) the frequency axis of the power spectrum slightly. This stretching is performed in such a way as to produce an equidistant pattern of peaks on the stretched, as opposed to the original, frequency axis. The PSPS of the stretched power spectrum will therefore show a stronger (more prominent) signature of the large spacing than the PSPS of the original spectrum.
The stretching depends on the value of and on the frequency range of the oscillations. We assume a maximum of 10 change in over the oscillation frequency range. The maximum stretch () is therefore:
(2) 
Then we compute the stretched frequency () as follows:
(3) 
where denotes the central frequency of the considered frequency range, which will usually be the frequency at which maximum power occurs, i.e., . Furthermore, is an integer which may have both positive and negative values, i.e. negative stretching means effectively compressing the power spectrum. To find the optimum stretch value we search for the value of for which we find minimum probability of the features in the PSPS to be due to noise.
3.2 Background signal
A background signal () consisting of granulation, activity and white noise is fitted to a binned power spectrum where we computed the average power over independent bins. The frequency range of the oscillations is excluded. Granulation and activity are represented by power laws, from which we obtain the time scales ( and ) and power ( and ) of both phenomena respectively (see Eq. 4). The granulation exponent is left as a free parameter, while the activity exponent is fixed to 2. Fixing the activity exponent is justified by the fact that we assume an exponential decay of the activity over time. For the granulation the exponent is a free parameter, as in the original Harvey model (Harvey, 1985) the granulation is modelled with three exponentially decaying power laws. For the present data, we can only fit one power law for the granulation due to the limited resolution and the input in the simulations, and therefore we do not assume exponential decay, i.e., fix the exponent at 2. Note that the background in the simulated data on which we tested our pipeline was comprised of two power laws. Should we find that more than two power laws are needed in real data, it will be straight forward to modify the code accordingly. In addition to the power laws, we add an offset which contains mostly white noise. However, in cases where no oscillations can be detected some oscillation signal might also be present in this offset. The final form of the background model used is:
(4) 
For the input parameters we chose for and the maximum power of the binned power spectrum and 0.001 times this value, respectively. Furthermore, the inputs for and were 100 000 and 1000 seconds, while the input value for was the mean power at high frequencies outside the oscillation range. As a first estimate, we chose to be equal to 2. To obtain the optimal fit, we vary the input parameters slightly. We randomly select one of the fitting parameters and multiply this by:
(5) 
where, denotes a random number that has a normal distribution with a mean of zero and standard deviation of one. To minimise boundary effects from the excluded oscillation frequency range we vary this range by extending or shortening it on both sides by 1/6 of the length of the original frequency range. After repeating this 200 times for each excluded oscillation range, the fit with the lowest is used as the bestfitting background fit.
In a few cases, the fitting with two power laws does not work properly. This is because only one decaying profile is visible in the data, either due to the presence of the oscillations at the same frequency as the hump of the second decaying profile, or due to a too low a signaltonoise ratio, i.e., high white noise or low signal, or a combination of the two. In these instances we fit only for one power law and the offset . This does not provide us with an optimal fit at low frequencies (below 10 Hz), and the parameters cannot be used to infer properties of granulation and activity. However, at higher frequencies ( 100 Hz), where the oscillations reside the single powerlaw fit provides a reasonable estimate of the background. The standard deviations of the fitting parameters are used as errors.
3.3 Average large separation
For the estimation of the large separation () we compute the PSPS in the frequency range of the oscillations. Here, we take into account that depends on frequency and compute the PSPS of a power spectrum with a stretched frequency axis. Determining from the stretched power spectrum provides a more reliable measure of and if required an estimate of the gradient of the large spacing with , / can be made. For more details on the stretching see the last paragraph of Section 3.1. The derivation of the gradient of the large separation is presented in Appendix A. In the PSPS (see Fig 1) we determine the position of the /2 and /4 features. The centroids and uncertainties of these features are computed in two ways. In the first method, we determine the power weighted centroids of the features in the PSPS and their errors are computed as the standard deviation of grouped data (see Eq. 19 in Appendix A). In a second method we compute the Bayesian posterior probability of the points in the PSPS, using the same equations as for the individual frequencies, i.e., Eqs. 8  10, which are discussed in Section 3.6 (see also Broomhall et al. (2009) and Appourchaux et al. (2009). Using these probabilities we compute the posterior weighted centroid of the feature. The interval with a probability of the feature not being due to noise higher than 68.27%, i.e., 1 in a Gaussian distribution, is used as the uncertainty interval. Finally, for both methods, we determine by computing a weighted average of /2 and /4.
We also compute from an autocorrelation of the full oscillation frequency range and from an autocorrelation using the individual oscillation frequencies determined with the Bayesian approach only (again, see Section 3.6). A Gaussian is fitted to the feature at in the respective autocorrelations and the width of the Gaussian is used as an estimate of the error.
3.4 Maximum mode amplitude, amplitude envelope and frequency of maximum amplitude
Our next package provides estimates of the maximum mode amplitude, and the modeamplitude envelope as a function of frequency. Our results are scaled to be equivalent radialmode amplitudes.
In summary, for method I we began by subtracting the background fit from the power spectrum. The resulting, residual power spectrum is averaged over the range occupied by the modes using a boxcar filter of width . Next, we multiply this averaged, residual spectrum by the large frequency spacing , and finally we divide by a constant factor, , to allow for the effective number of modes in each slice of the spectrum. The value of is chosen so that the above procedure gives observational estimates, as a function of frequency, of the power envelope for radial modes. is computed assuming the presence of 4 frequencies in each interval with = 0, 1, 2, 3, with relative power per mode of 1.0, 1.5, 0.5, 0.03 respectively. is the total power we expect in a interval, i.e., 3.03. There is a slight dependence of on limb darkening and thus on and , but the values change only by a few percent and we ignore those changes here.
The highest value of the power envelope is an estimate of the maximum mode power. The mode amplitude envelope and the maximum mode amplitude are given by the square root of the power envelope, and square root of the maximum power, respectively. The frequency at which the maximum mode power occurs is .
We choose to average the spectrum using a boxcar filter as opposed to the Gaussian filter (of width ) adopted by Kjeldsen et al. (2008). This is because it allows us to estimate very straightforwardly uncertainties for the amplitudes, here in independent frequency ranges of width . For example, we estimate the uncertainty of the maximum mode power as the standard deviation of the powers in each bin in the frequency range 3 that contribute to the estimated maximum power. We may then calculate independent averages in ranges on either side of the maximum, to give an estimated power envelope with uncertainties. Uncertainties for the amplitudes follow by remembering that fractional errors on the mode amplitudes are equal to half those on the mode powers.
Our decision to average over 3 was to some extent determined by the following obvious compromise. The narrower the range, the more we avoid smoothing out potentially interesting features of the amplitude envelope, while the wider the range, the less subject we are to fluctuations due to the stochastic nature of the modes. But there is also another important factor, which argues for adopting a wider range: that is to suppress biases in the estimated maximum mode amplitudes when the signaltonoise ratio is quite low.
The frequency at which maximum oscillation power occurs () is computed as the weighted mean frequency of the oscillation power with the error computed from the standard deviation of grouped data (see Eq. 19).
In a second method (hereafter method II), we fit a Gaussian to the binned oscillation power, where the binning is performed over intervals of 2. The height of the Gaussian fit is then converted to amplitude per radial mode by multiplying by /, as per the other approach. We use the standard deviation of the fit parameters to compute the errors. The centre of the Gaussian fit is .
3.5 Linewidth of most prominent modes
We seek a straightforward and robust method for determining from the power spectrum the linewidth shown by the most prominent modes.
Our method relies on the fact that the height in the power spectrum of a solarlike (i.e., damped) mode peak depends not only on the total power of the mode, but also (crucially to our method here) on the linewidth (or equivalently the damping time) of the mode and the intrinsic resolution in frequency of the spectrum. The height, , in units of power per Hertz is well described in both the resolved and unresolved regime by (Fletcher et al., 2006; Chaplin et al., 2009):
(6) 
where is the total power of the mode, is the FWHM linewidth of the mode peak, and is the effective length of the observations. We may reexpress Equation 6 in terms of the intrinsic (or natural) resolution in frequency . Substitution and subsequent rearrangment of the equation then gives the following
(7) 
which is the form required to explain our method. For a range of values of we estimate the ratio of the most prominent radial mode in the spectrum (as explained in the next paragraph below). A plot of versus then yields data following a linear relationship. We fit a straight line to the data, and the intercept on the ordinate in principle provides an estimate of the linewidth, . Evaluation of the spectrum at different is achieved by averaging the spectrum of the full timeseries over different numbers of bins (thereby degrading the intrinsic resolution as required). If is taken to be the effective length of the timeseries, this means that .
We estimate the ratio as follows. We already have an estimate of courtesy of the mode amplitude package in Section 3.4. To estimate the heights in each binaveraged spectrum we simply take the highest power spectral density in the range about . These estimates are only a proxy of the true, underlying , which means that to correctly estimate linewidths we must apply an empirical correction to the results. We found from simulations that a linear correction with both an offset and slope of 0.4 applied to the raw estimates of the linewidth is sufficient for this purpose.
Our simple proxy of may sometimes have been estimated from the most prominent mode (depending on the inclination of the star), when it is the height of the most prominent mode that we require. In our first version of the pipeline, we accept this potential uncertainty, and note that its main effect will be to add some additional scatter to the results. Any bias is taken care of by the empirical correction above.
3.6 Individual frequencies
For the determination of individual frequencies we used a Bayesian approach adapted from Broomhall et al. (2009) and references therein. We want to test whether the power at each frequency in the power spectrum could be the result of a component of a stochastically excited mode ( hypothesis) or due to noise ( hypothesis).
As explained by Appourchaux et al. (2009) we aim to compute the posterior probability of () given the observed data , i.e.,
(8) 
where we assumed that both hypotheses are, a priori, equally probable. For the 2 dof statistics of the power spectrum, the probability of observing given that there is only noise, i.e., the probability of observing if the hypothesis is true, () is:
(9) 
where is the observed power divided by the background.
For the alternative hypothesis , i.e., the probability of observing given that there is signal, we assume that we do not know a priori the mode height . Therefore, we assume that the height can be taken from a uniform distribution between 0 and (see Appendix A for details on the determination of ). We can then compute the probability of observing if the hypothesis is true () as follows (Appourchaux et al., 2009):
(10) 
Frequencies with a posterior probability less than 0.5% are taken to be candidate oscillation frequencies. The 0.5% posterior probability was chosen based on 1000 Monte Carlo simulations performed with the asteroFLAG code (Chaplin et al., 2008) of solar analogs, for which we computed the fraction of real detections for different posterior probabilities (see Fig. 10). The final frequencies () were computed using the parameter estimation:
(11) 
For the integration range, we use the frequency range for which we found that the posterior probability to find signal was larger than 68.27%, i.e, 1 in a Gaussian distribution. We also used this interval as the estimated error. We tested this error estimation by performing 1000 Monte Carlo simulations of one single stochastically excited mode with flat background noise. For each simulation we computed the final frequency and its error. Then we expressed the offset between the computed frequency and input frequency in terms of its error. We see that for 77% of the tests the offset is within 3 and for 89% of the tests it is within 5.
In cases where more than three significant oscillation frequencies could be detected, we used these individual frequencies to compute the large separation from the autocorrelation of the frequencies.
3.6.1 Small separations
We have investigated for how many stars we might be able to find the small separation (), i.e., for how many stars we could see more than two ridges in the échelle diagram. This was the case for less than 10% of the stars. This low number is most likely caused in part by the fact that we set the threshold posterior probability at only 0.5%, which reduces the false alarm rate, but also the number of identified frequencies. Therefore, for most mainsequence stars only two ridges are present in the échelle diagram. Because of the low percentage of stars for which we might be able to identify the small separation with the strict thresholds currently applied, we do not include such computation in the automated pipeline presented here. In a further analysis using peakbagging techniques, will be obtained. These results will be presented by Fletcher et al. (2009), in preparation.
4 Results
We are able to detect oscillations in 260 out of the 353 artificial stars, i.e, nearly 75%. For these stars we estimated the oscillation parameters and background as described in the previous section, which we then compared with the input values used to create the artificial data.
4.1 Oscillation parameters
In Figs. 2 and 3 we compare our results for and with the values computed from the input mass (), radius () and effective temperature (), using the scaling relations (Kjeldsen & Bedding, 1995):
(12) 
(13) 
The values obtained for both (Fig. 2) and (Fig. 3) are in good agreement with the input values, for each of the implemented methods, although a slight overestimation is present in the determined values from method I at higher frequencies where the height of the oscillations decreases (Chaplin et al., 2009). For method I, 54% of our values agree within uncertainties with the input values, while this percentage increases to 97% within 3 times the computed uncertainties. For method II, we find that 37% of our values agree with the input values within their uncertainties, which increases to 91% agreement within 3 times the uncertainties. In 96% of the stars for which we could detect oscillations we found with both methods.
For we found that 93%, 90%, 86% and 97% of our values agree with the input values within 5%, for results computed with the weighted mean of the features in the PSPS, the Bayesian probabilities in the PSPS, autocorrelation of full oscillation frequency range and autocorrelation of individual frequencies, respectively. The uncertainties for computed with the Bayesian probabilities in the PSPS are larger and seem more realistic than for all other methods. For this method 94% of our values agree with the input within 3 times the uncertainties, while this is only 44%, 20% and 7% for the other methods, i.e., weighted mean of the features in the PSPS, autocorrelation of full oscillation frequency range and autocorrelation of individual frequencies, respectively. For the first three methods we find in all stars for which we detect oscillations. For computed with the individual frequencies we have results for 60 % of the stars. Due to the better error estimate the results of the Bayesian probabilities in the PSPS are most reliable. Despite their underestimated errors, the values computed with the other methods are in more than 90 % of the cases compatible (within 3 times the uncertainties) with the Bayesian values.
In Fig. 4 our results for the maximum amplitude per radial oscillation mode are shown as a function of the input maximum amplitude per radial mode. For comparison we also computed estimates of the maximum amplitude using the method of Kjeldsen et al. (2008). The results are consistent with the onetoone relation, and for each method 90% of our amplitude values are consistent with the input values within 3 times the computed uncertainties.
Also, we computed the width of the frequency peaks in the power spectrum. The results are shown in Fig. 5. The input values of the line width follow the relation , and we see in Fig. 5 that for stars brighter than 9 magnitude (black dots) our results for are qualititatively in agreement with this relation. Good signaltonoise ratio is required for this method and for fainter stars the scatter in the results becomes considerably larger.
Four examples of background fits to power spectra with oscillations at different frequencies are shown in Fig. 6, all of which have a of the order of 1. We also compare in Fig. 7 our fitted values of the granulation parameters and with the artificial input values. Here, we see that the results follow the onetoone relation with the input value, but with nonnegligible scatter.
Note that for stars where a detection was made, we were not always able to determine the oscillation parameters with all methods described in Section 3. For each parameter we have at least one method that produces a result for all stars with detected oscillations, but for some methods we have results for fewer stars, down to 60% of the stars with detected oscillations. The quoted percentages are always computed for stars with a result for the considered method.
4.2 (Non)detections of oscillations
Next, we investigate empirically which parameters are of importance for the detection of oscillations in the data. First, we consider the apparent magnitude distribution of the stars with and without detected oscillations, see Fig. 8. As expected the percentage of stars for which we can detect solarlike oscillations decreases for fainter stars.
We fitted the background for all stars, independent of whether we did or did not detect any oscillations. From a comparison of the distribution of the fitted parameters, we find that for stars in which we could not detect any oscillations the offset ( in Eq. 4) is on average larger than for stars in which we could detect oscillations, while the exponent is typically lower. These distributions are shown in Fig. 9. These results are not unexpected since for stars for which we could not detect oscillations, the offset contains both noise and signal, while for stars with detected oscillations the offset mainly consists of noise. The latter can be seen in the bottom panel of Fig. 9, where we plot the offset as a function of the input . We indeed see that for stars with input 3000 Hz for which we did not detect oscillations the offset is higher than for stars for which we could detect oscillations in this frequency range. The exponent influences the slope of the granulation in a loglog plot of a power spectrum. An increase in the offset will decrease the slope and therefore the exponent . From these distributions it might be possible to obtain upper limits on some oscillation parameters. We consider such an investigation beyond the scope of this paper.
5 Discussion and Conclusions
With the methods described in Section 3, we could detect solarlike oscillations and their parameters in 260 out of 353 artificial mainsequence stars and subgiants and individual frequencies in 154 stars, not further discussed here. In general, we have tried to be very cautious to reduce the number of false detections. Special care is taken in the identification of the oscillation frequency interval as an incorrectly identified frequency range will imply missidentifications for all oscillation parameters and the background fitting. Furthermore, parameters such as , and amplitude per radial mode are determined with two or more (independent) methods.
The input values for and are reproduced by our analyses for the majority of stars. Also, for the amplitudes per radial mode, our values are in agreement with the input values.
The widths and thus the life times of the modes can be determined with reasonable accuracy for Kepler stars brighter than 9 magnitude. Our method does not contain detailed fitting, nor does it take into account that lifetimes vary for modes of different degrees. Nevertheless, for the bright stars we do find values consistent with the input relation (Chaplin et al., 2009).
An accurate determination of the value of the background at the oscillation frequencies is important for two reasons. First, the background level is taken into account in the extraction of oscillation parameters such as the amplitude per radial mode, and, secondly, it provides information on the power and time scales of atmospheric parameters. From the fact that we can obtain oscillation parameters which are consistent with the input parameters, we can on the one hand infer that our background estimate in the oscillation frequency range is accurate enough to determine oscillation parameters. On the other hand we still see scatter in the determined time scale and power of the granulation around the input values. The scatter in the granulation parameters does not mean per se that the background level in the frequency interval is uncertain, as we fit a function with 6 parameters, but the parameters should be treated with caution when using them for further investigations of activity and granulation.
In terms of our sensitivity to detect oscillations, we found clear evidence that the percentage of stars for which we can detect oscillations decreases for fainter stars. Also, we found evidence that it is harder to detect solarlike oscillations in cooler ( 5500 K) mainsequence stars than in the hotter ( 5500 K) ones. This is because in the simulations (as in real data) the pulsation amplitudes scale with luminosity, with a weaker dependence on temperature.
In conclusion, the analysis tools compiled into a pipeline presented here proved to identify oscillations for a large fraction of artificial mainsequence stars. For the majority of these stars we determine oscillation parameters within 3 of the input values. The existence of such pipelines will be important to be able to perform an asteroseismic analyses on the many stars we expect to become available from Kepler.
Acknowledgments
We are grateful to the International Space Science Institute (ISSI) for support provided by a workshop program award. This work has also been supported by the European Helio and Asteroseismology Network (HELAS), a major international collaboration funded by the European Commission’s Sixth Framework Programme. SH, WJC, AMB, YPE, STF, RN acknowledge the support by STFC Science and Technology Facilities Council.
Appendix A Methodology extension
Here we provide additional (mathematical) information on the methodology described in Section 3, including determination of errors.
Probability of peaks in PSPS The probability of the presence of equidistant frequency peaks in the PSPS being due to noise is computed as follows. We compute the probability () that a random variable from a 6 degrees of freedom (dof) distribution is larger than the average height of the three peaks at /2, /4 and /6 in the PSPS. Each of these peaks has 2 dof, hence the 6 dof.
Next, we compute the probability of this occuring by chance at least once over the full bins of the PSPS. We must also take account of the fact that in practice we oversample the PSPS by a factor of 10, so all bins are not independent. The resulting probability is given by:
where = 3 has been shown to provide a robust empirical correction for the effect of the oversampling (e.g., Chaplin et al., 2002; Gabriel et al., 2002). Finally, the probability that the peaks are not due to noise is just .
A similar procedure is used to compute the probability of only one peak in the PSPS, such as we use in the computation of the large separation. In these cases is computed as the probability that a random variable from a 2dof distribution is larger than the height of the considered peak in the PSPS.
Gradient of the large spacing Although results are not discussed in the paper, we can estimate / from the stretching as follows. First, we have
(14) 
with the average large spacing over the frequency or range of interest. Now, we may estimate / by differentiating Eq. 3. To differentiate the second term on the right hand side in Eq. 3 we use the substitution = 1, with , which gives:
(15) 
So, after including the differential of the first term on the righthand side, we have
(16) 
from which we obtain as a function of frequency:
(17) 
Finally, we find that the change in as a function of is:
(18) 
The bestfitting = therefore provides a direct estimate of /.
Standard deviation of grouped data The standard deviation of grouped data is used to compute errors in computed from the power weighted centroids in the PSPS, where we interpret each feature in the PSPS as compiled of a number of bins with a certain height () and midpoint ().
(19) 
The same formula is used to compute the error on from the frequencybinned oscillation power as computed in method I for the amplitudes per radial mode. Here, the total power and central frequency of each bin are interpreted as and , respectively.
Bayesian signal hypothesis For the computation of the probability of observing if the hypothesis is true, i.e, (Eq. 10), we need to integrate over an interval between 0 and . To determine we have smoothed the power spectrum over microHertz. then equals the maximum height of the smoothed spectrum minus the mean background noise level.
To determine the optimum we performed Monte Carlo simulations and determined the false detection rates. Spectra were generated using the asteroFLAG code (Chaplin
et al., 2008) for 1000 different stars with random inclinations. We then used different values of to determine and from the obtained candidate frequencies we also determine the ratio of the number of candidates that are actually modes to the total number of mode candidates. The higher this ratio the lower the proportion of false detections.
We also determined the total number of modes that could be detected in each case to ensure that the method was producing a reasonable number of mode candidates.
Note that a mode was counted as a detection if it lay within 2 linewidths of the input frequency and if the posterior probability was less than 0.005. We repeated the simulations for modes with different widths including 0.3, 1.0, 1.7, 2.4, 3.1 and 3.8times solar widths for oscillations in the range 2000  4500 Hz. The optimum value of was found to be 10 Hz, even for the smallest line width.
Similar simulations were performed for stars whose oscillations lay in the range 200  450 Hz. It was found that for frequencies in this oscillation range it was more appropriate to smooth over a narrower to determine . If the oscillations frequencies are 450 Hz we determined by smoothing over = 1 Hz.
References
 Alexander & Ferguson (1994) Alexander D. R., Ferguson J. W., 1994, ApJ, 437, 879
 Appourchaux et al. (2008) Appourchaux T., Michel E., Auvergne M., Baglin A., Toutain T., Baudin F., Benomar O., Chaplin W. J., Deheuvels S., Samadi R., Verner G. A., Boumier P., 2008, A&A, 488, 705
 Appourchaux et al. (2009) Appourchaux T., Samadi R., Dupret M., 2009, A&A, 506, 1
 Baglin et al. (2006) Baglin A., Auvergne M., Barge P., Deleuil M., Catala C., Michel E., Weiss W., The COROT Team 2006, in Fridlund M., Baglin A., Lochard J., Conroy L., eds, ESA Special Publication Vol. 1306 of ESA Special Publication, Scientific Objectives for a Minisat: CoRoT. p. 33
 Baranne et al. (1996) Baranne A., Queloz D., Mayor M., Adrianzyk G., Knispel G., Kohler D., Lacroix D., Meunier J.P., Rimbaud G., Vin A., 1996, A&AS, 119, 373
 Bedding & Kjeldsen (2008) Bedding T. R., Kjeldsen H., 2008, in van Belle G., ed., 14th Cambridge Workshop on Cool Stars, Stellar Systems, and the Sun Vol. 384 of Astronomical Society of the Pacific Conference Series, Asteroseismology from SolarLike Oscillations. p. 21
 Broomhall et al. (2009) Broomhall A. M., Chaplin W. J., Elsworth Y., New R., 2009, MNRAS, 397, 793
 Brown et al. (2005) Brown T. M., Everett M., Latham D. W., Monet D. G., 2005, in Bulletin of the American Astronomical Society Vol. 37 of Bulletin of the American Astronomical Society, Star Classification for the Kepler Input Catalog: From Images to Stellar Parameters. p. 1340
 Buzasi (2000) Buzasi D. L., 2000, in Teixeira T., Bedding T., eds, The Third MONS Workshop: Science Preparation and Target Selection Experiment Design and Data Reduction for Seismology: Lessons Learned from WIRE. p. 9
 Chaplin et al. (2008) Chaplin W. J., Appourchaux T., Arentoft T., Ballot J., ChristensenDalsgaard J., Creevey O. L., Elsworth Y., Fletcher S. T., García R. A., Houdek G., JiménezReyes S. J., Kjeldsen H., 2008, Astronomische Nachrichten, 329, 549
 Chaplin et al. (2002) Chaplin W. J., Elsworth Y., Isaak G. R., Marchenkov K. I., Miller B. A., New R., Pinter B., Appourchaux T., 2002, MNRAS, 336, 979
 Chaplin et al. (2009) Chaplin W. J., Houdek G., Elsworth Y., New R., Bedding T. R., Kjeldsen H., 2009, ApJ, 692, 531
 Chaplin et al. (2009) Chaplin W. J., Houdek G., Karoff C., Elsworth Y., New R., 2009, A&A, 500, L21
 ChristensenDalsgaard (2008a) ChristensenDalsgaard J., 2008a, Ap&SS, 316, 113
 ChristensenDalsgaard (2008b) ChristensenDalsgaard J., 2008b, Ap&SS, 316, 13
 ChristensenDalsgaard et al. (2008) ChristensenDalsgaard J., Arentoft T., Brown T. M., Gilliland R. L., Kjeldsen H., Borucki W. J., Koch D., 2008, Communications in Asteroseismology, 157, 266
 De Ridder et al. (2009) De Ridder J., Barban C., Baudin F., Carrier F., Hatzes A. P., Hekker S., Kallinger T., Weiss W. W., Baglin A., Auvergne M., Samadi R., Barge P., Deleuil M., 2009, Nature, 459, 398
 D’Odorico (2000) D’Odorico S., 2000, The Messenger, 99, 2
 Fletcher et al. (2006) Fletcher S. T., Chaplin W. J., Elsworth Y., Schou J., Buzasi D., 2006, MNRAS, 371, 935
 Gabriel et al. (2002) Gabriel A. H., Baudin F., Boumier P., García R. A., TurckChièze S., Appourchaux T., Bertello L., Berthomieu G., Charra J., Gough D. O., Pallé P. L., 2002, A&A, 390, 1119
 García et al. (2009) García R. A., Régulo C., Samadi R., Ballot J., Barban C., Benomar O., Chaplin W. J., Gaulme P., Appourchaux T., Mathur S., Mosser B., 2009, A&A, 506, 41
 Grevesse & Noels (1993) Grevesse N., Noels A., 1993, in Kubono S., Kajino T., eds, Origin and Evolution of the Elements Cosmic Abundances of the Elements. p. 14
 Harvey (1985) Harvey J., 1985, in Rolfe E., Battrick B., eds, Future Missions in Solar, Heliospheric & Space Plasma Physics Vol. 235 of ESA Special Publication, HighResolution Helioseismology. p. 199
 Hekker et al. (2009) Hekker S., Kallinger T., Baudin F., De Ridder J., Barban C., Carrier F., Hatzes A. P., Weiss W. W., Baglin A., 2009, A&A, 506, 465
 Huber et al. (2009) Huber D., Stello D., Bedding T. R., Chaplin W. J., Arentoft T., Quirion P., Kjeldsen H., 2009, ArXiv eprints: 0910.2764
 Iglesias & Rogers (1996) Iglesias C. A., Rogers F. J., 1996, ApJ, 464, 943
 Kjeldsen & Bedding (1995) Kjeldsen H., Bedding T. R., 1995, A&A, 293, 87
 Kjeldsen et al. (2008) Kjeldsen H., Bedding T. R., Arentoft T., Butler R. P., Dall T. H., Karoff C., Kiss L. L., Tinney C. G., Chaplin W. J., 2008, ApJ, 682, 1370
 Mathur et al. (2009) Mathur S., García R. A., Régulo C., Ballot J., Salabert D., Chaplin W. J., 2009, A&A, submitted
 Matthews et al. (2000) Matthews J. M., Kuschnig R., Walker G. A. H., Pazder J., Johnson R., Skaret K., Shkolnik E., Lanting T., Morgan J. P., Sidhu S., 2000, in Szabados L., Kurtz D., eds, IAU Colloq. 176: The Impact of LargeScale Surveys on Pulsating Star Research Vol. 203 of Astronomical Society of the Pacific Conference Series, Ultraprecise Photometry from Space: The MOST Microsat Mission. pp 74–75
 Michel et al. (2008) Michel E., Baglin A., Auvergne M., Catala C., Samadi R., Baudin F., Appourchaux T., Barban C., Weiss W. W., Berthomieu G., Boumier P., Dupret M.A., 2008, Science, 322, 558
 Mosser & Appourchaux (2009) Mosser B., Appourchaux T., 2009, ArXiv eprints: 0909.0782
 Queloz et al. (2001) Queloz D., Mayor M., Udry S., Burnet M., Carrier F., Eggenberger A., Naef D., Santos N., Pepe F., Rupprecht G., Avila G., Baeza F., Benz W., Bertaux J.L., Bouchy F., Cavadore C., 2001, The Messenger, 105, 1
 Samadi et al. (2005) Samadi R., Goupil M.J., Alecian E., Baudin F., Georgobiani D., Trampedach R., Stein R., Nordlund Å., 2005, JApA, 26, 171
 Stello et al. (2009) Stello D., Chaplin W. J., Basu S., Elsworth Y., Bedding T. R., 2009, MNRAS, p. L341
 Stello et al. (2009) Stello D., Chaplin W. J., Bruntt H., Creevey O. L., GarcíaHernández A., Monteiro M. J. P. F. G., Moya A., Quirion P., Sousa S. G., Suárez J., Appourchaux T., Arentoft T., Ballot J., Bedding T. R., ChristensenDalsgaard J., 2009, ApJ, 700, 1589
 Stello et al. (2004) Stello D., Kjeldsen H., Bedding T. R., de Ridder J., Aerts C., Carrier F., Frandsen S., 2004, Solar Physics, 220, 207
 Tassoul (1980) Tassoul M., 1980, ApJS, 43, 469