Filtering out activity-related variations from radial velocities in a search for low-mass planets
Variations related to stellar activity and correlated noise can prevent the detections of low-amplitude signals in radial velocity data if not accounted for. This can be seen as the greatest obstacle in detecting Earth-like planets orbiting nearby stars with Doppler spectroscopy regardless of developments in instrumentation and rapidly accumulating amounts of data. We use a statistical model that is not sensitive to aperiodic and/or quasiperiodic variability of stellar origin. We demonstrate the performance of our model by re-analysing the radial velocities of the moderately active star CoRoT-7 () with a transiting planet with a transiting planet whose Doppler signal has proven rather difficult to detect. We find that the signal of the transiting planet can be robustly detected together with signals of two other planet candidates. Our results suggest that rotation periods of moderately active stars can be filtered out of the radial velocity noise, which enables the detections of low-mass planets orbiting such stars.
keywords:methods: statistical, numerical – techniques: radial velocities – stars: individual: CoRoT-7
Activity-related noise caused by rotation, active features, and irregularities of stellar surfaces (e.g. Santos et al., 2010; Dumusque et al., 2011, 2012; Hatzes, 2013; Tuomi et al., 2013b) provide currently the greatest challenges in improving the precision of radial velocity surveys to such an extent that detections of Earth-like planets become possible. Despite being largely induced by rotation, the nature of stellar noise related to activity gives rise to aperiodic and/or quasiperiodic variations that can mimic periodic Doppler signals of Keplerian origin and thus prevent the detections of low-amplitude planetary signals by littering the posterior densities (or periodograms) with spurious signals that can be mistaken for genuine signals of planetary origin. In order to increase the sensitivity of radial velocity surveys, it is necessary to develop statistical models (e.g. for correlated noise: Baluev, 2013; Tuomi et al., 2013b) accounting for these irregular variations. Furthermore, it should only be accepted that a weak planetary signal is detected if it is necessary in describing the radial velocity variations and that these variations cannot be explained by stochastic variability that might be connected to measures of stellar activity and/or intrinsic correlations in the data (e.g. Boisse et al., 2011; Pont et al., 2011).
CoRoT-7 is a moderately active G9 (; [Fe/H] dex; M; ) dwarf star (Queloz et al., 2009; Bruntt et al., 2010) that has a planetary system orbiting it, including one transiting planet corresponding to a valuable benchmark planet for estimating the compositions and structures of such super-Earths (Léger et al., 2009; Queloz et al., 2009). However, determination of the mass of this planet based on high-precision radial velocities has proved difficult (e.g. Queloz et al., 2009; Hatzes et al., 2010; Boisse et al., 2011; Hatzes et al., 2011; Pont et al., 2011) due to the difficulties in detecting the signal in the precense of at least equally significant variations related to stellar activity and rotation (Queloz et al., 2009; Ferraz-Mello et al., 2011; Hatzes et al., 2011) and because modelling approaches based on different assumptions and approaches – pre-whitening or high-pass filtering corresponding to the removal of periodicities estimated to correspond to activity-related phenomena (Queloz et al., 2009; Ferraz-Mello et al., 2011) or modelling starspot-induced radial velocity variations (Lanza et al., 2010; Pont et al., 2011) – have provided a wide range of estimates that are consistent at best but sometimes very diverse (see the estimates listed in Hatzes et al., 2011).
In this work we obtain the HARPS spectra of CoRoT-7 from the European Southern Observatory archive, process them with the Template-Enhanced Radial velocity Re-analysis Application (HARPS-TERRA; Anglada-Escudé & Butler, 2012), and obtain velocity products together with activity indicators. We aim to filter out the activity-related variations from the CoRoT-7 radial velocity data with our noise model. Spurious quasiperiodic signals related to the stellar rotation of roughly 23 d (Lanza et al., 2010) corresponding to the “dominant frequency in the power spectrum” according to Queloz et al. (2009), have been affecting the obtained solutions to the radial velocities. Our goal is the detection of the signal of the transiting planet without any prior knowledge of its orbital period and, more generally, to be able to distinguish between activity-related variability and genuine Doppler signals of transiting planets.
2 Statistical modelling
Stellar activity can contribute considerably to the variations in precision radial velocity data (e.g. Lagrange et al., 2010; Santos et al., 2010; Boisse et al., 2011; Dumusque et al., 2011). It is thus necessary to account for these variations with the statistical model in order to detect low-amplitude signals of planetary origin. Our key point is that activity variations are not strictly periodic and comprise quasi- and/or aperiodic variability that is unlikely to be modelled accurately by using Keplerian periodicities or sinusoids. This is the case even when such variations appear as periodicities in the commonly used Lomb-Scargle periodograms (Lomb, 1976; Scargle, 1982; Cumming, 2004), although such an interpretation might be tempting in practice (e.g. Queloz et al., 2009; Dumusque et al., 2012; Hatzes, 2013) in the context of so-called “pre-whitening” that aims at removing periodogram powers related to activity and/or rotation by subtracting the corresponding sinusoids from the data. Since such variations cannot be expected to be strictly sinusoidal, this procedure gives rise to harmonics of the subtracted signals that need to be subtracted as well in order to decrease the contribution of such features to the data. Such an approach cannot be considered very satisfactory in practice because such a subtraction of sinusoids that are not orthonormal in the case of unevenly sampled time series causes biases to the genuine signals in the data (e.g. Pont et al., 2011), but also, because it requires prior knowledge of the nature of the signals seen in the periodograms, e.g. information from transit observations, such that activity-related spurious signals are subtracted but planetary signals are left intact (Queloz et al., 2009; Ferraz-Mello et al., 2011). We note that more sophisticated periodogram-based methods have been proposed as well, including the generalised log-likelihood periodograms (e.g. Anglada-Escudé & Tuomi, 2012; Baluev, 2013, and references therein) that evaluate the significances of signals based on likelihood-ratio tests and can be applied to arbitrary noise models, and the minimum mean squared error technique for determining the optimal number of signals in a data set (Jenkins et al., 2014) that also provides means for testing whether the signals are stationary in time.
Instead, we adopt the approach of Tuomi et al. (2013b) and assume that a large fraction of activity-induced variability can in fact be filtered out from the velocities by accounting for intrinsic correlations in the data. In practice, this means that the radial velocity noise is not white but has a significant red-noise component even for stars that are known to be rather inactive (Baluev, 2013; Tuomi et al., 2013a, b; Feroz et al., 2014). However, such models for correlated noise might not be sufficient in explaining the variations in radial velocities of moderately active stars such as CoRoT-7. Well-known measures of stellar activity, obtained from the same spectra as the velocities themselves, can be useful (e.g. Pont et al., 2011) to estimate how much the stellar activity contributes to the radial velocity variability. In particular, we consider using the line bisector span (BIS) that respond to spots and other surface inhomogeneities co-rotating on the stellar surface (Saar et al., 1998; Desort et al., 2007), line full-width at the half-maximum (FWHM), and the -index based on the CaII H&K lines as reasonable measures of activity (e.g. Boisse et al., 2011).
The above considerations suggest, as a first-order approximation, a model for the radial velocities that can be written (see also Tuomi et al., 2013b) as
where the measurement made at time is modelled by using the reference velocity ; a linear trend quantified by using the parameter that is can be present in the radial velocities of any given target due to the possible existence of a long-period stellar and/or substellar companion; a white-noise component that we assume to have a Gaussian density with a zero mean and a variance of , where represents the estimated instrument noise and is a free parameter of the model; a correlated noise component that we describe according to a th order moving average (MA) model with an exponential smoothing in the time-scale of (Baluev, 2013; Tuomi et al., 2013b); and linear correlations with the activity indices quantified by using the parameters .
Although it is possible that the linear correlations between velocities and activity indices described above are not sufficient as the relationship between those two cannot be expected to be linear (or in fact to follow any simple functional description), we adopt such a description as a first-order approximation. While we do not expect this model to be optimal, if velocity variations are indeed impacted by activity this will be better than modelling the velocity variations without such a component. Moreover, the MA() term might not be an optimal description of the correlated noise either. For instance, Baluev (2013) and Feroz et al. (2014) describe the noise such that they set all equal and take the summation over all , . We do not consider this a realistical description. First, such a model corresponds to a violation of causality as the measurements made after the th one contribute to its noise, although this might be justifiable as simply a statistical description without such an interpretation. Second, the assumption that the parameter is the same for all might be too restrictive. Furthermore, using such models requires inversions of matrices, where is the number of measurements, which becomes computationally heavy even for moderately high . We choose an MA(1) model that does not increase the computational requirements and appears to be a reasonably good noise model in practice (Tuomi, 2014; Tuomi et al., 2014) and because increasing the number of MA components did not improve the model. Moreover, as we measure activity-related information based on three activity indicators, namely BIS, FWHM, and -indices, we set and in Eq. (2).
The model described in Eq. 2 implies a likelihood model for the measurements that can be calculated for the th measurement based on the obtained values , where . We used the priors as discussed in (Tuomi & Anglada-Escudé, 2013) such that the eccentricity prior was defined as and the jitter prior as , where 2 ms. See Tuomi & Anglada-Escudé (2013) and Anglada-Escudé et al. (2013) for discussion and justification.
3 DRAM samplings
We analyse the radial velocity data sets by applying the delayed-rejection adaptive-Metropolis (DRAM) algorithm (Haario et al., 2006) that is a generalisation of the adaptive-Metropolis algorithm (Haario et al., 2001) based on the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970).
The basics of the DRAM algorithm are as follows. We first draw a proposal () from a proposal density . This density is a function of the current state of the chain (typically centered at it). The proposal is then accepted with the probability
where is the posterior density given the data . If the proposal density is a symmetric one, as is the case when applying a multivariate Gaussian density, the acceptance probability depends only on the posterior ratio of the two vectors because . Assuming that the new vector is rejected, another vector can then be proposed by using another proposal density (Haario et al., 2006). In addition to the current state and the newly proposed vector , this new proposal density can depend on the rejected vector as well. The new proposal is then accepted with probability
This process can be repeated for an arbitrary number of rejected proposals in the above manner. The equation for the general case, i.e. th acceptance probability, can be found in Haario et al. (2006).
Analyses of radial velocity time series in a search for periodic signals are typically complicated because the period parameter (its logarithm in our analyses) has a highly multimodal likelihood function and thus posterior density. Therefore, it is necessary to perform the DRAM samplings in such a way that the chains visit all the modes in the posterior regularly and that the highest maxima are identified robustly in the samplings. For this reason, we choose the proposal densities such that is a multivariate Gaussian density with a covariance matrix . The second proposal is then a multivariate Gaussian as well with a covariance of that is obtained by multiplying the row and the column corresponding to the period parameter of the signal by a factor of such that the width of the proposal density is decreased in the dimension of the period to enable the chains to find the narrow probability maxima in the period space. We allowed three delayed-rejection steps before finally rejecting a proposed vector. Such a practice has been proposed by Haario et al. (2006) albeit in a different context.
When searching for periodic signals in the data, we used tempered samplings such that a posterior raised to the power of was used instead of the original posterior. This ensured, by decreasing the relative heights of the modes in the posterior, that the corresponding Markov chains visited all the relevant areas in the period space. This is important because the posterior density is typically highly multimodal in such examples of analyses of radial velocity data (e.g. Tuomi, 2014; Tuomi et al., 2014). DRAM samplings were not used to obtain point and uncertainty estimates for the model parameters but only for ensuring that there were no additional significant maxima in the period space. For parameter estimation purposes, we set to draw samples from the original posterior density with the adaptive-Metropolis algorithm.
4 Searching for signals in HARPS data of CoRoT-7
The CoRoT-7 Doppler data have been obtained over two rather distinct observing runs between 2454527 - 2454885 JD and between 2455939 - 2455965 JD. We call these simply the first ( after removing few measurements corresponding to FWHM outliers) and second observing run (), respectively. These velocities are tabulated in the appendix. Because there is a gap of almost three years between these two observing runs, we believe that it cannot be expected that the noise properties, including all possible correlations as modelled according to Eq. (2), are equivalent in these runs due to differences in stellar activity, e.g. the evolution of spot patterns, and the designs of the observations in terms of e.g. cadence and exposure time. Therefore, we assume that the noise parameters of these two observing runs are independent of one another. Even if they are not strictly independent, we believe that treating them as free parameters for each run, as one would for data from different instruments, enables us to determine whether there are indeed differences that should be accounted for and to obtain a better statistical model than when making the much stronger assumption that the noise properties were not evolving as a function of time, which is unlikely to be the case (e.g. Queloz et al., 2009). Due to this choice, our model contains eight independent and two common parameters for the two observing runs, which yields a baseline model with 18 parameters when there are no Keplerian signals in the model.
We started the analyses of the HARPS velocity data by performing tempered samplings of a one-Keplerian model. These samplings enabled us to identify a signal at 3.7 d as a unique probability maximum in the period space together with local maxima corresponding to its double and a daily alias at a period of 1.4 d (Fig. 1, right hand panel). Similar samplings, albeit with parameter set to values closer to unity, of a two-Keplerian model yielded a maximum at a period of 9.0 d for the second signal (Fig. 1 middle panel) together with local maxima at 0.86 d and 22 d. After this, additional samplings revealed a unique third signal in the data at a period of 0.86 d (Fig. 1, right hand panel) without considerable local maxima. This sequence of signal detections was consistent with the signal detection criteria of Tuomi (2012) as the log-Bayesian evidences were increased at these steps by 43.0, 13.2, and 19.0. These log-Bayesian evidences correspond to model probabilities that increase by factors of , , and , respectively, when assuming equal prior probabilities for each model. The phase-folded signals are shown in Fig. 2. We did not detect any additional signals in the data.
The observed rotation period of the star of 23 d that has been detected in the radial velocities in earlier studies (Queloz et al., 2009) did not correspond to the global maximum for any of the models used in our analyses. Although it can be seen as a local maximum in the DRAM search for the second most significant signal (Fig. 1, middle panel), it did not correspond to a significant solution even when we started additional samplings in its vicinity. Furthermore, we did not observe significant signals at that period even after accounting for the three candidate planets orbiting the star. This indicates that the statistical model (Eq. 2) accounts for the aperiodic and quasiperiodic variations corresponding to the stellar activity sufficiently well and makes it unnecessary to model the rotation period (and/or its harmonics) as additional signals in the data (Queloz et al., 2009; Boisse et al., 2011). We interpret this result as a suggestion that the noise model is an adequate description of the activity-related variations and, more generally, that it might be possible to remove such variability from radial velocity data of at most moderately active stars. This could therefore help in the detection of low amplitude radial velocity variations corresponding to low-mass planets around such stars.
We did not find significant correlations between the radial velocities and BIS values or the -indices. However, the FWHM appears to be connected to the velocity variations as we obtain 0.094 [0.016, 0.198] for the first observing run and 0.065 [-0.013, 0.192] for the second when using the maximum a posteriori (MAP) estimates and the 99% credibility intervals. The former correlation is significantly present in the data considering that the 99% interval does not overlap with zero but the latter is significant in this sense with respect to only a 95% credibility interval.
The excess noise, as measured by obtaining estimates for the parameters , is likely different between the two observing runs. We obtained estimates of 3.89 [3.07, 4.97] ms and 1.88 [1.42, 3.21] ms for the first and second observing runs, respectively. Similarly, the first observing run contained significant intrinsic correlations because we obtained 0.88 [0.68, 1], whereas the same parameter was consistent with zero for the second observing run. The time-scale () of these intrinsic correlations was found to be roughly 6 d (see also Baluev, 2013; Tuomi et al., 2013b). Not accounting for these differences might weight the first observing run too much and the second one too little and lead to biased results.
Finally, we obtained estimates for the proposed system of three planets and show them in Table 1.
We note that according to our tests, samplings with the DRAM algorithm proved more robust than the corresponding samplings with the adaptive-Metropolis algorithm. We obtained the results by generating Markov chains with few members with DRAM, whereas we could obtain consistent results with the adaptive-Metropolis algorithm with roughly ten times longer chains. This is caused by the fact that the DRAM algorithm enables the chains to visit narrow high-probability regions in the parameter space more easily. This is because the sampling algorithms that are capable of exploring the posterior locally are more efficient in exploring the period space of radial velocity signals in practice due to a multimodality of the corresponding probability densities.
We have presented an analysis of the HARPS-TERRA velocities of CoRoT-7 that has been proposed to host a system of two planetary companion, possibly three, out of which the innermost one with an orbital period of roughly 0.85 days is transiting. According to our results, the three signals can be detected based on the HARPS-TERRA radial velocities by performing DRAM searches for posterior maxima in the period space without any prior knowledge of the period of the transiting planet (e.g. Queloz et al., 2009) that has the weakest radial velocity signal. Furthermore, with the statistical model accounting for both intrinsic correlations in the data and linear correlations with activity indices (Eq. 2), we did not observe the stellar rotation period at 23 days (Queloz et al., 2009) as a global maximum with models containing any number of Keplerian signals. This suggests that for moderately active stars such as CoRoT-7 with , such modelling of correlations automatically filters out activity-induced signals and variability related to the stellar rotation. We note that although we used the BIS, FWHM, and -index values as indicators of stellar activity, other measures of such activity should be considered as well (e.g. Barnes et al., 2014; Santos et al., 2014). Choosing the most informative indices and the exact functional form for the correlations requires model comparisons with a handful of benchmark targets and is a subject of future work together with the choice of the red noise model. However, our simple approach appears to work well for the CoRoT-7 data.
Modelling the activity-related variations based on observed photometric variability is another approach that has been attempted (Lanza et al., 2010). However, apart from photometric detections of transiting planets and stellar rotation periods, it is not clear how photometric data that is more often than not obtained at different epochs than the Doppler data, such as is the case with CoRoT-7 (Pont et al., 2011), can be connected to velocity variations in a useful and trustworthy manner. For this reason, as suggested by Pont et al. (2011), the information available in the activity data obtained from the same spectra as the radial velocities should be accounted for when modelling the relationship between stellar activity and radial velocities.
With an estimated mass of M, and adopting a radius of 1.690.09 R (Léger et al., 2009), the average density of this transiting planet is gcm that does not constrain the planetary composition very accurately. Most of the uncertainty in this estimate arises from the uncertainty in estimating the planetary mass better than by a factor of roughly 50%. The above mass estimate appears to be consistent with all the earlier estimates due to its uncertainty (see Hatzes et al., 2011) and underlines the importance of improving the statistical models used to analyse Doppler spectroscopy data in order to achieve greater sensitivity to low-amplitude signals and better constraints for the planetary masses.
The authors acknowledge the significant efforts of the HARPS-ESO team in improving the instrument and its data reduction pipelines.
- Anglada-Escudé & Butler (2012) Anglada-Escudé, G. & Butler, R. P. 2012, ApJS, 200, 15
- Anglada-Escudé & Tuomi (2012) Anglada-Escudé, G. & Tuomi, M. 2012, A&A, 548, A58
- Anglada-Escudé et al. (2013) Anglada-Escudé, G., Tuomi, M., Gerlach, E., et al. 2013, A&A, 556, A126
- Baluev (2013) Baluev, R. V. 2013, MNRAS, 429, 2052
- Barnes et al. (2014) Barnes, J. R., Jenkins, J. S., Jones, H. R. A., et al. 2014, MNRAS, 439, 3094
- Boisse et al. (2011) Boisse, I., Bouchy, F., Hébrard, G., et al. 2011, A&A, 528, A4
- Bruntt et al. (2010) Bruntt, H., Deleuil, M., Fridlund, M., et al. 2010, A&A, 519, A51
- Cumming (2004) Cumming, A. 2004, MNRAS, 354, 1165
- Desort et al. (2007) Desort, M., Lagrange, A.-M., Galland, F., et al. 2007, A&A, 473, 983
- Dumusque et al. (2011) Dumusque, X., Lovis, C., Ségransan, D., et al. 2011, A&A, 535, A55
- Dumusque et al. (2012) Dumusque, X., Pepe, F., Lovis, C., et al. 2012, Nature, 491, 207
- Feroz et al. (2014) Feroz, F. & Hobson, M. P. 2014, MNRAS, 437, 3540
- Ferraz-Mello et al. (2011) Ferraz-Mello, S., Tadeu dos Santos, M., Beaugé, C., et al. 2011, A&A, 531, A161
- Haario et al. (2001) Haario, H., Saksman, E., & Tamminen, J. 2001, Bernoulli, 7, 223
- Haario et al. (2006) Haario, H., Laine, M., Mira, A., and Saksman, E. 2006, Statistics and Computing, 16, 339
- Hastings (1970) Hastings, W. 1970, Biometrika 57, 97
- Hatzes et al. (2010) Hatzes, A. P., Dvorak, R., Wuchterl, G., et al. 2010, A&A, 520, A93
- Hatzes et al. (2011) Hatzes, A. P., Fridlund, M., Nachmani, G., et al. 2011, ApJ, 743, 75
- Hatzes (2013) Hatzes, A. P. 2013, ApJ, 770, 133
- Jenkins et al. (2014) Jenkins, J. S., Becerra Yoma, N., Rojo, P., et al. 2014, arXiv:1403.7646
- Kipping (2013) Kipping, D. M. 2013, MNRAS, 434, L51
- Lagrange et al. (2010) Lagrange, A.-M., Desort, M., Meunier, N. 2010, A&A, 512, 38
- Lanza et al. (2010) Lanza, A. F., Bonomo, A. S., Moutou, C., et al. 2010, A&A, 520, A53
- Léger et al. (2009) Léger, A., Rouan, D., Schneider, J., et al. 2009, A&A, 506, 287
- Lomb (1976) Lomb, N. R. 1976, Astrophys. Space Sci., 39, 447
- Metropolis et al. (1953) Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., et al. 1953, J. Chem. Phys., 21, 1087
- Pepe et al. (2011) Pepe, F., Lovis, C., Ségransan, D., et al. 2011, A&A, 534, A58
- Pont et al. (2011) Pont, F., Aigrain, S., & Zucker, S. 2011, MNRAS, 411, 1953
- Queloz et al. (2009) Queloz, D., Bouchy, F., Moutou, C., et al. 2009, A&A, 506, 303
- Saar et al. (1998) Saar, S. H., Butler, R. P., & Marcy, G. W. 1998, ApJ, 498, 153
- Santos et al. (2010) Santos, N. C., Gomez da Silva, J., Lovis, C., & Melo, C. 2010, A&A, 511, A54
- Santos et al. (2014) Santos, N. C., Mortier, A., Faria, J. P., et al. 2014, arXiv:1404.6135
- Scargle (1982) Scargle, J. D. 1982, ApJ, 263, 835
- Tuomi (2012) Tuomi, M. 2012, A&A, 543, A52
- Tuomi (2014) Tuomi, M. 2014, MNRAS, 440, L1
- Tuomi & Anglada-Escudé (2013) Tuomi, M. & Anglada-Escudé 2013, A&A, 556, A111
- Tuomi et al. (2013a) Tuomi, M., Anglada-Escudé, G., Gerlach, E., et al. 2013a A&A, 549, A48
- Tuomi et al. (2013b) Tuomi, M., Jones, H. R. A., Jenkins, J. S., et al. 2013b, A&A, 551, A79
- Tuomi et al. (2014) Tuomi, M., Jones, H. R. A., Barnes, J. R., et al. 2014, arXiv:1403.0430
Appendix A HARPS-TERRA radial velocities and activity indices