On the Eccentricity Distribution of Exoplanets from Radial Velocity Surveys
We investigate the estimation of orbital parameters by least- Keplerian fits to radial velocity (RV) data using synthetic data sets. We find that while the fitted period is fairly accurate, the best-fit eccentricity and are systematically biased upward from the true values for low signal-to-noise ratio and moderate number of observations , leading to a suppression of the number of nearly circular orbits. Assuming intrinsic distributions of orbital parameters, we generate a large number of mock RV data sets and study the selection effect on the eccentricity distribution. We find the overall detection efficiency only mildly decreases with eccentricity. This is because although high eccentricity orbits are more difficult to sample, they also have larger RV amplitudes for fixed planet mass and orbital semi-major axis. Thus the primary source of uncertainties in the eccentricity distribution comes from biases in Keplerian fits to detections with low-amplitude and/or small , rather than from selection effects. Our results suggest that the abundance of low-eccentricity exoplanets may be underestimated in the current sample and we urge caution in interpreting the eccentricity distributions of low-amplitude detections in future RV samples.
Subject headings:planetary systems – techniques: radial velocities
With the rapid increase in the rate of exoplanet detections, it has become feasible to study their statistical properties (for a recent review, see Udry & Santos 2007). The distributions of their orbital parameters and correlations with host star properties are crucial for our understanding of planet formation. Up to Feb 2008, over 200 exoplanets have been announced, most of which were detected by the radial velocity (RV) technique. Among the parameters that can be derived from RV data, the orbital eccentricity has a somewhat unexpected distribution, with an extended tail of high eccentricities (). Although the exact form of this eccentricity distribution is still uncertain to some extent, especially at the lowest eccentricities (i.e., comparing the distribution in Butler et al. 2006 and in the most recent catalog), it is clear that this distribution is quite different from that of the Solar system. There have already been several theoretical attempts to explain such an eccentricity distribution (e.g., Tremaine & Zakamska 2004 and references therein; Juric & Tremaine 2007; Ford et al. 2007; Zhou et al. 2007).
However, the largest current exoplanet catalog (e.g., Bulter et al. 2006 and their updates, hereafter Bulter06) is by no means homogeneous. Survey strategies, selection biases in the RV technique and uncertainties in best-fit orbital solutions could all bias the intrinsic distributions of orbital parameters, especially when taken together. Eccentric orbits have larger amplitudes than circular orbits when holding other parameters fixed, while failure to (time) resolve the perihelion approach can lead to non-detection for an eccentric orbit system. These two effects are believed to roughly cancel but detailed simulation is needed (e.g., Endl et al. 2002; Cumming 2004). It is also well known that the errors in least- Keplerian solutions become asymmetric for noisy data (e.g., Ford 2005; Butler et al. 2006), which is especially true for eccentricity. On the other hand, for circular orbits the eccentricity can only be scattered upward by error, so an obvious bias exists for circular orbits. As the RV surveys progress, more and more low amplitude (low signal-to-noise ratio) detections will be reported. Significant uncertainties remain in the distributions of orbital parameters for those exoplanets using best-fit orbital solutions. There are already some cases where the orbit eccentricity is poorly constrained by the RV data (e.g., Jones et al. 2006).
The purpose of the paper is to explore processes that might distort measurements of the shape of the intrinsic eccentricity distribution using synthetic data. First we investigate the reliability of single Keplerian fits to mock RV data sets as function of signal-to-noise ratio, and see if any bias arises from Keplerian fitting to noisy data (§2). Second, we construct statistical models of orbital parameter distributions and pass them through a simulated detection pipeline, in order to see if there is any serious selection effect on eccentricity (§3). We discuss our results and apply them to the current RV planet sample in §4, and we summarize our findings in §5.
2. Keplerian Fitting to Mock RV Data
We consider single planet detections throughout the paper.
Assuming a true orbit (period , eccentricity , minimal
planet mass , argument of periastron , and
temporal offset ) and the host star mass , we generate
mock RV data sets with Gaussian errors as perturbations.
For simplicity is assumed to be the same for all data
points in a set, but we relax this assumption later
(§4). In practice, comes from RV
measurement errors, stellar jitter and possible existence of
multiple planets (e.g., Ford 2006)
Here the amplitude is defined in terms of the intrinsic properties of a planet’s orbit, since we “know” the true orbital parameters in advance.
In addition to the signal-to-noise ratio , another important quantity is the number of observations . Clearly the larger and are, the better the quality of the data set.
We consider planets where the orbital period is shorter than the observational time span. Orbits with periods longer than the time baseline are difficult to detect due to limited sampling and lower amplitude (e.g., Cumming 2004). Even if detected, the best-fit orbital solution would often be very different from the true one (Shen et al. 2008, in preparation). The observation times are randomly generated within the observational time span, which somewhat mimics the effects of realistic radial velocity sampling.
In Fig. 1 we plot the relations of fitted eccentricity, period, minimal planet mass and semi-amplitude to their true values, as functions of , for a specific case with , days, , and eccentricity values at intervals of . Each set of true orbits was used to generate 5000 mock RV data sets with randomly chosen and temporal offset , and with Gaussian noise determined from the assumed signal-to-noise ratio. We used the Levenberg-Marquardt method (e.g., Press et al. 1992) to minimize , and the true orbital parameters were used as an initial guess for the fitting. Using the true orbital parameters as the initial guess greatly speeds up the convergence and increases the rate of successful fits. In the realistic case when we do not know the actual orbital parameters, the success of a fit also depends on an appropriate initial guess. We will come back to this point in §3. The mock observational time span was set to be two true periods, but we find almost identical results as long as the period is shorter than the time span. Finally, we simulated three values of .
Not all attempts to find a Keplerian fit will result in a detection. In an actual radial velocity search for planets, some threshold on the false alarm probability (FAP), the probability that a signal power will arise purely from noise, of a Keplerian fit, is normally used. Throughout the paper we adopt a fiducial , which is essentially always satisfied for large and . There are several ways to estimate the FAP using Monte Carlo simulations or analytical formulae. For simplicity we have followed the analytical procedure described in Cumming (2004) to estimate the FAP: Given the least- value from the Keplerian fit, and the least- value from the fit of a constant to the data, we calculate a power using eqn. (7) in Cumming (2004). An estimate of the FAP is then eqn. (5) in Cumming (2004): , where is the number of independent frequencies with the observational time span and days the lower bound of period searched in the Keplerian fitting, and is the probability distribution given by eqn. (8) in that paper.
All three panels in Fig. 1 show that the fitted period is fairly accurate. However, for eccentricity and (and also the amplitude following eqn. 1) the median fitted values are biased upward, and this behavior remains for true eccentricities as large as in our simulations. This is not a failure of the minimization method. In Fig. 2 we show two examples of Keplerian fits where both RV data sets prefer a more eccentric orbit. The biases in and , as well as the scatter around the median, decrease when or increases, as would be naively expected.
These fitting biases result from the fact that, in general, the least- solution is not an unbiased estimator of the true parameters for the non-linear Keplerian model even if the data errors are Gaussian. In particular, the probability distribution is not Gaussian, where is the set of true parameter and is the best-fit orbital solution; and it only approaches a Gaussian distribution when the errors are small.
To fully take account these biases we need to know for different combinations of , and , which can be explored numerically as in Fig. 1. However, for low intrinsic eccentricities (), the apparent fitting bias in eccentricity can be quantitatively understood as follows. While the error in the fitted eccentricity is not Gaussian when is comparable to , the fitted and preserve Gaussianity much better provided that the intrinsic eccentricity is small (e.g., Butler et al. 2006). Assuming their fitted values are Gaussian distributions around true values and with dispersion , the probability distribution of the fitted eccentricity is given by
where is the modified Bessel function of the first kind. For circular orbits () it is a Rayleigh distribution; for eccentric orbits and at the low-dispersion limit () it is approximately a Gaussian distribution.
Fig. 3 shows an example of with . For a high signal-to-noise ratio, (lower panels), both distributions of fitted eccentricity and () are approximately Gaussian. For a low signal-to-noise ratio, (upper panels), the eccentricity distribution significantly deviates from a Gaussian while and are still distributed in an approximately Gaussian form. We plot the predictions of fitted eccentricity distribution from eqn. (2) as solid lines, using the best-fit Gaussian dispersion of the / distributions. In this way we can also understand the upward median biases even at intrinsic eccentricities as high as , although the fitted and are no longer Gaussian at such high eccentricities–our simulations themselves are more revealing in this case.
A conservative empirical criterion (based on our numerical experiments) for a median bias in fitted eccentricity is given by the following joint constraint on and :
which works reasonably well for . This is also an approximate criterion that the median bias in is less than . Note again here that is defined using intrinsic planet properties. In practice, can only be estimated from the best-fit solution and will on average be overestimated a bit at the low-amplitude end, according to the fitting biases discussed here (e.g., Fig. 1). If using instead, we found a good approximation is to replace equation (3) with
which compensates the fitting bias of amplitude at the low end.
3. Statistical Models of Orbital Parameter Distributions
We now investigate the selection bias on eccentricity using synthetic RV data sets generated from model distributions of orbital parameters. The detailed comparison of different model distributions and constraints from observational data will be presented in a paper in preparation (Shen et al. 2008). Our purpose here is to investigate the possible selection bias on eccentricity, therefore essentially any well behaved model distributions will do. Nevertheless, we choose a model with physically plausible distributions, as we describe below.
The period distribution is a log-normal with log mean of 10 yrs and dispersion of 1 dex. The planet mass distribution is a power-law within ; this extends well into the “brown dwarf desert” due to the upper limit of . However such objects are too rare to have any statistical impact on our results. The orientation of the planet orbital plane is random, as is the argument of periastron within and the temporal offset within one period. For the eccentricity distribution we chose a model that peaks at and diminishes to zero at :
and we assume for this specific study. Our choice of this
particular model distribution is because the peak of the observed eccentricity distribution has shifted from to using the latest exoplanet catalog
We generate 100,000 mock orbits of the model distributions and pass them through the Keplerian detection pipeline with a detection threshold . For the initial guess of orbital solution, we again take advantage of our mock data sets by using the true orbital parameters. In the realistic case, the initial guess for period can be identified using traditional Lomb-Scargle periodogram (Lomb 1976; Scargle 1982), but appropriate initial guesses for are crucial for the convergence of Keplerian fits for orbits with high eccentricities, where the fit can easily be trapped in a local minimum due to the complicated space or fail to converge at all, and hence the planet is undetected. We have run a substantial number of tests, where we tried a fine grid of as initial guesses in each Keplerian fit. While time consuming, we found that in the vast majority cases the Keplerian fits find the same minimum as using the true parameters as the initial guess. Serendipitous aliasing cases do exist, where the global minimum solution deviates catastrophically from the true orbital parameters, and which are for the most noisy data sets. Hence in the realistic case, special care must be taken to find the global minimum efficiently.
In Fig. 4 we show the overall detection efficiency (the fraction of detected orbits) as function of eccentricity in filled circles, and different line types represent the detection efficiency when binned in . Clearly increasing increases the detection efficiency. However, the overall detection efficiency only slowly decreases with eccentricity. This is because although high eccentricity orbits are more difficult to sample, they also have larger amplitude on average. When we plot the detection efficiency with constant amplitude , we see a decrease for , as shown by the cyan long-dashed line in Fig. 4, which is consistent with previous studies (e.g., Endl et al. 2002; Cumming 2004). Therefore, there is no strong selection bias against eccentricity, as demonstrated by the similar (true) eccentricity distributions of the mock orbits and the detected ones in Fig. 4. Note, however, the absolute value of the overall detection efficiency here depends on the model distributions and the quality of the survey (noise level and number of observations).
However, as we have shown in §2, Keplerian fits will have non-negligible effects for low amplitude detections and therefore bias the underlying eccentricity distribution. Although the detection efficiency decreases rapidly when decreases, there are also many more objects in the low amplitude regime given our model distributions and noise level. We show this bias effect in Fig. 5, where the blue and red histograms show the distributions of true and fitted eccentricities for low amplitude detections with . The small pile up of fitted orbits is caused by detections with period longer than the observation time where the Keplerian fits failed badly (Shen et al. 2008, in preparation). Other than that, the overall effect of the Keplerian fitting bias is the transfer of true low-eccentricity orbits to fitted high-eccentricity ones, therefore distorting the underlying eccentricity distribution. As expected from our simulations in §2, this effect diminishes when we shift to high amplitude and/or large detections, as shown in Fig. 6.
Our results lead naturally to the question of how much bias in the eccentricity distribution is to be expected in the Butler et al. (2006) catalog. To test this we take their updated catalog as of January 26, 2008. We exclude those planets with orbital periods less than 20 days (i.e., those probably have been tidally circularized). Over half of the planets in that catalog do not have published RV data and measurement errors/stellar jitter information, so we use the RMS scatter of the model fit as an estimate of the noise . We also take the amplitude as the best-fit value . There are planets with , and available. In Fig. 7 we show the distribution of defined in equation (4) for these planets. There are 19 planets with . Thus statistically speaking, only of the planets are affected in the current sample, which are predominately low-amplitude detections.
To test the effects of unequal errors and realistic sampling we take HD 114729b in the Butler06 catalog as an example. It has a best-fit eccentricity with and where we have estimated using their published RV measurement errors and stellar jitter. The actual RV data and their best-fit solution is shown in Fig. 8. Using the true time series and data uncertainties derived from measurement errors and stellar jitter for this system, but with simulated radial velocities, we find that the probability of a fitted eccentric orbit with arising from a circular orbit is , and increases to for a fitted , consistent with our results in §2 with constant uncertainties and random sampling.
We have performed simulations of planet detection and orbital parameter fits using mock radial velocity data sets. Two effects that may affect the intrinsic eccentricity distribution are considered: selection bias on eccentricity, and fitting bias in the best-fit orbital solution.
We find that selection bias on eccentricity is negligible, as long as the fitting routine is efficient in finding the global solution. In a realistic survey, this requires a thorough search in the parameter space, and/or with advanced algorithms optimized for the search. Our finding is not in conflict with previous studies (e.g., Endl et al. 2002; Cumming 2004), which claim that the detection efficiency decreases for . This is because in previous studies, the detection efficiency as a function of eccentricity is estimated at fixed while in our study is larger for more eccentric orbits when other parameters are fixed. On the other hand, we find that for detections with low signal-to-noise ratio and small number of observations, the best-fit eccentricity is biased upward in the median value, which then gives rise to a change in the eccentricity distribution from the intrinsic one.
Inspection of the current exoplanet sample shows only are likely to be significantly affected by the Keplerian fitting bias. However, future radial velocity surveys will contain an increasing number of low amplitude detections if the number of low mass, large semi-major axis exoplanets grows as rapidly as suggested by extrapolation of current results suggest (e.g., Udry & Santos 2007). When the sample of low amplitude detections (on average less massive planets and more circular orbits) is large enough for statistical study, the bias in the best-fit eccentricity described here must be taken into account. Also, in individual cases where an accurate estimate of eccentricity is required, i.e., in modeling the habitable zone or tidal heating issues, either high quality RV data or constraints from other observations are required to support reliable conclusions. In the mean time, there is also a need for a more statistically sophisticated understanding of the uncertainties of derived orbital solutions (e.g., Ford 2005; 2006).
Our results suggest that the intrinsic eccentricity distribution may be even more peaked at than the current observed distribution. Some planet-planet scattering models tend to produce a Rayleigh-distribution of eccentricities (e.g., Juric & Tremaine 2007; Ford & Rasio 2007) with reduced circular orbits, therefore certain eccentricity damping mechanism such as interactions with a protoplanetary disk may be required to reconcile these models with observations.
- slugcomment: ApJ in press
- We note that in general the additional reflex motion induced by unknown secondary planets cannot be treated as independent perturbations. Second, for simplicity we do not consider secular trends in the orbital motion of the primary planet induced by unseen companions, which are typically on much longer timescales than the period and observation time baseline considered here.
- Butler, R. P., et al. 2006, ApJ, 646, 505 (Butler06)
- Cumming, A. 2004, MNRAS, 354, 1165
- Cumming, A., Marcy, G. W., & Butler, R. P. 1999, ApJ, 526, 890
- Cumming, A., Butler, R. P., Marcy, G. W., Vogt, S. S., Wright, J. T. & Fischer, D. A. 2008, PASP, in press, arXiv:0803.3357
- Endl, M. et al. 2002, A&A, 392, 671
- Ford, E. B. 2005, AJ, 129, 1706
- Ford, E. B. 2006, ApJ, 642, 505
- Ford, E. B., & Rasio, F. A. 2007, ApJ, submitted, arXiv:astro-ph/0703163
- Jones, H. R. A., et al. 2006, MNRAS, 369, 249
- Jurić, M., & Tremaine, S. 2007, ApJ, submitted, arXiv:astro-ph/0703160
- Lomb, N. R. 1976, Ap&SS, 39, 447
- Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical Recipes (New York: Cambridge Univ. Press)
- Scargle, J. D. 1982, ApJ, 263, 835
- Tremaine, S., & Zakamska, N. L. 2004, in AIP Conf. Proc. 713, The Search for Other Worlds, ed. S. S. Holt & D. Deming (New York: AIP), 243
- Udry, S., & Santos, N. C. 2007, ARA&A, 45, 397
- Zhou, J.-L., Lin, D. N. C., & Sun, Y.-S. 2007, ApJ, 666, 423