1 Introduction

# The Keck Planet Search: Detectability and the Minimum Mass and Orbital Period Distribution of Extrasolar Planets

## Abstract

We analyze 8 years of precise radial velocity measurements from the Keck Planet Search, characterizing the detection threshold, selection effects, and completeness of the survey. We first carry out a systematic search for planets, by assessing the false alarm probability associated with Keplerian orbit fits to the data. This allows us to understand the detection threshold for each star in terms of the number and time baseline of the observations, and the underlying “noise” from measurement errors, intrinsic stellar jitter, or additional low mass planets. We show that all planets with orbital periods days, velocity amplitudes , and eccentricities have been announced, and we summarize the candidates at lower amplitudes and longer orbital periods. For the remaining stars, we calculate upper limits on the velocity amplitude of a companion. For orbital periods less than the duration of the observations, these are typically , and increase for longer periods. We then use the non-detections to derive completeness corrections at low amplitudes and long orbital periods, and discuss the resulting distribution of minimum mass and orbital period. We give the fraction of stars with a planet as a function of planet mass and orbital period, and extrapolate to long period orbits and low planet masses. A power law fit for planet masses and periods days gives a mass-period distribution with , , and the normalization constant such that 10.5% of solar type stars have a planet with mass in the range and orbital period - days. The orbital period distribution shows an increase in the planet fraction by a factor of for orbital periods days. Extrapolation gives % of stars having gas giant planets within 20 AU. Finally, we constrain the occurrence rate of planets orbiting M dwarfs compared to FGK dwarfs, taking into account differences in detectability.

binaries: spectroscopic — methods: statistical — planetary systems

## 1. Introduction

Precise Doppler velocity surveys of nearby stars have led to the detection of more than 250 extrasolar planets (e.g. Marcy et al. 2005a; Butler et al. 2006a). They have minimum masses from Earth masses () and up, orbital periods from close to one day up to several years, and a wide range of eccentricities. Over 25 multiple planet systems are known, with many other single planet systems showing a long term velocity trend likely indicating a second planet with long orbital period (Fischer et al. 2001). The increasing number of detections allows us to answer questions about the statistical properties of extrasolar planetary systems, such as the mass, period, and eccentricity distributions (Tabachnik & Tremaine 2002; Butler et al. 2003; Fischer et al. 2003; Lineweaver & Grether 2003; Jones et al. 2003; Udry, Mayor, & Santos 2003; Gaudi, Seager, & Mallen-Ornelas 2005; Ford & Rasio 2006; Jones et al. 2006; Ribas & Miralda-Escudé 2007), and the incidence of giant planets as a function of host star metallicity (Fischer & Valenti 2005; Santos et al. 2005) and mass (Butler et al. 2004b; Butler et al. 2006b; Endl et al. 2006; Johnson et al. 2007).

In this paper, we focus on the frequency of planetary systems, and the distributions of mass and orbital periods. The frequency of planets is important for future astrometric and direct searches (see e.g. Beuzit et al. 2007). The details of the mass-orbital period distribution are important because they contain information about the planet formation process (Armitage et al. 2002; Ida & Lin 2004a, 2004b, 2005, 2008a, 2008b; Alibert et al. 2005; Rice & Armitage 2005; Kornet & Wolf 2006). Figure 1 shows the distribution of planet masses and orbital periods for 182 planets announced as of March 2007. Several features of the mass-period distribution have been discussed in the literature: the “pile-up” at orbital periods of days (the “hot jupiters”) (e.g. see Gaudi et al. 2005); the paucity of massive planets () in close orbits (Udry et al. 2002, 2003; Mazeh & Zucker 2002, 2003); the deficit of planets at intermediate orbital periods of and days, giving a “period valley” in the orbital period distribution (Jones et al. 2003; Udry et al. 2003; Burkert & Ida 2007); and a suggestion that the lack of lower mass planets () at orbital periods days is significant and a real feature of the distribution (Udry et al. 2003). It has also been pointed out that the incidence and mass-period distribution of planets should depend on the mass of the host star. In particular, a much lower incidence of Jupiter mass planets is expected around M dwarfs in the core accretion scenario for planet formation (Laughlin, Bodenheimer, & Adams 2004; Ida & Lin 2005; Kennedy & Kenyon 2008 although see Kornet & Wolf 2006), and observational estimates support this picture (Butler et al. 2004b; Butler et al. 2006b; Endl et al. 2006; Johnson et al. 2007).

These interpretations are complicated by the fact that the mass-period distribution is subject to selection effects at low masses and long orbital periods. The important observational quantity is the stellar velocity amplitude induced by the planet, which for a planet of mass is

 K=28.4 m/s√1−e2 (MPsiniMJ)(P1 yr)−1/3(M⋆M⊙)−2/3 (1)

where is the orbital period, is the eccentricity, is the mass of the star, and is the inclination of the orbit. The dotted lines in Figure 1 show and for circular orbits around a solar mass star. The detectable amplitude depends on the number and duration of the observations, and particularly on the Doppler measurement errors and other noise sources (see Cumming 2004 for a detailed discussion).

Scatter in the measured radial velocity is expected from statistical and systematic measurement errors, and intrinsic stellar radial velocity variations or “jitter”. The typical statistical measurement error, which we refer to in this paper as the “Doppler error” or “Doppler measurement error”, is determined by the uncertainty in the mean velocity of a large number of spectral segments. The Doppler errors are typically in the data considered here, although Doppler errors as small as are now routine (Mayor et al. 2003; Butler et al. 2004a). Sources of systematic measurement errors include imperfect PSF descriptions and deconvolution algorithms, and the characterization of the charge transfer in the spectrometer CCD (see discussion in Butler et al. 2006b). Stellar jitter is thought to arise from a combination of surface convective motions, magnetic activity, and rotation (Saar & Donahue 1997). The amount of jitter depends on stellar properties such as rotation rate and spectral type, but is typically for chromospherically quiet stars (Saar, Butler, & Marcy 1998; Santos et al. 2000; Wright 2005). Additional low mass planets in a system could provide another source of radial velocity variability.

These various sources of noise determine the velocity threshold for detecting planets, and vary between observations, different stars, and different surveys. Interpretation of the mass-period distribution at low masses requires a careful analysis of these selection effects. Most work to date has taken a fixed detection threshold, such as (e.g. Udry et al. 2003) or a mass cut well above the masses at which selection effects should play a role (Lineweaver & Grether 2003). A few detailed calculations of detection thresholds have been carried out. In Cumming, Marcy, & Butler (1999), we presented an analysis of 11 years of Doppler measurements of 76 stars as part of the Lick planet search. However, the conclusions regarding the mass-period distribution were limited because only 6 planets were then known. Endl et al. (2002) present a statistical analysis of the 37 star sample observed by the ESO Coudé Echelle spectrometer. Wittenmyer et al. (2006) present limits on companion mass for 31 stars observed at McDonald Observatory. The largest study so far is that of Naef et al. (2005), who derive detection thresholds for 330 stars from the ELODIE Planet Search and estimate planet occurrence rates.

In this paper, we analyse 8 years of radial velocity measurements from the Keck survey, consisting of data taken from the beginning of the survey in 1996 to the time of the HIRES spectrometer upgrade in 2004 August. The number of stars () and planets () included in the sample offer an order of magnitude improvement over our previous Cumming et al. (1999) analysis, and therefore the best opportunity to date to determine the occurrence rate of planets and their mass-period distribution. An outline of the paper is as follows. In §2, we describe a technique for identifying planets in radial velocity data, discuss the detection thresholds for the survey, and calculate limits on the mass and period of planets orbiting stars that do not have a significant detection. In §3, we use these limits to correct the mass and period distributions for incompleteness, and then characterize the occurrence rate of planets and the mass-period distribution. We summarize and conclude in §4.

## 2. Search for Planets

### 2.1. Observations

The Keck Planet Search program has been in operation since 1996 July, using the HIRES echelle spectrometer on the Keck I telescope (Vogt et al. 1994; Vogt et al. 2000). The data we analyze here were taken from the beginning of the survey up to 2004 August when the HIRES spectrometer was upgraded. They consist of radial velocity measurements of 585 F, G, K, and M stars (the fractions in these spectral classes are 7, 49, 27, and 16% respectively). Note that the M dwarf sample covers spectral types M5 and earlier; the F stars are of spectral type F5 and later. Selection of the target stars is described in Wright et al. (2004) and Marcy et al. (2005b). They lie close to the main sequence and are chromospherically quiet. They have , declination , and have no stellar companion within 2”. To ensure enough data points for an adequate Keplerian fit to the data, we further select only those stars with at least 10 observations over a period of two years or more. This excludes data for an additional 360 stars that were added in the two years prior to 2004 August. Figure 2 is a summary of the number, time baseline or duration, and mean rate of observations. Typical values are observations in total over a duration of years, with 3 observations per year. The target list of stars has changed over the years of the survey, with stars being dropped and added (see Marcy et al. 2005b for a discussion of the evolution of the target list), resulting in the spread in durations shown in Figure 2.

Figure 3 is a summary of the statistical Doppler measurement errors and the estimated jitter from Wright (2005). The Doppler error shown is the mean Doppler error for all observations for each star. These are statistical errors determined by the weighted uncertainty in the mean velocity of 400 spectral segments, each 2Å long. Most stars have a mean Doppler measurement error of (smaller errors of have been achieved at Keck following the 2004 HIRES upgrade, but these data are not included in this analysis). The stars in the sample have been selected to be chromospherically inactive, but still show stellar jitter at the few level (Saar et al. 1998). We adopt the jitter estimates described in Marcy et al. (2005b) and Wright (2005), in which the level of jitter is calibrated in terms of stellar properties, in particular , , and . These values include contributions from both intrinsic stellar jitter and systematic measurement errors. The typical jitter is for a chromospherically quiet star, with a tail of larger values for more active stars. This is comparable to the Doppler measurement errors, indicating that the velocity measurements have reached a precision for which stellar jitter and systematic errors are beginning to dominate the uncertainty.

In Figures 2 and 3 the dotted histograms summarize the observations of the 110 stars with stellar masses , which are almost entirely M dwarfs. In §3, we analyze the mass-orbital period distribution separately for stars with and , since the planet occurrence rate for M dwarfs appears to be smaller than that for FGK stars. Figures 2 and 3 show that in general the Doppler errors for the M dwarfs are larger than for the FGK stars, mostly due to their relative faintness. The dashed histograms in Figures 2 are for the subset of stars with an announced planet (see §2.4). Once a candidate planetary signature is detected in the data, we increase the priority of that star in our observing schedule, resulting in a greater number of observations for those stars with announced planets.

We include the Doppler errors and jitter by adding them in quadrature to find a total estimated error for each data point , . Figure 4 shows the distribution of the residuals after subtracting the mean velocity for a set of 386 “quiet” stars which after a preliminary analysis show no excess variability, long term trend, or evidence for a periodicity. We plot a histogram of the ratio for all 3436 velocities for these stars, compared with a normal distribution. The width of the observed distribution is % greater than a unit variance Gaussian, suggesting that the estimated variability is underpredicted by this factor. To allow for this, we have multiplied each by a factor of 1.3 in the analysis that follows. The dotted histogram shows a normal distribution with for comparison with the observed distribution. The Gaussian distribution underpredicts the tails of the distribution somewhat, but otherwise agrees quite well. The magnitude of this correction has only a small effect on the results of this paper, because when assessing the significance of a Keplerian orbit fit, we calculate ratios of chi-squared (for example, the ratio of chi-squareds with and without the Keplerian orbit included in the model) and so the role of the errors is to set the relative weighting of different data points (see Cumming et al. 1999 for a discussion).

### 2.2. Long Term Trends and Excess Variability

The first indication of a companion is often excess velocity variability above the level expected from measurement errors and stellar jitter. To assess this, we fit a straight line to each data set, and compare the observed scatter about the straight line to the predicted value. The data for each star are a set of measured velocities , observation times , and estimated errors . The estimated error is the quadrature sum of the Doppler error and stellar jitter, as described in §2.1. We fit either a constant () or a straight line () to the data. To test whether including a slope significantly improves the chi-squared of the fit,

 χ2=∑i (vi−fi)2σ2tot,i, (2)

we use an F-test (Bevington & Robinson 1992). The appropriate F-statistic is

 F(N−2),1=(N−2)(χ2constant−χ2slopeχ2slope) (3)

(Bevington & Robinson 1992), where is the for the best-fitting constant, and is the for the best-fitting straight line. We determine the probability that the observed value of is drawn from the corresponding -distribution. If this probability is smaller than %, we conclude that the slope is significant. We make the choice of % so that we expect no false detections in our sample of 585 stars. We find that 95 stars (16% of the total) have a significant slope. This fraction is consistent with the 20 out of 76 stars in the Lick sample, or 26%, that showed a long-term trend at the 0.1% significance level (Cumming et al. 1999).

Having decided whether a constant or a straight line best describes the long term behavior, we then test whether the residuals to the fit are consistent with the expected variability. We calculate the probability that or is drawn from a chi-squared distribution with or degrees of freedom respectively (Hoel, Port, & Stone 1971). We again choose a 0.1% threshold, so if this probability is smaller than 0.1% we infer that there is excess variability in the data. We find that of the 585 stars, 131 show significant variability at the 0.1% level, or 23% of the total. Of these 131 stars, 34 (26%) also show a significant slope (i.e. 6% of the 585 stars show both significant variability and a significant slope). This is similar to the Lick survey results, where we found 17 out of 76 stars, or 22%, with excess variability (Cumming et al. 1999; see also Nidever et al. 2002 who found that 107 out of 889 stars showed velocity variations of more than over 4 years).

### 2.3. Keplerian Fitting

To search for the signature of an orbiting planet, we fit Keplerian orbits to the radial velocities, and assess the significance of the fit (Cumming 2004, hereafter C04; Marcy et al. 2005b; O’Toole et al. 2007). The non-linear least-squares fit of a Keplerian orbit requires a good initial guess for the best-fitting parameters since there are many local minima in the space. Our approach is to first limit the fit to circular orbits, and use the best-fitting circular orbit parameters as a starting point for a full Keplerian fit.

It is important to note here that we search only for a single planet. We do not consider multiple planet systems in this paper, therefore our results for the mass and orbital period distributions apply only to the planet with the highest Doppler amplitude in a given system. To be consistent in this approach, we do not include any long term trends detected in §2.2 in the planet fits, and compare only to a constant velocity when assessing the significance of a given Keplerian fit. A star with a significant long-term trend will therefore be flagged as having a significant Keplerian fit with orbital period much longer than the duration of the observations. The multiplicity of planets as a function of their orbital periods is an important question that we leave to future work.

To find the best-fitting circular orbit, we calculate the Lomb-Scargle (LS) periodogram (Lomb 1976; Scargle 1982) for each data set. This involves fitting a sinusoid plus constant7 to the data for a range of trial orbital periods . For each period, the goodness of fit is indicated by the amount by which including a sinusoid in the fit lowers compared to a model in which the velocity is constant in time. This is measured by the periodogram power

 z(ω)=(N−3)2(χ2constant−χ2circ(ω)χ2circ(ω0)), (4)

where is the of a fit of a constant to the data, is the of the circular orbit fit, is the mean velocity, is the trial orbital frequency, and the best-fitting frequency. The number of degrees of freedom in the sinusoid fit is . A large power indicates that including a sinusoid significantly decreased the of the fit. We consider trial orbital periods between 1 day and 30 years.

We next choose two best fitting solutions as starting points for a full non-linear Keplerian fit. The two sinusoids are chosen so that they are well-separated in frequency. We then use a Levenberg-Marquardt algorithm (Press et al. 1992) to search for the minimum , starting with a Keplerian orbit with the same period and amplitude as the sinusoid fits, and trying several different choices for the time of periastron, eccentricity, and longitude of pericenter. Having obtained the minimum from the Keplerian fit, we define a power to measure the goodness of fit analogous to the LS periodogram power for circular orbits,

 ze(ω)=(N−5)4(χ2constant−χ2Kep(ω)χ2Kep(ω0)) (5)

(C04), where is the of a fit of a Keplerian orbit to the data (with degrees of freedom).

The significance of the fit depends on how often a power as large as the observed power would arise purely due to noise alone (C04; Marcy et al. 2005b). For a single frequency search, the distribution of powers due to noise alone can be written down analytically for Gaussian noise, it is

 Prob(z>z0)=(1+(ν+2)24z0ν)(1+4z0ν)−(ν+2)/2 (6)

(C04), where . However, the total false alarm probability depends on how many independent frequencies are searched. For a search of many frequencies, each independent frequency must be counted as an individual trial. The false alarm probability (FAP) is

 F=1−[1−Prob(z>z0)]Nf≈Nf Prob(z>z0), (7)

where is the number of independent frequencies, and in the second step we assume is small. For small , the FAP is simply the single frequency FAP multiplied by the number of frequencies.

An estimate of the number of independent frequencies is , where is the frequency range searched and is the duration of the data set (C04). For evenly-sampled data, the number of independent frequencies is , ranging from to the Nyquist frequency . For unevenly-sampled data, Horne & Baliunas (1986) found that for a search up to the Nyquist frequency (see also Press et al. 1992). This agrees with our simple estimate since . However, uneven sampling allows frequencies much higher than the Nyquist frequency to be searched (see discussion in Scargle 1982). In general, , by a factor of . For example, a set of 30 observations over 7 years has . A search for periods as short as 2 days therefore has . Therefore to detect a signal with a FAP of , the periodogram power for that signal must be large enough, or the Keplerian fit good enough, that the single-trial false alarm probability is .

The estimate for and equations (6) and (7) allow an analytic calculation of the false alarm probability (see Fig. 2 of C04). To check this analytic estimate, we determine the FAP and numerically using Monte Carlo simulations. The disadvantage of calculating the FAP in this way is that it is computationally intensive for Keplerian fits. This is the reason why we consider only the two best-fitting sinusoid models as starting points for the Keplerian fit. We find that the analytic estimate of the FAP agrees well with the FAP determined by the Monte Carlo simulations. Our method is to generate a large number of data sets consisting of noise only, using the same observation times as the data, and calculate the maximum power for each of them. The fraction of trials for which the maximum power exceeds the observed value then gives the false alarm probability. In addition, by fitting equation (7) to the numerical results, we can determine . This allows a determination of the FAP even when it is much smaller than (C04). We generate the noise in two ways, which give similar results for the FAP (see Cumming et al. 1999): (i) by selecting from a Gaussian distribution with standard deviation given by the observed error for each observation time, or (ii) by selecting with replacement from the observed velocities (after first subtracting the mean). The second approach (a “bootstrapping” method, see Press et al. 1992) has the advantage that it avoids the assumption of Gaussian noise; instead the actual velocity values are used as an estimate of the noise distribution. It is similar to the “velocity scrambling” method of Marcy et al. (2005b) for determining the false alarm probability for Keplerian fits, with the main difference being that we select with replacement from the observed velocities rather than randomizing the order of the observations.

### 2.4. Significant Detections

Figure 5 shows the results of the search for significant Keplerian fits. We plot the best-fit amplitude and period for stars with FAP%, divided into two categories. The open circles are for stars with an announced planet (i.e. a published orbital solution) as of 2005 May; the solid triangles are stars with a significant Keplerian fit, but not confirmed as a planet8. We will refer to these significant detections that are not announced as planets as “candidate” planets. There are 48 announced planets detected in our search (8.2% of the total number of stars), and 76 candidates (13% of the total). All but five of the announced planets in our sample of stars were detected by our algorithm (the 5 planets that were not detected orbit stars that were added to the survey only recently to confirm detections by other groups, see discussion below).

Of the 76 candidates, 27 have large velocity amplitudes corresponding to companion masses . The remaining 49 candidates, which have , fall into two groups: they are either at long orbital periods ( days), or low amplitudes () compared with the announced planets. That these are the detections that have not yet been announced makes sense since (i) it is difficult to constrain orbital parameters for a partial orbit, and so we generally wait for completion of at least one full orbit before announcing a planet, and (ii) for low amplitude signals, it becomes difficult to disentangle possible false signals from stellar photospheric jitter or from systematic variations in the measured velocities.

In the first case (long orbital periods), it is simple to understand the detection limit, which is set by the time baseline of the observations. The vertical dashed line in Figure 5 shows an orbital period of years (the duration of the longest data set considered here), and nicely divides the announced and candidate planets. In the second case (low amplitudes), an interesting question to consider is how the threshold for announcing a planet relates to the statistical threshold for detecting a Keplerian orbit. The statistical detection threshold depends on both the number of observations and signal amplitude. For circular orbits, an analytic expression for the signal to noise required to detect a signal with 50% probability is given by

 K√2s=[(NfF)2/ν−1]1/2 (8)

(Cumming et al. 2003; C04), where is the false alarm probability associated with the detection threshold, is the number of independent frequencies, is the noise level, and . The number of independent frequencies is set by the number of observations and the duration of the observations , as we discussed in §2.3. Figure 6 compares the signal to noise ratio for each detection with this analytic result. For this comparison, we define the noise to be the rms amplitude of the residuals to the best fit orbit, and the signal to noise ratio as . We show only those detections with best fitting period days and mass , for which signal to noise is expected to be the main limiting factor. The dotted curves show the analytic result for a detection probability of 50% and 99%.

The candidate detections mostly fall near the detection curves in Figure 6, whereas the announced planets lie above the curves. The five crosses represent planets that were detected by other groups. Their host stars, HD 8574, HD 74156, HD 82943, HD 130322, and HD 169830, were added to the Keck survey to confirm these detections, but do not yet have enough observations for a detection. They all lie below the dotted curves.

Inspection of Figure 6 shows that the detection threshold is determined by statistics when the signal to noise ratio is larger than . For lower amplitude signals, the statistical significance is no longer enough, since as we mentioned there is the danger that the observed variation is in fact from stellar jitter or systematic effects. Figure 6 shows that the effective detection threshold is then at a signal to noise of . To detect a planet with a lower amplitude than this requires significantly more work to rule out false signals.

Comparison with the published orbits shows that our automated technique reproduces the fitted orbital parameters well, except for 2 of the 48 announced planets. For HD 50499, we find an orbital period of days rather than 3000 days. We include a slope in the fit for this star, which reproduces the published orbital parameters. We find a period of 15 rather than 111 days for the highly eccentric planet around HD 80606 (which has ; Naef et al. 2001). These examples illustrate the weakness of the Lomb-Scargle periodogram at providing a good initial guess for the Keplerian fit, in particular for eccentric orbits, and emphasises the importance of trying many different starting periods. There are also strong aliasing or spectral leakage effects at 1 day, and so when fitting Keplerian orbits, we force the orbital period to be longer than 1.2 days rather than the 1 day limit of our periodogram search. We might also expect the search to fail for multiple planet systems; however, in all cases of announced multiple planet systems, we find a significiant single planet fit, usually for the planet with the largest velocity amplitude. This is the case even when the periods are close to or are harmonically related. For example, HD 128311 has two planets in a 2:1 resonance (Vogt et al. 2005). We detect the longer period planet in our search.

### 2.5. Upper Limits

We next calculate upper limits on the radial velocity amplitude of planets for those stars without a significant detection. To reduce the computational time and because we are not considering the eccentricity distribution in detail in this paper, we place upper limits on the amplitude of circular orbits only. Endl et al. (2002) and C04 showed that the detectability of a planet in an eccentric orbit is only slightly affected by eccentricity for , but can be substantially affected for larger eccentricities if the phase coverage is inadequate (e.g. see Fig. 6 of C04). About 14% of the known planet population has . In our sample, 6 stars out of 48 announced planets have (13%), and 3 have (6%). However, the true fraction with eccentricities greater than 0.5 may be larger because of the selection effects acting against highly eccentric orbits. Of the 76 candidate periodicities, 30 have , and 11 have . The larger fraction of eccentric orbits for these candidates than for the announced planets is likely due to the long orbital periods of many of the candidates for which the orbital eccentricity is not well-constrained. We leave a discussion of the eccentricity distribution to a future paper, and here assume circular orbits.

For all stars with FAP, we calculate upper limits as described in Cumming et al. (1999), utilizing the LS periodogram for sinusoid fits. At a given orbital period, we generate fake data sets of a sinusoid plus Gaussian noise. We assume that the amplitude of the noise is equal to the rms of the residuals to the best fitting sinusoid for the actual data. We then find the sinusoid amplitude that gives a LS periodogram power larger than the observed value in 99% of trials. We calculate the upper limit on as a function of orbital period. However, the upper limit is insensitive to period for because the uneven sampling gives good phase coverage for most periods (Scargle 1982; Cumming et al. 1999). For , the 99% upper limit scales close to (for periods years; see C04).

Figure 7 is a histogram of the mean upper limit on for orbital periods shorter than the duration of the observations ()9. Most stars have upper limits to of between and . Taking a typical value for the quadrature sum of jitter and Doppler errors to be (since both jitter and Doppler errors are typically ), these upper limits correspond to signal to noise ratios of between 2 and 3. This compares well with the analytic formula in equation (8) which gives for and . Notice that % of stars have upper limits . This is due to either large Doppler measurement errors for these stars, or a number of observations that is too small to give a good limit. The dotted histogram shows the upper limits for the M dwarfs only (). The upper limits are on average higher for these low mass stars. Radial velocity measurements are more difficult for M dwarfs because they are much fainter than solar mass stars.

### 2.6. Summary of Results

Figure 8 is a summary of the analysis in this section in the mass-orbital period plane. To convert between velocity amplitude and , we require the stellar mass (eq. [1] gives ). For the stars with significant Keplerian fits, either announced or candidate planets, we use the latest mass estimates given in the Takeda et al. (2007) and Valenti & Fisher (2005) catalogs (except for 7 of the candidate stars that are not listed in Takeda et al. 2007 or Valenti & Fischer 2005). For convenience, approximate stellar masses of the remaining stars are determined using the BV stellar mass relation given in Allen (2000)10.

We show the detections with FAP in Figure 8 as solid triangles and open circles, dividing them into candidates and announced planets respectively. As expected from the discussion in §2.4, the candidate periodicities (solid triangles) are mainly concentrated at or at days. The solid curves in Figure 8 summarize the upper limits as a function of period. For a given period, they show the mass which can be excluded at the 99% level from 5%, 20%, 50%, 80%, and 95% of stars. The aliasing effects are not strong: the upper limits vary smoothly with period because of the uneven time sampling which gives good phase coverage at most orbital periods (e.g. Scargle 1982). However, there is a slight reduction in sensitivity at periods close to 1 year, 2 years, and 1 day. The curves turn upwards and scale as for periods beyond days ( years), close to the duration of the observations.

The results shown in Figure 8 allow us to draw a number of conclusions. First, there are many candidate gas giants in orbital periods of 5–20 years, similar to our Solar System. Figure 8 shows that the main limitation at the moment for detection of an analog of our Solar System is the duration of the survey, rather than the sensitivity. We suspect that some of these long period candidates will turn out to be more massive that the indicated in Figure 8, since only a partial orbit has been observed, leaving the fitted mass uncertain. There are several candidates with . For these candidates, further observations are needed to rule out stellar jitter as the cause of the observed periodic variability. The upper limit curves continue smoothly to periods smaller than 3 days, and are not noticeably affected until they get close to 1 day. This implies that the abrupt drop in the number of planets at days is a real feature. In the “period valley” between 10–100 days (Jones et al. 2003; Udry et al. 2003), the detectability is good, with upper limits of for 50% of stars. However, for periods days, the upper limits are larger, . Therefore the reported lack of objects with at days by Udry et al. (2003) is clearly dependent on understanding the selection effects in this region. We address this in the next section by using these upper limits to correct the observed period and mass distributions for incompleteness.

## 3. The mass-period distribution

In this section, we describe a method for correcting for incompleteness by taking into account the non-detections, and discuss the resulting distribution of minimum masses and orbital periods. For conciseness we will refer to the minimum mass as “mass” throughout this section, although it should be noted that we do not include the distribution of inclination angles which is needed to determine the true mass from the measured minimum mass (Jorissen, Mayor, & Udry 2001; Zucker & Mazeh 2001). This is reasonable for a large statistical sample. The average value of the ratio of minimum mass to true mass is , a small correction for this analysis, and we also note that power law scalings are not affected by the unknown factors (Tabachnik & Tremaine 2002).

### 3.1. Including the upper limits

Several methods have been discussed in the literature for finding the distribution most consistent with a set of detections and upper limits (Avni et al. 1980; Feigelson & Nelson 1985; Schmitt 1985). Such data are known as censored data, and the analysis as survival analysis. Correcting for the upper limits involves counting which of them usefully constrain the distribution at a given point. Avni et al. (1980) present a method for doing this counting for a one-dimensional distribution. Here, we generalize this approach to the two-dimensional mass-period plane. We follow Avni et al. (1980) and present a heuristic derivation in this section; a more detailed derivation by a maximum likelihood method is given in the Appendix. The reader may find it useful to compare our discussion with §III.D of Avni et al. (1980) which describes the 1-dimensional case.

It is instructive to consider a hypothetical example to develop some intuition. Imagine that after looking at a given star there were three possible outcomes: we either (i) detect a planet, (ii) completely exclude the presence of a planet, or (iii) are not able to say anything (i.e. cannot exclude or confirm the presence of a planet). First, we observe stars with the result that planets are found and we are able to rule out planets from all of the remaining stars. The best estimate of the fraction of stars with planets is then . However, what if we are able to rule out planets only from a subset of stars, ? In this case, the best estimate of the planet fraction is to take the ratio of the number of detections to the number of stars for which a detection was possible, . We must take as the denominator because for each detected planet, the actual pool of target stars is less than the total pool, because the data for some of the stars are inadequate to detect that planet. As an extreme case, consider looking at 100 stars, and being able to say that one has a planet, one does not, and nothing about the remaining 98 stars ( and ). The best estimate for the planet fraction is then 50%. The extra 98 stars for which no useful upper limit could be obtained do not contribute to the estimate. This simple example tells us how to interpret Figure 8. In a region of the mass-period plane in which planets can be ruled out for a fraction of stars, the best estimate of the number of planets is times the number of detections.

Our approach is to assign an effective number for each detected planet . If selection effects are unimportant, , so that the detected planet is a good representation of the number of planets at that mass and period. However, will be greater than 1 if planet has orbital parameters in part of the mass-period plane where completeness corrections are important. For instance, if the completeness for planets at a given mass and period is 50%, then for a planet at that mass and period will be 2. The idea behind this method is that we are sampling the mass and period distribution at the discrete set of points corresponding to the mass and periods of the detected planets. Of course, the underlying distribution is likely to be smooth, and so the quantity should be thought of as the probability that a star has a planet with mass and period close to those of planet . The normalization of is such that the total fraction of stars with planets is .

To calculate for a given star with a detected planet, we must count how many of the stars with non-detections could have an undiscovered planet with the same mass and period as planet . We therefore consider each non-detection in turn and ask whether the upper limit calculated in §2.5 allows a companion with period and to be present. If so, we must increase to allow for this incompleteness. However, the upper limit allows a hypothetical planet to be present with any amplitude or orbital period that satisfies , not just the orbital parameters of planet . Therefore, rather than increasing by 1 for every upper limit that allows additional undiscovered planets at that location, we must increase by the probability that the hypothetical planet would have the same parameters as planet . In effect, we “share out” the hypothetical planet amongst all the possible locations in mass period space that are allowed by the upper limit. This leads to the following rule for increasing each iteration:

 Nn+1i=1+∑j,Kup,j>Ki NniN⋆Z(Kup,j), (9)

where labels the iteration and the sum is over all non-detections which allow an undiscovered planet with the properties of planet to be present. Here, for each of the non-detections considered in the sum, is the upper limit on the velocity amplitude (§2.5) at the orbital period of planet . The second term in equation (9) is the probability that a planet with a velocity amplitude below the upper limit has the period and amplitude corresponding to planet . It is weighted by the normalization factor in the denominator which counts all the possibilities consistent with the upper limit: these are either (i) a planet is present with , or (ii) no planet is present. Mathematically, we write this as the probability that the star does not have a planet with an amplitude that violates the upper limit (),

 Z(Kup)=1−∑i, Ki>KupNniN⋆, (10)

where the sum is over all detected planets whose velocity amplitude exceeds .

For example, suppose we wish to calculate for planet with . We start with our initial guess of for all . We then turn to planet , and, for that planet, consider each star with a non-detection in turn. Let us say that the first such star has high jitter or a small number of data points, so that the upper limit on the velocity amplitude of a companion is . This means that a companion with an amplitude of , the same as planet , cannot be ruled out, and so we must add a contribution to . To do this, we first use equation (10) to calculate , where the sum is over all detected planets with . If there are 30 such planets, and 500 total stars, then using the current values (this is the first iteration), we find . This is the probability (given our current values for ) that a star does not have a planet with . We use this value in the first term of the sum in equation (9), which means that we add an amount to . Because we know that there is no planet with around this star, the probability that there is a planet with the properties of planet is larger than . We now continue with the sum. Let us say that the second star with a non-detection is well-observed and those observations rule out a planet with the properties of planet . This star then contributes nothing to the sum in equation (9). We continue for all stars with non-detections to complete the sum in equation (9) and arrive at the new value . We then repeat this calculation to obtain for all detected planets . This entire procedure is iterated until all values of have converged.

An important question is how much our results depend on the candidate detections, since these detections await further observations before they can be confirmed as being due to an orbiting planet. Long period signals need further observations to cover at least one orbit; low amplitude signals are potentially due to other factors such as stellar jitter. Therefore it is likely that some of these candidate periodicities are not due to a planet and should not be included. Even if they are due to a planet, as more data are collected, the best-fit orbital parameters of these candidates may change. To answer this question, we have calculated the values of both with and without the candidate detections, by either including the candidate detections or by treating them as non-detections, with the upper limit given by the detected velocity amplitude. We find that our conclusions regarding the mass-orbital period distribution are similar in each case (see, for example Table 1 discussed below).

### 3.2. A power law fit using maximum likelihood

As an alternative to the non-parametric description of the period and mass distribution that we described in the last section, we also fit the distribution with the simplest parametric model, a power law in mass and period. One way to do this would be to take the corrected data given by the values we have described, and fit a power law to the histogram or the cumulative distribution. Instead, we have used a maximum likelihood technique to fit a power law in mass and orbital period simultaneously to the original data (see also Tabachnik & Tremaine 2002). In the Appendix we start with the same likelihood that is used to derive the non-parametric technique described in §3.1, but instead consider a parametric model in which the probablility of a star having a planet at mass and period is , where is a normalization constant. The resulting expression for the likelihood is given in equation (A19), and we evaluate this numerically and maximize it with respect to the two parameters and .

### 3.3. Results for F, G, K dwarfs

We first present results for the 475 stars with which have F, G, and K spectral types (see §2.1 for a discussion of the sample selection and the range of spectral types). We do not attempt a detailed study of the stellar mass dependence of planet occurrence rate or orbital parameters in this paper. Fischer & Valenti (2005) show that the apparent rate of occurrence of planets increases by a factor of 2 for stellar masses between and . Investigating the stellar mass dependence of planet properties requires untangling it from the effect of stellar metallicity, beyond the scope of this paper (e.g., see the recent discussion in Johnson et al. 2007). However, several recent papers have pointed out that the planet occurrence rate around M dwarfs appears to be several times lower than around F, G and K stars (Butler et al. 2004b, 2006b; Endl et al. 2006; Johnson et al. 2007). To avoid biasing our results for the occurrence rate of planets, and to investigate the occurrence rate around M dwarfs further, we treat stars with masses separately in §3.4. In addition, three of the 48 stars with announced planets were added to the survey to confirm detections by other groups. To ensure a fair sample, we remove these stars from our analysis.

Figure 9 shows the results of the calculation described in §3.1. We plot the mass and period of announced planets (black circles) and candidate detections (green circles), as well as the upper limit contours (blue curves) as in Figure 8. The value of for each detection is indicated by the area of the circle. The smallest circles, well above the detection threshold, correspond to . At lower masses, the values of are roughly consistent with the simple argument given in §3.1, that if the detection lies at a mass and period excluded by a fraction of the upper limits, then roughly . For , the completeness corrections are roughly a factor of two, and are close to unity for .

Figure 10 shows the result of our power-law fit (§3.2). We fit the data in the mass range and days, which includes 32 announced planets, and 4 candidate detections. The constraints on and are shown in Figure 10. The best fit values which maximize the likelihood are and . The error in is larger than the error in , presumably because the dynamic range in orbital periods is greater than in the planet masses. The best fitting power law gives a fraction of stars with a planet of % in this mass and period range, which compares with the value of 8.5% derived from our completeness corrections (see Table 2). The values of and are slightly correlated, in the same sense as Tabachnik & Tremaine (2002) observed, but to a smaller degree.

#### Fraction of stars with a planet

Summing in a particular region of the mass-orbital period plane and dividing by gives the fraction of stars with planets in that region. The percentage of stars with a planet more massive than a given mass and closer to the star than a given orbital period is listed in Table 1. In parentheses we give the Poisson error based on the number of detections. For each table entry, we give four values. The first two are the percentages derived by including both announced planets and unannounced candidates, with and without completeness corrections included. The third and fourth values are the percentages derived by including the announced planets only (in which case the velocity amplitude of each candidate is treated as an upper limit on velocity amplitude rather than a detection), with and without completeness corrections included. The two values are similar over most of Table 1. The largest differences of % are at long periods days, where there are very few announced planets, and many candidate periodicities.

#### Distribution of orbital periods

Figure 11 shows the orbital period distribution for planets with for orbital periods up to days, beyond which the detectability declines as the orbital period approaches the duration of the survey. In the lower panel, the dotted histogram is the distribution of detections, including announced and candidate detections; the solid histogram is the distribution of detections after correcting for completeness, i.e. summing in each bin. For each bin, we indicate the expected Poisson errors based on the number of detections but rescaled by the ratio of in that bin to the number of detections in that bin. The upper panel shows the cumulative histogram, showing the fraction of stars with a planet within a given orbital period. For clarity, we do not show the Poisson errors on the cumulative histogram, but they can be calculated based on the total number of stars. For example, approximately 2.4% of stars have a planet more massive than with an orbital period of days. This represents stars out of the total of 472, or a fraction %.

At the shortest periods, the period distribution shows the well known pile up of planets at orbital periods close to 3 days. This is illustrated in more detail in Figure 12 which shows the period distribution for those planets with days, and masses (the increased detectability of close in planets means that we can go to lower masses than in Fig. 11). All of these planets are announced; there are no candidate planets in this range of mass and period. Butler et al. (1996) mention that there are no significant selection effects that would lead to this pile up: we can see that very clearly in Figure 9, in which the upper limit curves continue smoothly to periods as short as 1 day with no change in detectability. The statistical significance of the pile up in our data depends on the model and range of orbital periods against which it is compared. A Kolmogorov-Smirnov (KS) test gives a 0.4% probability that the observed distribution is drawn from a uniform distribution in in the decade to days. If we extend the uniform distribution out to 100 days, the KS probability is 20%.

As we described earlier, a power law fit to the period distribution gives , which rises to longer periods. However, an alternative description of the distribution is a rapid increase in the planet fraction at orbital periods of days. Figure 11 shows that there is a change of slope in the cumulative distribution which suggests that the planet fraction increases beyond orbital periods of days. The change in slope does not depend on whether the candidate planets are included. If we assume that the orbital period distribution above and below days is flat, we find that the fraction of stars with a planet per decade is % at short periods, and % at long periods (the latter becomes % if only announced planets are included). Therefore, the incidence of planets increases by a factor of for periods longwards of days. The low planet fraction at intermediate orbital periods has been noted previously. Jones et al. (2003) and Udry et al. (2003) both pointed out that there is a deficit of gas giants at intermediate periods, days.

We have focused on the period distribution for days, but Figure 9 shows that there are many candidate planets with orbital periods days. In our analysis, we have assumed that the minimum mass and orbital period of detected companions are well-determined. However, for orbital periods longer that the time-baseline of the data, this is not the case: there exists a family of best-fitting solutions with a range of allowed orbital periods, masses, and eccentricities (see e.g. Ford 2005; Wright et al. 2007). To constrain the distribution at long orbital periods requires taking into account the distributions of orbital parameters allowed by the data for each star. For now, we extrapolate the period distribution determined for days to predict the occurrence rate of long period orbits assuming that either the flat distribution or the power law holds for longer orbital periods. The results are given in Table 2. For example, if the distribution is flat in beyond days, we expect that 17% of solar type stars harbor a gas giant (Saturn mass and up) within 20 AU. In the power law case, the fractions are larger, but not substantially larger because of the small value of . These extrapolations are consistent with the number of candidates we find at long orbital periods. If we sum the confirmed planets and candidates at long periods, taking the completeness corrections into account, and assuming that the fitted orbital periods are the correct ones, we find that 18% of stars have a planet or candidate within . This is the same fraction as our extrapolations suggest for orbital separations less than . However, the uncertainties in orbital parameters need to be taken into account before we can use the long period candidates to learn about the period distribution at long orbital periods.

#### The mass function of planets

The distribution of planets with and days is shown in Figure 13. As we have discussed, a power law fit to the mass function gives , with , so that the distribution in is approximately flat, but slowly rising to lower masses. Our value for agrees with the found by Butler et al. (2006a) by fitting the mass function of planets detected in the combined Keck, Lick, and AAT surveys (see also Jorissen, Mayor, & Udry 2001). The cumulative distribution in the upper panel of Figure 13 (which shows the fraction of stars with a planet more massive than a given mass) shows a corresponding close to linear increase to lower masses. The absence of a turnover in the cumulative distribution at low masses shows that our results are consistent with the mass function remaining approximately flat in to the lowest masses with good detectability.

An important question is whether the mass function is dependent on orbital period. In Figures 14 and 15, we show the mass distributions in three different period ranges: periods less than days, between days and 1 year, and greater than 1 year. In the context of theoretical models for planet formation and migration, these ranges correspond to planets that have undergone different amounts of migration (e.g. Ida & Lin 2004). For orbital periods less than a year, we give the distribution down to , but for longer orbital periods where detectability is not as good, we restrict the mass range to . We detect no close in planets with . This lack of close, massive planets has been noted before, and is significant: Figure 9 shows that the survey is complete in that region of the mass-period plane. At the longest orbital periods, there are almost as many detections in the mass range (half a decade) as there are in the range (a full decade), suggesting a steeper fall off with mass than the overall mass distribution. However, this depends on future confirmation of the candidates with additional observations, and depends on the completeness corrections, which are significant in the lowest mass bin.

Ida & Lin (2004) predict a mass-orbital period “desert” at low masses and intermediate orbital periods. Our data show no evidence for a drop in the planet occurrence rate at low masses, as can be seen in the central panel of Figure 14, although as Figure 9 shows, we are not able to address the distribution for masses in this period range because of selection effects. In their study of the mass-period distribution, Udry et al. (2003) found evidence for a deficiency of planets with at orbital periods days. They simulated the detectability of planets in that region, and concluded that detections would have been made if the mass distribution of planets at days was similar to that of the hot jupiters. Interestingly, we find Saturn mass candidates at periods longer than 1000 days, but not in the range –1000 days. However, the small number of detections in that region of the mass-period plane and the fact that the completeness corrections are significant there mean that we cannot come to any definite conclusions.

Fitting a uniform distribution in to the distributions in Figure 14, we find a fraction per decade of % for , % for , and % for . It is interesting to extrapolate this distribution to lower masses. For example, at short periods days, we expect 3.1% of stars to have a planet more massive than 10, and % to have an Earth mass planet or larger. For periods less than 1 year, extrapolation gives 7.4% of stars with , and 11% with .

#### Comparison with previous work

Our results for the fraction of stars with planets and the mass and orbital period distributions are generally consistent with previous determinations. Marcy et al. (2005a) estimate that 12% of stars have a gas giant within 20 AU, based on a flat extrapolation of the orbital period distribution of the detected planets in the combined Lick, Keck, and Anglo-Australian surveys. If we take announced planets only without completeness corrections, we find that the fraction of stars with a planet per decade is %, which extrapolates to % within 20 AU in good agreement with Marcy et al. (2005a).

Tabachnik & Tremaine (2002) fit the mass and period distributions with a double power law, , accounting for selection effects by estimating the detection thresholds for each of the Doppler surveys. They obtained , which agrees with our value , and , which agrees with our . Our error bars are larger than those of Tabachnik & Tremaine (2002) because in their work they included several surveys, and so have a larger total number of stars in their sample (their results for the Keck survey alone have similar error bars to our results). Extrapolating, Tabachnik & Tremaine (2002) found that 4% of solar type stars have planets with and . Our results give 6–9% depending on how the candidate detections are included (Table 1).

Lineweaver & Grether (2003) also fit a double power law in mass and period. Their technique was to define an area in which they estimate all planets have been detected around stars currently being surveyed, and extrapolate to longer periods and lower masses. For the mass function, they found , i.e. a significant rise in the mass function at low masses. We do not observe such a rise in Figure 13. For the period distribution, they found which again indicates a rise in the planet occurrence rate at long periods. They estimated that 9% of stars have a planet with and . Our extrapolation suggests a fraction % in this mass and period range (see Table 2).

Naef et al. (2005) present an analysis of data for 330 stars with 18 detected planets from the ELODIE Planet Search program. They give the fraction of stars with planets more massive than within three different orbital periods: % for , % for , and % for . For the same mass and period ranges, we find %, % and % (this last number is % if only announced planets are included).

### 3.4. Results for M dwarfs

We now turn to the M dwarfs in the sample. Figure 16 shows the results of the calculation described in §3.1 applied to the 110 stars with . Having detected 46 planets from 475 F, G, and K stars, we would expect 11 planets in this sample of M dwarfs if the planet frequency and detectability were the same. Instead, there are only 2 announced planets: GJ 876, which is in fact a triple system but our search detects only the most massive planet at days, and GJ 436, a planet in an orbit of less than 3 days.

Several recent papers have addressed the apparent paucity of gas giant planets around M dwarfs. In their paper announcing the discovery of the Neptune-mass planet orbiting GJ 436, Butler et al. (2004b) estimated that the planet fraction for M dwarfs is % for masses and periods years, roughly an order of magnitude lower than around F and G main sequence stars. Butler et al. (2006b) announced the detection of a planet in an orbit with days around GJ 849. Including both GJ 876 and GJ 849, they estimate a planet fraction % for planet masses and periods AU. In the same range of but with a larger mass limit , Johnson et al. (2007) find that this fraction is % for stars in the mass range -. Endl et al. (2006) give a limit on the fraction of M dwarfs with planets of % at the confidence level. This result is less constraining, since they estimate that their survey of 90 M dwarfs is 95% complete for and AU, and Table 1 shows that we find a planet fraction of % for planets in this mass and period range around FGK stars. These observational constraints agree with predictions of core accretion models for planet formation, which find that Jupiter mass planets should be rare around M dwarfs, with the mass function of planets shifted towards lower masses (Laughlin et al. 2004; Ida & Lin 2005; Kennedy & Kenyon 2008) (although Kornet & Wolf 2006 predict a greater incidence of gas giants at lower stellar masses if the protoplanetary disk parameters do not change with stellar mass).

With so few detections in our sample we obviously cannot say anything about the mass-period distribution. However, we can address the possible role of selection effects and constrain the relative fractions of stars with planets around dwarfs compared to FGK stars. We again take a maximum likelihood approach. We assume that the mass-period distribution for M dwarfs is the same power law distribution that we fit for the larger sample of FGK stars, , with and , but with a different normalization constant . (The mass-period distribution is likely different for M dwarfs than FGK stars, but this is the simplest assumption given the available data). We then calculate the likelihood for the ratio of normalization constants . This is shown in Figure 17. We use the same mass range , and period range days as in §3.1. The best fitting value is , indicating that M dwarfs are 10 times less likely to harbor a gas giant within 2000 days. The 95.4% () upper limit on the relative planet fraction is 0.51. Using the normalization from the power law model (which for the best-fitting model has 10.5% of stars with planets for FGK stars), we find the best-fitting M dwarf planet fraction to be %, with a upper limit of %.

The shape of the curve in Figure 17 is straightforward to understand. In the period range days and mass range , there are 35 detections out of 475 FGK stars, a fraction of 7.4%. If the planet fraction around M dwarfs is times the planet fraction around FGK dwarfs, we therefore expect to find detections amongst the 110 M dwarfs. The probability of detecting 1 planet is then from the Poisson distribution. Using Bayes’ theorem (e.g. Sivia 1996), we can view this expression as the probability distribution function for given the measured number of detections. We plot this as the dotted curve in Figure 17. The fact that the dotted curve lies to the right of the maximum likelihood result indicates that the selection effects favor detection of gas giants around M dwarfs by about 25% (if we increase the expected number by 25%, from to , the solid and dotted curves lie almost on top of each other). Although the Doppler errors and upper limits on velocity amplitude are generally greater for the M dwarfs, the lower stellar mass means that a gas giant planet intrinsically gives a larger velocity signal. The net effect is a slightly larger detectability of gas giants for M dwarfs.

This result shows that the deficit of gas giants around M dwarfs is statistically significant in our sample, and is not due to selection effects against finding companions to M dwarfs. However, the best-fitting value of the ratio is subject to small number statistics (Fig. 17). An illustration of this is the recently announced companion to GJ 849 (Butler et al. 2006b), which is one of the long period candidates shown in Figure 16. The data we analyse here do not include recent observations of this star taken in 2005 and 2006 that show the closure of the orbit. As a result, our search algorithm finds an orbital period of 4400 days, more than twice as long as the true orbital period of days. Therefore GJ 849 is actually just inside the region considered above ( days), suggesting that the best current estimate for the M dwarf planet fraction is % (Butler et al. 2006b) within 2000 days. The dashed curve in Figure 17 shows the result if we include the companion to GJ 849 in our calculation (by correcting the orbital period to the announced value by hand). Again, our result is well approximated by a Poisson distribution (but this time with two detections) if the expected number is increased by 25%. The best fit relative fraction is , which corresponds to % of M dwarfs with gas giants within 2000 days. The confidence limit on is .

## 4. Summary and Conclusions

We have carried out a systematic search for planets using precise radial velocity measurements of 585 stars from the Keck Planet Search. The number, duration, and frequency of observations, and typical Doppler measurement errors are summarized in Figures 2 and 3. This analysis provides a snapshot of the Keck Planet Search at the time of the HIRES spectrometer upgrade in 2004 August.

We systematically searched for planets by calculating the false alarm probability associated with Keplerian orbit fits to the data for each star (C04; Marcy et al. 2005b; O’Toole et al. 2007). This method allows the detection threshold for each star to be understood in terms of the number and duration of the observations, and the underlying “noise” from measurement errors, intrinsic stellar jitter, or additional low mass planets. The results are summarized in Figure 5. We find that all planets with orbital periods days, velocity amplitudes , and eccentricities have been announced. For stars without a detection, upper limits (Fig. 7) are typically for orbital periods less than the duration of the observations, and increase for longer periods (see C04 for a discussion of the period dependence of the detection threshold). The upper limits constrain the presence of additional planets, and allow us to study the mass and orbital period distribution. In section 3, we described a method to calculate the completeness corrections to the mass-orbital period distribution at low masses and long orbital periods. Our method is a generalization of the iterative method of Avni et al. (1980) to two dimensions. In the Appendix, we show that our approach corresponds to a maximum likelihood method with simple approximations for the likelihood functions of detections and non-detections.

The resulting completeness corrections for the 475 F, G and K stars in the sample are summarized in Figure 9, and Table 1 gives the fraction of stars with a planet as a function of minimum mass and orbital period (see §2.1 for details of the stellar sample including the distribution of spectral types). For masses , the detectability is good for periods as large as 2000 days. A power law fit to the data in this range gives a mass-period distribution with and . The normalization constant is such that the fraction of FGK stars with a planet in the mass range and period range days is 10.5%. In units corresponding to measuring planet masses in Jupiter masses and orbital periods in days, the value of the normalization is .

Table 2 shows the expected planet fractions obtained by extrapolating our results out to long periods. We estimate that % of stars have a gas giant planet within 20 AU. Extrapolating to low masses gives 11% of stars with an Earth mass planet or larger within 1 AU. This extrapolation is uncertain, since it takes the distribution derived for gas giant planets into the mass range of rocky planets, for which the formation and migration history is presumably quite different, e.g. Ida & Lin (2004a). A similar uncertainty applies for our extrapolation to long orbital periods also, since for example the relevant timescales for planet formation grow longer at larger orbital radii, although outwards migration can populate long period orbits (Veras & Armitage 2004; Martin et al. 2007).

We find several interesting features in the mass-period distribution. Massive planets () are rare at short orbital periods, as has been noted previously. There is no significant evidence for a lower cutoff in the mass function at intermediate orbital periods, down to planet masses of . Therefore we do not see any evidence yet for the planet desert proposed by Ida & Lin (2004a). For orbital periods longer than a year, there are almost as many detections in the mass range (half a decade) as there are in the range (a full decade), suggesting a steeper fall off with mass than the overall mass distribution at long periods. However, this result depends on several candidate detections with in this period range that await confirmation. A dependence of the mass function on orbital period might indicate differences in migration mechanisms for different planet masses (Armitage 2007). The orbital period distribution shows an increase in the occurrence rate of gas giants of a factor of 5 beyond days. Theoretical models of planet formation generally predict a smooth increase in the incidence of gas giants at longer orbital periods due to the increasing rate of migration as a planet moves inwards through the protoplanetary disk (e.g. Figure 1 of Ida & Lin 2004b; Figure 5 of Armitage 2007). The sharp increase in the period distribution at days shown in Figure 11 may reflect some particular radial structure in the protoplanetary disk. Ida & Lin (2008b) propose an explanation for this upturn based on a surface density enhancement at the ice line due to a local pressure maximum in the disk. Detailed comparisons between these various models and our results would constrain parameters such as the ratio of migration to disk depletion timescales (e.g. Ida & Lin 2004; Armitage 2007).

Because of the small number of detections in the sample of 110 M dwarfs, we are not able to constrain the mass-period distribution for these stars. However, by assuming that the mass-period distribution is the same for M dwarfs as for more massive stars, we constrained the occurrence rate of planets relative to the FGK stars, taking into account possible differences in detectability between the two groups. Our results shows that the occurrence rate of gas giants within days is times smaller for M dwarfs than FGK dwarfs (Fig. 17), with a two sigma limit . A lower incidence of Jupiter mass planets around M dwarfs is predicted by core accretion models for planet formation (Laughlin, Bodenheimer, & Adams 2004; Ida & Lin 2005; Kornet & Wolf 2006; Kennedy & Kenyon 2008). Comparing with Figure 8 of Ida & Lin (2005), we find that both the absolute and relative occurrence rates that we derive for Jupiter mass planets agree best with their standard model, in which disk mass is an increasing function of stellar mass (). Kennedy & Kenyon (2008) scale their disk mass , but include a detailed calculation of the position of the snow line. Their Figure 7 shows that the probability of having at least one giant planet is times lower for star than a star, within the range found in this paper. We can rule out a larger gas giant planet fraction for M dwarfs than for solar mass stars, as found by Kornet & Wolf (2006) for models in which the disk parameters were independent of stellar mass.

Our calculations can be improved in several respects. First, we neglected eccentricity when accounting for non-detections. This is reasonable for most values of , since eccentricity has a large effect on detectability for (Endl et al. 2002; C04). However, the population of high eccentricity planets () is not well constrained. In addition, there are more subtle selection effects involving eccentricity. For example, Cumming (2004) showed that non-zero eccentricity enhances detectability for orbital periods longer than the time baseline of the data, introducing a bias in the longest period orbits towards systems with . Further analysis is required to study the eccentricity distribution and the orbital period distribution of long period planets. Our data potentially allow us constrain the distribution of orbital periods beyond the 8 year time baseline of the observations, but this will require averaging over the range of possible eccentricities for those outer planets. We did not include multiple planet systems. This introduces some uncertainty in our derived distributions, since our technique identifies only a single planet. In a multiple system, this is the planet with the largest velocity amplitude, so that the distributions derived here are for the most detectable planet in a system. Finally, our method for including the upper limits involves dividing the data into either detections or non-detections, which depends on the choice of detection threshold. A better approach would be to calculate the probabilities directly and include them in the analysis (see eq. [A1]). Techniques to evaluate the relevant Bayesian integrals over the multidimensional parameter space have been discussed in the literature (Ford 2006; Ford & Gregory 2007; Gregory 2007a, 2007b). This approach will allow orbital eccentricity, the uncertainty in parameters associated with long orbital periods, and multiple companions to be included.

We thank Peter Bodenheimer, Rychard Bouwens, Ting-Kuei Chou, Gil Holder, Greg Laughlin, Doug Lin, and Steve Thorsett for useful discussions. We gratefully acknowledge the dedication and support of the Keck Observatory staff. We thank NASA and the University of California for generous allocations of telescope time. The authors extend thanks to those of Hawaiian ancestry on whose sacred mountain of Manua Kea we are privileged to explore the universe with them. We are grateful for generous funding support through NASA grant NNG05GK92G to GWM, NSF grant AST-0307493 to SSV, NASA grant NNG05G164G to DAF, and the Cottrell Science Scholar program to DAF. AC thanks Caltech Astronomy Department and the Kavli Institute for Theoretical Physics, Santa Barbara for hospitality while this work was in progress. Part of this work was completed while AC was supported by NASA at the University of California, Santa Cruz, through Hubble Fellowship grant HF-01138 awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA, under contract NAS 5-26555. AC is currently supported by an NSERC Discovery Grant, Le Fonds Québécois de la Recherche sur la Nature et les Technologies, and the Canadian Institute for Advanced Research, and is an Alfred P. Sloan Research Fellow.

## Appendix A Correcting for the upper limits: A maximum likelihood approach

### a.1. Likelihood function for detections and non-detections

In this Appendix, we derive equation (9) for the completeness corrections starting with a maximum likelihood approach. We assume that the fraction of stars with a planet is , with a mass-period distribution normalized so that . The likelihood of the data for star is

 Li=(1−FP)qi+∫dlnM∫dlnP f(M,P)pi(M,P), (A1)

where is the probability of the data given a planet of mass and period , and is the probability of the data given that no planet is present (see eqs. [10] and [11] of C04). To determine , we maximize the total likelihood, which is the product over all stars of the individual likelihoods, .

Note that equation (A1) for the likelihood assumes that each star has either no planet or one planet, i.e. that the presence of a planet with mass and period excludes the possibility of additional planets with other masses and periods. This is consistent with our search algorithm described in §2.3 which fits a single Keplerian orbit, and for multiple planet systems identifies only the planet with the largest radial velocity amplitude. The distribution that we derive should therefore be considered as the mass-period distribution of the most detectable planet in a system. This is a close approximation to the true distribution since the number of multiple planet systems is % of the total (e.g. Marcy et al. 2005a). To include the possibility of multiple planets, the likelihood function can be derived by considering each bin in mass-period space as an independent Poisson trial (e.g., as Tabachnik & Tremaine 2002), however this reduces to equation (A1) when the expected number of planets per star is (compare eq. [10] of Tabachnik & Tremaine 2002 with eq. [A5] below; see also Appendix B of Tokovinin et al. 2006 for a clear discussion).

In this paper, rather than evaluate and directly, we have classified each data set as either a detection or a non-detection. In the case of a detection, we make the approximation

 qi≈0,pi(M,P)≈MiPiδ(M−Mi)δ(P−Pi), (A2)

since we expect the likelihood to be strongly peaked near the best fit mass and period for a strong detection, with a vanishing probability that no planet is present. We write the mass, period, and velocity amplitude of detection as , , and . For a non-detection, we expect

 pi(M,P)∝{qiKKup,i, (A3)

since we are able to rule out velocity amplitudes above the upper limit but not below it, and it remains possible that the star has no planet. Substituting these approximations into equation (A1) gives

 Li∝{f(Mi,Pi)detection1−∫K>Kup,idlnM dlnP f(M,P)non-detection (A4)

where we have used the normalization . The total likelihood is

 L=Nd∏i=1f(Mi,Pi) N⋆−Nd∏j=1(1−∫K>Kup,jdlnM dlnP f(M,P)), (A5)

where is the number of detections out of stars. Equation (A5) is a two-dimensional generalization of the likelihood function of Avni et al. (1980).

As a quick aside to gain some intuition, let’s assume that is a constant independent of and . In addition, assume that is the same for each star with a non-detection. Then,

 L∝FNdP (1−kFP)N⋆−Nd (A6)

where is the fraction of planets ruled out by the upper limit. To find the best fit value of , we maximize by setting . This gives

 FP=NdN⋆k−1. (A7)

If the upper limit rules out the whole mass-period plane (in other words we can say that a star without a detection definitely does not have a planet) then and . However, if then we can exclude only a fraction of planets, and our estimate of must therefore be larger. For example, if a third of planets lie above the upper limit (), we conclude that there are three times as many planets as we actually detect, .

### a.2. Maximizing the likelihood: non-parametric approach

To proceed further, one could divide the mass-period plane into a grid and solve for in each grid cell (a non-parametric approach), or assume a parametric form for and find the parameters that maximize the likelihood. We follow Avni et al. (1980, §Vb) which is to discretize at the locations of the detected planets, i.e. write , where we use the shorthand , the sum is over the detections, and the normalization is . This method gives in a non-parametric way, but without binning the distribution. The total log likelihood is

 logL=∑ilogfi + ∑jlog⎡⎣1−∑i,Ki>Kup,jfi⎤⎦, (A8)

where the sums with index are over detections and the sum with index is over all non-detections.

We can now go ahead and maximize with respect to the values of . We set , which gives

 fi=⎡⎣∑j,Kup,j

where the sum with index is over all non-detections with upper limits that exclude a companion with the amplitude of detection , and we define

 Z(Kup)=1−∑i,Ki>Kupfi, (A10)

which has a sum over all detections that have velocity amplitudes larger than the specified upper limit .

Equation (A9) is an equation for which can be solved iteratively, as in §3.1. However, we proceed a little further in order to make connection with the heuristic derivation given in §3.1. We write equation (A9) as

 1=fi∑j,Kup,j

and then sum both sides over the detections, from to . After changing the order of the sums on the right hand side and simplifying, we find the result

 N⋆=∑j1Z(Kup,j) (A12)

where the sum is over all non-detections. Using this to rewrite equation (A9), we find

 fiN⋆=1+∑j,Kup,j>KifiZ(Kup,j), (A13)

where the sum is over all non-detections which have upper limits that allow a companion with the same amplitude as detection to be present. Equation (A13) is therefore equivalent to equation (A9), and with the final definition , reduces to equation (9) of §3.1 when solved iteratively.

### a.3. The limit of small bin size

One might worry that by following the distribution only at the location of the detected planets, we are missing those areas of the mass-period plane in which there are no detections. In fact, we show now that the converged solution has non-zero values for only at the locations of the detected planets. We start with equation (A5) and divide the mass-period plane into grid cells, labelled by so that we will write averaged over grid cell as . The grid cell containing detection we write as . The likelihood is then given by

 logL=∑ilogf(αi) + ∑jlog⎡⎢⎣1−∑α,K(α)>Kup,jf(α)⎤⎥⎦. (A14)

In the last term, is the velocity amplitude associated with grid cell , and so the sum is over only those grid cells that are constrained by the upper limit . We assume that the grid cells are small enough that the entire grid cell is either excluded or allowed by the upper limit; in practice only a part of the grid cell may be excluded by the upper limit, but we ignore this here for clarity. Now, maximizing with respect to the set of by setting , we find

 f(α)=Nd(α) ⎡⎢⎣∑j,Kup,j>K(α)1Z(Kup,j)⎤⎥⎦−1, (A15)

where is the number of detections in grid cell and . Following th