A Bayesian Periodogram Finds Evidence for Three Planets in 47 Ursae Majoris
A Bayesian analysis of 47 Ursae Majoris (47 UMa) radial velocity data confirms and refines the properties of two previously reported planets with periods of 1079 and 2325 days. The analysis also provides orbital constraints on an additional long period planet with a period days. The three planet model is found to be times more probable than the next most probable model which is a two planet model. The nonlinear model fitting is accomplished with a new hybrid Markov chain Monte Carlo (HMCMC) algorithm which incorporates parallel tempering, simulated annealing and genetic crossover operations. Each of these features facilitate the detection of a global minimum in . By combining all three, the HMCMC greatly increases the probability of realizing this goal. When applied to the Kepler problem it acts as a powerful multi-planet Kepler periodogram.
The measured periods are , , and d, and the corresponding eccentricities are , , and . The results favor low eccentricity orbits for all three. Assuming the three signals (each one consistent with a Keplerian orbit) are caused by planets, the corresponding limits on planetary mass () and semi-major axis are
(, ), (, ), and (, ), respectively. Based on a three planet model, the remaining unaccounted for noise (stellar jitter) is m s.
The velocities of model fit residuals were randomized in multiple trials and processed using a one planet version of the HMCMC Kepler periodogram. In this situation periodogram peaks are purely the result of the effective noise. The orbits corresponding to these noise induced periodogram peaks exhibit a well defined strong statistical bias towards high eccentricity. We have characterized this eccentricity bias and designed a correction filter that can be used as an alternate prior for eccentricity, to enhance the detection of planetary orbits of low or moderate eccentricity.
keywords:stars: planetary systems; stars: individual: 47 Ursae Majoris; methods: statistical; methods: numerical; techniques: radial velocities.
Improvements in precision radial velocity (RV) measurements and continued monitoring are permitting the detection of lower amplitude planetary signatures. One example of the fruits of this work is the detection of a super earth in the habitable zone surrounding Gliese 581 (Udry et al., 2007). This and other remarkable successes on the part of the observers is motivating a significant effort to improve the statistical tools for analyzing radial velocity data (e.g., Loredo & Chernoff 2003, Loredo 2004, Cumming 2004, Gregory 2005a & b, Ford 2005 & 2006, Ford & Gregory 2006, Cumming & Dragomir 2010). Much of the recent work has highlighted a Bayesian MCMC approach as a way to better understand parameter uncertainties and degeneracies and to compute model probabilities.
Gregory (2005a, b, c and 2007a, b, c) presented a Bayesian MCMC algorithm that makes use of parallel tempering (PT) to efficiently explore a large model parameter space starting from a random location. It is able to identify any significant periodic signal component in the data that satisfies Kepler’s laws and thus functions as a Kepler periodogram
The latest version of the algorithm, Gregory (2009), incorporates a genetic crossover operation into the MCMC algorithm. The new adaptive hybrid MCMC algorithm (HMCMC) incorporates parallel tempering, simulated annealing and genetic crossover operations. Each of these techniques was designed to facilitate the detection of a global minimum in . Combining all three in an adaptive hybrid MCMC greatly increase the probability of realizing this goal.
Butler & Marcy (1996) first reported a 1090 day companion to 47 UMa using data from Lick Observatory. With additional velocity measurements over 13 yr, Fischer et al. (2002) announced a long-period second planet, 47 UMa c, with a period of days and a mass of . Naef et al. (2004) reported observations from the fiber fed echelle spectrograph ELODIE of 47 UMa, and noted that the second planet was not evident in their data. Wittenmyer, Endl & Cochran (2007) reported that there is still substantial ambiguity as to the orbital parameters of the proposed planetary companion 47 UMa c. They gave a period of 7586 day for one orbital solution, and 2594 day for two others. In their latest work Wittenmyer et al. (2009), their best-fit 2-planet model now calls for days. In this paper we present a Bayesian analysis of the latest Lick observatory measurements and a combined Lick plus McDonald Observatory (Wittenmyer et al. 2009) data set.
We also report on an investigation of the behavior of the Bayesian HMCMC Kepler periodogram to noise. The noise data sets were formed by randomly interchanging velocity measurements.
2 The adaptive hybrid MCMC
The adaptive hybrid MCMC (HMCMC) is a very general Bayesian nonlinear model fitting program. After specifying the nonlinear model, data and priors, Bayes theorem dictates the target joint probability distribution for the model parameters which can be very complex. To compute the marginals for any subset of the parameters it is necessary to integrate the joint probability distribution over the remaining parameters. In high dimensions, the principle tool for carrying out the integrals is Markov chain Monte Carlo based on the Metropolis algorithm. The greater efficiency of an MCMC stems from its ability, after an initial burn-in period, to generate samples in parameter space in direct proportion to the joint target probability distribution. In contrast, straight Monte Carlo integration randomly samples the parameter space and wastes most of its time sampling regions of very low probability.
An important feature that prevents the HMCMC from becoming stuck in a local probability maximum is parallel tempering. Multiple MCMC chains are run in parallel. The joint probability density distribution for the parameters () of model , for a particular chain, is given by
Each MCMC chain corresponding to a different , with the value of ranging from zero to 1. When the exponent , the term on the LHS of the equation is the target joint probability distribution for the model parameters, . It is the posterior probability of a particular choice of parameter vector, , given the data represented by , the model choice , and the prior information . In general, the model parameter space of interest is a continuum so is a probability density distribution. The first term on the RHS of the equation, , is the prior probability density distribution of , prior to the consideration of the current data . The second term, , is called the likelihood and it is the probability that we would have obtained the measured data for this particular choice of parameter vector , model , and prior information . At the very least, the prior information, , must specify the class of alternative models (hypotheses) being considered (hypothesis space of interest) and the relationship between the models and the data (how to compute the likelihood). For further details of the likelihood function for this problem see Gregory (2005b). In many situations the log of the likelihood is simply proportional to the familiar statistic. If we later acquire another data set then the new prior, , is equal to our previous posterior, , i.e., . An exponent , yields a broader joint probability density equal to the prior. The reciprocal of is analogous to a temperature, the higher the temperature the broader the distribution.
For parameter estimation purposes 8 chains
() were employed. At an intervals of 10 iterations, a pair of adjacent chains on the tempering ladder are chosen at random and a proposal made to swap their parameter states. A Monte Carlo acceptance rule determines the probability for the proposed swap to occur (e.g., Gregory 2005a, eq. 12.12). This swap allows for an exchange of information across the population of parallel simulations. In low (higher temperature) simulations, radically different configurations can arise, whereas in higher (lower temperature) states, a configuration is given the chance to refine itself. The lower chains can be likened to a series of scouts that explore the parameter terrain on different scales. The final samples are drawn from the chain, which corresponds to the desired target probability distribution. For , the distribution is much flatter. The choice of values can be checked by computing the swap acceptance rate. When they are too far apart the swap rate drops to very low values.
Each parallel chain employs the Metropolis algorithm. At each iteration a proposal to jump to a new location in parameter space is generated from independent Gaussian proposal distributions (centered on the current parameter location), one for each parameter. In general, the ’s of these Gaussian proposal distributions are different because the parameters can be very different entities. Also if the ’s are chosen too small, successive samples will be highly correlated and will require many iterations to obtain an equilibrium set of samples. If the ’s are too large, then proposed samples will very rarely be accepted. The process of choosing a set of useful proposal ’s when dealing with a large number of different parameters can be very time consuming. In parallel tempering MCMC, this problem is compounded because of the need for a separate set of Gaussian proposal ’s for each chain (different tempering levels). This process is automated by an innovative two stage statistical control system (Gregory 2007b, Gregory 2009) in which the error signal is proportional to the difference between the current joint parameter acceptance rate and a target acceptance rate, typically 25% (Roberts et al. 1997). A schematic of the full adaptive control system (CS) is shown in Figure 1.
The first stage CS, which involves annealing the set of Gaussian proposal distribution ’s, was described in Gregory 2005a. An initial set of proposal ’s ( of the prior range for each parameter) are used for each chain. During the major cycles, the joint acceptance rate is measured based on the current proposal ’s and compared to a target acceptance rate. During the minor cycles, each proposal is separately perturbed to determine an approximate gradient in the acceptance rate for that parameter. The ’s are then jointly modified by a small increment in the direction of this gradient. This is done for each of the parallel chains. Proposals to swap parameter values between chains are allowed during major cycles but not within minor cycles.
The annealing of the proposal ’s occurs while the MCMC is homing in on any significant peaks in the target probability distribution. Concurrent with this, another aspect of the annealing operation takes place whenever the Markov chain is started from a location in parameter space that is far from the best fit values. This automatically arises because all the models considered incorporate an extra additive noise Gregory (2005b), for reasons discussed in Section 3, whose probability distribution is Gaussian with zero mean and with an unknown standard deviation . When the of the fit is very large, the Bayesian Markov chain automatically inflates to include anything in the data that cannot be accounted for by the model with the current set of parameters and the known measurement errors. This results in a smoothing out of the detailed structure in the surface and, as pointed out by Ford (2006), allows the Markov chain to explore the large scale structure in parameter space more quickly. The chain begins to decrease the value of as it settles in near the best-fit parameters. An example of this is shown in Figure 2. In the early stages is inflated to around 38 m s and then decays to a value of m s over the first 9,000 iterations. This is similar to simulated annealing, but does not require choosing a cooling scheme.
Although the first stage CS achieves the desired joint acceptance rate, it often happens that a subset of the proposal ’s are too small leading to an excessive autocorrelation in the MCMC iterations for these parameters. Part of the second stage CS corrects for this. The goal of the second stage is to achieve a set of proposal ’s that equalizes the MCMC acceptance rates when new parameter values are proposed separately and achieves the desired acceptance rate when they are proposed jointly. Details of the second stage CS were given in Gregory 2007b.
The first stage is run only once at the beginning, but the second stage can be executed repeatedly, whenever a significantly improved parameter solution emerges. Frequently, the burn-in period occurs within the span of the first stage CS, i.e., the significant peaks in the joint parameter probability distribution are found, and the second stage improves the choice of proposal ’s based on the highest probability parameter set. Occasionally, a new higher (by a user specified threshold) target probability parameter set emerges after the first two stages of the CS are completed. The control system has the ability to detect this and automatically re-activate the second stage. In this sense the CS is adaptive. If this happens the iteration corresponding to the end of the control system is reset. The useful MCMC simulation data is obtained after the CS are switched off.
The adaptive capability of the control system can be appreciated from an examination of Figure 1. The upper left portion of the figure depicts the MCMC iterations from the 8 parallel chains, each corresponding to a different tempering level as indicated on the extreme left. One of the outputs obtained from each chain at every iteration (shown at the far right) is the prior likelihood. This information is continuously fed to the CS which constantly updates the most probable parameter combination regardless of which chain the parameter set occurred in. This is passed to the ”Peak parameter set” block of the CS. Its job is to decide if a significantly more probable parameter set has emerged since the last execution of the second stage CS. If so, the second stage CS is re-run using the new more probable parameter set which is the basic adaptive feature of the CS.
The CS also includes genetic algorithm block which is shown in the bottom right of Figure 1. The current parameter set can be treated as a set of genes. In the present version, one gene consists of the parameter set that specify one orbit. On this basis, a three planet model has three genes. At any iteration there exist within the CS the most probable parameter set to date , and the most probable parameter set from the 8 chains for the most recent iteration . At regular intervals (user specified) each gene from is swapped for the corresponding gene in . If either substitution leads to a higher probability it is retained and updated. The effectiveness of this operation can be tested by comparing the number of times the gene crossover operation gives rise to a new value of compared to the number of new arising from the normal parallel tempering MCMC iterations. The gene crossover operations prove to be very effective, and give rise to new values times more often than MCMC operations. Of course, most of these swaps lead to very minor changes in probability but occasionally big jumps are created.
Gene swaps from , the parameters of the second most probable current chain, to are also utilized. This gives rise to new values of at a rate approximately half that of swaps from to . Crossover operations at a random point in the entire parameter set did not prove as effective except in the single planet case where there is only one gene. Further experimentation with this concept is ongoing.
3 Data and Analysis
Our initial analysis was based on data obtained at the Lick observatory and spans a period of 21.6 years. The data are listed in Tables 1 and 2. In addition to the observation time, radial velocity , and velocity error , the detector dewar number used is also included. We originally analyzed the data ignoring possible residual velocity offsets associated with dewar changes (Case A). To investigate how robust the results were we subsequently repeated the analysis incorporating the dewar velocity offsets as additional unknown parameters (Case B). In Case A the data from all 6 dewars are used. For Case B we excluded dewar 1 because with only a single measurement the analysis is unable to separate the offset from the model velocity contribution which reduces the time base by 235 days. Results for the two cases follow in subsequent sections labeled accordingly. In Section 6, we extend the analysis to include the Wittenmyer et al. (2009) data from the 9.2 m Hobby-Eberly Telescope (HET) and 2.7 m Harlam J. Smith (HJS) telescopes of the McDonald Observatory. In the rest of this section we describe the model fitting equations and the selection of priors for the model parameters. We also characterize a noise induced eccentricity bias that leads to a second choice for an eccentricity prior.
|m s||m s||m s||m s||m s||m s|
|m s||m s||m s||m s||m s||m s|
We have investigated the 47 UMa data using models ranging from a single planet to five planets. For a one planet model the predicted radial velocity is given by
and involves the 6 unknown parameters
a constant velocity.
the orbital period.
the orbital eccentricity.
the longitude of periastron.
the fraction of an orbit, prior to the start of data taking, that periastron occurred at. Thus, the number of days prior to that the star was at periastron, for an orbital period of P days.
the true anomaly, the angle of the star in its orbit relative to periastron at time .
We utilize this form of the equation because we obtain the dependence of on by solving the conservation of angular momentum equation
Our algorithm is implemented in Mathematica and it proves faster for Mathematica to solve this differential equation than solve the equations relating the true anomaly to the mean anomaly via the eccentric anomaly. Mathematica generates an accurate interpolating function between and so the differential equation does not need to be solved separately for each . Evaluating the interpolating function for each is very fast compared to solving the differential equation, so the algorithm should be able to handle much larger samples of radial velocity data than those currently available without a significant increase in computational time. For example, an increase in the data by a factor of 6.5 resulted in only an 18% increase in execution time.
As described in more detail in Gregory 2007a, we employed a re-parameterization of and to improve the MCMC convergence speed motivated by the work of Ford (2006). The two new parameters are and . Parameter is well determined for all eccentricities. Although is not well determined for low eccentricities, it is at least orthogonal to the parameter. We use a uniform prior for in the interval 0 to and uniform prior for in the interval to . This insures that a prior that is wraparound continuous in maps into a wraparound continuous distribution in . The big square holds two copies of the probability patch in which doesn’t matter. What matters is that the prior is now wraparound continuous in .
In a Bayesian analysis we need to specify a suitable prior for each parameter. These are tabulated in Table 3. For the current problem, the prior given in Equation 1 is the product of the individual parameter priors. Detailed arguments for the choice of each prior were given in Gregory 2007a.
|Parameter||Prior||Lower bound||Upper bound|
|Orbital frequency||1/1.5 d||1/1000 yr|
|(number of planets)|
|V (m s)||Uniform|
|b) Ecc. noise bias correction filter||0||0.99|
|Extra noise (m s)||0 (s)|
Gregory 2007a discussed two different strategies to search the orbital frequency parameter space for a multi-planet model: (i) an upper bound on is utilized to maintain the identity of the frequencies, and (ii) all are allowed to roam over the entire frequency range and the parameters re-labeled afterwards. Case (ii) was found to be significantly more successful at converging on the highest posterior probability peak in fewer iterations during repeated blind frequency searches. In addition, case (ii) more easily permits the identification of two planets in 1:1 resonant orbits. We adopted approach (ii) in the current analysis.
All of the models considered in this paper incorporate an extra noise parameter, , that can allow for any additional noise beyond the known measurement uncertainties
We used two different choices of priors for eccentricity, a uniform prior and eccentricity noise bias correction filter that is described in the next section.
3.1 Eccentricity Bias
When searching for low amplitude orbits any true signal has to compete against spurious orbital signals arising from noise. It was observed that the majority of the probability peaks detected in low signal-to-noise residuals exhibited high eccentricities. The upper panel in Figure 3 shows MCMC period parameter versus iteration for a 1 planet model fit to residuals (with randomized velocity values) from a 3 planet model fit. The lower panel is the same for the eccentricity parameter. The HMCMC finds many probability peaks spread over the full period range. There is no significance to the concentration of periods around 100 and 1500 days as the location of period concentrations changes markedly in other realizations of the velocity randomization. The concentration of eccentricity towards higher values is a regular feature. The corresponding plot of eccentricity shows a preponderance of high eccentricity values. Figure 4 shows a phase plot for one of these high eccentricity orbits which provides further insight into why high eccentricities orbits are favored. It is clear that for most of the orbit () the predicted shape is relatively flat providing an agreeable fit to points that fluctuate in an uncorrelated noise like fashion about some mean. Only for a small portion of the orbit does the noise have to conspire to give rise to the rapidly changing orbital velocity peak. To mimic a circular velocity orbit the noise points would have to appear correlated over a larger fraction of the orbit. For this reason it is more likely that noise will give rise to spurious highly eccentric orbits than low eccentricity orbits.
To explore this effect more quantitatively we analyzed a large number of real data sets where the observing times were kept fixed but the velocity residual data was randomly reorganized. In each trial we fit a one planet orbit model which explored eccentricities in the range 0 to 0.99 using the one planet Bayesian Kepler periodogram. In the first instance the data used was the 5 planet fit residuals for 55 Cancri. The data for 55 Cancri was a mixture of Lick and Keck observatory data. When the residual velocities were randomized the error associated with a particular velocity was shifted with its velocity because the quoted errors were very different for the two observatories. The red curve in the left panel of Figure 5 is the average of 5 different 55 Cancri randomized residuals trials. The green curve is the average of 4 trials of randomized residuals from a 2 planet 47 UMa model fit, and the blue curve the average of 8 trials of randomized residuals from a 3 planet 47 UMa model fit. All three curves are very similar and indicate a strong noise induced eccentricity bias towards high eccentricities.
To increase the chance of detecting and defining the parameters of low and moderate eccentricity orbits we have constructed an eccentricity noise bias correction filter from the reciprocal of the average of the three eccentricity bias curves just mentioned. The lower panel of Figure 5 shows the best fit polynomial (dashed curve) to the reciprocal of the mean of the three eccentricity bias curves (red points). After normalizing the best fit polynomial so the integral is equal to unity over the search range ( 0 to 0.99), we obtain the eccentricity noise bias correction filter (solid black curve). This becomes our second option for a choice of prior for eccentricity. The probability density function for this filter (solid black curve) is given by
On the basis of our understanding of the mechanism underlying the noise induced eccentricity bias, we expect the eccentricity prior filter to be generally applicable to searches for low amplitude orbital signals in other precision radial velocity data sets.
An obvious further question that remains to be explored is to what extent the observed distribution of published orbital eccentricities is influenced by such a bias.
4 Results (Case A)
For Case A, the dewar velocity offsets with respect to our reference dewar # 24 are assumed to be zero.
4.1 Parameter estimation
In this section we present the results of an exploration of the 47 UMa data with the multi-planet HMCMC Kepler periodogram starting with a one planet model and extending to a five planet model. The data for 47UMa is shown in Figure 6 panel (a). Panel (b) shows our final best 3 planet model fit compared to the data and panel (c) shows the residuals.
The one planet model turned up the 1080 day period which is clearly visible by eye in the raw data. We do not show any results for that model except to compute the marginal likelihood for model selection purposes which is presented in Section 4.2.
Figure 7 shows a plot of Log[Prior Likelihood] (upper) and period (lower) versus HMCMC iteration (every 200 point) for a 2 planet model. The starting periods of 4.7 and 1080 days are shown on the left hand side of the lower plot at a negative iteration number. The burn-in period of approximately 70,000 iterations is clearly discernable.
Figure 8 shows a plot of eccentricity versus period for a sample of the HMCMC parameter samples for the 2 planet model. Since the duration of the data set is only 7906 days, it is not surprising that uncertainties on the parameters of the second orbit are very large. On the basis of a 2 planet model, the parameters of the second planet are d and . It is clear that has a low eccentricty tail which reaches zero for a value of d. This agrees with the value of d found by Wittenmyer et al. (2009) in their best-fit 2-planet model where they fixed , the values proposed by Fischer et al. (2002).
Figure 9 shows plots of the 3 period parameters versus HMCMC iteration for a 3 planet model
Figure 11 shows a plot of eccentricity versus period for a sample of the HMCMC parameter samples for the 3 planet model. There is a large uncertainty in the eccentricity of the two largest periods which extends down to very low eccentricities.
Figure 12 shows the marginal probability distributions for the periods, eccentricities and K values for the three orbits found. The tenth plot is , the of the added white noise term. A summary of the 3 planet model parameters and their uncertainties are given in Table 4. The parameter value listed is the median of the marginal probability distribution for the parameter in question and the error bars identify the boundaries of the 68.3% credible region
|Parameter||planet 1||planet 2||planet 3|
|(JD - 2,440,000)|
The Gelman-Rubin (1992) statistic is typically used to test for convergence of the parameter distributions. In parallel tempering MCMC, new widely separated parameter values are passed up the line to the simulation and are occasionally accepted. Roughly every 100 iterations the simulation accepts a swap proposal from its neighboring simulation. The final simulation is thus an average of a very large number of independent simulations. What we have done is divide the iterations into 12 equal time intervals and inter-compared the 12 different essentially independent average distributions for each parameter using a Gelman-Rubin test. For all of the three planet model parameters the Gelman-Rubin statistic was .
Figure 13 shows a plot of eccentricity versus period for a 4 planet model. A well defined fourth period of days and eccentricity of was detected in repeated HMCMC trials. The amplitude was m s. The significance of this period is discussed further in Sections 4.2 and 6.
Finally, a 5 planet model was also attempted. In addition to the 4 periods found by the 4 planet model, a variety of probability peaks at other periods were observed but none were deemed significant.
As a test of our overall methodology, we simulated data for a 3 planet model based on the MAP values from the fit to the real data for the Case A analysis. The data was sampled at the real observation times and had added independent Gaussian noise with a , where is the quoted measurement error for the i point and , the extra noise parameter, was 4.4m s. Figure 14 shows a plot of eccentricity versus period for a sample of the HMCMC parameter samples for the 3 planet model fit to the simulated data set. Again, the starting period values for the HMCMC were 5, 20, and 100 days, a long way from the expected values. Comparison with Figure 11 indicates that the results for the actual data and 3 planet simulation are qualitatively very similar.
To test whether the fourth period in the Lick data (period = days) is a window function artifact of the sampling times, we analyzed two 3 planet simulations with a 4 planet model. In both cases the HMCMC found the 3 periods expected from the simulation. No well defined fourth period was found and the peak amplitude was m s compared with a m s for the real data set. This suggests that the fourth period is not simply a window function artifact. However, later HMCMC fits of a combination of Lick and Mcdonald Observatory data did not confirm this period.
4.2 Model selection
On of the great strengths of Bayesian analysis is the built-in Occam’s razor. More complicated models contain larger numbers of parameters and thus incur a larger Occam penalty, which is automatically incorporated in a Bayesian model selection analysis in a quantitative fashion (see for example, Gregory 2005a, p. 45). The analysis yields the relative probability of each of the models explored.
To compare the posterior probabilities of the i planet model to the one planet models we need to evaluate the odds ratio, , the ratio of the posterior probability of model to model . Application of Bayes’s theorem leads to,
where the first factor is the prior odds ratio, and the second factor is called the Bayes factor, . The Bayes factor is the ratio of the marginal (global) likelihoods of the models. The marginal likelihood for model is given by
Thus Bayesian model selection relies on the ratio of marginal likelihoods, not maximum likelihoods. The marginal likelihood is the weighted average of the conditional likelihood, weighted by the prior probability distribution of the model parameters and . This procedure is referred to as marginalization.
The marginal likelihood can be expressed as the product of the maximum likelihood and the Occam penalty (see Gregory & Loredo 1992 and Gregory 2005a, page 48). The Bayes factor will favor the more complicated model only if the maximum likelihood ratio is large enough to overcome this penalty. In the simple case of a single parameter with a uniform prior of width , and a centrally peaked likelihood function with characteristic width , the Occam factor is . If the data is useful then generally . For a model with parameters, each parameter will contribute a term to the overall Occam penalty. The Occam penalty depends not only on the number of parameters but also on the prior range of each parameter (prior to the current data set, ), as symbolized in this simplified discussion by . If two models have some parameters in common then the prior ranges for these parameters will cancel in the calculation of the Bayes factor. To make good use of Bayesian model selection, we need to fully specify priors that are independent of the current data . The sensitivity of the marginal likelihood to the prior range depends on the shape of the prior and is much greater for a uniform prior than a Jeffreys prior (e.g., see Gregory 2005a, page 61). In most instances we are not particularly interested in the Occam factor itself, but only in the relative probabilities of the competing models as expressed by the Bayes factors. Because the Occam factor arises automatically in the marginalization procedure, its effect will be present in any model selection calculation. Note: no Occam factors arise in parameter estimation problems. Parameter estimation can be viewed as model selection where the competing models have the same complexity so the Occam penalties are identical and cancel out.
The MCMC algorithm produces samples which are in proportion to the posterior probability distribution which is fine for parameter estimation but one needs the proportionality constant for estimating the model marginal likelihood. Clyde et al. (2006) recently reviewed the state of techniques for model selection from a statistics perspective and Ford & Gregory (2006) have evaluated the performance of a variety of marginal likelihood estimators in the extrasolar planet context.
Gregory (2007c), in the analysis of velocity data for HD 11964, compared the results from three marginal likelihood estimators: (a) parallel tempering, (b) ratio estimator, and (c) restricted Monte Carlo. Monte Carlo (MC) integration can be very inefficient in exploring the whole prior parameter range because it randomly samples the whole volume. The fraction of the prior volume of parameter space containing significant probability rapidly declines as the number of dimensions increases. For example, if the fractional volume with significant probability is 0.1 in one dimension then in 17 dimensions the fraction might be of order . In restricted MC integration (RMC) this should be much less of a problem because the volume of parameter space sampled is restricted to a region delineated by the outer borders of the marginal distributions of the parameters. For HD 11964, the three methods were compared for 1, 2 and 3 planet models. For the one planet model all three methods agreed within 15%. For the two planet model the three methods agreed within 28% with the RMC giving the lowest estimate. For the three planet model the estimates were very different. The RMC estimate was 16 time smaller than the PT estimate and the ratio estimator was 18 times larger than the PT estimate. The PT method is very compute intensive. For a three planet model 40 tempering levels and iterations were required. The problem becomes more difficult for larger numbers of planets. Thus for 3 or more planet models accurately computing the marginal likelihood is a very big challenge.
|Model||Periods||Marginal||Bayes factor||False Alarm||RMS residual|
|(d)||Likelihood||nominal||Probability||(m s||(m s)|
In this work we consider only RMC marginal likelihood estimates. This method is expected to underestimate the marginal likelihood in higher dimensions and this underestimate is expected to become worse the larger the number of model parameters, i.e. increasing number of planets. When we conclude, as we do, that the RMC computed odds in favor of the three planet model compared to the two planet model is we mean that the true odds is .
In earlier work, we defined the outer boundary of parameter space for RMC integration based on the 99% credible region. One problem is that if there is a significant contribution to the integral within say the 30% credible region, the volume in this region can be such a small fraction of the total that no random sample lands in that region. In this work we use a nested version of RMC integration. Multiple boundaries were constructed based on credible regions ranging from 30% to , as needed. We are then able to compute the contribution to the total RMC integral from each nested interval and sum these contributions. For example, for the interval between the 30% and 60% credible regions, we generate random parameter samples within the 60% region and reject any sample that falls within the 30% region. Using the remaining samples we can compute the contribution to the RMC integral from that interval.
The left panel of Figure 15 shows the contributions from the individual intervals for 5 repeats of the nested RMC evaluation for the 2 planet model. The right panel shows the summation of the individual contributions versus the volume of the credible region. The credible region listed as 9995% is defined as follows. Let and correspond to the upper and lower boundaries of the 99% credible region, respectively, for any of the parameters. Similarly, and are the upper and lower boundaries of the 95% credible region for the parameter. Then and . Similarly, .
Table 5 gives the Marginal likelihood estimates, Bayes factors and false alarm probabilities for 0, 1, 2, 3, and 4 planet models which are designated . The last two columns list the MAP value of extra noise parameter, , and the RMS residual. For each model the RMC calculation was repeated 5 times and the quoted errors give the spread in the results, not the standard deviation. The Bayes factors that appear in the third column are all calculated relative to model 2. Examination of a plot like the one shown in Figure 15, but for the 4 planet model, indicates that RMC is probably seriously underestimating the marginal likelihood. A better method of computing this quantity is sorely needed.
We can readily convert the Bayes factors to a Bayesian False Alarm Probability (FAP). For example, in the context of claiming the detection of a 3 planet model the FAP is the probability that there are actually 2 or less planets.
If we assume a priori (absence of the data) that the prob of 1 planet model = prob. of 2 planet model — etc., then probability of each model is related to the Bayes factors by
For the 3 planet model we obtain a very low FAP . The Bayesian false alarm probabilities for 1, 2, 3, and 4 planet models are given in the fourth column of Table 5.
In the context of claiming the detection of a 4 planet model the Bayesian false alarm probability is . This is very high and does not justify a claim for the detection of a fourth planet. The fourth period is also suspiciously close to one year to be of concern.
5 Results (Case B)
For Case B, we incorporated 4 additional parameters to allow for the unknown residual velocity offsets of dewars 6, 8, 39, and 18 relative to dewar 24. These are labeled , where the subscript denotes the detector dewar. In a Bayesian analysis these are treated as additional nuisance parameters which we can marginalize. Additionally, since they are of interest to the observers we also provide a summary of each residual offset parameter. In the radial velocity data processing pipeline every effort was made to insure the dewar velocity offsets were allowed for so the residuals are expected to be small. For the Case B analysis we have assumed a Gaussian prior for each centered on zero with a km s.
5.1 Parameter estimation
In this section we redo the analysis of the 47 UMa data with the multi-planet HMCMC Kepler periodogram starting with a one planet model and extending to a four planet model. The data is same as shown in Figure 6 panel (a) with the exception of the first point corresponding to dewar 1.
Figure 16 shows a plot of eccentricity versus period for a sample of the HMCMC parameter samples for the 2 planet model for Case B. The two planet model again favors a second period in the range 8100-15000d (68% credible region) with a long higher eccentricity tail extending to much longer periods. In Case B the time base is 235 days shorter than Case A so the lower eccentricity/lower period end is less well defined but otherwise there is general agreement. This model was run twice using different starting periods but the two planet HMCMC run did not favor a period around 2240 days even when the two starting periods used were 1078 and 2240 days, respectively. This is not that surprising given the relative sizes of the values for planets 2 and 3 in Table 4.
Figure 17 shows a plot of eccentricity versus period for a sample of the HMCMC parameter samples for the 3 planet model for Case B. Again, we see the emergence of a period of days and the third longer period appears better defined (compared to the 2 planet model) and extends to much lower eccentricities. Qualitatively, there is general agreement with the Case A results shown in Figure 11.
The dewar residual offset velocities were = , = , = , and = m s.
The 4 planet HMCMC analysis again showed a clear fourth period of with an eccentricity of . We did not compute the marginal likelihood for the 4 planet model but based on the Case A results the Bayesian false alarm probability for a 4 planet model is expected to be very high.
5.2 Model selection (Case B)
We repeated the Bayesian false alarm probability for the 3 planet model as described in Section 4.2 for the Case B analysis which incorporates the dewar residual offset parameters.
The computed Bayes factors are . This gives a false alarm probability of . Even though this is much greater than the value found in Case A it still argues strongly for favoring a 3 planet model.
On the basis of the model selection results we can conclude there is strong evidence for three planets although the longest period orbital parameters are still not well defined. The results for the Lick only analysis do not rule out low eccentricity orbits for all three planets. The major difference produced by including the dewar residual offset parameters was to reduce the Bayesian false alarm probability for a 3 planet model from to to . A significant part of this reduction might be a consequence of the reduced span of the data set by 235 days for the Case B analysis.
Our results appear to be entirely consistent with the latest analysis of Wittenmyer et al. (2009). Their best-fit 2-planet model now calls for days. They note that to fit a second planet, they fixed the parameters and at the values proposed by Fischer et al. (2002): e2 = 0.005 and . In our Case A two planet fit (Figure 8), in which all parameters were free, the eccentricity versus period plot exhibits a low eccentricity tail which occurs at a period between and days, directly comparable to their 9660 day period. The day period in the Lick data only shows up in our 3 planet and higher models. This is probably because the longer period signal with a m s dominates over the day period signal with a m s (see Table 6). Wittenmyer et al. (2009) did not report any results on fitting a 3 planet model.
To test this further we combined the Lick data with the Wittenmyer et al. (2009) data from the 9.2 m Hobby-Eberly Telescope (HET) and 2.7 m Harlam J. Smith (HJS) telescopes of the McDonald Observatory. We subtracted initial offset velocities of 23.3 and 25.4 m s based on a comparison of plots of the HET and HJS data sets to the Lick data. We then included a free parameter for each telescope to allow for an unknown residual velocity offset compared with the Lick dewar 24 in the same way as we had done for the other Lick dewars in Case B.
Figure 18 shows a plot of eccentricity versus period for our 3 planet HMCMC fit to the combined data set. The three starting periods used for the HMCMC run were 10, 1078, & 6000 days. The residual velocity offset parameters for the HET and HJS telescopes were and m s, respectively. It is clear from the figure that the same three periods appear as before but with the extra data the results now favor low eccentricity orbits for all three periods. This is a particularly pleasing result as low eccentricity orbits are more likely to exhibit long term stability than high eccentricity orbits.
The preference for low eccentrcity orbits is more apparent in the marginal distributions shown in Figure 19.
Our final orbital parameters are summarized in Table 6 together with the residual offset velocities and the extra noise term . Again, the parameter value listed is the median of the marginal probability distribution for the parameter in question and the error bars identify the boundaries of the 68.3% credible region. The value immediately below in parenthesis is the MAP value, the value at the maximum of the joint posterior probability distribution.
|Parameter||planet 1||planet 2||planet 3|
|(JD - 2,440,000)|
|(m s)||(m s)|
|(m s)||(m s)|
|(m s)||(m s)|
The final period phase plots are shown in Figure 20. The top left panel shows the data and model fit versus 1078 day orbital phase after removing the effects of the two other orbital periods. The red and green curves are the mean HMCMC model fit standard deviation and mean model fit standard deviation, respectively. The dashed curve is the mean HMCMC fit. The other two panels correspond to phase plot for the other two periods. In each panel the quoted period is the mode of the marginal distribution. The and phase coverage for the combined HET and HJS data (not shown) is not sufficient to warrant a fully independent search for these two periods.
HMCMC fits of a 4 planet model to the combined Lick, HET and HJS data set failed to detect a well defined fourth period casting doubt on the validity of the day period detected in the Lick only data. Even though this period was well defined in the Lick only data, the Bayesian false alarm probability of is much too high to warrant any claim of significance. The period is also suspiciously close to one year and might be an artefact of the data reduction.
6.1 Eccentricity bias
In Section 3.1 we showed that HMCMC periodogram peaks exhibit a well defined statistical bias towards high eccentricity in the absence of a real periodic signal. To mimic a circular velocity orbit the noise points need to be correlated over a larger fraction of the orbit than they do to mimic a highly eccentric orbit. For this reason it is more likely that noise will give rise to spurious highly eccentric orbits than low eccentricity orbits. Is there a similar or stronger bias when there is a real periodic signal? Based on the above explanation of the bias we would expect noise to conspire to increase the eccentricity of detected periodogram peaks associated with the real periodic signals. Our expectation is that the importance of this bias will be dependent on the strength of the signal and possibly on the number of observed periods
Figure 21 shows a plot of eccentricity versus period for the simulation. The starting period values for the HMCMC were 5, 20, and 100 days, a long way from the expected values. Again, all three simulated periods were detected and the preferred eccentricities are all close to zero but with significant tails extending to higher eccentricity. Based on this test there does not appear to be any clear additional eccentricity bias operating. The fact that the real Lick data alone favor (in Case A and Case B) somewhat larger eccentricities for and suggests there may be something else present in the real data, possible some low level systematic effect or other real signals. In this regard, the eccentricity of the longer period was considerably higher in the 2 planet models than when allowance was made for an additional period in the 3 planet models.
In this paper, we have demonstrated that a Bayesian adaptive hybrid MCMC (HMCMC) analysis of a challenging data set has helped clarify the number of planets present in 47UMa. HMCMC integrates the advantages of parallel tempering, simulated annealing and the genetic algorithm. Each of these techniques was designed to facilitate the detection of a global minimum in . Combining all three in an adaptive hybrid MCMC greatly increase the probability of realizing this goal. The adaptive Bayesian hybrid MCMC is very general and can be applied to many different nonlinear modeling problems. It has been implemented in gridMathematica on an 8 core PC. The increase in a speed for the parallel implementation is a factor 6.6. When applied to the Kepler problem it corresponds to a multi-planet Kepler periodogram which is ideally suited for detecting signals that are consistent with Kepler’s laws. However, it is more than a periodogram because it also provides full marginal posterior distributions for all the orbital parameters that can be extracted from radial velocity data. The execution time for a 1 planet blind fit (7 parameters) is iterations per hr. The program scales linearly with the number of parameters being fit.
The 47UMa data has been analyzed using 1, 2, 3, 4, and 5 planet models. On the basis of the model selection results we can conclude there is strong evidence for three planets based on a Bayesian false alarm probability of , however, the longest period orbital parameters are still not well defined. The measured periods (based on the combined data set) are , , and d, and the corresponding eccentricities are , , and . The results favor low eccentricity orbits for all three. Note: the longer time base of the full Lick data set favors a value for at the lower end of the 68% credible region of days. Assuming the three signals (each one consistent with a Keplerian orbit) are caused by planets, the corresponding limits on planetary mass () and semi-major axis are
(, ), and (, ), respectively. Based on our three planet model results, the remaining unaccounted for noise (stellar jitter) is m s.
A four planet model fit to the Lick data yielded a well defined fourth period of days and eccentricity of , but the combined data set did not yield a well defined fourth period. Even though this period was well defined in the Lick only data, the Bayesian false alarm probability of is much too high to warrant any claim of significance. The period is also suspiciously close to one year and might be an artefact of the data reduction.
The velocities of model fit residuals were randomized in multiple trials and processed using a one planet version of the HMCMC Kepler periodogram. In this situation periodogram probability peaks are purely the result of the effective noise. The orbits corresponding to these noise induced periodogram peaks exhibit a well defined statistical bias towards high eccentricity. We have characterized this eccentricity bias and designed a correction filter that can be used as an alternate prior for eccentricity to enhance the detection of planetary orbits of low or moderate eccentricity. On the basis of our understanding of the mechanism underlying the eccentricity bias, we expect the eccentricity prior filter to be generally applicable to searches for low amplitude orbital signals in other precision radial velocity data sets.
One of us (Gregory) would like to thank Wolfram Research for providing a complementary license to run gridMathematica that was used in this research. D.A.F. acknowledges research support from NASA grant NNX08AF42G. The authors thank Matt Giguere for calculating dewar offsets and the many student observers who have contributed to the 47 UMa data set over the years. They particularly thank Geoff Marcy and R. Paul Butler who began the Lick planet survey in 1989.
- pagerange: A Bayesian Periodogram Finds Evidence for Three Planets in 47 Ursae Majoris–References
- pubyear: 2010
- Following on from the pioneering work on Bayesian periodograms by Jaynes (1987) and Bretthorst (1988)
- Since the prior lower limits for and include zero, we used a modified Jeffreys prior of the form (4) For , behaves like a uniform prior and for it behaves like a Jeffreys prior. The term in the denominator ensures that the prior is normalized in the interval 0 to .
- In the absence of detailed knowledge of the sampling distribution for the extra noise, we pick a Gaussian because for any given finite noise variance it is the distribution with the largest uncertainty as measured by the entropy, i.e., the maximum entropy distribution (Jaynes 1957, Gregory 2005a section 8.7.4.)
- Note: the HMCMC runs shown here used the eccentricity prior based on the eccentricity noise bias correction filter discussed in Section 3.1. The results obtained using a uniform eccentricity prior are qualitatively the same.
- In practice, the probability density for any parameter is represented by a finite list of values representing the probability in discrete intervals . A simple way to compute the 68.3% credible region, in the case of a marginal with a single peak, is to sort the values in descending order and then sum the values until they approximate 68%, keeping track of the upper and lower boundaries of this region as the summation proceeds.
- This will be the subject of a future investigation.
- Bretthorst, G. L., 1988, Bayesian Spectrum Analysis and Parameter Estimation, New York: Springer-Verlag
- Butler, R. P. & Marcy, G. W. 1996, ApJ, 464, L153
- Cumming, A., 2004, MNRAS, 354, 1165
- Cumming, A., Dragomir, D., 2010, MNRAS, 401, 1029
- Clyde, M. A., Berger, J. O., Bullard, F., Ford, E. B., Jeffreys, W. H., Luo, R., Paulo, R., Loredo, T., 2006, in ‘Statistical Challenges in Modern Astronomy IV,’ G. J. Babu and E. D. Feigelson (eds.), ASP Conf. Ser., 371, 224
- Fischer,D. A., Marcy, G. W., Butler, R. P., Laughlin, G. L., and Vogt, S. S., 2002, ApJ, 564, 1028
- Ford, E. B., 2005, AJ, 129, 1706
- Ford, E. B., 2006, ApJ, 620, 481
- Ford, E. B., & Gregory, P. C., 2006, in ‘Statistical Challenges in Modern Astronomy IV,’ G. J. Babu and E. D. Feigelson (eds.), ASP Conf. Ser., 371, 189
- Gelman, A., & Rubin, D. B., 1992, Statistical Science 7, 457
- Gregory, P. C., and Loredo, T. J., 1992, ApJ, 398, 146
- Gregory, P. C., 2005a, ‘Bayesian Logical Data Analysis for the Physical Sciences: A Comparative approach with Mathematica Support’, Cambridge University Press
- Gregory, P. C., 2005b, ApJ, 631, 1198
- Gregory, P. C.,2005c, in ‘Bayesian Inference and Maximum Entropy Methods in Science and Engineering’, San Jose, eds. A. E. Abbas, R. D. Morris, J. P.Castle, AIP Conference Proceedings, 803, 139
- Gregory, P. C., 2007a, MNRAS, 374, 1321
- Gregory, P. C., 2007b, in ‘Bayesian Inference and Maximum Entropy Methods in Science and Engineering: 27th International Workshop’, Saratoga Springs, eds. K. H. Knuth, A. Caticha, J. L. Center, A,Ġiffin, C. C. Rodríguez, AIP Conference Proceedings, 954, 307
- Gregory, P. C., 2007c, MNRAS, 381, 1607
- Gregory, P. C., 2009, JSM Proceedings, Denver, American Statistical Association, arXiv:0902.2014v1 [astro-ph.EP]
- Jaynes, E. T., 1957, Stanford University Microwave Laboratory Report 421, Reprinted in ‘Maximum Entropy and Bayesian Methods in Science and Engineering’, G. J. Erickson and C. R. Smith, eds, (1988) Dordrecht: Kluwer Academic Press, p.1
- Jaynes, E.T. (1987), ‘Bayesian Spectrum & Chirp Analysis,’ in Maximum Entropy and Bayesian Spectral Analysis and Estimation Problems, ed. C.R. Smith and G.L. Erickson, D. Reidel, Dordrecht, p. 1
- Loredo, T., 2004, in ‘Bayesian Inference And Maximum Entropy Methods in Science and Engineering: 23rd International Workshop’, G.J. Erickson & Y. Zhai, eds, AIP Conf. Proc. 707, 330 (astro-ph/0409386)
- Loredo, T. L. and Chernoff, D., 2003, in ‘Statistical Challenges in Modern Astronomy III’, E. D. Feigelson and G. J. Babu (eds) , Springer, New York, p. 57
- Naef, D.,Mayor,M., Beuzit, J. L., Perrier, C., Queloz, D., Sivan, J. P., & Udry, S. 2004, A&A, 414, 351
- Roberts, G. O., Gelman, A. and Gilks, W. R., 1997, Annals of Applied Probability, 7, 110
- Saar, S. H., & Donahue, R. A., 1997, ApJ, 485, 319
- Saar, S. H., Butler, R. P., & Marcy, G. W. 1998, ApJ, 498, L153
- Takeda, G., Ford, E. B., Sills, A., Rasio, F. A., Fischer, D. A., & Valenti, J. A. 2007, ApJS, 168, 297
- Wittenmyer, R., Endl, M. & Cochran, W., 2007, ApJ, 654, 625
- Wittenmyer, R. A., Endl, M., Cochran, W. D., Levison, H. F., Henry, G. W., 2009, ApJS, 182, 97
- Wright, J. T., 2005, PASP, 117, 657
- Udry, S., Bonfils, X., Delfosse, X., Forveille, T., Mayor, M., Perrier, C., Bouchy, F., Lovis, C., Pepe, F., Queloz, D., and Bertaux, J.-L., 2007, A&A, 469, L43