The Joker: A custom Monte Carlo sampler
for binarystar and exoplanet radial velocity data
Abstract
Given sparse or lowquality radialvelocity measurements of a star, there are often many qualitatively different stellar or exoplanet companion orbit models that are consistent with the data. The consequent multimodality of the likelihood function leads to extremely challenging search, optimization, and MCMC posterior sampling over the orbital parameters. Here we create a custom Monte Carlo sampler for sparse or noisy radialvelocity measurements of twobody systems that can produce posterior samples for orbital parameters even when the likelihood function is poorly behaved. The six standard orbital parameters for a binary system can be split into four nonlinear parameters (period, eccentricity, argument of pericenter, phase) and two linear parameters (velocity amplitude, barycenter velocity). We capitalize on this by building a sampling method in which we densely sample the prior pdf in the nonlinear parameters and perform rejection sampling using a likelihood function marginalized over the linear parameters. With sparse or uninformative data, the sampling obtained by this rejection sampling is generally multimodal and dense. With informative data, the sampling becomes effectively unimodal but too sparse: in these cases we follow the rejection sampling with standard MCMC. The method produces correct samplings in orbital parameters for data that include as few as three epochs. The Joker can therefore be used to produce proper samplings of multimodal pdfs, which are still informative and can be used in hierarchical (population) modeling. We give some examples that show how the posterior pdf depends sensitively on the number and time coverage of the observations and their uncertainties.
cbblueHTML3182bd \definecolormahoganyRGB165,15,21
Section 1 Introduction
Precise radialvelocity measurements of stars have transformed astrophysics in the last decades. They have permitted the discovery and confirmation of companions (planetary, stellar, and otherwise) orbiting thousands of stars. Radial velocity surveys hold the promise of delivering the full population statistics for binary and trinary star systems (for example, Raghavan et al. 2010; Tokovinin 2014; Troup et al. 2016).
With large stellar spectroscopic surveys operating or under construction, we expect to have good quality spectra for millions of stars in the next few years. Most of these surveys have at least some targets—and many have many targets—that are observed multiple times (e.g., Majewski et al. 2015). Whether as a primary or secondary goal of their observing strategies, these surveys can generate discoveries of planetary, substellar, and stellar companions. These discoveries, in turn, will feed population inferences, followup programs, and projects to refine precise stellar models.
However, when radialvelocity observations are not designed with unambiguous detection and discovery in mind, there are usually multiple possible starcompanion (orbit) models that are consistent with any modest number of radialvelocity measurements that show stellar acceleration. That is, a small number of radial velocity measurements—even when the uncertainties are small—will lead to posterior beliefs about companion orbits and masses that put substantial plausibility onto multiple qualitatively different solutions. This is then reflected in a likelihood function that is highly multimodal in the relevant parameter spaces (e.g., Keplerian orbital parameters). While multimodal orbit likelihoods may be frustrating when studying individual systems of particular interest, extensive sets of such likelihoods can be powerful constraints for hierarchical modeling, inferring, e.g., the characteristics of the binary star, or exoplanet population.
While the problem has of course been recognized for a long time, there are currently no methods that explore these highly multimodal distributions with good guarantees of converged samplings (but see Gregory 2005; Brewer & Donovan 2015). Converged, independent samplings are essential to the jobs of delivering correct posterior samplings and reliable probabilistic statements about detection and characterization. With general Markov Chain Monte Carlo (MCMC) methods, the returned samplings are inherently correlated and must therefore be run longer than the autocorrelation time. However, computing the autocorrelation time (a twopoint statistic of a chain) is not simple to compute and can be misleading, especially when the target distribution is highly multimodal with widely separated modes.
Here we present a path to address this problem. Our approach is to build a custom posterior sampling method that capitalizes on the structure of the binarystar (or starexoplanet) kinematics to generate robust samplings of multiple solutions at manageable computational cost. In what follows, we use the term “binary” for any system with an observed source (the primary, e.g., a star), whose radial velocity variations are explained through gravitational twobody interactions with another object (the companion, e.g., star, exoplanet, stellar remnant); i.e. we restrict our analysis to singleline spectroscopic binary systems.
The structure of this Article is as follows: We lay out our assumptions and implementation approach, and demonstrate that we have a method that is reliable and essentially always correct under those assumptions. We perform experiments with the method to understand its properties and limitations. We then discuss the astrophysical applicability and potential of the method. We finish by describing the changes we would have to make if we weakened our assumptions, or if we don’t weaken our assumptions but they indeed prove to be far from correct.
Section 2 Assumptions and method
In order to set up a wellposed problem and build a path to a definite solution, we make a set of sensible, but nontrivial assumptions about the stellar systems we will use and observations thereof. Ultimately, we assume that the radial velocity curve of a singleline spectroscopic binary system can be specified by six parameters (Kepler 1609). Here we adopt a parameterization for this problem that is similar to Murray & Correia 2010: , , , , , and , which are the period, eccentricity, pericenter phase and argument, velocity semiamplitude, and the barycenter velocity. As is wellknown, this does not fully specify the binary system itself, because of the inclination () and mass function degeneracies. To proceed, we assume the following:

We have measurements of the radial velocity of a star, and that the time dependence of the expectation of that radial velocity is well described by the gravitational orbit of a pair of point masses (the twobody problem). We assume that the uncertainties in the observation epochs are negligible, and specified in an inertial frame (for example, SolarSystem barycentric Julian date). We only consider the case of a singleline spectroscopic binary system, where we do not have measurements of the (fainter) companion’s projected orbit.

Each star has exactly one companion, and that the radialvelocity measurements are not contaminated by nor affected by any other bodies. Our analysis allows the effective mass of the exactlyone companion to go to zero with finite probability; this encompasses the case of no companion.

The noise contributions to individual radialvelocity measurements are well described as samples from zeromean normal (Gaussian) distributions with correctly known variances convolved with a zeromean normal distribution with an additional “jitter” variance, . We assume that there are no outliers beyond this flexible noise model.

We can put particular, fairly sensible but not highly restrictive, prior pdfs on all the orbital parameters, as we describe below.
Each of these assumptions could be challenged: in particular we expect some stars to have additional companions, and we expect there to be outliers and unaccounted sources of noise. We will return to these assumptions, and the consequences of relaxing them, in Section 4.
The radial velocity at time is then given by (see also Equation 63 in Murray & Correia 2010)
(1) 
where the represents the free parameters, is the true anomaly given by
(2) 
and the eccentric anomaly, , must be solved for with the mean anomaly, ,
(3)  
(4) 
Of these parameters, four (, , , ) are nonlinearly related to the radialvelocity expectation, and two (, ) are linearly related. We additionally allow that the radial velocity curve of any star has an overall jitter, , to vary to partially account for imperfect knowledge of the radial velocity uncertainties and any intrinsic radialvelocity scatter; the jitter must also be treated as a nonlinear parameter.
With such a parameterization, the problem is then to construct the posterior pdf for these seven parameters, accounting for the fact that this pdf may have very complex, multimodal structure. Fundamentally, the method we describe and demonstrate here is to perform rejection sampling in the nonlinear parameters, but with analytic marginalization over the two linear parameters. The method capitalizes on the unique problem structure:

There are both linear and nonlinear parameters, and we can treat them differently; in particular, it is possible to analytically marginalize out the linear parameters (provided that the noise model is wellbehaved and the prior pdf is conjugate).

There is a finite, timesamplingimposed minimum size or resolution—in the period—of any features in the likelihood function. That is, there cannot be arbitrarily narrow modes in the multimodal posterior pdf.
The method described above is a specific case of rejectionsampling (von Neumann 1951) in which we densely sample (generate many samples with typical spacing smaller than the timesampling imposed resolution) from the prior probability density function (prior pdf) and use the likelihood evaluated at these samples as the rejection scalar. In detail, the rejection step works as follows:

For each sample in the prior pdf sampling of the four nonlinear parameters, there is a (linear, not logarithmic) marginal likelihood value (a probability for the data given the nonlinear parameters).

There is a maximum value that is the largest value of found across all of the samples in the prior sampling.

For each sample , choose a random number between 0 and

Reject the sample if .

The number of samples that survive the rejection is (hereafter) .
Note that this algorithm is guaranteed to produce at least one surviving sample; of course if only one sample survives (or any very small number), the sampling is not guaranteed to be fair. That said, if the original sampling of the prior pdf is dense enough that many survive the rejection step, the surviving samples do, by construction, constitute a fair, uncorrelated sampling from the posterior pdf.
Our prior pdf in the nonlinear parameters is very straightforward:
(5)  
(6)  
(7)  
(8)  
(9) 
where is the uniform distribution over the domain , is the normal distribution with mean and variance , and the prior pdf over eccentricity is the Beta distribution with , (Kipping, 2013). To simplify the marginalization integrals below, we assume that the priors over the linear parameters (, ) are very broad and Gaussian—or at least very flat over the range of relevance—and that they do not depend on the nonlinear parameters in any way (which is a substantial restriction; see Section 4). Of course, we may actually have stronger prior beliefs about the systemic velocity of the binary, (i.e. we may want to impose that it belong to the Galactic disk, or use a mixture model for the different kinematic components of the Galaxy). Using a more informative prior would require only a minimal change to the method.
In Section 3 or in the subsections specific to the different experiments we specify the values for hyperparameters , , , and . In practice, the choice of and can be tuned appropriately depending on knowledge about intrinsic variability of the source or suspicions about the reported uncertainties. We sample the prior pdf directly and explicitly with standard numpy.random calls (Van der Walt et al. 2011). In practice, we are usually required to take around samples (indeed, a quarter billion samples) from the priorpdf to produce sufficient final samplings; in each experiment below, we state the total number of prior samples generated and the number of surviving samples.
The unmarginalized likelihood function is
(10) 
where indexes the individual data points , is the radial velocity prediction at time given the orbital parameters , the datapoint times are the , and is the Gaussian noise variance for data point . Note that the form of this likelihood function is fully specified by the assumptions, given above.
We rejectionsample, however, using a marginalized likelihood, where we analytically marginalize out the linear parameters (, ). We construct an design matrix consisting of a column of unit predictions (given the nonlinear parameters) and a column of ones. We perform standard linear leastsquare fitting with this design matrix to obtain the bestfit values for the two linear parameters, and the standard linearfitting covariance matrix for their uncertainties. With these, we can construct—for each prior sample—the marginalized likelihood
(11) 
where the prediction is taken at the bestfit values of the linear parameters given sample of the nonlinear parameters, and the logdeterminant term () accounts for the volume in the marginalization integral. These are used in the rejection sampling algorithm described above.
There are three possible outcomes of this rejection sampling, based on two thresholds: We set a minimum number of samples . We also set a period resolution , with set to the median period across the surviving samples and set to the epoch span of the data. This is an expansion of the period resolution expected from an informationtheory (sampling theorems) perspective: for a periodic signal with frequency observed over a window with size , the smallest resolvable frequency differences will be , corresponding to period differences of . The three possible outcomes are:

samples survive the rejection. In this case, we are done.

samples survive the rejection, and these samples have a rootvariance (rms) in the period parameter that is smaller than (i.e. they give no indication of period ambiguity). In this case we assume that the posterior pdf is effectively unimodal, and we use the surviving samples (or sample) to initialize a MCMC sampling using the emcee package (ForemanMackey et al. 2013).

samples survive the rejection, and these samples span a period range larger than . In this case, we iterate the rejectionsampling procedure: We generate new priorpdf samplings and rejection sample until the number of surviving samples is larger than . This is expensive.
When we trigger the initialization and operation of emcee, we do the following:

Randomly generate sets of parameters (linear and nonlinear parameters) in a small, Gaussian ball around the highest sample from the rejection sampling.

Run emcee on this ensemble (with walkers) for steps (this number is arbitrary and can be tuned if the walkers converge much faster or slower).

Take the final state of the element ensemble as an independent sampling of the posterior pdf.
This procedure ensures that no matter what path we take, we end up with at least samples from the posterior for any input data.
Within the initial assumptions, this procedure almost inevitably results in a correct sampling of the parameter pdf. This is ensured by the density of the prior sampling in the nonlinear parameters, and also borne out in the numerical experiments in the following Section.
Section 3 Experiments and results
In what follows, we use (1) simulated data with known properties and (2) actual spectroscopic data from Data Release 13 (DR13; SDSS Collaboration et al. 2016) of the Apache Point Observatory Galactic Evolution Experiment (APOGEE; Majewski et al. 2015) in a series of experiments that demonstrate the reliability and utility of The Joker. APOGEE is one of the four subsurveys of the Sloan Digital Sky SurveyIII (SDSSIII; Eisenstein et al. 2011) and utilized a new infrared spectrograph to obtain moderateresolution, Hband spectra for over 160,000 stars throughout the Galaxy. From these spectra, highprecision radial velocities, chemical abundances, and stellar parameters have been derived and released as a part of DR13 (Holtzman et al. 2015; Nidever et al. 2015).
As a part of the observing strategy of APOGEE, most stars are observed multiple times and binned by day into “visit” spectra. Though a typical star is only observed a few times, (1) at least one pair of visits are separated by one month or longer, and (2) thousands of stars have been observed more than 10 times (for a more detailed look at the cadence and number of visits for APOGEE stars, see Figure 1 in Troup et al. 2016). Radial velocities (and stellar parameters) are derived for each of the visit spectra, affording a sparse and sporadic timedomain sampling of the radial velocity variations of most stars in the survey. This timedomain information was recently used to identify a sample of candidate stellar and substellar companions to APOGEE stars (Troup et al. 2016).
This search was conducted after data quality and data quantity cuts that were designed to keep the number of data points larger than the number of parameters in the model. In their case, the model parameters are six Keplerian orbital parameters plus a longterm (linear) velocity trend (seven in total). Under this criterion, stars with fewer than eight visits were eliminated from further consideration, leaving 15,000 stars. For each of the remaining stars, the radial velocity curves were searched for significant periods, which were then used to initialize fits of a Keplerian orbit using a custom leastsquares fitter (MPRVFIT; De Lee et al. 2013). In cases where multiple significant periods were found, the parameters obtained from the fit with the best modified value were retained (modulo a number of other considerations described in Section 3.3.4 of Troup et al. 2016).
The complexity of this pipeline and logic needed to identify a single optimal set of orbital parameters from a set of solutions highlights the fact that the likelihood function for a Keplerian orbit model is generically multimodal. When there are few data points or poor phase coverage this is especially true. While useful for searching for new candidate binaries, such a pipeline does not easily fit within hierarchical probabilistic modeling of the population of companions.
3.1 Experiment 1: Validation with simulated data
As a first demonstration, we generate fake radial velocity observations (with uncertainties) that are consistent with our assumptions (Section 2), then sample from the posterior pdf over orbital parameters with The Joker. The eccentricity, period, and velocity semiamplitude are chosen to be broadly consistent with the typical substellar companion found in recent analysis of the APOGEE data (Troup et al. 2016), the angle parameters (, ) are sampled from a uniform distribution, and the barycenter velocity is drawn from a zeromean Gaussian with variance . The values of the parameters used for the simulated data (i.e. the truth) are: , , , , , . We randomly sample five observation epochs uniformly over the interval imagining a 3year survey with no observing strategy and arbitrarily set the survey start date (in barycentric MJD) to be 55555. The radial velocity measurement uncertainties are drawn from a uniform distribution over the interval , motivated by current APOGEE radial velocity uncertainties.
We perform two samplings as a validation of The Joker: (a) we fix the jitter parameter , i.e. we assume the uncertainties are known perfectly and there is no intrinsic scatter, and (b) we sample over the jitter parameter as well, setting (note that these are dimensionless because they set the scale of a Gaussian in ). We start by generating prior samples over the nonlinear parameters with a period domain of (see Section 2) and use these same prior samples for both runs. For (a), 678 samples pass our rejection sampling step, and for (b), 580 samples survive.
Figure 1 shows the simulated data (black points) along with the true orbit (dashed, green line) and orbits computed from samples from the posterior pdf (gray lines) for the sampling with fixed jitter (top panel) and the sampling including jitter as a nonlinear parameter (bottom panel). Because we use the same prior samples in each, these look almost identical. Figures 2–3 show all projections of the posterior samples (gray points) and the true, input parameters (green lines and markers) for the case with fixed jitter and the case where jitter is treated as a free parameter. The surviving samples in each case look very similar, as expected. The typical uncertainties for these simulated data are ; for values of (where most are concentrated), this extra jitter is negligible compared to the uncertainties. Note that samples above are mostly rejected, indicating that, as constructed, the uncertainties are purely Gaussian and known.
In both cases, the modes in periodspace are narrow with a variety of separations, as can be seen in the radialvelocity curves plotted in the top panel. For small numbers of precise observations with poor phasecoverage, the posterior pdf over orbital parameters is extremely complex and structured, but we are still able to generate samples using The Joker that capture the complexity. It is obvious that these multimodal samples are essentially useless when trying to understand one particular system at hand. Yet, they rule out almost all of the parameter space encompassed by the prior distribution: even a few radial velocity measurements are manifestly very informative.
3.2 Experiment 2: Underestimated uncertainties
Besides the number and spacing of the observation epochs, the precision of the individual radial velocity measurements matters. Less precise data, compared to the radial velocity amplitude, will admit more qualitatively different orbital solutions. The structure and complexity of the posterior pdf (at fixed jitter) will therefore depend strongly on knowing the measurement uncertainties and the intrinsic velocity variability of the system. We illustrate this using another simulated radial velocity curve with lower signaltonoise. We show that for underestimated uncertainties, the posterior pdf over orbital parameters can look wellconstrained, but may be discrepant with the true orbital parameters. Finally, we show that by simultaneously sampling over an unknown extra variance in the data (the jitter), we can account for additional, unaccounted sources of noise.
For this experiment, we use the following parameter values to generate the simulated data: , , , , , . We again uniformly sample five observation epochs over the interval and use radial velocity measurement uncertainties drawn from a uniform distribution over the interval (the median true uncertainty is ).
When running The Joker for this experiment, we consider three cases: (a) the uncertainties are (correctly) known and jitter is fixed and ignored (), (b) the uncertainties are underestimated by a factor of 8 and jitter is fixed and ignored (), and (c) the uncertainties are underestimated by a factor of 8 and we treat the jitter as a free parameter. For each case, we again generate prior samples over the nonlinear parameters with a period domain of and reuse these prior samples for each case. In case (c), we generate samples in the jitter by setting —here we are assuming that we have some suspicion about the true magnitude of the uncertainties.
Figure 4 shows the simulated data and presumed uncertainties (black points), the true orbit (dashed, green line), and orbits computed from samples from the posterior pdf (gray lines) for the three different samplings. In all cases, a wide variety of orbit solutions is permitted by the data. Note that far fewer modes are present in the middle panel (case (b)), when the uncertainties are underestimated. Figures 5–7 show the corresponding corner plots of the parameter pdf’s for these three cases. For this data set, when the uncertainties are correctly known, the posterior pdf is highly multimodal (case (a)). When the uncertainties are underestimated, the posterior pdf has fewer modes, but the true orbital parameters do not appear consistent with any of the strongest modes (case (b)). When the uncertainties are (severely) underestimated but the jitter is allowed to vary (case (c)), the model prefers solutions with a finite jitter comparable to the input true uncertainties (). We have therefore shown that The Joker is useful even when uncertainties are underestimated or the intrinsic velocity variability of a system is unknown.
3.3 Experiment 3: Varying the number of data points
When the phase coverage of the radial velocity observations is good and the number of observation epochs is large, the posterior pdf over orbital parameters effectively becomes unimodal. Under these conditions, The Joker is of course a very inefficient sampler for this problem and will return very few samples (as few as one). As we have seen in the previous experiments, when the number of data points is small or the uncertainties are large, the posterior pdf is generally multimodal. In this experiment we explore the dependence of the posterior pdf’s complexity on the number of observation epochs by generating radial velocity curves with initially 11 epochs. We use the following parameter values to generate the simulated data: , , , , , . After running The Joker with the full 11 observations, we successively remove 2 data points and rerun the sampling until we are left with three observations epochs as input data (a total of five consecutive runs).
Specifically, we again generate prior samples over the nonlinear parameters with a period domain of and reuse these prior samples for each subsampling of the data. We fix the jitter to and assume that the uncertainties are known. Figure 8 shows the simulated data and orbits computed from posterior samplings. Starting from the top of Figure 8 with the full set of 11 data points, each panel below has two epochs fewer than the previous. The data used for the pdf sampling shown in a given panel are plotted as black circles and the number of data points used in each panel, , are indicated. As described in Section 2, when the number of surviving samples after rejection sampling, we either (1) initialize emcee using the remaining sample(s) if the periods of the surviving sample(s) are sufficiently close, or (2) rerun The Joker with a new set of prior samples until we have at least samples from the posterior pdf. In all panels, 128 orbits computed from the posterior samples are shown.
The structure in the posterior samples in one projection of the posterior pdf (period and eccentricity) is shown in the right panels of Figure 8. For the cases with nine and 11 data points, the posterior pdf appears to be unimodal. Multiple modes first appear when and the posterior pdf become more structured in further subsamplings of the data until ultimately forming a harmonic series of modes in the final case of . It is worth emphasizing that even for the case with 3 data points, of the prior samples pass the rejection step: even 3 radial velocity observations are informative!
3.4 Experiment 4: Real data for a known binary
For a more realistic application of The Joker, we choose an APOGEE target with a previously identified companion (2M00110648+6609349) but with few radial velocity measurements (Troup et al. 2016). Figure 9 shows radial velocity data for the star (black points). Similar to Experiment 1, these data are sparse and have poor phase coverage. However, this epoch sampling is quite different and is representative of realistic survey design choices.
We again generate prior samples over the nonlinear parameters with a period domain of and with , of which 22,313 samples pass our rejection sampling step. Overplotted as gray lines on Figure 9 are 256 orbits computed from these samples. Already from visualizing these orbits, it becomes clear that there are at least a few distinct period modes, and a wide variety of eccentricities, represented in the posterior sampling.
Figure 10 shows projections of all posterior samples in different parameter combinations. Here it is clear there are at least three period modes: the dominant mode at is broadly consistent (but not coincident) with the previously measured period (Troup et al. 2016); but other modes are clearly present at shorter periods with low eccentricity and longer periods with higher eccentricity. Interestingly, we also find that the model prefers having a finite jitter around , which would imply that the effective radial velocity uncertainties might be underestimated by a factor of .
3.5 Experiment 5: Prospects for observation planning
A noticeable difference between the and panels in Figure 8 is that the posterior pdf collapses significantly between these cases (from 20 period modes to 3): This implies that the two added observations are extremely informative. Inverting this idea, it also suggests that we can use The Joker to (1) predict the observation time that maximally collapses the posterior pdf for a previously measured source, and (2) for an expected population of sources, we can identify the optimal sampling pattern to maximize discovery or characterization of the sources. We will explore these ideas in detail in future work, but here we simply show that the timing of subsequent observations can lead to very different structure in the posterior samples.
We again simulate a data set of four noisy radial velocity measurements, shown as black points in the topleft panel of Figure 11. We use the following parameter values to generate the simulated data: , , , , , . Uncertainties were chosen to match the APOGEE data () and are shown as error bars, but they are often comparable to or smaller than the marker size. The topright panel shows posterior samples produced with The Joker again in the space of logperiod, , and eccentricity, . The three lower rows all have six observation epochs: the same four from the top row, but now with two additional observations spaced, in phase, by but with a different starting phase for the new observations. As is shown by the right panels, the observation times of the new observations can greatly effect the compactness of the posterior pdf. In particular, the placement of the observations in the bottom row of panels rules out most of the longperiod modes from the top panels, and many of the shortperiod modes, whereas for the middle two cases the new data are not as informative.
Section 4 Discussion
We have built a Monte Carlo sampler—The Joker—to draw samples from the full posterior pdf over orbital parameters for singlecompanion systems, given a set of multiepoch, singleline radial velocity measurements. The Joker has important properties that differ from other sampling methods: (1) It produces independent samplings even when the likelihood (and hence posterior pdf) is highly multimodal; (2) the method is based on Simple Monte Carlo, in some sense a pure bruteforce method, which parallelizes trivially; and (3) the samplings are guaranteed to be correct under the sensible assumptions presented here, without the need for convergence or other diagnostic checks. If the pdf is effectively unimodal, The Joker tells us that the solution is unique. If the pdf is multimodal, The Joker captures all relevant different solutions.
Our experiments show that The Joker can be used for discovery and characterization of stellar binaries or exoplanets, even with the presence of unrecognized or unaccounted noise contributions. However, for exoplanets, we emphasize that while The Joker could in principle be used for any exoplanet system, the simplicity of our noise model and singlecompanion assumption strongly suggest that it will be most useful in the study of massive exoplanets.
Perhaps the primary innovation The Joker brings is a separation of the parameters into linear and nonlinear subsets. The brute force sampling is only required in the nonlinear subspace, aiding computational feasibility. We further capitalize on the problem structure by identifying effectively unimodal and effectively multimodal posterior pdfs using the minimum possible width of a likelihood peak in the period direction, given the time sampling.
Interestingly, as we show in Section 3.3, even very sparse samplings of the radialvelocity history of a star provide highly informative posterior pdfs. The bottomright panel in Figure 8 shows a highly multimodal posterior pdf. Nonetheless, the vast majority of prior pdf samples have been eliminated, and only a tiny subset of periods, eccentricities, and amplitudes are consistent with the data. These posterior pdfs may look bewilderingly complex, but they can contribute extremely valuable information to any hierarchical inference, or provide a very informative prior pdf for further observing campaigns.
Indeed, The Joker can be used to generate inputs for a hierarchical inference. In previous work (Hogg et al. 2010b; ForemanMackey et al. 2014) we have shown that posterior samplings under an interim prior can be importancesampled with a hierarchical inference to generate posterior beliefs about the full population. These hierarchical inferences are the only population inferences that properly propagate nontrivial uncertainties at the individualsystem level to the conclusions at the population level (see Mandel et al. 2011; Strader et al. 2004; Brewer et al. 2013, 2014 for other examples of hierarchical inference in astronomy).
In the experiments above we have used massive prior samplings, starting with samples before rejection sampling. When the data are sparse or have a low signaltonoise (e.g., bottom panels in Figure 8), many prior samples pass the rejectionsampling step: If the goal is to learn about an individual system and the data are poor, many fewer prior samples can be used to initialize The Joker. In this limit, generating a set number of posterior samples is very fast because of the easy parallelization of the likelihood calls. The same is true when the data are highquality and the samples that pass the rejection step will be used to initialize an MCMC sampling. A large prior sampling is needed when (a) many posterior pdf samples are needed for hierarchical inference, or (b) the data are of intermediate quality (the exact definition of which is problemspecific). The Joker is most valuable in the lowtointermediate quality range, especially when the samples will be used for hierarchical inference, when a small but converged sampling is needed for many (thousands to millions) of sources.
The Joker should also be valuable in observation planning, or cadence evaluation, or survey strategy: As Section 3.5 shows, The Joker could be used to plan the times of next observations to maximize their expected information content (see also Loredo 2004; Ford 2004). That is beyond the scope of this Article and will be explored in future work.
The Joker is based on a set of assumptions, itemized in Section 2. The method delivers correct samplings when these specific assumptions hold. Of course, these specific assumptions do not hold sufficiently well!
Astrophysically, we know that a star’s radialvelocity history need not be set entirely by a single companion, with no other perturbers or sources of radialvelocity signal. Although the singlecompanion assumption is a severe assumption, it is prettymuch required for the method to be tractable. Of course, in reality, it is likely that many stars reside in higherorder multiplets. Sampling over orbital parameters even for two companions, however, is already intractable as the nonlinear parameter space jumps to eightdimensional, and (at least) tendimensional if there are companion–companion interactions. This would be absolutely intractable to sample by bruteforce Simple Monte Carlo; our advice would be to switch to some kind of Markov Chain method that deals as well as possible with multimodal posteriors, such as nested sampling (Skilling 2004; Brewer et al. 2009). This change would be associated with the loss of the simple convergence criterion that the rejection sampling provides: If lots of samples survive the rejection step, the posterior has been sampled independently! There is no comparably simple way of determining that any nested sampling is converged.
That said, there is a simple body problem that can be solved tractably: For systems with one shortperiod companion and very longperiod companion(s), The Joker can be easily extended to include additional linear parameters that allow longperiod velocity trends that are, e.g., polynomial in time; these additional parameters don’t worsen the prior pdf sampling (which happens over the nonlinear parameters only). These extra linear parameters could alternatively include extra terms that come in when, say, the data come from a set of different radialvelocity programs with different calibrations (as is the case for the recent, impressive Proxima b discovery; AngladaEscudé et al. 2016).
At a crucial practical level, the assumption of Gaussian noise and perfectly known noise variances is often violated. Here, introducing the jitter as an explicit fitting parameter should help to mitigate The Joker’s sensitivity to unknown noise sources. Nonetheless, presuming we understand the noise properties, may still be the most problematic assumption made by The Joker: Essentially all data sets show occasional outliers (catastrophic errors). There is nothing we can do simply here, if we want to capitalize on treating the linear and nonlinear parameters separately. To deal with very rare outliers (such that no star would be likely to suffer from more than one), one possible modification would be to do all leaveoneout samplings, take the union, and then importance sample the results using some ratio of the mixture of leaveoneout Gaussian likelihoods to some more realistic likelihood that involves an outlier model (as, for example, we suggest in Hogg et al. 2010a). Such a modification to the method is beyond the scope of this Article but not beyond the scope of our ambitions.
References
 AngladaEscudé et al. (2016) AngladaEscudé, G., Amado, P. J., Barnes, J., et al. 2016, Nature, 536, 437
 Astropy Collaboration et al. (2013) Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al. 2013, A&A, 558, A33
 Brewer & Donovan (2015) Brewer, B. J., & Donovan, C. P. 2015, MNRAS, 448, 3206
 Brewer et al. (2013) Brewer, B. J., ForemanMackey, D., & Hogg, D. W. 2013, AJ, 146, 7
 Brewer et al. (2014) Brewer, B. J., Marshall, P. J., Auger, M. W., et al. 2014, MNRAS, 437, 1950
 Brewer et al. (2009) Brewer, B. J., Pártay, L. B., & Csányi, G. 2009, ArXiv eprints, arXiv:0912.2380
 De Lee et al. (2013) De Lee, N., Ge, J., Crepp, J. R., et al. 2013, AJ, 145, 155
 Eisenstein et al. (2011) Eisenstein, D. J., Weinberg, D. H., Agol, E., et al. 2011, AJ, 142, 72
 Ford (2004) Ford, E. B. 2004, PASP, 116, 1083
 ForemanMackey et al. (2013) ForemanMackey, D., Hogg, D. W., Lang, D., & Goodman, J. 2013, PASP, 125, 306
 ForemanMackey et al. (2014) ForemanMackey, D., Hogg, D. W., & Morton, T. D. 2014, ApJ, 795, 64
 Gregory (2005) Gregory, P. C. 2005, ApJ, 631, 1198
 Hogg et al. (2010a) Hogg, D. W., Bovy, J., & Lang, D. 2010a, ArXiv eprints, arXiv:1008.4686
 Hogg et al. (2010b) Hogg, D. W., Myers, A. D., & Bovy, J. 2010b, ApJ, 725, 2166
 Holtzman et al. (2015) Holtzman, J. A., Shetrone, M., Johnson, J. A., et al. 2015, AJ, 150, 148
 Hunter (2007) Hunter, J. D. 2007, Computing In Science & Engineering, 9, 90
 Kepler (1609) Kepler, J. 1609, Astronomia nova.
 Kipping (2013) Kipping, D. M. 2013, MNRAS, 434, L51
 Loredo (2004) Loredo, T. J. 2004, in American Institute of Physics Conference Series, Vol. 707, Bayesian Inference and Maximum Entropy Methods in Science and Engineering, ed. G. J. Erickson & Y. Zhai, 330–346
 Majewski et al. (2015) Majewski, S. R., Schiavon, R. P., Frinchaboy, P. M., et al. 2015, ArXiv eprints, arXiv:1509.05420
 Mandel et al. (2011) Mandel, K. S., Narayan, G., & Kirshner, R. P. 2011, ApJ, 731, 120
 Murray & Correia (2010) Murray, C. D., & Correia, A. C. M. 2010, Keplerian Orbits and Dynamics of Exoplanets, ed. S. Seager, 15–23
 Nidever et al. (2015) Nidever, D. L., Holtzman, J. A., Allende Prieto, C., et al. 2015, AJ, 150, 173
 Pérez & Granger (2007) Pérez, F., & Granger, B. E. 2007, Computing in Science and Engineering, 9, 21
 PriceWhelan & Hogg (2017) PriceWhelan, A., & Hogg, D. W. 2017, adrn/thejoker: Release v0.1, , , doi:10.5281/zenodo.264481
 Raghavan et al. (2010) Raghavan, D., McAlister, H. A., Henry, T. J., et al. 2010, ApJS, 190, 1
 SDSS Collaboration et al. (2016) SDSS Collaboration, Albareti, F. D., Allende Prieto, C., et al. 2016, ArXiv eprints, arXiv:1608.02013
 Skilling (2004) Skilling, J. 2004, in American Institute of Physics Conference Series, Vol. 735, American Institute of Physics Conference Series, ed. R. Fischer, R. Preuss, & U. V. Toussaint, 395–405
 Strader et al. (2004) Strader, J., Brodie, J. P., & Forbes, D. A. 2004, AJ, 127, 3431
 Tokovinin (2014) Tokovinin, A. 2014, AJ, 147, 86
 Towns et al. (2014) Towns, J., Cockerill, T., Dahan, M., et al. 2014, Computing in Science and Engineering, 16, 62
 Troup et al. (2016) Troup, N. W., Nidever, D. L., De Lee, N., et al. 2016, AJ, 151, 85
 Van der Walt et al. (2011) Van der Walt, S., Colbert, S. C., & Varoquaux, G. 2011, Computing in Science & Engineering, 13, 22
 von Neumann (1951) von Neumann, J. 1951, J. Res. Nat. Bur. Stand., 12, 36