Fast and Easy Exoplanet Transit Modelling in Python
We present a fast and user friendly exoplanet transit light curve modelling package PyTransit, implementing optimised versions of the Giménez (2006) and Mandel & Agol (2002) transit models. The package offers an object-oriented Python interface to access the two models implemented natively in Fortran with OpenMP parallelisation. A partial OpenCL version of the quadratic Mandel-Agol model is also included for GPU-accelerated computations. The aim of PyTransit is to facilitate the analysis of photometric time series of exoplanet transits consisting of hundreds of thousands of datapoints, and of multi-passband transit light curves from spectrophotometric observations, as a part of a researcherâs programming toolkit for building complex, problem-specific, analyses.
keywords:Methods: numerical–Techniques: photometric–Planets and satellites
The rapid increase in computational power during the last decades has allowed for the adoption of increasingly robust statistical methods in the analysis of astrophysical data. Specially, combining a fully Bayesian approach to inference with Markov Chain Monte Carlo (MCMC) sampling for the posterior estimation has allowed for improved characterisation of model parameter uncertainties in parameter estimation problems, while the adoption of Bayesian model selection has given us the tools to robustly judge between many competing hypotheses aiming to explain our observational data. The increased flexibility and robustness come with a price. While the methods allow us to work with complex models with a high number of dimensions (free parameters), increasing dimensionality quickly increases the number of likelihood evaluations required for the analysis. Further, while the computers keep getting faster, also the size of the observational datasets (and the reserachers’ ambitions) keep growing thanks to the advancements in instrumentation and observation techniques.
In the analysis of photometric times series of exoplanet transits (transit light curves), the size and complexity of the observational datasets has increased due to the introduction of space-based telescopes CoRoT and Kepler, observing potentially hundreds of individual transits for a single transiting planet;111And, while the Kepler long-cadence (LC) mode produces a relatively small number of exposures per transit, the modelling of long cadence data requires model supersampling to account for the extended integration time (Kipping, 2010). due to introduction of lucky-imaging techniques allowing for very high time resolution observations; and due to the maturing of spectrophotometry as a transit observation method.
The space-based telescopes and lucky-imaging cameras produce light curves with tens to hundreds of thousands exposures, while the ground-based spectrophotometric observations of individual transits yield a smaller number of exposures, but with an additional dimension (number of passbands extracted from the observed spectra) to our time series. Both the stellar limb darkening and the planetary radius vary as a function of the wavelength coverage of the passband, and in typical cases the dimensionality of the parameter space increases by 3-5 free parameters per passband (radius ratio, at least two limb darkening parameters, assuming we are not overly reliant on the theoretical limb darkening coefficients, and 1-2 parameters to model the baseline flux). This increase in dimensionality leads to a significant increase in the number of likelihood evaluations needed to obtain a representative posterior sample using MCMC techniques, and, thus, an analysis of a single spectrophotometric transit can require equal amounts of computation resources as an analysis of a Kepler light curve covering hundreds of transits.
The transit shape model forms the core of the forward model in the transit light curve analysis. The model describes the dependence of the observed flux as a function of stellar limb darkening, planet-star radius ratio, and the planet-star distance (from centre to centre, expressed in stellar radii.) The field of transit light curve modelling is largely dominated by two analytical approaches: a set of transit models for different stellar limb darkening parametrisations by Mandel & Agol (2002), and a versatile series-expansion-based model presented by Giménez (2006). Both the Giménez and Mandel-Agol models (henceforth G and MA models, respectively) allow the most computationally intensive calculations to be factored out from the effects due to limb darkening, accelerating the calculation of multiple simultaneously observed passbands with different limb darkening coefficients significantly.
Here we present a Python package offering a straight-forward way to access the Giménez (2006) and Mandel & Agol (2002) transit models. The package has been used for transit light curve modelling in nine peer-reviewed publications over four years, and can be considered production ready. The models are implemented in Fortran 2003 (based on the original FORTRAN77 implementations by the respective authors) with OpenMP multithreading and model-specific optimisations aimed to minimise the model evaluation cost. Both models can be computed exactly, or using interpolation for improved speed, and a partial OpenCL implementation of the interpolated quadratic Mandel-Agol model is included for GPU computing. The package includes the necessary utility routines to calculate circular and elliptic orbits (using either Newton’s method, iteration, or two series approximations), transit durations, eclipse centres, etc., and offers a simple interface combining the orbit and the transit model computations that selects the most appropriate orbit calculation routine based on the eccentricity. Examples and tutorials on using the code are included in the package and online.222See the notebooks directory from https://github.com/hpparvi/PyTransit for IPython notebook examples, and https://github.com/hpparvi/exo_tutorials for more in-depth tutorials on exoplanet characterisation in general.
While the MA and G models are the two most commonly used approaches for transit modelling, both more generic and specialised modelling tools exists. Abubekerov & Gostev (2013) have derived analytical expressions for the transit shape (and its derivatives, also derived by Pál (2008) for the quadratic MA model) for a wider set of limb-darkening models than offered by Mandel & Agol, and offer an example implementation written in C. The JKTEBOP package by Southworth (2008) offers a versatile numerical approach to transit modelling where both the host star and the planet are modelled as biaxial spheroids. This goes beyond basic transit shape model, allowing for the modelling of the reflection and ellipsoidal effects as a function of orbital phase. Barnes (2009) has introduced a numerical approach for modelling transits over rapidly-rotating stars with significant gravity-darkening due to stellar oblateness. The transits over rapidly rotating stars can be use to probe for spin-orbit misalignment, since misaligned orbits will show asymmetric transits. Finally, Pál (2012) have considered the problem of modelling mutual transits in multi-planet systems, something the MA or G models cannot be used for directly.
PyTransit aims to offer a Pythonic access to the tools for one part of the planet characterisation problem: the modelling of the flux decrement due to an occulting planet as a function planet-star distance, planet-star radius ratio and stellar limb darkening. Several other transit modelling packages offer similar functionality with their own advantages and limitations. EXOFAST by Eastman et al. (2013) is an IDL library for transit and radial velocity modelling.333With a web-front-end at http://astroutils.astronomy.ohio-state.edu/exofast/exofast.shtml The authors gain significant improvements in the evaluation speed of the quadratic MA model by swapping the original method for computing the elliptic integral of the third kind with a faster one. However, the use of IDL, while still relatively popular in astrophysics, limits the package’s adoptability.444However, Eastman et al. also offer Python and Fortran implementations of their faster MA model. The Transit Analysis Package (TAP, Gazak et al., 2012) is an IDL transit modelling package with a graphical user interface. The package implements the wavelet based likelihood function by Carter & Winn (2009) that accounts for correlated noise (of very specific statistical characteristics) in the photometry, improving the robustness of the parameter estimates. Finally, the latest update of PlanetPack (Baluev, 2014) has included transit modelling using the Abubekerov & Gostev transit model and several correlated noise models. PlanetPack is a command-line program written in C++, and thus its use has the pros and cons of a an analysis approach based on a single monolithic program. However, this is alleviated by its open-source nature combined with its independence from proprietary packages and languages.
PyTransit advocates the toolkit-based approach where the analysis code is constructed using a set of tools best suited for the problem at hand. This is similar to EXOFAST (although the scope of the package is significantly narrower), and lower-level than what offered by TAP and PlanetPack. The approach offers significant flexibility what comes to working with different MCMC samplers, optimisers, implementing noise models, etc., but it also requires the end-user to have a slightly higher level of experience than what is required by the off-the-shelf analysis packages.
2 The Transit Models
2.1 The Giménez Model
where stands for the fractional loss of light due to the transiting planet, is the star-planet radius ratio, and is the projected star-planet distance divided by the stellar radius. The functions are described as
where are factors that depend only on limb darkening coefficients, , and , and the functions are expressed in terms of Jacobi polynomials, as
where , and is the gamma function.
The accuracy of the Giménez model is determined by , the number of Jacobi polynomials used in Eq. 3. Figure 1 shows the model’s maximum and mean absolute deviations from the exact solution for as a function of . The figure is only illustrative, since the exact deviations depends on the limb darkening and the spanned -values (but the dependency on limb darkening is small relative to the dependency on .) Relatively small (40) can be found to be sufficient for modelling ground-based observations. Also, a small can be first used to first obtain a quick rough solution, which can then be refined by increasing .
2.1.2 Limb darkening
The Giménez model produces transit light curves that follow a general limb darkening law,
where stands for the order of the model, is the n limb darkening coefficient, , and is the foreshortening angle (the angle between the surface normal and the line of sight.) The general law equals to linear limb darkening for ,
and to the quadratic limb darkening law for ,
where the quadratic law coefficients () are transformed to the general law coefficients as and .
2.1.3 Optimisation: Simultaneous Multiple Passbands
The limb darkening enters into the Giménez model as coefficients
where do not depend on or . The coefficients are cheap to compute, and by first calculating the in Eq. 2, the model can be evaluated with low computational cost for different sets of limb darkening coefficient vectors as
This is beneficial for the analysis of spectrophotometric data, since we do not need to evaluate the full model for each separate passband.
2.1.4 Optimisation: Precomputing the Model Coefficients
A quick look at the Eq. 3 shows that the computationally most expensive parts of the equation do not depend on or , but only on the depth of the expansion and the number of limb darkening coefficients. If we abbreviate
Eq. 3 can be written as
and can be precomputed into a two-dimensional lookup table given the number of limb darkening coefficients and the depth of the series expansion in the beginning of the computations.
Next, the Jacobi polynomials can be calculated using recursion as
where the coefficients also depend only on the expansion depth and number of limb darkening coefficients, and can be precomputed into another two-dimensional lookup table.
After and have been precomputed, the evaluation of the Giménez model for a given radius ratio, projected distance, and limb darkening coefficients requires only summations and multiplications, making the computation of the model for a large number of points fast.
2.2 The Quadratic Mandel-Agol Model
Mandel & Agol (2002) introduced a set of analytical transit light curve models for several different limb darkening laws, of which PyTransit implements the uniform and quadratic model. The flux, , for a transit over a stellar disk with quadratic limb darkening is
where is the radius ratio, is the projected distance, , and are the quadratic limb darkening coefficients, and , , and are functions that depend on and as defined in Mandel & Agol (2002).
2.2.2 Optimisation: Simultaneous Multiple Passbands
As with the Giménez model, the effects from limb darkening are factored out from the most expensive computations (those of , , and ), which, again, allows for efficient computation of multiple simultaneous passbands with different limb darkening coefficients.
2.2.3 Optimisation: Precomputing Interpolation Tables
The functions , and in Eq. 13 can be precomputed into two-dimensional interpolation tables spanning and ranges based on the prior set on . The model evaluation can now be done by first interpolating the values of the three functions, followed by the summations, multiplications, and one division (two of the divisions can be replaced with multiplications) in Eq. 13.
The maximum absolute error (here defined as the deviation from the non-interpolated model) for the interpolated MA model using the default values ( and ) for (that is, we have an uniform prior on the radius ratio from to ) is 4 ppm, and the average absolute deviation over the whole transit is 0.05 ppm. Both of these values are well below the limits that can be achieved Kepler or CoRoT. Decreasing has a smaller effect on the introduced error than decreasing , and even for yields a maximum absolute error of 8 ppm.
The transit models are implemented in Fortran 2003 based on the original FORTRAN77 implementations by Giménez and Mandel & Agol. The shared-memory parallelisation is carried out using OpenMP. The Python package offers easy-to-use Python classes wrapping the Fortran models with a common interface for both models. The Python interface offers automatic model supersampling given the number of subsamples and integration time, and model interpolation in -space is implemented for the Giménez model (the use of interpolation tables for the quadratic MA model makes any gains of -space interpolation insignificant.)
3.2 Model Supersampling
A single point in a photometric time series corresponds to an integration of the flux over the exposure time. If the changes in the observed signal are small compared to the noise level over a single exposure, we can approximate this integration with a single value evaluated at the centre of the exposure. However, this approximation fails when the exposure time is long enough to integrate over modelled features—such as with Kepler’s long time cadence mode (Kipping, 2010)—and also the model needs to be integrated over the exposure.
PyTransit offers transit model supersampling to account for long exposure times given the number of subsamples and the exposure time. The model is evaluated for evenly distributed subsamples inside each exposure, where is given by the user based on exposure time and transit duration, and each light curve point corresponds to the mean of the subsamples.
3.3 Giménez Model Interpolation in z-Space
While the optimisations described in Sec. 2.1.4 makes the Giménez model computationally efficient, simple model interpolation in -space can offer a notable speedup with light curves containing hundreds of thousands of points.
The code implements an alternative interpolated mode for the evaluation of the Giménez model, where the transit model is first evaluated for points for and points for , and the light curve points are interpolated from these tabulated values. The maximum absolute error (again, the difference between the interpolated and non-interpolated models) for the default grid size is 25 ppm, and the average error is 2 ppm.
3.4 Partial OpenCL Acceleration of the MA model
The package implements an OpenCL version of the interpolated quadratic Mandel-Agol model, where the interpolation of , , and is offloaded to the GPU. Given the computational simplicity of bilinear interpolation used, the overheads from memory transfer to and from the GPU dominate the evaluation time, making the model evaluation in GPU significantly slower than in CPU for small light curves. For basic usage, the OpenCL accelerated model is faster than the multithreaded Fortran implementation for light curves with points. However, significant speedups can be reached with smaller light curves if both the and likelihood calculations are also offloaded to the GPU to minimise the memory transfer.
3.5 Applicability to Transmission Spectroscopy
Both the Giménez and Mandel-Agol models allow for efficient evaluation for multiple simultaneously observed passbands, where the differences in the transit shape (and observed depth) reflect the differences in the stellar limb darkening in each passband. However, this does nothing to include the effects from the variations in the effective radius ratio , the parameter of interest when carrying out transmission spectroscopy.
The effects on the transit depth by varying are included in PyTransit by assuming that the relative changes in the effective radius ratio are a small fraction of its average value. Now, the changes in affect only the transit depth, and the changes in other observables (the duration and shape of the transit) are below observation limits. Thus, given a set of radius ratios, , the model is evaluated using the average radius ratio, , and then multiplied for each passband by a correction factor .
We benchmark the model performance using two setups:
Intel Linux desktop (64 bit Ubuntu Precise), Intel i7-3770 (4 cores at 3.4 GHz), GFortran 4.6.3, optimisation flags -Ofast -march=native.
AMD Linux desktop (64 bit Ubuntu Trusty), AMD FX-8350 (8 cores at 2.8 GHz), GFortran 4.9, optimisation flags -Ofast -march=native.
With the exception of threading, the performance scales equivalently for both setups (the absolute performance is also comparable when considering the difference in the clock rates), and we show the results only for the first setup.
Figure 2 shows the absolute model evaluation times as a function of the number of light curve points for the directly evaluated and interpolated transit models. The modelled light curves have 35% of in-transit points and 65% of out-of-transit points, corresponding to a typical observation setup. The interpolation for the Giménez model is carried out using linear interpolation in -space, while the interpolated Mandel-Agol model uses bilinear interpolation with the two-dimensional interpolation tables for , and . The quadratic Mandel-Agol model is significantly faster than the two-coefficient Giménez model, but increasing the number of limb darkening coefficients does not affect the performance of the Giménez model significantly. The OpenCL version of the interpolated Mandel-Agol model is slower than the Fortran version for small light curves due to memory transfer, but this can be alleviated by offloading also the rest of the computations to the GPU.
Figure 3 shows the model evaluation times per passband relative to the evaluation time for a single passband. The speedup from calculating all passbands simultaneously when working with spectrophotometric data is obvious.
Figure 4 shows the evaluation time as a function of OpenMP threads for the setup with an Intel i7 processor. The multithreading scaling is slightly different for the two setups: the optimal performance is obtained with three threads for the Intel setup and six threads for the AMD setup (not shown).
5 Conclusions and Discussion
We have described a Python package offering optimised versions of the Giménez and Mandel-Agol transit models. The package is freely available from github
and under continuing development. The Fortran routines are also directly usable as Fortran modules. The package comes with IPython notebook examples showing the use and features of the code. More in-depth tutorials covering exoplanet characterisation from transit light curves can be found from
again implemented as IPython notebooks.
Interpolation can be used to speed up both of the models, and the performance gain is especially significant for the MA model. The errors introduced by the interpolation are small, but systematic, and error evaluation is recommended if extreme precision ( for the MA model, for the G model) is required. The maximum deviations for the interpolated MA mode occur at the end of ingress and the beginning of egress.
The two transit models have different advantages: the quadratic Mandel-Agol model is significantly faster than the two-coefficient Giménez model, but the Giménez model allows for higher-order limb darkening. Other Mandel-Agol models with higher-order limb darkening may be implemented in the future, but the flexibility of the Giménez model makes this a low-priority task.
The author warmly thanks H. Deeg, J.A. Belmonte and S. Aigrain for their comments and helpful discussion, and the anonymous referee for the constructive comments. The development of the PyTransit package was supported by RoPACS, a Marie Curie Initial Training Network funded by the European Commissionâs Seventh Framework Programme; by Väisälä Foundation through the Finnish Academy of Science and Letters; and by the Leverhulme Research Project grant RPG-2012-661.
- Abubekerov & Gostev (2013) Abubekerov M. K., Gostev N. Y., 2013, MNRAS, 432, 2216
- Baluev (2014) Baluev R. V., 2014, arXiv:1410.1327
- Barnes (2009) Barnes J., 2009, ApJ, 705, 683
- Carter & Winn (2009) Carter J. a., Winn J. N., 2009, ApJ, 704, 51
- Eastman et al. (2013) Eastman J., Gaudi B. S., Agol E., 2013, PASP, 125, 83
- Gandolfi et al. (2015) Gandolfi D., Parviainen H., Deeg H. J., Lanza a. F., Fridlund M., Moroni P. G. P., Alonso R., Augusteijn T., Cabrera J., Evans T., Geier S., Hatzes a. P., Holczer T., Hoyer S., Kangas T., Mazeh T., Pagano I., Tal-Or L., Tingley B., 2015, A&A, 576, A11
- Gandolfi et al. (2013) Gandolfi D., Parviainen H., Fridlund M., Hatzes A. P., Deeg H. J., Frasca A., Lanza A. F., Prada Moroni P. G., Tognelli E., McQuillan A., Aigrain S., Alonso R., Antoci V., Cabrera J., Carone L., Csizmadia S., Djupvik A. A., Guenther E. W., Jessen-Hansen J., Ofir A., Telting J., 2013, A&A, 557, A74
- Gazak et al. (2012) Gazak J. Z., Johnson J. A., Tonry J., Dragomir D., Eastman J., Mann A. W., Agol E., 2012, Adv. Astron., 2012, 1
- Giménez (2006) Giménez A., 2006, A&A, 450, 1231
- Giménez (2007) Giménez A., 2007, A&A, 474, 1049
- Kipping (2010) Kipping D. M., 2010, MNRAS, 408, 1758
- Kopal (1977) Kopal Z., 1977, Astrophys. Space Sci., 50, 225
- Mandel & Agol (2002) Mandel K., Agol E., 2002, ApJ, 580, L171
- Murgas et al. (2012) Murgas F., Pallé E., Cabrera-Lavers A., Colón K. D., Martín E. L., Parviainen H., 2012, A&A, 544, A41
- Murgas et al. (2014) Murgas F., Pallé E., Zapatero Osorio M. R., Nortmann L., Hoyer S., Cabrera-Lavers A., 2014, A&A, 563, A41
- Pál (2008) Pál A., 2008, MNRAS, 390, 281
- Pál (2012) Pál A., 2012, MNRAS, 420, 1630
- Parviainen et al. (2013) Parviainen H., Deeg H. J., Belmonte J. A., 2013, A&A, 550, A67
- Parviainen et al. (2014) Parviainen H., Gandolfi D., Deleuil M. e. a., 2014, A&A, 562, A140
- Rouan et al. (2012) Rouan D., Parviainen H., Moutou C. e. a., 2012, A&A, 537, A54
- Southworth (2008) Southworth J., 2008, MNRAS, 386, 1644
- Tingley et al. (2011) Tingley B., Palle E., Parviainen H., Deeg H. J., Zapatero Osorio M. R., Cabrera-Lavers A., Belmonte J. a., Rodriguez P. M., Murgas F., Ribas I., 2011, A&A, 536, L9
- Tingley et al. (2014) Tingley B., Parviainen H., Gandolfi D., Deeg H. J., Palle E., Montañés Rodriguez P., Murgas F., Alonso R., Bruntt H., Fridlund M., 2014, A&A, 567