Kepler-4b through Kepler-8b

# An Independent Analysis of Kepler-4b through Kepler-8b

## Abstract

We present two independent, homogeneous, global analyses of the transit lightcurves, radial velocities and spectroscopy of Kepler-4b, Kepler-5b, Kepler-6b, Kepler-7b and Kepler-8b, with numerous differences over the previous methods. These include: i) improved decorrelated parameter fitting set used, ii) new limb darkening coefficients, iii) time stamps modified to BJD for consistency with RV data, iv) two different methods for compensating for the long integration time of Kepler LC data, v) best-fit secondary eclipse depths and excluded upper limits, vi) fitted mid-transit times, durations, depths and baseline fluxes for individual transits. We make several determinations not found in the discovery papers: i) We detect a secondary eclipse for Kepler-7b of depth  ppm and statistical significance 3.5-. We conclude reflected light is a much more plausible origin than thermal emission and determine a geometric albedo of . ii) We find that an eccentric orbit model for the Neptune-mass planet Kepler-4b is detected at the 2- level with . If confirmed, this would place Kepler-4b in a similar category as GJ 436b and HAT-P-11b as an eccentric, Neptune-mass planet. iii) We find weak evidence for a secondary eclipse in Kepler-5b of 2- significance and depth  ppm. The most plausible explanation is reflected light caused by a planet of geometric albedo . iv) A 2.6- peak in Kepler-6b TTV periodogram is detected and is not easily explained as an aliased frequency. We find that mean-motion resonant perturbers, non-resonant perturbers and a companion extrasolar moon all provide inadequate explanations for this signal and the most likely source is stellar rotation. v) We find different impact parameters relative to the discovery papers in most cases, but generally self-consistent when compared to the two methods employed here. vi) We constrain the presence of mean motion resonant planets for all five planets through an analysis of the mid-transit times. vii) We constrain the presence of extrasolar moons for all five planets. viii) We constrain the presence of Trojans for all five planets.

planetary systems — stars: individual (Kepler-4, Kepler-5, Kepler-6, Kepler-7, Kepler-8) techniques: spectroscopic, photometric
2

## 1. Introduction

The Kepler Mission was successfully launched on March 7th 2009 and began science operations on May 12th of the same year. Designed to detect Earth-like transits around Sun-like stars, the required photometric precision is at the level of  ppm over 6.5 hours integration on 12 magnitude stars and early results indicate this impressive precision is being reached (Borucki et al., 2009). In 2010, the first five transiting exoplanets (TEPs) to be discovered by the Kepler Mission were announced by the Kepler Science Team (Borucki et al., 2010a), known as Kepler-4b, Kepler-5b, Kepler-6b, Kepler-7b and Kepler-8b (Borucki et al. (2010b); Koch et al. (2010); Dunham et al. (2010); Latham et al. (2010); Jenkins et al. (2010a)). These expanded the sample of known transiting exoplanets to about 75 at the time of announcement.

The main objective of the Kepler Mission is to discover Earth-like planets, but the instrument naturally offers a vast array of other science opportunities including detection of gas giants, searches for thermal emission (Deming et al., 2005) and/or reflection from exoplanets, detection of orbital phase curves (Knutson et al., 2007) and ellipsoidal variations (Welsh et al., 2010), asteroseismology (Christensen-Dalsgaard et al., 2010) and transit timing (Agol et al., 2005), to name a few. Confirmation and follow-up of exoplanet transits is known to be a resource intensive activity and since the detection of new planets is Kepler’s primary objective, it is logical for many of these other scientific tasks to be conducted by the astronomical community as a whole.

Independent and detailed investigations of the Kepler photometry provides an “acid-test” of the methods employed by the Kepler Science Team. Indeed, the distinct analysis of any scientific measurement has always been a fundamental corner stone of the scientific method. In this paper, we present two independent analyses of the discovery photometry for the first five Kepler planets. We aim to not only test the accuracy of the methods used in the discovery papers, but also test our own methods by performing two separate studies. Both methods will be using the same original data, as published in the discovery papers.

Some additional data-processing tasks are run through the Kepler reduced data, which we were not used in the original analyses presented in the discovery papers, and are discussed in §3. In this section, we also discuss the generation of new limb darkening coefficients and methods for compensating for the long integration time of the Kepler long-cadence photometry. We also perform individual transit fits for all available Kepler transits in order to search for transit timing variations (TTV), transit duration variations (TDV) and other possible changes. These will be used to provide a search for perturbing planets and companion exomoons.

## 2. Fitting Methodology

### 2.1. Method A

In this work, the two of us adopt different algorithms for fitting the observational data. The first code has been written by D. Kipping and is a Markov Chain Monte Carlo (MCMC) code (for a brief introduction on this method, consult appendix A of Tegmark et al. (2004)) with a Metropolis-Hastings algorithm, written in Fortran 77/90. The lightcurve model is generated using the analytic quadratic limb darkening Mandel & Agol (2002) routine, treating the specific stellar intensity with:

 IμI1=1−u1(1−μ)−u2(1−μ)2 (1)

The quadratic limb darkening coefficients and are known to be highly correlated (Pál, 2008b), but a principal component analysis provides a more efficient fitting set:

 u′1 =u1cos40∘−u2sin40∘ (2) u′2 =u1sin40∘+u2cos40∘ (3)

We also implement conditions to ensure the brightness profile is everywhere positive and monotonically decreasing from limb to center: and (Carter et al., 2008).

Quantities relating to time differences are computed by solving the bi-quartic equation for the various true anomalies of interest following the methods detailed in Kipping (2008) and Kipping (2010a). These quantities include:

• The transit duration between the first and fourth contact, .

• The transit duration between the second and third contact, .

• The ingress and egress transit durations, and respectively.

• The secondary eclipse full duration, .

• The time of the predicted secondary eclipse, .

• The offset time between when the radial velocity possesses maximum gradient (for the instance closest to the time of conjunction) and the primary mid-transit, .

The latter two on this list are particularly important. For example, the RV offset time provides a strong constraint on .

The principal fitting parameters for the transit model are the orbital period, , the ratio-of-radii squared, , the Kipping (2010a) transit duration equation, , the impact parameter, , the epoch of mid-transit, and the Lagrange eccentricity parameters and . is used as it describes the duration between the planet’s centre crossing the stellar limb and exiting under the same condition and this is known to be a useful decorrelated parameter for light curve fits (Carter et al., 2008). We also note that an approximate form is required as the exact expression requires solving a bi-quartic equation which is not easily invertible. In contrast, may be inverted to provide using (Kipping, 2010a):

 ϱc =1−e21+esinω (4) a/R∗ = ⎷(1−b2ϱ2c)csc2[πT1√1−e2Pϱ2c]+(b2ϱ2c) (5)

Blending is accounted for using the prescription of Kipping & Tinetti (2010) where we modify the formulas for an independent blend rather than a self-blend. The nightside pollution effect of the planet is at least an order of magnitude below the detection sensitivity of Kepler due to the visible wavelength bandpass of the instrument. Therefore, the only blending we need account for are stellar blends. In addition, the model allows for an out-of-transit flux level () to be fitted for. Unlike method B (see later), we fix to be a constant during the entire orbital phase of the planet. Secondary eclipses are produced using the Mandel & Agol (2002) code with no limb darkening and the application of a transformation onto the secondary eclipse lightcurve which effectively squashes the depth by a ratio such that the new depth is equal to (planet dayside flux to stellar flux ratio) in the selected bandpass. This technique ensures the secondary eclipse duration and shape are correctly calculated. The observed flux is therefore modeled as:

 Fobs(t)=(Fmodel(t)+(B−1)B)Foot (6)

This model so far accounts for primary and secondary eclipses but an additional subroutine has been written to model the radial velocity variations of a single planet. Modeling the RV in conjunction with the transit data gives rise to several potential complications. Firstly, we could try fits using either an eccentric orbit or a fixed circular orbit model. An eccentric fit will always find a non-zero eccentricity due to the boundary condition that (Lucy & Sweeney, 1971). The effects of fitting versus not-fitting for eccentricity are explored in this work by presenting both fits for comparison.

A second complication is the possible presence of a linear drift in the RVs due a distant massive planet. This drifts can give rise to artificial eccentricity if not accounted for but their inclusion naturally increases the uncertainty on all parameters coming from the RV fits. Thirdly, an offset time, , between when then RV signal has maximum gradient (for the instance nearest the time of conjunction) and the mid-transit time can be included. For eccentric orbits, a non-zero offset always exists due to the orbital eccentricity. In our code this is calculated and denoted as and is calculated exactly by solving for the relevant true anomalies and computing the time interval using the duration function of Kipping (2008). However, an additional offset can also exist, , which may due to a massive body in a Trojan orbit (i.e. we have ). This offset was first predicted by Ford & Holman (2007) and the detection of a non-zero value of would indicate a Trojan. For eccentric orbits, the error in is often very large and dominates the error budget in , washing out any hints of a Trojan.

The question exists as to when one should include these two additional parameters. Their perenial inclusion would result in very large errors for poorly characterized orbits and this is clearly not desirable. We therefore choose to run a MCMC + simplex fit for all possible models and extract the lowest for the RV signal. This minimum is then used to compute the Bayesian Information Criterion, or BIC (see Schwarz (1978); Liddle (2007)). BIC compares models with differing numbers of free parameters (), heavily penalizing those with more and the preferred model is given that yielding the lowest BIC, where:

 BIC=χ2+klogN (7)

Where is the number of data points. In total, there exists four possible models for the circular and four possible models for the eccentric fit by switching on/off these two parameters. For both the circular and eccentric fits, we select the model giving the lowest BIC. Therefore, if for example the lowest BIC was that of a zero but non-zero , we would fix the gradient term but let the offset be freely fitted in the final results. The results of these preliminary investigations can be found in Table 14 of the Appendix.

The favored solution is generally the eccentric model over the circular model. This is because the light curve derived stellar density, which we will later use in our stellar evolution models, has a strong dependence on the eccentricity. In general, fixing leads to unrealistically small error on . However, in some cases sparse RV phase coverage can lead to artificially large values.

The code is executed as a global routine, fitting all the RV and photometry simultaneously with a total of free parameters: , , , , , , , , , , , and . The inclusion of and is reviewed on a case-by-case basis, as discussed. Additionally, the blending factor, which we denote as , is allowed to float around its best fit value in a Gaussian distribution of standard deviation equal to the uncertainty in . These values are reported in the original discovery papers.

The final fit is then executed with 125,000 trials with the first 20% of trials discarded to allow for burn-in. This leaves us with 100,000 trials to produce each a posteriori distribution, which is more than adequate. The Gelman & Rubin (1992) statistic is calculated for all fitted parameters to ensure good mixing. The final quoted values are given by the median of the resultant MCMC trials. We define the posian and negian as the maximum and minimum confidence limits of the median respectively. The negian may be found by sorting the list and extracting the entry which is 34.13% of the total list length. The posian is given by the entry which occurs at 68.27% of the total list length. These values may be then be used to determine the confidence bounds on the median.

After the global fitting is complete, we produce the distribution of the lightcurve derived stellar density and a Gaussian distribution for the SME-derived effective temperature and metallicity around the values published in the discovery papers and based on SME analysis of high resolution spectra. These distributions consist of 100,000 values and thus may be used to derive 100,000 estimates of the stellar properties through a YY-isochrone analysis (Yi et al., 2001). For errors in  and , we frequently adopted double that which was quoted in the discovery papers, as experience with HAT planets has revealed that SME often underestimates these uncertainties. Finally, the planetary parameters and their uncertainties were derived by the direct combination of the a posteriori distributions of the lightcurve, RV and stellar parameters. The stellar jitter squared is found by taking the best-fit RV model residuals and calculating the variance and subtracting the sum of the measurement uncertainties squared.

### 2.2. Method B

The second method builds on the codes developed under the HATNet project, mostly written in C and shell scripts. These have been used for the analysis of HATNet planet discoveries, such as Pál et al. (2008a); Bakos et al. (2009); Pál (2009b), and have been heavily modified for the current case of Kepler analysis. We used a model for describing the flux of the star plus planet system through the full orbit of the planet, from primary transit to occultation of the planet. In our initial analysis we assumed a periodic model, and later we considered the case of variable parameters as a function of transit number. The flux was a combination of three terms:

 F(t) = Φ(t,E,P,Foot,Foos,Foom,ζ/R⋆,p,b2,k,h,w)− (8) Foot/(1+ϵpn+B)×Ft(t,E,P,ζ/R⋆,p,b2)− Foos×ϵpd/((1+ϵpd+B)p2) ×Fs(t,E,P,ζ/R⋆,p,b2)

Here (the first term) is the brightness (phase-curve) of the star+planet+blend system without the effect of the transit and occultation. The drop in flux due to the transit is reflected by the second term, and the equivalent drop due to the occultation of the planet by the third term. The phase-curve is a rather simple step-function of three levels; within times the duration of the transit around the transit center, within times the duration of the occultation around the center of the occultation, and in between. For simplicity, we adopted . Because the transit and occultation centers and durations also enter the formula, depends on a number of other parameters, namely the time, mid-transit time, period, the parameter that is characteristic of the duration of the transit, planet to star radius ratio, impact parameter, and and Lagrangian orbital parameters. The location and total duration of the transit and occultation are fully determined by the above parameters. In general, the combined brightness of the star plus planet would smoothly vary from outside of the primary transit to outside of the occultation, typically as a gradual brightening as the planet shows its star-lit face towards the observer. If the night-side brightness of the planet is (Kipping & Tinetti, 2010), and the day-side brightness of the planet is , and is the flux of a blend contributing light to the system, then and (both equations normalized by ), i.e. there is a constraint between and . The reason for introducing all three of , and for characterizing the data was because the Kepler lightcurve, as provided by the archive, was “de-trended”, removing the (possible) brightening of the phase-curve due to the planet. Because of this, we found that introducing a more complex phase function was not warranted. We note that while such de-trending may be necessary to eliminate systematic effects, it also removes real physical variations of small amplitude.

The second term characterizes the dimming of the combined light of the star, blend and planet during the primary transit. The loss of stellar flux was modeled using the analytic formula based on Mandel & Agol (2002) for the eclipse of a star by a planet (Bakos et al., 2009), where the stellar flux was described by quadratic limb-darkening. The quadratic limb-darkening coefficients were derived by evaluating the stellar flux as a function of (where is the angle between the stellar surface normal vector and the line-of-sight) using the quadratic , coefficients. The formula takes into account the blending (constant flux contribution by other sources), the night-side illumination of the planet (normalized by ), and the out-of-transit flux level that was normalized by an arbitrary constant.3

Finally, the third term characterizes the dimming of the combined light of the star, blend and planet during the occultation of the planet. The day-side flux of the planet is gradually lost as it moves behind the star. This is modeled by a zero limb-darkening model of the secondary eclipse using the same Mandel & Agol (2002) formalism. This term also takes into account the blending, and the daytime flux of the system.

Our global modeling also included the radial velocity data. Following the formalism presented by Pál (2009), the RV curve was parametrized by an eccentric Keplerian orbit with semi-amplitude , Lagrangian orbital elements , and systemic velocity .

The Kepler photometry and radial velocity data are connected in a number of ways. For example, we assumed that there is a strict periodicity in the individual transit times4, and the same and ephemeris of the system describes the photometric and RV variations. Another example: the Lagrangian orbital parameters and were not only determined by the RV data, but also by the phase of the occultation of the planets in the Kepler data.

Altogether, the 13 parameters describing the physical model were (the of the first transit in the Kepler data), (the same parameter for the last transit), , , , , , , , , , , and . The blending factor was kept fixed at the values published in the Kepler discovery papers (Borucki et al. (2010b); Koch et al. (2010); Dunham et al. (2010); Latham et al. (2010); Jenkins et al. (2010a)). We adopted for the night-side emission of the planet. Fitting for and would require data with exceptional quality.

The joint fit on the photometry and radial velocity data was performed as described in Bakos et al. (2009). We minimized  in the parameter space by using a hybrid algorithm, combining the downhill simplex method (AMOEBA; see Press et al., 1992) with the classical linear least squares algorithm. Uncertainties for the parameters were derived using the Markov Chain Monte-Carlo method (MCMC, see Ford, 2006). The a priori distributions of the parameters for these chains were chosen from a generic Gaussian distribution, with eigenvalues and eigenvectors derived from the Fisher covariance matrix for the best-fit value. The Fisher covariance matrix is calculated analytically using the partial derivatives given by Pál (2009).

Stellar parameters were also determined in a Monte-Carlo fashion, using the tools and methodology described in e.g. Bakos et al. (2009). Basically, we used the MCMC distribution of and corresponding , a Gaussian distribution of  and  around the values published in the discovery papers and based on SME analysis of high resolution spectra. For errors in  and  we typically adopted double that of quoted in the discovery papers. Finally, the planetary parameters and their uncertainties were derived by the direct combination of the a posteriori distributions of the lightcurve, RV and stellar parameters.

Throughout this text, the final value for any given parameter is the median of the distribution derived by the Monte-Carlo Markov Chain analysis. Error is the standard deviation around the final value. Asymmetric error-bars are given if the negative and positive standard deviations differ by 30%. Parameter tables also list the RV “jitter,” which is a component of assumed astrophysical noise intrinsic to the star that we add in quadrature to the RV measurement uncertainties in order to have from the RV data for the global fit.

### 2.3. Parameter variation search method

#### Individual fits

In addition to global fits, we will search for variations in the lightcurve parameters for each transit. Searching for changes in the mid-transit time, commonly known as transit timing variations (TTV), can be used to search for perturbing planets (Agol et al., 2005; Holman & Murray, 2005) or companion moons (Sartoretti & Schneider, 1999; Kipping, 2009a). Similarly, transit duration variations (TDV) provide a complementary method of searching for exomoons (Kipping, 2009a, b). We also look for depth changes and baseline variations which may aid in the analysis of any putative signals.

Transit timing and duration measurements are made by splitting the time series into individual transits and fitting independently. In these individual fits, we only fit for , , , and and float all other parameters around their global best-fit determined value with their a posteriori distribution. The limb darkening coefficients are fixed to the theoretical values for these fits, since epoch to epoch variation in the LD coefficients is not expected. We choose to use method A for the individual fits as the parameter is more likely to have a uniform prior than (as used in method B), from geometric considerations. As for the global fits, we perform 125,000 trials for each fit with the first 25,000 (20%) trials being used as a burn-in.

For the TDV, we define our transit duration as the time for the sky-projection of the planet’s center to move between overlapping the stellar disc and exiting under the same condition, . This value was first proposed by Carter et al. (2008) due to its inherently low correlations with other parameters. As a result, this definition of the duration may be determined to a higher precision than the other durations. has the property that its uncertainty is exactly twice that of the mid-transit time for a trapezoid approximated lightcurve. In reality, limb darkening will cause this error to be slightly larger. Nevertheless, this property means that a useful definition of TDV is to halve the actual variations to give TDV = where is the globally-fitted value of . The factor of 0.5 means that we expect the r.m.s. scatter of the TDV to be equal to, or slightly larger than, the TTV scatter. It is also worth noting that in reality we could use any factor we choose rather than 0.5, because TDV is really a measurement of the fractional variation in duration and is not absolute in the sense that TTV is.

#### Periodograms

When analyzing the TTV and TDV signals, we first compute the of a model defined by a static system, i.e. constant duration and linear ephemeris. This value is naturally computed from the MCMC uncertainties for each transit. In practice, these errors tend to be overestimates. This is because the MCMC only produces accurate errors if it is moving around the true global minimum. In reality, slight errors in the limb darkening laws, the finite resolution of our integration-time compensation techniques and the error in the assumed photometric weights themselves mean that our favorite solution is actually slightly offset from the true solution. Consequently, the MCMC jumps sample a larger volume of parameter space than if they were circling the true solution.

Despite the wide-spread use and therefore advocation of the MCMC method for transit analysis, it is also unclear whether the MCMC method does produce completely robust errors, particularly in light of the inevitable hidden systemic effects. Therefore searching for excess variance in the values may not be a completely reliable method of searching for signals, echoed recently by Gibson et al. (2009). Concordantly, in all of our analyses, we compare the of a static model versus that of a signal and compute the F-test. The F-test has the advantage that it does care not about the absolute values, only the relative changes. Furthermore, it penalizes models which are overly complex (Occam’s razor).

We therefore compute periodograms by moving through a range of periods in small steps of 1/1000 of the transit period and then fitting for a sinusoidal signal’s amplitude and phase in each case. We compute the of the new fit and then use an F-test to compute the false alarm probability (FAP) to give the F-test periodogram. Any peaks below 5% FAP are investigated further and we define a formal detection limit of 3- confidence. Compared to the generalized Lomb-Scargle periodogram, the F-test is more conservative as it penalizes models for being overly-complex and therefore offers a more prudent evaluation of the significance of any signal. Conceptually, we can see that the difference is that a Lomb-Scargle periodogram computes the probability of obtaining a periodic signal by chance whereas the F-test periodogram computes the statistical significance of preferring the hypothesis of a periodic signal over a null hypothesis.

#### Analysis of variance

In general, there are two types of periodic signals we are interested in searching for; those of period greater than the sampling rate (long-period signals) and those of periods shorter than the sampling rate (short-period signals). An example of a long-period signal would be an exterior perturbing planet and a short-period signal example would be a companion exomoon. Periodograms are well suited for detecting long period signals above the Nyquist frequency, which is equal to twice the sampling period. Short-period signals cannot be reliably seeked below the Nyquist frequency. First of all, a period equal to the sampling rate will always provide a strong peak. Periods of an integer fraction of this frequency will also present strong significances (e.g. 1/2, 1/3, 1/4, etc). As the interval between these aliases becomes smaller, very short-period signals become entwined. If a genuine short-period signal was present, it would exhibit at least partial frequency mixing with one of these peaks to create a plethora of short-period peaks. As a result, peaks in the periodogram below cannot be considered as evidence for authentic signals. Therefore, we only plot our periodograms in the range of to twice the total temporal coverage.

To resolve this issue, it is possible to derive an exomoon’s period by taking the ratio of the TTV to TDV r.m.s. amplitudes (Kipping, 2009a, b). Since both of these signals are very short period, the data would ostensibly be seen as excess scatter. These excess scatters can be used to infer the r.m.s. amplitude of each signal and then derive the exomoon’s period from the ratio. Excess scatter, or variance, may be searched for by applying a simple test of the data, assuming a null-hypothesis of no signal being present. If a signal is found, or we wish to place constraints on the presence of putative signal amplitudes, we use the -distribution due to both Gaussian noise plus an embedded high frequency sinusoidal signal to model the variation.

#### Phasing

An unusual aspect of the individual data sets is that consecutive data points are 30 minutes apart. Despite accounting for the long integration times in our modeling (see later in §3.5), we consider the possibility that the long cadence data could produce artificial signals in the parameter variation search. We propose that one method of monitoring the effects of the long integration time is what we denote as phasing.

Using the globally fitted ephemeris, we know the mid-transit time for all transits (assuming a non-variable system). We may compare the difference between this mid-transit time and the closest data point in a temporal sense. This time difference can take a maximum value of minutes (assuming no data gaps) and will vary from transit to transit. We label this time difference as the phasing. In all searches for parameter variations, a figure showing the phasing will also be presented. This allows us to check whether any putative signals are correlated to the phasing, which would indicate a strong probability the signal is spurious.

## 3. Data processing

### 3.1. Barycentric Julian Dates

The radial velocity data from Kepler is provided in barycentric Julian date (BJD) but the photometry is given in heliocentric Julian data (HJD). For consistency, we calculate the difference between HJD and BJD for every time stamp and apply the appropriate correction. We use the JPL Horizons ephemeris to perform this correction, which is important given that the difference between HJD and BJD can be a few seconds. We do not correct times from the UTC system to TDB/TT (Eastman et al., 2010) as the difference is effectively a constant systematic due to leap seconds which accumulate over decades timescale. Further, recent Kepler data products (e.g. Kepler Data Release 5) use the BJD system and thus there is a preference to remain consistent with the probable future data format.

### 3.2. Measurement uncertainties

Unfortunately, the Kepler pipeline output did not include photometric uncertainties which forces us to evaluate the measurement errors ourselves. For Kepler-4b, Kepler-5b, Kepler-6b and Kepler-7b, we found that the r.m.s. scatter was constant over the observational period. In order to estimate the standard deviation, we use the median absolute deviation (MAD) from Gauss (1816) due to its robustness against outliers. Assuming Gaussian errors, the standard deviation is given by 1.4286 multiplied by the MAD value. This value for the standard deviation is then assigned to all data points as their measurement uncertainty.

For Kepler-8b, this approach was no possible since the r.m.s. scatter varies with time. To estimate the errors, we calculate the standard deviation using MAD in 50 point bins, excluding the transits and secondary eclipses. This binning window is shifted along by one data point and then repeated until we reach the end of the time series. The MAD-derived standard deviations are then plotted as a function of median time stamp from the binning window. Finally, we find that a quadratic fit through the data provides an excellent estimation of the trend. This formula is then used to generate all measurement uncertainties.

### 3.3. Outliers

Although the data has been ‘cleaned’ by the Kepler pipeline (Jenkins et al., 2010b), we run the data through a highly robust outlier detection algorithm developed by Beaulieu et al. (2010) using the MAD method. Essentially, we use MAD to estimate the standard deviation and then estimate a sigma clipping level. This is found by setting the sigma clipping level to a value such that if all the errors were Gaussian, we would only expect to reject one valid data point by accident, which is a function of the array length.

For each system we use MAD to estimate the outlier-robust standard deviation to be  ppm,  ppm,  ppm and  ppm for Kepler-4b through 7b respectively. For Kepler-8b we find that the standard deviation of the photometry is variable as described in the previous subsection. We present the list of rejected outlier points in Table 2 of the appendix.

### 3.4. Limb Darkening

Accurate limb darkening coefficients were calculated for the Kepler bandpass for each planet. For the Kepler-bandpass, we used the high resolution Kepler transmission function found at http://keplergo.arc.nasa.gov/CalibrationResponse.shtml. We adopted the SME-derived stellar properties reported in the original discovery papers. We employed the Kurucz (2006) atmosphere model database providing intensities at 17 emergent angles, which we interpolated linearly at the adopted and values. The passband-convolved intensities at each of the emergent angles were calculated following the procedure in Claret (2000). To compute the coefficients we considered the following expression:

 I(μ)I(1)=1−2∑k=1ck(1−μ)k, (9)

where is the intensity, is the cosine of the emergent angle, and are the coefficients. The final coefficients resulted from a least squares singular value decomposition fit to 11 of the 17 available emergent angles. The reason to eliminate 6 of the angles is avoiding excessive weight on the stellar limb by using a uniform sampling (10 values from 0.1 to 1, plus ), as suggested by Díaz-Cordovés et al. (1995). This whole process is performed by a Fortran code written by I. Ribas (Beaulieu et al., 2010). The coefficients are given in the final parameter tables for each planet.

### 3.5. Integration Time

The Kepler long-cadence (LC) photometry is produced by the onboard stacking of 270 images of 6.02 seconds exposure. There is a 0.52 second read-time on-top of the exposure time which leads to a net duty-cycle of 91.4%. The duty cycle is sufficiently high that we can consider the LC photometry to be equivalent to one long exposure of 29.4244 minutes. As a result of such a long integration, sharp changes in the photometry, due to say a transit ingress, are smeared out into broader morphologies. In itself, this does not pose a problem since the nature of this smearing is well understood and easily predicted.

Kipping (2010b) discusses in detail the consequences of finite integration times, with particular focus on Kepler’s LC photometry. The effects of the smearing may be accounted for by using a time-integrated flux model for the lightcurve, as opposed to an instantaneous flux model. If we denote the flux predicted at any instant by , the time-integrated flux by and the integration time by , the observed flux will be given by:

 ~F––(~t––)=∫~t+I/2t=~t−I/2F(t)dt∫~t+I/2t=~t−I/2dt (10)

Kipping (2010b) proposes two methods of numerically integrating this expression; Simpson’s composite rule and re-sampling. In this work, both of us compensate for the effect in different ways. Method A employs Simpson’s composite rule and method B employs re-sampling. In both cases, we use the expressions of Kipping (2010b), given below in equations (5) and (6) to choose our numerical resolution such that the maximum photometric error induced by the finite numerical resolution is less than the observed photometric uncertainties, as seen in Table 1.

 σ~FComp.Simp. =δτI24m2 (11) σ~FResampling =δτI8N2 (12)

, where is the transit depth, is ingress duration, and is the integration time.

## 4. Kepler-4b

### 4.1. Global Fits

The discovery of Kepler-4b was made by Borucki et al. (2010b). The planet is particularly interesting for joining the club of HAT-P-11b and GJ 436b as a Neptune-mass transiting exoplanet. Kepler-4b exhibits a sub-mmag transit around a magnitude star, which leads to relatively large uncertainties on the system parameters but demonstrates the impressive performance of the Kepler photometry.

However, the combination of the course sampling (e.g. the ingress/egress duration is 3-4 times shorter than the cadence of the observations), the very low RV amplitude and a sub-mmag transit make Kepler-4b the most challenging object to determine reliable system parameters for in this paper. The BIC test for the approriate RV model prefers the simplest description possible, reflecting the low signal-to-noise levels encountered for the radial velocities.

The largest difference between our own fits and that of Borucki et al. (2010b) is the retrieved and impact parameter, . Borucki et al. (2010b) find an equatorial transit solution whereas our method A circular fit, and both modes of method B, place Kepler-4b at an impact parameter of 0.5-0.6. Curiously, the eccentric fit of method A prefers an even larger impact parameter than this, as a result of letting the limb darkening be fitted.

Due to the well-known negative correlation between and (Carter et al., 2008), these larger values lead to lower values and consequently a significantly lower lightcurve derived stellar density, . Indeed, the lower stellar density results in one of the largest stellar radii out of all the known transiting systems. The fitted models on the phase-folded lightcurves are shown on Figure 2 using method B. Correlated noise was checked for in the residuals using the Durbin-Watson statistic, which finds , well inside the 1% critical boundary of . The orbital fits to the RV points are shown in Fig. 3 (method B), depicting both the circular and eccentric fits. The final table of all results are shown in Table 2.

#### Eccentricity

From Table 14 of the Appendix, the circular fit with no other free parameters is the preferred model to describe the radial velocities of Kepler-4b. We may perform other tests aside from the BIC model selection though. The global circular fit, using method A, yields a and the eccentric fit obtains . These values correspond to the lowest solution of the simplex global fit, but for the RV points only, which dominate the eccentricity determination. Assuming the period and epoch are essentially completely driven by the photometry, the number of free parameters are 2 and 4 for the circular and eccentric fits respectively. Therefore, based upon an F-test, the inclusion of two new degrees of freedom to describe the eccentricity is justified at the 1.7- level (91.5% confidence). Another test we can implement is the Lucy & Sweeney (1971) test, where the significance of the eccentric fit is given by:

 P(e>0)=1−exp[−^e22σ2e] (13)

Where is the modal value of and is the error (in the negative direction). We find that this gives a significance of 88.2% or 1.6-, in close agreement with the F-test result. We also computed the posterior distribution of the distance of the pericentre passage, in units of the stellar radius, and found for method A. Therefore, if the eccentric fit is the true solution, Kepler-4b would make an extremely close pericentre passage.

We note that Borucki et al. (2010b) found a best-fit eccentricity of (no quoted uncertainty) but the authors concluded the eccentric fit was not statistically significant, but provided no quantification. Based upon the current observations, it is not possible to make a reliable conclusion as to whether Kepler-4b is genuinely eccentric or not. At best, it can be considered a - marginal detection of eccentricity. The only assured thing we can say is that to 95% confidence. Despite the ambiguity of the eccentricity, it is interesting to consider the consequences of this object possessing .

For the eccentric fit, the extreme proximity of this planet to the star raises the issue of tidal circularization. Let us continue under the assumption that no third body is responsible for pumping the eccentricity of Kepler-4b. For a planet initially with , the eccentricity decreases to after circularization timescales. This means the maximum number of circularization timescales which have transpired is given by . Since (age of the system divided by circularization timescale) then we may write . Using the method of Adams & Laughlin (2006), the tidal circularization time may be written as:

 τcirc=QP463P2πMpM⋆(aR⋆1p)5(1−e2)13/2 (14)

Given that , we may compute the minimum possible value of through re-arrangement:

 QP≥6342πPM⋆Mp(p(a/R⋆))5τage−log(e)1(1−e2)13/2 (15)

The final term, may be neglected since we are interested in the minimum limit of and this term only acts to further increase the for non-zero . The advantage of doing this is that is a function of time and thus we can eliminate a term which would otherwise have to be integrated over time. Taking the posterior distribution of the equation for method A yields . We find that 99.9% of the MCMC trials correspond to for the eccentric fit. Thus, if the eccentric fit was genuine, Kepler-4b would exhibit very poor tidal dissipation. This is somewhat reminiscent of GJ 436b (Deming et al., 2007) and HAT-P-11b (Bakos et al., 2009) for which a large value is also predicted and raises the interesting question as to whether hot-Neptune generally possess larger values than their Jovian counterparts. We also note that WASP-18b has also been proposed to possess a greater-than-Jovian- (Hellier et al., 2009).

This discussion highlights the importance of obtaining more statistics regarding the eccentricity of transiting hot-Neptunes, which may reveal some fundamental insights into the origin of these objects.

#### Secondary eclipse

We here consider the circular model only since the eccentric fit is not unambiguously accepted. Global fits do not find any significant detection of the secondary eclipse. Secondary depths  ppm are excluded to 3- confidence, which corresponds to geometric albedos greater than unity and thus this places no constraints on the detectability of reflected light from the planet. The thermal emission is limited by this constraint such that  K, which places no constraints on redistribution.

#### Properties of the parent star Kepler-4

The Yonsei-Yale model isochrones from Yi et al. (2001) for metallicity = are plotted in Figure 4, with the final choice of effective temperature and  marked, and encircled by the 1 and 2 confidence ellipsoids, both for the circular and the eccentric orbital solution from method B. Here the MCMC distribution of  comes from the global modeling of the data. Naturally, errors of the stellar parameters for the eccentric solution are larger, due to the larger error in , which, in turn, is due to the uncertainty in the Lagrangian orbital parameters and (as opposed to fixing these to zero in the circular solution).

The stellar evolution modeling provides color indices that may be compared against the measured values as a sanity check. The best available measurements are the near-infrared magnitudes from the 2MASS Catalogue (Skrutskie et al., 2006). For Kepler-4, these are: , and ; which we have converted to the photometric system of the models (ESO system) using the transformations by Carpenter (2001). The resulting measured color index is . This is within 2- of the predicted value from the isochrones of (see 2). The distance to the object may be computed from the absolute magnitude from the models () and the 2MASS magnitude, which has the advantage of being less affected by extinction than optical magnitudes. The result is  pc, where the uncertainty excludes possible systematics in the model isochrones that are difficult to quantify.

### 4.2. Transit timing analysis

#### Analysis of variance

We find TTV residuals of 207.8 seconds and TDV residuals of 217.0 seconds using method A. Repeating the TTV analysis for method B finds best-fit times of consistently less than 0.25- deviance (we note that similar results were found for all planets). The TTV indicates timings consistent with a linear ephemeris, producing a of 5.7 for 11 degrees of freedom. This is somewhat low with a probability of 10.9% of occurring by coincidence. This may be an indication that the MCMC methods adopted here yield artificially large error bars, perhaps due to our choice of free-fitting every lightcurve rather than fixing certain parameters. However, we prefer to remain on the side prudence by overestimating such errors rather than underestimating them. The TDV too is consistent with a non-variable system exhibiting a of 6.4 for 12 degrees of freedom (10.6% probability of random occurrence).

Under the assumption that the timing uncertainties are accurate or overestimated, it is possible for us to determine what signal amplitudes are excluded for waveforms of a period less than the temporal baseline. For the TTV, we note that for 11 degrees of freedom, excess scatter producing a of 28.5 would be detected to 3- confidence. This therefore excludes a short-period signal of r.m.s. amplitude  seconds. Similarly, for the TDV, we exclude scatter producing a of 30.1, or a short-period r.m.s. amplitude of  seconds, to 3- confidence. This constitutes a 4.9% variation in the transit duration.

These limits place constraints on the presence of perturbing planets, moons and Trojans. For the case a 2:1 mean-motion resonance perturbing planet, the libration period would be  cycles (2.9 years) and thus we do not possess sufficient baseline to look for such perturbers yet. However, this libration period is less than the mission lifetime of Kepler (3.5 years) and so assuming the same timing precision for all measurements, the TTV will be sensitive to a 0.14  perturber.

The current TTV data rules out an exomoon at the maximum orbital radius (for retrograde revolution) of mass . The TDV data rules an exomoon of at a minimum orbital distance.

Using the expressions of Ford & Holman (2007) and assuming , the expected libration period of Kepler-4b induced by Trojans at L4/L5 is 50.5 cycles and therefore we do not yet possess sufficient temporal baseline to look for the TTVs. However, inspection of the folded photometry at from the transit centre excludes Trojans of effective radius to 3- confidence.

#### Periodograms

As discussed in section §2.3.2, we may use an F-test periodogram to search for periodic signals, as this negates any problems introduced by underestimating or overestimating timing uncertainties. Whilst a search for excess variance is sensitive to any period below the temporal baseline, periodograms are limited to a range of .

There are no peaks in the TTV periodogram below 5% FAP. The TDV does seem to exhibit a significant peak, which can also be seen by eye in the TDV plot itself, at a period of  cycles, amplitude  seconds and FAP 2.1% (2.3-). A linear trend in the durations could be confused with a long period signal. Fitting a simple model yields  seconds and  seconds/cycle with FAP 2.6% (2.2-). Therefore, the two models yield very similar significances and it is difficult to distinguish between these two scenarios. The linear drift model seems improbable due to the very rapid change in duration the trend implies. The gradient equates to a change of around one hour per year in the half-duration, i.e. two hours per year in . The period of the signal is too large to be caused by an exomoon and TDV is generally insensitive to perturbing planets, meaning that any planet which could produce such a large effect would dominate the radial velocity. Other TDV effects, for example apsidal precession, are not expected to occur on timescales of 10-20 days which puts the reality of the signal in question.

The phasing of the measurements, as defined in §2.3.4, yields a clear periodic waveform of period 4 cycles. Therefore one possible explanation for the TDV signal is a mixing of this phasing frequency with the Nyquist frequency to produce a signal of period 16 cycles, which is consistent with the value determined earlier. The analysis of further measurements is evidently required to assess the reality of the putative signal.

### 4.3. Depth and OOT Variations

The transit depth is consistent with the globally fitted value yielding a of 4.3 for 12 degrees of freedom. Similarly, the OOT levels are flat yielding a of 6.1 for 12 degrees of freedom.

## 5. Kepler-5b

### 5.1. Global fits

#### Comparison to discovery paper

Kepler-5b was discovered by Koch et al. (2010). Our global fits reveal Kepler-5b to be a planet on an orbit consistent with that of a circular orbit, transiting the host star at a near-equatorial impact parameter. The fitted models for method B of the phase-folded lightcurves are shown on Figure 6. Correlated noise was checked for in the residuals using the Durbin-Watson statistic, which finds , which is well within the 1% critical level of . The orbital fits to the RV points are shown in Fig. 9 for method B, depicting both the circular and eccentric solutions. We obtain transit parameters broadly consistent with that of the discovery paper, except for where method A favors an equatorial transit and method B favors a near-equatorial transit, whereas Koch et al. (2010) found . The final parameters are summarized in Table 4. Lightcurve fits reveal that the theoretical limb darkening values differ from the fitted values by a noticeable amount and the residuals of method B do reveal some structure as a consequence. Future SC mode observations will pin down the limb darkening to a much higher precision.

#### Eccentricity

The global circular fit of method A yields a for the RVs and the eccentric fit obtains . Based on an F-test, the inclusion of two new degrees of freedom to describe the eccentricity is justified below the 1- level and is therefore not significant. We therefore conclude that the system is consistent with a circular orbit and constrain to 95% confidence.

#### Secondary eclipse

We here only consider the circular fits since the eccentric solution is not statistically significant. Method B provides the more accurate constraints due the use of multiple baseline scalings. Proceeding with method B, the secondary eclipse is weakly detected in our analysis to 93.4% confidence, or 1.8-, with a depth of  ppm.

The MCMC distribution of the  values is shown in Figure 8. The discovery paper reported a weak 2.0- detection of the eclipse but did not provide an amplitude for the signal. An eclipse of depth  ppm is excluded at the 3- level, which excludes a geometric albedo to the same level.

If due to thermal emission, we find that by integrating a Planck function with the Kepler response function, the planetary brightness temperature would have to be  K, which is inconsistent with that expected for the equilibrium temperature of  K. Assuming negligible albedo and solving for the redistribution factor implies a factor of 3.7, which surpasses the maximal physically permitted value of 8/3 (corresponding to 2300K). Internal heating also seems to be unlikely as the heat source would have to be able to generate a surface temperature of 2300 K alone, or have an intrinsic luminosity of which is approximately that of a M7V star. Tidal heating from eccentricity seems improbable given that the orbit is consistent with a circular orbit. However, using our best-fit eccentricity and the equations of Peale & Cassen (1978), we estimate that tidal heating could induce for and . In order to get enough tidal heating, we require , which is much less than the typical values expected. Furthermore, such a low leads to very rapid circularization which somewhat nullifies the argument.

If the eclipse was due to reflection, we estimate a required geometric albedo of . This latter option does offer a plausible physical origin for the secondary eclipse and therefore should be considered as a possible explanation for the observation. Although there has been a general impression that hot-Jupiters must have low albedos after the remarkable MOST constraints of for HD 209458b by Rowe et al. (2008), a recent study by Cowan & Agol (2010) finds evidence for hot-Jupiters having much higher albedos () in a statistical sample of 20 exoplanets. Therefore, a geometric albedo of 0.15 is not exceptional in such a sample and offers much more reasonable explanation that thermal emission. However, at this stage, the 2- significance is too low to meet the requirements for formal detection.

#### Properties of the parent star Kepler-5

The Yonsei-Yale model isochrones from Yi et al. (2001) for metallicity = are plotted in Figure 10, with the final choice of effective temperature and  marked, and encircled by the 1 and 2 confidence ellipsoids, both for the circular and the eccentric orbital solution. Here the MCMC distribution of  comes from the global modeling of the data.