Of ‘Cocktail Parties’ and Exoplanets
Abstract
The characterisation of ever smaller and fainter extrasolar planets requires an intricate understanding of one’s data and the analysis techniques used. Correcting the raw data at the 10 level of accuracy in flux is one of the central challenges. This can be difficult for instruments that do not feature a calibration plan for such high precision measurements. Here, it is not always obvious how to decorrelate the data using auxiliary information of the instrument and it becomes paramount to know how well one can disentangle instrument systematics from one’s data, given nothing but the data itself. We propose a nonparametric machine learning algorithm, based on the concept of independent component analysis, to deconvolve the systematic noise and all nonGaussian signals from the desired astrophysical signal. Such a ‘blind’ signal demixing is commonly known as the ‘Cocktail Party problem’ in signalprocessing. Given multiple simultaneous observations of the same exoplanetary eclipse, as in the case of spectrophotometry, we show that we can often disentangle systematic noise from the original light curve signal without the use of any complementary information of the instrument. In this paper, we explore these signal extraction techniques using simulated data and two data sets observed with the HubbleNICMOS instrument. Another important application is the decorrelation of the exoplanetary signal from timecorrelated stellar variability. Using data obtained by the Kepler mission we show that the desired signal can be deconvolved from the stellar noise using a single time series spanning several eclipse events. Such nonparametric techniques can provide important confirmations of the existent parametric corrections reported in the literature, and their associated results. Additionally they can substantially improve the precision exoplanetary light curve analysis in the future.
1 Introduction
The field of transiting extrasolar planets and especially the study of their atmospheres is one of the youngest and most dynamic subjects in current astrophysics. Permanently at the edge of technical feasibility, we have come from the first radial velocity and transit detections (Mayor & Queloz, 1995; Marcy et al., 1998; Charbonneau et al., 2000), via the first detections of molecular features in hotJupiter atmospheres (Charbonneau et al., 2002) to ever more detailed characterisations of multitudes of systems (Grillmair et al., 2008; Charbonneau et al., 2008; Snellen et al., 2010b; Bean, 2011; Swain et al., 2008, 2009a, 2009b; Tinetti et al., 2007, 2010). With over 700 exoplanets discovered (Schneider et al., 2011) and over 1200 exoplanetary candidates that await confirmation (Borucki et al., 2011), the focus of interest shifts from the detection to the characterisation of smaller and smaller targets. The governing factor of this progression is the precision at which we can control our instrument and/or stellar systematics, hence the accuracy with which we can analyse the data.
To minimise the impact of the systematic noise components, different approaches have been proposed in the past. For space and groundbased observations, eg. Spitzer and Hubble (eg. Agol et al., 2010; Beaulieu et al., 2008, 2011; Charbonneau et al., 2002, 2005, 2008; Deming et al., 2007; Gillon et al., 2010; Grillmair et al., 2008; Knutson et al., 2007a, b; Sing et al., 2011; Snellen et al., 2010b; Bean et al., 2011; Swain et al., 2008, 2009a, 2009b; Tinetti et al., 2007, 2010), instrumental systematic noise has been approximated using parametric models, often based on auxiliary information (optical state vectors) such as instrumental temperature, orbital inclination, inter and intrapixel positions of the pointspreadfunction. Using optical state vectors to decorrelate one’s data is an effective technique (Swain et al., 2008). However for instruments that lack a calibration plan at the precision of 10, the accuracy of the retrieved optical state vectors (e.g. sensor sampling rates) and the adequacy of the instrument model’s definition itself become difficult to determine. Some of the recent controversy over results reported by various teams can be attributed to this circumstance (Knutson et al., 2011; Stevenson et al., 2010; Beaulieu et al., 2011; Swain et al., 2008; Gibson et al., 2011; Pont et al., 2010; Hatzes et al., 2010; Brunt et al., 2010).
The situation is even further complicated by brightness variability of the planet hosting star, in particular for small, very active Mstars. Hence, it is important to work towards an alternative route to quantify or remove systematic noise using nonparametric models that work as independent confirmations of existing results. Carter & Winn (2009), Thatte et al. (2010), Gibson et al. (2011) and Waldmann et al. (2011) have progressed towards nonparametric noise models and signal separation using wavelets, principal component, Gaussian processes and Fourier based analysis.
In this publication, we propose a new nonparametric method to separate systematic noise from the desired lightcurve signal. Given multiple lightcurves, observed simultaneously using spectrographs for example, we can disentangle our desired astrophysical signal from other timecorrelated or nonGaussian systematic noise sources using unsupervised machine learning algorithms. We furthermore explore the decorrelation of individual time series spanning several consecutive eclipse events. The importance of this work lies with the fact that no additional knowledge of the system or the star is required, besides the observations themselves. Such nonparametric methods provide a potential new route to decorrelate one’s date in the case where the instrument does not feature an adequate calibration plan, the quality of the auxiliary information of the instrument is nonoptimal or the host star shows significant activity. Such blind deconvolution techniques provide new insight and powerful validation of the established parametric instrumental models reported in the literature.
Here we will briefly introduce the more general theory of blindsource separation and proceed with a description of the algorithm proposed. The efficiency of said algorithm is tested with two synthetic models and two HST/NICMOS data sets available in the public domain and one Kepler (Q2 & Q3 data release) time series featuring strong timecorrelated stellar variability. Future publications will focus on indepth discussion of the proposed algorithm to specific data sets.
2 Background: the Cocktail Party Problem
In this section we will briefly describe the fundamental concepts on which the following analysis is based. Readers familiar with independent component analysis may proceed straight to section 3.
Let us consider the analogy of three people talking simultaneously in one room. The speech signals of these people are denoted by , and . In the same room are three microphones recording the observed signals , and . The observed signals can be expressed in terms of the original speech signals:
(1)  
instead of assuming and to be proper time signals, we drop the time dependence and assume them to be random variables
(2) 
where is a weighting factor (in this case the square of the distance of the speakers to the microphone) and are some real coefficients with being the maximum number of observed signals. The individual time series can also be expressed in terms of vectors where bold lowercase letters denote column vectors and uppercase letter denote matrices:
(3) 
where the rows of comprise the individual time series, , and similarly is the signal matrix of the individual source signals . is the ’mixing matrix’ consisting of the weights . Equation 3 is also known as the instantaneous mixing model and often referred to as the classical ’Cocktail Party Problem’ (Hyvärinen & Oja., 2001; Hyvärinen, 1999).
The challenge is to estimate the mixing matrix, and its (pseudo)inverse the demixing matrix, ,
(4) 
given the observations contained in without any additional prior knowledge of either or , or for some methods without restrictions of & .
Several algorithms have been proposed to find the linear transformation of equation 3. Amongst these are principal component analysis (PCA) (Pearson, 1901; Manly, 1994; Jolliffe, 2002; Press et al., 2007; Oja, 1992), factor analysis (FA) (Jolliffe, 2002; Harman, 1967), projection pursuit (PP) (Friedman, 1987; Huber, 1985) and the more recently developed independent component analysis (ICA) (Comon, 1994; Hyvärinen, 1999; Hyvärinen et al., 1999; Hyvärinen & Oja, 2000; Hyvärinen & Oja., 2001; Comon & Jutten, 2000; Stone, 2004).
The underlying differences between PCA and FA on one hand and ICA and PP on the other are the underling assumptions on the probability distribution functions (pdfs) of the signals comprising . The former group assumes the signals to follow: 1) a Gaussian distribution whilst the latter assume the signals to be, 2) predominantly nonGaussian or sparse with specific signatures in the spectral domain (e.g. SMICA Delabrouille et al., 2003). This results in significant differences in the way we estimate our signal components.
1) If the observed signals comprising follow Gaussian distributions, we can define their statistics by the first and second statistical moments (mean and covariance) only. Algorithms such as PCA and FA find a linear transformation from the correlated observed signals, , to a set of uncorrelated signals, . In this case, uncorrelatedness is equivalent to mutual independence and the source signals are at their most separated (please see Appendix A for a more indepth discussion on independence). Such a linear transformation is always possible and easily achieved using, for example, single value decompositions (SVD) (Pearson, 1901; Manly, 1994; Jolliffe, 2002; Press et al., 2007). An application of PCA to exoplanetary light curve detrending is given by Thatte et al. (2010).
2) In the case of the observed signals following nonGaussian distributions, significant information is contained in the higher statistical moments (skew & kurtosis) and it can be shown that uncorrelated signals (as produced by PCA & FA) are not necessarily mutually independent and hence not optimally separated from one another. Here uncorrelatedness is a weaker constraint than independence and it can be said that independent signals are always uncorrelated but not vice versa (see appendix A). As a consequence using PCA or FA algorithms may only yield a partially separated result for nonGaussian sources.
It is important to note that most observed signals, astrophysical or stellar/instrumental noise, are predominantly nonGaussian by nature. We can also state that most of these signals should be statistically independent from one another (e.g. an exoplanet light curve signal is independent of the systematic noise of the instrument with which it was recorded). These properties have led to a surge in ICA based analysis methods in current cosmology and extragalactic astronomy. Here ICA is used to separate the cosmic microwave background (CMB) or signatures of distant galaxies from their galactic foregrounds (e.g. Stivoli et al., 2006; Maino et al., 2002, 2007; Wang et al., 2010). Aumont & MacíasPérez (2007) furthermore separates instrumental noise from the desired astrophysical signal. Other applications include datacompression of sparse, large data sets to improve model fitting efficiencies (e.g. Lu et al., 2006; Delabrouille et al., 2003).
2.1 ICA in the context of exoplanetary lightcurves
In this publication, we focus on the application of ICA to exoplanetary lightcurve analysis. Let us consider multiple time series observations of the same exoplanetary eclipse signal either in parallel, by performing spectrophotometry with a spectrograph, or consecutive in time (as explained in section 5.3).
Without excluding the most general case, let us focus on a timeresolved spectroscopic measurement of an exoplanetary eclipse. For most observations, the signal recorded is a mixture of astrophysical signal, Gaussian (white) noise and systematic noise components originating from instrumental defects and other sources such as stellar activity and telluric fluctuations. We can therefore write the individual time series as sum of the desired astrophysical signal, , systematic (nonGaussian) noise components, , and Gaussian noise, . We can now define the underlying linear model of our time series data to be
(5) 
or
(6) 
where and are the number of systematic and white noise components respectively and assuming only one component is astrophysical.
2.2 Demixing signals using ICA
The basic assumptions of ICA are that the elements comprising , , are mutually independent random variables with probability densities, . We further assume that all (or at least one) of the probability densities, , are nonGaussian. This nonGaussianity is key since it allows the demixing matrix, , to be estimated. From the central limit theorem, we know that a convolution of any arbitrary probability distribution functions (pdfs) that feature a formal mean and variance, asymptotically approaches a Gaussian distribution in the limit of large (Riley et al., 2002). In other words, the sum of any two nonGaussian pdfs (ie. and ) is more Gaussian than the respective original pdfs. Therefore by maximising the nonGaussianity of the individual signals, we maximise their statistical independence. (Comon, 1994; Hyvärinen, 1999; Hyvärinen & Oja, 2000; Koldovský et al., 2006; Hyvärinen & Oja., 2001; Comon & Jutten, 2000; Stone, 2004).
2.2.1 Information Entropy
Although several measures of nonGaussianity exist (we refer the reader to Cichocki & Amari (2002), Hyvärinen & Oja (2000), Hyvärinen & Oja. (2001) and Comon & Jutten (2000) for detailed summaries), we here use the concept of ’negentropy’ (Brillouin, 1953). Negentropy is derived from the basic informationtheoretical concept of differential entropy. In information theory, in close analogy to thermodynamics, the entropy of a system is at its maximum when all data points are at its most random. In thermodynamics we measure the distribution of particles, in information theory it is the probability distribution of a random variable. From information theory we can derive the fundamental result that a Gaussian distribution has the largest entropy among all random variables of equal variance and known mean. Hence, by minimising the entropy of a variable, we maximise its nonGaussianity. For a random vector , with random variables , , the entropy is given by
(7) 
where H is the differential or Shannon entropy (Shannon, 1948) and is the pdf of the random vector . H is at its minimum when is at its most nonGaussian. We can now normalise equation 7 to yield the definition of negentropy:
(8) 
where is a random Gaussian vector with the same covariance matrix as . Now is at its most nonGaussian when is at its maximum. It is important to note that negentropy is insensitive to a multiplication by a scalar constant. This has the important consequence of introducing a sign and scaling ambiguity into the retrieved signal components of . We will discussed the implications of this limitation in section 4.
2.2.2 Contrast functions
In practice it is very difficult to calculate the negentropy of a system and various methods were devised to approximate . The classic method is to measure the kurtosis of meansubtracted with unit variance. However, kurtosis is very prone to distortions by outliers in the data and hence lacks the robustness required as measure of negentropy (Hyvärinen & Oja., 2001). To overcome this limitation, more robust measures of negentropy have been devised. One can approximate negentropy by equation 9 (Hyvärinen & Oja., 2001; Hyvärinen, 1999; Comon & Jutten, 2000; Stone, 2004)
(9) 
where is a random Gaussian variable with zero mean and unit variance and is a nonquadratic contrast function chosen to approximate the underlying probability distribution. There are usually three types of contrast functions: as general purpose function, optimised for superGaussian (leptokurtic) distributions and optimised for subGaussian (platykurtic) distributions (Hyvärinen, 1999; Hyvärinen & Oja., 2001; Comon & Jutten, 2000):
(10)  
The choice of contrast function is only important if one wants to optimise the performance of the ICA algorithm as it is done for the EFICA (Koldovský et al., 2006) algorithm where choices of contrast functions are tried iteratively.
2.2.3 FastICA
Finally, we can maximise the negentropy given in equation 9 by finding a unit vector that maximises the nonGaussianity of the projection , so that is at its maximum. For the fixedpoint FastICA algorithm, this can be achieved by the simple iterations scheme (Hyvärinen, 1999; Hyvärinen & Oja, 2000):

Choose initial (i.e. random) weight vector

Increment:

Normalise:

Repeat steps 2 & 3 until converged
where and are the first and second derivatives of the contrast function . The iteration scheme above estimates only one weight vector at a time, for estimating all nonGaussian components in parallel the iteration scheme and derivates of are given in appendix C.1 and in the standard literature (e.g. Hyvärinen, 1999; Hyvärinen & Oja., 2001; Comon & Jutten, 2000; Koldovský et al., 2006). For a comprehensive summary of other ICA algorithms please refer to (Comon & Jutten, 2000). Throughout this paper, we use a variant of the FastICA algorithm.
2.2.4 Projection Pursuit and ICA
Projection Pursuit and Independent Component Analysis are inherently linked as both algorithms try to represent the most nonGaussian components in an multidimensional data set. In this sense, one can think of ICA as a variant of PP with two major differences: 1) PP only estimates one nonGaussian component at a time whilst ICA is the multivariate definition of PP and estimates all nonGaussian components simultaneously, 2) as opposed to ICA, PP does not need an underlying data model and no assumptions about independent components are made. If the ICA model holds, optimizing the ICA nonGaussianity measures produce independent components; if the model does not hold, then what we get are the projection pursuit directions (Hyvärinen & Oja, 2000; Stone, 2004). This is an important point to make with regards to timeconsecutive transit observations, which break the underlying ICA assumptions otherwise.
3 The algorithm
Following from the discussion above, we can understand the signal demixing to be a two step process: decorrelation of the Gaussian components in the observed data using PCA, followed by the decorrelation of nonGaussian components using ICA and WASOBI algorithms. The decorrelation of Gaussian components to form a new uncorrelated vectors set can be understood as preprocessing step to the ICA procedure. The algorithm proposed here consists of five main parts: 1) Preprocessing of the observed data, 2) Signal separation, 3) Signal reconstruction 4) Lightcurve fitting and 4) Postanalysis. Figure 1 lays out the individual processing steps of the algorithm.
3.1 Signal preprocessing
Similarly to section 2, the observed signals can be expressed as dimensional matrix X where each row constitutes an individual time series, , with number of data points. Multiple time series observations are needed to separate the instantaneously mixed nonGaussian components. The process of identifying statistical independent components is greatly simplified if the input signals to any ICA algorithm have previously been whitened (also referred to as sphering). Whitening is essentially a transformation of our input data matrix X into a mean subtracted, , orthogonal matrix , where its autocovariance matrix, , equals the identity matrix, . The instantaneous mixing model for the whitened data is now given by
(11) 
where is the inverse square root of and the corresponding mixing matrix of . For a more detailed explanation see Appendix.
This whitening is easily achieved by performing a principal component analysis on the data (see Appendix). This step has two distinct advantages:
1) It reduces the complexity of the unwhitened mixing matrix, , from parameters, to for a whitened, orthogonal matrix (Hyvärinen & Oja., 2001). 2) Using whitening by principal components, we can reduce the dimensionality of the dataset by only maintaining a subset of eigenvectors. This reduces possible redundancies of the components comprising the data and prevents the later to be employed ICA algorithm from overlearning for overcomplete sets.
We also like to note that any type of additional linear signal cleaning or preprocessing step, such as those described by Carter & Winn (2009); Waldmann et al. (2011), are allowed. Linear data filtering or cleaning can be understood as multiplying equation 3 from the left with a linear transformation to get: . The underlying data model assumed in this paper is hence not affected.
3.2 Signal separation
After the observed signals have successfully been whitened (), we estimate the mixing matrix of the whitened signal, , using the MULTICOMBI algorithm (Tichavský et al., 2006a). MULTICOMBI comprises two complimentary algorithms, EFICA (Koldovský et al., 2006) and WASOBI (Yeredor, 2000). EFICA, an asymptotically efficient variant of the FastICA algorithm (Hyvärinen, 1999), is designed to separate nonGaussian, instantaneously mixed signals. WASOBI, on the other hand, is an asymptotically efficient version of the SOBI algorithm (Belouchrani et al., 1997), and is geared towards separating Gaussian autoregressive (AR) and timecorrelated components. It uses secondorder statistics and can be understood to be similar to principal component analysis. The use of both algorithms is necessary since a real life data set will always contain a mixture of both, nonGaussian and Gaussian AR processes. For a more indepth discussion of the algorithms employed here, we like to refer the interested reader to the appendices C.1 & C.2 and the original publications.
The EFICA and WASOBI algorithm can be shown to be asymptotically efficient, i.e. the estimators approach the CramérRao lower bound (Davison, 2009). In other words, the algorithms employed here can be shown to converge to the correct solution given the original source signals and in the limit of iterations. In reality the number of iterations is finite and and imperfect convergence results in traces of other sources to remain in the individual signals comprising . We can hence state that equation 4 becomes
(12) 
A measure of this error is the deviation of (or for the whitened case) from the unity matrix by inspecting the variance of its elements (Koldovský et al., 2006; Hyvärinen & Oja., 2001). This leads to the concept of the interferenceoversignal ratio (ISR) matrix. The ISR is the standard measure in signal processing of how well a given signal has been transmitted or deconvolved from a mixture of signals. It can be understood as the inverse of the signaltonoise ratio (SNR). The higher the ISR for a specific signal, the less well has it been separated from the original mixture. For a real case example, is unknown and the ISR needs to be estimated. An analytic approximation to the ISRs for the EFICA and WASOBI algorithms are found in the appendices C.1 & C.2.
Finally, we check the stability of the signal separation by perturbing the input matrix by a random and known matrix to give
(13) 
We now rerun the MULTICOMBI procedure using as input and estimate . Knowing we can work backwards to obtain as we are dealing with a linear transformation. This step is repeated several times to check the convergence of the algorithm by inspecting the variation on the mean ISR values of each separation attempt and the variations in consecutive estimations of directly.
3.3 Signal reconstruction
Once the mixing matrix, is estimated, we need to identify which signals are astrophysical, which ones are white and which are systematic noise. This is done in a two step process:
1) We construct the estimated signal matrix, , and for its individual components compute the Pearson correlation coefficient between and the first principal component of the PCA decomposition in section 3.1. For medium signal to noise (SNR) observations, the first principal component (PC), ie. the one with the highest eigenvalue associated to it, will contain the predominant lightcurve shape. As previously discussed, the first PC is not perfectly separated from the systematic signals and hence cannot be used directly for further analysis but it is good enough to use it as lightcurve identification. The identified lightcurve signal is labeled .
2) Once the lightcurve signal is identified, we exclude this row from and proceed to classify the remaining signals with respect to their nonGaussianity (ie. systematic noise sources). Here we use the LjungBox portmanteau test (see Appendix and Brockwell & Davis, 2006) to test for the hypothesis that the time series is statistically white (ie. Gaussian). This test was originally designed to check the residuals of autoregressive movingaverage (ARMA) models for significant departures from Gaussianity. It is hence ideally suited for our need to identify which estimated signal components are the desired nonGaussian ones.
The identified nonGaussian, systematic noise, signals are hence labeled and the remaining white noise signals to give
(14) 
where has dimensions and , , , have dimensions of , and respectively where . The demixing matrix is given by .
As previously mentioned, the components of have ambiguities in scaling and sign and can be thought to be similar to the eigenvectors of a principal component analysis with missing eigenvalues. Fortunately there exist two approaches to resolving this degeneracy:

In the case of being well separated as individual component, we can take and the demixing matrix and only retain the row containing the astrophysical signal component forming the rowvector . We then reconstruct the original data using only the separated signal component:
(15) where is the reconstructed whitened data with all but the astrophysical signal components removed. Using equation 11, we can now calculate the unwhitened matrix .
(16) (17) Hence we can think of as a linear, optimal filter for the signal component in . Please note that this linear filtering does not impair the scaling information as this is reinstated going from to .

In the case of not being well separated but other systematic noise components are, a different, more indirect approach can be used. Here, the systematic noise components, which do not contain sign or scaling information, are simultaneously fitted to the time series data (preferably outoftransit data), . We therefore define the systematic noise model for an individual time series by ,
(18) where is a k k diagonal scaling matrix of , which needs to be fitted iteratively as free parameters in the following section.
3.4 Lightcurve fitting
Having either filtered the data to obtain or constructed the noise model , we can now fit the original time series, using the standard analytical lightcurve models (Mandel & Agol, 2002; Seager & MallénOrnelas, 2003) in addition to the diagonal matrix , if necessary. For the purpose of this paper, which focuses on blindsourceseparation, we will restrict ourselves to demonstrating the feasibility of estimating and only leave the transit depth as variable lightcurve parameter. We use the analytic lightcurve model by Mandel & Agol (2002) and a NelderMead minimisation algorithm (Press et al., 2007). For real data applications, we advise the reader to use Markov Chain Monte Carlo methods, or similar, which have become standard in the field of exoplanets and allow the estimation of the posterior probability distributions and their associated errors (Bakos et al., 2007; Burke et al., 2007; Cameron et al., 2007; Ford, 2006; Gregory, 2011).
3.5 Postanalysis
Once the model fitting stage has been completed, we are left with fitting residual, , i.e. . Several tests are useful to determine how well the signals have been removed from the original time series, . In the case of a full Markov Chain Monte Carlo fitting, the posterior distributions of the fitting parameters may be used to asses the impact of the remaining systematic noise in the data when compared to a simulated data set only containing white noise. Portmanteau tests on individual time series are useful to test for nonwhite noise signals and allow a measure of the overall performance of the algorithm (Brockwell & Davis, 2006). Additionally, we can determine the KullbackLeibler divergence (Kullback & Leibler, 1951) of our residual’s probability distribution function (pdf) to an idealised Gaussian case.
For the simulations and realdata examples presented in the following section, we have merely plotted the autocorrleation functions (ACF) of the residuals obtained to determine whether for a given lag, these are within the 3 confidence limit of the residual being dominated by whitenoise (Brockwell & Davis, 2006; Davison, 2009). Here the ACF is given by:
(19)  
where is the number of data points in the time series, the specific lag and the confidence intervals are given by .
4 Simulations
In order to test the behaviour and efficiency of the algorithm described above, we produced a toy model simulation with five observed signals: 1) a secondary eclipse Mandel & Agol (2002) lightcurve; 2) a sinusoidal signal; 3) a sawtooth function; 4) a fourth order autoregressive signal to simulate timecorrelated signals; 5) Gaussian noise with a full width half maximum (FWHM) of 0.01 magnitudes. The premixed signals are displayed in figure 2. This gives us our signal matrix, , which needs to be recovered later on. We have then proceeded to mix the signals in figure 2 using a random mixing matrix, , to obtain our ’observed signals’, , in figure 3. For the sake of comparability we keep the mixing matrix to be the same for all simulations.
We now subdivide the simulations to illustrate the two possible methods of the signal reconstruction. Method 1 computes using equation 16 whilst Method 2 fits the noise model (equation 18) simultaneously with the Mandel & Agol (2002) lightcurve. These two examples demonstrate that both techniques work equally well for a well behaved data set.
4.1 Method 1: Filtering out the signal
In this example, we use the ’observed’ signals in figure 3 as input to the algorithm. We however do not perform a dimensionality reduction using PCA since we are not dealing with an overcomplete set in this example. The results of the separation are shown in figure 5. Here the top three, red lightcurves are the estimated systematic noise components as identified by the algorithm. The fourth component is Gaussian noise and the bottom is an inverse of the lightcurve signal. It should again be noted here that the blindsource separation does not preserve the scaling nor the signs of the signals in . However, when the original data is reconstructed using only the signal component, , to obtain (equation 16), the scaling and sign informations are reinstated. For a well behaved data set, i.e. one that obeys the instantaneous mixing model and has negligible Gaussian noise in their signal components, it is therefore possible to reconstruct the lightcurve signal from the raw data as explained in section 3.3. Figure 4 shows the top lightcurve of figure 3 (blue circles) and overplotted the retrieved signal component (red crosses) and offset below the systematic noise component (black squares).
As a useful byproduct of the algorithm, we obtain the interference over signal matrices (ISR, equations C.1 C11 in the Appendix C) for both the EFICA and WASOBI algorithms. These give us valuable information on the efficiency at which the signals have been separated. Figure 6 shows the Hinton diagrams of the EFICA and WASOBI matrices. Here, the smaller the offdiagonal elements of the matrix, the better the signal separation. In this example, the EFICA algorithm outperforms the WASOBI one, which is to be expected since all signals but one are nonGaussian.
4.2 Method 2: Fitting a noise model to the data
In the previous section, we have shown that in the case that the astrophysical component is well separated as individual signal, we can create a filter for the raw data that directly filters the lightcurve signal from the noise. However, in most real data applications, , is not perfectly separated but the components of may be more so. In this case we can construct the noise model given by equation 18 and the diagonal elements of are fitted as described in section 3.4. The starting position of the algorithm is the same as for the previous example (figure 3). The model fit of the first lightcurve in figure 3 and its residuals are shown in figure 7. The autocorrelation function for 100 lags is plotted in figure 8. All but two lags are within the 3 confidence limit that the residual is white noise dominated, indicating that all signals have been removed effectively.
Finally we simulate the convergence properties of both EFICA and WASOBI under varying white noise conditions. Here we repeatedly run the algorithm until signal separation is completed and record the mean ISRs of the source separation. We performed this simulation 300 times for Gaussian noise FWHMs varying from 0.0  0.3 magnitudes (figure 7 has a FWHM = 0.01) and every ISR measurement reported is the mean of 10 iterations. Figure 9 summarises the results. Here, the red circles represent the mean ISR or the EFICA algorithm and the blue crosses that of WASOBI. It can clearly be seen that for this example the EFICA algorithm outperforms WASOBI and on average reaches lower ISR values. We can further note that the blind source separation is not significantly affected by the magnitude of the white noise and performs well under difficult signal to noise conditions.
5 Application to data
The previous examples illustrated the two methods available to correct the observed data: Method 1: Filtering the astrophysical signal out of the systematic noise or Method 2: fitting a systematic noise model to the raw data. Here we apply these techniques to two primary eclipse data sets of HD 189733b and XO1b recorded by the NICMOS instrument on the Hubble Space Telescope as well as a single timecorrelated time series obtained by the Kepler spacecraft .
5.1 HST/NICMOS: HD 189733b
First presented by Swain et al. (2008), this data set of the primary eclipse of HD 189733b was recorded using HST/NICMOS in the G206 grism setting spanning five consecutive orbits. The HSTpipeline calibrated data were downloaded from the MAST^{1}^{1}1http://archive.stsci.edu/ archive and the spectrum was extracted using both standard IRAF^{2}^{2}2http://iraf.noao.edu/ routines as well as a custom built routine for optimal spectral extraction. Both extractions are in good accord with each other but the custom built routine was found to yield a better signal to noise and was subsequently used for all further analysis. A binning of 10 spectral channels ( 0.08m) was used resulting in 10 light curves across the G206 grism band. Figure 10 shows the obtained time series which serve as input to the algorithm. It can be seen that each time series is strongly affected by instrument systematics propagating from the blue side of the spectrum (bottom light curve) to the red with varying intensity and even sign. Swain et al. (2008) showed that these systematics are correlated to instrument state vectors such as orbital phase, relative positions and angles of the spectrum on the detector, instrument temperature, etc. We can hence expect that these systematics are statistically independent from the recorded astrophysical signal (the light curve) and it should therefore be in principle possible to decorrelate the signal.
We here demonstrate the detrending on an individual light curve at 1.694m (8 one down in figure 13). All time series in figure 13 were taken as input to the algorithm described above to estimate the demixing matrix , the astrophysical signal vectors, and the systematic noise vectors, . The interference over signal (ISR) matrix indicated the good separation of four main components figure 11 with the rest of the components being classified as predominantly Gaussian or weakly systematic. The existence of more than one Gaussian component () indicates that the set is overcomplete. However since the dataset is small enough, no PCA dimensionality reduction was performed. After the algorithm has identified the correct astrophysical signal, it proceeded to reconstruct the light curve using both methods described above.
Method 1: The astrophysical signal was filtered using equations 15 16. Figure 12 shows the raw light curve (blue circles) with the detrended time series, underneath (green squares). Superimposed light curves were computed using Mandel & Agol (2002) with orbital parameters were taken from Winn et al. (2007) and limbdarkening parameters from Claret (2000). It is clear that the detrended light curve is an improvement to the raw time series but that systematics still remain in the data. This is further illustrated by plotting the autocorrelation function of the modelfit residual in figure 16 (red circles). Here, residual correlation can be observed in particular at low lags. This is a consequence of the astrophysical signal, , being well separated but as shown in figure 11 (component 1), there remains some weak interference between the and other vectors, which is a consequence of equation 12 and to be expected for real datasets.
Method 2: The second method is a less direct approach. Instead of filtering for the astrophysical signal directly, we try to construct a ’systematic noise model’ that is then subtracted off the raw data. Using equation 18 and a simplex downhill algorithm (Nedler & Mead, 1965) we estimated the scaling matrix, , by fitting the the systematic noise vectors, to the four out of transit orbits. The scaled systematic noise vectors are shown in figure 14 which combine to form the systematic noise model, , in figure 13. It should be noted that is only a scaling matrix of the individual vectors as the scaling information is not preserved by the independent component analysis. Hence, relative intra and interorbit variations are preserved. Figure 15 shows the corrected data by subtracting the systematic noise model off the raw data. Inspecting the fitting residual’s autocorrelation function in figure 16 (black circles) indicates the residual to be statistically white and a maximal decorrelation of the data has been achieved.
5.2 HST/NICMOS: XO1b
Originally presented by Tinetti et al. (2010), the primary eclipse of XO1b was observed using the HST/NICMOS instrument in the G141 grism setting. The HSTpipeline calibrated data was downloaded and the spectra extracted using the same settings as for section 5.1. This yielded 10 light curves and which serve as input to the algorithm, see figure 17. Similar to the HD 189733b the algorithm retrieved four main components, the light curve signal and three main systematic noise components. The ISR matrix can is shown in figure 18. We now proceeded to detrending the light curve at the very red end of the spectrum (first from top in figure 17) as it, after visual inspection, exhibits the most prominent systematics of the 10 time series. Light curve fits assumed limbdarkening and orbital parameters by Burke et al. (2010).
Method 1: Figure 19 shows the raw time series and the detrended light curve using equation 16. The light curve is significantly detrended but systematics remain in the data as also shown by the autocorrelation function (red circles) in figure 23.
Method 2: As described in previous sections, figure 21 shows the retrieved systematic noise vectors and figure 22 features the ’raw’ data with the combined systematic noise model (red) underneath. The autocorrelation function of the model fit residual is shown in figure 23 (black crosses) and shows a factor 2 improvement on the decorrelation in the lower lags.
Figure 20 compares the detrended light curves of method 1 and method 2 and shows the residual of method 1  method 2 (black crosses). There is little difference between both methods indicating that the signal separation for this dataset is close to its maximum with the data being partially decorrelated. This is in contrast to the HD 189733b example where a near perfect decorrelation was achieved and can be attributed to the systematics being mostly wavelength invariant in the case of XO1b. In other words, systematic noise components which have a constant weighting throughout the data set cannot be decorrelated using ICA or PCA methods, which is to be expected following equations 1  3.
5.3 Kepler: Star 10118816
In the previous examples we have shown that spectroscopic datasets can be decorrelated effectively. We test here how well the proposed algorithm can decorrelate consecutively observed datasets. This is of particular interest in cases where no multichannel data are available and the time series data are contaminated with timecorrelated noise, be it stellar or instrumental. As opposed to the previous examples, where several time series, , were observed simultaneously, here we take a single time series covering several consecutive eclipse features and cut the time series into segments spanning equal lengths over each eclipse event. Using these segments as inputs to the algorithm clearly violates the underlying assumptions of the independent component analysis, as the mixing is not instantaneous. In this case, the ICA analysis can be understood as a Projection Pursuit (PP) analysis, see section 2.2.4 and (Hyvärinen & Oja, 2000; Stone, 2004; Huber, 1985). Here the ICA algorithm, in the absence of a working ICA datamodel, will try to extract as many nonGaussian components as possible and return the rest of the data in its original form. This is very similar to Projection Pursuit, where the data is not described by an underlying data model at all but only the most nonGaussian component is retrieved. In other words, we can only expect to retrieve the eclipse signal component, , with any degree of accuracy. As a result we will not be able to retrieve systematic noise components, , and we can only use (in section 3.3) to detrend the data.
We have downloaded data observed by the Kepler space telescope (Borucki et al., 1996, 2010; Jenkins et al., 2010; Caldwell et al., 2010; Koch et al., 2010) for a planethosting candidate star observed over the second and third datarelease quarters (Q2 & Q3). The time series, with the Kepler ID: 10118816, exhibits highly variable features and significant timecorrelated noise (see figure 24, blue crosses). Given Kepler’s superb instrument calibration, we can assume this timecorrelated noise to be due to stellar variability. Using the periodogram calculator on the NSteD database^{3}^{3}3\(http://nsted.ipac.caltech.edu/applications/ETSS/Kepler\_index.html\), we identified four main periodically recurring signals in the dataset. Choosing the second strongest feature with a period of 0.040915 days, we phase folded the data and cut the time series in 10 equally sized segments. As for the previous examples we now took these time series segments as input to the algorithm.
We performed our decorrelation as for the previous examples but using Method 1 only. Figure 25 shows the ISR matrix of the separation indicating a relatively poor separation of the components but two (components 4 & 9). As discussed above, this behaviour is to be expected with the breaking of the instantaneous mixing model. Nonetheless, we obtained a clear feature (component 4) in our analysis which is over plotted (red circles) on the mean, phasefolded data (blue crosses) in figure 26. Here the decorrelated signal has a much reduced scatter compared to the mean of the phasefolded feature, which indicates that much of the unwanted stellar variability has been removed. It is also clear from this figure that we are not dealing with an exoplanetary light curve but a stellar pulsation feature. As expected, the remaining components returned from the algorithm (figure 27) are the residuals of the input data minus the component shown in figure 26. Hence we only used the component in figure 26 to reconstruct the original time series. This was done by using equation 16 on each segment of the time series, followed by adding the segments back together in the order they were originally split up.
Figure 24 shows the original input data (blue crosses) with the filtered signal (red circles) over plotted on top. In the bottom plot (a zoomed in version of the time series), it is clear that the desired feature remains in the filtered time series whilst the contribution of other timecorrelated stellar noise is substantially reduced.
6 Discussion
In the previous sections we have shown that for a set of simultaneously observed time series data (e.g. following an exoplanetary eclipse with a spectrograph) we can describe the data by an instantaneous mixing model (equation 3). This allows the separation of nonGaussian, time and spatiallycorrelated signals from one another. The degeneracy caused by not being able to retrieve the component’s signs or amplitudes can be circumvented in two ways: Method 1) The separated signals are used to construct a linear transformation to filter the astrophysical signal from the originally observed data and hence preserve all scaling information; Method 2) The separated astrophysical signal is not used directly but instead all systematic noise components are combined to form a ‘systematic noise model’ which can then be used to correct the original observed data.
We have explored the efficiency of the signal detrending on two simulated and two HST/NICMOS data sets with different types of systematic noise due to different grisms. The simulations demonstrate the two methods of detrending the data in an idealised case and explore the efficiency of the signal separation in the presence of varying Gaussian noise in the data. In the instantaneous mixing model employed here, Gaussian noise sources are only indirectly allowed and can interfere with the effectiveness of separating nonGaussian vectors. We tested this point by adding additional Gaussian noise components of variable amplitude to the simulations but did not observe any significant reductions in the signal separation efficiency.
We proceeded to analyse two HST/NICMOS data sets: the primary eclipses of HD189733b and XO1b. For both data sets we find the Method 2 to yield better results. In the case of HD189733b, we can achieve a near perfect decorrelation of astrophysical signal and systematic noise and no further steps are necessary to the decorrelation process. A more in depth discussion of this data set and HST/NICMOS systematics is beyond the scope of this publication. In the case of XO1b the decorrelation is significant but incomplete. The difference in maximum decorrelation achievable can be attributed to the systematic noise sources being strong functions of wavelength in the case of HD189733b whilst almost with constant weighting ( in equation 2) in the case of XO1b.
Whenever systematics have constant weighting per channel observed () and/or time, it becomes very difficult for PCA or ICA based approaches to decorrelate the signal from the systematics. Here auxiliary information of the instrument is needed to proceed with the decorrelation process. This is very well possible in the case of dedicated instruments such as Kepler, as these have been specifically designed with such high precision and stability measurements in mind (Borucki et al., 1996; Jenkins et al., 2010). For instruments that do not feature the calibration plan required to further decorrelate with instrument state parameters, the solution is far less obvious.
We furthermore explored the decorrelation of eclipse signals observed consecutively rather than in parallel. We demonstrated, using Kepler data, that despite the formal violation of the ‘instantaneous mixing model’, the proposed algorithm is able to retrieve the desired signal component with good accuracy. Such an application is particularly important for treating variability of the hoststar which can significantly impair the quality of the final science result (eg. Czesla et al., 2009; Boisse et al., 2011; Aigrain et al., 2011; Ballerini et al., 2011).
It is furthermore interesting to note that pre and postprocessing steps (e.g. wavelets (Carter & Winn, 2009), Fourier based techniques (Waldmann et al., 2011), decorrelation using instrument state parameters (Swain et al., 2008)), do not break the instantaneous mixing model and can be run in conjunction with ICA methods. This makes independent component analysis a very powerful and versatile tool for nonparametric decorrelation of exoplanetary data sets.
7 Conclusion
In the light of searching and characterising ever smaller and fainter exoplanetary targets, the development of novel detrending routines becomes increasingly critical. Based on the concepts of blind source deconvolution of instantaneously mixed signals, we have presented a first step towards nonparametric corrections and data filters that do not require additional information on the systematic noise of the instrument or stellar activity. Such algorithms have two important applications:
1) For instruments that lack a calibration plan at the accuracy of 10 in flux variation, which is required for spectroscopy of exoplanetary atmospheres, the spectroscopic signatures become inherently entangled and dependent on the method used to correct instrument and other systematics in the data. The decorrelation of spectroscopic data was demonstrated using two HST/NICMOS data sets.
2) Detections of faint exoplanetary eclipses are often made difficult by timecorrelated activity of the host star. We demonstrated, using a single Kepler time series, that much of the stellar variability can be removed in time series that span several exoplanetary eclipse events.
The algorithm proposed is a powerful tool for lightcurve detrending, which can be used by its own or in conjunction with any other type of data filtering or cleaning technique. This becomes an invaluable advantage for data analysis when the instrument’s response function is unknown or poorly characterised.
References
 Aigrain et al. (2011) Aigrain, S., Pont, F., Zucker, S., 2011, arXiv: 1110.1034
 Agol et al. (2010) Agol, E., Cowan, N. B., Knutson, H. A., Deming, D., et al., 2010, ApJ, 721, 1861
 Aumont & MacíasPérez (2007) Aumont, J. & MacíasPérez, J., F., 2007, MNRAS, 376, 739
 Bakos et al. (2007) Bakos, G. Á., Shporer, A., Pál, A., Torres, G., Kovács, G. et al., 2007, ApJL, 671, L173
 Ballerini et al. (2011) Ballerini, P., Micela, G., Lanza, A. F., Pagano, I., 2011, A&A, submitted
 Bean (2011) Bean, J. L., , 2011, Nature, 478, 41
 Bean et al. (2011) Bean, J. L., Désert, J.M., Kabath, P., et al., 2011, ApJ, 743, 92
 Beaulieu et al. (2008) Beaulieu, J. P.; Carey, S.; Ribas, I. and Tinetti, G., 2008, ApJ, 677, 1343
 Beaulieu et al. (2011) Beaulieu, J. P., Kipping, D. M., Batista, V., Tinetti, G., Ribas, I. , Carey, S. et al., 2010, arXiv,0909.0185
 Belouchrani et al. (1997) Belouchrani A., AbedMeraim K., Cardoso J.F., Moulines E., 1997, IEEE Trans. Signal Processing, vol. 45, 434
 Boisse et al. (2011) Boisse, I., Bouchy, F., Hébrard, G., et al., 2011, A&A, 528, A4
 Borucki et al. (1996) Borucki, W. J., Dunhm, E. W., Koch, D. G., Cochran, W. D., et al., 1996, Astrophys. & Space Science 241, 111
 Borucki et al. (2010) Borucki, W. J., Koch, D., Basri, G., et al., 2010, Science, 327, 977
 Borucki et al. (2011) Borucki, W. J.,Koch, D. G., Basri, G. et al. 2011, ApJ, 736, 19
 Brillouin (1953) Brillouin, L., 1953, J. of Applied Physics, 24, 1152
 Brockwell & Davis (2006) Brockwell P.J., Richard A.D., 2006, ’Time Series: Theory and Methods (second edition)’, Springer Verlag, ISBN: 1441903198
 Brunt et al. (2010) Bruntt, H., Deleuil, M., Fridlund, M., Alonso, R., et al., 2010, A&A, 519, A51
 Burke et al. (2007) Burke, C. J., McCullough, P. R., Valenti, J. A., JohnsKrull, C. M., Janes, K. A. et al., 2007, ApJ, 671, 2115
 Burke et al. (2010) Burke, C. J., McCullough, P. R., Bergeron, L. E., et al., 2010, ApJ, 719,1796
 Caldwell et al. (2010) Caldwell, D. A., Kolodziejczak, J. J., Van Cleve, et al., 2010, ApJL, 713, L92
 Carter & Winn (2009) Carter J.A., Winn J.N., ApJ, 2009, ApJ, 704, 51t
 Cameron et al. (2007) Collier Cameron, A., Wilson, D. M., West, R. G., Hebb, L., et al., 2007, MNRAS, 380, 1230
 Charbonneau et al. (2000) Charbonneau, D., Brown, T., M., Latham, D., W., Mayor, M., 2000, ApJ, 529, L45
 Charbonneau et al. (2002) Charbonneau, D., Brown, T. M., Noyes, R. W. and Gilliland, R. L., 2002, ApJ, 568, 377
 Charbonneau et al. (2005) Charbonneau, D. and Allen, L. E., Megeath, S. T., Torres, G., Alonso, R., Brown, T. M., et al., A., 2005, ApJ, 626,523
 Charbonneau et al. (2008) Charbonneau, D., Knutson, H. A., Barman, T., Allen, L. E., Mayor, M., et al., S., 2008, ApJ, 686,1341
 Cichocki & Amari (2002) Cichocki A., Amari S., 2002, John Wiley & Sons Inc., Adaptive Blind Signal and Image Processing, ISBN: 0471607916
 Claret (2000) Claret, A., 2000, AA, 363, 1081
 Comon (1994) Comon, P., 1994, Signal Processing, 36, 287
 Comon & Jutten (2000) Comon, P., & Jutten, C., 2000, Handbook of Blind Source Separation: Independent Component Analysis and Applications, Academic Press
 Czesla et al. (2009) Czesla, S., Huber, K., F., Wolter, U., Schröter, S., Schmitt, H., M., M., 2009, A&A, 505, 1277
 Davison (2009) Davison, A. C., 2009, Cambridge University Press, Statistical Models, ISBN: 9780521734493
 Delabrouille et al. (2003) Delabrouille, J., Cardoso, J.F. & Patanchon, G., 2003, MNRAS, 346, 1089
 Deming et al. (2007) Deming, D., Richardson, L. J. & Harrington, J., 2007, MNRAS, 378, 148
 Doron & Yeredor (2004) Doron E., Yeredor A., 2004, Proc. of ICA 2004, 390
 Ford (2006) Ford E. B., 2006, ApJ, 642, 505
 Friedman (1987) Friedman, J., H., 1987, J. of the American Statistical Association, 82, 249
 Gibson et al. (2011) Gibson, N. P.,Pont, F. & Aigrain, S., 2011, MNRAS, 411, 2199
 Gibson et al. (2011b) Gibson, N. P., Aigrain, S., Roberts, S., et al., 2011, arXiv:1109.3251
 Gillon et al. (2010) Gillon, M. and Lanotte, A. A., Barman, T., Miller, N., Demory, B.O., et al., J., 2010, A&A, 511, 3
 Gregory (2011) Gregory, P. C., 2011, MNRAS, 410, 94
 Grillmair et al. (2008) Grillmair, C. J., Burrows, A., Charbonneau, D., Armus, L., Stauffer, J., Meadows, V., van Cleve, J., von Braun, K. & Levine, D., 2008, Nature, 456, 767
 Harman (1967) Harman, H., H., 1967, Modern Factor Analysis, University of Chicago Press, 2nd edition
 Hatzes et al. (2010) Hatzes, A. P., Fridlund, M., Nachmani, G., Mazeh, T., et al. 2010, arXiv: 1105.3372v1
 Huber (1985) Huber, P., J., 1985, The Annals of Statistics, 13, 435
 Hyvärinen (1999) Hyvärinen A.,1999, IEEE Trans. on Neural Networks, 10(3), 626
 Hyvärinen et al. (1999) Hyvärinen A., Pajunen, P., 1999, Neural Networks, 12, 429
 Hyvärinen & Oja (2000) Hyvärinen A., Oja, E., 2000, Neural Networks, 13, 411
 Hyvärinen & Oja. (2001) Hyvärinen A., Karhunen J., Oja E., 2001, John Wiley & Sons Inc., Independent Component Analysis, ISBN: 047140540X
 Jenkins et al. (2010) Jenkins, J. M., Caldwell, D. A., Chandrasekaran, et al., 2010, ApJL, 713, L87
 Jolliffe (2002) Jolliffe I.T., 2002, Springer Verlag New York Inc., ISBN: 0387954422
 Knutson et al. (2007a) Knutson, H. A., Charbonneau, D., Allen, L. E., Fortney, J. J., et al., 2007, Nature, 447,183
 Knutson et al. (2007b) Knutson, H. A., Charbonneau, D., Noyes, R. W., Brown, T. M., Gilliland, R. L., 2007, ApJ, 655,564
 Knutson et al. (2011) Knutson, H. A., Madhusudhan, N. , Cowan, N. B., Christiansen, J. L., Agol, E., Deming, D., et al., 2011, arXiv: 1104.2901
 Koch et al. (2010) Koch, D. G., Borucki, W. J., Basri, G., et al., 2010, ApJL, 713, L79
 Koldovský et al. (2006) Koldovský Z., Tichavský P., Oja E., IEEE Trans. on Neural Networks, 17(5),1265
 Koldovský et al. (2005) Koldovský Z., Tichavský P., Oja E., 2005, Proc.of IEEE/SP 13th Workshop on Stat. Signal Processing
 Kullback & Leibler (1951) Kullback, S., Leibler, R.A., 1951, Annals of Mathematical Statistics 22, 79
 Lagarias et al. (1998) Lagarias, J.C., J. A. Reeds, M. H. Wright, & P. E. Wright, 1998SIAM Journal of Optimization, 9(1), 112
 Lu et al. (2006) Lu, H., Zhou, H., Wang, J., et al., 2006, ApJ, 131, 790
 Maino et al. (2002) Maino, D., Farusi, A., Baccigalupi, C., et al., 2002, MNRAS, 334, 53
 Maino et al. (2007) Maino, D., Donzelli, S., Banday, A., J., Stivoli, F., Baccigalupi, C., 2007, MNRAS, 374, 1207
 Mandel & Agol (2002) Mandel, K.and Agol, E., 2002, ApJL, 580, L171
 Manly (1994) Manly, B., J., F., 1994, ’Multivariate Statistical Methods  A Primer’, 2nd Ed., Chapman & Hall
 Marcy et al. (1998) Marcy, G., W., Butler, R., P., Vogt, S., S., Fischer, D., Lissauer, J., J., 1998, ApJL, 505, 147
 Mayor & Queloz (1995) Mayor, M., Queloz, D., 1995, Nature, 378, 355
 Nadaraya (1964) Nadaraya, E. A., 1964, Prob. & its Applic., 10, 186
 Nedler & Mead (1965) Nelder, J. A.m Mead, R., 1965, Computer Journal, 7, 308
 Oja (1992) Oja, E., 1992, Neural Networks, 5, 927
 Pearson (1901) Pearson, K., 1901, Phil. Mag., 2, 559
 Pont et al. (2010) Pont, F., Aigrain, S. & Zucker, S., 2010, MNRAS, 411, 1953
 Press et al. (2007) Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P., 2007, ’Numerical Recipes’, Cambridge Uni. Press, ISBN: 9780521884075
 Redfield et al. (2008) Redfield, S., Endl, M., Cochran, W. D. and Koesterke, L., 2008, ApJL,673,L87
 Riley et al. (2002) Riley, K. F., Hobson, M., P., Bence, S. J., 2002, ’Mathematical Methods for Physics and Engineering’, Cambridge Uni. Press, ISBN: 0521890675
 Schneider et al. (2011) Schneider, J., Dedieu, C., Le Sidaner, P., Savalle, R., Zolotukhin, I., 2011, A&A, 532, 79
 Seager & MallénOrnelas (2003) Seager, S. and MallénOrnelas, G., 2003, ApJ, 585, 1038
 Shannon (1948) Shannon, C., 1948, Bell System Tech. J., 27, 379
 Sing et al. (2011) Sing, D. K., Pont, F., Aigrain, S., et al., 2011, MNRAS, 416, 1443
 Snellen et al. (2008) Snellen, I. A. G. and Albrecht, S. and de Mooij, E. J. W. and Le Poole, R. S., 2008, A&A, 487, 357
 Snellen et al. (2010a) Snellen, I. A. G. and de Kok, R. J. and de Mooij, E. J. W. and Albrecht, S., 2010a, Nature, 465, 1049
 Snellen et al. (2010b) Snellen, I. A. G. and de Mooij, E. J. W. and Burrows, A, 2010b, A&A, 513, 76
 Stevenson et al. (2010) Stevenson, K. B., Harrington, J., Nymeyer, S., Madhusudhan, N., Seager, S., et al., 2010, Nature, 464, 1161
 Stivoli et al. (2006) Stivoli, F., Baccigalupi, C., Maino, D., Stomper, R., 2006, MNRAS, 372, 615
 Stone (2004) Stone, J., V., 2004, Independent Component Analysis: A tutorial Introduction, A Bradford Book
 Swain et al. (2008) Swain, M. R.,Vasisht, G. and Tinetti, G., 2008, Nature, 452, 329
 Swain et al. (2009a) Swain, M. R., Vasisht, G., Tinetti, G., Bouwman, J., Chen, P., et al , 2009, ApJL, 690, L114
 Swain et al. (2009b) Swain, M. R., Tinetti, G., Vasisht, G., Deroo, P., Griffith, C., et al., D., 2009, ApJ, 704, 1616
 Tichavský et al. (2006a) Tichavský P., Doron E., Yeredor A., GomezHerrero G., 2006, Proc. EUSIPCO2006
 Tichavský et al. (2006a) Tichavský P., Doron E., Yeredor A., Nielsen J., 2006, Proc. EUSIPCO2006
 Tinetti et al. (2007) Tinetti, G.,VidalMadjar, A., Liang, M.C., Beaulieu, J.P. , Yung, Y., et al., F., 2007, Nature, 448, 169
 Tinetti et al. (2010) Tinetti, G., Deroo, P., Swain, M. R., Griffith, C. A., Vasisht, G., et al., P., 2010, ApJL, 712, L139
 Thatte et al. (2010) Thatte, A. and Deroo, P. and Swain, M. R., 2010, A&A, 523, 35
 Waldmann et al. (2011) Waldmann I.P., Drossart P., Tinetti G., Griffith C.A., Swain M., Deroo P., ApJ, in press
 Wang et al. (2010) Wang, J., Xu, H., Gu, J., An, T., 2010, arXiv: 1008.3391v1
 Watson (1964) Watson, G.S., 1964, Sanky Series A, 26, 359
 Winn et al. (2007) Winn, J. N., Holman, M. J., Henry, G. W., et al., 2007, AJ, 2007, 133, 1828
 Yeredor (2000) Yeredor A., 2000, IEEE Sig. Proc. Letters, 7, 197
This appendix provides some additional notes to the methods employed in this paper. For a more indepth discussion of the topics presented here, please refer to the cited publications.
Appendix A Uncorrelatedness, orthogonality and independence of Gaussian and nonGaussian signals
In Gaussian statistics, our probability densities are fully defined by the first and second statistical moments, i.e. their means and covariances. Two random vectors, and , are said to be uncorrelated when their covariance () is zero:
where is the expectation value of which can be approximated by the mean in this case by
(A2) 
with being the number of data points in the time series.
Furthermore, we define two random variables ( and ) to be orthogonal, when both their expectation values, in addition to their covariance are zero:
(A3) 
We can always find an affine, linear transformation from a correlated set of variables to an orthogonal one.
Finally, our two random vectors and are independent from one another if and only if the joined probability distribution of both signals are factorizable into the product of their marginal pdfs, and :
(A4) 
and satisfy the property
(A5) 
where and are absolutely integrable functions of and respectively. From the definition of independence in equation A5, we obtain the definition of uncorrelatedness (equation A3) in the special case where both and are linear and are only defined by their covariances (i.e. no higher order statistical moments) (Hyvärinen & Oja, 2000; Hyvärinen & Oja., 2001; Riley et al., 2002). In other words, uncorrelatedness is a special case of independence. Uncorrelated Gaussian random variables are always also independent and the definitions of uncorrelatedness, orthogonality (for zero mean) and statistical independence become identical.
Appendix B Preprocessing
The covariance matrix of , , is given by , where is the matrix of eigenvectors and the diagonal matrix of eigenvalues, . Using principal component analysis (PCA), we compute E and D and the whitening matrix is hence the inverse square root covariance matrix is then given by equation B2 (Hyvärinen & Oja., 2001; Jolliffe, 2002).
(B1) 
(B2) 
where and is the demixing matrix of the whitened observed signals .
Appendix C Blind source separation
At the heart of the algorithm lies the blindsource separation routine. To attain the demixing matrix , many different types and varieties of algorithms are being used in the literature. Here we will use the ’MultiCOMBI’ algorithm developed by Tichavský et al. (2006a) combining a fixed point highorder ICA algorithm to separate nonGaussian sources with a secondorder statistics blindsourceseparation (BSS) algorithm for separating autoregressive (AR) sources. We will briefly outline these algorithms and explain how it is applied to the whitened data obtained in section 3.1.
c.1 Parallel FastICA and EFICA
In section 2.2 we briefly outlined the measures of nonGaussianity, negentropy (equation 8), used throughout this paper and stated that negentropy can be approximated via the kurtosis of the random vector or via the use of contrast functions, equation 9 & 10. In section 2.2.3 we showed stated the iteration scheme to obtain a single independent component (IC) at a time. This is called a deflationary algorithm where the computed IC is subtracted from the data before the second IC is computed. Such an iteration scheme has the property of finding the ICs in the order of decreasing nonGaussianity. However, the main drawback of a serial computation of ICs is that estimation errors in the first ICs propagate in the extraction of later ICs via the orthogonalization step. This effect is cumulative and may signficantly impair weaker ICs. This predicament can be circumvented by estimating all ICs in parallel.
Similar to the single unit iteration, the whitened demixing matrix is at its most mutually independent when the projection is at its most nonGaussian. The FastICA fixed point iteration step is then given by
(C1) 
where is the unnormalised next iteration of , is an N x 1 vector of 1’s and and are the first and second order derivatives of the nonlinear function :
(C2)  
This is followed by a symmetric orthogonalisation step:
(C3) 
For a full derivation we refer you to Hyvärinen (1999) and Hyvärinen & Oja. (2001). Whereas the convergence of the FastICA algorithm is often dependent on the nonlinearity chosen by the user, the EFICA (Koldovský et al., 2006) algorithm employed here is a variant of the above iteration scheme and allows for different nonlinearities to be assigned adaptively to different sources. Koldovský et al. (2006) showed that EFICA is asymptotically efficient, ie. reaches the CramerRao Lower Bound (CRLB) in an ideal case where the nonlinearity equals the score function.
To assert a good degree of separation, we can define as the gain matrix. For a perfectly estimated demixing matrix, , the gain matrix is equal to its identity matrix
(C4) 
In signal processing, the performance of blindsource separation algorithms is usually measured by the interference over signal ratio matrix,
(C5) 
where and denote the observed and estimated sources. The ISR for and individual observed signal is given by
(C6) 
However, the original mixing matrix, A, is not generally known for real data sets and equations C5 & C6 are only useful in the case of simulations. Tichavský et al. (2006a) have shown that the whole ISR matrix for the EFICA algorithm can be approximated by
(C7) 
(C8)  
where and are the k’th and l’th observed and estimated signals of S in equation 3, and the first and second derivative of for signal and is the number of signals estimated. Here it should be mentioned that, of course, the true realisation of each ISR component is unknown and a meanISR is computed leading to the best ’on average’ separation of the signals.
c.2 Wasobi
Whilst EFICA is optimised for the separation of instantaneously mixed, nonGaussian sources, secondorder statistics BSS algorithms rely on timestructure in the sources’ correlation function to estimate . A variety of algorithms exist in the literature, here we use a derivative of the popular SOBI algorithm (Belouchrani et al., 1997), WASOBI (Yeredor, 2000; Tichavský et al., 2006a) to separate Gaussian autoregressive (AR) sources in the input data . Here, the blind source separation follows the same linear model as in equation 3 and the mixing matrix is estimated by a joint diagonalisation of the signals’ autocorrelation matrices. The unknown correlation matrices of the observed signals for a given lag ,
(C9) 
satisfies the relation
(C10) 
where are the source signals’ diagonalised correlation matrices (Yeredor, 2000). Hence, if the correlation matrices are diagonal, ie. the offdiagonal components are zero, the separated signals can be said to be independent from each other. The SOBI & WASOBI algorithms estimate as the joint diagonoliser of a set of correlation matrices. Similar to the EFICA code, we can define an asymptotic estimate of the ISR matrix
(C11) 
(C12) 
where and denote the observed and the estimated sources, is the covariance sequence of the th source, is the variance of the source and are the autoregression coefficients of the th source (Tichavský et al., 2006a).
c.3 MultiCOMBI
The algorithms introduced above are highly complementary to each other. Whilst EFICA has an asymptotically efficient performance in separating nonGaussian instantaneous mixtures, WASOBI is asymptotically efficient in separating Gaussian timecorrelated signals. Both these properties are necessary since a real data set will have both of the aforementioned properties and its components would hence not be optimally demixed if one would only employ one type of algorithm. MULTICOMBI (Tichavský et al., 2006a) uses a clustering technique in which both algorithms are run on the set of unseparated sources and their interference over signal matrices, and , are estimated. The signals are then clustered depending on whether their specific is lower for the EFICA or WASOBI case. Then, the process is repeated until all clusters are singeltons, ie. only contain one signal per cluster, and the signals are hence optimally separated.
Appendix D Convergence check
From the MULTICOMBI algorithm, we obtain the estimated signal matrix , an overall ISR matrix as well as final and . Since the algorithms used here use fixedpoint convergence techniques, the problem of nonrepeatability of the separation process is less than for neural network based approaches. However, it is common sense to check the stability of the result obtained and to estimate the error on .
In order to estimate the stability of the convergence, we perturb the unknown mixing matrix A with a random and known mixing matrix to give a new mixing matrix and equation 3 becomes: . This is equivalent to multiplying the whitened signal with P
(D1) 
We rerun the separation step and estimate . Since P is known, we can reconstruct the original mixingmatrix and compare it with the new result. In the scope of an automated algorithm, the sum of all terms of is compared to the sum of and the result is reported.
To identify the stochastic nature of the retrieval we furthermore rerun the separation step with the same whitened signal, , akin to a Monte Carlo simulation. We perform realisations (where typically) and use the demixing matrices to construct mean noise models later on. This way, we propagate the signal separation error to the modelfitting in a coherent manner.
Appendix E Signal separation
In order to identify the nonwhite (i.e. systematic) signals in our estimated signal matrix , we use the LjungBox portmanteau test (Brockwell & Davis, 2006). The test statistic, usually denoted by , is defined by summing the normalised autocorrelations of the individual time series, over a range of lags:
(E1) 
where is the autocorrelation at lag and is the number of observations in the time series. The hypothesis of the time series being solely white noise is rejected if is bigger than a prespecified fraction of the chisquared distribution
(E2) 
where is the quantile of the chisquared distribution with degrees of freedom (Brockwell & Davis, 2006). Here we take .