ACICA algorithm

On signals faint and sparse: the Acica algorithm for blind de-trending of Exoplanetary transits with low signal-to-noise

I. P. Waldmann University College London, Gower Street, WC1E 6BT, UK ingo@star.ucl.ac.uk
Abstract

Independent Component Analysis (ICA) has recently been shown to be a promising new path in data analysis and de-trending of exoplanetary time series signals. Such approaches do not require or assume any prior or auxiliary knowledge on the data or instrument in order to de-convolve the astrophysical light curve signal from instrument or stellar systematic noise. These methods are often known as ‘blind source separation’ (BSS) algorithms. Unfortunately all BSS methods suffer from a amplitude and sign ambiguity of their de-convolved components which severely limits these methods in low signal-to-noise (S/N) observations where their scalings cannot be determined otherwise. Here we present a novel approach to calibrate ICA using sparse wavelet calibrators. The Amplitude Calibrated Independent Component Analysis (ACICA) allows for the direct retrieval of the independent components’ scalings and the robust de-trending of low S/N data. Such an approach gives us an unique and unprecedented insight in the underlying morphology of a data set, making this method a powerful tool for exoplanetary data de-trending and signal diagnostics.

methods: data analysis — methods: statistical — techniques: photometric — techniques: spectroscopic

1 Introduction

As we explore smaller and smaller extrasolar planet around ever fainter stars, it is unsurprising that the need for ever more accurate data-calibration and de-trending techniques is a growing one. In the recent past, there has been a notable emergence of so called ‘non-parametric’ data de-trending algorithms in the fields of transiting extrasolar planet and time-resolved exoplanetary spectroscopy (Carter & Winn, 2009; Thatte et al., 2010; Gibson et al., 2012; Waldmann, 2012; Waldmann et al., 2013). The use of such ‘non-parametric’ algorithms is a reactionary response to the difficulties of calibrating and de-trending time series observations when the instrument response function is not known at the precision of the science signal to be extracted. Previous, more conventional ‘parametric’ de-trending approaches rely on auxiliary information of the instrument (e.g. temperature sensor readings, telescope tilt, drifts in x and y positions of the science signal across the detector, etc.). Such methods have the disadvantage of being heavily reliant on the signal-to-noise (S/N) of the auxiliary information used to de-trend the science data, as well as suffering from a degree of arbitrariness in their definition of the instrument response function.

In Waldmann (2012) and Waldmann et al. (2013), we have demonstrated independent component analysis (ICA) as novel de-correlation strategy for exoplanetary time series. ICA belongs to the class of blind-source separation (BSS) algorithms, which attempt to de-correlate an observed mixture of signals into its individual source components without prior knowledge of the original signals nor the way they were mixed together. Such an approach requires the least amount of information on a given system and hence ensures a high degree of objectivity in the de-trending of data.

1.1 Limitations of conventional ICA

Whilst it has been shown that ICA is well suited to the de-correlation of non-Gaussian signals in simultaneously observed exoplanetary time series, it has two mayor limitations that will be addressed in this paper. These are:

1) Susceptibility to Gaussian noise: to de-correlate non-Gaussian signals, ICA is inherently limited to a low degree of Gaussian white noise in the observed time series observations. This so far posed a significant limitation on the types of data that can be de-correlated. Waldmann et al. (2013) showed that medium to high-SNR space based observations are somewhat permissible but noisier ground-based observations of exoplanetary time series are often out of reach for conventional ICA algorithms.

2) Amplitude and Sign ambiguity: Like all blind source separation (BSS) algorithms, ICA can de-correlate signals up to an amplitude and sign ambiguity. As explained in section 1.2, the algorithm attempts to simultaneously estimate the source signals, , as well as their respective mixings (the mixing matrix), that represent our observations . Given both and are unknown, a scalar multiplication of either can be canceled by an equal division of the other. Hence no BSS algorithms attempt the retrieval of scalar amplitudes of the source signals . Waldmann et al. (2013) resolved this by iteratively fitting components of onto observed out-of-transit data to retrieve the lost scaling factors. Whilst this is a valid approach, it again limits us to high-SNR observations as too much scatter in the observed time-series inhibits a good convergence of such a scaling factor regression.

In this paper we will address both these limitations by defining ICA in orthogonal wavelet space. In the wavelet domain, as explained in later sections, we can threshold Gaussian wavelet coefficients and increase the signal’s sparsity, making the ICA algorithm more robust in difficult S/N conditions. We can furthermore inject a sparse wavelet coefficient calibration signal that allows us to directly calibrate the amplitudes of the mixing matrix A without the need of any post-ICA scaling factor regression.

A quick introduction to BSS and Wavelets are given in sections 1.2 & 1.3, a description of the Wavelet-ICA and noise thresholding in section 2. Section 2.1 describes the amplitude calibration algorithm which is demonstrated in sections 3.1 & 3.2 using simulations and Spitzer/IRS data respectively.

1.2 Blind Source Separation

Besides ICA, other BSS algorithms include principal component analysis (PCA), factor analysis (FA), projection pursuit (PP), non-negative matrix factorisation (NMF), stationary subspace analysis (SSA), morphological component analysis (MCA) amongst others. For an extensive review of these algorithms we refer the interested reader to Comon & Jutten (2010) as well as relevant ICA literature (Oja, 1992; Hyvärinen, 1999; Hyvärinen & Oja, 2000; Hyvärinen & Oja., 2001; Stone, 2004; Koldovský et al., 2006, 2005; Yeredor, 2000; Tichavský et al., 2008). Whereas the underlying statistical assumptions differ significantly, all these algorithms take simultaneously observed signals , where is the observed signal index, and de-correlate these into the source signals , where is the source signal index. They all follow the functional form

(1)

where are scaling factors. Assuming that the exoplanetary observation consists of a mixture of astrophysical signal, , instrument or stellar systematic noise, , and the white noise signal, , from a Gaussian process , we can express equation 1 as sum of vectors (the time-dependance has been dropped for clarity):

(2)

where is the number of systematic noise sources. Finally this can also be expressed as column vectors ,
and the mixing matrix ,

(3)

We can furthermore define the ‘de-mixing matrix’ as the inverse of the ‘mixing matrix’

(4)

For a perfect de-correlation of the observed data into its source components, the original mixing matrix is the perfect inverse of the estimated de-mixing matrix and , where is the identity matrix.

ICA attempts to estimate both and the de-mixing matrix by assuming that all components of are statistically independent of one another. This is achieved by iteratively maximising the non-Gaussianity of each signal component by estimating their respective Shannon entropies (Shannon, 1948; Hyvärinen & Oja, 2000). For more information we refer the reader to Waldmann (2012) and the standard literature.

1.3 Introduction to Wavelets

Readers familiar with wavelet decompositions may jump to section 2.

Similar to a Fourier Transform (FT), the Wavelet Transform (WT) decomposes a given time series signal into its frequency components. Where the FT uses sine and cosine functions that extend over the full range of the data, the WT uses highly localised impulses. These impulses or ‘wavelets’ scan through the time series and much like a tuning fork to an instrument, ‘resonate’ with localised features in the time series that are akin to the wavelet’s shape and scaling. The individual wavelet basis functions are derived from a single mother wavelet through translation and dilation of the mother wavelet (Percival & Walden, 2000). Different wavelets exist with different analytical properties, here we use the orthogonal basis wavelets of the Daubechies (db) family (Daubechies, 1988). The wavelet analogue to the Fourier transform of the times series is given by:

(5)

where is called the ‘mother wavelet’ for a given scaling and translation and is the wavelet coefficient with respect to and . We define the mother wavelet for the continuous wavelet transform (CWT) as

(6)

The wavelet base is orthogonal and we can hence reconstruct the data by taking the sum of the product of all coefficients for a given scale and translation, , with the respectively scaled and translated mother wavelet

(7)

For a more in-depth definition of wavelets and their respective properties we refer the interested reader to Daubechies (1992) and Percival & Walden (2000).

1.3.1 Multi-resolution analysis

The above equations apply to the CWT case. The wavelet coefficients describe the correlation between the wavelet at varying scales (or frequencies). These can be calculated by changing the scale of the wavelet, i.e. the analysis window. We can hence speak of a multi-resolution decomposition, where each scaling of the mother wavelet denotes a given resolution. Here, the analogy to the Fourier Transform would be band-pass filters of varying size. Most times it is more sensible to exploit the discrete nature of the data and to define the discrete wavelet transform (DWT). The DWT is significantly easier to implement and faster to compute. Similarly to the continuous case, in the DWT we have a ‘mother’ wavelet and a scaling function, also known as ‘father’ wavelet. Here, the ‘mother’ wavelet is denoted by and the ‘father’ by (Daubechies, 1992; Percival & Walden, 2000; Press et al., 2007). It is important to note that unlike in the CWT case, where the ‘mother’ wavelet itself is scaled to represent different frequencies in the data, this is not the case in the DWT. In the DWT, in analogy with the Fourier Transform, and can be thought of as high-pass and low-pass frequency filters respectively. Different scalings are then achieved by progressively ‘down-sampling’ the data.

The DWT is best understood by following the individual steps of the algorithm that computes the transform:

  1. The observed, discrete time series, , is convolved with the ‘mother’ wavelet :

    (8)

    where denotes the convolution of with and represent the ‘mother’ wavelet coefficients for a given scale . As mentioned earlier, the ‘mother’ wavelet, acts as a high-pass filter, sensitive to the high frequencies or details of the time series. We hence refer to the coefficients of as detail coefficients.

  2. The next step is to convolve the same time series, , with the scaling function or ‘father’ wavelet:

    (9)

    As opposed to the ‘mother’ wavelet, the ‘father’ wavelet acts as a low-pass filter of the time series and its coefficients can be viewed as a moving average of the underlying trend of . We hence refer to its coefficients as average coefficients and denote them with . Furthermore, the low-pass filter is related to the high-pass filter by

    (10)

    where is the filter length and corresponds to the number of points in the time series .

  3. We now have two sets of time series, a low-pass filtered, moving-average time series, , and a high-pass filtered time series, . We record the coefficients and proceed with our analysis of the average coefficients, . Since half of the frequencies in (namely the high-pass ones) have been removed by equation 9, the Nyquist theorem tells us that we are oversampled by a factor of two. We can hence remove every second coefficient in without losing information. This operation is termed ‘down-sampling’ and abbreviated by , leaving us with coefficients to describe . Similar applies to the detailed coefficients . The detail and average coefficients are hence given by:

    (11)
    (12)
  4. The Nyquist down-sampling introduces the concept of scaling or multiple resolutions. If we now repeat steps 1-3 on the down-sampled average coefficients, , we obtain a new set of coefficients ( and ) on a scale that is double the size of the previous decomposition. This is illustrated in figure 1 as flow chart and a data example is given in figure 2.

Figure 1: Outline of multi-resolution wavelet decomposition down to the 3rd decomposition level.
Figure 2: Example of a discrete wavelet transform (DWT). TOP: sinusoidal time series with Gaussian noise and saw-tooth functions superimposed. BOTTOM: four level DWT decomposition of a noisy sinusoid using symlet-5 wavelets. It can be seen that the average coefficients, , represent a ‘moving average’ of the data, whereas the detailed coefficients, , represent bands of higher frequencies.

For a given scale, , the data can now be reconstructed by reversing the above process:

(13)

where are the total number of decompositions.

For an algorithmic implementation using quadrature mirror filters (QMFs) see appendix A and Press et al. (2007).

2 Wavelet ICA

We now perform the DWT on each observed time series, , and obtain a series of average coefficients, , and detail coefficients for a given scale, . For our ICA decomposition, we use these series of coefficients instead of the time domain time-series, , and define our observed data as

(14)

and similarly we can express our source signals, , as the wavelet equivalent

(15)

From here onwards we will use to denote the wavelet domain presentation of the time-domain signal . We hence restate equation 3 as

(16)

There are several important considerations to note here:

As we are dealing with a multi-resolution analysis, it becomes possible to perform the blind source separation on individual scales (or bandpasses) only or to actively exclude some frequency ranges from the analysis. Such an approach may be advantageous if one has prior knowledge of the signals’ frequency bandwidths and wishes to restrict the impact of high-frequency noise or other signals on the BSS of a given signal. An example of this is given by Lin & Zhang (2005). In this paper, we do not take this approach for reasons described in section 2.1.

Transforming the observed data into wavelet basis sets increases the redundancy of the data. This is advantageous in the case of overcomplete ICA, where more source signals are present in the data than time series observed. This overcompleteness leads to an improper separation of the independent source signals in the data. Increasing the data’s redundancy has been shown to alleviate this problem and makes it possible to efficiently use ICA in very restricted data ranges (Inuso et al., 2007; Mammone et al., 2012).

Furthermore as pointed out in section 1.1, ICA is inefficient in the presence of high frequency scatter. This is alleviated in the wavelet space as by the virtue of the multi-resolution analysis, most high-frequency scatter is contained in the coefficients and a better degree of separation can be achieved for lower frequency systematics. We will demonstrate this property in section 3.2. It is also possible in wavelet space to selectively suppress Gaussian noise via soft or hard thresholding (e.g. Stein, 1981; Donoho, 1995) and hence increase the robustness of the BSS algorithm in low S/N conditions.

2.1 Amplitude Calibrated ICA (Acica)

Arguably the central problem with an exoplanetary data de-trending algorithm based on ICA is the scale and sign ambiguity of the de-correlated, independent components.

Whilst we do not know the scaling of individual source components in , we know by definition of the ICA pre-processing step (whitening) that each source signal is normalised to unit variance, , and that . Hence the elements of do not need to preserve the absolute but the relative scalings of . If we know the absolute scaling of one source signal, we can hence solve the scaling degeneracy of all signals. This is usually possible for simulations where we control the inputs but not for real life examples where the source signals and their scalings are a priori unknown. One way to solve this predicament is via the introduction of a calibration signal (CS). Such a CS must have the following properties:

1) Minimal to no impact on the data: The introduction of a CS should not distort any underlying signals or their amplitudes.

2) Temporal localisation: The CS should not be located in in-transit (INT) regions and not take up too much out-of-transit (OOT) data.

3) Stability to noise: The CS should not be affected by the noise of the data (be it Gaussian or otherwise).

4) Non-Gaussianity: The signal should be non-Gaussian enough for the ICA to recognise it but not more non-Gaussian than the science signal itself. A too prominent CS could bias the ICA towards the retrieval of the CS and could impair the retrieval of other non-Gaussian signals.

In the time domain, it is difficult to implement property 4 without violating property 1. In order to implement a distinct enough non-Gaussian signal in the time domain, one needs to significantly alter large sections of the data. Also the treatment of noise (property 3) is problematic. In the frequency domain, property 2 is violated as the Fourier transform does not contain temporal information and the CS would superimpose the science signal. However, all four criteria can be met in the wavelet domain.

2.1.1 Injecting the Calibration Signal

In the wavelet domain, we can introduce a much sparser CS than in the time domain and have control over its temporal location (unlike in the frequency domain) via the lag term . Given that is more redundant than , we can minimise the impact on the observed data or avoid any alterations to the data altogether by selecting wavelet coefficients of with zero or near-zero amplitudes to be used for our CS. We can now re-define equation 2 as

(17)

where denotes the observed data with CS, is the CS with the same dimensions as and is a random but known scaling constant. We define as

(18)

where are pre-selected lags for a given scale that correspond to a section of the OOT data. Note that:

1) No average coefficients, , are used in since these are not redundant enough.

2) As equation 17 suggests, only number of independent components can be extracted. Adding the CS in the the observed data may render the ICA overcomplete as one source signal will not be retrieved in favour of the CS. For large data sets () this is generally not a problem.

3) We choose to avoid the CS to be the most dominant feature in the data.

4) In equation 18 we have chosen to define to contain one non-zero coefficient per scale and lags corresponding to the same area of OOT data in the time-domain. This allows for efficient use of OOT data and a high sparsity in the wavelet domain but is entirely arbitrary otherwise.

2.1.2 Retrieving the scaling information

Having injected the calibration signal into our observations, we can now perform the ICA deconvolution (section 1.2) on the data with CS in the wavelet domain, . We identify the CS in the retrieved source signals and denote their respective elements of the mixing matrix, as . By measuring the average amplitude of the wavelet coefficients comprising the retrieved CS, , and knowing the original CS amplitude, ,

(19)

we can retrieve the scaling of the CS as well as those of the other signals contained in . We denote this calibrated mixing matrix as with its elements given by

(20)

2.1.3 Calibration error

Using the above calibration approach we consider the total error on the independent components (ICs) to be a combination of source separation error (SSE) of the individual IC and the SSE of the calibration signal components, . The SSE for the source signal can be estimated by

(21)

where is called the ‘gain matrix’ and defined as . For perfectly separated sources, we hence have where is the identity matrix. The ICA algorithms employed here (EFICA and WASOBI; Koldovský et al., 2006; Yeredor, 2000; Tichavský et al., 2006b) can be shown to be asymptotically efficient and converge to the correct solution in the limit of iterations. However in real world scenarios this is not the case and we find that the estimated mixing (or de-mixing) matrix is only approximately equal to the true underlying mixing matrix , i.e. . To estimate the SSE, we can hence inspect the variance of the matrix . Whilst it is possible to directly calculate for simulations, the true mixing of the signals is usually unknown in real data applications. Tichavský et al. (2006a); Koldovský et al. (2006) and Tichavský et al. (2008) have shown that an asymptotic estimate of is nonetheless possible. For a derivation we refer the reader to the cited literature.

We define the source separation error for the calibrated components in as the quadrature error of the individual source components’ error and the SSE of the calibration signal

(22)

3 Application Examples

3.1 Simulations

In this section we illustrate the ICA calibration approach using simulations.

1) For this we generated five input signals: a Mandel & Agol (2002) secondary eclipse curve of HD189733b, a Gaussian process with FWHM of 210 amplitude, and three instrument noise vectors derived from HST/NICMOS observations (Waldmann et al., 2013). These input signals are shown in figure 3.

2) We mix those signals using a randomly generated mixing matrix to obtain our ‘observed’ time series, , figure 4.

3) Using the MATLAB111http://www.mathworks.co.uk wavelet-toolbox and daubechies 4 (db4) wavelets (Daubechies, 1988, 1992) we calculated the discrete wavelet transform for four scales () of each time series in to obtain (figure 5, black and blue curves). In the scope of this simulation we found the choice of wavelet and number of vanishing moments not to matter greatly. However the choice of wavelet and decomposition depth may vary from data set to data set.

4) We generated the calibration signal, , to consist of one wavelet coefficient per scale giving total number of non-zero coefficients. For each series, , the calibration signal was multiplied with the random scaling factor , i.e. . Figure 5 (red peaks) shows the scaled calibration signal for each . Note that we chose the lags of the non-zero coefficient to correspond to the 95 percentile of each scale. This guarantees the CS to be located in the post-eclipse out-of-transit data. It also overlaps the differently scaled wavelet impulses and hence minimises the impact on the time-domain data representation. See figure 6 for a time-domain representation of (third from top in figure 5).

5) The blind source separation via independent component analysis as described in Waldmann (2012) was performed and the time-domain representation of the retrieved ICs are shown in figure 7.

6) Following section 2.1.2 we calculate the scaled mixing matrix and apply the scalings to individual source signals. Figure 8 shows the observed data from figure 6, blue circles, and the re-constructed data using the scaling matrix , green crosses. The scaled ICs are shown offset underneath.

Figure 3: Input signals before mixing. From to bottom: 3 systematic noise components of the HST/NICMOS instrument (Waldmann et al., 2013); a Mandel & Agol (2002) lightcurve of the secondary eclipse of HD189733b; Gaussian noise at with FWHM of 210 normalised flux.
Figure 4: Signals from figure 3 mixed together, using a random mixing matrix, to created the observed signals .
Figure 5: BLUE AND BLACK: Wavelet transform of signals in figure 4, the order corresponds to order in figure 4 (colours for clarity). GREEN dotted: denoting scale boundaries from average coefficients to the highest frequency scale . RED: Calibration signal for each series over-plotted. Four peaks are visible (one per scale) close to their scale boundaries. Their amplitudes are defined by the random scaling vector .
Figure 6: BLACK: Time domain representation of third observed component after the calibration signal was added (in figure 5). RED: Calibration signal only. Note the temporal location of the four frequency signal (red spikes in figure 5) at the far edge of the post-egress out-of-transit data.
Figure 7: Time domain representation of the independent components, , retrieved. First components is the secondary eclipse signal, second components the calibration signal. Note that these are not scaled yet.
Figure 8: TOP: Third observed time series (, figure 6, blue circles), over-plotted (green crosses) reconstructed signal using scaled components, described in section 2.1.2. Note the excellent match between observed and reconstructed time series. BOTTOM black: scaled independent components of the above signal.

3.2 Spitzer/IRS: HD189733b secondary eclipse

We test the proposed algorithm on Spitzer/IRS (Houck et al., 2004) observations of a secondary eclipse of HD189733b. These observations were obtained in November 2006 (program id: 30473) in low resolution mode ranging from 7.46 to 14.29m. The secondary eclipse was followed for 5:48 hours with integration times of 14.7 seconds per ramp. The Spitzer pipeline calibrated data were reduced using the Spice222version 2.5. http://irsa.ipac.caltech.edu/data
/SPITZER/docs/dataanalysistools/tools/spice/
spectral reduction software. Examples of the resulting time series are shown in figures 9 & 10 (blue crosses). Similar to section 3.1, we take these time series as inputs to our algorithm. Figure 11 shows the DWT, using db4 wavelets, of the time series at 7.6971m (black lines) with the injected calibration signal over plotted in red. Note that no binning in wavelength was performed before, which marks a significant difference to the HST/NICMOS analysis in Waldmann et al. (2013), where a relatively coarse binning was necessary to reduce the Gaussian noise.

We now follow through each individual step as described in the previous sections and obtain the scaled independent components of the data set. Figures 9 & 10 show two observed time series (blue crosses) with the correctly scaled individual independent components (black dots) underneath. The first and second ICs comprise the secondary eclipse signal and the calibration signal respectively. These are also shown in figure 12. The remaining ICs are instrument or stellar systematic noise. The amplitudes of these systematic components can be seen to be lower toward the blue end of the spectrum (figure 9), which is also evident in a cleaner observed time series, and more pronounced toward the red end of the spectrum (figure 10). In figure 12 we over plotted the best fit Mandel & Agol (2002) model using a MCMC fitting routine (Waldmann et al., 2013) with the transit depth as only free parameter. The transit depth posterior distribution is shown in figure 13. The total error is the quadrature sum of and the MCMC derived standard error . We obtain at 7.6971m a planet/star contrast ratio of which we find consistent with the measurement by Grillmair et al. (2007). Figure 12 also shows the retrieved calibration signal (black dots) with the original calibration signal over plotted in red. The excellent match between both indicates an adequate signal separation and the correct scaling of the independent components.

In order to demonstrate the increased efficiency of the proposed ACICA algorithm in the presence of noise, we also show the ICs derived by performing the more ‘traditional’ ICA in the time domain only (figure 14). The components in figure 14 are poorly separated and the standard ICA analysis did not converged with traces of the secondary eclipse feature present in three separate components.

Figure 9: Spitzer/IRS observations of HD189733b at 7.6971m. BLUE CROSSES: Raw time series. BLACK DOTS: first 11 most non-Gaussian independent components comprising the raw time series. The first and second components are the secondary eclipse curve and the calibration signal respectively. Remaining components are instrument or stellar systematics.
Figure 10: Spitzer/IRS observations of HD189733b at 11.6285m. BLUE CROSSES: Raw time series. BLACK DOTS: individual independent components in the same order as in figure 9. Note the significantly higher systematic noise amplitudes at the red-end of the spectrum compared to the blue end in figure 9.
Figure 11: BLACK: Wavelet transform presentation of time series in figure 9 at 7.6971m. RED: Calibration signal injected in the detailed coefficients.
Figure 12: TOP: secondary eclipse component at 7.6971m (first independent component in figure 9) with Mandel & Agol (2002) model over plotted (red). BOTTOM: retrieved calibration signal (black dots) with original input calibration signal over plotted (red line).
Figure 13: MCMC generated posterior distribution of transit depth parameter for lightcurve fit in figure 12. Distribution mean and one sigma bounds are indicated with red continuous and discontinuous lines respectively.
Figure 14: First five (unscaled) independent components of ICA analysis performed in the time domain. Given the low signal to noise of the data the convergence of the ICA algorithm is limited and it cannot separate individual sources when directly applied in the time domain. Compare this to figure 9 where ICA was performed on the wavelet coefficients and the source signals were separated correctly.

4 Discussion

Using ACICA, we can directly retrieve the signal amplitudes of the source components comprising our data. It not only allows us to non-parametrically de-trend low S/N data sets but also allows for a unique insight into the structural make up of the observations. A study of the systematic components and their amplitudes offers a powerful diagnostic on systematic noise behaviour across a detector. As shown in the Spitzer/IRS example, amplitudes of systematic components in the data can vary significantly across the dispersion of a grism. A study of these systematic components may allow, amongst others, for an optimisation of residual flat-fielding errors across a field or the characterisation of wavelength dependent slit-losses of the instrument. As data diagnostic we can hence define the S/N of a time series in terms of its systematic noise

(23)

where is the variance of a given systematic noise component at for the time series and is the respective measured transit depth.

The choice of wavelet family can be an important factor in the convergence properties of the code. For noisy data we find a db4 wavelet base to be the best compromise between slowly varying continues functions and step-like, discrete offsets in the data. Should the data’s systematics be dominated by offset patterns, such as those often induced by nodding (e.g. Grillmair et al., 2007), we find a Haar wavelet to give a sparser representation of the data than the Daubechies family. Similarly, high order Daubechies wavelets or symmetric wavelets (Symlets) are best suited if one’s data is defined by continuous evolving trends.

As mentioned in section 2, Gaussian noise can be suppressed in wavelet space by the selective thresholding of detailed coefficients. Whilst true in theory, we found it difficult to implement in practice and great care needs to be taken in the choice of thresholding rule and amplitude to avoid ‘over cleaning’ and distorting of the underlying data. Fortunately we find that convergence properties of the ICA de-convolution are much enhanced by the transformation of the data into wavelet space alone and further cleaning was (in the scope of this analysis) unnecessary.

5 Conclusion

In this paper we present a novel approach to the amplitude calibration of independent component analysis (the ACICA algorithm) via the introduction of a sparse calibration signal in orthogonal wavelet space. By transforming observed time series into series of wavelet coefficients, we are able to overcome two main limitations of blind source separation algorithms such as ICA: 1) the convergence of ICA algorithms is strongly impaired by the presence of Gaussian noise in the data. By transforming data to a sparser and more redundant basis we can significantly improve the performance of the ICA de-convolution without otherwise altering the data. 2) In the wavelet domain it is possible to inject an artificial calibration signal of known amplitude into the data without significant or any impact to the original data. With the use of such a calibration signal we can directly determine the individual absolute amplitudes of the derived independent components. This marks a significant improvement over methods such as they are discussed in Waldmann et al. (2013) where component amplitudes were derived through a regression analysis to existing out of transit baseline data.

ACICA allows us to de-trend otherwise inaccessible data sets non-parametrically. We demonstrated this using simulations and archival Spitzer/IRS data. It furthermore offers us an unprecedented and unique insight into the morphology of a data set by allowing us to directly map out temporal/wavelength dependent variations of instrumental or stellar noise in the data set.

Together, these attributes make the algorithm proposed here a highly versatile and powerful tool for exoplanetary time series analysis.

References

  • Carter & Winn (2009) Carter J.A., Winn J.N., 2009, ApJ, 704, 51
  • Claret (2000) Claret, A., 2000, AA, 363, 1081
  • Comon & Jutten (2010) Comon, P., & Jutten, C., 2010, ‘Handbook of Blind Source Separation: Independent Component Analysis and Applications’, Academic Press
  • Crouzet et al. (2012) Crouzet, N., McCullough, P. R., Burke, C. J., Long, D., 2012, axXiv: 1210.5275v1
  • Daubechies (1988) Daubechies, I., 1988, Commun. Pure Appl. Math, 41, 909
  • Daubechies (1992) Daubechies, I., 1992, ‘Ten Lectures on Wavelets’, Soc. for Industrial and Applied Mathematics
  • Donoho (1995) Donoho,D.L., 1995, IEEE Trans. on. Inf. Theory, 41,3,613
  • Gibson et al. (2012) Gibson, N. P., Aigrain, S., Roberts, S., et al., 2012, MNRAS, 419, 2683
  • Grillmair et al. (2007) Grillmair, C. J., Charbonneau, D., Burrows, A., et al., 2007, ApJL, 658, L115
  • Grillmair et al. (2008) Grillmair, C. J., Burrows, A., Charbonneau, D., Armus, L., Stauffer, J., Meadows, V., van Cleve, J., von Braun, K. & Levine, D., 2008, Nature, 456, 767
  • Haario et al. (2006) Haario, H., Laine, L., Mira, A., Saksman, E., 2006, Statistics and Computing, 16, 339
  • Haario et al. (2001) Haario, H., Saksman, E., Tamminen, J., 2001, Bernoulli, 7, 223
  • Hastings (1970) Hastings, W. K., 1970, Biometrika, 57, 97
  • Houck et al. (2004) Houck, J. R., Roellig, T. L., van Cleve, J., et al., 2004, ApJS, 154,18
  • Hyvärinen (1999) Hyvärinen A.,1999, IEEE Trans. on Neural Networks, 10, 626
  • Hyvärinen & Oja (2000) Hyvärinen A., Oja, E., 2000, Neural Networks, 13, 411
  • Hyvärinen & Oja. (2001) Hyvärinen A., Karhunen J., Oja E., 2001, ‘Independent Component Analysis’, John Wiley & Sons Inc.
  • Inuso et al. (2007) Inuso, G., La Foresta, F., Mammone, N., Morabito, F., C., 2007, Int. J. Conf. on Neural Networks, 1524
  • Koldovský et al. (2006) Koldovský Z., Tichavský P., Oja E., 2006, IEEE Trans. on Neural Networks, 17,1265
  • Koldovský et al. (2005) Koldovský Z., Tichavský P., Oja E., 2005, Proc.of IEEE/SP 13th Workshop on Stat. Signal Processing
  • Lin & Zhang (2005) Lin, J., Zhang, A., 2005, NDT&E International, 38, 421
  • Mammone et al. (2012) Mammone, N., La Foresta, F., Morabito, F. C., 2012, IEEE Sensors Journal, 12, 533
  • Mandel & Agol (2002) Mandel, K.and Agol, E., 2002, ApJL, 580, L171
  • Oja (1992) Oja, E., 1992, Neural Networks, 5, 927
  • Percival & Walden (2000) Percival,D.B., Walden, A.T.,2000, ‘Wavelet Methods for Time Series Analysis’, Cambridge Uni. Press
  • Press et al. (2007) Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P., 2007, ‘Numerical Recipes’, Cambridge Uni. Press
  • Riley et al. (2002) Riley, K. F., Hobson, M., P., Bence, S. J., 2002, ‘Mathematical Methods for Physics and Engineering’, Cambridge Uni. Press
  • Shannon (1948) Shannon, C., 1948, Bell System Tech. J., 27, 379
  • Stein (1981) Stein, C.M., 1981, Ann. Statist., 9, 6, 1135
  • Stone (2004) Stone, J., V., 2004, ‘Independent Component Analysis: A tutorial Introduction’, A Bradford Book
  • Swain et al. (2008) Swain, M. R.,Vasisht, G. and Tinetti, G., 2008, Nature, 452, 329
  • Swain et al. (2009a) Swain, M. R., Vasisht, G., Tinetti, G., Bouwman, J., Chen, P., et al , 2009, ApJL, 690, L114
  • Swain et al. (2009b) Swain, M. R., Tinetti, G., Vasisht, G., Deroo, P., Griffith, C., et al., D., 2009, ApJ, 704, 1616
  • Swain et al. (2011) Swain, M. R. and Deroo, P. and Vasisht, G., 2011, IAU Symposium, 276, 148
  • Tichavský et al. (2006a) Tichavský P., Doron E., Yeredor A., Gomez-Herrero G., 2006, Proc. EUSIPCO-2006
  • Tichavský et al. (2006b) Tichavský P., Doron E., Yeredor A., Nielsen J., 2006, Proc. EUSIPCO-2006
  • Tichavský et al. (2008) Tichavsky, P., Koldovsky, Z., Yeredor, A., Gomez-Herrero, G. , Doron, E., 2008, IEEE Transactions on Neural Networks, 19, 421
  • Tessenyi et al. (2012) Tessenyi, M., Ollivier, M., Tinetti, G., et al., 2012, ApJ, 746, 45
  • Thatte et al. (2010) Thatte, A. and Deroo, P. and Swain, M. R., 2010, A&A, 523, 35
  • Waldmann (2012) Waldmann, I. P., 2012, ApJ, 747, 12
  • Waldmann et al. (2013) Waldmann, I P., Tinetti, G., Deroo, P., et al., 2013, ApJ accepted
  • Yeredor (2000) Yeredor A., 2000, IEEE Sig. Proc. Letters, 7, 197

Appendix A The quadrature mirror filter

A very fast and simple implementation of the DWT for multi-resolution decomposition is by constructing the quadrature mirror transformation matrix. For a Daubechies-4 wavelet we obtain four coefficients comprising and similarly . Rather than convolving the time series, with both filters separately and then down-sample, we can also construct a matrix where each odd row contains and each even row contains coefficients. This automatically down-samples the data to the new resolution . Such a matrix is called a ‘quadrature mirror filter’ (QMF) and equation A1 is an example of such (Press et al., 2007)

(A1)

where the empty spaces denote zeros. To obtain a DWT using this QMF, we multiply the QMF with the column vector containing the time-series on the right. Note the circular behaviour of the matrix at the bottom, where the wavelet coefficients wrap around to the beginning. This has important consequences as it indicates that the DWT wraps around the data and the transform at the end of the time-series is sensitive to data at the beginning of the time series. This effect can be avoided by adding sufficient zero-valued points to the time series at its beginning and end. This process is also known as ‘zero-padding’.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
50035
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description