On signals faint and sparse: the Acica algorithm for blind detrending of Exoplanetary transits with low signaltonoise
Abstract
Independent Component Analysis (ICA) has recently been shown to be a promising new path in data analysis and detrending of exoplanetary time series signals. Such approaches do not require or assume any prior or auxiliary knowledge on the data or instrument in order to deconvolve the astrophysical light curve signal from instrument or stellar systematic noise. These methods are often known as ‘blind source separation’ (BSS) algorithms. Unfortunately all BSS methods suffer from a amplitude and sign ambiguity of their deconvolved components which severely limits these methods in low signaltonoise (S/N) observations where their scalings cannot be determined otherwise. Here we present a novel approach to calibrate ICA using sparse wavelet calibrators. The Amplitude Calibrated Independent Component Analysis (ACICA) allows for the direct retrieval of the independent components’ scalings and the robust detrending of low S/N data. Such an approach gives us an unique and unprecedented insight in the underlying morphology of a data set, making this method a powerful tool for exoplanetary data detrending and signal diagnostics.
1 Introduction
As we explore smaller and smaller extrasolar planet around ever fainter stars, it is unsurprising that the need for ever more accurate datacalibration and detrending techniques is a growing one. In the recent past, there has been a notable emergence of so called ‘nonparametric’ data detrending algorithms in the fields of transiting extrasolar planet and timeresolved exoplanetary spectroscopy (Carter & Winn, 2009; Thatte et al., 2010; Gibson et al., 2012; Waldmann, 2012; Waldmann et al., 2013). The use of such ‘nonparametric’ algorithms is a reactionary response to the difficulties of calibrating and detrending time series observations when the instrument response function is not known at the precision of the science signal to be extracted. Previous, more conventional ‘parametric’ detrending approaches rely on auxiliary information of the instrument (e.g. temperature sensor readings, telescope tilt, drifts in x and y positions of the science signal across the detector, etc.). Such methods have the disadvantage of being heavily reliant on the signaltonoise (S/N) of the auxiliary information used to detrend the science data, as well as suffering from a degree of arbitrariness in their definition of the instrument response function.
In Waldmann (2012) and Waldmann et al. (2013), we have demonstrated independent component analysis (ICA) as novel decorrelation strategy for exoplanetary time series. ICA belongs to the class of blindsource separation (BSS) algorithms, which attempt to decorrelate an observed mixture of signals into its individual source components without prior knowledge of the original signals nor the way they were mixed together. Such an approach requires the least amount of information on a given system and hence ensures a high degree of objectivity in the detrending of data.
1.1 Limitations of conventional ICA
Whilst it has been shown that ICA is well suited to the decorrelation of nonGaussian signals in simultaneously observed exoplanetary time series, it has two mayor limitations that will be addressed in this paper. These are:
1) Susceptibility to Gaussian noise: to decorrelate nonGaussian signals, ICA is inherently limited to a low degree of Gaussian white noise in the observed time series observations. This so far posed a significant limitation on the types of data that can be decorrelated. Waldmann et al. (2013) showed that medium to highSNR space based observations are somewhat permissible but noisier groundbased observations of exoplanetary time series are often out of reach for conventional ICA algorithms.
2) Amplitude and Sign ambiguity: Like all blind source separation (BSS) algorithms, ICA can decorrelate signals up to an amplitude and sign ambiguity. As explained in section 1.2, the algorithm attempts to simultaneously estimate the source signals, , as well as their respective mixings (the mixing matrix), that represent our observations . Given both and are unknown, a scalar multiplication of either can be canceled by an equal division of the other. Hence no BSS algorithms attempt the retrieval of scalar amplitudes of the source signals . Waldmann et al. (2013) resolved this by iteratively fitting components of onto observed outoftransit data to retrieve the lost scaling factors. Whilst this is a valid approach, it again limits us to highSNR observations as too much scatter in the observed timeseries inhibits a good convergence of such a scaling factor regression.
In this paper we will address both these limitations by defining ICA in orthogonal wavelet space. In the wavelet domain, as explained in later sections, we can threshold Gaussian wavelet coefficients and increase the signal’s sparsity, making the ICA algorithm more robust in difficult S/N conditions. We can furthermore inject a sparse wavelet coefficient calibration signal that allows us to directly calibrate the amplitudes of the mixing matrix A without the need of any postICA scaling factor regression.
1.2 Blind Source Separation
Besides ICA, other BSS algorithms include principal component analysis (PCA), factor analysis (FA), projection pursuit (PP), nonnegative matrix factorisation (NMF), stationary subspace analysis (SSA), morphological component analysis (MCA) amongst others. For an extensive review of these algorithms we refer the interested reader to Comon & Jutten (2010) as well as relevant ICA literature (Oja, 1992; Hyvärinen, 1999; Hyvärinen & Oja, 2000; Hyvärinen & Oja., 2001; Stone, 2004; Koldovský et al., 2006, 2005; Yeredor, 2000; Tichavský et al., 2008). Whereas the underlying statistical assumptions differ significantly, all these algorithms take simultaneously observed signals , where is the observed signal index, and decorrelate these into the source signals , where is the source signal index. They all follow the functional form
(1)  
where are scaling factors. Assuming that the exoplanetary observation consists of a mixture of astrophysical signal, , instrument or stellar systematic noise, , and the white noise signal, , from a Gaussian process , we can express equation 1 as sum of vectors (the timedependance has been dropped for clarity):
(2) 
where is the number of systematic noise sources. Finally this can also be expressed as column vectors ,
and the mixing matrix ,
(3) 
We can furthermore define the ‘demixing matrix’ as the inverse of the ‘mixing matrix’
(4) 
For a perfect decorrelation of the observed data into its source components, the original mixing matrix is the perfect inverse of the estimated demixing matrix and , where is the identity matrix.
ICA attempts to estimate both and the demixing matrix by assuming that all components of are statistically independent of one another. This is achieved by iteratively maximising the nonGaussianity of each signal component by estimating their respective Shannon entropies (Shannon, 1948; Hyvärinen & Oja, 2000). For more information we refer the reader to Waldmann (2012) and the standard literature.
1.3 Introduction to Wavelets
Readers familiar with wavelet decompositions may jump to section 2.
Similar to a Fourier Transform (FT), the Wavelet Transform (WT) decomposes a given time series signal into its frequency components. Where the FT uses sine and cosine functions that extend over the full range of the data, the WT uses highly localised impulses. These impulses or ‘wavelets’ scan through the time series and much like a tuning fork to an instrument, ‘resonate’ with localised features in the time series that are akin to the wavelet’s shape and scaling. The individual wavelet basis functions are derived from a single mother wavelet through translation and dilation of the mother wavelet (Percival & Walden, 2000). Different wavelets exist with different analytical properties, here we use the orthogonal basis wavelets of the Daubechies (db) family (Daubechies, 1988). The wavelet analogue to the Fourier transform of the times series is given by:
(5) 
where is called the ‘mother wavelet’ for a given scaling and translation and is the wavelet coefficient with respect to and . We define the mother wavelet for the continuous wavelet transform (CWT) as
(6) 
The wavelet base is orthogonal and we can hence reconstruct the data by taking the sum of the product of all coefficients for a given scale and translation, , with the respectively scaled and translated mother wavelet
(7) 
For a more indepth definition of wavelets and their respective properties we refer the interested reader to Daubechies (1992) and Percival & Walden (2000).
1.3.1 Multiresolution analysis
The above equations apply to the CWT case. The wavelet coefficients describe the correlation between the wavelet at varying scales (or frequencies). These can be calculated by changing the scale of the wavelet, i.e. the analysis window. We can hence speak of a multiresolution decomposition, where each scaling of the mother wavelet denotes a given resolution. Here, the analogy to the Fourier Transform would be bandpass filters of varying size. Most times it is more sensible to exploit the discrete nature of the data and to define the discrete wavelet transform (DWT). The DWT is significantly easier to implement and faster to compute. Similarly to the continuous case, in the DWT we have a ‘mother’ wavelet and a scaling function, also known as ‘father’ wavelet. Here, the ‘mother’ wavelet is denoted by and the ‘father’ by (Daubechies, 1992; Percival & Walden, 2000; Press et al., 2007). It is important to note that unlike in the CWT case, where the ‘mother’ wavelet itself is scaled to represent different frequencies in the data, this is not the case in the DWT. In the DWT, in analogy with the Fourier Transform, and can be thought of as highpass and lowpass frequency filters respectively. Different scalings are then achieved by progressively ‘downsampling’ the data.
The DWT is best understood by following the individual steps of the algorithm that computes the transform:

The observed, discrete time series, , is convolved with the ‘mother’ wavelet :
(8) where denotes the convolution of with and represent the ‘mother’ wavelet coefficients for a given scale . As mentioned earlier, the ‘mother’ wavelet, acts as a highpass filter, sensitive to the high frequencies or details of the time series. We hence refer to the coefficients of as detail coefficients.

The next step is to convolve the same time series, , with the scaling function or ‘father’ wavelet:
(9) As opposed to the ‘mother’ wavelet, the ‘father’ wavelet acts as a lowpass filter of the time series and its coefficients can be viewed as a moving average of the underlying trend of . We hence refer to its coefficients as average coefficients and denote them with . Furthermore, the lowpass filter is related to the highpass filter by
(10) where is the filter length and corresponds to the number of points in the time series .

We now have two sets of time series, a lowpass filtered, movingaverage time series, , and a highpass filtered time series, . We record the coefficients and proceed with our analysis of the average coefficients, . Since half of the frequencies in (namely the highpass ones) have been removed by equation 9, the Nyquist theorem tells us that we are oversampled by a factor of two. We can hence remove every second coefficient in without losing information. This operation is termed ‘downsampling’ and abbreviated by , leaving us with coefficients to describe . Similar applies to the detailed coefficients . The detail and average coefficients are hence given by:
(11) (12) 
The Nyquist downsampling introduces the concept of scaling or multiple resolutions. If we now repeat steps 13 on the downsampled average coefficients, , we obtain a new set of coefficients ( and ) on a scale that is double the size of the previous decomposition. This is illustrated in figure 1 as flow chart and a data example is given in figure 2.
For a given scale, , the data can now be reconstructed by reversing the above process:
(13)  
where are the total number of decompositions.
2 Wavelet ICA
We now perform the DWT on each observed time series, , and obtain a series of average coefficients, , and detail coefficients for a given scale, . For our ICA decomposition, we use these series of coefficients instead of the time domain timeseries, , and define our observed data as
(14) 
and similarly we can express our source signals, , as the wavelet equivalent
(15) 
From here onwards we will use to denote the wavelet domain presentation of the timedomain signal . We hence restate equation 3 as
(16) 
There are several important considerations to note here:
As we are dealing with a multiresolution analysis, it becomes possible to perform the blind source separation on individual scales (or bandpasses) only or to actively exclude some frequency ranges from the analysis. Such an approach may be advantageous if one has prior knowledge of the signals’ frequency bandwidths and wishes to restrict the impact of highfrequency noise or other signals on the BSS of a given signal. An example of this is given by Lin & Zhang (2005). In this paper, we do not take this approach for reasons described in section 2.1.
Transforming the observed data into wavelet basis sets increases the redundancy of the data. This is advantageous in the case of overcomplete ICA, where more source signals are present in the data than time series observed. This overcompleteness leads to an improper separation of the independent source signals in the data. Increasing the data’s redundancy has been shown to alleviate this problem and makes it possible to efficiently use ICA in very restricted data ranges (Inuso et al., 2007; Mammone et al., 2012).
Furthermore as pointed out in section 1.1, ICA is inefficient in the presence of high frequency scatter. This is alleviated in the wavelet space as by the virtue of the multiresolution analysis, most highfrequency scatter is contained in the coefficients and a better degree of separation can be achieved for lower frequency systematics. We will demonstrate this property in section 3.2. It is also possible in wavelet space to selectively suppress Gaussian noise via soft or hard thresholding (e.g. Stein, 1981; Donoho, 1995) and hence increase the robustness of the BSS algorithm in low S/N conditions.
2.1 Amplitude Calibrated ICA (Acica)
Arguably the central problem with an exoplanetary data detrending algorithm based on ICA is the scale and sign ambiguity of the decorrelated, independent components.
Whilst we do not know the scaling of individual source components in , we know by definition of the ICA preprocessing step (whitening) that each source signal is normalised to unit variance, , and that . Hence the elements of do not need to preserve the absolute but the relative scalings of . If we know the absolute scaling of one source signal, we can hence solve the scaling degeneracy of all signals. This is usually possible for simulations where we control the inputs but not for real life examples where the source signals and their scalings are a priori unknown. One way to solve this predicament is via the introduction of a calibration signal (CS). Such a CS must have the following properties:
1) Minimal to no impact on the data: The introduction of a CS should not distort any underlying signals or their amplitudes.
2) Temporal localisation: The CS should not be located in intransit (INT) regions and not take up too much outoftransit (OOT) data.
3) Stability to noise: The CS should not be affected by the noise of the data (be it Gaussian or otherwise).
4) NonGaussianity: The signal should be nonGaussian enough for the ICA to recognise it but not more nonGaussian than the science signal itself. A too prominent CS could bias the ICA towards the retrieval of the CS and could impair the retrieval of other nonGaussian signals.
In the time domain, it is difficult to implement property 4 without violating property 1. In order to implement a distinct enough nonGaussian signal in the time domain, one needs to significantly alter large sections of the data. Also the treatment of noise (property 3) is problematic. In the frequency domain, property 2 is violated as the Fourier transform does not contain temporal information and the CS would superimpose the science signal. However, all four criteria can be met in the wavelet domain.
2.1.1 Injecting the Calibration Signal
In the wavelet domain, we can introduce a much sparser CS than in the time domain and have control over its temporal location (unlike in the frequency domain) via the lag term . Given that is more redundant than , we can minimise the impact on the observed data or avoid any alterations to the data altogether by selecting wavelet coefficients of with zero or nearzero amplitudes to be used for our CS. We can now redefine equation 2 as
(17) 
where denotes the observed data with CS, is the CS with the same dimensions as and is a random but known scaling constant. We define as
(18) 
where are preselected lags for a given scale that correspond to a section of the OOT data. Note that:
1) No average coefficients, , are used in since these are not redundant enough.
2) As equation 17 suggests, only number of independent components can be extracted. Adding the CS in the the observed data may render the ICA overcomplete as one source signal will not be retrieved in favour of the CS. For large data sets () this is generally not a problem.
3) We choose to avoid the CS to be the most dominant feature in the data.
4) In equation 18 we have chosen to define to contain one nonzero coefficient per scale and lags corresponding to the same area of OOT data in the timedomain. This allows for efficient use of OOT data and a high sparsity in the wavelet domain but is entirely arbitrary otherwise.
2.1.2 Retrieving the scaling information
Having injected the calibration signal into our observations, we can now perform the ICA deconvolution (section 1.2) on the data with CS in the wavelet domain, . We identify the CS in the retrieved source signals and denote their respective elements of the mixing matrix, as . By measuring the average amplitude of the wavelet coefficients comprising the retrieved CS, , and knowing the original CS amplitude, ,
(19) 
we can retrieve the scaling of the CS as well as those of the other signals contained in . We denote this calibrated mixing matrix as with its elements given by
(20) 
2.1.3 Calibration error
Using the above calibration approach we consider the total error on the independent components (ICs) to be a combination of source separation error (SSE) of the individual IC and the SSE of the calibration signal components, . The SSE for the source signal can be estimated by
(21) 
where is called the ‘gain matrix’ and defined as . For perfectly separated sources, we hence have where is the identity matrix. The ICA algorithms employed here (EFICA and WASOBI; Koldovský et al., 2006; Yeredor, 2000; Tichavský et al., 2006b) can be shown to be asymptotically efficient and converge to the correct solution in the limit of iterations. However in real world scenarios this is not the case and we find that the estimated mixing (or demixing) matrix is only approximately equal to the true underlying mixing matrix , i.e. . To estimate the SSE, we can hence inspect the variance of the matrix . Whilst it is possible to directly calculate for simulations, the true mixing of the signals is usually unknown in real data applications. Tichavský et al. (2006a); Koldovský et al. (2006) and Tichavský et al. (2008) have shown that an asymptotic estimate of is nonetheless possible. For a derivation we refer the reader to the cited literature.
We define the source separation error for the calibrated components in as the quadrature error of the individual source components’ error and the SSE of the calibration signal
(22) 
3 Application Examples
3.1 Simulations
In this section we illustrate the ICA calibration approach using simulations.
1) For this we generated five input signals: a Mandel & Agol (2002) secondary eclipse curve of HD189733b, a Gaussian process with FWHM of 210 amplitude, and three instrument noise vectors derived from HST/NICMOS observations (Waldmann et al., 2013). These input signals are shown in figure 3.
2) We mix those signals using a randomly generated mixing matrix to obtain our ‘observed’ time series, , figure 4.
3) Using the MATLAB^{1}^{1}1http://www.mathworks.co.uk wavelettoolbox and daubechies 4 (db4) wavelets (Daubechies, 1988, 1992) we calculated the discrete wavelet transform for four scales () of each time series in to obtain (figure 5, black and blue curves). In the scope of this simulation we found the choice of wavelet and number of vanishing moments not to matter greatly. However the choice of wavelet and decomposition depth may vary from data set to data set.
4) We generated the calibration signal, , to consist of one wavelet coefficient per scale giving total number of nonzero coefficients. For each series, , the calibration signal was multiplied with the random scaling factor , i.e. . Figure 5 (red peaks) shows the scaled calibration signal for each . Note that we chose the lags of the nonzero coefficient to correspond to the 95 percentile of each scale. This guarantees the CS to be located in the posteclipse outoftransit data. It also overlaps the differently scaled wavelet impulses and hence minimises the impact on the timedomain data representation. See figure 6 for a timedomain representation of (third from top in figure 5).
3.2 Spitzer/IRS: HD189733b secondary eclipse
We test the proposed algorithm on Spitzer/IRS (Houck et al., 2004) observations of a secondary eclipse of HD189733b. These observations were obtained in November 2006 (program id: 30473) in low resolution mode ranging from 7.46 to 14.29m. The secondary eclipse was followed for 5:48 hours with integration times of 14.7 seconds per ramp. The Spitzer pipeline calibrated data were reduced using the Spice^{2}^{2}2version 2.5. http://irsa.ipac.caltech.edu/data
/SPITZER/docs/dataanalysistools/tools/spice/ spectral reduction software. Examples of the resulting time series are shown in figures 9 & 10 (blue crosses). Similar to section 3.1, we take these time series as inputs to our algorithm. Figure 11 shows the DWT, using db4 wavelets, of the time series at 7.6971m (black lines) with the injected calibration signal over plotted in red. Note that no binning in wavelength was performed before, which marks a significant difference to the HST/NICMOS analysis in Waldmann et al. (2013), where a relatively coarse binning was necessary to reduce the Gaussian noise.
We now follow through each individual step as described in the previous sections and obtain the scaled independent components of the data set. Figures 9 & 10 show two observed time series (blue crosses) with the correctly scaled individual independent components (black dots) underneath. The first and second ICs comprise the secondary eclipse signal and the calibration signal respectively. These are also shown in figure 12. The remaining ICs are instrument or stellar systematic noise. The amplitudes of these systematic components can be seen to be lower toward the blue end of the spectrum (figure 9), which is also evident in a cleaner observed time series, and more pronounced toward the red end of the spectrum (figure 10). In figure 12 we over plotted the best fit Mandel & Agol (2002) model using a MCMC fitting routine (Waldmann et al., 2013) with the transit depth as only free parameter. The transit depth posterior distribution is shown in figure 13. The total error is the quadrature sum of and the MCMC derived standard error . We obtain at 7.6971m a planet/star contrast ratio of which we find consistent with the measurement by Grillmair et al. (2007). Figure 12 also shows the retrieved calibration signal (black dots) with the original calibration signal over plotted in red. The excellent match between both indicates an adequate signal separation and the correct scaling of the independent components.
In order to demonstrate the increased efficiency of the proposed ACICA algorithm in the presence of noise, we also show the ICs derived by performing the more ‘traditional’ ICA in the time domain only (figure 14). The components in figure 14 are poorly separated and the standard ICA analysis did not converged with traces of the secondary eclipse feature present in three separate components.
4 Discussion
Using ACICA, we can directly retrieve the signal amplitudes of the source components comprising our data. It not only allows us to nonparametrically detrend low S/N data sets but also allows for a unique insight into the structural make up of the observations. A study of the systematic components and their amplitudes offers a powerful diagnostic on systematic noise behaviour across a detector. As shown in the Spitzer/IRS example, amplitudes of systematic components in the data can vary significantly across the dispersion of a grism. A study of these systematic components may allow, amongst others, for an optimisation of residual flatfielding errors across a field or the characterisation of wavelength dependent slitlosses of the instrument. As data diagnostic we can hence define the S/N of a time series in terms of its systematic noise
(23) 
where is the variance of a given systematic noise component at for the time series and is the respective measured transit depth.
The choice of wavelet family can be an important factor in the convergence properties of the code. For noisy data we find a db4 wavelet base to be the best compromise between slowly varying continues functions and steplike, discrete offsets in the data. Should the data’s systematics be dominated by offset patterns, such as those often induced by nodding (e.g. Grillmair et al., 2007), we find a Haar wavelet to give a sparser representation of the data than the Daubechies family. Similarly, high order Daubechies wavelets or symmetric wavelets (Symlets) are best suited if one’s data is defined by continuous evolving trends.
As mentioned in section 2, Gaussian noise can be suppressed in wavelet space by the selective thresholding of detailed coefficients. Whilst true in theory, we found it difficult to implement in practice and great care needs to be taken in the choice of thresholding rule and amplitude to avoid ‘over cleaning’ and distorting of the underlying data. Fortunately we find that convergence properties of the ICA deconvolution are much enhanced by the transformation of the data into wavelet space alone and further cleaning was (in the scope of this analysis) unnecessary.
5 Conclusion
In this paper we present a novel approach to the amplitude calibration of independent component analysis (the ACICA algorithm) via the introduction of a sparse calibration signal in orthogonal wavelet space. By transforming observed time series into series of wavelet coefficients, we are able to overcome two main limitations of blind source separation algorithms such as ICA: 1) the convergence of ICA algorithms is strongly impaired by the presence of Gaussian noise in the data. By transforming data to a sparser and more redundant basis we can significantly improve the performance of the ICA deconvolution without otherwise altering the data. 2) In the wavelet domain it is possible to inject an artificial calibration signal of known amplitude into the data without significant or any impact to the original data. With the use of such a calibration signal we can directly determine the individual absolute amplitudes of the derived independent components. This marks a significant improvement over methods such as they are discussed in Waldmann et al. (2013) where component amplitudes were derived through a regression analysis to existing out of transit baseline data.
ACICA allows us to detrend otherwise inaccessible data sets nonparametrically. We demonstrated this using simulations and archival Spitzer/IRS data. It furthermore offers us an unprecedented and unique insight into the morphology of a data set by allowing us to directly map out temporal/wavelength dependent variations of instrumental or stellar noise in the data set.
Together, these attributes make the algorithm proposed here a highly versatile and powerful tool for exoplanetary time series analysis.
References
 Carter & Winn (2009) Carter J.A., Winn J.N., 2009, ApJ, 704, 51
 Claret (2000) Claret, A., 2000, AA, 363, 1081
 Comon & Jutten (2010) Comon, P., & Jutten, C., 2010, ‘Handbook of Blind Source Separation: Independent Component Analysis and Applications’, Academic Press
 Crouzet et al. (2012) Crouzet, N., McCullough, P. R., Burke, C. J., Long, D., 2012, axXiv: 1210.5275v1
 Daubechies (1988) Daubechies, I., 1988, Commun. Pure Appl. Math, 41, 909
 Daubechies (1992) Daubechies, I., 1992, ‘Ten Lectures on Wavelets’, Soc. for Industrial and Applied Mathematics
 Donoho (1995) Donoho,D.L., 1995, IEEE Trans. on. Inf. Theory, 41,3,613
 Gibson et al. (2012) Gibson, N. P., Aigrain, S., Roberts, S., et al., 2012, MNRAS, 419, 2683
 Grillmair et al. (2007) Grillmair, C. J., Charbonneau, D., Burrows, A., et al., 2007, ApJL, 658, L115
 Grillmair et al. (2008) Grillmair, C. J., Burrows, A., Charbonneau, D., Armus, L., Stauffer, J., Meadows, V., van Cleve, J., von Braun, K. & Levine, D., 2008, Nature, 456, 767
 Haario et al. (2006) Haario, H., Laine, L., Mira, A., Saksman, E., 2006, Statistics and Computing, 16, 339
 Haario et al. (2001) Haario, H., Saksman, E., Tamminen, J., 2001, Bernoulli, 7, 223
 Hastings (1970) Hastings, W. K., 1970, Biometrika, 57, 97
 Houck et al. (2004) Houck, J. R., Roellig, T. L., van Cleve, J., et al., 2004, ApJS, 154,18
 Hyvärinen (1999) Hyvärinen A.,1999, IEEE Trans. on Neural Networks, 10, 626
 Hyvärinen & Oja (2000) Hyvärinen A., Oja, E., 2000, Neural Networks, 13, 411
 Hyvärinen & Oja. (2001) Hyvärinen A., Karhunen J., Oja E., 2001, ‘Independent Component Analysis’, John Wiley & Sons Inc.
 Inuso et al. (2007) Inuso, G., La Foresta, F., Mammone, N., Morabito, F., C., 2007, Int. J. Conf. on Neural Networks, 1524
 Koldovský et al. (2006) Koldovský Z., Tichavský P., Oja E., 2006, IEEE Trans. on Neural Networks, 17,1265
 Koldovský et al. (2005) Koldovský Z., Tichavský P., Oja E., 2005, Proc.of IEEE/SP 13th Workshop on Stat. Signal Processing
 Lin & Zhang (2005) Lin, J., Zhang, A., 2005, NDT&E International, 38, 421
 Mammone et al. (2012) Mammone, N., La Foresta, F., Morabito, F. C., 2012, IEEE Sensors Journal, 12, 533
 Mandel & Agol (2002) Mandel, K.and Agol, E., 2002, ApJL, 580, L171
 Oja (1992) Oja, E., 1992, Neural Networks, 5, 927
 Percival & Walden (2000) Percival,D.B., Walden, A.T.,2000, ‘Wavelet Methods for Time Series Analysis’, Cambridge Uni. Press
 Press et al. (2007) Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P., 2007, ‘Numerical Recipes’, Cambridge Uni. Press
 Riley et al. (2002) Riley, K. F., Hobson, M., P., Bence, S. J., 2002, ‘Mathematical Methods for Physics and Engineering’, Cambridge Uni. Press
 Shannon (1948) Shannon, C., 1948, Bell System Tech. J., 27, 379
 Stein (1981) Stein, C.M., 1981, Ann. Statist., 9, 6, 1135
 Stone (2004) Stone, J., V., 2004, ‘Independent Component Analysis: A tutorial Introduction’, A Bradford Book
 Swain et al. (2008) Swain, M. R.,Vasisht, G. and Tinetti, G., 2008, Nature, 452, 329
 Swain et al. (2009a) Swain, M. R., Vasisht, G., Tinetti, G., Bouwman, J., Chen, P., et al , 2009, ApJL, 690, L114
 Swain et al. (2009b) Swain, M. R., Tinetti, G., Vasisht, G., Deroo, P., Griffith, C., et al., D., 2009, ApJ, 704, 1616
 Swain et al. (2011) Swain, M. R. and Deroo, P. and Vasisht, G., 2011, IAU Symposium, 276, 148
 Tichavský et al. (2006a) Tichavský P., Doron E., Yeredor A., GomezHerrero G., 2006, Proc. EUSIPCO2006
 Tichavský et al. (2006b) Tichavský P., Doron E., Yeredor A., Nielsen J., 2006, Proc. EUSIPCO2006
 Tichavský et al. (2008) Tichavsky, P., Koldovsky, Z., Yeredor, A., GomezHerrero, G. , Doron, E., 2008, IEEE Transactions on Neural Networks, 19, 421
 Tessenyi et al. (2012) Tessenyi, M., Ollivier, M., Tinetti, G., et al., 2012, ApJ, 746, 45
 Thatte et al. (2010) Thatte, A. and Deroo, P. and Swain, M. R., 2010, A&A, 523, 35
 Waldmann (2012) Waldmann, I. P., 2012, ApJ, 747, 12
 Waldmann et al. (2013) Waldmann, I P., Tinetti, G., Deroo, P., et al., 2013, ApJ accepted
 Yeredor (2000) Yeredor A., 2000, IEEE Sig. Proc. Letters, 7, 197
Appendix A The quadrature mirror filter
A very fast and simple implementation of the DWT for multiresolution decomposition is by constructing the quadrature mirror transformation matrix. For a Daubechies4 wavelet we obtain four coefficients comprising and similarly . Rather than convolving the time series, with both filters separately and then downsample, we can also construct a matrix where each odd row contains and each even row contains coefficients. This automatically downsamples the data to the new resolution . Such a matrix is called a ‘quadrature mirror filter’ (QMF) and equation A1 is an example of such (Press et al., 2007)
(A1) 
where the empty spaces denote zeros. To obtain a DWT using this QMF, we multiply the QMF with the column vector containing the timeseries on the right. Note the circular behaviour of the matrix at the bottom, where the wavelet coefficients wrap around to the beginning. This has important consequences as it indicates that the DWT wraps around the data and the transform at the end of the timeseries is sensitive to data at the beginning of the time series. This effect can be avoided by adding sufficient zerovalued points to the time series at its beginning and end. This process is also known as ‘zeropadding’.