Sparsity and adaptivity for the blind separation of partially correlated sources
Abstract
Blind source separation (BSS) is a very popular technique to analyze multichannel data. In this context, the data are modeled as the linear combination of sources to be retrieved. For that purpose, standard BSS methods all rely on some discrimination principle, whether it is statistical independence or morphological diversity, to distinguish between the sources. However, dealing with realworld data reveals that such assumptions are rarely valid in practice: the signals of interest are more likely partially correlated, which generally hampers the performances of standard BSS methods. In this article, we introduce a novel sparsityenforcing BSS method coined Adaptive Morphological Component Analysis (AMCA), which is designed to retrieve sparse and partially correlated sources. More precisely, it makes profit of an adaptive reweighting scheme to favor/penalize samples based on their level of correlation. Extensive numerical experiments have been carried out which show that the proposed method is robust to the partial correlation of sources while standard BSS techniques fail. The AMCA algorithm is evaluated in the field of astrophysics for the separation of physical components from microwave data.
EDICS: MLRSSEP
Source separation, sparse representations, morphological component analysis, partially correlated sources.
I Introduction
With the rapid development of multiwavelength or multisensor instruments, it is crucial to design analysis tools to efficiently extract meaningful information. This is especially the case in the field of astrophysics where the development of dedicated component separation methods has been recently of prime importance for the processing of microwave surveys such as NASAWMAP or ESAPlanck [1, 2, 3]. For that purpose, Blind source separation (BSS) is a particularly wellsuited framework to analyze multichannel data. In this context and following the socalled linear mixture model, each observation or channel is modeled as the linear combination of sources :
where quantifies the contribution of the th source in the observation . Each datum and source are assumed to have entries. The term stands for additive noise or model imperfections. The wellknown linear mixture model is more conveniently recast in the following matrix form:
(1) 
where is a matrix the rows of which are composed by the observations, is the socalled mixing matrix and is the matrix which contains the sources. The goal of blind source separation is to blindly estimate both the mixing matrix and the sources from the knowledge of only. If one neglects the noise , there exists an infinite number of couples which satisfy . This is an illposed problem which requires using prior information about either the mixing matrix or the sources to be recovered. For that purpose, classical approaches rely on information which promote some discriminant information or diversity between the sources. Since the beginning of the 90s, Independent Component Analysis or ICA (see and [4, 5, 6] references therein) has taken the lion share. In the framework of ICA, the sources are assumed to be statistically independent. Most methods proposed so far in this context generally differ in the way they promote and measure independence. In the remaining of this paper, we will primarily focus on the separation of sparse sources.
Ia Sparse blind sources separation
Sparkled by advances in harmonic analysis and applied mathematics, the sparse modeling of signals [7] has attracted a lot of interest and has proved to be effective at solving a very large range of inverse problems: denoising, deconvolution [8], compressive sensing reconstruction [9], inpainting [10] to only name a few. In this context, it is assumed that the each source can be sparsely decomposed in some basis, waveform dictionary or signal representation :
(2) 
The sparsity of in the signal representation is customarily described with two different models:

Exact sparsity model: each source is assumed to have a small number of nonzero coefficients in . The set of nonzero coefficients, or support of the source , is denoted by .

Approximate sparsity model: natural signals do not generally have an exactly sparse distribution in . They rather exhibit an approximately sparse distribution: most of the entries of the expansion coefficient vectors are zero or with negligible amplitudes and only a few take significant amplitudes.
Most natural signals exhibit such sparsity property in adequately chosen signal representations . Examples of standard signal representations include the wavelets [11, 12], the curvelets [13] or even adaptively learned signal representations [14]. The sparsity property is particularly appealing since few significant expansion coefficients of in efficiently or compressively encode its information content. Sparsity has been exploited in recent sparse blind source separation methods (see [15, 16, 17, 18]).
In the framework of ICA, sparse and independent sources have been studied in [15, 19]. The RNA method [15] looks for sources by enforcing their sparsity in an orthonormal basis. The Efficient FastICA (EFICA) algorithm described in [19] is an improvement of the acclaimed FastICA algorithm [20] which is specifically adapted to retrieve sources with generalized Gaussian distribution; this includes sparse sources.
The compressibility of the signal allowed by sparse representations has been exploited in the Sparse Component Analysis (SCA) algorithm introduced in [21, 22, 17]. In the framework of SCA, the sources are assumed to have nearly or exactly disjoint supports : for each source, a few entries are nonzero and each active entry is active in only a single source. Relaxing this assumption has been proposed assuming that there exist entries where only one source is active. Assuming that the sparsity patterns of the sources are fully or partially disjoint has inspired several sparse BSS techniques for the processing of time series and audio signals [23, 24, 25, 26].
However these assumptions are seldom verified with more complex data, especially in imaging : i) signals or images are generally not exactly but rather approximately sparse in , thus entailing that there is no sample where the sources vanish, ii) there is generally no sample where each source is active alone. These assumptions have been further relaxed to approximately sparse sources in [18] with the introduction of morphological diversity (MD). This concept rather states that sources with different morphologies are unlikely to have similar large amplitude coefficients in . For instance, sparse and independently distributed sources are very likely to verify the MD assumption. Since highamplitude entries of the sources in encode the most salient morphological features of the sources, sources which do not share similar morphologies will not therefore activate similar expansion coefficients in . Building on the concept of morphological diversity, Generalized Morphological Component Analysis (GMCA) has been showed to be an effective BSS method [18, 27].
Recent developments in sparse BSS have mainly emphasize on solving nonnegative matrix factorization problems with sparsity constraints on the sources [28, 29, 30, 31, 32]. In the present paper, we focus on the general setting where the mixing matrix and the sources are not necessarily nonnegative.
IB Realworld data are often partially correlated
Dealing with reallife signals reveals that the actual sources are seldom perfectly uncorrelated: neither statistical independence nor morphological diversity are valid assumptions. Most classical source separation methods may not work properly for dependent sources and would likely yield erroneous results.
In the field of spectroscopic data processing, the sources of interest are often partially correlated. In this particular field of research, the development of dedicated source separation techniques has already been considered assuming strong assumptions about the sources to be retrieved [33, 34]. In [33], the separation of partially correlated signals is made possible by assuming that for each source there exists at least one nonzero entry that vanishes for the other sources; this is obviously reminiscent of the disjoint source support introduced in the framework of SCA. Together with the nonnegativity of the sources and the mixing matrix, the fact that the sources are assumed to have partially disjoint supports dramatically simplifies the separation task. Indeed, in such a setting, each column of the matrix are colinear with some of the columns of the data . Thanks to the nonnegativity assumption, the data are positive linear combinations of positive sources. From a geometrical point of view, the columns of the mixing matrix are the vertices of the conical hull generated by the columns of (see [35, 36] for more precisions). This geometrical property has been exploited to design algorithms that seek the simplex with minimal volume that encloses the data samples [37]. This framework has been showed to be well adapted to the blind separation of very sparse and nonnegative sources. This assumption has been somehow relaxed to model sources that are composed of sparse spikes and a nonnegative baseline in [34]. In this case, the sources are modeled as a combination of two components: i) a sparse spikes train and ii) a smooth continuum, where the sparse spike component verify the source support assumption made in [33].
Partially correlated sources are often found in other seemingly different scientific domains such as astrophysics. As an illustration, the left panels of Figure 1 show simulations of two components which originate from distinct astrophysical phenomena. From a mathematical perspective, these two components are characterized by rather different columns of the mixing matrix (in physics, this translates into different electromagnetic spectra), and by somehow different spatial distributions. As a consequence, they can be physically and mathematically considered as sources. These components are generated by different physical mechanisms (e.g. synchrotron emission, Bremsstrahlung, thermal emission, etc.) and are therefore considered as distinct components or sources by astrophysicists. However these signals depend on common physical parameters (e.g. gas density, temperature, etc.) which may make them partially correlated. We refer the interested reader to [38, 39] for more details about the astrophysics in the microwave wavelengths.
The right panel of Figure 1 reports the values taken by the correlation coefficients between these two astrophysical components in the wavelet domain. This particular example also highlights a second origin for signal partial correlation: chancecorrelations. Indeed, in the finite sample case, single realizations of theoretically independent or uncorrelated sources are likely to have nonzero correlation. In Figure 1, the number of available samples decreases at lowfrequency, which plays a nonnegligible role in the increase of the correlation coefficients at largescale.
In the general setting, such as in imaging science, the limitations of standard methods are the following:

Sparsity: standard methods assume that the sources are sparse and nonnegative in the sample domain. However, it is more generally appropriate to model the sparsity of the sources in some transformed domain , where nonnegativity does not hold.

Nonnegativity: the nonnegativity of both the sources and the mixing matrix, which is a valid assumption in a wide range of applications, does not hold in some applications (for instance in astrophysics, see [40]).

Sources with partially disjoint supports: assuming that, for each source, there exist at least one data sample where a single source is active is a strong assumption in general. In the general setting, the sources are approximately sparse in a transformed domain. As a consequence, all the sources have nonvanishing entries.
The goal of the present paper is to introduce a novel blind source separation algorithm that relaxes these assumptions in the case of sparse and partially correlated sources.
Contribution
In this article, building upon sparsity and the concept of morphological diversity, we introduce a novel blind source separation technique for the separation of sparse and partially correlated sources from noisy measurements. This new technique coined Adaptive Morphological Component Analysis (AMCA) makes no assumption about the nonnegativity of the sources or their mixtures. This makes this method suitable for a very wide range of separation problems especially in imaging.
The paper is organized as follows: Section II reviews the basics of the GMCA algorithm, its limitations in the case of partially correlated sources, and how they can be alleviated. Section IIB introduces the AMCA algorithm. Section III reports results of extensive evaluations of performances the AMCA algorithm with respect to standard sparse BSS algorithms. Finally, Section IIID illustrates the performances of AMCA in the context of astrophysical source separation.
Ii Adaptive sparsity and separation of partially correlated sources
Iia Morphological diversity and the separation of partially correlated sources
Building on the concept of morphological diversity, the GMCA algorithm (Generalized Morphological Component Analysis  see [18]) aims at retrieving the sparse sources and the mixing matrix by solving the following optimization problem :
(3) 
where the first term penalizes nonsparse solutions; the quadratic part is a classical datafidelity term. The Frobenius norm is defined as follows: . The sparsity level is measured by the norm of the sources with . We generally choose either or ; we refer the interested reader to [18, 27] for a detailed discussion of the choice a particular norm for the sparsity penalty. The norm is particularly appealing since it makes the estimation of a convex optimization problem when is fixed. However, as discussed in [18], it tends to produce biased estimates of the sources. In the remaining of this paper, we will therefore choose which has been showed to provide the best separation results in [18].
The problem in Equation (3) is classically tackled by using an alternate projected leastsquare algorithm. The GMCA algorithm processes by iteratively alternating the following steps :

Step 1 : Estimation of for fixed :

Step 2 : Estimation of for fixed :
Unless and are orthogonal matrices, the solution to the problem in Step 1 does not admit a closedform expression. In the GMCA algorithm, this step is rather approximated with a projected leastsquare estimator :
where is the Moore pseudoinverse of the matrix and stands for the th row of the matrix . The operator is the wellknown hardthresholding operator with threshold . For any vector , this operator performs on each entry as follows:
The second step of the GMCA algorithm is a more classical leastsquare problem which admits a closedform solution : .
Sparse BSS and partially correlated sources
In this section, we discuss how sparse and partially correlated (s.p.c.) sources can be estimated using sparse blind source separation. For the sake of clarity, the transform is fixed to the identity matrix. In essence, it has been emphasized in [18, 27] that, in the lownoise limit, solving the problem in Equation 3 with GMCA is somehow similar to finding the sources which are jointly the sparsest possible. More precisely, assuming that the norm is used to measure the sparsity level of the sources, sparse BSS tends to estimate the sources and the mixing matrix so that the sources are enclosed in the ball with the smallest radius. It is important to notice that, for sparse sources with independently distributed entries, any nontrivial mixture of the sources has, with high probability, a higher norm (see [27]); it will therefore be enclosed in a ball with larger radius. This observation motivates the design of a separation procedure that looks for the sparsest sources as performed by the GMCA algorithm.
In the case of s.p.c. sources, this separation procedure may no longer be viable. As an illustration, Figure 2 shows an example of two s.p.c. sources. In brief, the nonzero entries of the sources are independently distributed with the exception of a few samples – labeled with red and blue dots – which are shared by the two sources. Figure 3 displays the scatter plot (i.e. this plane features the samples of in abscissa and in ordinates) of the two sources estimated by looking for the sparsest sources from two mixtures as is done in the GMCA algorithm. The estimated sources are enclosed in the ball with the smallest radius (dotted purple ball). However they do not match the sought after sources; these would actually be enclosed in the dotted red ball. The black arrow features the axis along which the abscissa and ordinates should be aligned if the right sources were correctly estimated. This simple toyexample highlights the limits of any standard sparse BSS method: the presence of significantly active samples in more than one source entails that the sparsest solution might not be the correct solution.
These empirical observations raise the following crucial question: which are the sources’ samples that are the most relevant or informative for separation ? Sparse BSS methods will generally be handicapped by the presence of samples which simultaneously have a significant amplitude in more than one source. For instance, in Figure 3, the sample located at the position takes a large amplitude in both sources. These samples are deemed nondiscriminant and therefore of poor relevance for the estimation of the sources and the mixing matrix.
In the rest of the paper, we will assume that the samples of the sources can be decomposed into two distinct subsets: will define the set of the sources’ samples where morphological diversity does not hold: these samples are likely to take significant amplitudes in more than one source. The set will denote the complement of ; morphological diversity is assumed to hold true in this set. will therefore contain the samples that are the most discriminant and, subsequently, the most relevant for the separation.
If the subset were known, the sources could be estimated in a straightforward way by restricting the application of any sparse BSS method to . In the framework of GMCA, one would trivially substitute step 2 with the following :
(4) 
where the operator denotes the submatrix made of the column of indexed by .
This assumption is unfortunately unrealistic in practice. The set has therefore to be learned along with the sources and the mixing matrix. It is important to note that attempting to estimate as an extra unknown in Equation (3) would lead to a cumbersome combinatorial problem. Since it cannot be reasonably estimated directly as an extra parameter, we rather advocate the use of a more flexible way to privilege discriminant samples, which are likely to belong to .
The problem in Equation 4 is equivalent to a weighted leastsquare problem of the form:
(5) 
where is a diagonal matrix with ones for samples in and zeros otherwise. Since is unknown, we propose relaxing this strategy with an adaptive reweighting procedure. For that purpose, the mixing matrix will be estimated in a similar manner as in Equation (5). Each sample is given a weight which measures their discriminant level. These weights are encoded in the diagonal matrix . We highlighted previously that standard BSS methods are hindered by samples with large amplitudes in more than one source. Conversely, the most discriminant samples will be the ones that mainly take a large amplitude in a single source. These properties suggest that discriminant samples will be traced by the sparsity level of the columns of the matrix . Consequently, we define the weight vector as follows:
(6) 
where is the th column of . The scalar may be required when there exist columns of with vanishing norm; its value is generally small, typically .
The parameter will be chosen in the range so as to measure the sparsity level of the samples across the sources. Other choices for the samples’ weights can be considered. However, this choice is particularly interesting as : i) it accounts for the natural sparsity of the sources, ii) the samples in are indeed sparser across the sources than samples which are common to all sources and iii) this weighting strategy is inversely proportional to the amplitude of the columns of . The former entails that it penalizes more samples of the sources which are common to several sources and have a high amplitude in the meantime. This type of samples are generally the most detrimental to sparse BSS methods when the sources are partially correlated.
IiB AMCA: Adaptive Morphological Component Analysis
In this section, we introduce a novel blind source separation algorithm, which we called AMCA (Adaptive Morphological Component Analysis), to specifically deal with s.p.c. sources, that builds upon the GMCA algorithm.
In the previous section, we advocated the use of a weighting strategy to privilege/penalize data samples based on their ability to discriminate between s.p.c. sources. This formally amounts to estimating the mixing matrix and the sources by solving the following minimization problem:
(7) 
where the matrix stands for the weighting matrix. This problem takes a simpler formulation in the transformed domain. For the sake of simplicity, we will first consider that is an orthogonal matrix. We will further define and , which respectively stand for expansion coefficients of the data and the sources in the transformed domain. Thanks to this reparameterization, and assuming that , the problem in Equation (7) can be recast as follows:
(8) 
where , and is a diagonal matrix that stores the regularization parameters. According to the sparse modeling developed in the previous section, the partial correlation of the sources is modeled as a correlation of certain entries of the sources in the transformed domain. Subsequently, the matrix is diagonal and it can be defined as , where the operator is the Hadamard product. From Equation (6), the weight vector is defined as:
(9) 
In the next, we will write denote the weighting matrix as to highlight its dependency on the parameter .
The optimal weight vector clearly depends on the unknown true sources. In practice, one has generally only access to an imperfect estimate ; the weight vector and the weight matrix will be computed from some current estimate in the proposed algorithm.
For a fixed weight matrix , minimizing the problem in Equation 8 can be tackled by alternately updating and in the spirit of Block Coordinate Relaxation(BCR  see [41, 42]). This procedure then turns to alternately solving sequences of convex minimization problems in place of a globally nonconvex problem.
Updating for a fixed mixing matrix
Assuming that the mixing matrix is fixed, estimating the source matrix is performed by looking for the unique minimizer of the following convex problem:
(10) 
The equation to be minimized can be decomposed into two terms: i) a quadratic and differentiable data fidelity term, and ii) a convex but nondifferentiable penalty. Fortunately, the is a simple and proper convex function that admits a proximal operator (we refer the interested reader to [43, 44] for more details about proximal calculus). Furthermore, the quadratic term is differentiable and its gradient is Lipschitz with constant . This entails that the problem in Equation (10) can be solved exactly using techniques such as the ForwardBackward splitting algorithm [43]. This optimization strategy has been used in [31] for solving sparse nonnegative matrix factorization problems. However, it has the strong disadvantage of dramatically increasing the computational cost of the algorithm: update would require resorting to efficient but costly iterative algorithms. Furthermore, each time the source matrix is updated in the algorithm AMCA algorithm, it is fully reestimated; therefore, it may not be necessary to update with high precision at each step of AMCA. Subsequently, we rather resort to a simpler but approximate minimization scheme in the spirit of the GMCA algorithm.
More precisely, finding the critical point Equation (10) can be done by deriving its subdifferential with respect to [45] and finding its roots:
(11) 
where we used the fact that is diagonal. The term stands for the subdifferential of the matrix norm taken at ; it is well known that the subdifferential of the matrix norm is defined as follows:
The root of the subdifferential in Equation (11) does not admit an explicit formulation unless is diagonal, which would make Equation (11) separable with respect to each individual source. In such a case, each source can be estimated independently and takes the following closedform expression:
(12) 
where stands for the softthresholding operator (i.e. the socalled proximal operator of the norm) and . It is important to note that, in this setting, this is equivalent to:
(13) 
where and is precisely the leastsquares estimate of the sources. In other, this procedure amounts to thresholding the leastsquare estimate (i.e. the minimum of the quadratic data fidelity term), which thus dramatically reduces the computational burden of this step in the AMCA algorithm. However, this equivalence holds whenever is orthogonal (i.e. is diagonal), which is not always a valid assumption. As a consequence, updating according to Equation (16) provides a rough proxy in the general case which might prevent the algorithm from convergence. This will be discussed later on in this section.
Updating for a fixed source matrix
The mixing matrix is updated assuming the source matrix is fixed, which is performed by tackling the following convex problem:
(14) 
This is a standard reweighted quadratic problem, which admits the following unique minimizer:
(15) 
The AMCA algorithm
In the spirit of BCR, the AMCA algorithm mainly alternates by updating and according to Equations (16) and (15). However, the overall estimation problem of Equation (7) is nonconvex; this means that this estimation procedure is likely to be prone to trapping in local minima. This will highly depend on the initialization of the AMCA algorithm. In [27], we showed that thanks to the sparsity of the sources and more specifically to the morphological diversity assumption, starting from large regularization parameters (or equivalently large thresholds) and decreasing their values towards a final noisedependent threshold greatly helped improving the robustness of the algorithm GMCA with respect to local minima. The same procedure is implemented in the AMCA algorithm: the are first set up to some large initial values and then decreased towards final values .
As pointed out earlier in this Section, the weight matrix is computed from some guess for the true sources . Since one has only access to estimated sources, and these estimates are expected to improve at each step of the algorithm, the weight matrix is updated at each iteration according to the current estimate of the sources . The choice of the parameters are discussed in Section IIC.
In [27], different thresholding strategies have been considered, namely soft and hardthresholding. In contrast to hardthresholding, the main drawback of the softthresholding is that it is prone to biasing the estimated sources. In practice, substituting the softthresholding with a hardthresholding in the GMCA algorithm provided a substantial improvement of the separation performances. As a consequence, the update step described in Equation (16) is replaced by :
(16) 
where stands for the hardthresholding operator with thresholds .
The resulting algorithm has been named Adaptive Morphological Component Analysis (AMCA). The adaptivity of AMCA stems from its attempt to estimate a weight matrix that better favors discriminant entries. In other words, this can be interpreted as adaptively learning which samples are more likely to verify the morphological diversity.
The AMCA algorithm is detailed below:
I. Transform the data
II. Initialize , and III. While , 1  Update assuming is fixed : , where is the th column of 2  Update the weight matrix : , where is the th column of 3  Update assuming is fixed : 4  Decrease each and the parameter IV. Reconstruct the sources . 
On the convergence of the AMCA algorithm
The problem in Equation (8) is not convex; therefore, only convergence to a critical point can be expected. The design of the AMCA algorithm has been inspired by Block Coordinate Relaxation [41]. In [42], Tseng precisely proved the convergence of BCR for the minimization of nondifferentiable and nonconvex cost functions, with a special application to sparsityconstrained blind source separation problem. According to [42], alternating the minimization steps in Equations (10) and (14), for fixed parameters and , converges to a critical point of the problem in Equation (8).
Firstly, decreasing the thresholds is a strategy that has been proposed in [18] to improve the robustness of the GMCA algorithm to spurious local minima. In the field of optimization, this procedure is reminiscent of the fixed point continuation technique [46], which has been proposed to speed up the minimization of penalized leastsquares problems. The convergence of the AMCA algorithm would be guaranteed as long as steps (10) and (14) are alternated until convergence for each value of . The thresholds are however updated at each iteration of the AMCA algorithm, which helps speeding up the algorithm but might prevent it from convergence.
Secondly, weight matrix is updated at each iteration, which also might prevent the AMCA algorithm from convergence. Lastly, in the spirit of reweighted techniques [47], the weights are updated based on the current estimate of the sources . If this strategy is a well motivated heuristic, the convergence of the AMCA algorithm is not theoretically grounded.
In numerical experiments, the AMCA tends to stabilize after a given number of iteration. As an illustration, Figure 4 displays the evolution of the signaltodistortion ratio (SDE  see Section III for more details) during a run of the AMCA algorithm on simulated sources, which are described in Section III. Our intuition is that, after a given number of iterations, the parameters and are likely to evolve slowly enough to make the AMCA stabilize. In practice, the AMCA algorithm converges in a few hundreds of iterations. Subsequently, choosing the number of iterations provided good results.
From orthogonal transforms to tight frames
Previously, the derivation of the AMCA algorithm has been carried out assuming that the signal representation is an orthogonal transform; i.e. it verifies: . This property allows to perform AMCA in the transformed domain. However, it has long been emphasized that appropriately chosen redundant transforms generally allow for extra degrees of freedom which help producing sparser representations. In the context of sparse BSS, the use of redundant transforms has been shown to yield substantial separation improvements (see [27, 18]). In the numerical experiments of Section III, we make use of redundant wavelets to sparsely represent the sources. Such type of wavelets have the property of translationinvariance [48, 49]. These transforms are nonorthogonal; they are tight frames which verify the exact reconstruction property but for which . In practice, this means that is a sparse representation of the signal but not generally the sparsest, if one exists (see [7]). Fortunately, since the Gram matrix is diagonally dominant for most tight frames of interest in imaging, a first order approximation amounts to resorting to the same changes of variables and . This allows to apply the AMCA algorithm in the transform domain in the tight frame case which greatly lower the computational cost of the algorithm.
IiC Choosing the parameters in AMCA
Choice of the thresholds
It has been emphasized in [27] that the choice of the thresholds plays a crucial role in the separation process. From the concept of morphological diversity, the highamplitudes entries of the sources are likely to belong to a single source. In the case of statistically independent sources, this assumption holds true, with high probability, for high enough amplitudes. This makes the highamplitude entries the most discriminant samples of the sources.
The hardthresholding operator selects entries with amplitudes higher than and rejects the others. In the AMCA algorithm, the thresholds are first initialized to high values; generally each parameter is set to the maximum amplitude of the initial guess of the th source . Their values then decrease towards a final threshold which depends on the noise level. In practice, the final thresholds are fixed to where stands for the standard deviation of the noise which contaminates the th sources. Whenever the level of the noise is unknown, it is estimated empirically using the Median Absolute Deviation estimator (MAD). Assuming that the noise is additive white Gaussian, this procedure guarantees, with high probability, that no noise sample is selected.
Such kind of coarsetofine procedure has some interesting properties : i) the sources are first estimated from the most discriminant entries (i.e. the highamplitude entries of the sources), ii) in the spirit of simulated annealing, the variation of the thresholds makes AMCA less prone to trapping in local minima and iii) the thresholding helps preventing AMCA from incorporating noise entries which makes it more robust to noise.
Choice of the weight parameter
The AMCA algorithm rely on a reweighting procedure that penalizes/favors certain entries of the estimated sources. The weights are function of the norm of the columns of . They somehow measure the activity of each sample across the sources. Intuitively, choosing a low value for seems quite natural since it yields more contrast between sparse and nonsparse columns of . This argument would make perfect sense if the true sources were known. A tradeoff has to be made between the two following options : i) large values for might lead to an underpenalization of less discriminant entries and ii) small values for provide a larger penalization of nonsparse entries of , which is desirable to efficiently separate s.p.c. sources. However, at the beginning of the AMCA algorithm, one has only access to imperfect, if not erroneous, estimates of . In this case, small values of might mispenalize/misfavor entries of which can eventually hamper the separation process. Alleviating this dilemma is made by starting with a high value for – typically – and then decreasing it, at each iteration, towards some final value . Several values for have been tested; it turns out that choosing leads to a good tradeoff for all the experiments we carried out. Smaller values for did not bring any noticeable improvement.
Iii Numerical experiments
Iiia Experimental protocol
In this section, we evaluate the performances of the AMCA algorithm with respect to standard sparse BSS methods : i) the GMCA algorithm [18], ii) RNA (Relative Newton Algorithm) [15] and iii) EFICA (Efficient FastICA) [19]. It is important to recall that, to our knowledge, the blind separation of s.p.c. sources has never been dealt with in the general case where: i) neither the mixing matrix nor the sources are assumed to be nonnegative, ii) the sources do not have (quasi)disjoint supports. Relaxing these assumptions is interesting so as to tackle the separation of complex data which are approximately sparse in general signal representations (e.g. Fourier, wavelets, curvelets, etc.) but where nonnegativity is certainly not a valid assumption.
In this first study, our prime intention is to study the performances of standard sparse BSS methods and the AMCA algorithm to estimate s.p.c. sources. For that purpose, we will first make use of synthetic but rather complex data; this makes MonteCarlo simulations possible in various settings. To mimic spectroscopylike data, the (s.p.c.) sources are generated according to the socalled (SPC) model introduced below :

The sources are sparse signals. This means that each source has only nonzero entries out of entries. The activation of the entries of the sources will be distributed according to a Bernoulli process with parameter so that :
(17) 
For each source, active entries are shared by all the sources and the remaining are independently drawn. Let us recall that is the set of the entries which are common to all the sources; the distribution of these entries is Gaussian with mean and variance . In this model, common active entries do not have correlated amplitudes.

The nonzero entries which belong to are are independently distributed according to a Gaussian law with mean and variance . The parameter defines some dynamic range between the amplitude of common and independent samples of the sources.

the sources are convolved with a Laplacian kernel of full width at half height (FWHM) equal to .
An example of simulated sources (resp. mixtures) is shown in the left (resp. right) panel of Figure 5. The resulting sources can be sparsely represented in a frame of redundant (translation invariant) 1D wavelets^{1}^{1}1For that purpose we used the RICE wavelet toolbox : http://dsp.rice.edu/software/ricewavelettoolbox. Each source is made of samples. The mixing matrix will be picked at random from a Gaussian distribution. In the following experiments, the sparsity level is fixed to . This means that for — which will be the actual value in the next — about spikes will be active in each source.
Evaluation criteria
Following [50], the estimated sources can be decomposed into the combination of contributions of different origins:
with the following interpretations for the terms:

is the projection of on the target (groundtruth) source. In other words, it is the one part of this decomposition which corresponds to what needs to be recovered. The other ones are residues.

accounts for interferences due to other sources.

is the part of the reconstruction which is due to noise.

stands for the remaining artefacts which are neither due to interferences nor noise.
From this decomposition, one can derive a global criterion named Source to Distortion Ratio (SDR) which contains information about each of the different terms of the decomposition:
(18) 
In [18], we advocated the use of a global performance criterion that only depends on the mixing matrix. Such a criterion is particularly interesting in the noisy setting as it is not dependent on the denoising effect of the methods to be compared: methods like GMCA or AMCA have the ability to provide denoised estimates of the sources while others are not. For that purpose, the following mixing matrix criterion is introduced:
(19) 
where is the pseudoinverse of the estimated mixing matrix corrected – through – for the scale/permutation indeterminacies; is the true mixing matrix. Low values of then indicate better separation performances.
In the rest of the paper, all criteria will be computed as the median value of independent MonteCarlo simulations. Unless specified, the number of simulations for each of the figures was set to .
IiiB Comparisons with standard sparse BSS methods
IiiB1 An example of recovered sources
To better visualize how sparse BSS algorithms perform on s.p.c. sources, Figure 6 displays sources out of estimated from mixtures with and . The number of observations as well as the number of sources are set to . The number of samples per source is fixed to . Since , this experiment turns out to be an arduous problem as of the active spikes are shared by all the sources. Figure 6 shows that standard BSS methods all seem to recover some of the most prominent spikes of the sought after source. However they all exhibit a certain level of interferences which indicates poor separation quality. The source estimated by AMCA does not visually exhibit such interferences and perfectly matches the source to be retrieved.
IiiB2 Coherence level
In this paragraph, the performances of the AMCA algorithm are evaluated when the level of correlation or more generally coherence, defined by the ratio , varies between (the nocorrelation case) and (the sources all share the same entries). Its values vary in the range ; when , all the sources are independently distributed. The opposite extreme case corresponds to sources with exactly the same common active entries which means, according to our model, that the sources are fully correlated sources. The performances of standard sparse BSS methods should decrease for high valued of . In this experiment, the number of sources and the number of channels are both fixed to . The standard deviation of the entries in is fixed to . The noise level is fixed to dB. The evolution of the SDR as a function of the coherence level is displayed in the left panel of Figure 7. First, it is interesting to notice that the performances of the standard BSS algorithms (GMCA, EFICA and RNA) are quickly hampered by the partial correlation between the sources. The GMCA algorithm seems to be less sensitive to the correlation of the sources for . When more than of the entries are shared by all the sources, these algorithms behave equally badly. The AMCA algorithm already shows a much higher SDR even when only a few entries are correlated. When less than of the entries are correlated, the SDR of the AMCA algorithm is higher than dB. Its performances start degrading for to eventually behave similarly to the standard BSS algorithm when tends towards . The evolution of the mixing matrix criterion is shown in the right panel of Figure 7. One can notice that the mixing matrix criterion with the AMCA algorithm is consistently more than order of magnitudes lower than for the standard BSS methods. Figure 8 displays the average over random simulations of the minimum SDR over the estimated sources; in other words this quantity measures the SDR the most badly estimated source for each simulation. This figure shows that the GMCA algorithm is very good at recovering some of the sources but not all with the same quality. From that respect, the standard BSS methods behave quite similarly : for they are not able to identify all the sources (the minimum SDR is close to dB). The minimum SDR is dB higher for with the AMCA algorithm. Consequently, the AMCA algorithm is good at recovering all the sources even when the number of common spikes in the sources is equal to .
IiiB3 Dynamic range
A second parameter which is very likely to hamper the performances of standard BSS methods is the dynamic range between between the amplitudes of correlated and noncorrelated entries. More precisely, standard sparse BSS algorithms are highly sensitive to the coefficients of the sources which have the highest amplitudes. This makes perfect sense when morphological diversity holds true: the most significant entries of the sources are most likely the most discriminant. Furthermore, we claimed earlier in this paper that the weighting strategy of the AMCA algorithm penalizes more high amplitude coefficients which are shared by several sources. The AMCA algorithm should therefore be less sensitive to the dynamic range between correlated and noncorrelated samples of the sources than the standard BSS methods. In this experiment, the number of sources and the number of channels are both fixed to . The proportion of correlated entries between the sources is fixed to . The noise level is fixed to dB. Figure 9 features the evolution of the SDR (resp. mixing matrix criterion ) in the left (resp. right) panel as a function of the standard deviation of the correlated entries of the sources in the interval . For (i.e. on average, correlated entries have twice the amplitude of the noncorrelated entries), standard BSS methods all behave quite well with SDR values higher than dB. For higher values of their performances dramatically decrease towards low SDR values. As expected, the AMCA algorithm is only slightly impacted by the variation of the dynamic range. Even for , the SDR of the sources with the AMCA algorithm is higher than dB and the corresponding mixing matrix criterion is about orders of magnitude lower than with the standard BSS algorithms.
IiiB4 Number of sources
Recovering a large number of sources is generally a complicated task. Indeed, for a fixed number of samples , the ability of source separation techniques to disentangle a growing number of sources is rapidly limited. This is true in the case of s.p.c. sources as the number of entries of the sources which are most relevant for the separation are also limited. In the next experiment, the number of sources and the number of channels are both fixed to the same value. The proportion of correlated entries is fixed to . The noise level is fixed to dB and . Figure 10 displays the evolution of the SDR on the left panel and on the right panel as a function of the number sources in a range . As expected the performances of the standard sparse BSS algorithms, as measured by the SDR, decay as the number of sources to be recovered grows. The AMCA algorithm provides good results (the SDR is higher than dB) for less than roughly sources. For a higher number of sources, the performances of the AMCA decrease quite rapidly. However, one has to keep in mind that the number of samples per source is kept fixed to from which are active, that is to say on average . Amongst these active samples, only have been drawn independently. This means that each source has about likely discriminant samples out of . When the number of sources increases, the probability for two independently drawn samples to take simultaneously significant amplitudes grows. This might become likely when the number of sources is large which explains why the performances of AMCA decay for . The evolution of the mixing matrix criterion has to be analyzed with care as the performances of all the methods seem to improve as the number of sources increases while the evolution of the SDR shows a quite different behavior. It has to be noticed that the mixing matrix criterion averages the scalar product of the rows of the inverse of the estimated mixing matrix and the true mixing matrix. In other words, it looks at a quantity that only involves interferences between two sources. On the other hand, the SDR is a global measure of distortion that impacts each of the sources. Precisely, for a fixed value of the mixing matrix criterion, the distortion will increase with the number of sources which explains why the actual SDR of the standard sparse BSS methods decrease with .
IiiC The AMCA algorithm and noise
Discussion about the impact of noise
In this section, we discuss the impact of the reweighting scheme on the performances of the AMCA in the noisy setting. First, in the AMCA algorithm, the weights are estimated from the estimated sources. These sources are obtained via Step 1 of the AMCA algorithm.
In the low noise limit, one interesting feature of the proposed reweighting scheme is that it is inversely proportional to the amplitude of the columns of . More precisely, this entails that large entries of which are shared by several sources are more penalized than small entries with the same relative distribution across the sources. Strongly penalizing large and correlated entries is desirable since they are detrimental to the estimation of the mixing matrix and the sources.
In the noisy setting, the situation turns out to be rather different since smallamplitude samples are more likely perturbed by noise than large amplitude entries. On the one hand, the proposed reweighting procedure might be disastrous for the separation of the sources whether they are partially correlated or not. Indeed, since the weights are inversely proportional to the amplitude of the columns of the sources, the proposed procedure will tend to favor small entries which are more affected by the presence of noise. On the other hand, Step 1 of the AMCA algorithm rejects entries with amplitudes smaller than some prescribed noisedependent level. This should lower the impact of noise on the performances of the AMCA algorithm.
Comparisons
In [18], the authors demonstrated that the GMCA algorithm is robust to additive noise contamination. This is especially true whenever morphological diversity holds; in that case the most discriminant sources are the entries of the sources with the most significant amplitudes. It turns out these entries are also the least contaminated by additive noise. In the case of s.p.c. sources, the most discriminant sources are not necessarily the largeamplitude samples. A first consequence is that noise will very likely have a strong impact on the quality of the separation.
In this experiment, the number of sources and the number of channels are both fixed to . The proportion of correlated entries of the sources is fixed to , is set to . The noise level varies in the range dB. The evolution of the mixing matrix criterion is featured in Figure 11. It first reveals that the GMCAbased methods have slightly better performances when the noise level increases. When the noise level tends towards dB all these methods tend to all perform badly. More surprisingly, the performances of the AMCA algorithm do not seem to be more impacted by the presence of noise than standard BSS methods. To better understand this behavior, let us recall that, in the AMCA algorithm, the mixing matrix is updated from the current estimate of the sources which is naturally denoised in Step 1 of the algorithm. This means that coefficients of the sources which are of the order of or below the noise level are naturally rejected in Step 1; the estimation of the mixing matrix won’t therefore account for these entries. In other words the reweighting procedure won’t “amplify” the impact of the noise. Eventually, for high noise levels, only high amplitude coefficients will be detected. In this regime, the GMCA and AMCA algorithms will be similarly handicapped by correlated largeamplitude coefficients.
IiiD Application in Astrophysics
Multiwavelength instruments and sensors have long been used in Astrophysics. However, it is only recently that the design of modern and sophisticated data analysis methods, such as blind source separation, play a significant role in this field. The recent results of the ESA/Planck space mission in 2013 has shed light on the crucial role that blind source separation has played to provide a very accurate map of the Cosmological Microwave Background (CMB) from multiwavelength microwave data [1, 51]. In this context, each observation measures the mixture of several astrophysical components including the CMB and galactic foreground emissions to only name two. Galactic emissions can further be decomposed into several physically relevant sources with different spectral contributions. As emphasized in the introduction, these galactic sources are emitted from similar regions of the sky which entails that these components are, by nature, partially correlated. Disentangling between the different galactic sources is a challenging task which has been rarely tackled using blind source separation so far. In the rest of this section, we focus on the estimation of the various galactic sources from CMBfree data – the CMB is customarily removed from the data before studying galactic components. In the range of the microwave wavelengths observed by instruments such as WMAP [52, 2] and Planck, the most prominent galactic emissions are : the freefree emission, galactic synchrotron emission, spinning dust and thermal dust emissions – see [39] for more details about astrophysical emissions in the range of microwave wavelengths and their simulations.
In this section, our goal is to study the impact of the partial correlation of astrophysical components on the performances of blind source separation methods. This study will therefore be carried out on simulations of the WMAP[52] observations in the range Ghz generated by the PSM (Planck sky model^{2}^{2}2http://www.apc.univparis7.fr/ delabrou/PSM/psm.html – see [39]). Simulations of these four major galactic emissions at the vicinity of the galactic center are displayed in Figure 12. In these simulations, for the sake of simplicity, the linear mixture model (1) holds true. As displayed in Figure 12, the most prominent emissive areas of all four sources are located in similar regions of the sky. This makes them partially correlated, especially at large scale or low frequencies. The simulated WMAP observations are featured in Figure 13. The sources displayed in Figure 12 have an approximately sparse distribution in a bidimensional translationinvariant wavelet transform [12]; this will be our choice for the signal representation in this section. All the sparse BSS algorithms used in the forthcoming experiments have been performed in the wavelet domain using default parameters.
In the context of astrophysics, the impact of noise is usually deemed an important contribution that can hamper the quality of separation. The performances of sparse BSS are therefore evaluated for different levels of noise using the SDR and the mixing matrix criterion defined above. The left column of Figure 14 (resp. 15 and 16) features the sources estimated by the EFICA algorithm (resp. RNA and GMCA) for a normalized noise level of . The normalized noise level corresponds to the nominal level of the WMAP data. It corresponds to a signaltonoise ratio of dB. This value might seem very large but only largescale features are emerging above the noise level for mild values of the SNR.
With the exception of the estimated freefree emission which shows some spurious point sources residuals that originally belong to the thermal dust emission, all the four sources seem to be correctly estimated. The column on the left of Figure 14 (resp. 15 and 16) features the residual error defined by the difference between the estimated and original sources. Figure 17 displays exactly the same pictures for the AMCA algorithm. The display levels of the residual errors are kept the same to better reveal the differences between the results from all tested algorithms. The level of estimation error provided by the AMCA algorithm seem to be significantly lower for the freefree, spinning dust and thermal dust emissions while tending to be slighter better for the synchrotron emission.
Figure 18 shows the evolution of the SDR for the four tested sparse BSS methods when the normalized noise level varies from (i.e. the noise level is actually lower than for that real WMAP data) to . Each entry of this figure has been obtained as the median value of independent MonteCarlo simulations. This figure shows that the performances of the standard sparse BSS methods seem to be quite similar at all levels of noise with values of SDR in the range dB. The AMCA algorithm provides estimates with a level of SDR that is almost dB higher in the low noise regime. We already emphasized that the evolution of the SDR yields a partial view of the performances of these algorithms when the noise level increases. In this regime, the mixing matrix criterion is better suited to evaluate the quality of separation independently of the level of noise that contaminates the estimates of the sources. The evolution of is provided in Figure 18. This figure confirms that the standard sparse BSS algorithms perform in a rather similar manner with values of the mixing matrix criterion of about . In the meantime, the AMCA algorithm performs slightly better, especially in the low noise regime, with a value of that is about times lower in this case. In this high noise level regime, only the largescale features of the sources are detected. Since these features are likely to be at the origin of the partial correlation of the sources, all algorithms, including AMCA, are handicapped in this setting.
Software
Following the philosophy of reproducible research [53], the codes and algorithms introduced in this article will be available at http://www.cosmostat.org/GMCALab.
Iv Conclusion
The blind separation of sparse and partially correlated (s.p.c.) sources is a challenging task which standard sparse BSS methods generally hardly tackle with success. In this article, we introduce a novel sparse blind source separation coined Adaptive Morphological Component Analysis which is designed to retrieve sparse and partially correlated sources. We emphasize that recovering s.p.c. sources can be approached by identifying and privileging the most discriminant entries of the sources. We then propose learning adaptively these entries via a reweighting scheme jointly with the estimation of the sources and the mixing matrix yielding the proposed AMCA (Adaptive Morphological Component Analysis) algorithm. We further evaluate the performances of the AMCA algorithm with respect to standard sparse BSS methods in various experimental scenarios using MonteCarlo simulations of s.p.c. sources. The AMCA algorithm is shown to be much less sensitive to partial correlations of the sources even when the sources are highly correlated (i.e. when about of the active samples are common to all the sources). The proposed algorithm is slightly impacted by the dynamic of the amplitudes of correlated and independent entries. It is competitive when the number of sources to be retrieved is large. Finally, we applied the AMCA algorithm to the separation of astrophysical components from simulated microwave data. This application illustrates that the AMCA algorithm is well suited to estimate physical components which are, by nature, partially correlated.
Acknowledgment
This work was supported by the French National Agency for Research (ANR) 11ASTR03402MultID.
References
 [1] Planck Collaboration, P. A. R. Ade, N. Aghanim, C. ArmitageCaplan, M. Arnaud, M. Ashdown, F. AtrioBarandela, J. Aumont, C. Baccigalupi, A. J. Banday, and et al., “Planck 2013 results. XII. Component separation,” ArXiv eprints 1303.5072, Mar. 2013.
 [2] J. Bobin, F. Sureau, P. Paykari, A. Rassat, S. Basak, and J.L. Starck, “Wmap nineyear cmb estimation using sparsity,” A&A, vol. 553, p. L4, 2013.
 [3] J. Bobin, J.L. Starck, F. Sureau, and S. Basak, “Sparse component separation for accurate cosmic microwave background estimation,” Astronomy & Astrophysics, vol. 550, no. A73, 2013.
 [4] P. Comon and C. Jutten, Handbook of blind source separation. Elsevier, 2010.
 [5] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. J. Wiley New York, 2001.
 [6] S. Choi, A. Cichocki, H. Park, and S. Lee, “Blind source separation and independent component analysis: a review,” Neural Inform. Process. Lett. Rev., vol. 6, pp. 1–57, 2005.
 [7] M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, 2010.
 [8] J.L. Starck, F. Murtagh, and J. Fadili, Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity. Cambridge, 2010.
 [9] E. Candès and M. Wakin, “People hearing without listening: An introduction to compressive sampling,” IEEE Signal Processing Magazine, March 2008, 21–30.
 [10] M. J. Fadili, J.L. Starck, and F. Murtagh, “Inpainting and zooming using sparse representations,” The Computer Journal, vol. 52, 2009, 64–79.
 [11] S. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1998.
 [12] J.L.Starck, J.Fadili, and F.Murtagh, “The undecimated wavelet decomposition and its reconstruction,” IEEE Transactions on Image Processing, vol. 16, pp. 297–309, 2007.
 [13] M. J. Fadili and J.L. Starck, “Curvelets and ridgelets,” in Encyclopedia of Complexity and System Science. Springer, 2008, in press.
 [14] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” ITSP, vol. 54, no. 11, 2006, 4311–4322.
 [15] M. Zibulevski, “Blind source separation with relative Newton method,” Proceedings of ICA, Independent Component Analysis, 2003, 2003, 897902.
 [16] M. Z. A.M. Bronstein, M.M. Bronstein and Y.Y.Zeevi, “Sparse ica for blind separation of transmitted and reflected images,” Intl. Journal of Imaging Science and Technology (IJIST), vol. 15/1, pp. 84–91, 2005.
 [17] Y. Li, S. Amari, A. Cichocki, and C. Guan, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on information theory, vol. 52, pp. 3139–3152, 2006.
 [18] J. Bobin, J.L. Starck, M. J. Fadili, and Y. Moudden, “Sparsity and morphological diversity in blind source separation,” IEEE Transactions on Image Processing, vol. 16, no. 11, 2007, 2662–2674.
 [19] P. T. Z. Koldovsky and E. Oja, “Efficient variant of algorithm fastica for independent component analysis attaining the cramerrao lower bound,” IEEE Transactions on neural networks, vol. 17, pp. 1265–1277, 2006.
 [20] A. Hyvärinen, “Fast and robust fixedpoint algorithms for independent component analysis,” IEEE Transactions on Neural Networks, vol. 10, no. 3, pp. 626–634, 1999.
 [21] R. Gribonval and S. Lesage, “A survey of sparse component analysis of blind source separation: principles, perspectives, and new challenges,” in ESANN’ 2006 proceedings, 2006.
 [22] S. Lesage, S. Krstulovic, and R. Gribonval, “Underdetermined source separation: comparison of two approaches based on sparse decompositions,” in Proceedings ICA 2006, 2006.
 [23] A. Jourjine, S. Rickard, and O. Yilmaz, “Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures,” in Proceedings of ICASSP 2000, 2000.
 [24] O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via timefrequency masking,” IEEE Transactions on Signal Processing, vol. 52, no. 7, pp. 1830–1847, 2004.
 [25] R. Saab, O. Yilmaz, M. McKeown, and R. Abugharbieh, “Underdetermined blind source separation using sparse representations with delays,” in Proceedings SPARS’ 05, 2005.
 [26] F. Abrard and Y. Deville, “A timefrequency blind signal separation method applicable to underdetermined mixtures of dependent sources,” Signal Processing, vol. 85, no. 7, pp. 1389 – 1403, 2005.
 [27] J. Bobin, J.L. Starck, Y. Moudden, and M. J. Fadili, “Blind source separation: The sparsity revolution,” Advances in Imaging and Electron Physics, vol. 152, 2008, 221–306.
 [28] J. Kim, R. Monteiro, and H. Park, “Group Sparsity in Nonnegative Matrix Factorization,” in SDM, 2012, pp. 851–862.
 [29] N. Gillis, “Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing,” Journal of Machine Learning Research, vol. 13, pp. 3349–3386, 2012.
 [30] N. Gillis and A. Kumar, “Exact and heuristic algorithms for seminonnegative matrix factorization,” Faculté polytechnique, Université de Mons, https://sites.google.com/site/nicolasgillis/publications, Tech. Rep., 2014.
 [31] J. Rapin, J. Bobin, A. Larue, and J.L. Starck, “Sparse and NonNegative BSS for Noisy Data,” IEEE Transactions on Signal Processing, vol. 61, pp. 5620–5632, 2013.
 [32] ——, “Analysis and Synthesis Sparsity in NMF,” SIAM Imaging Science (in press), 2014.
 [33] W. Naanaa and J.M. Nuzillard, “Blind source separation of positive and partially correlated data,” Signal Processing, vol. 85, pp. 1711–1722, 2005.
 [34] Y. Sun, C. Ridge, F. del Rio, A. Shaka, and J. Xin, “Postprocessing and sparse blind source separation of positive and partially overlapped data,” Signal Processing, vol. 91, pp. 1838–1851, 2011.
 [35] D. Donoho and V. Stodden, “When does nonnegative matrix factorization give a correct decomposition into parts,” Earth, vol. 16, pp. 1141–1148, 2003.
 [36] N. Gillis, The why and how of nonnegative matrix factorization, ser. Machine learning and Pattern recognition series, J. Suykens, M. Signoretto, and A. Argyriou, Eds. Chapman & Hall, CRC, 2014.
 [37] T.H. Chan, C.Y. Chi, Y.M. Huang, and W.K. Ma, “Convex analysis based minimumvolume enclosing simplex algorithm for hyperspectral unmixing,” Acoustics, Speech, and Signal Processing, IEEE International Conference on, pp. 1089–1092, 2009.
 [38] R. Bouchet and R. Gispert, “Foregrounds and cmb experiments: I. semianalytical estimates of contamination,” New Astronomy, vol. 4, no. 443, 1999.
 [39] J. Delabrouille and et al., “The prelaunch Planck Sky Model: a model of sky emission at submillimetre to centimetre wavelengths,” Astronomy & Astrophysics, vol. 553, p. A96, May 2013.
 [40] J. Bobin, Y. Moudden, J.L. Starck, J. Fadili, and N. Aghanim, “SZ and CMB reconstruction using Generalized Morphological Component Analysis,” Statistical Methodology, vol. 5, no. 4, p. 11, 2007. [Online]. Available: http://arxiv.org/abs/0712.0588
 [41] A. Bruce, S. Sardy, and P. Tseng, “Block coordinate relaxation methods for nonparametric signal denoising,” Proceedings of the SPIE  The International Society for Optical Engineering, vol. 3391, pp. 75–86, 1998.
 [42] P. Tseng, “Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization,” Optimization, vol. 109, no. 3, pp. 475–494, 2001. [Online]. Available: http://www.springerlink.com/index/q327675221126243.pdf
 [43] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forwardbackward splitting,” SIAM Journal on Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168–1200, 2005.
 [44] N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, no. 3, 2014.
 [45] Rockafellar, Convex analysis, ser. Princeton Landmarks in Mathematics and Physics. Princeton University Press, 1970.
 [46] E. Hale, W. Yin, and Y. Zhang, “A FixedPoint Continuation Method for l1 Regularized Minimization with Applications to Compressed Sensing,” Rice University CAAM Technical Report, vol. TR0707, 2007.
 [47] E. J. Candès, M. B. Wakin, and S. P. Boyd, “Enhancing Sparsity by Reweighted Minimization,” Journal of Fourier Analysis and Applications, vol. 14, pp. 877–905, 2007.
 [48] R. R. Coifman and D. L. Donoho, “Translationinvariant denoising,” in Wavelets and Statistics, G. O. A. Antoniadis, Ed. SpringerVerlag, 1995, pp. 125–150.
 [49] J.L. Starck, P. Abrial, Y. Moudden, and M. Nguyen, “Wavelets, ridgelets and curvelets on the sphere,” Astronomy & Astrophysics, vol. 446, pp. 1191–1204, 2006.
 [50] E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech & Language Processing, vol. 14, no. 4, pp. 1462–1469, 2006.
 [51] J. Bobin, F. Sureau, J.L. Starck, A. Rassat, and P. Paykari, “Joint planck and wmap cmb map reconstruction,” Astronomy & Astrophysics, vol. 563, no. A105, 2014.
 [52] C. L. Bennett and et.al., “Nineyear wilkinson microwave anisotropy probe (wmap) observations: Final maps and results,” ApJS, 2013.
 [53] J. Buckheit and D. Donoho, “WaveLab and Reproducible Research.” SpringerVerlag, 1995, pp. 55–81.