Sparse and Non-negative BSS for Noisy Data

Sparse and Non-negative BSS for Noisy Data

Jérémy Rapin, Jérôme Bobin, Anthony Larue, and Jean-Luc Starck
Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to accepted on August , 2013.A. Larue and J. Rapin are with CEA, LIST, 91191 Gif-sur-Yvette Cedex, France.J. Bobin, J.Rapin and J.-L. Starck are with CEA, IRFU, Service d’Astrophysique, 91191 Gif-sur-Yvette Cedex, France.

Non-negative blind source separation (BSS) has raised interest in various fields of research, as testified by the wide literature on the topic of non-negative matrix factorization (NMF). In this context, it is fundamental that the sources to be estimated present some diversity in order to be efficiently retrieved. Sparsity is known to enhance such contrast between the sources while producing very robust approaches, especially to noise. In this paper we introduce a new algorithm in order to tackle the blind separation of non-negative sparse sources from noisy measurements. We first show that sparsity and non-negativity constraints have to be carefully applied on the sought-after solution. In fact, improperly constrained solutions are unlikely to be stable and are therefore sub-optimal. The proposed algorithm, named nGMCA (non-negative Generalized Morphological Component Analysis), makes use of proximal calculus techniques to provide properly constrained solutions. The performance of nGMCA compared to other state-of-the-art algorithms is demonstrated by numerical experiments encompassing a wide variety of settings, with negligible parameter tuning. In particular, nGMCA is shown to provide robustness to noise and performs well on synthetic mixtures of real NMR spectra.

BSS, NMF, sparsity, morphological diversity

I Introduction

In many applications, such as nuclear magnetic resonance (NMR) spectrometry or mass-spectrometry, measurements are often made of mixtures of physical components which can be identified by their specific spectrum. Discriminating between these elementary components or sources can be made by acquiring several measurements at different times or locations in order to observe different mixtures, yielding multispectral data.

In this context, blind source separation (BSS) aims at recovering the spectra from measurements in which the components sources are mixed up together in an unknown way. The instantaneous linear mixture model assumes that the measurements are linear mixture of sources with spectra . In other words, there exist mixture coefficients such that:


where the vectors are added in order to account for noise and model imperfections. This mixing model can be conveniently rewritten in matrix form:



  • is the number of measurements.

  • is the number of samples of the sources.

  • is the number of sources.

  • is the measurements matrix in which each row is a measurement.

  • the unknown source matrix in which each row is a spectrum/source.

  • the unknown mixing matrix which defines the contribution of each source to the measurements.

  • is an unknown noise matrix accounting for instrumental noise and/or model imperfections.

With the notation (Frobenius norm for ) and for independent and identically distributed Gaussian noise, the maximum-likelihood estimate is then provided by the standard problem:


However, this problem presents an infinite number of solutions which are not necessarily of any interest with respect to a given application. It is therefore standard to constrain and/or so as to limit the search to minima with a desired structure. In this article, we focus on the assumption that both and are non-negative, yielding Non-negative Matrix Factorization (NMF).

The non-negativity assumption arises naturally in many applications, including text mining [1], clustering [2], audio processing [3] and spectrometry [4]. Indeed, the sources in can for instance be mass or power spectra, which are non-negative, and the mixtures can represent concentrations, which cannot be negative either. The problem is written under the constrained form:


can be the distance such as in (3) or any divergence taking into account other priors on the noise [3, 5, 6]. In this article, we focus on Gaussian noise and therefore use the distance.

The first publications dealing with this type of problems comes from Paatero & Tapper [7] and Lee & Seung [5] who provided convergent gradient descent type algorithms. They emphasized on the importance of the non-negativity constraint which yields a “part-based” representation of the data. Indeed, only additive/constructive, and not subtractive/destructive combinations are then allowed in order to represent the data.

NMF shares a couple of indeterminacies with other BSS problems:

  • permutations: if is a permutation, so that one cannot hope to recover the order of the rows of and columns of .

  • scales: if is diagonal with strictly positive diagonal coefficients, so that if is a solution of (4), is also solution.

However, unlike problem (3) for which a solution can be conveniently obtained through SVD decomposition for instance, NMF is NP-Hard [8] and can present numerous local minima. For this reason, additional constraints or priors information can be helpful to recover the sought sources. In the next sections, we impose a sparse prior on the sources, together with their non-negativity.


This article details and extends the preliminary work in [9] with more complete experimental study and a more advanced understanding of the algorithms. Specifically, noise is accurately accounted for. It illustrates the proposed approach on realistic data as well. In Section II, we first give a review of the state-of-the-art NMF methods and sparse NMF techniques proposed so far. Our approach aims at obtaining a better management of the noise through an ad hoc regularization strategy. For that purpose, we introduce in Section III an extension of the sparse BSS algorithm GMCA (Generalized Morphological Component Analysis) to tackle the problem of sparse and non-negative BSS from noisy data. This extension is motivated by the robustness of GMCA to additive noise contamination.

We further show that a special care has to be taken to constrain both the sparsity and the non-negativity of the signals. More specifically, improperly constrained methods can lead to unstable and sub-optimal solutions. We therefore introduce the non-negative GMCA (nGMCA) which has the ability to provide exact solutions of the constrained and penalized sub-problems, thanks to proximal calculus methods. Last but not least, the proposed algorithm is a simple to use technique since it barely requires any parameter tuning.

In Section IV the proposed approach is compared with standard NMF and sparse NMF algorithms. Exhaustive experiments show that nGMCA allows for better robustness to noise, large numbers of sources and the conditioning of the mixing matrix, and therefore outperforms most algorithms for a broad variety of settings. Finally, in section V, we illustrate how nGMCA performs also well to separate out synthetic NMR spectra.

Ii State-of-the-art NMF methods

Ii-a Standard NMF algorithms

The minimization of the cost function from Problem (4) is generally solved by alternately updating and . Indeed, while this problem is not convex, the sub-problem in writes:


and the equivalent sub-problem in are often convex and their minimization is therefore much easier. These alternating updates are then repeated for a large number of iterations.

The first converging NMF algorithm, designed by Lee & Seung [5, 10], updates and with a weighted gradient descent. The weights insure that the gradient steps do not increase the cost function in (4) and keep and non-negative, since the update can be recast as a pointwise product of non-negative matrices. With the element-wise matrix multiplication and the element-wise matrix division, the update rule for and in the case of a least square cost function can be written as follows:


The multiplicative update rule is usually considered as a standard because of its convenience, with no parameter to set. As well, it was the first convergent algorithm proposed to solve the NMF problem. However, it has been shown to be slow and that the monotone decrease of the algorithm does not insure convergence to a local minimum [11, 12]. First-order methods are also used in projected/proximal gradient descent algorithms [13], interior point gradient [14], and quasi-Newton algorithms [15].

Another standard approach is the Alternating Least Square (ALS) algorithm explained in [7], which solves exactly the unconstrained cost function with a pseudo-inverse and projects the result on the non-negativity constraint:


with the operator . This algorithm is also widely used due to its easy implementation and its efficiency in decreasing the cost function. However, it does not necessarily converge. In Hierarchical ALS (HALS, [16, 17]), the columns of and rows of are processed one by one. This yields a simple and fast optimization process to solve the constrained sub-problems.

It is however possible to solve exactly the constrained sub-problems of type (5) at each iteration as follows:


In [18] for instance, Lin uses a projected gradient descent subroutine to solve the sub-problems. Guan et al. later provided a faster first-order method [19].

It has to be mentioned that other approaches based on geometrical methods have also been investigated to solve NMF problems[20, 21, 22]; these approaches are however generally very sensitive to noise.

As was previously stated, non-negativity is not always sufficient to recover the actual sources and mixing matrix. In non-negative ICA [23], one additionally enforces the independence of the sources. However, this approach is also sensitive to noise. Sparsity, on the other hand, has been shown to provide robustness to noise. We give a short introduction about this prior in the next section, before presenting sparse NMF algorithms.

Ii-B Sparsity & NMF

Ii-B1 An Introduction to Sparsity

Sparsity constraints have already proved their efficiency to solve a very wide range of inverse problems (see [24] and references therein). In the context of BSS, sparsity has been shown to increase the diversity between the sources which greatly helps their separation [25, 26, 27]. In the wide sense, a sparse signal is such that its information content is concentrated into only a few large non-zero coefficients, or can be well approximated in such a way. The sparsity of a signal however depends on the basis or dictionary in which it is expressed. For instance, a sine wave will be sparse in the Fourier domain since it can be encoded with one coefficient in this domain, while in the direct domain most of its coefficients are non-zero: the more a basis captures the structures of a signal, the sparser will be in such a basis. In this article we will however only focus on sparsity in the direct domain.

Ii-B2 Sparse NMF Algorithms

In many applications the non-negativity and sparsity of the sources arise naturally as for instance in MS or NMR spectroscopy. Recent works have emphasized the fact that this knowledge can indeed help perform more relevant factorizations [2, 28, 29].

In [2], Kim & Park have proposed to formulate the NMF problem as:


In this equation, the sparsity-enforcing regularizer favors solutions where a single source dominates at each sample. However, it does not enforce the intrinsic sparsity of each of the sources. The authors made use of an active-set method to solve this problem. This technique can solve exactly each constrained sub-problem in a similar way than in Problem (8). However, it is not clear how the parameters and must be set. The authors provide an implementation on their website111

In [15], Zdunek & Cichocki used a similar regularization term for but with decreasing during the algorithms in order to be more robust to local minima, and without exactly solving the sub-problems.

In [29], Hoyer used a sparse regularization of the form that uniformly enforce the sparsity of . The author later used a different type of sparse prior in [30] defined for some vector as follows:


This sparseness function goes from 1 when is perfectly sparse —only 1 active coefficient— to 0 when it is not at all —all coefficients active, with the same value. The idea is therefore to optionally impose a chosen level of sparsity for the sources and/or the mixtures:


This problem is solved by using projected gradient descent steps followed by a projection on the sparseness constraint if the constraint is active and with a multiplicative update otherwise. However, the constraint is directly related to the expected sparseness level of the sources which is not necessarily known beforehand. Furthermore, hard-constraining the sparseness level may make the solution very dependent on the sparseness parameters and . An implementation of this algorithm is available online222

Sparse HALS [16, 17] aims at solving Problem (4) with a sparsity penalization of the form . In the case of an data fidelity term, this problem can be rewritten as:


In this algorithm, the columns of and lines of are updated one by one. As each sub-problem admits a straightforward analytic solution and no matrix inversion is required, the HALS algorithm turns to be a simple and fast NMF solver. In a recent implementation of the HALS [31, 32]333, the parameter of is automatically managed in order to obtain a required sparsity rate (defined as the ratio of coefficients smaller than times the largest coefficient).

It is important to note that none of these algorithms makes use of the sparse prior explicitly to deal with additive noise; therefore they may not be robust in case of noise contamination.

Iii Non-negative Generalized Morphological Component Analysis

Iii-a A first naive extension

In the last decade the use of sparsity in the field of BSS has been widely explored. In [27, 33], the authors have introduced a sparsity-enforcing BSS technique coined Generalized Morphological Component Analysis (GMCA) which has been shown to be effective at separating out sparse signals from noisy data. Morphological diversity has been defined in [33] as a mean to characterize separable sources based on their geometrical structures or morphologies : separable sources with different morphologies do not share the same significant coefficients in a given sparse representation. When sparsity holds in the direct domain this means that the entries of each source with the most significant amplitudes should be different. This does not mean that their supports are disjoint but rather their most significant elements should be disjoint.

The objective of this paper is to extend this sparsity-enforcing BSS algorithm to deal with non-negative mixtures. Following [27], the GMCA with an additional non-negative constraint estimates the mixing matrix and the sources by minimizing the following optimization problem:


The pseudo-norm counts non-null coefficients in and therefore limits their number, thus enforcing the sparsity of . In the vein of Alternating Least Squares, GMCA alternately and iteratively estimates the unconstrained least square solution and projects on the non-negativity constraint, with an additional thresholding step for the sources in order to keep only the most significant coefficients. These updates are provided in lines 6 and 7 of Algorithm 1 where the hard-thresholding operator is defined as follows:


It has been emphasized in [27] that one crucial feature of GMCA is the use of a decreasing threshold . At the beginning, this parameter is first set to a high value and then decreases throughout the iterative down to a final value that depends on the noise level. Simulated annealing has already inspired decreasing and regularizations in NMF [15, 34]. The motivation behind a decreasing threshold is however different:

  1. first estimating the mixing matrix from the entries of the sources that have the highest amplitude and thus likely to belong to only one source.

  2. help removing the smallest coefficients which are more sensitive to noise contamination.

In the same way as in the original GMCA, for a source with index , the threshold is set to where is an empirical estimator (the median absolute deviation) of the source noise variation. is chosen at each iteration in order to obtain a linear increase of the number of active coefficients in , so as to refine the estimation while maintaining some continuity. The final , , is usually taken in the range as a trade-off between sufficient denoising and correct separation. Indeed, for a sparse signal contaminated by i.i.d. Gaussian noise with standard deviation , thresholding at 3 rejects noise samples with probability 0.99. Still, in BSS, too large a final threshold could leave some leakage between the sources. This nGMCA will be considered as a naive extension of GMCA and used as a reference algorithm. It will be coined naive non-negative GMCA (nGMCA).

2:initialize and
3:for  do
4:     Normalize the columns of
6:     Select the thresholds considering
9:end for
Algorithm 1 : nGMCA

It is very important to notice that this algorithm, as based on alternating projected least-squares, is only a proxy which generally does not converge to a stable couple , hence the “naive” denomination. This means that the solution given by this type of algorithm may not be stable and be sub-optimal. The solution may not provide the sparsest non-negative sources. This is due to the fact that it deals with the data fidelity term and the constraints in a completely independent way, thus not exactly solving the sub-problems such as in Problem (8).

Next, we therefore propose an alternative algorithm which exactly solves the non-negatively constrained and penalized sub-problems and should allow for more robust and stable solutions.


We first propose to tackle the sparse non-negative BSS problem using an sparse regularization with such as in (13). This then amounts to solving the following optimization problem:


where is the characteristic function of the non-negative orthant, which enforces the non-negative constraints. The characteristic function of the non-negative orthant is defined as:


In the previous section, we emphasized that a naive approach based on projected least-square does not necessarily provide a stable and thus optimal solution to the problem. To go beyond the aforementioned naive extension of GMCA, one has to alternatively and exactly minimize the constrained sub-problems in and so as to obtain stable solutions with the sought structure. Let’s first have a look at the sub-problem in ; assuming is fixed, the sources are estimated as follows:


This problem minimizes a function which can be split into the sum of a differentiable quadratic term and two non-smooth and non-differentiable terms: the norm and the characteristic function of the non-negative orthant . The choice provides the closest convex surrogate to the sparse norm and in this case, the problem admits a unique solution which, however, cannot be formulated explicitly. Fortunately, thanks to recent advances in proximal calculus and splitting techniques, efficient algorithms can be designed to solve this type of problems.

Iii-B1 Proximal calculus and splitting methods

Proximal splitting methods [35] aim at minimizing convex functions which may not be differentiable. The idea of these methods is to split the cost function into the sum of several convex functions which are alternately locally minimized. Many problems can be decomposed into the general form:


where is a smooth and convex data fidelity term; and a convex regularization term which may not be differentiable.

Since is differentiable, it can be locally minimized with a gradient descent step. When is not, but if it is convex, proper and lower semi-continuous, one can define the proximal operator of :


In this case, the following process called forward-backward splitting algorithm (FBS, see [35]):


has been shown to converge to the solution of Problem (18) if is -Lipschitz.

The update of is made by solving the following problem:


which can be recast in the general form (18) by defining and . In this case, the gradient of is trivially equal to: , it is -Lipschitz with where is the matrix spectral norm. The proximal operator of is the projector onto the non-negative orthant . The FBS then reduces to a projected gradient algorithm similar to the updates in [18].

Similarly, the update of in Problem (17) with can be solved with FBS, with . The proximal operator of this function also takes an explicit form usually termed skewed-position soft-thresholding operator:


where the soft-thresholding operator is defined as:


This soft-thresholding, induced by the norm, is well-known to introduce a bias. It is therefore customary to use . In the FBS this is made by replacing the soft-thresholding operator with the hard-thresholding. Rigorously, it is not a proximal operator since is not convex nor semi-continuous; this means that there is no convergence guarantee of the forward-backward splitting algorithm when hard-thresholding is used.

Iii-B2 Description of nGMCA

In the same vein as the naive nGMCA introduced in the previous section, and are alternately updated with the exception that, now, each sub-problem is solved exactly, which guarantees that the solution is stable and has the sought structure. The main steps of the algorithm are given in Algorithm 2. One must notice that exactly solving each sub-problem is costly since it requires sub-iterations at each step. Fortunately, it has been recently showed that the speed of convergence of the FBS algorithm can be greatly improved by using the multi-step techniques introduced by Nesterov [36]. Both steps have the algorithm make use of an accelerated version of the FBS algorithm [37]. A detailed description of how this accelerated algorithm is used to update is given in appendix A.

A special care has to be given to renormalizations since the regularization tends to reduce the norm of . More specifically, since the norm of can keep increasing to compensate the reduction of the norm of , the algorithm can converge to the degenerate solution and . In the algorithm, the columns of are therefore renormalized to unity before updating . This also assigns to the coefficients of their overall importance in the estimation of .

Following the general thresholding strategy used in GMCA and its extensions, the threshold decreases from step to step. However, the strategy used in this version nGMCA differs from the one used in the naive approach. In the former, the threshold is applied to the sources as defined by their least-square estimate. On the contrary the threshold in nGMCA applies at each gradient descent step. The update rule of in the sub-iterations of nGMCA (without the acceleration) can be written as follows:


with containing only ones. Iterative soft-thresholding therefore operates on the gradient and not directly on the source values like in the naive nGMCA. Also, unlike with hard-thresholding, a variation of affects all active coefficients. Our strategy consists in starting with a large parameter which forces the coefficients of to be non-increasing in the first iteration. The threshold is then linearly decreased in order to refine the solution while preserving continuity, down to where is this time an estimate of the noise level in the row of the gradient.

We also implemented a version of nGMCA using hard-thresholding aiming at solving Problem (15) with an pseudo-norm instead of the norm for the regularization. The superscript and are specified to differentiate between respectively the hard- and the soft-thresholding versions.

2:initialize , and
3:for  do
4:     Normalize the columns of
7:     Select
8:end for
Algorithm 2 : nGMCA

Iv Numerical experiments

In this section we first compare the introduced algorithms and classical algorithms on noiseless data, in order to better understand their behaviors, and then we benchmark the GMCA-based algorithms with state-of-the-art sparse algorithms on noisy data. The settings of the simulations and the evaluation methodology are described in the following sections.

Iv-a Settings

Reference matrices and coefficients are uniformly generated respectively from the distribution of and , where:

  • is a Bernoulli random variable with activation parameter , i.e. it equals 1 with probability and 0 otherwise.

  • is a centered and reduced Generalized Gaussian random variable with shape parameter .

In practice and control 2 kinds of sparsity. The Bernoulli parameter affects the number of actual zeros in and and therefore exact sparsity. On the other hand, selects the sharpness of the distribution of , which pdf is proportional to (with and dependent on the standard deviation which is fixed to 1 here). As special cases, for , is a Gaussian random variable, and with it is a Laplacian random variable. With , is considered as approximately sparse —the generated signals become sparser when decreases.

In the experiments and unless stated otherwise, the standard settings will be , , with measurements of samples.

Iv-B Evaluation of the Results

In order to evaluate and compare the algorithms, a scale and permutation invariant criterion is needed. This criterion has to be well adapted to measure the reconstruction performance. In many applications, the signal of interests are the sources. More precisely, it has to be noticed that noise-reducing priors are applied on the sources only. This implies that a good estimate of the sources should be the least contaminated by noise and interferences from the other sources. Moreover, in a noisy setting, a perfectly estimated mixing matrix do not necessarily yield a good estimate of the sources: indeed a slightly degraded mixing matrix may be preferred if it leads to less noisy sources. These points make criteria based on this variable not adequate to measure a good separation. In the next, we will focus on estimating the separation performance using a criterion on the sources .

In [38], Vincent et al. have proposed different criteria to evaluate the performance of blind source separation techniques. In the noisy setting, they propose separating each estimated source into the sum of several components:

with the projection of on the reference source it estimates; and , in its orthogonal space, respectively standing for interferences with other sources, contamination with noise and contamination with algorithm artifacts. They design an SNR-type energy ratio criterion named Source Distortion Ratio (SDR):


As stated in [38], this criterion is a global performance measure taking into account all the elements of the reconstruction, i.e. a correct separation (low ), efficient denoising (low ) and little artifacts left by the algorithm (low ). Also, this criterion has the advantage of being scale-invariant. In the next, the SDR will be used to evaluate the separation performance of the proposed technique with respect to state-of-the-art methods.

Because of the permutation invariance, one cannot know a priori which estimated source stands for which reference source. Reference and estimated sources are therefore paired one-to-one in order to obtain the best mean SDR. The SDR on (coined SDR) has be used in the experiments below in order to assess the performances of the algorithms. In the following, the SDR has been evaluated from several Monte-Carlo simulations; the number of simulations will be given in each figure’s caption.

Iv-C Behavioral Study: Noiseless Data

In this first experimental section, nGMCA, nGMCA and nGMCA are tested on data with a large activation rate for () and therefore not the very sparse sources for which they are designed. These settings are not favorable for the nGMCA algorithms, which allows to emphasize the differences of behavior between them. Since the problem reduces to an exact factorization when there is no noise, is set to 0, hence the final threshold in this case is identical for all the algorithms. ALS [7], the Multiplicative Update [10] and (non-sparse) accelerated HALS [31] are also performed as standard algorithms to play the role of references. The maximum number of iterations were set to large enough numbers in order to assure the convergence of all the algorithms. Precisely, the number of iterations is set to 5000 for HALS, 40,000 for the Multiplicative Update, 500 for ALS and the GMCA-based algorithms, with a maximum of 80 sub-iterations in nGMCA and nGMCA).

Iv-C1 Summary of the experiments

  • Figure 1: this benchmark shows the influence of the number of sources on the estimation of . This parameter is of course important since the more numerous they are, the more difficult they are to separate. nGMCA outperforms ALS, while with , the final iterations are identical for both algorithms. The thresholding strategy of the first stage of the iterations therefore proves its efficiency for the separation. Though nGMCA is not performing as well as nGMCA and nGMCA with few sources, it is much more robust than all the algorithms with large numbers of sources. This is further detailed in paragraph IV-C3.

    Fig. 1: Reconstruction SDR (SDR) with respect to the number of sources (, noiseless, average of 48 simulations)
  • Figure 2: this figure exhibits the evolution of the cost function throughout the iterations, for 40 sources and a large activation rate (). In the refinement phase, the sparsity parameter is left constant at its final value in order to observe the convergence of the algorithms and the possibility to enhance the reconstruction. nGMCA converges to a lower value than nGMCA while nGMCA does not converge at all. An explanation is provided in paragraph IV-C2.

    Fig. 2: Evolution of the cost function during the iterations for a representative example (, , noiseless).
  • Figure 3: this benchmark shows the influence of coefficients distribution on the reconstruction. Modifying the parameter is a way to make its distribution more or less sparse. A sparser yields less correlated columns and hence a better conditioning of the problem (table 4) which simplifies the separation. nGMCA tends to be more robust than the other algorithms to ill-conditioned mixtures.

    Fig. 3: Reconstruction SDR (SDR) with respect to the distribution parameter (, , noiseless, average of 48 simulations)
    0.50 0.70 0.90 1.20 1.60 2.00 3.00 4.00
    cond() 6.90 8.36 9.43 10.7 12.0 12.9 14.3 15.3
    Fig. 4: Conditioning of with respect to
    (, average of 48 simulations)

Iv-C2 Properly accounting for the constraints

The differences of performance between nGMCA and nGMCA for large numbers of sources and large activation rates in figure 1 can be understood by observing the evolution of the cost function during the iterations (figure 2). Properly applied non-negativity and sparsity constraints can help refining the reconstruction of and once the sources have been sufficiently disambiguated. Indeed, since nGMCA does not exactly solve the constrained cost function, it does not necessarily neither converge to a minimum nor lead to a stable solution, while nGMCA does.

Iv-C3 Vs

The explanation of the differences between nGMCA and nGMCA lies in the properties of hard- and soft-thresholding. This is summarized in figure 5 which shows thresholding applied to three two-dimensional points . When a point —a column of — has two large coefficients, i.e. when the point is in the quadrant, it suffers a bias with soft-thresholding while it remains untouched with hard-thresholding. On the other hand, the shift induced by soft-thresholding increases the ratio of its larger coefficient over its smaller one, which helps reinforce the affectation of a point to the direction of its larger coefficient (in this case: ). With the thresholded point, this means here that . While the lesser bias created by hard-thresholding can lead to better accuracy, reinforcing the affectation of each point to a direction with soft-thresholding leads to better separation of the sources.

Fig. 5: Difference between soft- and hard-thresholding (thresholding at )

With few sources, both nGMCA and nGMCA separate correctly as displays figure 1. The bias is therefore costly for nGMCA since it leads to some compensation behaviors: with 80% activation rate, nearly every coefficient suffers the bias and one source tends to compensate for all the others’ with a positive offset. This can be seen on source 5 in figure 7. The offset correlates with all the ground truth sources as can be observed from the correlation matrix in figure 6: estimated source gathers all the thresholded coefficients of the other estimated sources and is therefore affected by the interferences.

On the other hand, the effect of soft-thresholding on the coefficients amplitude helps giving more weight to larger coefficients which is essential for the separation of sources from ill-conditioned mixtures as shown in figure 1 with a large number of sources ; and in figure 3 with a large for instance.

Fig. 6: Correlation matrix between the estimated and reference sources, in a representative example
( after normalization, nGMCA, , , noiseless)
Fig. 7: Reference and estimated sources number 5 and 6
(nGMCA, , , noiseless, samples 1 to 80)

Iv-D Noisy Data

In this section, noise is added to the data and the input of the algorithms is therefore , with a Gaussian matrix with independent and uniformly distributed coefficients. In the experiments, the amount of noise is given in term of SNR on the data , SNR. nGMCA, nGMCA and nGMCA are compared with Hoyer’s [30], Kim & Park’s algorithms [2] and sparse accelerated HALS [32], which are competitive, publicly available algorithms and take sparsity into account in different ways (paragraph II-B2). There is no straightforward way to set Kim & Park’s algorithm parameters and we therefore used the default parameters of the implementation. It is then left running until convergence. For Hoyer’s algorithm, no prior is applied on and the sparsity ratio of is optimally tuned using the ground truth sources, since there is no automatic way to set the parameters. The sparsity level for the sparse accelerated HALS is also provided from the ground truth sources and both algorithms are left running for 5000 iterations in order to assure convergence. In all this section, in the nGMCA algorithms, as an effective trade-off between noise removal and good separation of the sources.

The comparisons also include an oracle which solves the non-negatively constrained inversion problem in using the ground-truth sources :


The sparsity parameter is set to such as in nGMCA. This oracle stands for the optimal which could be recovered by nGMCA if the uncontaminated mixtures were known. Of course, since the mixture are not known in practice in BSS, the oracle yields unachievable results, but it provides a reference line for the comparisons and a limit for the progression margin of the reconstructions.

Iv-D1 Summary of the experiments

  • Figures 8, 9 and 10: these benchmarks show the reconstruction results for 15 sources with activation rate of 10% (figures 8 and 9) and 30% (figure 10) —the lower the activation rate, the better the sparse prior— with a varying level of noise contamination in the data. Figure 9 and 10 display the loss in SDR compared to the oracle in order to facilitate the visualization. In both cases, nGMCA is less sensitive to noise and outperforms the other algorithms.

    Fig. 8: Reconstruction SDR (SDR) with respect to the data SNR (SNR)
    (, , average of 192 simulations)
    Fig. 9: Reconstruction SDR (SDR) minus the oracle SNR, with respect to the data SNR (SNR) (, , average of 192 simulations)
    Fig. 10: Reconstruction SDR (SDR) minus the oracle SNR, with respect to the data SNR (SNR) (, , average of 192 simulations)
  • Figures 11: this benchmark shows the same experience than the previous ones but with a larger activation rate (50%) which is less favorable to the GMCA-based algorithms. nGMCA remains better in most settings. Hoyer’s algorithm performs similarly to nGMCA for very noisy data but it is important to remember that in our experiment, Hoyer’s algorithm and sparse accelerated HALS are provided with the ground truth sparsity ratios, which would not be available with such precision in practice. For cleaner data, nGMCA and nGMCA begin to overtake nGMCA, which corroborates the results of the previous section for noiseless data with large activation rates and few sources (figure 1).

    Fig. 11: Reconstruction SDR (SDR) minus the oracle SNR, with respect to the data SNR (SNR) (, , average of 192 simulations)
  • Figure 12: This benchmarks provides the reconstruction results for noisy data (15dB), 15 sources, a low activation rate (30%) and a varying number of measurements . The lower the number of measurements, the more difficult the reconstruction is, since the redundancy can help denoising and discriminating between the sources. While we have exhibited results for a large number of measurements so far, this shows that nGMCA also compares favorably with other algorithms when the number of measurements is more restrained.

    Fig. 12: Reconstruction SDR (SDR) with respect to the number of measurements
    (SNR=15dB, , , average of 72 simulations)
  • Figure 13: This benchmarks provides the reconstruction results for sparse (30% activation rate) and noisy (15dB) data, and a varying number of sources . The complexity of the separation rises with the number of sources hence the reconstruction results decrease with it for all algorithms, but in any case, nGMCA performs best for all the values.

    Fig. 13: Reconstruction SDR (SDR) with respect to the number of sources (SNR=15dB, , average of 48 simulations)

Iv-D2 About the initialization and the separation

Figure 14 provides the same results as figure 10 but compares nGMCA and nGMCA with version of them which are initialized with and and hence, with a perfect separation from the start. The initialized nGMCA is also provided with the exact noise standard deviation. The difference in term of reconstruction quality between the regular algorithms and their optimally initialized version is extremely small. This shows that the automatic estimation of the noise level within nGMCA is appropriate, and that the initialization of nGMCA and nGMCA is robust.

Fig. 14: Reconstruction SDR (SDR) with respect to the data SNR (SNR)
(, , average of 192 simulations)

Iv-E Conclusion of the Experiments - the Compromise

Remember that the estimated sources (see [38]) can be decomposed as follows:


Any BSS algorithm must minimize at the same time interferences, noise and artifacts in order to achieve good performance. These three terms are strongly affected by the sparsity prior:

  • Interferences: They intervene when the sources are not correctly, or not completely, separated. The term is computed as the projection on all the sources but the target. Sparsity, as a measure of diversity, can greatly help getting a correct separation of the sources, hence keeping this term relatively small. However, it can still create interferences when the sparse model for the sources departs from their actual structure, such as in figure 7. Interferences then originate from an imperfect source prior and/or badly separated sources.

  • Noise: is the part of the reconstruction that projects on the noise but not the sources. Since the Gaussian noise studied in this article spreads uniformly on all the coefficients, while sparse sources concentrate their energy on few coefficients, the thresholding effect implied by and regularizations significantly denoises the estimates. This reduces the importance of the noise term and therefore helps obtaining better reconstructions.

  • Artifacts: for a given source, the artifacts gathers the residues which are neither explained by the other sources nor the noise. We observed that the soft-thresholding operator introduces a bias which is the main contributor to the artifacts. Again, this term will increase when the sparsity level of the sources decreases: in such a case, the sparsity prior is not as well suited to constrain the morphology of the sources. It is also important to notice that even when the sources are very sparse, improperly constrained solutions are more akin to be contaminated with a higher level of artifacts.

All these aspects interact with each other. As shown in this section, nGMCA provides an effective trade-off between noise, interferences and bias. Indeed, through the experiments, we show that nGMCA outperforms other algorithms in most scenarios, according to the SDR criterion which takes into account these three origins of reconstruction deterioration.
For low activation rate (high sparsity), nGMCA performs definitely better than the other algorithms for a large range of noise levels (figures 8 and 10) while in the extreme noiseless case it performs quite reasonably. In this setting, the sparsity-enforcing prior plays its role at: i) getting a good separation process with respect to other priors (such as the pseudo-norm); this helps reducing the interferences, ii) correctly denoising the sources; this tends to lower the noise contribution and artifacts.

nGMCA is noticeably quite robust to departures from the sparsity assumptions: it performs reasonably well with large activation rates (figures 1 and 11) but at the cost of a slight bias of the estimated sources (figures 6 and 7) which tends to increase the contribution of the artifacts.

Additionally, the nGMCA algorithm provides good separation performance for a large range of numbers of sources (figure 13) as well as for ill-conditioned problems arising from a lack of observations in figure 12, or from correlated mixing directions in figure 3. These results can be explained by the good separation power of the regularizer with an appropriate tuning of the regularization parameter, in order to disentangle sparse sources, together with the appropriate implementation of the non-negativity constraints.

V Application

Fig. 15: NMR spectra of 4 chemical compounds.
Fig. 16: Example of mixture (main component: lactose, SNRdB)
Fig. 17: Reconstruction SDR (SDR) with respect to the data SNR (SNR)
(, , average of 96 synthetic NMR data simulations)

In physical applications, molecules can be identified by their specific Nuclear Magnetic Resonance (NMR) spectra. In this section, we simulate more realistic data, using NMR spectra of real molecules. These spectra are well adapted to the current settings since they are very sparse. The information about the peaks can be found in the Spectral Database for Organic Compounds, SDBS444 The spectra were convoluted with a Laplacian with width at half maximum of 3 samples, in order to account for the acquisition imperfections. The number of samples is set to . is made of real spectra such as the ones displayed in figure 15. Some sources can exhibit strong normalized scalar product, such as cholesterol and menthone spectra for instance (0.67). The mixing coefficients of are simulated in the same way as in the previous section (, ). The observed data is where is an i.i.d. Gaussian noise matrix. An example of measurement where the lactose spectrum is particularly strong is provided in figure 16.

In figure 17, the number of measurements is limited to the number of sources, i.e. , which occurs in some applications; and the curves show the influence of noise in the data. With so few measurements, denoising becomes more important, while at the same time the noise is underestimated by the algorithm since the problem is less constrained. To compensate this behavior, is this time set to 2.

nGMCA fails to obtain suitable results. Indeed, in this setting the conditioning of the problem is extremely poor: and nGMCA is not able to converge. On the other hand, nGMCA performs from 3 to 5dB better than all the other algorithms. This shows once again that nGMCA is particularly robust for a large variety of settings. An example of reconstruction is given in figure 18, where nGMCA is able to identify more peaks that sparse accelerated HALS. Its reconstruction is however not completely noiseless, since there is always a trade-off to find between denoising, separation and bias.

Fig. 18: Example of reconstruction (lactose, SNRdB)

In figure 19, the number of measurements varies from 15 to 90. Since the conditioning greatly improves for larger numbers of measurements, nGMCA results increase very quickly. But in any case, although nGMCA and sparse accelerated HALS obtain similar results to nGMCA when there are enough measurements, nGMCA still performs better than all the other tested algorithms in most of the settings.

Fig. 19: Reconstruction SDR (SDR) with respect to the number of measurements (SNRdB, , average of 36 synthetic NMR data simulations)

Vi Software

Following the philosophy of reproducible research [39], the algorithms introduced in this article will be available at

Vii Conclusion

In this paper we have introduced a new algorithm, nGMCA, to tackle the problem of sparse non-negative BSS from noisy mixtures. Inspired by a recent sparse BSS algorithm coined GMCA, several extensions have been explored which imply that a rigorous handling of both sparse and non-negative constraints are essential to avoid instabilities and sub-optimal solutions. In particular, one extension estimates both a mixing and a source matrix by exactly solving the non-negatively constrained and penalized sub-problems, using proximal techniques. Extensive comparisons have been carried out with state-of-the-art algorithms on synthetic data; these experiments show that this nGMCA extension is robust to noise contamination thanks to a dedicated thresholding strategy, with negligible parameter tuning. The experiments also show that it performs well for a wide variety of settings, including problems with highly correlated mixture directions, few observations or a large number of sources. Finally, the nGMCA algorithm yields highly competitive results on synthetic mixtures of real NMR spectra.

In this article however, the sparsity of the sources only held in the direct or sample domain. Future work will focus on extending nGMCA to deal with the more general setting where the sources are still non-negative in the direct domain, but are sparse in a different signal representation.

Appendix A Resolution of the sub-problems

Algorithm 3 solves the sub-problem in (17) with using FISTA [37].

1:procedure UpdateS()
3:     initialize , , ,
4:     while not converged do
9:     end while
10:     return
11:end procedure
Algorithm 3 FISTA for the sub-problem in


J.B. was supported by the French National Agency for Research (ANR) 11-ASTR-034-02-MultID. We thank the associate editor and the anonymous reviewers for their help in improving the clarity and quality of the paper. We also thank Daniel Machado for his help in writing the paper.


  • [1] M. W. Berry and M. Browne, “Email Surveillance Using Non-negative Matrix Factorization,” Computational & Mathematical Organization Theory, vol. 11, no. 3, pp. 249–264, 2005.
  • [2] J. Kim and H. Park, “Sparse Nonnegative Matrix Factorization for Clustering,” Georgia Institute of Technology, Tech. Rep. GT-CSE-08-01, 2008.
  • [3] C. Févotte, N. Bertin, and J.-L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis,” Neural Computation, vol. 21, no. 3, pp. 793–830, 2009.
  • [4] R. Dubroca, C. Junot and A. Souloumiac, “Weighted NMF for high-resolution mass spectrometry analysis,” Proceedings of EUSIPCO, pp. 1806–1810, 2012.
  • [5] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization, ” Nature, vol. 401, no. 6755, pp. 788–791, 1999.
  • [6] A. Cichocki, R. Zdunek, and S.-i. Amari, “Csiszárs divergences for non-negative matrix factorization: Family of new algorithms,” Independent Component Analysis and Blind Signal Separation, vol. 3889, no. 1, pp. 32–39, 2006.
  • [7] P. Paatero and U. Tapper, “Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994.
  • [8] S. A. Vavasis, “On the Complexity of Nonnegative Matrix Factorization,” SIAM Journal on Optimization, vol. 20, no. 3, pp. 1364–1377, 2009.
  • [9] J. Rapin, J. Bobin, A. Larue and J.-L. Starck, “Robust Non-Negative Matrix Factorization for Multispectral Data with Sparse Prior,” Proceedings of ADA7, 2012.
  • [10] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” Advances in neural information processing systems, vol. 13, no. 1, pp. 556–562, 2001.
  • [11] E. F. Gonzalez and Y. Zhang, “Accelerating the Lee-Seung Algorithm for Nonnegative Matrix Factorization,” Dept. Comput. Appl. Math., Rice University, Tech. Rep. TR0502, pp. 1–13, 2005.
  • [12] M. Berry, M. Browne, A. Langville, V. Pauca, and R. Plemmons, “Algorithms and applications for approximate nonnegative matrix factorization,” Computational Statistics & Data Analysis, vol. 52, no. 1, pp. 155–173, 2007.
  • [13] Y. Xu, “Alternating proximal gradient method for nonnegative matrix factorization,” arXiv:1112.5407v1, December 2011.
  • [14] M. Merritt and Y. Zhang, “Interior-Point Gradient Method for Large-Scale Totally Nonnegative Least Squares Problems,” Journal of Optimization Theory and Applications, vol. 126, no. 1, pp. 191–202, 2005.
  • [15] R. Zdunek and A. Cichocki, “Nonnegative matrix factorization with constrained second-order optimization,” Signal Processing, vol. 87, no. 8, pp. 1904–1916, 2007.
  • [16] A. Cichocki, R. Zdunek, and S.-i. Amari, “Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization,” in Proceedings of ICA, 2007, pp. 169–176.
  • [17] A. Cichocki, R. Zdunek, A. H. Phan, and S.-i. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. John Wiley & Sons, Ltd, 2009.
  • [18] C.-J. Lin, “Projected gradient methods for nonnegative matrix factorization,” Neural Computation, vol. 19, no. 10, pp. 2756–2779, 2007.
  • [19] N. Guan, D. Tao, Z. Luo, and B. Yuan, “NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization,” IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 2882–2898, 2012.
  • [20] M. Babaie-Zadeh, A. Mansour, C. Jutten, and F. Marvasti, “A Geometric Approach for Separating Several Speech Signals.” in Proceedings of ICA, 2004, pp. 798–806.
  • [21] W. S. Ouedraogo, A. Souloumiac, M. Jaïdane, and C. Jutten, “Geometrical method using simplicial cones for overdetermined nonnegative blind source separation: application to real PET images,” in Proceedings of LVA/ICA, 2012, pp. 494–501.
  • [22] N. Gillis and S. A. Vavasis, “Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization,” arXiv:1208.1237, 2012.
  • [23] M. D. Plumbley and E. Oja, “A Nonnegative PCA Algorithm for Independent Component Analysis,” IEEE Transactions on Neural Networks, vol. 15, no. 1, pp. 66–76, 2004.
  • [24] J.-L. Starck, F. Murtagh, and M.-J. Fadili, Sparse Image and Signal Processing - Wavelets, Curvelets, Morphological Diversity. Cambridge University Press, 2010.
  • [25] M. Zibulevsky and B. A. Pearlmutter, “Blind Source Separation by Sparse Decomposition in a Signal Dictionary,” Neural Computation, vol. 13, no. 4, pp. 863–882, 2001.
  • [26] Y. Li, A. Cichocki, S.-i. Amari, S. Shishkin, J. Cao, and F. Gu, “Sparse Representation and Its Applications in Blind Source Separation,” NIPS, 2003.
  • [27] J. Bobin, J.-L. Starck, J. Fadili, and Y. Moudden, “Sparsity and morphological diversity in blind source separation,” IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2662–2674, 2007.
  • [28] J. Eggert and E. Körner, “Sparse coding and NMF,” International Joint Conference on Neural Networks, vol. 2, no. 4, pp. 2529–2533, 2004.
  • [29] P. O. Hoyer, “Non-negative sparse coding,” Proceedings of NNSP, pp. 557–565, 2002.
  • [30] P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,” Journal of Machine Learning Research, vol. 5, no. 5, pp. 1457–1469, 2004.
  • [31] N. Gillis and F. Glineur, “Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization,” Neural Computation, vol. 24, no. 4, pp. 1085–1105, 2012.
  • [32] N. Gillis, “Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing,” Journal of Machine Learning Research, vol. 13, 2012.
  • [33] J. Bobin, J.-L. Starck, Y. Moudden, and J. Fadili, “Blind Source Separation : the Sparsity Revolution,” Advances in Imaging and Electron Physics, vol. 152, no. January, pp. 1–82, 2008.
  • [34] A. Cichocki and R. Zdunek, “Regularized Alternating Least Squares Algorithms for Non-negative Matrix/Tensor Factorization,” in Proceedings of the ISNN, 2007, pp. 793–802.
  • [35] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Modeling Simulation, vol. 4, no. 4, pp. 1168–1200, 2005.
  • [36] Y. Nesterov, “Gradient methods for minimizing composite objective function,” Mathematical Programming, vol. 140, no. 1, pp. 125–161, 2013.
  • [37] A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, p. 183, 2009.
  • [38] E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech & Language Processing, vol. 14, no. 4, pp. 1462–1469, 2006.
  • [39] J. Buckheit and D. Donoho, “WaveLab and Reproducible Research.” Springer-Verlag, 1995, pp. 55–81.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description