Approximate Sparse Decomposition Based on Smoothed Norm
Abstract
In this paper, we propose a method to address the problem of source estimation for Sparse Component Analysis (SCA) in the presence of additive noise. Our method is a generalization of a recently proposed method (SL0), which has the advantage of directly minimizing the norm instead of norm, while being very fast. SL0 is based on minimization of the smoothed norm subject to . In order to better estimate the source vector for noisy mixtures, we suggest then to remove the constraint , by relaxing exact equality to an approximation (we call our method Smoothed norm Denoising or SL0DN). The final result can then be obtained by minimization of a proper linear combination of the smoothed norm and a cost function for the approximation. Experimental results emphasize on the significant enhancement of the modified method in noisy cases.
Approximate Sparse Decomposition Based on Smoothed Norm

Index Terms— atomic decomposition, sparse decomposition, sparse representation, overcomplete signal representation, sparse source separation
1 Introduction
Blind source separation (BSS) consists of detecting the underlying source signals within some observed mixtures of them without any prior information about the sources or the mixing system. Let be the vector of observed mixtures and denote the vector of unknown source signals. The mixing equation for the linear instantaneous noisy model will be:
(1) 
where A is the unknown mixing matrix and n denotes the additive noise vector. The aim of BSS is then to estimate from observed data x without any knowledge of the mixing matrix, A, or the source signals.
In the determined case, when , the problem can be successfully solved using Independent Component Analysis (ICA) [1]. However, in the underdetermined (or overcomplete) cases where fewer observations than sources are provided, even if is known, there are infinitely many solutions to the problem since the number of unknowns exceeds the number of equations. This illposedness could be resolved by the assumption of ‘Sparsity’, i.e. resulting in non totally blind source separation problem. A signal is considered to be sparse when only a few of its samples take significant values. Thus, among all possible solutions of (1) we seek the sparsest one, which has then minimum number of nonzero components, i.e. minimum norm.
SCA can also be viewed as the problem of representing a signal as a linear combination of vectors, called atoms [2]. The atoms collectively form a dictionary, matrix, over which the signal is to be decomposed.There are special interests in the cases where (refer for example to [3] and the references in it). Again we have the problem of finding the sparsest solution of the set of underdetermined linear equations where is the dictionary of atoms. This problem is also called ‘atomic decomposition’ and has many potential applications in diverse fields of science [3].
The general Sparse Component Analysis (SCA) problem consists of two steps: first estimating the mixing matrix, and then finding the sparsest source vector, assuming the mixing matrix to be known. The first step can be accomplished by means of clustering methods [4]. In this paper, we focus our attention on the second step; that is for a given mixing matrix, we wish to find the solution to the following minimization problem:
(2) 
where denotes the number of nonzero elements of (and is usually called the norm of ).
So far, several algorithms such as Basis Pursuit (BP) [5, 6] and Matching Pursuit (MP) [2, 4] have been proposed to approximate the solution of (2). The former is based on the observation that for most large underdetermined systems of linear equations the minimal norm ( ) solution is also the sparsest solution [6]. The minimization of norm can be efficiently solved using Linear Programming (LP) techniques [7]. Despite all recent developments, computational efficiency has still remained as a main concern.
Recently in [8], the idea of using smoothed norm (SL0) was introduced. More precisely this algorithm minimizes a smooth approximation of the norm denoted by , and the approximation tends to equality when . The algorithm then sequentially solves the problem:
(3) 
for a decreasing sequence of .
This approximation accommodates for both continuous optimization techniques to estimate the sparsest solution of (2) and a noisetolerant algorithm. The idea turned out to be both efficient and accurate, i.e. providing a better accuracy than norm minimization algorithms while being about two orders of magnitude faster [8] than LP.
However, the proposed algorithm has not been designed for the noisy case (1), where a noise vector, n, has been added to the observed mixture x. In this paper, we will try to generalize the proposed method to this noisy case by removing the constraint and relaxing the exact equality to an approximation. In sparse decomposition viewpoint, this means an approximate sparse decomposition of a signal on an overcomplete dictionary. The final algorithm will then be an iterative minimization of a proper linear combination of smoothed norm and .
This paper is organized as follows. Section 2 discusses the main idea of the proposed method. Section 3 gives a formal statement of the final algorithm. Finally, experimental results are presented in Section 4.
2 Main Idea
As stated in the previous section, when the dimensions increase, finding the minimum norm solution of (2) is impractical for two reasons. Firstly because norm of a vector is a discontinuous function of its elements and leads to an intractable combinatorial optimization, and secondly because of the solution being highly sensitive to noise. The idea of [8] is then to replace the norm by continuous function, which approximates Kronecker delta function, and use optimization techniques to minimize it subject to , as a constraint. For example, consider the Gaussian like function:
(4) 
where denotes the th element of vector . For sufficiently small values of , tends to count the number of zero elements of the vector s. Thus we have:
(5) 
where is the dimension of the vector . The sparsest solution of (2) can then be approximated by the solution of the following minimization problem:
(6) 
The above minimization task can be accomplished using common gradient type (e.g. steepest descent) algorithms. Note that the value of determines how smooth the function is; the smaller the value of , the better the estimation of but the larger the probability of being trapped in local minima of the cost function. The idea of [8] for escaping from local minima is then to use a decreasing set of values for in each iteration. More precisely for each value of the minimization algorithm is initiated with the minimizer of the for the previous (larger) value of .
Now consider a more realistic case where a noise vector, n, has been added to the observed mixture, as in (1). Here we notice that we have an uncertainty on exact value of the observed vector and it seems reasonable to remove the constraint and reduce it to . This idea is based on the observation that in presence of considerable noise, this constraint may lead to a totally different sparse decomposition. Thus we wish to minimize two terms; as cost of approximation, and the smoothed norm (), as the measure of sparsity.
For the sake of simplicity, we choose as the cost of approximation. Therefore, the idea will naturally leads us to the following minimization problem:
(7) 
where , represents a compromise between the two terms of our cost function; sparsity and equality condition. Intuitively, we may expect that for less noisy mixtures, the value of should be greater than that of observations with high noise quantity. Further discussion on the choice of is left to Section 4.
Another advantage of removing constraint appears when the dictionary matrix, , is not full rank. In this case satisfying the exact equality constraint for observed vectors, which are not in column space of is impossible and as a result the previous algorithm fails to find any answer.
3 Final Algorithm
The final algorithm is shown in Fig 1. We call our algorithm SL0 DeNoising (SL0DN). As seen in the algorithm, the final values of the previous estimation are used for the initialization of the next steepest descent step. The decreasing sequence of is used to escape from getting trapped into local minima.
Direct calculations show that:
(8) 
In the minimization part, the steepest descent with variable stepsize () has been applied: If is such that we multiply it by 1.2 for the next iteration, otherwise it is multiplied by 0.5.
4 Experimental Results
In this section we investigate the performance of the proposed method and present our simulation results. Since our framework is a generalization of the idea presented in [8], the practical considerations in that paper can be directly imported into our framework.
In [8], it has been experimentally shown that SL0 is about two orders of magnitude faster than the stateoftheart interiorpoint LP solvers [7], while being more accurate. We provide the comparison results of our method with the SL0 method. Moreover a comparison with Basis Pursuit Denoising will be presented.
In all experiments, sparse sources have been artificially generated using a BernoulliGaussian model: each source is ‘active’ with probability , and is ‘inactive’ with probability . If it is active, its value is modeled by a zeromean Gaussian random variable with variance ; if it is not active, its value is modeled by a zeromean Gaussian random variable with variance , where . Consequently, each is distributed as:
(9) 
Sparsity implies that . We considered , and . Elements of the mixing matrix, A, and noise vector, n, were also considered to have normal distributions with standard deviation of 1 and , respectively. As in [8], the set of decreasing values for was fixed to .
Experiment 1. Optimal value of
In this experiment, we investigate the effect of on the performance of our method. We set the dimensions to , , and for each value of we plotted the average Signal to Noise Ratio (SNR), defined by , as a function of (in this section, all the results are averaged over 100 experiments). Figure 2 shows a sample of our experiments. Dash line represents the results obtained from (6), which is independent of . Note that, there exists an interval in which the choice of will result in a better estimation compared to SL0. The SNR takes its maximum in this region for some value of , which we call .
As mentioned in the previous section, we expect an appropriate choice of to be a decreasing function of since with the increase of noise power, the cost of approximation decreases. To verify this, for each value of , we obtained the value of using the curves similar to Fig. 2. Figure 3 shows the values of as a function of in [0,0.15]. We fit these results with a curve of type to find the following rule of thumb for the choice of parameter :
(10) 
This formula gives a rough approximation for the choice of appropriate in the initialization step of the algorithm.
Notice that we have two choices for the initialization of the proposed method: either to estimate directly from the observed mixtures [9] and then use (10) to find an approximation of , or to follow this iterative approach to solve the problem:

choose an arbitrary reasonable value of .

take from the curve.

run the algorithm and after convergence, compute an estimation of from the obtained source vector and then goto step 2.
Experiment 2. Speed and performance
In order to measure the speed of our algorithm, we run the algorithm 100 times for , and . The simulation is performed in MATLAB7 environment using an Intel 2.8Ghz processor and 512MB of memory. The average run time of SL0DN was 2.062 seconds while the average time for SL0 was 0.242 seconds. Although SL0DN is somehow slower than SL0, but regarding to Table I in [8], the algorithm is still much faster than magic and FOCUSS.
We proceed with the performance analysis of the proposed algorithm. In this experiment, we fix the parameters with those of experiment 1 and for each value of , choose the value of with (10). In Fig. 4 the average output SNR is compared to the results of SL0. It can be seen that except for lownoise mixtures , SL0DN achieves a better SNR. Thus for noisy mixtures, the case for most real data, the act of approximately satisfying constraint is justified experimentally.
We also compared the results SL0DN with Basis Pursuit DeNoising (BPDN) which is much faster than BP. We used Gradient Projection for Sparse Reconstruction (GPSR) [10] algorithm for BPDN. The results of GPSR are shown in Fig. 4 with dotted line. As we see, the average SNR curve of GPSR lies under the two other curves except for low noise mixtures. It worths mentioning that the average run time of GPSR was 3.156 seconds.
Experiment 3. Dimension Dependency
In this experiment we study the performance of the proposed method for different dimensions of sources and mixtures. In this experiment, the values of and change within a constant ratio (). The average output SNR for both methods are shown in Fig.5. The results suggest that the quality of estimation is almost independent of the dimensions.
5 conclusion
We presented a fast method for Sparse
Component Analysis (SCA) or atomic decomposition on overcomplete
dictionaries, in presence of additive noise. The method was a
generalization of SL0 method. The proposed method was based on
smoothed norm
minimization and satisfying the equality
constraint approximately instead of exact equality constraint.
The proposed method is fast while being more robust against noisy
mixtures than the original SL0. Experimental results approved the
performance and the noisetolerance of our method for noisy
mixtures.
References
 [1] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, 2001.
 [2] S. Mallat and Z. Zhang, “Matching pursuits with timefrequency dictionaries,” IEEE Trans. on Signal Proc., vol. 41, no. 12, pp. 3397–3415, 1993.
 [3] D. L. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of noise,” IEEE Trans. Info. Theory, vol. 52, no. 1, pp. 6–18, Jan 2006.
 [4] R. Gribonval and S. Lesage, “A survey of sparse component analysis for blind source separation: principles, perspectives, and new challenges,” in Proceedings of ESANN’06, April 2006, pp. 323–330.
 [5] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1999.
 [6] D. L. Donoho, “For most large underdetermined systems of linear equations the minimal norm solution is also the sparsest solution,” Tech. Rep., 2004.
 [7] E. Candes and J. Romberg, “magic: Recovery of sparse signals via convex programming” 2005, URL: www.acm.caltech.edu/l1magic/downloads/l1magic.pdf.
 [8] G. H. Mohimani, M. BabaieZadeh, and C. Jutten, “Fast sparse representation based on smoothed norm,” accepted for publication in IEEE Trans. on Signal Proc. (available as eprint arXiv 0809.2508).
 [9] H. Zayyani, M. BabaieZadeh and C. Jutten “Source estimation in noisy Sparse Component Analysis,” 15’th Intl. Conf. on Digital Signal Processing (DSP2007), pp. 219222 July 2007.
 [10] Mario A.T. Figueiredo, Robert D. Nowak, and Stephen J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems”, IEEE Journal of Selected Topics in Signal Processing: Special Issue on Convex Optimization Methods for Signal Processing, 1(4), pp. 586598, 2007