Approximate Sparse Decomposition Based on Smoothed \ell^{0}-Norm

# Approximate Sparse Decomposition Based on Smoothed ℓ0-Norm

###### Abstract

In this paper, we propose a method to address the problem of source estimation for Sparse Component Analysis (SCA) in the presence of additive noise. Our method is a generalization of a recently proposed method (SL0), which has the advantage of directly minimizing the -norm  instead of -norm, while being very fast. SL0 is based on minimization of the smoothed -norm  subject to . In order to better estimate the source vector for noisy mixtures, we suggest then to remove the constraint , by relaxing exact equality to an approximation (we call our method Smoothed -norm  Denoising or SL0DN). The final result can then be obtained by minimization of a proper linear combination of the smoothed -norm and a cost function for the approximation. Experimental results emphasize on the significant enhancement of the modified method in noisy cases.

Approximate Sparse Decomposition Based on Smoothed -Norm

 H. Firouzi, M. Farivar, M. Babaie-Zadeh††thanks: This work has been partially supported by Iran National Science Foundation (INSF) under contract number 86/994, and also by ISMO and French embassy in Iran in the framework of a Gundi-Shapour collaboration program. Sharif University of Technology Department Of Electrical Engineering Tehran, Iran

 C. Jutten GIPSA-LAB, Grenoble, France

Index Terms—  atomic decomposition, sparse decomposition, sparse representation, over-complete signal representation, sparse source separation

## 1 Introduction

Blind source separation (BSS) consists of detecting the underlying source signals within some observed mixtures of them without any prior information about the sources or the mixing system. Let be the vector of observed mixtures and denote the vector of unknown source signals. The mixing equation for the linear instantaneous noisy model will be:

 x=As+n (1)

where A is the unknown mixing matrix and n denotes the additive noise vector. The aim of BSS is then to estimate from observed data x without any knowledge of the mixing matrix, A, or the source signals.

In the determined case, when , the problem can be successfully solved using Independent Component Analysis (ICA) [1]. However, in the underdetermined (or over-complete) cases where fewer observations than sources are provided, even if is known, there are infinitely many solutions to the problem since the number of unknowns exceeds the number of equations. This ill-posedness could be resolved by the assumption of ‘Sparsity’, i.e. resulting in non totally blind source separation problem. A signal is considered to be sparse when only a few of its samples take significant values. Thus, among all possible solutions of (1) we seek the sparsest one, which has then minimum number of nonzero components, i.e. minimum -norm.

SCA can also be viewed as the problem of representing a signal as a linear combination of vectors, called atoms [2]. The atoms collectively form a dictionary, matrix, over which the signal is to be decomposed.There are special interests in the cases where (refer for example to [3] and the references in it). Again we have the problem of finding the sparsest solution of the set of underdetermined linear equations where is the dictionary of atoms. This problem is also called ‘atomic decomposition’ and has many potential applications in diverse fields of science [3].

The general Sparse Component Analysis (SCA) problem consists of two steps: first estimating the mixing matrix, and then finding the sparsest source vector, assuming the mixing matrix to be known. The first step can be accomplished by means of clustering methods [4]. In this paper, we focus our attention on the second step; that is for a given mixing matrix, we wish to find the solution to the following minimization problem:

 ^s=argmin∥s∥0subject to  x=As (2)

where denotes the number of non-zero elements of (and is usually called the -norm of ).

So far, several algorithms such as Basis Pursuit (BP)  [5, 6] and Matching Pursuit (MP)  [2, 4] have been proposed to approximate the solution of (2). The former is based on the observation that for most large underdetermined systems of linear equations the minimal -norm ( ) solution is also the sparsest solution [6]. The minimization of -norm can be efficiently solved using Linear Programming (LP) techniques [7]. Despite all recent developments, computational efficiency has still remained as a main concern.

Recently in  [8], the idea of using smoothed -norm  (SL0) was introduced. More precisely this algorithm minimizes a smooth approximation of the -norm denoted by , and the approximation tends to equality when . The algorithm then sequentially solves the problem:

 maximize Fσ(s)s.t.As=x (3)

for a decreasing sequence of .

This approximation accommodates for both continuous optimization techniques to estimate the sparsest solution of (2) and a noise-tolerant algorithm. The idea turned out to be both efficient and accurate, i.e. providing a better accuracy than -norm  minimization algorithms while being about two orders of magnitude faster [8] than LP.

However, the proposed algorithm has not been designed for the noisy case (1), where a noise vector, n, has been added to the observed mixture x. In this paper, we will try to generalize the proposed method to this noisy case by removing the constraint and relaxing the exact equality to an approximation. In sparse decomposition viewpoint, this means an approximate sparse decomposition of a signal on an over-complete dictionary. The final algorithm will then be an iterative minimization of a proper linear combination of smoothed -norm and .

This paper is organized as follows. Section 2 discusses the main idea of the proposed method. Section 3 gives a formal statement of the final algorithm. Finally, experimental results are presented in Section 4.

## 2 Main Idea

As stated in the previous section, when the dimensions increase, finding the minimum -norm solution of (2) is impractical for two reasons. Firstly because -norm of a vector is a discontinuous function of its elements and leads to an intractable combinatorial optimization, and secondly because of the solution being highly sensitive to noise. The idea of [8] is then to replace the -norm by continuous function, which approximates Kronecker delta function, and use optimization techniques to minimize it subject to , as a constraint. For example, consider the Gaussian like function:

 Fσ(s)=m∑i=1exp(−s2i/2σ2) (4)

where denotes the -th element of vector . For sufficiently small values of , tends to count the number of zero elements of the vector s. Thus we have:

 ∥s∥0=m−limσ→0Fσ(s) (5)

where is the dimension of the vector . The sparsest solution of (2) can then be approximated by the solution of the following minimization problem:

 ^s=argmin (m−Fσ(s))subject % to  x=As (6)

The above minimization task can be accomplished using common gradient type (e.g. steepest descent) algorithms. Note that the value of determines how smooth the function is; the smaller the value of , the better the estimation of but the larger the probability of being trapped in local minima of the cost function. The idea of [8] for escaping from local minima is then to use a decreasing set of values for in each iteration. More precisely for each value of the minimization algorithm is initiated with the minimizer of the for the previous (larger) value of .

Now consider a more realistic case where a noise vector, n, has been added to the observed mixture, as in (1). Here we notice that we have an uncertainty on exact value of the observed vector and it seems reasonable to remove the constraint and reduce it to . This idea is based on the observation that in presence of considerable noise, this constraint may lead to a totally different sparse decomposition. Thus we wish to minimize two terms; as cost of approximation, and the smoothed -norm (), as the measure of sparsity.

For the sake of simplicity, we choose as the cost of approximation. Therefore, the idea will naturally leads us to the following minimization problem:

 ^s=argmin Jσ(s)=(m−Fσ(s))+λ ∥As−x∥22 (7)

where , represents a compromise between the two terms of our cost function; sparsity and equality condition. Intuitively, we may expect that for less noisy mixtures, the value of should be greater than that of observations with high noise quantity. Further discussion on the choice of is left to Section 4.

Another advantage of removing constraint appears when the dictionary matrix, , is not full rank. In this case satisfying the exact equality constraint for observed vectors, which are not in column space of is impossible and as a result the previous algorithm fails to find any answer.

## 3 Final Algorithm

The final algorithm is shown in Fig 1. We call our algorithm SL0 DeNoising (SL0DN). As seen in the algorithm, the final values of the previous estimation are used for the initialization of the next steepest descent step. The decreasing sequence of is used to escape from getting trapped into local minima.

Direct calculations show that:

 Δs=∂Jσ(s)∂s=λ(2AT(As−x)) +1σ2[s1e(−s21/2σ2),…,sme(−s2m/2σ2)]T (8)

In the minimization part, the steepest descent with variable step-size () has been applied: If is such that we multiply it by 1.2 for the next iteration, otherwise it is multiplied by 0.5.

## 4 Experimental Results

In this section we investigate the performance of the proposed method and present our simulation results. Since our framework is a generalization of the idea presented in [8], the practical considerations in that paper can be directly imported into our framework.

In [8], it has been experimentally shown that SL0 is about two orders of magnitude faster than the state-of-the-art interior-point LP solvers [7], while being more accurate. We provide the comparison results of our method with the SL0 method. Moreover a comparison with Basis Pursuit Denoising will be presented.

In all experiments, sparse sources have been artificially generated using a Bernoulli-Gaussian model: each source is ‘active’ with probability , and is ‘inactive’ with probability . If it is active, its value is modeled by a zero-mean Gaussian random variable with variance ; if it is not active, its value is modeled by a zero-mean Gaussian random variable with variance , where . Consequently, each is distributed as:

 si∼p⋅N(0,σon)+(1−p)⋅N(0,σoff), (9)

Sparsity implies that . We considered , and . Elements of the mixing matrix, A, and noise vector, n, were also considered to have normal distributions with standard deviation of 1 and , respectively. As in [8], the set of decreasing values for was fixed to .

Experiment 1. Optimal value of

In this experiment, we investigate the effect of on the performance of our method. We set the dimensions to , , and for each value of we plotted the average Signal to Noise Ratio (SNR), defined by , as a function of (in this section, all the results are averaged over 100 experiments). Figure 2 shows a sample of our experiments. Dash line represents the results obtained from (6), which is independent of . Note that, there exists an interval in which the choice of will result in a better estimation compared to SL0. The SNR takes its maximum in this region for some value of , which we call .

As mentioned in the previous section, we expect an appropriate choice of to be a decreasing function of since with the increase of noise power, the cost of approximation decreases. To verify this, for each value of , we obtained the value of using the curves similar to Fig. 2. Figure 3 shows the values of as a function of in [0,0.15]. We fit these results with a curve of type to find the following rule of thumb for the choice of parameter :

 λ≈10.007+3.5σ2n. (10)

This formula gives a rough approximation for the choice of appropriate in the initialization step of the algorithm.

Notice that we have two choices for the initialization of the proposed method: either to estimate directly from the observed mixtures [9] and then use (10) to find an approximation of , or to follow this iterative approach to solve the problem:

1. choose an arbitrary reasonable value of .

2. take from the curve.

3. run the algorithm and after convergence, compute an estimation of from the obtained source vector and then goto step 2.

Experiment 2. Speed and performance

In order to measure the speed of our algorithm, we run the algorithm 100 times for , and . The simulation is performed in MATLAB7 environment using an Intel 2.8Ghz processor and 512MB of memory. The average run time of SL0DN was 2.062 seconds while the average time for SL0 was 0.242 seconds. Although SL0DN is somehow slower than SL0, but regarding to Table I in [8], the algorithm is still much faster than -magic and FOCUSS.

We proceed with the performance analysis of the proposed algorithm. In this experiment, we fix the parameters with those of experiment 1 and for each value of , choose the value of with (10). In Fig. 4 the average output SNR is compared to the results of SL0. It can be seen that except for low-noise mixtures , SL0DN achieves a better SNR. Thus for noisy mixtures, the case for most real data, the act of approximately satisfying constraint is justified experimentally.

We also compared the results SL0DN with Basis Pursuit DeNoising (BPDN) which is much faster than BP. We used Gradient Projection for Sparse Reconstruction (GPSR) [10] algorithm for BPDN. The results of GPSR are shown in Fig. 4 with dotted line. As we see, the average SNR curve of GPSR lies under the two other curves except for low noise mixtures. It worths mentioning that the average run time of GPSR was 3.156 seconds.

Experiment 3. Dimension Dependency

In this experiment we study the performance of the proposed method for different dimensions of sources and mixtures. In this experiment, the values of and change within a constant ratio (). The average output SNR for both methods are shown in Fig.5. The results suggest that the quality of estimation is almost independent of the dimensions.

## 5 conclusion

We presented a fast method for Sparse Component Analysis (SCA) or atomic decomposition on over-complete dictionaries, in presence of additive noise. The method was a generalization of SL0 method. The proposed method was based on smoothed -norm
minimization and satisfying the equality constraint approximately instead of exact equality constraint. The proposed method is fast while being more robust against noisy mixtures than the original SL0. Experimental results approved the performance and the noise-tolerance of our method for noisy mixtures.

## References

• [1] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, 2001.
• [2] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. on Signal Proc., vol. 41, no. 12, pp. 3397–3415, 1993.
• [3] D. L. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of noise,” IEEE Trans. Info. Theory, vol. 52, no. 1, pp. 6–18, Jan 2006.
• [4] R. Gribonval and S. Lesage, “A survey of sparse component analysis for blind source separation: principles, perspectives, and new challenges,” in Proceedings of ESANN’06, April 2006, pp. 323–330.
• [5] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1999.
• [6] D. L. Donoho, “For most large underdetermined systems of linear equations the minimal -norm solution is also the sparsest solution,” Tech. Rep., 2004.
• [7] E. Candes and J. Romberg, “-magic: Recovery of sparse signals via convex programming” 2005, URL: www.acm.caltech.edu/l1magic/downloads/l1magic.pdf.
• [8] G. H. Mohimani, M. Babaie-Zadeh, and C. Jutten, “Fast sparse representation based on smoothed -norm,” accepted for publication in IEEE Trans. on Signal Proc. (available as eprint arXiv 0809.2508).
• [9] H. Zayyani, M. Babaie-Zadeh and C. Jutten “Source estimation in noisy Sparse Component Analysis,” 15’th Intl. Conf. on Digital Signal Processing (DSP2007), pp. 219-222 July 2007.
• [10] Mario A.T. Figueiredo, Robert D. Nowak, and Stephen J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems”, IEEE Journal of Selected Topics in Signal Processing: Special Issue on Convex Optimization Methods for Signal Processing, 1(4), pp. 586-598, 2007
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters