Compressed Sensing Performance Analysis via Replica Method using Bayesian framework

# Compressed Sensing Performance Analysis via Replica Method using Bayesian framework

## Abstract

Compressive sensing (CS) is a new methodology to capture signals at lower rate than the Nyquist sampling rate when the signals are sparse or sparse in some domain. The performance of CS estimators is analyzed in this paper using tools from statistical mechanics, especially called replica method. This method has been used to analyze communication systems like Code Division Multiple Access (CDMA) and multiple input multiple output (MIMO) systems with large size. Replica analysis, now days rigorously proved, is an efficient tool to analyze large systems in general. Specifically, we analyze the performance of some of the estimators used in CS like LASSO (the Least Absolute Shrinkage and Selection Operator) estimator and Zero-Norm regularizing estimator as a special case of maximum a posteriori (MAP) estimator by using Bayesian framework to connect the CS estimators and replica method. We use both replica symmetric (RS) ansatz and one-step replica symmetry breaking (1RSB) ansatz, clamming the latter is efficient when the problem is not convex. This work is more analytical in its form. It is deferred for next step to focus on the numerical results.

Solomon A.Tesfamicael (tesfamic@iet.ntnu.no - Spring 2010

## 1Introduction

Recently questions like, why go to so much effort to acquire all the data when most of what we get will be thrown away?;Can we not just directly measure the part that will not end up being thrown away?, that were paused by Donoho[1] and others, triggered a new way of sampling or sensing called compact (“compressed”) sensing (CS).

In CS the task is to estimate or recover a sparse or compressible vector from a measurement vector . These are related through the linear transform . Here, is a sparse vector and . In the seminal papers[1] -[3], is estimated from , by solving a convex optimization problem [4],[5]. Others have used greedy algorithms, like subspace pursuit (SP)[6], orthogonal matching pursuit (OMP) [7] to solve the problem. In this paper the focus is rather on the convex optimization methods. And we consider the noisy measurement system and the linear relation becomes

Here, and are as in above where as the noise term, . There exists a large body of work on how to efficiently obtain an estimate for . And the performances of such estimators are measured using metrics like Restricted Isometric Property (RIP) [8], Mutual Coherence (MC) [9], yet there is apparently no consensus on the bounds in using such metrics. The tool used in this paper gives performance bounds of large size CS systems [10].
Generally the linear model is used to describe a multitude of linear systems like code division multiple access (CDMA) and multiple antenna systems like MIMO, to mention just a few. Tools from statistical mechanics have been employed to analyze large CDMA [11] and MIMO systems [12] [13], and on in this paper the same wisdom is applied to analyze the performance of estimators used in CS. Guo and et al in [10] used a Bayesian framework for statistical inference with noisy measurements and characterize the posterior distribution of individual elements of the sparse signal by describing the mean mean square error(MSE) exactly. To do so, they consider in a large system and applied the decoupling principle using tools from statistical mechanics.
One can find also works that have used the tools from statistical mechanics to analyse CS system performances. To mention some, in [10] as stated above, Guo and et al used the tools to describe the minimum mean square error (MMSE) estimator, in [14] Rangan and others used the maximum a posterior(MAP) estimator of CS systems. These are referred as Replica MMSE claim and Replica MAP claim in [14].

In [16] -[20] authors have used Belief propagation and message passing algorithms for probabilistic reconstruction in CS using replica methods including RS. Especially, in [18] one finds excellent work about phase diagrams in CS systems while [21] generalizes replica analysis using free random matrices. Kabashima and et. al in [22], Ganguli and Sompolinsky in [23] and Takeda and Kabashima [24] -[26] have shown statistical mechanical analysis of the CS by considering the noiseless recovery problem and they indicated that RSB analysis is needed in the phase regimes where the RS solution is not stable. In this paper the performance of those CS estimators, considered as MAP estimator, is shown for the noisy problem by using the replica method including RS and RSB as in [27] -[29], where the RSB ansatz gives better solution when the replica symmetry (RS) solution is unstable. This work is kind of an extension of [29] from MIMO systems to the CS systems.
The paper is organized as follows. In Section 2 the estimator in CS system are presented and redefined using the Bayesian framework, and based on that we present our basis of analysis in Section 3 which is the replica method from the statistical physics and apply it on the different CS estimators which are presented generally as a MAP estimator. In Section 4 we showed our analysis using a paricular example, and Section 5 presents conclusion and of future work.

## 2Bayesian framework for Sparse Estimation

Beginning with a given vector of measurements and measurement matrix , assuming noisy measurement with being i.i.d. Gaussian random variables with zero mean and covariance matrix , estimating the sparse vector is the problem that we are considering where these variables are related by the linear model .

### 2.1Sparse Signal Estimation

Various methods for estimating may be used. The classical approach to solving inverse problems of such type is by least squares (LS) estimator in which no prior information is used and its closed form is

which performs very badly for the CS estimation problem we are considering since it does not find the sparse solution. Another approach to estimate is via the solution of the unconstrained optimization problem

where is a regularizing term, for some non-negative . By taking , emphasis is made on a solution with LP norm, and is defined as a penalizing norm. When , we get

This is penalizing the least square error by the L2 norm and this performs badly as well, since it does not introduce sparsity into the problem. When , we get the L0 norm, which is defined as

the number of the non zero intries of , which actually is a partial norm since it does not satisfy the triangle inequality property, but can be treated as norm by defining it as in [14], and get the L0 norm regularizing estimator

which gives the best solution for the problem at hand since it favors sparsity in . Nonetheless, it is an NP- hard combinatorial problem. Instead, it has been a practice to approximate it using L1 penalizing norm to get the estimator

which is a convex approximation to the L0 penalizing solution Equation 6. The best solution for estimate of the sparse vector is given by the zero-norm regularized estimator which is a hard combinatorial problem. These estimators, - , can equivalently be presented as solutions to constrained optimization problem [1]-[3]. This constrained optimization version of is known as the L1 penalized L2 minimization called LASSO (Least Absolute Shrinkage and Selection Operator) or BPDN(Basis Persuit Denoising), which can be set as Quadratic Programing (QP) and Quadratic Constrained Linear Programing (QCPL) optimization problems. 1 In the following subsection the above estimators are presented as a MAP estimator in Bayesian framework.

### 2.2Bayesian framework for Sparse signal

Equivalently, the estimator of in can generally be presented as MAP estimator under the Bayesian framework. Assume a prior probability distribution for to be

where the cost function is some scalar-valued, non negative function with and

such that for sufficiently large , is finite as in [14]. And let the assumed variance of the noise be given by

where is system parameter which can be taken as where is the assumed variance for each component of . Note that we incorporate the sparsity in the prior pdf via . By the probability density function of given is given by

and prior distribution of by , the posterior distribution for the measurement channel according to Bayes law is

Then the MAP estimator can be shown to be

Now, as we choose different penalizing function in ([14]) we get the different estimators defined above in equations , , and but this time under the Bayesian framework as a MAP estimator [14].

1. Linear Estimators: when reduces to

which is the LMMSE estimator.

2. LASSO Estimator: when we get the LASSO estimator and becomes

3. Zero-Norm regularization estimator: when , we get the Zero-Norm regularization estimator and becomes

Whether these minimization problems are solvable or not the replica analysis results can provide the asymptotic performances of all the above estimators via replica method as showed in [10], [14], [22], [23] and [24]. We apply RS ansatz as used by Müller and et al in [27] and RSB ansatz as used by Zaidel and et al [29] on vector precoding for MIMO. Actually, this work is an extension of the RSB analysis to MIMO systems done in [29] to the CS system.

## 3A Statistical Physics Analysis

The performance of the Bayesian estimators like MMSE and MAP can be done using the pdf of the error vector. The error is random and it should be centered about zero for the estimator to perform well. Kay showed in that way (see section 11.6 in [30]) the performance analysis of MMSE estimator. We believe in general that inference for the asymptotic performance of MAP estimators is best done with statistical mechanical tools including RSB assumption and this is done in the sense of the mean square error (MSE).

The posterior distribution is a sufficient statistics to estimate [10] and the denominator is called the normalizing factor or evidence in Bayesian inference according to [31] and Partition function in statistical mechanics. Actually, it is this connection, which gives the ground to apply the tools, which are used in statistical mechanics. So the task of evaluating the above estimators for the sparse vector can be translated to the statistical physics framework. And let us justify first how the analysis using statistical mechanical tool is able to do it.

Define the Gibbs-Boltzmann distribution as

where is a constant known as the inverse temperature in the terminology of physical systems. For small , the prior probability becomes flat, and for large , the prior probability has sharp modes. , which is an expression of the total energy of the system, is called the Hamiltonian in physics literature and is the partition function given by

Often the Hamiltonian can be given by a quadratic form like

with being a Random matrix of dimension . Then the minimum average energy per component of can be given by

For our system that we considered to address, which is given by or equivalently by , the Hamiltonian becomes

Compared to , the Hamiltonian in has regularizing term in addition to the quadratic form, which is the energy of the error, in which the regularizing term is accountable for addressing the problem in the CS. The Gibbs-Boltzman distribution is a solution to or to in general, after plugging and since they are equivalent problems. The normalizing factor ( aslo called the partition function) of this distribution is central for calculating many important variables and we shall begin from this term to analyse the CS estimators performance.
Assuming that and being drawn from the same discrete set (we shall later provide an example from such a set). The partition function of the posterior distribution given in becomes

by using and . The posterior distribution depends on the predetermined random variables and called quenched states in physics literature [25], [26]. That is, we use fixed states instead of for the large system limit, as , while maintaining fixed. We then calculate the nth moment of the partition function with respect to the predetermined variables, replicas, hence the name replica method came from. The replicated partition function is then given by

where . And after substituting , it becomes

Averaging over the noise first, we get

where and it is assumed to decompose into

and is a diagonal matrix while is orthogonal matrix assumed to be drawn randomly from the uniform distribution defined by the Haar measure on the orthogonal group. For more clarity on this one can see — in [25]. And is given by

Further averaging what we get on the right side of over the cross correlation matrix , by assuming the eigenvalue spectrum of to be self-averaging, we get

The inner expectation in is the Harish -Chandra -Itzykoson-Zuber integral (again see in [27] and [29] and the references therein). The plan here is to evaluate the fixed-rank matrices as . Further following the explanation in [29] becomes

where is the R-transform of the limiting eigenvalue distribution of the matrix J( see, definition 1 in [27] of R-transform or in [12] and [13] for better understanding of R-transform) and denote the Eigenvalues of the matrix , with defined through

for .

After applying replica trick, the average free energy can be given by

and the energy of the error can be calculated from the average free energy as

where we get by using one of the assumptions used in replica calculations, after interchanging the order of the limits we assumed we get the same result. Further, for we have

Since the additive exponential terms of order have no effect on the results when taking saddle point integration in the limiting regime as due to the factor outside the logarithm in any such terms are dropped further for notational simplicity as in [29].

In order to find the summation in we employed the procedure in [29] and the dimensional space spanned by the replicas is split into subshells, defined through matrix

The limit able us to use saddle point integration. Hence we can have the following general result as similar to [29] but extended in this work with the term, which pertains to CS, where we have given the expression that helps to evaluate the performances of the CS estimators using equation .

See Appendix B.

Further, to get specific results we need to assume simple structure onto the cross correlation matrix at the saddle point. So we assume two different assumptions for the entries of called ansatz: replica symmetry(RS) and replica symmetric breaking (RSB) ansatz. Then compare the above limiting energy for the different estimators considered in this paper using the two types of ansatz for the CS system. That is the main purpose that we want to show in this paper. And we took the structures similar to[29] :

1. replica symmetry ansatz :

2. one replica symmetry breaking ansatz :

Applying these assumptions we found some results as given in the following subsections. In the first subsection we assume the RS ansatz which can be considered as the extension of [27]. In the last two subsections we assume RSB ansatz as an extension of [29] to CS.

### 3.1LASSO estimator with RS ansatz

Consider the LASSO estimator given in , which is equivalent to the solution of the main unconstrained optimization problem in penalized sense. Its performance can be expressed in terms of the limiting energy penalty per component using two macroscopic variables and given by

where

and is refering about integration over Gaussian measure, while refers to integration over the pdf of (See Appendix B). Under RS ansatz assumptions we then get the following statement.

See Appendix B.

### 3.2 LASSO estimator with 1RSB ansatz

Moving to the very purpose of the present paper, we use RSB ansatz instead of RS and we repeat what has been done in the above subsections. The limiting energy in this case involves four macroscopic variables like , , , and , which can be given by the following fixed point equations as and , as showed in appendix D, and using the compact notation as in [29]. Let

and its normalized version

where

and

where the other variables , , and , are given by

Then the following two statements are the extensions of the propositions in [29] to CS problems.

See Appendices D.

### 3.3 Zero-Norm regularizing estimator with 1RSB ansatz

The LASSO estimation is considered as the convex relaxation of the Zero-Norm regularizing estimation. Since the latter is a non-convex problem its performance is better evaluated when we use RSB ansatz. So extending proposition to this estimator we get the following statement.

See Appendix D.

## 4 Particular Example: Bernoulli-Gaussian Mixture Distribution

Assume the original vector follows a Bernoulli-Gaussian mixture distribution. So following the Bayesian framework analysis in Section 3, let be composed of random variables with each component obeying the pdf

where , with being the number of non zero entries of . With out loss of generality, let , and vary between and . Also lets assume that the entries of the measurement matrix follow i.i.d. Gaussian random variable of mean zero and variance 1/M. In addition let be such that the signal to noise ratio is .

We have simulated equations (2.7) and (2.8). Figure 1 shows MSE versus of the two estimators, where we se that the penalizing estimator, LMMSE, is not as good as the penalizing estimator in general. Figure 2 shows MSE versus of the two estimators and we see that LMMSE is not sensitive to the sparsity of the vector as compared to the penalizing estimator. Note that we have plotted the -penalizing estimator using different algorithms: LASSO, L1-LS, Log-Bar.

In both figures, we see that the least square estimator is not good for the compressive sensing problem. In addition, we also observed that simulating the penalizing estimator is hard. However, it is possible to apply statistical physics tools, including replica methods, to analayze the performances of all the estimators mentioned above, including zero norm estimator. In [14], median square error was used to compare the different estimators given by - as shown here in figure 3. What we do here is that we include 1RSB ansatz analysis of the performance of the CS estimators as each of them are presented here as a MAP estimator. Actually it is one of the conjuctures made by Müller and others that the performance of MAP estimators is best done using one step RSB. And we showed it here via the minimized energy expressions as given in the propositions by the equations , , and .

### 4.1 Replica symmetry analysis

Considering the macroscopic variables given by and and pluging the assumed distributions above and simplyfying it one more step, the fixed point equations become

Using these macroscopic variables in we find the limiting energy numerically which is given under propostion ? and the result is shown in figure ?.

### 4.2 Replica symmetry Breaking analysis

Considering the same Bernoulli-Gaussian mixture distribution assumed in this section we consider the macroscopic variables which arises from one step replica symmetry breaking (1RSB) ansatz. Then the minimum energy per component as , while is finite ratio, which are given by and are dependent up on four macroscopic variables given by -. The ther first are simplified further as follows:

We can further simplify - as follows

It is possible to simplify these results further and give numerical results. But this is deferred for further work. We expect that the free energy from The RSB ansatz to be greater than the free energy from the RS ansatz for the Zero-Norm regularizing, which can be seen from the analytical terms which have more parameters in . However, for LASSO these free energy, hence the energy error, will be quite similar since for convex minimization problems there is one global minimum and RS ansats is sufficient enough to produce the solution.

## 5 Conclusion

In this paper we have used the replica method to analyze the performance of the estimators used in compressed sensing which can be generalized as MAP estimators. And the performance of MAP estimators can well be shown using replica method including one-step replica breaking ansatz. It is a philosophical standpoint that 1RSB enough to analyze the estimators like MAP. We have only showed here for one particular example for the CS problem, i.e. for Bernoulli-Gaussian distribution. One may be interested to verify it using different examples. In addition we have only compared the estimators performance based on the free energy, but one can also use other metrics such as comparing the input/out put distribution using replica analysis as it is done in [29]. The main result of this paper is analytical analysis for the performance of the estimators used in CS and many things can be extended including efficient algorithms in implementing the numerical analysis.

## 6Acknowledgments

We are grateful to Lars Lundheim, Rodrigo Vicente de Miguel and Benjamin M. Zaidel for interesting discussions and suggestions.

## AImportant Definitions

### a.1Green’s function

In Classical probability theory (CPT) one is concerned with the densities, moments and comulants of elements of random matrices. Where as in Random matrix theory (RMT) also called (Free Random Variable calculus), one is engaed in finding the spectral densities, moments and cumilants ( By Professor Maciej A. Novak). As Fourier transfom is the generating function for the moments in CPT, Green’s function ( also called Stieltjes transform) is the generating function for the spectral moments defined as

where is random matrix and is of the same size unit matrix, are the eigenvalues, and is the spectral moment. The integral is over the support set of the eigenvalues.

### a.2R-transform

The generating function for the cumulants of the CPT is given by the logarithm of the Fourier transfom. In similar maner to the above section we can define the generating function for spectral cumulants. It is called the R-transform (Voiculescu,1986). It is given by

where are the spectral cumulants of the random matrix . We can relate R-transform with Greens’s function as follows:

The spectral density of the matrix converges almost surely to the Marchenko-Pastur law as [27]. And the R-transform of this matrix is given by

and its derivative with respect to z becomes

## BProof of propostion

The avarage energy penality can be derived from the average free energy given in

where is given by . Using as the splitting of the space, we get

where

is the integration measure,

denotes probability weight of the subshell composed of Dirac-functions in the real line. This procedure is a change of integration variables in multiple dimensions where the integration of an exponential function over the replicas has been replaced by integration over the variables . To evaluate we follow [12], [29] and represent the Dirac measure using the Fourier transform as

where and this gives

where

Assuming , which is the sparsity enforcer as described above in LASSO estimator, and after doing some rearrangements, the inner expectation of can be given by

Now defining

we can get

Following the i.i.d. assumption for the component of the sparse vector , and applying the strong law of large numbers as we get