A Greedy Algorithm for the Analysis Transform Domain

A Greedy Algorithm for the Analysis Transform Domain

R. Giryes The Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
Abstract

Many image processing applications benefited remarkably from the theory of sparsity. One model of sparsity is the cosparse analysis one. It was shown that using -minimization one might stably recover a cosparse signal from a small set of random linear measurements if the operator is a frame. Another effort has provided guarantees for dictionaries that have a near optimal projection procedure using greedy-like algorithms. However, no claims have been given for frames. A common drawback of all these existing techniques is their high computational cost for large dimensional problems.

In this work we propose a new greedy-like technique with theoretical recovery guarantees for frames as the analysis operator, closing the gap between greedy and relaxation techniques. Our results cover both the case of bounded adversarial noise, where we show that the algorithm provides us with a stable reconstruction, and the one of random Gaussian noise, for which we prove that it has a denoising effect, closing another gap in the analysis framework. Our proposed program, unlike the previous greedy-like ones that solely act in the signal domain, operates mainly in the analysis operator’s transform domain. Besides the theoretical benefit, the main advantage of this strategy is its computational efficiency that makes it easily applicable to visually big data. We demonstrate its performance on several high dimensional images.

keywords:
Sparse representations, Compressed sensing, Sparsity for Big Data, Synthesis, Analysis, Iterative hard threshodling, Greedy Algorithms.
Msc:
[2010] 94A20, 94A12, 62H12

1 Introduction

For more than a decade the idea that signals may be represented sparsely has a great impact on the field of signal and image processing. New sampling theory has been developed Donoho06Stable () together with new tools for handling signals in different types of applications, such as image denoising dabov06image (), image deblurring Danielyan12BM3D (), super-resolution Zeyde12single (), radar Fannjiang10Compressed (), medical imaging Lustig08Compressed () and astronomy Salmon13Poisson (), to name a few Bruckstein09From (). Remarkably, in most of these fields the sparsity based techniques achieve state-of-the-art results.

The classical sparse model is the synthesis one. In this model the signal is assumed to have a -sparse representation under a given dictionary . Formally,

 x=Dα,  ∥α∥0≤k, (1)

where is the -pseudo norm that counts the number of non-zero entries in a vector. Notice, that the non-zero elements in corresponds to a set of columns that creates a low-dimensional subspace in which resides.

Recently, a new sparsity based model has been introduced: the analysis one elad07Analysis (); Nam12Cosparse (). In this framework, we look at the coefficients of , the coefficients of the signal after applying the transform on it. The sparsity of the signal is measured by the number of zeros in . We say that a signal is -cosparse if has zero elements. Formally,

 ∥Ωx∥0≤p−ℓ. (2)

Remark that each zero element in corresponds to a row in to which the signal is orthogonal and all these rows define a subspace to which the signal is orthogonal. Similar to synthesis, when the number of zeros is large the signal’s subspace is low dimensional. Though the zeros are those that define the subspace, in some cases it is more convenient to use the number of non-zeros as done in Candes11Compressed (); Vaiter13Robust ().

The main setup in which the above models have been used is

 y=Mx+e, (3)

where is a given set of measurements, is the measurement matrix and is an additive noise which is assumed to be either adversarial bounded noise Donoho06Stable (); Bruckstein09From (); Candes06Near (); Candes05Decoding () or with a certain given distribution such as Gaussian Candes07Dantzig (); Bickel09Simultaneous (); Giryes12RIP (). The goal is to recover from and this is the focus of our work. For details about other setups, the curious reader may refer to Harmany12Spiral (); Soltanolkotabi13Robust (); Giryes14Sparsity (); Giryes14SparsityBased (); Pang13Ranking (); Pang14Learning (); Pang10Robust ().

Clearly, without a prior knowledge it is impossible to recover from in the case , or have a significant denoising effect when is random with a known distribution. Hence, having a prior, such as the sparsity one, is vital for these tasks. Both the synthesis and the analysis models lead to (different) minimization problems that provide estimates for the original signal .

In synthesis, the signal is recovered by its representation, using

 ^αS−ℓ0=\operatornamewithlimitsargmin~α∈Rn∥~α∥0 s.t ∥y−MD~α∥2≤λe, (4)

where is an upper bound for if the noise is bounded and adversarial ( refers to synthesis-). Otherwise, it is a scalar dependent on the noise distribution Candes07Dantzig (); Bickel09Simultaneous (); Candes06Modern (). The recovered signal is simply . In analysis, we have the following minimization problem.

 ^xA−ℓ0=\operatornamewithlimitsargmin~x∈Rd∥Ω~x∥0 s.t ∥y−M~x∥2≤λe. (5)

The values of are selected as before depending on the noise properties ( refers to analysis-).

Both (4) and (5) are NP-hard problems Nam12Cosparse (); Davis97Adaptive (). Hence, approximation techniques are required. These are divided mainly into two categories: relaxation methods and greedy algorithms. In the first category we have the -relaxation elad07Analysis () and the Dantzig selector Candes07Dantzig (), where the latter has been proposed only for synthesis. The -relaxation leads to the following minimization problems for synthesis and analysis respectively111Note that setting to be the finite difference operator in (7) leads to the anisotropic version of the well-known total variation (TV) Rudin92TV (). See Needell13Stable (); Giryes15Sampling () for more details.:

 ^αS−ℓ1=\operatornamewithlimitsargmin~α∈Rn∥~α∥1 s.t ∥y−MD~α∥2≤λe, (6) ^xA−ℓ1=\operatornamewithlimitsargmin~x∈Rd∥Ω~x∥1 s.t ∥y−M~x∥2≤λe. (7)

Among the synthesis greedy strategies we mention the thresholding method, orthogonal matching pursuit (OMP) Chen89Orthogonal (); MallatZhang93 (); Pati93OMP (), CoSaMP Needell09CoSaMP (), subspace pursuit (SP) Dai09Subspace (), iterative hard thresholding Blumensath09Iterative () and hard thresholding pursuit (HTP) Foucart11Hard (). Their counterparts in analysis are thresholding Peleg12Performance (), GAP Nam12Cosparse (), analysis CoSaMP (ACoSaMP), analysis SP (ASP), analysis IHT (AIHT) and analysis HTP (AHTP) Giryes14Greedy ().

An important question to ask is what are the recovery guarantees that exist for these methods. One main tool that was used for answering this question in the synthesis context is the restricted isometry property Candes06Near (). It has been shown that under some conditions on the RIP of , we have in the adversarial bounded noise case that

 ∥∥^αalg−α∥∥22≤Calg∥e∥22, (8)

where is the recovered representation by one of the approximation algorithms and is a constant that depends on the RIP of and differs for each of the methods Donoho06Stable (); Candes06Near (); Needell09CoSaMP (); Dai09Subspace (); Blumensath09Iterative (); Foucart11Hard (); Zhang11Sparse (); foucart10Sparse (); Cai10New (). This result implies that these programs achieve a stable recovery.

Similar results were provided for the case where the noise is random white Gaussian with variance Candes07Dantzig (); Bickel09Simultaneous (); Giryes12RIP (); BenHaim09Coherence (). In this case the reconstruction error is guaranteed to be Candes07Dantzig (); Bickel09Simultaneous (); Giryes12RIP (). Unlike the adversarial noise case, here we may have a denoising effect, as the recovery error can be smaller than the initial noise power . Remark that the above results can be extended also to the case where we have a model mismatch and the signal is not exactly -sparse.

In the analysis framework we have similar guarantees for the adversarial noise case. However, since the analysis model treats the signal directly, the guarantees are in terms of the signal and not the representation like in (8). Two extensions for the RIP have been proposed providing guarantees for analysis algorithms. The first is the D-RIP Candes11Compressed ():

Definition 1.1 (D-Rip Candes11Compressed ())

A matrix has the D-RIP with a dictionary and a constant , if is the smallest constant that satisfies

 (1−δk)∥D~α∥22≤∥MD~α∥22≤(1+δk)∥D~α∥22, (9)

whenever is -sparse.

The D-RIP has been used for studying the performance of the analysis -minimization Candes11Compressed (); Liu12Compressed (); Kabanava13Analysis (). It has been shown that if is a frame with frame constants and , and has the D-RIP with then

 ∥∥^xA−ℓ1−x∥∥22≤CA−ℓ1(∥e∥22+∥Ωx−[Ωx]k∥21k), (10)

where the operator is a hard thresholding operator that keeps the largest elements in a vector, is a function of , and , and and are some constants. A similar result has been proposed for analysis -minimization with the finite difference operator Needell13Stable (); Giryes15Sampling ().

The second is the O-RIP Giryes14Greedy (), which was used for the study of the greedy-like algorithms ACoSaMP, ASP, AIHT and AHTP.

Definition 1.2 (O-Rip Giryes14Greedy ())

A matrix has the O-RIP with an operator and a constant , if is the smallest constant that satisfies

 (1−δΩ,ℓ)∥v∥22≤∥Mv∥22≤(1+δΩ,ℓ)∥v∥22, (11)

whenever has at least zeroes.

With the assumption that there exists a cosupport selection procedure that implies a near optimal projection for with a constant (see Definition 3.13 in Section 3). It has been proven for such operators that if then

 ∥∥^xA−ℓ1−x∥∥22≤Calg(∥e∥22+∥∥x−xℓ∥∥22), (12)

where is the largest singular value of , is the best -cosparse approximation for , is a function of , and , and and are some constants that differ for each technique.

Notice that the conditions in synthesis imply that no linear dependencies can be allowed within small number of columns in the dictionary as the representation is the focus. The existence of such dependencies may cause ambiguity in its recovery. Since the analysis model performs in the signal domain, i.e. focus on the signal and not its representation, dependencies may be allowed within the dictionary. A recent series of contributions have shown that high correlations can be allowed in the dictionary also in the synthesis framework if the signal is the target and not the representation Blumensath11Sampling (); Davenport13Signal (); Giryes13CanP0 (); Giryes13OMP (); Giryes13IHTconf (); Giryes14GreedySignal (); Giryes14NearOracle ().

1.1 Our Contribution

The conditions for greedy-like techniques require the constant to be close to . Having a general projection scheme with is NP-hard Tillmann14Projection (). The existence of a program with a constant close to one for a general operator is still an open problem. In particular, it is not known whether there exists a procedure that gives a small constant for frames. Thus, there is a gap between the results for the greedy techniques and the ones for the -minimization.

Another drawback of the existing analysis greedy strategies is their high complexity. All of them require applying a projection to an analysis cosparse subspace, which implies a high computational cost. Therefore, unlike in the synthesis case, they do not provide a “cheap” counterpart to the -minimization.

In this work we propose a new efficient greedy program, the transform domain IHT (TDIHT), which is an extension of IHT that operates in the analysis transform domain. Unlike AIHT, TDIHT has a low complexity, as it does not require applying computationally demanding projections like AIHT, and it inherits guarantees similar to the ones of analysis -minimization for frames. Given that and its pseudo-inverse can be applied efficiently, TDIHT demands in each iteration applying only , , the measurement matrix and its transpose together with other point-wise operations. This puts TDIHT as an efficient alternative to the existing analysis methods especially for big data problems, as for high dimensional problems all the existing techniques require solving computationally demanding high dimensional optimization problems Nam12Cosparse (); Candes11Compressed (); Giryes14Greedy (). The assumption about and holds for many types of operators such as curvelet, wavelets, Gabor transforms and discrete Fourier transform Candes11Compressed ().

Another gap exists between synthesis and analysis. To the best of our knowledge, no denoising guarantees has been proposed for analysis strategies apart from the work in Peleg12Performance () that examines the performance of thresholding for the case and Giryes14effective () that studies the analysis -minimization. We develop results for Gaussian noise in addition to the ones for adversarial noise, showing that it is possible to have a denoising effect using the analysis model also when and for different algorithms other than thresholding.

Our theoretical guarantees for TDIHT can be summarized by the following theorem:

Theorem 1.3 (Recovery Guarantees for TDIHT with Frames)

Let where and is a frame with frame constants and , i.e., and . For certain selections of and using only measurements, the recovery result of TDIHT satisfies

 ∥x−^x∥2 ≤ O(BA∥e∥2)

for the case where is an adversarial noise, implying that TDIHT leads to a stable recovery. For the case that is random i.i.d zero-mean Gaussian distributed noise with a known variance we have

 E∥x−^x∥22 ≤ O(B2A2klog(p)σ2)

implying that TDIHT achieves a denoising effect.

Note that .

Remark 1.5

Using Remark 2.3 in Needell09CoSaMP (), we can convert the norm into an norm in the model mismatch terms in (1.3) and (1.3), turning it to be more similar to what we have in the bound for analysis -minimization in (10).

Remark 1.6

Theorem 1.3 is a combination of Theorems 4.14 and 4.17, plugging the minimal number of measurements implied by the D-RIP conditions of these theorems. Measurement matrices with sub-Gaussian entries are examples for matrices that satisfy this number of measurements Candes11Compressed ().

1.2 Organization

This paper is organized as follows. Section 2 includes the notations used in this paper together with some preliminary lemmas for the D-RIP. Section 3 presents the transform domain IHT. Section 4 provides the proof of our main theorem. In Section 5 we present some simulations demonstrating the efficiency of TDIHT and its applicability to big data. Section 6 concludes the paper.

2 Notations and Preliminaries

We use the following notation in our work:

• We denote by the euclidean norm for vectors and the spectral () norm for matrices; by the norm that sums the absolute values of a vector; and by the pseudo-norm which counts the number of non-zero elements in a vector.

• Given a cosupport set , is a sub-matrix of with the rows that belong to .

• In a similar way, for a support set , is a sub-matrix of with columns222By the abuse of notation we use the same notation for the selection sub-matrices of rows and columns. The selection will be clear from the context since in analysis the focus is always on the rows and in synthesis on the columns. corresponding to the set of indices in .

• keeps the elements in supported on and zeros the rest.

• returns the support of a vector and returns the support set of elements with largest magnitudes.

• denotes the Moore-Penrose pseudo-inverse Moore1920PseudoInverse ().

• is the orthogonal projection onto the orthogonal complement of .

• is the orthogonal projection onto .

• Throughout the paper we assume that .

• The original unknown -cosparse vector is denoted by , its cosupport by and the support of the non-zero entries by . By definition and .

• For a general -cosparse vector we use , for a general vector in the signal domain and for a general vector in the analysis transform or dictionary representation domain .

We now turn to present several key properties of the D-RIP. All of their proofs except of the last one, which we present hereafter, appear in Giryes14GreedySignal ().

Corollary 2.7

If satisfies the D-RIP with a constant then

 ∥MPT∥22≤1+δk (15)

for every such that .

Lemma 2.8

For it holds that .

Lemma 2.9

If satisfies the D-RIP then

 ∥PT(I−M∗M)PT∥2≤δk (16)

for any such that .

Corollary 2.10

If satisfies the D-RIP then

 ∥∥PT1(I−M∗M)PT2∥∥2≤δk, (17)

for any and such that .

The last Lemma we present is a generalization of Proposition 3.5 in Needell09CoSaMP ().

Lemma 2.11

Suppose that satisfies the upper inequality of the D-RIP, i.e.,

 ∥MDw∥2≤√1+δk∥Dw∥2 ∀w,∥w∥0≤k, (18)

and that . Then for any representation we have

 ∥MDw∥2≤√1+δkA(∥w∥2+1√k∥w∥1). (19)

The proof is left to Appendix A. Before we proceed we recall the problem we aim at solving:

Definition 2.12 (Problem P)

Consider a measurement vector such that , where is either -cosparse under a given and fixed analysis operator or almost -cosparse, i.e. has leading elements. The non-zero locations of the leading elements is denoted by . is a degradation operator and is an additive noise. Our task is to recover from . The recovery result is denoted by .

3 Transform Domain Iterative Hard Thresholding

Our goal in this section is to provide a greedy-like approach that provide guarantees similar to the one of analysis -minimization. By analyzing the latter we notice that though it operates directly on the signal, it actually minimizes the coefficients in the transform domain. In fact, all the proof techniques utilized for this recovery strategy use the fact that nearness in the analysis dictionary domain implies the same in the signal domain Candes11Compressed (); Needell13Stable (); Liu12Compressed (). Using this fact, recovery guarantees have been developed for tight frames Candes11Compressed (), general frames Liu12Compressed () and the 2D finite difference operator which corresponds to TV Needell13Stable (). Working in the transform domain is not a new idea and was used before, especially in the context of dictionary learning Ophir11Multi (); Ravishankar11MR (); Ravishankar13Sparsifying ().

Henceforth, our strategy for extending the results of the -relaxation is to modify the greedy-like approaches to operate in the transform domain. In this paper we concentrate on IHT. Before we turn to present the transform domain version of IHT we recall its synthesis and analysis versions.

3.1 Quick Review of IHT and AIHT

IHT and AIHT are assumed to know the cardinalities and respectively. They aim at approximating variants of (4) and (5):

 \operatornamewithlimitsargmin~α∥y−MD~α∥22 s.t. ∥~α∥0≤k, (20)

and

 \operatornamewithlimitsargmin~x∥y−M~x∥22 s.t. ∥Ω~x∥0≤p−ℓ. (21)

IHT aims at recovering the representation and uses only one matrix in the whole recovery process. AIHT targets the signal and utilizes both and . For recovering the signal using IHT, one has . IHT Blumensath09Iterative () and AIHT Giryes14Greedy () are presented in Algorithms 1 and 2 respectively.

The iterations of IHT and AIHT are composed of two basic steps. In both of them the first is a gradient step, with a step size , in the direction of minimizing . The second step of IHT projects to the closest -sparse subspace by keeping the elements with the largest magnitudes. In AIHT a cosupport selection procedure, , is used for the cosupport selection and then orthogonal projection onto the corresponding orthogonal subspace is performed. In the theoretical study of AIHT this procedure is assumed to apply a near optimal projection:

Definition 3.13

A procedure implies a near-optimal projection with a constant if for any

 ∥∥z−Q^Sℓ(z)z∥∥22≤CℓminΛ∥z−QΛz∥22. (22)

In the algorithms’ description we neither specify the stopping criterion, nor the step size selection technique. For exact details we refer the curious reader to Giryes14Greedy (); Blumensath09Iterative (); Kyrillidis11Recipes (); Cevher11ALPS ().

3.2 Transform Domain Analysis Greedy-Like Method

The drawback of AIHT for handling analysis signals is that it assumes the existence of a near optimal cosupport selection scheme . The type of analysis dictionaries for which a known feasible cosupport selection technique exists is very limited Giryes14Greedy (); Tillmann14Projection (). Note that this limit is not unique only to the analysis framework Blumensath11Sampling (); Davenport13Signal (); Giryes13IHTconf (). Of course, it is possible to use a cosupport selection strategy with no guarantees on its near-optimality constant and it might work well in practice. For instance, simple hard thresholding has been shown to operate reasonably well in several instances where no practical projection is at hand Giryes14Greedy (). However, the theoretical performance guarantees depend heavily on the near-optimality constant. Since for many operators there are no known selection schemes with small constants, the existing guarantees for AIHT, as well as the ones of the other greedy-like algorithms, are very limited. In particular, to date, they do not cover frames and the 2D finite difference operator as the analysis dictionary. Moreover, even when an efficient optimal selection procedure exists AIHT is required to apply a projection to the selected subspace which in many cases is computationally demanding.

To bypass these problems we propose an alternative greedy approach for the analysis framework that operates in the transform domain instead of the signal domain. This strategy aims at finding the closest approximation to and not to using the fact that for many analysis operators proximity in the transform domain implies the same in the signal domain. In some sense, this approach imitates the classical synthesis techniques that recover the signal by putting the representation as the target.

In Algorithm 3 an extension for IHT for the transform domain is proposed. This algorithm makes use of , the number of non-zeros in , and , a dictionary satisfying . One option for is . If does not have a full row rank, we may compute by adding to rows that resides in its rows’ null space and then applying the pseudo-inverse. For example, for the 2D finite difference operator we may calculate by computing its pseudo inverse with an additional row composed of ones divided by . However, this option is not likely to provide good guarantees. As the 2D finite difference operator is beyond the scope of this paper we refer the reader to Needell13Stable () for more details on this subject.

Indeed, for many operators , there are infinitely many options for such that . When Algorithm 3 forms the final solution , different choices of will lead to different results since may not belong to . Therefore, in our study of the algorithm in the next section we focus on the choice of as a frame and .

For the step size we can use several options. The first one it use a constant step size for all iterations. The second one is to use an ‘optimal’ step size that minimizes the fidelity term

 μt=\operatornamewithlimitsargminμ∥∥y−MD^wt∥∥22. (23)

Since and the above minimization problem does not have a close form solution and a line search for may involve a change of several times. Instead, one may follow Kyrillidis11Recipes () and limits the minimization to the support . In this case we have

 μt = = (y−MD~TΩ~TD^wt−1)∗MD~TΩ~TM∗(y−MD^wt−1)∥∥MD~TΩ~TM∗(y−MD^wt−1)∥∥22.

We denote this selection technique as the adaptive changing step-size selection.

In this paper we focus in the theoretical part on TDIHT with . Our results can be easily extended also to the other step-size selection options using the proof techniques in Giryes14Greedy (); Kyrillidis11Recipes (). Naturally, TDIHT with an adaptive changing step-size selection behaves better than TDIHT with a constant . Therefore, in Section 5 we demonstrate the performance of the former.

4 Frame Guarantees

We provide theoretical guarantees for the reconstruction performance of the transform domain analysis IHT (TDIHT), with a constant step size , for frames. We start with the case that the noise is adversarial.

Theorem 4.14 (Stable Recovery of TDIHT with Frames)

Consider the problem and apply TDIHT with a constant step-size and . Suppose that is a bounded adversarial noise, is a frame with frame constants such that and , and has the D-RIP with the dictionary and a constant . If (i.e. ), then after a finite number of iterations

 ∥∥x−^xt∥∥2 ≤ 2ηA+2B√1+δ2kA(1+1−ρt1−ρ)∥e∥2 +2(1+δ2k)A21−ρt1−ρ((1+A+A22)∥ΩTCx∥2 +1√k∥ΩTCx∥1 ),

implying that TDIHT leads to a stable recovery. For tight frames and for other frames .

The result of this theorem is a generalization of the one presented in foucart10Sparse () for IHT. Its proof follows from the following lemma.

Lemma 4.15

Consider the same setup of Theorem 4.14. Then the -th iteration of TDIHT satisfies

 (26) +1+δ2kA((1+A+A22)∥ΩTCx∥2+1√k∥ΩTCx∥1) +∥∥ΩT∪^TtM∗e∥∥2.

The proof of the above lemma is left to Appendix B. We turn now to prove Theorem 4.14.

Proof:[Proof of Theorem 4.14] First notice that implies that . Using Lemma 4.15, recursion and the definitions of and , we have that after iterations

 ∥∥(Ωx−wtg)T∪^Tt∥∥2≤ρt∥∥(Ωx−w0g)T∪^T0∥∥2 (27) +(1+ρ+ρ2+⋯+ρt−1)(∥ΩT∪TeM∗e∥2 +1+δ2kA((1+A)∥ΩTCx∥2+1√k∥ΩTCx∥1)) =ρt∥ΩTx∥2+1−ρt1−ρ(∥ΩT∪TeM∗e∥2+ 1+δ2kA((1+A)∥ΩTCx∥2+1√k∥ΩTCx∥1)),

where the last equality is due to the equation of geometric series () and the facts that and . For a given precision factor we have that if then

 (28)

As and , we have using matrix norm inequality

 ∥∥x−^xt∥∥2=∥∥D(Ωx−^wt)∥∥2≤1A∥∥Ωx−^wt∥∥2. (29)

Using the triangle inequality and the facts that is supported on and , we have

 (30)

By using again the triangle inequality and the fact that is the best -term approximation for we get

 ∥∥x−^xt∥∥2 ≤ 1A∥ΩTCx∥2+1A∥∥(Ωx−wtg)T∪^Tt∥∥2 +1A∥∥(wtg−^wt)T∪^Tt∥∥2 ≤ 1A∥ΩTCx∥2+2A∥∥(Ωx−wtg)T∪^Tt∥∥2.

Plugging (28) and (27) in (4) yields

 ∥∥x−^xt∥∥2≤2ηA+2A(1+1−ρt1−ρ)∥ΩT∪TeM∗e∥2 (32) +2(1+δ2k)A21−ρt1−ρ((1+A+A22)∥ΩTCx∥2 +1√k∥ΩTCx∥1).

Using the D-RIP and the fact that is a frame we have that and this completes the proof.

Having the result for the adversarial noise case we turn to give a bound for the case where a distribution of the noise is given. We dwell on the white Gaussian noise case. For this type of noise we make use of the following lemma.

Lemma 4.16

If is a zero-mean white Gaussian noise with a variance then

 E[max|T|≤k∥ΩTM∗e∥22]≤4maxi∥Ω∗i∥22(1+δ1)klog(p)σ2. (33)

Proof: First notice that the -th entry in is Gaussian distributed random variable with zero-mean and variance . Denoting by a diagonal matrix such that , we have that each entry in is Gaussian distributed with variance . Therefore, using Theorem 2.4 from Giryes12RIP () we have . Using the D-RIP we have that . Since the matrix norm of a diagonal matrix is the maximal diagonal element we have that and this provides the desired result.

Theorem 4.17 (Denoising Performance of TDIHT with Frames)

Consider the problem and apply TDIHT with a constant step-size . Suppose that is an additive white Gaussian noise with a known variance (i.e. for each element , is a frame with frame constants such that and , and has the D-RIP with the dictionary and a constant . If (i.e. ), then after a finite number of iterations

 E∥∥x−^xt∥∥22≤32(1+δ1)B2A2(1+1−ρt1−ρ)2klog(p)σ2 (34) +8(1+δ2k)2A4(1−ρt1−ρ)2((1+A+A22)∥ΩTCx∥2 +1√k∥ΩTCx∥1)2,

implying that TDIHT has a denoising effect. For tight frames and for other frames .

Proof: Using the fact that for tight frames , we have using Lemma 4.16 that

 E∥ΩT∪TeM∗e∥22≤4B2(1+δ1)klog(p)σ2. (35)

Plugging this in a squared version of (32) (with ), using the fact that for any two constants we have , leads to the desired result.

4.1 Result’s Optimality

Notice that in all our conditions, the number of measurements we need is and it is not dependent explicitly on the intrinsic dimension of the signal. Intuitively, we would expect the minimal number of measurements to be rather a function of , where (refer to Nam12Cosparse (); Giryes14Greedy () for more details), and henceforth our recovery conditions seems to be sub-optimal.

Indeed, this is the case with the analysis -minimization problem Nam12Cosparse (). However, all the guarantees developed for feasible programs Candes11Compressed (); Giryes14Greedy (); Needell13Stable (); Liu12Compressed (); Kabanava13Analysis () require at least measurements. Apparently, such conditions are too demanding because , which can be much smaller than , does not play any role in them. However, it seems that it is impossible to robustly reconstruct the signal with fewer measurements Giryes14effective ().

The same argument may be used for the denoising bound we have for TDIHT (), saying that we would expect it to be . Interestingly, even the solution cannot achieve the latter bound but only a one which is a function of the cosparsity Giryes14effective ().

In this work we developed recovery guarantees for TDIHT with frames. These guarantees close a gap between relaxation-based techniques and greedy algorithms in the analysis framework, and extend the denoising guarantees of synthesis methods to analysis. It is interesting to ask whether these can be extended to other operators such as the 2D finite difference or to other methods such as AIHT or an analysis version of the Dantzig selector. The core idea in this work is the connection between the signal domain and the transform domain. We believe that the relationships used in this work can be developed further, leading to other new results and improving existing techniques. Some results in this direction appear in Giryes15Sampling ().

5 Experiments

In this section we repeat some of the experiments performed in Nam12Cosparse (); Giryes14Greedy (); Nam11GAPN (). We start with synthetic signals in the noiseless case. We test the performance of TDIHT with an adaptive changing step-size and compare it with AIHT, AHTP, ASP, ACoSaMP Giryes14Greedy (), -minimization elad07Analysis () and GAP Nam12Cosparse (). As there are several possibilities for setting the parameters in AIHT, AHTP, ASP and ACoSaMP we select the ones that provide the best performance according to Giryes14Greedy ().

We draw a phase transition diagram Donoho09countingfaces () for each of the algorithms. We use a random matrix and a random tight frame with and , where each entry in the matrices is drawn independently from the Gaussian distribution. We test different possible values of and different values of and for each pair repeat the experiment times. In each experiment we check whether we have a perfect reconstruction. White cells in the diagram denote a perfect reconstruction in all the experiments of the pair and black cells denotes total failure in the reconstruction. The values of and are selected according to the following formula:

 m=δd ℓ=d−ρm, (36)

where , the sampling rate, is the x-axis of the phase diagram and , the ratio between the cosparsity of the signal and the number of measurements, is the y-axis.

Figure 1 compares the reconstruction results of TDIHT with an adaptive changing step-size with the other algorithms. It should be observed that TDIHT outperforms AIHT and provides comparable results with AHTP, where TDIHT is more computationally efficient than both of them. Though its performance is inferior to the other algorithms, it should be noted that its running time is faster by one order of magnitude compared to all the other techniques. Therefore, in the cases that it succeeds, it should be favored over the other programs due to its better running time.

We turn now to test TDIHT for high dimensional signals. We test the performance of several MRI images: the Shepp-Logan phantom, FLAIT brain image, T2 Sagittal view of the lumbar spine and the circle of Willis. The first image is of size , while the other are of size . They are all presented in Fig. 2.

We focus on the recovery of these images from a few number of Fourier measurements. With set to be the undecimated Haar transform with one level of resolution (redundancy four) and its inverse transform, we succeed to recover the phantom image using only sampled radial lines, which is only of the measurements. This number is only slightly larger than the number needed for GAP, relaxed ASP (RASP) and Relaxed ACoSaMP (RACoSaMP) in Nam12Cosparse (); Giryes14Greedy (). The advantage of TDIHT over these methods is its low complexity as it requires applying only and its conjugate and and its inverse transform while in the other algorithms a high dimensional least squares minimization problem should be solved. Note also that for AIHT and RAHTP the number of radial lines needed for recovery is and for IHT (with the decimated Haar operator with one level of resolution) we need more than radial lines.

Exploring the noisy case, we perform a reconstruction using TDIHT of a noisy measurement of the phantom with radial lines and signal to noise ratio (SNR) of . Figure 3(a) presents the naive recovery from noisy image, the result of applying inverse Fourier transform on the measurements with zero-padding, and Fig. 3(b) presents the TDIHT reconstruction result. We get a peak SNR (PSNR) of . Note that RASP and GAPN require radial lines to get the same PSNR. However, this comes at the cost of higher computational complexity.

We perform similar experiments for the other images. Instead of uniformly sampling radial lines, we use pseudo-random variable-density undersampling patterns Lustig07Sparse ()333Unlike Lustig07Sparse () we perform our experiments with real (non-complex) images. The sampling patterns we use can be downloaded from http://www.eecs.berkeley.edu/mlustig/CS.html. presented in Fig. 4. We use var. dens. 1 with FLAIT brain and the circle of Willis and var. dens. 2 with lumber spine.

As the images at hand are only approximately cosparse, we set a threshold, in the noiseless case and in the noisy one, such that each element in the cosparse representation below it is considered as zero. These thresholds create a model error in the recovery but provide a larger cosparsity value that eases the recovery. Moreover, in the noisy case, it is natural to set such a threshold since anyway small representation coefficients are being covered by the noise.

Table 1 summarizes the recovery performance, in terms of PSNR, for the FLAIT brain, lumbar spine and circle of Willis images both for the noiseless and noisy cases. To evaluate the performance of TDIHT in the noiseless case, we compare its PSNR with the one of the model error (which we get by applying on the original image followed by thresholding and multiplication by ). For FLAIT brain and lumber spine we get errors, which are comparable, and even better for the latter, to their model errors and . Note that such high PSNRs are equivalent to mean squared errors of the order of and therefore we may say that we achieve a perfect recovery in these cases. This is not the case for the circle of Willis, in which the error is worse than the model error . Note though that for this image, as well as for the other two images, we get better recovery error than the more sophisticated methods RASP and GAP. Even in the noisy case, the quality we get with TDIHT is better than the one we have using those methods.

Figs. 5 presents the reconstruction outcome of TDIHT in the noisy case. To illustrate better the recovery gain, we present a zoom in on a part of the image and compare it to the naive recovery. The improvement can be seen clearly in all the three images.

6 Conclusion

This paper presents a new algorithm, the transform domain IHT (TDIHT), for the cosparse analysis framework. In contrast to previous algorithms, TDIHT can be applied efficiently in high dimensional problems due to the fact that it requires applying only the measurement matrix , its transpose , the operator and its inverse together with point-wise operations. The proposed algorithm is shown to provide stable recovery given that has the D-RIP and that