Stable and robust sampling strategies for compressive imaging

Stable and robust sampling strategies for compressive imaging

Felix Krahmer and Rachel Ward
Abstract

In many signal processing applications, one wishes to acquire images that are sparse in transform domains such as spatial finite differences or wavelets using frequency domain samples. For such applications, overwhelming empirical evidence suggests that superior image reconstruction can be obtained through variable density sampling strategies that concentrate on lower frequencies. The wavelet and Fourier transform domains are not incoherent because low-order wavelets and low-order frequencies are correlated, so compressive sensing theory does not immediately imply sampling strategies and reconstruction guarantees. In this paper we turn to a more refined notion of coherence – the so-called local coherence – measuring for each sensing vector separately how correlated it is to the sparsity basis. For Fourier measurements and Haar wavelet sparsity, the local coherence can be controlled and bounded explicitly, so for matrices comprised of frequencies sampled from a suitable inverse square power-law density, we can prove the restricted isometry property with near-optimal embedding dimensions. Consequently, the variable-density sampling strategy we provide allows for image reconstructions that are stable to sparsity defects and robust to measurement noise. Our results cover both reconstruction by -minimization and by total variation minimization. The local coherence framework developed in this paper should be of independent interest in sparse recovery problems more generally, as it implies that for optimal sparse recovery results, it suffices to have bounded average coherence from sensing basis to sparsity basis – as opposed to bounded maximal coherence – as long as the sampling strategy is adapted accordingly.

1 Introduction

The measurement process in a wide range of imaging applications such as radar, sonar, astronomy, and computer tomography, can be modeled – after appropriate approximation and discretization – as taking samples from weighted discrete Fourier transforms [19]. Similarly, it is well known in the medical imaging literature that the measurements taken in Magnetic Resonance Imaging (MRI) are well modeled as Fourier coefficients of the desired image. Within all of these scenarios, one seeks strategies for taking frequency domain measurements so as to reduce the number of measurements without degrading the quality of image reconstruction. A central feature of natural images that can be exploited in this process is that they allow for approximately sparse representation in suitable bases or dictionaries.

The theory of compressive sensing, as introduced in [17, 11], fits comfortably into this set-up: its key observation is that signals which allow for a sparse or approximately sparse representation can be recovered from relatively few linear measurements via convex approximation, provided these measurements are sufficiently incoherent with the basis in which the signal is sparse.

1.1 Imaging with partial frequency measurements

Much work in compressive sensing has focused on the setting of imaging with frequency domain measurements [11, 10, 39] and, in particular, towards accelerating the MRI measurement process [27, 26]. For images that have sparse representation in the canonical basis, the incoherence between Fourier and canonical bases implies that uniformly subsampled discrete Fourier transform measurements can be used to achieve near-optimal oracle reconstruction bounds: up to logarithmic factors in the discretization size, any image can be approximated from such frequency measurements up to the error that would be incurred if the image were first acquired in full, and then compressed by setting all but the largest-magnitude pixels to zero [39, 36, 10, 12]. Compressive sensing recovery guarantees hold more generally subject to incoherence between sampling and sparsity transform domains. Unfortunately, natural images are generally not directly sparse in the standard basis, but rather with respect to transform domains more closely resembling wavelet bases. As low-scale wavelets are highly correlated (coherent) with low frequencies, sampling theorems for compressive imaging with partial Fourier transform measurements have remained elusive.

A number of empirical studies, including the very first papers on compressive sensing MRI [27, 26], suggest that better image restoration is possible by subsampling frequency measurements from variable densities preferring low frequencies to high frequencies. In fact, variable-density MRI sampling had been proposed previously in a number of works outside the context of compressive sensing, although there did not seem to be a consensus on an optimal density [28, 41, 43, 33, 22]. For a thorough empirical comparison of different densities from the compressive sensing perspective, see [44].

Guided by these observations, the authors of [35] estimated the coherence between each element of the sensing basis with elements of the sparsity basis separately as a means to derive optimal sampling strategies in the context of compressive sensing MRI. In particular, they observe that incoherence-based results in compressive sensing imply exact recovery results for more general systems if one samples a row from the measurement basis proportionally to its squared maximal correlation with the sparsity-inducing basis. For given problem dimensions, they find the optimal distribution as the solution to a convex problem. In [34], the approach of optimizing the sampling distribution is combined with premodulation by a chirp, which by itself is another measure to reduce the coherence [5, 25]. A similar variable-density analysis already appeared in [38] in the context of sampling strategies and reconstruction guarantees for functions with sparse orthogonal polynomial expansions and will also be the guiding strategy of this paper (cf. Section 5). After the submission of this paper, the idea of variable-density sampling has been extended to the context of block sampling [6], motivated by practical limitations of MRI hardware.

1.2 Contributions of this paper

In this paper we derive near-optimal reconstruction bounds for a particular type of variable-density subsampled discrete Fourier transform for both wavelet sparsity and gradient sparsity models. More precisely, up to logarithmic factors in the discretization size, any image can be approximated from such measurements up to the error that would be incurred if the wavelet transform or the gradient of the image, respectively, were first computed in full, and then compressed by setting all but the largest-magnitude coefficients to zero. Note that the reconstruction results that have been derived for uniformly-subsampled frequency measurements [11, 18] only provide such guarantees for images which are exactly sparse.

A major role in determining an appropriate sampling density will be played by the local coherence of the sensing basis with respect to the sparsity basis, as introduced in Section 5. Consequently, an important ingredient of our analysis is Theorem 6.2, which provides frequency-dependent bounds on inner products between rows of the orthonormal discrete Fourier transform and rows of the orthonormal discrete Haar wavelet transform. In particular, the maximal correlation between a fixed row in the discrete Fourier transform and any row of the discrete Haar wavelet transform decreases according to an inverse power law of the frequency, and decays sufficiently quickly that the sum of squared maximal correlations scales only logarithmically with the discretization size . This implies, according to the techniques used in [38, 37, 7], that subsampling rows of the discrete Fourier matrix proportionally to the squared correlation results in a matrix that has the restricted isometry property of near-optimal order subject to appropriate rescaling of the rows.

For reconstruction, total variation minimization [40, 8, 32, 42, 14, 13] will be our algorithm of choice. In the papers [30] and [31], total variation minimization was shown to provide stable and robust image reconstruction provided that the sensing matrix is incoherent with the Haar wavelet basis. Following the approach of [30], we prove that from variable density frequency samples, total variation minimization can be used for stable image recovery guarantees.

1.3 Outline

The remainder of this paper is organized as follows. Preliminary notation is introduced in Section 2. The main results of this paper are contained in Section 3. Section 4 reviews compressive sensing theory and Section 5 presents recent results on sampling strategies for coherent systems. The main results on the coherence between Fourier and Haar wavelet bases are provided in Section 6, and proofs of the main results are contained in Section 7. Section 8 illustrates our results by numerical examples. We conclude with a summary and a discussion of open problems in Section 9.

2 Preliminaries

2.1 Notation

In this paper, we consider discrete images, that is, blocks of pixels, and represent them as discrete functions . We write to denote any particular pixel value and for a positive integer , we denote the set by . By we denote the Hadamard product, i.e., the image resulting from pointwise products of the pixel values, . On the space of such images, the vector norm is given by , and . The inner product inducing the vector norm is , where denotes the complex conjugate of number . By an abuse of notation, the “-norm” counts the number of non-zero entries of .

An image is called -sparse if . The error of best -term approximation of an image in is defined as

Clearly, if is -sparse. Informally, is called compressible if decays quickly as increases.

For two nonnegative functions and on the real line, we write (or ) if there exists a constant such that (or , respectively) for all .

The discrete directional derivatives of are defined pixel-wise as

(2.1)
(2.2)

The discrete gradient transform is defined in terms of the directional derivatives via

(2.3)

where the directional derivatives are extended to by adding zero entries. The total variation semi-norm is the norm of the image gradient,

(2.4)

Here we note that our definition is the anisotropic version of the total variation semi-norm. The isotropic total variation semi-norm becomes the sum of terms

The isotropic and anisotropic total variation semi-norms are thus equivalent up to a factor of .

2.2 Bases for sparse representation and measurements

The Haar wavelet basis is a simple basis which allows for good sparse approximations of natural images. We will work primarily in two dimensions, but first introduce the univariate Haar wavelet basis as it will nevertheless serve as a building block for higher dimensional bases.

Definition 2.1 (Univariate Haar wavelet basis).

The univariate discrete Haar wavelet system is an orthonormal basis of consisting of the constant function , the step function given by

along with the dyadic step functions

for satisfying and .

To define the bivariate Haar wavelet basis of , we extend the univariate system by the window functions

The bivariate Haar wavelet system can now be defined via tensor products of functions in the extended univariate system. In order for the system to form an orthonormal basis of only tensor products of univariate functions with the same scaling parameter are included.

Definition 2.2 (Bivariate Haar wavelet basis).

The bivariate Haar system of consists of the constant function given by

and the functions with indices in the range , , and
given by

We denote by the bivariate Haar transform and, by a slight abuse of notation, also the unitary matrix representing this linear map.

We will also work with discrete Fourier measurements.

Definition 2.3 (Discrete Fourier basis).

Let . The one-dimensional discrete Fourier system is an orthonormal basis of consisting of the vectors

(2.5)

indexed by discrete frequencies in the range . The two-dimensional discrete Fourier basis of is just a tensor product of one-dimensional bases, namely

(2.6)

indexed by discrete frequencies in the range

We denote by the two-dimensional discrete Fourier transform and, again, also the associated unitary matrix. Finally, we denote by its restriction to a set of frequencies .

3 Main results

Our main results say that appropriate variable density subsampling of the discrete Fourier transform will with high probability produce a set of measurements admitting stable image reconstruction via total variation minimization or -minimization.

While our recovery guarantees are robust to measurement noise, our guarantees are based on a weighted -norm such that high-frequency measurements have higher sensitivity to noise. This noise model results from the proof; empirical studies, however, suggest that the more standard uniform noise model yields superior performance. We refer the reader to Section 8 for details. Our first result concerns stable recovery guarantees for total variation minimization.

Theorem 3.1.

Fix integers and such that and

(3.1)

Select frequencies i.i.d. according to

Prob
(3.2)

where is an absolute constant and is chosen such that is a probability distribution.
Consider the weight vector with , and assume that the noise vector satisfies , for some . Then with probability exceeding , the following holds for all images :

Given noisy partial Fourier measurements , the estimation

(3.3)

approximates up to the noise level and best -term approximation error of its gradient:

(3.4)

Disregarding measurement noise, the error rate provided in Theorem 3.1 (and also the one in Theorem 3.2 below) is optimal up to logarithmic factors in the ambient image dimension. This follows from classical results about the Gel’fand width of the -ball due to Kashin [24] and Garnaev–Gluskin [20]. As mentioned above, our noise model is non-standard, so the behavior for noisy signals is not covered by these lower bounds.

Our second result focuses on stable image reconstruction by -minimization in the Haar wavelet transform domain. It is a direct consequence of applying the Fourier-wavelet incoherence estimates derived in Theorem 5.2 to Theorem 6.2.

Theorem 3.2.

Fix integers and such that and

(3.5)

Select frequencies i.i.d. according to the density as in (3.1) and assume again that the noise vector satisfies the weighted -constraint with weight and noise level as in Theorem 3.1. Then with probability exceeding , the following holds for all images : Given noisy measurements , the estimation

(3.6)

approximates up to the noise level and best -term approximation error in the bivariate Haar basis:

(3.7)

Even though the required number of samples in Theorem 3.2 is smaller than the number of samples required for the total variation minimization guarantees in Theorem 3.1, one finds that total variation minimization requires fewer measurements empirically. This may be due to the fact that the gradient of a natural image has stronger sparsity than its Haar wavelet representation. For this reason we focus on total variation minimization. Independent of this observation, we strongly suspect that the additional logarithmic factors in the number of measurements stated in Theorem 3.1 are an artifact of the proof, and that it should be possible to strengthen the result to obtain a similar recovery guarantee with the number of measurements as in Theorem 3.2. Moreover, one should be able to reduce the number of necessary log-factors with a RIP-less approach [9]. These are important follow-up questions, as the current number of logarithmic factors may limit the direct applicability of our results to practical problems.

4 Compressive sensing background

4.1 The restricted isometry property

Under certain assumptions on the matrix and the sparsity level , any -sparse can be recovered from as the solution to the optimization problem:

One of the fundamental results in compressive sensing is that this optimization problem, which is NP-hard in general, can be relaxed to an -minimization problem if one asks that the matrix restricted to any subset of columns be well-conditioned. This property is quantified via the so-called restricted isometry property as introduced in [12]:

Definition 4.1 (Restricted isometry property).

Let . For , the restricted isometry constant associated to is the smallest number for which

(4.1)

for all -sparse vectors . If , one says that has the restricted isometry property (RIP) of order and level .

The restricted isometry property ensures stability: not only sparse vectors, but also compressible vectors can be recovered from the measurements via -minimization. It also ensures robustness to measurement errors.

Proposition 4.2 (Sparse recovery for RIP matrices).

Assume that the restricted isometry constant of satisfies . Let and assume noisy measurements with . Then

(4.2)

satisfies

(4.3)

In particular, reconstruction is exact, , if is -sparse and .

There are stronger versions of this result which allow for weaker constraints on the restricted isometry constant [29]. However, our version is a corollary of the following proposition, which appears as Proposition 2 in [30], and generalizes the results from [11]. This proposition will also play an important role in the proof of our main results.

Proposition 4.3 (Stable recovery for RIP matrices, [30]).

Suppose that and satisfies the restricted isometry property of order at least and level , and suppose that satisfies a tube constraint

Suppose further that for a subset of cardinality , the signal satisfies a cone constraint

(4.4)

Then

(4.5)

Indeed, Proposition 4.2 follows from Proposition 4.3 by noting that the minimality of implies a cone constraint for the residual over the support of the largest-magnitude entries of . The proof of Proposition 4.3 can be found in [30].

4.2 Bounded orthonormal systems

While the strongest known results on the restricted isometry property concern random matrices with independent entries such as Gaussian or Bernoulli, a scenario that has proven particularly useful for applications is that of structured random matrices with rows chosen from a basis incoherent to the basis inducing sparsity (see below for a detailed discussion on the concept of incoherence). The resulting sampling schemes correspond to bounded orthonormal systems, and such systems have been extensively studied in the compressive sensing literature (see [36] for an expository article including many references).

Definition 4.4 (Bounded orthonormal system).

Consider a set equipped with probability measure .

  • A set of functions is called an orthonormal system with respect to if , where denotes the Kronecker delta.

  • An orthonormal system is said to be bounded with bound if .

For example, the basis of complex exponentials forms a bounded orthonormal system with optimally small constant with respect to the uniform measure on , and -dimensional tensor products of complex exponentials form bounded orthonormal systems with respect to the uniform measure on the set . A random sample of an orthonormal system is the vector , where is a random variable drawn according to the associated distribution . Any matrix whose rows are independent random samples of a bounded orthonormal system, such as the uniformly subsampled discrete Fourier matrix, will have the restricted isometry property:

Proposition 4.5 (RIP for bounded orthonormal systems, [36]).

Consider the matrix whose rows are independent random samples of an orthonormal system , with bound and orthogonalization measure . If

(4.6)

for some 111For matrices consisting of uniformly subsampled rows of the discrete Fourier matrix, it has been shown in [15] that this constraint is not necessary., then with probability at least the restricted isometry constant of satisfies .

An important special case of a bounded orthonormal system arises by sampling a function with sparse representation in one basis using measurements from a different, incoherent basis. The mutual coherence between a unitary matrix with rows and a unitary matrix with rows is given by

The smallest possible mutual coherence is , as realized by the discrete Fourier matrix and the identity matrix. We call two orthonormal bases and mutually incoherent if or . In this case, the rows of the basis constitute a bounded orthonormal system with respect to the uniform measure. Propositions 4.5 and 4.2 then imply that, with high probability, signals of the form for sparse can be stably reconstructed from uniformly subsampled measurements , as has the restricted isometry property.

Corollary 4.6 (RIP for incoherent systems, [39]).

Consider orthonormal bases with mutual coherence bounded by Fix and integers , and such that and

(4.7)

Consider the matrix formed by uniformly subsampling rows i.i.d. from the the matrix . Then with probability at least the restricted isometry constant of satisfies .

5 Local coherence

The sparse recovery results in Corollary 4.6 based on mutual coherence do not take advantage of the full range of applicability of bounded orthonormal systems. As argued in [38], Proposition 4.5 implies comparable sparse recovery guarantees for a much wider class of sampling/sparsity bases through preconditioning resampled systems. In the following, we formalize this approach through the notion of local coherence.

Definition 5.1 (Local coherence).

The local coherence of an orthonormal basis of with respect to the orthonormal basis of is the function defined coordinate-wise by

The following result shows that we can reduce the number of measurements in (4.6) by replacing the bound on the coherence in (4.7) by a bound on the -norm of the local coherence, provided we sample rows from appropriately using the local coherence function. It can be seen as a direct finite-dimensional analog to Theorem 2.1 in [38], but for completeness, we include a short self-contained proof.

Theorem 5.2.

Let and be orthonormal bases of . Assume the local coherence of with respect to is pointwise bounded by the function , that is . Let , suppose

(5.1)

and choose (possibly not distinct) indices i.i.d. from the probability measure on given by

Consider the matrix with entries

(5.2)

and consider the diagonal matrix with . Then with probability at least the restricted isometry constant of the preconditioned matrix satisfies .

Proof.

Note that as the matrix with rows is unitary, the vectors , , form an orthonormal system with respect to the uniform measure on as well. We show that the system is an orthonormal system with respect to in the sense of Definition 4.4. Indeed,

(5.3)
(5.4)

hence the form an orthonormal system with respect to . Noting that and hence this system is bounded with bound , the result follows from Proposition 4.5. ∎

Remark 5.3.

Note that the local coherence not only appears in the embedding dimension , but also in the sampling measure. Hence a priori, one cannot guarantee the optimal embedding dimension if one only has suboptimal bounds for the local coherence. That is why the sampling measure in Theorem 5.2 is defined via the (known) upper bounds and rather than the (usually unknown) exact values and , showing that suboptimal bounds still lead to meaningful bounds on the embedding dimension.

Remark 5.4.

For (as in Corollary 4.6), one has , so Theorem 5.2 is a direct generalization of Corollary 4.6. As one has equality if and only if is constant, however, Theorem 5.2 will be stronger in most cases.

6 Local coherence estimates for frequencies and wavelets

Due to the tensor product structure of both of these bases, the two-dimensional local coherence of the two-dimensional Fourier basis with respect to bivariate Haar wavelets will follow by first bounding the local coherence of the one-dimensional Fourier basis with respect to the set of univariate building block functions of the bivariate Haar basis.

Lemma 6.1.

Fix with . For the space , the one-dimensional Fourier basis vectors , , and the one-dimensional Haar wavelet basis building blocks , , satisfy the incoherence relation

(6.1)
Proof.

We estimate

(6.2)
(6.3)
(6.4)
(6.5)

To estimate this expression, we note that

(6.6)

and distinguish two cases:

If , we bound and apply (6.6) to obtain

(6.7)
(6.8)

For , and hence , we note that and bound, again using (6.6),

(6.9)
(6.10)

This lemma enables us to derive the following incoherence estimates for the bivariate case.

Theorem 6.2.

Let for . Then the local coherence of the orthonormal two-dimensional Fourier basis with respect to the orthonormal bivariate Haar wavelet basis in , as defined in (2.3) and (2.2), respectively, is bounded by

(6.11)
(6.12)

and one has .

Proof.

First note that the bivariate Fourier coefficients decompose into the product of univariate Fourier coefficients:

(6.13)

For , the factors can be bounded using Lemma 6.1, which, for , yields the bound

Next we consider the case where either or ; w.l.o.g., assume . We use that in one dimension, we have as well as . So we only need to consider the case that and hence . Thus we obtain

In both cases, we obtain . The bound follows directly from the Cauchy-Schwarz inequality. We conclude .

For the -bound, we use an integral estimate,

(6.14)
(6.15)
(6.16)

where we used that . Taking square root implies the result. ∎

Remark 6.3.

We believe that the factor of which appears in the proposition is due to lack of smoothness for the Haar wavelets. Hence we hope this factor can be removed by considering smoother wavelets.

As the infimum of a strictly decreasing function and a strictly increasing function is bounded uniformly by its value at the intersection point of the two functions, Lemma 6.1 also gives frequency-dependent bounds for the local coherence between frequencies and wavelets in the univariate setting.

Corollary 6.4.

Fix with . For the space , the one-dimensional Fourier basis vectors , , and the one-dimensional Haar wavelets satisfy the incoherence relation

(6.17)

7 Recovery guarantees

In this section we present proofs of the main results.

7.1 Proof of Theorem 3.2

The proof of Theorem 3.2 concerning recovery from -minimization in the bivariate Haar transform domain follows by combining the local incoherence estimate of Theorem 6.2 with the local coherence based reconstruction guarantees of Theorem 5.1. Under the conditions of Theorem 5.1, the stated recovery results follow directly from Theorem 4.2. The weighted -norm in the noise model is a consequence of the preconditioning.

7.2 Preliminary lemmas for the proof of Theorem 3.1

The proof of Theorem 3.1 proceeds along similar lines to that of Theorem 3.2, but we need a few more preliminary results relating the bivariate Haar transform to the gradient transform. The first result, Proposition 7.1, is derived from a more general statement involving the continuous bivariate Haar system and the bounded variation seminorm from [16].

Proposition 7.1.

Suppose has mean zero, and suppose its bivariate Haar transform is arranged such that is the -th largest-magnitude coefficient. Then there is a universal constant such that for all ,

See [30] for a derivation of Proposition 7.1 from Theorem of [16].

We will also need two lemmas about the bivariate Haar system.

Lemma 7.2.

Let . For any indices and there are at most bivariate Haar wavelets satisfying .

Proof.

The lemma follows by showing that for fixed dyadic scale in , there are at most 6 Haar wavelets with edge length satisfying . If the edge between and coincides with a dyadic edge at scale , then the 3 wavelets supported on each of the bordering dyadic squares transition from being zero to nonzero along this edge. On the other hand, if coincides with a dyadic edge at dyadic scale but does not coincide with a dyadic edge at scale , then 2 of the 3 wavelets supported on the dyadic square having as a center edge can satisfy the stated bound. ∎

Lemma 7.3.
Proof.

is supported on a dyadic square of side-length , and on its support, its absolute value is constant, . Thus at the four boundary edges of the square, there is a jump of , at the (at most two) dyadic edges in the middle of the square where the sign changes there is a jump of . Hence . ∎

We are now ready to prove Theorem 3.1.

7.3 Proof of Theorem 3.1

Recall that denotes the bivariate Haar transformation Let be the Haar coefficient corresponding to the constant wavelet, and let , for , denote the -st largest-magnitude Haar coefficient among the remaining, and let denote the associated Haar wavelet. We use this slightly modified ordering because Proposition 7.1 applies only to mean-zero images.

Let be the diagonal matrix encoding the weights in the noise model, i.e., , where, for as in Theorem 6.2, . Note that .

By Theorem 5.2 combined with the bivariate incoherence estimates from Theorem