An analysis of blocks sampling strategies in compressed sensing
Compressed sensing is a theory which guarantees the exact recovery of sparse signals from a small number of linear projections. The sampling schemes suggested by current compressed sensing theories are often of little practical relevance since they cannot be implemented on real acquisition systems. In this paper, we study a new random sampling approach that consists in projecting the signal over blocks of sensing vectors. A typical example is the case of blocks made of horizontal lines in the 2D Fourier plane. We provide theoretical results on the number of blocks that are required for exact sparse signal reconstruction. This number depends on two properties named intra and inter-support block coherence. We then show through a series of examples including Gaussian measurements, isolated measurements or blocks in time-frequency bases, that the main result is sharp in the sense that the minimum amount of blocks necessary to reconstruct sparse signals cannot be improved up to a multiplicative logarithmic factor. The proposed results provide a good insight on the possibilities and limits of block compressed sensing in imaging devices such as magnetic resonance imaging, radio-interferometry or ultra-sound imaging.
Key-words: Compressed Sensing, blocks of measurements, MRI, exact recovery, minimization.
Compressive Sensing is a new sampling theory that guarantees accurate recovery of signals from a small number of linear projections using three ingredients listed below:
Sparsity: the signals to reconstruct should be sparse, meaning that they can be represented as a linear combination of a small number of atoms in a well-chosen basis. A vector is said to be -sparse if its number of non-zero entries is equal to .
Nonlinear reconstruction: a key feature ensuring recovery is the use of non linear reconstruction algorithms. For instance, in the seminal papers [Don06, CRT06a], it is suggested to reconstruct via the following -minimization problem:
where () is a sensing matrix, represents the measurements vector, and for all .
Incoherence of the sensing matrix: the matrix should satisfy an incoherence property described later. If is perfectly incoherent (e.g. random Gaussian measurements or Fourier coefficients drawn uniformly at random) then it can be shown that only measurements are sufficient to perfectly reconstruct the -sparse vector .
The construction of good sensing matrices is a keystone for the successful application of compressed sensing. The use of matrices with independent random entries has been popularized in the early papers [CRT06b, Can08]. Such sensing matrices have limited practical interest since they can hardly be stored on computers or implemented on practical systems. More recently, it has been shown that partial random circulant matrices [FR13, PVGW12, RRT12] may be used in the compressed sensing context. With this structure a matrix-vector product can be efficiently implemented on a computer by convolving the signal with a random pulse and by subsampling the result. This technique can also be implemented on real systems such as magnetic resonance imaging (MRI) or radio-interferometry [PMG12]. However this demands to modify the acquisition device physics, which is often uneasy and costly. Another way to proceed consists in drawing sampling locations among possible ones, see [CRT06a, RV08]. This setting, which is the most widespread in applications, is a promising avenue to implement compressed sensing strategies on nearly all existing devices. Its efficiency depends on the incoherence between the acquisition and sparsity bases [DH01, CR07]. It is successfully used in radio interferometry [WJP09], digital holography [MAAOM10] or MRI [LDP07] where the measurements are Fourier coefficients.
To the best of our knowledge, all current compressed sensing theories suggest that the measurements should be drawn independently at random. This is impossible for most acquisition devices which have specific acquisition constraints. A typical example is MRI, where the samples should lie along continuous curves in the Fourier domain (see e.g. [Wri97, LKP08]). As a result, most current implementations of compressed sensing do not comply with theory. For instance, in the seminal work on MRI [LDP07], the authors propose to sample parallel lines of the Fourier domain (see Figure 1 and 2).
In this paper, we aim at bridging the gap between theory and practice. We consider a sensing matrix that is constructed by stacking blocks of measurements and not just isolated measurements. In the proposed formalism, the blocks can be nearly arbitrary random matrices. For instance, our main result covers the case of blocks made of groups of rows of a deterministic sensing matrix (e.g. lines in the Fourier domain) or blocks with random entries (e.g. Gaussian blocks). We study the problem of exact non-uniform sparse recovery in a noise-free setting. This sampling strategy raises various questions. How many blocks of measurements are needed to ensure exact reconstruction? Is the required number of blocks compatible with faster acquisition?
Our first contribution is to extend the standard compressed sensing theorems to the case of blocks of measurements. We then show that our result is sharp in a few practical examples and extends the best currently known results in compressed sensing. We also prove that in many cases, imposing a block structure has a dramatic effect on the recovery guarantees since it strongly impoverishes the variety of admissible sampling patterns. Overall, we believe that the presented results give a good theoretical basis to the use of block compressed sensing and show the limits to this setting. This work also provides some insight on many currently used sampling patterns in MRI, echography, computed tomography scanners,…
Outline of the paper
The remaining of the paper is organized as follows. In Section 2, we first describe the notation and the main assumptions necessary to derive a general theory on blocks of measurements. We present the main result of this paper about blocks-sampling acquisition in Section 3. In Section 4, we discuss the sharpness of our results. First, we show that our approach provides the same guarantees that existing results when using isolated measurements (either Gaussian or randomly extracted from deterministic transforms). We conclude on a pathological example to show sharpness in the case of blocks sampled from separable transforms.
Let be a subset of of cardinality . We denote by the matrix with columns where denotes the -th vector of the canonical basis of . For given and , we also define , and .
2.2 Main assumptions
Recall that we consider the following -minimization problem:
where is the sensing matrix, is the measurements vector, is the unknown vector to be recovered. In this paper, we assume that the sensing matrix can be written as
where are i.i.d. copies of a random matrix , satisfying
where is the identity matrix. This condition is the extension of the isotropy property described in [CP11] in a blocks-constrained acquisition setting.
In most cases studied in this paper, the random matrix is assumed to be of fixed size with . This assumption is however not necessary. The number of blocks of measurements is denoted , while the overall number of measurements is denoted . When has a fixed size , .
The following quantities will be shown to play a key role to ensure sparse recovery in the sequel.
Let be a set of cardinality . We denote by the smallest positive reals such that the following bounds hold either deterministically or stochastically (in a sense discussed later)
The quantities introduced in Definition 2.1 can be interpreted as follows. The number can be seen as an intra-support block coherence, whereas and are related to the inter-support block coherence, that is the coherence between blocks restricted to the support of the signal and blocks restricted to the complementary of this support. Note that the factors and involved in the definition of and ensure homogeneity between all of these quantities.
2.3 Application examples
The number of applications of the proposed setting is very large. For instance, it encompasses those proposed in [CP11]. Let us provide a few examples of new applications below.
2.3.1 Partition of orthogonal transforms
Let denote an orthogonal transform. Blocks can be constructed by partitioning the rows from :
where stands for the disjoint union. This case is the one studied in [PDG14].
Let be a discrete probability distribution on the set of integers . A random sensing matrix can be constructed by stacking i.i.d. copies of the random matrix defined by for all . Note that the normalization by ensures that the isotropy condition is verified.
2.3.2 Overlapping blocks issued from orthogonal transforms
In the last example, we concentrated on partitions, i.e. non-overlapping blocks of measurements. The case of overlapping blocks can also be handled. To do so, define the blocks as follows: where , and denotes the multiplicity of the row , i.e. the number of appearances of this row in different blocks. This renormalization is sufficient to ensure where is defined similarly to the previous example. See Appendix D for an illustration of this setting in the case of 2D Fourier measurements.
2.3.3 Blocks issued from tight or continuous frames
2.3.4 Random blocks
In the previous examples, the blocks were predefined and extracted from deterministic matrices or systems. The proposed theory also applies to random blocks. For instance, one could consider blocks with i.i.d. Gaussian entries since these blocks satisfy the isotropy condition (4). This example is of little practical relevance since stacking random Gaussian matrices produces a random Gaussian matrix that can be analyzed with standard compressed sensing approaches. It however presents a theoretical interest in order to show the sharpness of our main result. Another example with potential interest is that of blocks generated randomly using random walks over the acquisition space [CCKW14].
3 Main result
Our main result reads as follows.
Let be a set of indices of cardinality and suppose that is an -sparse vector supported on . Fix . Suppose that the sampling matrix is constructed as in (3), and that the isotropy condition (4) holds. Suppose that the bounds (5) hold deterministically. If the number of blocks satisfies the following inequality
then is the unique solution of (2) with probability at least . The constant can be taken equal to .
The proof of Theorem 3.1 is detailed in Section C.1. It is based on the so-called golfing scheme introduced in [Gro11] for matrix completion, and adapted by [CP11] for compressed sensing from isolated measurements. Note that Theorem 3.1 is a non uniform result in the sense that reconstruction holds for a given support and not for all -sparse signals. It is likely that uniform results could be derived by using the so-called Restricted Isometry Property. However, this strong property is usually harder to prove and leads to narrower classes of admissible matrices and to larger number of required measurements.
Remark 3.2 (The case of stochastic bounds).
In Definition 2.1, we say that the bounds deterministically hold if the inequalities (5) are satisfied almost surely. This assumption is convenient to simplify the proof of Theorem 3.1. Obviously, it is not satisfied in the setting where the entries of are i.i.d. Gaussian variables. To encompass such cases, the bounds in Definition 2.1 could stochastically hold, meaning that the inequalities (5) are satisfied with large probability. The proof of the main result can be modified by conditioning the deviation inequalities in the Lemmas of Appendix C.1 to the event that the bounds in Definition 2.1 hold. Therefore, even though we do not provide a detailed proof, the lower bound on the required number of blocks in Theorem 3.1 remains accurate. Hence, we will propose in Section 4.2 some estimates of the quantities (5) in the case of Gaussian measurements.
The lower bound on the number of blocks of measurements in Theorem 3.1 depends on and thus on the support of the vector to reconstruct. In the usual compressed sensing framework, the matrix is constructed by stacking realizations of a random vector . The best known results state that isolated measurements are sufficient to reconstruct with high probability. The coherence is the smallest number such that . The quantity in Theorem 3.1 therefore replaces the standard factor . The coherence is usually much simpler to evaluate than which depends on three properties of the random matrix : the intra-support coherence and the inter-support coherences and . As will be seen in Section 4, it is important to keep all those quantities in order to obtain tight reconstruction results. Nevertheless, a rough upper bound of , reminiscent of the coherence, can be used as shown in Proposition 3.3.
Let be a subset of of cardinality . Assume that the following inequality holds either deterministically or stochastically
with . Then
The proof of Proposition 3.3 is given in Appendix C.2. The bound given in Proposition 3.3 is an upper bound on that should not be considered as optimal. For instance, for Gaussian measurements, it is important to precisely evaluate the three quantities .
Remark 3.4 (Noisy setting).
In this paper, we concentrate on a noiseless setting. It is likely that noise can be accounted for mimicking the proofs in [CP11] for instance.
4 Sharpness of the main result
In this section, we discuss the sharpness of the lower bound given by Theorem 3.1 by comparing it to the best known results in compressed sensing.
4.1 The case of isolated measurements
First, let us show that our result matches the standard setting where the blocks are made of only one row, that is . This is the standard compressed sensing framework considered e.g. by [CRT06a, FR13, CP11]. Consider that is a deterministic matrix, and that the sensing matrix is constructed by drawing rows of according to some probability distribution , i.e. one can write as follows:
where the ’s are i.i.d. random variables taking their value in with probability . According to Proposition 3.3, for a support of cardinality the following upper bound holds:
Therefore, according to Theorem 3.1, it is sufficient that
In addition, choosing in order to minimize the right-hand side of (7) leads to
which in turn leads to the following required number of measurements:
Contrarily to common belief, the probability distribution minimizing the required number of measurements is not the uniform one, but the one depending on the -norm of the considered row. Let us highlight this fact. Consider that , where denotes the 1D Fourier matrix of size . If a uniform drawing distribution is chosen, the right hand side of (7) is . This shows that uniform random sampling is not interesting for this sensing matrix. Note that the coherence of is equal to , which is the worst possible case for orthogonal matrices. Nevertheless, if the optimal drawing distribution is chosen, i.e.
then, the right hand side of (7) becomes . Using this sampling strategy, compressed sensing therefore remains relevant. Furthermore, note that the latter bound could be easily reduced by a factor 2 by systematically sampling the location associated to the first row of , and uniformly picking the remaining isolated measurements. Similar remarks were formulated in [KW14] which promote non-uniform sampling strategies in compressed sensing.
4.2 The case of Gaussian measurements
We suppose that the entries of are i.i.d. Gaussian random variables with zero-mean and variance . This assumption on the variance ensures that the isotropy condition (4) is satisfied. The matrix constructed by concatenating such blocks is also a Gaussian random matrix with i.i.d. entries and does not differ from an acquisition setting based on isolated measurements. Therefore, if Theorem 3.1 is sharp, one can expect that measurements are enough to perfectly reconstruct . In what follows, we show that this is indeed the case.
Assume that the entries of are i.i.d. Gaussian random variables with zero-mean and variance . Then, . Therefore, Gaussian blocks are sufficient to ensure perfect reconstruction with high probability.
This is similar to an acquisition based on isolated Gaussian measurements and this is optimal up to a logarithmic factor, see [Don06].
4.3 The case of separable transforms
In this section, we consider -dimensional deterministic transforms obtained as Kronecker products of orthogonal one-dimensional transforms. This setting is widespread in applications. Indeed, separable transforms include -dimensional Fourier transforms met in astronomy [BSO08] or products of Fourier and wavelet transforms met in MRI [LDSP08] or radio-interferometry [WJP09]. A specific scenario encountered in many settings is that of blocks made of lines in the acquisition space. For instance, parallel lines in the 3D Fourier space are used in [LDP07]. The authors propose to undersample the 2D - plane and sample continuously along the orthogonal direction (see Figure 2).
The remaining of this Section is as follows. We first introduce the notation. We then provide theoretical results about the minimal amount of blocks necessary to reconstruct all -sparse vectors. Next, we show that Theorem 3.1 is sharp in this setting since the amount of blocks required to reconstruct -sparse vectors coincides with the minimal amount. Finally, we perform a comparison with the results in [PDG14].
Let denote an arbitrary orthogonal transform, with . Let
where denote the Kronecker product. Note that is also orthogonal. We define groups of measurements from as follows:
For instance, if is the 1D discrete Fourier transform, this strategy consists in constructing blocks as horizontal discrete lines of the discrete Fourier plane. This is similar to the blocks used in [LDP07]. Similarly to paragraph 2.3.1, a sensing matrix can be constructed by drawing i.i.d. blocks with distribution . Letting denote the drawn blocks indexes, reads:
Let be the support of cardinality of the signal to reconstruct. Under the above hypotheses, if
then the vector is the unique solution of (2) with probability at least .
Using the above result we also obtain the following Corollary.
The sharpness of the bounds on the required number of measurements in Proposition 4.3 will be discussed in the following paragraph.
4.3.2 The limits of separable transforms
Considering a 2D discrete Fourier transform and a dictionary of blocks made of horizontal lines in the discrete Fourier domain, one could hope to only require blocks of measurements to perfectly recover all -sparse vectors. Indeed, it is known since [CRT06a] that isolated measurements uniformly drawn at random are sufficient to achieve this. In this paragraph, we show that this expectation cannot be satisfied since at least blocks are necessary to reconstruct all -sparse vectors. It means that this specific block structure is inadequate to obtain strong reconstruction guarantees. This result also shows that Theorem 4.3 is nearly optimal.
In order to prove those results, we first recall the following useful lemma. We define a decoder as any mapping . Note that is not necessarily a linear mapping.
[CDD09, Lemma 3.1] Set to be the set of -sparse vectors in . If is any matrix, then the following propositions are equivalent:
There is a decoder such that , for all -sparse in .
For any set of cardinality , the matrix has rank .
Looking at (iii) of Lemma 4.4, since the rank of is smaller than , we deduce that is a necessary condition to have a decoder. Therefore, if the number of isolated measurements is less than with the degree of sparsity of , we cannot reconstruct . This property is an important step to prove Proposition 4.5.
Assume that the sensing matrix has the special block structure described in (11). If , then there exists no decoder such that for all -sparse vector . In other words, the minimal number of distinct blocks required to identify every -sparse vectors is necessarily larger than .
Proposition 4.5 shows that there is no hope to reconstruct all -sparse vectors with less than blocks of measurements, using sensing matrices made of blocks such as (9). Moreover, since the blocks are of length , it follows that whenever , the full matrix should be used to identify every -sparse . Let us illustrate this result on a practical example. Set to be the 2D Fourier matrix, i.e. the Kronecker product of two 1D Fourier matrices. Consider that the dictionary of blocks is made of horizontal lines. Now consider a vector to be -sparse in the spatial domain and only supported on the first column as illustrated in Figure 3(a). Due to this specific signal structure, the Fourier coefficients of are constant along horizontal lines, see Figure 3(b). Therefore, for this type of signal, the information captured by a block of measurements (i.e. a horizontal line) is as informative as one isolated measurement. Clearly, at least blocks are therefore required to reconstruct all -sparse vectors supported on a vertical line of the 2D Fourier plane. Using Corollary 4.3, one can derive the following result.
Let denote the 2D discrete Fourier matrix and consider a partition in blocks that consist of lines in the 2D Fourier domain. Assume that is -sparse. The drawing probability minimizing the right hand side of (13) is given by
and for this particular choice, the number of blocks of measurements sufficient to reconstruct with probability is
This result is disappointing but optimal up to a logarithmic factor, due to Proposition 4.5. We refer to Appendix C.5 for the proof. This Proposition indicates that blocks are sufficient to reconstruct which is similar to the minimal number given in Proposition 4.5 up to a logarithmic factor.
4.3.3 Relation to previous work
To the best of our knowledge, the only existing compressed sensing results based on blocks of measurements appeared in [PDG14]. In this paragraph, we outline the differences between both approaches.
First, in our work, no assumption on the sign pattern of the non-zero signal entries is required. Furthermore, while the result in [PDG14] only covers the case described in Paragraph 2.3.1 (i.e. partitions of orthogonal transforms), our work covers the case of overlapping blocks of measurements (see Paragraph 2.3.2), subsampled tight or continuous frames (see Paragraph 2.3.3), and it can also be extended to the case of randomly generated blocks (see Paragraph 2.3.4). Last but not least, the work [PDG14] only deals with uniform sampling densities which is well known to be of little interest when dealing with partially coherent matrices (see e.g. end of Paragraph 4.1 for an edifying example).
Apart from those contextual differences, the comparison between the results in [PDG14] and the ones in this paper is not straightforward. The criterion in [PDG14] that controls the overall number of measurements depends on the following quantity:
where stands for the block restricted to the columns in with renormalized rows. The total number of measurements required in the approach [PDG14] is
which should be compared to our result
As shown in the previous paragraphs, the number (17) is sharp in various settings of interest, while (16) is usually hard to explicitly compute or too large in the case of patially incoherent transforms. It therefore seems that our results should be preferred over those of [PDG14].
We have introduced new sensing matrices that are constructed by stacking random blocks of measurements. Such matrices play an important role in applications since they can be implemented easily on many imaging devices. We have derived theorems that guarantee exact reconstruction using these matrices via -minimization algorithms and outlined the crucial role of two properties: the extra and intra support block-coherences introduced in Definition 2.1. We have showed that our main result (Theorem 3.1) is sharp in a few settings of practical interest, suggesting that it cannot be improved in the general case up to logarithmic factors.
Apart from those positive results, this work also reveals some limits of block sampling approaches. First, it seems hard to evaluate the extra and intra support block-coherences - except in a few particular cases - both analytically and numerically. This evaluation is however central to derive optimal sampling approaches. More importantly, we have showed in Paragraph 4.3.2 that not much could be expected from this approach in the specific setting where separable transforms and blocks consisting of lines of the acquisition space are used. Despite the peculiarity of such a dictionary, we believe that this result might be an indicator of a more general weakness of block sampling approaches. Since the best known compressed sensing strategies heavily rely on randomness (e.g. Gaussian measurements or uniform drawings of Fourier atoms), one may wonder whether the more rigid sampling patterns generated by block sampling approaches have a chance to provide decent results. It is therefore legitimate to ask the following question: is it reasonable to use variable density sampling with pre-defined blocks of measurements in compressed sensing?
Numerical experiments indicate that the answer to this question is positive. For instance, it is readily seen in Figure 4 (a,b,c) and (j,k,l), that block sampling strategies can produce comparable results to acquisitions based on isolated measurements. The first potential explanation to this phenomenon is that is low for the dictionaries chosen in those experiments. However, even acquisitions based on horizontal lines in the Fourier domain (see Figure 4 (d,e,f)) produce rather good reconstruction results while Proposition 4.6 seems to indicate that this strategy is doomed.
This last observation suggests that a key feature is missing in our study to fully understand the potential of block sampling in applications. Recent papers [AHPR13, AHR14] highlight the central role of structured sparsity to explain the practical success of compressed sensing. A very promising perspective is therefore to couple the ideas of structured sparsity in [AHPR13, AHR14] and the ideas of block sampling proposed in this paper to finely understand the results in Figure 4 and perhaps design new optimal and applicable sampling strategies.
|(a)||(b) PSNR = 40 dB||(c)|
|(d)||(e) PSNR = 32.79 dB||(f)|
|(g)||(h) PSNR = 36.34 dB||(i)|
|(j)||(k) PSNR = 38.99 dB||(l)|
Appendix A Bernstein’s inequalities
Theorem A.1 (Scalar Bernstein Inequality).
Let be independent random variables such that almost surely for every . Assume that for . Then for all ,
Theorem A.2 (Rectangular Matrix Bernstein Inequality).
[Tro12, Theorem 1.6]
Let be a finite sequence of rectangular independent random matrices of dimension . Suppose that is such that and a.s. for some constant that is independent of . Define
Then, for any , we have that
Theorem A.3 (Vector Bernstein Inequality (V1)).
[CP11, Theorem 2.6] Let be a finite sequence of independent and identically distributed random vectors of dimension . Suppose that and a.s. for some constant and set . Let . Then, for any , we have that
Theorem A.4 (Vector Bernstein Inequality (V2)).
[FR13, Corollary 8.44] Let be a finite sequence of independent and indentically distributed random vectors of dimension . Suppose that and a.s. for some constant . Let . Then, for any , we have that
where . Note that the previous inequality still holds by replacing by where .
Appendix B Estimates: auxiliary results
Let be the support of the signal to be reconstructed such that . Note that the isotropy condition (4) ensures that the following properties hold
for any vector , .
for any , .
The above properties will be repeatedly used in the proof of the following lemmas.
Let be of cardinality of . Then, for any , one has that
Let , such that . Let be a vector in . Then, for any , one has that
Without loss of generality we may assume that . We remark that
where is a random vector with zero mean. Simple calculations yield that
Now, let us define By independence of the random vectors , it follows that
To bound the first term in the above equality, one can write
One immediately has that Therefore, one finally obtains that
Using the above upper bounds, namely and , the result of the lemma is thus a consequence of the Bernstein’s inequality for random vectors (see Theorem A.4), which completes the proof.
Let , such that . Let be a vector of . Then we have
Suppose without loss of generality that . Then,
Let us define . Note that . From the Cauchy-Schwarz inequality, we get