# Sigma-Delta quantization of sub-Gaussian frame expansions and its application to compressed sensing

###### Abstract.

Suppose that the collection forms a frame for ,
where each entry of the vector is a sub-Gaussian random variable. We consider expansions in such a frame, which are then quantized using a Sigma-Delta scheme.
We show that an arbitrary signal in can be recovered from its quantized frame coefficients up to an error which decays root-exponentially
in the oversampling rate . Here the quantization scheme is assumed to be chosen appropriately depending on the oversampling rate and the quantization alphabet can be coarse.
The result holds with high
probability on the draw of the frame uniformly for all signals. The crux of the argument is a bound on the
extreme singular values of the product of a deterministic matrix and
a sub-Gaussian frame. For fine quantization alphabets, we leverage this bound to show
polynomial error decay in the context of compressed sensing. Our results extend
previous results for
structured deterministic frame expansions and
Gaussian compressed sensing measurements.
compressed sensing, quantization, random frames, root-exponential accuracy, Sigma-Delta, sub-Gaussian matrices

2010 Math Subject Classification: 94A12, 94A20, 41A25, 15B52

## 1. Introduction

### 1.1. Main problem

In this paper we address the problem of digitizing, or quantizing, generalized linear measurements of finite dimensional signals. In this setting a signal is a vector , and the acquired measurements are inner products of with elements from a collection of measurement vectors in . This generalized linear measurement model has received much attention lately, both in the frame theory literature where one considers , e.g., [1], and in the compressed sensing literature where , e.g., [2]. For concreteness, let denote the measurement vectors. The generalized linear measurements are given by and can be organized as a vector , given by . Note that here and throughout the paper, we always consider column vectors. Our goal is to quantize the measurements, i.e., to map the components of to elements of a fixed finite set so that the measurements can be stored and transmitted digitally. A natural requirement for such maps is that they allow for accurate recovery of the underlying signal. As the domain of the forward map is typically an uncountable set and its range is finite, an exact inversion is, in general, not possible. Accordingly we seek an approximate inverse, which we shall refer to as a reconstruction scheme or decoder. More precisely, let be a finite set, which we shall refer to as the quantization alphabet, and consider a set of signals, which is compact in . Then a quantization scheme is a map

and a reconstruction scheme is of the form

For to be an approximate inverse of , we need that the reconstruction error

viewed as a function of is as small as possible in some appropriate norm. Typical choices are , the worst case error, and , the mean square error.

We are interested in quantization and reconstruction schemes that yield fast error decay as the number of measurements increases. Next, we give a brief overview of the literature on quantization in both frame quantization and compressed sensing settings and explain how these two settings are intimately connected. Below, we use different notations for the two settings to make it easier to distinguish between them.

### 1.2. Finite frame setting

Let be a frame for , i.e., , and assume that the matrix whose th row is has rank , thus the map is injective. Accordingly, one can reconstruct any exactly from the frame coefficients using, e.g., any left inverse of . As we explained above, the frame coefficients can be considered as generalized measurements of and our goal is to quantize such that the approximation error is guaranteed to decrease as the number of measurements, i.e., , increases.

#### 1.2.1. Memoryless scalar quantization

The most naive (and intuitive) quantization method is rounding off every frame coefficient to the nearest element of the quantizer alphabet . This scheme is generally called memoryless scalar quantization (MSQ) and yields (nearly) optimal performance when and is an orthonormal basis. However, as redundancy increases (say, we keep fixed and increase ) MSQ becomes highly suboptimal. In particular, one can prove that the (expected) approximation error via MSQ with a given fixed alphabet can never decay faster than regardless of the reconstruction scheme that is used and regardless of the underlying [3]. This is significantly inferior compared to another family of quantization schemes, called quantizers, where one can have an approximation error that decays like for any integer , provided one uses an appropiate order quantizer.

#### 1.2.2. Sigma-Delta quantization

Despite its use in the engineering community since the 1960’s as an alternative quantization scheme for digitizing band-limited signals (see, e.g., [4]), a rigorous mathematical analysis of quantization was not done until the work of Daubechies and Devore [5]. Since then, the mathematical literature on quantization has grown rapidly.

Early work on the mathematical theory of quantization has focused on understanding the reconstruction accuracy as a function of oversampling rate in the context of bandlimited functions, i.e., functions with compactly supported Fourier transform. Daubechies and DeVore constructed in their seminal paper [5] stable th-order schemes with a one-bit alphabet. Furthermore, they proved that when such an th-order scheme is used to quantize an oversampled bandlimited function, the resulting approximation error is bounded by where is the oversampling ratio and depends on the fine properties of the underlying stable schemes. For a given oversampling rate one can then optimize the order to minimize the associated worst-case approximation error, which, in the case of the stable family of Daubechies and DeVore, yields that the approximation error is of order .

In [6], Güntürk constructed an alternative infinite family of quantizers of arbitrary order—refined later by Deift et al. [7]—and showed that using these new quantizers one can do significantly better. Specifically, using such schemes (see Section 2.2) in the bandlimited setting, one obtains an approximation error of order where in [6] and in [7]. In short, when quantizing bounded bandlimited functions, one gets exponential accuracy in the oversampling rate by using these schemes. In other words, one can refine the approximation by increasing the oversampling rate , i.e., by collecting more measurements, exponentially in the number of measurements without changing the quantizer resolution. Exponential error decay rates are known to be optimal [8, 6]; lower bounds for the constants for arbitrary coarse quantization schemes are derived in [9]. In contrast, it again follows from [3] that independently quantizing the sample values at best yields linear decay of the average approximation error.

Motivated by the observation above that suggests that quantizers utilize the redundancy of the underlying expansion effectively, Benedetto et al. [10] showed that quantization schemes provide a viable quantization scheme for finite frame expansions in . In particular, [10] considers with and shows that the reconstruction error associated with the first-order quantization (i.e., ) decays like as increases. Here the “oversampling ratio” is defined as if the underlying frame for consists of vectors. This is analogous to the error bound in the case of bandlimited functions with a first-order quantizer. Following [10], there have been several results on quantization of finite frames expansions improving on the approximation error by using higher order schemes, specialized frames, and alternative reconstruction techniques, e.g., [11, 12, 13, 14, 15]. Two of these papers are of special interest for the purposes of this paper: Blum et al. showed in [14] that frames with certain smoothness properties (including harmonic frames) allow for the reconstruction error to decay like , provided alternative dual frames—called Sobolev duals—are used for reconstruction. Soon after, [15] showed that by using higher order schemes whose order is optimally chosen as a function of the oversampling rate , one obtains that the worst-case reconstruction error decays like , at least in the case of two distinct structured families of frames: harmonic frames and the so-called Sobolev self-dual frames which were constructed in [15]. For a more comprehensive review of schemes and finite frames, see [16]. One of the fundamental contributions of this paper is to extend these results to wide families of random frames.

#### 1.2.3. Gaussian frames and quantization

Based on the above mentioned results, one may surmise that structure and smoothness are in some way critical properties of frames, needed for good error decay in quantization. Using the error analysis techniques of [14], it can be seen, though, that what is critical for good error decay in quantization is the “smoothness” and “decay” properties of the dual frame that is used in the reconstruction—see Section 2.3, cf. [16]. This observation is behind the seemingly surprising fact that Gaussian frames, i.e., frames whose entries are drawn independently according to , allow for polynomial decay in of the reconstruction error [17, Theorem A]—specifically, one can show that the approximation error associated with an th-order scheme is of order . The proof relies on bounding the extreme singular values of the product of powers of a deterministic matrix—the difference matrix defined in Section 2.3—and a Gaussian random matrix. This result holds uniformly with high probability on the draw of the Gaussian frame.

### 1.3. Compressed sensing setting

Compressed sensing is a novel paradigm in mathematical signal processing that was spearheaded by the seminal works of Candes, Romberg, Tao [18], and of Donoho [19]. Compressed sensing is based on the observation that various classes of signals such as audio and images admit approximately sparse representations with respect to a known basis or frame. Central results of the theory establish that such signals can be recovered with high accuracy from a small number of appropriate, non-adaptive linear measurements by means of computatinally tractable reconstruction algorithms.

More precisely, one considers -dimensional, -sparse signals, i.e., vectors in the set

The generalized measurements are acquired via the matrix where . The goal is to recover from . In this paper, we focus on random matrices with independent sub-Gaussian entries in the sense of Definitions 3.2 and 3.5 below. It is well-known [20] that if , where is an absolute constant, with high probability, such a choice of allows for the recovery of all as the solution of the -minimization-problem

#### 1.3.1. Quantized compressed sensing

The recovery guarantees for compressed sensing are provably stable with respect to measurement errors. Consequently, this allows to incorporate quantization into the theory, albeit naively, as one can treat the quantization error as noise. The resulting error bounds for quantized compressed sensing, however, are not satisfactory, mainly because additional quantized measurements will, in general, not lead to higher reconstruction accuracy [17]. The fairly recent literature on quantization of compressed sensing measurements mainly investigates two families of quantization methods: the 1-bit or multibit memoryless scalar quantization (MSQ) [21, 22, 23, 24] and quantization of arbitrary order [17].

The results on the MSQ scenario focus on replacing the naive reconstruction approach outlined above by recovery algorithms that exploit the structure of the quantization error. For Gaussian random measurement matrices, it has been shown that approximate reconstruction is possible via linear programming even if the measurements are coarsely quantized by just a single bit [23]. For non-Gaussian measurements, counterexamples with extremely sparse signals exist which show that, in general, corresponding results do not hold [23]. These extreme cases can be controlled by introducing the -norm of the signal as an additional parameter, establishing recovery guarantees for arbitrary sub-Gaussian random measurement matrices, provided that this norm is not too large [24]. All these results yield approximations where the error does not decay faster than , where in this case the oversampling rate is defined as the ratio of , the number of measurements, to , the sparsity level of the underlying signal. Again, it follows from [3] that for independently quantized measurements, i.e., MSQ, no quantization scheme and no recovery method can yield an error decay that is faster than .

This bottleneck can be overcome by considering quantizers, which take into account the representations of previous measurements in each quantization step [17]. The underlying observation is that the compressed sensing measurements are in fact frame coefficients of the sparse signal restricted to its support. Accordingly, the problem of quantizing compressed sensing measurements is a frame quantization problem, even though the “quantizer” does not know what the underlying frame is. This motivates a two-stage approach for signal recovery:

In a first approximation, the quantization error is just treated as noise, a standard reconstruction algorithm is applied, and the indices of the largest coefficients are retained as a support estimate. Once the support has been identified, the measurements carry redundant information about the signal, which is exploited in the quantization procedure by applying frame quantization techniques.

This two-stage approach has been analyzed in [17] for the specific case of Gaussian random measurement matrices. In particular, it was shown that under mild size assumptions on the non-zero entries of the underlying sparse signals, this approach can be carried through. Consequently, in the case of Gaussian measurement matrices, one obtains that the approximation error associated with an th-order quantizer is of order where —see [17, Theorem B]. These results hold uniformly with high probability on the draw of the Gaussian measurement matrix provided .

### 1.4. Contributions

Our work in this paper builds on these results, and generalizes them. Our contributions are two-fold. On the one hand, we establish corresponding results in the compressed sensing setting which allow arbitrary, independent, fixed variance sub-Gaussian (in the sense of Definition 3.2 below) random variables as measurement matrix entries. In particular, this includes the important case of Bernoulli matrices, whose entries are renormalized independent random signs. More precisely, in Theorem 4.10 we prove a refined version of the following.

###### Theorem 1.1.

Let be an matrix whose entries are appropriately normalized independent sub-Gaussian random variables and suppose that where . With high probability the -th order reconstruction satisfies

for all for which . Here is the resolution of the quantization alphabet and is an appropriate constant that depends only on .

Our second line of contributions is on frame quantization: We show that using appropriate quantization schemes, we obtain root-exponential decay of the reconstruction error with both Gaussian and sub-Gaussian frame entries. In particular, in Theorem 4.4 we prove a refined version of the following result.

###### Theorem 1.2.

Let be an matrix whose entries are appropriately normalized independent sub-Gaussian random variables. Suppose that satisfies , where is a constant independent of and . Then with high probability on the draw of , the corresponding reconstruction from a scheme of appropriate order satisfies

for all with . Here are appropriate constants.

Note that a key element of our proof, which may be of independent interest, pertains to the extreme singular values of the product of a deterministic matrix with quickly decaying singular values and a sub-Gaussian matrix, see Proposition 4.1.

###### Remark 1.3.

All of the constants in the above theorems can be made explicit. Moreover, the quantization schemes are explicit and tractable, as are the reconstruction algorithms; however, the quantization scheme and reconstruction algorithms are different between Theorems 1.1 and 1.2. Please see Theorems 4.4 and 4.10 for the full details.

### 1.5. Organization

The remainder of the paper is organized as follows. In Section 2 we review quantization and basic error analysis techniques that will be useful in the rest of the paper. In Section 3 we introduce the concept of a sub-Gaussian random matrix and recall some of its key properties as well as some important probabilistic tools. In Section 4, we prove a probabilistic lower bound on the singular values of the product of the matrix , where is a positive integer and is a difference matrix, and a sub-Gaussian random matrix. Finally, we use this result in combination with some known results on the properties of various quantization schemes to prove the main theorems.

## 2. Sigma-Delta quantization

An th order quantizer maps a sequence of inputs to a sequence whose elements take on values from via the iteration

(1) | ||||

Here is a fixed function known as the quantization rule and is a sequence of state variables initialized to zero, i.e., for all . It is worth noting that designing a a good quantization rule in the case is generally non-trivial, as one seeks stable schemes, i.e., schemes that satisfy

(2) |

for constants and that do not depend on (note that for the remainder of this paper, the constants are numbered in the order of appearance; this allows to refer to constants introduced in previous results and proofs). In particular, stability is difficult to ensure when one works with a coarse quantizer associated with a small alphabet, the extreme case of which is -bit quantization corresponding to

In this work we consider two different sets of assumptions. Our results on compressed sensing reconstruction require sufficiently fine alphabets, whereas the results on frame quantization make no assumptions on the size of the alphabet —in particular, allowing for very coarse alphabets. In both cases we will work with the level mid-rise alphabet

(3) |

### 2.1. Greedy sigma-delta schemes

We will work with the greedy quantization schemes

(4) | ||||

It is easily seen by induction that for the level mid-rise alphabet and , a sufficient condition for stability is as this implies

(5) |

Note that to satisfy this stability condition, the number of levels must increase with .

### 2.2. Coarse sigma-delta schemes

We are also interested in coarse quantization, i.e., schemes where the alphabet size is fixed. In this case, guaranteeing stability with a smaller alphabet typically entails a worse (i.e., larger) stability constant. The coarse schemes that we employ were first proposed by Güntürk [6] and refined by Deift et al. [7]. Originally designed to obtain exponential accuracy in the setting of bandlimited functions, they were used to obtain root-exponential accuracy in the finite frame setup in [15]. At their core is a change of variables of the form , where is as in (1) and for some (with entries indexed by the set ) such that . The quantization rule is then chosen in terms of the new variables as , where with the Kronecker delta. Then (1) reads as

(6) | ||||

where, again, is the scalar quantizer associated with the level mid-rise alphabet (3). By induction, one concludes

i.e., a sufficient condition to guarantee stability for all bounded inputs is

(7) |

Thus, one is interested in choosing with minimal subject to and (7). This problem was studied in [6, 7] leading to the following proposition (cf. [15]).

###### Proposition 2.1.

There exists a universal constant such that for any midrise quantization alphabet , for any order , and for all , there exists for some such that the scheme given in (6) is stable for all input signals with and

(8) |

where as above, and .

### 2.3. Sigma-Delta error analysis

As above, assume that and the frame matrix . If the vector of frame coefficients is quantized to yield the vector , then linear reconstruction of from using some dual frame of (i.e., ) produces the estimate . We would like to control the reconstruction error . Writing the state variable equations (1) in vector form, we have

(9) |

where is the difference matrix with entries given by

(10) |

Thus,

(11) |

Working with with stable schemes, one can control via . Thus, it remains to bound the operator norm and a natural choice for is

(12) |

This so-called Sobolev dual frame was first proposed in [14]. Here is the Moore-Penrose (left) inverse of the matrix . Since (12) implies that the singular values of will play a key role in this paper.

We begin by presenting some important properties of the matrix . The following proposition is a quantitative version of Proposition 3.1 of [17].

###### Proposition 2.2.

The singular values of the matrix satisfy

###### Proof.

Note that (see, e.g., [17])

Moreover, by Weyl’s inequalities [25] on the singular values of Hermitian matrices, it holds that (see [17] for the full argument)

Combining the above inequalities, we obtain

Observing that

for establishes the lower bound.

For the upper bound, note that for and for . Thus,

∎

## 3. Sub-Gaussian random matrices

Here and throughout, denotes that the random variable is drawn according to a distribution . Furthermore, denotes the zero-mean Gaussian distribution with variance . The following definition provides a means to compare the tail decay of two distributions.

###### Definition 3.1.

If two random variables and satisfy for some constant and all , then we say that is -dominated by (or, alternatively, by ).

###### Definition 3.2.

A random variable is sub-Gaussian with parameter if it is -dominated by .

###### Remark 3.3.

One can also define sub-Gaussian random variables via their moments or, in case of zero mean, their moment generating functions. See [20] for a proof that all these definitions are equivalent.

###### Remark 3.4.

Examples of sub-Gaussian random variables include Gaussian random variables, all bounded random variables (such as Bernoulli), and their linear combinations.

###### Definition 3.5.

We say that a matrix is sub-Gaussian with parameter , mean and variance if its entries are independent sub-Gaussian random variables with mean , variance , and parameter .

The contraction principle (see, for example, Lemma 4.6 of [26]) will allow us to derive estimates for sub-Gaussian random variables via the corresponding results for Gaussians.

###### Lemma 3.6 (Contraction Principle).

Let be a non-decreasing convex function and let and be two finite symmetric sequences of random variables such that there exists a constant such that for each , is -dominated by . Then, for any finite sequence in a Banach space equipped with a norm we have

While the contraction principle as well as the following chaos estimate are formulated for random vectors, we mainly work with random matrices. Thus, it is convenient to “vectorize” the matrices: for a matrix , we denote by the vector formed by stacking its columns into a single column vector.

To state the more refined chaos estimate, we need the concept of the Talagrand -functional (see, e.g., [26] for more details).

###### Definition 3.7.

For a metric space , an admissible sequence of is a collection of subsets of , , such that for every , and . The functional is defined by

where the infimum is taken with respect to all admissible sequences of .

Furthermore, for a set of matrices, we denote by and the diameter in the Frobenius norm and the spectral norm , respectively. Here the Frobenius norm is given by and the spectral norm is given by . The following theorem is a slightly less general version of [27, Thm. 3.1].

###### Theorem 3.8.

Let be a symmetric set of matrices, that is, , and let be a random vector whose entries are independent, sub-Gaussian random variables of mean zero, variance one, and parameter . Set

Then, for ,

The constants depend only on .

## 4. Main results

### 4.1. Estimates of singular values and operator norms

As argued above, a key quantity to control the reconstruction error both in the context of compressed sensing and frame quantization is the norm , where is the Sobolev dual of a sub-Gaussian frame and is the associated state vector. This quantity can be controlled by the product of the operator norm and the vector norm . We will estimate these quantities separately in the following two propositions. An estimate for the first quantity can be deduced from the following proposition together with two observations: First, recall that singular values are invariant under unitary transformations, so , where is the singular value decomposition of . Second, when estimated using Proposition 2.2, the singular values of are bounded exactly as in the following assumptions.

###### Proposition 4.1.

Let be an sub-Gaussian matrix with mean zero, unit variance, and parameter , let be a diagonal matrix, and let be an orthonormal matrix, both of size . Further, let and suppose that , where is a positive constant that may depend on . Then there exist constants (depending on and ) such that for and

In particular, depends only on , while can be expressed as provided .

###### Proof.

The matrix has dimensions and , so by the Courant min-max principle applied to the transpose one has

(13) |

Noting that, for , where the constant will be determined later, each -dimensional subspace intersects the span of the first standard basis vectors in at least a -dimensional space, this expression is bounded from below by

(14) | ||||

(15) |

The inequality follows from the observation that is invariant under and the smallest singular value of is . In the last step, denotes the projection of an -dimensional vector onto its first components. We note that (15), again by the Courant min-max principle, is equal to

(16) |

Now, as ,

(17) |

Thus, noting that and that by choosing we ensure that ,

(18) | ||||

(19) | ||||

(20) |

Note that this choice of also ensures , which is required above. We will estimate (20) using Theorem 3.8, similarly to the proof of [27, Thm. A.1]. Indeed, we can write

(21) |

where is a vector of length with independent sub-Gaussian entries of mean zero and unit variance, and

(22) |

In order to apply Theorem 3.8, we need to estimate, for , , , and . We obtain for :

(23) |

Furthermore, we have, for ,

(24) |

so the quantities and can be estimated in exact analogy to [27, Thm. A.1]. This yields and for some constant depending only on . With these estimates, we obtain for the quantities , , in Theorem 3.8

(25) | ||||

(26) | ||||

(27) |

so the resulting tail bound reads

(28) |

where and are the constants depending only on as they appear in Theorem 3.8. Note that , so for oversampling rates , we obtain and hence, choosing , we obtain the result

(29) |

where, as desired, the constant depends only on the sub-Gaussian parameter . ∎

In contrast to the term analyzed in the previous proposition, crucially depends on the quantization procedure that is employed. The procedure employed will be fundamentally different in the frameworks of compressed sensing and frame quantization. While the quantization level in the compressed sensing scheme is chosen sufficiently fine to allow for accurate support recovery via standard compressed sensing techniques, there is no need for this in the context of frame quantization and the quantization scheme employed can be coarse.

In both cases, we will employ schemes which are stable in the sense of (2). As explained in Section 2, can be controlled for such schemes via the input . More precisely, to bound , we require a bound on . Since the matrices are random we derive bounds on the operator norms that hold with high probability on the draw of .

###### Proposition 4.2.

Let be an sub-Gaussian matrix with mean zero, unit variance, and parameter , let and fix . Denote the associated oversampling rate by . Then, with probability at least , we have for all

(30) |

Here is a constant that may depend on , but that is independent of and .

###### Proof.

Since

(31) |

we will focus on bounding the norm for random vectors consisting of independent sub-Gaussian entries with parameter . Using Markov’s inequality as well as the contraction principle applied to the increasing convex function and the sequence of standard basis vectors, we reduce to the case of a -dimensional random vector :

(32) | ||||

(33) | ||||

(34) | ||||

(35) | ||||

(36) |

where we set to obtain the third inequality. Applying a union bound over the rows and specifying we obtain for sufficiently large:

(37) | ||||

(38) | ||||

(39) | ||||

(40) |

where we used that is independent of and grows superlinearly, so above some threshold , can absorb both and .

∎

###### Remark 4.3.

Clearly, when the entries of are bounded random variables, there exists a finite, deterministic upper bound on the operator norm