Column normalization of a random measurement matrix
In this note we answer a question of G. Lecué, by showing that column normalization of a random matrix with iid entries need not lead to good sparse recovery properties, even if the generating random variable has a reasonable moment growth. Specifically, for every we construct a random vector with iid, mean-zero, variance coordinates, that satisfies for every . We show that if and is the column-normalized matrix generated by independent copies of , then with probability at least , does not satisfy the exact reconstruction property of order .
Sparse Recovery is one of the most important research topics in modern signal processing. It focuses on the possibility of identifying a sparse signal—i.e., a signal that is supported on relatively few coordinates in relative to the standard basis—using linear measurements. We refer the reader to the books [MR2807761, fora13] for extensive surveys on sparse recovery and related topics.
In a basic sparse recovery problem one pre-selects an matrix that generates the given data. For an unknown (sparse) vector , the coordinates of the vector are the linear measurements of one may use for recovery. The hope is that for a well chosen , the resulting linear measurements would be enough to identify , and because is sparse, the number of measurements required for recovery should be significantly smaller than the dimension .
One of the main achievements of the theory of sparse recovery was the introduction of a convex optimization problem called basis pursuit, which is an effective recovery procedure: it selects that solves the minimization problem
where we denote by .
Extensive effort has been devoted to the question of finding conditions on the measurement matrix that ensure the recovery of any sparse vector; more accurately, one would like to guarantee that for every -sparse vector , the minimization problem (1.1) has a unique solution— itself.
Let be the set of -sparse vectors in . A matrix satisfies the exact reconstruction property of order if for every there is a unique solution to the minimization problem (1.1) and that unique solution is .
Because measurements are ‘expensive’, one would like to find matrices that satisfy the exact recovery property of order with the smallest number of measurements (rows) possible. One may show that if satisfies the exact reconstruction property of order , then it must have at least rows. Moreover, typical realizations of various random matrices with rows indeed satisfy the exact reconstruction property of order (see, e.g., [fora13]). Therefore, the optimal number of measurements required for the exact reconstruction property of order is , and that number serves as the benchmark for an optimal measurement matrix.
The question we are interested in has to do with the normalization of the columns of the measurement matrix. It is often assumed in literature that the columns of have unit Euclidean norm (see, for example, [MR2807761] and [fora13] and references therein); i.e., if is the standard basis in then for . Column normalization appears frequently in various notions used in the study of the exact reconstruction property. Among these well-studied notions are coherence [fora13]; the restricted eigenvalues condition [MR2533469]; and the compatibility condition [MR2807761]. Moreover, in many real-world applications, measurement matrices with normalized columns tends to perform better than matrices whose columns have not been normalized.
While column normalization seems a natural idea, it adds substantial technical difficulties when studying random measurement matrices: normalizing the columns of a matrix with independent rows introduces additional dependencies. Despite the added difficulties, the results of [LM-JEMS] highlight the possibility that column normalization may still have a significant role to play in the context of random measurement matrices, particularly in heavy-tailed situations.
To formulate the results of [LM-JEMS] and explain their connection to column-normalization we need the following definition:
Let be a random variable. Given an integer , let , , be independent copies of . The random matrix generated by is . Also, we denote by a vector with independent copies of ; thus the rows of are independent copies of .
The following result from [LM-JEMS] is a construction of random matrices generated by seemingly nice random variables, but despite that the matrices exhibit poor reconstruction properties.
There exist absolute constants and for which the following holds. For every there is a mean-zero, variance one random variable that satisfies
For every and every ,
If then with probability , does not satisfy the exact reconstruction property of order .
Theorem 1.3 implies that without assuming that each has a subgaussian moment growth
To put Theorem 1.3 in some perspective, it is complemented by a positive result, once linear forms have enough subgaussian moments.
Let be a mean-zero, variance one random variable. Assume that for every and every ,
then with probability at least , satisfies the exact reconstruction property of order . Here, in an absolute constant and and are constants that depend only on .
It follows from Theorem 1.4 that if has a slightly better moment growth condition than in Theorem 1.3—a subgaussian growth up to —the random measurement matrix generated by satisfies the exact reconstruction property of order , for the optimal number of measurements .
The connection with column-normalization arises from the main observation used in the proof of Theorem 1.4:
Recall that denotes the set of -sparse vectors in . Let . If
(a) for every ,
(b) for every ,
and , then satisfies the exact reconstruction property of order .
Lemma 1.5 gives a clear motivation for considering column-normalized random measurement matrices, and that motivation grows stronger when taking into account the proof of Theorem 1.4. It turns out that the ‘bottleneck’ in the proof is the upper bound on , while guaranteeing (a) requires a rather minimal small-ball assumption. Therefore, the seemingly more restrictive condition (a) is almost universally true (see [MenACM, LM-JEMS] for more details) and (b) is the only place in which the moment growth assumption is used in the proof of Theorem 1.4.
Clearly, column normalization resolves the issue of an upper estimate on . That, and the fact that (a) is true under minimal assumptions has led G. Lecué [Lec] to ask whether with column normalization, the moment growth condition (1.2) can be relaxed significantly, leading to a much stronger version of Theorem 1.4.
Let be a mean-zero, variance random variable, set to be the matrix generated by and let be the column-normalized matrix generated by . Thus, the entries of are
If for every and , does satisfy the exact reconstruction property of order with high probability?
There exist absolute constants and for which the following holds. For every there is a symmetric, variance random variable with the following properties:
If are independent copies of and , then for every and every , .
If , then with probability at least , the column-normalized matrix generated by does not satisfy the exact reconstruction property of order .
Theorem 1.7 answers Question 1.6 in the negative: column normalization does not improve the poor behaviour described in Theorem 1.3. Indeed, for , linear forms satisfy an norm equivalence, but the recovery of -sparse vectors using requires at least measurements — significantly larger than the optimal number of measurements, . Moreover, if and , then although for every , the recovery of -sparse vectors using requires at least measurements, which, again, is suboptimal when .
Let us mention the straightforward observation that a version of Theorem 1.4 holds for column-normalized matrices as well.
Let be and be as in Theorem 1.4 and let be the column-normalized measurement matrix generated by . If , then with probability at least , satisfies the exact reconstruction property of order .
Let be a symmetric, -valued random variable, set to be a -valued random variable with mean and let ; the values of and will be specified later. Let
let be independent copies of and set .
Let us identify conditions under which satisfies the first part of Theorem 1.7.
There exists an absolute constant for which the following holds. Assume that and that there is such that for every , . Then for every and every ,
Moreover, for every , , and .
In particular, is an isotropic random vector and for every , exhibits a -subgaussian moment growth up to the -th moment.
The proof of Lemma 2.1 is based on a simple comparison argument:
Let be centred, independent random variables and assume are also centred and independent. If is even and for every and every , , then for every ,
Proof. By a standard symmetrization argument we may also assume that and are symmetric. Therefore,
with the sum taken over all choices of , where and is the appropriate multinomial coefficient. Since are symmetric, the only products that do not vanish are when are even, and if are even then
Proof of Lemma 2.1. Observe that is mean-zero and that . Hence, if and then —and the “moreover” part of the claim follows.
Turning to the first part of the claim, let be independent copies of , set to be a standard gaussian random variable and let be independent copies of . Recall that for every , , and observe that
The key part in the construction is the following lemma which describes the typical structure of the matrix generated by ,
There exist absolute constants and for which the following holds. Let and . Then, with probability at least :
(1) there are indices and such that and for , ;
there is a subset of cardinality such that for every and ;
we have that , where and .
If satisfies Lemma 2.3 then its column-normalized version does not satisfy the exact reconstruction property of order .
Proof. Using the notation of Lemma 2.3 and by its first part, ; hence, if we denote by the standard basis of ,
If set ; otherwise, set . In either case, is -sparse. Let and observe that the coordinates of satisfy that
Next, let be the set of coordinates given by the second part of Lemma 2.3. Clearly, and
is an Bernoulli matrix. Therefore,
Observe that and by the third part of Lemma 2.3
for an absolute constant .
Hence, if then . Since and , it is evident that is not the unique solution of the minimization problem
and does not satisfy the exact reconstruction property of order .
The proof of Lemma 2.3 uses a standard fact on iid -valued random variables: if are independent copies of a -valued random variable and then with probability at least , .
Proof of Lemma 2.3. Let be independent copies of , let be the indicator of the event
Observe that and that if are independent copies of and then with probability at least , . In particular, on that event, the matrix has at least two identical columns, each with a single entry of . Therefore, the first part of Lemma 2.3 holds if
For the second part of the lemma, let be the indictor of the event
and note that . If are independent copies of and then with probability at least , . Hence, if
then with probability at least , there is and for every and every , .
Turning to the third part of the lemma, and by applying the second part, we have that for , . Let and recall that are independent of . Therefore, by Corollary 4.1 from [LPRP], there are absolute constants and for which, with probability at least ,
Finally, all that remains is to see when (2.1) and (2.2) are satisfied. It is straightforward to verify that if for then (2.1) holds, and if then (2.2) holds. Therefore, both conditions are satisfied with the choice of for a suitable absolute constant , as long as .
It follows from Corollary 2.4 that with probability at least , the column-normalized matrix generated by does not satisfy the exact reconstruction property of order . To complete that proof, all that remains is to show that also satisfies the assumptions of Lemma 2.1: that for every and for an absolute constant .
To that end, let and observe that is decreasing when ; hence, for every as long as . Therefore, if we set then for every , as required.
Appendix A Proof of Theorem 1.9
The proof is a direct consequence of the argument used in the proof of Theorem 1.4. Thanks to column normalization, satisfies (b) in Lemma 1.5 for . All that is left to verify is (a) for which is a constant that depends only on .
The proof of Theorem 1.4 shows that if has independent rows that are distributed as then with probability at least ,
Also, with probability at least ,
For every , set
which is also an -sparse vector. Observe that , implying that
and (a) from Lemma 1.5 is verified for the matrix for .
- footnotetext: Department of Mathematics, Technion, I.I.T., Haifa, Israel and Mathematical Sciences Institute, The Australian National University, Canberra, Australia, Email: email@example.com
- footnotetext: Supported in part by the Israel Science Foundation.
- Recall that a characterization of an -subgaussian random variable is that for every