Column normalization of a random measurement matrix

# Column normalization of a random measurement matrix

## Abstract

In this note we answer a question of G. Lecué, by showing that column normalization of a random matrix with iid entries need not lead to good sparse recovery properties, even if the generating random variable has a reasonable moment growth. Specifically, for every we construct a random vector with iid, mean-zero, variance coordinates, that satisfies for every . We show that if and is the column-normalized matrix generated by independent copies of , then with probability at least , does not satisfy the exact reconstruction property of order .

12

## 1 Introduction

Sparse Recovery is one of the most important research topics in modern signal processing. It focuses on the possibility of identifying a sparse signal—i.e., a signal that is supported on relatively few coordinates in relative to the standard basis—using linear measurements. We refer the reader to the books [MR2807761, fora13] for extensive surveys on sparse recovery and related topics.

In a basic sparse recovery problem one pre-selects an matrix that generates the given data. For an unknown (sparse) vector , the coordinates of the vector are the linear measurements of one may use for recovery. The hope is that for a well chosen , the resulting linear measurements would be enough to identify , and because is sparse, the number of measurements required for recovery should be significantly smaller than the dimension .

One of the main achievements of the theory of sparse recovery was the introduction of a convex optimization problem called basis pursuit, which is an effective recovery procedure: it selects that solves the minimization problem

 min∥t∥1   subject to   Γv=Γt, (1.1)

where we denote by .

Extensive effort has been devoted to the question of finding conditions on the measurement matrix that ensure the recovery of any sparse vector; more accurately, one would like to guarantee that for every -sparse vector , the minimization problem (1.1) has a unique solution— itself.

###### Definition 1.1

Let be the set of -sparse vectors in . A matrix satisfies the exact reconstruction property of order if for every there is a unique solution to the minimization problem (1.1) and that unique solution is .

Because measurements are ‘expensive’, one would like to find matrices that satisfy the exact recovery property of order with the smallest number of measurements (rows) possible. One may show that if satisfies the exact reconstruction property of order , then it must have at least rows. Moreover, typical realizations of various random matrices with rows indeed satisfy the exact reconstruction property of order (see, e.g., [fora13]). Therefore, the optimal number of measurements required for the exact reconstruction property of order is , and that number serves as the benchmark for an optimal measurement matrix.

The question we are interested in has to do with the normalization of the columns of the measurement matrix. It is often assumed in literature that the columns of have unit Euclidean norm (see, for example, [MR2807761] and [fora13] and references therein); i.e., if is the standard basis in then for . Column normalization appears frequently in various notions used in the study of the exact reconstruction property. Among these well-studied notions are coherence [fora13]; the restricted eigenvalues condition [MR2533469]; and the compatibility condition [MR2807761]. Moreover, in many real-world applications, measurement matrices with normalized columns tends to perform better than matrices whose columns have not been normalized.

While column normalization seems a natural idea, it adds substantial technical difficulties when studying random measurement matrices: normalizing the columns of a matrix with independent rows introduces additional dependencies. Despite the added difficulties, the results of [LM-JEMS] highlight the possibility that column normalization may still have a significant role to play in the context of random measurement matrices, particularly in heavy-tailed situations.

To formulate the results of [LM-JEMS] and explain their connection to column-normalization we need the following definition:

###### Definition 1.2

Let be a random variable. Given an integer , let , , be independent copies of . The random matrix generated by is . Also, we denote by a vector with independent copies of ; thus the rows of are independent copies of .

The following result from [LM-JEMS] is a construction of random matrices generated by seemingly nice random variables, but despite that the matrices exhibit poor reconstruction properties.

###### Theorem 1.3

There exist absolute constants and for which the following holds. For every there is a mean-zero, variance one random variable that satisfies

For every and every ,

 ∥⟨X,t⟩∥Lq≤c2√q∥⟨X,t⟩∥L2=c2√q.

If then with probability , does not satisfy the exact reconstruction property of order .

Theorem 1.3 implies that without assuming that each has a subgaussian moment growth3 up to the -moment for close to , the resulting measurement matrix is suboptimal. Indeed, under a modest assumption, say that for every , the recovery of -sparse vectors requires at least measurements. And, if for large enough, then the number of measurements required for the recovery of -sparse vectors is at least , which is suboptimal when .

To put Theorem 1.3 in some perspective, it is complemented by a positive result, once linear forms have enough subgaussian moments.

###### Theorem 1.4

Let be a mean-zero, variance one random variable. Assume that for every and every ,

 ∥⟨X,t⟩∥Lq≤L√q∥⟨X,t⟩∥L2=L√q. (1.2)

If

 m≥c5slog(ed/s),

then with probability at least , satisfies the exact reconstruction property of order . Here, in an absolute constant and and are constants that depend only on .

It follows from Theorem 1.4 that if has a slightly better moment growth condition than in Theorem 1.3—a subgaussian growth up to —the random measurement matrix generated by satisfies the exact reconstruction property of order , for the optimal number of measurements .

The connection with column-normalization arises from the main observation used in the proof of Theorem 1.4:

###### Lemma 1.5

Recall that denotes the set of -sparse vectors in . Let . If

(a) for every ,

(b) for every ,

and , then satisfies the exact reconstruction property of order .

Lemma 1.5 gives a clear motivation for considering column-normalized random measurement matrices, and that motivation grows stronger when taking into account the proof of Theorem 1.4. It turns out that the ‘bottleneck’ in the proof is the upper bound on , while guaranteeing (a) requires a rather minimal small-ball assumption. Therefore, the seemingly more restrictive condition (a) is almost universally true (see [MenACM, LM-JEMS] for more details) and (b) is the only place in which the moment growth assumption is used in the proof of Theorem 1.4.

Clearly, column normalization resolves the issue of an upper estimate on . That, and the fact that (a) is true under minimal assumptions has led G. Lecué [Lec] to ask whether with column normalization, the moment growth condition (1.2) can be relaxed significantly, leading to a much stronger version of Theorem 1.4.

###### Question 1.6

Let be a mean-zero, variance random variable, set to be the matrix generated by and let be the column-normalized matrix generated by . Thus, the entries of are

 ~Γij=xij(∑mℓ=1x2ℓj)1/2=Γij∥Γej∥2.

If for every and , does satisfy the exact reconstruction property of order with high probability?

Our main result is a version of Theorem 1.3 for a column-normalized matrix generated by well chosen random variable, showing that the answer to question 1.6 is negative.

###### Theorem 1.7

There exist absolute constants and for which the following holds. For every there is a symmetric, variance random variable with the following properties:

If are independent copies of and , then for every and every , .

If , then with probability at least , the column-normalized matrix generated by does not satisfy the exact reconstruction property of order .

Theorem 1.7 answers Question 1.6 in the negative: column normalization does not improve the poor behaviour described in Theorem 1.3. Indeed, for , linear forms satisfy an norm equivalence, but the recovery of -sparse vectors using requires at least measurements — significantly larger than the optimal number of measurements, . Moreover, if and , then although for every , the recovery of -sparse vectors using requires at least measurements, which, again, is suboptimal when .

###### Remark 1.8

Theorem 1.7 actually improves the estimates from Theorem 1.3: a logarithmic factor in the bound on the number of measurements is removed, and the probability estimate is significantly better: rather than constant probability.

Let us mention the straightforward observation that a version of Theorem 1.4 holds for column-normalized matrices as well.

###### Theorem 1.9

Let be and be as in Theorem 1.4 and let be the column-normalized measurement matrix generated by . If , then with probability at least , satisfies the exact reconstruction property of order .

Theorem 1.9 is an immediate consequence of the proof of Theorem 1.4; its proof is presented in Appendix A merely for the sake of completeness.

## 2 Proof of Theorem 1.7

Let be a symmetric, -valued random variable, set to be a -valued random variable with mean and let ; the values of and will be specified later. Let

 x=ε⋅max{1,ηR},

let be independent copies of and set .

Let us identify conditions under which satisfies the first part of Theorem 1.7.

###### Lemma 2.1

There exists an absolute constant for which the following holds. Assume that and that there is such that for every , . Then for every and every ,

 ∥⟨X,t⟩∥Lq≤c0L√q∥⟨X,t⟩∥L2.

Moreover, for every , , and .

In particular, is an isotropic random vector and for every , exhibits a -subgaussian moment growth up to the -th moment.

The proof of Lemma 2.1 is based on a simple comparison argument:

###### Lemma 2.2

Let be centred, independent random variables and assume are also centred and independent. If is even and for every and every , , then for every ,

 ∥d∑j=1tjxj∥Lp≤L∥d∑t=1tjzj∥Lp.

Proof.  By a standard symmetrization argument we may also assume that and are symmetric. Therefore,

 E(d∑j=1tjxj)p=E∑→βc→βd∏j=1tβjjxβjj=∑→βc→βd∏j=1tβjjExβjj,

with the sum taken over all choices of , where and is the appropriate multinomial coefficient. Since are symmetric, the only products that do not vanish are when are even, and if are even then

 d∏j=1tβjjExβjj≤d∏j=1tβjjLβjEzβjj.

Therefore,

 ∑→βc→βd∏j=1tβjjExβjj≤Lp∑→βc→βd∏j=1tβjjEzβjj=LpE(d∑j=1tjzj)p.

Proof of Lemma 2.1. Observe that is mean-zero and that . Hence, if and then —and the “moreover” part of the claim follows.

Turning to the first part of the claim, let be independent copies of , set to be a standard gaussian random variable and let be independent copies of . Recall that for every , , and observe that

 (E|x|q)1/q≤1+Rδ1/q≤2L√q≤c1L(E|g|q)1/q.

Therefore, and satisfy the conditions of Lemma 2.2 with a constant . Applying Lemma 2.2, it follows that for every and every ,

 ∥d∑j=1tjxj∥Lq≤c1L∥|d∑j=1tjgj∥Lq≤c2L√q;

thus, .

The key part in the construction is the following lemma which describes the typical structure of the matrix generated by ,

 Γ=(xij)1≤i≤m,1≤j≤d:Rd→Rm.
###### Lemma 2.3

There exist absolute constants and for which the following holds. Let and . Then, with probability at least :

(1) there are indices and such that and for , ;

there is a subset of cardinality such that for every and ;

we have that , where and .

###### Corollary 2.4

If satisfies Lemma 2.3 then its column-normalized version does not satisfy the exact reconstruction property of order .

Proof.  Using the notation of Lemma 2.3 and by its first part, ; hence, if we denote by the standard basis of ,

 ~Γej1=1(R2+m−1)1/2(εℓj1Rfℓ+∑i≠ℓεij1fi)

and

 ~Γej2=1(R2+m−1)1/2(εℓj2Rfℓ+∑i≠ℓεij2fi).

If set ; otherwise, set . In either case, is -sparse. Let and observe that the coordinates of satisfy that

 wℓ=0   and   w2i≤1R2+m−1   for  i≠ℓ;

therefore,

 ~Γv∈√mRBm2.

Next, let be the set of coordinates given by the second part of Lemma 2.3. Clearly, and

 ΓJ=(xij)1≤i≤m,j∈J=(εij)1≤i≤m,j∈J

is an Bernoulli matrix. Therefore,

 ~ΓJ=(~Γij)1≤i≤m,j∈J=ΓJ√m.

Observe that and by the third part of Lemma 2.3

 c√mBm2⊂1√mΓBJ1

for an absolute constant .

Hence, if then . Since and , it is evident that is not the unique solution of the minimization problem

 min∥t∥1   subject to    ~Γv=~Γt

and does not satisfy the exact reconstruction property of order .

The proof of Lemma 2.3 uses a standard fact on iid -valued random variables: if are independent copies of a -valued random variable and then with probability at least , .

Proof of Lemma 2.3. Let be independent copies of , let be the indicator of the event

 ∃ℓ∈{1,...,m}    ηℓ=1   and   ηi=0 for every i≠ℓ.

Observe that and that if are independent copies of and then with probability at least , . In particular, on that event, the matrix has at least two identical columns, each with a single entry of . Therefore, the first part of Lemma 2.3 holds if

 mδ(1−δ)m−1≥2md (2.1)

For the second part of the lemma, let be the indictor of the event

 ηi=0   for every 1≤i≤m

and note that . If are independent copies of and then with probability at least , . Hence, if

 (1−δ)m≥4md, (2.2)

then with probability at least , there is and for every and every , .

Turning to the third part of the lemma, and by applying the second part, we have that for , . Let and recall that are independent of . Therefore, by Corollary 4.1 from [LPRP], there are absolute constants and for which, with probability at least ,

 c4Bm2⊂ΓBJ1.

Finally, all that remains is to see when (2.1) and (2.2) are satisfied. It is straightforward to verify that if for then (2.1) holds, and if then (2.2) holds. Therefore, both conditions are satisfied with the choice of for a suitable absolute constant , as long as .

To complete the proof of Theorem 1.7, let as above, set and put —a choice which complies with the conditions of Lemma 2.3 as long as

 m≤c2√pd1/p. (2.3)

It follows from Corollary 2.4 that with probability at least , the column-normalized matrix generated by does not satisfy the exact reconstruction property of order . To complete that proof, all that remains is to show that also satisfies the assumptions of Lemma 2.1: that for every and for an absolute constant .

To that end, let and observe that is decreasing when ; hence, for every as long as . Therefore, if we set then for every , as required.

## Appendix A Proof of Theorem 1.9

The proof is a direct consequence of the argument used in the proof of Theorem 1.4. Thanks to column normalization, satisfies (b) in Lemma 1.5 for . All that is left to verify is (a) for which is a constant that depends only on .

The proof of Theorem 1.4 shows that if has independent rows that are distributed as then with probability at least ,

 inft∈Σs∥Γt∥22=inft∈Σsm∑i=1⟨Xi,t⟩2≥c3(L)m∥t∥22.

Also, with probability at least ,

 max1≤j≤d∥Γej∥2≤c5(L)√m.

For every , set

 ~t=d∑j=1tj∥Γej∥2ej,

which is also an -sparse vector. Observe that , implying that

 ∥~Γt∥22≥c3md∑j=1t2j∥Γej∥22≥c3c25∥t∥22,

and (a) from Lemma 1.5 is verified for the matrix for .

### Footnotes

1. footnotetext: Department of Mathematics, Technion, I.I.T., Haifa, Israel and Mathematical Sciences Institute, The Australian National University, Canberra, Australia, Email: shahar@tx.technion.ac.il
2. footnotetext: Supported in part by the Israel Science Foundation.
3. Recall that a characterization of an -subgaussian random variable is that for every
291323