Compressed Sensing Using Binary Matrices of Nearly Optimal Dimensions

# Compressed Sensing Using Binary Matrices of Nearly Optimal Dimensions

Mahsa Lotfi and Mathukumalli Vidyasagar ML is with the the Erik Jonsson School of Engineering and Computer Science, The University of Texas at Dallas, Richardson, TX 75080, USA. MV is with the Indian Institute of Technology Hyderabad, and The University of Texas at Dallas. Emails: mahsa.lotfi@utdallas.edu; m.vidyasagar@iith.ac.in, m.vidyasagar@utdallas.edu. This research was supported by the National Science Foundation, USA under Award #ECCS-1306630, and by the Department of Science and Technology, Government of India.
###### Abstract

In this paper, we study the problem of compressed sensing using binary measurement matrices, and -norm minimization (basis pursuit) as the recovery algorithm. We derive new upper and lower bounds on the number of measurements to achieve robust sparse recovery with binary matrices. We establish sufficient conditions for a column-regular binary matrix to satisfy the robust null space property (RNSP), and show that the sparsity bounds for robust sparse recovery obtained using the RNSP are better by a factor of compared to the restricted isometry property (RIP). Next we derive universal lower bounds on the number of measurements that any binary matrix needs to have in order to satisfy the weaker sufficient condition based on the RNSP, and show that bipartite graphs of girth six are optimal. Then we display two classes of binary matrices, namely parity check matrices of array codes, and Euler squares, that have girth six and are nearly optimal in the sense of almost satisfying the lower bound. In principle randomly generated Gaussian measurement matrices are “order-optimal.” So we compare the phase transition behavior of the basis pursuit formulation using binary array code and Gaussian matrices, and show that (i) there is essentially no difference between the phase transition boundaries in the two cases, and (ii) the CPU time of basis pursuit with binary matrices is hundreds of times faster than with Gaussian matrices, and the storage requirements are less. Therefore it is suggested that binary matrices are a viable alternative to Gaussian matrices for compressed sensing using basis pursuit.

## 1 Introduction

Compressed sensing refers to the recovery of high-dimensional but low-complexity entities from a limited number of measurements. The specific problem studied in this paper is to recover a vector where only components are significant and the rest are either zero or small, from a set of linear measurements where . A variant is when where denotes measurement noise, and a prior bound of the form is available. By far the most popular solution methodology for this problem is basis pursuit in which an approximation to the unknown vector is constructed via

 ^x:=\operatornamewithlimitsargminz∥z∥1 s.t. ∥y−Az∥≤ϵ. (1)

The basis pursuit approach (with so that the constraint in (1) becomes ) was proposed in [1, 2], but without guarantees on its performance. Much of the subsequent research in compressed sensing has been focused on the case where consists of samples of a zero-mean, unit-variance Gaussian or sub-Gaussian random variable, normalized by . With this choice, it is shown in [3] that, with high probability with respect to the process of generating , measurements suffice to ensure that defined in (1) equals , provided is sufficiently sparse. It is also known that any compressed sensing algorithm requires samples; see [4] for an early result, and [5] for a simpler and more explicit version of this bound. Thus random Gaussian matrices are “order optimal” in the sense that the number of measurements is within a fixed universal constant of the minimum required.

In recent times, there has been a lot of interest in the use of sparse binary measurement matrices for compressed sensing. One of the main advantages of this approach is that it allows one to connect compressed sensing to fields such as graph theory and algebraic coding theory. Random matrices are dense, and each element needs to be stored to high precision. In contrast, sparse binary matrices require less storage both because they are sparse, and also because every nonzero element equals one. For this reason, binary matrices are also said to be “multiplication-free.” As a result, popular compressed sensing approaches such as (1) can be applied effectively for far larger matrices, and with greatly reduced CPU time, when is a sparse binary matrix instead of a random Gaussian matrix.

At present, the best available bounds for the number of measurements required by a binary matrix are . This contrasts with for random Gaussian matrices. However, in the latter case, the symbol hides a very large constant. It is shown in the paper that for values of of thereabouts, the known bounds with binary matrices are in fact smaller than with random Gaussian matrices.

The preceding discussion refers to the case where a particular matrix is “guaranteed” to recover all sufficiently sparse vectors. A parallel approach is to study conditions under which “most” sparse vectors are recovered. Specifically, in this approach, are fixed, and is varied from to . For each choice of , a large number of vectors with exactly nonzero components are generated at random, and the fraction that is recovered accurately is computed. Clearly, as is increased, this fraction decreases. But the phenomenon of interest is known as “phase transition.” One might expect that the fraction of recovered randomly generated vectors equals when is sufficiently small, and decreases gradually to as approaches . In reality there is a sharp boundary below which almost all -sparse vectors are recovered, and above which almost no -sparse vectors are recovered. This has been established theoretically for the case where consists of random Gaussian samples in [6, 7, 8, 9]. A very general theory is derived in [10], where the measurement matrix still consists of random Gaussians, but the objective function is changed from the -norm to an arbitrary convex function. In a recent paper [11], phase transitions are studied empirically for several classes of deterministic measurement matrices, and it is verified that there is essentially no difference between the phase transitions with random Gaussian matrices.

Now we describe the organization of the paper, as well as its contributions. Sections 2 through 5 contain background material, but also include some improvements over known results. Specifically, Section 2 gives a precise definition of compressed sensing. Sections 3 and 7 discuss two of the most popular sufficient conditions for achieving compressed sensing, namely the restricted isometry property (RIP) and the robust null space property (RNSP), respectively. The relationship between the two is discussed in Section 5. Then we review the literature on the construction of binary matrices for compressed sensing in Section 6. The original contributions of the paper begin with Section 7. In this section we derive a sufficient condition for a binary matrix to satisfy the RNSP. This condition improves the best known bounds by a factor of roughly . In Section 8 we derive a universal lower bound on the number of measurements that are needed to satisfy the sufficient condition derived in Section 7. It is shown that the number of measurements is minimized when the bipartite graph associated with the measurement matrix has girth . In Section 9, we present a class of binary matrices that have girth six, which includes as special cases (i) a construction from LDPC (low density parity check) coding theory known as array codes, and (ii) another construction based on Euler squares, the matrices in this class come close to meeting the lower bound on the number of measurements derived in Section 8. This is the justification for the phrase “nearly optimal” in the title of the paper. In Section 10, we discuss the phase transition behavior of the basis pursuit formulation when this class of binary matrices are used. In Section 11 we present some numerical examples. On the basis of these examples, it is possible to conclude that: (i) there is no discernible difference between the phase transition behavior with random Gaussian matrices compared to the binary matrices proposed in [12], and the class of matrices proposed here. On the other hand, the time of execution using our class of binary matrices is 1,000 times faster, if not more, compared to random Gaussian matrices. On the basis of the material presented here, we believe that the class of binary matrices proposed here are a viable alternative to, and possibly a replacement for, random Gaussian measurement matrices.

## 2 Definition of Compressed Sensing

Let denote the set of -sparse vectors in ; that is

 Σk:={x∈Rn:∥x∥0≤k},

where, as is customary, denotes the number of nonzero components of . Given a norm on , the -sparsity index of with respect to that norm is dfined by

 σk(x,∥⋅∥):=minz∈Σk∥x−z∥.

Now we are in a position to define the compressed sensing problem precisely. Note that is called the measurement matrix and is called the “decoder map.”

###### Definition 1.

The pair is said to achieve stable sparse recovery of order and indices if there exists a constant such that

 ∥Δ(Ax)−x∥p≤Cσk(x,∥⋅∥q),∀x∈Rn. (2)

The pair is said to achieve robust sparse recovery of order and indices (and norm ) if there exist constants and such that, for all with , it is the case that

 ∥Δ(Ax+η)−x∥p≤Cσk(x,∥⋅∥q)+Dϵ,∀x∈Cn. (3)

The above definitions apply to general norms. In this paper, and indeed in much of the compressed sensing literature, the emphasis is on the case where and . However, the norm on is still arbitrary.

## 3 Approaches to Compressed Sensing – I: RIP

Next we present some sufficient conditions for basis pursuit as defined in (1) to achieve robust or stable sparse recovery. There are two widely used sufficient conditions, namely the restricted isometry property (RIP) and the stable (or robust) null space property (SNSP or RNSP). We begin by discussing the RIP.

###### Definition 2.

A matrix is said to satisfy the restricted isometry property (RIP) of order with constant if

 (1−δ)∥u∥22≤∥Au∥22≤(1+δ)∥u∥22,∀u∈Σk. (4)

The RIP is formulated in [3]. It is shown in a series of papers [3, 13, 14] that the RIP of is sufficient for to achieve robust sparse recovery. Now we present the best known, and indeed the “best possible,” result relating RIP and robust recovery.

###### Theorem 1.

If satisfies the RIP of order with constant for some , or for , then achieves robust sparse recovery. Moreover, both bounds are tight.

The first bound is proved in [15] while the second bound is proved in [16]. Note that both bounds are equal when . Hence the theorem provides a continuous tight bound on for all .

This theorem raises the question as to how one may go about designing measurement matrices that satisfy the RIP. There are two popular approaches, one probabilistic and one deterministic. In the probabilistic method, the measurment matrix equals where consists of independent samples of a Gaussian variable, or more generally, a sub-Gaussian random variable. In this paper we restrict our attention to the case where consists of Gaussian samples, and refer the reader to [17] for the more general case of sub-Gaussian samples. The relevant bound on to ensure that satisfies the RIP with high probability is given next; it is a fairly straight-forward modification of [17, Theorem 9.27].

###### Theorem 2.

Suppose an integer and real numbers are specified, and that , where consists of independent samples of a normal Gaussian random variable . Define

 g=1+1√2ln(en/k),η=√1+δ−1g. (5)

Then satisfies the RIP of order with constant with probability provided

 m≥2η2(klnenk+ln2ξ). (6)
###### Proof.

The proof of this theorem is given in very sketchy form, as it follows that of [17, Theorem 9.27]. In that theorem, it is shown that, if the measurement matrix consists of independent samples of Gaussian random variables, and if

 m≥2η2(klogenk+ln2ξ),

where satisfies

 δ≤2gη+g2η2,

then satisfies the RIP of order with constant , with probability . Now the above equation can be rewritten as

 δ+1≤1+2gη+g2η2=(1+gη)2.

Rearranging this equation leads to (5). ∎

Equation (6) leads to an upper bound of the form for the number of measurements that suffice for the random matrix to satisfy the RIP with high probability. It is shown in [5, Theorem 3.1] that any algorithm that achieves stable sparse recovery requires measurements. See [4, Theorem 5.1] for an earlier version. For the convenience of the reader, we restate the latter theorem. Note that it is assumed in [5] that , but the proof requires only that . In order to state the theorem, we introduce the entropy with respect to an arbitrary integer . Suppose is an integer. Then the -ary entropy is defined by

 Hθ(u):=−ulogθuθ−1−(1−u)logθ(1−u). (7)
###### Theorem 3.

Suppose and that, for some map , the pair achieves stable -sparse recovery with constant . Define . Then111Note that the base of the logarithm does not matter because it cancels out between the two terms.

 m≥1−Hθ(1/2)log(4+2C)klogθ (8)

Because robust -sparse recovery implies stable -sparse recovery, the bound in 8 applies also to robust -sparse recovery.

Comparing Theorems 2 and 3 shows that measurements are both necessary and sufficient for robust -sparse recovery. For this reason, the probabilistically generated measurement matrices are considered to be “order-optimal.” However, this statement is misleading because the symbol in the upper bound hides a very large constant, as shown next.

###### Example 1.

Suppose and , which is a problem instance studied later in Section 11. Then the upper and lower bounds from Theorems 2 and 3 imply that

 14≤m≤44,345.

Thus the spread between the upper and lower bounds is more than three orders of magnitude. Also, the upper bound for the number of measurements is more than the dimension .

There is another factor as well. As can be seen from Theorem 2, probabilistic methods lead to measurement matrices that satisfy the RIP only with high probability, that can be made close to one but never exactly equal to one. Moreover, as shown in [18], once a matrix has been generated, it is NP-hard to test whether that particular matrix satisfies the RIP.

These observations have led the research community to explore deterministic methods to construct matices that satisfy the RIP. A popular approach is based on coherence of a matrix.

###### Definition 3.

Suppose is column-normalized, so that for all , where denotes the -column of . Then the coherence of is denoted by and is defined as

 μ(A):=maxi≠j|⟨ai,aj⟩|. (9)

The following result is an easy consequence of the Gerschgorin circle theorem.

###### Lemma 1.

A matrix satisfies the RIP of order with constant

 δk=(k−1)μ, (10)

provided that , or equivalently, .

## 4 Approaches to Compressed Sensing – II: RNSP

An alternative to the RIP approach to compressed sensing is provided by the stable (and robust) null space property. The SNSP is formulated in [19], while, to the best of the author’s knowledge, the RNSP is formulated for the first time in [20]; see also [17, Definition 4.17].

###### Definition 4.

Suppose and let denote the null space of . Then is said to satisfy the stable null space property (SNSP) of order with constant if, for every set with , we have that

 ∥vS∥1≤ρ∥vSc∥1,∀v∈N(A). (11)

The matrix is said to satisfy the robust null space property (RNSP) of order for the norm with constants and if, for every set with , we have that

 ∥hS∥1≤ρ∥hSc∥1+τ∥Ah∥,∀h∈Rn. (12)

It is obvious that RNSP implies the SNSP.

The utility of these definitions is brought out in these theorems.

###### Theorem 4.

(See [17, Theorem 4.12].) Suppose satisfies the stable null space property of order with constant . Then the pair achieves stable -sparse recovery with

 C=21+ρ1−ρ. (13)

It is shown in [17, Theorm 4.14] that the SNSP is necessary and sufficient for to achieve stable sparse recovery.

###### Theorem 5.

(See [17, Theorem 4.22].) Suppose satisfies the robust null space property of order for the norm with constants and . Then the pair achieves robust -sparse recovery with

 C=21+ρ1−ρ,D=4τ1−ρ. (14)

It is shown in [17, Theorm 4.20] that the RNSP is necessary and sufficient for to achieve robust sparse recovery.

## 5 Relationship Between RIP and RNSP

Until recently, the twin approaches of RIP and RNSP had proceeded along parallel tracks. However, it is shown in [21, Theorem 9] that if satisfies the RIP of order with constant for some , then it satisfies the RSNP of order . The specific result is the following:

###### Theorem 6.

Given integers and a real number , suppose that the matrix satisfies the RIP of order with constant . Define

 ν:=√t(t−1)−(t−1). (15)

Then satisfies the RNSP with constants

 ρ=c/a<1,τ=b√k/a2, (16)

where

 a := [ν(1−ν)−δ(0.5−ν+ν2)]1/2 (17) = [(1−δ)−(1+δ)(1−2ν)2]1/22,
 b:=ν(1−ν)√1+δ, (18)
 c:=[δν22(t−1)]1/2. (19)

As stated in Theorem 1, is the weakest sufficient condition in terms of RIP for robust sparse recovery, whenever . Taken in conjunction with Theorem 6, it follows that it is not possible to obtain weaker sufficient conditions using the RIP approach than by using the RNSP approach.

Note that if has coherence , then by Lemma 1, we have that for all . Next, by Theorem 6, basis pursuit achieves robust -sparse recovery whenever

 (tk−1)μ<√t−1t (20)

for any . So let us ask: What is an “optimal” choice of ? To answer this question, we neglect the in comparison to , and rewrite the above inequality as

 kμ<√t−1t3.

Thus we get the best bound by maximizing the right side with respect to . It is an easy exercise in calculus to show that the maximum is achieved with , and the corresponding bound . Hence by combining with Lemma 1 we can derive the following bound.

###### Theorem 7.

Suppose has coherence . Then achieves robust -sparse recovery whenever

 (1.5k−1)μ<1/√3, (21)

or equivalently

 k<⌊23√3μ+23⌋. (22)

Moreover, the bound is nearly optimal when applying Theorem 6.

If we retain the term instead of replacing it by , we would get a more complicated expression for the optimal value of . However, it can be verified that if (21) is satisfied, then so is (20).

## 6 Binary Matrices for Compressed Sensing: A Review

In this section we present a brief review of the use of binary matrices as measurement matrices in compressed sensing. The first construction of a binary matrix that satisfies the RIP is due to DeVore and is given in [12]. The DeVore matrix has dimensions , where is a power of a prime number, and is an integer, has exactly elements of in each column, and has coherence . This construction is generalized to algebraic curves in [22], but does not seem to offer much of an advantage over that in [12]. A construction that leads to matrices of order based on Reed-Muller codes is proposed in [23]. Because the number of measurements is restricted to be a power of , this is not a very practical method. A construction in [24] is based on a method to generate Euler squares from nearly a century ago [25]. The resulting binary matrix has dimensions , where is an arbitrary integer, making this perhaps the most versatile construction. The integer is bounded as follows: Let be the prime number decomposition of . Then . In particular if is itself a power of a prime, we can have . Each column of the resulting binary matrix has exactly ones, and the matrix has coherence . All of these matrices can be used to achieve robust -sparse recovery via the basis pursuit formulation, by combining Lemma 1 with Theorem 1. Another method found in [26] constructs binary matrices using the Chinese remainder theorem, and achieves probabilistic recovery.

There is another property that is sometimes referred to as the -RIP, introduced in [27, 28, 29], which makes a connection between expander graphs and compressed sensing. However, while this approach readily leads to stable -sparse recovery, it does not lend itself readily to robust -sparse recovery. One of the main contributions of [30] is to show that the construction of [12] can also be viewed as a special case of an expander graph construction proposed in [31].

Yet another direction is initiated in [32], in which a general approach is presented for generating binary matrices for compressed sensing using algebraic coding theory. In particular, it is shown that binary matrices which, when viewed as elements over the binary field , have good properties in decoding, will also be good measurement matrices when viewed as matrices of real numbers. In particular, several notions of “pseudoweights” are introduced, and it is shown that these pseudoweights can be related to the satisfaction of the stable (but not robust) null space property of binary matrices. These bounds are improved in [33] to prove the stable null space property under weaker conditions than in [32].

## 7 Robust Null Space Property of Binary Matrices

In this section we commence presenting the new results of this paper on identifying a class of binary matrices for compressed sensing that have a nearly optimal number of measurements.

Suppose with . Then can be viewed as the biadjacency matrix of a bipartite graph with input (or “left”) nodes and output (or “right”) nodes. Such a graph is said to be left-regular if each input node has the same degree, say . This is equivalent to saying that each column of contains exactly ones. Given a bipartite graph with edges, input nodes and output nodes, define the “average left degree” of the graph as , and . Note that these average degrees need not be integers. Then it is clear that . The girth of a graph is defined as the length of the shortest cycle. Note that the girth of a bipartite graph is always an even number, and in “simple” graphs (not more than one edge between any pair of vertices), the girth is at least four.

Hereafter, in this paper we will not make a distinction between a binary matrix, and the bipartite graph associated with the matrix. Specifically, the columns correspond to the “left” nodes while the rows correspond to the “right” nodes. So an expression such as “ is a left-regular binary matrix of degree ” means that the associated bipartite graph is left-regular with degree . This usage will permit us to avoid some tortuous sentences.

Theorems 8 and 9 are the starting point for the contents of this section.

###### Theorem 8.

(See [33, Theorem 2].) Suppose is left-regular with left degree , and suppose that the maximum inner product between any two columns of is . Then for every , we have that

 |vi|≤λ2dl∥v∥1,∀i∈[n], (23)

where denotes .

If the matrix has girth six or more, then the maximum inner product between any two columns of is at most equal to one. In such a case it is possible to improve the bound (23).

###### Theorem 9.

(See [33, Theorem 3].) Suppose and that has girth . Then for every , we have that

 |vi|≤∥v∥1C′,∀i∈[n], (24)

where, if , then

 C′:=2t∑i=0(dl−1)i, (25)

and if , then

 C′:=2t−1∑i=0(dl−1)i, (26)

Note that Theorem 9 is an improvement over Theorem 8 only when the girth of the graph is . If the girth equals , then as defined in (25) becomes , and the bound in (24) becomes the same as that in (23) after noting that . Similarly, if , then in (26) also becomes just .

In [33], the bounds (23) and (24) are used to derive sufficient conditions for the matrix to satisfy the stable null space property. However, it is now shown that the same two bounds can be used to infer the robust null space property of . This is a substantial improvement, because with such an matrix, basis pursuit would lead to robustness against measurement noise, which is not guaranteed with the SNSP. We derive our results through a series of preliminary results.

###### Lemma 2.

Suppose , and let be any norm on . Suppose there exist constants such that

 |hi|≤∥h∥1α+β∥Ah∥,∀i∈[n],∀h∈Rn. (27)

Then, for all , the matrix satisfies the RNSP of order . Specifically, whenever with , (12) holds with

 ρ=kα−k,τ=αkβα−k. (28)
###### Proof.

Let with be arbitrary. Then

 ∥hS∥1 = ∑i∈S|hi| ≤ kα∥h∥1+kβ∥Ah∥ = kα(∥hS∥1+∥hSc∥1)+kβ∥Ah∥.

Therefore

 (1−kα)∥hS∥1≤kα∥hSc∥1+kβ∥Ah∥,

or

 ∥hS∥1≤kα−k∥hSc∥1+αkβα−k∥Ah∥,

which is the desired conclusion. ∎

Next, let be arbitrary and let be any norm on . Let denote the null space of , and let denote the orthogonal complement of in . Then for all , it is easy to see that

 ∥u∥2≤1σmin∥Au∥2,

where is the smallest nonzero singular value of . Because all norms on a finite-dimensional space are equivalent, there exists a constant that depends only on the norm on such that

 ∥y∥2≤c∥y∥,∀y∈Rm. (29)

In particular, , so we can take in this case. Therefore, by Schwarz’ inequality, we get

 ∥u∥1≤√n∥u∥2≤c√nσmin∥Au∥,∀u∈N⊥. (30)

Now we can state the main result of this section.

###### Theorem 10.

Suppose is left-regular with left degree , and let denote the maximum inner product between any two columns of (and observe that ). Next, let denote the smallest nonzero singular value of , and for an arbitrary norm on , choose the constant such that (29) holds. Then satisfies (27) with

 α=2dlλ,β=(λ2dl+1)c√nσmin. (31)

Consequently, for all , satisfies the RNSP of order with

 ρ=λk2dl−λk,τ=2dlk2dl−λkβ. (32)
###### Proof.

Let be arbitrary, and express as , where and . Then clearly

 |hi|=|vi+ui|≤|vi|+|ui|,∀i∈[n].

We will bound each term separately.

As shown in Theorem 8, we have that

 |vi| ≤ λ2dl∥v∥1 ≤ λ2dl(∥h∥1+∥u∥1) ≤ λ2dl∥h∥1+λc√n2dlσmin∥Au∥ = λ2dl∥h∥1+λc√n2dlσmin∥Ah∥,

where the last step follows from the fact that because . Next

 |ui|≤∥u∥1≤c√nσmin∥Ah∥,∀i∈[n].

Combining these two inequalities shows that

 |hi|≤|vi|+|ui|≤λ2dl∥h∥1+(λ2dl+1)c√nσmin∥Ah∥.

This establishes (31). Now (32) follows from Lemma 2, specifically (28). ∎

###### Theorem 11.

Suppose is left-regular with left-degree , and has girth of at least six. Define the constant as in (25) or (26) as appropriate. Then for all , the matrix satisfies the RNSP of order , with constants

 ρ=kC′−k,τ=C′−kC′kβ. (33)

The proof of Theorem 11 is entirely analogous to that of Theorem 10, with the bound in Theorem 9 replacing that in Theorem 8. Therefore the proof is omitted.

The results in Theorem 10 lead to sharper bounds for the sparsity count compared to using RIP and coherence bounds. This is illustrated next.

###### Example 2.

Suppose is left-regular with degree and with the inner product between any two columns bounded by . Then it is easy to see that the coherence of is bounded by . Therefore, if we use Theorem 7, then it follows that achieves robust -sparse recovery whenever

 k<⌊2dl3√3λ+23⌋.

In contrast, if we use Theorem 10, it follows that achieves robust sparse recovery whenever , which is an improvement by a factor of roughly .

## 8 Lower Bounds on the Number of Measurements

Theorem 9 shows that, for a fixed left degree , as the girth of the graph corresponding to becomes larger, so does the constant . Therefore, as the girth of increases, so does the upper bound on as obtained from Theorem 11. This suggests that, for a given left degree and number of input nodes , it is better to choose graphs of large girth. However, as shown next, as the girth of a graph is increased, the number of measurements also increases. As shown below, the “optimal” choice for the girth is actually .

To establish this statement, let us define

 ¯k:={(dl−1)tif g=4t+2,(dl−1)t−1if g=4t. (34)

It is recognized that is just the last term in the summation in (25) and (26). Now, if the actual sparsity count satisfies , then it follows from Theorem 9 that the pair achieves robust -sparse recovery. As stated before, if we choose the matrix to have higher and higher girth, the bound also becomes higher. So the question therefore becomes: What happens to , the number of measurements, as the girth is increased? The answer is given next.

###### Theorem 12.

Suppose is -left regular graph with , and that every row and every column of contain at least two ones. If the girth of equals , then

 m≥¯k2/(t+1)nt/(t+1), (35)

whereas if for , then

 m≥¯k(2t−1)/[t(t−1)]n(t−1)/t. (36)

The proof of Theorem 12 is based on the following result [34, Equations (1) and (2)]:

###### Theorem 13.

Suppose with . Suppose further that in the bipartite graph associated with , every node has degree .222This is equivalent to the requirement that every row and every column of contains at least two ones. Let denote the total number of edges of the graph, and define to be the average left-node degree and average right-node degree, respectively. Suppose finally that the graph has girth . Then

 m≥r−1∑i=0(¯dl−1)⌈i/2⌉(¯dr−1)⌊i/2⌋. (37)

It is important to note that the above theorem does not require any assumptions about the underlying graph (e.g., regularity). The only assumption is that every node has degree two or more, so as to rule out trivial cases. Usually such theorems are used to find upper bounds on the girth of a bipartite graph in terms of the numbers of its nodes and edges (as in Theorem 14 below). However, we turn it around here and use the theorem to find a lower bound on , given the integers and .

Note that if , then and the bound (37) becomes , which is trivial. In fact has to exceed the maximum degree of any left node. However, for , the bound in (37) is meaningful.

###### Proof.

(Of Theorem 12:) The bound (37) implies that is no smaller than the last term in the summation; that is

 m≥¯d⌈(r−1)/2⌉l¯d⌊(r−1)/2⌋r. (38)

Because is assumed to be left-regular, actually , but we do not make use of this, and will carry the symbol throughout. By definition, we have that . Therefore, if , then it follows that

 ¯dr−1=n¯dlm−1≥n¯dlm−nm=nm(¯dl−1).

Therefore (38) implies that

 m≥(¯dl−1)α(nm)⌊(r−1)/2⌋, (39)

where

 α=⌈(r−1)/2⌉+⌊(r−1)/2⌋.

Now we treat the cases and separately. If , then , so that

 ⌈(r−1)/2⌉=⌊(r−1)/2⌋=t,α=2t.

Therefore (39) becomes

 m≥(¯dl−1)2t(nm)t=¯k2(nm)t.

This can be rearranged as

 mt+1≥nt¯k2,

or

 m≥¯k2/(t+1)nt/(t+1),

which is (35). In case , the proof proceeds along entirely parallel lines and is omitted. ∎

It is obvious from (35) that the lower bound is minimized (for a fixed choice of and ) with , or . Similarly, the lower bound in (36) is minimized when , or . Higher values of would lead to more measurements being required. We can also compare with and show that is better. Let us substitute in (35) and in (36). This gives

 m≥{¯kn1/2if g=6,¯k3/2n1/2if g=8. (40)

If we wish to have fewer measurements than the dimension of the unknown vector, we can set . Substituting this requirement into (40) leads to

 ¯k

Hence graphs of girth 6 are preferable to graphs of girth 8, because the upper limit on the recoverable sparsity count is higher with a graph of girth than with a graph of girth .

## 9 Construction of Nearly Optimal Graphs of Girth Six

The discussion of the preceding section suggests that we must look for bipartite graphs of girth six where the integer satisfies the bound (37) with the replaced by an equality, or at least, close to it. In this section we prove a general result to the effect that a class of binary matrices has girth six. Then we give two specific constructions. The first of these is based on array codes which are a part of low density parity check (LDPC) codes, and the second is based on Euler squares. The first construction is easier to explain, but the second one gives far more flexibility in terms of the number of measurements.

Here is the general theorem.

###### Theorem 14.

Suppose for some integers . Suppose further that

1. , where is the average left degree of .

2. The maximum inner product between any two columns of is one.

3. Every row and every column of have at least two ones.

Then the girth of is six.

Remark: Before proving the theorem, let us see how closely such a matrix satisfies the inequality (37). In the constructions below we have that , and . Therefore the bound in (37) becomes

 m≥1+(l−1)+(l−1)(q−1)=q(l−1)+1.

Since , we see that the actual value of exceeds the lower bound for by a factor of (after neglecting the last term of on the right side). Note that there is no guarantee that the lower bound in Theorem 10 is actually achievable. So the class of matrices proposed above (if they could actually be constructed), can be said to be “near optimal.” In applying this theorem, we would choose such that , and choose any desired . With such a measurement matrix, basis pursuit will achieve robust -sparse recovery up to , that is, , more or less.

###### Proof.

Let denote the girth of . Then Condition (2) implies that . Condition (3) implies that the bound (37) applies with , , . Let , and define

 α=⌈(r−1)/2⌉+⌊(r−1)/2⌋,β=⌊(r−1)/2⌋.

Then the inequality (37) implies that

 lq≥(¯dl−1)α(q/l)β≥(l−1)α(q/l)β.

This can be rewritten as

 (l−1)αqβ−1lβ+1≤1. (41)

Note that , so that , due to Condition (2). We study two cases separately.

Case (1): for some . In this case

 (r−1)/2=t−1/2,⌈(r−1)/2⌉=t,⌊(r−1)/2⌋=t−1,
 α=2t−1,β=t−1.

Therefore (41) becomes

 (l−1)2t−1qt−2lt≤1, (42)

or

 qt−2(l−1)t−1≤(ll−1)t≤2t,

because for . Also

 qt−2(l−1)t−1≥qt−2(l−1)t−2=[q(l−1)]t−2.

Combining these inequalities gives

 [q(l−1)]t−2≤2t,

or

 [q(l−1)2]t−2≤22=4. (43)

It is now shown that (43) cannot hold if . If , then

 q(l−1)2≤[q(l−1)2]t−2≤4,

or . However, and , so this inequality cannot hold. Now let us consider the possibility that , i.e., that . In this case (42) becomes

 (l−1)31l2≤1, or (l−1)3≤l2.

This inequality can hold only for and not if . Hence cannot have girth for any .

Case (2): for some . In this case

 ⌈(r−1)/2⌉=⌊(r−1)/2⌋=t,α=2t,β=t.

So (41) becomes

 (l−1)2tqt−1lt+1≤1. (44)

As before, this can be rewritten as

 qt−1(l−1)t−1≤(ll−1)t+1≤2t+1,

or

 [q(l−1)2]t−1≤22=4. (45)

This inequality can hold if because the left side equals . However, if , then (45) implies that

 q(l−1)2≤[q(l−1)2]t−1≤4,

or , which is impossible. Hence (45) implies that , or that . ∎

Now we present two explicit constructions of binary matrices that satisfy the conditions of Theorem 14.

The first construction is taken from the theory of low density parity check (LDPC) codes, and is a generalization of [35]. This type of construction for Low Density Parity Check codes (LDPC) was first introduced in [36]. Let be a prime number, and let be any “fixed-point free” permutation of . In [35] is taken as the shift permutation matrix defined by and the rest zeros, where is interpreted modulo . Then , the identity matrix. Now let be any integer, and define the matrix as the block-partitioned matrix , where

 Mij=P(i−1)(j−1). (46)

More elaborately, the matrix is given by

 H(q,l)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣III…IIPP2…Pq−1IP2P4…P2(q−1)⋮⋮⋱⋮⋮IPl−1P2(l−1)…P(l−1)(q−1)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (47)

The matrix is biregular, with left (column) degree and right (row) degree . It is rank-deficient, having rank . In principle we could drop the redundant rows, but that would destroy the left-regularity of the matrix, thus rendering the theory in this paper inapplicable. (However, the resulting matrix would still be right-regular.) Moreover, due to the fixed-point free nature of , it follows that the inner product between any two columns of is at most equal to one.

It is shown in [35, Proposition 1] that has girth six, but that follows from Theorem 14.

The second construction is based on Euler squares. In [25], a general recipe is given for constructing generalized Euler squares. This is used in [24] to construct an associated binary matrix of order where is any arbitrary integer (in contrast with the construction of [35] which requires to be a prime number), such that the maximum inner product between any two columns is at most equal to one. Again, by Theorem 14, such matrices have girth six and are thus nearly optimal for compressed sensing. The upper bound on is defined as follows: Let be the prime number decomposition of . Then . In particular if is a prime or a power of a prime, then we can have . It is easy to verify that, if is a prime, then the construction in [24] is the same as the array code construction of [35] with permuted columns. For the case where is a prime power, the construction is more elaborate and is not pursued further here.

###### Example 3.

In this example we compare the number of samples required when using the DeVore construction and a matrix that satisfies the hypotheses of Theorem 14, such as the array code matrix or the Euler square matrix. The conclusions are that: (i) When , the Devore construction requires fewer measurements than the array code, whereas when , the array code type of matrix requires fewer measurements. (ii) When , the DeVore construction requires more measurements than , the dimension of the unknown vector, whereas the array code construction has whenever .

To see this, recall that the DeVore construction produces a matrix of dimensions with the maximum inner product between columns equal to , and each column contains ones. So if we choose , then in Theorem 11 equals , while . Consequently the DeVore matrix satisfies the RNSP of order whenever , and the number of measurements equals , Thus requires that , or . In contrast, a matrix of the type disussed in Theorem 14 has dimensions where and . For this class of matrices, we have and . This matrix satisfies the RNSP whenever , and the number of measurements equals . Now if and only if . Also whenever . Here, in the interests of simplicity, we ignore the fact that has to be a prime number in both cases, and various rounding up operations.

## 10 Phase Transitions

Phase transition refers to an abrupt change in the qualitative behavior of the solution to a problem as the parameters are changed. In the case of compressed sensing, let us define two quantities: , which is known as the undersampling ratio, and , which is known as the oversampling ratio.333This terminology is introduced in [6] with denoted by and denoted by . Since these symbols are now used to denote different quantities in the compressed sensing literature, we use and instead. Suppose we choose integers , together with a matrix , and use basis pursuit as the decoder. If a -sparse vector is chosen at random, we can ask: What is the probability that recovers the vector?

This question is answered in [6, 37] using techniques from combinatorial geometry, specifically polytope theory. Suppose is a convex polytope, that is, the convex hull of a finite number of points in , and as a matrix. Polytopes have vertices, edges and -dimensional faces called facets. Let denote the number of facets of dimension . In particular, we can define various polytopes corresponding to -sparse vectors in , which is called a “cross polytope” [37]. The image of under , denoted by , is also a polytope, and for each , we have that . Moreover, it is shown in [37] that, if is drawn at random from the cross polytope , then the probability of recovering via basis pursuit equals the ratio . Thus the question becomes of analyzing the behavior of this ratio for specific polytopes and specific matrices .

In [7], it is proved that if consists of samples of a normal (Gaussian) random variable, then as this recovery probability (i.e., the face count ratio) exhibits a sharp change when is increased for a fixed . It is claimed that this behavior is observed even with moderate values of such as . In this paper, the authors make a distinction between two types of recovery, namely uniform and nonuniform. In uniform recovery, basis pursuit is expected to recover all -sparse vectors, with high probability (with respect to the randomly generated Gaussian matrix). In nonuniform recovery, there is also a uniform probability measure on the set of -sparse vectors in , and basis pursuit is expected to recover a -sparse vector with high probability (both with respect to the randomly generated -sparse vector and the randomly generated Gaussian vector). Clearly, nonuniform recovery holds whenever uniform recovery holds, but the converse need not be true. In the present paper, the focus is on uniform recovery.

Donoho and Tanner in [8] and [38] define the strong threshold and weak threshold as the threshold for uniform and nonuniform recovery, respectively. Unfortunately, there is no closed form expression for values either in the weak or the strong case. However, in [38], Theorems 1.4 and 1.5 suggest complicated formulas for these functions that work in the