Toeplitz Block Matrices in Compressed Sensing

# Toeplitz Block Matrices in Compressed Sensing

Florian Sebert, Leslie Ying, and Yi Ming Zou
01/10/2008
###### Abstract.

Recent work in compressed sensing theory shows that independent and identically distributed (IID) sensing matrices whose entries are drawn independently from certain probability distributions guarantee exact recovery of a sparse signal with high probability even if . Motivated by signal processing applications, random filtering with Toeplitz sensing matrices whose elements are drawn from the same distributions were considered and shown to also be sufficient to recover a sparse signal from reduced samples exactly with high probability. This paper considers Toeplitz block matrices as sensing matrices. They naturally arise in multichannel and multidimensional filtering applications and include Toeplitz matrices as special cases. It is shown that the probability of exact reconstruction is also high. Their performance is validated using simulations.

F. Sebert and Y. M. Zou are with the Department of Mathematical Sciences, University of Wisconsin, Milwaukee, WI 53201, USA email: fmsebert@uwm.edu, ymzou@uwm.edu
L. Ying is with the Department of Electrical Engineering, University of Wisconsin, Milwaukee, WI 53201, USA email: leiying@uwm.edu

## 1. Introduction

The central problem in compressed sensing (CS) is the recovery of a vector from its linear measurements of the form

 (1.1) yi=,1≤i≤n,

where is assumed to be much smaller than . Of course, for , (1.1) posts an under-determined system of equations which has non-unique solutions. Exact recovery of the original vector needs further prior information. The work by Candés, Donoho, Romberg, Tao, and others (see e.g. [1],[2], and the references therein) showed that under the assumption that is sparse, one can actually recover from a sample which is much smaller in size than by solving a convex program with a suitably chosen sampling basis . If we write the linear system (1.1) in the form

 (1.2) y=Φx,where Φ is an n×N matrix,

then the question about what sampling methods guarantee the exact recovery of becomes the question about what matrices are “good” compressed sensing matrices, meaning that they ensure exact recovery of a sparse from with high probability under the condition that .

In [3] Candès and Tao introduce the restricted isometry property as a condition on matrices which provides a guarantee on the performance of in compressed sensing.

Following their definition, we say that a matrix satisfies RIP of order and constant if

 (1.3) (1−δm)∥z∥22≤∥ΦTz∥22≤(1+δm)∥z∥22∀z∈R|T|,

where , , and denotes the matrix obtained by retaining only the columns of corresponding to the entries of .

It was shown in [3] (reinterpreted in [4]) that if satisfies RIP of order and constant :

 (1.4) (1−δ3m)∥z∥22≤∥ΦTz∥22≤(1+δ3m)∥z∥22∀z∈R|T|,

where and , the decoder given by

 (1.5) △(y):=argmin∥x∥lN1      subject to  Φx=y

ensures exact recovery of from .

Recently Baraniuk et al [5] showed that matrices whose entries are drawn independently from certain probability distribution satisfy RIP of order with probability for every provided that , where are some positive constants depending only on . Motivated by applications in signal processing, Bajwa et al [6] considered (truncated) Toeplitz-structured matrices whose entries are drawn from the same probability distributions and showed that they satisfy RIP of order with probability for every provided that .

Some examples of probability distributions that can be used in this context have been studied in [7]. They include

 ri,j∼N(0,1n),
 (1.6) ri,j=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩1√nwith % probability1/2−1√nwith probability1/2,
 ri,j=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩√3nwith % probability1/60with probability2/3−√3nwith probability1/6.

Motivated by applications in multichannel sampling, in this paper we will consider Toeplitz block matrices with elements in each block drawn independently from one of the probability distributions in (1.6) and some other block matrices with similar structures. We show that such matrices also satisfy RIP of order for every with high probability, provided that , where and is some positive constant depending only on . These Toeplitz block matrices naturally represent the system equation matrices in multichannel sampling applications where a single input signal is recovered from output samples of multiple channels with IID random filters. The result justifies the use of multichannel over single-channel systems in compressed sensing. The advantages of Toeplitz matrices pointed out in [6], like e.g. efficient implementations, also apply to the matrices considered in this paper.

## 2. Main Result

###### Theorem 2.1.

For Toeplitz block matrices of the form

 (2.1) Φ=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝ΦkΦk−1…Φ2Φ1Φk+1Φk…Φ3Φ2⋮⋮⋱⋱⋮Φk+l−1Φk+l−2……Φl⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rn×N

with blocks whose elements are drawn independently from one of the probability distributions in (1.6), there exist constants depending only on , such that:

1. If , then for any , satisfies RIP of order for every with probability at least

 1−e−c2n/l.
2. If , then for any , satisfies RIP of order for every with probability at least

 1−e−c2n/m2.

The above theorem gives the requirement for and probability of exact reconstruction of a -sparse signal from a measurement if Toeplitz block matrices are used. In particular it says, that if the number of blocks () in one column of does not exceed a certain value depending only on the sparsity of the signal , the probability of perfect reconstruction is greater and the number of required measurements is smaller than if is not bounded in this way.

As noted in [6, 9], Toeplitz matrices naturally arise in one-dimensional single-channel filtering applications where the matrix elements are filter coefficients. Similarly, the Toeplitz block matrices defined in (2.1) naturally arise in one-dimensional multichannel sampling applications where the length of the filter is at least points larger than that of the input signal. The conventional multichannel sampling theorem states that the sampling rate reduction over the single channel system cannot exceed the number of channels for exact recovery. While Theorem 2.1 suggests that multichannel systems with IID random filters might be able to reduce the sampling rate by a factor higher than the number of channels.

We remark, that for other block matrices with similar structures, the result in Theorem 2.1 also holds (see IV).

## 3. Proof of Main Result

Let . Denote by the -th row of the matrix obtained by retaining only those columns of corresponding to the elements in , and let denote the set of random variables common to the -th row of and the -th block of .

We note that, if (1.4) holds for a set , then it also holds for any . To prove that Toeplitz IID block matrices satisfy RIP with high probability, it is therefore enough to consider only those sets where .

###### Lemma 3.1.

Define the sets by

 DT,i={j∈{1,2,…,n}:ΦT,j is stochastically dependent on% ΦT,i,j≠i}.
• If satisfies , then

• If satisfies , then

###### Proof.

Fix . defines a sequence , where is the number of columns from block in . Thus . Consider the number of rows that have dependency with the elements in . Since all elements inside a single block are independent, there can be no dependencies within one block. Moreover, because of the structure of the matrix , there can be at most

 {0if Φts∩ΦT,i=∅|T|−rtsif Φts∩ΦT,i≠∅

rows outside the block that depend on any element in .
(i) If satisfies , i.e. if , these rows may be distinct, and we have

 |DT,i| ≤∑{ts,s∈{1,2,…,k}:Φts∩ΦT,i≠∅}(|T|−rts) ≤∑t∈T(|T|−1)=|T|(|T|−1)≤l−1

dependent rows.
(ii) If satisfies , i.e. if , then is upper bounded by the number of blocks, so .

In [7] it has been shown that for given , , and with , an IID matrix of size with entries drawn independently from one of the distributions in (1.6)111These matrices consist of columns whose squared norm is equal to 1 in expectation. satisfies (1.3) with probability

 (3.1) ≥1−e−f(n,m,δm),

where

 (3.2) f(n,m,δm)=c0n−mln(12/δm)−ln(2).

Now consider a (truncated) Toeplitz block matrix as in (2.1), where the blocks are such IID matrices with entries drawn independently from the same set of distributions as above.

The following lemma gives an upper bound for the probability that a matrix as in (2.1) with satisfies (1.4) for any fixed subset with . Lemma 3.3 gives a tighter bound for the case .

The proof of Lemma 3.2 uses an argument similar to the one in the proof of Lemma 1 in [6].

###### Lemma 3.2.

For given with , and , the Toeplitz block submatrix satisfies (1.4) with probability at least

 1−e−f(d,m,δm)+ln(l).
###### Proof.

We can write the matrix as

 (3.3) ΦT=⎛⎜ ⎜⎝Φ1T⋮ΦlT⎞⎟ ⎟⎠,

where the blocks of size are given by the columns determined by in the -th row of blocks in .

Note that is an IID matrix with entries from one of the distributions in (1.6). If we let , then the matrices have columns whose squared norm is equal to 1 in expectation and by (3.1) satisfy (1.4), i.e.

 (1−δm)∥z∥22≤∥~ΦiTz∥22≤(1+δm)∥z∥22, ∀z∈R|T|, ∀i∈{1,2,…,l},

with probability at least

 (3.4) 1−e−f(d,m,δm).

Now since

 (3.5) ∥ΦTz∥22=l∑i=1∥ΦiTz∥22=l∑i=11l∥~ΦiTz∥22

and , we have

 (1−δm)∥z∥22≤∥ΦTz∥22≤(1+δm)∥z∥22,∀z∈R|T|.

In other words, the event implies the event
. Consequently,

 P(E2) =1−P(Ec2)≥1−P(Ec1) ≥1−l∑i=1P({~ΦiT does not % satisfy (???)}) ≥1−l∑i=1e−f(d,m,δm)(by (???)) =1−e−f(d,m,δm)+ln(l).

###### Lemma 3.3.

For given with , and , if , the Toeplitz block submatrix satisfies (1.4) with probability at least

 1−e−f(⌊n/q⌋,m,δm)+ln(q),

where .

###### Proof.

Let denote the -th row of and construct an undirected dependency graph such that and

 E={(i,i′)∈V×V:i≠i′,ΦT,i and ΦT,i′ are dependent}.

By Lemma 3.1, can at most be dependent with other rows. Therefore, the maximum degree of is given by , and using the Hajnal-Szemerédi theorem on equitable coloring of graphs, we can partition using colors. Let be the different color classes, then

 |Cj|=⌊n/q⌋ or |Cj|=⌈n/q⌉.

Now, let be the submatrix obtained from retaining the rows corresponding to the indices in and define . Then

 (3.6) ∀z∈R|T|,∥ΦTz∥22=q∑j=1∥ΦjTz∥22=q∑j=1|Cj|n∥~ΦjTz∥22.

Every is a IID matrix whose columns have squared norm equal to 1 in expectation. By (3.1), they satisfy (1.4) with probability at least

 (3.7) 1−e−f(|Cj|,m,δm)≥1−e−f(⌊n/q⌋,m,δm).

Since , by (3.6), we have that if

 (1−δm)∥z∥22≤∥~ΦjTz∥22≤(1+δm)∥z∥22,∀z∈R|T|, ∀j∈{1,2,…,q}

then

 (1−δm)∥z∥22≤∥ΦTz∥22≤(1+δm)∥z∥22,∀z∈R|T|.

In other words, the event implies the event . Consequently,

 P(E2) =1−P(Ec2)≥1−P(Ec1) ≥1−q∑j=1P({~ΦjT does not % satisfy (???)}) ≥1−q∑j=1e−f(⌊n/q⌋,m,δm)(by (???)) =1−e−f(⌊n/q⌋,m,δm)+ln(q).

Main result in Theorem 2.1.

###### Proof.

(i) From (3.2) and Lemma 3.2 we have that satisfies (1.4) for any such that with probability at least

 (3.8) 1−e−c0d+3mln(12/δ3m)+ln(2)+ln(l).

Since there are such subsets, using Bonferroni’s inequality (see e.g. [8]) yields that satisfies RIP of order with probability at least

 (3.9) 1−e−c0n/l+3m[ln(12/δ3m)+ln(N/3m)+1]+ln(2)+ln(l).

Fix and pick . Then for any , the exponent of in (3.9) is upper bounded by :

 −c0nl+3m[ln(12δ3mN3m)+1]+ln(2l)≤−c2nl ⇔ 3m[ln(12δ3mN3m)+1]+ln(2l)≤nl(c0−c2) ⇔ 3lmc0−c2[ln(12δ3mN3m)+1+ln(2)3m+ln(l)3m]≤n ⇔ 3lmln(Nm)c0−c2⎡⎢ ⎢⎣ln(12δ3mN3m)+1+ln(2)+ln(l)3mln(Nm)⎤⎥ ⎥⎦≤n ⇐ 3lmln(Nm)c0−c2[ln(12δ3m)+5]≤n ⇔ c1lmln(Nm)≤n

(ii) From (3.2) and Lemma 3.3 we have that satisfies (1.4) for any such that with probability at least

 1−e−c0⌊n/q⌋+3mln(12/δ3m)+ln(2)+ln(q) (3.10) ≥ 1−e−c0n/9m2+3mln(12/δ3m)+ln(2)+ln(9m2)+c0.

Since there are such subsets, using Bonferroni’s inequality again yields that satisfies RIP of order with probability at least

 (3.11) 1−e−c0k/9m2+3m[ln(12/δ3m)+ln(N/3m)+1]+ln(2)+ln(9m2)+c0.

Now fix and pick , where . Then, for any , the exponent of in (3.11) is upper bounded by . This completes the proof of the theorem. ∎

###### Remark 3.1.

If , then is an IID matrix, and Theorem 2.1 lower bounds the probability of satisfying RIP of order by , which recovers the bound obtained in [5].

###### Remark 3.2.

As long as , a matrix as in (2.1) satisfies RIP of order with probability , which is the bound given in [6], since

 (3.12) −c2n/l≤−c2n/(9m2−3m)≤−c2n/9m2=−c′2n/m2.

## 4. Other Block Matrices

### 4.1. Circular matrices

The above consideration can be applied to (truncated) circulant block matrices of the form

 (4.1) Φ=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝ΦkΦk−1…Φ2Φ1Φ1Φk…Φ3Φ2⋮⋮⋱⋱⋮Φl−1Φl−2……Φl⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rn×N,

where the blocks are all IID matrices.

Similar to (2.1), the circulant matrices in (4.1) also represent the system equation matrices in multichannel sampling, but the convolution is a circular one. They usually arise in applications where convolutions are implemented by multiplications in Fourier domain.

Before we present the theorem for this type of matrices, we first comment on the maximum number of stochastically dependent rows in a (truncated) circulant matrix of the form

 (4.2) A=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝aqaq−1…a2a1a1aq…a3a2⋮⋮⋱⋱⋮ap−1ap−2……ap⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rp×q.

Again, we denote by the -th row of the matrix , which is obtained by retaining only those columns of corresponding to .

###### Lemma 4.1.

Define the sets by is stochastically dependent on Then has cardinality at most .

###### Proof.

Note first, that an upper bound for the case clearly upper bounds the case where . We may therefore assume that and is a square circulant matrix. Then the number of rows stochastically dependent on is independent of and we can, w.l.o.g., assume that . Let be a -tuple defined by

 tj={0if j∉T1if j∈T,j=1,\ldots,q,

and consider the matrix

 (4.3) ~A=⎛⎜ ⎜ ⎜ ⎜⎝tσ(t)…σq−1(t)⎞⎟ ⎟ ⎟ ⎟⎠∈Rq×q,

where defines the right-shift . Denote by the matrix obtained by retaining only those columns of corresponding to . It is now easy to see that

 |DT,i| =|{~AT,i,i∈{2,…,q} : h(~AT,1,~AT,i)<|T|}| ≤{# of ones in t}⋅({# of ones % in t}−1) =|T|(|T|−1),

where is the Hamming distance defined by

 h(x,y)=|{j∈{1,2,…,q} : xj≠yj}|.

The following theorem gives lower bounds for the probability that a circulant block matrix as in 4.2 satisfies the RIP of order . Note that the bounds obtained are the same as in 2.1 although the number of independent entries in is greater than before. This is due to the nature of the proof using the number of stochastically dependent rows of which is the same for both Toeplitz and circulant matrices.

###### Theorem 4.1.

Let be as in (4.1). Then there exist constants depending only on , such that:

1. If , then for any , satisfies RIP of order for every with probability at least

 1−e−c2n/l.
2. If , then for any , satisfies RIP of order for every with probability at least

 1−e−c2n/m2.
###### Proof.

A similar argument as the one in the proof of Lemma 3.1 shows that the upper bound for the maximum number of rows stochastically dependent on any row of a (truncated) circulant block matrix is the same as for the (truncated) Toeplitz block matrices (use Lemma 4.1). Then the proof of Theorem 2.1 directly applies to the setting at hand. ∎

### 4.2. Circulant-circulant Matrices

We also consider matrices that are (truncated) circulant block matrices whose blocks are themselves circulant:

 (4.4) Φ =⎛⎜ ⎜ ⎜ ⎜ ⎜⎝ΦkΦk−1…Φ2Φ1Φ1Φk…Φ3Φ2⋮⋮⋱⋱⋮Φl−1Φl−2……Φl⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rn×N, (4.5) Φi =⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝φipφip−1…φi2φi1φi1φip…φi3φi2⋮⋮⋱⋱⋮φiq−1φiq−2……φiq⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rq×p.

Denote by the right-shift of blocks and by the right-shift of elements inside a block , both by one position. These matrices arise in two-dimentional imaging applications where the independent elements are the coefficients of the point spread function of the imaging system. Replacing (4.3) in the proof of Lemma 4.1 by

 ¯A=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝tσ1τ0(t)⋮σ(i−1)(mod p)τ⌊i−1p⌋(t)⋮σp−1τl−1(t)⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rlq×kp,

readily yields the upper bound for the number of rows stochastically dependent on any one row of . Applying Lemma 3.3 and Theorem 4.1 shows that the probability for perfect reconstruction is no less than . This says that imaging systems with IID random point spread functions can significantly reduce the number of acquired samples, while still being able to reconstruct the original sparse image if the above conditions hold.

### 4.3. Circulant-circulant Block Matrices

As a generalization of the matrices defined by (4.4) and (4.5), the following matrices are also considered:

 (4.6) Φ =⎛⎜ ⎜ ⎜ ⎜ ⎜⎝Φk1Φk1−1…Φ2Φ1Φ1Φk1…Φ3Φ2⋮⋮⋱⋱⋮Φl1−1Φl1−2……Φl1⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rn×N, Φi

where the blocks are all IID matrices. These matrices arise in multichannel two-dimensional imaging applications where the number of rows in corresponds to the independent channels. We show next that these matrices are also good compressed sensing matrices.

###### Corollary 4.1.

Let be as in (4.6). Then there exist constants depending only on , such that:

1. If , then for any , satisfies RIP of order for every with probability at least

 1−e−c2n/l1l2.
2. If , then for any , satisfies RIP of order for every with probability at least

 1−e−c2n/m2.

This follows directly from Lemma 4.1 and Theorem 4.1.

### 4.4. Deterministic Construction

The CS matrices we have considered so far are based on randomized constructions. However, in certain applications, deterministic constructions are preferred. In [10] DeVore provided a deterministic construction of CS matrices using polynomials over finite fields. We will consider deterministic block matrices based on DeVore’s construction. Let us first recall the construction in [10].

Consider the set , where denotes the field of integers modulo , a prime. This set has elements. Define , . This set has elements. For every , define the graph of by

and consider the column vector , indexed by the elements of ordered lexicographically, given by

 v(f):=(1(0,0)∈G(f),…,1(0,p−1)∈G(f),1(1,0)∈G(f),…,1(p−1,p−1)∈G(f))t,

where

 1(a,b)∈G(f)={1if (a,b)∈G(f)0if (a,b)∉G(f)

Construct the matrix , where the polynomials are ordered lexicographically with respect to their coefficients. It was shown in [10], that the matrix satisfies RIP for any with .

Now consider

 (4.7) Ψ0=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝ΨtΨt−1…Ψ2Ψ1Ψt+1Ψt…Ψ3Ψ2⋮⋮⋱⋱⋮Ψt+s−1Ψt+s−2……Ψs⎞⎟ ⎟ ⎟ ⎟ ⎟⎠∈Rsp2×tl,

where , and each block is constructed from the first vectors , as above.

###### Theorem 4.2.

The matrix satisfies RIP with for any .

###### Proof.

As before, we only have to consider the case where . Let such that , and let be the matrix obtained by retaining only those columns of corresponding to the elements in . Consider the matrix . Since every column of has exactly ones, the diagonal elements of are all one. An off diagonal element of has the form , where , and denotes the vector that represents some polynomial . Since the graphs of two different polynomials in have at most elements in common, for any . Therefore, the sum of all off diagonal elements in any row or column of is whenever . We can, therefore, write

 (4.8) GT=I+BT,

where and . Since , we have that and so the spectral norms of and are and , respectively. This shows that satisfies (1.4). ∎

## 5. Numerical Results

To validate that the probability of exact recovery for Toeplitz block CS matrices is high, the performance of Toeplitz block, IID, and Toeplitz CS matrices is compared empirically. In our simulation, a length n = 2048 signal with randomly placed m = 20 non-zero entries drawn independently from the Gaussian distribution was generated. Each such generated signal is sampled using IID, Toeplitz and Toeplitz block matrices with entries drawn independently from the Bernoulli distribution and reconstructed using the log barrier solver from [11]. The experiment is declared a success if the signal is exactly recovered, i.e., the error is within the range of machine precision. The empirical probability of success is determined by repeating the reconstruction experiment 1000 times and calculating the fraction of success. This empirical probability of success is plotted as a function of the number of measurement samples n in Fig. 1. The simulation results show, that in the vast majority of applications all Toeplitz block matrices perform similar to IID matrices.

## Acknowledgment

The third author would like to acknowledge the support from IMA for his participation in the short course “Compressive Sampling and Frontiers in Signal Processing”.

## References

• [1] E. Candès, J. Romberg, and T. Tao, ”Robust uncertainty principles: Exact sinal reconstruction from highly incomplete frequency information”, IEEE Trans. Inf. Theory 52, no. 2, pp. 489-509, 2006.
• [2] D. Donoho, ”Compressed Sensing”, IEEE Trans. Information Theory 52, no. 4, pp. 1289-1306, 2006.
• [3] E. Candès and T. Tao, ”Decoding by linear programming”, IEEE Trans. Inf. Theory 51, no. 12, pp. 4203-4215, 2005.
• [4] A. Cohen, W. Dahmen, and R. DeVore, ”Compressed sensing and best k-term approximation”, (2006), Preprint.
• [5] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, ”A Simple Proof of the Restricted Isometry Property for Random Matrices”, (2007), Preprint.
• [6] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak, ”Toeplitz-Structured Compressed Sensing Matrices”, IEEE SSP Workshop, pp. 294-298, 2007.
• [7] D. Achlioptas, ”Database-friendly Random Projections”, Proc. ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, pp. 274-281, 2001.
• [8] Y. Dodge, F. Marriott, Int. Statistical Institute, ”The Oxford dictionary of statistical terms”, 6th ed, Oxford University Press, p. 47, 2003.
• [9] J.A. Tropp, ”Random Filters for Compressive Sampling”, Proceedings of 40th Annual Conference on Information Sciences and Systems, pp. 216 - 217, 22-24 March 2006.
• [10] R. DeVore, ”Deterministic Constructions of Compressed Sensing Matrices”, (2007), Preprint.
• [11] E. Candés, http://www.acm.caltech.edu/l1magic/.