ModifiedCS: Modifying Compressive Sensing for Problems with Partially Known Support ^{†}^{†}thanks: N. Vaswani and W. Lu are with the ECE dept. at Iowa State University (email: {namrata,luwei}@iastate.edu). A part of this work appeared in [1, 2]. This research was supported by NSF grants ECCS0725849 and CCF0917015. Copyright (c) 2010 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubspermissions@ieee.org.
Abstract
We study the problem of reconstructing a sparse signal from a limited number of its linear projections when a part of its support is known, although the known part may contain some errors. The “known” part of the support, denoted , may be available from prior knowledge. Alternatively, in a problem of recursively reconstructing time sequences of sparse spatial signals, one may use the support estimate from the previous time instant as the “known” part. The idea of our proposed solution (modifiedCS) is to solve a convex relaxation of the following problem: find the signal that satisfies the data constraint and is sparsest outside of . We obtain sufficient conditions for exact reconstruction using modifiedCS. These are much weaker than those needed for compressive sensing (CS) when the sizes of the unknown part of the support and of errors in the known part are small compared to the support size. An important extension called Regularized ModifiedCS (RegModCS) is developed which also uses prior signal estimate knowledge. Simulation comparisons for both sparse and compressible signals are shown.
I Introduction
In this work, we study the sparse reconstruction problem from noiseless measurements when a part of the support is known, although the known part may contain some errors. The “known” part of the support may be available from prior knowledge. For example, consider MR image reconstruction using the 2D discrete wavelet transform (DWT) as the sparsifying basis. If it is known that an image has no (or very little) black background, all (or most) approximation coefficients will be nonzero. In this case, the “known support” is the set of indices of the approximation coefficients. Alternatively, in a problem of recursively reconstructing time sequences of sparse spatial signals, one may use the support estimate from the previous time instant as the “known support”. This latter problem occurs in various practical applications such as realtime dynamic MRI reconstruction, realtime singlepixel camera video imaging or video compression/decompression. There are also numerous other potential applications where sparse reconstruction for time sequences of signals/images may be needed, e.g. see [3, 4].


Sparse reconstruction has been well studied for a while, e.g. see [5, 6]. Recent work on Compressed Sensing (CS) gives conditions for its exact reconstruction [7, 8, 9] and bounds the error when this is not possible [10, 11].
Our recent work on Least Squares CSresidual (LSCS) [12, 13] can be interpreted as a solution to the problem of sparse reconstruction with partly known support. LSCS replaces CS on the observation by CS on the LS observation residual, computed using the “known” part of the support. Since the observation residual measures the signal residual which has much fewer large nonzero components, LSCS greatly improves reconstruction error when fewer measurements are available. But the exact sparsity size (total number of nonzero components) of the signal residual is equal to or larger than that of the signal. Since the number of measurements required for exact reconstruction is governed by the exact sparsity size, LSCS is not able to achieve exact reconstruction using fewer noiseless measurements than those needed by CS.
Exact reconstruction using fewer noiseless measurements than those needed for CS is the focus of the current work. Denote the “known” part of the support by . Our proposed solution (modifiedCS) solves an relaxation of the following problem: find the signal that satisfies the data constraint and is sparsest outside of . We derive sufficient conditions for exact reconstruction using modifiedCS. When is a fairly accurate estimate of the true support, these are much weaker than the sufficient conditions for CS. For a recursive time sequence reconstruction problem, this holds if the reconstruction at is exact and the support changes slowly over time. The former can be ensured by using more measurements at , while the latter is often true in practice, e.g. see Fig. 1.
We also develop an important extension called Regularized ModifiedCS which also uses prior signal estimate knowledge. It improves the error when exact reconstruction is not possible.
A shorter version of this work first appeared in ISIT’09 [1]. In parallel and independent work in [14], Khajehnejad et al have also studied a similar problem to ours but they assume a probabilistic prior on the support. Other related work includes [15]. Very recent work on causal reconstruction of time sequences includes [16] (focusses on the timeinvariant support case) and [17] (use past estimates to only speed up the current optimization but not to improve reconstruction error). Except [14], none of these prove exact reconstruction using fewer measurements and except [15, 14], none of these even demonstrate it.
Other recent work, e.g. [18], applies CS on observation differences to reconstruct the difference signal. While their goal is to only estimate the difference signal, the approach could be easily modified to also reconstruct the actual signal sequence (we refer to this as CSdiff). But, since all nonzero coefficients of a sparse signal in any sparsity basis will typically change over time, though gradually, and some new elements will become nonzero, thus the exact sparsity size of the signal difference will also be equal to/larger than that of the signal itself. As a result CSdiff will also not achieve exact reconstruction using fewer measurements, e.g. see Fig.3.
In this work, whenever we use the term CS, we are actually referring to basis pursuit (BP) [5]. As pointed out by an anonymous reviewer, modifiedCS is a misnomer and a more appropriate name for our approach should be modifiedBP.
As pointed out by an anonymous reviewer, modifiedCS can be used in conjunction with multiscale CS for video compression [19] to improve their compression ratios.
The paper is organized as follows. We give the notation and problem definition below. ModifiedCS is developed in Sec. II. We obtain sufficient conditions for exact reconstruction using it in Sec. III. In Sec. IV, we compare these with the corresponding conditions for CS and we also do a Monte Carlo comparison of modifiedCS and CS. We discuss Dynamic ModifiedCS and Regularized Modified CS in Sec. V. Comparisons for actual images and image sequences are given in Sec. VI and conclusions and future work in Sec. VII.
Ia Notation
We use for transpose. The notation denotes the norm of the vector . The pseudonorm, , counts the number of nonzero elements in . For a matrix, , denotes its induced norm, i.e. .
We use the notation to denote the submatrix containing the columns of with indices belonging to . For a vector, the notation (or ) refers to a subvector that contains the elements with indices in . The notation, . We use to denote the complement of the set w.r.t. , i.e. . The set operations, stand for set union and intersection respectively. Also denotes set difference. For a set , denotes its size (cardinality). But for a scalar, , denotes the magnitude of .
The restricted isometry constant [9], , for a matrix, , is defined as the smallest real number satisfying
(1) 
for all subsets of cardinality and all real vectors of length . The restricted orthogonality constant [9], , is defined as the smallest real number satisfying
(2) 
for all disjoint sets with , and , and for all vectors , of length , respectively. By setting in (2),
(3) 
The notation means that is Gaussian distributed with mean and covariance while denotes the value of the Gaussian PDF computed at point .
IB Problem Definition
We measure an length vector where
(4) 
We need to estimate which is a sparse length vector with . The support of , denoted , can be split as where is the “known” part of the support, is the error in the the known part and is the unknown part. Thus, , , are disjoint and .
We use to denote the size of the (s)upport, to denote the size of the (k)nown part of the support, to denote the size of the (e)rror in the known part and to denote the size of the (u)nknown part of the support.
We assume that satisfies the restricted isometry property (RIP) [9] for . RIP means that where is the RIP constant for defined in (1).
In a static problem, is available from prior knowledge. For example, in the MRI problem described in the introduction, let be the (unknown) set of all DWT coefficients with magnitude above a certain zeroing threshold. Assume that the smaller coefficients are set to zero. Prior knowledge tells us that most image intensities are nonzero and so the approximation coefficients are mostly nonzero. Thus we can let be the (known) set of indices of all the approximation coefficients. The (unknown) set of indices of the approximation coefficients which are zero form . The (unknown) set of indices of the nonzero detail coefficients form .
For the time series problem, and with support, , and is the support estimate from the previous time instant. If exact reconstruction occurs at , . In this case, is the set of indices of elements that were nonzero at , but are now zero (deletions) while is the newly added coefficients at (additions). Slow sparsity pattern change over time, e.g. see Fig. 1, then implies that and are much smaller than .
When exact reconstruction does not occur, includes both the current deletions and the extras from , . Similarly, includes both the current additions and the misses from , . In this case, slow support change, along with , still implies that and .
Ii Modified Compressive Sensing (modifiedCS)
Our goal is to find a signal that satisfies the data constraint given in (4) and whose support contains the smallest number of new additions to , although it may or may not contain all elements of . In other words, we would like to solve
(5) 
If is empty, i.e. if , then the solution of (5) is also the sparsest solution whose support contains .
As is well known, minimizing the norm is a combinatorial optimization problem [20]. We propose to use the same trick that resulted in CS [5, 7, 8, 10]. We replace the norm by the norm, which is the closest norm to that makes the optimization problem convex, i.e. we solve
(6) 
Denote its output by . If needed, the support can be estimated as
(7) 
where is a zeroing threshold. If exact reconstruction occurs, can be zero. We discuss threshold setting for cases where exact reconstruction does not occur in Sec. VA.
Iii Exact Reconstruction Result
We first analyze the version of modifiedCS in Sec. IIIA. We then give the exact reconstruction result for the actual problem in Sec. IIIB. In Sec. IIIC, we give the two key lemmas that lead to its proof and we explain how they lead to the proof. The complete proof is given in the Appendix. The proof of the lemmas is given in Sec. IIID.
Recall that , , and .
Iiia Exact Reconstruction Result: version of modifiedCS
Consider the problem, (5). Using a rank argument similar to [9, Lemma 1.2] we can show the following. The proof is given in the Appendix.
Proposition 1
Using , this is equivalent to . Compare this with [9, Lemma 1.2] for the version of CS. It requires which is much stronger when and , as is true for time series problems.
IiiB Exact Reconstruction Result: modifiedCS
Of course we do not solve (5) but its relaxation, (6). Just like in CS, the sufficient conditions for this to give exact reconstruction will be slightly stronger. In the next few subsections, we prove the following result.
Theorem 1 (Exact Reconstruction)
To understand the second condition better and relate it to the corresponding CS result, let us simplify it. Simplifying further, a sufficient condition for is . Further, a sufficient condition for this is .
To get a condition only in terms of ’s, use the fact that [9]. A sufficient condition is . Further, notice that if and if , then .
Corollary 1 (Exact Reconstruction)
Given a sparse vector, , whose support, , where and are disjoint and . Consider reconstructing it from by solving (6).

is the unique minimizer of (6) if and
(9) 
This, in turn, holds if

This, in turn, holds if and
These conditions can be rewritten by substituting .
As shown in Fig. 1, usually , and (which means that ). Consider the case when the number of measurements, , is smaller than what is needed for exact reconstruction for a given support size, , but is large enough to ensure that . Under these assumptions, compare (9) with (10). Notice that (a) the first bracket of the left hand side (LHS) of (9) will be small compared to the LHS of (10). The same will hold for the second and third terms of its second bracket compared with the second and third terms of (10). The first term of its second bracket, , will be smaller than the first term of (10), . Thus, for a certain range of values of , the LHS of (9) will be smaller than that of (10) and it may happen that (9) holds, but (10) does not hold. For example, if , (10) will not hold, but if , (9) can hold if are small enough. A detailed comparison is done in Sec. IV.
IiiC Proof of Theorem 1: Main Lemmas and Proof Outline
The idea of the proof is motivated by that of [9, Theorem 1.3]. Suppose that we want to minimize a convex function subject to and that is differentiable. The Lagrange multiplier optimality condition requires that there exists a Lagrange multiplier, , s.t. . Thus for to be a solution we need . In our case, . Thus for and for . For , . Since is not differentiable at 0, we require that lie in the subgradient set of at 0, which is the set [21]. In summary, we need a that satisfies
(11) 
Lemma 1 below shows that by using (11) but with replaced by for all , we get a set of sufficient conditions for to be the unique solution of (6).
Lemma 1
The proof is given in the next subsection.
Next we give Lemma 2 which constructs a which satisfies and for any set disjoint with of size and for any given vector of size . It also bounds for all where is called an “exceptional set”. We prove Theorem 1 by applying Lemma 2 iteratively to construct a that satisfies the conditions of Lemma 1 under the assumptions of Theorem 1.
Lemma 2
Given the known part of the support, , of size . Let , be such that and . Let be a vector supported on a set , that is disjoint with , of size . Then there exists a vector and an exceptional set, , disjoint with , s.t.
(12)  
(13) 
where is defined in (8) and
(14) 
The proof is given in the next subsection.
Proof Outline of Theorem 1. To prove Theorem 1, apply Lemma 2 iteratively, in a fashion similar to that of the proof of [9, Lemma 2.2] (this proof had some important typos). The main idea is as follows. At iteration zero, apply Lemma 2 with (so that ), , and , to get a and an exceptional set , of size less than , that satisfy the above conditions. At iteration , apply Lemma 2 with (so that ), , and to get a and an exceptional set , of size less than . Lemma 2 is applicable in the above fashion because condition 1 of Theorem 1 holds. Define . We then argue that if condition 2 of Theorem 1 holds, satisfies the conditions of Lemma 1. Applying Lemma 1, the result follows. We give the entire proof in the Appendix.
IiiD Proofs of Lemmas 1 and 2
We prove the lemmas from the previous subsection here. Recall that and .
IiiD1 Proof of Lemma 1
The proof is motivated by [9, Section IIA]. There is clearly at least one element in the feasible set of (6)   and hence there will be at least one minimizer of (6). Let be a minimizer of (6). We need to prove that if the conditions of the lemma hold, it is equal to . For any minimizer, ,
(15) 
Recall that is zero outside of , and are disjoint, and is always nonzero on the set . Take a that satisfies the three conditions of the lemma. Then,
(16)  
IiiD2 Proof of Lemma 2
The proof of this lemma is significantly different from that of the corresponding lemma in [9], even though the form of the final result is similar.
Any that satisfies will be of the form
(17) 
We need to find a s.t. , i.e. . Let . Then . This follows because since is a projection matrix. Thus,
(18) 
Consider any set with disjoint with . Then
(19) 
Consider the first term from the right hand side (RHS) of (19).
(20)  
Consider the second term from the RHS of (19). Since is nonnegative definite,
(21) 
Now, which is the difference of two symmetric nonnegative definite matrices. Let denote the first matrix and the second one. Use the fact that where denote the minimum, maximum eigenvalue. Since and , thus
(22) 
as long as the denominator is positive. It is positive because we have assumed that . Using (20) and (22) to bound (19), we get that for any set with ,
(23) 
where is defined in (8). Notice that is nondecreasing in , , . Define an exceptional set, , as
(24) 
Notice that must obey since otherwise we can contradict (23) by taking .
Since and is disjoint with , (23) holds for , i.e. . Also, by definition of , , for all . Finally,
(25)  
since (holds because is a projection matrix). Thus, all equations of (13) hold. Using (18), (12) holds.
Iv Comparison of CS and ModifiedCS
In Theorem 1 and Corollary 1, we derived sufficient conditions for exact reconstruction using modifiedCS. In Sec. IVA, we compare the sufficient conditions for modifiedCS with those for CS. In Sec. IVB, we use Monte Carlo to compare the probabilities of exact reconstruction for both methods.
Iva Comparing sufficient conditions
We compare the sufficient conditions for modifiedCS and for CS, expressed only in terms of ’s. Sufficient conditions for an algorithm serve as a designer’s tool to decide the number of measurements needed for it and in that sense comparing the two sufficient conditions is meaningful.
For modifiedCS, from Corollary 1, the sufficient condition in terms of only ’s is . Using , this becomes
(26) 
For CS, two of the best (weakest) sufficient conditions that use only ’s are given in [22, 23] and [11]. Between these two, it is not obvious which one is weaker. Using [22] and [11], CS achieves exact reconstruction if either
(27) 
To compare (26) and (27), we use which is typical for time series applications (see Fig. 1). One way to compare them is to use [24, Corollary 3.4] to get the LHS’s of both in terms of a scalar multiple of . Thus, (26) holds if . Since , the second condition implies the first, and so only is sufficient. On the other hand, (27) holds if which is clearly stronger.
Alternatively, we can compare (26) and (27) using the high probability upper bounds on as in [9]. Using [9, Eq 3.22], for an random Gaussian matrix, with high probability (w.h.p.), , where
and binary entropy for . Thus, w.h.p., modifiedCS achieves exact reconstruction from randomGaussian measurements if
(28) 
Similarly, from (27), w.h.p., CS achieves exact reconstruction from randomGaussian measurements if either
(29) 
In Fig. 2, we plot , and against for three different choices of . For , we use (from Fig. 1). As can be seen, the maximum allowed sparsity, i.e. the maximum allowed value of , for which either or is smaller than that for which . Thus, for a given number of measurements, , w.h.p., modifiedCS will give exact reconstruction from randomGaussian measurements, for larger sparsity sizes, , than CS would. As also noted in [9], in all cases, the maximum allowed is much smaller than what is observed in simulations, because of the looseness of the bounds. For the same reason, the difference between CS and modifiedCS is also not as significant.
IvB Comparison using Monte Carlo
So far we only compared sufficient conditions. The actual allowed for CS may be much larger. To actually compare exact reconstruction ability of modifiedCS with that of CS, we thus need Monte Carlo. We use the following procedure to obtain a Monte Carlo estimate of the probability of exact reconstruction using CS and modifiedCS, for a given (i.e. we average over the joint distribution of and given ).

Fix signal length, and its support size, . Select , and .

Generate the randomGaussian matrix, (generate an matrix with independent identically distributed (i.i.d.) zero mean Gaussian entries and normalize each column to unit norm)^{1}^{1}1As pointed out by an anonymous reviewer, we actually do not need to normalize each column to unit norm. As proved in [25], a matrix with i.i.d. zero mean Gaussian entries with variance will itself satisfy the RIP. If the variance is not , there will just be a scaling factor in the RIP. This does not affect reconstruction performance in any way..

Repeat the following times

Generate the support, , of size , uniformly at random from .

Generate . Set .

Set .

Generate of size uniformly at random from the elements of .

Generate of size , uniformly at random from the elements of .

Let
