Subspace Methods for Joint Sparse Recovery
Abstract
We propose robust and efficient algorithms for the joint sparse recovery problem in compressed sensing, which simultaneously recover the supports of jointly sparse signals from their multiple measurement vectors obtained through a common sensing matrix. In a favorable situation, the unknown matrix, which consists of the jointly sparse signals, has linearly independent nonzero rows. In this case, the MUSIC (MUltiple SIgnal Classification) algorithm, originally proposed by Schmidt for the direction of arrival problem in sensor array processing and later proposed and analyzed for joint sparse recovery by Feng and Bresler, provides a guarantee with the minimum number of measurements. We focus instead on the unfavorable but practically significant case of rankdefect or illconditioning. This situation arises with limited number of measurement vectors, or with highly correlated signal components. In this case MUSIC fails, and in practice none of the existing methods can consistently approach the fundamental limit. We propose subspaceaugmented MUSIC (SAMUSIC), which improves on MUSIC so that the support is reliably recovered under such unfavorable conditions. Combined with subspacebased greedy algorithms also proposed and analyzed in this paper, SAMUSIC provides a computationally efficient algorithm with a performance guarantee. The performance guarantees are given in terms of a version of restricted isometry property. In particular, we also present a nonasymptotic perturbation analysis of the signal subspace estimation that has been missing in the previous study of MUSIC.
I Introduction
The problem of computing a sparse approximate solution to a linear system has been studied as the subset selection problem in matrix computations [2] with applications in statistical regression [3] and signal processing [4], [5]. The matrix representing the linear system and its columns are called a dictionary and atoms, respectively. The sparse recovery problem addresses the identification of the support, which denotes the indices of the atoms that contribute to the sparse solution, or equivalently, the indices of the nonzero rows of the unknown matrix. Once the support is determined, the recovery of the sparse signals reduces to standard overdetermined linear inverse problems.
The study of sparse solutions to underdetermined systems dates back to 70’s [6], [7]. Relevant theories and algorithms have been further developed in the 80’s [8], [9] and in the 90’s [10], [11], [12]. Recently, this subject became more popular in the signal processing community with the name of compressed sensing [13]. In particular, the elegant analysis derived with modern probability theory [13], [14] provided performance guarantees for polynomialtime algorithms in terms of properties of random dictionaries. These might be the most important contributions in recent years.
In some applications, there are multiple measurement vectors (righthand sides of the linear system of equations), each corresponding to a different unknown vector, with the special property that all unknown vectors share a common support. The sparse recovery problem with this joint structure in the sparsity pattern is called joint sparse recovery or the multiple measurement vector (MMV) problem and is often an easier problem with better performance.
In the mid 1990’s, Bresler and Feng introduced “spectrumblind sampling” [4], [15], [16]. Their scheme enables subNyquist minimumrate sampling and perfect reconstruction of multiband signals (analog or discrete, in one or more dimensions) with unknown but sparse spectral support. They reduced the spectrumblind reconstruction problem to a finitedimensional joint sparse recovery problem. Mishali and Eldar elaborated the spectrumblind sampling approach in [17] (cf. [18] for a more detailed discussion of the relationship of [17] to the earlier work from the 1990’s [4], [15], [16]), and they also proposed “modulated wideband converter” in [19], which improves on spectrumblind sampling by adding robustness against jitter. The reconstruction in the modulated wideband converter too is reduced to a finite dimensional joint sparse recovery problem. Rao et al. (cf. [11] and the references therein) introduced a joint sparse recovery formulation and methods for the recovery of sparse brain excitations. Obozinski et. al. [20] formulated variable selection in multivariate regression as a joint sparse recovery problem. The design matrix in regression corresponds to the linear system matrix of the joint sparse recovery and the indicator function of the variables that mostly contribute to the given data is assumed to be sparse. Malioutov et al. posed the direction of arrival (DOA) estimation problem as a joint sparse recovery problem [5]. For the typically small number of sources in this problem, the indicator function of the quantized angles is modeled to be sparse.
Algorithms that exploit the structure in the sparsity pattern have been developed for the joint sparse recovery problem. Bresler and Feng proposed to use a version of the MUSIC algorithm [21] from sensor array processing for the full row rank case where the nonzero rows of the unknown matrix have full row rank [4], [15], [16]. They also proposed methods based on a greedy search inspired by the alternating projections algorithm [22] in DOA estimation, later dubbed orthogonal least squares (OLS) [23]. Existing solutions to the sparse recovery problem for the single measurement vector (SMV) case have been extended to the MMV case. Greedy algorithms [24], [25], [26] extend orthogonal matching pursuit (OMP) [27] and convex optimization formulations with the mixed norm [5], [28], [29], [30] extend the corresponding SMV solution such as basis pursuit (BP) [31] and LASSO [32]. Sparse Bayesian learning (SBL) [33] has been also extended to the MMV case [34], [35].
Owing to the similarity between the joint sparse recovery problem and DOA estimation, theories developed for DOA estimation affected those for joint sparse recovery. For example, the fundamental limit on the performance of DOA estimation [36] also applies to joint sparse recovery. Wax and Ziskind [36] showed the condition for the unique identification of DOA in the sensor array processing, which has been applied to joint sparse recovery by Feng and Bresler [4] to determine the condition for the unique identification of the support. The condition has been further studied in more general settings [24], [29]. MUSIC applied to joint sparse recovery [4] was the first method that was guaranteed with the tightest sufficient condition, which also coincides with a necessary condition required for the support identification by any method. However, the guarantee only applies to the case where the nonzero rows of the unknown matrix have full row rank and there is no noise in the measurement vectors.
Performance guarantees of greedy algorithms and of convex optimization formulations for joint sparse recovery have been also studied extensively in the literature [37], [25], [29], [28], [26], [38], [39], [40]. The guarantees of such methods have not been proved to be strictly better than the guarantees for the SMV case. Moreover, unlike the guarantee of MUSIC, such methods are not guaranteed with the minimal requirement for the full row rank case in the absence of noise.
Performance guarantees aside, the empirical performance and computational cost of any method are of key importance and usually determines its adoption in practice. Empirically, the optimization schemes with diversity measures (e.g., the mixed norm) perform better than greedy algorithms. In particular, the rate of exact support recovery in existing algorithms does not improve with increasing rank of the unknown matrix beyond a certain level. Furthermore, under unfavorable settings such as rankdefect or illconditioning, existing algorithms for joint sparse recovery, while not failing, are far from achieving the guarantee of MUSIC for the full row rank case.
While the optimization scheme with diversity measures perform better empirically than the greedy algorithms, this improved performance comes at a much higher computational cost. Convex optimization formulations [5], [28], [29], [30] are usually cast as second order cone programming (SOCP), which is more difficult to solve compared to its analogues in the SMV case, which are cast as linear programming (LP) or quadratic programming (QP). In contrast, greedy algorithms and MUSIC are computationally efficient. As a summary, none of the listed methods enjoys both good empirical performance and computational speed at the same time.
In view of the various drawbacks of the existing algorithms for joint sparse recovery, MUSIC, when it works, is extremely attractive. In a favorable setting, where the matrix composed of the nonzero rows of the unknown signal matrix has full row rank, MUSIC is guaranteed to recover the support and hence provides a guarantee with minimal requirement. Moreover, MUSIC is highly efficient computationally. However, the full row rank condition is often violated in practice. For example, if the number of measurement vectors is smaller than the sparsity level , then no more than rows can be linearly independent, and the nonzero rows do not have full row rank. In other applications, such as spectrumblind sampling or the DOA problem, is large or even infinite. Even in this case though, the rank might be smaller than , or the submatrix of nonzero rows can be illconditioned. For example, in the DOA problem, correlation between sources or multipath propagation can cause a large condition number. It is wellknown that MUSIC fails in this practically important “rankdefective” case and this has motivated numerous attempts to overcome this problem, without resorting to infeasible multidimensional search. However, all these previous methods use special structure of the linear system – such as shift invariance that enables to apply socalled spatial smoothing [41]. Previous extension of MUSIC are therefore not applicable to the general joint sparse recovery problem.
The main contributions of this paper are summarized as follows. First, we propose a new class of algorithms, subspaceaugmented MUSIC (SAMUSIC) that overcome the limitations of existing algorithms and provide the best of both worlds: good empirical performance at all rank conditions; and efficient computation. In particular, SAMUSIC algorithms improve on MUSIC so that the support is recovered even in the case that the unknown matrix has rankdefect and/or illconditioning. Compared to MUSIC [4], in the presence of a rankdefect, SAMUSIC has additional steps of partial support recovery and subspace augmentation. Combined with partial support recovery by the subspacebased greedy algorithms also introduced in this paper, SAMUSIC provides a computationally efficient solution to joint sparse recovery with a performance guarantee. In fact, the computational requirements of SAMUSIC algorithms are similar to those of greedy algorithms and of MUSIC. Secondly, we derive explicit conditions that guarantee each step of SAMUSIC for the noisy and/or rankdefective case. The performance is analyzed in terms of a property [42], which is a local version of the restricted isometry property (RIP) [43]. We call this property the weak1 restricted isometry property (weak1 RIP). Most importantly, compared to the relevant work [44] with similar but independently developed ideas, the analysis in this paper is nonasymptotic and applies to wider class of matrices including Gaussian, random Fourier, and incoherent unitnorm tight frames.
Contributions of independent interest include two new subspacebased greedy algorithms for joint sparse recovery with performance guarantees, extension of the analysis of MUSIC for joint sparse recovery to the noisy case with imperfect subspace estimation, and nonasymptotic analysis of subspace estimation from finitely many snapshots. The latter analysis is different from previous analysis of subspace methods, which were based on the law of large numbers, asymptotic normality, or low order expansion.
The remainder of this paper is organized as follows. After introducing notations in Section II, the joint sparse recovery problem is stated in Section III. MUSIC for joint sparse recovery is reviewed in Section IV with discussion that motivates the current work. We also propose an algorithm for signal subspace estimation in Section IV. In Section V, we propose SAMUSIC and subspacebased greedy algorithms. We review the notion of the weak1 RIP in Section VI. The weak1 RIP analysis of various matrices that arise commonly in compressed sensing is given without any ambiguous constant, which might be of independent interest. In Section VII, we provide nonasymptotic analysis of the algorithms of Sections IV and V by using the weak1 RIP. In Section VIII, we analyze subspace estimation using a random signal model. The empirical performance of SAMUSIC is compared to other methods in Section IX and the relation to relevant works is discussed in Section X.
Ii Notation
Symbol denotes a scalar field, which is either the real field or the complex field . The vector space of tuples over is denoted by for where is the set of natural numbers (excluding zero). Similarly, for , the vector space of matrices over is denoted by .
We will use various notations on a matrix . The range space spanned by the columns of will be denoted by . The Hermitian transpose of will be denoted by . The th column of is denoted by and the submatrix of with columns indexed by is denoted by , where denotes the set for . The th row of is denoted by , and the submatrix of with rows indexed by is denoted by . Symbol will denote the th standard basis vector of , where is implicitly determined for compatibility. The th largest singular value of will be denoted by . For Hermitian symmetric , will denote the the largest eigenvalue of . The Frobenius norm and the spectral norm of are denoted by and , respectively. For , the mixed norm of is defined by
The inner product is denoted by . The embedding Hilbert space, where the inner product is defined, is not explicitly mentioned when it is obvious from the context.
For a subspace of , matrices and denote the orthogonal projectors onto and its orthogonal complement , respectively.
For two matrices and of the same dimension, if and only if is positive semidefinite.
Symbols and will denote the probability and the expectation with respect to a certain distribution. Unless otherwise mentioned, the distribution shall be obvious from the context.
Iii Problem Statement
A vector is sparse if it has at most nonzero components. The support of is defined as the set of the indices of nonzero components. The sparsity level of is defined as the number of nonzero components of . The sparse recovery problem is to reconstruct an sparse vector from its linear measurement vector through sensing matrix . In particular, if there is no noise in , then is a solution to the linear system . In this case, under certain conditions on , unknown vector is recovered as the unique sparse solution. For example, any sparse is recovered if and only if any columns of are linearly independent [45]. For to be the unique sparse solution to , it is necessary that submatrix have full column rank, where is the support of . Otherwise, there exists another sparse solution and this contradicts uniqueness. Note that once the support of is determined from , then is easily computed as where denotes the MoorePenrose pseudo inverse of . Therefore, the key step in solving the sparse recovery problem is the identification of the support.
In practice, the measurement vector is perturbed by noise. Usually, we assume that is given by
with additive noise . In this case, is no longer a solution to the linear system . Instead, minimizing with the sparsity constraint that is sparse provides a solution to the sparse recovery problem. Alternatively, various convex optimization methods with sparsity inducing metrics such as the norm have been proposed. However, the solution provided by such methods is not exactly sparse in the presence of noise. In some applications, the support has important physical meaning and hence the identification of the support is explicitly required. For example, in imaging applications of compressed sensing the support corresponds to the location of the target object, and in sparse linear regression the most contributing variables are identified by the support (cf. [46]). In such applications, unless the solution obtained to the sparse recovery problem is exactly sparse, a step of thresholding the obtained solution to the nearest sparse vector is necessary. For this reason, in this paper, the success of the sparse recovery problem is defined as the exact identification of the support of .
Let us now turn to the main problem of this paper, where there exist multiple sparse signal vectors that share the same (or similar) sparsity pattern(s) and the measurement vectors are obtained through a common sensing matrix . We assume that the union of the supports of the for has at most elements. Then, has at most nonzero rows and is called row sparse. The row support of is defined as the set of indices of nonzero rows. The joint sparse recovery problem is to find the row support of the unknown signal matrix from the matrix with multiple measurement vectors (MMV) given by
with common sensing matrix and with perturbation . Let denote the row support of . Then, is compactly rewritten as where is the matrix composed of the nonzero rows of , and is the submatrix of with the corresponding columns. The prior knowledge that is row sparse will be assumed ^{1}^{1}1 This is only for convenience of the analysis but is not a limitation of the proposed algorithms. See VB for more detailed discussion. . An important parameter in the problem will be the rank of the unknown signal matrix , , which will be assumed unknown as well. When matrix has full row rank, assumes its maximum value, , and we will refer to this as the full row rank case. This is the case preferred by the algorithms in this paper. Otherwise, , considered as violation of the full row rank case, will be called the rankdefective case.
Iv MUSIC Revisited
The similarity between the joint sparse recovery (or MMV) problem and the direction of arrival (DOA) estimation problem in sensor array processing has been well studied before (e.g. [4]). In particular, it has been shown that the joint sparse recovery problem can be regarded as a special case of the DOA problem with discretized angles [4]. Through this analogy between the two problems, the algorithms and their analysis developed for the DOA problem have been applied to the joint sparse recovery problem [4]. In this section, we review a subspacebased algorithm proposed by Feng and Bresler [4], on which our new algorithm in Section V improves. We also elaborate the subspacebased algorithm in [4] to work without ideal assumptions.
Iva MUSIC for the Joint Sparse Recovery Problem Revisited
Inspired by the success of the MUSIC algorithm [21] in sensor array processing, Bresler and Feng [4], [16], [15] proposed to use a version of MUSIC for joint sparse recovery. As in the original MUSIC algorithm in sensor array processing [21], the first step is to estimate the socalled signal subspace defined by
from the snapshot matrix^{2}^{2}2 We adopt the terminology from the sensor array processing literature. To emphasize the analogy between the joint sparse recovery problem and DOA estimation, we also call each of the columns of a snapshot. Then, will denote the number of snapshots. . In sensor array processing, MUSIC [21] estimates using the eigenvalue decomposition (EVD) of . The same method is applied to the joint sparse recovery problem under the assumption that
(4.1) 
which is achieved with statistical assumptions on and under the asymptotic in (with infinitely many snapshots) [15]. It is also assumed that has full column rank and has full row rank. Then, the dimension of coincides with . The assumed relation (4.1) implies that the smallest eigenvalue of has multiplicity , whence the dimension of is exactly determined. The signal subspace is then exactly computed as the invariant subspace spanned by the dominant eigenvectors of since the noise part in (4.1) only shifts the eigenvalues of . The joint sparse recovery problem then reduces to the case where the subspace estimation is errorfree and hence the subsequent analysis of the support recovery in [4], [16], [15] considered this errorfree case ^{3}^{3}3 Obviously, for the noiseless case, the subspace estimation is errorfree without relying on the asymptotic . .
Given the signal subspace , MUSIC for joint sparse recovery identifies the rowsupport as the set of the indices of columns of such that . In other words, MUSIC accepts the index as an element of the support if and rejects it otherwise. In the remainder of this paper, the acronym MUSIC will denote the version for joint sparse recovery [4] rather than the original MUSIC algorithm for sensor array processing [21].
A sufficient condition that guarantees the success of MUSIC for the special case where has full row rank, is given in terms of the Kruskal rank [47] defined below.
Definition IV.1
The Kruskal rank of a matrix , denoted by , is the maximal number such that any columns of are linearly independent.
Proposition IV.2 ([4], [15])
Let be an arbitrary subset of with elements. Suppose that is row sparse with support and . If satisfies
(4.2) 
then
(4.3) 
for if and only if .
The following result directly follows from Proposition IV.2.
Corollary IV.3
Under the conditions on and in Proposition IV.2, given the exact signal subspace , MUSIC is guaranteed to recover .
Remark IV.4
Remark IV.5
MUSIC with its performance guarantee in Proposition IV.2 is remarkable in the context of compressive sensing for the following reasons. First, MUSIC is a polynomialtime algorithm with a performance guarantee under a condition that coincides with the necessary condition for unique recovery (by any algorithm, no matter how complex) ^{4}^{4}4 Since the mid 1990’s, when these results (for what became known later as compressive sampling) [4], [16], [15] were published, until recently, MUSIC was the only polynomialtime algorithm with such a guarantee. Recent results [48] showed another (greedy) algorithm with the same guarantee. . Second, MUSIC is simple and cheap, involving little more than a single EVD of the data covariance matrix. In fact, efficient methods for partial EVD, or other rankrevealing decompositions can further reduce the cost ^{5}^{5}5 The rank revealing decompositions can be computed by the Lanczos method [2] for the truncated singular value decomposition (SVD). If the matrix is large, randomized algorithms [49], [50] can be also used to compute the truncated SVD. .
Unfortunately, as is wellknown in the sensor array processing literature [21], [51], and also demonstrated by numerical experiments later in Section IX, MUSIC is prone to failure when does not have full row rank, or when it is illconditioned in the presence of noise. In sensor array processing, this is known as the “source coherence problem” [52], and (with the exception of the case of a Vandermonde matrix ) no general solution to this problem are known. This motivates our work to propose a new subspacebased method for joint sparse recovery that improves on MUSIC.
For the noisy case, the analysis of signal subspace estimation based on the asymptotic in is not practical. In particular, from a perspective of compressed sensing (with joint sparsity), recovery of the support from a finite (often small) number of snapshots is desired. In the next subsection, we propose a subspace estimation scheme that works with finitely many snapshots, which will be applied to both MUSIC and the new subspacebased methods in this paper. In Section IVC, we formalize the MUSIC algorithm for support recovery in the presence of a perturbation in the estimate of . This will lay the ground for the subsequent analysis of MUSIC in the noisy scenario, and for its extension in the same scenario to SAMUSIC.
IvB Signal Subspace Estimation from Finitely Many Snapshots
We study the problem of signal subspace estimation from finitely many snapshots in this subsection. For later use in other subspacebased algorithms in Section V, we weaken the assumptions in the previous section. In particular, we assume finite and no longer assume that has full row rank. We also assume that the columns of the noise matrix are i.i.d. random vectors with white spectrum, i.e., . (Otherwise, the standard prewhitening schemes developed in sensor array processing may be applied prior to subspace estimation.)
When is illconditioned, the last few singular values of are small, making the estimation of highly sensitive to the noise in the snapshots. To improve the robustness against noise, instead of estimating the whole subspace , we only estimate an dimensional subspace of for . The dimension is determined so that the gap between and is significant.
We propose and analyze a simple scheme that determines the dimension by thresholding the eigenvalues of so that the estimated signal subspace spanned by the dominant eigenvectors of is close to an dimensional subspace of . The procedure for estimating the signal subspace and its dimension is described next.
Given the snapshot matrix , we compute its sample covariance matrix defined by
Then, we compute a biased matrix by
Note that and have the same eigenvectors. Recall that our goal is to find an dimensional subspace for some from such that there exists an dimensional subspace of the signal subspace satisfying for small . For better performance of support recovery by algorithms in Section V, larger is preferred.
For an ideal case where (4.1) holds, reduces to defined by
Since , by setting to , we compute itself rather than a proper subspace of .
For finite , usually, the cross correlation terms in the sample covariance matrix between the noise term and the signal term are smaller than the autocorrelation terms. Since the autocorrelation term of is nearly removed in , we will show that it is likely that is small in the spectral norm. In particular, let be the subspace spanned by the dominant eigenvectors of ; if is small compared to the gap between and , then there exists an dimensional subspace of with small . Since is not available, we determine the dimension by simple thresholding. More specifically, given a threshold , the dimension of is determined as the maximal number satisfying
(4.4) 
When (and hence ) is illconditioned, it is likely that the gap between the consecutive eigenvalues and where is small compared to , which is roughly proportional to . The aforementioned increased robustness to noise is provided by choosing so that the gap is large., which will in turn reduce the estimated subspace dimension . More sophisticated methods for determining are possible, but this simple method suffices for our purposes and is amenable to analysis (see Section VIII). This algorithm for estimating the signal subspace and its dimension is summarized as Algorithm 1.
IvC MUSIC Applied to an Inaccurate Estimate of the Signal Subspace
In the presence of a perturbation in the estimated signal subspace , MUSIC finds the set of indices that satisfy
(4.5) 
The corresponding algorithm is summarized in Algorithm 2. To provide an intuition for the selection criterion in (4.5), we use the notion of the “angle function” [53].
Definition IV.6 ([53])
The angle function between two subspaces and is defined by
(4.6) 
Remark IV.7
V SubspaceAugmented MUSIC
Va MUSIC Applied to an Augmented Signal Subspace
When has full row rank, the signal subspace coincides with . In this case, given the exact , MUSIC is guaranteed to recover the support because (i) for all ; and (ii) for all , which is implied by (Proposition IV.2). However, in the rankdefective case, when does not have full row rank, i.e., when is strictly smaller than the sparsity level , we have . Therefore, is a proper subspace of and it may happen that for some (or all) . This will cause MUSIC to miss valid components of . Because, in the presence of noise (imperfect subspace estimate), MUSIC selects, by (4.8), the indices for which is closets (in the sense of the angle function) to , this may result in the selection of spurious indices into the estimate of . This explains the wellknown fact that in the rankdefective case MUSIC is prone to failure.
Subspaceaugmented MUSIC overcomes the limitation of MUSIC to the full row rank case by capitalizing on the following observation: MUSIC fails in the rankdefective case because is a proper subspace of ; however, if another subspace that complements is given so that , then MUSIC applied to the augmented subspace will be successful.
Unfortunately, in general, finding such an oracle subspace is not feasible. The search procedure cannot even be enumerated. However, if , the matrix of nonzero rows of , or more generally, the subspace , satisfies a mild condition, then the search may be restricted without loss of generality to subspaces spanned by columns of . The following proposition states this result.
Definition V.1
Matrix is rownondegenerate if
(5.1) 
Remark V.2
Condition (5.1) says that every rows of are linearly independent for . This is satisfied by that is generic in the set of full rank matrices of the same size. In fact, an even weaker requirement on suffices, as shown by the next argument that reduces the requirement to .
Remark V.3
Condition (5.1) is invariant to multiplication of by any full row rank matrix of compatible size on the right. In particular, Condition (5.1) holds if and only if
(5.2) 
for any orthonormal basis of . It follows that (5.1) is a property of the subspace . Furthermore, (5.1) also implies that any such that is also rownondegenerate (Lemma .4).
Remark V.4
The condition on in (5.2) that any subset of rows of up to size are linearly independent is purely algebraic and can be paraphrased to say that the rows of are in general position. This is a mild condition satisfied by generic . For example, if the rows of are independently distributed with respect to any absolutely continuous probability measure, then (5.2) is satisfied with probability 1.
Remark V.5
Proposition V.6 (Subspace Augmentation)
Suppose that is row sparse with support and is rownondegenerate. Let be an arbitrary dimensional subspace of where . Let be an arbitrary subset of with elements. If has full column rank, then
(5.3) 
Proof:
See Appendix F.
Remark V.7
Note that is a necessary condition for (5.3). Therefore, should be a subset of with at least elements for the success of the subspace augmentation.
Remark V.8
The rownondegeneracy condition on is a necessary condition to guarantee (5.3) for an arbitrary subset of with elements. Suppose that fails to satisfy the rownondegeneracy condition, i.e., . By the assumption on , there exists a row sparse matrix with support such that . Since was an arbitrary dimensional subspace of , without loss of generality, we may assume that . By the projection update formula, it follows that and hence it suffices to show for the failure of (5.3). Since , there exists of size such that . Then, . It follows that (5.3) fails for this specific . Therefore, the rownondegeneracy condition on is essential. Furthermore, by Remarks V.2–V.5, the rownondegeneracy is a mild condition; it will be assumed to hold henceforth.
Let be rownondegenerate and suppose that an errorfree estimate of is available. In this case, Proposition V.6 implies that, given a correct partial support of size , MUSIC applied to the augmented subspace enjoys the same guarantee as MUSIC for the full row rank case (Proposition IV.2). We will see in Section VII that a similar statement applies even with an imperfect estimate .
Based on the above result, we propose a class of methods for joint sparse recovery called subspaceaugmented MUSIC (SAMUSIC) consisting of the following steps:

Signal subspace estimation: compute an estimate of the signal subspace .

Partial support recovery: compute a partial support of size from and , where and is the sparsity level known a priori.

Augment signal subspace: compute the augmented subspace

Support completion: complete to produce , by adding more support elements obtained by applying “MUSIC” to , that is, finding satisfying
The general SAMUSIC algorithm is summarized as Algorithm 3. An actual implementation might use orthonormal bases and for the subspaces and instead of constructing the projection operators and . Step 3 could then be performed by a QR decomposition of matrix .
A particular instance of the SAMUSIC algorithm is specified by the particular methods used for the steps of signal subspace estimation and partial support recovery. For subspace estimation, in the analysis in Sections VII and VIII, we will consider the EVDbased scheme in Section IVB. However, as mentioned earlier, the subspace estimation scheme is not restricted to the given method. For example, if the noise is also sparse, then robust principal component analysis (RPCA) [54] will provide a better estimate of than the usual SVD.
The choice of method for partial support recovery is discussed in the next subsection. Here we note some special cases. As increases, the size of the partial support required in SAMUSIC decreases. For small , we can use an exhaustive search over , the computational cost of which also decreases in . In particular, for the special case where , the step of partial support recovery is eliminated, and SAMUSIC reduces to MUSIC [4].
VB Partial Support Recovery with Practical Algorithms
When there is a “rankdefect” in , i.e., , SAMUSIC requires partial support recovery of size . In addition to computational efficiency, a key desirable property of an algorithm to accomplish this is that it solves the partial support recovery problem more easily than solving the full support recovery problem.
From this perspective, greedy algorithms for the joint sparse recovery problem are attractive candidates. Both empirical observations and the performance guarantees in the sequel suggest that the first few steps of greedy algorithms are more likely to succeed than the entire greedy algorithms. In other words, greedy algorithms take advantage of the reduction to partial support recovery when they are combined with SAMUSIC.
Any of the known greedy algorithms for joint sparse recovery may be used in SAMUSIC, producing a different version of SAMUSIC. In particular, we may consider variations on orthogonal matching pursuit (OMP) [27], such as MMV orthogonal matching pursuit (MOMP) [24], simultaneous orthogonal matching pursuit (SOMP) [25], or their generalization to SOMP [26]. The SOMP algorithm incrementally updates the support by the following selection rule: given an index set from the previous steps, the algorithm adds to the index that satisfies
(5.4) 
where is a parameter of the algorithm. MOMP and SOMP correspond to 2SOMP and 1SOMP, respectively.
In particular, we propose and analyze two different algorithms for the partial support recovery step in SAMUSIC. The first is the signal subspace orthogonal matching pursuit (SSOMP) algorithm. SSOMP is a subspacebased variation of MOMP (2SOMP) that replaces the snapshot matrix in (5.4) by the orthogonal projector onto the estimated signal subspace. (Equivalently, is replaced by an orthogonal basis matrix for the estimated signal subspace). Hence, given the estimated support from the previous steps, SSOMP updates by adding selected by
(5.5) 
The complete SSOMP is summarized as Algorithm 4.
The second algorithm we propose for the partial support recovery step in SAMUSIC is signal subspace orthogonal matching subspace pursuit (SSOMSP). SSOMSP is a modification of another greedy algorithm, rankaware order recursive matching pursuit (RAORMP) [48] ^{6}^{6}6 The name “RankAware ORMP” proposed for this algorithm appears to be a misnomer. RAORMP, as originally proposed [48], does not have any feature to determine rank. It computes an orthonormal basis for . However, whereas in the ideal, noiseless case this basis will have dimension equal to the rank of , with any noise present will have full rank equal to , and this will also be the dimension of the computed orthonormal basis. Hence, the algorithm does not seem to have any builtin rankawareness.. SSOMSP replaces the snapshot matrix in RAORMP by the orthogonal projector onto the estimated signal subspace, or equivalently, by an orthogonal basis matrix for the estimated signal subspace. Given the estimated support from the previous steps, RAORMP updates by adding selected by
(5.6) 
Similarly, SSOMSP updates the support by adding selected by
(5.7) 
In general, and span different subspaces, and hence the two projection operators used in (5.6) and (5.7) are different. The two projection operators coincide only for the special case when there is no noise.
We interpret (5.7) using the angle function between two subspaces, i.e., (5.7) is equivalent to
(5.8) 
Given the subspace , which is spanned by the columns of corresponding to support elements determined in the preceding steps of the algorithm, the selection rule finds the nearest subspace to among all subspaces, each of which is spanned by for . The name “orthogonal matching subspace pursuit” is intended to distinguish the matching using a subspace metric in SSOMSP from that of OMP and its variations.
Again, we use a partial run ( steps) of SSOMSP for partial support recovery and switch to MUSIC applied to the augmented subspace. The complete SSOMSP is summarized as Algorithm 5.
VC Stopping Conditions for Unknown Sparsity Level
In most analyses of greedy algorithms, the sparsity level is assumed to be known a priori. In fact, this assumption is only for simplicity of the analysis and not a limitation of the greedy algorithms. For example, MOMP recovers a support of unknown size by running the steps until the residual falls below a certain threshold that depends on the noise level (or vanishes, in the noiseless case). SSOMP recovers a support of unknown size similarly but a different criterion may be used for the stopping condition. Assume that the estimated signal subspace satisfies . After steps, SSOMP returns of size . Whether to continue to the next step is determined by the stopping condition given by
for a threshold . For the noiseless case, is set to 0 and, for the noisy case, is set to an estimate of . Let us consider the solution given by the enumeration of all possible supports:
We assume that . Otherwise, in the noiseless case, this implies that the support is not uniquely determined. If , then SSOMP stops after steps whenever it is guaranteed to recover the support with known .
Similarly, SAMUSIC with SSOMP (SAMUSIC+SSOMP henceforth) can recover the support without knowledge of by applying the same stopping criterion for SSOMP, which is summarized in Algorithm 6. The update criterion in Step 9 of Algorithm 6 determines the SAMUSIC algorithm. For example, SAMUSIC+SSOMP uses the condition in (5.5) whereas SAMUSIC with SSOMSP (SAMUSIC+SSOMSP henceforth) uses the condition in (5.7). If , then SAMUSIC algorithms return support of size whenever they are guaranteed to recover the support with known . For simplicity, the analyses in Section VII will assume known sparsity level.
Vi Weak Restricted Isometry Property
Via Uniform Restricted Isometry Property
The restricted isometry property (RIP) has been proposed in the study of the reconstruction of sparse vectors by norm minimization [43]. Matrix satisfies the RIP of order if there exists a constant such that
(6.1) 
The smallest that satisfies (6.1) is called the restricted isometry constant (RIC) of order and is denoted by . Note that (6.1) is equivalent to
(6.2) 
and hence satisfies
The RIP of order implies that all submatrices of with columns are uniformly well conditioned.
ViB Weak Restricted Isometry Property
In many analyses of sparse signal recovery, the uniform RIP is unnecessarily strong and requires a demanding condition on that is not satisfied by the matrices that arise in applications. Therefore, weaker versions of RIP have been proposed, tailored to specific analyses [42], [55].
Matrix satisfies the weak restricted isometry property (weak RIP) [55] with parameter , where , with , and , if
(6.3) 
The corresponding weak restricted isometry constant is given by
The special case of the weak RIP with has been previously proposed [42] to derive an average case analysis of the solution of the MMV problem by the mixed norm minimization, also known as MMV basis pursuit (MBP) [29]. This specific case of the weak RIP with , which we call the weak1 RIP, is useful for the analysis in this paper. Obviously, the weak1 RIP is satisfied by a less stringent condition on . In the following, we list some matrices that satisfy the weak1 RIP along with the required conditions ^{7}^{7}7 We show that, compared to the uniform RIP, the requirement on the number of measurements to satisfy the weak1 RIP is reduced by large factors, ranging between 200 to thousands fold.. Importantly, in addition to Gaussian matrices, which lend themselves to relatively easy analysis, this list includes deterministic matrices that arise in applications, and provides reasonable constants for them.
ViC Gaussian Matrix
Eldar and Rauhut derived a condition for the weak1 RIP of an i.i.d. Gaussian matrix [42, Proposition 5.3]. Their proof starts with the concentration of the quadratic form around its expectation where is an i.i.d. Gaussian matrix, to bound the singular values of , which has been originally proposed in [56]. We provide an alternative and much tighter condition for the weak1 RIP of directly using the concentration of the singular values of an i.i.d. Gaussian matrix [57].
Proposition VI.1
Given with , let be an i.i.d. Gaussian matrix whose entries follow . Suppose that is a subset of with elements. For any , if