RIPlike Properties in Subsampled Blind Deconvolution
Abstract
We derive near optimal performance guarantees for subsampled blind deconvolution. Blind deconvolution is an illposed bilinear inverse problem and additional subsampling makes the problem even more challenging. Sparsity and spectral flatness priors on unknown signals are introduced to overcome these difficulties. While being crucial for deriving desired near optimal performance guarantees, unlike the sparsity prior with a nice unionofsubspaces structure, the spectral flatness prior corresponds to a nonconvex cone structure, which is not preserved by elementary set operations. This prohibits the operator arising in subsampled blind deconvolution from satisfying the standard restricted isometry property (RIP) at near optimal sample complexity, which motivated us to study other RIPlike properties. Combined with the performance guarantees derived using these RIPlike properties in a companion paper, we show that subsampled blind deconvolution is provably solved at near optimal sample complexity by a practical algorithm.
1 Introduction
1.1 Subsampled blind deconvolution of sparse signals
The subsampled blind deconvolution problem refers to the resolution of two signals from a few samples of their convolution and is formulated as a bilinear inverse problem as follows. Let denote the set of sampling indices out of . Given , the sampling operator is defined so that the th element of is the th element of for . Then, the samples of the convolution indexed by with additive noise constitute the measurement vector , which is expressed as
where denotes additive noise.
Let be uniquely represented as and over dictionaries and . Then, the recovery of is equivalent to the recovery of , and the subsampled blind deconvolution problem corresponds to the bilinear inverse problem of recovering from its bilinear measurements in , when , , and are known.
A stable reconstruction in subsampled blind deconvolution is defined through the lifting procedure [1] that converts the blind deconvolution to recovery of a rank1 matrix from its linear measurements. By the lifting procedure, bilinear measurements of are equivalently rewritten as linear measurements of the matrix , i.e., there is a linear operator such that
Then, each element of the measurement vector corresponds to a matrix inner product. Indeed, there exist matrices that describe the action of on by
(1) 
Since the circular convolution corresponds to the elementwise product in the Fourier domain, ’s are explicitly expressed as
where denotes the th column of the unitary DFT matrix . The subsampled blind deconvolution problem then becomes a matrixvalued linear inverse problem where the unknown matrix is constrained to the set of rank1 matrices.
In the lifted formulation, a reconstruction of the unknown matrix is considered successful if it satisfies the following stability criterion:
(2) 
for an absolute constant . This definition of success is free of the inherent scale ambiguity in the original bilinear formulation. Once is recovered, (resp. ) is identified up to a scale factor as the left (resp. right) factor of the rank1 matrix .
The subsampled blind deconvolution problem is illposed and cannot be solved without restrictive models on unknown signals. We assume the following signal priors, which are modified from a previous subspace model for blind deconvolution [1].
 A1

Sparsity: The coefficient vector is sparse. Geometrically, belongs to the union of all subspaces spanned by standard basis vectors. The previous subspace model [1] corresponds to a special case where the subspace in the union that includes is known a priori. To simplify the notation, define
where counts the number of nonzeros in . Then, . The other coefficient vector is sparse, i.e., .
 A2

Spectral flatness: The unknown signals and are flat in the Fourier domain as follows. Define a set by
(3) where denotes the spectral flatness level of given by
Then, and . When and are invertible, it is equivalent to and .^{1}^{1}1For simplicity, we restrict our analysis to the case where and are invertible matrices. However, it is straightforward to extend the analysis to the case with overcomplete dictionaries by replacing the inverse by the preimage operator.
Our objective is to show that the subsampled blind deconvolution of signals following the aforementioned models is possible at near optimal sample complexity. Similarly to related results in compressed sensing, we take the following twostep approach: i) First, in a companion paper [2], it was shown that stable reconstruction from noisy measurements is available under a restricted isometry property (RIP) of the linear operator . In particular, under a mild additional assumption on signals, we show that a practical algorithm provably achieves stable reconstruction under RIPlike properties of ; ii) Next, in this paper, we prove that if both dictionaries are mutually independent random matrices whose entries are independent and identically distributed (i.i.d.) following a zeromean and complex normal distribution , with high probability, such RIPlike properties hold at the sample complexity of . This sample complexity is near optimal (up to a logarithmic factor) when the spectral flatness parameters and are sublinear in and , respectively; Combining these results provides the desired near optimal performance guarantees.
1.2 RIP and RIPlike properties
We first review RIP and extend the notion to RIPlike properties. RIP was originally proposed to show the performance guarantee for the recovery in compressed sensing by norm minimization [3]. It is generalized as follows:
Definition 1.1.
Let be a Hilbert space where denotes the HilbertSchmidt norm. Let be a centered and symmetric set, i.e., and for all in the unit modulus. A linear operator satisfies the RIP if
or equivalently,
HilbertSchmidt norms, including the norm, are represented as an inner product of a vector with itself. For example, and . This observation extends RIP to another property called restricted anglepreserving property (RAP) defined as follows:
Definition 1.2.
Let be centered and symmetric sets. A linear operator satisfies the RAP if
In a more restrictive case with orthogonality between and (), RAP reduces to the restricted orthogonality property (ROP) [4].
Definition 1.3.
Let be centered and symmetric sets. A linear operator satisfies the ROP if
RIP and RAP of a linear operator have useful implications for the inverse problem given by . Let . The RIP of implies that is injective when the domain is restricted to ; hence, every is uniquely identified from . The RAP was used to show that practical algorithms, such as the projected gradient method, reconstruct from with a provable performance guarantee.
By definition, the RAP implies the RIP, but the converse is not true in general. For certain with special structures, RIP implies RIPlike properties. For example, when is a subspace, the Minkowski sum of and coincides with . Therefore, RIP, RIP, and RAP are all equivalent. The restrictive set as a subspace arises in many applications. A set of matrices with Toeplitz, Hankel, circulant, symmetric, or skew symmetric structure corresponds to such an example.
Yet for another example, a sparsity model, which corresponds to a union of subspaces, provides the desired relationship between RIP and RIPlike properties. Let be the set with all sparse vectors in the Euclidean space. Then, it follows that the difference set between and itself is contained within (another restrictive set of the same structure but with a twice larger parameter), i.e.,
(4) 
Therefore, we have the following implications:

RIP implies RIP.

RIP implies RAP.

RIP implies RAP.
Recall that these RIPlike properties guarantee stable reconstruction of sparse vectors from by practical algorithms. With the above implications, it suffices to show RIP for . This is why the performance guarantees in compressed sensing are typically given in terms of RIP. The above argument also applies to an abstract atomic sparsity model [5] and to the sparse and rank1 model [6].
1.3 RIPlike properties in blind deconvolution
Next, we present our main results that derive RIPlike properties of the linear operator in subsampled blind deconvolution at near optimal sample complexity. In fact, these properties hold for a slightly more general model than an exact sparsity model. To state the main results in this setup, we define a set of approximately sparse vector by
(5) 
Theorem 1.4.
There exist absolute numerical constants and such that the following holds. Let be independent random matrices whose entries are i.i.d. following . Let be defined in (1).

If , then with probability at least ,
for all and for all .

If , then with probability at least ,
for all and for all .
In the course of proving Theorem 1.4, we also obtain the following corollary, the proof of which is contained in the proof of Theorem 1.4.
Corollary 1.5.
There exist absolute numerical constants and such that the following holds. Let be a random matrix whose entries are i.i.d. following . Let . Let be defined in (1). Suppose that . Then, with probability at least ,
for all and for all .
Theorem 1.6.
There exist absolute numerical constants and such that the following holds. Let be independent random matrices whose entries are i.i.d. following . Let be defined in (1). If , then with probability at least ,
for all , , , and such that and .
Corollary 1.7.
There exist absolute numerical constants and such that the following holds. Let be independent random matrices whose entries are i.i.d. following . Let be defined in (1). If , then with probability at least ,
(6) 
for all , , , and such that either or .
Proof of Corollary 1.7.
The above results combined with their implications in a companion paper [2] provide performance guarantees for subsampled blind deconvolution at near optimal sample complexity of .
Note that Theorem 1.4 derives a sufficient condition respectively for the RAP and the RAP of , where and are defined by
On the other hand, Corollary 1.7 derives a sufficient condition for the ROP of .
The derivations of these RIPlike properties are significantly different from the previous RIP analyses in the following senses: i) In general, a restrictive set does not satisfy an inclusion property like (4). The restrictive sets and , induced from both the sparsity and spectral flatness, correspond to this case. The nonconvex cone structure induced from a nonnegativity prior is yet another example for this case. Therefore, RIPlike properties are not directly implied by the corresponding RIP, and it is necessary to derive RIPlike properties independently. ii) More difficulties arise from the subsampling in the time domain following the convolution. In particular, the random measurement functionals are not mutually independent, which was one of the crucial assumptions in previous RIP analyses. Technically, deriving the RAP in Theorem 1.6 involves bonding the deviation of a fourthorder chaos process. We exploit the total orthogonality assumed in Theorem 1.6 to avoid such a complicated scenario.
Recall that Theorems 1.4 and 1.6 consider an approximate sparsity model that covers a wider set than the set of exactly sparse vectors. During the proofs, we also provide extensions of conventional RIP analysis of an i.i.d. subgaussian sensing matrix and partial Fourier sensing matrix in compressed sensing as side results, which might be of independent interest.
The rest of this paper is organized as follows: In Section 2, we extend the previous work on suprema of chaos processes by Krahmer et al. [7] from a quadratic form to a bilinear form. Key entropy estimates are derived in Section 3 along with their applications to showing the RIP of random matrices for approximately sparse vectors. In Section 4, the proofs for the main theorems are presented. Then, we conclude the paper with discussions.
1.4 Notations
Various norms are used in this paper. The Frobenius norm of a matrix is denoted by . The operator norm from to will be . Absolute constants will be used throughout the paper. Symbols are reserved for realvalued positive absolute constants. Symbols is a positive integer absolute constant. For a matrix , its elementwise complex conjugate, its transpose, and its Hermitian transpose are respectively written as , , and . For a linear operator between two vector spaces, will denote its adjoint operator. The matrix inner product between two matrices and is denoted by . Matrix will represent the unitary discrete Fourier transform and stands for the circular convolution where its length is clear from the context. We will use the shorthand notation . Let . Then, denotes the coordinate projection whose action on a vector keeps the entries of indexed by and sets the remaining entries to zero. The identity map on will be denoted by .
2 Suprema of Chaos Processes
2.1 Covering number and dyadic entropy number
Let be convex sets where is a Banach space. The covering number, denoted by , is defined as
The th dyadic entropy number, denoted by , is defined as
Then, the covering number and dyadic entropy number satisfy
(7) 
Indeed, the inequality in (7) is derived as follows:
2.2 Subadditivity of functional
Let be a metric space. An admissible sequence of , denoted by , is a collection of subsets of that satisfies and for all . The functional [8] is defined by
Lemma 2.1.
Let and be metric spaces embedded in a common vector space. Then,
Proof.
Let and denote admissible sequences for and , respectively. Define by and for . Then, for all , and satisfies and for all . This implies that is an admissible sequence of . By the definition of the functional, we have
where the second inequality holds because the metric satisfies the triangle inequality. Since the choice of admissible sequences and was arbitrary, by taking the infimum with respect to and , we get the desired inequality. ∎
2.3 Suprema of chaos processes: bilinear forms
Krahmer et al. [7] showed the concentration of a subgaussian quadratic form.
Theorem 2.2 ([7, Theorem 3.1]).
Let be an subgaussian vector with . Let . Then for ,
where and are constants that only depend on , and , , and are given by
Our main observation here is that a simple application of the polarization identity provides the extension of the concentration result by Krahmer et al. [7] from a subgaussian quadratic form to a subgaussian bilinear form. Note that a quadratic form is a special case of a bilinear form.
Theorem 2.3.
Let be an subgaussian vector with . Let . Then for ,
where and are constants that only depend on , and , , and are defined in Theorem 2.2.
Proof of Theorem 2.3.
The main result in [7, Theorem 3.5] states that for a collection of selfadjoint matrices
(8) 
where the terms is defined by
(9)  
By the polarization identity and the subadditivity of with respect to the Minkowski sum (Lemma 2.4), we extend [7, Theorem 3.5] to the bilinear case, which is summarized in Lemma 2.5.
The next step of applying Markov’s inequality to the th moment in the proof of Theorem 2.2 applies here without modification, which competes the proof. ∎
Lemma 2.4.
Let be as defined in (9). For every complex number of unit modulus,
Proof.
By the triangle inequality, we have and . Moreover, Lemma 2.1 implies
The assertion follows by applying these results to the definition of . ∎
Lemma 2.5.
Let be an subgaussian vector with . Let . Then for every ,
3 Key Entropy Estimates
In this section, we derive entropy estimates (lemmas 3.2 and 3.6), which are key components in the proofs of the main results in Section 4. These lemmas also extend the previous RIP results on certain random matrices to the case where the linear operator is restricted to the set of compressible vectors instead of exactly sparse vectors.
The restricted isometry property of a subgaussian matrix and a partial Fourier matrix has been well studied in the compressed sensing literature. The restrictive model in these studies was the standard sparsity model, which consists of exactly sparse vectors in . We will derive Lemmas 3.2 and 3.6 in the course of extending the previously known RIP of random matrices to the RIP, where the set of approximately sparse vectors is defined in (5).
3.1 Subgaussian linear operator
We start with a subgaussian matrix , whose entries are i.i.d. following . Several derivations of the RIP of have been presented (cf. [3, 9, 7]). For example, the recent result by Krahmer et al. [7] is summarized as follows:
Theorem 3.1 ([7, Theorem C.1]).
A subgaussian matrix satisfies RIP with probability at least if
Earlier proofs [3, 9] consist of the following two steps: i) For any with , the corresponding submatrix , with columns of indexed by , has its singular values concentrated within except with exponentially small probability; ii) An upper bound on the probability for the violation () with the worst case choice of , obtained by a union bound, still remains small. The first step was shown either by the large deviation result [10] or by a standard volume argument together with the concentration of a subgaussian quadratic form. It is not straightforward to extend these approaches to the case where the restriction set includes approximately sparse vectors. Recently, Krahmer et al. [7, Appendix C] proposed an alternative derivation of the RIP of a subgaussian matrix . They derived a Dudleytype upper bound on the function of (the set of sparse vectors within the unit ball) given by
(10) 
We extend their result in (10) to the approximately sparse case, which is stated in the following lemma.
Lemma 3.2.
Remark 3.3.
Proof of Lemma 3.2.
Since , we have
(11)  
where the second step holds by the change of variables, and the third step follows from (7).
Note that is of type if and of type2 if . Furthermore, is a contraction, Therefore, Maurey’s empirical method (cf. [11, Proposition 2], [12]) implies
where is defined by
Let denote the unique solution to . Then, . The following cases for cover all possible scenarios.
 Case 1:

If , then
 Case 2:

If , then
 Case 3:

If , then since for , we have
Therefore,
which implies
(12) 
For , we use the standard volume argument to get
Indeed, by the standard volume argument ([13, Lemma 1.7]), we have
which implies
Therefore,