RIP-like Properties in Subsampled Blind Deconvolution

RIP-like Properties in Subsampled Blind Deconvolution

Kiryung Lee and Marius Junge
Abstract

We derive near optimal performance guarantees for subsampled blind deconvolution. Blind deconvolution is an ill-posed bilinear inverse problem and additional subsampling makes the problem even more challenging. Sparsity and spectral flatness priors on unknown signals are introduced to overcome these difficulties. While being crucial for deriving desired near optimal performance guarantees, unlike the sparsity prior with a nice union-of-subspaces structure, the spectral flatness prior corresponds to a nonconvex cone structure, which is not preserved by elementary set operations. This prohibits the operator arising in subsampled blind deconvolution from satisfying the standard restricted isometry property (RIP) at near optimal sample complexity, which motivated us to study other RIP-like properties. Combined with the performance guarantees derived using these RIP-like properties in a companion paper, we show that subsampled blind deconvolution is provably solved at near optimal sample complexity by a practical algorithm.

1 Introduction

1.1 Subsampled blind deconvolution of sparse signals

The subsampled blind deconvolution problem refers to the resolution of two signals from a few samples of their convolution and is formulated as a bilinear inverse problem as follows. Let denote the set of sampling indices out of . Given , the sampling operator is defined so that the th element of is the th element of for . Then, the samples of the convolution indexed by with additive noise constitute the measurement vector , which is expressed as

where denotes additive noise.

Let be uniquely represented as and over dictionaries and . Then, the recovery of is equivalent to the recovery of , and the subsampled blind deconvolution problem corresponds to the bilinear inverse problem of recovering from its bilinear measurements in , when , , and are known.

A stable reconstruction in subsampled blind deconvolution is defined through the lifting procedure [1] that converts the blind deconvolution to recovery of a rank-1 matrix from its linear measurements. By the lifting procedure, bilinear measurements of are equivalently rewritten as linear measurements of the matrix , i.e., there is a linear operator such that

Then, each element of the measurement vector corresponds to a matrix inner product. Indeed, there exist matrices that describe the action of on by

(1)

Since the circular convolution corresponds to the element-wise product in the Fourier domain, ’s are explicitly expressed as

where denotes the th column of the unitary DFT matrix . The subsampled blind deconvolution problem then becomes a matrix-valued linear inverse problem where the unknown matrix is constrained to the set of rank-1 matrices.

In the lifted formulation, a reconstruction of the unknown matrix is considered successful if it satisfies the following stability criterion:

(2)

for an absolute constant . This definition of success is free of the inherent scale ambiguity in the original bilinear formulation. Once is recovered, (resp. ) is identified up to a scale factor as the left (resp. right) factor of the rank-1 matrix .

The subsampled blind deconvolution problem is ill-posed and cannot be solved without restrictive models on unknown signals. We assume the following signal priors, which are modified from a previous subspace model for blind deconvolution [1].

A1

Sparsity: The coefficient vector is -sparse. Geometrically, belongs to the union of all subspaces spanned by standard basis vectors. The previous subspace model [1] corresponds to a special case where the subspace in the union that includes is known a priori. To simplify the notation, define

where counts the number of nonzeros in . Then, . The other coefficient vector is -sparse, i.e., .

A2

Spectral flatness: The unknown signals and are flat in the Fourier domain as follows. Define a set by

(3)

where denotes the spectral flatness level of given by

Then, and . When and are invertible, it is equivalent to and .111For simplicity, we restrict our analysis to the case where and are invertible matrices. However, it is straightforward to extend the analysis to the case with overcomplete dictionaries by replacing the inverse by the preimage operator.

Our objective is to show that the subsampled blind deconvolution of signals following the aforementioned models is possible at near optimal sample complexity. Similarly to related results in compressed sensing, we take the following two-step approach: i) First, in a companion paper [2], it was shown that stable reconstruction from noisy measurements is available under a restricted isometry property (RIP) of the linear operator . In particular, under a mild additional assumption on signals, we show that a practical algorithm provably achieves stable reconstruction under RIP-like properties of ; ii) Next, in this paper, we prove that if both dictionaries are mutually independent random matrices whose entries are independent and identically distributed (i.i.d.) following a zero-mean and complex normal distribution , with high probability, such RIP-like properties hold at the sample complexity of . This sample complexity is near optimal (up to a logarithmic factor) when the spectral flatness parameters and are sublinear in and , respectively; Combining these results provides the desired near optimal performance guarantees.

1.2 RIP and RIP-like properties

We first review RIP and extend the notion to RIP-like properties. RIP was originally proposed to show the performance guarantee for the recovery in compressed sensing by -norm minimization [3]. It is generalized as follows:

Definition 1.1.

Let be a Hilbert space where denotes the Hilbert-Schmidt norm. Let be a centered and symmetric set, i.e., and for all in the unit modulus. A linear operator satisfies the -RIP if

or equivalently,

Hilbert-Schmidt norms, including the norm, are represented as an inner product of a vector with itself. For example, and . This observation extends RIP to another property called restricted angle-preserving property (RAP) defined as follows:

Definition 1.2.

Let be centered and symmetric sets. A linear operator satisfies the -RAP if

In a more restrictive case with orthogonality between and (), RAP reduces to the restricted orthogonality property (ROP) [4].

Definition 1.3.

Let be centered and symmetric sets. A linear operator satisfies the -ROP if

RIP and RAP of a linear operator have useful implications for the inverse problem given by . Let . The -RIP of implies that is injective when the domain is restricted to ; hence, every is uniquely identified from . The -RAP was used to show that practical algorithms, such as the projected gradient method, reconstruct from with a provable performance guarantee.

By definition, the -RAP implies the -RIP, but the converse is not true in general. For certain with special structures, RIP implies RIP-like properties. For example, when is a subspace, the Minkowski sum of and coincides with . Therefore, -RIP, -RIP, and -RAP are all equivalent. The restrictive set as a subspace arises in many applications. A set of matrices with Toeplitz, Hankel, circulant, symmetric, or skew symmetric structure corresponds to such an example.

Yet for another example, a sparsity model, which corresponds to a union of subspaces, provides the desired relationship between RIP and RIP-like properties. Let be the set with all -sparse vectors in the Euclidean space. Then, it follows that the difference set between and itself is contained within (another restrictive set of the same structure but with a twice larger parameter), i.e.,

(4)

Therefore, we have the following implications:

  • -RIP implies -RIP.

  • -RIP implies -RAP.

  • -RIP implies -RAP.

Recall that these RIP-like properties guarantee stable reconstruction of -sparse vectors from by practical algorithms. With the above implications, it suffices to show -RIP for . This is why the performance guarantees in compressed sensing are typically given in terms of -RIP. The above argument also applies to an abstract atomic sparsity model [5] and to the sparse and rank-1 model [6].

1.3 RIP-like properties in blind deconvolution

Next, we present our main results that derive RIP-like properties of the linear operator in subsampled blind deconvolution at near optimal sample complexity. In fact, these properties hold for a slightly more general model than an exact sparsity model. To state the main results in this setup, we define a set of approximately -sparse vector by

(5)
Theorem 1.4.

There exist absolute numerical constants and such that the following holds. Let be independent random matrices whose entries are i.i.d. following . Let be defined in (1).

  1. If , then with probability at least ,

    for all and for all .

  2. If , then with probability at least ,

    for all and for all .

In the course of proving Theorem 1.4, we also obtain the following corollary, the proof of which is contained in the proof of Theorem 1.4.

Corollary 1.5.

There exist absolute numerical constants and such that the following holds. Let be a random matrix whose entries are i.i.d. following . Let . Let be defined in (1). Suppose that . Then, with probability at least ,

for all and for all .

Theorem 1.6.

There exist absolute numerical constants and such that the following holds. Let be independent random matrices whose entries are i.i.d. following . Let be defined in (1). If , then with probability at least ,

for all , , , and such that and .

Corollary 1.7.

There exist absolute numerical constants and such that the following holds. Let be independent random matrices whose entries are i.i.d. following . Let be defined in (1). If , then with probability at least ,

(6)

for all , , , and such that either or .

Proof of Corollary 1.7.

If suffices to consider the case where . Due to the homogeneity of (6), without loss of generality, we may assume . Decompose as . Then, for satisfying .

where the second step follows from Theorems 1.4 and 1.6. ∎

The above results combined with their implications in a companion paper [2] provide performance guarantees for subsampled blind deconvolution at near optimal sample complexity of .

Note that Theorem 1.4 derives a sufficient condition respectively for the -RAP and the -RAP of , where and are defined by

On the other hand, Corollary 1.7 derives a sufficient condition for the -ROP of .

The derivations of these RIP-like properties are significantly different from the previous RIP analyses in the following senses: i) In general, a restrictive set does not satisfy an inclusion property like (4). The restrictive sets and , induced from both the sparsity and spectral flatness, correspond to this case. The non-convex cone structure induced from a nonnegativity prior is yet another example for this case. Therefore, RIP-like properties are not directly implied by the corresponding RIP, and it is necessary to derive RIP-like properties independently. ii) More difficulties arise from the subsampling in the time domain following the convolution. In particular, the random measurement functionals are not mutually independent, which was one of the crucial assumptions in previous RIP analyses. Technically, deriving the -RAP in Theorem 1.6 involves bonding the deviation of a fourth-order chaos process. We exploit the total orthogonality assumed in Theorem 1.6 to avoid such a complicated scenario.

Recall that Theorems 1.4 and 1.6 consider an approximate sparsity model that covers a wider set than the set of exactly -sparse vectors. During the proofs, we also provide extensions of conventional RIP analysis of an i.i.d. subgaussian sensing matrix and partial Fourier sensing matrix in compressed sensing as side results, which might be of independent interest.

The rest of this paper is organized as follows: In Section 2, we extend the previous work on suprema of chaos processes by Krahmer et al. [7] from a quadratic form to a bilinear form. Key entropy estimates are derived in Section 3 along with their applications to showing the RIP of random matrices for approximately sparse vectors. In Section 4, the proofs for the main theorems are presented. Then, we conclude the paper with discussions.

1.4 Notations

Various norms are used in this paper. The Frobenius norm of a matrix is denoted by . The operator norm from to will be . Absolute constants will be used throughout the paper. Symbols are reserved for real-valued positive absolute constants. Symbols is a positive integer absolute constant. For a matrix , its element-wise complex conjugate, its transpose, and its Hermitian transpose are respectively written as , , and . For a linear operator between two vector spaces, will denote its adjoint operator. The matrix inner product between two matrices and is denoted by . Matrix will represent the unitary discrete Fourier transform and stands for the circular convolution where its length is clear from the context. We will use the shorthand notation . Let . Then, denotes the coordinate projection whose action on a vector keeps the entries of indexed by and sets the remaining entries to zero. The identity map on will be denoted by .

2 Suprema of Chaos Processes

2.1 Covering number and dyadic entropy number

Let be convex sets where is a Banach space. The -covering number, denoted by , is defined as

The th dyadic entropy number, denoted by , is defined as

Then, the covering number and dyadic entropy number satisfy

(7)

Indeed, the inequality in (7) is derived as follows:

2.2 Subadditivity of functional

Let be a metric space. An admissible sequence of , denoted by , is a collection of subsets of that satisfies and for all . The functional [8] is defined by

Lemma 2.1.

Let and be metric spaces embedded in a common vector space. Then,

Proof.

Let and denote admissible sequences for and , respectively. Define by and for . Then, for all , and satisfies and for all . This implies that is an admissible sequence of . By the definition of the functional, we have

where the second inequality holds because the metric satisfies the triangle inequality. Since the choice of admissible sequences and was arbitrary, by taking the infimum with respect to and , we get the desired inequality. ∎

2.3 Suprema of chaos processes: bilinear forms

Krahmer et al. [7] showed the concentration of a subgaussian quadratic form.

Theorem 2.2 ([7, Theorem 3.1]).

Let be an -subgaussian vector with . Let . Then for ,

where and are constants that only depend on , and , , and are given by

Our main observation here is that a simple application of the polarization identity provides the extension of the concentration result by Krahmer et al. [7] from a subgaussian quadratic form to a subgaussian bilinear form. Note that a quadratic form is a special case of a bilinear form.

Theorem 2.3.

Let be an -subgaussian vector with . Let . Then for ,

where and are constants that only depend on , and , , and are defined in Theorem 2.2.

Proof of Theorem 2.3.

The main result in [7, Theorem 3.5] states that for a collection of self-adjoint matrices

(8)

where the terms is defined by

(9)

By the polarization identity and the subadditivity of with respect to the Minkowski sum (Lemma 2.4), we extend [7, Theorem 3.5] to the bilinear case, which is summarized in Lemma 2.5.

The next step of applying Markov’s inequality to the th moment in the proof of Theorem 2.2 applies here without modification, which competes the proof. ∎

Lemma 2.4.

Let be as defined in (9). For every complex number of unit modulus,

Proof.

By the triangle inequality, we have and . Moreover, Lemma 2.1 implies

The assertion follows by applying these results to the definition of . ∎

Lemma 2.5.

Let be an -subgaussian vector with . Let . Then for every ,

Proof of Lemma 2.5.

By the polarization identity, we have

Now the triangle inequality in (for ) implies the assertion in combination with Lemma 2.4. ∎

3 Key Entropy Estimates

In this section, we derive entropy estimates (lemmas 3.2 and 3.6), which are key components in the proofs of the main results in Section 4. These lemmas also extend the previous RIP results on certain random matrices to the case where the linear operator is restricted to the set of compressible vectors instead of exactly sparse vectors.

The restricted isometry property of a subgaussian matrix and a partial Fourier matrix has been well studied in the compressed sensing literature. The restrictive model in these studies was the standard sparsity model, which consists of exactly -sparse vectors in . We will derive Lemmas 3.2 and 3.6 in the course of extending the previously known -RIP of random matrices to the -RIP, where the set of approximately -sparse vectors is defined in (5).

3.1 Subgaussian linear operator

We start with a subgaussian matrix , whose entries are i.i.d. following . Several derivations of the -RIP of have been presented (cf. [3, 9, 7]). For example, the recent result by Krahmer et al. [7] is summarized as follows:

Theorem 3.1 ([7, Theorem C.1]).

A subgaussian matrix satisfies -RIP with probability at least if

Earlier proofs [3, 9] consist of the following two steps: i) For any with , the corresponding submatrix , with columns of indexed by , has its singular values concentrated within except with exponentially small probability; ii) An upper bound on the probability for the violation () with the worst case choice of , obtained by a union bound, still remains small. The first step was shown either by the large deviation result [10] or by a standard volume argument together with the concentration of a subgaussian quadratic form. It is not straightforward to extend these approaches to the case where the restriction set includes approximately -sparse vectors. Recently, Krahmer et al. [7, Appendix C] proposed an alternative derivation of the -RIP of a subgaussian matrix . They derived a Dudley-type upper bound on the function of (the set of -sparse vectors within the unit ball) given by

(10)

We extend their result in (10) to the approximately sparse case, which is stated in the following lemma.

Lemma 3.2.
Remark 3.3.

Lemma 3.2 provides an upper bound of the function of a larger set , consisting of approximately -sparse vectors, instead of the set of exactly -sparse unit vectors. On the other hand, unlike the upper bound in (10), the bound in Lemma 3.2 is suboptimal, but only by a logarithmic factor.

Proof of Lemma 3.2.

Since , we have

(11)

where the second step holds by the change of variables, and the third step follows from (7).

Note that is of type- if and of type-2 if . Furthermore, is a contraction, Therefore, Maurey’s empirical method (cf. [11, Proposition 2], [12]) implies

where is defined by

Let denote the unique solution to . Then, . The following cases for cover all possible scenarios.

Case 1:

If , then

Case 2:

If , then

Case 3:

If , then since for , we have

Therefore,

which implies

(12)

For , we use the standard volume argument to get

Indeed, by the standard volume argument ([13, Lemma 1.7]), we have

which implies

Therefore,