Guaranteed Blind Sparse Spikes Deconvolution via Lifting and Convex Optimization

# Guaranteed Blind Sparse Spikes Deconvolution via Lifting and Convex Optimization

Yuejie Chi,  Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.The author is with Department of Electrical and Computer Engineering and Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210. This work is supported in part by ONR under grant N00014-15-1-2387 and NSF under grant CCF-1527456. Email: chi.97@osu.edu. Preliminary results of this paper were presented in part at the Asilomar Conference on Signals, Systems, and Computers (Asilomar), Pacific Grove, CA, Nov. 2015.
###### Abstract

Neural recordings, returns from radars and sonars, images in astronomy and single-molecule microscopy can be modeled as a linear superposition of a small number of scaled and delayed copies of a band-limited or diffraction-limited point spread function, which is either determined by the nature or designed by the users; in other words, we observe the convolution between a point spread function and a sparse spike signal with unknown amplitudes and delays. While it is of great interest to accurately resolve the spike signal from as few samples as possible, however, when the point spread function is not known a priori, this problem is terribly ill-posed. This paper proposes a convex optimization framework to simultaneously estimate the point spread function as well as the spike signal, by mildly constraining the point spread function to lie in a known low-dimensional subspace. By applying the lifting trick, we obtain an underdetermined linear system of an ensemble of signals with joint spectral sparsity, to which atomic norm minimization is applied. Under mild randomness assumptions of the low-dimensional subspace as well as a separation condition of the spike signal, we prove the proposed algorithm, dubbed as AtomicLift, is guaranteed to recover the spike signal up to a scaling factor as soon as the number of samples is large enough. The extension of AtomicLift to handle noisy measurements is also discussed. Numerical examples are provided to validate the effectiveness of the proposed approaches.

{keywords}

blind spikes deconvolution, lifting, atomic norm, joint spectral sparsity

## I Introduction

In many applications, the goal is to estimate the set of delays and amplitudes of point sources contained in a sparse spike signal from its convolution with a band-limited or diffraction-limited point spread function (PSF) , which is either determined by the nature or designed by the users. This describes the problem of estimating target locations in radar and sonar, firing times of neurons, direction-of-arrivals in array signal processing, etc.

When the PSF is assumed perfectly known, many algorithms have been developed to retrieve the spike signal, ranging from subspace methods such as MUSIC [1] and ESPRIT [2] to total variation minimization [3]. However, in many applications, the PSF is not known a priori, and must be estimated together with the spike model, referred to as blind spikes deconvolution. As an example, in neural spike train decoding, the characteristic function of the neurons is determined by the nature and needs to be calibrated [4]. As another example, in blind channel estimation for wireless communications, the transmitted signal is modulated by unknown data symbols, therefore the receiver has to perform joint channel estimation and symbol decoding. A related problem is blind calibration of uniform linear arrays [5], where it is desirable to calibrate the gains of the array antennas in a blind fashion.

Broadly speaking, blind deconvolution of two signals from their convolution falls into the category of bilinear inverse problems, which is in general ill-posed without further constraints. The identifiability, up to an unavoidable scaling ambiguity, of these problems has recently been investigated in [6][8] under the constraints that one or both signals are sparse or lie in some known subspace. Along the algorithmic line of research, conventional approaches for blind deconvolution are typically based on expectation maximization [5, 9], which often suffer from local minima and lack performance guarantees. Recently, Ahmed, Recht and Romberg developed a provably-correct algorithm for blind deconvolution by assuming both signals lie in some known low-dimensional subspaces [10] under certain conditions. The key in their approach is the so-called lifting trick that translates the problem into an under-determined linear system with respect to a lifted rank-one matrix, which can be exactly recovered using a nuclear norm minimization algorithm. Ling and Strohmer [11] further extended this framework by allowing one of the signal to be sparse in a known dictionary, and applied -norm minimization to the lifted sparse matrix, for which sufficient conditions for exact recovery are also provided. Finally, Lee et. al. proposed an alternating minimization framework [12] to the case when both signals are sparse in a known dictionary, and established convergence guarantees from a near-optimal number of samples under some conditions.

### I-a Our Contributions

In this paper, we study the problem of blind spikes deconvolution, where we want to jointly estimate the PSF and the spike signal composed of a small number of delayed and scaled Dirac functions. Since it is more convenient to work in the Fourier domain, we start by sampling the Fourier transform of the convolution, giving rise to a measurement vector , where denotes point-wise product, is the sampled Fourier transform of the PSF, is the sampled Fourier transform of the spike signal, which is a sum of complex sinusoids with frequencies determined by the corresponding delays, is the number of spikes, and is an additive noise term. Our problem is to recover the set of spikes contained in from the possibly noisy observation .

Motivated by [10], we assume that the PSF lies in a known low-dimensional subspace , i.e. , , where the orientation of in the subspace, given by , still needs to be estimated. This assumption is quite flexible and holds, at least approximately, in a sizable number of applications [10]. Through a novel application of the lifting trick, we show now it is possible to translate the measurement vector into a set of linear measurements with respect to the matrix . While it is tempting to directly recover from the obtained linear system of equations, it is under-determined since we have more unknowns, , than the number of observations, . Fortunately, note that the columns of can be regarded as an ensemble of spectrally-sparse signals with the same spectral support, it is therefore possible to motivate this structure in the solution using the recently proposed atomic norm for spectrally-sparse ensembles [13, 14]. Specifically, we seek the matrix with minimum atomic norm that satisfies the set of linear measurements exactly in the noiseless setting, and within the noise level in the noisy setting. The proposed algorithm is referred to as AtomicLift. AtomicLift can be efficiently implemented via semidefinite programming using off-the-shelf solvers. Moreover, the spikes can be localized by identifying the peaks of a dual polynomial constructed from the dual solution of AtomicLift. Numerical examples are provided to demonstrate the effectiveness of AtomicLift in both noiseless and noisy settings.

To establish rigorous performance guarantees of AtomicLift in the noiseless case, we assume that each row of is identically and independently drawn from a distribution that obeys a simple isotropy property and an incoherence property, which is motivated by Candès and Plan in their development of a RIPless theory of compressed sensing [15]. This implies the PSF to have certain “spectral-flatness” property, so that the PSF has on average the same energy at different frequencies. Moreover, this assumption is flexible to allow the entries in each row of to be correlated. On the other hand, we assume the minimum separation between spikes is at least , where . This condition is the same as the requirement in [3, 16] even when the PSF is known perfectly. Under these conditions, we show that in the noiseless setting, with high probability, AtomicLift recovers the spike signal model up to a scaling factor as soon as is on the order of up to logarithmic factors. Importantly, our result does not make randomness assumptions on the spike signal nor the orientation of the PSF in the subspace . Recall that when the PSF is known exactly, it is capable to resolve spikes as soon as is on the order of . Therefore, when both and are not too large, AtomicLift is provably capable of blind spikes deconvolution at a price of more samples. The stability analysis of AtomicLift in the noisy setting is presented elsewhere [17] due to space limits.

Our proof is based on constructing a valid vector-valued dual polynomial that certifies the optimality of the proposed convex optimization algorithm with high probability. The construction is inspired by [3, 16], where the squared Fejer’s kernel is an essential building block in the construction. Nonetheless, significant, and nontrivial, modifications are necessary since our dual polynomial is vector-valued rather than scalar-valued as in the existing works, and is additionally complicated by the special linear operator induced from lifting.

### I-B Comparisons with Related Work

Our approach is inspired by the pioneering work of [10, 11, 18], which applied the lifting trick to quadratic and bilinear inverse problems such as phase retrieval and blind deconvolution. In [10], both the PSF and the signal are assumed lying in some known subspaces with dimension and respectively. It is established in [10] that a nuclear norm minimization algorithm achieves exact recovery from a near-optimal number of samples up to logarithmic factors, as long as the subspace of is deterministic and satisfies certain spectral-flatness condition, and the subspace of is composed of i.i.d. Gaussian entries. Unfortunately, this algorithm cannot be applied in our setting, as the signal does not lie in a known subspace, but rather an unknown subspace parameterized by the continuous-valued locations of the spikes.

In [11], Ling and Strohmer extended the framework in [10] to allow the signal to be a -sparse vector in a random Gaussian or random partial DFT matrix. It is established in [11] that an -minimization algorithm achieves exact recovery as soon as is on the order of up to logarithmic factors. If the locations of the spikes in lies on the grid of the DFT frame, becomes a sparse vector in the DFT frame, it is possible to apply the algorithm proposed in [11]. However, the performance guarantees in [11] cannot be applied. Moreover, since the locations of the spikes do not necessarily lie on any a priori defined grid, it will encounter the basis mismatch issue discussed extensively in [19] that potentially results in significant performance degeneration. The same holds true for the algorithm of Lee et. al. [12] which assumes is sparse in a pre-determined dictionary.

Finally, our work is related to recent advances in super-resolution [3, 16, 20][23] using total variation or atomic norm minimization, but significantly deviates from the existing literature since we focus on the more challenging case when the PSF is not known. To the best of the author’s knowledge, our work provides the first algorithm for blind super-resolution with provable performance guarantees.

### I-C Paper Organization

The rest of the paper is organized as follows. Section II formulates the problem of blind spikes deconvolution, describes the proposed AtomicLift algorithm and its performance guarantees. Section III provides numerical examples to demonstrate the performance of AtomicLift. Section IV proves the main theorem in this paper. Finally, we conclude and outline a few future directions in Section V.

Throughout the paper, we use boldface capital letters to denote matrices and vectors , to denote the transpose, to denote the conjugate transpose, and to denote the conjugate. , denote the Frobenius norm and the spectral norm of a matrix respectively, and denotes the norm of a vector .

## Ii AtomicLift for Blind Sparse Spikes Deconvolution

We will first describe the problem formulation and the AtomicLift algorithm in the noiseless case, and then extends to the noisy case.

### Ii-a Problem Formulation

Let be a continuous-time spike signal given as

 x(t)=K∑k=1¯akδ(t−¯τk),

where is the number of spikes, and are the complex amplitude and delay of the th spike, , and is the maximum allowable delay of the spikes. Let be the PSF with the bandwidth . The convolution of and is given as

 y(t) =x(t)∗g(t)=K∑k=1¯akg(t−¯τk), (1)

where denotes convolution. Taking the Fourier transform of (1), we have

 Y(f)=X(f)G(f)=(K∑k=1¯ake−j2πf¯τk)⋅G(f) (2)

for , where , , and are the Fourier transforms of , , and respectively. In order to digitally process the output, we will uniformly sample (2) at points , , and denote , , and . This yields

 yn =xn⋅gn=(K∑k=1¯ake−j2πnτkBmaxTmax2M)⋅gn, (3)

where is the normalized delay. From (3), it is straightforward to see that the number of samples needs to satisfy so that the delays can be uniquely identified. Since we’re interested in algorithmic frameworks that allow identification of the spike signal using as small as possible, without loss of generality, we will assume , and consider the normalized delays in this paper, and the sample complexity becomes the bandwidth of .

We can now rewrite (3) as

 yn=gn⋅xn=gn⋅(K∑k=1¯ake−j2πnτk), (4)

where , for . Denote the set of spike locations as . In the matrix form, we rewrite (4) as

 y=diag(g)x, (5)

where , , and . Interestingly, (5) is also related to blind calibration for uniform linear arrays, where can be interpreted as the vector of array antenna gains.

Clearly, there is an unavoidable scaling ambiguity for the identification of and from , since for any nonzero scalar , . Our goal is to recover both and , in particular, the set of spikes with their corresponding amplitudes up to a scaling factor.

### Ii-B AtomicLift

The problem of blind spikes deconvolution is extremely ill-posed without further constraints [6]. In this paper, inspired by [10], we make the assumption that lies in a known low-dimensional subspace, given as

 g=Bh,

where is known, is unknown, and . Denote , where is the th column of the matrix . We rewrite in (4) as

 yn=bTnhxn=bTnheTnx=eTn(xhT)bn, (6)

where is the th standard basis vector of . Let , using the lifting trick [10, 11, 18], (6) can be rewritten as a linear measurement of ,

 yn=eTnZ⋆bn=⟨Z⋆,enbHn⟩,n=−2M,…,2M,

where . Therefore, can be regarded a set of linear measurements of , i.e.

 y=X(Z⋆), (7)

where denotes the operator that performs the linear mapping (6).

Since from , we can recover and , up to a scaling factor, as the left and right singular vectors of , respectively, we now wish to recover the matrix from . Though it appears the number of unknowns is much more than the number of measurements, a key observation is that can be regarded as a signal ensemble where each column signal is composed of complex sinusoids with the same frequencies,

 Z⋆=xhT=K∑k=1akc(τk)hT∈CN×L,

where , and

 c(τ)=1√N[e−j2π(−2M)τ,…,1,…,e−j2π(2M)τ]T

represents a complex sinusoid with the frequency . Therefore, it is possible to motivate the joint spectral sparsity of the columns of by minimizing the atomic norm [24] for joint spectrally-sparse ensembles proposed by the author in [13]. To proceed, define the set of atoms as

 A={A(τ,u)=c(τ)uH∈CN×L|τ∈[0,1),∥u∥2=1}.

The atomic norm seeks the tightest convex relaxation of decomposing a matrix into the smallest number of atoms in , and is defined as [13]

 ∥Z∥A=inf{t>0:Z∈tconv(A)} (8) =infτk∈[0,1)uk∈CL:∥uk∥2=1{∑kck∣∣Z=∑kckA(τk,uk),ck≥0},

where is the convex hull of . The corresponding decomposition of that achieves the atomic norm is called the atomic decomposition. Moreover, admits an equivalent semidefinite programming (SDP) characterization [13] which can be computed efficiently using off-the-shelf solvers:

 ∥Z∥A=infu∈CN,W∈CL×L{ 12Tr(toep(u))+12Tr(W)∣∣ [toep(u)ZZHW]⪰0},

where is the Toeplitz matrix with as the first column. We then propose the following algorithm, denoted as AtomicLift, to motivate the joint spectral sparsity of by seeking the matrix with the smallest atomic norm satisfying the measurements:

 ^Z=argminZ∈CN×L∥Z∥As. t.y=X(Z). (9)

### Ii-C Performance Guarantee of AtomicLift

The main result of this paper is that if we assume the rows of the subspace are i.i.d. drawn from some distribution that satisfies the isotropy property and the incoherence property, together with a mild separation condition for the spike signal, the proposed AtomicLift algorithm provably recovers the PSF as well as the spike signal, up to a scaling ambiguity, with high probability as long as is large enough.

Specifically, we assume each row of is sampled independently and identically from a population , i.e. , . Furthermore, we require satisfies the following properties:

• Isotropy property: is said to satisfy the isotropy property if for ,

 EbbH=IL.
• Incoherence property: for , define the coherence parameter of as the smallest number that

 max1≤i≤L|bi|2≤μ

holds.

The above definitions are motivated by [15] in the development of a RIPless theory of compressed sensing. In particular, the incoherence parameter is a deterministic bound on the maximum entry of , which can be extended to a stochastic setting using the stochastic incoherence discussed in [15], so that is allowed to be composed of unbounded sub-Gaussian or sub-exponential distributions. It is possible to extend our results to the stochastic setting in [15], but for conciseness in this paper, we’ll limit ourselves to the deterministic setting.

We discuss the implications of the above properties for the PSF when is arbitrary. Following the isotropy property, we have for all , which means that on average, the PSF is “spectrally flat”, having the same energy across different frequencies. Also, following the isotropy property, where the lower bound can be met, for example by selecting to be in the form of

 b=[1,ej2πf,…,ej2π(L−1)f], (10)

where is chosen uniformly at random in .

Furthermore, define the minimum separation of the spike signal as

 Δ=min1≤i

which is evaluated as the wrap-around distance on the unit circle. The performance guarantee of AtomicLift is presented in Theorem 1, which is proven in Section IV.

###### Theorem 1.

Let . Assume lies in a random subspace whose rows are sampled i.i.d. from a population satisfying the isotropy property and the incoherence property, with the coherence parameter . If , then there exists a numerical constant such that

 M≥CμK2L2log2(Mδ)

is sufficient to guarantee that we can recover via the AtomicLift algorithm with probability at least .

Theorem 1 allows to have arbitrary orientation in the subspace , and applies to any deterministic spike signal as long as it satisfies the separation condition regardless of the amplitudes. The separation is the same as required by Candès and Fernandez-Granda [3] for spikes deconvolution using total variation minimization even when the PSF is known perfectly. Therein they established that measurements are sufficient to exactly recover the spike signal. In comparison, our performance guarantee is probabilistic that holds with high probability.

Theorem 1 suggests that as long as is on the order of up to logarithmic factors, AtomicLift provably recovers the spike signal with high probability, as long as the conditions in Theorem 1 are satisfied. If is a small constant independent of and , our bound simplifies to , which suggests blind spike deconvolution is possible at a cost of more measurements.

### Ii-D AtomicLift for noisy data

We consider a noisy version of (4), where the frequency-domain data samples are contaminated by additive noise:

 yn=xn⋅gn+wn, (11)

where is bounded by . Accordingly, the lifted measurement model becomes

 y=X(Z⋆)+w. (12)

The AtomicLift algorithm can be modified with the measurement constraint that obeys the noise level, given as

 ^Znoisy=argminZ∈CN×L∥Z∥As.t.∥y−X(Z)∥2≤ϵ. (13)

### Ii-E Spike Localization

Define . The dual norm of can be defined as [13]

 ∥Y∥⋆A =supτ∈[0,1)∥∥YHc(τ)∥∥2.

The dual problem of (9) can thus be written as

 ^p=argmaxp∈CN⟨p,y⟩Rs.t.∥X∗(p)∥⋆A≤1, (14)

and the dual problem of (13) can be written as

 ^p=argmaxp∈CN⟨p,y⟩−ϵ2∥p∥2,s.t.∥X∗(p)∥∗A≤1, (15)

where . Write the vector-valued dual polynomial as

 Q(τ) =(X∗(^p))Hc(τ),

where is the solution of the dual problems (14) and (15), then the spikes can be localized by the peaks of :

 ^T={τ∈[0,1)|∥Q(τ)∥2=1}. (16)

We refer the readers to standard arguments in [16, 25, 14] for more details.

## Iii Numerical Experiments

We perform a series of numerical experiments to validate the performance of AtomicLift implemented using MOSEK [26]111The code can be downloaded from http://www2.ece.ohio-state.edu/~chi/papers/atomiclift.m.. Without loss of generality, in all numerical experiments, we set the index , rather than as in the previous sections.

Let . We first generate the spike locations uniformly at random, respecting the minimum separation (which is smaller than what the theory requires), whose coefficients are generated with a dynamic range of dB and uniform phase. We generate the subspace by selecting each row i.i.d. following (10) with , and choose the coefficient vector as an all-one vector. Fig. 1 (a) shows the PSF in the time domain, and the convolution with a spike signal with spikes is shown in (b). If the PSF is known, one can deconvolve (b) with the PSF and obtain the calibrated time-domain signal in (c). Clearly Fig. 1 (c) is very different from Fig. 1 (b), therefore calibration must be performed if the PSF is unknown to avoid severe performance degeneration. Fig. 1 (d) demonstrates the exact localization of the spikes via computing the dual polynomial of AtomicLift.

We next examine phase transition of the AtomicLift algorithm. We randomly generate the low-dimensional subspace with i.i.d. standard Gaussian entries, and the coefficient vector with i.i.d. standard Gaussian entries. The spike signal is generated in the same fashion as above. For each simulation, we compute the normalized error , where is the estimate of the lifted matrix . First fix . For each pair of , we run Monte Carlo simulations of the AtomicLift algorithm, and claim the reconstruction of a simulation is successful if the normalized error is below . Fig. 2 (a) shows the average success rate with respect to the number of spikes and the subspace dimension . For comparison, we also plot the hyperbola curve which roughly matches the phase transition boundary. Similarly, Fig. 2 (b) shows the average success rate with respect to and for a fixed , and Fig. 2 (c) shows the average success rate with respect to and for a fixed . The phase transition plots suggest that AtomicLift succeeds when , which is better than the prediction of our theory. Fig. 2 (d) shows the average success rate with respect to and as in the same setting of Fig. 2 (b), except that the locations of the spikes are randomly generated without obeying the separation condition. It can be seen that the phase transition is not as sharp, which is in line with existing results in applying atomic norm minimization to spectrum estimation [16, 25, 14].

Finally, we examine the performance of AtomicLift in the noisy setting. Using similar setup as Fig. 2 when and , we introduce additive white Gaussian noise as in (11), where each is i.i.d. generated with . The signal-to-noise ratio (SNR) is defined as dB. Using a standard tail bound [27], we set in (13). We use (16) to identify the spike locations. Fig. 3 shows the recovered source locations and their magnitudes. It is worth noting that the dual polynomial will overestimate the number of sources, and further model order estimation is still necessary for noisy data.

## Iv Proof of Main Theorem

In this section we prove Theorem 1. First, we describe the desired form of a valid vector-valued dual polynomial and its properties that will guarantee the optimality of AtomicLift in Proposition 1. We next design the dual polynomial with the help of the squared Fejer’s kernel [3] in Section IV-A. The rest of the proof is then to carefully validate it satisfies all the required properties. Proposition 1 presents a sufficient condition for the optimality of AtomicLift in (9), whose proof is provided in Appendix A.

###### Proposition 1.

The solution to (9) is unique if there exists a vector such that the vector-valued dual polynomial

 Q(τ) =(X∗(q))Hc(τ)=BTdiag(q)c(τ) =1√N2M∑n=−2Me−j2πτnq∗nbn∈CL (17)

satisfies

 Q(τk)=1∥h∥22sign(a∗k)h∗∀τk∈T, (18a) ∥Q(τ)∥2<1,∀τ∈[0,1]∖T, (18b)

where is the complex sign function.

Proposition 1 suggests that the solution to AtomicLift in (9) is exact and equals to if we can find a dual polynomial with the valid form that satisfies (18a) and (18b). Our objective is then to construct such a valid dual polynomial.

### Iv-a Construction of the dual polynomial

Without loss of generality, we assume from now on. Consider the squared Fejer’s kernel and its derivative as

 K(τ) =1M2M∑n=−2Msne−j2πτn, K′(τ) =1M2M∑n=−2Msn(−j2πn)e−j2πτn.

where following the definition [3]. To proceed, we define randomized matrix-valued versions of and as

 K(τ) =1M2M∑n=−2Msn(bnbHn)e−j2πτn∈CL×L, K′(τ) =1M2M∑n=−2Msn(−j2πn)(bnbHn)e−j2πτn∈CL×L,

where the derivatives are entry-wise. Clearly,

 EK(τ)=1M2M∑n=−2MsnE(bnbHn)e−j2πτn=K(τ)IL,

and , where is the identity matrix following the isotropy property. We then construct the vector-valued dual polynomial as

 Q(τ)=K∑k=1K(τ−τk)αk+K∑k=1K′(τ−τk)βk, (19)

where and , . It is straightforward to see that has the valid form defined in (1) of Proposition 1. We select the coefficient vectors as the solution to the linear equations given below

 {Q(τk)=sign(a∗k)h∗,τk∈T,Q′(τk)=0,τk∈T. (20)

which can be rewritten as equation (21) after denoting . For simplicity, we denote the LHS matrix of (21) as . Before inverting (21), we need to establish that is invertible with high probability.

### Iv-B Invertibility of Γ

The expectation of can be given as , where is the Kronecker product, is given in (22), where

The following lemma is useful from [16] regarding .

###### Lemma 1.

[16, Proposition IV.1] Let . Then is invertible and

 ∥I−Φ∥ ≤0.3623,∥Φ∥≤1.3623,∥∥Φ−1∥∥≤1.568.

Note that we can write as a sum of independent random matrices as

 Γ =1M2M∑n=−2Msn(νn⊗bn)(νn⊗bn)H =1M2M∑n=−2Msn(νnνHn)⊗(bnbHn),

then can be written as

 Γ−¯Γ =2M∑n=−2MSn, (23)

where . The following lemma, whose proof is given in Appendix B, establishes the concentration of around .

###### Lemma 2.

Let and . For any , as long as

 M≥80μKLχ2log(4KLδ), (24)

we have holds with probability at least .

Denote the event , which holds with probability at least as long as (24) holds. This implies that the matrix is invertible when holds for some , since

 ∥I−Γ∥ ≤∥∥I−¯Γ∥∥+∥∥¯Γ−Γ∥∥≤0.3623+χ<1,

where from Lemma 1. Under , let , where and , then we have

 (25)

With the above parameterization, satisfies the first condition in (18a). Let , where and . Then we have

and . Then using elementary linear algebra that is essentially the same as [16, Corollary IV.5], we have the following lemma.

###### Lemma 3.

On the event with , we have

 ∥∥Γ−1−¯Γ−1∥∥≤2∥∥¯Γ−1∥∥2χ,∥∥Γ−1∥∥≤2∥∥¯Γ−1∥∥.\vspace−0.05in

### Iv-C Certifying (18b)

The rest of the proof is to guarantee satisfies (18b) with high probability. Let be the th order entry-wise derivative of , . The th entry of can be written as

 K(m)ℓ,s(τ)=1M2M∑n=−2Msn(−j2πn)m(bn,ℓb∗n,s)e−j2πτn,

where is the th entry of . Further define as

 Ξ(m)(τ) =1M2M∑n=−2Msn(j2πκn)mej2πτn(νn⊗bn)bHn,

whose expectation can be written as

 ¯Ξ(m)(τ) =1M2M∑n=−2Msn(j2πκn)mej2πτn(νn⊗IL) =ζ(m)(τ)⊗IL,

where

 ζ(m)(τ) =1M2M∑n=−2Msn(j2πκn)mej2πτnνn. (26)

Let be the th order entry-wise derivative of , where the th order derivative of the th entry of can be written as

 Q(m)ℓ(τ)=K∑k=1L∑s=1[K(m)ℓ,s(τ−τk)αk,s+K(m+1)ℓ,s(τ−τk)βk,s].

Then, can be written as

 =[Ξ(m)(τ)−¯Ξ(m)(τ)+¯Ξ(m)(τ)]H ⋅(L−¯L⊗IL+¯L⊗IL)[sign(a∗)⊗h∗] =[¯Ξ(m)(τ)]H(¯L⊗IL)[sign(a∗)⊗h∗] +I(m)1(τ)+I(m)2(τ), (27)

where

 I(m)1(τ)=[Ξ(m)(τ)−¯Ξ(m)(τ)]HL[sign(a∗)⊗h∗]

and

 I(m)2(τ)=[¯Ξ(m)(τ)]H(L−¯L⊗IL)[sign(a∗)⊗h∗].

Furthermore, the first term in (IV-C) can be written as

 [¯Ξ(m)(τ)]H(¯L⊗IL)[sign(a∗)⊗h∗] =(ζ(m)(τ)H⊗IL)[¯Lsign(a∗)⊗h∗] =[ζ(m)(τ)H¯Lsign(a∗)]h∗:=κm¯Q(m)(τ)h∗,

where becomes the scalar-valued dual polynomial constructed in [3]. Now, in (IV-C) can be rewritten as

 κmQ(m)(τ)=κm¯Q(m)(τ)h∗+I(m)1(τ)+I(m)2(τ). (28)

The rest of the proof proceeds in the following steps. We first establish that is uniformly bounded for points on a grid ; next, we extend that is uniformly bounded for all ; finally, we show that , .

To bound , the central argument is to bound for a fixed , which is based on a significantly modified argument of [16, Lemma IV.6] whose proof is provided in Appendix C.

###### Lemma 4.

Let and fix . Let

 ¯σ2m=24m+1μLMmax⎧⎨⎩1,2KL√2μM⎫⎬⎭,

and fix a positive number

 a≤⎧⎪ ⎪⎨⎪ ⎪⎩(M8μL)1/4% if 2KL√2μM≥1,√M8μKLotherwise.

then we have

 ∥∥Ξ(m)(τ)−¯Ξ(m)(τ)∥∥≤4mL√2μKM+a¯σm,

holds for with probability at least for some .

We then establish that is bounded on the grid in the following lemma, proved in Appendix D.

###### Lemma 5.

Let and . Let be a finite set defined on . As long as

 (29)

for some constant , we have

 P{supτd∈Υgrid∥∥I(m)1(τd)∥∥2≤ϵ,m=0,1,2,3}≥1−16δ.

We next bound , which is supplied in the following lemma proved in Appendix E.

###### Lemma 6.

Let and . As long as

 M≥CμK2Llog(4KLδ), (30)

for some constant , we have

 P{∥∥I(m)2(τ)∥∥2≤ϵ,m=0,1,2,3}≥1−8δ.

The next task is to extend the above inequalities to the unit interval by choosing the grid size properly. Define the event

 E3={∥∥κmQ(m)(τ)−κm¯Q(m)(τ)h∗∥∥2≤ϵ3,m=0,1,2,3}.

Combining Lemma