Deconvolution of Point Sources:A Sampling Theorem and Robustness Guarantees

# Deconvolution of Point Sources: A Sampling Theorem and Robustness Guarantees

## Abstract

In this work we analyze a convex-programming method for estimating superpositions of point sources or spikes from nonuniform samples of their convolution with a known kernel. We consider a one-dimensional model where the kernel is either a Gaussian function or a Ricker wavelet, inspired by applications in geophysics and imaging. Our analysis establishes that minimizing a continuous counterpart of the norm achieves exact recovery of the original spikes as long as (1) the signal support satisfies a minimum-separation condition and (2) there are at least two samples close to every spike. In addition, we derive theoretical guarantees on the robustness of the approach to both dense and sparse additive noise.

\xpatchcmd\thmt@restatable

[#1]\IfAppendix[#1]

Keywords. Deconvolution, sampling theory, convex optimization, sparsity, super-resolution, dual certificate, nonuniform sampling, impulsive noise, Gaussian convolution, Ricker wavelet.

## 1 Introduction

The problem of deconvolution consists of estimating a signal from data modeled as the convolution of with a kernel sampled at a finite number of locations , , …,

 yi:=(K∗μ)(si),i=1,2,…,n. (1.1)

This problem arises in multiple domains of the applied sciences, including spectroscopy [44, 20], neuroscience [74], geophysics [50], ultrasound [1, 43], optics [11], where is the point-spread function of the imaging system, and signal processing [49], where is the impulse response of a linear system. In many of these applications, the signal is well modeled as a superposition of point sources or spikes, which could correspond to heat sources [47], fluorescent probes in microscopy [7, 40], celestial bodies in astronomy [56] or neural action potentials in neuroscience [60]. Mathematically, the spikes can be represented as a superposition of Dirac measures supported on a finite set

 μ :=∑tj∈Tajδtj, (1.2)

with amplitudes . For such signals, the data correspond to samples from a linear combination of shifted and scaled copies of the convolution kernel

 yi=∑tj∈TajK(si−tj),i=1,2,…,n. (1.3)

To make our analysis concrete, we focus on two specific convolution kernels, depicted in Figure 1, that are particularly popular in deconvolution applications: the Gaussian kernel,

 KG(t) :=exp(−t22σ2), (1.4)

and the Ricker wavelet

 KR(t) :=(1−t2σ2)exp(−t22σ2), (1.5)

which equals the second derivative of the Gaussian kernel up to a multiplicative constant. In both cases is a real-valued parameter that determines the spread of the kernel. Figure 2 illustrates the measurement model described by equation 1.3 for the Gaussian and Ricker kernels. Gaussian kernels are used to model the point-spread function of diffraction-limited imaging systems such as microscopes [7, 75] and telescopes [3], blurring operators in computer vision [42] and the Green’s function of systems governed by the diffusion equation [47, 52]. The Ricker kernel is relevant to reflection seismology, a technique to estimate properties of the subsurface of the Earth by transmitting a short pulse into the ground and then measuring the reflected signal [66]. Processing these measurements can be reduced to a one-dimensional deconvolution problem of the form (1.3[55], where the signal represents the reflection coefficients at the interface between adjacent underground layers and the convolution kernel corresponds to the pulse, which can be modeled using a Ricker wavelet [59].

In the 1970s and 1980s, geophysicists working on reflection seismology proposed to tackle the spike-deconvolution problem by solving a regularized least-squares problem incorporating an -norm penalty to promote sparse solutions [70, 22, 45, 63, 26]. Empirically, the method often recovers the spikes exactly from noiseless data– this is the case for instance for the data shown in Figure 2– and produces robust estimates when the data are perturbed by noise. This optimization-based approach has also been successfully applied to deconvolution problems arising in other domains such as processing of marine seismic data [21], signal processing [29], ultrasound imaging [53, 5] and estimation of initial conditions for systems governed by the heat equation [47].

In this paper we provide a theoretical framework to analyze deconvolution via -norm minimization when the sampling pattern is nonuniform. In order to make our analysis independent of the discretization of the signal support, we consider a continuous counterpart of the norm, called the total-variation (TV) norm [36, Section 3.1], which should not be interpreted as the total variation of a piecewise-constant function [62], but rather as the analog of the norm for atomic measures. In fact, the TV norm of an atomic measure of the form , , equals the norm of the coefficients . Our goal is to characterize under what conditions minimizing the TV norm subject to measurement constraints is an accurate and robust deconvolution method. In the noiseless case, we consider the solution to the problem

 minimize~μ ∥~μ∥TV (1.6) subject to (K∗~μ)(si)=yi,i=1,…,n.

Our main result is a sampling theorem for deconvolution via convex programming. We prove that solving Problem (1.6) achieves exact recovery as long as the signal support is not too clustered– a condition which is necessary for the problem to be well posed– and the set of samples contains at least two samples that are close to each spike. Our analysis holds for any nonuniform sampling pattern satisfying this condition, except for some pathological cases described in creftypecap 2.1.2. In addition, the method is shown to be robust to dense additive noise in terms of support estimation. Finally, we illustrate the potential of this framework for incorporating additional assumptions by providing exact-recovery guarantees when the data are corrupted by impulsive perturbations.

The paper is organized as follows: In Section 2 we describe our main results and discuss related work. Section 3 is devoted to the proof of the deconvolution sampling theorem, which is based on a novel dual-certificate construction that is our main technical contribution. Sections 4 and 5 establish robustness to dense and sparse noise respectively. In Section 6 we report the results of several experiments illustrating the numerical performance of our methods. We conclude the paper in Section 7, where we discuss several future research directions. Code implementing all the algorithms discussed in this work is available online2.

## 2 Main Results

### 2.1 Conditions on the Signal Support and the Sample Locations

In this section we introduce conditions to characterize the class of signals and sampling patterns that we consider in our analysis of the point-source deconvolution problem.

#### Minimum Separation

A remarkable insight that underlies compressed-sensing theory is that randomized linear measurements preserve the energy of most sparse signals (more technically, they satisfy conditions such as the restricted isometry property [19]). As a result, it is possible to robustly recover arbitrary sparse signals from such measurements, even if the problem is underdetermined [18, 28]. Unfortunately, this is not the case for the deconvolution problem, unless the convolution kernel is random [39, 61]. For smooth deterministic kernels, the data corresponding to different sparse signals can be almost identical, even if we have access to their full convolution with the kernel. Figure 3 shows an example of this phenomenon, where a pair of distinct sparse signals become essentially indistinguishable after they are convolved with a Gaussian kernel (the same happens if we use a Ricker kernel instead). No algorithm would be able to tell the two signals apart from noisy samples of the convolution even at very high signal-to-noise ratios. The reason why the data are so similar becomes apparent in the frequency domain: most of the energy of the convolution kernel is concentrated in a low-pass band, whereas the energy of the difference between the signals is concentrated in the high end of the spectrum. Since convolution is equivalent to pointwise multiplication in frequency, the difference is suppressed by the measurement process.

In order to restrict our attention to a class of signals for which deconvolution is a well-posed problem for our convolution kernels of interest, we constrain the support to have a certain minimum separation.

###### Definition 2.1 (Minimum Separation).

The minimum separation of the support of a signal is

 Δ(T)=mini≠i′|ti−ti′|. (2.1)

This quantity was introduced in [14] in the context of super-resolution of point sources from low-pass measurements (see also [27]). In that case, the minimum separation necessary to ensure that two signals do not produce very similar measurements can be characterized theoretically by applying Slepian’s seminal work on prolate spheroidal sequences [67] (see also [51] and Section 3.2 in [14]). In Section 6.1 we report numerical experiments indicating that for Gaussian and Ricker convolution kernels the minimum separation must be at least around to avoid situations like the one in Figure 3.

#### Sample Proximity

In order to estimate a signal of the form (1.2) we need to determine two uncoupled parameters for each spike: its location and its amplitude. As a result, at least two samples per spike are necessary to deconvolve the signal from data given by (1.3). In addition, the amplitude of our convolution kernels of interest decay quite sharply away from the origin, so most of the energy in the convolved signal corresponding to a particular spike is concentrated around that location. We therefore need two samples that are near each spike to estimate the signal effectively. To quantify to what extent this is the case for a fixed sampling pattern and signal support we define the sample proximity of and . In words, and have sample proximity if for each spike in there are at least two samples that are close to it.

###### Definition 2.2 (Sample Proximity).

Fix a set of sample locations and a signal support . Then has sample proximity if for every spike location there exist two distinct sample locations such that

 |ti−s|≤γ(S,T)and∣∣ti−s′∣∣≤γ(S,T). (2.2)

To derive an upper bound on the sample proximity necessary for TV-norm minimization to identify a spike we consider a signal consisting of a single spike, , and a sampling pattern with two samples , where and . Figure 4 shows two alternative explanations for these data: a single spike at or two spikes at and . For the Gaussian kernel, minimizing the TV norm chooses the two-spike option when . For the Ricker wavelet, minimizing the TV norm chooses the two-spike option when .

If we describe a sampling pattern only through its sample proximity, we allow for the following situation: there could be only two samples close to a spike which also happen to be very close to each other. We cannot expect to derive robust-deconvolution guarantees for such patterns: a small perturbation in the measurements can have a dramatic effect on the estimated spike location for any recovery method due to the local smoothness of the convolution kernels of interest. This is illustrated in Figure 5. The good news is that this situation is pathological and would never arise in practice: placing the samples in this way requires knowing the location of the spikes beforehand! In order to rule out these patterns in our analysis, we define the sample separation associated to a given sample proximity and fix it to a small value.

###### Definition 2.3 (Sample Separation).

Fix a set of samples and a signal support with sample proximity . and have sample separation for every spike location there exist at least two samples locations and such that

 |ti−s|≤γ(S,T),∣∣ti−s′∣∣≤γ(S,T),and∣∣s−s′∣∣≥κ(S,T). (2.3)

### 2.2 Sampling Theorem for Exact Recovery

Our main result is a sampling theorem for deconvolution via convex optimization from nonuniform samples. We show that solving problem (1.6) achieves exact recovery for the Gaussian and Ricker kernels under minimum-separation and sample-proximity conditions on the support and sample locations.

###### Theorem 2.4.

Let be a signal defined by (1.2) and assume that the data are of the form (1.3), where is the Gaussian kernel or the Ricker wavelet. Then is the unique solution to problem (1.6), as long as the signal support and the sampling pattern have a minimum separation and a sample proximity lying in the orange region depicted in the corresponding plot of Figure 6, and the sample separation is greater than .

As explained in creftypecap 2.1.2, we fix a small sample separation to rule out pathological sampling patterns. Except for such cases, the result holds for any nonuniform sampling pattern satisfying the sample-proximity condition: among the samples there must be two close to each spike, but the rest can be situated at any arbitrary location. The proof of the theorem, presented in Section 3, relies on a novel construction of a dual certificate, which is our main technical contribution and can be applied to derive recovery guarantees for other convolution kernels.

In order to evaluate the accuracy of our theoretical analysis, we compare our results to the numerical performance of the method for a sampling pattern containing only two samples close to each spike placed at different sample proximities. This sampling pattern is not realistic, because placing only two samples close to every spike requires knowing the support of the signal. However, it provides a worst-case characterization of the method for a given sample proximity, since incorporating additional samples can only increase the chances of achieving exact recovery. The simulation, described in more detail in Section 6.2, confirms that the method succeeds as long as the minimum separation is not too small and the sample proximity is not too large. Figure 7 shows that the numerical exact-recovery region contains the theoretical exact-recovery region established by Theorem 2.4.

Our analysis focuses on the continuous TV-norm minimization problem. In practice, this can be implemented by discretizing the domain and then minimizing the -norm of the discretized estimate. Theorem 2.4 implies that this procedure succeeds if the original signal is supported on the discretization grid. As we explain at the end of Section 2.3 the techniques we present to derive robustness guarantees provide some control over the error incurred if the signal is not supported on the grid. Performing a more detailed analysis of discretization error is an interesting direction for future research.

###### Corollary 2.5.

Assume that the support of the measure in equation 1.2 lies on a known discretized grid . Then as long as its support and the set of samples satisfy the conditions of Theorem 2.4, the true coefficients , …, are the unique solution to the -norm minimization problem

 minimize~a∈R|G| ∥~a∥1 (2.4) subject to ∑tj∈T~ajK(si−tj)=yi,i=1,…,n.
###### Proof.

Problem (2.4) is equivalent to problem (1.6) restricted to measures supported on . As a result, any respective solutions and must satisfy . By Theorem 2.4, (1.6) is uniquely minimized by . Since is supported on , is the unique solution of (2.4). ∎

When the original spike does not lie exactly on the grid the approximation error can be controlled to some extent using the results on additive dense noise in the following section. A more specific analysis of the effect of discretization in the spirit of [32] is an interesting topic for future research.

### 2.3 Robustness to dense noise

In this section we consider the deconvolution problem when additive noise perturbs the data,

 yi:=(K∗μ)(si)+zi,i=1,2,…,n. (2.5)

Our only assumption on the perturbation is that it has bounded norm, i.e. for some noise level . Otherwise, the error is arbitrary and can be adversarial. To adapt the TV-norm minimization problem (1.6) to this scenario we relax the data-consistency constraint from an equality to an inequality involving a known upper bound on the noise level ,

 minimize~μ ∥~μ∥TV (2.6) subject to n∑i=1(yi−(K∗~μ)(si))2≤¯ξ2.

For real analytic kernels that decay at infinity, which includes the Gaussian kernel and the Ricker wavelet, the solution to this problem is atomic. {restatable}[Proof in Appendix E]lemmaNoiseAtomic Assume satisfies (2.5) with . If is real analytic with as then the solution to problem (2.6) is an atomic measure with finite support of the form

 ^μ =∑^tk∈ˆT^akδ^tk,^a1,…,^a∣∣ˆT∣∣∈R. (2.7)

Empirically, we observe that the estimated support is an accurate estimate of the true support when the noise is not too high, as illustrated in Figure 8. The following theorem shows that this is guaranteed to be the case under the assumptions of Theorem 2.4.

###### Theorem 2.6 (Proof in Section 4).

Let be a signal defined by (1.2) and assume that the data are of the form (2.5), where is the Gaussian kernel or the Ricker wavelet and . If the sample separation is at least and the minimum separation and sample proximity lie in the exact-recovery regions depicted in Figure 6, the solution

 ^μ:=∑^tk∈^T^akδ^tk (2.8)

to problem (2.6) has the following properties:

 ∣∣ ∣ ∣∣aj−∑{^tl∈^T:|^tl−tj|≤ησ}^al∣∣ ∣ ∣∣ ≤C1¯ξ√|T|for all tj∈T, (2.9) ∑{^tl∈^T,tj∈T:|^tl−tj|≤ησ}|^al|(^tl−tj)2 ≤C2¯ξ√|T|, (2.10) ∑{^tl∈^T:|^tl−tj|>ησ,∀tj∈T}|^al| ≤C3¯ξ√|T|, (2.11)

where depend only on , , and . Upper bounds on the values of are given in Figure 9.

Property (2.9) implies that the amplitude of each spike is well approximated by the sum of the estimated spikes close to it, where close means in a radius of . is smaller than for the Gaussian kernel and for the Ricker kernel for most values of the minimum separation and sample proximity for which exact recovery occurs, as shown in Figure 9. Property (2.10) implies that the estimated support clusters around the support of the true signal. Property (2.11) shows that spurious spikes detected away from the support have small magnitude. An interesting consequence of the theorem is that the error incurred by the method at each spike can be bounded by a quantity that does not depend on the amplitude of other spikes.

###### Corollary 2.7.

Under the conditions of Theorem 2.6, for any element in the support of such that there exists an element in the support of the estimate satisfying

 |ti−^ti|≤ ⎷C2¯ξ√|T||ai|−C1¯ξ√|T|. (2.12)

These guarantees demonstrate that the method is robust in a small-noise regime: the error tends to zero as the noise decreases. This occurs under the same conditions that we require for exact recovery. In particular, we only assume that the sampling pattern has two samples close to each spike, but we do not account for the rest of the samples. Characterizing how additional samples improve the estimation accuracy of the algorithm in the presence of noise is an interesting problem for future research.

The results of this section allow us to control the error of our recovered signal when the measurements are corrupted by noise. Another source of measurement error is due to discretization. As mentioned earlier, in practice we often solve the continuous TV-norm minimization problem by discretizing and solving an -norm minimization problem. In general the true signal does not necessarily lie on the grid, which means the model is not completely accurate. We can account for this by solving the following discretized version of problem (2.6)

 minimize~a∈R|G| ∥~a∥1 (2.13) subject to n∑i=1⎛⎝yi−∑gj∈G~ajK(si−gj)⎞⎠2≤¯ξ2,

where is some fixed discretization grid. By bounding the effect of shifting the true measure onto we can obtain results analogous to Theorem 2.6 and Corollary 2.7 for . Here is the grid separation and is the Lipschitz constant of the kernel. Improving these somewhat crude error bounds is an interesting direction for future work.

### 2.4 Sparse Noise

In this section we consider data contaminated by outliers, where some of the samples are completely corrupted by impulsive noise,

 yi:=(K∗μ)(si)+wii=1,2,…,n. (2.14)

Here is a sparse vector with arbitrary amplitudes. In order to account for the presence of the sparse perturbation, we incorporate an additional variable to Problem (2.6) and add a corresponding sparsity-inducing penalty term to the cost function:

 minimize~μ,~w ∥~μ∥TV+λ∥~w∥1 (2.15) subject to (K∗~μ)(si)+~wi=yi,i=1,…,n,

where is a regularization parameter. An equivalent approach was previously proposed in [37] without theoretical guarantees. In [35] an analogous formulation is applied to spectral super-resolution in the presence of outliers, where the impulsive noise occurs in the spectral domain and is consequently very incoherent with the signal. In our case, the sparse noise and the clean samples are much less incoherent: most of the energy corresponding to a given spike in the signal is concentrated in a few samples around it, which could be mistaken for sparse noise. For the Gaussian and Ricker kernels, we observe that solving problem (2.15) is effective as long as there are not too many contiguous noisy samples. Figure 10 shows an example where the solution to the optimization problem estimates both the signal and the sparse corruptions exactly.

The following theorem provides exact-recovery guarantees in the presence of sparse noise. For simplicity, we assume that the samples lie on a regular grid with step size , so that where , instead of requiring a sample-proximity condition. In addition, we impose a minimum-separation condition on the support of the signal, as well as on the set of noise locations to preclude it from being too clustered. Finally, we require that the samples surrounding the spikes and the sparse noise, denoted by

 I :={si∈S∖N∣∃tj∈T,si≤tj

respectively, be unaffected by the noise and non-overlapping (so that there are at least two clean samples between every corruption and every spike).

###### Theorem 2.8 (Proof in Section 5).

Let be a signal defined by (1.2) and be data of the form (2.14), where is the Gaussian kernel and the samples lie on a grid with step size . Assume the spikes and samples are both surrounded by clean samples, so that:

 |I|=2|T|,|C|=2|N|,andC∩I=∅. (2.18)

If has minimum separation , and the noise locations have minimum separation then solving problem (2.15) with parameter recovers and exactly.

###### Theorem 2.9 (Proof in Section 5).

Let be a signal defined by (1.2) and be data of the form (2.14), where is the Ricker wavelet and the samples lie on a grid with step size . If (2.18) holds, has minimum separation , and the noise locations have minimum separation then solving problem (2.15) with parameter recovers and exactly.

Section 6.3 reports the results of several numerical experiments showing that the method achieves exact recovery under weaker assumptions than those in Theorems 2.9 and 2.8. Characterizing the performance of the method more generally is an interesting research problem. These experiments also suggest that the approach is quite robust to the choice of the regularization parameter . The following lemma, which is based on Theorem 2.2 in [16], provides some theoretical justification: it establishes that if the method recovers a signal and a sparse noise realization for a certain value of , then it also achieves exact recovery for data corresponding to any trimmed versions of the signal or the noise for that same value of . Note that denotes the nonzero entries of , as opposed to which denotes the corresponding sample locations.

{restatable}

[Proof in Appendix H]lemmaSparseTrim Let be a vector with support and let be an arbitrary measure such that

 yi=(K∗μ)(si)+wi, (2.19)

for . Assume that the pair is the unique solution to Problem (2.15) for the data , and consider the data

 y′i=(K∗μ′)(si)+w′i, (2.20)

for . Here is a trimmed version of : it is equal to on a subset of its support. Similarly, the support of satisfies . For any choice of and , the pair is the unique solution to Problem (2.15) if we set the data vector to equal for the same value of .

### 2.5 Related work

There are two previous works that derive exact recovery guarantees for the Gaussian kernel in one dimension. In [64] the authors prove exact deconvolution of spikes from samples at arbitrary locations using a weighted version of the TV norm that avoids having to impose a sample proximity condition. The result only holds for positive spikes, but does not require a minimum-separation condition. In [73] the authors prove exact recovery of spikes from samples lying on a uniform grid, also without a minimum-separation condition. The method can be interpreted as an extension of Prony’s method [25]. In both cases, no robustness guarantees are provided (such guarantees would require conditions on the support of the signal, as explained in creftypecap 2.1.1).

As mentioned in the introduction, to the best of our knowledge -norm minimization for deconvolution was originally proposed by researchers in geophysics from the 1970s [70, 22, 45, 63, 26]. Initial theoretical works focused on analyzing random convolution kernels [39, 61] using techniques from compressed sensing [19, 28]. A series of recent papers analyze TV-norm minimization for recovering point sources from low-pass measurements [23, 14, 34] and derive stability guarantees [15, 33, 4, 69, 31, 46]. This framework has been extended to obtain both exact-recovery and robustness guarantees for deconvolution via convex programming for bump-like kernels including the Gaussian in [5, 6] and specifically Ricker kernels in [55] under the assumption that the complete convolution between a train of spikes and the kernel of interest is known (i.e. without sampling).

When the convolution kernel is a low-pass function, as is approximately the case for the Gaussian kernel, one can reformulate the spike deconvolution as a super-resolution problem. First, the spectrum of the convolved signal is estimated from the data. Then the low-frequency component of the spikes is approximated by dividing the result by the spectrum of the convolution kernel. Finally, the spikes are recovered from the estimate of their low-pass spectrum using spectral super-resolution techniques based on convex programming [14] or on Prony’s method [25, 68], as in the finite-rate-of-innovation (FRI) framework [73, 30]. This framework can also be applied to arbitrary non-bandlimited convolution kernels [71] and nonuniform sampling patterns [54], but without exact-recovery guarantees.

## 3 Proof of Exact Recovery (Theorem 2.4)

In the proof of Theorem 2.4 we use standardized versions of our kernels where :

 KG(t):=exp(−t22)andKR(t):=(1−t2)exp(−t22)=−(KG)(2)(t), (3.1)

without loss of generality. This is equivalent to expressing in units of . Auxiliary Mathematica code to perform the computations needed for the proof is available online3.

### 3.1 Dual Certificate

We prove Theorem 2.4 by establishing the existence of a certificate that guarantees exact recovery.

{restatable}

[Proof in Appendix A]propositionDualCert Let be the nonzero support of a signal of the form (1.2). If for any sign pattern there exists a function of the form

 Q(t):=n∑i=1qiK(si−t) (3.2)

satisfying

 Q(tj)=ρj, ∀tj∈T, (3.3) |Q(t)|<1, ∀t∈Tc, (3.4)

then the unique solution to Problem (1.6) is . In words, to prove exact recovery we need to show that is possible to interpolate the sign of the amplitudes of any superposition of spikes, which we denote , on the support of the spikes using scaled copies of the convolution kernel centered at the location of the samples.

The vector is known as a dual certificate in the literature [19] because it certifies recovery and is a solution to the Lagrange dual of Problem (1.6):

 maximizeq qTy (3.5) subject to supt∣∣ ∣∣n∑i=1qiK(si−t)∣∣ ∣∣≤1.

Dual certificates have mostly been used to derive guarantees for inverse problems involving random measurements, including compressed sensing [19, 17], matrix completion [12] and phase retrieval [13]. In such cases, the construction relies on concentration bounds and other tools from probability theory [72]. In contrast, our setting is completely deterministic.

Our techniques are inspired by the deterministic certificate used to establish exact recovery of an atomic measure from low-pass measurements in [14]. In that case, the certificate corresponds to a low-pass trigonometric polynomial which also interpolates an arbitrary sign pattern on the support of the atomic measure that represents the signal. This polynomial is constructed via interpolation using a low-pass interpolation kernel with fast decay. The crucial difference with our setting is that in [14] the interpolating function just needs to be low-pass, which makes it possible to center the shifted copies of the interpolation kernel at the elements of the signal support ( in our notation). Similarly, if the whole convolution is assumed to be know, as in [5, 6, 55], a certificate can be built by interpolating the sign pattern with the convolution kernel in the same way. In contrast, if we incorporate sampling into the measurement process, we are constrained to use shifted kernels centered at the sample locations . This requires a new method for constructing the dual certificate, which is our main technical contribution.

By the sample proximity and separation conditions (Definition 2.2), we can associate each spike location in with two nearby samples that are not too close to each other. This allows us to form a subset of size that contains pairs of samples separated by that are -close to each spike location in the support . We construct the dual combination using kernels centered at the elements of ,

 Q(t):=∑~si∈˜SqiK(~si−t). (3.6)

In order to satisfy condition (3.3), must interpolate the sign pattern on . To satisfy condition (3.4), must have a local extremum at each element of . Otherwise the construction will violate condition (3.4) close to the support of the signal as shown in Figure 12. To favor local extrema on , we constrain the derivative of to vanish at those points. The sign-interpolation and zero-derivative constraints yield a system of equations,

 Q(ti) =ρi, (3.7) Q′(ti) =0,for all ti∈T.

The main insight underlying our proof is that the solution to this system is amenable to analysis once is reparametrized appropriately. The reparametrization, described in the next section, allows us to prove that the system is invertible and to control the amplitude of on .

### 3.2 Interpolation With Bumps and Waves

We construct the dual combination defined in Section 3.1 by interpolating an arbitrary sign pattern using modified interpolation kernels, which are linear combinations of shifted copies of . The construction is a reparametrization of (3.6) of the form

 Q(t)=∑ti∈TαiBti(t,~si,1,~si,2)+βiWti(t,~si,1,~si,2), (3.8)

where for each we define a pair of modified kernels and . The motivation is that controlling the coefficients and is considerably simpler than controlling , …, directly.

We call the first interpolation kernel a bump, defined as

 Bti(t,~si,1,~si,2) :=bi,1K(~si,1−t)+bi,2K(~si,2−t), (3.9)

where and are the two elements of that are closest to . The coefficients and are adjusted so that has a local extremum equal to one at :

 Bti(ti,~si,1,~si,2) =1, (3.10) ∂∂tBti(ti,~si,1,~si,2) =0. (3.11)

If and are close enough to , the extremum is a maximum and is indeed a bump-like kernel centered at . Figure 11 shows examples of bumps when is the Gaussian kernel and the Ricker wavelet.

The bumps are linear combinations of shifted copies of the interpolation kernel centered at the samples, so we can construct a dual combination of the form (3.2) interpolating an arbitrary sign pattern by weighting them appropriately. The problem is that the resulting construction does not satisfy the zero-derivative condition in (3.7) and therefore tends to violate condition (3.3) close to the elements of , as shown in Figure 12. This is the reason why our dual-combination candidate (3.8) contains an additional interpolation kernel, which we call a wave,

 Wti(t,~si,1,~si,2) =wi,1K(~si,1−t)+wi,2K(~si,2−t). (3.12)

In this case, the coefficients are adjusted so that

 Wti(ti,~si,1,~si,2) =0, (3.13) ∂∂tWti(ti,~si,1,~si,2) =1. (3.14)

Each is a wave-shaped function centered at . Figure 13 shows examples of waves when is the Gaussian kernel and the Ricker wavelet. The additional degrees of freedom in (3.8) allow us to enforce the zero-derivative condition and obtain a valid dual combination. The role of the wave in our analysis is analogous to the role of the derivative of the interpolation kernel in the dual-certificate construction for super-resolution from low-pass data [14]. Note that here we cannot use the derivative of the bumps or of , as the resulting construction would not be a linear combination of shifted copies of .

The following lemma provides a closed-form expression for the coefficients used to build the bump and the wave. In addition, it shows that the bump and wave always exist for the Gaussian kernel and the Ricker wavelet under mild assumptions on the sample proximity and the sample separation.

{restatable}

[Proof in Section B.2]lemmaBumpExists For a fixed kernel , the bump and wave exist with coefficients given by

 [bi,1wi,1bi,2wi,2] =1K(~si,2−ti)K(1)(~si,1−ti)−K(1)(~si,2−ti)K(~si,1−ti)[−K(1)(~si,2−ti)−K(~si,2−ti)K(1)(~si,1−ti)K(~si,1−ti)]

when the expression in the denominator is nonzero. The Gaussian kernel has nonzero denominator for . The Ricker wavelet has nonzero denominator when and . The condition is natural for the Ricker wavelet, since it has roots at .

We would like to emphasize that the bumps and waves are just a tool for analyzing the dual certificate obtained by solving the system of equations (3.7). Under the conditions of Theorem 2.4, which ensure that the support of the signal is not too clustered and that there are two samples close to each spike, we show that each coefficient is close to the sign of the spike and is small, so that

 Q(t) =∑ti∈TαiBti(t,~si,1,~si,2)+βiWti(t,~si,1,~si,2) (3.15) ≈∑ti∈TρiBti(t,~si,1,~si,2). (3.16)

This allows us to control and show that it is a valid dual combination. In contrast, characterizing the coefficients , …, directly is much more challenging, as illustrated in Figure 14.

The following two lemmas formalize this intuition. The first establishes that the system of equations (3.7) is invertible. The second shows that the amplitude of is bounded on . Together with Section 3.1 they yield a proof of Theorem 2.4. Both of their proofs rely heavily on bounds controlling the bumps and waves and their derivatives, which are compiled in Section 3.3.

###### Lemma 3.1 (Proof in Section 3.4).

Under the assumptions of Theorem 2.4, the system of equations (3.7) has a unique solution.

###### Lemma 3.2 (Proof in Section 3.5).

Under the assumptions of Theorem 2.4, the dual combination corresponding to the solution system (3.7) satisfies for all .

We now illustrate the proof technique described in this section using a simple example. Consider an atomic measure consisting of 3 point masses:

 μ:=a1δt1+a2δt2+a3δt3. (3.17)

Here . Let denote a set of sample locations, and let be defined by

 ~S={~s1,1,~s1,2,~s2,1,~s2,2~s3,1,~s3,2}, (3.18)

where and are close to for . We fix a sign pattern . Using kernels centered at the elements of we construct a dual combination of the form in (3.6) that satisfies the interpolation equations in (3.7). The resulting dual combination can be seen in the first and third plots of Figure 14. The amplitudes of the kernels centered at the samples are difficult to control. We then reparametrize into the form given in (3.8) by writing it as a linear combination of 3 bumps and 3 waves. This reparametrization is shown in the second and fourth plots of Figure 14. As can be seen in the figure, the bump coefficients roughly match the values of and the wave coefficients are very small, making them amenable to analysis.

### 3.3 Bounding Bumps and Waves

In this section we establish bounds on the bumps and waves and their derivatives, which are used in Sections 3.4 and 3.5 to establish exact recovery. To simplify notation, without loss of generality we consider a bump and a wave centered at the origin and set

 B(t,~si,1,~si,2) :=B0(t,~si,1,~si,2), (3.19) W(t,~si,1,~si,2) :=W0(t,~si,1,~si,2). (3.20)

We begin with a simplifying observation. For even kernels like the Gaussian and Ricker, the bumps and waves exhibit a form of symmetry. {restatable}[Proof in Section B.3]lemmaEvenKernel Let be a kernel with corresponding bump and wave . If the kernel satisfies for all then

 B(t,s1,s2)=B(−t,−s1,−s2)andW(t,s1,s2)=−W(−t,−s1,−s2), (3.21)

for all .

The shape of each bump and wave depends on where its corresponding samples are located. In order to account for all possible sample locations satisfying the sample-proximity and sample-separation conditions in Definition 2.2, we define sample-independent upper bounds of the form

 |B(i)|∞(t) :=sup|s1|,|s2|≤γ(S,T)|s1−s2|≥κ(S)|B(i)(t,s1,s2)|, (3.22) |W(i)|∞(t) :=sup|s1|,|s2|≤γ(S,T)|s1−s2|≥κ(S)|W(i)(t,s1,s2)|, (3.23) B(i)∞(t) :=sup|s1|,|s2|≤γ(S,T)|s1−s2|≥κ(S)B(i)(t,s1,s2), (3.24)

for . By Section 3.3, these functions are all even. The bounds are depicted in Figure 15.

To simplify our analysis, we define monotonized versions of the absolute bounds above for :

 |B(i)|↓∞(t) :=sup{|B(i)|∞(u)∣u≥|t|}, (3.25) |W(i)|↓∞(t) :=sup{|W(i)|∞(u)∣u≥|t|}. (3.26)

The decay properties of the Gaussian and Ricker kernels translate to fast-decaying bumps and waves. The following lemma formalizes this, by controlling the tails of our monotonic bounds. {restatable}[Proof in Section B.4]lemmaTailDecay For the Gaussian and Ricker kernels we have

 ∞∑j=6|B(i)|↓∞((j−1/2)Δ) ≤10−12κ(S), ∞∑j=6|W(i)|↓∞((j−1/2)Δ) ≤10−12κ(S), (3.27) |B(i)|↓∞(t) ≤10−12κ(S), |W(i)|↓∞(t) ≤10−12κ(S), (3.28)

for , , and . When applying the lemma, corresponds to the minimum separation of the signal support. There is nothing special about the constants above (i.e. , , ); they are simply chosen for definiteness and other values would also work (see Section B.4 for more details).

Section 3.3 shows that the bumps, waves, and their derivatives are small for . The following lemma provides piecewise constant bounds for . {restatable}[Proof in Section B.5]lemmaPiecewise Fix and and let . Partition the interval into intervals of the form

 Uj :=[10(j−1)N1,10jN1), (3.29)

and . For the Gaussian and Ricker kernels, there exist functions , , , such that for all and

 |B(i)|∞(t) ≤˜|B(i)|(j), (3.30) |W(i)|∞(t) ≤˜|W(i)|(j), (3.31) B(i)∞(t) ≤˜B(i)(j). (3.32)

These bounds depend on , and an additional parameter , satisfying as explained in Section B.5.

Section 3.3 can be used to extend the bounds of Section 3.3 to the monotonized bumps, waves, and their derivatives.

###### Corollary 3.3.

Assuming the conditions and definitions in Sections 3.3 and 3.3 we have, for and

 |B(i)|↓∞(t) ≤max(maxk:j≤k≤N1˜|B(i)|(k),ϵ), (3.33) |W(i)|↓∞(t) ≤max(maxk:j≤k≤N1˜|B(i)|(k),ϵ), (3.34)

where .

Section 3.3 and Corollary 3.3 provide upper bounds on the bumps, waves, and their derivatives that are symmetric, sample-independent and, in the case of Corollary 3.3, monotonic.

### 3.4 Proof of Lemma 3.1: Invertibility of the Interpolation Equations

In this section we prove Lemma 3.1, establishing the invertibility of the interpolation equations (3.7). As a by-product, we also obtain bounds on the interpolation coefficients in (3.8). For simplicity we use the following abbreviated notation for the bumps and waves:

 Bi(t) :=Bti(t,~si,1,~si,2), (3.35) Wi(t) :=Wti(t,~si,1,~si,2). (3.36)

To begin, we express equations (3.7) in terms of bumps and waves:

 n∑j=1αjBj(ti)+βjWj(ti) =ρi, (3.37) n∑j=1αjB(1)j(ti)+βjW(1)j(ti) =0,for all ti∈T. (3.38)

Define the matrices by

 (B)ij:=Bj(ti),(W)ij:=Wj(ti),(B(1))ij:=B(1)j(ti),(W(1))ij:=W(1)j(ti). (3.39)

This allows us to express the reparametrized interpolation equations in block-matrix form:

 (3.40)

If is sufficiently large we can exploit the decay of the bumps and waves to prove that this matrix is close to the identity. This is formalized in the following linear-algebra result, which shows how to convert norm bounds on these blocks into bounds on