# Analysis of Inpainting via Clustered Sparsity and Microlocal Analysis

## Abstract

Recently, compressed sensing techniques in combination with both wavelet and directional representation systems have been very effectively applied to the problem of image inpainting. However, a mathematical analysis of these techniques which reveals the underlying geometrical content is completely missing. In this paper, we provide the first comprehensive analysis in the continuum domain utilizing the novel concept of clustered sparsity, which besides leading to asymptotic error bounds also makes the superior behavior of directional representation systems over wavelets precise. First, we propose an abstract model for problems of data recovery and derive error bounds for two different recovery schemes, namely minimization and thresholding. Second, we set up a particular microlocal model for an image governed by edges inspired by seismic data as well as a particular mask to model the missing data, namely a linear singularity masked by a horizontal strip. Applying the abstract estimate in the case of wavelets and of shearlets we prove that – provided the size of the missing part is asymptotically to the size of the analyzing functions – asymptotically precise inpainting can be obtained for this model. Finally, we show that shearlets can fill strictly larger gaps than wavelets in this model.

Minimization, Cluster Coherence, Inpainting, Parseval Frames, Sparse Representation, Data Recovery, Shearlets, and Meyer Wavelets

**Acknowledgements.** Emily J. King is supported by a fellowship for postdoctoral researchers from the Alexander von Humboldt Foundation. Gitta Kutyniok would like to thank David Donoho for discussions on this and related topics. She is grateful to the Department of Statistics at Stanford University and the Department of Mathematics at Yale University for their hospitality and support during her visits. She also acknowledges support by the Einstein Foundation Berlin, by Deutsche Forschungsgemeinschaft (DFG) Heisenberg fellowship KU 1446/8, Grant SPP-1324 KU 1446/13 and DFG Grant KU 1446/14, and by the DFG Research Center Matheon “Mathematics for key technologies” in Berlin. Xiaosheng Zhuang acknowledges support by DFG Grant KU 1446/14. Finally, the authors are thankful to the anonymous referees for their comments and suggestions.

## 1Introduction

A common problem in many fields of scientific research is that of missing data. The human visual system has an amazing ability to fill in the missing parts of images, but automating this process is not trivial. Also, depending on the type of data, the human senses may be unable to fill in the gaps. Conservators working to repair damaged paintings use the term *inpainting* to describe the process. This word now also means digitally recovering missing data in videos and images. The removal of overlaid text in images, the repair of scratched photos and audio recordings, and the recovery of missing blocks in a streamed video are all examples of inpainting. Seismic data are also commonly incomplete due to land development and bodies of water preventing optimal sensor placement [28]. In seismic processing flow, data recovery plays an important role.

One very common approach to inpainting is using variational methods [2]. However, recently the novel methodology of compressed sensing, namely exact recovery of sparse or sparsified data from highly incomplete linear non-adaptive measurements by minimization or thresholding, has been very effectively applied to this problem. The pioneering paper is [22], which uses curvelets as sparsifying system for inpainting. Various intriguing successive empirical results have since then been obtained using applied harmonic analysis in combination with convex optimization [5]. These three papers do contain theoretical analyses of the convergence of their algorithms to the minimizers of specific optimization problems but not theoretical analyses of how well those optimizers actually inpaint. Other theoretical analysis of those types of methods (imposing sparsity with a discrete dictionary) typically use a discrete model of the original image which does not allow the geometry of the problem to be taken into account. However, variational methods are built on continuous methods and may be analyzed using a continuous model, for example, [11]. Also, some work has been done to compare variational approaches with those built on minimization [8]. Finally, in works such as [28] and [30], intuition behind why directional representation systems such as curvelets and shearlets outperform wavelets when inpainting images strongly governed by curvilinear structures such as seismic images is given. So, although there are many theoretical results concerning inpainting, they mainly concern algorithmic convergence or variational methods.

The preliminary results presented in the *SPIE Proceedings* paper [34] combined with the theory in this paper provide the first comprehensive analysis of discrete dictionaries inpainting the continuum domain utilizing the novel concept of clustered sparsity, which besides leading to asymptotic error bounds also makes the superior behavior of directional representation systems over wavelets precise. Along the way, our abstract model and analysis lay a common theoretical foundation for data recovery problems when utilizing either analysis-side minimization or thresholding as recovery schemes (Section Section 2). These theoretical results are then used as tools to analyze a specific inpainting model (Sections Section 3 – Section 6).

### 1.1A Continuum Model

One of the first practitioners of curvelet inpainting for applications was the seismologist Felix Herrmann, who achieved superior recovery results for images which consisted of curvilinear singularities in which vertical strips are missing due to missing sensors. These techniques were soon also exploited for astronomical imaging, etc., the common bracket being the governing by curvilinear singularities. It is evident, that no *discrete* model can appropriately capture such a geometrical content.

Thus a continuum domain model seems appropriate. In fact, in this paper we choose a distributional model which is a distribution acting on Schwartz functions by

the weight and length being specified in the main body of the paper. Essentially, the weight sets up the linear singularity that is smooth in the vertical direction, while the value of corresponds to the length of the singularity. Mimicking the seismic imaging situation, we might then choose the shape of the missing part to be

i.e., a vertical strip of width . Clearly, cannot be too large relative to or else we are erasing too much of . Further, we let and denote the orthogonal projection of onto the missing part and the known part, respectively. One task can now be formulated mathematically precise in the following way. Given

recover .

It should be mentioned that such a microlocal viewpoint was first introduced and studied in the situation of image separation [18].

### 1.2Sparsifying Systems

It was recently made precise that the optimal sparsifying systems for such images governed by anisotropic structures are curvelets [6] and shearlets [37]. Of these two systems shearlets have the advantage that they provide a unified concept of the continuum and digital domain, which curvelets do not achieve. However, many inpainting algorithms even still use wavelets, and one might ask whether shearlets provably outperform wavelets. In fact, we will make the superior behavior of shearlets within our model situation precise.

For our analysis, we will use systems of wavelets and shearlets which are defined below. Both systems are smooth Parseval frames. Parseval frames generalize orthonormal bases in a manner which will be useful in the sequel.

#### Wavelets

Meyer wavelets are some of the earliest known examples of orthonormal wavelets; they also happen to have high regularity [14]. We modify the classic system to get a decomposition of the Fourier domain that is comparable to the shearlet system that we will use. For the construction, let satisfy , where the *indicator function* is defined to take the value on and on , and

Then the Fourier transform of the 1D Meyer generator is defined by

and the Fourier transform of the scaled 1D Meyer scaling function is

where we use the following Fourier transform definition for

(where is the standard Euclidean inner product) which can be naturally extended to functions in . The inverse Fourier transform is given by

We will not detail the interpretation of a scaling function but refer the interested reader to [14]. Then we define the -functions , , and by

We denote

Then the *Parseval Meyer wavelet system* is given by

We have not yet shown that this system forms a Parseval frame. It is known (in various forms, for example [9]) that if for

and

then

is a Parseval frame for . The Meyer wavelet system defined above easily satisfies this.

#### Shearlets

We will use the construction of Guo and Labate of smooth Parseval frames of shearlets [26] which is a modification of cone-adapted shearlets (see, for example [37]). Let the parabolic scaling matrices and and shearing matrices and be defined as

We use these dilation matrices as these are used in [26] and given particulars of their construction, it is not straightforward to adopt their methods to the dilation matrix . In addition, given the fact that the matrices defined above always have integer values when is an integer, they are reasonable from the point of view of implementation. Let satisfy , and

Further set and . For , define

and

We define the following shearlet system for

where

for ,

for and ,

and for ,

The are the “seam” elements that piece together the and . We now have the following result from [26].

We will sometimes employ the notation

where , , , and .

Fix a . Then the support of each and lies in the Cartesian corona

The position of the support inside the corona is determined by the values of and , with the “seam” elements having support in the corners. Thus, the shearlet system induces the frequency tiling in Figure 2 (cf. Figure 1 for the frequency tiling of Meyer wavelets).

### 1.3Recovery Algorithms

We next decide upon a recovery strategy. Compressed sensing offers a variety of such, the most common ones being minimization and thresholding. We will also use these. However, for preparation purposes to derive an asymptotic scale dependent analysis – the fact that the energy of our model is arbitrary high frequencies requires this approach –, we first perform a band-pass filtering on (see Eqn. (Equation 8)). The band-pass filter will be roughly speaking chosen according to the band given by the wavelets and shearlets, see Figures Figure 1 and Figure 2, leading to the sequence

The minimization problem we choose has the form

where is a Parseval frame. We emphasize that this approach to inpainting minimizes the *analysis* coefficients and is hence related to the newly introduced cosparsity model [45]. The choice will be explained further in Subsection Section 2.2.

The thresholding strategy we choose is brutally simple. We only perform one step of hard thresholding, namely, setting for some threshold , the reconstructed image is

For the asymptotic analysis, the are explicitly computed in the proofs of Lemmas ? and ?. In practice, as is usual with parameters in algorithms, one must be careful when selecting the .

It will be surprising that the geometry of wavelets and shearlets is strong enough to achieve the same asymptotic recovery results as for minimization for the respective systems. However, thresholding techniques can be viewed as approximations of minimization and many parallel results have been found for minimization and thresholding. For example, minimization [18] and thresholding [42] applied to the geometric separation problem both achieve asymptotic separation. In fact, thresholding can be used to separate wavefront sets [42]. Iterative thresholding algorithms have successfully approximated solutions to such diverse sparsity problems as multidimensional NMR spectroscopy [19] and finding row-sparse solutions to underdetermined linear systems [23].

### 1.4Microlocal Analysis

One might ask where the geometry we mentioned before will come into play. This can best be explained and illustrated using microlocal analysis in phase space. For a more detailed explanation of the fundamentals of microlocal analysis, see [31], and for an application of microlocal analysis to derive a fundamental understanding of sparsity-based algorithms using shearlets and curvelets, see [7]. Phase space in this context is indexed by position-orientation pairs which describe the singular behavior of a distribution. The orientation component is an element of real projective space, which for simplicity’s sake we shall identify in what follows with . The wavefront set of a distribution is roughly the set of elements in the phase space at which is nonsmooth. First consider a curvilinear singularity along a closed curve :

where is the usual Dirac delta distribution located at . As illustrated in Figure ?, the wavefront set of is

where is the normal direction of at . Now consider the model from Section 1.1,

As can be seen in Figure ? the wavefront set of almost looks like itself except that the wavefront set fills all possible angles (i.e., forms a spike) at the end points of the missing mask. This is because at the end points, the distribution is singular in all but the parallel direction. Note that the wavefront set of the linear singularity does not have spikes at the end due to the smooth weight. The difference between the approximate phase space portrait of shearlets and wavelets is demonstrated in Figure 3. The intuition behind the image comes from the fact that shearlets resolve the wavefront set [27]. Even though our shearlets and wavelets are smooth and thus do not have a wavefront set, by doing a continuous shearlet transform (), one can get an approximation of phase space information which takes into account orientation, this is shown in Figure 3. This is similar in spirit to a wavelet spectrogram.

Furthermore, in Figure 4 (Left) the small overlap of the wavefront set of a cluster of shearlets with a spike in the phase space, which represents an end point of the mask of missing information , can be clearly seen. Thus shearlet clusters are incoherent with the end points, meaning that the clusters do not overlap the spikes strongly in the phase space. However, there is a lot of phase space overlap with the wavefront set away from the endpoints. So it is easy to see how easily a cluster of shearlets can span a gap of missing data (Figure Figure 4 (Right)). Herrmann and Hennenfent call this property the “principle of alignment” which explains why curvelets “attain high compression on synthetic data as well as on real seismic data” [30]. The phase space information of curvelets and shearlets are essentially the same [24].

### 1.5Asymptotical Analysis

The width of the area to be inpainted plays a key role, even when using other inpainting techniques. In [10], variational inpainting methods are analyzed theoretically, showing that the local thickness of the area to be inpainted affects the success of the inpainting more than the overall size of the area to be inpainted.

Thus our analysis shall also take this into account. We accomplish this by also making the gap size dependent on the scale . This leads to the problem of recovering from knowledge of

for each scale . Letting denote the recovered image by either one of the proposed algorithms, we will show that asymptotically precise inpainting, i.e.,

is achieved for wavelets provided that (Theorems ? and ?) as and for shearlets provided that (Theorems ? and ?) as . In fact, this is exactly what one would imagine. Inpainting succeeds provided that the gap size is comparable to the size of the analyzing elements. The scale-dependent gap size allows us to analyze dependency on the size of the shearlets and wavelets in a clear way, providing a theoretical understanding of how inpainting algorithms work even though in practice the gap size is fixed.

### 1.6Wavelets versus Shearlets

This observation seems to indicate that shearlets indeed perform better than wavelets. However, the previously mentioned theorems just state positive results. In order to show that shearlets outperform wavelets in the model situation which we consider, we require a negative result of the following type: If as and is recovered by wavelets, then

And in fact, this is what we will prove in Theorem ?. In this sense, we now have a mathematically precise statement showing that shearlets are strictly better for inpainting in our model.

The only slight disappointment is the fact that this statement will only be proven for thresholding as the recovery scheme. We strongly suspect that this result also holds for minimization. However, we are not aware of any analysis tools strong enough to derive these results also in this situation.

### 1.7Our Approach

Our analysis has focused primarily on revealing the fundamental mathematical concepts which lead to successful image inpainting using wavelets or shearlets. The viewpoint we take, however, is that this is just the “tip of the iceberg,” and the main results are susceptible of very extensive generalizations and extensions. For example, our asymptotic analysis is based on a vertical mask of missing data from a horizontal wavefront. Other masks applied to curved wavefronts could be considered. The microlocal bending techniques employed in [18] seem to suggest that this approach will yield desirable results.

### 1.8Contents

We begin in Section 2 with an abstract analysis of data recovery via minimization introducing clustered sparsity and concentration in a Hilbert space as tools. We then apply the results in Section 2 to a particular class of inpainting problems which are described in Section 3. In Sections Section 4 and Section 5, we prove that both wavelets and shearlets, respectively, are able to inpaint a missing band but that shearlets can handle wider gaps. It is shown in Section 6 that the inpainting result for wavelets in Section 4 is tight; i.e., shearlets strictly outperform wavelets in the considered model situation. We discuss future directions of research and limitations of the current model in Section 7. Finally, Section 8 is an appendix that contains auxiliary results concerning shearlets needed for Section 5.

## 2Abstract Analysis of Data Recovery

We start by analyzing missing data recovery via minimization and thresholding in an abstract model situation. The error estimates we will derive can be applied in a variety of situations. In this paper, – as discussed before – we aim to utilize them to analyze inpainting via wavelets and shearlets following a continuum domain model. In fact, these error estimates will later on be applied to each scale while deriving an asymptotic analysis.

### 2.1Abstract Model

Let be a signal in a Hilbert space . To model the data recovery problem correctly, we assume that can be decomposed into a direct sum

of a subspace which is associated with the *missing* part of and a subspace which relates to the *known* part of the signal. Further, let and denote the orthogonal projections onto those subspaces, respectively. The problem of data recovery can then be formulated as follows: Assuming that is known to us, recover .

Following the philosophy of compressed sensing, suppose that there exists a Parseval frame which – in a way yet to be made precise – sparsifies the original signal . Either can be selected non-adaptively such as choosing a wavelet or shearlet system which will be our avenue in the sequel, or can be chosen adaptively using dictionary learning algorithms such as [1].

To already draw the connection to the special situation of inpainting at this point, assume that . If the measurable subset is the missing area of the image, we set and .

### 2.2Inpainting via Minimization

A first methodology from compressed sensing to achieve recovery is minimization, which recovers the original signal by solving

We wish to remark that in this problem, the norm is placed on the *analysis* coefficients rather than on the *synthesis* coefficients as in [15] and other papers on basis pursuit. Since we intend to also apply this optimization problem in the situation when does not form a basis but merely a frame, the analysis and synthesis approaches are different. One reason to do this is to avoid numerical instabilities which are expected to occur since, for each , the linear system of equations has infinitely many solutions, only the specific solution is analyzed. Also, since we are only interested in correctly inpainting and not in computing the sparsest expansion, we can circumvent possible problems by solving the inpainting problem by selecting a particular coefficient sequence which expands out to the , namely the analysis sequence. A similar strategy was pursued in [34] and [42]. Various inpainting algorithms which are based on the core idea of (Inp) combined with geometric separation are heuristically shown to be successful in [5].

Interestingly, this minimization problem can be also regarded as a mixed - problem [41], since the analysis coefficient sequence is exactly the minimizer of

that is, the coefficient sequence which is minimal in the norm. The optimization problem in (Inp) may also be thought of a relaxation of the *cosparsity* problem

Theoretical results concerning cosparsity may be found in [45].

We also consider the noisy case. Assume now that we know , where and are unknown, but is assumed to be small in the sense of for small . Also, clearly . Then we solve

To analyze this optimization problem, we require the following notion, which intuitively measures the maximal fraction of the total norm which can be concentrated to the index set restricted to functions in . In this sense, the geometric relation between the missing part and expansions in is encoded.

This notion allows us to formulate our first estimate concerning the error of the reconstruction via (Inp). The reader should notice that the considered error is solely measured on , the masked space, since due to the constraint in (Inp). Another important notion is that of *clustered sparsity*.

We now present a pair of lemmas which were first published in [34] without proof.

The noiseless case Lemma ? holds as a corollary to the case with noise, which follows.

Since is Parseval,

We invoke the relation , which implies that . Using the definition of , we obtain

It follows that

The clustered sparsity of now implies

Applying the sparsity of again and the minimality of , we have

Using (Equation 6) and (Equation 7), this leads to

Combining this with (Equation 5), we finally obtain

We now establish a relation between the concentration on and the notion of cluster coherence first introduced in [18]. For this, by abusing notation, we will write and for the projected frame elements.

To first introduce the notion of cluster coherence, recall that in many studies of optimization, one utilizes the *mutual coherence*

whose importance was shown by [16]. This may be called the *singleton coherence.* We modify the definition to take into account clustering of the coefficients arising from the geometry of the situation.

The following relation is a specific case of Proposition 3.1 in [34]. We include a proof for completeness.

For each , we choose a coefficient sequence such that and for all satisfying . Invoking the fact that is a tight frame, hence , and the fact that , we obtain

Combining Lemmata ? and ? proves the final noiseless estimate and combining Lemmata ? and ? proves the final estimate with noise:

Let us briefly interpret this estimate, first focusing on the noiseless case. As expected the error decreases linearly with the clustered sparsity. It should also be emphasized that both clustered sparsity and cluster coherence depend on the chosen “geometric set of indices” . Thus this set is crucial for determining whether is a good dictionary for inpainting. This will be illustrated in the sequel when considering a particular situation; however, is merely an analysis tool and explicit knowledge of it is not necessary to recover data. Note that in general, the larger the set is, the smaller is (i.e., is -relatively sparse for a smaller ) and the larger the cluster coherence is. This seems to be a contradiction, but if sparsifies well, then a small set can be chosen which keeps small. Finally, considering the noisy case, as also expected the error estimate depends linearly on the bound for the noise.

### 2.3Inpainting via Thresholding

Another fundamental methodology from compressed sensing for sparse recovery is thresholding. The beauty of this approach lies in its simplicity and its associated fast algorithms. Typically, it is also possible to prove success of recovery in similar situations as in which minimization succeeds.

Various thresholding strategies are available such as iterative thresholding, etc. It is thus surprising that the most simple imaginable strategy, which is to perform just *one step* of hard thresholding, already allows for error estimates as strong of for minimization. We start by presenting this thresholding strategy. For technical reasons, – note also that this is no restriction at all – we now assume that the Parseval frame consists of frame vectors with equal norm, i.e., for all .

The following result provides us with an estimate for the error of the synthesized signal computed via One-Step-Thresholding.

As before, Proposition ? follows as a corollary to the case with noise:

Invoking the decomposition of and the fact that is Parseval,

Since

and , it follows that

It follows from the equal-norm condition on the frame that for any sequence ,

Applying the clustered sparsity of we obtain

which is what we intended to prove.

As before, let us also interpret this estimate. Now the situation is slightly different from the estimate for the approach. Again, the estimate depends linearly on the clustered sparsity and the noise. The difference now is the appearance of the term in the numerator instead of the cluster coherence in the denominator. Note, however, that

Thus both in the minimization case Proposition ? and in the thresholding case Proposition ?, the bound on the error is lower when the cluster coherence is lower. Furthermore, is a quantification of how much of the signal is missing, which clearly can not be too high.

## 3Mathematical Model

We next provide a specific mathematical model which is motivated by the fact that images are typically governed by edges, which can most prominently be seen in, for example, seismic imaging (Figure Figure 5).

Following this line of thought, our model is based on line singularities – which can as explained later be extended to curvilinear singularities – with missing data of the forms as gaps or holes. In this section, such a model for the original image and the mask will be introduced. Since the analysis we derive later is based on the behavior in Fourier domain, the Fourier content of the models is another focus.

### 3.1Image Model

Inspired by seismic data with missing traces, an example of which is found in Figure 5, we define our mathematical model. The data can be viewed as a collection of curvilinear singularities which are missing nearly vertical strips of information. We first simplify the model by considering linear singularities. As shearlets are directional systems, we then simplify the model so that the linear singularity is horizontal. The specific mathematical model that we shall analyze is as follows. Let be a smooth function that is supported in , where we always assume that is sufficiently large, in particular, much larger than (a measure of the missing data which will be defined in the next subsection). For now, we consider as a prototype of a line singularity the weighted distribution acting on tempered distributions by

Notice that this distribution is supported on the segment

of the -axis, hence can be employed as a model for a horizontal linear singularity. The weighting was chosen to ensure that we are dealing with an -function after filtering. The Fourier transform of can be computed to be

Let now be a filter corresponding to the frequency corona at level (see Equation (Equation 2)). defined by its Fourier transform ,

To simplify the proofs for wavelets, we also define

so that . We use two bands for the wavelets so that the wavelet and shearlet systems will be compared on the same frequency corona. This makes sense as the base () dilation for the wavelets has determinant , while the base dilation for the shearlets has determinant . We consider the filtered version of which we denote by , i.e.,

The next result provides us with an estimate of the norm of .

We have

### 3.2Masks

Inspired by the missing sensor scenario in seismic data we will define the mask of a missing piece of the image as follows. The mask is a vertical strip of diameter and is given by

For an illustration, we refer to Figure 6.

For the convenience of the reader, we compute the associated Fourier transforms, where as usual we set for .

Define the planar Heaviside by . Since , we have . We now express in terms of by

This leads to

The proof is finished.

### 3.3Transfer of Abstract Setting

All of the main proofs in Sections Section 4 and Section 5 will follow a particular pattern. Either Proposition ? (in the case of minimization) or Proposition ? (in the case of thresholding) is applied to the situation in which is chosen to be the filtered linear singularity , the Hilbert space is defined by , and is either the Parseval system of Meyer wavelets or of shearlets at scale .

In the analysis that follows, will denote the optimal -clustered sparsity for filtered coefficients. That is, for minimization with a fixed filter level , we will fix a set of significant coefficients of and set

Similarly, we will analyze thresholding schemes by setting

where the are the significant coefficients in One-Step-Thresholding Algorithm. The inpainting accomplished (i.e., the solution in Proposition ? or Proposition ?) on the filtered levels will be denoted by . will denote the filtered real image; that is, , where is the original, complete image. Thus, the main theorems in Sections Section 4 and Section 5 will show that

The results will specifically depend on the asymptotic behavior of the gap . For the proofs involving the Meyer system, the following notation will also be useful

## 4Positive Results for Wavelet Inpainting

We begin by proving theoretically for the first time what has been known heuristically; namely, that wavelets can successfully inpaint an edge as long as not too much is missing. In Section 4.1, we investigate the inpainting results of minimization by estimating the -clustered sparsity and cluster coherence with respect to and a proper chosen index set . In Subsection Section 4.2, we similarly give the estimation of and for inpainting using thresholding.

### 4.1 Minimization

In what follows, we use the compact notation . We first need to choose the set of significant coefficients appropriately. We do this by setting

where . This choice of forces the clustered sparsity to grow slower than the growth rate of :

By definition, we have

We now compute

that is,

where is a smooth and compactly supported function that is essentially supported on

Applying the change of variable ensures that