Optimally sparse approximations of 3D functions by compactly supported shearlet frames

Optimally sparse approximations of 3D functions by compactly supported shearlet frames

Gitta Kutyniok Technische Universität Berlin, Institut für Mathematik, 10623 Berlin, Germany, E-mail: kutyniok@math.tu-berlin.de    Jakob Lemvig Technical University of Denmark, Department of Mathematics, Matematiktorvet 303, 2800 Kgs. Lyngby, Denmark, E-mail: J.Lemvig@mat.dtu.dk    Wang-Q Lim Technische Universität Berlin, Institut für Mathematik, 10623 Berlin, Germany, E-mail: lim@math.tu-berlin.de
Abstract

We study efficient and reliable methods of capturing and sparsely representing anisotropic structures in 3D data. As a model class for multidimensional data with anisotropic features, we introduce generalized three-dimensional cartoon-like images. This function class will have two smoothness parameters: one parameter controlling classical smoothness and one parameter controlling anisotropic smoothness. The class then consists of piecewise -smooth functions with discontinuities on a piecewise -smooth surface. We introduce a pyramid-adapted, hybrid shearlet system for the three-dimensional setting and construct frames for with this particular shearlet structure. For the smoothness range we show that pyramid-adapted shearlet systems provide a nearly optimally sparse approximation rate within the generalized cartoon-like image model class measured by means of non-linear -term approximations.

Key words. anisotropic features, multi-dimensional data, shearlets, cartoon-like images, non-linear approximations, sparse approximations

AMS subject classifications. Primary: 42C40, Secondary: 42C15, 41A30, 94A08

1 Introduction

Recent advances in modern technology have created a new world of huge, multi-dimensional data. In biomedical imaging, seismic imaging, astronomical imaging, computer vision, and video processing, the capabilities of modern computers and high-precision measuring devices have generated 2D, 3D and even higher dimensional data sets of sizes that were infeasible just a few years ago. The need to efficiently handle such diverse types and huge amounts of data has initiated an intense study in developing efficient multivariate encoding methodologies in the applied harmonic analysis research community. In neuro-imaging, e.g., fluorescence microscopy scans of living cells, the discontinuity curves and surfaces of the data are important specific features since one often wants to distinguish between the image “objects” and the “background”, e.g., to distinguish actin filaments in eukaryotic cells; that is, it is important to precisely capture the edges of these 1D and 2D structures. This specific application is an illustration that important classes of multivariate problems are governed by anisotropic features. The anisotropic structures can be distinguished by location and orientation or direction which indicates that our way of analyzing and representing the data should capture not only location, but also directional information. This is exactly the idea behind so-called directional representation systems which by now are well developed and understood for the 2D setting. Since much of the data acquired in, e.g., neuro-imaging, are truly three-dimensional, analyzing such data should be performed by three-dimensional directional representation systems. Hence, in this paper, we therefore aim for the 3D setting.

In applied harmonic analysis the data is typically modeled in a continuum setting as square-integrable functions or distributions. In dimension two, to analyze the ability of representation systems to reliably capture and sparsely represent anisotropic structures, Candés and Donoho [7] introduced the model situation of so-called cartoon-like images, i.e., two-dimensional functions which are piecewise -smooth apart from a piecewise discontinuity curve. Within this model class there is an optimal sparse approximation rate one can obtain for a large class of non-adaptive and adaptive representation systems. Intuitively, one should think adaptive systems would be far superior in this task, but it has been shown in recent years that non-adaptive methods using curvelets, contourlets, and shearlets all have the ability to essentially optimal sparsely approximate cartoon-like images in 2D measured by the -error of the best -term approximation [7, 17, 13, 24].

1.1 Dimension three

In the present paper we will consider sparse approximations of cartoon-like images using shearlets in dimension three. The step from the one-dimensional setting to the two-dimensional setting is necessary for the appearance of anisotropic features at all. When further passing from the two-dimensional setting to the three-dimensional setting, the complexity of anisotropic structures changes significantly. In 2D one “only” has to handle one type of anisotropic features, namely curves, whereas in 3D one has to handle two geometrically very different anisotropic structures: Curves as one-dimensional features and surfaces as two-dimensional anisotropic features. Moreover, the analysis of sparse approximations in dimension two depends heavily on reducing the analysis to affine subspaces of . Clearly, these subspaces always have dimension and co-dimension one in 2D. In dimension three, however, we have subspaces of co-dimension one and two, and one therefore needs to perform the analysis on subspaces of the “correct” co-dimension. Therefore, the 3D analysis requires fundamental new ideas.

Finally, we remark that even though the present paper only deals with the construction of shearlet frames for and sparse approximations of such, it also illustrates how many of the problems that arises when passing to higher dimensions can be handled. Hence, once it is known how to handle anisotropic features of different dimensions in 3D, the step from 3D to 4D can be dealt with in a similar way as also the extension to even higher dimensions. Therefore the extension of the presented result in to higher dimensions should be, if not straightforward, then at least be achievable by the methodologies developed.

1.2 Modelling anisotropic features

The class of 2D cartoon-like images consists, as mentioned above, of piecewise -smooth functions with discontinuities on a piecewise -smooth curve, and this class has been investigated in a number of recent publications. The obvious extension to the 3D setting is to consider functions of three variables being piecewise -smooth function with discontinuities on a piecewise -smooth surface. In some applications the -smoothness requirement is too strict, and we will, therefore, go one step further and consider a larger class of images also containing less regular images. The generalized class of cartoon-like images in 3D considered in this paper consists of three-dimensional piecewise -smooth functions with discontinuities on a piecewise surface for . Clearly, this model provides us with two new smoothness parameters: being a classical smoothness parameter and being an anisotropic smoothness parameter, see Figure LABEL:fig:cartoon-piecewise for an illustration.

This image class is unfortunately not a linear space as traditional smoothness spaces, e.g., Hölder, Besov, or Sobolev spaces, but it allows one to study the quality of the performance of representation systems with respect to capturing anisotropic features, something that is not possible with traditional smoothness spaces.

Finally, we mention that allowing piecewise -smoothness and not everywhere -smoothness is an essential way to model singularities along surfaces as well as along curves which we already described as the two fundamental types of anisotropic phenomena in 3D.

1.3 Measure for Sparse Approximation and Optimality

The quality of the performance of a representation system with respect to cartoon-like images is typically measured by taking a non-linear approximation viewpoint. More precisely, given a cartoon-like image and a representation system, the chosen measure is the asymptotic behavior of the error of -term (non-linear) approximations in the number of terms . When the anisotropic smoothness is bounded by the classical smoothness as , the anisotropic smoothness of the cartoon-like images will be the determining factor for the optimal approximation error rate one can obtain. To be more precise, as we will show in Section LABEL:sec:optimal-sparsity, the optimal approximation rate for the generalized 3D cartoon-like images models which can be achieved for a large class of adaptive and non-adaptive representation systems for is

 ∥f−fN∥2L2≤C⋅N−α/2as N→∞,

for some constant , where is an -term approximation of . For cartoon-like images, wavelet and Fourier methods will typically have an -term approximation error rate decaying as and as , respectively, see [23]. Hence, as the anisotropic smoothness parameter grows, the approximation quality of traditional tools becomes increasingly inferior as they will deliver approximation error rates that are far from the optimal rate . Therefore, it is desirable and necessary to search for new representation systems that can provide us with representations with a more optimal rate. This is where pyramid-adapted, hybrid shearlet systems enter the scene. As we will see in Section LABEL:sec:optimal-sparsity-3d, this type of representation system provides nearly optimally sparse approximations:

 ∥f−fN∥2L2≤{C⋅N−α/2+τ,if β∈[α,2),C⋅N−1(logN)2,if β=α=2,} as N→∞,

where is the -term approximation obtained by keeping the largest shearlet coefficients, and with and for and for . Clearly, the obtained sparse approximations for these shearlet systems are not truly optimal owing to the polynomial factor for and the polylog factor for . On the other hand, it still shows that non-adaptive schemes such as the hybrid shearlet system can provide rates that are nearly optimal within a large class of adaptive and non-adaptive methods.

1.4 Construction of 3D hybrid shearlets

Shearlet theory has become a central tool in analyzing and representing 2D data with anisotropic features. Shearlet systems are systems of functions generated by one single generator with parabolic scaling, shearing, and translation operators applied to it, in much the same way wavelet systems are dyadic scalings and translations of a single function, but including a directionality characteristic owing to the additional shearing operation and the anisotropic scaling. Of the many directional representation systems proposed in the last decade, e.g., steerable pyramid transform [29], directional filter banks [3], 2D directional wavelets [2], curvelets [6], contourlets [13], bandelets [28], the shearlet system [25] is among the most versatile and successful. The reason for this being an extensive list of desirable properties: Shearlet systems can be generated by one function, they precisely resolve wavefront sets, they allow compactly supported analyzing elements, they are associated with fast decomposition algorithms, and they provide a unified treatment of the continuum and the digital realm. We refer to [22] for a detailed review of the advantages and disadvantages of shearlet systems as opposed to other directional representation systems.

Several constructions of discrete band-limited and compactly supported 2D shearlet frames are already known, see [15, 21, 9, 26, 20, 11]; for construction of 3D shearlet frames less is known. Dahlke, Steidl, and Teschke [10] recently generalized the shearlet group and the associated continuous shearlet transform to higher dimensions . Furthermore, in [10] they showed that, for certain band-limited generators, the continuous shearlet transform is able to identify hyperplane and tetrahedron singularities. Since this transform originates from a unitary group representation, it is not able to capture all directions, in particular, it will not capture the delta distribution on the -axis (and more generally, any singularity with “-directions”). We will use a different tiling of the frequency space, namely systems adapted to pyramids in frequency space, to avoid this non-uniformity of directions. We call these systems pyramid-adapted shearlet system[22]. In [16], the continuous version of the pyramid-adapted shearlet system was introduced, and it was shown that the location and the local orientation of the boundary set of certain three-dimensional solid regions can be precisely identified by this continuous shearlet transform. Finally, we will also need to use a different scaling than the one from [10] in order to achieve shearlet systems that provide almost optimally sparse approximations.

Since spatial localization of the analyzing elements of the encoding system is very important both for a precise detection of geometric features as well as for a fast decomposition algorithm, we will mainly follow the sufficient conditions for and construction of compactly supported cone-adapted 2D shearlets by Kittipoom and two of the authors [20] and extend these result to the 3D setting (Section LABEL:sec:shearl-high-dimens). These results provide us with a large class of separable, compactly supported shearlet systems with “good” frame bounds, optimally sparse approximation properties, and associated numerically stable algorithms. One important new aspect is that dilation will depend on the smoothness parameter . This will provide us with hybrid shearlet systems ranging from classical parabolic based shearlet systems () to almost classical wavelet systems (). In other words, we obtain a parametrized family of shearlets with a smooth transition from (nearly) wavelets to shearlets. This will allow us to adjust our shearlet system according to the anisotropic smoothness of the data at hand. For rational values of we can associate this hybrid system with a fast decomposition algorithm using the fast Fourier transform with multiplication and periodization in the frequency space (in place of convolution and down-sampling).

Our compactly supported 3D hybrid shearlet elements (introduced in Section LABEL:sec:shearl-high-dimens) will in the spatial domain be of size times times for some fixed anisotropy parameter . When this corresponds to “cube-like” (or “wavelet-like”) elements. As approaches the scaling becomes less and less yielding “plate-like” elements as . This indicates that these anisotropic 3D shearlet systems have been designed to efficiently capture two-dimensional anisotropic structures, but neglecting one-dimensional structures. Nonetheless, these 3D shearlet systems still perform optimally when representing and analyzing cartoon-like functions that have discontinuities on piecewise -smooth surfaces – as mentioned such functions model 3D data that contain both point, curve, and surface singularities.

Let us end this subsection with a general thought on the construction of band-limited tight shearlet frames versus compactly supported shearlet frames. There seem to be a trade-off between compact support of the shearlet generators, tightness of the associated frame, and separability of the shearlet generators. The known constructions of tight shearlet frames, even in 2D, do not use separable generators, and these constructions can be shown to not be applicable to compactly supported generators. Moreover, these tight frames use a modified version of the pyramid-adapted shearlet system in which not all elements are dilates, shears, and translations of a single function. Tightness is difficult to obtain while allowing for compactly supported generators, but we can gain separability as in Theorem LABEL:thm:compact-for-pyramid hence fast algorithmic realizations. On the other hand, when allowing non-compactly supported generators, tightness is possible, but separability seems to be out of reach, which makes fast algorithmic realizations very difficult.

1.5 Other approaches for 3D data

Other directional representation systems have been considered for the 3D setting. We mention curvelets [5, 4], surflets [8], and surfacelets [27]. This line of research is mostly concerned with constructions of such systems and not their sparse approximation properties with respect to cartoon-like images. In [8], however, the authors consider adaptive approximations of Horizon class function using surflet dictionaries which generalizes the wedgelet dictionary for 2D signals to higher dimensions.

During the final stages of this project, we realized that a similar almost optimal sparsity result for the 3D setting (for the model case ) was reported by Guo and Labate [18] using band-limited shearlet tight frames. They provide a proof for the case where the discontinuity surface is (non-piecewise) -smooth using the X-ray transform.

1.6 Outline

We give the precise definition of generalized cartoon-like image model class in Section LABEL:sec:cartoon, and the optimal rate of approximation within this model is then derived in Section LABEL:sec:optimal-sparsity. In Section LABEL:sec:shearl-high-dimens and Section LABEL:sec:constr-comp-supp we construct the so-called pyramid-adapted shearlet frames with compactly supported generators. In Sections LABEL:sec:optimal-sparsity-3d to LABEL:sec:proof-theorem-1 we then prove that such shearlet systems indeed deliver nearly optimal sparse approximations of three-dimensional cartoon-like images. We extend this result to the situation of discontinuity surfaces which are piecewise -smooth except for zero- and one-dimensional singularities and again derive essential optimal sparsity of the constructed shearlet frames in Section LABEL:sec:proof-theorem-2. We end the paper by discussion various possible extensions in Section LABEL:sec:extensions.

1.7 Notation

We end this introduction by reviewing some basic definitions. The following definitions will mostly be used for the case , but they will however be defined for general . For we denote the -norm on of by . The Lebesgue measure on is denoted by and the counting measure by . Sets in are either considered equal if they are equal up to sets of measure zero or if they are element-wise equal; it will always be clear from the context which definition is used. The -norm of is denoted by . For , the Fourier transform is defined by

 ^f(ξ)=∫Rnf(x)e−2πi⟨ξ,x⟩dx

with the usual extension to . The Sobolev space and norm are defined as

 Hs(Rn)={f:Rn→C:∥f∥2Hs:=∫Rn(1+|ξ|2)s∣∣^f(ξ)∣∣2dξ<+∞}.

For functions the homogeneous Hölder seminorm is given by

where is the fractional part of and is the usual length of a multi-index . Further, we let

 ∥f∥Cβ:=maxγ≤⌊β⌋sup|∂γf|+∥f∥˙Cβ,

and we denote by the space of Hölder functions, i.e., functions , whose -norm is bounded.

2 Generalized 3D cartoon-like image model class

The first complete model of 2D cartoon-like images was introduced in [7], the basic idea being that a closed -curve separates two -smooth functions. For 3D cartoon-like images we consider square integrable functions of three variables that are piecewise -smooth with discontinuities on a piecewise -smooth surface.

Fix and , and let be continuous and define the set in by

 B={x∈R3:∥x∥2≤ρ(θ1,θ2),x=(∥x∥2,θ1,θ2) in % spherical coordinates}.

We require that the boundary of is a closed surface parametrized by

 b(θ1,θ2)=⎛⎜⎝ρ(θ1,θ2)cos(θ1)sin(θ2)ρ(θ1,θ2)sin(θ1)sin(θ2)ρ(θ1,θ2)cos(θ2)⎞⎟⎠,θ=(θ1,θ2)∈[0,2π)×[0,π]. \hb@xt@.01(2.1)

Furthermore, the radius function must be Hölder continuous with coefficient , i.e.,

 \hb@xt@.01(2.2)

For , the set is defined to be the set of all such that is a translate of a set obeying (LABEL:eq:curve) and (LABEL:eq:curvebound). The boundary of the surfaces in will be the discontinuity sets of our cartoon-like images. We remark that any starshaped sets in with bounded principal curvatures will belong to for some . Actually, the property that the sets in are parametrized by spherical angles, which implies that the sets are starshaped, is not important to us. For we could, e.g., extend to be all bounded subset of , whose boundary is a closed surface with principal curvatures bounded by .

To allow more general discontinuities surfaces, we extend to a class of sets with piecewise boundaries . We denote this class , where is the number of pieces and be an upper bound for the “curvature” on each piece. In other words, we say that if is a bounded subset of whose boundary is a union of finitely many pieces which do not overlap except at their boundaries, and each patch can be represented in parametric form by a -smooth radius function with . We remark that we put no restrictions on how the patches meet, in particular, can have arbitrarily sharp edges joining the pieces . Also note that .

The actual objects of interest to us are, as mentioned, not these starshaped sets, but functions that have the boundary as discontinuity surface.

Definition 2.1

Let , , and . Then denotes the set of functions of the form

 f=f0+f1χB,

where and with and for each . We let .

We speak of as consisting of cartoon-like 3D images having -smoothness apart from a piecewise discontinuity surface. We stress that is not a linear space of functions and that depends on the constants and even though we suppress this in the notation. Finally, we let denote binary cartoon-like images, that is, functions , where and .

3 Optimality bound for sparse approximations

After having clarified the model situation , we will now discuss which measure for the accuracy of approximation by representation systems we choose, and what optimality means in this case. We will later in Section LABEL:sec:optimal-sparsity-3d restrict the parameter range in our model class to . In this section, however, we will find the theoretical optimal approximation error rate within for the full range and . Before we state and prove the main optimal sparsity result of this section, Theorem LABEL:thm:copy-lp-in-E, we discuss the notions of -term approximations and frames.

3.1 N-term approximations

Let be a dictionary with the index set not necessarily being countable. We seek to approximate each single element of with elements from by terms of this system. For this, let be arbitrarily chosen. Letting now , we consider -term approximations of , i.e.,

The best -term approximation to is an -term approximation

 fN=∑i∈INciϕi,

which satisfies that, for all , , and for all scalars ,

 ∥f−fN∥L2≤∥∥f−∑i∈INciϕi∥∥L2.

3.2 Frames

A frame for a separable Hilbert space is a countable collection of vectors for which there are constants such that

 A∥f∥2≤∑j∈J∣∣⟨f,fj⟩∣∣2≤B∥f∥2for all f∈H.

If the upper bound in this inequality holds, then is said to be a Bessel sequence with Bessel constant . For a Bessel sequence , we define the frame operator of by

 S:H→H,Sf=∑j∈J⟨f,fj⟩fj.

If is a frame, this operator is bounded, invertible, and positive. A frame is said to be tight if we can choose . If furthermore , the sequence is said to be a Parseval frame. Two Bessel sequences and are said to be dual frames if

 f=∑j∈J⟨f,gj⟩fjfor % all f∈H.

It can be shown that, in this case, both Bessel sequences are even frames, and we shall say that the frame is dual to , and vice versa. At least one dual always exists; it is given by and called the canonical dual.

Now, suppose the dictionary forms a frame for with frame bounds and , and let denote the canonical dual frame. We then consider the expansion of in terms of this dual frame, i.e.,

 f=∑i∈I⟨f,ϕi⟩~ϕi.

For any we have by definition. Since we only consider expansions of functions belonging to a subset of , this can, at least, potentially improve the decay rate of the coefficients so that they belong to for some . This is exactly what is understood by sparse approximation (also called compressible approximations). We hence aim to analyze shearlets with respect to this behavior, i.e., the decay rate of shearlet coefficients.

For frames, tight and non-tight, it is not possible to derive a usable, explicit form for the best -term approximation. We therefore crudely approximate the best -term approximation by choosing the -term approximation provided by the indices associated with the largest coefficients in magnitude with these coefficients, i.e.,

 fN=∑i∈IN⟨f,ϕi⟩~ϕi.

However, even with this rather crude greedy selection procedure, we obtain very strong results for the approximation rate of shearlets as we will see in Section LABEL:sec:optimal-sparsity-3d.

The following well-known result shows how the -term approximation error can be bounded by the tail of the square of the coefficients . We refer to [23] for a proof.

Lemma 3.1

Let be a frame for with frame bounds and , and let be the canonical dual frame. Let with , and let be the -term approximation . Then

for any .

Let denote the non-increasing (in modulus) rearrangement of , e.g., denotes the th largest coefficient of in modulus. This rearrangement corresponds to a bijection that satisfies

Since , also . Let be a cartoon-like image, and suppose that , in this case, even decays as

 |c∗n|≲n−(α+2)/4forn→∞ \hb@xt@.01(3.1)

for some , where the notation means that there exists a such that , i.e., . Clearly, we then have for . By Lemma LABEL:lemma:n-term-frame-approx, the -term approximation error will therefore decay as

 ∥f−fN∥2≤1A∑n>N|c∗n|2≲∑n>Nn−α/2+1≍N−α/2, \hb@xt@.01(3.2)

where is the -term approximation of by keeping the largest coefficients, that is,

 fN=N∑n=1c∗n~ϕπ(n). \hb@xt@.01(3.3)

The notation , sometimes also written as , used above means that is bounded both above and below by asymptotically as , that is, and . The approximation error rate obtained in (LABEL:eq:from-coeff-decay-to-error-decay) is exactly the sought optimal rate mentioned in the introduction. This illustrates that the fraction introduced in the decay of the sequence will play a major role in the following. In particular, we are searching for a representation system which forms a frame and delivers decay of as in (LABEL:eq:sought-weak-decay) for any cartoon-like image.

3.3 Optimal sparsity

In this subsection we will state and prove the main result of this section, Theorem LABEL:thm:copy-lp-in-E, but let us first discuss some of its implications for sparse approximations of cartoon-like images.

From the dictionary with the index set not necessarily being countable, we consider expansions of the form

 f=∑i∈Ifciϕi, \hb@xt@.01(3.4)

where is a countable selection from that may depend on . Moreover, we can assume that are normalized by . The selection of the th term is obtained according to a selection rule which may adaptively depend on . Actually, the th element may also be modified adaptively and depend on the first th chosen elements [14]. We assume that how deep or how far down in the indexed dictionary we are allowed to search for the next element in the approximation is limited by a polynomial . Without such a depth search limit, one could choose to be a countable, dense subset of which would yield arbitrarily good sparse approximations, but also infeasible approximations in practise. We shall denote any sequence of coefficients chosen according to these restrictions by .

We are now ready to state the main result of this section. Following Donoho [14] we say that a function class contains an embedded orthogonal hypercube of dimension and side if there exists , and orthogonal functions , , with , such that the collection of hypercube vertices

 H(m;f0,{ψi}):={f0+m∑i=1ξiψi,m,δ:ξi∈{0,1}}

is contained in . The sought bound on the optimal sparsity within the set of cartoon-like images will be obtained by showing that the cartoon-like image class contains sufficiently high-dimensional hypercubes with sufficiently large sidelength; intuitively, we will see that a certain high complexity of the set of cartoon-like images limits the possible sparsity level. The meaning of “sufficiently” is made precise by the following definition. We say that a function class contains a copy of if contains embedded orthogonal hypercubes of dimension and side , and if, for some sequence , and some constant :

 m(δk)≥Cδ−pk,k=k0,k0+1,… \hb@xt@.01(3.5)

The first part of the following result is an extension from the 2D to the 3D setting of [14, Thm. 3].

Theorem 3.2
• The class of binary cartoon-like images contains a copy of for .

• The space of Hölder functions with compact support in contains a copy of for .

Before providing a proof of the theorem, let us discuss some of its implications for sparse approximations of cartoon-like images. Theorem LABEL:thm:copy-lp-in-E(i) implies, by [14, Theorem 2], that for every and every method of atomic decomposition based on polynomial depth search from any countable dictionary , we have for :

 \hb@xt@.01(3.6)

where the weak- “norm”111Note that neither nor (for ) is a norm since they do not satisfy the triangle inequality. Note also that the weak- norm is a special case of the Lorentz quasinorm. is defined as . Sparse approximations are approximations of the form with coefficients decaying at certain, hopefully high, rate. Equation (LABEL:eq:sparsity-of-coefficients) is a precise statement of the optimal achievable sparsity level. No representation system (up to the restrictions described above) can deliver expansions (LABEL:eq:dict-expansion) for with coefficients satisfying for . As we will see in Theorems LABEL:thm:3d-opt-sparse and LABEL:thm:3d-opt-sparse-piecewise, pyramid-adapted shearlet frames deliver for , where .

Assume for a moment that we have an “optimal” dictionary at hand that delivers , and assume further that it is also a frame. As we saw in the Section LABEL:sec:frames, this implies that

 ∥f−fN∥2L2≲N−α/2% as N→∞,

where is the -term approximation of by keeping the largest coefficients. Therefore, no frame representation system can deliver at better approximation error rate than under the chosen approximation procedure within the image model class . If is actually an orthonormal basis, then this is truly the optimal rate since best -term approximations, in this case, are obtained by keeping the largest coefficients.

Similarly, Theorem LABEL:thm:copy-lp-in-E(ii) tells us that the optimal approximation error rate within the Hölder function class is . Combining the two estimates we see that the optimal approximation error rate within the full cartoon-like image class cannot exceed convergence. For the parameter range , this rate reduces to . For , as will show in Section LABEL:sec:optimal-sparsity-3d, shearlet systems actually deliver this rate except from an additional polylog factor, namely . For and , the -factor is replaced by a small polynomial factor , where and for or .

It is striking that one is able to obtain such a near optimal approximation error rate since the shearlet system as well as the approximation procedure will be non-adaptive; in particular, since traditional, non-adaptive representation systems such as Fourier series and wavelet systems are far from providing an almost optimal approximation rate. This is illustrated in the following example.

Example 1

Let be the ball in with center and radius . Define . Clearly, if . Suppose . The best -term Fourier sum yields

 ∥f−fN∥2L2≍N−1/3for N→∞,

which is far from the optimal rate . For the wavelet case the situation is only slightly better. Suppose is any compactly supported wavelet basis. Then

 ∥f−fN∥2L2≍N−1/2for N→∞,

where is the best -term approximation from . The calculations leading to these estimates are not difficult, and we refer to [23] for the details. We will later see that shearlet frames yield , where is the best -term approximation.

We mention that the rates obtained in Example LABEL:example:fourier-wavelets are typical in the sense that most cartoon-like images will yield the exact same (and far from optimal) rates.

Finally, we end the subsection with a proof of Theorem LABEL:thm:copy-lp-in-E.

Proof. [Proof of Theorem LABEL:thm:copy-lp-in-E] The idea behind the proofs is to construct a collection of functions in and , respectively, such that the collection of functions will be vertices of a hypercube with dimension satisfying (LABEL:eq:dimension-growth).

(i): Let and be smooth functions with compact support and . For and we define:

 φi,m(t)=φi1,i2,m(t)=Am−αφ1(mt1−2πi1)φ2(mt2−πi2),

for , where and . We let further . It is easy to see that . Moreover, it can also be shown that , where denotes the homogeneous Hölder norm introduced in (LABEL:eq:curvebound).

Without loss of generality, we can consider the cartoon-like images translated by so that their support lies in . Alternatively, we can fix an origin at , and use spherical coordinates relative to this choice of origin. We set and define

 ψi,m=χ{ρ0<ρ≤ρ0+φi,m}for i1,i2∈{0,…,m−1}.

The radius functions for with defined by

 ργ(θ1,θ2)=ρ0+m∑i1=1m∑i2=1γi1,i2φi,m(θ1,θ2), \hb@xt@.01(3.7)

determines the discontinuity surfaces of the functions of the form:

 fγ=χ{ρ≤ρ0}+m∑i1=1m∑i2=1γi1,i2ψi,mfor γi1,i2∈{0,1}.

For a fixed the functions are disjointly supported and therefore mutually orthogonal. Hence, is a collection of hypercube vertices. Moreover,

 ∥ψi,m∥2L2 =λ({(ρ,θ1,θ2):ρ0≤ρ≤ρ0+φi,m(θ1,θ2)}) ≤∫2π0∫π0∫ρ0+φi,m(θ1,θ2)ρ0ρ2sinθ2dρdθ2dθ1 ≤C0m−α−2∥φ∥L1,

where the constant only depends on . Any radius function of the form (LABEL:eq:radius-functions) satisfies

 ∥∥ργ∥∥˙Cα≤∥φi,m∥˙Cα=A∥φ∥˙Cα.

Therefore, whenever . This shows that we have the hypercube embedding

 H(m2,χ{ρ≤ρ0},{ψi,m})⊂Ebinα(R3).

The side length of the hypercube satisfies

whenever . Now, we finally choose and as

By this choice, we have for sufficiently small . Hence, is a hypercube of side length and dimension embedded in . We obviously have , thus the dimension of the hypercube obeys

 d≥C2δ−4α+2

for all sufficiently small .

(ii): Let with compact support . For to be determined, we define for :

 ψi,m(t)=ψi1,i2,i3,m(t)=m−βφ(mt1−i1)φ(mt2−i2)φ(mt3−i3),

where and . We let . It is easy to see that . We note that the functions are disjointly supported (for a fixed ) and therefore mutually orthogonal. Thus we have the hypercube embedding

 H(m3,0,{ψi,m})⊂Cβ(R3),

where the side length of the hypercube is . Now, chose as

 m(δ)=⎢⎢ ⎢ ⎢⎣(δ∥ψ∥L2)−1/(β+3/2)⎥⎥ ⎥ ⎥⎦.

Hence, is a hypercube of side length and dimension embedded in . The dimension of the hypercube obeys

 d≥Cδ−31β+3/2=Cδ−62β+3,

for all sufficiently small .

3.4 Higher dimensions

Our main focus is, as mentioned above, the three-dimensional setting, but let us briefly sketch how the optimal sparsity result extends to higher dimensions. The -dimensional cartoon-like image class consists of functions having -smoothness apart from a -dimensional -smooth discontinuity surface. The -dimensional analogue of Theorem LABEL:thm:copy-lp-in-E is then straightforward to prove.

Theorem 3.3
• The class of -dimensional binary cartoon-like images contains a copy of for .

• The space of Hölder functions contains a copy of for .

It is then intriguing to analyze the behavior of and . from Theorem LABEL:thm:copy-lp-in-E-higher-dim. In fact, as , we observe that in both cases. Thus, the decay of any for cartoon-like images becomes slower as grows and approaches which is actually the rate guaranteed for all .

Moreover, by Theorem LABEL:thm:copy-lp-in-E-higher-dim we see that the optimal approximation error rate for -term approximations within the class of -dimensional cartoon-like images is . In this paper we will however restrict ourselves to the case since we, as mentioned in the introduction, can see this dimension as a critical one.

4 Hybrid shearlets in 3D

After we have set our benchmark for directional representation systems in the sense of stating an optimality criteria for sparse approximations of the cartoon-like image class , we next introduce the class of shearlet systems we claim behave optimally.