Optimally sparse approximations of 3D functions by compactly supported shearlet frames
Abstract
We study efficient and reliable methods of capturing and sparsely representing anisotropic structures in 3D data. As a model class for multidimensional data with anisotropic features, we introduce generalized threedimensional cartoonlike images. This function class will have two smoothness parameters: one parameter controlling classical smoothness and one parameter controlling anisotropic smoothness. The class then consists of piecewise smooth functions with discontinuities on a piecewise smooth surface. We introduce a pyramidadapted, hybrid shearlet system for the threedimensional setting and construct frames for with this particular shearlet structure. For the smoothness range we show that pyramidadapted shearlet systems provide a nearly optimally sparse approximation rate within the generalized cartoonlike image model class measured by means of nonlinear term approximations.
Key words. anisotropic features, multidimensional data, shearlets, cartoonlike images, nonlinear approximations, sparse approximations
AMS subject classifications. Primary: 42C40, Secondary: 42C15, 41A30, 94A08
1 Introduction
Recent advances in modern technology have created a new world of huge, multidimensional data. In biomedical imaging, seismic imaging, astronomical imaging, computer vision, and video processing, the capabilities of modern computers and highprecision measuring devices have generated 2D, 3D and even higher dimensional data sets of sizes that were infeasible just a few years ago. The need to efficiently handle such diverse types and huge amounts of data has initiated an intense study in developing efficient multivariate encoding methodologies in the applied harmonic analysis research community. In neuroimaging, e.g., fluorescence microscopy scans of living cells, the discontinuity curves and surfaces of the data are important specific features since one often wants to distinguish between the image “objects” and the “background”, e.g., to distinguish actin filaments in eukaryotic cells; that is, it is important to precisely capture the edges of these 1D and 2D structures. This specific application is an illustration that important classes of multivariate problems are governed by anisotropic features. The anisotropic structures can be distinguished by location and orientation or direction which indicates that our way of analyzing and representing the data should capture not only location, but also directional information. This is exactly the idea behind socalled directional representation systems which by now are well developed and understood for the 2D setting. Since much of the data acquired in, e.g., neuroimaging, are truly threedimensional, analyzing such data should be performed by threedimensional directional representation systems. Hence, in this paper, we therefore aim for the 3D setting.
In applied harmonic analysis the data is typically modeled in a continuum setting as squareintegrable functions or distributions. In dimension two, to analyze the ability of representation systems to reliably capture and sparsely represent anisotropic structures, Candés and Donoho [7] introduced the model situation of socalled cartoonlike images, i.e., twodimensional functions which are piecewise smooth apart from a piecewise discontinuity curve. Within this model class there is an optimal sparse approximation rate one can obtain for a large class of nonadaptive and adaptive representation systems. Intuitively, one should think adaptive systems would be far superior in this task, but it has been shown in recent years that nonadaptive methods using curvelets, contourlets, and shearlets all have the ability to essentially optimal sparsely approximate cartoonlike images in 2D measured by the error of the best term approximation [7, 17, 13, 24].
1.1 Dimension three
In the present paper we will consider sparse approximations of cartoonlike images using shearlets in dimension three. The step from the onedimensional setting to the twodimensional setting is necessary for the appearance of anisotropic features at all. When further passing from the twodimensional setting to the threedimensional setting, the complexity of anisotropic structures changes significantly. In 2D one “only” has to handle one type of anisotropic features, namely curves, whereas in 3D one has to handle two geometrically very different anisotropic structures: Curves as onedimensional features and surfaces as twodimensional anisotropic features. Moreover, the analysis of sparse approximations in dimension two depends heavily on reducing the analysis to affine subspaces of . Clearly, these subspaces always have dimension and codimension one in 2D. In dimension three, however, we have subspaces of codimension one and two, and one therefore needs to perform the analysis on subspaces of the “correct” codimension. Therefore, the 3D analysis requires fundamental new ideas.
Finally, we remark that even though the present paper only deals with the construction of shearlet frames for and sparse approximations of such, it also illustrates how many of the problems that arises when passing to higher dimensions can be handled. Hence, once it is known how to handle anisotropic features of different dimensions in 3D, the step from 3D to 4D can be dealt with in a similar way as also the extension to even higher dimensions. Therefore the extension of the presented result in to higher dimensions should be, if not straightforward, then at least be achievable by the methodologies developed.
1.2 Modelling anisotropic features
The class of 2D cartoonlike images consists, as mentioned above, of piecewise smooth functions with discontinuities on a piecewise smooth curve, and this class has been investigated in a number of recent publications. The obvious extension to the 3D setting is to consider functions of three variables being piecewise smooth function with discontinuities on a piecewise smooth surface. In some applications the smoothness requirement is too strict, and we will, therefore, go one step further and consider a larger class of images also containing less regular images. The generalized class of cartoonlike images in 3D considered in this paper consists of threedimensional piecewise smooth functions with discontinuities on a piecewise surface for . Clearly, this model provides us with two new smoothness parameters: being a classical smoothness parameter and being an anisotropic smoothness parameter, see Figure LABEL:fig:cartoonpiecewise for an illustration.
This image class is unfortunately not a linear space as traditional smoothness spaces, e.g., Hölder, Besov, or Sobolev spaces, but it allows one to study the quality of the performance of representation systems with respect to capturing anisotropic features, something that is not possible with traditional smoothness spaces.
Finally, we mention that allowing piecewise smoothness and not everywhere smoothness is an essential way to model singularities along surfaces as well as along curves which we already described as the two fundamental types of anisotropic phenomena in 3D.
1.3 Measure for Sparse Approximation and Optimality
The quality of the performance of a representation system with respect to cartoonlike images is typically measured by taking a nonlinear approximation viewpoint. More precisely, given a cartoonlike image and a representation system, the chosen measure is the asymptotic behavior of the error of term (nonlinear) approximations in the number of terms . When the anisotropic smoothness is bounded by the classical smoothness as , the anisotropic smoothness of the cartoonlike images will be the determining factor for the optimal approximation error rate one can obtain. To be more precise, as we will show in Section LABEL:sec:optimalsparsity, the optimal approximation rate for the generalized 3D cartoonlike images models which can be achieved for a large class of adaptive and nonadaptive representation systems for is
for some constant , where is an term approximation of . For cartoonlike images, wavelet and Fourier methods will typically have an term approximation error rate decaying as and as , respectively, see [23]. Hence, as the anisotropic smoothness parameter grows, the approximation quality of traditional tools becomes increasingly inferior as they will deliver approximation error rates that are far from the optimal rate . Therefore, it is desirable and necessary to search for new representation systems that can provide us with representations with a more optimal rate. This is where pyramidadapted, hybrid shearlet systems enter the scene. As we will see in Section LABEL:sec:optimalsparsity3d, this type of representation system provides nearly optimally sparse approximations:
where is the term approximation obtained by keeping the largest shearlet coefficients, and with and for and for . Clearly, the obtained sparse approximations for these shearlet systems are not truly optimal owing to the polynomial factor for and the polylog factor for . On the other hand, it still shows that nonadaptive schemes such as the hybrid shearlet system can provide rates that are nearly optimal within a large class of adaptive and nonadaptive methods.
1.4 Construction of 3D hybrid shearlets
Shearlet theory has become a central tool in analyzing and representing 2D data with anisotropic features. Shearlet systems are systems of functions generated by one single generator with parabolic scaling, shearing, and translation operators applied to it, in much the same way wavelet systems are dyadic scalings and translations of a single function, but including a directionality characteristic owing to the additional shearing operation and the anisotropic scaling. Of the many directional representation systems proposed in the last decade, e.g., steerable pyramid transform [29], directional filter banks [3], 2D directional wavelets [2], curvelets [6], contourlets [13], bandelets [28], the shearlet system [25] is among the most versatile and successful. The reason for this being an extensive list of desirable properties: Shearlet systems can be generated by one function, they precisely resolve wavefront sets, they allow compactly supported analyzing elements, they are associated with fast decomposition algorithms, and they provide a unified treatment of the continuum and the digital realm. We refer to [22] for a detailed review of the advantages and disadvantages of shearlet systems as opposed to other directional representation systems.
Several constructions of discrete bandlimited and compactly supported 2D shearlet frames are already known, see [15, 21, 9, 26, 20, 11]; for construction of 3D shearlet frames less is known. Dahlke, Steidl, and Teschke [10] recently generalized the shearlet group and the associated continuous shearlet transform to higher dimensions . Furthermore, in [10] they showed that, for certain bandlimited generators, the continuous shearlet transform is able to identify hyperplane and tetrahedron singularities. Since this transform originates from a unitary group representation, it is not able to capture all directions, in particular, it will not capture the delta distribution on the axis (and more generally, any singularity with “directions”). We will use a different tiling of the frequency space, namely systems adapted to pyramids in frequency space, to avoid this nonuniformity of directions. We call these systems pyramidadapted shearlet system[22]. In [16], the continuous version of the pyramidadapted shearlet system was introduced, and it was shown that the location and the local orientation of the boundary set of certain threedimensional solid regions can be precisely identified by this continuous shearlet transform. Finally, we will also need to use a different scaling than the one from [10] in order to achieve shearlet systems that provide almost optimally sparse approximations.
Since spatial localization of the analyzing elements of the encoding system is very important both for a precise detection of geometric features as well as for a fast decomposition algorithm, we will mainly follow the sufficient conditions for and construction of compactly supported coneadapted 2D shearlets by Kittipoom and two of the authors [20] and extend these result to the 3D setting (Section LABEL:sec:shearlhighdimens). These results provide us with a large class of separable, compactly supported shearlet systems with “good” frame bounds, optimally sparse approximation properties, and associated numerically stable algorithms. One important new aspect is that dilation will depend on the smoothness parameter . This will provide us with hybrid shearlet systems ranging from classical parabolic based shearlet systems () to almost classical wavelet systems (). In other words, we obtain a parametrized family of shearlets with a smooth transition from (nearly) wavelets to shearlets. This will allow us to adjust our shearlet system according to the anisotropic smoothness of the data at hand. For rational values of we can associate this hybrid system with a fast decomposition algorithm using the fast Fourier transform with multiplication and periodization in the frequency space (in place of convolution and downsampling).
Our compactly supported 3D hybrid shearlet elements (introduced in Section LABEL:sec:shearlhighdimens) will in the spatial domain be of size times times for some fixed anisotropy parameter . When this corresponds to “cubelike” (or “waveletlike”) elements. As approaches the scaling becomes less and less yielding “platelike” elements as . This indicates that these anisotropic 3D shearlet systems have been designed to efficiently capture twodimensional anisotropic structures, but neglecting onedimensional structures. Nonetheless, these 3D shearlet systems still perform optimally when representing and analyzing cartoonlike functions that have discontinuities on piecewise smooth surfaces – as mentioned such functions model 3D data that contain both point, curve, and surface singularities.
Let us end this subsection with a general thought on the construction of bandlimited tight shearlet frames versus compactly supported shearlet frames. There seem to be a tradeoff between compact support of the shearlet generators, tightness of the associated frame, and separability of the shearlet generators. The known constructions of tight shearlet frames, even in 2D, do not use separable generators, and these constructions can be shown to not be applicable to compactly supported generators. Moreover, these tight frames use a modified version of the pyramidadapted shearlet system in which not all elements are dilates, shears, and translations of a single function. Tightness is difficult to obtain while allowing for compactly supported generators, but we can gain separability as in Theorem LABEL:thm:compactforpyramid hence fast algorithmic realizations. On the other hand, when allowing noncompactly supported generators, tightness is possible, but separability seems to be out of reach, which makes fast algorithmic realizations very difficult.
1.5 Other approaches for 3D data
Other directional representation systems have been considered for the 3D setting. We mention curvelets [5, 4], surflets [8], and surfacelets [27]. This line of research is mostly concerned with constructions of such systems and not their sparse approximation properties with respect to cartoonlike images. In [8], however, the authors consider adaptive approximations of Horizon class function using surflet dictionaries which generalizes the wedgelet dictionary for 2D signals to higher dimensions.
During the final stages of this project, we realized that a similar almost optimal sparsity result for the 3D setting (for the model case ) was reported by Guo and Labate [18] using bandlimited shearlet tight frames. They provide a proof for the case where the discontinuity surface is (nonpiecewise) smooth using the Xray transform.
1.6 Outline
We give the precise definition of generalized cartoonlike image model class in Section LABEL:sec:cartoon, and the optimal rate of approximation within this model is then derived in Section LABEL:sec:optimalsparsity. In Section LABEL:sec:shearlhighdimens and Section LABEL:sec:constrcompsupp we construct the socalled pyramidadapted shearlet frames with compactly supported generators. In Sections LABEL:sec:optimalsparsity3d to LABEL:sec:prooftheorem1 we then prove that such shearlet systems indeed deliver nearly optimal sparse approximations of threedimensional cartoonlike images. We extend this result to the situation of discontinuity surfaces which are piecewise smooth except for zero and onedimensional singularities and again derive essential optimal sparsity of the constructed shearlet frames in Section LABEL:sec:prooftheorem2. We end the paper by discussion various possible extensions in Section LABEL:sec:extensions.
1.7 Notation
We end this introduction by reviewing some basic definitions. The following definitions will mostly be used for the case , but they will however be defined for general . For we denote the norm on of by . The Lebesgue measure on is denoted by and the counting measure by . Sets in are either considered equal if they are equal up to sets of measure zero or if they are elementwise equal; it will always be clear from the context which definition is used. The norm of is denoted by . For , the Fourier transform is defined by
with the usual extension to . The Sobolev space and norm are defined as
For functions the homogeneous Hölder seminorm is given by
where is the fractional part of and is the usual length of a multiindex . Further, we let
and we denote by the space of Hölder functions, i.e., functions , whose norm is bounded.
2 Generalized 3D cartoonlike image model class
The first complete model of 2D cartoonlike images was introduced in [7], the basic idea being that a closed curve separates two smooth functions. For 3D cartoonlike images we consider square integrable functions of three variables that are piecewise smooth with discontinuities on a piecewise smooth surface.
Fix and , and let be continuous and define the set in by
We require that the boundary of is a closed surface parametrized by
\hb@xt@.01(2.1) 
Furthermore, the radius function must be Hölder continuous with coefficient , i.e.,
\hb@xt@.01(2.2) 
For , the set is defined to be the set of all such that is a translate of a set obeying (LABEL:eq:curve) and (LABEL:eq:curvebound). The boundary of the surfaces in will be the discontinuity sets of our cartoonlike images. We remark that any starshaped sets in with bounded principal curvatures will belong to for some . Actually, the property that the sets in are parametrized by spherical angles, which implies that the sets are starshaped, is not important to us. For we could, e.g., extend to be all bounded subset of , whose boundary is a closed surface with principal curvatures bounded by .
To allow more general discontinuities surfaces, we extend to a class of sets with piecewise boundaries . We denote this class , where is the number of pieces and be an upper bound for the “curvature” on each piece. In other words, we say that if is a bounded subset of whose boundary is a union of finitely many pieces which do not overlap except at their boundaries, and each patch can be represented in parametric form by a smooth radius function with . We remark that we put no restrictions on how the patches meet, in particular, can have arbitrarily sharp edges joining the pieces . Also note that .
The actual objects of interest to us are, as mentioned, not these starshaped sets, but functions that have the boundary as discontinuity surface.
Definition 2.1
Let , , and . Then denotes the set of functions of the form
where and with and for each . We let .
We speak of as consisting of cartoonlike 3D images having smoothness apart from a piecewise discontinuity surface. We stress that is not a linear space of functions and that depends on the constants and even though we suppress this in the notation. Finally, we let denote binary cartoonlike images, that is, functions , where and .
3 Optimality bound for sparse approximations
After having clarified the model situation , we will now discuss which measure for the accuracy of approximation by representation systems we choose, and what optimality means in this case. We will later in Section LABEL:sec:optimalsparsity3d restrict the parameter range in our model class to . In this section, however, we will find the theoretical optimal approximation error rate within for the full range and . Before we state and prove the main optimal sparsity result of this section, Theorem LABEL:thm:copylpinE, we discuss the notions of term approximations and frames.
3.1 term approximations
Let be a dictionary with the index set not necessarily being countable. We seek to approximate each single element of with elements from by terms of this system. For this, let be arbitrarily chosen. Letting now , we consider term approximations of , i.e.,
The best term approximation to is an term approximation
which satisfies that, for all , , and for all scalars ,
3.2 Frames
A frame for a separable Hilbert space is a countable collection of vectors for which there are constants such that
If the upper bound in this inequality holds, then is said to be a Bessel sequence with Bessel constant . For a Bessel sequence , we define the frame operator of by
If is a frame, this operator is bounded, invertible, and positive. A frame is said to be tight if we can choose . If furthermore , the sequence is said to be a Parseval frame. Two Bessel sequences and are said to be dual frames if
It can be shown that, in this case, both Bessel sequences are even frames, and we shall say that the frame is dual to , and vice versa. At least one dual always exists; it is given by and called the canonical dual.
Now, suppose the dictionary forms a frame for with frame bounds and , and let denote the canonical dual frame. We then consider the expansion of in terms of this dual frame, i.e.,
For any we have by definition. Since we only consider expansions of functions belonging to a subset of , this can, at least, potentially improve the decay rate of the coefficients so that they belong to for some . This is exactly what is understood by sparse approximation (also called compressible approximations). We hence aim to analyze shearlets with respect to this behavior, i.e., the decay rate of shearlet coefficients.
For frames, tight and nontight, it is not possible to derive a usable, explicit form for the best term approximation. We therefore crudely approximate the best term approximation by choosing the term approximation provided by the indices associated with the largest coefficients in magnitude with these coefficients, i.e.,
However, even with this rather crude greedy selection procedure, we obtain very strong results for the approximation rate of shearlets as we will see in Section LABEL:sec:optimalsparsity3d.
The following wellknown result shows how the term approximation error can be bounded by the tail of the square of the coefficients . We refer to [23] for a proof.
Lemma 3.1
Let be a frame for with frame bounds and , and let be the canonical dual frame. Let with , and let be the term approximation . Then
for any .
Let denote the nonincreasing (in modulus) rearrangement of , e.g., denotes the th largest coefficient of in modulus. This rearrangement corresponds to a bijection that satisfies
Since , also . Let be a cartoonlike image, and suppose that , in this case, even decays as
\hb@xt@.01(3.1) 
for some , where the notation means that there exists a such that , i.e., . Clearly, we then have for . By Lemma LABEL:lemma:ntermframeapprox, the term approximation error will therefore decay as
\hb@xt@.01(3.2) 
where is the term approximation of by keeping the largest coefficients, that is,
\hb@xt@.01(3.3) 
The notation , sometimes also written as , used above means that is bounded both above and below by asymptotically as , that is, and . The approximation error rate obtained in (LABEL:eq:fromcoeffdecaytoerrordecay) is exactly the sought optimal rate mentioned in the introduction. This illustrates that the fraction introduced in the decay of the sequence will play a major role in the following. In particular, we are searching for a representation system which forms a frame and delivers decay of as in (LABEL:eq:soughtweakdecay) for any cartoonlike image.
3.3 Optimal sparsity
In this subsection we will state and prove the main result of this section, Theorem LABEL:thm:copylpinE, but let us first discuss some of its implications for sparse approximations of cartoonlike images.
From the dictionary with the index set not necessarily being countable, we consider expansions of the form
\hb@xt@.01(3.4) 
where is a countable selection from that may depend on . Moreover, we can assume that are normalized by . The selection of the th term is obtained according to a selection rule which may adaptively depend on . Actually, the th element may also be modified adaptively and depend on the first th chosen elements [14]. We assume that how deep or how far down in the indexed dictionary we are allowed to search for the next element in the approximation is limited by a polynomial . Without such a depth search limit, one could choose to be a countable, dense subset of which would yield arbitrarily good sparse approximations, but also infeasible approximations in practise. We shall denote any sequence of coefficients chosen according to these restrictions by .
We are now ready to state the main result of this section. Following Donoho [14] we say that a function class contains an embedded orthogonal hypercube of dimension and side if there exists , and orthogonal functions , , with , such that the collection of hypercube vertices
is contained in . The sought bound on the optimal sparsity within the set of cartoonlike images will be obtained by showing that the cartoonlike image class contains sufficiently highdimensional hypercubes with sufficiently large sidelength; intuitively, we will see that a certain high complexity of the set of cartoonlike images limits the possible sparsity level. The meaning of “sufficiently” is made precise by the following definition. We say that a function class contains a copy of if contains embedded orthogonal hypercubes of dimension and side , and if, for some sequence , and some constant :
\hb@xt@.01(3.5) 
The first part of the following result is an extension from the 2D to the 3D setting of [14, Thm. 3].
Theorem 3.2

The class of binary cartoonlike images contains a copy of for .

The space of Hölder functions with compact support in contains a copy of for .
Before providing a proof of the theorem, let us discuss some of its implications for sparse approximations of cartoonlike images. Theorem LABEL:thm:copylpinE(i) implies, by [14, Theorem 2], that for every and every method of atomic decomposition based on polynomial depth search from any countable dictionary , we have for :
\hb@xt@.01(3.6) 
where the weak “norm”^{1}^{1}1Note that neither nor (for ) is a norm since they do not satisfy the triangle inequality. Note also that the weak norm is a special case of the Lorentz quasinorm. is defined as . Sparse approximations are approximations of the form with coefficients decaying at certain, hopefully high, rate. Equation (LABEL:eq:sparsityofcoefficients) is a precise statement of the optimal achievable sparsity level. No representation system (up to the restrictions described above) can deliver expansions (LABEL:eq:dictexpansion) for with coefficients satisfying for . As we will see in Theorems LABEL:thm:3doptsparse and LABEL:thm:3doptsparsepiecewise, pyramidadapted shearlet frames deliver for , where .
Assume for a moment that we have an “optimal” dictionary at hand that delivers , and assume further that it is also a frame. As we saw in the Section LABEL:sec:frames, this implies that
where is the term approximation of by keeping the largest coefficients. Therefore, no frame representation system can deliver at better approximation error rate than under the chosen approximation procedure within the image model class . If is actually an orthonormal basis, then this is truly the optimal rate since best term approximations, in this case, are obtained by keeping the largest coefficients.
Similarly, Theorem LABEL:thm:copylpinE(ii) tells us that the optimal approximation error rate within the Hölder function class is . Combining the two estimates we see that the optimal approximation error rate within the full cartoonlike image class cannot exceed convergence. For the parameter range , this rate reduces to . For , as will show in Section LABEL:sec:optimalsparsity3d, shearlet systems actually deliver this rate except from an additional polylog factor, namely . For and , the factor is replaced by a small polynomial factor , where and for or .
It is striking that one is able to obtain such a near optimal approximation error rate since the shearlet system as well as the approximation procedure will be nonadaptive; in particular, since traditional, nonadaptive representation systems such as Fourier series and wavelet systems are far from providing an almost optimal approximation rate. This is illustrated in the following example.
Example 1
Let be the ball in with center and radius . Define . Clearly, if . Suppose . The best term Fourier sum yields
which is far from the optimal rate . For the wavelet case the situation is only slightly better. Suppose is any compactly supported wavelet basis. Then
where is the best term approximation from . The calculations leading to these estimates are not difficult, and we refer to [23] for the details. We will later see that shearlet frames yield , where is the best term approximation.
We mention that the rates obtained in Example LABEL:example:fourierwavelets are typical in the sense that most cartoonlike images will yield the exact same (and far from optimal) rates.
Finally, we end the subsection with a proof of Theorem LABEL:thm:copylpinE.
Proof. [Proof of Theorem LABEL:thm:copylpinE] The idea behind the proofs is to construct a collection of functions in and , respectively, such that the collection of functions will be vertices of a hypercube with dimension satisfying (LABEL:eq:dimensiongrowth).
(i): Let and be smooth functions with compact support and . For and we define:
for , where and . We let further . It is easy to see that . Moreover, it can also be shown that , where denotes the homogeneous Hölder norm introduced in (LABEL:eq:curvebound).
Without loss of generality, we can consider the cartoonlike images translated by so that their support lies in . Alternatively, we can fix an origin at , and use spherical coordinates relative to this choice of origin. We set and define
The radius functions for with defined by
\hb@xt@.01(3.7) 
determines the discontinuity surfaces of the functions of the form:
For a fixed the functions are disjointly supported and therefore mutually orthogonal. Hence, is a collection of hypercube vertices. Moreover,
where the constant only depends on . Any radius function of the form (LABEL:eq:radiusfunctions) satisfies
Therefore, whenever . This shows that we have the hypercube embedding
The side length of the hypercube satisfies
whenever . Now, we finally choose and as
By this choice, we have for sufficiently small . Hence, is a hypercube of side length and dimension embedded in . We obviously have , thus the dimension of the hypercube obeys
for all sufficiently small .
(ii): Let with compact support . For to be determined, we define for :
where and . We let . It is easy to see that . We note that the functions are disjointly supported (for a fixed ) and therefore mutually orthogonal. Thus we have the hypercube embedding
where the side length of the hypercube is . Now, chose as
Hence, is a hypercube of side length and dimension embedded in . The dimension of the hypercube obeys
for all sufficiently small .
3.4 Higher dimensions
Our main focus is, as mentioned above, the threedimensional setting, but let us briefly sketch how the optimal sparsity result extends to higher dimensions. The dimensional cartoonlike image class consists of functions having smoothness apart from a dimensional smooth discontinuity surface. The dimensional analogue of Theorem LABEL:thm:copylpinE is then straightforward to prove.
Theorem 3.3

The class of dimensional binary cartoonlike images contains a copy of for .

The space of Hölder functions contains a copy of for .
It is then intriguing to analyze the behavior of and . from Theorem LABEL:thm:copylpinEhigherdim. In fact, as , we observe that in both cases. Thus, the decay of any for cartoonlike images becomes slower as grows and approaches which is actually the rate guaranteed for all .
Moreover, by Theorem LABEL:thm:copylpinEhigherdim we see that the optimal approximation error rate for term approximations within the class of dimensional cartoonlike images is . In this paper we will however restrict ourselves to the case since we, as mentioned in the introduction, can see this dimension as a critical one.
4 Hybrid shearlets in 3D
After we have set our benchmark for directional representation systems in the sense of stating an optimality criteria for sparse approximations of the cartoonlike image class , we next introduce the class of shearlet systems we claim behave optimally.
4.1 Pyramidadapted shearlet systems
Fix . We scale according to scaling matrices , or , , and represent directionality by the shear matrices , , or , , defined by
and  
and  