Deep Convolutional Neural Networks Based on Semi-Discrete Frames

# Deep Convolutional Neural Networks Based on Semi-Discrete Frames

Thomas Wiatowski and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland
Email: {withomas, boelcskei}@nari.ee.ethz.ch
###### Abstract

Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of these networks was pioneered by Mallat [1]. Specifically, Mallat considered so-called scattering networks based on identical semi-discrete wavelet frames in each network layer, and proved translation-invariance as well as deformation stability of the resulting feature extractor. The purpose of this paper is to develop Mallat’s theory further by allowing for different and, most importantly, general semi-discrete frames (such as, e.g., Gabor frames, wavelets, curvelets, shearlets, ridgelets) in distinct network layers. This allows to extract wider classes of features than point singularities resolved by the wavelet transform. Our generalized feature extractor is proven to be translation-invariant, and we develop deformation stability results for a larger class of deformations than those considered by Mallat. For Mallat’s wavelet-based feature extractor, we get rid of a number of technical conditions. The mathematical engine behind our results is continuous frame theory, which allows us to completely detach the invariance and deformation stability proofs from the particular algebraic structure of the underlying frames.

ptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptpt

Deep Convolutional Neural Networks Based on
Semi-Discrete Frames

Thomas Wiatowski and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {withomas, boelcskei}@nari.ee.ethz.ch

## I Introduction

A central task in signal classification is feature extraction [2]. For example, we may want to detect whether an image contains a certain handwritten digit [3]. Moreover, this should be possible independently of the feature’s spatial (or temporal) location within the signal, which motivates the use of translation-invariant feature extractors. In addition, sticking to the example of handwritten digits, we want the feature extractor to be robust with respect to different handwriting styles. This is typically accounted for by asking for stability with respect to non-linear deformations of the feature to be extracted.

Spectacular success in many practical classification tasks has been reported for feature extractors generated by deep convolutional neural networks [4, 5]. The mathematical analysis of such networks was initiated by Mallat in [1]. Mallat’s theory applies to so-called scattering networks, where signals are propagated through layers that compute the modulus of wavelet coefficients. The resulting feature extractor is provably translation-invariant and stable with respect to certain non-linear deformations. Moreover, it leads to state-of-the-art results in various image classification tasks [6, 7].

The wavelet transform resolves signal features characterized by point singularities, but is not very effective in dealing with signals dominated by anisotropic features, such as, e.g., edges in images [8]. It thus seems natural to ask whether Mallat’s theory on scattering networks can be extended to general signal transformations. Moreover, certain audio classification problems [9] suggest that scattering networks with different signal transformations in different layers would be desirable in practice.

### Contributions

The goal of this paper is to extend Mallat’s theory to cope with general signal transformations (e.g., Gabor frames, wavelets, curvelets, shearlets, ridgelets), as well as to allow different signal transformations in different layers of the network, all that while retaining translation-invariance and deformation stability. Our second major contribution is a new deformation stability bound valid for a class of non-linear deformations that is wider than that considered by Mallat in [1]. The proofs in [1] all hinge critically on the wavelet transform’s structural properties, whereas the technical arguments in our proofs are completely detached from the particular structure of the signal transforms. This leads to simplified and shorter proofs for translation-invariance and deformation stability. Moreover, in the case of Mallat’s wavelet-based feature extractor we show that the admissibility condition for the mother wavelet (defined in [1, Theorem 2.6]) is not needed. The mathematical engine behind our results is the theory of continuous frames [10].

### Notation and preparatory material

The complex conjugate of is denoted by . The Euclidean inner product of is , with associated norm . The supremum norm of a matrix is defined by , and the supremum norm of a tensor is . We write for the open ball of radius centered at . The Borel -algebra of is denoted by . For a -measurable function , we write for the integral of with respect to Lebesgue measure . For , denotes the space of all -measurable functions such that For we set . The operator norm of the linear bounded operator is designated by . stands for the identity operator on . For a countably infinite set , denotes the space of sets , , , such that . We write for the Schwartz space, i.e., the space of functions whose derivatives along with the function itself are rapidly decaying [11, Section 7.3]. We denote the Fourier transform of by , and extend it in the usual way to [11, Theorem 7.9]. The convolution of and is . We write , , for the translation operator, and , , for the modulation operator. Involution is defined by . We denote the gradient of a function as . For a vector field , we write for its Jacobian matrix, and for its Jacobian tensor, with associated norms , , and . For a scalar field , we define the norm .

## Ii Mallat’s wavelet-based feature extractor

We set the stage by briefly reviewing Mallat’s construction [1]. The basis for Mallat’s feature extractor is a multi-stage wavelet filtering technique followed by modulus operations. The extracted features of a signal are defined as the set of low-pass filtered functions

 |⋯| |f∗ψλ(l)|∗ψλ(m)|⋯∗ψλ(n)|∗ϕJ, (1)

labeled by the indices corresponding to pairs of scales and directions. The wavelets and the low-pass filter are atoms of a semi-discrete Parseval wavelet frame and hence satisfy

 ∥ϕJ∗f∥22+∑λ∈ΛW∥ψλ∗f∥22=∥f∥22,∀f∈L2(Rd).

We refer the reader to Appendix A for a short review of the theory of semi-discrete frames. The architecture corresponding to (II), illustrated in Figure 1, is known as scattering network [6], and uses the same wavelets in every network layer.

It is shown in [1] that the feature extractor in (II) is translation-invariant, in the sense that

 ΦM(Ttf)=TtΦM(f),∀t∈Rd, ∀f∈L2(Rd),

where is applied element-wise in . Further, it is proved in [1] that is stable with respect to deformations of the form

 Fτf(x):=f(x−τ(x)). (2)

Specifically, for the normed function space defined in (8) below, Mallat proved that there exists a constant such that for all and every with111It is actually the assumption , rather than as stated in [1, Theorem 2.12], that is needed in [1, Eq. E.31] to establish . , the deformation error satisfies

 |||ΦM(f)−ΦM(Fτf)|||≤C(2−J∥τ∥∞+J∥Dτ∥∞+∥D2τ∥∞)∥f∥HM. (3)

## Iii Generalized feature extractor

In this section, we describe our generalized feature extractor and start by introducing the notion of a frame collection.

###### Definition 1.

For all , let be a semi-discrete frame with frame bounds and atoms indexed by a countable set . The sequence is called a frame collection with frame bounds and

The elements , , in a frame collection correspond to particular layers in the generalized scattering network defined below. In Mallat’s construction one atom of the semi-discrete wavelet frame , namely the low-pass filter , is singled out to generate the output set (II) of the feature extractor . We honor Mallat’s terminology and designate one of the atoms of each frame in the frame collection as output-generating atom. Note, however, that our theory does not require this atom to have low-pass characteristics. Specifically, we set for an arbitrary, but fixed . From now on, we therefore write

for the atoms of the semi-discrete frame . The reader might want to think of the discrete index set as a collection of scales, directions, or frequency-shifts.

###### Remark 1.

Examples of structured frames that satisfy the general semi-discrete frame condition (9) and will hence be seen, in Theorem 1, to be applicable in the construction of generalized feature extractors are, e.g., Gabor frames [12], curvelets [13, 14], shearlets [8], ridgelets [15, 16], and, of course, wavelets [17] as considered by Mallat in [1].

We now introduce our generalized scattering network. To this end, we generalize the multi-stage filtering technique underlying Mallat’s scattering network to allow for general semi-discrete frames that can, in addition, be different in different layers. This requires the definition of a general modulus-convolution operator, and of paths on index sets.

###### Definition 2.

Let be a frame collection with atoms . For , define the set . An ordered sequence is called a path. The empty path, , defines the set . The modulus-convolution operator is defined as , where are the atoms of the semi-discrete frame associated with the -th layer in the network.

We also need to extend the operator to paths and do that according to

 U[q]f:=U[λm]⋯U[λ2]U[λ1]f=|⋯||f∗fλ1|∗fλ2|⋯∗fλm|, (4)

where we set . Note that the multi-stage filtering operation (4) is well-defined, as , thanks to Young’s inequality [18, Theorem 1.2.12]. Figure 2 illustrates the generalized scattering network with different semi-discrete frames in different layers.

We can now put the pieces together and define the generalized feature extractor .

###### Definition 3.

Let be a frame collection, and define . Given a path , , we write for the output-generating atom of the semi-discrete frame . The feature extractor with respect to the frame collection is defined as

 ΦΨ(f):={U[q]f∗ϕ[q]}q∈Q. (5)

## Iv Main result

The main result of this paper is the following theorem, stating that the feature extractor defined in (5) is translation-invariant and stable with respect to time-frequency deformations of the form

 Fτ,ωf(x):=e2πiω(x)f(x−τ(x)). (6)

The class of deformations we consider is wider than the one in Mallat’s theory, who considered translation-like deformations of the form only. Modulation-like deformations occur, e.g., if we have access only to a band-pass version of the signal .

###### Theorem 1.

Let be a frame collection with upper frame bound . The feature extractor defined in (5) is translation-invariant. Further, for , define the space of -band-limited functions

 HR:={f∈L2(Rd) | supp(^f)⊆BR(0)}.

Then, the feature extractor is stable on with respect to non-linear deformations (6), i.e., there exists (that does not depend on ) such that for all and all , with , it holds that

 |||ΦΨ(f)−ΦΨ(Fτ,ωf)|||≤C(R∥τ∥∞+∥ω∥∞)∥f∥2. (7)

The proof of Theorem 1 can be found in Appendix B. Our main result shows that translation-invariance and deformation stability are retained for the generalized feature extractor . The strength of this result derives itself from the fact that the only condition on for this to hold is . This condition is easily met by normalizing the frame elements accordingly. Such a normalization impacts neither translation-invariance nor the constant in (7) which is seen, in (14), to be independent of . All this is thanks to our proof techniques, unlike those in [1], being independent of the algebraic structure of the underlying frames. This is accomplished through a generalization of a Lipschitz-continuity result by Mallat [1, Proposition 2.5] for the feature extractor (stated in Proposition 2 in Appendix B), and by employing a partition of unity argument [19] for band-limited functions.

## V Relation to Mallat’s results

To see how Mallat’s wavelet-based architecture is covered by our Theorem 1, simply note that by [1, Eq. 2.7] the atoms satisfy (10) with . Since Mallat’s construction uses the same wavelet frame in each layer, this trivially implies .

Mallat imposes additional technical conditions on the atoms , one of which is the so-called scattering admissibility condition for the mother wavelet, defined in [1, Theorem 2.6]. To the best of our knowledge, no wavelet in , , satisfying this condition has been reported in the literature.

Mallat’s stability bound (3) applies to signals satisfying

 ∥f∥HM:=∞∑m=0∑q∈ΛWm1∥U[q]f∥2<∞. (8)

While [1, Section 2.5] cites numerical evidence on (8) being finite for a large class of functions , it seems difficult to establish this analytically.

Finally, the stability bound (3) depends on the parameter , which determines the coarsest scale resolved by the wavelets . For the term vanishes; however, the term tends to infinity.

Our main result shows that i) the scattering admissibility condition in [1] is not needed, ii) instead of the signal class characterized by (8) our result applies provably to the space of -band-limited functions , and iii) our deformation stability bound (7), when particularized to wavelets, besides applying to a wider class of non-linear deformations, namely (6) instead of (2), is independent of .

The proof technique used in [1] to establish (3) makes heavy use of structural specifics of the atoms , namely isotropic dilations, vanishing moment conditions, and a constant number of directional wavelets across scales.

## A Semi-discrete frames

This appendix gives a short review of semi-discrete frames [10].

###### Definition 4.

Let be a set of functions indexed by a countable set . The set of translated and involuted functions

 ΨΛ={TbIfλ}(λ,b)∈Λ×Rd

is called a semi-discrete frame, if there exist constants such that

 A∥f∥22≤∑λ∈Λ∥f∗fλ∥22≤B∥f∥22 (9)

for all . The functions are called the atoms of the semi-discrete frame . When the semi-discrete frame is said to be tight. A tight semi-discrete frame with frame bound is called a semi-discrete Parseval frame.

The frame operator associated with the semi-discrete frame is defined in the weak sense by ,

is a bounded, positive, and boundedly invertible operator [10].

The reader might want to think of semi-discrete frames as shift-invariant frames [20], where the translation parameter is left unsampled. The discrete index set typically labels a collection of scales, directions, or frequency-shifts. For instance, as illustrated in Section II, Mallat’s scattering network is based on a semi-discrete Parseval frame of directional wavelet structure, where the atoms are indexed by the set , labeling a collection of scales and directions.

For shift-invariant frames it is often convenient to work with a unitarily equivalent representation of the frame operator.

###### Proposition 1.

[17, Theorem 5.11] Let be a countable index set. The functions are atoms of the semi-discrete frame with frame bounds if and only if

 (10)

## B Proof of Theorem 1

We first prove translation-invariance. Fix and define , . By (5) it follows that is translation-invariant if and only if

 C[q](Ttf)=Tt(C[q]f), ∀t∈Rd, ∀q∈Q. (11)

Due to and

 Tt(C[q]f)=Tt(U[q]f∗ϕ[q])=(Tt(U[q]f))∗ϕ[q],

(11) holds if , , . The proof is concluded by noting that is translation-invariant thanks to (4) and

 U[λn](Ttf)=|(Ttf)∗fλn|=|Tt(f∗fλn)|=Tt(U[λn]f),

for all , .

Let us now turn to the proof of deformation stability, which is based on two key ingredients, the first being a generalization of a Lipschitz-continuity result by Mallat [1, Proposition 2.5]:

###### Proposition 2.

Let be a frame collection with upper frame bound . The feature extractor is a bounded, Lipschitz-continuous operator with Lipschitz constant , i.e.,

 |||ΦΨ(f)−ΦΨ(h)|||≤√B∥f−h∥2

for all

The proof of Proposition 2 is not given here, as it essentially follows that of [1, Proposition 2.5] with minor changes. We now apply Proposition 2 with and get

 |||ΦΨ(f)−ΦΨ(Fτ,ωf)|||≤∥f−Fτ,ωf∥2

for all . Here, we used , due to , as well as , which is thanks to

 ∥h∥22=∥Fτ,ω(f)∥22=∫Rd|f(x−τ(x))|2dx≤2∥f∥22,

obtained through the change of variables , together with

 dudx=|det(Id−Dτ(x))|≥1−d∥Dτ∥∞≥1/2. (12)

The inequalities in (12) hold thanks to [21, Corollary 1] and , respectively. The second key ingredient of our proof is a partition of unity argument [19] for band-limited functions used to derive an upper bound on . We first determine a function such that for all . Consider such that , . Setting yields . Thus, , , as well as and for all . Then, we define the operator , . Note that is well-defined as . We now get

 ∥f−Fτ,ωf∥2=∥Aγf−Fτ,ωAγf∥2≤∥Aγ−Fτ,ωAγ∥2,2∥f∥2

for all . In order to bound the norm , we apply Schur’s Lemma to the integral operator .

###### Schur’s Lemma.

[18, App. I.1] Let be a locally integrable function satisfying and Then, the integral operator given by is a bounded operator from to with norm .

From the identity

 Fτ,ωAγ(f)(x)=e2πiω(x)∫Rdγ(x−τ(x)−u)f(u)du,

it follows that has the kernel function , which is locally integrable thanks to and . We next use a first-order Taylor expansion in order to bound . To this end, let , and define , as . It follows that and . Therefore, we have , . The special choice yields with

 ∣∣(ddthx,u)(λ)∣∣≤∣∣⟨∇γ(x−λτ(x)−u),τ(x)⟩∣∣+|2πω(x)γ(x−λτ(x)−u)|≤∥τ∥∞|∇γ(x−λτ(x)−u)|+2π∥ω∥∞|γ(x−λτ(x)−u)|.

Thanks to , and , we can apply Fubini’s Theorem to get

 ∫Rd|k(x,u)|du≤∥τ∥∞∫10∫Rd|∇γ(x−λτ(x)−u)|dudλ+2π∥ω∥∞∫10∫Rd|γ(x−λτ(x)−u)|dudλ≤∥τ∥∞∥∇γ∥1+2π∥ω∥∞∥γ∥1=R∥τ∥∞∥∇η∥1+2π∥ω∥∞∥η∥1.

Similarly, we obtain

 ∫Rd|k(x,u)|dx≤∥τ∥∞∫10∫Rd|∇γ(x−λτ(x)−u)|dxdλ+2π∥ω∥∞∫10∫Rd|γ(x−λτ(x)−u)|dxdλ≤2∥τ∥∞∥∇γ∥1+4π∥ω∥∞∥γ∥1=2R∥τ∥∞∥∇η∥1+4π∥ω∥∞∥η∥1

by the change of variables , together with

 dydx=|det(Id−λDτ(x))|≥1−λd∥Dτ∥∞≥1/2. (13)

The inequalities in (13) hold thanks to [21, Corollary 1], , and . The proof is completed by setting

 C:=max{2∥∇η∥1,4π∥η∥1}(R∥τ∥∞+∥ω∥∞). (14)

## References

• [1] S. Mallat, “Group invariant scattering,” Comm. Pure Appl. Math., vol. 65, no. 10, pp. 1331–1398, Oct. 2012.
• [2] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, Aug. 2013.
• [3] Y. LeCun and C. Cortes, “The MNIST database of handwritten digits,” http://yann. lecun. com/exdb/mnist, 1998.
• [4] Y. LeCun, K. Kavukvuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp. 253–256.
• [5] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. of 25th International Conference on Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1106–1114.
• [6] J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1872–1886, Aug. 2013.
• [7] ——, “Classification with scattering operators,” in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1561–1566.
• [8] G. Kutyniok and D. Labate, Shearlets: Multiscale analysis for multivariate data.   Birkhäuser, 2012.
• [9] J. Andén and S. Mallat, “Deep scattering spectrum,” IEEE Trans. Signal Process., vol. 62, no. 16, pp. 4114–4128, Aug. 2014.
• [10] S. T. Ali, J.-P. Antoine, and J.-P. Gazeau, “Continuous frames in Hilbert spaces,” Annals of Physics, vol. 222, no. 1, pp. 1–37, Feb. 1993.
• [11] W. Rudin, Functional analysis.   McGraw-Hill, 1991.
• [12] K. Gröchening, Foundations of time-frequency analysis.   Birkhäuser, 2001.
• [13] E. J. Candès and D. L. Donoho, “Continuous curvelet transform: II. Discretization and frames,” Appl. Comput. Harmon. Anal., vol. 19, no. 2, pp. 198–222, Sep. 2005.
• [14] P. Grohs, S. Keiper, G. Kutyniok, and M. Schaefer, “Cartoon approximation with -curvelets,” arXiv:1404.1043, Apr. 2014.
• [15] E. J. Candès, “Ridgelets: Theory and applications,” Ph.D. dissertation, Stanford University, 1998.
• [16] P. Grohs, “Ridgelet-type frame decompositions for Sobolev spaces related to linear transport,” J. Fourier Anal. Appl., vol. 18, no. 2, pp. 309–325, Apr. 2012.
• [17] S. Mallat, A wavelet tour of signal processing: The sparse way.   Academic Press, 2009.
• [18] L. Grafakos, Classical Fourier Analysis.   Springer, 2008.
• [19] W. Rudin, Real and complex analysis.   McGraw-Hill, 1987.
• [20] A. J. E. M. Janssen, “The duality condition for Weyl-Heisenberg frames,” in Gabor analysis: Theory and applications, H. G. Feichtinger and T. Strohmer, Eds.   Birkhäuser, 1998, pp. 33–84.
• [21] R. P. Brent, J. H. Osborn, and W. D. Smith, “Note on best possible bounds for determinants of matrices close to the identity matrix,” Linear Algebra Appl., vol. 466, pp. 21–26, Feb. 2015.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters