Understanding Graph Neural Networks with Asymmetric Geometric Scattering Transforms

# Understanding Graph Neural Networks with Asymmetric Geometric Scattering Transforms

Michael Perlmutter Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA Feng Gao Department of Genetics, Yale University, New Haven, Connecticut, USA Guy Wolf Matthew Hirn
November 15, 2019
###### Abstract

The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. Recently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed graph scattering transforms based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. This work helps bridge the gap between scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. This lays the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.

## 1 Introduction

The scattering transform is a wavelet-based model of convolutional neural networks (CNNs), introduced for signals defined on by S. Mallat in [11]. Like the front end of a CNN, the scattering transform produces a representation of an inputted signal through an alternating cascade of filter convolutions and pointwise nonlinearities. It differs from CNNs in two respects: i) it uses predesigned, wavelet filters rather than filters learned through training data, and ii) it uses the complex modulus as its nonlinear activation function rather than more common choices such as the rectified linear unit (ReLU). These differences lead to a network which provably has desirable mathematical properties. In particular, the Euclidean scattering transform is: i) nonexpansive on , ii) invariant to translations up to a certain scale parameter, and iii) stable to certain diffeomorphisms. In addition to these theoretical properties, the scattering transform has also been used to achieve very good numerical results in fields such as audio processing [1], medical signal processing [4], computer vision [12], and quantum chemistry [10].

While CNNs have proven tremendously effective for a wide variety of machine learning tasks, they typically assume that inputted data has a Euclidean structure. For instance, an image is naturally modeled as a function on However, many data sets of interest such as social networks, molecules, or surfaces have an intrinsically non-Euclidean structure and are naturally modeled as graphs or manifolds. This has motivated the rise of geometric deep learning, a field which aims to generalize deep learning methods to non-Euclidean settings. In particular, a number of papers have produced versions of the scattering transform for graph [7, 8, 9, 17] and manifold [13] structured data. These constructions seek to provide a mathematical model of geometric deep learning architectures such as graph neural networks in a manner analogous the way that Euclidean scattering transform models CNNs.

In this paper, we will construct two new families of wavelet transforms on a graph from asymmetric matrices and provide a theoretical analysis of both of these wavelet transforms as well as the windowed and non-windowed scattering transforms constructed from them. Because the matrices are in general not symmetric, our wavelet transforms will not be nonexpansive frame analysis operators on the standard inner product space Instead, they will be nonexpansive on a certain weighted inner product space where is an invertible matrix. In important special cases, our matrix will be either the lazy random walk matrix its transpose or its symmetric counterpart given by In these cases, is a weighted space with weights depending on the geometry of We will use these wavelets to construct windowed and non-windowed versions of the scattering transform on The windowed scattering transform inputs a signal and outputs a sequence of functions which we refer to as the scattering coefficients. The non-windowed scattering transform replaces the low-pass matrix used in the definition of the windowed scattering transform with an averaging operator and instead outputs a sequence of scalar-valued coefficients. It can be viewed as the limit of the windowed scattering transform as the scale of the low-pass tends to infinity (evaluated at some fixed coordinate ).

Analogously to the Euclidean scattering transform, we will show that the windowed graph scattering transform is: i) nonexpansive on ii) invariant to permutations of the vertices, up to a factor depending on the scale of the low-pass (for certain choices of ), and iii) stable to graph perturbations. Similarly, we will show that the non-windowed scattering transform is i) Lipschitz continuous on ii) fully invariant to permutations, and iii) stable to graph perturbations.

### 1.1 Notation and Preliminaries

Let be a weighted, connected graph consisting of vertices , edges , and weights , with the number of vertices. If is a signal in we will identify with the corresponding point in so that if is an matrix, the multiplication is well defined. Let denote the weighted adjacency matrix of , let be the corresponding weighted degree vector, and let We will let

be the normalized graph Laplacian, let denote the eigenvalues of and let be an orthonormal eigenbasis for may be factored as

 N=VΩVT,

where and is the unitary matrix whose -th column is One may check that and that we may choose where We note that since we assume is connected, it has a positive spectral gap, i.e.

 0=ω0<ω1. (1)

Our wavelet transforms will be constructed from the matrix defined by

 Tg\coloneqqVg(Ω)VT\coloneqqVΛgVT,

where is some strictly decreasing spectral function such that and , and

 Λg\coloneqqdiag(g(ω0),…,g(ωn−1))\coloneqqdiag(λ0,…,λn−1).

We note that where the fact that follows from (1). When there is no potential for confusion, we will supress dependence of and write and in place of and As our main example, we will choose in which case

In [8], Gama et al. constructed a graph scattering transform using wavelets which are polynomials in and in [9], Gao et al. defined a different, but closely related, graph scattering transform from polynomials of the lazy random walk matrix

In order to unify and generalize these frameworks we will let be an invertible matrix and let be the matrix defined by

 K\coloneqqM−1TM.

Note that depends on the choice of both and , and thus includes a very large family of matrices. As important special cases, we note that we may obtain by setting , and we obtain and by setting and letting and , respectively.

In Section 2, we will construct two wavelet transforms and from functions of and show that these wavelet transforms are non-expansive frame analysis operators on the appropriate Hilbert space. When (and therefore ), this Hilbert space will simply be the standard inner product space However, for general the matrix will not be self-adjoint on . This motivates us to introduce the Hilbert space of signals defined on with inner product defined by

 ⟨x,y⟩M=⟨Mx,My⟩2,

where denotes the standard inner product. We note that the norms , and are equivalent and that

where for any matrix we shall let and denote its operator norms on and respectively. The following lemma, which shows that is self-adjoint on will be useful in studying the frame bounds of the wavelet transforms constructed from

###### Proof.

By construction, is self-adjoint with respect to the standard inner product. Therefore, for all and we have

 ⟨Kx,y⟩M =⟨M(M−1TM)x,My⟩2 =⟨TMx,My⟩2 =⟨Mx,TMy⟩2 =⟨Mx,M(M−1TM)y⟩2 =⟨Mx,MKy⟩2 =⟨x,Ky⟩M.

It will frequently be useful to consider the eigenvector decompositions of and By definition, we have

 T=VΛVT (2)

where and is an orthonormal eigenbasis for wih Since the matrices and are similar with one may use the definition of to verify that the vectors defined by

 ui\coloneqqM−1vi

form an orthonormal eigenbasis for with One may also verify that

 wi\coloneqqMvi

is a left-eigenvector of and for all

In the following section, we will construct wavelets from polynomials of For a polynomial,
and a matrix we define by

 p(B)=akBk+…+a1B+a0I

The following lemma uses (2) to derive a formula for computing polynomials of and and relates the operator norms of polynomials of to polynomials of It will be useful for studying the wavelet transforms introduced in the following section.

###### Lemma 2.

For any polynomial we have

 (3)

Consequently, for all

 ∥p(K)x∥M=∥p(T)Mx∥2. (4)
###### Proof.

Since is unitary, and so it follows from (2) that

 Tr=VΛrVT

for all Moreover, since

 Kr=(M−1TM)r=M−1TrM=M−1VΛrVTM.

Linearity now implies (3). (4) follows by recalling that and noting therefore that for all

In light of Lemma 2, for any polynomial we may define and by

 p(T)1/2\coloneqqVTp(Λ)1/2VTandp(K)1/2=M−1VTp(Λ)1/2VTM, (5)

where the square root of the diagaonal matrix is defined entrywise. We may readily verify that

 p(T)1/2p(T)1/2=p(T)andp(K)1/2p(K)1/2=p(K).

### 1.2 Related Work

Graph scattering transforms have previously been introduced by Gama, Ribeiro, and Bruna. in [7] and [8], by Gao, Wolf, and Hirn in [9], and by Zou and Lerman in [17]. In [17], the authors construct a family of wavelet convolutions using the spectral decomposition of the unnormalized graph Laplacian and define a windowed scattering transform as an iterative series of wavelet convolutions and nonlinearities. They then prove results analogous to Theorems 1, 3, and 6 of this this paper for their windowed scattering transform. They also introduce a notion of stability to graph perturbations. However, their notion of graph perturbations is significantly different than the one we consider in Section 4.

In [8], the authors construct a family of wavelets from polynomials of in the case where and showed that the resulting non-windowed scattering transform was stable to graph perturbations. These results were then generalized in [7], where the authors introduced a more general class of graph convolutions, constructed from a class of symmetric matrices known as “graph shift operators.” The wavelet transform considered in [8] is nearly identical to the introduced in Section 2, in the special case where and with the only difference being that our wavelet transform includes a low-pass filter. In [9], wavelets were constructed from the lazy random walk matrix These wavelets are essentially the same as the in the case where and although similarly to [8], the wavelets in [9] do not use a low-pass filter. In all of these previous works, the authors carry out substantial numerical experiments and demonstrate that scattering transforms are effective for a variety of graph deep learning tasks.

Our work here is meant to unify and generalize the theory of these previous constructions. Our introduction of the matrix allows us to obtain wavelets very similar to either [8] or [9] as special cases. Moreover, the introduction of the tight wavelet frame allows us to produce a network with provable conservation of energy and nonexpansive properties analogous to [17]. To highlight the generality of our setup, we introduce both windowed and non-windowed versions of the scattering transform using general (wavelet) frames and provide a detailed theoretical analysis of both. In the case where (and therefore ) much of this analysis is quite similar to [8]. However, for general this matrix is asymmetric which introduces substantial challenges. While [9] demonstrated that asymmetric wavelets are numerically effective in the case , this work is the first to produce a theoretical analysis of graph scattering transforms constructed with asymmetric wavelets.

We believe that the generality of our setup introduces a couple of exciting new avenues for future research. In particular, we have introduced a large class of scattering transforms with provable stability and invariance guarantees. In the future, one might attempt to learn the matrix or the spectral function based off of data for improved numerical performance on specific tasks. This could be an important step towards bridging the gap between scattering transforms, which act as a model of neural networks, and other deep learning architectures. We also note that a key difference between our work and [18] is that we use the normalized graph Laplacian whereas they use the unnormalized Laplacian. It is quite likely that asymmetric wavelet transforms similar to ours can be constructed from the spectral decomposition of the unnormalized Laplacian. However, we leave that to future work.

## 2 The Graph Wavelet Transform

In this section, we will construct two graph wavelet transforms based off of the matrix introduced in Section 1.1. In the following sections, we will provide a theoretical analysis of the scattering transforms constructed from each of these wavelet transforms and of their stability properties.

Let and for let be the polynomial defined by

 pj(t)=⎧⎪⎨⎪⎩1−tif j=0t2j−1−t2jif 1≤j≤Jt2Jif j=J+1,

and let We note that by construction

 J+1∑j=0pj(t)=J+1∑j=0qj(t)2=1 for all 0≤t≤1. (6)

Using these functions we define two wavelet transforms by

 W(1)J={Ψ(1)j,Φ(1)J}0≤j≤JandW(2)J={Ψ(2)j,Φ(2)J}0≤j≤J,

where

 Ψ(1)j=qj(K),Φ(1)J=qJ+1(K),Ψ(2)j=pj(K),andΦ(2)J=pJ+1(K),

and the are defined as in (5). The next two propositions show is an isometry and is a nonexpansive frame analysis operator on .

###### Proposition 1.

is an isometry from to That is, for all

###### Proof.

Proposition 1 shows is self-adjoint on By Lemma 2 and by (5) we have

 Ψ(1)j=qj(K)=M−1Vqj(Λ)VTM

for and

 Φ(1)J=qJ+1(K)=M−1VqJ+1(Λ)VTM.

Thus, and are all self-adjoint on and are diagonalized in the same basis. Therefore, lower and upper the frame bounds of are given by computing

 min0≤i≤n−1Q(λi)andmax0≤i≤n−1Q(λi),

where The proof follows from recalling that by (6), we have that uniformly on and therefore, is an isometry. ∎

###### Proposition 2.

is a nonexpansive frame analysis operator from to That is, there exists a constant which depends only on such that for all

 CJ∥x∥2M≤∥∥W(2)Jx∥∥2ℓ2(L2(G,M))\coloneqqJ∑j=0∥∥Ψ(2)jx∥∥2M+∥∥Φ(2)Jx∥∥2M≤∥x∥2M.

We note in particular, that does not depend on or on the eignenvalues of

###### Remark 1.

If we restrict attention to such that then we may use an argument similar to Proposition 4.1 of [8] to get a lower frame bounds for which does not depend on , but does depend on the

###### Proof.

By the same reasoning as in the proof of Proposition 1, the frame bounds of are given by computing

 min0≤i≤n−1P(λi)andmax0≤i≤n−1P(λi),

where Since for all we have

 maxiP(λi)≤max[0,1]J+1∑j=0pj(t)2≤max[0,1](J+1∑j=0pj(t))2=1

with the middle inequality following from the fact that for all and the last equality following from (6). For the lower bound, we note that

 min0≤i≤n−1P(λi)≥min0≤t≤1J+1∑j=0pj(t)2≥min0≤t≤1[p0(t)2+pJ+1(t)2]=min0≤t≤1[(1−t)2+t2J+1]\coloneqqCJ>0.

## 3 The Scattering Transform

In this section, we will construct the scattering transform as a multilayered architecture built off of a frame such as the wavelet transforms and introduced in Section 2. We shall see the scattering transform constructed is a continuous operator on whenever is nonexpansive. We shall also see that it has desirable conservation of energy bounds when due to the fact that is an isometry. On the other hand, we shall see in the following section that the scattering transform has much stronger stability guarantees when

### 3.1 Definitions

Let be a connected weighted graph with let be an invertible matrix, and let be some indexing set. Assume that

 W={Ψj,Φ}j∈J

is a frame on such that

 A∥x∥2M≤∥Wx∥2ℓ2(L2(G,M))\coloneqq∑j∈J∥Ψjx∥2M+∥Φx∥2M≤B∥x∥2M, (7)

for some In this paper, we are primarily interested in the case where and is either or . Therefore, we will think of the matrices as wavelets, and as a low-pass filter. However, we will define the scattering transform for generic frames in order to highlight the relationship between properties of the scattering transform and of the underlying frame.

Letting be the pointwise modulus function , we define by

 Ux\coloneqq{U[j]x:m≥0,j=(j1,…,jm)∈Jm}.

Here, is the -fold Cartesian product of with itself, the are defined by

 U[j]x=MΨjm…MΨj1x,

for and we declare that when and is the “empty index.” We then define the windowed and non-windowed scattering transforms, and by

 Sx={S[j]x:m≥0,j=(j1,…,jm)∈Jm}and¯¯¯¯Sx={¯¯¯¯S[j]x:m≥0,j=(j1,…,jm)∈Jm},

where the scattering coefficients and are defined by

 S[j]x=ΦU[j]xand¯¯¯¯S[j]x=⟨μ,U[j]x⟩M

for some weighting vector One natural choice is where is the vector of all ones. In this case, one may verify that and we recover a setup similar to [9]. Another natural choice is in which case we recover a setup similar to [8] if we set

In practice, one only uses finitely many scattering coefficients. This motivates us to consider the partial scattering transforms defined for by

 S(L)ℓx={S[j]x:j=(j1,…,jm)∈Jm,ℓ≤m≤L}

and

 ¯¯¯¯S(L)ℓx={¯¯¯¯S[j]x:j=(j1,…,jm)∈Jm,ℓ≤m≤L}.

### 3.2 Continuity and Conservation of Energy Properties

The following theorem shows that the windowed scattering transform is nonexpansive and the non-windowed scattering transform is Lipschitz continuous when is either or or, more generally, whenever is nonexpansive.

###### Theorem 1.

If in (7), then the windowed scattering transform is a nonexpansive operator from to and the non-windowed scattering transform is a Lipschitz continuous operator from to Specifically, for all

 ∥Sx−Sy∥ℓ2(L2(G,M))≤∥x−y∥M, (8)

and

 ∥¯¯¯¯Sx−¯¯¯¯Sy∥ℓ2≤∥μ∥M∥Φ−1∥M∥x−y∥M. (9)

The proof of (8) is very similar to analogous results in e.g., [11] and [17]. The proof of (9) uses the relationship to show

 ∥¯¯¯¯Sx−¯¯¯¯Sy∥ℓ2≤∥μ∥M∥Φ−1∥M∥Sx−Sy∥ℓ2(L2(G,M)).

Full details are provided in Appendix B.

The next theorem shows that if is either of the wavelet transforms constructed in Section 2, then experiences rapid energy decay. Our arguments use ideas similar to the proof of Proposition 3.3 of [17], with minor modifications to account for the fact that our wavelet constructions are different. Please see Appendix C for a complete proof.

###### Theorem 2.

Let let and let be either of the wavelet transforms, or constructed in Section 2. Then for all and all

 ∑j∈Jm+1∥U[j]x∥2M≤(1−dmin∥d∥1)∑j∈Jm∥U[j]x∥2M. (10)

Therefore, for all

 ∑j∈Jm+1∥U[j]x∥2M≤(1−dmin∥d∥1)m∥x∥2M. (11)

The next theorem shows that if then the windowed graph scattering transform conserves energy on Its proof, which relies on Proposition 1, Theorem 2, and Lemma 5, is nearly identical to the proof of Theorem 3.1 in [17]. We give a proof in Appendix D for the sake of completeness.

###### Theorem 3.

Let let and let Then the non-windowed scattering transform is energy preserving, i.e., for all

 ∥Sx∥ℓ2(L2(G,M))=∥x∥M.

### 3.3 Permutation Invariance and Equivariance

In this section, we will show that both and the windowed graph scattering transform are permutation equivariant. As a consequence, we will be able to show that the non-windowed scattering transform is permutation invariant and that under certain assumptions the windowed-scattering transform is permutation invariant up to a factor depending on the scale of the low-pass filter.

Let denote the permutation group on elements, and, for let be the graph obtained by permuting the vertices of We define which we view as the analog of associated to by

 M′=ΠMΠT.

To motivate this definition, we note that if is the identity, then is also the identity, and if the square-root degree matrix, then the square-root degree matrix on is given by

 ΠD1/2ΠT,

with a similar formula holding when We define and to be the frame and the weighting vector on corresponding to and by

 W′\coloneqqΠWΠT\coloneqq{ΠΨjΠT,ΠΦΠT}j∈Jandμ′=Πμ, (12)

and we let and denote the corresponding windowed and non-windowed scattering transforms on

To understand we note that the natural analog of on is given by

 T′=ΠTΠT.

Therefore, Lemma 2 implies that for any for any polynomial

 p((M′)−1TM′) =(M′)−1p(T′)M′ =(ΠMΠT)−1p(ΠTΠT)(ΠMΠT) =(ΠM−1ΠT)Πp(T)ΠT(ΠMΠT) =ΠM−1p(T)MΠT =Πp(M−1TM)ΠT.

with a similar formula holding Therefore, if is either of the wavelet transforms or then is analogous wavelet transform constructed from

###### Theorem 4.

Both and the windowed scattering transform are equivariant to permutations. That is, if is any permutation and is defined as in (12), then for all

 U′Πx=ΠUxandS′Πx=ΠSx.
###### Proof.

Let be a permutation. Since and it follows that for all

 U′[j]Πx=MΨ′jΠx=MΠΨjΠTΠx=MΠΨjx=ΠMΨjx=ΠU[j]x.

For we have Therefore, it follows inductively that is equivariant to permutations. Since we have that

 S′Πx=Φ′U′Πx=ΠΦΠTΠUx=ΠSx.

Thus, the windowed scattering transform is permutation equivariant as well. ∎

###### Theorem 5.

The non-windowed scattering transform is fully permutation invariant, i.e., for all permutations and all

 ¯¯¯¯S′Πx=¯¯¯¯Sx.
###### Proof.

Since is permutation equivariant by Theorem 4 and we may use the fact that and that to see that for any and any

 ¯¯¯¯S′[j]Πx=⟨μ′,U′[j]Πx⟩M′=⟨M′Πμ,M′ΠU[j]x⟩2=⟨ΠMμ,ΠMU[j]x⟩2=⟨Mμ,MU[j]x⟩2=¯¯¯¯S[j]x.

Next, we will use Theorem 4 to show that if is either or and then the windowed scattering transform is invariant on up to a factor depending on the scale of the low-pass filter. We note that Therefore, decays exponentially fast as and so if is large, the right hand side of (13) will be nearly zero. We also recall that if our spectral function is given by then this choice of will imply that

###### Theorem 6.

Let and let be either or Then the windowed-scattering transform is permutation invariant up to a factor depending on Specifically, for all and for all

 ∥S′Πx−Sx∥ℓ2(L2(G,M))≤λt1∥Π−I∥M(1+n∥d∥∞dmin)1/2∥x∥M, (13)

where if and if

###### Proof.

By Theorem 4, and the fact that we see that

 ∥S′Πx−Sx∥ℓ2(L2(G,M)) =∥ΠSx−Sx∥ℓ2(L2(G,M)) =∥ΠΦUx−ΦUx∥ℓ2(L2(G,M)) ≤∥ΠΦ−Φ∥M∥Ux∥ℓ2(L2(G,M)). (14)

Let if and let if so that in either case Let (3) implies that for any

 Tty=n−1∑i=0λti⟨vi,z⟩2vi.

Therefore, by Lemma 2 and the relationship we have

 Ktx=M−1Tt(Mx)=n−1∑i=0λti⟨vi,Mx⟩2M−1vi=n−1∑i=0λti⟨vi,Mx⟩2ui.

Since and the assumption that implies that Therefore, and so

Therefore, since forms an orthonormal basis for we have that by Parseval’s identity

 ≤∥Π−I∥2M∥∥ ∥∥n−1∑i=1λti⟨vi,Mx⟩2ui∥∥ ∥∥M =∥Π−I∥2Mn−1∑i=1λ2ti|⟨vi,Mx⟩2|2 ≤∥Π−I∥2Mλ2t1n−1∑i=1|⟨vi,Mx⟩2|2 ≤∥Π−I∥2Mλ2t1∥Mx∥22 =∥Π−I∥2Mλ2t1∥x∥2M. (15)

To bound we note that by Theorem 2,

 ∥Ux∥2ℓ2(L2(G,M)) =∥x∥2M+⎛⎝