Incidence Networks for Geometric Deep Learning

Incidence Networks for Geometric Deep Learning

Abstract

Sparse incidence tensors can represent a variety of structured data. For example, we may represent attributed graphs using their node-node, node-edge, or edge-edge incidence matrices. In higher dimensions, incidence tensors can represent simplicial complexes and polytopes. In this paper, we formalize incidence tensors, analyze their structure, and present the family of equivariant networks that operate on them. We show that any incidence tensor decomposes into invariant subsets. This decomposition, in turn, leads to a decomposition of the corresponding equivariant linear maps, for which we prove an efficient pooling-and-broadcasting implementation. We demonstrate the effectiveness of this family of networks by reporting state-of-the-art on graph learning tasks for many targets in the QM9 dataset.

\mdfdefinestyle

MyFramelinecolor=white, outerlinewidth=1pt, roundcorner=2pt, innertopmargin=nerbottommargin=nerrightmargin=5pt, innerleftmargin=5pt, backgroundcolor=black!3!white \makesavenoteenvtabular \makesavenoteenvtable \newboolAPX \booltrueAPX

\printAffiliationsAndNotice\icmlEqualContribution

1 Introduction

Many interesting data structures have alternative tensor representations. For example, we can represent graphs using both node-node and node-edge sparse incidence matrices. We can extend this incidence representation to data defined on simplicial complexes and polytopes of arbitrary dimension, such as mesh, polygons, and polyhedra. The goal of this paper is to design deep models for these structures.

We represent an attributed geometric structure using its incidence tensor, which models the incidence of its faces. For example, rows and columns in a node-edge incidence matrix are indexed by faces of size one (nodes) and two (edges). Moreover each edge (column) is incident to exactly two nodes (rows). The sparsity pattern of the incidence tensor has important information about the geometric structure. This is because sparsity preserving permutation of nodes often match the automorphism group (a.k.a. symmetry group) of the geometric object; see Fig. 1(a,b).

We are interested in designing models that are informed by the symmetry of the underlying structure. We do so by making the model equivariant (invariant) to symmetry transformations. When using the incidence tensor representation, a natural choice of symmetry group is the automorphism group of the geometric object. However, when working with a dataset comprising of different instances (e.g., different graphs or polyhedra), using individual automorphism group is not practical. This is because each symmetry group dictates a different equivariant model, and we cannot train a single model on the whole dataset. A solution is to use the symmetric group (the group of all permutations of nodes) for all instances, which implicitly assumes a dense structure where all faces are present, e.g., all graphs are fully connected; see Fig. 1(c,d).

Next, we show that under the action of the symmetry group, any incidence tensor decomposes into orbits, where each orbit corresponds to faces of particular size. For example, a node-node incidence matrix decomposes into: 1) diagonals, that can encode node attributes – we call this node vector – and; 2) off-diagonals, corresponding to edge attributes, which we call an edge vector. These are examples of face-vectors in the general setting.

This decomposition of data into face-vectors also breaks up the design of equivariant linear maps for arbitrary incidence tensors into design of such maps between face-vectors of different size. We show that any such linear map can be written as a linear combination of efficient pooling-and-broadcasting operations. These equivariant linear maps replace the linear layer in a feedforward neural network to create an incidence network. We provide an extensive experimental evaluation of different incidence networks on one of the largest graph datasets (QM9). The results support our theoretical findings and establish a new state-of-the-art for several targets.

Figure 1: a) The sparsity pattern in the node-face incidence matrix for an (undirected) triangular bi-pyramid (concatenation of two tetrahedra). Note that each face (column) is adjacent to exactly three nodes. b) Nodes are permuted using a member of the symmetry group of the object . This permutation of nodes imposes a natural permutation action on the faces in which . Note that permutations from the automorphism group preserve the sparsity pattern of the incidence matrix. c) The geometric object of (a) after densification: the incidence matrix now includes all possible faces of size three, however, it still maintains a specific sparsity pattern. d) After densifying the structure, any permutation of nodes (and corresponding permutation action on faces of the dense incidence matrix) preserves its sparsity pattern.

2 Related Works

Deep learning with structured data is a very active area of research. Here, we briefly review some of the closely related works in graph learning and equivariant deep learning.

Graph Learning. The idea of graph neural networks goes back to the work of Scarselli et al. (2009). More recently, Gilmer et al. (2017) introduced the message passing neural networks and showed that it subsumes several other graph neural network architectures Li et al. (2015); Duvenaud et al. (2015); Kearnes et al. (2016); Schütt et al. (2017), including the spectral methods that follows. Another body of work in geometric deep learning extend convolution to graphs using the spectrum of the graph Laplacian Bronstein et al. (2017); Bruna et al. (2014). While principled, in its complete form, the Fourier bases extracted from the Laplacian are instance dependent and lack of any parameter or function sharing across the graphs limits their generalization. Following Henaff et al. (2015); Defferrard et al. (2016), Kipf and Welling (2016) propose a single-parameter simplification of spectral method that addresses this limitation and it is widely used in practice. Some notable extensions and related ideas include Veličković et al. (2017); Hamilton et al. (2017); Xu et al. (2018); Zhang et al. (2018); Ying et al. (2018); Morris et al. (2018); Maron et al. (2019a).

Equivariant Deep Learning. Equivariance constrains the predictions of a model under a group of transformations of the input, such that

(1)

Here is a “consistently” defined transformation of parameterized by , while denotes the corresponding transformation of the output. For example, in a convolution layer LeCun et al. (1998), is the group of discrete translations, and Eq. 1 means that any translation of the input leads to the same translation of the output. When is a standard feed-forward layer with parameter matrix , the equivariance property Eq. 1 enforces parameter-sharing in Shawe-Taylor (1993); Ravanbakhsh et al. (2017).

Most relevant to our work, are equivariant models proposed for geometric deep learning that we review next. Covariant compositional network Kondor et al. (2018) extends the message passing framework by considering basic tensor operations that preserve equivariance. While the resulting architecture can be quite general, it comes at the cost of efficiency. 1 Hartford et al. (2018) propose a linear map equivariant to independent permutation of different dimensions of a tensor. Equivariant graph networks of Maron et al. (2018) model the interactions within a set of nodes. We will further discuss this model as a special type of and incidence network. These equivariant layers for interactions between and within sets are further generalized to multiple types of interactions in Graham and Ravanbakhsh (2019). Several recent works investigate the universality of such equivariant networks Maron et al. (2019b); Keriven and Peyré (2019); Chen et al. (2019). A flexible approach to equivariant and geometric deep learning where a global symmetry is lacking is proposed in Cohen et al. (2019).

3 Graphs

In this section we discuss graphs and later generalize the arguments to a broader set of geometric objects in Section 5. Without loss of generality, in the following we assume a fully-connected graph , where denotes a set of nodes, and a set of edges.

3.1 Linear Layers

There are two standard ways one can represent . We can use a node-node incidence matrix indexed by . Node and edge features are encoded as diagonal and off-diagonal entries of , respectively. Here, we assume single input and output channel (i.e., scalar node and edge attributes) for simplicity; ’results trivially generalize to multiple channels.

Figure 2: Parameter-sharing in the receptive field of equivariant map: (left) 15 parameter layer for node-node incidence, (middle) 7 parameter layer for node-edge incidence, and (right) 4 parameter layer node-edge incidence Block structure. The adjacency structure of the undirected graph with 5 nodes and 7 edges is evident from the sparsity patterns. Here, each inner block shows the parameter-sharing in the receptive field of the corresponding output unit. For example, the block on row 1 and column of the (middle) figure shows the dependency of the output incidence matrix at that location on the entire input incidence matrix. Decomposition. The total number of unique parameters in (left) is 15 compared to 7 for the (middle). As shown in Section 3.1 the 15 () parameter model decomposes into 4 linear maps, one of which is isomorphic to the 7 parameter model. One could also identify the 7 unique symbols of (middle) in the parameter-sharing of the (left). Note that these symbols appear on the off-diagonal blocks and off-diagonal elements within blocks, corresponding to input and output edges.

Consider the group of all permutations of nodes and its action on . This action, simultaneously permutes the rows and columns of . Let be a linear map equivariant to ,

(2)

The map is constrained so that permuting the rows and columns of the input will have the same effect on the output. As shown in Maron et al. (2018) this condition constrains the number of independent parameters in to fifteen, regardless of the size of the graph.

Alternatively, one can represent with a node-edge incidence matrix where labels nodes and the unordered pair with labels edges. has a special sparsity pattern: iff node is incident to the edge . We identify this sparsity pattern implicitly by writing , so that we only index non-zero entries. Edge features are encoded at and for two different edge-directions of the edge .

The action of on is also a simultaneous permutation of rows and columns, where the permutation of columns is defined by the action on the node pair that identifies each edge. . This action preserves the sparsity pattern of defined above. The maximal equivariant linear map acting on is constrained to have seven independent parameters (assuming a single input and output channel). Finally, one may also consider edge-edge incidence matrix, producing yet another type of equivariant linear layer.

Since both and represent the same graph , and the corresponding linear maps are equivariant to the same group , one expects a relationship between the two representations and maps. This relationship is due to decomposition of into orbits under the action of . In particular, decomposes into two orbits: diagonal elements () and off-diagonal elements (), where each subset is invariant under the action of – that is simultaneous permutation of rows and columns does not move a diagonal element to off-diagonal or vice-versa. We write this decomposition as

(3)

where the diagonal orbit is isomorphic to the vector of nodes and the off-diagonal orbit is isomorphic to the vector of edges with . Consider the map defined above, in which both input and target decompose in this way. It follows that the map itself also decomposes into four maps

(4)

where maps a face-vector of faces of size to a face-vector of faces of size . Equivariance to for each of these maps constrains the number of independent parameters for each of them: is the equivariant layer used in DeepSets Zaheer et al. (2017), and has two parameters. and each have three parameters, and has seven unique parameter, and maps input edge features into target edge features. One key point is that the edge-vector is isomorphic to the node-edge incidence matrix and thus the seven-parameters equivariant map for is exactly of Eq. 4.

One can also encode node features in a node-edge incidence matrix by doubling the number of channels and broadcasting node-features across all edges incident to a node. In this case all fifteen operations are retrieved, and the two layers for and are equivalent. These two linear maps are visualized in Fig. 2 (left, middle).

In Section 5 we will generalize representation, decomposition, and pooling-and-broadcasting implementation of equivariant layers to higher-order geometric objects.

3.2 Sparse Tensors and Non-Linear Layers

So far we have discussed equivariant linear layers for a fully connected graph. This means dense input/output node-node incidence , or equivalently a node-edge incidence with the sparsity pattern described in the previous section (which is maintained by the action of ). To avoid the cost of a dense representation, one may apply a sparsity mask after the linear map, while preserving equivariance:

(5)

where is the equivariant linear map of Section 3.1, is the sparsity mask, and is the Hadamard product. For example, assuming the layer output has the same shape as the input, one might choose to preserve the sparsity of the input. In this case, will have zero entries where the input has zero entries, and ones otherwise. However, the setting of Eq. 5 is more general as input and output may have different forms. Since the sparsity mask depends on the input, the map of Eq. 5 is now non-linear. In practice, rather than calculating the dense output and applying the sparsity mask, we directly produce the non-zero values.

3.3 Further Relaxation of the Symmetry Group

The neural layers discussed so far are equivariant to the group , where is the number of nodes. A simplifying alternative is to assume an independent permutation of rows and columns of the node-node or node-edge matrix. This is particularly relevant for the node-edge matrix, where one can consider node and edges as two interacting sets of distinct objects. The corresponding -equivariant layer, where is the number of edges, was introduced in Hartford et al. (2018), has 4 unique parameters, and it is substantially easier to implement compared to the layers introduced so far. In Appendix A we show how to construct a sparsity-preserving (and therefore non-linear) layer for this case. Even though a single layer is over-constrained by these symmetry assumptions, in the appendix, we prove that two such layers generate exactly the same node and edge features as a single linear layer for a node-node incidence. These results are corroborated by good performance on experiments.

4 Experiments

incidence networks
target enn-s2s nmp-edge schnet Cormorant WaveScatt

0.092 0.077 0.235 0.092 0.160 0.028 0.039 0.030 0.036 0.037 0.033 0.033
0.040 0.032 0.033 0.031 0.049 0.019 0.025 0.028 0.030 0.023 0.028 0.029
G 0.019 0.012 0.014 - - 0.001 0.001 0.008 0.008 0.003 0.011 0.010
H 0.017 0.011 0.014 - - 0.001 0.001 0.008 0.008 0.002 0.010 0.010
0.043 0.036 0.041 0.036 0.085 0.098 0.191 0.089 0.116 0.097 0.101 0.090
0.037 0.030 0.034 0.036 0.076 0.049 0.062 0.049 0.052 0.054 0.054 0.052
gap 0.069 0.058 0.063 0.073 0.118 0.073 0.062 0.068 0.080 0.087 0.078 0.071
0.030 0.029 0.033 0.130 0.340 0.040 0.082 0.040 0.067 0.038 0.055 0.060
0.180 0.072 0.073 0.673 0.410 0.010 0.012 0.017 0.017 0.009 0.021 0.017
U 0.019 0.010 0.019 - - 0.001 0.002 0.007 0.009 0.002 0.010 0.009
0.019 0.010 0.014 0.028 0.022 0.001 0.001 0.008 0.008 0.003 0.010 0.010
ZPVE 0.0015 0.0014 0.0017 0.0019 0.002 0.006 0.008 0.008 0.011 0.007 0.010 0.009
Table 1: Mean absolute errors on the QM9 targets. enn-s2s is the neural message passing of Gilmer et al. (2017) and nmp-edge Jørgensen et al. (2018) is its improved variation with edge updates. schnet uses a continuous filter convolution operation Schütt et al. (2018). Cormorant uses a rotation equivariant architecture Anderson et al. (2019). The results of WaveScatt Hirn et al. (2017) were taken from Anderson et al. (2019). Results where an incidence network achieves state-of-the-art is in bold. See Table 2 for a summary of different types of incidence networks. Target units are reported in Table 4 (Appendix F).

Many deep models for graphs have been applied to the task of predicting molecular properties Gilmer et al. (2017); Schütt et al. (2018, 2017); Jørgensen et al. (2018); Morris et al. (2018); Unke and Meuwly (2019); Kondor et al. (2018); Anderson et al. (2019); interestingly, most, if not all of these methods are considered message passing methods.2 A drawback of a fully-fledged message passing scheme compared to incidence networks is its scalability. However, this is not an issue for QM9 dataset Ramakrishnan et al. (2014) that contains 133,885 small organic molecules.

Our architecture for all models is a simple stack of equivariant layers:

where the final layer has a single channel followed by pooling, which produces a scalar value for the target. For details on the dataset, architecture and training procedure, see Appendix F.

Table 1 reports previous state-of-the-art, as well as our results using various members of the incidence network family. The abbreviation used for the results include: Sparse vs. Dense (/); directed vs. undirected (/) and node-edge vs. node-node (/). For example, uses sparse non-linear layers that operate on directed node-node incidence and produce directed asymmetric outputs. Finally () identifies the layer that uses the larger symmetry. See Table 2 more details on each incidence network model.

Layer Name Edge Type Graph Type Num. Params. Layer Type Sym. Group

Node-Node

Directed Dense 15 Linear
Undirected Dense 9 Linear
Directed Sparse 15 Non-Linear
Undirected Sparse 9 Non-Linear

Node-Edge

Undirected Dense 7 Linear
Undirected Sparse 7 Non-Linear
Undirected Sparse 4 Non-Linear
Table 2: Details of the layers reported in Table 1.

All models match or outperform state-of-the-art in 7/12 targets (bold values). They also show a similar performance despite using different representations, supporting our theoretical analysis regarding the comparable expressiveness node-node and node-edge representation. Dense models generally perform slightly better at the cost of run-time for training. Finally, we note that the 4 parameter model () of Section 3.3 performs almost as well, despite using an over-constraining symmetry group, further supporting our theoretical results outlined in Section 3.3 and explained in Appendix A.

5 Higher Order Geometric Structures

In Section 5.1 we define incidence tensors, which generalize node-node and node-edge matrices of graphs. We discuss several examples, showing how they can represent geometric objects such as a graphs, polytopes, or simplicial complexes. In Section 5.2 we generalize the orbit decomposition introduced in Section 3.1 to generic incidence tensors. Finally, in Section 5.3 we show how to build equivariant layers using a linear combination of simple pooling-and-broadcasting operations for arbitrary incidence tensors.

5.1 Incidence Tensors

Recall that denote a set of nodes. A directed face of size is an ordered tuple of distinct nodes . Following a similar logic, an undirected face , is a subset of of size . We use when identifying the size of the face – i.e.. For example, identifies an edge in a graph, or a mesh, while is a triangle in a triangulated mesh.

An incidence tensor is a tensor of order , where each dimension is indexed by all faces of size . For example, if indexes nodes, and identifies an edge, becomes a node-edge incidence matrix. An incidence tensor has a sparsity structure, identified by a set of constraints , where all the indices are equal for any non-zero entry of . For example, we have , only if . Therefore .

Therefore, while in general the pair defines the incidence tensor, whenever it is clear from context, we will only use to denote it. This formalism can represent a variety of different geometric structure as demonstrated in the following sections.

Simplicial Complexes

Before discussing general simplicial complexes let us review graphs as an example of incidence tensors.

The node-node incidence matrix is an incidence tensor indexed by a pair of nodes, with no sparsity constraints. We denoted it with in Section 3.1 for simplicity. The node-edge incidence matrix is denoted by the pair . It is indexed by nodes and edges . The entries can be non-zero only when , meaning that the edge is adjacent to the node . An alternative notation is . Again, we denoted it simply with in Section 3.1. We have also denoted node and edge vectors with and , respectively. As a final example, would denote an edge-edge incidence matrix whose entries are non-zero wherever two edges are incident.

Let us now move to the definition of a general (undirected) simplicial complex. An abstract simplicial complex is a collection of faces, closed under the operation of taking subsets – that is . Dimension of a face is its size minus one. Maximal faces are called facets and the dimension of is the dimension of its largest facet. For example, an undirected graph is a one-dimensional simplicial complex. Each dimension of an incidence tensor may be indexed by faces of specific dimension. Two undirected faces of different dimension are incident if one is a subset of the other. This type of relationship as well as alternative definitions of incidence between faces of the same dimension can be easily accommodated in the form of equality constraints in .

Although not widely used, a directed simplicial complex can be defined similarly. The main difference is that faces are sequences of the nodes, and is closed under the operation of taking a subsequence. As one might expect, the incidence tensor for directed simplicial complexes can be built using directed faces in our notation.

Example 1.

A zero-dimensional simplicial complex is a set of points that we may represent using an incidence vector. At dimension one, we get undirected graphs, where faces of dimension one are the edges. Triangulated mesh is an example of two-dimensional simplicial complex; see figure below.

The triangular bi-pyramid of Fig. 1 is an example of 3 dimensional simplicial complex with 5 nodes, 9 edges, 7 faces of size 3, and two faces of size 4. The node-face incidence matrix in Fig. 1(a) is expressed by in our formalism.

Polygons, Polyhedra, and Polytopes

Another family of geometric objects with incidence structure is polytope. A formal definition of abstract polytope and its representation using incidence tensors is given in the Appendix D. A polytope is a generalization of polygone and polyhedron to higher dimensions. The structure of an (abstract) polytope is encoded using a partially ordered set (poset) that is graded, meaning that each element of the poset has a rank. For example, Fig. 3, shows the poset for a cube, where each level is a different rank, and subsets in each level identify faces of different size (nodes, edges, and squares). The idea of using incidence tensor representation for a polytope, is similar to its use for simplicial complexes. Each dimension of indexes faces of different rank. Two faces of the same dimension may be considered incident if they have a face of specific lower rank in common. We may also define two faces of different dimension incident if one face is a subset of the other – i.e. in the partial order.

Figure 3: Representation of a cube as a (graded) partially ordered set. The incidence structure of the poset as well as face attributes is encoded in an incidence tensor.

5.2 Symmetry & Decomposition

The automorphism group associated with an incidence tensor is the set of all permutations of nodes that maps every face to another face, and therefore preserve the sparsity

where the action of on the faces is naturally defined as

(6)

See Fig. 1(a,b) for an example. We may then construct -equivariant linear layers through parameter-sharing. However, the constraints on this linear operator varies if our dataset has incidence tensors with different sparsity patterns. For example, a directed graph dataset may contain a fully connect graph with automorphism group and a cyclic graph with automorphism group . For these two graphs, node-node and node-edge incidence matrices are invariant to the corresponding automorphism groups, necessitating different constraints on their linear layer. To remedy the problem with model-sharing across instances, we densify all incidence tensors so that all directed or undirected faces of a given dimension are present. Now, one may use the same automorphism group across all instances; see Fig. 1(c,d). Next, we consider the incidence tensor as a -set, and identify the orbits of action. {mdframed}[style=MyFrame]

Theorem 5.1.

The action of on any incidence tensor decomposes into orbits that are each isomorphic to a face-vector:

(7)

where is the multiplicity of faces of size . The value of is equal to the number of partitioning of the set of all indices into non-empty partitions, such that belong to different partitions, and members of belong to the same partition.

The proof appears in Appendix B.

Example 2 (Node-adjacency tensors).

Consider an order node-node-…-node incidence tensor with no sparsity constraints. In this case, the multiplicity of Eq. 7 corresponds to the number of ways of partitioning a set of elements into non-empty subsets and it is also known as Stirling number of the second kind (written as ). Each partition of size identifies a face-vector for a face of size . These faces can be identified as hyper-diagonals of order in the original adjacency tensor . For example, as shown in the figure below, decomposes into a node-vector (the main diagonal of the adjacency cube), three edge-vectors (isomorphic to the three diagonal planes of the cube adjacency, with the main diagonal removed), and one hyper-edge-vector (isomorphic to the adjacency cube, where the main diagonal and diagonal planes have been removed). Here, , , and .

5.3 Equivariant Maps for Incidence Tensors

As shown in the previous section, any incidence tensor can be decomposed into disjoint union of face-vectors, that are invariant sets under the action of symmetry group. An implication is that any equivariant map from an incidence tensor to another also decomposes into equivariant maps between face-vectors.

Let be a linear function (here represented as a tensor) that maps a vector of faces of size to a vector of faces of size ,

(8)

where identifies faces of size , and (using Einstein notation) repeated indices on are summed over. Equivariance to is realized through a symmetry constraint on ,

(9)

which ties the elements within each orbit of the so called diagonal -action on ; see Fig. 2 (left, middle) for a graph example.

Pool & Broadcast Interpretation

Each unique parameter in the constrained corresponds to a linear operation that has a pool and broadcast interpretation – that is any linear equivariant map between two incidence tensors can be written as a linear combination of pooling-broadcasting operations. Moreover, this interpretation allows for a linear-time implementation of the equivariant layers, as we avoid the explicit construction of .

Definition 1 (Pooling).

Given a face vector , for , the pooling operation sums over the indices in :

In practice, the summation in the definition may be replaced with any permutation-invariant aggregation function. We use mean-pooling in our experiments.

Definition 2 (Broadcasting).

broadcasts , a faces vector of size , over a target vector of faces of size . We identify with a sequence of node indices of the target face-vector, with , and we broadcast across the remaining node indices – that is

For example, given an edge-vector , broadcasts to a triangle-vector (i.e., vector of faces of size ), where is mapped to the first two node indices and broadcasted along the third. The important fact about pool and broadcast operations defined above is that they are equivariant to permutation of nodes. In fact it turns out that an equivariant can only linearly combine pooling and broadcasting of input incidence tensor into an output tensor. {mdframed}[style=MyFrame]

Theorem 5.2.

Any equivariant linear map as defined in Eq. 8 can be written as

(10)

The proof appears in Appendix B. The sum of the pooling-and-broadcasting operations in Eq. 10 includes pooling the node indices of the input face-vector in all possible ways, and broadcasting the resulting collection of face-vectors to the target face-vector, again in all ways possible. is the parameter associated with each unique pooling-and-broadcasting combination.

The number of operations in Eq. 10, is given by

(11)

This counts the number of possible choices of indices out of input indices in Eq. 8 and indices out of output indices to for pool and broadcast. Once this set is fixed there are different ways to match input indices to output indices.

Decomposition of Equivariant Maps

Let be an equivariant map between arbitrary incidence tensors, where both input and output decompose according to Eq. 7. Using the equivariant maps of Eq. 10, we get a decomposition of into all possible combination of input-output face vectors

(12)

where for each copy (out of copies) of the output face of size , we are summing over all the maps produced by different input faces having different multiplicities. Use of and in the map is to indicate that for each input-output copy, the map uses a different set of parameters. The upshot is that input and output multiplicities play a role similar to input and output channels.

The total number of independent parameters in a layer is

(13)

where is given by Eq. 11.

Example 3 (Node-adjacency tensors).

This example, is concerned with the incidence representation used in equivariant graph networks of Maron et al. (2018) and derives their model as a special case, using our pool/broadcast layer and face-vectors decomposition. For equivariant layer that maps a node-node-…-node incidence tensor of order (as outlined in Example 2) to the same structure, the decomposition in terms of face-vectors reads

where is the Stirling number of the second kind; see Example 2. The total number of operations according to Eq. 13 is then given by

In the last line, is the Bell number and counts the number of unique partitions of a set of size . To see the logic in the final equality: first divide in half. Next, partition each half into subsets of different sizes () and choose of these partitions from each half and merge them in pairs. The first two terms count the number of ways we can partition each half into (or ) partitions and select a subset of size among them. The term accounts for different ways in which partitions can be aligned. This result agrees with the result of Maron et al. (2018). Therefore one may implement the hyper-graph networks using efficient pooling-and-broadcasting operations outlined in Eq. 10.

Recall that when discussing equivariant layers for graphs, we also considered independent permutations of rows and columns in a node-edge incidence matrix, and claimed that despite having only 4 parameters, stacking two such layers (with additional channels) is equivalent to the 15 parameter model. In Appendix E, a similar result is given for higher dimensions, showing that one may use as the symmetry group of an incidence tensor, where the equivariant model has parameters.

6 Conclusion

This paper introduces a general approach to learning equivariant models for a large family of structured data through their incidence tensor representation. In particular, we showed various incidence tensor representations for graphs, simplicial complexes, and abstract polytopes. The proposed family of incidence networks are 1) modular: they decompose to simple building blocks; 2) efficient: they all have linear-time pooling-and-broadcasting implementation, and; 3) effective: various members of this family achieve state-of-the-art performance for graphs using a simple architecture.

In our systematic study of this family, we discussed implications of 1) added symmetry due to undirected faces; 2) sparsity preserving equivariant maps, and; 3) the successive relaxation of the symmetry group . Here, moving to a larger group simplifies the neural layer by reducing the number of unique parameters (and linear operations), while increasing its bias. Application of incidence networks to different domains, such as learning on triangulated mesh is a direction that we hope to explore in the future.

Footnotes

  1. Possibly due to the complexity of CCN architecture, experiments in Kondor et al. (2018) do not use all attributes in QM9 dataset and their results are not comparable to state-of-the-art.
  2. We were not able to compare our experimental results to Morris et al. (2018); Maron et al. (2019a) and the results reported in Wu et al. (2018) due to their choice of using a larger training split. Moreover, the raw QM9 dataset used by Morris et al. (2018) contains 133,246 molecules, which has 639 fewer molecules than the dataset used in our experiments.

References

  1. Cormorant: covariant molecular neural networks. arXiv preprint arXiv:1906.04015. Cited by: Table 1, §4.
  2. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §2.
  3. Spectral networks and locally connected networks on graphs. ICLR. Cited by: §2.
  4. On the equivalence between graph isomorphism testing and function approximation with gnns. arXiv preprint arXiv:1905.12560. Cited by: §2.
  5. Gauge equivariant convolutional networks and the icosahedral cnn. arXiv preprint arXiv:1902.04615. Cited by: §2.
  6. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852. Cited by: §2.
  7. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, Cited by: §2.
  8. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212. Cited by: §2, Table 1, §4.
  9. Deep models for relational databases. arXiv preprint arXiv:1903.09033. Cited by: §2.
  10. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §2.
  11. Deep models of interactions across sets. In Proceedings of the 35th International Conference on Machine Learning, pp. 1909–1918. Cited by: §2, §3.3.
  12. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163. Cited by: §2.
  13. Wavelet scattering regression of quantum chemical energies. Multiscale Modeling & Simulation 15 (2), pp. 827–863. Cited by: Table 1.
  14. Neural message passing with edge updates for predicting properties of molecules and materials. arXiv preprint arXiv:1806.03146. Cited by: Table 1, §4.
  15. Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design 30 (8), pp. 595–608. Cited by: §2.
  16. Universal invariant and equivariant graph neural networks. arXiv preprint arXiv:1905.04943. Cited by: §2.
  17. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
  18. Covariant compositional networks for learning graphs. arXiv preprint arXiv:1801.02144. Cited by: §2, §4, footnote 1.
  19. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §2.
  20. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §2.
  21. Provably powerful graph networks. arXiv preprint arXiv:1905.11136. Cited by: §2, footnote 2.
  22. Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902. Cited by: §2, §3.1, Example 3.
  23. On the universality of invariant networks. arXiv preprint arXiv:1901.09342. Cited by: §2.
  24. Weisfeiler and leman go neural: higher-order graph neural networks. arXiv preprint arXiv:1810.02244. Cited by: §2, §4, footnote 2.
  25. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data 1, pp. 140022. Cited by: §4.
  26. Equivariance through parameter-sharing. In Proceedings of the 34th International Conference on Machine Learning, JMLR: WCP, Vol. 70. Cited by: §2.
  27. The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.
  28. Quantum-chemical insights from deep tensor neural networks. Nature communications 8, pp. 13890. Cited by: §2, §4.
  29. SchNet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics 148 (24), pp. 241722. Cited by: Table 1, §4.
  30. Symmetries and discriminability in feedforward network architectures. IEEE Transactions on Neural Networks 4 (5), pp. 816–826. Cited by: §2.
  31. PhysNet: a neural network for predicting energies, forces, dipole moments and partial charges. arXiv preprint arXiv:1902.08408. Cited by: §4.
  32. Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §2.
  33. MoleculeNet: a benchmark for molecular machine learning. Chemical science 9 (2), pp. 513–530. Cited by: footnote 2.
  34. Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536. Cited by: §2.
  35. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810. Cited by: §2.
  36. Deep sets. In Advances in Neural Information Processing Systems, Cited by: §3.1.
  37. An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
409290
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description