Incidence Networks for Geometric Deep Learning
Abstract
Sparse incidence tensors can represent a variety of structured data. For example, we may represent attributed graphs using their nodenode, nodeedge, or edgeedge incidence matrices. In higher dimensions, incidence tensors can represent simplicial complexes and polytopes. In this paper, we formalize incidence tensors, analyze their structure, and present the family of equivariant networks that operate on them. We show that any incidence tensor decomposes into invariant subsets. This decomposition, in turn, leads to a decomposition of the corresponding equivariant linear maps, for which we prove an efficient poolingandbroadcasting implementation. We demonstrate the effectiveness of this family of networks by reporting stateoftheart on graph learning tasks for many targets in the QM9 dataset.
MyFramelinecolor=white, outerlinewidth=1pt, roundcorner=2pt, innertopmargin=nerbottommargin=nerrightmargin=5pt, innerleftmargin=5pt, backgroundcolor=black!3!white \makesavenoteenvtabular \makesavenoteenvtable \newboolAPX \booltrueAPX
1 Introduction
Many interesting data structures have alternative tensor representations. For example, we can represent graphs using both nodenode and nodeedge sparse incidence matrices. We can extend this incidence representation to data defined on simplicial complexes and polytopes of arbitrary dimension, such as mesh, polygons, and polyhedra. The goal of this paper is to design deep models for these structures.
We represent an attributed geometric structure using its incidence tensor, which models the incidence of its faces. For example, rows and columns in a nodeedge incidence matrix are indexed by faces of size one (nodes) and two (edges). Moreover each edge (column) is incident to exactly two nodes (rows). The sparsity pattern of the incidence tensor has important information about the geometric structure. This is because sparsity preserving permutation of nodes often match the automorphism group (a.k.a. symmetry group) of the geometric object; see Fig. 1(a,b).
We are interested in designing models that are informed by the symmetry of the underlying structure. We do so by making the model equivariant (invariant) to symmetry transformations. When using the incidence tensor representation, a natural choice of symmetry group is the automorphism group of the geometric object. However, when working with a dataset comprising of different instances (e.g., different graphs or polyhedra), using individual automorphism group is not practical. This is because each symmetry group dictates a different equivariant model, and we cannot train a single model on the whole dataset. A solution is to use the symmetric group (the group of all permutations of nodes) for all instances, which implicitly assumes a dense structure where all faces are present, e.g., all graphs are fully connected; see Fig. 1(c,d).
Next, we show that under the action of the symmetry group, any incidence tensor decomposes into orbits, where each orbit corresponds to faces of particular size. For example, a nodenode incidence matrix decomposes into: 1) diagonals, that can encode node attributes – we call this node vector – and; 2) offdiagonals, corresponding to edge attributes, which we call an edge vector. These are examples of facevectors in the general setting.
This decomposition of data into facevectors also breaks up the design of equivariant linear maps for arbitrary incidence tensors into design of such maps between facevectors of different size. We show that any such linear map can be written as a linear combination of efficient poolingandbroadcasting operations. These equivariant linear maps replace the linear layer in a feedforward neural network to create an incidence network. We provide an extensive experimental evaluation of different incidence networks on one of the largest graph datasets (QM9). The results support our theoretical findings and establish a new stateoftheart for several targets.
2 Related Works
Deep learning with structured data is a very active area of research. Here, we briefly review some of the closely related works in graph learning and equivariant deep learning.
Graph Learning. The idea of graph neural networks goes back to the work of Scarselli et al. (2009). More recently, Gilmer et al. (2017) introduced the message passing neural networks and showed that it subsumes several other graph neural network architectures Li et al. (2015); Duvenaud et al. (2015); Kearnes et al. (2016); Schütt et al. (2017), including the spectral methods that follows. Another body of work in geometric deep learning extend convolution to graphs using the spectrum of the graph Laplacian Bronstein et al. (2017); Bruna et al. (2014). While principled, in its complete form, the Fourier bases extracted from the Laplacian are instance dependent and lack of any parameter or function sharing across the graphs limits their generalization. Following Henaff et al. (2015); Defferrard et al. (2016), Kipf and Welling (2016) propose a singleparameter simplification of spectral method that addresses this limitation and it is widely used in practice. Some notable extensions and related ideas include Veličković et al. (2017); Hamilton et al. (2017); Xu et al. (2018); Zhang et al. (2018); Ying et al. (2018); Morris et al. (2018); Maron et al. (2019a).
Equivariant Deep Learning. Equivariance constrains the predictions of a model under a group of transformations of the input, such that
(1) 
Here is a “consistently” defined transformation of parameterized by , while denotes the corresponding transformation of the output. For example, in a convolution layer LeCun et al. (1998), is the group of discrete translations, and Eq. 1 means that any translation of the input leads to the same translation of the output. When is a standard feedforward layer with parameter matrix , the equivariance property Eq. 1 enforces parametersharing in ShaweTaylor (1993); Ravanbakhsh et al. (2017).
Most relevant to our work, are equivariant models proposed for geometric deep learning that we review next.
Covariant compositional network Kondor et al. (2018) extends the message passing framework by considering basic tensor operations that preserve equivariance.
While the resulting architecture can be quite general, it comes at the cost of efficiency.
3 Graphs
In this section we discuss graphs and later generalize the arguments to a broader set of geometric objects in Section 5. Without loss of generality, in the following we assume a fullyconnected graph , where denotes a set of nodes, and a set of edges.
3.1 Linear Layers
There are two standard ways one can represent . We can use a nodenode incidence matrix indexed by . Node and edge features are encoded as diagonal and offdiagonal entries of , respectively. Here, we assume single input and output channel (i.e., scalar node and edge attributes) for simplicity; ’results trivially generalize to multiple channels.
Consider the group of all permutations of nodes and its action on . This action, simultaneously permutes the rows and columns of . Let be a linear map equivariant to ,
(2) 
The map is constrained so that permuting the rows and columns of the input will have the same effect on the output. As shown in Maron et al. (2018) this condition constrains the number of independent parameters in to fifteen, regardless of the size of the graph.
Alternatively, one can represent with a nodeedge incidence matrix where labels nodes and the unordered pair with labels edges. has a special sparsity pattern: iff node is incident to the edge . We identify this sparsity pattern implicitly by writing , so that we only index nonzero entries. Edge features are encoded at and for two different edgedirections of the edge .
The action of on is also a simultaneous permutation of rows and columns, where the permutation of columns is defined by the action on the node pair that identifies each edge. . This action preserves the sparsity pattern of defined above. The maximal equivariant linear map acting on is constrained to have seven independent parameters (assuming a single input and output channel). Finally, one may also consider edgeedge incidence matrix, producing yet another type of equivariant linear layer.
Since both and represent the same graph , and the corresponding linear maps are equivariant to the same group , one expects a relationship between the two representations and maps. This relationship is due to decomposition of into orbits under the action of . In particular, decomposes into two orbits: diagonal elements () and offdiagonal elements (), where each subset is invariant under the action of – that is simultaneous permutation of rows and columns does not move a diagonal element to offdiagonal or viceversa. We write this decomposition as
(3) 
where the diagonal orbit is isomorphic to the vector of nodes and the offdiagonal orbit is isomorphic to the vector of edges with . Consider the map defined above, in which both input and target decompose in this way. It follows that the map itself also decomposes into four maps
(4) 
where maps a facevector of faces of size to a facevector of faces of size . Equivariance to for each of these maps constrains the number of independent parameters for each of them: is the equivariant layer used in DeepSets Zaheer et al. (2017), and has two parameters. and each have three parameters, and has seven unique parameter, and maps input edge features into target edge features. One key point is that the edgevector is isomorphic to the nodeedge incidence matrix and thus the sevenparameters equivariant map for is exactly of Eq. 4.
One can also encode node features in a nodeedge incidence matrix by doubling the number of channels and broadcasting nodefeatures across all edges incident to a node. In this case all fifteen operations are retrieved, and the two layers for and are equivalent. These two linear maps are visualized in Fig. 2 (left, middle).
In Section 5 we will generalize representation, decomposition, and poolingandbroadcasting implementation of equivariant layers to higherorder geometric objects.
3.2 Sparse Tensors and NonLinear Layers
So far we have discussed equivariant linear layers for a fully connected graph. This means dense input/output nodenode incidence , or equivalently a nodeedge incidence with the sparsity pattern described in the previous section (which is maintained by the action of ). To avoid the cost of a dense representation, one may apply a sparsity mask after the linear map, while preserving equivariance:
(5) 
where is the equivariant linear map of Section 3.1, is the sparsity mask, and is the Hadamard product. For example, assuming the layer output has the same shape as the input, one might choose to preserve the sparsity of the input. In this case, will have zero entries where the input has zero entries, and ones otherwise. However, the setting of Eq. 5 is more general as input and output may have different forms. Since the sparsity mask depends on the input, the map of Eq. 5 is now nonlinear. In practice, rather than calculating the dense output and applying the sparsity mask, we directly produce the nonzero values.
3.3 Further Relaxation of the Symmetry Group
The neural layers discussed so far are equivariant to the group , where is the number of nodes. A simplifying alternative is to assume an independent permutation of rows and columns of the nodenode or nodeedge matrix. This is particularly relevant for the nodeedge matrix, where one can consider node and edges as two interacting sets of distinct objects. The corresponding equivariant layer, where is the number of edges, was introduced in Hartford et al. (2018), has 4 unique parameters, and it is substantially easier to implement compared to the layers introduced so far. In Appendix A we show how to construct a sparsitypreserving (and therefore nonlinear) layer for this case. Even though a single layer is overconstrained by these symmetry assumptions, in the appendix, we prove that two such layers generate exactly the same node and edge features as a single linear layer for a nodenode incidence. These results are corroborated by good performance on experiments.
4 Experiments
incidence networks  

target  enns2s  nmpedge  schnet  Cormorant  WaveScatt  

0.092  0.077  0.235  0.092  0.160  0.028  0.039  0.030  0.036  0.037  0.033  0.033 
0.040  0.032  0.033  0.031  0.049  0.019  0.025  0.028  0.030  0.023  0.028  0.029  
G  0.019  0.012  0.014      0.001  0.001  0.008  0.008  0.003  0.011  0.010 
H  0.017  0.011  0.014      0.001  0.001  0.008  0.008  0.002  0.010  0.010 
0.043  0.036  0.041  0.036  0.085  0.098  0.191  0.089  0.116  0.097  0.101  0.090  
0.037  0.030  0.034  0.036  0.076  0.049  0.062  0.049  0.052  0.054  0.054  0.052  
gap  0.069  0.058  0.063  0.073  0.118  0.073  0.062  0.068  0.080  0.087  0.078  0.071 
0.030  0.029  0.033  0.130  0.340  0.040  0.082  0.040  0.067  0.038  0.055  0.060  
0.180  0.072  0.073  0.673  0.410  0.010  0.012  0.017  0.017  0.009  0.021  0.017  
U  0.019  0.010  0.019      0.001  0.002  0.007  0.009  0.002  0.010  0.009 
0.019  0.010  0.014  0.028  0.022  0.001  0.001  0.008  0.008  0.003  0.010  0.010  
ZPVE  0.0015  0.0014  0.0017  0.0019  0.002  0.006  0.008  0.008  0.011  0.007  0.010  0.009 
Many deep models for graphs have been applied to the task of predicting molecular properties Gilmer et al. (2017); Schütt et al. (2018, 2017); Jørgensen et al. (2018); Morris et al. (2018); Unke and Meuwly (2019); Kondor et al. (2018); Anderson et al. (2019); interestingly, most, if not all of these methods are considered message passing methods.
Our architecture for all models is a simple stack of equivariant layers:
where the final layer has a single channel followed by pooling, which produces a scalar value for the target. For details on the dataset, architecture and training procedure, see Appendix F.
Table 1 reports previous stateoftheart, as well as our results using various members of the incidence network family. The abbreviation used for the results include: Sparse vs. Dense (/); directed vs. undirected (/) and nodeedge vs. nodenode (/). For example, uses sparse nonlinear layers that operate on directed nodenode incidence and produce directed asymmetric outputs. Finally () identifies the layer that uses the larger symmetry. See Table 2 more details on each incidence network model.
Layer Name  Edge Type  Graph Type  Num. Params.  Layer Type  Sym. Group  
NodeNode 
Directed  Dense  15  Linear  
Undirected  Dense  9  Linear  
Directed  Sparse  15  NonLinear  
Undirected  Sparse  9  NonLinear  
NodeEdge 
Undirected  Dense  7  Linear  
Undirected  Sparse  7  NonLinear  
Undirected  Sparse  4  NonLinear  
All models match or outperform stateoftheart in 7/12 targets (bold values). They also show a similar performance despite using different representations, supporting our theoretical analysis regarding the comparable expressiveness nodenode and nodeedge representation. Dense models generally perform slightly better at the cost of runtime for training. Finally, we note that the 4 parameter model () of Section 3.3 performs almost as well, despite using an overconstraining symmetry group, further supporting our theoretical results outlined in Section 3.3 and explained in Appendix A.
5 Higher Order Geometric Structures
In Section 5.1 we define incidence tensors, which generalize nodenode and nodeedge matrices of graphs. We discuss several examples, showing how they can represent geometric objects such as a graphs, polytopes, or simplicial complexes. In Section 5.2 we generalize the orbit decomposition introduced in Section 3.1 to generic incidence tensors. Finally, in Section 5.3 we show how to build equivariant layers using a linear combination of simple poolingandbroadcasting operations for arbitrary incidence tensors.
5.1 Incidence Tensors
Recall that denote a set of nodes. A directed face of size is an ordered tuple of distinct nodes . Following a similar logic, an undirected face , is a subset of of size . We use when identifying the size of the face – i.e., . For example, identifies an edge in a graph, or a mesh, while is a triangle in a triangulated mesh.
An incidence tensor is a tensor of order , where each dimension is indexed by all faces of size . For example, if indexes nodes, and identifies an edge, becomes a nodeedge incidence matrix. An incidence tensor has a sparsity structure, identified by a set of constraints , where all the indices are equal for any nonzero entry of . For example, we have , only if . Therefore .
Therefore, while in general the pair defines the incidence tensor, whenever it is clear from context, we will only use to denote it. This formalism can represent a variety of different geometric structure as demonstrated in the following sections.
Simplicial Complexes
Before discussing general simplicial complexes let us review graphs as an example of incidence tensors.
The nodenode incidence matrix is an incidence tensor indexed by a pair of nodes, with no sparsity constraints. We denoted it with in Section 3.1 for simplicity. The nodeedge incidence matrix is denoted by the pair . It is indexed by nodes and edges . The entries can be nonzero only when , meaning that the edge is adjacent to the node . An alternative notation is . Again, we denoted it simply with in Section 3.1. We have also denoted node and edge vectors with and , respectively. As a final example, would denote an edgeedge incidence matrix whose entries are nonzero wherever two edges are incident.
Let us now move to the definition of a general (undirected) simplicial complex. An abstract simplicial complex is a collection of faces, closed under the operation of taking subsets – that is . Dimension of a face is its size minus one. Maximal faces are called facets and the dimension of is the dimension of its largest facet. For example, an undirected graph is a onedimensional simplicial complex. Each dimension of an incidence tensor may be indexed by faces of specific dimension. Two undirected faces of different dimension are incident if one is a subset of the other. This type of relationship as well as alternative definitions of incidence between faces of the same dimension can be easily accommodated in the form of equality constraints in .
Although not widely used, a directed simplicial complex can be defined similarly. The main difference is that faces are sequences of the nodes, and is closed under the operation of taking a subsequence. As one might expect, the incidence tensor for directed simplicial complexes can be built using directed faces in our notation.
Example 1.
A zerodimensional simplicial complex is a set of points that we may represent using an incidence vector. At dimension one, we get undirected graphs, where faces of dimension one are the edges. Triangulated mesh is an example of twodimensional simplicial complex; see figure below.
Polygons, Polyhedra, and Polytopes
Another family of geometric objects with incidence structure is polytope. A formal definition of abstract polytope and its representation using incidence tensors is given in the Appendix D. A polytope is a generalization of polygone and polyhedron to higher dimensions. The structure of an (abstract) polytope is encoded using a partially ordered set (poset) that is graded, meaning that each element of the poset has a rank. For example, Fig. 3, shows the poset for a cube, where each level is a different rank, and subsets in each level identify faces of different size (nodes, edges, and squares). The idea of using incidence tensor representation for a polytope, is similar to its use for simplicial complexes. Each dimension of indexes faces of different rank. Two faces of the same dimension may be considered incident if they have a face of specific lower rank in common. We may also define two faces of different dimension incident if one face is a subset of the other – i.e., in the partial order.
5.2 Symmetry & Decomposition
The automorphism group associated with an incidence tensor is the set of all permutations of nodes that maps every face to another face, and therefore preserve the sparsity
where the action of on the faces is naturally defined as
(6) 
See Fig. 1(a,b) for an example. We may then construct equivariant linear layers through parametersharing. However, the constraints on this linear operator varies if our dataset has incidence tensors with different sparsity patterns. For example, a directed graph dataset may contain a fully connect graph with automorphism group and a cyclic graph with automorphism group . For these two graphs, nodenode and nodeedge incidence matrices are invariant to the corresponding automorphism groups, necessitating different constraints on their linear layer. To remedy the problem with modelsharing across instances, we densify all incidence tensors so that all directed or undirected faces of a given dimension are present. Now, one may use the same automorphism group across all instances; see Fig. 1(c,d). Next, we consider the incidence tensor as a set, and identify the orbits of action. {mdframed}[style=MyFrame]
Theorem 5.1.
The action of on any incidence tensor decomposes into orbits that are each isomorphic to a facevector:
(7) 
where is the multiplicity of faces of size . The value of is equal to the number of partitioning of the set of all indices into nonempty partitions, such that belong to different partitions, and members of belong to the same partition.
The proof appears in Appendix B.
Example 2 (Nodeadjacency tensors).
Consider an order nodenode…node incidence tensor with no sparsity constraints. In this case, the multiplicity of Eq. 7 corresponds to the number of ways of partitioning a set of elements into nonempty subsets and it is also known as Stirling number of the second kind (written as ). Each partition of size identifies a facevector for a face of size . These faces can be identified as hyperdiagonals of order in the original adjacency tensor . For example, as shown in the figure below, decomposes into a nodevector (the main diagonal of the adjacency cube), three edgevectors (isomorphic to the three diagonal planes of the cube adjacency, with the main diagonal removed), and one hyperedgevector (isomorphic to the adjacency cube, where the main diagonal and diagonal planes have been removed). Here, , , and .
5.3 Equivariant Maps for Incidence Tensors
As shown in the previous section, any incidence tensor can be decomposed into disjoint union of facevectors, that are invariant sets under the action of symmetry group. An implication is that any equivariant map from an incidence tensor to another also decomposes into equivariant maps between facevectors.
Let be a linear function (here represented as a tensor) that maps a vector of faces of size to a vector of faces of size ,
(8)  
where identifies faces of size , and (using Einstein notation) repeated indices on are summed over. Equivariance to is realized through a symmetry constraint on ,
(9) 
which ties the elements within each orbit of the so called diagonal action on ; see Fig. 2 (left, middle) for a graph example.
Pool & Broadcast Interpretation
Each unique parameter in the constrained corresponds to a linear operation that has a pool and broadcast interpretation – that is any linear equivariant map between two incidence tensors can be written as a linear combination of poolingbroadcasting operations. Moreover, this interpretation allows for a lineartime implementation of the equivariant layers, as we avoid the explicit construction of .
Definition 1 (Pooling).
Given a face vector , for , the pooling operation sums over the indices in :
In practice, the summation in the definition may be replaced with any permutationinvariant aggregation function. We use meanpooling in our experiments.
Definition 2 (Broadcasting).
broadcasts , a faces vector of size , over a target vector of faces of size . We identify with a sequence of node indices of the target facevector, with , and we broadcast across the remaining node indices – that is
For example, given an edgevector , broadcasts to a trianglevector (i.e., vector of faces of size ), where is mapped to the first two node indices and broadcasted along the third. The important fact about pool and broadcast operations defined above is that they are equivariant to permutation of nodes. In fact it turns out that an equivariant can only linearly combine pooling and broadcasting of input incidence tensor into an output tensor. {mdframed}[style=MyFrame]
Theorem 5.2.
Any equivariant linear map as defined in Eq. 8 can be written as
(10) 
The proof appears in Appendix B. The sum of the poolingandbroadcasting operations in Eq. 10 includes pooling the node indices of the input facevector in all possible ways, and broadcasting the resulting collection of facevectors to the target facevector, again in all ways possible. is the parameter associated with each unique poolingandbroadcasting combination.
Decomposition of Equivariant Maps
Let be an equivariant map between arbitrary incidence tensors, where both input and output decompose according to Eq. 7. Using the equivariant maps of Eq. 10, we get a decomposition of into all possible combination of inputoutput face vectors
(12) 
where for each copy (out of copies) of the output face of size , we are summing over all the maps produced by different input faces having different multiplicities. Use of and in the map is to indicate that for each inputoutput copy, the map uses a different set of parameters. The upshot is that input and output multiplicities play a role similar to input and output channels.
Example 3 (Nodeadjacency tensors).
This example, is concerned with the incidence representation used in equivariant graph networks of Maron et al. (2018) and derives their model as a special case, using our pool/broadcast layer and facevectors decomposition. For equivariant layer that maps a nodenode…node incidence tensor of order (as outlined in Example 2) to the same structure, the decomposition in terms of facevectors reads
where is the Stirling number of the second kind; see Example 2. The total number of operations according to Eq. 13 is then given by
In the last line, is the Bell number and counts the number of unique partitions of a set of size . To see the logic in the final equality: first divide in half. Next, partition each half into subsets of different sizes () and choose of these partitions from each half and merge them in pairs. The first two terms count the number of ways we can partition each half into (or ) partitions and select a subset of size among them. The term accounts for different ways in which partitions can be aligned. This result agrees with the result of Maron et al. (2018). Therefore one may implement the hypergraph networks using efficient poolingandbroadcasting operations outlined in Eq. 10.
Recall that when discussing equivariant layers for graphs, we also considered independent permutations of rows and columns in a nodeedge incidence matrix, and claimed that despite having only 4 parameters, stacking two such layers (with additional channels) is equivalent to the 15 parameter model. In Appendix E, a similar result is given for higher dimensions, showing that one may use as the symmetry group of an incidence tensor, where the equivariant model has parameters.
6 Conclusion
This paper introduces a general approach to learning equivariant models for a large family of structured data through their incidence tensor representation. In particular, we showed various incidence tensor representations for graphs, simplicial complexes, and abstract polytopes. The proposed family of incidence networks are 1) modular: they decompose to simple building blocks; 2) efficient: they all have lineartime poolingandbroadcasting implementation, and; 3) effective: various members of this family achieve stateoftheart performance for graphs using a simple architecture.
In our systematic study of this family, we discussed implications of 1) added symmetry due to undirected faces; 2) sparsity preserving equivariant maps, and; 3) the successive relaxation of the symmetry group . Here, moving to a larger group simplifies the neural layer by reducing the number of unique parameters (and linear operations), while increasing its bias. Application of incidence networks to different domains, such as learning on triangulated mesh is a direction that we hope to explore in the future.
Footnotes
 Possibly due to the complexity of CCN architecture, experiments in Kondor et al. (2018) do not use all attributes in QM9 dataset and their results are not comparable to stateoftheart.
 We were not able to compare our experimental results to Morris et al. (2018); Maron et al. (2019a) and the results reported in Wu et al. (2018) due to their choice of using a larger training split. Moreover, the raw QM9 dataset used by Morris et al. (2018) contains 133,246 molecules, which has 639 fewer molecules than the dataset used in our experiments.
References
 Cormorant: covariant molecular neural networks. arXiv preprint arXiv:1906.04015. Cited by: Table 1, §4.
 Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §2.
 Spectral networks and locally connected networks on graphs. ICLR. Cited by: §2.
 On the equivalence between graph isomorphism testing and function approximation with gnns. arXiv preprint arXiv:1905.12560. Cited by: §2.
 Gauge equivariant convolutional networks and the icosahedral cnn. arXiv preprint arXiv:1902.04615. Cited by: §2.
 Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852. Cited by: §2.
 Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, Cited by: §2.
 Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212. Cited by: §2, Table 1, §4.
 Deep models for relational databases. arXiv preprint arXiv:1903.09033. Cited by: §2.
 Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §2.
 Deep models of interactions across sets. In Proceedings of the 35th International Conference on Machine Learning, pp. 1909–1918. Cited by: §2, §3.3.
 Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163. Cited by: §2.
 Wavelet scattering regression of quantum chemical energies. Multiscale Modeling & Simulation 15 (2), pp. 827–863. Cited by: Table 1.
 Neural message passing with edge updates for predicting properties of molecules and materials. arXiv preprint arXiv:1806.03146. Cited by: Table 1, §4.
 Molecular graph convolutions: moving beyond fingerprints. Journal of computeraided molecular design 30 (8), pp. 595–608. Cited by: §2.
 Universal invariant and equivariant graph neural networks. arXiv preprint arXiv:1905.04943. Cited by: §2.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
 Covariant compositional networks for learning graphs. arXiv preprint arXiv:1801.02144. Cited by: §2, §4, footnote 1.
 Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §2.
 Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §2.
 Provably powerful graph networks. arXiv preprint arXiv:1905.11136. Cited by: §2, footnote 2.
 Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902. Cited by: §2, §3.1, Example 3.
 On the universality of invariant networks. arXiv preprint arXiv:1901.09342. Cited by: §2.
 Weisfeiler and leman go neural: higherorder graph neural networks. arXiv preprint arXiv:1810.02244. Cited by: §2, §4, footnote 2.
 Quantum chemistry structures and properties of 134 kilo molecules. Scientific data 1, pp. 140022. Cited by: §4.
 Equivariance through parametersharing. In Proceedings of the 34th International Conference on Machine Learning, JMLR: WCP, Vol. 70. Cited by: §2.
 The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.
 Quantumchemical insights from deep tensor neural networks. Nature communications 8, pp. 13890. Cited by: §2, §4.
 SchNet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics 148 (24), pp. 241722. Cited by: Table 1, §4.
 Symmetries and discriminability in feedforward network architectures. IEEE Transactions on Neural Networks 4 (5), pp. 816–826. Cited by: §2.
 PhysNet: a neural network for predicting energies, forces, dipole moments and partial charges. arXiv preprint arXiv:1902.08408. Cited by: §4.
 Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §2.
 MoleculeNet: a benchmark for molecular machine learning. Chemical science 9 (2), pp. 513–530. Cited by: footnote 2.
 Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536. Cited by: §2.
 Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810. Cited by: §2.
 Deep sets. In Advances in Neural Information Processing Systems, Cited by: §3.1.
 An endtoend deep learning architecture for graph classification. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §2.