The Role of Isomorphism Classes in MultiRelational Datasets
Abstract
Multiinteraction systems abound in nature, from colloidal suspensions to gene regulatory circuits. These systems can produce complex dynamics and graph neural networks have been proposed as a method to extract underlying interactions and predict how systems will evolve. The current training and evaluation procedures for these models through the use of synthetic multirelational datasets however are agnostic to interaction network isomorphism classes, which produce identical dynamics up to initial conditions. We extensively analyse how isomorphism class awareness affects these models, focusing on neural relational inference (NRI) models, which are unique in explicitly inferring interactions to predict dynamics in the unsupervised setting. Specifically, we demonstrate that isomorphism leakage overestimates performance in multirelational inference and that sampling biases present in the multiinteraction network generation process can impair generalisation. To remedy this, we propose isomorphismaware synthetic benchmarks for model evaluation. We use these benchmarks to test generalisation abilities and demonstrate the existence of a threshold sampling frequency of isomorphism classes for successful learning. In addition, we demonstrate that isomorphism classes can be utilised through a simple prioritisation scheme to improve model performance, stability during training and reduce training time.
^{1}Department of Physics, University of Cambridge
^{2}Computer Laboratory, University of Cambridge
^{3}Department of Mathematics, University of Cambridge
{bjd39,cb2015}@cam.ac.uk
1 Introduction
We focus on the task of predicting the dynamics of simple manybody multiinteraction systems, a first step towards scaling to more complex dynamical systems. A variety of approaches have been developed to tackle variants of this problem, including predicting the trajectories of particles given the underlying interaction network Battaglia et al. (2016), learning to simulate complex physics with graph networks SanchezGonzalez et al. (2020), and applying constraints from Lagrangian dynamics to learn a physics model Lutter et al. (2019). We will focus on approaches that both predict trajectories and infer relations in the system, neural relational inference (NRI) Kipf et al. (2018) and factorised neural relation inference (fNRI) Webb et al. (2019). These are unsupervised models which explicitly infer the underlying interactions of a system to predict the resulting dynamics. This structure is akin to an interpretable theory and the predictions it makes, with the aim of being a more valuable research tool compared to inscrutable blackboxes. The investigations conducted will also be relevant to synthetic multirelational datasets from other settings Sinha et al. (2020).
Despite designing the model architecture around the potential value of explicitly inferring interactions, little attention is paid to the structure of multiplex interaction networks or their sampling distribution in training and evaluation routines. There are many nonobvious results in the field of random graph theory, perhaps the most wellknown being the percolation transition where, above a threshold connectivity, it is expected that a single component will come to encompass the entire graph ErdÅs and RÃ©nyi (1959, 1960). Effects such as these can bias the generation of the synthetic data used to train these models, hampering generalisation and causing performance to be overestimated.
A second missing component is the scientific process by which experiments are formulated to test the edge of current theories: areas well within the understood domain provide little new information, whereas regions far beyond our understanding are often too poorly explained to allow insight to be gained from results. Scientific progress is driven by this almost antagonistic relationship between theorists and experimentalists. This is absent from the training procedure of these models, where examples are treated without consideration as to the model’s current performance.
We incorporate these concerns into better synthetic multirelation dataset generation and a new training procedure in this work. To do this, we first analyse the structure of interaction networks (Section 3), exposing nonintuitive results in the distribution of multiplex isomorphism classes and exploring how generation methods can incur a bias and leak over generated datasets. We demonstrate how these biases impact training and the overestimation of model performance arising due to leakage. We also (tentatively) present a novel fast algorithm for the construction of the set of nonisomorphic interaction networks, with a proof of correctness
2 Model background
We present a brief overview of the task formulation and stateoftheart approaches, the Neural Relational Inference (NRI) model Kipf et al. (2018) and its derivative, the Factorised Neural Relational Inference (fNRI) model Webb et al. (2019). We do not make any architectural modifications to the original models.
Problem statement
The primary task is the reconstruction (or evolution) of trajectories of particles in an interacting system, represented as a sequence of feature vectors over particles, , and time, , with neither access to nor supervision from the ground truth interaction network.
Model formulation
Both the NRI and fNRI are formulated as variationalautoencoders (VAEs) with observed trajectories being encoded as a latent interaction network that determines the output trajectory evolution for some initial conditions. Architecturally, the models are graph neural networks that use message passing in the encoder and decoder. The encoder embeds each particle’s trajectory then, through a series of vertextoedge and edgetovertex message passing operations, produces an edgeembedding between each pair of particles. The models differ in the dimensionality and meaning of the edgeembedding: the NRI uses a onehot dimensional vector with a separate edgetype for each interaction and combination of interactions; the fNRI uses a multicategorical vector of length where different edge types exist only for different interactions. The decoder samples the latent interaction network to modulate the messagepassing between particles, corresponding to the transmission of forcecarrying particles. Figure 1 presents the model diagrammatically.
Though the fNRI outperforms the original NRI, that the models differ in their handling of the latent interaction networks makes them both relevant to our analysis of the impact that interaction network sampling has on performance estimation.
3 Isomorphism analysis
In this section we analyse interaction networks through their isomorphism classes, investigate the sampling biases that arise from standard multiinteraction network generation processes, and show how information can leak between datasets through isomorphisms. The influence of the bias and leakage on performance evaluation is presented. Here we focus our analysis on five particles interacting via idealsprings, finitesprings, and a charge force in two dimensions, as in the original work Webb et al. (2019).
3.1 Isomorphism classes
The set of possible interaction networks for some combination of interactions can be partitioned into isomorphism classes which, up to initial conditions, result in identical particle dynamics. In this sense the isomorphism classes form the set of ‘unique networks’ that can be generated.
Basis networks
We can simplify our analysis by first considering the isomorphism classes of the base interactions separately, as the multiplex network
Multiplex isomorphism classes
To generate multiinteraction networks we join basis networks together to form a multiplex network. The set of multiplex networks resulting from all the combinations of basis networks, and all the ways of joining them up, can be partitioned into multiplex isomorphism classes. Just as for the basis networks, these can be considered as the meaningfully ’unique networksâ. An example of multiplex isomorphism classes partitioning the interaction networks, generated by joining basis networks together, is shown in Figure 2. For multiplex networks to be isomorphic, it is necessary that the layers are separately isomorphic, and as such we are guaranteed to include all nonisomorphic multiplex networks when considering only the combinations of basis networks.
Fast multiplex isomorphism generation
To understand the sampling distribution over multiplex isomorphism classes we need to generate them. Naively, this can be achieved by generating all possible networks (binary strings over the number of edges), checking that they are multiplex and satisfy force relations, and then performing pairwise isomorphism tests to build groups. We present a new method that exploits the symmetries of the basis networks and the process of combining them.
The key concept is to write the ways of combining basis graphs as permutations of node labels and then make associations between these using automorphisms. We can write all the ways of combining a pair of graphs with labels and , respectively, by keeping the second graph fixed and permuting the nodes in the first. By definition, performing an automorphic transformation on node labels in one layer of the multiplex is undetectable in the other layers, and so the resulting permutation of node labels is isomorphic to the original network. This allows us to construct an equivalence class by applying all basis graph automorphisms, grouping the resulting permutations and further applying the automorphisms to the results to form a closedgroup. Notably, any automorphism for the overall network must also be an automorphism for every basis graph, and so we do not overlook any transformations. We provide a visualisation of the method in Figure 4, psuedocode in Algorithm 1 and a proof in Appendix A.
Our method is applied on pregenerated automorphisms (a task handled by multiple existing libraries Darga et al. (2008). To combine a third graph, we can flatten the representative multiplex of where the automorphisms of the flattened graphs are given by the automorphisms that exist in both basis networks for the given node pairing (permutation). The flattened graph can then be passed as input itself.
3.2 Sampling biases and data leaks
Given we are now able to efficiently generate the set of nonisomorphic multiplex networks for a group of interaction types, we turn our attention to the sampling distribution induced by different generation methods and their effects on model performance and evaluation.
Not all networks are created equally
Kipf et al. (2018) generate interaction networks with Bernoulli sampling over edges for pairwise interactions and Bernoulli sampling over nodes for collective interactions, a process that is inherited by Webb et al. (2019). Sampling graphs with a Bernoulli distribution on the presence of edges is commonly known as ErdÅsâRÃ©nyi (ER) sampling and we will refer to this generation procedure as OriginalER. The total number of edges or interactingnodes follow a binomial distribution and there is a second bias arising for pairwise interactions from their arrangement, as shown in Figure 5. We also consider a second generation method where basis network isomorphism classes are sampled uniformly, UniformBasis, that removes the arrangement bias. By propagating the distribution these methods induce over basis network sampling frequencies, we can produce the relative frequency of the full multiplex network isomorphism classes, shown in Figure 6. We find strong sampling biases exist for both methods, with the mosttoleastlikely ratio exceeding 100:1 in each case.
We compare the performance of models that are trained on training sets generated by ER sampling and uniform sampling of the multiplex isomorphism classes. The latter removes all sampling biases on the multiplex isomorphism classes. The validation and test sets are generated by uniform sampling of the multiplex isomorphism classes, and are identical between both datasets, which will be referred to as TrainER and TrainUniform respectively. The results in Table 1 show that ER bias reduces the performance on both the models.
Isomorphism leakage
Conventionally, training, validation and test sets are disjoint. Naive generation of the interaction networks however will almost certainly result in some multiplex isomorphism classes being present in the different sets, i.e. leaking to the test set. For two datasets and with data generated by some latent graph , we say that there is isomorphism leakage between and if there exists and where and are isomorphic. Neither Kipf et al. (2018) nor Webb et al. (2019) claim to control for this possibility, and it is not the case that identical examples are present across their splits as initial conditions vary, however, by controlling for this facet of variability in isolation we find significant changes in judged model performance. We adopt the exact training scheme used by Webb et al. (2019)
4 Model testing
In light of the previous results, there is a need for a standardised and reproducible isomorphismaware benchmarking framework to evaluate model performances Dwivedi et al. (2020). In this section we propose multiple benchmarks and utilise them, to analyse the fNRI. We test for generalisability, focusing on the evaluation of flexibility and robustness over ‘skill’ Chollet (2019). We also investigate how the training set distribution affects performance, including varying the sampling frequency of isomorphism classes in the training set.
Measuring generalisation
Considering the uniqueness of interaction networks, we can associate testing on isomorphism classes seen during training with the transductive setup Yang et al. (2016), where the same graph is used in both contexts, and testing on unseen classes with the inductive setup. We can further associate two kinds of generalisation with these cases: to different initial conditions (Con) in the transductive case, and to different interaction networks (Iso) in the inductive case.
Here we focus on the idealspring and charge interactions for five particles as the number of multiplex isomorphism classes is small (454). The original work Kipf et al. (2018) used what will be referred to as the OriginalER dataset for idealspring, charge interactions. This has the same structure as in Section 3.2, which also included finitesprings. To test Con and Iso generalisation, and also compare our results with the OriginalER dataset, we propose the Con dataset and Iso dataset respectively. We also investigate both types of generalisation together using the ConIso dataset. The Con111 dataset contains [454, 454, 454] multiplex isomorphism classes (all of them), with [111, 22, 22] initial conditions (the same set for each multiplex isomorphism class). The Iso155 dataset partitions the multiplex isomorphism classes between the training, validation and test set such that they do not overlap, each with the same set of 155 initial conditions. The ConIso dataset also has the same structure, but all the initial conditions are different. In these datasets the number of initial conditions is chosen such that the total trajectories closely matches that of the OriginalER dataset e.g. . A summary of these datasets and the results are shown in Table 3 and 4 respectively.
The fNRI performs well on the Con111 and Iso155 datasets and demonstrates generalisation to different initial conditions, multiplex isomorphism classes separately. Perhaps counterintuitively, the Iso155 dataset actually outperforms the Con111 dataset. A plausible explanation is that repeatedly observing the same interaction networks applied to different initial conditions provides a stronger learning signal, which in turn enables superior performance and generalisation. The ConIso dataset demonstrates good performance for the meansquared error, but has a much lower encoding accuracy. This may be due to the model learning an implicit encoding of the trajectories that isn’t interaction network based, as opposed to the explicit one which the encoding accuracy measures.
FewShot learning
Though stateoftheart performance in many machine learning tasks is achieved with large amounts of labelled data, there are many domains in which it is impractical or overly costly to gather sufficient data. In such a setting, fewshot learning algorithms can be employed to make best use of what is available Wang et al. (2019). The effect of the frequency of occurrence of each multiplex isomorphism class in the training set is explored in this section to investigate the fNRI’s capacity for fewshot learning. The Con datasets are used, which have the same structure as the Con111 dataset, where each multiplex isomorphism class is seen times per epoch. Figure 7 shows how model performance improves as the number of initial conditions, , is increased.
As expected, the performance of the fNRI increases and then plateaus as the number of initial conditions increase. There is a rise in performance at around 20 initial conditions, which is associated with the increase in encoding accuracy. This shows that there is a threshold sampling frequency of isomorphism classes for successful learning. Curiously, there is a large drop in the meansquared error below the threshold frequency. Again, we believe that the model may be learning to encode an implicit representation of the interaction network, which allows for the prediction of the few trajectories that are present in the smaller datasets with lower number of initial conditions. Once we pass the threshold frequency, it may be that the explicit representation is required to capture the entire dataset, hence the increase in encoding accuracy and consequently the decrease of the meansquared error.
5 Prioritised sampling
In this section, we use a simple prioritisation scheme to analyse the benefits isomorphismawareness can have on training speed and final performance. The likelihood of selecting an example for training is proportional to the exponentially weighted average of the historic model error on that sample. The historic error can also be grouped by multiplex isomorphism class. The performance of unprioritised and prioritised sampling with and without grouping is shown in Figure 8.
Prioritised sampling in the fNRI reliably shifts the learning curve to lower epochs, increasing the learning rate. Without grouping by multiplex isomorphism classes, the fNRI converges to lower performances. Prioritised sampling also improves performance in the NRI, again to a lesser extent without grouping. This demonstrates that isomorphismawareness can be used to improve model performances.
6 Conclusions
We have analysed multiplex isomorphism classes in the context of learning to model multiinteraction systems. On the basis of our analysis, we have shown that the performance of models on this task has been overestimated, particularly with regards to generalisation. To remedy this problem we have proposed and evaluated new benchmarking datasets. Through experiments with these new benchmarks we show under what conditions neural relational inference models can be expected to learn and generalise well. We also present results on prioritised sampling in a training procedure that parallels the scientfic process. Finally, we have presented, and proven, an efficient new method for generating multiplex isomorphism classes for this context that makes further work in this area practical and accessible.
Ethics Statement
Our work is concerned primarily with foundational results in graph theory and their implications for training and evaluation for similarly foundational problems in systems of interacting particles. For this reason we consider there to be few foreseeable broader impacts, though we address the potential application of these models to human networks.
The NRI Kipf et al. (2018) presents results applying the model to the motion of basketball players and it seems reasonable to consider whether this could be extended to generic motion of people. Firstly, the application is more narrow than it first appears, predicting motion during an artificially constrained phase of play (a pickandroll) and the model is only weakly able to reconstruct player trajectories even in this scenario. Secondly, there is a scaling issue with the current system that requires relations to be considered which has not been resolved (limiting application to larger groups). Thirdly, it is unclear how the data collection to enable this application could be performed without also having the infrastructure to make it redundant—if you have high quality segmented overhead video footage of citizens, why do you need a model to tell you how they will move?
A second consideration may be that the model could be adopted to track how individuals interact and ‘move’ online. Whilst it would be interesting to investigate whether these models can be used in a discrete, nonEuclidean space, current work is limited to particles moving in a continuous, lowdimensional, Euclidean space only, and it is far from obvious how to solve key challenges to adapting to this new task.
Appendix A Proof for the multiplex graph isomorphism algorithm
To show the algorithm works, we need to show given one representative of an isomorphism class, it generates all isomorphic layered graphs.
Suppose we have graphs as our basis graphs, together with embeddings a common vertex set. We can identify this vertex set with the vertex set , and choose to do so. We can therefore assume we have bijections by postcomposing with .
Suppose that give an isomorphic embedding to the . Let be an isomorphism which witnesses the layered graphs are isomorphic, i.e. there is an edge between and iff there is an edge between and . Since all of these maps are bijections, we deduce exists and is an automorphism of for each . Call this map .
Note that . Returning to the maps, we want to transform the into the by postcomposition of a common isomorphism of , and precomposition by isomorphisms of . Using our previous map, we observe . This has the same form as in our algorithm, and hence we must obtain every isomorphic embedding.
Appendix B Other investigations
In this section we present other investigations that were conducted on the fNRI using isomophismaware benchmarks. This includes identifying which interaction types are most important for training, the effect of not including all isomorphism classes in the training set, and measuring generalisation for three interactions (as opposed to two interactions in Section 4).
b.1 Training essentials
The importance of each interaction type in the training set for the fNRI on the idealspring, charge dataset is investigated here. The datasets used are:

Extrapolate Charges to High (XCH)

Extrapolate Charges to Low (XCL)

Interpolate Charges (IC)

Extrapolate Springs to High (XSH)

Extrapolate Springs to Low (XSL)

Interpolate Springs (IS)
Using the XCH dataset as an example, the interaction networks are split into high charge and low charge groups. The training set is comprised of the low charges. The validation and test set is comprised of the high charges. Each has [50, 22, 22] initial conditions. The same logic and number of initial conditions apply to the other datasets.
Dataset  MSE20  Accuracy  ISpring  Charge 

XSH  96.48 64.26  0.567 0.288  0.688 0.181  0.779 0.250 
XSL  105.40 38.21  0.568 0.278  0.724 0.187  0.749 0.191 
IS  99.33 9.29  0.3800.118  0.636 0.082  0.5630.068 
XCH  
XCL  
IC 
The results in Table 5 show that the fNRI has comparable performance for the spring datasets, and performs the best on the XCL dataset for the charges. Training on high charges seems to allow for better generalisation to lower charges whereas there seems to be no preference for the springs. To gain insight into these results, we identify the ‘difficulty’ of each interaction type. To do this we trained the fNRI on the Con111 dataset and partitioned the test set by interaction type. The results are shown in Figure 9.
According to the reconstruction error (MSE20), the most difficult interactions are the high charge interaction networks, which seems to become easier for no springs and high springs. This is expected for the no spring case and the high spring (purely attractive) case may be explained by the clumping of particles which may cause the fNRI to ‘cheat’ and predict the centre of mass motion of the particles. Besides this, the spring difficulty seems to be roughly constant for each number of chargeedges. The difficulty, according to the encoding accuracy, is highest for high charges and low springs. This may explain the results on the extrapolation/interpolation datasets. According to the reconstruction error, the training set ‘difficulty’ should be around the same for the spring datasets, whereas the XCL dataset should have the most difficult training set. This may suggest that training on interactions the model find the most difficult may generalise better to easier interactions, and not the other way around. Note that the fNRI only has access to the reconstruction error (in the loss function) and not the encoding accuracy.
b.2 The effect of subsampling
In this section we consider the effect of not including all the possible multiplex isomorphism classes in the training set, for idealspring charge interactions. We compare the performance of Con datasets, which contain all the multiplex isomorphism classes, with the performance on the SubCon dataset. This dataset removes some multiplex isomorphism classes from the training set of the Con dataset such that there are [324, 454, 454] multiplex isomorphism classes. The validation and test sets are identical between these two datasets.
The SubCon datasets generally show the same behaviour as the Con datasets. It performs worse, but eventually catches up to the performance on the Con dataset. The threshold frequency of learning has been shifted from around 20 initial conditions to around 40 initial conditions. This suggests that subsampling increases the threshold frequency of learning.
b.3 Measuring generalisation for three interactions
Here we focus on the idealspring, charge and finitespring interactions for five particles, as opposed to the case for idealspring and charge interactions in Section 4. We use the Con, Iso and ConIso dataset as before, but for idealspring, charge, finitespring interactions. In this case, the number of multiplex isomorphism classes is large (over 250,000) and we are no longer constrained to using just 454 of them, as in the case for idealspring, charge interactions. We keep the same structure for the Con and Iso datasets, but for the coniso dataset we use [454, 454, 454] interaction networks, all from different multiplex isomorphism classes. A summary of the results are shown in Table 6.
The results are qualitatively similar to the results for the idealspring, charge interactions in Section 4, with the fNRI performing the best on the Iso155 dataset and the worst on the Con111 dataset. This shows that the fNRI generalises better to different graphs, compared to different initial conditions. Again, a plausible explanation is that repeatedly observing the same interaction networks applied to different initial conditions provides a strong learning signal.
Footnotes
 To the best of our knowledge, following a thorough literature review and consultation with domain experts, the algorithm and proof are original work, though we welcome any suggestions of priorart.
 A multiplex network is a vertexaligned multilayer graph. A vertex exists in every layer and is only connected to itself across layers.
 Both the original NRI and fNRI have made their codebases publicly available, greatly enabling this work.
References
 Interaction networks for learning about objects, relations and physics. neural information processing systems (pp. 45024510). External Links: Link Cited by: §1.
 The measure of intelligence.. . External Links: Link Cited by: §4.
 Faster symmetry discovery using sparsity of symmetries. In 2008 45th ACM/IEEE Design Automation Conference, pp. 149–154. Cited by: §3.1.
 Benchmarking graph neural networks.. . External Links: Link Cited by: §4.
 On random graphs.. Publicationes mathematicae, 6(26), pp.290297.. External Links: Link Cited by: §1.
 On the evolution of random graphs.. Publ. Math. Inst. Hung. Acad. Sci, 5(1), pp.1760.. External Links: Link Cited by: §1.
 Neural relational inference for interacting systems. . External Links: Link Cited by: §1, §2, §3.2, §3.2, §4, Ethics Statement.
 Deep lagrangian networks: using physics as model prior for deep learning.. . External Links: Link Cited by: §1.
 Learning to simulate complex physics with graph networks.. . External Links: Link Cited by: §1.
 Evaluating Logical Generalization in Graph Neural Networks.. External Links: Link Cited by: §1.
 Generalizing from a Few Examples: A Survey on FewShot Learning. arXiv eprints, pp. arXiv:1904.05046. External Links: 1904.05046 Cited by: §4.
 Factorised neural relational inference for multiinteraction systems.. . External Links: Link Cited by: §1, §2, §3.2, §3.2, §3.
 Revisiting SemiSupervised Learning with Graph Embeddings. 33rd International Conference on Machine Learning, ICML 2016 1, pp. 86–94. External Links: Link Cited by: §4.