Identifying symmetries in data sets is generally difficult, but knowledge about them is crucial for efficient data handling. Here we present a method how neural networks can be used to identify symmetries. We make extensive use of the structure in the embedding layer of the neural network which allows us to identify whether a symmetry is present and to identify orbits of the symmetry in the input. To determine which continuous or discrete symmetry group is present we analyse the invariant orbits in the input. We present examples based on rotation groups and the unitary group Further we find that this method is useful for the classification of complete intersection Calabi-Yau manifolds where it is crucial to identify discrete symmetries on the input space. For this example we present a novel data representation in terms of graphs.

LMU-ASC 11/20


Detecting Symmetries with Neural Networks

Sven Krippendorf, Marc Syvaeri

  Arnold Sommerfeld Center for Theoretical Physics

[1pt] Ludwig-Maximilians-Universität

[1pt] Theresienstraße 37

[1pt] 80333 München, Germany

[1em]   Max-Planck-Institut für Physik

[1pt] Föhringer Ring 6

[1pt] 80805 München, Germany

1 Introduction

One ubiquitous feature in nature is the presence of symmetries, ranging from the ultra-small captured by the symmetries underlying the Standard Model of Particle Physics to the isotropy and homogeneity of our Universe on cosmological scales; and in every day life when one wants to identify objects in a picture with a neural network. The question we pursue in this paper is: Can we use neural networks to detect symmetries in an underlying data product?

We present a method which is suitable for data questions where we have samples of a function of the input variables This situation is present in supervised learning. The presence of a symmetry is simply the statement that inputs which are transformed under some symmetry transformation lead to the same output

The key idea which we utilise to find symmetries, is the fact that objects which are invariant under symmetries are clustered together in the embedding space (i.e. the second to last layer in our neural networks). As a first step, this reveals the presence of symmetries. Effectively, this is rather similar to word embeddings found in word2vec [1], which has also been utilised to identify similarities between chemical elements [2]. By analysing the relation of the points in the input space we are then able to identify the nature of the symmetry, i.e. we determine the generators of the symmetries.

We test this method on artificial datasets with an underlying rotational group and and show how we can identify a unitary group (here: ) and distinguish it from larger symmetry groups (here: ). To show the applicability of the identification of generators in higher dimensional datasets (e.g. images), we discuss how we can identify in the context of rotated MNIST data.

We use this method in the context of the classification of consistent vacua in string theory. Finding distinct ways to obtain string vacua is a crucial step in improving our understanding of string theory as a theory of quantum gravity. One aspect is the classification of consistent string backgrounds, in particular Calabi-Yau manifolds (CYs). To obtain a classification one needs to remove redundancies arising from multiple representations of the same manifold. We apply our method to the case of complete intersection Calabi-Yau manifolds (CICYs). Utilising a novel representation in terms of graph networks, we perform the supervised classification task for two topological invariants, the Hodge numbers and When analysing the embedding layer, we are able to re-identify the known identities in the dataset.

The rest of the paper is organised as follows. In Section 2 we describe how symmetries can be found in the embedding layer. We then examine the orbits in the input layer to identify the underlying symmetry in Section 3, before presenting our conclusions.

2 Finding Symmetries

In this section we present a method of how to identify previously ‘unknown’ symmetries in a dataset by examining the clustering behaviour in the embedding layer. We study this method on two types of examples – continuous and discrete symmetries.

In the first part, we discuss two examples based on real and complex-valued functions. For this we take the Mexican hat potential in two dimensions which features an -symmetry, and an invariant superpotential (holomorphic function). The procedure to find symmetries is as follows: Within these potentials, we define classes which are defined by a respective value of the potential. This enables us to construct a classification problem.1 We train our network to address this classification task and examine the representation in the embedding layer. This reveals that the representation distinguishes between points connected via the symmetry and points not connected but still in the same class. Coarsely speaking, the network clusters symmetry invariant points and there is a gap in the embedding layer to the other points in the class.

In the second part, we study discrete symmetries in the context of classification of CICYs in three dimensions. We take multiple representatives of each manifold, and train the network to classify some topological invariants, the Hodge numbers and Again, by analysing the structure of the embedding layer, we are able to identify finer grained classes compared to the trained classes. These finer grained classes are comprised of different representatives of the respective CICY manifold. The neural network must use other quantities which it is not trained on.

Depending on the dimension of the embedding space, we use a dimensional reduction with TSNE [3] to be able to plot the data points and to visualise its structures.

This identification of a symmetry in the dataset is then used in a second step to construct the generators associated with this symmetry. This is discussed in Section 3 and this step allows us to identify the underlying symmetry.

2.1 Continuous Symmetries


We start with a two dimensional function with an underlying -symmetry:


where we use for our numerical experiments. Here, two types of points appear: Points with the same value for the potential (1) which are related by a symmetry transformation and points which are not related by a symmetry. Examples of such points can be found in the plot of the potential shown in the right panel of Figure 1.

We formulate our classification problem as follows: we define 11 classes for the function where the values of these classes are as follows:


Then we sample points by randomly picking values for and , and checking whether they belong to one of the classes. For training, we use balanced training sets with representatives per class. We train a simple network consisting of dense layers with hidden units with ReLu-activation and a final layer with dimensional softmax output activation.2 We use categorical crossentropy with Adam optimiser.3 We train our network on this classification task to a reasonable training accuracy (above 95 percent).4 We then visualise the representation on the embedding layer by applying TSNE on this -dimensional data set which can be found in Figure 1.


Figure 1: Left: This shows the TSNE-representation (perplexity of ) of the embedding layer. Each colour represents one class. For several classes, we can directly see two distinct point clouds. Right: This shows the plot of the Mexican hat potential where we highlight the classes using the same color coding as on the left panel. Here, we can directly match points with multiple clusters and disconnected TSNE components.

Looking at a specific class, one can directly see that the separating property is the norm of the point. To be precise, points bigger than the norm of the minimum of the potential at are separated by points with smaller norm. In Figure 1 we can identify for multiple of these classes that they clearly split in two regions whereas for classes with elements from only ‘one’ radius they are not split.


We now demonstrate the method on an example with an -symmetry. To do this we examine the following complex valued function


where and transform in the fundamental and anti-fundamental representation of respectively. Such holomorphic functions appear for instance in supersymmetric field theories and are referred to as superpotentials. Here we are interested in finding the symmetries in this superpotential. In addition to the symmetry, this superpotential has two independent scaling symmetries:


where . However, we check that orbits of these symmetries are not present in our datasets.

Proceeding as before, we firstly sample points for the superpotential and categorise them regarding their outputs. We have one classification with 11 class labels for the real part and one classification for the imaginary part. We choose the following numerical ranges, which are symmetric around zero:


With this classification we cover the entire output range in the open subset Again, we sample the points by randomly picking values for and , and checking whether their real and imaginary part both belong to one of these classes. As in the previous case, we trained a simple network consisting of dense layers with neurons and ReLu-activation, followed by two -dimensional dense layers with softmax activation. As before, we use categorical crossentropy for each of these output layers with an Adam optimiser. For training we used a balanced set with representatives per class and we terminated training at an accuracy of slightly above percent. Again, we visualise the structure of the -dimensional embedding layer by applying TSNE and show the resulting two dimensional space in Figure 2.

Figure 2: Left: This is a TSNE-projection of the -dimensional embedding space (perplexity ). The coloured dots mark the same classes as highlighted on the right hand side. Gray dots denote the other points in the embedding. Right: invariant quantity Most classes have two distinct representatives but some only have one. For instance, the yellow and light orange class have a single invariant. In the embedding layer there are no distinct clusters for these points unlike for the other points.

In this projection, it is tedious to find different regions as a consequence of having different classes. We highlight some examples of the separation in the point clouds in Figure 2 with one and two distinct representatives respectively. This can be seen by computing the invariant quantity of (where and ) and find that there are two different values for most of our classes. Once again, the latent representation reveals the symmetry structure of the problem. As a consistency check we find that no such structure is observed on the input data.

2.2 Discrete Case: Identifying distinct string theory vacua

After these warm-up exercises we now discuss an example where finding the symmetries in a dataset are crucial to answer a question in mathematical physics: How many distinct vacua of string theory can be constructed in a particular class of string models?

Knowing which distinct ways one can obtain string vacua is a crucial question in our understanding of string theory as a theory of quantum gravity. One sub-question is associated to classifying consistent background geometries for string theory, in particular CY-manifolds [5].

CICYs provide an interesting class of such backgrounds: their classification has been achieved in three and partially in four dimensions [6, 7] and models on such spaces are among the most realistic string vacua constructions to date [8, 9]. The initial enumeration features many representations which are related by a priori unknown symmetries. Although they have been identified in a heroic effort for three and four dimensions, it is unknown what the symmetries are in higher dimensions. The knowledge of these symmetries is necessary in order to tackle the combinatorial complexity of the initial enumeration which renders a classification in higher dimensions currently unfeasible.

CICYs are realized as complete intersections in products of complex projective spaces whose classical description we now review (cf. [10] for more details).

Construction – classical description

A CICY can be described by its configuration matrix which, for instance, can look like this

The notation is to be understood as follows: The first column of the matrix denotes the dimension of the projective space, here our space is the product space . The other columns encode the information on the polynomials which define the hypersurface in the ambient product space. The entries in a given column refer to the multi-degrees in the corresponding projective space. The CICY is defined as the zeros of these polynomials. To write the polynomials explicitly for this example, we have to define the coordinates of each space: is denoted with were , the coordinates by with , and for we have with . The polynomials can be written as (before imposing any scaling of the projective spaces):

where and are complex coefficients. Therefore, the configuration matrix describes a family of CICYs parametrised by the space of the coefficients. Many basic properties do not depend on the explicit form of the polynomials, but only on the configuration matrix (so for example the Euler characteristic depends on the configuration matrix rather than on the explicit polynomials). This feature is the strength of this notation, and one of the motivations to introduce it. For the hypersurface to be a CY-manifold, the rows have to satisfy the following relation between the degree of the projective factor and its appearance in all polynomials:


Restricting to manifolds of fixed complex dimension leads to the constraint on the number of projective components


where denotes the number of equations. In combination with the observation that a factor with a quadratic constraint is redundant, it can then be shown that there is only a finite number of such configuration matrices [11]. In [6] of such matrices were singled out for the case of threefolds, utilising some additional identities which are discussed below. This dataset can be found online [12]. In [13] it was pointed out that of these matrices are redundant and describe the same CICY. For fourfolds configuration matrices were obtained in [7] and in higher dimensions the corresponding sets of configuration matrices are unknown. In the following we focus on the case of three-folds.

Identities – discrete symmetries

The simplest identities which leave the underlying CICYs unchanged are permutations of rows and columns in the configuration matrices.

Beyond this, there are several further identities how configuration matrices are linked to each other which can be checked explicitly for small configuration matrices and the identities can then be applied in general [6]. To obtain the classification one can choose one of these respective representations. They can be summarised as follows:


Here denotes a vector containing the dimensions of ‘arbitrary’ projective spaces. denote vectors containing zeros everywhere but in one entry which equals one. denotes vector with cross sum two. are appropriate matrices to render the configuration matrix consistent.

CICYs as graphs – new data representation

The representation in terms of configuration matrices is not permutation invariant, although we are interested in properties which are insensitive to the choice of permutation. This can be achieved when considering a graph representation of the configuration matrix. Such mappings to graphs have shown improved performance such as in classifying properties of molecules [14].

For this novel representation of CICYs we mapped the right part of the configuration matrix (which is sufficient to reconstruct the whole matrix) to a graph. An example of such a graph is shown in Figure 3. We assign different weights to connections in rows and columns respectively. This representation has the advantage that our notation of CICYs is invariant under the permutation of rows and columns.

Configuration matrix Graph representation Next neighbours
Figure 3: Different representation of one CICY. Left: The classic configuration matrix. Middle: A graph visualisation with two distinct weights. Right: Nearest neighbours of the graph.

As the next step, we have to prepare the data in such a way that we can feed the graphs in our network. Therefore, we have to translate the properties of the graphs into a numerical description. We use the next neighbours of each point which are shown for our example in Figure 3 on the right side. We calculated these features for all CICYs and hence obtained a dictionary for all types of points in this dataset, finding types. This naturally gives a -dimensional feature vector with integer entries. As these feature vectors do not uniquely identify a CICY we also use the eigenvalues of the adjacency matrix of the graph as input. In summary, we took the feature vector which has a clear length consisting of integers and the eigenvalues of the adjacency matrix, padded with additional zeros as input for our network. This leads to a -dimensional input vector. Note that the identities correspond to local operations on our graphs.

Training of the network

Our target output data are the topological invariants and which were obtained in [15]. For this supervised learning task, we now proceed as in the continuous case, in particular as in the case with two output classification layers, one for and one for

We started from the classified input-output pairs, and constructed random representatives of each class using identities (if applicable) and permutations. As next step, we constructed the -dimensional input vector as previously described. We note that in this representation each class has a different number of representatives, depending on the number of identities which can be applied. For example the so called quintic hypersurface


just has one representative because no identities can be applied here. However for other CICYs we obtain between and representatives. We end up with around different input vectors. The clear advantage of this input is that we can be sure that two different data-points always describe two distinct matrices which are not related via permutations. To balance the discrepancy of different number of representatives we keep several copies of CICYs with low number of representatives in our training dataset. For evaluation of the classification we only use unique input vectors.

The network we use is a simple multilayer-perceptron with ReLu-activation functions and two softmax-classifications as the final layer, details can be found in Table 1. Again we stop training when the network achieves above 95 percent accuracy in both classifications. For the analysis of the results, we only use the correctly classified data-points.

Type Dimension Activation Initializer Regularization
Input 315
Dense 315 ReLU glorot_uniform
Dense 315 ReLU glorot_uniform l2()
Dense 100 ReLU glorot_uniform
Dense 100 glorot_uniform l2()
Output 1: Dense 102 softmax
Output 2: Dense 20 softmax
Table 1: Neural network architecture for Hodge number classification. The embedding layer is the layer before the output layers. We use categorical crossentropy as the loss on both output layers.

Analysis of the results

As we face a situation with too many classes we utilise a different method to analyse the nearest neighbours in the embedding layer. For a given input configuration, we look at distances of its nearest neighbours in the embedding layer. We identify a sufficient threshold and compare the class labels of the points closer than the threshold.5

As a first step, we pick one data point in the embedding space and find the 250 nearest neighbours with respect to their Euclidean distance. A plot of these lines are the blue curves in Figure 4.

Figure 4: We show the Euclidean distance of the nearest neighbours in the embedding layer to two fixed CICYs (blue). In yellow we show the difference between these distances for points and In red we highlight the largest difference. Below is the respective CICY configuration matrix from the original list.

Two generic features are several plateaus in the distance curve and several big jumps between two points which are shown in yellow in Figure 4. We are interested in the biggest jump, and we use this as our threshold to distinguish manifolds. The red line in Figure 4 is the location of the threshold. The prediction is that points closer than the point at the threshold all belong to one class. We require that we are looking at least at one neighbour. This prediction is quite successful given the fact that the network is just trained with the Hodge numbers, and has no training on the CICY labels. Figure 5 summarises the performance of our method with respect to the CICY labels and we find that for the vast majority of data points the neighbours are correctly classified (for of CICY labels we find an accuracy above ). Outliers arise for CICYs with one or two existing representatives which is expected from this method. Focusing on the Hodge pair with and there are distinct CICYs. Again (cf. Figure 5 right panel), we find that the majority of the CICYs are correctly classified with our method – noting only a small drop to compared to the performance on the entire dataset. Such a drop is expected because the entire dataset contains many cases where we have just one class of CICYs for a specific combination of Hodge numbers.

Figure 5: Performance of our method on CICY dataset. Left: The distribution of performance for all 686,464 data points. Right: The distribution of performance on the subset of CICYs with Hodge-numbers and . The analysis of finding nearest neighbours is still performed with all data points.

The surprising part is that as far as we know there is no straightforward way to see whether two manifolds are inequivalent due to the basis dependence of the intersection numbers. Therefore, more analysis is in order to understand why networks are able to distinguish distinct matrices, and find a sufficient basis to distinguish between CICYs. We plan to return to this question whether the neural network has learned Wall’s theorem [10].6

3 Finding Generators

Having identified the presence of symmetries, the next step, which we discuss now, is to identify the symmetry generators. Our starting point is a pointcloud on the input space which has been identified in the previous step to be related via a symmetry due to the closeness in the embedding layer. To establish a numerical method to perform this analysis we start with a noisy pointcloud. First, we describe our algorithm and apply it then in examples for several symmetry groups in various dimensions. Finally we exemplify how this algorithm can be utilised on images.

3.1 Algorithm

The idea behind the algorithm is to extract the information about the symmetry group when considering a pointcloud which has been found to be related by some symmetry group. Infinitesimally, points are connected as follows:


where are some small numbers selecting by how much the point is transformed with the respective generator The symmetry group is characterised by the generators which we want to obtain from the pointcloud. In particular the structure of the nearest neighbours carries the information about the generators. To extract them efficiently, one needs to find an appropriate regression setup where all components of the generators are constrained. For instance, considering just a single point in -dimensions gives via equation (10) conditions on the components. However, by appropriately utilising multiple points the generators can be completely identified. We find the generators as follows:

  1. If our dataset features several redundant dimensions or the inputs are not centered around the origin to pre-process the dataset by performing appropriate dimensional reduction and centering around the origin (e.g. via PCA).

  2. We generate an orthonormal basis as follows. We pick a point at random. The first basis vector is given by its associated normalised vector We then pick a further vector at random in the pointcloud and the second basis vector is given by the normalised version of We then complete the remaining orthonormal basis elements automatically.

  3. The next step is to filter out points which are close enough to the hyperplane spanned by and . This is the hyperplane in which the generator acts. As condition we use


    The more data points we have the smaller we can choose . Points in this ‘thick’ hyperplane feature neighbours in the direction of interest and points in the orthogonal direction. The contribution of these latter points to our regression problem is removed later with condition (15). Note that a too large will include all points – in particular also the poles on the sphere – which leads to a drop in performance.

  4. Within this points we now identify all pairs of points which are close to each other:


    This choice allows us to keep multiple point pairs and not just the nearest neighbour.

  5. Each of these neighbouring point pairs provides constraints relevant for determining one combination of the generators in Equation (10). At linear order this is given as


    where denotes the generator we determine. The normalisation factor ensures the correct numerical prefactors. denotes the sign which contains the appropriate directional information of the points for this hyperplane and is calculated by


    The necessity of can be understood by considering the example of identifying the generator of and considering point pairs in different quadrants. Each of these point pairs constrains up to components of the -components of Additional components are constrained by demanding that

  6. Using the above constraints in Equations (14) and (15) we now can constrain all components of the generator using linear regression. In practice we weigh the constraints arising from (15) stronger than constraints from (14), ensuring that (15) is definitely satisfied. This also removes the false directional information arising from point pairs arising due to the thickness of our hyperplane.

  7. By applying steps 2-5 multiple times we obtain generators for ‘all’ directional combinations. On the resulting generator candidates we perform principal component analysis. By analysing the standard deviation in these components we identify the relevant number of generators for the underlying pointcloud. The associated principal components to these generators reveal the algebra structure of these generators. Hence we determine the underlying symmetry group.

  8. To distinguish unitary from orthogonal groups such as in the example below where we distinguish between and additional care is needed in setting up the regression problem. The necessity arises as follows: Consider the orbit of a point on a unit sphere The entire orbit which is generated by both symmetries is given by and hence one cannot distinguish with just one pointcloud. However realistic situations such as the example with the superpotential (cf. Section 2) feature multiple orbits, one for each field. We can utilise this situation as we are equipped with two point pairs which are connected with the same transformation (neglecting for the moment that they can be in different representations). Here one can distinguish the transformations from and as the action on the first point pair fixes the generator completely, whereas for not all generators are fixed by the first transformation. Utilising both point pairs in our regression doubles the constraints arising from (14) and allows us to distinguish for instance and

Below, we discuss some numerical examples of these generators.

Figure 6: Three examples of pointclouds for with varying number of points and different noise where the respective parameters are shown in the plot title. The respective generator corresponds to the first PCA component which is singled out by our algorithm.

3.2 Examples

We design our examples in increasing complexity and capture various embeddings of symmetries to check the performance of our algorithm. The first warm-up example is that of a pointcloud generated by , i.e. points on a circle.

To test the stability of our algorithm we perform experiments with varying number of points and we add some Gaussian noise to the radius. Results for several examples are shown in Figure 6. Even for pointclouds with few points and large noise we find very good results for the generators. The large difference in the standard deviation from the first to the remaining components shows that this pointcloud is only connected with one generator. For the analysis shown here we use

The next examples we discuss are and Again we train pointclouds with varying total number of points and different levels of noise. For several choices of hyperparameters we show the standard deviations of the PCA-components in Figure 7. In both setups we again find consistently a steep decline in the standard deviation after three and six components respectively. For the experiment shown as the red curve in Figure 7 we obtain the following generators:


For the experiment we obtain the following generators

Figure 7: Left: The standard deviation of the PCA components for the example of Right: The results for the standard deviation of the PCA components of the example.

Next we turn to the discussion of and acting on four real dimensions. Our method should reveal the underlying generators three and respectively two generators rather than all six generators of Again we test our method on pointclouds with varying number of points and different noise. We provide an overview of our findings in Figure 8. For the case, the dominant generators found by our algorithm are given:


where these results correspond to the run with points shown in red in Figure 8. Note that to distinguish from it was necessary to utilise two pointclouds as described in bullet point of our algorithm. For we find, for instance in the case of the run associated to the parameters of the black curve in Figure 8

Figure 8: Left: The standard deviation of the PCA components for the example of Right: The results for the standard deviation of the PCA components of the example.

3.3 Rotated MNIST

The final example we discuss is the application of our algorithm on images. To do this we want to re-identify from the rotated MNIST dataset 7 In contrast to our previous examples we now want to identify the generators on a -dimensional space. However, as previously described, we can dimensionally reduce this space, for instance via PCA.

Our analysis proceeds as follows: We consider a subset of the rotated MNIST dataset, consisting of images of and their rotated versions Note that such a subset of the dataset easily emerges when doing a classification task. On the entire rotated MNIST dataset we perform PCA and consider the first three components. We apply this PCA transformation on the datasets containing only several rotated images of a single digit, e.g.  A visualisation of the orbits associated to several digits eight can be seen in Figure 9. On this pointcloud of digits eight, we now perform the remaining steps of our algorithms and find that the dominant generator is given by an rotation:


The respective standard deviations can be found in Figure 9 on the right. We clearly identify the generator of as the dominant generator.

Figure 9: Left: Pointcloud of first three PCA components of our rotated MNIST dataset. Highlighted in orange are the orbits of multiple digits eight. Gray points correspond to the other digits present in this dataset. Right: The standard deviation on the generators identified from this pointcloud for the digit eight.

3.4 Discrete symmetries – CICYs

To conclude this section we briefly return to the example of CICYs discussed in Section 2.2. Per construction the symmetries acting are discrete rather than continuous. To identify underlying symmetries – earlier referred to as identities (cf. (8)) – one needs to match identical transformations in different orbits acting on the input space. As our input dataset is precisely generated by these identities and such different representations are mapped to the same cluster in the embedding layer, our network does identify these identities. It will be interesting to analyse whether the network finds additional symmetries and identities which are yet unknown. However, this would require a different training approach with differently prepared datasets which we leave for future work.

4 Conclusions

Detecting symmetries in an automated fashion removes the necessity for domain knowledge associated to a particular data product. Such domain knowledge often might not be of existence or has been the outcome of scientific efforts such as in the development of the quark model [16]. In this article we introduced a method on how to detect symmetries with only very limited domain knowledge. The required domain knowledge was to be able to perform a ‘simple’ classification task which we think is often a realistic starting point.

We have discussed examples of basic symmetries appearing in physics such as rotational groups and The structure in the embedding layer does reveal these symmetries and hence provides orbits on the input space which are generated by these symmetries. In a second step we were able to pinpoint the nature of these continuous symmetries by our regression algorithm. Beyond rotational groups and we find that the embedding layer can be used to identify classes CICY-manifolds. It remains to be seen whether these methods can establish new identities in the case of the classification of -folds which is unknown to this date. For this analysis, we introduced a novel graph representation for CICYs which removes several redundancies of the matrix representation used up to now. In passing we note that this provides the first application of graph neural networks in string theory. We have not yet explored the full potential on other ML work on this dataset with this representation (cf. [17, 18, 19, 20, 21, 22] for other ML applications on the CICY dataset).

Another observation which appeared in this analysis is that the neural network has found a way to calculate topological invariants as required by Wall’s theorem which formalises how complex manifolds are completely characterised. We have not yet investigated this avenue but want to highlight that it will be exciting to compare these two complimentary approaches to classification. In which situations does a neural network obtain use such mathematically rigorous ways of classification?

We have seen that an important ingredient in our analysis are dimensional reduction tools – here in particular TSNE [3]. It remains to be seen in the future which additional structures TSNE and other techniques can reveal on datasets in mathematical physics, similar to structures seen in autoencoders [23].

Putting this method into perspective, we can find that our results can be improved with augmenting the pointclouds. Additional points can be obtained if an equation generating these orbits is known. In this context it might be useful to utilise the techniques recently described in [24]. Furthermore, our technique of identifying symmetries is useful to determine which symmetry equivariant architecture (cf. [25]) promises to be efficient for more sophisticated classification tasks. Beyond classification, another application in machine learning for utilising symmetries which has recently been proposed is in the context of reinforcement learning [26]. In either case, it promises to be extremely interesting to see which other symmetries can be found in every day and scientific datasets, going beyond a standard rotational invariance such as we discussed in the context of MNIST.

This is a proof of concept paper presenting several ways of identifying underlying symmetries in the data. Further scrutiny of these methods for other symmetries is in order. Now, it is even more tantalising to find out the underlying symmetry structures neural networks are dynamically using to achieve their remarkable performance.


We would like to thank Per Berglund, Harold Erbin, Andre Lukas for useful discussions. Significant parts of this work were performed during the workshop the Data Science revolution at the Simons Center for Geometry and Physics in Stony Brook.

Note: We are aware that Danilo Rezende and collaborators are working on similar questions related to identifying symmetries with neural networks.


  1. In our experiments, we find that regression does not expose symmetries in the same way.
  2. We use tensorflow with keras backend. For PCA and TSNE implementations we use [4].
  3. Our results do not require large hyperparameter tuning.
  4. We only define a training dataset because we are only interested in correctly classified data points. At this stage there is no necessity to construct a test or validation set.
  5. There is no obstruction to apply this procedure also in the previous situations.
  6. We thank Per Berglund and Andre Lukas for stressing this observation to us.
  7. Our rotated MNIST dataset consists of the first original images in the MNIST dataset and rotated versions of these images, totalling images. The rotation angles are chosen at random.


  1. T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781 (2013) .
  2. Q. Zhou, P. Tang, S. Liu, J. Pan, Q. Yan and S.-C. Zhang, Learning atoms for materials discovery, Proceedings of the National Academy of Sciences 115 (2018) E6411 [1807.05617].
  3. L. v. d. Maaten and G. Hinton, Visualizing data using t-sne, Journal of machine learning research 9 (2008) 2579.
  4. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825.
  5. P. Candelas, G. T. Horowitz, A. Strominger and E. Witten, Vacuum Configurations for Superstrings, Nucl. Phys. B258 (1985) 46.
  6. P. Candelas, A. M. Dale, C. A. Lutken and R. Schimmrigk, Complete Intersection Calabi-Yau Manifolds, Nucl. Phys. B298 (1988) 493.
  7. J. Gray, A. S. Haupt and A. Lukas, All Complete Intersection Calabi-Yau Four-Folds, JHEP 07 (2013) 070 [1303.1832].
  8. L. B. Anderson, J. Gray, A. Lukas and E. Palti, Two Hundred Heterotic Standard Models on Smooth Calabi-Yau Threefolds, Phys. Rev. D84 (2011) 106005 [1106.4804].
  9. L. B. Anderson, J. Gray, A. Lukas and E. Palti, Heterotic Line Bundle Standard Models, JHEP 06 (2012) 113 [1202.1757].
  10. T. Hubsch, Calabi-Yau manifolds: A Bestiary for physicists. World Scientific, Singapore, 1994.
  11. P. Green and T. Hubsch, Calabi-yau Manifolds as Complete Intersections in Products of Complex Projective Spaces, Commun. Math. Phys. 109 (1987) 99.
  12. “Cicy-list.”
  13. L. B. Anderson, Y.-H. He and A. Lukas, Monad Bundles in Heterotic String Compactifications, JHEP 07 (2008) 104 [0805.2875].
  14. S. Kearnes, K. McCloskey, M. Berndl, V. Pande and P. Riley, Molecular graph convolutions: moving beyond fingerprints, Journal of Computer-Aided Molecular Design 30 (2016) 595 [1603.00856].
  15. P. S. Green, T. Hubsch and C. A. Lutken, All Hodge Numbers of All Complete Intersection Calabi-Yau Manifolds, Class. Quant. Grav. 6 (1989) 105.
  16. M. Gell-Mann, The Eightfold Way: A Theory of strong interaction symmetry, .
  17. Y.-H. He, Machine-learning the string landscape, Phys. Lett. B774 (2017) 564.
  18. F. Ruehle, Evolving neural networks with genetic algorithms to study the String Landscape, JHEP 08 (2017) 038 [1706.07024].
  19. K. Bull, Y.-H. He, V. Jejjala and C. Mishra, Machine Learning CICY Threefolds, Phys. Lett. B785 (2018) 65 [1806.03121].
  20. D. Klaewer and L. Schlechter, Machine Learning Line Bundle Cohomologies of Hypersurfaces in Toric Varieties, Phys. Lett. B789 (2019) 438 [1809.02547].
  21. K. Bull, Y.-H. He, V. Jejjala and C. Mishra, Getting CICY High, Phys. Lett. B795 (2019) 700 [1903.03113].
  22. C. R. Brodie, A. Constantin, R. Deen and A. Lukas, Machine Learning Line Bundle Cohomology, Fortsch. Phys. 68 (2020) 1900087 [1906.08730].
  23. A. Mütter, E. Parr and P. K. S. Vaudrevange, Deep learning in the heterotic orbifold landscape, Nucl. Phys. B940 (2019) 113 [1811.05993].
  24. S. J. Wetzel, R. G. Melko, J. Scott, M. Panju and V. Ganesh, Discovering symmetry invariants and conserved quantities by interpreting siamese neural networks, 2003.04299.
  25. T. Cohen and M. Welling, Group equivariant convolutional networks, in International conference on machine learning, pp. 2990–2999, 2016.
  26. E. van der Pol, T. Kipf, F. A. Oliehoek and M. Welling, Plannable approximations to mdp homomorphisms: Equivariance under actions, 2002.11963.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description