Universal approximations of permutation invariant/equivariant functions by deep neural networks

Universal approximations of permutation invariant/equivariant functions by deep neural networks

Akiyoshi Sannai & Yuuki Takai
RIKEN Center for Advanced Intelligence Project/Keio University
Tokyo, Japan
{akiyoshi.sannai, yuuki.takai}@riken.jp
&Matthieu Cordonnier
École Normale Supérieure Paris-Saclay,
Cachan, France
matthieu.cordonnier@ens-paris-saclay.fr
Abstract

In this paper, we develop a theory about the relationship between -invariant/equivariant functions and deep neural networks for finite group . Especially, for a given -invariant/equivariant function, we construct its universal approximator by deep neural network whose layers equip -actions and each affine transformations are -equivariant/invariant. Due to representation theory, we can show that this approximator has exponentially fewer free parameters than usual models.

\iclrfinalcopy

1 Introduction

Deep neural networks have great success in many applications such as image recognition, speech recognition, natural language process and others as Alex et al. (2012), Goodfellow et al. (2013), Wan et al. (2013), and Silver et al. (2017). A common strategy in their works is to construct larger and deeper networks. However, one of the main obstructions about using very deep and large networks for learning tasks is the so-called curse of dimensionality. Namely, if the parameters’ dimension increase, so does the required sample size. Then, the computational complexity becomes exponentially higher. An idea to overcome this is to design models with respect to the objective structure.

Zaheer et al. (2017) designed a model adapted to machine learning tasks defined on sets, which are, from a mathematical point of view, permutation invariant or equivariant tasks. They demonstrate surprisingly good applicability on their method on population statistic estimation, point cloud classification, set expansion, and outliers detection. Empirically speaking, their results are really significant. Many researchers studied invariant/equivariant networks as Qi et al. (2017), Hartford et al. (2018), Risi Kondor (2018), Maron et al. (2019a), Bloem-Reddy & Teh (2019), Kondor & Trivedi (2018), and so on. Nevertheless, theoretical guarantee of their methods is not sufficiently considered. One of our motivations is to establish a theoretical guarantee. In this paper, we prove an invariant/equivariant version of the universal approximation theorem by constructing an approximator. For the symmetric group, our approximator is close to the equivariant model of Zaheer et al. (2017) in a sense (see a remark after Theorem 2.2). We can calculate the number of the free parameters appearing in our invariant/equivariant model, and show that this number is exponentially smaller than one of the usual models.

For usual deep neural networks, a universal approximation theorem was first proved by Cybenko (1989). It states that, when the width goes to infinity, a (usual) neural network with a single hidden layer can, with arbitrary accuracy, approximate any continuous function with compact support. Though his theorem was only for sigmoid activation functions, there are further versions of this theorem which allows some wider classes of activation functions. In the recent literature, the most commonly used activation function is the ReLU (Rectified Linear Unit) function, which is the one we focus on in this paper. Some important previous works on universal approximation theorem for ReLU activation function are by Barron (1994), Hornik et al. (1989), Funahashi (1989), Kůrková & Sanguineti (2002), and Sonoda & Murata (2017). In particular, for a part of the proof of our main theorem, we borrow the results of Sonoda & Murata (2017) and Hanin & Sellke (2017). The interest of the universal approximation theorem in learning theory is to guarantee that we can search in the space which contains the solutions. The universal approximation theorem states the existence of the model which approximates the target function in arbitrary accuracy. This means that if we use the suitable algorithm, we have the desired solutions. We cannot guarantee such situations without the universal approximation theorem. Our universal approximation theorem allows us to apply representation theory. By this point of view, we can calculate the number of free parameters of our approximator.

In the equivariant case, a technical key point of the proof is the one to one correspondence between -equivariant functions and -invariant functions. Here, is a finite group and is the subgroup of consisting of the elements which fix . We first confirm this correspondence at the function level. After that, we rephrase it by deep neural networks. This correspondence enables us to reduce the equivariant case to the invariant case.

The invariant case has already established by some researchers Zaheer et al. (2017), Yarotsky (2018), Maron et al. (2019b). For , here is the symmetric group of degree , Zaheer et al. (2017) showed that a representation theorem of -invariant function which is famous as a solution for the Hilbert’s 13th problem by Kolmogorov (1956) and Arnold (1957) gives us an explicit description. Due to this theorem and the usual universal approximation theorem, we can construct a concrete deep neural network of the invariant model. Recently, Maron et al. (2019b) proved an invariant version of the universal approximation theorem for any finite group using tensor structures. We borrow their results to obtain our main results.

1.1 Contributions

Our contributions are summarized as follows:

We prove an invariant/equivariant version of the approximation theorems, which is a one step to understand the behavior of deep neural networks with permutations or more generally group actions.

Using representation theory, we calculate the number of free parameters appearing in our models. As a result, the number of parameter in our models is exponentially smaller than the one of the usual models. This means that our models are easier to train than the usual models.

Although our model is slightly different from the equivariant model of Zaheer et al. (2017) for , our theorem guarantees that our model for finite group can approximate any -invariant/equivariant functions.

1.2 Related works

Group theory, or symmetry is an important concept in mathematics, physics, and machine learning. In machine learning, deep symmetry networks (symnets) is designed by Gens & Domingos (2014) as a generalization of convnets that forms feature maps over arbitrary symmetry groups. Group equivariant Convolutional Neural Networks (G-CNNs) is designed by Cohen & Welling (2016), as a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. The models for permutation invariant/equivariant tasks are designed by Zaheer et al. (2017) to give great results on population statistic estimation, point cloud classification, set expansion, and outlier detection.

The universal approximation theorem is one of the most classical mathematical theorems of neural networks. As we saw in the introduction, Cybenko (1989) proved this theorem in 1989 for sigmoid activation functions. After his achievement, some researchers showed similar results to generalize the sigmoid function to a larger class of activation functions as Barron (1994), Hornik et al. (1989), Funahashi (1989), Kůrková (1992) and Sonoda & Murata (2017).

As mentioned above, the invariant case has been established. For , Zaheer et al. (2017) essentially proved an invariant version of an universal approximation theorem. Yarotsky (2018) gave a more explicit -invariant approximator by a shallow deep neural network. Maron et al. (2019b) considered a -invariant model with some tensor structures for any finite group . An equivariant version for finite group by shallow (hyper-)graph neural networks is proved by Keriven & Peyré (2019). Our architecture of approximator is different from theirs. Moreover, although they proved only for “squashing functions” which exclude ReLU functions, our theorem allows us to use the ReLU functions. We also remark that our setting in this paper is quite general. In particular, ours include tensor structures, hence graph neural networks. It must be interesting to compare the numbers of free parameters of models of us and Keriven & Peyré (2019).

2 Preliminaries and main results

In this paper, we treat fully connected deep neural networks. We mainly consider ReLU activation functions. Here, the ReLU activation function is defined by

We remark that our argument during this paper works for any activation functions which satisfy a usual universal approximation theorem. A deep neural network is built by stacking the blocks which consist of a linear map and a ReLU activation. More formally, it is a function from to defined by , where . In this case, is called the width of the -th layer. The output of the deep neural networks is

(1)

where is called the depth of the deep neural network. We define the width of a deep neural network as the maximum of the widths of all layers.

Our main objects are deep neural networks which are invariant/equivariant with actions by a finite group . We review some facts about groups and these actions here. Some more details are written in Appendix A. Let be the group consisting of permutations of elements . This is called the symmetric group of degree . The symmetric group acts on by the permutation for . By Proposition A.1 that any finite group can be regarded as a subgroup of for a positive integer . Then, also acts on by the action as an element of . For , we define the orbit of as . Then, the set can be divided to a disjoint union of the orbits: .

Let be a finite group action on . For , we define the stabilizer subgroup of associated with by the subgroup of elements of fixing . Then, by Proposition A.2, the orbit and the set of cosets are bijective. When , , and , we set .

We next consider an action of on the vector space . The left action “” of on is defined by

for and . We also call this the permutation action of on . If there is an injective group homomorphism , acts on by the permutation action as an element of by : For and , we define . Then, we simply say that acts on .

Example 2.1.

The finite group is embedded into by , where is transposition between and . In this case, the orbit decomposition of is . By this embedding , an -action on is defined: .

Example 2.2 (Tensors).

The group action of which is a subgroup of on tensors as in Maron et al. (2019b) is realized as follows: An -action on is defined by the following injective homomorphism : We fix a bijection from to , and for , is defined by

for and . Then, for a tensor , acts on by . This action is same as one of Maron et al. (2019b).

Example 2.3 (-tuple of -dimensional vectors).

We identify with and define as . Let be an -tuple of -dimensional vectors. Then, for , . This means a permutation of vectors.

Definition 2.1.

Let be a finite group. We assume that an injective homomorphisms are given. Then, acts on . We say that a map is -invariant if for any and any . We also assume that . Then, we say that a map -equivariant if for any .

When and the actions are induced by permutation, we call -invariant (resp. -equivariant) functions as permutation invariant (resp. permutation equivariant) functions.

We define -invariance and -equivariance for deep neural networks. We can easily confirm that the models in Zaheer et al. (2017) satisfies these properties.

Definition 2.2.

We say that a deep neural network as (1) is -equivariant if an action of on each of layers is given by embedding and the all corresponding map is -equivariant. We say that a deep neural network is -invariant if there is a positive integer such that -actions on each layer for are given and the corresponding map is -equivariant for and the map is -invariant.

Some approximation theorems for invariant functions have been already known:

Proposition 2.1 (-invariant version of universal approximation theorem).

Let be a finite group which is a subgroup of . Let be a compact set in which is stable for the corresponding -action in . Then, for any which is continuous and -invariant and for any , the following -invariant ReLU neural networks satisfy that these represented functions satisfy :

  • , where is the summation (-invariant part) and is a linear map such that for any and any . Here, the actions on each layers except for the output layer are same as Example 2.2.

  • For . , where (resp. ) is a deep neural network approximating (resp. ) defined below.

Diagram 1: A neural network approximating -invariant function . In blue: the inputs, in red: the output, in green: and who have to be learned.

This proposition for is proven by Zaheer et al. (2017), Yarotsky (2018). For general finite group , Maron et al. (2019b) proved it. Diagram 1 illustrates the -invariant ReLU neural network appeared in Proposition 2.1. The key ingredient of the proof by Zaheer et al. (2017) is the following Kolmogorov-Arnold representation theorem:

Theorem 2.1 (Zaheer et al. (2017) Kolmogorov-Arnold’s representation theorem for permutation actions).

Let be a compact set. Then, any continuous -invariant function can be represented as

(2)

for some continuous function . Here, .

Since has only one variable, we line up the copies of the network which approximates . Then, by combining and the network which approximates , we obtain the network which approximates . By using the theorem of Hanin & Sellke (2017) (resp. Sonoda & Murata (2017)), we obtain the bound of the width (resp. the depth) for approximation of and . Maron et al. (2019b) proved this proposition using a tensor structure.

The main theorem is a -equivariant version of universal approximation theorem. To state the main theorem, we need some notation. Let be a finite group acting on . We set the orbit decomposition of as , and let . Without loss of generality, we may reorder as

and for . For each , let be the coset decomposition by > Then, by … we may assume that satisfies for . Then, the main theorem is the following:

Theorem 2.2 (-equivariant version of universal approximation theorem).

Let be a finite group which is a subgroup of . Let be a compact set in which is stable for the corresponding -action in . Then, for any which is continuous and -equivariant and for any , the following -invariant ReLU neural network satisfies that these represented functions satisfy :

Here, is the -invariant deep neural network approximating as in Proposition 2.1, and the actions on each layers are defined as follows: Each of hidden layers are written by for a vector space . On this space, acts on () by

where is the element of satisfying for some .

Diagram 2: A neural network approximating -equivariant map

For , when is represented by as in (1), we can consider that are -equivariant by usual permutation. In this sense, our model is close to the -equivariant model in Zaheer et al. (2017).

Our strategy for the proof is the following: At first, we establish the correspondence between -invariant functions and -equivariant functions. By this correspondence, we take -invariant function corresponding to the objective function . By Proposition 2.1, we can approximate by a -invariant network . Using , we construct the -equivariant network which approximates . Diagram 2 illustrates the -equivariant ReLU neural network appeared in Theorem 2.2.

Due to our universal approximation theorems, if the free parameters of the invariant/equivariant models are fewer than the ones of the usual models, we have a guarantee for using the invariant/equivariant models. The following definition illustrates the swapping of nodes.

Definition 2.3.

Let be an index set of nodes in a layer. We say an -action on is a union of permutations if , where each has elements and acts on by permutation.

Theorem 2.3.

Let be an -invariant model of depth and width whose number of the equivariant layers is (resp. an -equivariant model of depth and width ). Assume that the action is a union of permutations on nodes in each equivariant layer. Then, the number of free parameters in this model is bounded by (resp. ).

Note that the number of free parameters in the usual model is . Hence, this theorem implies that the free parameters of the invariant/equivariant models are exponentially fewer than the ones of the usual models.

3 Equivariant case

In this section, we prove Theorem 2.2, namely, the equivariant version of the universal approximation theorem. The key ingredient is the following proposition (proof is in Appendix C):

Proposition 3.1.

Notations are same as Theorem 2.2. Then, a map is -equivariant if and only if can be represented by for some -invariant functions . Here, is regarded as a linear map .

For simplicity, we prove Theorem 2.2 only for . We can show the general case by a similar argument. More precisely, we construct an -equivariant deep neural network approximating the given -equivariant function. Similarly, we can prove this theorem for any finite group . To show Theorem 2.2 for , we divide the proof to four steps as follows:

  1. By Proposition 3.1 proved below, we reduce the argument on -equivariant map to the one of -invariant function .

  2. Modifying Theorem 2.1, we have a representation of -invariant function .

  3. Using the above representation, we have a -invariant deep neural net which approximates and construct a deep neural network approximating .

  4. We introduce a certain action of on which appears the first hidden layer naturally and show the -equivariance between the input layer and the first hidden layer.

We first investigate step 1. We recall that, during this section, we only consider the action of on induced from permutation . Then, we remark that the orbit of by this action is the total set and the coset decomposition of by is . Thus, we have the following:

Corollary 3.1.

A map is -equivariant if and only if there is -invariant function satisfying . Here, is the transposition between and and is regarded as a linear map .

Next, we consider step 2. The stabilizer subgroup is isomorphic to as a group by Lemma. Hence, we can regard the -invariant function as an -invariant function. This point of view allows us to apply Theorem 2.1 to . Hence, we have the following representation theorem of -invariant functions as a corollary of Theorem 2.1.

Corollary 3.2 (Representation of -invariant function).

Let be a compact set, let be a continuous and -invariant function. Then, can be represented as

for some continuous function . Here, is similar as in Theorem 2.1.

By this corollary, we can represent the -invariant function as , where and are

Then, we consider step 3, namely, the existence of -invariant deep neural network approximating the function . After that, using this approximator, we construct a deep neural network approximating -equivariant function . By a slight modification of the invariant version of Proposition 2.1 for -invariant case, there exists a sequence of deep neural networks (resp. ) which converges to (resp. ) uniformly. Then, the sequence of deep neural networks converges to uniformly.

Diagram 3: A neural network approximating the -invariant function

Now, can be approached by the following deep neural network by replacing and by universal approximators as Diagram 3. We remark that the left part (the part of before taking sum) of this deep neural network is naturally equivariant for the action of . For an -equivariant map with the natural action, by Proposition 3.1, there is a unique -invariant function such that Here, and we regard any element of as a map from to . By the argument in this section, we can approximate by the previous deep neural network . Substituting for , we construct a deep neural network approximating as Diagram 2.

The represented function of this neural network of is . The map splits into two parts, the part of transpositions and part of . On the deep neural network corresponding to as above, the latter part corresponds to the layers from the first hidden layer to the output layer. This part is the copies of same -invariant deep neural network (an approximation of ). Thus, this part is clearly made of equivariant stacking layers for the permutation action of . Hence, it is remained to show that the former part is also -equivariant.

We here investigate bounds of the width and the depth of approximators. By Theorem 2.1, each of and can be approximated by a shallow neural network. Hence, if we do not bound the width, we can obtain deep neural network approximating with depth three. On the other hand, by Theorem 2.1 again, if we do not bound the depth, (resp. ) can be approximated by a deep neural network with width (resp. ). Thus, we can obtain a deep neural network approximating with width bounded from above by .

Finally, as step 4, we show that our deep neural network is an -equivariant deep neural network. The most difficult part is to show the equivariance between the input layer and the first hidden layer presented by a function as for for a certain -invariant linear function . (Although is equal to , we distinguish them to stress the difference.) Unfortunately, the permutation action on the latter space does not make the function equivariant. For this reason, we need to define another action of on exploiting the -equivariance among each copies.

Definition 3.1.

We suppose that acts on by permutation, denoted by (i.e., by regarding as a subgroup of ). Then, we define the action “” of on as follows:

for any , and for any . Here, for any , is an element of defined as .

We will prove the well-definedness of this action “” in Appendix D. This action is obtained by the injective homomorphism

for . We remark that this action “” naturally appears in representation theory as the induced representation, which is an operation to construct a representation of group from a representation of the subgroup of .

We conclude the proof of Theorem 2.2 by showing the -equivariance of :

Lemma 3.1.

The function is -equivariant.

The detail of proof is in Appendix E. By this lemma, we conclude the proof of Theorem 2.2. We remark that the affine transformation is corresponding to in the notation (1). By a representation theoretic aspect, this is an intertwining operator between these representation spaces. This affine transformation has free parameters a priori. However, by -equivariance and a representation theoretic argument, in principle, can be written by only five parameters. By a similar argument, for the other hidden layers, the affine transformation has parameters (though a priori).

4 Dimension reduction

In this section, we give the proof of Theorem 2.3.

Proposition 4.1.

Let be an -equivariant map. Assume that the -action on and is a union of permutations. Then, divides and , and the number of the free parameters in is equal to .

Proof.

Since and have union of permutation actions, by considering the orbit of the cordinates, we see that devides and . Let us write and . In this case, is written by sum of matrices , namely . Here, each matrix corresponds to the linear map:

where the first map is the inclusion to coordinates of and the last map is projection to coordinates of . Since these constructions are taken to be compatible with -action, we see that is -equivariant. If the activation functions are bijective, we are done because of the same discussion as in the proof of Lemma 3 in Zaheer et al. (2017). But in our case, we need more discussion, which is in Appendix G. ∎

Proof of Theorem 2.3.

By Proposition 4.1, the number of the free parameter in each equivariant layer is bounded by . Hence we obtain the desired bound. ∎

5 Conclusion

We introduced some invariant/equivariant models of deep neural networks which are universal approximators for invariant/equivariant functions. The universal approximation theorems in this paper and the discussion in Section 4 show that although the free parameters of our invariant/equivariant models are exponentially fewer than the ones of the usual models, the invariant/equivariant models can approximate the invariant/equivariant functions to arbitrary accuracy. Our theory also implies that there is much possibility that the group models behave as the usual models for the tasks related to groups and representation theory can be powerful tool for theory of machine learning. This must be a good perspective to develop the models in deep learning.

References

  • Alex et al. (2012) Krizhevsky Alex, Sutskever Sutskever, Ilya, and Hinton Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
  • Arnold (1957) Vladimir I. Arnold. On functions of three variables. Proceeding of the USSR Academy of Sciences, 114:679–681, 1957. English translation: Amer. Math. Soc. Transl., 28 (1963), pp. 51–54.
  • Barron (1994) Andrew R Barron. Approximation and estimation bounds for artificial neural networks. Machine learning, 14(1):115–133, 1994.
  • Bloem-Reddy & Teh (2019) Benjamin Bloem-Reddy and Yee Whye Teh. Probabilistic symmetry and invariant neural networks. arXiv preprint arXiv:1901.06082, 2019. URL https://arxiv.org/abs/1901.06082.
  • Cohen & Welling (2016) Taco S Cohen and Max Welling. Group equivariant convolutional networks. Proceedings of the 33rd International Conference on Machine Learning, 4, 2016. URL: arXivpreprintarXiv:1602.07576,2016..
  • Cybenko (1989) George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
  • Funahashi (1989) Ken-Ichi Funahashi. On the approximate realization of continuous mappings by neural networks. Neural networks, 2(3):183–192, 1989.
  • Gens & Domingos (2014) Robert Gens and Pedro M Domingos. Deep symmetry networks. In Advances in Neural Information Processing Systems, pp. 2537–2545, 2014.
  • Goodfellow et al. (2013) Ian J Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082, 2013.
  • Hanin & Sellke (2017) Boris Hanin and Mark Sellke. Approximating continuous functions by relu nets of minimal width. arXiv preprint arXiv:1710.11278, 2017.
  • Hartford et al. (2018) Jason Hartford, Devon Graham, Kevin Leyton-Brown, and Siamak Ravanbakhsh. Deep models of interactions across sets. In International Conference on Machine Learning, pp. 1914–1923, 2018.
  • Hornik et al. (1989) Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
  • Keriven & Peyré (2019) Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. arXiv preprint arXiv:1905.04943, 2019.
  • Kolmogorov (1956) Andrey Nikolaevich Kolmogorov. On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables. Proceeding of the USSR Academy of Sciences, 108:176–182, 1956. English translation: Amer. Math. Soc. Transl., 17 (1961), pp. 369–373.
  • Kondor & Trivedi (2018) Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 2747–2755, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
  • Kůrková (1992) Věra Kůrková. Kolmogorov’s theorem and multilayer neural networks. Neural networks, 5(3):501–506, 1992.
  • Kůrková & Sanguineti (2002) Věra Kůrková and Marcello Sanguineti. Comparison of worst case errors in linear and neural network approximation. IEEE Transactions on Information Theory, 48(1):264–275, 2002.
  • Maron et al. (2019a) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019a. URL https://openreview.net/forum?id=Syx72jC9tm.
  • Maron et al. (2019b) Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant networks. Proceedings of the 36th International Conference on Machine Learning, 97, 2019b.
  • Qi et al. (2017) Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660, 2017.
  • Risi Kondor (2018) Horace Pan Shubhendu Trivedi Brandon Anderson Risi Kondor, Hy Truong Son. Covariant compositional networks for learning graphs, 2018. URL https://openreview.net/forum?id=S1TgE7WR-.
  • Silver et al. (2017) David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
  • Sonoda & Murata (2017) Sho Sonoda and Noboru Murata. Neural network with unbounded activation functions is universal approximator. Applied and Computational Harmonic Analysis, 43(2):233–268, 2017.
  • Wan et al. (2013) Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In International conference on machine learning, pp. 1058–1066, 2013.
  • Yarotsky (2018) Dmitry Yarotsky. Universal approximations of invariant maps by neural networks. arXiv preprint arXiv:1804.10306, 2018. URL: https://arxiv.org/abs/1804.10306.
  • Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in neural information processing systems, pp. 3391–3401, 2017.

Appendix A Review of groups and group actions

Let be a set with product, i.e., for any , is defined an element of . Then, is called a group if satisfies the following conditions:

  1. There is an element such that for all .

  2. For any , there is an element such that .

  3. For any , holds.

Let be two finite groups. We say a map is a (group) homomorphism if . This means that the map preserves the group structures of in .

Next, we review actions of groups. Let be a set. An action of (or -action) on is defined as a map satisfying the following:

  1. For any , .

  2. For any and , .

If these conditions are satisfied, we say that acts on by a left action.

Example A.1.

An example which we mainly consider in this paper is the permutation group of elements:

and the product of is given by the composition as maps. acts on the set by the permutation .

We remark that actions of on is not unique, for example, the trivial action for any is also one of action. Hence, when we stress the difference of some actions, we use some distinguished notation for each actions as “” or “” etc.

Let be a group and a subset of . We call a subgroup of if is a group with the same product as .

Example A.2.

Let be a finite group acting on a set . For an element , the stabilizer subgroup of G with respects to is defined by

When and , we use the following notation: .

Let be two groups. If there is an injective homomorphism , the image can be a subgroup of . Then, we say that is an embedding of group to . Moreover, if acts on a set , then also acts on through , i.e., by for and .

Then, the following proposition holds:

Proposition A.1.

Any finite group can be embedded into for some .

Proof.

Let and . For any , for some as . We set . Then, we define . It is easy to show that this is an injective homomorphism. ∎

This proposition implies that any finite group can be realized as a permutation action on for some .

Let be a finite group acting on . Then, for , we define the (-)orbit of as

Then, for , the relation that and are in a same orbit is an equivalent relation. Hence, can be divided to a disjoint union of the equivalent classes of this equivalent relation:

We call this the (-)orbit decomposition of .

Let be a subgroup of a finite group . Then, for , the set

is called the left coset of with respect to . The relation that two elements and are in a same coset is also an equivalent relation. Hence, we can divide to a disjoint union of equivalent classes of this relation:

We call this decomposition the right coset decomposition of by . We set as the set of the left cosets of by H:

Then, there is a relation between an orbit and a set of cosets:

Proposition A.2.

Let be a finite group acting on a set . For , the map

is bijective.

Proof.

It is easy to check well-definedness and bijectivity. ∎

For acting on , the -orbit of is only one, i.e., . Hence, the following holds (the permutation action of is defined by taking inverse , hence we need to consider the set of right cosets):

Corollary A.1.

The map

is bijective.

Appendix B Proof of Proposition 2.1

Proof of Proposition 2.1.

We may assume . In fact, since we consider the -norm, if all components of