Intertwiners between Induced Representations with Applications to the Theory of Equivariant Neural Networks (Preliminary Report)

Intertwiners between Induced Representations
with Applications to the Theory of Equivariant Neural Networks
(Preliminary Report)

Taco S. Cohen Qualcomm Research, Qualcomm Technologies Netherlands B.V. Mario Geiger EPFL Maurice Weiler QUVA Lab, University of Amsterdam
February 2018
Abstract

Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields (“feature channels”), whereas the steerable G-CNN can also use vector or tensor fields (“capsules”) to represent data. In algebraic terms, the feature spaces in regular G-CNNs transform according to a regular representation of the group , whereas the feature spaces in Steerable G-CNNs transform according to the more general induced representations of . In order to make the network equivariant, each layer in a G-CNN is required to intertwine between the induced representations associated with its input and output space.

In this paper we present a general mathematical framework for G-CNNs on homogeneous spaces like Euclidean space or the sphere. We show, using elementary methods, that the layers of an equivariant network are convolutional if and only if the input and output feature spaces transform according to an induced representation. This result, which follows from G.W. Mackey’s abstract theory on induced representations, establishes G-CNNs as a universal class of equivariant network architectures, and generalizes the important recent work of Kondor & Trivedi on the intertwiners between regular representations.

In order for a convolution layer to be equivariant, the filter kernel needs to satisfy certain linear equivariance constraints. The space of equivariant kernels has a rich and interesting structure, which we expose using direct calculations.

Additionally, we show how this general understanding can be used to compute a basis for the space of equivariant filter kernels, thereby providing a straightforward path to the implementation of G-CNNs for a wide range of groups and manifolds.

\newmdenv

[shadow=true,shadowcolor=black,font=,rightmargin=8pt]shadedbox

1 Introduction

In recent years, the Convolutional Neural Network (CNN) has emerged as the primary model class for learning from signals such as audio, images, and video. Through the use of convolution layers, the CNN is able to exploit the spatial locality of the input space, and the translational symmetry (invariance) that is inherent in many learning problems. Because convolutions are translation equivariant (a shift of the input leads to a shift of the output), convolution layers preserve the translation symmetry. This is important, because it means that further layers of the network can also exploit the symmetry.

Motivated by the success of CNNs, many researchers have worked on generalizations, leading to a growing body of work on group equivariant networks (Cohen and Welling, 2016, 2017; Worrall et al., 2017; Weiler et al., 2018; Thomas et al., 2018; Kondor, 2018). Generalization has happened along two mostly orthogonal directions. Firstly, the symmetry groups that can be exploited was expanded beyond pure translations, to other transformations such as rotations and reflections, by replacing convolutions with group convolutions (Cohen and Welling, 2016). The feature maps in these networks transform as scalar fields on the group or a homogeneous space (Kondor and Trivedi, 2018). We will refer to such networks as regular G-CNNs, because the transformation law for scalar fields is known as the regular representation of .

Initially, regular G-CNNs were implemented for planar images, acted on by discrete translations, rotations, and reflections. Such discrete G-CNNs have the advantage that they are easy to implement, easy to use, fast, and result in improved results in a wide range of practical problems, making them a natural starting point for the generalization of CNNs. However, the concept is much more general: because G-CNNs were formulated in abstract group theoretic language, they are easily generalized to any group or homogeneous space that we can sum or integrate over (Kondor and Trivedi, 2018). For instance, Spherical CNNs (Cohen et al., 2018) are G-CNNs for the 3D rotation group acting on the sphere (a homogeneous space for ).

Figure 1: To transform a planar vector field by a rotation , first move each arrow to its new position, keeping its orientation the same, then rotate the vector itself. This is described by the induced representation , where is a rotation matrix that mixes the two coordinate channels.

The second direction of generalization corresponds to a move away from scalar fields. Using connections to the theory of steerable filters (Freeman and Adelson, 1991) and induced representations (Ceccherini-Silberstein et al., 2009; Figueroa-O’Farrill, 1987; Gurarie, 1992), the feature space was generalized to vector- and tensor fields, and even more general spaces (sections of homogeneous vector bundles) (Cohen and Welling, 2017; Weiler et al., 2018; Worrall et al., 2017; Thomas et al., 2018; Kondor, 2018; Kondor et al., 2018). We will refer to these networks as steerable or induced G-CNNs, because the filters in these networks are steerable, and the associated transformation law is called the induced representation (see Fig. 1).

Thus, the general picture that has emerged is one of networks that use convolutions to map between spaces of sections of homogeneous vector bundles in a group equivariant manner. The classical CNN, mapping scalar fields (a.k.a. feature channels) on the plane to scalar fields on the plane in a translation equivariant manner, is but one special case. In this paper we study the general class of induced G-CNNs, and in particular the space of equivariant linear maps (intertwiners) between two induced representations associated with the input and output feature space of a network layer. We show that any equivariant map between induced representations can be written as a (twisted) convolution / cross-correlation, thus generalizing the results of Kondor and Trivedi (2018), who showed this for regular representations111Since the regular representation of on is the representation of induced from the trivial representation of , the results of Kondor & Trivedi can be obtained from ours by filling in the trivial representation whenever one encounters in this paper..

The induced representation has been studied extensively by physicists and mathematicians. The word “induced” comes from the fact that the transformation law of e.g. a vector field can be inferred from the transformation law of an individual vector under the action of a certain isotropy (or “stabilizer”) subgroup of the symmetry group. For instance, when applying a 3D rotation to a vector field on the sphere, each vector is moved to a new position by the 3D rotation, and the vector itself is rotated in its tangent plane by a 2D rotation (This is illustrated in Fig. 1 for a planar vector field). Thus, we say that this vector field transforms according to the representation of induced by the canonical representation of . As another example, a higher order tensor transforms according to a different representation of , so a tensor field on the sphere transforms according to a different induced representation of .

Induced representations are important in physics because they are the primary tool to construct irreducible representations, which enumerate the types of elementary particles of a physical (field) theory. In representation learning, the idea of irreducible representations as elementary particles has been applied to formalize the idea of “disentangling” or “capsules” that represent distinct visual entities (Cohen and Welling, 2014, 2015), each of which has a certain type (Cohen and Welling, 2017). Indeed, we think of induced G-CNNs as the mathematically grounded version of Hinton’s idea of capsules (sans dynamic routing, for now) (Hinton et al., 2011; Sabour et al., 2017; Hinton et al., 2018).

The general formalism of fiber bundles has also been proposed as a geometrical tool for modelling early visual processing in the mammalian brain (Petitot, 2003). Although it is far too early to conclude anything, this convergence of physics, neuroscience, and machine learning suggests that field theories are not just for physicists, but provide a generally useful model class for natural and man-made learning systems.

1.1 Outline and Summary of Results

In order to understand and properly define the induced representation, we need some notions from group- and representation theory, such as groups, cosets, double cosets, quotients, sections, and representations. In section 2.1 we will define these concepts and illustrate them with two examples: the rotation group and Euclidean motion group . Although necessary for a detailed understanding of the rest of the paper, this section is rather dry and may be skimmed on a first reading.

Induced representations are defined in section 3. We present two of the many equivalent realizations of the induced representation. The first realization describes the transformation law for the vector space of sections of a vector bundle over , such as vector fields over the sphere . This realization is geometrically natural, and such vector fields can be stored efficiently in computer memory, making them the preferred realization for implementations of induced G-CNNs. The downside of this realization is that, due to the use of an arbitrary frame of reference (choice of section), the equations describing it get quite cumbersome. For this reason, we also discuss the induced representation realized in the space of vector-valued functions on , having a certain kind of symmetry (the space of Mackey functions). We define a “lifting” isomorphism from to to show that they are equivalent:

(1)

In section 4 we study the space of linear equivariant maps, or intertwiners, between two representations and , induced from representations of subgroups and . Denoting this space by or (depending on the chosen realization of or ), we find that (of course) they are equivalent, and more importantly, that any equivariant map can be written as a special kind of convolution or correlation with an equivariant kernel on or , respectively. Furthermore, these spaces of equivariant kernels, denoted and , are shown to be equivalent to a space of kernels on the double coset space , denoted . This is summarized in the following diagram of isomorphisms:

(2)

The map takes a kernel to the “neural network layer” ,

(3)

by using the kernel in a cross-correlation denoted . The map is defined similarly. That is an isomorphism means that any equivariant map can be written as a convolution with an appropriate kernel .

The kernels in and have to satisfy certain equivariance constraints. These constraints can be largely resolved by moving to , where finding a solution is typically easier. Using the results of this paper, finding a basis for the space of equivariant filters for a new group should be relatively straightforward.

Having seen the main results derived in a relatively concrete manner, we proceed in section 5 to show how these results relate to Mackey’s theory of induced representations, which is usually presented in a more abstract language. Then, in section 6, we show how to actually compute a basis for for the case of and .

2 Mathematical Background

2.1 General facts about Groups and Quotients

Let be a group and a subgroup of . A left coset of in is a set for . The cosets form a partition of . The set of all cosets is called the quotient space or coset space, and is denoted . There is a canonical projection that assigns to each element the coset it is in. This can be written as . Fig. 2 provides an illustration for the group of symmetries of a triangle, and the subgroup of reflections.

The quotient space carries a left action of , which we denote with for and . This works fine because this action is associative with the group operation:

(4)

for . One may verify that this action is well defined, i.e. does not depend on the particular coset representative . Furthermore, the action is transitive, meaning that we can reach any coset from any other coset by transforming it with an appropriate . A space like on which acts transitively is called a homogeneous space for . Indeed, any homogeneous space is isomorphic to some quotient space .

A section of is a map such that . We can think of as choosing a coset representative for each coset, i.e. . In general, although is unique, is not; there can be many ways to choose coset representatives. However, the constructions we consider will always be independent of the particular choice of section.

Although it is not strictly necessary, we will assume that maps the coset of the identity to the identity :

(5)

We can always do this, for given a section with , we can define the section so that . This is indeed a section, for (where we used Eq. 4 which can be rewritten as ).

One useful rule of calculation is

(6)

for and . The projection onto is necessary, for in general . These two terms are however related, through a function , defined as follows:

(7)

That is,

(8)

We can think of as the element of that we can apply to (on the right) to get . The function will play an important role in the definition of the induced representation, and is illustrated in Fig. 2.

Figure 2: A Cayley diagram of the group of symmetries of a triangle. The group is generated by rotations and flips . The elements of the group are indicated by hexagons. The red arrows correspond to right multiplication by , while the blue lines correspond to right multiplication by . Cosets of the group of flips () are shaded in gray. As always, the cosets partition the group. As coset representatives, we choose , , and . The difference between and is indicated. For this choice of section, we must set , so that .

From the fiber bundle perspective, we can interpret Eq. 8 as follows. The group can be viewed as a principal bundle with base space and fibers . If we apply to the coset representative , we move to a different coset, namely the one represented by (representing a different point in the base space). Additionally, the fiber is twisted by the right action of . That is, moves to another element in its coset, namely to .

The following composition rule for is very useful in derivations:

(9)

For elements , we find:

(10)

Also, for any coset ,

(11)

Using Eq. 9 and 11, this yields,

(12)

for any and .

For , Eq. 8 specializes to:

(13)

where we defined

(14)

This shows that we can always factorize uniquely into a part that represents the coset of , and a part that tells us where is within the coset:

(15)

A useful property of is that for any ,

(16)

It is also easy to see that

(17)

When dealing with different subgroups and of (associated with the input and output space of an intertwiner), we will write for an element of , , for the corresponding section, and for the -function (for ).

2.2 Double cosets

A -double coset is a set of the form for subgroups of . The space of -double cosets is called . As with left cosets, we assume a section is given, satisfying .

The double coset space can be understood as the space of -orbits in , that is, Note that although acts transitively on (meaning that there is only one -orbit in ), the subgroup does not. Hence, the space splits into a number of disjoint orbits (for ), and these are precisely the double cosets .

Of course, does act transitively within a single orbit , sending (both of which are in , for ). In general this action is not necessarily fixed point free which means that there may exist some which map the left cosets to themselves. These are exactly the elements in the stabilizer of given by

(18)

Clearly, is a subgroup of . Furthermore, is conjugate to (and hence isomorphic to) the subgroup , which is a subgroup of .

For double cosets , we will overload the notation to . Like the coset stabilizer, this double coset stabilizer can be expressed as

(19)

2.3 Semidirect products

For a semidirect product group , such as , some things simplify. Let where is a subgroup, is a normal subgroup and . For every there is a unique way of decomposing it into where and . Thus, the left coset of depends only on the part of :

(20)

It follows that for a semidirect product group, we can define the section so that it always outputs an element of , instead of a general element of . Specifically, we can set . It follows that . This allow us to simplify expressions involving :

(21)

2.4 Haar measure

When we integrate over a group , we will use the Haar measure, which is the essentially unique measure that is invariant in the following sense:

(22)

Such measures always exist for locally compact groups, thus covering most cases of interest (Folland, 1995). For discrete groups, the Haar measure is the counting measure, and integration can be understood as a discrete sum.

We can integrate over by using an integral over ,

(23)

3 Induced Representations

The induced representation can be realized in various equivalent ways. We first discuss the realization through Mackey functions, which is easier to deal with mathematically. We then discuss the realization on functions222Technically, we should work with sections of a homogeneous vector bundle , instead of functions, but to keep things simple we will not. This is not a problem as long as one can find a continuous section of the bundle that is defined almost everywhere. on , which gives more complicated equations, but which can be implemented more efficiently in software, and more clearly conveys the geometrical meaning.

3.1 Realization through Mackey Functions

The space of Mackey functions for a representation of is defined as:

(24)

One may verify that this is a vector space. The induced representation acting on is defined as:

(25)

for and . It is clear that is in (i.e. satisfies the equivariance condition in Eq. 24), because the left and right action commute.

3.2 Realization through Functions on

Another way to realize the induced representation is on the space of vector valued functions on the quotient ,

(26)

Using a section of the canonical projection , and the function (Eq. 8), we can define as:

(27)

The meaning of this equation is that to transform the function , we have to do two things: first, we take each vector and attach it at position of the transformed function , without changing it. Secondly, we need to transform the vector itself by the representation of . This is demonstrated in Fig. 1.

To see that Eq. 27 does indeed define a representation, we expand the definition of twice and use the composition rule for (Eq. 9):

(28)

3.3 Equivalence of the Realizations

To show that the constructions are equivalent, we will define a lifting map of functions (i.e. ) to Mackey functions (i.e. ), and show that it is a bijection that commutes with the two definitions of .

The lift and its inverse are defined as:

(29)
(30)

for and . The idea behind this definition is that a Mackey function is determined by its value on coset representatives , because by Eq. 15 and 24 it satisfies . Hence, setting does not lose information. Specifically, we can reconstruct by setting .

It is easy to show, using Eq. 16, that satisfies the equivariance condition (Eq. 24):

(31)

So indeed for . To verify that is inverse to , use Eq. 13:

(32)

For the opposite direction, using ,

(33)

Finally, we show that commutes with the two definitions of the induced representation (Eq. 25 and 27). Let be the induced representation on and the induced rep on . For ,

(34)

It follows that and are isomorphic representations of .

3.4 Some basic properties of induction

We state some basic facts about induced representations. Proofs can be found in Ceccherini-Silberstein et al. (2009).

Theorem 3.1 (Induction in stages).

Let be a group and subgroups of , and let be a representation of , then:

(35)
Theorem 3.2.

The induced representation of a direct sum of representations is the direct sum of the induced representations:

(36)

4 Intertwiners: Elementary Approach

We would like to understand the structure of the space of intertwiners between two induced representations and :

(37)

and similarly for .

Using direct calculation, we will show that every map in or can be written as a convolution or cross-correlation with an equivariant kernel. We will start with the Mackey function approach.

4.1 Intertwiners for

Let be a representation of , and let be the induced representation acting on functions in . Likewise, let be a representation of , and let be the induced representation acting on functions in

A general linear map between vector spaces and can always be written as

(38)

using a two-argument operator-valued kernel .

In order for Eq. 38 to define an equivariant map between and , the kernel must satisfy several constraints. By (partially) resolving these constraints, we will show that Eq. 38 can always be written as a cross-correlation, and that the space of admissible kernels is in one-to-one correspondence with the space of bi-invariant one-argument kernels , to be defined below.

4.1.1 Equivariance Convolution

Since we are only interested in equivariant maps, we get a constraint on :

(39)

Hence, without loss of generality, we can define the two-argument kernel in terms of a one-argument kernel:

(40)

The application of to reduces to a cross-correlation:

(41)

It is also possible to define the one-argument kernel differently, so that we would get a convolution instead of a cross-correlation.

4.1.2 Left equivariance of

We want the result (or ) to live in , which means that this function has to satisfy the Mackey condition,

(42)

for all and .

4.1.3 Right equivariance of

The fact that satisfies the Mackey condition ( for ) implies a symmetry in the correlation . That is, if we apply a right--shift to the kernel, i.e. , we find that

(43)

It follows that we can take (for ,

(44)

4.1.4 Resolving the right-equivariance constraint

The above constraints show that the one-argument kernel should live in the space of bi-equivariant kernels on :

(45)

Here denotes the space of linear maps from to .

We can resolve the right -equivariance constraint by defining in terms of a kernel on the left coset space, i.e. . Specifically, using the decomposition of (Eq. 15), we can define

(46)

It is easy to verify that when defined in this way, satisfies right -equivariance.

We still have the left -equivariance constraint, which translates to as follows. For , and ,

(47)

where the last step made use of Eq. 9.

Thus, the space of bi-equivariant, single argument kernels on is equivalent to the following space of left-equivariant kernels on :

(48)

The isomorphism is defined as follows:

(49)

One may verify that these maps are indeed inverses, and that for and for .

In section 4.3 we will resolve the left-equivariance constraint that still applies to . But first we will continue with the realization of , where will again make an appearance.

4.2 Intertwiners for

In this section we will study the intertwiners between two induced representations and , realized on the spaces and (i.e. functions and ). The derivations in this section will mirror those of the last section, except that we start with functions on from the start.

A general linear map can be written as:

(50)

where and .

In order for to be equivariant, it must satisfy the constraint:

(51)

Expanding the left-hand side using the definition of (Eq. 27), we find

(52)

For the right-hand side, we obtain

(53)

Combining the last two equations, we obtain the constraint