Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems

Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems

Chris Reinke, Mayalen Etcheverry & Pierre-Yves Oudeyer
Flowers Team, Inria
Bordeaux, France
{chris.reinke,mayalen.etcheverry,pierre-yves.oudeyer}@inria.fr
Abstract

In many complex dynamical systems, artificial or natural, one can observe self-organization of patterns emerging from local rules. Cellular automata, like the Game of Life (GOL), have been widely used as abstract models enabling the study of various aspects of self-organization and morphogenesis, such as the emergence of spatially localized patterns. However, findings of self-organized patterns in such models have so far relied on manual tuning of parameters and initial states, and on the human eye to identify “interesting” patterns. In this paper, we formulate the problem of automated discovery of diverse self-organized patterns in such high-dimensional complex dynamical systems, as well as a framework for experimentation and evaluation. Using a continuous GOL as a testbed, we show that recent intrinsically-motivated machine learning algorithms (POP-IMGEPs), initially developed for learning of inverse models in robotics, can be transposed and used in this novel application area. These algorithms combine intrinsically-motivated goal exploration and unsupervised learning of goal space representations. Goal space representations describe the “interesting” features of patterns for which diverse variations should be discovered. In particular, we compare various approaches to define and learn goal space representations from the perspective of discovering diverse spatially localized patterns. Moreover, we introduce an extension of a state-of-the-art POP-IMGEP algorithm which incrementally learns a goal representation using a deep auto-encoder, and the use of CPPN primitives for generating initialization parameters. We show that it is more efficient than several baselines and equally efficient as a system pre-trained on a hand-made database of patterns identified by human experts. Source code and videos at https://automated-discovery.github.io/

1 Introduction

Self-organization of patterns that emerge from local rules is a pervasive phenomena in natural and artificial dynamical systems (ball1999self). It ranges from the formation of snow flakes, spots and rays on animal’s skin, to spiral galaxies. Understanding these processes has boosted progress in many fields, ranging from physics to biology (camazine2003self). This progress relied importantly on the use of powerful and rich abstract computational models of self-organization (kauffman1993origins). For example, cellular automata like Conway’s Game of Life (GOL) have been used to study the emergence of spatially localized patterns (SLPs) (gardener1970mathematical), informing theories of the origins of life (gardener1970mathematical; beer2004autopoiesis). SLPs, such as the famous glider in GOL (gardner1983wheels), are self-organizing patterns that have a local extension and can exist independently of other patterns. However, finding such novel self-organized patterns, and mapping the space of possible emergent patterns, has so far relied heavily on manual tuning of parameters and initial states. Moreover, the dependence of this exploration process on the human eye to identify “interesting” patterns is strongly limiting further advances.

We formulate here the problem of automated discovery of a diverse set of self-organized patterns in such high-dimensional, complex dynamical systems. This involves several challenges. A first challenge consists in determining a representation of patterns, possibly through learning, enabling to incentivize the discovery of diverse “interesting” patterns. Such a representation guides exploration by providing a measure of (di-)similarity between patterns. This problem is particularly challenging in domains where patterns are high-dimensional as in GOL. In such cases, scientists have a limited intuition about what useful features are and how to represent them. Moreover, low-dimensional representations of patterns are needed for human browsing and the visualization of the discoveries. Representation learning shall both guide exploration, and be fed by self-collected data.

A second challenge consists in how to automate exploration of high-dimensional, continuous initialization parameters to discover efficiently “interesting” patterns, such as SLPs, with a limited budget of experiments. Sample efficiency is important to enable the later use of such discovery algorithms for physical systems (grizou2019exploration), where experimental time and costs are strongly bounded. For example, in the continuous GOL used in this paper as a testbed, initialization consists in determining the values of a real-valued, high-dimensional matrix besides 7 additional dynamics parameters. The possible variations of this matrix are too large for a simple random sampling to be efficient. More structured methods are needed.

To address these challenges, we propose to leverage and transpose recent intrinsically motivated learning algorithms, within the family of population-based Intrinsically Motivated Goal Exploration Processes (POP-IMGEPs - denoted simply as IMGEPs below, baranes2013active; pere2018unsupervised). They were initially designed to enable autonomous robots to explore and learn what effects can be produced by their actions, and how to control these effects. IMGEPs self-define goals in a goal space that represents important features of the outcomes of actions, such as the position reached by an arm. This allows the discovery of diverse novel effects within their goal representations. It was recently shown how deep neuronal auto-encoders enabled unsupervised learning of goal representations in IMGEPs from raw pixel perception of a robot’s visual scene (laversanne2018curiosity). We propose to use a similar mechanism for automated discovery of patterns by unsupervised learning of a low-dimensional representation of features of self-organized patterns. This removes the need for human expert knowledge to define such representations.

Moreover, a key ingredient for sample efficient exploration of IMGEPs for robotics has been the use of structured motion primitives to encode the space of body motions (pastor2013dynamic). We propose to use a similar mechanism to handle the generation of structured initial states in GOL-like complex systems, based on specialized recurrent neural networks (CPPNs) (stanley2006exploiting).

In summary, we provide in this paper the following contributions:

  • We formulate the problem of automated discovery of diverse self-organized patterns in high-dimensional and complex game-of-life types of dynamical systems.

  • We show how to transpose POP-IMGEPs algorithms to address the associated joint challenge of (learning to) represent interesting patterns and discovering them in a sample efficient manner.

  • We compare various approaches to define or learn goal space representations for the sample efficient discovery of diverse SLPs in a continuous GOL testbed.

  • We show that an extension of a state-of-the-art POP-IMGEP algorithm, with incremental learning of a goal space using a deep auto-encoder, is equally efficient than a system pretrained on a hand-made database of patterns.

2 Related Work

Automated Discovery in Complex Systems

Automated processes have been widely used to explore complex dynamical systems. For example, evolutionary algorithms have been applied to search specific patterns or rules of cellular automata (mitchell1996evolving; sapin2003research). However, their objective is to optimize a specific goal instead of discovering a diversity of patterns. Another line of experiments represent active inquiry-based learning strategies which query which set of experiments to perform to improve a system model, i.e. a mapping from parameters to the system outcome. Such strategies have been used in biology (king2004functional; king2009automation), chemistry (raccuglia2016machine; reizman2016suzuki; duros2017human) and astrophysics (richards2011active). However, these approaches have relied on expert knowledge, and focused on automated optimization of a pre-defined target property. Here, we are interested to automatically discover and map a diversity of unseen patterns without prior knowledge of the system. An exception is the concurrent work of grizou2019exploration, which showed how a simple POP-IMGEP algorithm could be used to automate discovery of diverse patterns in oil-droplet systems. However, it used a low-dimensional input space, and a hand-defined low-dimensional representation of goal spaces, identified as a major limit of the system.

Intrinsically motivated learning

Intrinsically-motivated learning algorithms (baldassarre2013intrinsically; baranes2013active) autonomously organize an agent’s exploration curriculum in order to discover efficiently a maximally diverse set of outcomes the agent can produce in an unknown environment. They are inspired from the way children self-develop open repertoires of skills and learn world models. Intrinsically Motivated Goal Exploration Processes (IMGEPs) (baranes2013active; forestier2017intrinsically) are a family of curiosity-driven algorithms developed to allow efficient exploration of high-dimensional complex real world systems. Population-based versions of these algorithms, which leverage episodic memory, hindsight learning, and structured dynamic motion primitives to parameterize policies, enable sample efficient acquisition of high-dimensional skills in real world robots (forestier2017intrinsically; rolf2010goal). Recent work (laversanne2018curiosity; pere2018unsupervised) studied how to automatically learn the goal representations with the use of deep variational autoencoders. However, training was done passively and in an early stage on a precollected set of available observations. Recent approaches (nair2018visual; pong2019skew) introduced the use of an online training of VAEs to learn the important features of a goal space similar to the methods in this paper. However, these approaches focused on the problem of sequential decisions in MDPs, incurring a cost on sample efficiency. This problem is observed in various intrinsically motivated RL approaches (bellemare2016unifying; burda2018exploration). The approaches are orthogonal to the automated discovery framework considered here with independent experiments allowing the use of memory-based sample efficient methods. A related family of algorithms in evolutionary computation is novelty search (lehman2008exploiting) and quality-diversity algorithms (pugh2016quality), which can be formalized as special kinds of population-based IMGEPs.

Representation learning

We are using representation learning methods to learn autonomously goal spaces for IMGEPs. Representation learning aims at finding low-dimensional explanatory factors representing high-dimensional input data (bengio2013representation). It is a key problem in many areas in order to understand the underlying structure of complex observations. Many state-of-the-art methods (tschannen2018recent) have built on top of Deep variational autoencoders (VAE) (kingma2013auto), using varying objectives and network architectures. However, studies of the interplay between representation learning and autonomous data collection through exploration of an environment have been limited so far.

3 Algorithmic Methods for Automated Discovery

3.1 Population-based Intrinsically Motivated Goal Exploration Processes

An IMGEP is an algorithmic process generating a sequence of experiments to explore the parameters of a system by targeting self-generated goals (Fig. 1). It aims to maximize the diversity of observations from that system within a budget of experiments. In population-based IMGEPs, an explicit memory of the history of experiments and observations is used to guide the process.

The systems are defined by three components. A parameter space corresponding to the controllable system parameters . An observation space where an observation is a vector representing all the signals captured from the system. For this paper, the observations are a time series of images which depict the morphogenesis of activity patterns. Finally, an unknown environment dynamic : which maps parameters to observations.

To explore a system, an IMGEP uses a goal space that represents relevant features of its observations, computed using an encoding function . For the exploration of patterns, such features may describe their form or extension. The exploration process iterates times through: 1) sample a goal from a goal sampling distribution ; 2) infer corresponding parameter using a parameter sampling policy ; 3) roll-out an experiment with , observe the outcome , compute encoding ; 4) store in history . Because the sampling of goals and parameters depend on a history of explored parameters, an initial set of parameters are randomly sampled and explored before the intrinsically motivated goal exploration process starts.

Different goal and parameter sampling mechanisms can be used within this architecture (baranes2013active; forestier2016modular). In the experiments below, parameters are sampled by 1) given a goal, selecting the parameter from the history whose corresponding outcome is most similar in the goal space; 2) then mutating it by a random process. The goal sampling policy is a uniform distribution over a hypercube in chosen to be large enough to bias exploration towards the frontiers of known goals to incentivize diversity.

Figure 1: Population-based intrinsically motivated goal exploration process with incremental learning of a goal space (IMGEP-OGL algorithm) used to explore a continuous GOL model.

3.2 Online Learning of Goal Spaces with Deep Auto-Encoders

For IMGEPs the definition of the goal space and its corresponding encoder are a critical part, because it biases exploration of the target system. One approach is to define a goal space by selecting features manually, for example by using computer vision algorithms to detect the positions of a pattern and its form. The diversity found by the IMGEPs will then be biased along these pre-defined features. A limit of this approach is its requirement of expert knowledge to select helpful features, particularly problematic in environments where experts do not know in advance what features are important, or how to formulate them.

Another approach is to learn goal space features by unsupervised representation learning. The aim is to learn a mapping from the raw sensor observations to a compact latent vector . This latent mapping can be used as a goal space where a latent vector is interpreted as a goal.

Previous IMGEP approaches already learned successfully their goal spaces with variational autoencoders (VAE) (laversanne2018curiosity; pere2018unsupervised). However, the goal spaces were learned before the start of the exploration from a prerecorded dataset of observations from the target environment. During the exploration the learned representations were kept fixed. A problem with this pretraining approach is that it limits the possibilities to discover novel patterns beyond the distribution of pretraining examples, and requires expert knowledge.

1 Initialize goal space encoder VAE with random weights
2 for  to  do
3       if  then // Initial random iterations to populate
4             Sample
5      else  // Intrinsically motivated iterations
6             Sample a goal based on space represented by
7             Choose
8            
9      Perform an experiment with and observe
10       Encode reached goal
11       Append to the history
12       if  then // Periodically train the network
13             for E epochs do
14                  Train on observations in with importance sampling
15            for  do // Update the database of reached goals
16                  
17            
18      
Algorithm 1 IMGEP-OGL

In this paper we attempt to address this problem by continuously adapting the learned representation to the novel observations encountered during the exploration process. For this purpose, we propose an online goal space learning IMGEP (IMGEP-OGL), which learns the goal space incrementally during the exploration process (Algorithm 1). The training procedure of the VAE is integrated in the goal sampling exploration process by first initializing the VAE with random weights (Appendix E). The VAE network is then trained every explorations for epochs on the observation collected in the history . Importance sampling is used to give more weight to recently discovered patterns.

3.3 Structuring the parameter space in IMGEPs: from DMPs to CPPNs

A key role in the generation of patterns in dynamical systems is their initial state . IMGEPs sample these initial states and apply random perturbations to them during the exploration. For the experiments in this paper this state is a two-dimensional grid with cells. Performing directly a random sampling of the grid cells results in initial patterns that resemble white noise. Such random states result mainly in the emergence of global patterns that spread over the whole state space, complicating the search for spatially localized patterns. This effect is analogous to a similar problem in the exploration of robot controllers. Direct sampling of actions for individual actuators at a microscopic time scale is usually inefficient. A key ingredient for sample efficient exploration has been the use of structured primitives (dynamic motion primitives - DMPs) to encode the space of possible body motions (pastor2013dynamic).

We solved the sampling problem for the initial states by transposing the idea of structured primitives. Indeed, “actions” consist here in deciding the parameters of an experiment, including the initial state. We propose to use compositional pattern producing networks (CPPNs) (stanley2006exploiting) to produce structured initial patterns similar do DMPs. CPPNs are recurrent neural networks that allow the generation of structured initial states (Appendix B, Fig. 9) . The CPPNs are used as part of the parameters . They are defined by their network structure (number of neurons, connections between neurons) and their connection weights. They include a mechanism for random mutation of the weights and structure. The number of parameters in is therefore not fixed (yet starts small) and open-ended.

4 Experimental methods

We describe here the continuous Game of Life (Lenia) we use as a testbed representing a large class of high-dimensional dynamical systems, as well as the experimental procedures, the evaluation methods used to measure diversity and detect SLPs, and the used algorithmic baselines and ablations.

4.1 Continous Game of Life as a testbed

(a) Evolution in Lenia
from CPPN to animal
(b) Lenia animals
discovered by IMGEP-OGL
(c) Lenia animals
discovered by chan2018lenia
t=1 t=50
t=100 t=200
Figure 2: Example patterns produced in the continuous GOL system (Lenia). Illustration of the dynamical morphing from an initial CPPN image to an animal (a). The automated discovery (b) is able to find similar complex animals as a human-expert manual search (c) by chan2018lenia.

Lenia (chan2018lenia) is a continuous cellular automaton (wolfram1983statistical) similar to Conway’s Game of Life (gardener1970mathematical). Lenia, in particular, represents a high-dimensional complex dynamical system where diverse visual structures can self-organize and yet are hard to find by manual exploration. It features the richness of Turing-complete game-of-life models. It is therefore well suited to test the performance of pattern exploration algorithms for unknown and complex systems. The fact that GOL models have been used widely to study self-organization in various disciplines, ranging from physics to biology and economics (bak1989self), also supports their generality and potential of reuse of our approach for discovery in other computational or wet high-dimensional systems.

Lenia consists of a two-dimensional grid of cells where the state of each cell is a real-valued scalar activity . The state of cells evolves over discrete time steps (Fig. 2, a). The activity change is computed by integrating the activity of neighbouring cells. Lenia’s behavior is controlled by its initial pattern and several settings that control the dynamics of the activity change (). Appendix A describes Lenia and its parameters in detail.

Lenia can be understood as a self-organizing morphogenetic system. Its parameters for the initial pattern and dynamics control determine the development of morphological patterns. Lenia can produce diverse patterns with different dynamics (stable, non-stable or chaotic). Most interesting, spatially localized coherent patterns that resemble in their shapes microscopic animals can emerge (Fig. 2, b, c). These pattern types, which we will denote “animals” as a short name, are a key reason scientists have used GOL models to study theories of the origins of life (gardener1970mathematical; beer2004autopoiesis). Therefore, in our evaluation method based on measures of diversity (see below), we will in particular study the performance of IMGEPs, and the impact of using various approaches for goal space representation, on finding a diversity of animal patterns. We implemented for this purpose different pattern classifiers to analyze the exploration results (Appendix A.2). Initially we differentiate between dead and alive patterns. A pattern is dead if the activity of all cells are either or . Alive patterns are separated into animals and non-animals. Animals are a connected areas of positive activity which are finite, i.e. which do not infinitely cross several borders. All other patterns are non-animals whose activity usually spreads over the whole state space.

4.2 Evaluation based on the diversity of Patterns

The algorithms are evaluated based on their discovered diversity of patterns. Diversity is measured by the spread of the exploration in an analytic behavior space. This space is externally defined by the experimenter as in previous evaluation approaches in the IMGEP literature. For example, in pere2018unsupervised is the diversity of discovered effects of a robot that manipulates objects measured by binning the space of object positions and counting the number of bins discovered. A difference here is that the experimenter does not have access to an easily interpretable hand-defined low-dimensional representation of possible patterns, equivalent to the cartesian coordinate of rigid objects. The space of raw observations , i.e. the final Lenia patterns , is also too high-dimensional for a meaningful measure of spread in it. We constructed therefore an external evaluation space. First, a latent representation space was build through the training of a VAE to learn the important features over a very large dataset of Lenia patterns identified during the many experiments over all evaluated algorithms. This large dataset enabled to cover a diversity of patterns orders of magnitude larger than what could be found in any single algorithm experiment, which experimental budget was order of magnitude smaller. We then augmented that space by concatenating hand-defined features (the same as for the HGS algorithm). See Appendix C for more information.

For each experiment all explored patterns were projected into the analytic behavior space. The diversity of the patterns is then measured by discretizing the space into bins of equal size by splitting each dimension into sections (results were found to be robust to the number of bins per dimension, see C). This results in bins. The number of bins in which at least one explored entity falls is used as a measure for diversity.

We also measured the diversity in the space of parameters by constructing an analytic parameter space. The 15 features of this space consisted of Lenia’s parameters (, , , , , , ) and the latent representation of a VAE. The VAE was trained on a large dataset of initial Lenia states () used over the experimental campaign. This diversity measures also used 7 bins per dimension.

4.3 Algorithms

The exploration behaviors of different IMGEP algorithms were evaluated and compared to a random exploration. The IMGEP variants differ in their way how the goal space is defined or learned. Appendices D and E provide details and hyperparameters.

Random exploration: The IMGEP variants were compared to a random exploration that sampled randomly for each of the exploration iterations the parameters including the initial state .

IMGEP-HGS - Goal exploration with a hand-defined goal space: The first IMGEP uses a hand-defined goal space that is composed of 5 features used in chan2018lenia. Each feature measures a certain property of the final pattern that emerged in Lenia: 1) the sum over the activity of all cells, 2) the number of activated cells, 3) the density of the activity center, 4) an asymmetry measure of the pattern and 5) a distribution measure of the pattern.

IMGEP-PGL - Goal exploration with a pretrained goal space: For this IMGEP variant the goal space was learned with a VAE approach on training data before the exploration process started. The training set consisted of 558 Lenia patterns: half were animals that have been manually identified by chan2018lenia; the other half randomly generated with CPPNs, see Section 4.4.

IMGEP-OGL - Goal exploration with online learning of the goal space: Algorithm 1.

IMGEP-RGS - Goal exploration with a random goal space: An ablated IMGEP using a goal space based on the encoder of a VAE with random weights.

4.4 Experimental Procedure and hyperparameters

For each algorithm 10 repetitions of the exploration experiment were conducted. Each experiment consisted of exploration iterations. This number was chosen to be compatible with the application of the algorithms in physical experimental setups similar to grizou2019exploration, planned in future work. For IMGEP variants the first iterations used random parameter sampling to initialize their histories . For the following iterations each IMGEP approach sampled a goal via an uniform distribution over its goal space. The ranges for sampling in the hand-defined goal space (HGS) are defined in Table 5 (Appendix D). The ranges for the VAE based goal spaces (PGL, OGL) were set to for each of their latent variables. Then, the parameter from a previous exploration in was selected whose reached goal had the minimum euclidean distance to the current goal within the goal space. This parameter was then mutated to generate the parameter that was explored.

The parameters consisted of a CPPN (Section 3.3) that generates the initial state for Lenia and the settings defining Lenia’s dynamics: . The CPPNs were initialized and mutated by a random process that defines their structure and connection weights as done by stanley2006exploiting. The random initialization of the other Lenia settings was done by an uniform distribution and their mutation by a Gaussian distribution around the original values. The meta parameters to initialize and mutate the parameters were the same for all algorithms (Appendix B). They were manually chosen without optimizing them for a specific algorithm.

5 Results

We address several questions evaluating the ability of IMGEP algorithms to identify a diverse set of patterns, and in particular diverse “animal” patterns (i.e. spatially localized patterns).

(a) Diversity in Parameter Space (b) Diversity in Behavior Space
(c) Behavior Space Diversity for Animals (d) Behavior Space Diversity for Non-Animals
Figure 3: (a) Although random explorations reach the highest diversity in the analytic parameter space, (b) IMGEPs reach a higher diversity in the analytic behavior space (except when using random representations). (c) IMGEPs with a learned goal space discovered a larger diversity of animals compared to a hand-defined goal space. (d) Learned goal spaces are as efficient as a hand-defined space for finding diverse non-animals patterns. Overall, IMGEPs with unsupervised learning of goal features are efficient to discover a diversity of diverse patterns. Depicted is the average diversity () with the standard deviation as shaded area (for some not visible because it is too small). See Fig. 27, 28, 29, 30 and 31 in Annex for a qualitative visual illustration of these results.
Does goal exploration outperform random parameter exploration?

In robotics/agents contexts where scenes are populated with rigid objects, various forms of goal exploration algorithms outperform random parameter exploration (laversanne2018curiosity). We checked whether this still holds in continuous GOL which have very different properties. Measures of the diversity in the analytic behavior space confirmed the advantage of IMGEPs with hand-designed (HGS) or learned goal spaces (PGL/OGL) over random explorations (Fig. 3, b). The organization resulting from goal exploration is also visible through the diversity in the space of parameters. IMGEPs focus their exploration on subspaces that are useful for targeting new goals. In contrast, random parameter exploration is unguided, resulting in a higher diversity in the parameter space (Fig. 3, b).

What is the impact of learning a goal space vs. using random or hand-defined features?

We compared also the performance of random VAE goal spaces (RGS) to learned goal spaces (PGL/OGL). For reinforcement learning problems, using intrinsic reward functions based on random features of the observations can result in a high or boosted performance (burda2018large; burda2018exploration). In our context however, using random features (RGS) collapsed the performance of goal exploration, and did not even outperform random parameter exploration for all kinds of behavioural diversity (Fig. 3). Results also show that hand-defined features (HGS) produced significantly less global diversity and less “animal” diversity than using learned features (PGL/OGL). However, HGS found an equal diversity of “non-animals”. These results show that in this domain, the goal-space has a critical influence on the type and diversity of patterns discovered. Furthermore, unsupervised learning is an efficient approach to discover a diversity of diverse patterns, i.e. both efficient at finding diverse animals and diverse non-animals.

Is pretraining on a database of expert patterns necessary for efficient discovery of diverse animals?

A possibility to bias exploration towards patterns of interest, such as “animals”, is to pretrain a goal space with a pattern dataset hand-made by an expert. Here PGL is pretrained with a dataset containing 50% animals. This leads PGL to discover a high diversity of animals. However, the new online approach (IMGEP-OGL) is as efficient as PGL to discover diverse patterns (Fig. 3, b,c,d). Taken together, these results uncover an interesting bias of using learned features with a VAE architecture, which strongly incentivizes efficient discovery of diverse spatially localized patterns.

(a) IMGEP-HGS Goal Space (b) IMGEP-OGL Goal Space
Figure 4: (a) Hand-defined and (b) learned goal spaces have major differences shown here by a t-SNE visualization. The different number and size of clusters of animals or non-animals can explain the differences in their resulting diversity between the algorithms (Fig. 3).
How do goal space representations differ?

We analyzed the goal spaces of the different IMGEP variants to understand their behavior by visualizing their reached goals in a two-dimensional space. T-SNE (maaten2008visualizing) was used to reduce the high-dimensional goal spaces. It puts points that were nearby in the high-dimensional space also close to each other in the two-dimensional visualization.

The hand-defined (HGS) and learned (OGL) goal spaces show strong differences between each other (Fig. 4). We believe this explains their different abilities to find either a high diversity of non-animals or animals (Fig. 3, c, d). The goal space of the IMGEP-HGS shows large areas and several clusters for non-animal patterns (Fig. 4, a). Animals form only few and nearby clusters. Thus, the hand-defined features seem poor to discriminate and describe animal patterns in Lenia. As a consequence, when goals are uniformly sampled within this goal space during the exploration process, then more goals are generated in regions that describe non-animals. This can explain why IMGEP-HGS explored a higher diversity of non-animal patterns but only a low diversity of animal patterns. In contrast, features learned by IMGEP-OGL capture better factors that differentiate animal patterns. This is indicated by the several clusters of animals that span a wide area in its goal space (Fig. 4, b).

We attribute this effect to the difficulty of VAEs to capture sharp details (zhao2017towards). They therefore represent mainly the general form of the patterns but not their fine-grained structures. Animals differ often in their form whereas non-animals occupy often the whole cell grid and differ in their fine-grained details. The goal spaces learned by VAEs seem therefore better suited for exploring diverse sets of animal patterns.

6 Conclusion

We formulated a novel application area for machine learning: the problem of automatically discovering self-organized patterns in complex dynamical systems with high-dimensions both in the action space and in the observation space. We showed that this problem calls for advanced methods requiring the dynamic interaction between sample efficient autonomous exploration and unsupervised representation learning. We demonstrated that population-based IMGEPs are a promising algorithmic framework to address this challenge, by showing how it can discover diverse self-organized patterns in a continuous GOL. In particular, we introduced a new approach of learning a goal space representation online via data collected during the exploration process. It enables sample efficient discovery of diverse sets of animal-like patterns, similar to those identified by human experts and yet without relying on such prior expert knowledge (Fig. 2). We also analyzed the impact of goal space representations on the diversity and types of discovered patterns.

The continuous game of life shares many properties with other artificial or natural complex systems, explaining why GOL models have been used in many disciplines to study self-organization, see bak1989self. We therefore believe this study shows the potential of IMGEPs to automated discovery in other systems encountered in physics, chemistry or even computer animation. In further work, we aim to apply this approach in roboticized wet experiments such as the one presented in grizou2019exploration and addressing fundamental understanding of how proto-cells can self-organize.

Acknowledgments

We thank Bert Wang-Chak Chan for his helpful discussions about the Lenia system and Jonathan Grizou for his comments on the visualization of our results. Furthermore, we thank Cédric Colas for his useful comments on the script.

References

Appendix A Target System: continuous Game of Life (Lenia)

The Lenia model is a particular implementation of continuous Game of Life models (chan2018lenia). It was used as the target system for all exploration experiments. The following section describes Lenia and the parameters to control its behavior in detail. It is followed by a description of the classifiers used to categorize dead, animal and non-animal Lenia patterns. Finally, statistical measures about the patterns are introduced which were used to define goal and analytic spaces.

a.1 Implementation Details and Parameters

Lenia (chan2018lenia) is a cellular automaton (wolfram1983statistical). It consists of a two-dimensional grid of cells with for all experiments. The cell grid is similar to the surface of a ball. Cells on the north border are neighbors to the south border cells. The east and west border are also connected. The state of each cell is a real-valued scalar activity . The states of cells evolve over discrete time steps with for all experiments. The activity change of a cell is computed by integrating the previous activity of its neighbouring cells:

where is the growth mapping, is the kernel, with is the time step and is the clip function. For all experiments an exponential growth mapping was used:

with and being perimeters that control its shape.

The kernel integrates the activity of the current cell and its neighbours by a convolution with a kernel function :

(1)

where is the neighborhood around the cell and with is the site distance. The neighborhood is defined by a circle around with radius : . The kernel is constructed by a kernel core function and a kernel shell function . The kernel core creates a ring around the center coordinate and is defined by an exponential:

The kernel shell takes a vector parameter and copies the kernel core into concentric rings. The rings are of equal thickness with peak heights :

Finally, the kernel is normalized:

In total 8 parameters controlled the behavior of Lenia for all experiments. is the starting pattern of the system. is the radius of the circle around a cell whose enclosed cells influence the activity of . controls the growth strength update per time step. The growth mapping is controlled by and . The form of the kernel function is controlled by .

We based our Python implementation of Lenia on the code provided by https://github.com/Chakazul/Lenia.

Figure 5: Patterns identified by a human-expert (chan2018lenia). The activity of each cell is mapped from to the color scheme visualized at the bottom. An activity of 0 is white. This color scheme is used for all illustrated patters throughout the paper.

a.2 Classifier

We categorized 3 types of patterns that are observed in Lenia. The categories were used to analyze if the exploration algorithms showed differences in their exploration behaviors by identifying different types of patterns. The 3 categories are dead, animals and non-animals. For each class is a classifier defined. The classifiers only classify the final pattern in which the Lenia system morphs after time steps.

Dead Classifier: For dead patterns is the activity of all cells either or in the last time step.

Animal Classifier: The final Lenia pattern is classified as an animal if it is a finite and connected pattern of activity. Cells , are connected as a pattern if both are active ( and ) and if they influence each other. Cells influence each other when they are within their radius of the kernel as defined by the parameter (Eq. 1).

Furthermore, the connected pattern must be finite. In Lenia finite and infinite patterns can be differentiated because the opposite borders of Lenia’s cell grid are connected, so that the space is similar to a ball surface. Thus, a pattern can loop around this surface making it infinite. We identify infinite patterns by the following approach. First, all connected patterns are identified for the case of assuming an infinite grid cell, i.e. opposite grid cell borders are connected. Second, all connected patterns for the case of a finite grid cell, i.e. opposite grid cell borders are not connected, are identified. Third, for each border pair (north-south and east-west) it is tested if cells within a distance of from both borders exists, that are part of a connected pattern for the infinite and finite grid cell case. If such a pattern exists than it is assumed to be infinite, because it loops around the grid cell surface of Lenia (Fig. 6, a). All other patterns are considered to be finite (Fig. 6, b). Please note that this method has a drawback. It can not identify certain infinite patterns that loop over several borders, for example, if a pattern exists that connects the north to east and then the west to south border (Fig. 7).

(a) Infinite Pattern (a) Finite Pattern
pattern
infinite
segmentation
finite
segmentation
pattern
infinite
segmentation
finite
segmentation
 
Figure 6: Classification of Lenia patterns into finite and infinite patterns. Infinite patterns form loop between the image borders which are identified if a segment is connected between two borders in the infinite and finite segmentation. Finite patterns form no loops. They have connected segments between borders in the infinite but not finite segmentation. Segments are colorized in yellow, green and blue.

                 

Figure 7: Examples of infinite patterns that are misclassified as a finite patterns.

Moreover, there are two additional constraints that an animal pattern must fulfill. First, the cells of the connected pattern must have at least 80% of all activation, i.e. . Second, a pattern must exists for the last two time steps ( and ). Both constraint are used to avoid that too small patterns or chaotic entities which change drastically between time steps are classified as animals. See Fig. 5, 27, 29, 30 and 31 for examples of animal patterns.

Non-Animal Classifier: We also classified non-animal patterns which are all entities that were not dead and not an animal. These patterns spread usually over the whole state space and are connected via borders. See Fig. 27, 29, 30 and 31 for examples of non-animal patterns.

a.3 Statistical Measures for Lenia Patterns

We defined five statistical measurements for the final patterns that emerge in Lenia. The measures were used as features for hand-defined goal spaces of IMGEPs and to define partly the analytic behavior space in which the results of the exploration experiments were compared.

Activation mass : Measures the sum over the total activation of the final pattern and normalizes it according to the size of the Lenia grid:

where is the number of cells of the Lenia system.

Activation volume : Measures the number of active cells and normalizes it according to the size of the Lenia grid:

Activation density : Measures how dense the activation is distributed on average over all active cells:

Activation asymmetry : Measures how symmetrical the activation is distributed according to an axis that starts in the center of the patterns activation mass and goes along the last movement direction of this center. This measure was introduced to especially characterize animal patterns such as shown in Fig. 5. The center of the activity mass is usually also the center of the animals and analyzing the activity along their movement axis measures how symmetrical they are.

As a first step, the center of the activation mass is computed for every time step of the Lenia simulation and the Lenia pattern recentered to this location. This ensures that the center is all the time correctly computed in the case the animal moves and reaches one border to appear on the opposite border in the uncentered pattern. The center for time step is calculated by:

where measures the image moment (or raw moment) of order for .

Based on the center the pattern is recentered to by shifting the and indexes according to the center:

(2)

where is width and length of the Lenia grid and the indexing is . After each time step the center is recomputed and the pattern recentered:

Please note, the simulations and all figures of patterns in the paper are done with the uncentered pattern. The centered version is only computed for the purpose of statistical measurements.

The recenter step by defines also the movement direction of the activity center:

where are the coordinates for the middle point of the grid. A line can be defined that starts in the midpoint of the final centered pattern and goes in and opposite to the final movement direction of the activity mass center . This line separates the grid in two equal areas. The asymmetry is computed by comparing the amount of activity in the grid right and left of the line. The normalized difference between both sides is the final asymmetry measure:

Activation centeredness : Measures how strong the activation is distributed around the activity mass center:

where is the distance from the point to the center point . is the centered activation that is updated every time step as for the asymmetry measure (Eq. 2). The weights decrease the farer a point is from the center. Thus, patterns that are concentrated around the center have a high value for close to . Whereas, patterns whos activity is distributed throughout the whole grid have a smaller value. For patterns that are equally distributed () is defined as centeredness measure.

Appendix B Sampling of Parameters for Lenia

All exploration algorithms explore Lenia patterns by sampling the parameters that control Lenia. The parameters are comprised of the initial pattern and the parameters which control the dynamic behavior (). There are two operations to sample parameters: 1) random initialization and 2) mutating an existing parameter . CPPNs are used for the random initialization and mutation of the initial pattern . The details of this process are described in the next section. Afterwards, the initialization and mutation of Lenia’s parameter that control its dynamics are described.

b.1 Sampling of Start Patterns for Lenia via CPPNs

Compositional Pattern Producing Networks (CPPNs) are recurrent neural networks that were developed for the generation and evolution of gray-scale 2D images (stanley2006exploiting). We used CPPNs to generate and mutate the initial state of Lenia which resembles an image. CPPNs generate images pixel by pixel by taking as input a bias value, the and coordinate of the pixel in the image and its distance to the image center (Fig. 8). Their output is the pixel value as a gray scale between and for the given coordinate. For the generation of initial Lenia patterns is as input the and coordinate of the grid cells used. They were mapped to and . The distance to the grid center is given by . The final activity of a cell is the remapped output of the CPPN via .

Figure 8: CPPNs are recurrent neural networks which take as input a bias of , the and coordinate of a point in the generated pattern and its distance to the center of the pattern. Their output is the activity value of a grid cell.

CPPNs consist of several hidden neurons (typically between 4 to 6 in our experiments) that can have recurrent connections and self connections. Each CPPN has one output neuron. Two activation functions were used for the hidden neurons and the output neuron. The first is Gaussian and the second is sigmoidal:

(3)
(4)

To randomly initialize a Lenia initial pattern a CPPN is randomly sampled by sampling the number of hidden neurons, the connections between inputs and neurons and neurons to neurons, their connection weights and the activation functions for neurons. Afterwards the initial pattern is generated by it. In the history of the IMGEPs is then the CPPN as part of the parameter added. If the parameter is mutated, then the weights, connections and activation functions of the CPPN are mutated and the new initial pattern generated by it. A CPPN is defined over its network structure (number of nodes, connections of nodes) and its connection weights. The number of parameters in is therefore variable and not fixed.

We used the neat-python222https://github.com/CodeReclaimers/neat-python package for the random generation and mutation of CPPNs. It is based on the NeuroEvolution of Augmenting Topologies (NEAT) algorithm for the evolution of neural networks (stanley2002efficient). The meta-parameters for the initialization and mutation of CPPNs are listed in Table 1. The random sampling and mutation of CPPNs allows to generate complex patterns as illustrated in Fig. 9.

Parameter Value
Initial number of hidden neurons
Initial activation functions gauss, sigm
Initial connections random connections with probability
Initial synapse weight Gaussian distribution with ,
Synapse weight range
Mutation neuron add probability
Mutation neuron delete probability
Mutation connection add probability
Mutation connection delete probability
Mutation rate of activation functions
Mutation rate of synapse weights
Mutation replace rate of synapse weights
Mutation power of synapse weights
Mutation enable/disable rate of synapse weights
Table 1: Configuration for the initialization and mutation of CPPN networks that generate the initial state for the Lenia system.

The random sampling of a new CPPN is done by the following steps. All CPPNs are initialized with 4 hidden neurons and 1 output neuron. Their activation functions are randomly assigned. Each input-hidden, hidden-hidden and hidden-output neuron pair is connected with a probability of . The weights of each connection are sampled via a Gaussian distribution: . The maximum and minimum weights for a connection are and .

An existing CPPN is mutated by the following procedure. At first, structural mutations are performed. With probability a new neuron with a random activation function is added. The neuron is connected to the network by choosing randomly an existing connection. This connection is deleted. A connection from the source of the deleted connection to the new neuron is added with weight . Additionally, a new connection from the new neuron to the target of the deleted connection is added with the old connection weight , finishing the addition of a new neuron. With probability one of the hidden neurons is deleted. With probability a new connection is added between a random input-hidden, hidden-hidden or hidden-output neuron pair. The connection weight is sampled by the same method as for the sampling of new CPPNs. With probability one random existing connection is removed. After the structural mutations the activation functions and weights are mutated. For each neuron the activation function is changed with probability by randomly assigning a new activation function (either gauss or sigm). For each connection the weight is mutated by the following steps. With probability the weight of the connection is changed according to:

where is the mutation power and is the clip function. With probability the connection weight is completely replaced by sampling a new one as done for the sampling of weights of new CPPNs.

Please note, the neat-python package allows also the setting and mutation of response and bias weights for each neuron. Those settings were not used for the experiments. Moreover, we adjusted the sigmoid and Gaussian function in the neat-python package to the ones defined in Eq. 3 and Eq. 4 to be able to replicate similar images as in stanley2006exploiting.

Initialization 1st Mutation 2nd Mutation 3rd Mutation 4th Mutation 5th Mutation
Figure 9: CPPNs can generate complex patterns via their random initialization and successive mutations. Each row shows generated patterns by one CPPN and its mutations.

b.2 Sampling of Lenia’s Dynamic Parameters

The parameters that control the dynamics of Lenia () are initialized and mutated via uniform and Gaussian distributions. Table 2 lists for each parameter the meta-parameters for their initialization and mutation. Each parameter is initialized by an uniform sampling with and as upper and lower border. An existing parameter is mutated by the following equation:

where is the mutation power and is the clip function with and as upper and lower border. For natural numbers the resulting value is rounded towards the nearest natural number.

Parameter Type Value Range Mutation
Table 2: Settings for the initialization and mutation of Lenia system parameters .

Appendix C Measurement of Diversity in the Analytic Parameter and Behavior Space

The algorithms are compared on their ability to explore a diverse set of patterns. The next section introduces the diversity measure, followed by sections that introduce the spaces in which the algorithms are compared.

c.1 Diversity Measure

Diversity is measured by the area that explored parameters cover in the parameter space of Lenia or that the identified patterns cover in the observation space. For the experiments the parameter space consisted of the initial start state of Lenia () and the settings for Lenia’s dynamics (). The space consist therefore of dimensions, each for a single grid cell of the initial pattern, plus 7 dimensions for the dynamic settings. The observation space consists of the final patterns resulting in dimensions for the space. Each single exploration results in a new point in those spaces.

The diversity measures how much area the algorithms explored in those spaces (Fig. 10). The measurement is done by discretizing the space with a spatial grid and counting the number of discretized areas in which at least one point falls. For the discretization each dimension of the space is given a range, i.e. a minimum and maximum border. Each dimension is then split in a certain number of equally sized bins between those borders. The areas with values falling below the minimum or above the maximum border are counted as two additional bins.

The number of dimensions of the original parameter and observation space are too large to measure diversity in a meaningful manner. The initial pattern and the final pattern have dimensions. We constructed therefore an analytic parameter and behavioral space where the latent representations of a -VAE were used to reduce the high-dimensional patterns to 8 dimensions. The diversity in those spaces was compared between the algorithms. 5 bins (7 with the out of range values) per dimension were used for the discretization of those spaces for all experiments in the paper.

Figure 10: Illustration of the diversity measure in a two-dimensional space. The ranges for the dimensions were set to and . The number bins per dimension is 5. Including the outlier areas the number of discretized bins is . The diversity is the number of bins in which points exist (grey areas) which are 12 in this example.

c.2 Analytic Parameter Space

The analytic parameter space was constructed by the 7 Lenia parameters that control its dynamics and 8 latent representation dimensions of a -VAE (Table 3). The -VAE was trained on initial patterns used during the experiments. The dataset was constructed by randomly selecting 42500 patterns (37500 as training set, 5000 as validation set) from the experiments of all algorithms and each of their 10 repetitions. The -VAE uses the same structure, hyper-parameters, loss function and learning algorithm as described in Section  E. It was trained for more than 1400 epochs with (Fig. 11). The encoder which resulted in the minimal validation set error during the training was used. According to its reconstructed patterns it can represent the general form of patterns but often not individual details such as their texture (Fig. 12).

Analytic Parameter Space Definition

Parameter min max Parameter min max
R -VAE latent -5 5
T -VAE latent -5 5
-VAE latent -5 5
-VAE latent -5 5
-VAE latent -5 5
-VAE latent -5 5
-VAE latent -5 5
-VAE latent -5 5
Table 3: Features of the analytic parameter space and their min and max values

Learning curve of the -VAE for the Analytic Parameter Space

Figure 11: Learning curve of the -VAE whose latent encoding was used for the analytic parameter space.

Reconstruction Examples of the Analytic Parameter Space -VAE

Figure 12: Examples of patterns (left) and their reconstructed output (right) by the -VAE used for the construction of the analytic parameter space. The patterns are sampled from the validation dataset.

c.3 Analytic Behavior Space

Analytic Parameter Space Definition Parameter min max Parameter min max mass 0 1 -VAE latent -5 5 volume 0 1 -VAE latent -5 5 density 0 1 -VAE latent -5 5 asymmetry -1 1 -VAE latent -5 5 centeredness 0 1 -VAE latent -5 5 -VAE latent -5 5 -VAE latent -5 5 -VAE latent -5 5

Table 4: Features of the analytic behavior space and their min and max values

Learning curve of the -VAE for the Analytic Behavior Space

Figure 13: Learning curve of the -VAE whose latent encoding was used for the analytic behavior space.

Reconstruction Examples of the Analytic Behavior Space -VAE

Figure 14: Examples of patterns (left) and their reconstructed output (right) by the -VAE used for the construction of the analytic behavior space. The patterns are sampled from the validation dataset.

The analytic behavior space was constructed by combining the 5 statistical measures for final Lenia patterns (Section A.3) and 8 latent representation dimensions of a -VAE (Table 4). The -VAE was trained on final patterns observed during experiments. The dataset was constructed by randomly selecting 42500 patterns (37500 as training set, 5000 as validation set) from the experiments of all algorithms and each of their 10 repetitions. The dataset consists of 50% animal and 50% non-animal patterns. The -VAE uses the same structure, hyper-parameters, loss function and learning algorithm as described in Section  E. It was trained for more than 1400 epochs with (Fig. 13). The encoder which resulted in the minimal validation set error during the training was used. Its reconstructed patterns show that it is able to represent the general form of patterns but often not individual details such as their texture (Fig. 14).

Appendix D Random Exploration and IMGEPs with Hand-Defined Goal Spaces

Two random explorations and several IMGEPs with different hand-defined goal spaces were evaluated and compared. The main paper and the additional results in Section F only report the results for the best random exploration and one IMGEP variant with a hand-defined goal space. This section introduces the implementation details and diversity results of all evaluated random explorations and IMGEPs with hand-defined goal spaces.

d.1 Random Explorations

We evaluated two random exploration strategies: Random Initialization and Random Mutation. The main paper and the additional results in Section F only discuss the Random Initialization approach.

Random Initialization: This approach sampled for each of the 5000 explorations a random parameter including a random CPPN to generate the initial state . The approach can be replicated by using Algorithm 2 with .

Random Mutation: This approach is closer to the principle of IMGEPs. It first performs random explorations and adds each explored parameter to a history . Afterwards, it randomly samples a parameter from the history and mutates it. The new parameter is also added to history . The approach can be replicated by using Algorithm 2 where line 6 is skipped and the parameter sampling distribution is selecting a random parameter from the history and mutating it.

d.2 IMGEPs with Hand-Defined Goal Spaces

We evaluated several IMGEP variants with goal spaces that were hand-defined (IMGEP-HGS). Each space was constructed by a different combination of statistical measures of the final Lenia patterns (Tables 5 and 6) which are described in Section A.3. The main paper and the additional results in Section F only discuss the IMGEP-HGS 9 approach. Algorithm 2 lists the steps of the IMGEP-HGS variants. They begin with random explorations, followed by 4000 explorations based on randomly generated goals. Each goal was sampled from a uniform distribution within the ranges defined in Table 5. Then the parameter from a previous exploration that resulted in the closest outcome to the current goal was mutated and explored.

1 Initialize goal space representation by hand-defined features
2 for  to  do
3       if  then // Initial random iterations to populate
4             Sample
5      else  // Intrinsically motivated iterations
6             Sample a goal based on the space represented by
7             Choose
8            
9      Perform an experiment with and observe
10       Append to the history
11      
Algorithm 2 IMGEP-HGS

d.3 Results

The random explorations and IMGEP-HGS variants are compared by their resulting diversity in the analytic parameter and behavior space (Fig. 15). The diversity is measured by the number of reached bins in each space using a binning of 7 bins per dimension.

The Random Initialization approach reached for all diversity measures a higher diversity than the Random Mutation approach. Therefore, the Random Initialization approach is used for the comparison to IMGEP approaches in the main paper and the additional results in Section F.

Most IMGEP-HGS variants had a higher diversity in the analytic behavior space compared to random explorations, although their diversity in the analytic parameter space is lower. This shows the advantage of IMGEPs over random searches in discovering a wider range of patterns in the target system. The best overall diversity had IMGEP-HGS 3, 4 and 9. We chose IMGEP-HGS 9 to compare it with learned goal spaces in the main paper and for the additional results in Section F. It identified the highest diversity of non-animals of the three variants (3, 4, 9) reaching a higher diversity for non-animals than any IMGEP with a learned goal space. It was therefore selected to show that the choice of the goal space has an influence on the patterns that IMGEPs identify.

Depending on the statistical measures used to define the goal space the diversity between the IMGEP-HGS variants varied. IMGEPs that use the volume measure (HGS 1 - 4) reach in general a higher overall diversity which can be attributed to their higher diversity of animal patterns than goal spaces with the density measure (HGS 5 - 8) (Fig. 15, b, c). In terms of diversity of identified animals showed the inclusion of several measures the best performance (HGS 4 and HGS 8 in Fig. 15, c). In terms of diversity of identified non-animals showed the inclusion of several measures besides the centeredness measure the best performance (HGS 3 and HGS 7 in Fig. 15, d). The results show that the choice of the goal space has an important influence on the diversity of identified patterns and their type (animal or non-animal).

Feature min max
mass
volume
density
asymmetry
centeredness
Table 5: HGS Goal Space Ranges
HGS-Variants
Feature 1 2 3 4 5 6 7 8 9
mass
volume
density
centeredness
asymmetry
Table 6: IMGEP-HGS Variants

(a) Diversity in the Analytic Parameter Space (b) Diversity in the Analytic Behavior Space (c) Diversity of Animals in the Analytic Behavior Space (d) Diversity of Non-Animals in the Analytic Behavior Space

Figure 15: Although all IMGEP-HGS variants have lower diversity in the analytic parameter space compared to the Random Initialization approach, most of them have a higher diversity in the analytic behavior space. Each dot besides the boxplot shows the diversity of found patterns for each repetition (). The box ranges from the upper to the lower quartile. The whiskers represent the upper and lower fence. The mean is indicated by the dashed line and the median by the solid line.

Appendix E IMGEPs with Random and Learned Goal Spaces using Deep Variational Autoencoders

We considered three random initializations for the VAE representation used in the IMGEP-RGS as well as three different training objectives for learning the VAE goal space used in IMGEP-PGL and IMGEP-OGL. Variational Autoencoders (VAEs) (kingma2013auto; rezende2014stochastic) are commonly used deep generative models that can unsupervisedly learn a latent representation of the data. The latent representation has a reduced number of dimensions and should capture the important features of the input data. We use VAEs to learn the important features that describe Lenia patterns. The features are then used to define goal spaces for IMGEPs. This section details the different variants that were implemented for IMGEP with random and learned VAE goal spaces, the implementation details and compares the diversity results as well as the VAEs reconstruction accuracy.

e.1 IMGEP with Random VAE Goal Spaces

To study the impact of learning representations we implemented IMGEP-RGS as an ablated version of IMGEPs where the goal space is based on the encoder of a VAE with random weights. We evaluated three variants to randomly set the weights of the VAE encoder: Pytorch (paszke2017automatic), Xavier (glorot2010understanding) and Kaiming (he2015delving). The VAE is composed of four 2D convolutional layers (with ReLU activations) followed by three fully-connected layers. Table 7 shows the different sampling distributions from which the encoder parameters are initialized. We used uniform distributions for both Xavier and Kaiming variants and set all the layers bias parameters to zero.

Convolutional Layers Linear Layers
RGS Variants bound weight bias weight bias
Pytorch
Xavier 0 0
Kaiming 0 0
Table 7: IMGEP-RGS variants initialization schemes. is the number of input units in the weight tensor and is the number of output units in the weight tensor.

e.2 IMGEP with Learned VAE Goal Spaces

e.2.1 VAE framework

VAEs have two components: a neural encoder and decoder. The encoder represents a given data point in a latent representation . In variational approaches the encoder describes a data point by a representative distribution in the latent space of reduced dimension . A standard Gaussian prior and a diagonal Gaussian posterior are used for this purpose. Given a data point , the encoder outputs the mean and variance of the representative distribution in the latent space. The decoder tries to reconstruct the original data from a sampled latent representation for the distribution given by the encoder.

Under these assumptions, training is done by maximizing the computationally tractable evidence lower bound (with ):

(5)

The first term () represents the expected reconstruction accuracy while the second () is the KL divergence of the approximate posterior from the prior.

(6)

e.2.2 VAE variants

The recent growing interest in unsupervised representation learning, and therefore in VAEs, resulted in a plethora of proposed losses, network designs and choices of family for the encoder, decoder and prior distributions (tschannen2018recent). In order to enhance desired properties such as interpretability and disentanglement of the latent variables, many current state-of-the-art approaches build on the VAE framework and augment the VAE objective (higgins2017beta; burgess2018understanding; kim2018disentangling; chen2018isolating; kumar2017variational).

In this paper, we couple the VAE architecture with three different objectives: the classical VAE objective (kingma2013auto) (equation 5 with ), the -VAE objective (higgins2017beta) equation 5 with ) and an augmented -VAE objective (equation 7).

The -VAE objective re-weights the term by a factor , aiming to enhance the disentangling properties of the learned latent factors. We are interested in such properties as it has been shown that it can benefit exploration (laversanne2018curiosity). However, heavily penalizing can result in the network learning to “sacrifice” one or more of the learned latent variables in order to nullify their contribution (equation 6). Those dimensions become completely uninformative and useless for further exploration in the learned latent space. This phenomenon is known as posterior collapse and is a common problem when training VAEs (bowman2015generating; chen2016variational; he2019lagging; kingma2016improved).

To prevent this phenomenon to happen, we then considered an augmented -VAE objective with a new term that encourages the network to decrease together the individual contributions of the different latent variables. This augmented loss term not only minimizes the averaged contribution (sum) but also the variance of the individual contributions:

(7)

Similarly other modifications of the training objective can be found in the literature to avoid posterior collapse (tolstikhin2017wasserstein; zhao2017infovae).

By writing the VAE training objective as stated in equation 8, the three different variants outlined above correspond to the following set of hyper-parameters , and .

(8)

e.3 Implementation Details

This section describes the IMGEP approaches (RGS, PGL and OGL) and the network architecture, training procedure, hyper-parameters and datasets for the training of their VAEs.

All VAEs use the same architecture (Table 8). The encoder network has as input the Lenia pattern and as outputs for each latent variable the mean and log-variance . The decoder takes as input during the training for each latent variable a sampled value . For validation runs and the generation of all reconstructed patterns shown in figures the decoder takes the mean as input. Its output is the reconstructed pattern.

The training objectives of all three variants are given in section E.2.1. The resulting loss function (Eq. 8) of all VAE variants for a batch is:

where are the input patterns, are the reconstructed patterns, are the outputs of the decoder network and is the number of latent dimensions. The reconstruction accurray part of the loss is given by a binary cross entropy with logits:

where the index is for the single cells (pixel) of the pattern and for the datapoint in the current batch, is the batch size and . The KL divergence terms are given by:

All VAEs were trained for 2000 epochs and initialized with pytorch default initialization. We used the Adam optimizer (kingma2014adam) (, , , , weight decay=) with a batch size of 64.

The patterns from the datasets were augmented by random x and y translations (up to half the pattern size and with probability 0.3), rotation (up to 40 degrees and with probability 0.3), horizontal and vertical flipping (with probability 0.2). The translations and rotations were preceded by spherical padding to preserve Lenia spherical continuity.

Encoder Decoder
Input pattern A: Input latent vector z:
Conv layer: 32 kernels , stride , -padding + ReLU FC layers : 256 + ReLU, + ReLU
Conv layer: 32 kernels , stride , -padding + ReLU TransposeConv layer: 32 kernels , stride , -padding + ReLU
Conv layer: 32 kernels , stride , -padding + ReLU TransposeConv layer: 32 kernels , stride , -padding + ReLU
Conv layer: 32 kernels , stride , -padding + ReLU TransposeConv layer: 32 kernels , stride , -padding + ReLU
FC layers : 256 + ReLU, 256 + ReLU, FC: TransposeConv layer: 32 kernels , stride , -padding
Table 8: VAE architecture for the pretrained and online experiments.

Three types of IMGEPs were evaluted:

IMGEP-RGS (random goal space):

IMGEP with a goal space defined by an encoder network with random weights (Algorihm 3). The network architecture of the encoder is the same that the one of the VAEs used for IMGEP with learned goal spaces. In the other IMGEP algorithms (HGS/PGL/OGL), the goals are sampled uniformly within fixed-range boundaries that are chosen in advance. However, in the case of random goal spaces, we do not know in advance in which region of the space goals will be encoded. Therefore, we set the range to for each of the latent variables, to also bias exploration towards the boundaries of the discovered goal space.

1 Initialize goal space encoder with a VAE with random weights
2 for  to  do
3       if  then // Initial random iterations to populate
4             Sample
5      else  // Intrinsically motivated iterations
6             Sample a goal based on space represented by
7             Choose
8            
9      Perform an experiment with and observe
10       Append to the history
11      
Algorithm 3 IMGEP-RGS
IMGEP-PGL (prelearned goal space):

IMGEP (Algorihm 4) with a goal space defined by a VAE that was trained before the exploration starts. The VAE is trained on a dataset with precollected Lenia patterns. The best VAE model obtained during the training phase, i.e. the one with with the highest accuracy on the validation data, is used for the exploration.

The dataset used to train the VAE has 558 patterns which are distributed into a training (75%), validation (10%) and testing (15%) datasets. Half of the patterns (279) were manually identified animal patterns by chan2018lenia (Fig. 5). The other half (279) are randomly initialized CPPN patterns as described in Section B.1 (Fig. 9).

During the intrinsically motivated iterations, goals are uniformly sampled in the hypercube . This values were chosen because the encoder of the VAE is trained to match a prior standard normal distribution (through the KL divergence term), therefore we can assume that most area of the covered goal space will fall into that hypercube.

1 Initialize goal space encoder with the pretrained VAE
2 for  to  do
3       if  then // Initial random iterations to populate
4             Sample
5      else  // Intrinsically motivated iterations
6             Sample a goal based on space represented by
7             Choose
8            
9      Perform an experiment with and observe
10       Append to the history
11      
Algorithm 4 IMGEP-PGL
IMGEP-OGL (online learned goal space):

IMGEP (Algorihm 1 in the main paper) that trains the VAE which defines the goal space during the exploration. The VAE is trained on Lenia patterns discovered by the algorithm. Every explorations the VAE model is trained for 40 epochs resulting in 2000 epochs in total (less if there is not enough data after the first runs to start the training).

Importance sampling is used to give the patterns in the training dataset a different weight during the training. A weighted random sampler is used that samples newly discovered patterns from the training dataset half of the time. Each pattern that has been added to the training dataset during the last period of 100 explorations has a probability of to be sampled (N is the total number of new patterns in the dataset). Older patterns are also sampled half of the time each one with probability . As a result, newer discovered patterns have a higher weight and a stronger influence on the training of the VAE model.

The datasets were constructed incrementally during the exploration by gathering non-dead patterns. One pattern every ten is added to the validation set (10%) and the rest is used in the training set. At the initial period of training, the training dataset amounts approximately 50 patterns and at the last period of training the dataset amounts approximately 3425 patterns (Fig. 16). The validation dataset only serves for checking purposes and has no influence on the learned goal space.

During the intrinsically motivated iterations, goals are uniformly sampled in the hypercube .

Figure 16: The IMGEP-OGL collects during the exploration animal and non-animal patterns to add them to its dataset for the training of the VAE. The figure shows the development of the averaged dataset size over all repetitions () of the IMGEP-OGL algorithm with a -VAE. Standard deviation is depicted as a shaded area but for some not visible because it is too small.

e.4 Results

We compared the different IMGEP-RGS variants as well as the different objective variants for IMGEP with learned goal spaces (PGL and OGL) with each other on the basis of the diversity of their identified patterns. Furthermore, the pattern reconstruction ability of the VAEs is analyzed.

e.4.1 Diversity

The algorithms are compared by their diversity in the analytic parameter and behavior space (Section C). Diversity is measured by the number of discretized bins that were explored by the algorithms in each space if each dimension of the space is seperated in 7 bins.

All the IMGEPs with learned goal spaces reached a higher diversity in the analytic behavior space compared to random explorations (Fig. 17, b), although random explorations have a higher diversity in the analytic parameter space (Fig. 17, a). This result confirms further the advantage of IMGEPs over random explorations in terms of identifying diverse patterns.

Furthermore, all the IMGEPs with learned goal spaces outperformed the IMGEP with random goal space. This result shows the importance of learning relevant pattern features that, combined with an effective exploration process, is key to discover a high diversity of patterns.

There is no significant differences between Xavier and Kaiming IMGEP-RGS variants. They both seem to present a higher variance than the Pytorch variant and reach therefore a higher average performance, but it is unclear why. Because the Xavier initialization performed slightly the best for IMGEP-RGS, it was used for the results in the main paper and in Section F.

The difference between the PGL and OGL variants were small for all diversity measures. The OGL showed a slight advantage over the PGL versions in all diversity measures. Thus, an online version of the IMGEP can learn an appropriate goal space during the exploration. A precollected dataset as for the PGL is not necessary to successfully use IMGEPs.

The difference between the VAE objective variants (VAE, -VAE and augmented -VAE) was very small. The -VAE was slightly better than the other two variants for the diversity in the analytic parameter space and for both IMGEP variants. All VAEs seemed to learn similar features for our datasets. It might be possible that the different VAE variants show different behaviors if their parameters are fine-tuned, such as the parameter, but this was out of the scope of this paper. Because the -VAE objective performed slighly the best for IMGEP-PGL and IMGEP-OGL, it was used for the results in the main paper and in Section F.

(a) Diversity in the Analytic Parameter Space (b) Diversity in the Analytic Behavior Space (c) Diversity of Animals in the Analytic Behavior Space (d) Diversity of Non-Animals in the Analytic Behavior Space

Figure 17: The different VAE algorithms showed only small differences in terms of diversity. The -VAE had a slighlty better diversity for the analytic behavior space for both IMGEP variants (PGL and OGL). Each dot besides the boxplot shows the diversity of found patterns for each repetition (). The box ranges from the upper to the lower quartile. The whiskers represent the upper and lower fence. The mean is indicated by the dashed line and the median by the solid line.

e.4.2 VAE Pattern Reconstruction

All learned VAE variants showed similar learning curves on the precollected dataset and the online collected dataset (Fig. 18 and 20). Their ability to reconstruct patterns based on the encoded latent representation is also qualitatively similar. For both datasets the VAEs are able to learn the general form of the activity pattern (Fig. 19 and 21). Nonetheless, the compression of the images to a 8-dimensional vector results in a general blurriness in the reconstructed patterns. As a result, the VAEs are not able to encode finer details and textures of patterns (Fig. 22). We believe this is the reason for their ability to identify more animals compared to the random exploration or the IMGEP-RGS and IMGEP-HGS. Different animals have often a different form, whereas non-animals span often over the whole area of Lenia’s grid and differentiate mainly in their textures and small details. Because the VAE seem to encode more the general form a goal space based on them is more appropriate to find patterns with different forms such as the animals and not different textures which are important for non-animals.

PGL - VAE PGL - -VAE PGL - -VAE aug
Figure 18: Averaged learning curves () of the VAEs for the IMGEP-PGL experiments.

Reconstruction Examples of the -VAE used for the IMGEP-PGL

Figure 19: Examples of patterns (left) and their reconstructed output (right) by a VAE network used for the IMGEP-PGL. The patterns are sampled from its validation dataset. The dataset is composed of half animal patterns (rows 1 and 2) and half randomly generated CPPN patterns (rows 3 and 4).
OGL - VAE OGL - -VAE OGL - -VAE aug
Figure 20: Averaged learning curves () of the VAEs for the IMGEP-OGL experiments.

Reconstruction Examples of the -VAE used for the IMGEP-OGL

Figure 21: Examples of patterns (left) and their reconstructed output (right) by a VAE network used for the IMGEP-OGL. The patterns are sampled from its validation dataset. Animal patterns (rows 1 and 2) and non-animal patterns (rows 3 and 4) are shown.

Reconstruction Examples of Non-Animals with Textures

Figure 22: Examples of “textured” patterns that the VAE networks is unable to reconstruct. While a human eye can differentiate the input patterns (spatial frequency, orientation, etc.) the VAE reconstructs all images identically.

Appendix F Additional Results

This section lists additional results. The results are only for a subset of all algorithm variants that have been evaluated. The results correspond to the following algorithms: Random to Random Initialization (Section D), IMGEP-RGS to IMGEP-RGS Xavier (Section E), IMGEP-HGS to IMGEP-HGS 9 (Section D), IMGEP-PGL to IMGEP-PGL with a -VAE (Section E) and IMGEP-OGL to IMGEP-OGL with a -VAE (Section E).

f.1 Number of Identified Patterns

The main paper used the measure of diversity of the found patterns per algorithm to compare their performance. Another measure to compare the algorithms is the number of the patterns they identified for each of the three pattern classes: dead, animals, non-animals (Fig. 23).

The results deviate slightly from the diversity measures. In terms of identified non-dead patterns, all IMGEP approaches outperform a random exploration by finding between 10 to 20% more patterns. Although the IMGEP-RGS and IMGEP-HGS find more non-dead patterns than the IMGEPs with learned goal spaces (OGL, PGL) its overall diversity in the analytic behavior space is smaller (Fig. 3, b of the main paper).

In the case of animal patterns, all IMGEP approaches outperform the random exploration (8%). Within the IMGEP approaches the online learned goal space approach (IMGEP-OGL, 34%) and the pretrained goal space approach (IMGEP-PGL: 35%) find a similar amount. The hand-defined goal space approach identified less animal patterns (IMGEP-HGS: 19%). The random goal space approach is the one that finds the least (IMGEP-RGS: 10%). For non-animal patterns, the random goal space approach identifies most patterns (IMGEP-RGS: 79%) , followed by the hand-defined approach (IMGEP-HGS: 67%), the random exploration (56%) and both learned goal space approaches (IMGEP-OGL: 45%, IMGEP-PGL: 43%). Although the number of identified non-animal patterns for the learned goal space approaches is low, their diversity is higher than for a random exploration and the random goal space approach, and only slightly lower than for the hand-defined goal space approach (Fig. 3, d of the main paper).

Percentage of Identified Patterns per Class

Figure 23: IMGEPs found less dead patterns compared to the random exploration. In terms of animals, the learned goal space approaches (IMGEP-PGL and OGL) found most animal patterns. For non-animals, the random goal space (IMGEP-RGS) found most patterns, followed by the hand-defined goal space (IMGEP-HGS). The plot illustrates the percentage of found patterns for each class. Each dot besides the boxplot shows the percentage of found patterns for each repetition (). The box ranges from the upper to the lower quartile. The whiskers represent the upper and lower fence. The mean is indicated by the dashed line and the median by the solid line.

f.2 Dependence of the Diversity Measure on the Number of Bins per Dimension

The diversity of identified patterns measures the spread of the area which the identified patterns cover in the analytic behavior space (Fig. 3 in the main paper). The measure is defined by dividing the space in a number of discrete areas or bins (Section C.1). The diversity is then measured by how many bins are covered during an exploration. The bins are created by dividing each dimension of the space into a number of equally-sized bins. We analyzed how the number of bins per dimension influences the diversity measure (Fig. 24).

Although the diversity difference between the algorithms depends on the number of bins per dimension for each space, the order of the algorithms, i.e. which algorithm has a higher diversity, is generally constant. Only if the number of bins per dimension grows large (>10) the order of the algorithms changes for some spaces and subpatterns. The order starts to follow the order seen for the number of identified patterns (compare the diversity with 25 bins per dimension in Fig. 24 with the number of identified patterns in Fig. 23). In this case the discretization of the space becomes too fine and each pattern falls into its own discretized area. We chose therefore a smaller number of bins per dimension of 7 (including the out of border bins) for all other diversity plots in the main paper and the Supplementary Material to compare the algorithms in a meaningful way.

(a) Diversity in Parameter Space (b) Diversity in Behavior Space
(c) Behavior Space Diversity for Animals (d) Behavior Space Diversity for Non-Animals
Figure 24: Dependencies of the diversity measure on the number of bins per dimensions for (a) the analytic parameter and (b-d) the behavior space. Depicted is the average diversity () with the standard deviation as shaded area (for some not visible because it is too small).

f.3 Dimension Reduction of the Analytic Parameter and Behavior Space

A two-dimensional reduction of the identified patterns in the analytic parameter and behavior space (Section C) visualizes the diversity of the parameters and identified patterns. The dimension reduction of the parameter space is based on all explored parameters encoded in the analytic parameter space from the first repetition experiment of all 4 algorithms. All encoded points were normalized so that the overall minimum value became 0 and the maximum value 1 for each dimension. Afterwards a principle component analysis (PCA) was performed to detect the 2 principle components (Jolliffe1986). The found patterns for each algorithm are plotted according to those components.

Random IMGEP-RGS
IMGEP-HGS
IMGEP-PGL IMGEP-OGL
Figure 25: A random exploration covers the analytic parameter space more uniformly than IMGEP algorithms which form clusters at certain areas. PCA dimension reduction of the analytic parameter space which illustrates all explored parameters by the first repetition experiment per algorithm.
Random IMGEP-RGS
IMGEP-HGS
IMGEP-PGL IMGEP-OGL
Figure 26: In the analytic behavior space IMGEPs reach a higher diversity compared to a random exploration, except IMGEP-RGS which covers similar areas. The HGS approach explores more non-animal areas and the PGL and OGL more animal areas. PCA dimension reduction of the analytic behavior space which illustrates all identified patterns by the first repetition experiment per algorithm.

The results show that the random exploration has a stronger uniform distribution than any of the IMGEP algorithms in the analytic parameter space (Fig. 25). The IMGEP algorithms show concentrations of explorations in specific regions of the parameter space. The visualization shows also that it is not possible to define distinct regions in the parameter space that allow to differentiate between dead, animal and non-animal patterns.

The same analysis was performed for the identified patterns of each algorithm encoded in the analytic behavior space (Fig. 26). It is visible that the random exploration and the random goal space (RGS) approaches are more concentrated compared to the hand-defined (HGS) or learned goal spaces approaches (PGL and OGL), especially in a region with many non-animal patterns (north-west). The IMGEP-RGS has a similar spread than random exploration, with an even higher concentration of non-animals (north-west). The IMGEP-HGS has a wider spread in the non-animal area. The IMGEPs with a learned goal space (PGL and OGL) show a stronger distribution in an area that encodes mostly animals (south).

f.4 Identified Patterns

Fig. 27, 28, 29, 30 and 31 illustrate examples of identified pattern per class (animal, non-animal, dead) and their ratio for the random exploration, IMGEP-RGS, IMGEP-HGS, IMGEP-PGL and IMGEP-OGL. The patterns have been randomly sampled from the results of the first exploration repetition experiment of each algorithm.

Random Exploration

Figure 27: Examples of identified patterns for the random exploration algorithm from the first repetition of experiments.

IMGEP-RGS

Figure 28: Examples of identified patterns for the IMGEP-RGS algorithm from the first repetition of experiments.

IMGEP-HGS

Figure 29: Examples of identified patterns for the IMGEP-HGS algorithm from the first repetition of experiments.

IMGEP-PGL

Figure 30: Examples of identified patterns for the IMGEP-PGL algorithm from the first repetition of experiments.

IMGEP-OGL

Figure 31: Examples of identified patterns for the IMGEP-OGL algorithm from the first repetition of experiments.

f.5 Visualization of Goal Spaces

The goal space of IMGEPs is their most important element because it defines which type of patterns are set as goals for the exploration. This section provides complementary material for the analysis made in Section 5.2 of the main paper and shows further visualizations of goal spaces. The goal spaces of all IMGEP algorithms are visualized via a two-dimensional reduction of each goal space. Two techniques for dimensionality reduction were applied: PCA (Jolliffe1986) and t-Distributed Stochastic Neighbor Embedding (t-SNE) (maaten2008visualizing).

The visualization was constructed by using for each exploration algorithm its goal space representations of all patterns it explored from a single repetition experiment. All goal representations were normalized so that the overall minimum value became 0 and the maximum value 1 for each goal space dimension. Afterwards the PCA was performed to detect the 2 principle components. T-SNE was executed by using the default standard Euclidean distance metric and default hyper-parameters (perplexity set to 50).

The resulting two-dimensional visualizations of the goal spaces make the differences between the algorithms visible (Fig. 32). For both aproaches (PCA, t-SNE), the random goal space (RGS) and hand-defined goal space (HGS) have only a small area and a few clusters for animal patterns. In contrast, the learned goal spaces based on -VAEs (PGL and OGL) have larger areas and more clusters for animal patterns. As a result, the learned goal spaces explore more animal patterns and find a higher diversity of them (Fig. 3, c) compared to the hand-defined goal space and the random goal space. The reason for this effect seems to be that the -VAE which defines the goal space for the PGL and OGL is learning to represent the shape of patterns. The shape is an important feature of animals. Whereas, non-animals often cover the whole Lenia grid and differ mainly in their textures which the -VAE does not represent well (Section E).

The visualization serves as a support in qualitatively evaluating and comparing the efficiency of each algorithm in extracting a diversity of patterns from the data. Integrated into an interactive interface, these graphs are also useful for a potential human end-user to easily explore and visualize the different type of found patterns during the exploration phase. Videos and demonstrations of the interface can be found on the website https://automated-discovery.github.io/.

PCA T-SNE
rgs
hgs
pgl
ogl
Figure 32: PCA and t-SNE visualization of the goal spaces for the IMGEP variants show that HGS has more area (PCA) and clusters (t-SNE) for non-animals compared to learned goal spaces (PGL and OGL) and vice versa for animals. t-SNE shows that the hand-defined goal space (HGS) and learned goal spaces (PGL and OGL) structure and cluster more the discovered patterns compared to random goal space (RGS).
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
392232
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description