Evolution of Plastic Learning in Spiking Networks via Memristive Connections
This article presents a spiking neuroevolutionary system which implements memristors as plastic connections, i.e. whose weights can vary during a trial. The evolutionary design process exploits parameter self-adaptation and variable topologies, allowing the number of neurons, connection weights, and inter-neural connectivity pattern to emerge. By comparing two phenomenological real-world memristor implementations with networks comprised of (i) linear resistors (ii) constant-valued connections, we demonstrate that this approach allows the evolution of networks of appropriate complexity to emerge whilst exploiting the memristive properties of the connections to reduce learning time. We extend this approach to allow for heterogeneous mixtures of memristors within the networks; our approach provides an in-depth analysis of network structure. Our networks are evaluated on simulated robotic navigation tasks; results demonstrate that memristive plasticity enables higher performance than constant-weighted connections in both static and dynamic reward scenarios, and that mixtures of memristive elements provide performance advantages when compared to homogeneous memristive networks.
Memristors, Genetic Algorithms, Neurocontrollers, Hebbian Theory
Gerard Howard, Larry Bull and Andrew Adamatzky are with the department of Computer Science and Creative Technologies, University of the West of England, Bristol BS16 1QY, UK, contact e-mail: firstname.lastname@example.org.
Ella Gale and Ben de Lacy Costello are with the department of Chemistry, University of the West of England, Bristol BS16 1QY, UK.
The field concerned with nanoscale brainlike information processing is known as Neuromorphic Computation (NC) . NC is a new way of computing in hardware that blurs the distinction between processor and memory, as both may be distributed at any spatial position in the architecture. Neuron-like units, such as Complementary Metal-Oxide Semiconductor (CMOS) neurons , are densely interconnected by numerous adaptive synapses that communicate via transmission of spikes. Neuromorphic architectures are yet to be physically realised, yet are /media/arxiv_projects/7262/envisioned to encorporate many attractive characteristics, including redundancy, self-organisation, adaptation, and learning .
NC has recently become more viable thanks to the manufacture of the memristor  (memory resistor) at HP labs . A memristor is a fundamental passive two-terminal circuit element whose state (memristance) is both nonvolative and dependent upon past activity. Nonvolative memory  is perfect for low-power storage, and the devices dynamic internal state facilitates information processing. These properties make the memristor ideal for use as a nanoscale synapse in NC architectures . A proposed approach to realise learning in NC involves harnessing Hebbian principles  to realise Spike Time Dependent Plasticity (STDP) , allowing connections between a presynaptic and postsynaptic neuron to alter efficiacy dependent on the spike timings of those neurons.
It has been reasoned that, much like the brain, different areas of NC architectures could be responsible for different activities. To this end, we focus on the evolution of self-organizing small-scale Spiking Neural Networks (SNNs ), where each network can be conceptualised as being representative of part of a larger NC architecture. We employ a model of neuroevolution whereby each network in the population initially comprises of a number of hidden layer neurons, connected to a problem-dependent, fixed number of input and output neurons. The evolutionary process can then alter network topology as part of the Genetic Algorithm (GA) .
In this study, we initially compare two phenomenological memristor implementations, Hewlitt-Packard (HP)-like  and Polyethylene Oxide - Polyaniline (PEO-PANI)-like , and analyse their computational properties when cast as synaptic connections in evolutionary SNNs. The memristive element of the network is designed to allow the weight of the connections to vary during a trial, providing a learning architecture which may be beneficial to the evolutionary design process. We then allow for the evolution of heterogeneous memristive networks (e.g. those containing all memristor types), and investigate whether such mixtures give an inherent performance advantage when compared to their homogeneous counterparts. As the equations used to govern the memristors are based on physical devices, the evolved networks represent possible behaviours of partial NC architectures (e.g. in the context of evolvable hardware ). Performance is evaluated on simulated robotic navigation tasks.
Our initial hypothesis is that memristive synapses provide the networks with increased performance. To test this hypothesis, we compare the homogenous memristive networks (HP, and PEO-PANI) to networks solely comprised of linear resistors (e.g. ) and constant-weighted elements. Extending the hypothesis to heterogeneous networks, we seek to confirm that varied memristive behaviours can be harnessed by the evolutionary design process to provide further advantages, specifically that (i) certain functionality can be more easily achieved by certain memristor types (ii) combinations of memristor types are beneficial to the networks. Specifically, we aim to answer the following research questions:
Does the evolutionary process allow for the successful generation of memristive networks that outperform constant-valued connections, despite the memristors nonlinearity and given the potential for complex interactions within memristive networks?
In the heterogeneous case, do mixtures of memristors provide better performance than other implementations? How do such networks generate useful behaviour?
Is there an evolutionary preference in assigning specific roles to specific memristor types based on variations in their memristive behaviours?
The remainder of the article is ordered as follows: Section II introduces background research. Section III introduces the system. Section IV details the spiking implementation. Section V outlines memristor implementations. Section VI details the GA. Section VII details the network topology mechanisms. Section VIII gives the /media/arxiv_projects/7262/environment. Section IX details the experimental setup. Sections X, XI,and XII analyse the results of the experiments that were carried out and highlight the main differences between the memristor models. Section XIII provides a summary.
2.1Spiking Networks and Evolutionary Spiking Networks
SNNs present a biologically plausible phenomenological model of neural activity in the brain. In a SNN, neurons are linked via unidirectional, weighted connections that act as communication carriers. When two neurons (A and B) are connected, neuron A is either (a) presynaptic to neuron B (the connection is directed from neuron A to neuron B) or (b) postsynaptic to neuron B, if the connection is directed from neuron B to neuron A.
The medium of communication is the action potential, or spike, which is emitted from a presynaptic neuron and received by all connected postsynaptic neurons. Each neuron has an internal state, known as“membrane potential”, which is influenced by spike reception but decreases over time. Spikes are emitted from a given neuron after this state surpasses a certain level of excitation (received either from the /media/arxiv_projects/7262/environment or from presynaptic neurons). This time-dependent build-up of membrane potentials and release of spikes is able to produce dynamic activation patterns through time.
The earliest equations that describe SNNs were described by Lapicque in 1907 . Two popular formal SNN implementations are the Leaky Integrate and Fire (LIF) model  (which is derived from ) and the Spike Response Model (SRM) . The main justifications for including SNN networks are (i) increased utility when compared to other network models e.g. the MLP  - shown in  (ii) current NC research focussing on spiking neurons as a basis of communication due to low power requirements and the ability to harness spikes as a learning mechanism (e.g ).
Application of evolutionary techniques to neural networks involves the use of a GA to alter connection weights, network topology, connectivity patterns, or combinations of the above. A survey of various methods for evolving both weights and architectures in neural networks is presented in . Neuro-evolution was first applied to LIF SNNs to evolve networks that produce temporally-dependent outputs  and SRM spiking networks were first evolved for a vision-based navigation task .
As the subject of the paper describes robotics tasks, a short overview of spiking neuroevolutionary robotics follows. Nolfi and Floreano  provide a review. SNN circuits were evolved to model abstractions of biological retina in  where the system was applied to a robotics platform. An LIF spiking model is used in , again for the goal of evolving navigation behaviours. A similar spiking model is applied to a simple robotic navigation task ; the authors conclude that the dynamics of a SNN provide further degrees of problem-solving freedom given temporally-sensitive problems. A recent hardware implementation is given in .
Memristors (memory-resistors) are the fourth fundamental circuit element, joining the capacitor, inductor and resistor . A memristor can be defined as a resistor whose instantaneous resistance value (a) depends on all charge that has passed through it (b) is nonvolatile. Formally, a memristor is a passive two-terminal electronic device that is described by the non-linear relationship between the device terminal voltage, , and terminal current, , as shown in (1). Nonlinearity arises because the instantaneous memristance, , depends on the charge (2), where is the time integral of voltage, or magnetic flux.
The memristor was theorectically characterized and named by Chua in 1971 . Memristive systems have recently enjoyed a resurgence of interest from the research community after being manufactured by HP labs . This has spawned a number of research avenues in terms of applications of memristive systems , and synthesis of various other memristors  .
There are many reasons to think that memristors might be useful in NC. Primarily, memristors can be manufactured at the required scale and implement synaptic behaviour in hardware . HP memristors  have been used in the manufacture of nanoscale neural crossbars , which have been applied to pattern recognition circuits . Silver Silicide memristors have been shown to function in neural architectures . Memristor theory has also been used to model learning in amoeba . In particular,  highlights the attractive prospect of applying evolutionary computation techniques directly to memristive hardware, as memristors can simultaneously perform the functions of both processor and memory. This work will focus on the use of the memristor as an adaptive synapse.
Hebbian learning  is thought to account for synaptic adaptation and learning in the brain. Briefly, Hebbian learning states that “neurons that fire together, wire together” - in other words in the event that a presynaptic neuron causes a postsynaptic neuron to fire, the synaptic strength between those two neurons is increased so that such an event is more likely to happen in the future. This mechanism allows for self-organising, correlated activation patterns and is therefore of particular relevence when considering learning in neural systems.
Spike Time dependent Plasticity (STDP)  was originally formulated as a way of implementing Hebbian learning within computer-based neural networks. Interestingly, the STDP equation has been found to have distinct similararities to the reality of Hebbian learning in biological synapses . It has recently been postulated that a memristance-like mechanism affects synaptic efficiacy in biological neural networks , based on similarities between memristive equations and their neural counterparts.
Integration of neuroevolution with neuromodulatory networks is investigated by Soltoggio (e.g., ). In the networks, dedicated modulatory neurons are responsible for affecting the inputs received by traditional neurons. Heterogeneous modulation rules are available to the networks, although unlike memristors they have no direct hardware analogue. The networks are tested on agent navigation tasks and robot controllers , both with promising results. A probabilistic SNN model is investigated  whereby the probability of spike transmission across a synapse is affected by Hebbian learning rules; results demonstrate the power of plasticity in generating varied behaviour. Floreano and Urzelai  evolve Discrete Time Recurrent Neural Networks  where synapses are affected by four versions of the Hebb rule during the lifetime of the agent as it solves a navigation task.
Memristive STDP has been implemented in , with all four papers using spike-coincidence based STDP as a learning mechanism. Also consistent between the papers is the use of a “two-part spike”, which a SNN neuron uses to pass information to both presynaptic and postsynaptic neurons. The temporal coincidence between presynaptic and postsynaptic spikes at a memristive synapse alters the voltage across that synapse; if a threshold voltage is surpassed, the synapses weight is altered. The main difference between  and  is that in , the two-part spike is implemented as a discrete-time stepwise waveform approximation, whereas  use values calculated from continuous waveform equations, allowing them to operate in continuous time.
In summary, this literature review has highlighted the previous success of STDP as a neural learning mechanism, and shown memristors as an ideal medium for the implementation of STDP, especially coupled with a SNNs. The relevence of a robotic navigation task in the context of neuroevolution is also shown.
The system presented here consists of a population of SNNs which are evaluated on a robotics test problem, and altered via GA operation which is detailed in section VI. To introduce the terminology to be used throughout this paper: each experiment lasted for 1000 evolutionary generations; each generation involved new networks in the population being evaluated on the test problem (a trial). Each trial consisted of a number of timesteps, which began with the reading of sensory information and calculation of action, and ended with the agent performing that action. Every timestep consisted of 21 steps of SNN processing, at the end of which the action was calculated.
4Spiking Network Implementation
We base our spiking implementation on the LIF model. Neurons can be stimulated either by an external current or by connections from presynaptic neurons; recurrency and direct input - output connections are illegal. Each neuron has a membrane potential, , where 0, which slowly degrades over time according to (3). As spikes are received by the neuron, the value of is increased in the case of an excitory spike, or decreased if the spike is inhibitory. If surpasses a positive threshold, , the neuron spikes and transmits an action potential to every neuron to which it is presynaptic, with strength relative to the efficiacy of the synapse that connects them. The neuron then resets its membrane potential to some low number. At time , the membrane potential of a neuron is given in (3); the reset equation is given in (4).
Here, is the membrane potential at time , is the input current to the neuron, is a positive constant, is the degradation (leak) constant and is the reset membrane potential of the neuron. The networks are arranged into three layers: input (which receives sensory information), hidden (a variable-size layer), and output (where motor-actions are calculated). Example architectures can be seen in Figs. Figure 10(d), Figure 11 and Figure 12(d). A model of temporal delays is used so that, in the single hidden layer only, a spike sent between two neurons is received steps after it is sent (see (5)), is the index of the sending neuron and is the index of the receiving neuron (indexing is sequential).
Action calculation involves the current input state being repeatedly processed 21 times by each network (experimentally determined to allow sufficient STDP to occur during the lifetime of the agent). For the purposes of this paper, each network was initialised with 6 input neurons (used to pass sensor values to the network), nine hidden neurons, and 2 output neurons that are used to calculate the action. Each output neuron had an activation window that recorded the number of spikes produced by that neuron over the last 21 steps. We classified the spike trains at the two output neurons as being either low or high activated (see (6)).
Here, is the number of spikes in the window and is the window size. The combined spike trains at the two output neurons translated to a discrete movement according to the output activation strengths. See section Section 8.1 for precise details of sensory state generation and possible actions.
We are primarily interested in implementing memristors as a form of variably-weighted connection between neurons in our SNNs, where variable weight indicates that connection efficiacy can alter during a trial. The behaviour of each memristor under STDP depends on the memristive equations used; these are defined in subsections 1 and 2. The linear resistor is described in subsection 3. It is important to note that as the calculated resistance value of memristive connections are based on their real-world counterparts, simulation results should be replicable in hardware. Constant-valued connections do not alter weight during a trial; rather, their weights are altered between trials via the GA. The HP memristor was chosen for study as it is well understood. The PEO-PANI memristor was chosen as it is also well-understood, but more importantly has a strongly different memristance profile (see Figure 1(a)), allowing the potential for contrasting behaviour.
The HP memristor is comprised of thin-film Titanium Dioxide (TiO) and oxygen-depleted Titanium Oxide (TiO). The boundary between the two compounds moves in response to the charge on the memristor, which in turn alters the resistance of the device. More details can be found in . Memristance is defined in (7):
is the resistance of the TiO and is the resistance of the TiO. is a parameter comprising both the thickness of the device and the mobility of the oxygen vacancies in TiO and TiO, and () (see (8)) is the charge on the device.
One further parameter is , which is the amount of time it takes the memristor to alter from to . Once is calculated, the weight of the memristive connection can be set as ; weight is therefore equal to the inverse memristance of the device.
The PEO-PANI memristor consists of layers of PANI, onto which Lithium ion (Li)-doped PEO is added . We have phenomenologically recreated the memristance profile of the PEO-PANI memristor, resulting in behaviour similar to that seen in . The equation for is identical to that of the HP memristor (7); the weighting equation is given in (9):
In this study the term “linear resistor” refers to a theoretical nonvolatile-memory-augmented device that describes a linear relation between and . The linear resistor alters by , therefore it takes positive STDP events to increase from to .
As mentioned in section Section 2.3, a number of STDP implementations exist. As our SNNs operated in discrete time, we follow  in using discrete-time stepwise waveforms. STDP could affect any variable connection in the networks.
In our implementation, each neuron was augmented with a variable to record the last time it spiked (), which is initially 0. When a neuron spiked, was set to some positive number. At the end of each of the 21 steps that make up a single timestep, each memristive connection is analysed by checking the values of its presynaptic and postsynaptic neurons. If the calculated value exceeded a positive threshold , memristance of the synapse occurred (see Figure 2,(10),(11)). At the end of each step, each value was decreased by 1 to a minimum of 0, creating a discrete stepwise waveform through time. Each STDP event altered , by , as detailed in (12), which was then used to calculate the synaptic weight. The memory of the system is therefore contained in .
From Fig. 1(a) it can be seen that the amount of change in connection weight depends heavily on the current weight of the connection. In particular, the HP memristor is insensitive to the effects of STDP where , the PEO-PANI-like memristor is insensitive where . The linear memristor displays constant sensitivity. Fig. 1(b) shows the effect of STDP on the weight in the more sensitive areas (for HP where , for PEO-PANI where ), and compares them to the effect of STDP on the linear resistor over the same number of STDP steps. In the case of the HP-like and PEO-PANI-like memristors, it can be seen that memristance can account for a change equal to 90% of the total range of within 90 STDP steps; the same number of STDP steps gives a change in the linear resistor equal to 10% of the total range of .
In our GA, two parents are selected fitness-proportionately, mutated, and used to create two offspring. We use only mutation to explore connection weight space; crossover is omitted as sufficient solution space exploration can be obtained via a combination of self-adaptive weight and topology mutations; a view that is reinforced in the literature, e.g. . The offspring are inserted into the population and two networks with the lowest fitness deleted. Parents stay in the population competing with their offspring.
We utilise self-adaptive mutation rates as in Evolution Strategies (ES) , to dynamically control the frequency and magnitude of mutation events taking place in each network. This allows for increased structural stability in highly fit networks whilst allowing less fit networks to search solution space more widely per GA application. Here, the value (originally , the rate of mutation per allele) of each network is initialized randomly uniformly in the range [0,0.25]. During a GA cycle, a parent’s value is modified as in (13), the offspring then adopts this new , and mutates itself by this value, before being inserted into the population. The proportionality constant it set to 1 and thereore omitted.
Only non-memristive networks can alter their connection weights via the GA. Connection weights in this case are initially set during network creation, node addition, and connection addition randomly uniformly in the range [0,1]. Memristive network connections are always set to 0.5, and cannot be mutated from this value. This forces the memristive networks to harness the plasticity of their connections during a trial to successfully solve the problem.
In addition to self-adaptive mutation, we apply two topology alteration schemes to allow the modification of the spiking networks by adding/removing (i) hidden layer nodes (ii) neural connections. This framework allows each network to control its own knowledge representation autonomously by adapting network topology to reflect the complexity of the problem considered. All network types use these topology mechanisms. Our self-adaptive topology mechanisms bare some resemblance to Takagi-Sugeno (TS) (neuro-) fuzzy models  in that both parameter and self-organized structure learning occur (usually using a recursive least-squares algorithm for parameters and some rule density or utility metric for structure). However TS systems are commonly used for clustering and use multiple fuzzy rules to define a solution, rather than a single individual as in our case.
Given the nature of NC, it would be useful if appropriate network structure is allowed to develop until some task-dependent required level of computing power is attained. A number of encoding variants have been developed specifically for neuroevolution, including Analog Genetic Encoding (AGE) , which allows for both neurons and connections to be modified, amongst others, e.g. . A popular framework is NeuroEvolution of Augmenting Topologies (NEAT) , which combines neurons from a predetermined number of niches to encourage diverse neural utility and enforce niche-based evolutionary pressure. This method has been shown to be amenable to real-time evolution . Successful applications of neuroevolution range from real-world optimisation  and classification  to control .
In our system, each network has a varying number of hidden layer neurons (initially 9, and always 0). Additional neurons can be added or removed from the single hidden layer based on two new self-adaptive parameters, and . Here, is the probability of performing neuron addition/removal and is the probability of adding a neuron; removal occurs with probability . Both have initial values taken from a random uniform distribution, with ranges [0,0.5] for and [0,1] for . Offspring networks have their parents and values modified using (13) as with , with neuron addition/removal taking part after mutation. Added nodes are initially excitatory with 50% probability, otherwise they are inhibitory.
Feature selection is a way of streamlining input to a given process. Automatic feature selection includes wrapper approaches (where feature subsets change during the running of the algorithm ) and filter approaches (where the subset selection is a pre-processing step ). The connectivity pattern of artificial neural networks was first evolved by Dolan and Dyer . A comparitive study can be found in .
In this paper we allow any connection to be individually enabled/disabled. During a GA cycle a connection can be enabled or disabled based on a new self-adaptive parameter (which is initialized and self-adapted in the same manner as and ). If a connection is enabled for a non-memristive network, its connection weight is randomly initialised uniformly in the range [0,1], memristive connections are always set to 0.5. All connections are initially enabled for new networks. During a node addition event, new connections are set probabilistically, with . Connection Selection is particularly important to the memristive networks. As they cannot alter connection weights via the GA, variance induced in network connectivity patterns plays a large role in the generation of useful STDP patterns. In the context of NC, an evolutionary algorithm could conceivably tinker with connection structure as a means of homeostatic fault tolerance and recovery, as well as a compression technique to reduce the number of active synapses.
Our chosen robotics simulator was Webots , a platform that is popular amongst the research community. Alternatives are summarised in . Webots was selected due to the accuracy of its simulations and prevalence of successful applications in the literature. Examples include evolution of simulated Khepera controllers to avoid obstacles , showing the suitability of Webots to an evolutionary approach. Tellez and Angulo  apply incremental neuroevolution to successfully generate complex behaviours from intitially-less-complex /media/arxiv_projects/7262/environments or sensory configurations. Hierarchical neural control is exploited to guide a simulated Khepera around a T-maze using self-organising neural networks similar to our own .
The agent was a simulated Khepera II robot with 8 light sensors and 8 IR distance sensors (see Fig. 3(a)). At each timestep (32ms in real time), the agent sampled its light sensors, whose values ranged from 8 (fully illuminated) to 500 (no light) and IR distance sensors, whose response values ranged from 0 (no object detected) to 1023 (object very close). All sensor readings were scaled to the range [0,1] for computational reasons (0 being unactivated, 1 being highly activated). Six sensors were used to comprise the input state for the SNN, three IR and three light sensors at positions 0, 2 and 5 as shown in Fig. 3(a). Additionally, two bump sensors were added to the front-left and front-right of the agent to prevent it from becoming stuck against an object. If either bump sensor was activated, an interrupt was sent causing the agent to reverse 10cm and the agent to be penalised by 10 timesteps. Movement values and sensory update delays were constrained by Webots Khepera data. Three actions were possible: forward, (maximum movement on both left and right wheels) and continuous turns to both the left and right (caused by halving the left/right motor outputs respectively). Actions were calculated once at the end of each timestep from the output neuron classifications: (high, high) or (low, low) = forwards, (high, low) = left turn, (low, high) = right turn.
The agent was located within a walled arena which it could not leave, with coordinates ranging from [-1,1] in both and directions and walls around the boundary having height . Adding to the complexity of the /media/arxiv_projects/7262/environment, a three-dimensional box was placed centrally in the arena, with vertices on “ground level” at (, ), (, ), (, ), and (, ), and raised to a height of . A light source, modelled on a 15 Watt bulb, was placed at the top-right hand corner of the arena (, ). The agent initially faces North, and its initial start position was constrained to the range . The agent must traverse the /media/arxiv_projects/7262/environment and approach the light source to receive reward. The /media/arxiv_projects/7262/environment is shown in Fig.3(b).
When the agent reached the goal state (where ), the responsible network received a constant fitness bonus of 2500, which was added to the fitness function outlined in (14). The denominator in the equation expresses the difference between the position of the goal state (1.6) and the current agent position ( and ), and is the number of timesteps taken to solve. The minimum value of this function is capped so that . The fitness of an agent is calculated at the end of every timestep, with the highest attained value of during the trial kept as the fitness value for that network. Optimal performance gives , which corresponds to 700 timesteps from start to goal state with no collisions.
In the following experiments we gauged the impact of both types of memristive synapse, comparing to a benchmark systems containing (i) memory-augmented linear resistors (ii) constant-valued connections. To aid clarity, we adopt a shorthand of “PEO” for networks containing only PEO-PANI connections. Likewise, “HP”, “LIN” and “GA” networks refer to networks containing only HP memristors, linear resistors, and constant connections respectively.
An experiment began with the generation of 100 networks of a given type (HP/PEO/LIN/GA). Every network in the population was then trialed on the test problem, with a maximum of 4000 timesteps per trial (long enough to allow for initial exploration). After this, 1000 generations of GA application took place and newly-generated networks were trialed on the test problem. Every 20 generations, the current state of the system was observed and used to create the results that follow. The entire process can be described as a system, with one system per connection type. All experiments were repeated 30 times per system. In any hardware implementation, the final solution would be the single fittest network from the population.
As the robot’s start location was tightly constrained, we were able to compare system performance, defined as the first generation in which any network in that system found the goal state. This measure produced 30 numbers, one per experimental repeat, that allowed us to perform t-tests to compare the respective goal-seeking performance of the four systems. In Table 1, “Performance” was the average performance per network type as outlined above. “High fitness” refers to the mean fitness of the highest-fitness network in each run. “Neurons” were the average final connected neurons per network in the population and “Connectivity” was the average percentage of enabled connections in the population. Statistical signifance was assessed on the 5% scale.
SNN parameters were initial hidden layer nodes=9, , , , , and . Memristive parameters were , , , , , . During a trial the variable connections in the networks may alter their weights via STDP. After every trial, variable connections were reset to their original weight of 0.5.
|HP vs PEO||0.009||0.033||0.318||0.983|
|HP vs LIN||0.009||0.106||0.601||0.349|
|HP vs GA||0.023||0.091||0.699||0.859|
|PEO vs LIN||0.763||0.009||0.130||0.171|
|PEO vs GA||0.027||0.044||0.107||0.781|
|LIN vs GA||0.019||0.684||0.762||0.289|
The most striking result from Table 1 was that the PEO networks exceeded the GA networks statistically significantly, both in terms of performance (p=0.027) and high fitness (p=0.044). LIN networks also outperformed GA networks (p=0.019), although did not have statistically better final fitness. These results indicate that these networks learn to harness the plasticity of their connections alongside the topology variations introduced by connection selection to swiftly evolve goal-finding networks.
In contrast, HP networks display significantly lower performance than all other network types (p=0.009 vs. PEO, p=0.009 vs. LIN, p=0.023 vs GA), as well as significantly worse high fitness than PEO networks (p=0.033). This observed behaviour may be due to the memristance profile (see Figure 1(b)) being highly sensitive to the effects of STDP for high () values of , as well as being more likely to be stuck at low () values, a notion echoed in . It is reasoned that this combination of effects makes the memristor less suited to attaining highly-activated networks (network analysis reveals lower numbers of spikes per network, possibly preventing the network from reliably achieving certain output patterns).
Figs. Figure 4(a) and Figure 4(b) show that the PEO and LIN networks share similar fitness profiles, both being distinctly quicker to attain high fitness values than HP and GA networks. Both PEO and LIN networks solve the /media/arxiv_projects/7262/environment within 60 trials and attain their maximal fitness values within 150 trials. GA networks reach lower final fitness values in a more gradual manner, reaching the maximal fitness value after 500 trials. HP networks are slower still; highest fitness values are attained after 950 trials. All systems eventually solved the problem, except 2 runs of HP networks. A summary of averages and standard deviations is given in Table 2.
|Perf||526.1 (992.4)||17 (34.6)||14.7 (32.5)||77.6 (130.0)|
|High fit||10660 (2280)||11581 (303)||11363 (398)||11402 (277)|
|Avg fit||9477 (3333)||11454 (319)||11058 (728)||11420 (423)|
|Conns(%)||49.42 (9.61)||49.02 (4.63)||51.26 (4.06)||51.19 (4.71)|
|Nodes||16.68 (1.74)||17.04 (0.09)||16.89 (0.54)||17.11 (0.57)|
|0.121 (0.09)||0.123 (0.1)||0.115 (0.1)||0.018 (0.01)|
|0.073 (0.04)||0.062 (0.02)||0.019 (0.02)||0.056 (0.03)|
|0.122 (0.11)||0.113 (0.07)||0.135 (0.11)||0.122 (0.11)|
|0.022 (0.034)||0.010 (0.01)||0.010 (0.01)||0.011 (0.01)|
Although there were variations between the network types with regards to the numbers of hidden layer neurons, no statistically significant differences were observed (Table Table 1 shows p-values ranging from p=0.107 to p=0.762). While PEO and LIN networks show smooth profiles to their final average neuron numbers of 17.049 and 16.894 respectively (Fig. Figure 4 (c)), HP and GA networks show more unstable profiles to final numbers of 16.677 and 17.105 neurons respectively. No significant differences were found with regards to connectivity, although Figure 4 (d) shows a general order of PEO/GA networks being more densely connected, followed by HP networks and finally PEO networks. Again, profiles are more stable for PEO and LIN networks than they are for GA and HP networks.
|HP vs PEO||0.916||0.211||0.616||0.069|
|HP vs LIN||0.549||0.017||0.226||0.07|
|HP vs GA||0.001||0.064||0.988||0.09|
|PEO vs LIN||0.618||0.178||0.243||0.525|
|PEO vs GA||0.001||0.434||0.471||0.129|
|LIN vs GA||0.001||0.471||0.465||0.458|
In all cases, a lower parameter value is associated with a more stable evolutionary process, as such events are evolutionarily preferred to be less frequent within those networks.
Being the only network to utilise the parameter, the GA networks final values were expectedly significantly different when compared to the other network types (all p-values 0.001) - Table 3. Between the variable networks there were no statistically significant differences. The GA mutation profile (Fig. Figure 5 (a)) can be seen to rapidly increase from a value of 0.3 to 0.05 at generation 300, briefly climb to approximately 0.07 at generation 480, then descend smoothly to a final value of 0.019. Other network profiles are irrelevent to the performance of those systems.
The probability of performing a neuron addition/removal event is encapsulated in the parameter. One statistically significant difference can be seen in Table 3, showing that such events are more likely to occur in HP networks than LIN networks. This can be seen in Figure 4 (c) to drive the HP networks to lower numbers of neurons per network. The probability of performing an addition rather than removal is governed by the parameter; Table 3 shows no statistically significant differences between the network types. These results indicate that no single network type allows the evolutionary process to self-adapt to produce networks containing statistically fewer neurons whilst maintaining high performance.
Connection selection is associated with the parameter. Table 3 shows no statistically significant differences, although values comparing HP to other network types are all almost significant (p=0.069 vs. PEO, p=0.07 vs. LIN p=0.09 vs GA). This difference is reflected in Figure 5 (d), where HP networks produce the only initially upwards-trending profile which follows a markedly different curvature to the others. Again, PEO and LIN network profiles can be observed to be similar to each other.
10Heterogeneous Mixtures of Memristors
While the experiments presented in Section 9 show the benefits of memristive connections, each networks behaviour under STDP is limited as homogeneous memristors follow identical STDP response curves as shown in Figure 1(a). To increase the variety of plastic behaviours available to the network as a whole, we now extend the system to allow networks to be comprised of all three variable connection types (HP memristor, PEO-PANI memristor, and linear resistor). As with our previous experiments, these networks should be replicable in hardware, provided that the myriad memristors can interface with a single neuron type, operate on the same scale, and possess similar electrical tolerances.
Mixing different types of synaptic plasticity has been investigated previously . In the first paper, interneural connections are affected by six distinct variations of the traditional Hebb rule. In , spike transmission from synapse to neuron is probabilistic, with heterogeneous probabilities throughout the network. Finally,  uses four unique Hebbian learning rules for its connections; networks may be comprised of all four connection types. All three papers consistently report that networks benefit from the inclusion of varied plasticity rules, mainly in terms of speed of goal finding, or encoding of functionality that is unattainable in homogeneously plastic networks. Comparisons are made to GA, PEO and LIN networks discussed in Section 9. Experiments are conducted with the same parameters shown in Section 9, on the same /media/arxiv_projects/7262/environment as Section 8.
To facilitate the evolution of heterogeneous networks the system is altered in two regards, (i) connection creation and (ii) GA activity.
On initialization of a new variable connection (via network creation, node addition, or connection addition), the type of that connection is selected probabilistically with of each type (HP-like memristor, PEO-PANI-like memristor, variable resistor) being selected.
Discovery is modified to allow one memristor type to mutate into another during a GA cycle. As connections are always 0.5 before a trial begins and cannot be mutated, has no role in the memristive networks. Instead we use to control the rate of memristor type mutation taking place. During a GA cycle, after mutation, each connection in the child networks may alter to one of the two other connection types upon satisfaction of probability . Each network’s value of is self-adapted as in equation (9), and is initially seeded randomly uniformly in the range [0,0.25] as with and .
Performance is shown in Table 5, which reveals that heterogeneous networks have higher performance characteristics than PEO (p=0.026), LIN (p=0.043) and GA (p=0.003) networks. Figure 6 (a) reveals that goal-finding behaviour is attained within 20 trials, faster than any homogenous network type. The final “high fitness” value attained is higher than all other network types (Fig. Figure 6 (a)), and significantly higher than that of both LIN (p0.001) and GA (p0.001) networks (Table Table 5). Average fitness is shown in Figure 6 (b) and can be seen to attain near-optimal population-wide fitness after only 300 trials, an improvement over the other network types considered. These results suggest that mixing synaptic behaviour allows the networks to more quickly attain higher performance characteristics. Averages and standard deviations are given in Table 4.
|Perf||1.7 (4.8)||17 (34.6)||14.7 (32.5)||77.6 (130.0)|
|High fit||11696 (186)||11581 (303)||11363 (398)||11402 (277)|
|Avg fit||11474 (285)||11454 (319)||11058 (728)||11420 (423)|
|Conns(%)||48.97 (5.58)||49.02 (4.63)||51.26 (4.06)||51.19 (4.71)|
|Nodes||16.98 (0.65)||17.04 (0.09)||16.89 (0.54)||17.11 (0.57)|
|0.074 (0.03)||0.123 (0.1)||0.115 (0.1)||0.018 (0.01)|
|0.072 (0.02)||0.062 (0.03)||0.019 (0.02)||0.056 (0.03)|
|0.132 (0.09)||0.113 (0.07)||0.135 (0.11)||0.122 (0.11)|
|0.011 (0.01)||0.010 (0.01)||0.010 (0.01)||0.011 (0.01)|
As with the homogenous network comparisons, Table 5 reveals no significant differences with regards with final heterogeneous network neuron numbers. Figure 6 (c) visualises a steady profile that terminates slightly below its starting value of 17. Percentage connectivity drops by approximately 1% during the experiment to a final value of 49%, shown in Figure 6 (d), giving a similar final value to PEO networks, lower than LIN and GA. This is (just) significantly lower than LIN networks (p=0.049, Table 5), although the actual difference is only 2%. Due to the general lack of statistical significance, it is demonstrated that the increased performance characteristics of heterogeneous networks are not offset by increased network complexity, and in some cases offer an improvement.
Heterogeneous and GA networks use to control different aspects of the GA cycle (HET networks use it to control the rate of switching of memristive behaviours, GA networks use it to alter connection weights). Because of this, a statistically significant p-value 0.001 is seen between these network types (Table Table 6). LIN and PEO networks do not use so comparisons are omitted. Figure 7 (a) shows that the HET profile is more stable than the GA profile.
The profile of (the rate of constructivism events in the networks) is shown in Figure 7 (b) to descend to a final value of 0.072, higher than the other network types. This is significantly higher than that of LIN (p=0.005) and GA (p=0.015) networks (Table Table 6), although this seems to correspond to heightened topology manipulation activity rather than different final neuron levels, as shown in Table 5. All network types show similar downward-trending profiles for the parameter, which is the probability of node additon as opposed to node removal upon satisfaction of , shown in Figure 7 (c). The similarity of these profiles is reflected in their respective p-values (Table Table 6), which show no significant differences. These results indicate that the evolutionary process does not distinguish significantly between the variable connection types used in the networks.
Heterogeneous networks follow similar profiles to LIN and PEO networks, as visualised in Figure 7 (d). Because of this, there are no statistical differences in terms of . Despite this lack of statistical significance in the controlling parameter, HET networks are significantly less connected than LIN networks (p=0.049, Table 5. This indicates that connection removal events are more likely to produce beneficial outcomes in HET networks than they are in LIN networks, as the frequency of those events is similar across the network types.
|HETERO vs.||Performance||High fitness||Neurons||Connectivity|
11Analysis of Heterogeneous Network Evolution
Although heterogeneous networks were found to have higher performance characteristics than all other network types, it would be beneficial to know how STDP is used to benefit heterogeneous networks. For example, are particular variable connection types more likely to be attached to excitatory or inhibitory neurons? Is there an evolutionary preference to have a given type of variable connection attached to certain inputs, or driving a particular output neuron? We focus on two broad themes; evolution (this section): evolutionary preferences to certain memristive configurations and runtime (section Section 12) - how STDP is used by those configurations to generate high-performance behaviour.
We average only the best network in each run, allowing us to focus on topological configurations that are beneficial. Figure 8 shows with HP and LIN components being preferred to PEO-PANI memristors. Despite Table 7 showing no statistically significant differences, it is interesting to see the worst-performing memristor type from the homogenous networks (HP) being preferred to the best-performing (PEO-PANI). This suggests that the evolutionary process finds a way to harness HP-like behaviour more readily when used in combination with other memristor types.
|HP vs. PEO-PANI||0.118|
|HP vs. LINEAR||0.609|
|PEO vs. LINEAR||0.054|
11.1Memristor Types per Layer
We now consider the specific positions of memristors in the networks. As the networks consist of three layers, memristors can be classified based on the layers of the neurons that they connect, e.g. input, hidden, or output. Figs. Figure 9 confirms results seen in Figure 8, specifically that PEO-PANI memristors are universally more sparsely utilised than the other connection types. In all cases, PEO-PANI memristors become the minority within 200 generations.
Two main significant results are shown in Table 8. Firstly, LIN connections are preferred to both HP (p=0.045) and PEO-PANI (p=0.012) types when connecting two hidden layer neurons (Fig. Figure 9 (b)). A feasible explanation is that the networks benefit from a basis of stable (e.g. linear) communications within the hidden (processing) layer to generate reliable action sequences. More importantly, this result indicates that more linear memristors, if physically realised, could play an important role in future NC implementations. Secondly, HP memristors are significantly (p=0.04) preferred to PEO-PANI memristors when connecting hidden neurons to output neurons; Figure 9 (c) shows HP memristors are by far the most popular choice in this role. HP memristors appear to be more suited to reliably reduce the number of spikes in the output trains to generate low output classifications when a turn is required.
|Input - hidden||HP vs. PEO-PANI||0.710|
|HP vs. LINEAR||0.543|
|PEO vs. LINEAR||0.339|
|Hidden - hidden||HP vs. PEO-PANI||0.482|
|HP vs. LINEAR||0.045|
|PEO vs. LINEAR||0.012|
|Hidden - Output||HP vs. PEO-PANI||0.04|
|HP vs. LINEAR||0.079|
|PEO vs. LINEAR||0.839|
|Neuron location||Neuron type||Comparison||P-value|
|Presynaptic||Excitatory||HP vs. PEO-PANI||0.018|
|HP vs. LINEAR||0.805|
|PEO vs. LINEAR||0.015|
|Inhibitory||HP vs. PEO-PANI||0.516|
|HP vs. LINEAR||0.259|
|PEO vs. LINEAR||0.874|
|Postsynaptic||Excitatory||HP vs. PEO-PANI||0.721|
|HP vs. LINEAR||0.368|
|PEO vs. LINEAR||0.314|
|Inhibitory||HP vs. PEO-PANI||0.061|
|HP vs. LINEAR||0.72|
|PEO vs. LINEAR||0.183|
Table 9 shows the relative numbers of memristor types that are connected (pre- or post-synaptic) to excitatory and inhibitory neurons. PEO-PANI memristors are less preferred to the other connection types; p=0.018 vs. HP memristors and p=0.015 vs. LIN memristors, when an excitatory neuron is presynaptic. As excitatory neurons are the sole method of activity generation, LIN may be preferred as they respond to STDP less dramatically, and can therefore more reliably maintain useful activity patterns. Since excitatory spikes are also responsible for generating output spike trains, HP are reasoned to be preferred as they can reliably reduce network activity to generate low spike train classifications when required. This supports the claim that certain memristor types are preferred in certain situations, and implies that the nonlinear activity of the HP and PEO-PANI memristors may be harnessed to alter, rather than preserve, network behaviour.
|Input source neuron||Comparison||P-value|
|IR sensor||HP vs. PEO-PANI||0.033|
|HP vs. LINEAR||0.074|
|PEO vs. LINEAR||0.933|
|Light sensor||HP vs. PEO-PANI||1|
|HP vs. LINEAR||0.379|
|PEO vs. LINEAR||0.410|
Despite low general appearance rates within the networks, PEO-PANI memristors are significantly preferred to HP memristors when postsynaptic to an IR sensor (Table Table 10, p=0.033). As IR sensors respond only when near an obstacle, swift attainment of stable high activation to alter network acivation is required. The PEO-PANI profile is also ideal to stabily create high-efficiacy connections via positive STDP which would make future obstacle avoidance response both stronger and quicker. In contrast, no statistically significant values are found when the input neuron is attached to a light sensor. Light sensors may be more ambivalent to more gradual synaptic efficiacy changes as the state space experienced by those sensors is less rugged than that experienced by the IR sensors; swift action perturbation is not required so any synapse type can function equally well.
12Heterogenous STDP Analysis
Synaptic plasticity acts to alter the influence of the connections on the activity of the network during a trial. For this reason, we analyse the activity of the network as it solves the test problem, with particular attention paid to the role of STDP in behaviour generation.
12.1Runtime analysis: Averages of Best Networks
Each network took a differing amount of timesteps to solve the task. Since averages are taken over the highest performing network in each run, there is an approximate correlation between the time that network executes the “turn”, and the time it reaches the goal state and ends the trial. Therefore, we are able to check for patterns within this timeframe. Initial analysis revealed that the numbers of STDP events during a trial tend to osciallate between two or more values as the trial progresses. To account for erronuous statistics that may arise as a result of these oscillations, all STDP values (and resultant weights) are averages over the previous 10 timesteps.
Fig. Figure 10 (a) shows the average weight per memristor type during runtime. The PEO-PANI weight profile constantly increased throughout the duration of the trial, whereas the HP memristor and linear resistor weights terminated at approximately equal values. PEO-PANI memristors were universally higher-weighted than HP memristors and linear resistors at the end of a trial; average final weights were HP memristor = 0.572, PEO-PANI memristor= 0.673 and linear resistor = 0.591. Between HP and PEO-PANI memristors a statistically signifant p-value of 0.047 is observed, suggesting that PEO-PANI memristors act more like facilitating synapses than the other two memristor types.
STDP events were most prevalent during the first 250 timesteps (Figs. Figure 10 (b) and Figure 10 (c)). Within this timeframe, HP memristors had low amounts of positive STDP (36.18) and high amounts of negative STDP (67.24). PEO-PANI memristors had high (84.45) positive STDP events, and low (16.48) negative STDP events. Linear resistors possessed comparable amounts of both types of STDP; 31.23 positive and 25.04 negative events. Because of this, HP memristors had significantly more negative STDP events than PEO-PANI did, and PEO-PANI underwent significantly more positive STDP events than HP (both p0.001). HP memristors experience significantly more negative STDP than negative STDP, with the inverse being true for PEO-PANI memristors (both p0.001). These results reinforce the view that the different memristor types are harnessed by the network as a result of being placed in favourable locations via evolution.
Following this period of heightened STDP activity, STDP events for all three variable connection types diminish and become more stable; networks use STDP to “set up” connection weights, which are then affected by further STDP to induce turning behaviour. It should be noted that STDP generally involves more positive events than it does negative, e.g. the role of STDP is mainly to increase levels of activation within the networks.
12.2Turn Analysis of Best Overall Network
We further refined the scope of our investigation to cover the single highest performing network, shown in Figure 10 (d), which solved the task in 709 steps. Motivation for focus on the turn is based on activity; as more STDP events occured during a turn this was an obvious timeframe to study.
The network existed in a number of stable states oscillating between (usually 2) STDP values, which were observed throughout runtime. Turning motion began at timestep 293 and ended at timestep 372, during which periodic action switching behaviour between “forward” and “right turn” actions were observed. Outside of this range, uniform “forward” actions were generated. Performance characteristics of each memristor type at the turn event are shown in Table 11. HP memristors had lower rates of positive STDP with respect to negative STDP throughout the turn. The opposite was true for PEO-PANI memristors, strengthening the notion that the two memristor types are evolutionarily preferred as facilitating and depressing synapses respectively. During the turn, HP memristors were seen to maintain identical numbers of positive STDP events, whereas both PEO-PANI memristors and linear resistors kept identical negative STDP oscillations.
|Start of turn||HP||PEO-PANI||LINEAR|
|Positive STDP||11 to 11/12||26/27 to 17/19||19/21 to 27|
|Negative STDP||25/26 to 32/33||6/9 to14/15||43 to 25/29|
|Positive STDP||11/12 to 13/17||37/41 to 58/63||37/38 to 59/60|
|Negative STDP||25/26 to 18/22||14/15 to 2/5||25/29 to 22/23|
In particular, two memristors were altered via STDP during runtime to achieve the desired behaviour, shown in Figure 11. Firstly, the HP memristor connecting the second hidden node to the first output node underwent repeated negative STDP events, which due to the HP memristance profile enacted a swift decrease in conductivity. This caused the initial turning motion by altering the spike train of the first output neuron from “high activation” to “low activation”. Correspondingly, the output action changed from constant forward motion to sequential “forward” and “right turn” actions. Towards the end of the turn, this connection underwent a restrengthening due to different input node spike trains, allowing the first output neuron to achieve higher spiking frequency and produce a constant “forward” motion. The second memristor in question was postsynaptic to the first input node and presynaptic to the 8th hidden node, which was strengthened at the end of the turn. It was reasoned that this memristor allowed forwards motion to be generated by compensating for the change in light sensor values owing to the orientation of the agent changing. Practically, the newly-strengthened memristor caused the 8th hidden neuron to spike more frequently. As this neuron was connected to the first output neuron, increased activity also caused the output neuron to spike more frequently, causing a “high” spike train classification which, in cooperation with the activity of the first memristor mentioned above, allowed for the generation of stable “forward” motion despite the new agent orientation.
13Dynamic reward scenario
To further test the capabilities of the HET system, we ran an experiment based on the T-maze (e.g. ) scenario, where the agent must “forget” it’s previously-learned behaviour after a time and adapt to a newly-positioned goal state. Soltoggio  demonstrates the utility of plastic networks in such dynamic reward scenarios.
For continuity, the sensorimotor space was identical to the previous experiment (Fig.3), although the same adaptivity as in the T-maze is required. We made the /media/arxiv_projects/7262/environment more challenging. Firstly sensory noise was added based on Webots Khepera data; 2% noise for IR sensors and 10% noise for light sensors, all randomly sampled from a uniform distribution. Wheel slippage was also included (10% chance). Secondly, the location of the reward changed from upper-right to upper-left during the lifetime of the agent (Fig.3(b)). It should be noted that the light source does not move, e.g. with the reward in its second position, the agent is no longer performing phototaxis.
Each trial was split into two parts, the reward was moved for part 2. Membrane potentials and synaptic weights were not reset between these parts so that the agent had memory of the first part. If the agent did not locate the goal in the first part, it cannot receive reward when the goal is moved. A reward of 1 was given when the agent stabily found the first reward zone (part 1). After this, the reward was relocated to the upper-left of the /media/arxiv_projects/7262/environment and part 2 commenced, continuing until the agent located the new reward zone (for a total fitness of 2), or the step limit was reached. “Performance” was the number of trials the system took to find the second reward having located the first reward and was the main metric for comparison as it measured adaptation speed. All other parameters are identical to those in Section 9.
In the following experiment, we compare the HET system to a benchmark GA system, and intend to demonstrate the utilty of memristive networks over those with static connections in this dynamic reward scenario. Results, shown in Table 12, reveal that HET networks are universally preferable to GA networks, having higher performance, high fitness and average fitness, as well as lower connectivity and neuron numbers. Significantly, HET networks are quicker at adapting to the change in reward location (p=0.037), suggesting that plastic memristive networks are suited to dynamic tasks. Six of the GA networks could not locate both rewards. The parameter was significantly lower (p=0.049), although this did not lead to a significant reduction in connectivity. All other parameters (, , ) have different final values than those in the first experiment, demonstrating the context-sensitivity of the self-adaptation process.
Whereas the previous experiment saw HP being preferred to PEO-PANI when connecting hidden-output neurons, evolution now prefers both HP (p=0.007, avg 3.8) and PEO-PANI (p=0.047, avg 4.4) to LIN (avg 2.4) components. This, coupled with the fact that LIN synapses are no longer significantly preferred to the other types between two hidden layer neurons, suggests that dynamic (nonlinear) activity is required by the networks to adapt rapidly.
HP memristors are also significantly preferred to PEO-PANI memristors (p=0.032) when connected to a light sensor, averages are 6 and 4.8 synapses of that respective type. It is postulated that this favouring of an easily depressed connection type is one of the ways the networks evolve to deal with noise, which light sensors experience more than IR sensors.
Average synapse weights and STDP performance can be seen in Figure 12. STDP is originally used as in the previous experiment, to “set up” connection weights. The major result from STDP analysis of the best network from each run is that the average weight per synapse type varies between the two parts of the trial (Fig. Figure 12(a)). In contrast to the previous experiment, PEO-PANI weights are low during the first part, and suddenly increase at the start of the second part (approximately timestep 750). In all networks considered, PEO-PANI synapse weight varied more widely than the other types during a trial. It is reasoned that PEO-PANI synapses were used as they can affect network activity more stabily, and more strongly per positive STDP event. Average PEO-PANI weight (0.539) was significantly higher than average HP weight (0.507, p0.001).
The use of STDP within the networks is shown in Figs. Figure 12(b) and(c). STDP is intially similar to the previous experiment, with a slight increase in all types of positive STDP at timestep 750. This coincides with a decrease in HP negative STDP, and increase in PEO and LIN negative STDP around the same timeframe. Due to the nature of the PEO-PANI profile, this slight increase in positive STDP corresponds to the large increase in average weight seen in Figure 12(a), allowing the network to successfully solve the /media/arxiv_projects/7262/environment. Overall, HP memristors undergo significantly more negative STDP than positive STDP, with the inverse being true of PEO-PANI memristors. Noise is seen to be handled in some networks by STDP. Specifically, light sensors are frequently seen attached to inhibitory neurons, which can act via STDP to reduce the impact of those inputs on the activity of the network. To give some idea of topology, the best network is shown in Figure 12(d). It should be noted that, despite the increased complexity of the task, this network contains less hidden layer neurons (17 vs. 20) and memristors (91 vs. 109) than the best network from the static /media/arxiv_projects/7262/environment.
|Performance||57.8 (52.5)||541.4 (364)||0.037|
|High fit||2 (0)||1.8 (0.45)||0.373|
|Avg fit||1.12 (0.08)||1.03 (0.04)||0.155|
|Conns(%)||52.21 (1.61)||52.78 (1.29)||0.109|
|Nodes||16.79 (0.32)||16.85 (0.12)||0.739|
|0.06 (0.01)||0.08 (0.02)||0.111|
|0.08 (0.02)||0.09 (0.02)||0.192|
|0.25 (0.09)||0.30 (0.05)||0.422|
|0.05 (0.01)||0.06 (0.01)||0.049|
In this paper we have demonstrated the first evolutionary approach to designing memristive SNNs for obstacle avoidance/dynamic reward tasks. We have shown that plasticity can be harnessed by the networks via STDP to achieve more expedient goal-finding behaviour with no significant downside in terms of topological complexity. Results indicate that, in possible NC implementations, heterogeneous mixtures of memristors possess advantages compared to both constant connections and networks of a single memristor type. Self-adaptive parameters were found to alter dependent on the variable connection type in the network. It is important to note that internal memristive network dynamics have no analogue in the GA case; those behaviours cannot be replicated by GA networks. Overall it can be seen that all of our research questions have been answered and the hypotheses sufficiently demonstrated, as highlighted in the conclusions drawn from each set of experiments and summarised below.
The original hypothesis, “that memristive synapses provide the networks with increased performance”, was confirmed as PEO and LIN networks outperformed GA networks, and PEO networks evolved higher fitness solutions than GA networks.
The heterogeneous network hypothesis, “to confirm that varied memristive behaviours can be harnessed by the evolutionary design process to provide further advantages, specifically that (i) certain functionality can be more easily achieved by certain memristor types (ii) combinations of memristor types are beneficial to the networks” was answered as heterogenous networks were higher performing than LIN, PEO and GA networks, and generated higher fitness solutions than LIN and GA networks. They were also shown to outperform GA networks in a dynamic reward scenario.
Research question 1 - “Does the evolutionary process allow for the successful generation of memristive networks that outperform constant-valued connections, despite the memristors nonlinearity and given the potential for complex interactions within memristive networks?” - was answered as PEO networks were successfully evolved to outperform, and generate higher fitness solutions than, GA networks.
Reseach question 2 - ”In the heterogeneous case, do mixtures of memristors provide better performance than other implementations? How do such networks generate useful behaviour?“ - was proven as heterogeneous networks had higher performance characteristics than PEO, LIN and GA networks. Useful behaviour was generated on an evolutionary level by assigning positions to the memristors based on their profiles, and on a runtime level by generating STDP to alter synaptic efficiacies to exploit properties of those profiles.
Research question 3 - “Is there an evolutionary preference in assigning specific roles to specific memristor types based on variations in their memristive behaviours?” was answered as a number of statistically significant differences with respect to the placement of specific connection types in certain positions in the networks were found.
Biological brains contain mixtures of synapses that have specific types (e.g. depressing, facilitating) based on their performance characteristics ; results suggest that the evolutionary process casts the HP memristor as a depressing synapse and the PEO-PANI as an excitatory synapse within the heterogenous networks. With statistical significance, in both static and dynamic scenarios, PEO-PANI synapses achieved higher average efficiency than HP synapses and underwent more positive STDP than negative STDP, as well as undergoing more positive STDP than HP memristors did. The inverse is true of the HP memristor when compared to the PEO-PANI. Biological brains also place these varied synapse types in certain contexts (e.g.  gives examples of depressing synapses being typically found between two pyramidal neurons and facilitating synapses being frequently connecting between pyramidal and interneurons). Initial findings provide compelling evidence that evolution of heterogeneous networks shares this feature; numerous examples have been reported herein including (i) PEO-PANI being preferred to HP when attached to IR sensors (ii) LIN being preferred to HP and PEO-PANI, being used to generate more stable behaviour between two hidden layer neurons (iii) HP being preferred to PEO-PANI when connecting to output layer neurons. In the dynamic case, (i) HP were preferred to PEO-PANI when attached to light sensors (ii) PEO-PANI were preferred to HP and LIN when connecting to output neurons. It is clear that the memristors are assigned types, and thus roles, within the networks based on their profiles. It is also shown that the role for a specific type can vary based on the /media/arxiv_projects/7262/environment the controller encounters.
The introduction highlighted the implementation of neuromorphic structures as motivation for conducting this research. The use of physical equations to model memristive behaviour makes a future hardware implementation more viable. Performance of the yet-to-be-realised LIN component indicates that more linear memristive behaviours may be beneficial, especially in a heterogeneous scenario. As the memristors have a constant initial weight that is not affected by GA activity, memristive networks are initially handicapped with less degrees of behavioural freedom. Despite this fact, they are shown to adapt by allowing /media/arxiv_projects/7262/environmental signals to alter synaptic efficiency to outperform (in some cases) the GA approach on the test problem. The networks evolved are admittedly at a much smaller scale than those required by the neuromorphic paradigm. Scalability is more likely to be possible due to the inclusion of constructivism and self-adaptive search parameters, provided that the innate self-organising properties of the networks can account for the increased complexity of intra-network communications.
- In traditional neural network terminology, the objective of the networks is to find a suitable sensor-motor mapping to allow for navigation, the function is the function that approximates this mapping, and the compactness is the minimal network topology as shown in the results sections.
- C. Mead, “Neuromorphic electronic systems,” Proceedings of the IEEE, vol. 78(10), pp. 1629–1636, 1990.
- J. M. Rabaey, Digital integrated circuits: a design perspective. 1em plus 0.5em minus 0.4emUpper Saddle River, NJ, USA: Prentice-Hall, Inc., 1996.
- L. Chua, “Memristor-the missing circuit element,” Circuit Theory, IEEE Transactions on, vol. 18, no. 5, pp. 507 – 519, Sep. 1971.
- D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing memristor found,” Nature, vol. 453, pp. 80–83, 2008.
- Y. Ho, G. M. Huang, and P. Li, “Nonvolatile memristor memory: device characteristics and design implications,” in Proceedings of the 2009 International Conference on Computer-Aided Design, ser. ICCAD ’09.1em plus 0.5em minus 0.4emNew York, NY, USA: ACM, 2009, pp. 485–490.
- V. Erokhin, T. Berzina, A. Smerieri, P. Camorani, S. Erokhina, and M. Fontana, “Bio-inspired adaptive networks based on organic memristors,” Nano Communication Networks, vol. 1, no. 2, pp. 108 – 117, 2010.
- D. O. Hebb, The organisation of behavior.1em plus 0.5em minus 0.4emWiley, New York, 1949.
- W. M. Kistler, “Spike-timing dependent synaptic plasticity: a phenomenological framework,” Biological Cybernetics, vol. 87, pp. 416–427, 2002.
- W. Gerstner and W. M. Kistler.1em plus 0.5em minus 0.4emCambridge University Press, Aug.
- J. H. Holland, “Adaptation,” in Progress in theoretical biology IV, R. Rosen and F. M. Snell, Eds.1em plus 0.5em minus 0.4emNew York, NY, USA: Academic Press, 1976, pp. 263–293.
- V. Erokhin and M. P. Fontana, “Electrochemically controlled polymeric device: a memristor (and more) found two years ago,” ArXiv e-prints, Jul. 2008.
- T. Higuchi, M. Iwata, D. Keymeulen, H. Sakanashi, M. Murakawa, I. Kajitani, E. Takahashi, K. Toda, N. Salami, N. Kajihara, and N. Otsu, “Real-world applications of analog and digital evolvable hardware,” Evolutionary Computation, IEEE Transactions on, vol. 3, no. 3, pp. 220 –235, sep 1999.
- L. Sellami, S. Singh, R. Newcomb, and G. Moon, “Linear bilateral cmos resistor for neural-type circuits,” in Circuits and Systems, 1997. Proceedings of the 40th Midwest Symposium on, vol. 2, aug. 1997, pp. 1330 – 1333.
- L. Lapicque, “Recherches quantitatifs sur l’excitation electrique des nerfs traitée comme une polarisation,” in Journal of Physiological Pathology, vol. 9, San Francisco, California, USA, 1907, pp. 620–635.
- D. E. Rumelhart and J. L. McClelland, Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations. 1em plus 0.5em minus 0.4emCambridge, MA, USA: MIT Press, 1986.
- W. Maass, “Networks of spiking neurons: The third generation of neural network models,” Neural Networks, vol. 10, no. 9, pp. 1659 – 1671, 1997.
- K. Saggie-Wexler, A. Keinan, and E. Ruppin, “Neural processing of counting in evolved spiking and mcculloch-pitts agents,” Artificial Life, vol. 12, no. 1, pp. 1–16, 2006.
- X. Jin, A. Rast, F. Galluppi, S. Davies, and S. Furber, “Implementing spike-timing-dependent plasticity on spinnaker neuromorphic hardware,” in Neural Networks (IJCNN), The 2010 International Joint Conference on, A. Prieto, Ed., july 2010, pp. 1 –8.
- S. Mitra, S. Fusi, and G. Indiveri, “Real-time classification of complex patterns using spike-based learning in neuromorphic vlsi,” Biomedical Circuits and Systems, IEEE Transactions on, vol. 3, no. 1, pp. 32 –42, feb. 2009.
- D. Floreano, P. Dürr, and C. Mattiussi, “Neuroevolution: from architectures to learning,” Evolutionary Intelligence, vol. 1, pp. 47–62, 2008.
- M. Korkin, N. E. Nawa, and H. d. Garis, “A “spike interval information coding” representation for atr’s cam-brain machine (cbm),” in Proceedings of the Second International Conference on Evolvable Systems: From Biology to Hardware, D. M. Moshe Sipper and A. Pèrez-Uribe, Eds.1em plus 0.5em minus 0.4emLondon, UK: Springer-Verlag, 1998, pp. 256–267.
- D. Floreano and C. Mattiussi, “Evolution of spiking neural controllers for autonomous vision-based robots,” in Evolutionary Robotics. From Intelligent Robotics to Artificial Life, ser. Lecture Notes in Computer Science, T. Gomi, Ed.1em plus 0.5em minus 0.4emSpringer Berlin / Heidelberg, 2001, vol. 2217, pp. 38–61.
- S. Nolfi and D. Floriano, Evolutionary Robotics.1em plus 0.5em minus 0.4emCambridge, MA, USA: The MIT Press, 2000.
- G. Indiveri, “Modeling selective attention using a neuromorphic analog VLSI device,” Neural Computation, vol. 12, no. 12, pp. 2857–2880, 2001.
- D. Floreano, N. Schoeni, G. Caprari, and J. Blynel, “Evolutionary bits’n’spikes,” in Proceedings of the eighth international conference on Artificial life.1em plus 0.5em minus 0.4emCambridge, MA, USA: MIT Press, 2003, pp. 335–344.
- D. Federici, “Evolving developing spiking neural networks,” in IEEE Congress on Evolutionary Computation, D. Corne, Z. Michalewicz, B. McKay, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, G. Raidl, K. Tan, and A. Zalzala, Eds.1em plus 0.5em minus 0.4emIEEE, 2005, pp. 543–550.
- P. Rocke, B. McGinley, F. Morgan, and J. Maher, “Reconfigurable hardware evolution platform for a spiking neural network robotics controller,” in Reconfigurable Computing: Architectures, Tools and Applications, ser. Lecture Notes in Computer Science, P. Diniz, E. Marques, K. Bertels, M. Fernandes, and J. Cardoso, Eds.1em plus 0.5em minus 0.4em Springer Berlin / Heidelberg, 2007, vol. 4419, pp. 373–378.
- O. Kavehei, Y.-S. Kim, A. Iqbal, K. Eshraghian, S. Al-Sarawi, and D. Abbott, “The fourth element: Insights into the memristor,” in Communications, Circuits and Systems, 2009. ICCCAS 2009. International Conference on, july 2009, pp. 921 –927.
- H. Kim, M. Sah, C. Yang, and L. Chua, “Memristor-based multilevel memory,” in Cellular Nanoscale Networks and Their Applications (CNNA), 2010 12th International Workshop on, T. Roska, M. Gilli, and A. Zaràndy, Eds., 2010, pp. 1 –6.
- B. Mouttet, “Memristor pattern recognition circuit architecture for robotics,” in Proceedings of the 2nd International Multi-Conference on Engineering and Technological Innovation II, B. Jorge, C. Nagib, E. Kamran, H. Shigehiro, L. William, S. Tomohiro, S. Stefano, and Z. C. Dale, Eds., 2009, pp. 65–70.
- W. Doolittle, W. Calley, and W. Henderson, “Complementary oxide memristor technology facilitating both inhibitory and excitatory synapses for potential neuromorphic computing applications,” in Semiconductor Device Research Symposium, 2009. ISDRS ’09., K. Jones and Z. Dilli, Eds., 2009, pp. 1 –2.
- S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale memristor device as synapse in neuromorphic systems,” Nano Letters, vol. 10, no. 4, pp. 1297–1301, 2010.
- =2plus 43minus 4 B. Linares-barranco and T. Serrano-Gotarredona, “Memristance can explain spike-time- dependent-plasticity in neural synapses,” Nature preceedings, 2009. [Online]. Available: http://hdl.handle.net/10101/npre.2009.3010.1 =0pt
- G. Snider, “Computing with hysteretic resistor crossbars,” Applied Physics A: Materials Science and Processing, vol. 80, pp. 1165–1172, 2005, 10.1007/s00339-004-3149-1.
- Y. V. Pershin, S. La Fontaine, and M. Di Ventra, “Memristive model of amoeba learning,” Phys. Rev. E, vol. 80, no. 2, p. 021926, 2009.
- G.-Q. Bi and M.-M. Poo, “Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type.” J. Neurosc, vol. 77, no. 1, pp. 551–555, 1998.
- A. Soltoggio, “Neural plasticity and minimal topologies for reward-based learning,” in Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems.1em plus 0.5em minus 0.4em Washington, DC, USA: IEEE Computer Society, 2008, pp. 637–642.
- P. Durr, C. Mattiussi, A. Soltoggio, and D. Floreano, “Evolvability of Neuromodulated Learning for Robots,” in Proceedings of the 2008 ECSIS Symposium on Learning and Adaptive Behavior in Robotic Systems, A. O. El-Rayis, A. Stoica, E. Tunstel, T. Huntsberger, T. Arslan, and S. Vijayakumar, Eds.1em plus 0.5em minus 0.4emLos Alamitos, CA: IEEE Computer Society, 2008, pp. 41–46.
- W. Maass and A. M. Zador, “Dynamic stochastic synapses as computational units,” Neural Computation, vol. 11, no. 4, pp. 903–917, 1999.
- J. Urzelai and D. Floreano, “Evolution of adaptive synapses: Robots with fast adaptive behavior in new /media/arxiv_projects/7262/environments,” Evol. Comput., vol. 9, no. 4, pp. 495–524, December 2001.
- R. D. Beer, “Toward the evolution of dynamical neural networks for minimally cognitive behavior,” in From Animals to Animats 4: Proceedings if the 4th International Conference on the Simulation of Adaptive Behavior, P. Maes, M. Mataric, J. Meyer, J. Pollack, and S. Wilson, Eds.1em plus 0.5em minus 0.4emCambridge, MA, USA: MIT press, 1996, pp. 421–429.
- G. S. Snider, “Spike-timing-dependent learning in memristive nanodevices,” in Proceedings of the 2008 IEEE International Symposium on Nanoscale Architectures, ser. NANOARCH ’08.1em plus 0.5em minus 0.4em Washington, DC, USA: IEEE Computer Society, 2008, pp. 85–92.
- A. Afifi, A. Ayatollahi, and F. Raissi, “STDP implementation using memristive nanodevice in CMOS-nano neuromorphic networks,” IEICE Electronics Express, vol. 6, no. 3, pp. 148–153, 2009.
- M. D. Pickett, D. B. Strukov, J. L. Borghetti, J. J. Yang, G. S. Snider, D. R. Stewart, and R. S. Williams, “Switching dynamics in titanium dioxide memristive devices,” Journal of Applied Physics, vol. 106, no. 7, p. 074508, 2009.
- M. Rocha, P. Cortez, and J. Neves, “Evolutionary neural network learning,” in Progress in Artificial Intelligence, ser. Lecture Notes in Computer Science.1em plus 0.5em minus 0.4emSpringer Berlin / Heidelberg, 2003, vol. 2902, pp. 24–28.
- I. Rechenberg, Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution.1em plus 0.5em minus 0.4emStuttgart, Germany: Frommann-Holzboog, 1973.
- T. Takagi and M. Sugeno, “Fuzzy Identification of Systems and Its Applications to Modeling and Control,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 15, no. 1, pp. 116–132, feb. 1985.
- J. de Jesus Rubio, “Sofmls: Online self-organizing fuzzy modified least-squares network,” Fuzzy Systems, IEEE Transactions on, vol. 17, no. 6, pp. 1296 –1309, dec. 2009.
- M. Fazle Azeem, M. Hanmandlu, and N. Ahmad, “Structure identification of generalized adaptive neuro-fuzzy inference systems,” Fuzzy Systems, IEEE Transactions on, vol. 11, no. 5, pp. 666 – 681, oct. 2003.
- C. Mattiussi and D. Floreano, “Analog genetic encoding for the evolution of circuits and networks,” Evolutionary Computation, IEEE Transactions on, vol. 11, no. 5, pp. 596 –607, oct. 2007.
- J.-Y. Jung and J. Reggia, “Evolutionary design of neural network architectures using a descriptive encoding language,” Evolutionary Computation, IEEE Transactions on, vol. 10, no. 6, pp. 676 –688, dec. 2006.
- K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,” Evolutionary Computation, vol. 10, no. 2, pp. 99–127, 2002.
- K. Stanley, B. Bryant, and R. Miikkulainen, “Real-time neuroevolution in the nero video game,” Evolutionary Computation, IEEE Transactions on, vol. 9, no. 6, pp. 653 – 668, dec. 2005.
- J. Alonso, F. Alvarruiz, J. Desantes, L. Hernandez, V. Hernandez, and G. Molto, “Combining neural networks and genetic algorithms to predict and reduce diesel engine emissions,” Evolutionary Computation, IEEE Transactions on, vol. 11, no. 1, pp. 46 –55, feb. 2007.
- E. Tuci, G. Massera, and S. Nolfi, “Active categorical perception of object shapes in a simulated anthropomorphic robotic arm,” Evolutionary Computation, IEEE Transactions on, vol. 14, no. 6, pp. 885 –899, dec. 2010.
- K.-Y. Im, S.-Y. Oh, and S.-J. Han, “Evolving a modular neural network-based behavioral fusion using extended vff and /media/arxiv_projects/7262/environment classification for mobile robot navigation,” Evolutionary Computation, IEEE Transactions on, vol. 6, no. 4, pp. 413 – 419, aug 2002.
- P. Durr, C. Mattiussi, and D. Floreano, “Genetic representation and evolvability of modular neural controllers,” Computational Intelligence Magazine, IEEE, vol. 5, no. 3, pp. 10 –19, aug. 2010.
- R. Kohavi and G. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1, pp. 273–324, 1997.
- D. Koller and M. Sahami, “Toward optimal feature selection,” in International Conference on Machine Learning, L. Saitta, Ed., 1996, pp. 284–292.
- C. P. Dolan and M. G. Dyer, “Toward the evolution of symbols,” in Genetic Algorithms and their Applications (ICGA’87), J. J. Grefenstette, Ed.1em plus 0.5em minus 0.4emHillsdale, New Jersey: Lawrence Erlbaum Associates, 1987, pp. 123–131.
- O. Michel, “Webots: Professional mobile robot simulation,” International Journal of Advanced Robotic Systems, vol. 1, no. 1, pp. 39–42, 2004.
- J. Craighead, R. Murphy, J. Burke, and B. Goldiez, “A survey of commercial open source unmanned vehicle simulators,” in Robotics and Automation, 2007 IEEE International Conference on, S. Hutchinson, Ed., apr 2007, pp. 852 –857.
- P. K. Kim, P. Vadakkepat, T.-H. Lee, and X. Peng, “Evolution of control systems for mobile robots,” in Proceedings of the 2002 Congress on Evolutionary Computation CEC2002, D. B. Fogel, M. A. El-Sharkawi, X. Yao, G. Greenwood, H. Iba, P. Marrow, and M. Shackleton, Eds.1em plus 0.5em minus 0.4emIEEE Press, 2002, pp. 617–622.
- =2plus 43minus 4 R. A. Téllez and C. Angulo, “Progressive design through staged evolution,” in Frontiers in Evolutionary Robotics, H. Iba, Ed. 1em plus 0.5em minus 0.4emVienna: I-Tech Education and Publishing, April 2008, ch. 20, pp. 353–378. [Online]. Available: http://www.intechopen.com/articles/show/title/ frontiers_in_evolutionary_robotics =0pt
- R. W. Paine and J. Tani, “How hierarchical control self-organizes in artificial adaptive systems,” Adaptive Behavior, vol. 13, no. 3, pp. 211–225, 2005.
- J. Pérez-Carrasco, C. Zamarreño-Ramos, T. Serrano-Gotarredona, and B. Linares-Barranco, “On neuromorphic spiking architectures for asynchronous stdp memristive systems,” in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, 2010, pp. 1659 –1662.
- J. Blynel and D. Floreano, “Exploring the t-maze: Evolving learning-like robot behaviors using ctrnns,” in Applications of Evolutionary Computing, ser. Lecture Notes in Computer Science, S. Cagnoni, C. Johnson, J. Cardalda, E. Marchiori, D. Corne, J.-A. Meyer, J. Gottlieb, M. Middendorf, A. Guillot, G. Raidl, and E. Hart, Eds.1em plus 0.5em minus 0.4emSpringer Berlin / Heidelberg, 2003, vol. 2611, pp. 173–176.
- A. Soltoggio, J. A. Bullinaria, C. Mattiussi, P. Dürr, and D. Floreano, “Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward- based Scenarios,” in Proceedings of the 11th International Conference on Artificial Life (Alife XI), S. Bullock, J. Noble, R. Watson, and M. A. Bedau, Eds.1em plus 0.5em minus 0.4emCambridge, MA: MIT Press, 2008, pp. 569–576.
- E. R. Kandel, J. H. Schwartz, and T. M. Jessell, Principles of Neural Science, 4th ed.1em plus 0.5em minus 0.4emMcGraw-Hill Medical, Jul. 2000.
- A. M. Thomson and J. Deuchars, “Temporal and spatial properties of local circuits in neocortex,” Trends in Neurosciences, vol. 17, no. 3, pp. 119 – 126, 1994.