Generalisation of structural knowledge in the Hippocampal-Entorhinal system

Generalisation of structural knowledge in the Hippocampal-Entorhinal system

James C.R. Whittington*
University of Oxford, UK
james.whittington@magd.ox.ac.uk
\ANDTimothy H. Muller*
University of Oxford, UK
timothy.muller@magd.ox.ac.uk
&Caswell Barry
University College London, UK
caswell.barry@ucl.ac.uk
\ANDTimothy E.J. Behrens
University of Oxford, UK
behrens@fmrib.ox.ac.uk
Abstract

A central problem to understanding intelligence is the concept of generalisation. This allows previously learnt structure to be exploited to solve tasks in novel situations differing in their particularities. We take inspiration from neuroscience, specifically the Hippocampal-Entorhinal system (containing place and grid cells), known to be important for generalisation. We propose that to generalise structural knowledge, the representations of the structure of the world, i.e. how entities in the world relate to each other, need to be separated from representations of the entities themselves. We show, under these principles, artificial neural networks embedded with hierarchy and fast Hebbian memory, can learn the statistics of memories, generalise structural knowledge, and also exhibit neuronal representations mirroring those found in the brain. We experimentally support model assumptions, showing a preserved relationship between grid and place cells across environments.

 

Generalisation of structural knowledge in the Hippocampal-Entorhinal system


  James C.R. Whittington* University of Oxford, UK james.whittington@magd.ox.ac.uk Timothy H. Muller* University of Oxford, UK timothy.muller@magd.ox.ac.uk Caswell Barry University College London, UK caswell.barry@ucl.ac.uk Timothy E.J. Behrens University of Oxford, UK behrens@fmrib.ox.ac.uk

\@float

noticebox[b]Preprint. Work in progress.\end@float

1 Introduction

Animals have an innate ability to flexibly take knowledge from one domain and transfer it to another. This is not yet the case for machines. The advantages of transferring knowledge are clear - quick inferences are possible in new situations, thus one does not have to always learn afresh. Transfer of statistical structure (the relationships between objects in the world) is particularly useful as it imbues an agent with the ability to fit things/concepts together that share the same statistical structure, but differ in the particularities e.g. when one hears a story, they can fit it in with what they already know about stories in general such as there is a beginning, middle and end - when the funny story appears while listening to the the news, it can be inferred that the programme is about to end.

Generalisation is a topic of much interest. Advances in machine learning and artificial intelligence (AI) have been very impressive Krizhevsky et al. [2012], Mnih et al. [2015], however there is scepticism as to whether the ’true’ underlying structure is being learned. We propose that in order to learn and generalise structural knowledge, this structure must be represented explicitly, i.e. separated from the representations of the sensory objects in the world. In worlds that share the same structure but with different sensory objects, the explicitly represented structure can be combined with sensory information in a conjunctive code. This allows new sensory observations to fit with prior learned structural knowledge, which leads to generalisation.

In order to understand how we may construct such a system, we take inspiration from neuroscience. The hippocampus is known to be important for generalisaiton, memory, problems of causality, inferential reasoning, transitive reasoning, one-shot imagination, and navigation Dusek and Eichenbaum [1997], Buckmaster et al. [2004], Hassabis et al. [2007]. We propose the statistics of memories in hippocampus are extracted by cortex McClelland et al. [1995], and future hippocampal representations/memories are constrained to be consistent with the learned structural knowledge. We find this an interesting system to model using artificial neural networks (ANNs), as it may offer insights into the problem of generalisation for machines, further our understanding of the biological system itself and continue to link Neuroscience and AI research Hassabis et al. [2017], Whittington and Bogacz [2017].

For spatial navigation, we have a good understanding of neuronal representations in the form of place cells (Hippocampus) and grid cells (Medial Entorhinal Cortex). Thus when modelling this system, we start with problems akin to navigation so that we can both leverage and compare our results to the known representational information. Place cells O’Keefe et al. [1971], and subsequently grid cells Hafting et al. [2005] have had a radical impact on the field of neuroscience, and lead to the 2014 Nobel Prize in Physiology and Medicine. Place cells and grid cells are similar in that they have a consistent firing pattern for specific regions of space. Place cells tend to only fire in a single (or couple) of locations in a given environment, however grids cells fire in a regular lattice pattern that tiles the space. These cells cemented the idea of a ’cognitive map’, where an animal holds an internal representation of the space that it navigates. Traditionally these cells were believed to be spatial only. It has since emerged that place cells can code for entirely non-spatial dimensions such as sound frequency Aronov et al. [2017]. It has also been shown that there are grid like codes for two dimensional (2D) non-spatial coordinate systems Constantinescu et al. [2016], and that grid cells which respond for spatial environments, also respond in non-spatial environments Aronov et al. [2017]. Thus it seems that the place and grid code is not solely used for spatial cognition, but is perhaps a way of general way of representing information.

Grid cells may offer a generalisable structural code. Recent results suggests they summarise the statistics of 2D space, either via a PCA of hippocampal place cells Dordek et al. [2016] or as eigenvectors of the sucessor representation Stachenfeld et al. [2017]. These summary statistics mean that one can generalise rules of 2D-ness not just in ’spatial’ space, e.g. if A is close to B, and B is close to C, then we can infer that A and C are also close. Indeed grid cell representations are similar in environments which share structure (see section 5). Place cells may offer a conjunctive representation. Their activity has been shown to be modulated by the sensory environment as well as location Komorowski et al. [2009]. Additionally the place cell code in one environment is different to the code in a structurally identical environment - this is called remapping Bostock et al. [1991], Leutgeb et al. [2005]. Remapping is traditionally thought to be random. However, we propose that place cells are a conjunctive representation between structural (grid cells) input and sensory input, thus place cells remap to locations consistent with this conjunction.

We implement our proposal in an ANN tasked with predicting sensory observations when walking on 2D graph worlds, where each vertex has an associated sensory experience. To make accurate predictions, the agent must learn the underlying hidden structure of the graphs. We separate structure from sensory identity, proposing grid cells encode structure, and place cells form a conjunctive representation between identity and structure (Figure 1). This conjunctive representation forms a memory, which is a bridge between structure and identity, and allows the same structural code to be reused in environments with the same statistics but different sensory experiences. Predictive state transitions occur through the grid cells, as the grid code is a natural basis for navigation in 2D spaces Stemmler et al. [2015], Bush et al. [2015]. We combine Hebbian learning, allowing for rapid formation of episodic memories, with gradient descent, which slowly learns to extract statistics of these memories. We focus on 2D graphs so we can compare our learned representations to the well-characterised neuronal representations of space (place and grid cells), noting however our approach is general to any graph structure. We further present analyses of a remapping experiment Barry et al. [2012], which support our model assumptions, showing place cells remap to locations consistent with grid code i.e. not randomly as previously thought.

2 Related work

Concurrently developed papers have discovered grid-like and/or place-like representations in ANNs Banino et al. [2018], Cueva and Wei [2018]. Neither paper uses memory or explains place cell phenomena. Both, however, use supervised learning in order to discover grid-like representations. They supervise either on actual coordinates Cueva and Wei [2018], or more notably on ground truth place cells Banino et al. [2018]. It is perhaps not surprising that one discovers spatial representations when the network is explicitly supervised on spatial representations. In our model we are performing unsupervised learning and do not provide any external information to the network, other than the available actions, current action and immediate sensory data. These are all information available to a biological agent, unlike or ground truth spatial representations. We further propose a role for grid cells in generalisation, not just navigation.

Our modelling approach is simliar to Gemici et al. [2017], Authors et al. [2018]. However, we choose our memory storage and addressing to be computationally biologically plausible (rather than using other types of differentiable memory more akin to RAM), as well as using hierarchical processing. This enables our model to discover representations that are useful for both navigating and for addressing memories. We also are explicit in separating out the abstract structure of the space, from any specific content (Figure 1).

We follow a similar ideology to complementray learning systems McClelland et al. [1995] where the statistics of memories in hippocampus are extracted by cortex. We additionally propose that this learnt structural knowledge constrains hippocampal representations in new contexts, allowing reuse of learnt knowledge.

3 Model

We consider an agent passively moving on a 2D graph. On each vertex of the graph there is a non-unique sensory stimulus e.g. an image. If the agent wants to ’understand’ its environment then it wishes to maximise its model’s probability of observing the correct stimulus. The agent is trained on many environments sharing the same structure i.e. 2D graphs, however the stimulus distribution is different (each vertex is randomly assigned a stimuli). The agent can have various approaches to this problem, for example it could remember all pairwise associations between stimuli - however this approach does not exploit the underlying structure of the task - the 2D-ness of the space. A generalisable approach would be to have an abstract representation of location in the space, and then to place a memory of what stimulus was observed at that location. Since the agent understands where it is in space, this allows for accurate state predictions to previously visited nodes even if the agent has never travelled along that particular edge before.

Figure 1: Separated structural and sensory representations combined in a conjunctive code. LEC/MEC: Lateral/Medial Entorhinal cortex, HPC: Hippocampus.

We propose that the grid cell representation is of a location in abstract space, and the place cell representation is useful for the formation of fast episodic memories. We propose that to link the stimulus with that location, the memory should be a conjunction of abstact location and sensory stimulus, thus we propose the place cells form a conjunctive representation between the sensorium and the grid input (Figure 1). This is consistent with experimental evidence Komorowski et al. [2009]. We posit that this is done hierarchically across spatial frequencies, such that the higher frequency statistics can be repeatedly used across space. This greatly reduces the number of weights that need to be learnt. This proposition is consistent with the fact that hierarchical scales are also observed within both grid Stensola et al. [2012] and place cells Kjelstrup et al. [2008], with the Entorhinal cortex receiving sensory information in hierarchical temporal scales.

We think of the sensory data, the item/object experience of a state, as coming from the the ’what stream’ via the Lateral Entorhinal Cortex. The grid cells in our model, are the ’where stream’ coming from the Medial Entorhinal Cortex. Our conjunctive memory links ’what’ to ’where’.

We consider the agent to have the following generative model (graphical model shown in Figure 1(a)).

Where is the instantaneous sensory stimulus (presented as a one-hot vector) and the grid cell. represents the agent’s memory composed from place cells ( in Figure 2). For clarity we have omitted passive actions or distributional paramters. We use approximate inference, with the following recognition distribution . Following Gemici et al. [2017] (Appendix A), we end up with the free energy , with as a per time-step free energy. We can then use variational autoencoders to optimise Kingma and Welling [2013], Rezende et al. [2014].

Thus we view the Entorhinal-Hippocampal system as a system that performs inference. The grid cells make an inference based on its previous estimate of location in abstract space, sensory information that it can link to previous locations via memory, and information regarding the available actions to be taken (a.k.a boundary information). The place cells, formalised in section 3.1, are a conjunction between the sensory data , and current location in abstract space , and are stored as memories.

We note an alternate generative model with the place cells as random variables is possible, with (Figure 1(a) with dashed circles instead of boxes). This formalisation offers inspiration for auxiliary losses introduced later.

(a) Generative model
(b) Inference model/architecture
Figure 2: Circled/ boxed variables are stochastic/ deterministic. Red arrows indicate additional inference dependencies.

3.1 Details of approach

With the above in mind, we discuss details of the model and their intuition. When possible we denote a layer of activations with vector notation e.g. or . Otherwise variables with subscripts and/ or represent elements of the corresponding vector e.g. . We use to denote scalar weights, for matrices and for biases. The sensory data is a one-hot vector where each of its elements represent a particular sensory identity . We consider place and grid cells, and respectively, to come in different frequencies indexed by the superscript . A grid cell in a given frequency is denoted by , where the index is over the number of grid cells in that frequency . A place cell also has a particular sensory preference - we denote this by where the index has the same meaning as above i.e. there are place cells for frequency . We now present details of the distributions (in an unconventional order for ease of comprehension), with the rough inference network shown in Figure 1(b). Though we already refer to these layers as grid and place cells, it is important to note that gridness and placeness are not hardcoded - all representations are learned.

Place cell layer in inference arm. We treat these neurons as the conjunction between sensorium and structural information from the grid cells: where and i.e. repeated and tiled times respectively to have the correct dimensions. is a learnable parameter and is a normalised version of . The sensorium for each frequency, , is obtained via exponential smoothing of the sensory data with . Thus we see that we can obtain hierarchical temporal scales of the sensory data via different smoothing constants . The separation into hierarchical scales helps to provide a unique code for each position, even if the same stimuli appears in several locations of one environment, since the surrounding stimuli, and therefore the lower frequency place cells, are likely to be different. Since the place cell representations form memories, one can utilise the hierarchical scales for memory retrieval. We choose the function to ensure the only neurons active are those ’consistent’ with both the sensorium and the structural information. This also sparsifies our memories and prevents interference. We understand that the exponential smoothing is over-simplified and far from an optimal sensorium, but we find it a simple way with minimal assumptions to approach this problem.

Place cell layer in generative arm. We extract stored memories, via an attractor network of the form , where is the iteration of the attractor network. i.e. with its dimensions scaled appropriately by . We denote the output of the network as .

Data generation. We treat as a categorical distribution. From the ’generated’ place cells , we estimate where is a learnable parameter for each frequency. The predicted sensory category is then , noting the index.

Grid generation. We specify the state transition distribution as , where is the action taken. We define , where and connections are only allowed from low frequency to the same or higher frequency. is a simple, per frequency, neural network with as its input. Again we separate into different hierarchical scales so that high frequency statistics can be reused across lower frequency statistics.

Grid inference. We factorise the posterior as where is information about the available actions and is normally distributed with both mean and variance a simple linear function of . is equivalent to the generative distribution. provides information on location from the current sensorium. Since memories are a conjunction between location and sensorium, a memory contains information regarding location. Thus we can run our attractor network in ’reverse’, i.e we use as the input, to retrieve a memory associated with the current sensorium. We use a, per frequency, linear mapping from the retrieved memory to give the mean of the distribution. The variance of the distribution, per frequency, is a function (simple neural network) of how well (summed squared error) the retrieved memory is able to predict back as per ’data generation’. Thus if we have seen the sensorium before, we will be able to retrieve a memory, which we can assess the quality of via its ability to predict back the sensorium. If we are able to do so successfully then we can be more confident that our memory is informative on current location. This factored posterior of three Gaussians, gives another Gaussian distribution with a precision weighted mean. Thus when we are unconfident of our generated current position we can use sensory information to refine our current position estimate.

Memory storage. Memories are stored in a Hebbian fashion, similar to Ba et al. [2016], where are place cells in the inference arm - the memories we wish to store. are retrieved memories when is the input to the attractor network (place cells in the generative arm) and are the retrieved memories when is the input to the attractor network (see ’Grid inference’). The memory matrix is updated as follows: . The first outer product links the location with the memory, and the second outer product links the sensorium with the memory. Connections from high to low frequencies are set to zero for hierarchical memory retrieval. We choose a Hebbian like learning rule, as we wish the agent to be able to perform rapid learning when entering a new environment. and are the rate of forgetting and remembering respectively. Alternative updates rules that also work are ones that are more ’traditionally Hebbian’ i.e. contain terms such as .

3.2 Model implications

We offer a solution to the problem of how structural codes are shared, via grid cells, to remapped place cells. Even with structure the same, since sensory stimuli in other environments are different, the conjunctive code is different. Thus we beleive that place cell remapping is not random, instead place cells are chosen that are consistent with grid and sensory codes. This is a different notion to the other models of remapping Abbott et al. [2011], where there is random generation of place and grid cells. Learning starts afresh in each environment for these randomly generated cells. We see that our method allows for dramatically faster learning, as learnt structure can be re-used in new environments. In section 5, we present experimental evidence in concordance with our model.

In addition to offering a novel theory for place cell remapping, our model also provides an explanation for what determines place field sizes. Specifically, a given place cell will be active in the regions of space that are consistent with both with the grid representation (structure) received by that place cell and the sensory experience coded for by that place cell. It further offers explanation for why a given place cell may have multiple place fields within one environment, as there may be multiple locations where this consistency holds. Therefore our model offers a novel framework for designing experiments to manipulate place field sizes and locations, for example, based on simultaneously recorded grid cells and environmental cues.

We believe that using more biologically realistic computational mechanisms (e.g. Hebbian Memory, no LSTM) will facilitate further incorporation of neuroscience-inspired phenomena, such as successor representations or replay, which may be useful for building AI systems.

4 Model experiments

Although we have presented a Bayesian formulation, best results were obtained by using a network of the identical architecture, however only using the means of the above distributions - i.e. not sampling from the distributions. We then use the following surrogate per time step loss function:

is a cross entropy loss, where , and are squared error losses. The losses encourage the retireved memories (in either forward or reverse manner) to be the same as the ’inferred’ memory. The loss encourages the network to represent information hierarchically. We also add a prediction of the next time-step loss for both and , as well as regularising the grid cell activity with a l2 loss. We use BPTT truncated to 25 steps, and optimise with ADAM Kingma and Ba [2014]. We use and different frequencies with as , with () as . We choose not to learn here. Our environments are square with possible widths . The agent changes to a completely new environment when it consistently reaches prediction accuracy or after steps. The agent has a slight bias for straight paths to facilitate exploration. and are set to and respectively. We normalise by subtracting the mean, and then dividing by the l2-norm, to give . is a direction signal, where the agent can move, up, down, left, right or stay still. We did not optimise for hyperparameters, as our aim is not to show stellar performance, but more the computational and generalisable principles of this approach.

(a) Schematic
(b) Grid cells
(c) Banding
Figure 3: Top panel all one environment. Bottom panels another environment. (a) Schematic of two environments (not actual size). (b) Same three grid cells in each environment. High frequency grid to the left, and lower frequency on the right. These are square as we have chosen a four way embedding of actions and four connected space. (c) Banded cell.

We show the representations learned by our network in Figure 3 and 4 by plotting spatial activity maps of particular neurons. In Figure 2(b) we present the grid cells. The top panel shows cells from one environment, and the bottom panels from a different and slightly smaller environment. We see that our network chooses to represent information in a grid-like pattern (square-grids as the statistics of our space is square). We can also observe spatial firing fields at different frequencies. We see that the representations are consistent across environments, regardless of their size - thus we have a generalisable representation of 2D space, not just a template of a particular size environment. Figure 2(c) shows banded cells from our model which are also observed in the brain alongside grid cells Krupic et al. [2012].

(a) High frequency
(b) Low frequency
Figure 4: Place cell remapping for a high and lower frequency cell from model. Env: Environement.

We observe the spontaneous appearance of phases in the grid cells (middle and right panels of Figure 2(b)). This is to say that each grid representation is a shifted version of another at the same frequency. The separation into different phases means that two place cells that respond to the same stimuli, will not necessarily be active simultaneously if the stimuli is present. Each cell will only be active when their corresponding grid phase is active. This means that one can uniquely code for the same stimuli in many different locations. Across two environments, a given stimulus may occur at different locations, and therefore different grid phases. Thus, due to their conjunctive nature, a place cell may remap across environments. We show this in Figure 4.

(a) Zero-shot link inference
(b) One-shot learning
Figure 5: Accuracy of visited nodes, when link previously unseen. Black dashed line is chance. (a) Returning to a node, as a function of fraction of nodes visited (b) First time staying still at a node, as a function of times node visited.

If we have learned the structure of our space, we will be able to make next step predictions of previously visited nodes even if we are travelling along a previously non-traversed edge in the graph - i.e. we infer a link in the graph. We present such data in Figure 4(a). We plot the prediction accuracy of these link inferences as a function of the fraction of the total nodes visited in the graph. We achieve better than chance () prediction, which increases steadily as more nodes are visited. The reason we do not immediately have a high accuracy regardless of the fraction of nodes visited is either due to non-one-shot Hebbian learning, or the agent is unsure of its position. To test one-shot learning we consider occasions when the agent stays still for the first time at a node, as a function of the number of times that node has already been visited (Figure 4(b)). We see that the agent is able to predict at a high accuracy even if it has only just visited the node for the first time. This indicates we are able to do one-shot-learning with our Hebbian memory. Combined with Figure 4(a) it also indicates that the agent takes some time to be sure of its location.

5 Data analysis

Our framework predicts place cells and grid cells retain their relationship across environments, allowing generalisation of structure encoded by grid cells. We empirically test this prediction in data from a remapping experiment Barry et al. [2012] where both place and grid cells were recorded from rats in two different environments. The environments were of the same dimensions (1m by 1m) but differed in their sensory (texture/visual/olfactory) cues so the animals could distinguish between them. Each of seven rats has recordings from both environments. Recordings on each day consist of five twenty-minute trials in the environments: the first and last trials in one environment and the intervening three trials in a second environment. We test the prediction that a given place cell retains its relationship with a given grid cell across environments using two measures. First, whether grid cell activity at the position of peak place cell activity is correlated across environments (gridAtPlace), and second, whether the minimum distance between the peak place cell activity and a peak of grid cell activity is correlated across environments (minDist; normalised to corresponding grid scale). To account for potential confounds or biases (e.g. border effects, inaccurate peaks), we fit the recorded grid cell rate maps to an idealised grid cell formula Stemmler et al. [2015], and use this ideal grid rate map to give grid cell firing rates and locations of grid peaks. Only grid cells with high grid scores () were used to ensure good ideal grid fits to the data, and we excluded grid cells with large scales (cm), both computed as in Barry et al. [2012]. Locations of place cell peaks were simply defined as the location of maximum activity in a given cell’s rate map. To account for border effects, we removed place cells that had peaks close to borders (cm from a border).

Our framework predicts a preserved relationship between place and grid cells of the same spatial scale (module). However, since we do not know the modules of the recorded cells, we can only expect a non-random relationship across the entire population. For each measure, we compute its value for every place cell-grid cell pair (for both trials). A correlation across trials is then performed on these values. To test the significance of this correlation and ensure it is not driven by bias in the data, we generate a null distribution by randomly shifting the place cell rate maps and recomputing the measures and their correlation across trials. We then examine where the correlation of the non-shuffled data lies relative to the null.

Figure 6 presents analyses for both the gridAtPlace measure and the minDist measure. The scatter plots show the correlation of a given measure across trials, where each point is a place cell-grid cell pair. The histogram plots show where this correlation (green line) lies relative to the null distribution of correlation coefficients. The p value is the proportion of the null distribution that is greater than the unshuffled correlation.

(a) Grid at place
(b) Minimum distance
Figure 6: Top panels are for within environment analyses. Bottom panels across environment analyses.

As a sanity check, we first confirm these measures are significantly correlated within environments (i.e. across two visits to the same environment - trials and ), when the cell populations should not have remapped (see Figure 5(a), top and 5(b), top). We then test across environments (i.e. two different environments - trials and ), to asses whether our predicted non-random remapping relationship between grid and place cells exists (Figure 5(a), bottom and 5(b), bottom). Here we also find significant correlations for both measures for the place cell-grid cell pairs. We note the gridAtPlace result holds across environments (p ) when not fitting an ideal grid and using a wide range of grid score cut-offs (minDist not calculated without the ideal grid due to inaccurate grid peaks). Finally performing the across environment gridAtPlace analysis with our model rate maps (Figure 6(a)), we observe correlations of 0.3-0.35 which are consistent with that of the data.

(a) Grid at place
(b) Preserved structure
Figure 7: Model-Data correspondence. Black/ Red: Data/ Model. (a) gridAtPlace across environments. (b) Scatter of elements of correlation matrices across environments.

To share structure, the relationship between grid cells should be preserved across environments. The grid cell correlation matrix is preserved (i.e. itself correlated) across environments (p from null), both in the data Barry et al. [2012] as well as in our model (Figure 6(b)). These results are consistent with the notion that grid cells encode generalisable structural knowledge.

These are the first analyses demonstrating non-random place cell remapping based on neural activity, and provide evidence for a key prediction of our model: that place cells, despite remapping across environments, retain their relationship with grid cells.

6 Conclusions

Here we proposed a mechanism for generalisation of structure inspired by Hippocampal-Entorhinal system. We proposed that one can generalise state-space statistics, via explicit separation of structure and stimuli while using a conjunctive memory representation to link the two. We proposed that spatial hierarchies are utilised to allow for an efficient combinatorial code. We have shown that hierarchical grid-like and place-like representations emerge naturally from our model in a purely unsupervised learning setting. We have shown that these representations are effective at both generalising the state-space, but also for hierarchical memory addressing. We have also presented experimental evidence that demonstrates grid and place cells retain their relationships across environment, which supports our model assumptions. We hope that this work can provide new insights that will allow for advances in AI, as well as providing new predictions, constraints and understanding in Neuroscience.

References

  • Abbott et al. [2011] Larry F. Abbott, J. D. Monaco, and Larry F. Abbott. Modular Realignment of Entorhinal Grid Cell Activity as a Basis for Hippocampal Remapping. Journal of Neuroscience, 31(25):9414–9425, 2011. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.1433-11.2011. URL http://www.jneurosci.org/cgi/doi/10.1523/JNEUROSCI.1433-11.2011.
  • Aronov et al. [2017] Dmitriy Aronov, Rhino Nevers, and David W. Tank. Mapping of a non-spatial dimension by the hippocampal–entorhinal circuit. Nature, 543(7647):719–722, 2017. ISSN 0028-0836. doi: 10.1038/nature21692. URL http://www.nature.com/doifinder/10.1038/nature21692.
  • Authors et al. [2018] Anonymous Authors, Marco Fraccaro, Danilo Jimenez Rezende, Yori Zwols, Alexander Pritzel, S. M. Ali Eslami, and Fabio Viola. Generative Temporal Models with Spatial Memory for Partially Observed Environments. (Icml), apr 2018. URL http://arxiv.org/abs/1804.09401.
  • Ba et al. [2016] Jimmy Lei Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, and Catalin Ionescu. Using Fast Weights to Attend to the Recent Past. Advances in Neural Information Processing Systems, pages 1–10, 2016. URL http://arxiv.org/abs/1610.06258.
  • Banino et al. [2018] Andrea Banino, Caswell Barry, Benigno Uria, Charles Blundell, Timothy Lillicrap, Piotr Mirowski, Alexander Pritzel, Martin J Chadwick, Thomas Degris, Joseph Modayil, Greg Wayne, Hubert Soyer, Fabio Viola, Brian Zhang, Ross Goroshin, Neil Rabinowitz, Razvan Pascanu, Charlie Beattie, Stig Petersen, Amir Sadik, Stephen Gaffney, and Helen King. Representations in Artificial Agents. Nature, 26, 2018. ISSN 1476-4687. doi: 10.1038/s41586-018-0102-6. URL http://dx.doi.org/10.1038/s41586-018-0102-6.
  • Barry et al. [2012] C. Barry, L. L. Ginzberg, J. O’Keefe, and N. Burgess. Grid cell firing patterns signal environmental novelty by expansion. Proceedings of the National Academy of Sciences, 109(43):17687–17692, 2012. ISSN 0027-8424. doi: 10.1073/pnas.1209918109. URL http://www.pnas.org/cgi/doi/10.1073/pnas.1209918109.
  • Bostock et al. [1991] E Bostock, R U Muller, and J L Kubie. Experience-dependent modifications of hippocampal place cell firing. Hippocampus, 1(2):193–205, 1991. ISSN 1050-9631. doi: 10.1002/hipo.450010207.
  • Buckmaster et al. [2004] Cindy A. Buckmaster, Howard Eichenbaum, David G Amaral, Wendy A Suzuki, and Peter R Rapp. Entorhinal Cortex Lesions Disrupt the Relational Organization of Memory in Monkeys. Journal of Neuroscience, 24(44):9811–9825, 2004. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.1532-04.2004. URL http://www.jneurosci.org/cgi/doi/10.1523/JNEUROSCI.1532-04.2004.
  • Bush et al. [2015] Daniel Bush, Caswell Barry, Daniel Manson, and Neil Burgess. Using Grid Cells for Navigation. Neuron, 87(3):507–520, 2015. ISSN 10974199. doi: 10.1016/j.neuron.2015.07.006. URL http://dx.doi.org/10.1016/j.neuron.2015.07.006.
  • Constantinescu et al. [2016] Alexandra O. Constantinescu, Jill X. O’Reilly, and Timothy E. J. Behrens. Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292):1464–1468, 2016. ISSN 10959203. doi: 10.1126/science.aaf0941.
  • Cueva and Wei [2018] Christopher J. Cueva and Xue-Xin Wei. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. pages 1–19, 2018. URL http://arxiv.org/abs/1803.07770.
  • Dordek et al. [2016] Yedidyah Dordek, Daniel Soudry, Ron Meir, and Dori Derdikman. Extracting grid cell characteristics from place cell inputs using non-negative principal component analysis. eLife, 5(MARCH2016):1–36, 2016. ISSN 2050084X. doi: 10.7554/eLife.10094.
  • Dusek and Eichenbaum [1997] J. A. Dusek and H. Eichenbaum. The hippocampus and memory for orderly stimulus relations. Proceedings of the National Academy of Sciences, 94(13):7109–7114, 1997. ISSN 0027-8424. doi: 10.1073/pnas.94.13.7109. URL http://www.pnas.org/cgi/doi/10.1073/pnas.94.13.7109.
  • Gemici et al. [2017] Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, and Timothy Lillicrap. Generative Temporal Models with Memory. pages 1–25, 2017. ISSN 1702.04649. URL http://arxiv.org/abs/1702.04649.
  • Hafting et al. [2005] Torkel Hafting, Marianne Fyhn, Sturla Molden, May-britt Britt Moser, and Edvard I. Moser. Microstructure of a spatial map in the entorhinal cortex. Nature, 436(7052):801–806, 2005. ISSN 00280836. doi: 10.1038/nature03721.
  • Hassabis et al. [2007] Demis Hassabis, Dharshan Kumaran, S. D. Vann, and E. A. Maguire. Patients with hippocampal amnesia cannot imagine new experiences. Proceedings of the National Academy of Sciences, 104(5):1726–1731, 2007. ISSN 0027-8424. doi: 10.1073/pnas.0610561104. URL http://www.pnas.org/cgi/doi/10.1073/pnas.0610561104.
  • Hassabis et al. [2017] Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. Neuroscience-Inspired Artificial Intelligence. Neuron, 95(2):245–258, 2017. ISSN 10974199. doi: 10.1016/j.neuron.2017.06.011. URL http://dx.doi.org/10.1016/j.neuron.2017.06.011.
  • Kingma and Ba [2014] Diederik P. Kingma and Jimmy Lei Ba. Adam: A Method for Stochastic Optimization. pages 1–15, 2014. ISSN 09252312. doi: http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503. URL http://arxiv.org/abs/1412.6980.
  • Kingma and Welling [2013] Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. (Ml):1–14, 2013. ISSN 1312.6114v10. doi: 10.1051/0004-6361/201527329. URL http://arxiv.org/abs/1312.6114.
  • Kjelstrup et al. [2008] Kirsten Brun Kjelstrup, Trygve Solstad, Vegard Heimly Brun, Torkel Hafting, Stefan Leutgeb, Menno P. Witter, Edvard I. Moser, and May-Britt Moser. Finite Scale of Spatial Represenation in the Hippocampus. Science, 321(July):140 – 143, 2008. ISSN 0036-8075.
  • Komorowski et al. [2009] Robert W. Komorowski, Joseph R. Manns, and Howard Eichenbaum. Robust Conjunctive Item-Place Coding by Hippocampal Neurons Parallels Learning What Happens Where. Journal of Neuroscience, 29(31):9918–9929, 2009. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.1378-09.2009. URL http://www.jneurosci.org/cgi/doi/10.1523/JNEUROSCI.1378-09.2009.
  • Krizhevsky et al. [2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, pages 1–9, 2012. ISSN 10495258. doi: http://dx.doi.org/10.1016/j.protcy.2014.09.007.
  • Krupic et al. [2012] Julia Krupic, Neil Burgess, and John O’Keefe. Neural Representations of Location Composed of Spatially Periodic Bands. 337(August):853–857, 2012. URL http://www.sciencemag.org/content/337/6096/853.full.pdf.
  • Leutgeb et al. [2005] Stefan Leutgeb, Jill K. Leutgeb, Carol A. Barnes, Edvard I. Moser, Bruce L. McNaughton, and May-Britt Moser. Independent Codes for Spatial and Episodic Memory in Hippocampal Neuronal Ensembles. Science, 309(5734):619–623, jul 2005. ISSN 0036-8075. doi: 10.1126/science.1114037. URL http://www.sciencemag.org/cgi/doi/10.1126/science.1114037.
  • McClelland et al. [1995] James L. McClelland, Bruce L. McNaughton, and Randall C. O’Reilly. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3):419–457, 1995. ISSN 0033295X. doi: 10.1037/0033-295X.102.3.419.
  • Mnih et al. [2015] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Bellemare Marc G, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015. ISSN 10450823. doi: 10.1038/nature14236. URL http://dx.doi.org/10.1038/nature14236.
  • O’Keefe et al. [1971] John O’Keefe, J. Dostrovsky, J Dostrovske, and J. Dostrovsky. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Research, 34(1):171–175, 1971. ISSN 00068993. doi: 10.1016/0006-8993(71)90358-1.
  • Rezende et al. [2014] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. 2014. ISSN 10495258. doi: 10.1051/0004-6361/201527329. URL http://arxiv.org/abs/1401.4082.
  • Stachenfeld et al. [2017] Kimberley L. Kimberly L. Stachenfeld, Matthew M. Botvinick, and Samuel J. Gershman. The hippocampus as a predictive map. Nature Neuroscience, 20(11):1643–1653, 2017. ISSN 15461726. doi: 10.1038/nn.4650.
  • Stemmler et al. [2015] Martin Stemmler, Alexander Mathis, and Andreas Herz. Connecting Multiple Spatial Scales to Decode the Population Activity of Grid Cells. Science Advances, in press(December):1–12, 2015. ISSN 2375-2548. doi: 10.1126/science.1500816.
  • Stensola et al. [2012] Hanne Stensola, Tor Stensola, Trygve Solstad, Kristian FrØland, May-Britt Britt Moser, and Edvard I. Moser. The entorhinal grid map is discretized. Nature, 492(7427):72–78, 2012. ISSN 00280836. doi: 10.1038/nature11649. URL http://www.nature.com/doifinder/10.1038/nature11649http://dx.doi.org/10.1038/nature11649.
  • Whittington and Bogacz [2017] James C. R. Whittington and Rafal Bogacz. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity. Neural Comput, 29(5):1229–1262, may 2017. ISSN 0899-7667. doi: 10.1162/NECO_a_00949. URL http://www.mitpressjournals.org/doi/10.1162/NECO{_}a{_}00949http://arxiv.org/abs/1412.4210https://www.biorxiv.org/content/early/2016/12/23/035451https://www.ncbi.nlm.nih.gov/pubmed/28333583{%}0Ahttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467749/pdf/emss-73.

Appendix A Derivation of variational lower bound

We follow the derivation from Gemici et al. [2017]. Exploiting Jensen’s inequality, we have

(1)

Should we specify the recognition distribution as , then we can then write things as the following

(2)

Should we define following , then we have

(3)

Since is not a function of elements from the set , we can rewrite the above equation as the following:

(4)

All inner integrals integrate to 1, and so we are left with the following:

(5)

This can be rewritten as:

(6)

We can see that we have a per time step energy function (). We can now optimise in a recurrent network. Including our more specific distributions of , and , we then recover the same per time step () function to optimise as presented in the main text.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
198949
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description