About Learning inRecurrent Bistable Gradient Networks

About Learning in
Recurrent Bistable Gradient Networks

Jörn Fischer Mannheim University
of Applied Sciences
Paul-Wittsack-Str. 10
68163 Mannheim
Tel.: (+49) 621 292-6767
Fax: (+49) 621 292-6-6767-1
Email: j.fischer@hs-mannheim.de
   Steffen Lackner Mannheim University
of Applied Sciences
Paul-Wittsack-Str. 10
68163 Mannheim
Email: steffen.lackner@posteo.de

Recurrent Bistable Gradient Networks [1, 2, 3] are attractor based neural networks characterized by bistable dynamics of each single neuron. Coupled together using linear interaction determined by the interconnection weights, these networks do not suffer from spurious states or very limited capacity anymore. Vladimir Chinarov and Michael Menzinger, who invented these networks, trained them using Hebb’s learning rule. We show, that this way of computing the weights leads to unwanted behaviour and limitations of the networks capabilities. Furthermore we evince, that using the first order of Hintons Contrastive Divergence algorithm [4] leads to a quite promising recurrent neural network. These findings are tested by learning images of the MNIST database for handwritten numbers.

I Introduction

Hopfield networks invented in 1984 by John Hopfield [6, 7] are somehow predecessors of Deep Belief Networks which are widely used as state of the art neural networks. They are recurrent neural networks inspired by the physical behaviour of spin glasses. Hopfield networks are perceptron based and have a symmetric weight matrix and no self-connecting neurons. This guarantees that all dynamics, that can take place in this type of network, is a fixed point attraction. To overcome negative effects as spurious states and limited capacity of Hopfield Networks, Bolzmann Machines [5] were introduced which were then restricted to have no interconnections between neurons in a layer and which were stacked and trained layer by layer e.g. with the Wake Sleep Algorithm introduced by Geoffrey Hinton [4]. In the BioSystems journal [1] in the year 2000 and on the IWANN conference in 2001 [2] Vladimir Chinarov and Michael Menzinger presented a class of recurrent Hopfield-like networks called Bistable Gradient Networks, which eliminated the disadvantages of spurious states and of the very limited capacity. They demonstrated this by training these networks successfully with the Hebbian learning rule, storing many more patterns than a standard Hopfield network could memorize (, with neurons).

Because of their successful implementation of interconnected neurons, their paper [2] presents Hebb’s rule as the perfect, efficient way to train the Bistable Gradient Networks. In our investigation we realize, that this is not always true. There are pattern combinations which may not be stored with Hebb’s learning rule. In the following section we start with a short description of the basic principles of Bistable Gradient Networks. To understand why Hebb’s learning rule is not the best choice to train them, a simple thought experiment is described afterwards. We show, that using Hintons Contrastive Divergence leads to far better results. Furthermore we demonstrate the capabilities storing handwritten numbers from the MNIST-database into the network and point out that noisy images are nearly perfectly denoised. Finally we end up with a conclusion.

Ii Bistable Gradient Networks

In this section a short introduction to the basic concepts of Bistable Gradient Networks is given. In the domain of dynamical systems a neuron is written down as a differential equation. To derive this equation we start with the energy function of such a neuron, which may be defined as follows:


where leads to a bistable behaviour of the neuron and describes the linear coupling between the neurons.

The variable defines the neurons state or output, while the derivative of the energy function with respect to gives us the direction, in which the neurons state changes in time:


This energy function or potential is shown in figure 1; the derivative is plotted below in figure 2. The minima of the energy function correspond to the fixed points marked in the figure of its derivative. In the differential equation (2) we can see that there is a linear part—the sum of the weighted outputs—which may shift the function up or down as shown in figure 2 as a dashed line. In dependence of this linear part it easily happens that only the left or only the right fixed point exists anymore. This leads to a predetermined behaviour. The neural output converges to (or slightly above) or to or (slightly below). Let us name the state active and inactive. If a number of neurons have positive interconnection weights and a large part of these neurons is active, then their derivative will be shifted up and the inactive neurons converge to the active state. On the other hand a neuron which is active, but connected with negative weights to and from the other active neurons, will shift its derivative down and it converges to the inactive state.

Fig. 1: The energy function has two minima, where the activity of the neuron converges to. This energy function explains the bistability of the neurons.
Fig. 2: The stable fixed points are related to the energy minima in figure 1. The derivative is zero, if (active) or (inactive). The dashed curves show how the derivative is shifted up or down in dependence of the weights and the output of other neurons according to (2). If it is shifted up, only the positive fixed point exists, while if it is shifted down only the negative fixed point remains.

Iii Thought Experiment

To understand why special pattern combinations may not be stored we first write down the Hebbian learning rule:


Especially if we store only one pattern it is easily seen, that the active and inactive neurons get strong positive interconnections and the connections between active and inactive neurons will be strongly negative. This implicates, that the inverse image is always stored as strong as the image itself into the network, a phenomenon which is also described by Hopfield [6, 7].

If we now try to store many patterns into a network, which strongly overlap e.g. a big number of active neurons for all patterns with a small number of neurons which make the difference between the patterns, we find a problem emerging: a big number of neurons which is always active will always inhibit a small area of mostly inactive neurons, even if a few of them are active for a stored pattern. The network would always activate the big area, while the rest would be certainly always deactivated.

In the following section we describe how to change the learning rule to observe the wanted behaviour.

Iv Contrastive Divergence

Though the type of neurons Geoffrey Hinton uses are completely different (he uses binary output with stochastic activation) the learning rule is of great interest for us. The learning rule for may be written down as follows:


where denotes the time step. is computed as:


We start with randomly initialized weights . We initialize the networks output with the pattern to be learned. is calculated from this initialisation. After computing the activation of the network for one time step we receive . To adapt the weights only (5) has to be applied for all patterns for several times.

If a pattern is represented by a fixed point the difference in (5) will be zero and the weights stay unchanged. If a neuron changes its state after activation, the difference may be positive or negative. In the case of a negative difference the weights are weakened, while if it is positive the weights are strengthened. This is done until the difference for all patterns is zero, so that each pattern to be learned results in a fixed point.

After each weight change we keep the neurons connections symmetric and eliminate self connections . These two conditions guarantee our network to contain only fixed point attractors. This is because any state change will decrease the appropriate energy function. In further experiments we neglected these constraints and see that the behavior of the network does not change remarkably.

In the next section the algorithm is tested on patterns of the MNIST-database for handwritten numbers.

V The MNIST-database for handwritten numbers

To proof that learning with the algorithm is successful, we trained a network of neurons with patterns from the MNIST-database using (5). An excerpt of these learning patterns is shown in figure 3. The great overlap of the neurons activation from one handwritten number to another makes it impossible to train these patterns using the Hebbian learning rule.

Fig. 3: The upper images show the letters which are trained. The learning rate is . Each pattern is trained alternatively for iterations.

The handwritten numbers are trained for iterations with a learning rate . Figure 4 shows the reconstruction of the original images out of images with more than of noise added. The network is activated using the Euler-method with a step size of . Each -th time step an image is computed. The converged images have a mean error rate about .

Fig. 4: After training, we start with a noisy pixel image, where pixel at a random position are inverted. The images below are computed using the Euler-method with a time step size of in iterations. Only on each -th iteration an image is presented.

Vi Conclusion

The recurrent Bistable Gradient Network using Hebb’s learning rule for computing the interconnection weights of the network leads to difficulties especially in strongly overlapping patterns. To overcome these problems we applied Hintons first order Contrastive Divergence algorithm to train the weights. The results were successfully tested with patterns from the MNIST-database for handwritten letters. Testing an image reconstruction with noisy images of more than of noise leads to a near perfect reconstruction with a mean error rate of about . In our future research we will try to improve learning by taking higher orders of the algorithm into account.


The authors would like to thank Tobias Becht for helpful comments.


  • [1] V. Chinarov, M. Menzinger, Computational dynamics of gradient bistable networks, BioSystems 55, p 137-142, 2000
  • [2] V. Chinarov, M. Menzinger, Bistable Gradient Neural Networks: Their Computational Properties, IWANN2001 Conference in Granada, Spain, proceedings pp 333-338, Springer, 2001
  • [3] V. Chinarov, M. Menzinger, Reconstruction of noisy patterns by bistable gradient neural like networks, BioSystems 68, p 147-153, 2003
  • [4] G. Hinton, Hinton, P. Dayan,, B. Frey., and R. Neal. The wake-sleep algorithm for self-organizing neural networks. Science, 268, 1158–1161, 1995
  • [5] G. E. Hinton, T. J. Sejnowski, D. E. Rumelhart, J. L. McClelland, PDP Research Group, Learning and Relearning in Boltzmann Machines, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. Cambridge: MIT Press: 282–317, 1986
  • [6] J. J. Hopfield, Neural networks and physical systems with emergent collective computational properties. Proc. Nat. Acad. Sci. (USA) 79, 2554-2558., 1982
  • [7] J. J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Nat. Acad. Sci. (USA) 81, 3088-3092.,1984
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description