Mapping Temporal Variables into the NeuCube for Improved Pattern Recognition, Predictive Modelling and Understanding of Stream Data
This paper proposes a new method for an optimized mapping of temporal variables, describing a temporal stream data, into the recently proposed NeuCube spiking neural network architecture. This optimized mapping extends the use of the NeuCube, which was initially designed for spatiotemporal brain data, to work on arbitrary stream data and to achieve a better accuracy of temporal pattern recognition, a better and earlier event prediction and a better understanding of complex temporal stream data through visualization of the NeuCube connectivity. The effect of the new mapping is demonstrated on three bench mark problems. The first one is early prediction of patient sleep stage event from temporal physiological data. The second one is pattern recognition of dynamic temporal patterns of traffic in the Bay Area of California and the last one is the Challenge 2012 contest data set. In all cases the use of the proposed mapping leads to an improved accuracy of pattern recognition and event prediction and a better understanding of the data when compared to traditional machine learning techniques or spiking neural network reservoirs with arbitrary mapping of the variables.
NeuCube architecture, spiking neural network, early event prediction, spatiotemporal data
Temporal data have been collected in various fields, such as brain science, ecology, geophysics, social sciences. Temporal data may contain complex temporal patterns that would need to be learned and extracted in a computational model. In some cases, the variables describing the temporal data have spatial attributes, e.g. the 3D location of the channels in EEG data [] and the patterns that need to be learned become spatiotemporal. Learning dynamic patterns from temporal and spatiotemporal data is challenging task as the temporal features may manifest complex interaction that also may change dynamically over time. The âtime-windowsâ of important temporal or spatiotemporal patterns may change over time which is seldom covered by traditional machine learning methods such as regression techniques, support vector machines (SVM) or multi-layer perceptrons (MLP).
Many recurrent models have been proposed to learn spatiotemporal relationship from signal, such as the recursive self-organizing network models [, ], recurrent neural network [, ]. Recently NeuCube [, ] has been proposed to capture time and space characteristics of spatiotemporal brain data in a spiking neural network (SNN) architecture. The NeuCube architecture consists of: input data encoding module, that encodes multivariable continuous temporal stream data into spike trains; a 3D recurrent SNN cube (SNNcube) where input data is mapped and learned in an unsupervised mode; a SNN classifier that learns in a supervised mode to classify spatiotemporal patterns of the SNNcube activities that represent patterns from the input data. The effectiveness and superiority of NeuCube to model brain data (such as EEG) has been demonstrated in previous works. [] studies the modelling ability of NeuCube for electroencephalography (EEG) spatiotemporal data measuring complex cognitive brain processes. In [] NeuCube is used for modelling and recognition of complex EEG spatiotemporal data related to both physical and intentional (imagined) movements. [, ] presents the results of using NeuCube to classify/cluster fMRI data and [] uses NeuCube for the analysis of EEG data recorded from a person affected by Absence Epileptic. [] studies how NeuCube can help understand brain functional changes in a better way.
While in these studies, NeuCube is used to model brain data and the features are manually mapped into the system according to the spatial location where the signals are sampled from, in this paper we present a fully automatic input mapping method based on graph matching technique that enables NeuCube to model any spatiotemporal data and leads to better spatiotemporal pattern recognition, early event prediction and model visualization and understanding.
The main contributions of this paper are as follows:
We propose a new graph matching algorithm to optimize the mapping of the input features (variables) of arbitrary temporal stream data into a 3D SNNcube with the goal to achieve a better pattern recognition accuracy, an earlier event prediction from temporal data and a better understanding of the data through model visualization.
We proposed a graph based semi-supervised learning algorithm to automatically analyse the neuronal cluster structure inside the trained SNNcube and develop various dynamic functions for visualization of the neuronal activity states and synaptic evolving progress during learning process and we prove the convergence of the new algorithm.
We present two algorithms, spike density correlation and maximum spike coincidence, for spike trains similarity measure that can be used in the graph matching algorithm for two commonly encountered types of spike trains: bipolar spike trains and unipolar spike trains.
We introduce an Adaptive Threshold Based (ATB) encoding algorithm by using mean and standard deviation information of the signal gradient to calculate its threshold and thus make ATB self-adaptive to signal changes.
We develop an optimization component using genetic algorithm that can automatically optimize all NeuCube parameters in once and thus enable users to find optimal parameters’ value easily and achieve good results.
The remainder of this paper is organized as follows. Section 2 describes the NeuCube architecture. Section 3 proposes a new method for the optimization of the mapping of the input temporal features into the SNNcube. Experiments on three benchmark data sets are reported in Section 4, followed by conclusions and discussions in Section 5.
2 The NeuCube Architecture
The main parts of NeuCube are: input encoding module; a three-dimensional recurrent SNN reservoir/cube (SNNcube); an evolving SNN classifier. Fig. 1 shows the block diagram of the NeuCube architecture. The SNNcube is a scalable module. Its size is controlled by three parameters: and, representing the neuron number along , and axes, respectively. The total number of neurons in the SNNcube is . As a first implementation here, we use a probabilistic leaky integrate and fire model (LIFM) [].
The encoding module converts continuous data streams into discrete spike trains, suitable to be processed in the SNNcube, because spike neural networks can only process discrete spike trains. After encoding, the information contained in original continuous signal is transformed into forms of spikes.
The NeuCube is trained in a two-stage learning process. The first stage is unsupervised learning that makes the SNNcube learn spatiotemporal relations from the input data by adjusting the connection weights in SNNcube. The second stage is supervised learning that aims at learning the class information associated with each training spatiotemporal sample.
To be more specific, the unsupervised learning stage is intended to encode âhiddenâ spatiotemporal relationships from the input data into neuronal connection weights. According to the Hebbian learning rule, if the interaction between two neurons is persistent, then the connection between them will be strengthened. In particular, we train the SNNcube using spike-timing dependent synaptic plasticity (STDP) learning rule []: if neuron fires before neuron , then the connection weight from neuron to neuron will increase and, if the spikes are in a reverse order, the connection from neuron to neuron will decrease . This ensures that the time difference in the input spiking trains, which encode the temporal patterns in the original input signals, will be captured by the neuron firing state and the unsymmetrical connection weights in the reservoir.
The second training stage is to train an output classifier using class label information associated with the training temporal samples. The dynamic evolving Spike Neural Networks (deSNN) [] is employed here as an output classifier, because deSNN is computationally efficient and emphasizes the importance of both first spikes arriving at the neuronal inputs (observed in biological systems []) and the following spikes (which in some stream data are more informative).
Once a NeuCube model is trained, all connection weights in the SNNcube and in the output classification layer are established. These connections and weights can change based on further training (adaptation), because the evolvable characteristic of the architecture.
3 A Method for an Optimal Mapping of Temporal Input Variables into a 3D SNNcube based on the Graph Matching Algorithm
3.1 A method for mapping input temporal variables into a 3D SNNcube based on graph matching optimisation algorithm
Given a particular spatiotemporal data set, it is important to optimise the mapping of the data into the 3D SNNcube for an optimal learning a better understanding of the spatiotemporal patterns in the data. For some spatiotemporal data, such as brain EEG, there is prior information about the location of each channel (input feature) and this information can be readily utilized for mapping the EEG temporal signal into the SNNcube []. But for other common applications such as climate temporal data, we do not have such spatial mapping information. And the way temporal data is mapped into the SNNcube would significantly impact the results. Here we introduce a new method to map temporal variables into the SNNcube for a better pattern recognition, a better and earlier event prediction and a better visualisation of the model to explain the data.
Suppose there are temporal samples in the data set, measured through temporal variables and the observed time length of each sample is . We first choose randomly input neurons from the SNNcube. Then we map the variables into the SNNcube following the principle: input variables/features that, after the input data transformation into spike trains, represent highly correlated spike trains are mapped to nearby input neurons. Because high correlation indicates that the variables are likely to be more time dependent with each other, and this relationship should also be reflected in the connectivity of the 3D SNNcube. Spatially close neurons in the SNNcube will capture in their connections more temporal interactions between the input variables mapped into these neurons. The principle of mapping similar input vectors into topologically close neurons is known from the SOM [], but in SOM these are static vectors and the similarity is measured by the Euclidean distance. Now, we address the problem of mapping temporal sequences, rather than static vectors, into a 3D SNN.
Specifically, we construct two weighted graphs: the input neuron similarity graph (NSG) and the time series/signals similarity graph (SSG). In the NSG, the input neuronsâ spatial 3D coordinates are denoted by and the graph edges are determined in the following way: each input neuron is connected to its nearest input neurons and the edges are weighted by the inverse of the Euclidean distance between them.
In the SSG, we denote the spike train corresponding to the feature as
where is the Dirac delta function. The set of spike trains corresponding to all the features is and it forms the graph vertex set. The graph edges are constructed in this way: each spike density function is connected to its highest correlated neighbours and the edges are weighted by the similarity between them. Measuring the similarity between spike trains is an important problem in neural science and have been studied for a long history [, ]. Here we propose two simple but effective spike trains similarity measuring algorithms for two most commonly encountered spike train types, respectively. The first one is spike density correlation, which is suitable for measuring unipolar spike trains with two state (firing or unfiring). This kind of spike train is commonly observed in glutamatergic neurons and produced by some spike encoding algorithms such as Bens Spiker Algorithm (BSA) []. The other one is maximum spike coincidence, which is suitable for measuring bipolar spike trains with three state (excitatory, rest or inhibitory). This kind of spike train is commonly produced by serotonergic neurons, as well as some encoding algorithms such as address event representation [].
spike density correlation
Given a unipolar spike train , we first use kernel density estimation method to estimate its spike density function by
where is the function convolution operator and is a kernel function, with kernel bandwith . Fig. 2 displays the spike density estimation result.
Thereafter, the similarity between spike train and is defined as the Pearson’s linear correlation coefficient
where and are the mean and the standard deviation of , respectively. is the expectation. Here and are the random variables corresponding to and , respectively.
maximum spike coincidence
Given two bipolar spike trains and , we define the similarity between them as the maximal number of spikes that coincide while one spike train moves over the other spike train
where is the result of translating with time unit, i.e.
Note that while is negative, is translated backward and while is positive, is translated forward, as illustrated in Fig. 3. is the indicator function, equal to 1 if and 0 elsewhere.
After construction of NSG and SSG graphs, we adopt the graph matching technique, which is a powerful tool to solve mapping problems and has been widely used in computer vision and pattern recognition, to determine an optimal mapping between the two weighted graphs under the mapping rule. For these two graphs, we can compute their adjacency matrices, written as and . The graph matching method is aimed to find out a permutation matrix that minimizes the following objective function:
where denotes the Frobenius matrix norm. Solving this problem exactly is known to be an NP hard problem due to its combinatorial optimization property. Many algorithms have been proposed to find an approximated solution.
Among these algorithms the Factor Graph Matching (FGM) algorithm [] has been demonstrated to produce state-of-art results. So here we utilize the FGM algorithm to solve the SSG to NSG matching problem in equation (6). Suppose in NSG the sum of graph edge weights of an vertex, say vertex , to all other vertices is , and, similarly, in SSG the sum of graph edge weights of vertex to all other vertices is , then the difference between the normalized weight sum and reflects the similarity of the positions of and in the corresponding graphs, where (and similarly ) is the normalized weight sum of vertex that is obtained by dividing by the largest weight sum in graph NSG. So we define the vertex similarity as:
and the edge similarity:
where: , are graph edge weights in NSG and SSG, respectively; and are parameters to control the affinity between neurons and edges, respectively. Table 1 gives the algorithm procedure.
|1||Input spike trains and input mapping neurons|
|2||Construct graphs NSG using Euclidean distance|
|3||Construct graphs SSG using equation (3) or (4)|
|4||Compute vertex similarity using equation (7)|
|5||Compute edge similarity using equation (8)|
|6||Solve problem (6) by Factor Graph Matching|
Fig. 4 shows matching results for an exemplar temporal data represented by 14 features (this is actually the case study data from experiment 1, presented in section 4 in []). The left graph is the input NSG and the right graph is SSG. We can see that after matching, highly correlated features are mapped to nearby input neurons.
3.2 The optimal input mapping enables early and accurate event prediction from temporal data
In many applications, such as pest population outbreak prevention, natural disaster warning and financial crisis forecasting, it is important to know the risk of event occurrence as early as possible in order to take some actions to prevent it or make adjustment in time, rather than waiting for the whole pattern of temporal data to be entered into the model. The main challenge in the early event prediction task is that the time length of the recall samples should be smaller than that of training samples, because training samples are collected in the past and all the data are already known. This is illustrated in Fig. 5.
Traditional data modeling methods, such as SVM, kNN, and MLP, are no longer applicable for early event prediction task, because they require prediction sample and training samples to have same time length. Furthermore, traditional methods also cannot model both the close interaction and interrelationship between time and space components of the spatiotemporal data.
In contrast, NeuCube with the proposed new mapping method would enable a better early event prediction, as the connectivity of a trained SNNcube would reflect on the temporal relationship in the temporal data. As a result, if part of a new sample is presented, this would fire a chain of activities in the SNNcube based on the established connections. To be more specific, let us consider mapping four features A, B, C and D into a two dimensional SNNcube
During testing phase, once similar temporal pattern appears, early firing is triggered and then propagated in a neuron population (hence a firing chain of a neuron cluster) before full data are presented, i.e. the neurons and their connections form a functional cluster which becomes selective to the spatiotemporal pattern and tend to fire at the start of the signal pattern. As demonstrated by [, ], LIF neurons equipped with STDP learning rule can learn unsupervisedly an arbitrary spatiotemporal pattern embedded in complex background spike trains and when its preferred spike sequence appears, the neuron can emit a spike very early at the start of the pattern. Biologically, the population firing in the SNNcube chain-fire phenomenon was observed in zebra finches HVC area to control the precise temporal structure in birdsong [], in which the neural activity is propagated in chain network to form the basic clock of the song rhythm. In the SNNcube we have also observed a similar chain-fire phenomenon while spike trains are presented to the network. These features endow the SNNcube with a powerful potential to encode complex spatiotemporal patterns contained in the input spike trains used for training and to respond early to the presence of a specific spatiotemporal pattern in a recall/prediction mode. This is especially important when a result is the consequence of several highly correlated key factors, as further demonstrated by the study of Stroke occurrence []
In contrast, Fig. 6 (b) and (c) display the non-optimal mapping, where uncorrelated features are mapped together. The neurons around uncorrelated features can hardly learn the original signal temporal relationship, because nearby neurons are presented with less even non-correlated signals. The interactions between these nearby neurons can be much lower and do not capture any meaningful temporal correlation. As a result, while new samples are partially presented during testing, similar temporal pattern as contained in training samples cannot be detected correctly, as will be demonstrated by experimental results in Section IV.
3.3 The optimal input mapping enables better network structure analysis and visualization and a better data understanding
After it is trained, the SNNcube has captured spatial and temporal relationships from the temporal data. It is helpful to know how the neurons in the SNNcube are related to the input features and what patterns the SNNcube have learned from the input signals. This information is important to understand the model and the temporal data set []. While in previous work the neuronal clusters are manually labeled according to the synaptic weights and this costs plenty of time and is less accurate, we propose the following algorithm to unveil SNNcube structure through automatically analysing neuronal clusters in the SNNcube.
The neurons in the cube are indexed from 1 to according to ascendent order of their , and coordinates. We mark the input neurons as the information source in the SNNcube and define a source matrix as follows: if neuron is the input neuron of variable , then the entry (, ) of is 1, otherwise is 0. The affinity matrix of the SNNcube is defined in the following way: the entry (, ) of is the total spike amount transmitted between neuron and neuron . Note that more spike means stronger interaction and more information shared between the neurons. Then the ratio of information for each neuron received from the input information sources is computed by the following algorithm:
Evaluate to converge
is the identity matrix and is a diagonal matrix defining the propagation rates on different directions. and are also diagonal matrices with and , respectively. In the first iteration . Step 1 computes the normalized adjacency matrix which can fully encode the connection information of SNNcube by a square matrix, according to network theory []. Step 2 propagates information from input neurons to other neurons in an iterative way. The main principle behind the algorithm is that information (or activation) is propagated in the network and the propagation process is dominated by the network structure. Imagining that each input neuron is a source and has some kind of information (i.e. electricity, fluid matters etc.), but the information type possessed by different input neuron is different (i.e. different ions or different liquid). The information is propagated from sources to other neurons in each iteration and the propagation amount is proportional to the connection weight between each pair of neurons. At the beginning only the input neurons (the sources) have the information and other neurons don’t have any information. As the propagation process continues, all neurons receive some information from one or more input neurons and the information amount corresponding to different input neurons is different. The amount of a particular type of information (from the corresponding input neuron) received by a neuron reflects the intimation relationship of this neuron with that input source neuron. The more amount of information it received, the closer it is with that input neuron. Finally according to network theory when the whole network reaches to a stable state, entry represents the relative information amount neuron received from input neuron [, ]. Finally step 3 normalizes the information amount corresponding to different input neurons and this facilitates to classify neurons into different input clusters by , where is the number of input neuron and is the number of total neurons in SNNcube. Matrix is the basis of extracting neuronal clusters.
The propagation factor matrix controls the convergence of the propagation algorithm and the amount of the information being propagated to other neurons in the SNNcube. Here is computed by , where is the mean affinity value between a neuron and its 8 neighboring neurons, so that the information propagation between strongly connected neurons is large while the information propagated through weakly connected neurons is small.
We now show the propagation algorithm converges. Because is a diagonal matrix and , the spectral radius of is . After the iteration, we have
where and , representing the information amount acquired from sources and neighborhood, respectively. Let be the random walk matrix on the graph. Then for the spectral radius of we have . Because , is similar to , so . Since the spectral radius , and , so . Therefore , and
It is worth mentioning that Step 2 can also be replaced by equation (10). But since the matrix is highly sparse, it will be much more efficient to evaluate the equation in Step 2 than to invert a matrix in equation (10), regarding to both space and time comsumption
Fig. 7 left plot shows the network structure after unsupervised training for the study of Aphids data presented in our previous work []. The big solid dots represent input neurons and other neurons are labeled in the same intensity as the input neuron from which it receives most spikes. The unconnected dots mean no spike arrived at that neuron. On Fig. 7 right, the top pane is spike number for each variable after encoding and the bottom pane is the neuron number belonging to each input variable cluster. From this figure we can see the consistency between the input signal spike train and the SNNcube structure. Note that variable 11 (solar radiation) is emphasized in the SNNCube that suggests a greater impact of the solar radiation on the aphid number. This was observed also in a other work []. This is very different from traditional methods such SVM which have been used for same tasks but without offering facilities to trace the learning processes for the sake of data understanding. It is worth mentioning that the spatial pattern of the input mapping, e.g. in Fig. 4, is embedded in the source matrix and has a direct influence on the visualization results and the interpretation of the results. These mapping and visualization are demonstrated to be useful for high level cause - results interpretation in some studied, such as EEG signal study for identifying differences between people with opiate addiction and those undertaking substitution treatment for opiate addiction [] and the fMRI data study for brain activity while subjects are presented with different pictures
4 Experimental results on three case study problems
In this section we present three case studies to demonstrate the validity of the proposed architecture for both early even prediction based on temporal data and spatiotemporal pattern recognition. The first case study is conducted on a benchmark physiological data set SantaFe
While in [, , ] address event representation (AER) encoding with fixed threshold is used and the threshold has to be tuned manually each time for each feature individually, here we introduce a self-adaptive bi-directional thresholding method, Adaptive Threshold Based (ATB) encoding algorithm. ATB is simply self-adaptive to all features: for an input time series/signal , we calculate the mean and standard deviation of the gradient , then the threshold is set to , where is a parameter controlling the spiking rate after encoding. After this, we obtain a âpositiveâ spike train which encodes the segments in the time series with increasing signal and a negative spike train, which encodes the segments of decreasing signal.
Because NeuCube system is a complex system and contains many tunable parameters, manually tuning these parameters might be time consumption for a non-experienced user. We have implemented a Genetic Algorithm (GA) optimization component to optimize all the system parameters at once in a -fold cross validation way with respect to a data set. The objective function of the GA is the overall validation error rate. We use the optimization component to simultaneously optimize 7 key parameters of the NeuCube system in Table 2. After optimization, GA outputs the optimal parameters’ value that attained the smallest cross validation error rate.
|Spike Threshold||Threshold of ATB encoding||[0.1, 0.9]|
|Firing Threshold||LIF neuron firing threshold||[0.01, 0.8]|
|STDP Rate||The learning rate in STDP algorithm||[0.001, 0.5]|
|Refractory Time||Refractory time after firing||[2, 9]|
|Mod||Modulation factor in []||[0.00001,0.5]|
|Drift||Synaptic drift parameter in []||[0.1, 0.95]|
|Number of nearest neighbors in deSNN||[1, 10]|
Baseline algorithms consist of: Multiply Linear Regression (MLR) is simple, extensively studied and can achieve relative stable result in many applications; Support Vector Machine (SVM) is probably the most widely used and has been demonstrated to be successful in various applications; Multilayer Perceptron (MLP) is a classic neural network model and has advantages in processing multivariate data; Nearest Neighbors (NN) and weighted Nearest Neighbors (wNN) [] are among the most popular classification algorithms [] and especially while processing high-dimension data; finally, Global Alignment Kernel (GAK) [] is a recently developed method to process time series and can achieve the state-of-art results in many applications. So there are 6 baseline algorithms
Because baseline algorithms process static vectors, we prepared their data set in the following way. We concatenated each temporal variables one after another to form a long feature vector for each sample, as shown in Fig. 8.
Since baseline algorithms have very few tunable parameters ( one or two parameters), the parameters are tuned in a grid search method. This process is simple and quick, usually just a few steps to locate the optimal value of the parameters.
4.1 Predict events by classifying time series data based on physiologic signals
In this case study we conduct experiments for complex physiologic time series classification on SantaFe dataset. The dataset contains three physiologic features: heart rate (HR), respiration force (RF, chest volume) and blood oxygen (BO). Each of the feature was measured twice in one second and the measurement lasted 5 hours. The aim of this case study is to predict the sleep stage of a patient by using the collected physiologic signals. There are three sleeping stages: awake, sleep and waking, and the sleep stages are read from EEG signals by neurologist.
The original dataset contains 34000 data points for each feature (approximate 4.7 hours long) and 103 sleep stage labels, and the sleep stage labeling interval varies from 60 seconds to 4600 seconds. Each label indicates a different sleep stage happening at that moment, i.e. whenever the patient transits from one sleep stage to another different sleep stage, a corresponding label is assigned at that time. So the signals recorded during the labeling interval reflect the happening process of the next sleep stage. Consequently by classifying these temporal data into different classes, we actually predict the next sleep stage of the patient. In this case study we only use the labeling data with 60 seconds time interval to make sure the feature length is same for every sample for fair comparison, because traditional methods (such as SVM) cannot process data which have uneven feature length between samples. In this case we use a SNNcube
Fig. 9 left pane gives the mapping result found by the proposed input mapping algorithm. It can be seen that the respiration force is mapped between blood oxygen and heart rate, but it is much closer to heart rate. This indicates the respiration signal more correlates with heart rate than blood oxygen. This result is consistent with the actual observation described in the three major research questions, which are raised on the dataset website
Fig. 10 shows four consecutive snapshots of the instantaneous firing state of the neurons during training process. Squares are input neurons (Purple means that a positive spike is input at this moment. Cyan means negative spike and yellow means no spike). Red dots are firing neurons and black dots are unfiring neurons. Note that the firing neuronal cluster grows from small to large and the propagation of the spike states (hence the information the spike carried) has a trajectory in the SNNcube. While the meaning of the firing pattern and spreading trajectory, how they reveal the significance of the SNNcube modelling ability are questions to be answered, the study of these is out of the scope of this paper and we leave them in our future papers.
Figure 11 shows the evolving process of the connections during different time in the training stage. From this we can see how the connections between different neurons are created dynamically as the spike trains input to the reservoir. At the beginning, the connections in the reservoir are sparse and nearly randomly distributed 11(a). As the training process continues, some new connections appear around each input neuron, but blood oxygen input neuron does not have interaction with the other two input neurons 11(b). At the last half training process, more connections are created and the interaction between each pair of input neurons increases, especially for blood oxygen 11(c)-(d). The synaptic weight evolving process is tightly connected with the physiological signal trend in Fig. 12, from which we can see clearly the correlation between blood oxygen and the other two signals in the last half signal segment. It is worth mentioning that while previous method to study the SantaFe dataset mainly focused on analysis of the nonlinear dynamics of the respiration, heart rate and blood pressure [, ], they can hardly provide such an intuitive and direct way to observe the dynamics of the signals, and hence the interaction between the physiological signals during a sleep stage.
We designed two experiments on this data set to show the predictive ability of NeuCube with different mapping and how early the model can predict sleep stage using part of the data. In these experiments, we trained NeuCubes of both random mapping and graph mapping using 100% of the time length samples (60 seconds), but temporal data of only 75% and 50% of the time length of the samples was used to predict the sleep stage.
Experimental results of early event prediction are in Table 3. In order to comparing the impact of testing time length upon accuracy, we also include the result of full time length in the second column. NeuCube(R) is random mapping and NeuCube(G) is graph mapping.
|Accuracy of each training and testing time length (sec)|
From these results we can see that NeuCube(G) can perform better for early event prediction. In contrast the performance of NeuCube(R) for early prediction is much lower, because SNNcube with randomly mapping fails to acquire the temporal pattern in the data for early firing during testing stage. Furthermore, a realistic early event prediction should be that as the time length of observed data increases, the prediction accuracy will also increase. But from table 3 we can see that as the time length of training data increases, traditional modeling methods do not necessarily produce better results (some even worsen), because they cannot model the whole spatiotemporal relationship in the prediction task. They can only model a certain time segment. Temporal data typically exhibits complex interrelationship and interaction among different features and traditional methods are proposed to process static vector data and thus they can hardly model the complex relationship contained in temporal data. The GAK and wNN can produce better results, but their accuracies are still much lower than that of NeuCube. Because NeuCube is trained on the whole spatiotemporal relationship in the data, even a small amount of input data can trigger spiking activities in the SNNcube that will correspond to the learned full temporal pattern resulting in a better prediction.
4.2 Spatiotemporal pattern recognition based on spatiotemporal data
In this case study we consider a benchmark traffic status classification problem of spatiotemporal data from the PEMS database. In freeways, vehicle flow is monitored by traffic sensors with fixed spatial locations and the data collected by these sensors exhibit spatial and temporal characteristics. Discovering spatial-temporal patterns can be very meaningful for traffic management and a city traffic plan.
The study area is San Francisco bay area which is shown in Fig. 13. There are thousands of sensors distributed over the road network and the sensors distribution is indicated in right plot of Fig 13, in which each black dot represents a monitoring sensor. These sensors monitor lane occupation rate 24-hourly every day. Measurements are taken every 10 minutes and normalized between [0 1], where 0 means no car occupation and 1 means full occupation of the lane in the monitoring region. So there are 144 (24x6) data points per day. In this case study we collect traffic data over a period of 15 months and thus, after removing public holidays and sensor maintenance days, there are 440 days to be classified.
We did some preprocessing of the data: (1) we removed the data of outlier sensors from the data set, e.g. sensors producing always 1 or 0 in 24 hours and sensors flip from 0 to 1 or 1 to 0 suddenly; (2) nearby sensors that produce almost the same data sequence are combined into one sensor; (3) the total occupation rate of each sensor is calculated as a sum of all measurements over 440 days; (4) 50 sensors corresponding to the largest occupation rate are selected as the final features (variables) to represent the data set. Fig. 14 shows samples of the spatial and temporal distribution of the traffic status of the road network from Monday to Thursday.
In this case study we construct a reservoir. Fig. 15 shows the final input mapping result found by the graph matching algorithm. We can see the similar connection pattern between the two graphs.
Fig. 16 shows the overall neuron firing state matrix of the SNNcube corresponding to the four data samples, from top to bottom Monday till Thursday. In each figure, the horizontal axis is neuron ID and the vertical axis is time tick, from 0 at the top to 144 at the bottom. One should note that while in the plot it seems the firing state matrix is very dense, it is actually very sparse. Take the Thursday (the fourth one) as an example. There are 20416 firing entries and the firing state matrix size is 486000 (, where 3375 is the total neuron number in the SNNcube), so the firing entries is about 4.20%. We can see that these sparse firing matrices have different patterns related to the input data. Meanwhile, since the size of the SNNcube can be specified according to the problem, the SNNcube with highly sparse firing rate has a great power to encode input signals and patterns, and thus it can potentially model any complex spatial and temporal relationship.
After obtaining the firing state matrix of each sample, we transmit them to the output deSNN layer [], which creates an output neuron for each sample and connects the output neuron to each neuron in the SNNcube. Then the weights of the connections are established by Rank Order (RO) learning rule []. For a testing sample, its class label is determined by comparing its output neuron weights with those training samples’ output neuron weights, which are established at training stage, using the weighted NN rule []. In Table 4, we compared the 2-fold cross validation experimental result between NeuCube and baseline algorithms: MLR, SVM, MLP, NN, wNN and Global Alignment Kernels (GAK) [].
The parameters’ values used in the classical machine learning methods are: - degree of polynomial kernel; - number of neurons in the hidden layer of MLP; - number of nearest neighbors in NN and wNN; - Gaussian kernel width. From these results we can see that the proposed NeuCube model achieves better classification results. This is because traditional machine learning methods are designed to process static vector data, and they have limited ability to model spatially correlated and temporally varied data. Meanwhile, MLR, SVM and MLP also show disadvantages while modeling high-dimensionql data (e.g. there are 7200 features in each sample of this case study). NN and wNN have been widely used in high-dimension data processing, such as document classification, because they can approximately reconstruct the underlying manifold whose dimension is usually much lower than its ambient space and thus they can produce better results than MLR and SVM. While the recently proposed GAK algorithm is shown to be very efficient and effective in processing time series, its performance is still lower than that of NeuCube. It should be mentioned that although in this case study classifying weekday traffic patterns does not have an obvious application situation now, this PEMS database has been established as a benchmark data set to test an algorithm’s ability for spatiotemporal data processing [, ].
4.3 Comparison of random input mapping and graph input mapping method for Physionet data classification
To fully evaluate the effect of input mapping, comparison experiments are conduced on both Challenge 2012 and SantaFe data sets. Different from SantaFe data set described in Section IV-A, Challenge 2012 is a more complex and challenging data set. The goal of Challenge 2012 data is to predict mortality of 4000 Intensive Care Unit (ICU) patients using their 37 physiological signals (see Table 5) which are measured at irregular time interval within total time length being 48 hours. The irregularity of the measurement time interval and the complexity of the physiological signal make this prediction task highly challenging.
We first apply the following pre-processing procedure to the Challenge 2012:
Patient selection. The original data contains records of 12000 ICU patients and the records are divided into two parts, training set A with 4000 patients’ records and testing set B with 8000 patients’ records. Because testing set B doesn’t contain mortality state and thus we cannot calculate exactly its classification accuracy, we just conduct experiments on training set A. Some of the patients only have several measurements, for example due to an early in-hospital death. So we select the patients who have more than 40 records from set A and this results into a subset contains 3470 patients.
Feature selection. The original data contain some features which have too few measurements during the observation period to enable effective temporal encoding, such as the TropT and TropI features are only measured several times during 48 hours. We first calculate the overall measurement amount of every feature and then select those feature with a average measurements more than 5. As a result the 12 features are used in the experiments, as listed in Table 5.
|DiasABP||Invasive diastolic arterial blood pressure||mmHg|
|GCS||Glasgow Coma Score (3-15)||-|
|MAP||Invasive mean arterial blood pressure||mmHg|
|NIDiasABP||Non-invasive diastolic arterial blood pressure||mmHg|
|NIMAP||Non-invasive mean arterial blood pressure||mmHg|
|NISysABP||Non-invasive systolic arterial blood pressure||mmHg|
|SysABP||Invasive systolic arterial blood pressure||mmHg|
Figure 17 displays the mean and standard deviation of the first four selected features for the two classes. From these figures we can see that the physiological signals corresponding to the two classes have very similar tends and the difference is almost indistinguishable to human eyes [].
The comparison experiments are conducted in the following way. The GA optimization component is running with two modes: one is randomly mapping and the other mode is graph matching mapping. The optimized parameters are listed in Table 2. GA parameters are: generation number: 16, population size: 50, crossover function: scattered, crossover fraction: 0.2, selection function: roulette, elite count 5. The results Challenge 2012 and SantaFe are shown in Fig. 18, in which horizontal axis represents GA generation and vertical axis represents error rate (fitness value of GA).
From these results we can see that the graph matching can achieve lower error rate than randomly mapping. As the generation number increases, we can see more clearly decreasing trend for graph based mapping, while the trend of random mapping curve is not so obvious and has greater fluctuation. It has to be mentioned that while the fitness value of GA algorithm decreases monotonically for optimizing deterministic system, it usually has fluctuation and does not always decrease for stochastic system. Because in a stochastic system even the system parameters are same, different initial states may yield different results. This is the reason why the error rate (fitness value of GA) in our NeuCube system dose not monotonically decrease.
To further compare the performance, Fig. 19 displays the mean error rate and standard deviation of error rate of the 10 times running on Challenge 2012. From these results we can see the performance of graph based mapping method is more effective and stable, regarding to the lower mean error rate and the smaller standard deviation.
Since Challenge 2012 is a contest data set that highly challenges traditional classification algorithms, it might be interesting to compare the performance of NeuCbe with traditional methods. The results are shown in Table 6. Note the highest accuracy of baseline algorithms is 51.6% from GAK, but the NeuCube can achieve 76.95%, a significant improvement to state-of-art results. The optimal values of the parameters given by GA that attain this result are: Spike Threshold - 0.800, Firing Threshold - 0.500, STDP Rate - 0.073, Refractory Time - 6, Mod - 0.687, Drift - 0.095.
In this paper we proposed a new mapping method to map spatiotemporal input variables into a 3D spiking neural network architecture called NeuCube that enable NeuCube models any spatiotemporal data. The weighted undirect graph matching technique is adopted here so that similar input variables based on their temporal similarity are mapped into spatially closer neurons. The closer the neurons in the SNNcube are, the more temporal relationships they learn from data. This automatic optimal mapping algorithm greatly reduces the work load for a user to find out the optimal mapping by hand in previous system and can yield better results regarding to spatiotemporal pattern recognition and early event prediction. A comparison study of the proposed mapping and randomly mapping has been conducted on a popular contest physiological data set Challenge 2012 and the results demonstrated the superiority of the proposed method over randomly mapping method.
To have a better understanding of the SNNcube model and the data modelled in it, we also proposed an algorithm based on network activation spreading to automatically reveal neuronal clusters, which in previous system were also determined by hand according to the learnt connection weights [] and thus costed lots of time. The algorithm can divide the SNNcube into different neuronal clusters based on either the spike amount communication or the connection weight between neurons. These neuronal clusters can help us to better understand the learning mechanism and the modelling ability of the SNNCube, as well as the data.
It should be mentioned that due to the scalability of the SNNcube, NeuCube is a deep architecture and it has the potential to model any complex spatiotemporal data. Our future work includes:
Improvement of the mapping method for complex temporal and spatial data, including spatiotemporal data with moving spatial coordinates;
Statistical analysis of the firing pattern and spreading trajectory in SNNcube with mining algorithms [] and how they reveal the significance of the SNNcube modelling ability;
More experiments on other ecological-, environmental-, financial- and business temporal data with large scale SNNcube to further explore the deep modelling and deep prediction capability of the proposed NeuCube on complex spatiotemporal data.
This work was funded by Education New Zealand and the Tripartite project between Auckland University of Technology, New Zealand, Shanghai Jiaotong University (East China) and Xinjiang University (West China) China. The work was initiated in the Knowledge Engineering and Discovery Research Institute (KEDRI) during the visit of Enmei Tu in KEDRI and continued in the Shanghai JiaoTong University (SJTU) in a collaborative way. KEDRI and SJTU partially funded this work. This work is partly supported by NSFC China (No: 61572315), and 973 Planï¼China (No. 2015CB856004).
Enmei Tu was born in Anhui, China. He received his B.Sc. degree and M.Sc. degree from University of Electronic Science and Technology of China (UESTC) in 2007 and 2010, respectively and PhD degree from the Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, China in 2014. He is now a research fellow in Roll-Royce@NTU Corporate Laboratory at Nanyang Technological University. His research interests are machine learning, computer vision and neural information processing.
Nikola Kasabov is a Fellow of the Royal Society of New Zealand, the New Zealand Computer Society and the Institute of Electrical and Electronic Engineers (IEEE). He is the founding Director and the Chief Scientist of the Knowledge Engineering and Discovery Research Centre (KEDRI) and Personal Chair of Knowledge Engineering in the School of Computing and Mathematical Sciences at AUT. His main interests are in the areas of: computational intelligence, neuro-computing, bioinformatics, neuroinformatics, speech and image processing, novel methods for data mining and knowledge discovery. He has published over 450 works in international journals and conferences, as well as books/chapters.
Jie Yang received a bachelorâs degree and a masterâs degree in Shanghai Jiao Tong University in 1985 and 1988, respectively. In 1994, he received Ph.D. in University of Hamburg, Germany. Now he is the Professor and Director of Institute of Image Processing and Pattern recognition in Shanghai Jiao Tong University. He is the principal investigator of more than 30 nation and ministry scientific research projects, including two national 973 research plan projects, three national 863 research plan projects, three national nature fundation projects, five international cooperative projects with France, Korea, Japan, New Zealand. He has published more than 5 hundreds of articles in national or international academic journals and conferences.
- Recall that the size of a three dimensional SNNcube is . Two dimensional SNNcube is a special case where .
- Note that is a highly sparse matrix which keeps constant during iterations and Step 2 usually converges after several iterations.
- The first three methods are implemented in NeuCom system: http://www.kedri.aut.ac.nz/areas-of-expertise/data-mining-and-decision-support-systems/neucom. NN and wNN are implemented based on MATLAB knnsearch function. GAK is available at: http://www.iip.ist.i.kyoto-u.ac.jp/member/cuturi/GA.html
- Currently we determine the reservoir size by trying several different ones. But as a general rule, the larger the reservoir is, the more powerful computational ability the reservoir has and thus the more complex patterns the model can learn and recognize. However, a larger reservoir consumes longer time and more memory space and needs more training data for training.
- How do the evolution of the heart rate, the respiration rate, and the blood oxygen concentration affect each other? (A correlation between breathing and the heart rate, called respiratory sinus arrhythmia, is almost always observed.)
- N. K. Kasabov, “Neucube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data,” Neural Networks, vol. 52, pp. 62–76, 2014.
- T. Voegtlin, “Recursive self-organizing maps,” Neural Networks, vol. 15, no. 8, pp. 979–991, 2002.
- B. Hammer, A. Micheli, A. Sperduti, and M. Strickert, “Recursive self-organizing network models,” Neural Networks, vol. 17, no. 8, pp. 1061–1085, 2004.
- T. Mikolov, S. Kombrink, L. Burget, J. H. Černockỳ, and S. Khudanpur, “Extensions of recurrent neural network language model,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011, pp. 5528–5531.
- I. Sutskever, J. Martens, and G. E. Hinton, “Generating text with recurrent neural networks,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 1017–1024.
- N. Kasabov, “Neucube evospike architecture for spatio-temporal modelling and pattern recognition of brain signals,” in Artificial Neural Networks in Pattern Recognition. Springer, 2012, pp. 225–243.
- N. Kasabov and E. Capecci, “Spiking neural network methodology for modelling, classification and understanding of eeg data measuring cognitive processes,” Information Sciences, vol. 294, pp. 565–575, 2015.
- D. Taylor, N. Scott, N. Kasabov, E. Capecci, E. Tu, N. Saywell, Y. Chen, J. Hu, and Z.-G. Hou, “Feasibility of neucube snn architecture for detecting motor execution and motor intention for use in bciapplications,” in Neural Networks (IJCNN), 2014 International Joint Conference on. IEEE, 2014, pp. 3221–3225.
- M. G. Doborjeh, E. Capecci, and N. Kasabov, “Classification and segmentation of fmri brain data with a neucube evolving spiking neural network model,” in Evolving and Autonomous Learning Systems (EALS), 2014 IEEE Symposium on. IEEE, 2014, pp. 73–80.
- M. G. Doborjeh and N. Kasabov, “Dynamic 3d clustering of spatio-temporal brain data in the neucube spiking neural network architecture on a case study of fmri data,” in Neural Information Processing. Springer, 2015, pp. 191–198.
- E. Capecci, J. I. Espinosa-Ramos, N. Mammone, N. Kasabov, J. Duun-Henriksen, T. W. Kjaer, M. Campolo, F. La Foresta, and F. C. Morabito, “Modelling absence epilepsy seizure data in the neucube evolving spiking neural network architecture,” in Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015, pp. 1–8.
- E. Capecci, N. Kasabov, and G. Y. Wang, “Analysis of connectivity in neucube spiking neural network models trained on eeg data for the understanding of functional changes in the brain: A case study on opiate dependence treatment,” Neural Networks, vol. 68, pp. 62–77, 2015.
- N. Kasabov, “To spike or not to spike: A probabilistic spiking neuron model,” Neural Networks, vol. 23, no. 1, pp. 16–19, 2010.
- S. Song, K. D. Miller, and L. F. Abbott, “Competitive hebbian learning through spike-timing-dependent synaptic plasticity,” Nature neuroscience, vol. 3, no. 9, pp. 919–926, 2000.
- N. Kasabov, K. Dhoble, N. Nuntalid, and G. Indiveri, “Dynamic evolving spiking neural networks for on-line spatio-and spectro-temporal pattern recognition,” Neural Networks, vol. 41, pp. 188–201, 2013.
- S. Thorpe and J. Gautrais, “Rank order coding,” Computational Neuroscience, pp. 113–118, 1998.
- T. Kohonen and S.-O. Maps, “Springer series in information sciences,” Self-organizing maps, vol. 30, 1995.
- M. C. van Rossum, “A novel spike distance,” Neural Computation, vol. 13, no. 4, pp. 751–763, 2001.
- J. Dauwels, F. Vialatte, T. Weber, and A. Cichocki, “On similarity measures for spike trains,” Advances in Neuro-Information Processing, pp. 177–185, 2009.
- B. Schrauwen and J. Van Campenhout, “Bsa, a fast and accurate spike train encoding scheme,” in Proceedings of the international joint conference on neural networks, vol. 4. IEEE Piscataway, NJ, 2003, pp. 2825–2830.
- K. Dhoble, N. Nuntalid, G. Indiveri, and N. Kasabov, “On-line spatiotemporal pattern recognition with evolving spiking neural networks utilising address event representation, rank oder-and temporal spike learning,” in In Proc. WCCI 2012. Citeseer, 2012.
- F. Zhou and F. De la Torre, “Factorized graph matching,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp. 127–134.
- E. Tu, N. Kasabov, M. Othman, Y. Li, S. Worner, J. Yang, and Z. Jia, “Neucube (st) for spatio-temporal data predictive modelling with a case study on ecological data,” in Neural Networks (IJCNN), 2014 International Joint Conference on. IEEE, 2014, pp. 638–645.
- T. Masquelier, R. Guyonneau, and S. J. Thorpe, “Spike timing dependent plasticity finds the start of repeating patterns in continuous spike trains,” PloS one, vol. 3, no. 1, p. e1377, 2008.
- ——, “Competitive stdp-based spike pattern learning,” Neural computation, vol. 21, no. 5, pp. 1259–1276, 2009.
- Y. Ikegaya, G. Aaron, R. Cossart, D. Aronov, I. Lampl, D. Ferster, and R. Yuste, “Synfire chains and cortical songs: temporal modules of cortical activity,” Science, vol. 304, no. 5670, pp. 559–564, 2004.
- M. Othman, “Spatial-temporal data modelling and processing for personalised decision support,” Ph.D. dissertation, Auckland University of Technology, 2015.
- J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554–2558, 1982.
- F. R. Chung, Spectral graph theory. American Mathematical Soc., 1997, vol. 92.
- J. Shrager, T. Hogg, and B. A. Huberman, “Observation of phase transitions in spreading activation networks,” Science, vol. 236, no. 4805, pp. 1092–1094, 1987.
- D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, “Learning with local and global consistency,” Advances in neural information processing systems, vol. 16, no. 16, pp. 321–328, 2004.
- S. Worner, G. Lankin, S. Samarasinghe, D. Teulon, S. Zydenbos et al., “Improving prediction of aphid flights by temporal analysis of input data for an artificial neural network,” New Zealand Plant Protection, pp. 312–316, 2002.
- N. Kasabov, V. Feigin, Z.-G. Hou, Y. Chen, L. Liang, R. Krishnamurthi, M. Othman, and P. Parmar, “Evolving spiking neural networks for personalised modelling, classification and prediction of spatio-temporal patterns with a case study on stroke,” Neurocomputing, vol. 134, pp. 269–279, 2014.
- K. Hechenbichler and K. Schliep, “Weighted k-nearest-neighbor techniques and ordinal classification,” 2004.
- X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008.
- M. Cuturi, “Fast global alignment kernels,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 929–936.
- M. Korürek and A. Nizam, “Clustering mit–bih arrhythmias with ant colony optimization using time domain and pca compressed wavelet coefficients,” Digital Signal Processing, vol. 20, no. 4, pp. 1050–1060, 2010.
- M. Engin, “Ecg beat classification using neuro-fuzzy network,” Pattern Recognition Letters, vol. 25, no. 15, pp. 1715–1722, 2004.
- Y. Ichimaru and G. Moody, “Development of polysomnographic database on cd-rom,” Psychiatry and Clinical Neurosciences, vol. 53, no. 2, pp. 175–177, 1999.
- A. Skabardonis, P. Varaiya, and K. Petty, “Measuring recurrent and nonrecurrent traffic congestion,” Transportation Research Record: Journal of the Transportation Research Board, no. 1856, pp. 118–124, 2003.
- M. T. Bahadori, D. Kale, Y. Fan, and Y. Liu, “Functional subspace clustering with application to time series,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 228–237.
- E. Tu, J. Yang, J. Fang, Z. Jia, and N. Kasabov, “An experimental comparison of semi-supervised learning algorithms for multispectral image classification,” Photogrammetric Engineering & Remote Sensing, vol. 79, no. 4, pp. 347–357, 2013.
- S. Li, Z. Wang, G. Zhou, and S. Y. M. Lee, “Semi-supervised learning for imbalanced sentiment classification,” Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 3, p. 1826, 2011.
- E. Tu, J. Yang, N. Kasabov, and Y. Zhang, “Posterior distribution learning (pdl): A novel supervised learning framework using unlabeled samples to improve classification performance,” Neurocomputing, vol. 157, pp. 173–186, 2015.
- B. R. Gibson, T. T. Rogers, and X. Zhu, “Human semi-supervised learning,” Topics in cognitive science, vol. 5, no. 1, pp. 132–172, 2013.
- E. Tu, L. Cao, J. Yang, and N. Kasabov, “A novel graph-based k-means for nonlinear manifold clustering and representative selection,” Neurocomputing, vol. 143, pp. 109–122, 2014.