Unsupervised Anomaly Detection in Stream Data with Online Evolving Spiking Neural Networks
Abstract
In this work, we propose a novel OeSNNUAD (Online evolving Spiking Neural Networks for Unsupervised Anomaly Detection) approach for online anomaly detection in univariate time series data. Our approach is based on evolving Spiking Neural Networks (eSNN). Its distinctive feature is that the proposed eSNN architecture learns in the process of classifying input values to be anomalous or not. In fact, we offer an unsupervised learning method for eSNN, in which classification is carried out without earlier pretraining of the network with data with labeled anomalies. Unlike in a typical eSNN architecture, neurons in the output repository of our architecture are not divided into known a priori decision classes. Each output neuron is assigned its own output value, which is modified in the course of learning and classifying the incoming input values of time series data. To better adapt to the changing characteristic of the input data and to make their classification efficient, the number of output neurons is limited: the older neurons are replaced with new neurons whose output values and synapses’ weights are adjusted according to the current input values of the time series. The proposed OeSNNUAD approach was experimentally compared to the stateoftheart unsupervised methods and algorithms for anomaly detection in stream data. The experiments were carried out on Numenta Anomaly Benchmark and Yahoo Anomaly Datasets. According to the results of these experiments, our approach significantly outperforms other solutions provided in the literature in the case of Numenta Anomaly Benchmark. Also in the case of real data files category of Yahoo Anomaly Benchmark, OeSNNUAD outperforms other selected algorithms, whereas in the case of Yahoo Anomaly Benchmark synthetic data files, it provides competitive results to the results recently reported in the literature.
I Introduction
Unsupervised anomaly discovery in stream data is a research topic that has important practical applications. For example, an Internet system administrator may be interested in recognition of abnormally high activity on a web page caused by a hacker attack. An unexpected spiking usage of CPU unit in a computer system is another example of an anomalous behaviour that may require investigation. Correct detection and classification of such anomalies may enable optimization of the performance of the computer system. However, in many cases it is not easy to collect enough training data with labeled anomalies for supervised learning of an anomaly detector in order to use it later for identification of real anomalies in stream data. It is thus particularly important to design such anomalies detectors that can properly classify anomalies from data where none of the input values is labeled as being anomalous or not. Moreover, since the characteristic of the input data stream is typically varying, the designed anomaly detector should learn in an online mode, in which the classification of current input values adjusts the state of the detector for better anomaly detection in future input data.
In order to design an effective anomaly detection system, one may consider the adaptation of evolving Spiking Neural Networks (eSNN) to the task. eSNN are a subclass of Spiking Neural Networks (SNN), in which learning processes, neuronal communication and classification of data instances are based solely on spike exchange between neurons [30]. The architecture of an eSNN network consists of two layers of neurons: input and output. The aim of the input layer of neurons is to transform input data instances into spikes. Depending on the type of input data, the transformation can be carried out by means of the temporal encoding methods such as StepForward or ThresholdBased [28, 24] or Gaussian Receptive Fields [23]. The distinctive feature of an eSNN is the evolving repository of output neurons, which in the training phase of the network is updated with a new output neuron that is created for each new input data sample presented to eSNN [17, 18]. In particular, each newly created output neuron can be either added to the output repository or, based on the provided similarity threshold, merged with one of the neurons already existing in the repository.
Recently an extension OeSNN of eSNN for online classification of stream data was proposed in [23]. Contrary to the eSNN architecture, the size of the output neurons repository in OeSNN is limited: older neurons are removed from the repository and are replaced with new neurons. It was presented in [23], that OeSNN is able to make fast classification of input stream data, while preserving restrictive memory limits. Considering all the positive features of eSNN and OeSNN, in this article, we offer a novel OeSNNUAD (Online evolving Spiking Neural Networks for Unsupervised Anomaly Detection) approach for unsupervised anomaly detection in stream data.
Our main contributions presented in this article are as follows:

we introduce an unsupervised learning model of OeSNN, in which output neurons do not have classes assigned according to class labels present in the input stream data. Instead, each output neuron: 1) has an output value that is generated based on the current and past characteristic of data stream when the neuron is created and added to the repository, 2) output values of output neurons are corrected based on the classification of input values as being anomalous or not. Hence, to correctly detect anomalies in stream data the proposed OeSNNUAD approach does not need any input values labeled as anomalies or not.

as a part of the proposed OeSNNUAD architecture, we offer a new anomaly classification method, which classifies each input value as anomalous or not according to the comparison of the prediction error obtained for that value with the average and the standard deviation of the past prediction errors of a given time window.

we derive an important property of the eSNN neuronal model, which shows that the values of actual post synaptic potential thresholds of all output neurons are the same. This property eliminates the necessity of recalculation of these thresholds when output neurons of eSNN are updated in the course of the learning process and increases the speed of classification of input stream data.

we prove experimentally that the proposed approach is more effective in the unsupervised detection of anomalies in stream data of Numenta Anomaly Benchmark and Yahoo Anomaly Datasets than other stateoftheart approaches proposed in the literature.

eventually, we argue that the proposed anomaly detection architecture is able to make fast classification of input values and work in environments with restrictive memory limits.
The paper is structured as follows. In Section II we provide description of the related work. The proposed OeSNNUAD approach is offered in Section III, which also contains theoretical properties of OeSNNUAD and proofs. In Section IV, we discuss experimental evaluation. First we give an overview of the used datasets and characterize the experimental setup. Then, we provide the results of anomaly detection with the proposed approach and with the stateoftheart solutions. We conclude our work in Section V.
Ii Related Work
Unsupervised anomaly detection in time series data is an important task, which attracts attention of researchers and practicioners. A number of solutions of the task was offered in the literature to date. The stateoftheart algorithms for anomaly detection are:

Numenta and NumentaTM [2]  two slightly different algorithms that consist of the following modules: (i) a Hierarchical Temporal Memeory (HTM) network for predicting the current value of an input stream data, (ii) an error calculation module, and (iii) an anomaly likelihood calculation module, which classifies the input value as an anomaly or not based on the likelihood of the calculated error. Both algorithms are implemented in Python and available in the NAB set of algorithms. NumentaTM and Numenta differ in a way of the HTM implementation and its parameters initialization.

HTM JAVA [14]  a JAVA implementation of the Numenta algorithm.

Skyline [35]  an algorithm based on ensembles of several outliers’ detectors, such as e.g. Grubb’s test for outliers or a simple comparison of the current input value of time series against the deviation from the average of past values. In Skyline, a given input value of time series is classified as an anomaly if it is marked as anomalous by the majority of ensemble detectors. Skyline is implemented in Python and available as a part of the NAB benchmark.

TwitterADVec [19]  a method for anomaly detection based on the Seasonal Hybrid ESD (SHESD) algorithm [9]. For given time series values, the SHESD algorithm first calculates extreme Student deviates [31] of these values and then, based on a statistical test decides, which of these values should be marked as outliers. The TwitterADVec method is currently implemented as both an R language package and as a part of the NAB benchmark.

Yahoo EGADS (Extensible Generic Anomaly Detection System) [22]  an algorithm consisting of the following modules: (i) a timeseries modeling module, (ii) an anomaly detection module, and (iii) an alerting module. EGADS is able to discover three types of anomalies: outliers, sudden changepoints in values and anomalous subsequences of time series. To this end, the following three different anomaly detectors were implemented in EGADS: (i) time series decomposition and prediction for outliers’ detection, (ii) a comparison of values of current and past time windows for changepoint detection, and (iii) clustering and decomposition of time series for detection of anomalous subsequences.

DeepAnT [26]  a semisupervised deep learning method. DeepAnT operates with both Convolutional Neural Networks and LongShort Term Memories networks and consists of several modules, such as: a time series prediction module and anomaly detector module. Contrary to the approach proposed here, DeepAnT divides classified time series values into training and testing parts. First, DeepAnT learns from the training data, and then it is used for the classification of the test data. The advantage of our OeSNNUAD method over DeepAnT is its ability to learn the correct classification of anomalies based on the whole provided time series, rather than only from its training part.

Bayesian Changepoint [1]  an online algorithm for sudden changepoint detection in time series data by means of the Bayesian inference. This method is particularly suited to time series data in which it is possible to clearly separate partitions of values generated from different probability distributions. The algorithm is able to infer the most recent changepoint in the current input values based on the analysis of probability distributions of time series partitions, which are created from changepoints registered in the past values.

EXPected Similarity Estimation (EXPoSE) [32]  an algorithm that classifies anomalies based on the deviation of an input observation from an estimated distribution of past input values.

KNN CAD [4]  a method for univariate time series data based on nearest neighbors classification. KNN CAD method first transforms time series values into its Caterpillar matrix. Such a matrix is created for both the most recent input value (which is classified as an anomaly or not) and for a sequence of past values, which are used as reference data. Next, the NonConformity Measure (NCM) is calculated for both the classified value and the reference values using the created Caterpillar matrix. Eventually, the anomaly score of the classified input value is obtained by comparing its NCM with NCMs of the reference values.

Relative Entropy [36]  a method, which uses a relative entropy metric (KullbackLeibler divergence) of two data distributions to decide if a series of input values can be classified as anomalies.

ContextOSE [34]  an algorithm that creates a set of contexts of time series according to the characteristics of its values. A subsequence of most recent input values is classified as anomalous if its context differs significantly from the contexts of past subsequences of values, which are stored in the operating memory of ContextOSE.
In addition to the above presented methods and algorithms directly compared with our approach in the experimental evaluation, other approaches to unsupervised anomaly detection in time series data were proposed. In [27], an unsupervised approach to anomaly detection, which combines ARIMA (Autoregressive Moving Average) method and Convolutional Neural Networks (CNN) was provided. [12] introduced an unsupervised anomaly detection method integrating LongShort Term Memory (LSTM) networks and Oneclass Support Vector Machines. The method proposed in [12] uses online learning of LSTM offered in [13]. A supervised eSNN approach to anomaly detection, called HESADM, was proposed in [11]. In this approach, the eSNN network first learns based on the training data, and then it is used for anomaly detection. All data samples presented to the detector are labeled as being either anomalous or not and there is a clear distinction between training and testing phases. In [10], a semisupervised approach to anomaly classification with oneclass eSNN was offered and dedicated to intrusion detection systems. Contrary to the approaches presented in [11, 10], our OeSNNUAD approach learns to recognize anomalies in an unsupervised mode, in which anomaly labels are not assigned to data samples.
Iii OeSNNUAD  the Proposed Anomaly Detection Model based on Online Evolving Spiking Neural Networks
In this section, we present our online approach to unsupervised anomaly detection in stream data called OeSNNUAD, which is based on the eSNN network. First, we overview the proposed architecture of OeSNNUAD. Then, we describe encoding of input values applied by us and the used neuronal model. Eventually, we present our algorithm for anomaly detection in stream data.
Iiia The Architecture of OeSNNUAD
The eSNN network of the proposed OeSNNUAD architecture consists solely of an input layer and output layer. The input layer contains input neurons and their Gaussian Receptive Fields. The output layer is an evolving repository of output neurons. The proposed architecture of OeSNNUAD is presented in Fig. 1.
The set of input neurons is denoted by , while the set of output neurons by . The number of input neurons is fixed and determined by usergiven parameter , whereas the maximal number of output neurons is given by , which is also a userspecified parameter value.
The input stream data is denoted by , and is defined as a series of real values . A window with regard to is denoted by and is defined as , where is a userspecified parameter called window size. Clearly, window can be obtained from by removing the first value from , shifting the remaining values by one position to the left and adding at the last position.
When value occurs in stream data, and thus becomes subject to classification, values in window are used to determine GRFs of input neurons. Next, is encoded by means of GRFs into a sequence of spikes, which are then used to update the repository of output neurons. Eventually, is classified as an anomaly or not. To this end, errors of the eSNN network predictions for nonanomalous values in are used. In the remainder of the article, in the case when denotes the current time, we may also write briefly instead of .
In Table I, we list notation of parameters used in the algorithms presented in the article.
Notation  Description  Value  
Stream of input data  
Time window of input values  
Size of time window  
Input value at time  
OeSNNUAD prediction of  
Vector of predicted values  




Input neurons  
Number of input neurons  


th neuron in the set of input neurons  








Excitation of GRF of neuron for value  
Firing time of input neuron for value  
,  Mean and variance of input values in  
Normal distribution  
Repository of output neurons  
Number of output neurons in repository  

(0, 1)  
Usergiven similarity threshold  (0, 1]  
th output neuron from repository  
Vector of synaptic weights of output neuron  




Output value of output neuron  


Number of updates of output neuron  



(0, 1]  
New candidate output neuron  


Error correction factor  (0, 1]  


Vector of error values between and  
Anomaly classification factor  2 
IiiB Input Layer of the Proposed OeSNNUAD Approach
The aim of the input neurons of the eSSN network and their GRFs is to encode input values of a data stream into firing orders of these neurons. The firing orders are then used for learning of the eSNN network and for the detection of anomalies. The encoding of the input value into firing orders of input neurons is performed in several steps. First, based on the actual time window , GRFs of input neurons are recalculated. In particular, the maximal value in window (denoted by ) and the minimal value (denoted by ) are used to calculate centers and widths of GRFs. For each th input neuron, where , the center and width of its GRF are defined according to Eq. (1) and Eq. (2), respectively:
(1) 
(2) 
Please note that the widths of GRFs of all input neurons are the same.
Based on the centre and the width of an input neuron , the excitation of each GRF for an input value is calculated according to Eq. (3):
(3) 
The excitation of th GRF translates into firing time of the related input neuron according to Eq. (4). in Eq. (4) denotes the usergiven basic synchronization time of firings of input neurons in eSNN.
(4) 
The firing times of input neurons are used to calculate their firing orders. Let denote a list of all input neurons in sorted nondecreasingly with respect to their firing times in such a way that for each pair of input neurons and in that have the same firing times and such that input neuron precedes input neuron on the list. Then, the firing order of an input neuron , where , equals the position of this input neuron on the list decreased by 1.
Let us consider an example of encoding of value given in Fig. 2. The size of window is . The GRFs parameters and are 0.1 and 1.0, respectively. Seven neurons in the input layer are used with seven associated GRFs. In Fig. 2, the firing times are calculated with synchronization time equal to 1.0. The input value translates into the firing times of input neurons:

,

,

,

,

,

,

.
According to the obtained firing times of input neurons, their firing ordering is as follows: order(4) = 0, order(3) = 1, order(5) = 2, order(2) = 3, order(6) = 4, order(1) = 5, order(0) = 6.
IiiC Neuronal Model of Output Neurons
In our approach, we apply a simplified Leaky Integrate and Fire (LIF) neuronal model of output neurons as presented in [23]. According to this model, an output neuron accumulates its Postsynaptic Potential (PSP) until it reaches an actual postsynaptic potential threshold . Then the output neuron fires and its PSP value is reset to 0. The accumulation of PSP potential of an output neuron is given in Eq. (5):
(5) 
where represents the synapse’s weight from input neuron to output neuron , is a usergiven modulation factor within range , and is the firing order of the input neuron for the encoding of the value of . In the proposed approach, the postsynaptic potential of each output neuron is reset to 0 (regardless if is actually fired or not) and recalculated for each input value being classified.
The distinctive feature of (O)eSNN is the creation of a candidate output neuron for each value of the input data stream. When a new candidate output neuron is created for , its synapses weights are initialized according to the firings orders of input neurons for the encoding. The initial weights of synapses between each input neuron in and the candidate output neuron are calculated according to Eq. 6.
(6) 
A candidate output neuron, say , is characterized also by two additional parameters: the maximal postsynaptic threshold and the actual postsynaptic potential threshold . The definition of is given in Eq. (7):
(7) 
where is the firing order function calculated for this input value for which candidate output neuron was created. The definition of the actual postsynaptic potential threshold is given in Eq. (8):
(8) 
where C is a user fixed value from the interval (0, 1).
Example 1
To illustrate how synapses weights of a new candidate output neuron are calculated, let us consider an example of encoding the value of with seven input neurons presented in Fig. 2 and as well as parameters of neuronal model set to and , respectively. The previously calculated firing orders of the seven input neurons for the encoded value are as follows: order(4) = 0, order(3) = 1, order(5) = 2, order(2) = 3, order(6) = 4, order(1) = 5, order(0) = 6. In such a case, the weights of synapses between input neurons and neuron are initialized as follows:

,

,

,

,

,

,

.
For the encoding of Example 1, the value of maximal postsynaptic potential threshold calculated according to Lemma 1 is .
Each candidate output neuron, say , is either added to the repository or is merged with some output neuron in . An additional parameter provides the information from how many candidate output neurons was created. equal to 1 means that is a former candidate output neuron and preserves values of its parameters. Now, each time when an output neuron built from former candidate output neurons is merged with a current candidate output neuron , each weight of of the synapse between output neuron and input neuron is recalculated as shown in Eq. (9), is recalculated as shown in Eq. (10) and is recalculated according to Eq. (11):
(9) 
(10) 
(11) 
In addition, is increased by 1 to reflect the fact that one more candidate output neuron was used to obtain an updated version of output neuron .
IiiD Properties of a Neuronal Model of Output Neurons
In this subsection, we formulate and prove properties of the candidate output neurons and the output neurons in .
Lemma 1
For each candidate output neuron , the following holds:
(i) the sum of its synaptic weights equals ,
(ii) = ,
(iii) = .
Proof. By Eq. (6), the vector of synaptic weights of any candidate output neuron, say , consists of the following elements: (which may be stored in different order in distinct candidate vectors), and value of is the sum of the squares of these elements by Eq. (7). Thus, the sum of all synaptic weights of each candidate output neuron trivially equals and = . By Eq. (8), = .
Theorem 1
For each output neuron , the following holds:
(i) the sum of its synaptic weights equals ,
(ii) = ,
(iii) = .
Proof. The proof follows immediately from Lemma 1 in the case when an output neuron, say is composed of one candidate output neuron; that is, when = 1. Now, we will focus on the case when output neuron is constructed from , where > 1, candidate output neurons: , , …, . Then, by Eq. (9), the vector of synaptic weights of output neuron is the average of the vectors of synaptic weights of these candidate vectors. Hence and by Lemma 1.(i), the sum of synaptic weights of output neuron equals = . By Eq. (10), of output neuron is the average of of candidate vectors , , …, . Hence, and by Lemma 1.(ii), = = . By Eq. (11), is the average of of candidate vectors , , …, . Hence, and by Lemma 1.(iii), = = .
Corollary 1 follows immediately from Theorem 1 and the fact that and are sums of consecutive elements of geometric series:
Corollary 1
For each output neuron , the following holds:
(i) the sum of its synaptic weights equals ,
(ii) = ,
(iii) = .
IiiE Learning Algorithm of OeSNNUAD
In the course of learning of (O)eSNN as presented in [23], the weights of synapses between input neurons and output neurons are shaped during the supervised learning of the network. The proposed learning method of OeSNNUAD does not use supervised learning. The OeSNNUAD approach to anomaly classification can be summarized by its two phases performed for each input value of stream data obtained at time :

The sliding window is updated with value and GRFs of input neurons are initialized. The value of is used to calculate firing times and firing orders of neurons in . Next, the input value is classified as anomalous or not in three steps: a neuron , which fired first is obtained. The output value of this neuron is reported as a prediction and classified as either an anomaly or not. If an anomaly is not detected, the output value of is corrected according to the input value .

Next, a new candidate output neuron is created and initialized. The candidate neuron initialization procedure is performed in three steps: the synapses to all input neurons are created and their weights are calculated according to Eq. 6. Next, the output value of the candidate neuron as well as its initialization time are calculated. Eventually, the candidate output neuron is used to update the repository of output neurons.
In Algorithm 1, we present the main learning procedure of OeSNNUAD. All of the input learning and data encoding parameters are constant during learning and classification of input values in OeSNNUAD. First, the current size of output repository is set to . Next, based on the fact that the values of actual postsynaptic potential thresholds are the same for all output neurons in (according to Theorem 1.(iii) and Corollary 1.(iii)), their common actual postsynaptic potential threshold, denoted by , is calculated one time according to Corollary 1.(iii). Then, the sliding window is initialized with input values from . These values are not classified as anomalies.
Next, in step 11 of Algorithm 1, the detection of anomalies among next input values from begins. First, time window is updated with value , which will be classified, and GRFs of input neurons are initialized based on the content of window , as presented in Algorithm 2. Next, the value of the output neuron that fires as first is obtained. The value is reported as a prediction of input value . In order to obtain the first firing output neuron , we proposed the FiresFirst function presented in Algorithm 6. Specifically, the output neuron firing as first is obtained as follows:
First, postsynaptic potentials of all output neurons in are reset to 0. Next, in the loop in which variable iterates over identifiers of input neurons starting from the one with the least order value (0) to the one with the greatest order value (), for each output neuron in , say , the lower approximation of its , denoted by , is calculated in an incremental way. As a result, after iterations, where , , where is the input vector whose order equals , ; that is, , and , , …, . After the first iteration in which of at least one output neuron is greater than the threshold (and by this, its is also greater than the threshold), no other iterations are carried out. In such a case, each output neuron identified so far whose is greater than is added to the ToFire list. is found as this output neuron in ToFire that has the greatest value of , and is returned as the result of the FiresFirst function. Please note that the method we propose to calculate more and more precise lower approximations of of output neurons guarantees that is found in a minimal number of iterations. If within iterations no output neuron with is found, the FiresFirst returns None to indicate that no output neuron in was fired. In this case, value is classified as being anomalous and the prediction of network as well as error value are set to and , respectively. Otherwise, the prediction of network is equal to and the absolute difference between and is set as the value of error .
The ClassifyAnomaly function, given in Algorithm 8, performs anomaly classification (which we describe in more detail in the next subsection). It returns Boolean value indicating presence or absence of an anomaly for the input value . If the anomaly is not detected ( is ), then the ValueCorrection function is called with parameters and . The function, presented in Algorithm 7, adjusts the output value (reported as a prediction at time ) to the input value. Specifically, the value is increased or decreased by the factor of the difference ().
In step 26 of Algorithm 1, a new candidate output neuron is created, and initialized by function InitializeNeuron, presented in Algorithm 3. Function InitalizeNeuron first creates synapses between the new candidate neuron and each input neuron in . Then, the initial synapses weights are calculated according to the firing orders of input neurons in obtained for an input value . Next, the output value of is generated from a normal distribution created based on input values currently falling into window and finally the initialization time is set to current input value time .
Step 30 of Algorithm 1 calls function FindMostSimilar, presented in Algorithm 4, which finds ouput neuron such that the Euclidean distance between vectors of synapses weights of and is the smallest. If is less than or equal to the value threshold, then is updated according to the function UpdateNeuron presented in Algorithm 5 and is discarded. Otherwise, if the number of output neurons in repository did not reach yet, then is added to and counter is incremented. If both the similarity condition is not fulfilled and the repository is full, then candidate output neuron replaces the oldest neuron in .
Function UpdateNeuron, presented in Algorithm 5, updates the vector of synapses weights of output neuron as well as its output value and initialization time. The updated values are weighted averages of all previous () values of output neuron and the respective values of candidate output neuron . Eventually, the procedure increments the update counter of output neuron .
IiiF Anomaly Classification
Given input value at moment obtained from the input stream and prediction of this value made by OeSNNUAD, the aim of the anomaly classification module is to decide if either should be classified as an anomaly or not. The approaches proposed in the literature such as presented in [25, 2, 26, 27] either simply calculate an error between the predicted and the real value, compare it against a threshold value and decide if an anomaly occurred or a window of past errors is used to construct a statistical distribution and obtain the probability of predicting an error for . With low probability of such an error, the observation is classified as an anomaly. In both approaches, a constant error threshold value for anomaly classification is used. [5] takes a different approach and proposes to adapt an error threshold for anomaly classification according to the changing characteristic of the stochastic process generating input data.
In our approach presented in Algorithm 8, a vector of error values calculated between predicted and observed values of window is used to decide if either observation should be classified as anomalous or not. The error between and its prediction is calculated as an absolute difference between these two values: . Given vector of error values obtained for all input values of already presented to the network and their predictions in , a vector of such past error values of , whose respective input values were not classified as anomalies is obtained. If is empty, then the procedure returns , which indicates absence of an anomaly for . Otherwise, the mean and the standard deviation of error values in are calculated and used to classify as either an anomaly or not. If the difference between values and is greater than , where is a usergiven parameter, then is classified as an anomaly, otherwise it is not.