Pelican: A Deep Residual Network for Network Intrusion Detection
One challenge for building a secure network communication environment is how to effectively detect and prevent malicious network behaviours. The abnormal network activities threaten users’ privacy and potentially damage the function and infrastructure of the whole network. To address this problem, the network intrusion detection system (NIDS) has been used. By continuously monitoring network activities, the system can timely identify attacks and prompt counter-attack actions. NIDS has been evolving over years. The current-generation NIDS incorporates machine learning (ML) as the core technology in order to improve the detection performance on novel attacks. However, the high detection rate achieved by a traditional ML-based detection method is often accompanied by large false-alarms, which greatly affects its overall performance. In this paper, we propose a deep neural network, Pelican, that is built upon specially-designed residual blocks. We evaluated Pelican on two network traffic datasets, NSL-KDD and UNSW-NB15. Our experiments show that Pelican can achieve a high attack detection performance while keeping a much low false alarm rate when compared with a set of up-to-date machine learning based designs.
With the continuously expanding network scale, network attacks are becoming more and more frequent, volatile and advanced. From mobile phones, personal computers to industrial servers, all networked devices are potentially under the threats of malicious intrusion activities. How to effectively discover and prevent the intrusions is very challenging. Many cybersecurity enterprises and research institutes, such as Chronicle, Symantec, McAfee, have been developing the network intrusion detection system (NIDS) to safeguard their networked computing environments.
Fig. 1 shows a general setting of using such a detection system, where NIDS sits within the network, continuously monitors in-out network traffic, and reports any suspicious behaviours to the security team for further attack identification and containment.
The NIDS designs were initially signature-based and were only effective for the detection of known attacks. With the increasingly large traffic data generated each day and the need to timely detect volatile and advanced attacks (due to growing network users), the artificial intelligence (AI) or machine learning (ML) technology has come into play in the NIDS design.
Most of the existing ML-based NIDS designs, however, achieve high detection rate at the cost of large false alarms, which is inevitably adding unnecessary workload to the security team and may delay the counter-attack responses, and hence adversely affecting the overall network security.
In this paper, we aim to develop a deep neural network (DNN) design to achieve high intrusion detection accuracy. Our main contributions are as follows:
We investigate the performance degradation – the hurdle existing in training DNN, and introduce residual learning in our design.
We propose a building-block based DNN structure, Pelican, that is built on sub residual networks.
To improve learning efficiency, we incorporate CNN and RNN in the sub residual network so that both spatial and temporal features in the input data can be effectively captured.
We test Pelican on two different network traffic datasets and compare Pelican with a set of ML based designs for intrusion detection. Our experiment results show that Pelican outperforms those existing designs.
Importantly, our work demonstrates that the residual learning is effective in building DNN for network intrusion detection.
It is a general perspective that a deeper neural network should have a better potential on learning and generalization data than a shallow one. We tested this idea by running an experiment based on an existing neural network design, LuNet, which was proposed in  for network intrusion detection. Fig. 2 shows the plots of training accuracy and testing accuracy (on a dataset UNSW-NB15) of the network with respect to different number of parameter (or learning) layers. From these plots, we can see that as the network depth increases, the learning accuracy does not increase as expected; Instead the performance is even degraded.
In fact, this performance degradation in training deep neural network is not just for network intrusion detection. The problem was also revealed early in [10, 9, 28] when training the convolution neural networks (CNN) for image classification.
The performance degradation issue imposes a great hurdle in unleashing the potential of deep neural network. There have been a few solutions proposed. Among them, residual learning has been demonstrated an effective technique in the area of image classification. Here we want to testify that it is also effective for network intrusion detection.
The general design idea of residual learning is briefly presented below. (The detailed discussions can be found in an abundant literature,  etc.)
Iii Residual Learning
Residual learning is related to deep neural network. A deep neural network can be abstracted with multiple learning layers, as demonstrated in Fig. 3 (a), where three learning layers are used. Each layer in the network contains a set of connections between its input and output, and has corresponding adaptive parameters (typically connection weights).
Network training consists of rounds of rounds of learning that basically includes forward mapping and backward tuning.
For each round of learning, the network processes the input data layer by layer in a forward-propagation fashion to generate an output mapping
, where and are, respectively, the input and output vectors, and , the total set of adaptive parameters from all learning layers. The learning error generated at the output is then used to train the network.
Backward Tuning: Basically, training a network is tuning the adaptive parameters in each learning round to minimize the learning error. There are several classes of training algorithms. For DNNs with a large number of adaptive parameters, a gradient descent optimization algorithm such as SGD, RMSprop, ADAELTA, is often used.
With such an algorithm, the error is backward propagated through each layer where each of the related parameters, , , is adjusted based on its gradient , as shown in the below formula:
where is the step size of the tuning and can be determined by the learning rate. We use to denote this tuning process in Fig. 3 (a), where is the total set of gradients.
The gradient, , of is the product of a chain of partial derivatives of local outputs along the backward chain from the output layer to the current parameter layer related to , and can be approximately denoted as
If those parameters are smaller than 1 (as often is the case), the gradient tends to be vanished towards the initial learning layers; Therefore, the parameters of the initial layers will not get properly tuned in each training round. But the initial layers directly learn the input data and their learning is often critical in capturing the fundamental features of the input data and plays an import role for the final learning outcome. On the other hand, if those parameter are larger than 1, the gradient will become exponentially large (namely, exploded); In this case, the parameter tuning is too rough and may fail to make any improvement in the training, and therefore, the learning cannot converge.
Consequently, with more and more learning layers added, this problem become more and more eminent, leading to network learning performance decreased, as observed in our experiment shown in Section II.
Use of residual learning can mitigate such a problem. Residual learning was first introduced in ResNet , a very deep neural network with hundreds of learning layers, for computer vision tasks. ResNet has demonstrated amazing results that are even better than human performance for image recognition.
Residual learning applies a short cut between the output layer to an input layer, as demonstrated in Fig. 3(b), so that the output error can be propagated to the input layer through a shorten route, hence avoiding the gradient vanishing/exploding problem caused by the existing long propagation path.
Our design with residual learning for Pelican is given in the next section.
Pelican is basically constructed with a set of residual blocks, as shown in Fig. 4, where a chain of residual blocks plus an output-layer block are concatenated.
Each residual block, , is a sub residual network that is formed based on a plain network block proposed in , as is shown in Fig. 5(a). The plain network block uses CNN and RNN and is able to learn both spatial and temporal features from the input. The main functions of each type of layer in the basic block are briefly summarized below.
Batch-Normalization (BN) : BN reduces the internal covariate shift during training by scaling weights to unit norms. It is applied before Convolution and Recurrent network (GRU) layers. BN helps fine-tune the learning rate to accelerate network training.
Convolution : The convolution operation in this layer extracts the spatial features from the input data and produces a feature map at the output. The convolution operation is often followed by an activation function to amplify distinctions between features generated by the convolution. Here the rectified linear unit (ReLU) is used as the activation function.
Maxpooling : This layer selects most active neurons based on the maximum probabilities in nearby features to facilitate the next stage learning.
Gated Recurrent Unit (GRU) [5, 4]: GRU is a recurrent network that can extract the temporal features of the input data through a recurrent process. Similar to the convolution layer, an activation function and a recurrent activation function are needed for GRU, for which tanh and hard sigmoid are, respectively, used here.
Reshape: Since the dimension of data changes during learning, the reshape layer is used to keep the accordance of data dimension.
With the functional layers presented above for the basic plain block, we construct the residual network block (ResBlk), as shown in Fig. 5(b), where the short cut is connected from the BN output to facilitate the initialization of overall deep network.
The training environment used in our evaluation was based on TensorFlow backend, Keras and scikit-learn packages on a HP EliteDesk 800 G2 SFF Desktop with Intel (R) Core (TM) i5-6500 CPU @ 3.20 GHz processor and 16.0 GB RAM.
The training was performed on two network intrusion datasets: NSL-KDD  and UNSW-NB15 .
Both datasets have removed a significant amount of redundant records from the originally collected data to ensure the trustworthiness of evaluation [19, 15].
The NSL-KDD dataset consists of 5 categories: Normal, DoS, U2R, R2L and Probe, and the attack samples were collected based on a U.S air force network environment.
The UNSW-NB15 dataset includes 10 categories: Normal, DoS, Exploits, Generic, Shellcode, Reconnaissance, Backdoors, Worms, Analysis and Fuzzers, and the attack samples were collected from Common Vulnerabilities and Exposures
V-a Data Preprocessing
There were 148,516 and 257,673 data records from NSL-KDD and UNSW-NB15 used in the evaluation. Before training, we needed to preprocess the data, which consists of three steps:
Step 1, Numerical Conversion: Since the neural network could not recognize the textual notation, such as ‘tcp’, ‘private’ and ‘http’ in the raw data, we converted them into numerical values. Here, we used the ‘get_dummies’ function in Pandas  for the conversion.
Step 2, Normalization: The data in the dataset may have various distributions with different means and derivations, which may make the neural network learning difficult. Hence, we applied standardization to scale them with a mean of 0 and a standard deviation of 1.
Step 3, Training/Testing Dataset Creation: To address the problem of data deficiency , we used k-fold cross-validation to ensure a good training-testing proportion. With the k-fold validation, a dataset was split into k subsets, where k-1 subsets were combined for training and the rest one was used for testing. Here, we set k=10.
V-B Evaluation Metric
We used three metrics to evaluate the performance: validation accuracy (ACC), detection rate (DR) and false-alarm rate (FAR), as defined below.
where TP and TN are, respectively, the number of attacks and the number of normal traffic correctly classified; FP is the number of actual normal records mis-classified as attacks, and FN is the number of attacks incorrectly classified as normal traffic.
V-C Tested DNN Models and Parameter Setting
To investigate the effectiveness of residual network, we constructed two plain networks and two residual networks with different depths. The brief description of the four networks is given below:
21 parameter-layer plain network (Plain-21 ): It was built with five plain blocks + one global average pooling layer + one dense layer.
21 parameter-layer residual network (Residual-21): It was built with five residual blocks + one global average pooling layer + one dense layer.
41 parameter-layer plain network (Plain-41): It was built with ten plain blocks + one global average pooling layer + one dense layer.
41 parameter-layer residual network (Residual-41 (Pelican)): It was built with ten residual blocks + one global average pooling layer + one dense layer.
Since residual learning uses the “add” operation, the output dimension of filters (number of filters) and recurrent units must be equal to the input shape. In our experiment, after the data preprocessing, the input to the network for UNSW-NB15 has 196 features, and for NSL-KDD has 121 features. Therefore, the input shapes for the two datasets are (1,196) and (1,121) respectively. Hence, we set 196 filters and 196 recurrent units to learn UNSW-NB15, and similarly 121 filters and 121 recurrent units to learn NSL-KDD.
V-D Training Loss and Testing Loss
We recorded the training histories and compared the training and testing losses to examine how residual learning can mitigate the degradation problem in deep neural networks. The training losses of the four networks on the two datasets are plotted in Fig. 6 (a)-(d). As shown in Fig. 6, Plain-21 has less losses than Plain-41, which indicates that adding more learning layers leads to poorer performance. However, by using residual learning, the losses reduce greatly; for the networks of the same depth, the residual network has a much lower loss than the plain network. It can also be observed that the deeper network, Residual-41, in most cases, shows smaller losses than the shallow one, Residual-21. The only exception happened during testing the both residual networks on the UNSW-NB15 dataset, as shown in Fig. 6 (b), which could be explained due to overfitting (More discussion on this will follow).
V-E True Attacks Detected vs False Alarms
Table II presents the total number of correctly detected attacks (TP) and the total number of false alarms (FP) generated by the four networks on the two datasets. As can be seen from the table, the residual-41 can detect more attacks and at the same time generate less false alarms than other three designs.
V-F Overall Performance
Table III and Table IV shows the detection rate (DR), validation accuracy (ACC), and false alarm rate (FAR) of the four networks tested on the NSL-KDD and UNSW-NB15 datasets, respectively. From the two tables, we can conclude that:
The residual networks outperform the plain networks with a high detection rate, a good validation accuracy and a low false alarm rate.
With the increasing number of learning layers, a deeper residual network can achieve better performance than a shallow residual network.
V-G Limitations on Experiments
Though our evaluation here has basically demonstrated the effectiveness of using residual learning for deep neural network for network intrusion detection, our experiment results are restricted by some limitations, and the training data insufficiency and the low capacity of computing resources are two major limiting factors we encountered, which are discussed below.
Training Data Insufficiency
Generally, a deep neural network requires sufficient data for it to learn effectively and be able to cater for large scale data. Insufficient training data may lead to overfitting, which can be partially addressed by dropout, as has been mentioned in Section IV. In our experiment, even though we already set a high dropout (0.6) to overcome overfitting, it was still an issue, as manifested by our experiment results shown in Fig. 6 (b). This situation indicates that the deep network model needs more data to fit. However, because of privacy and security concerns, a sufficient cyber-attack dataset is much expensive to obtain. Currently, NSL-KDD and UNSW-NB15 are only two trustworthy cyber-attack datasets that are free of redundancy. We hope the training datasets of sufficiently large will be available in future so that Pelican can be further improved and evaluated.
Low Capacity of Computing Resources
Our Pelican model can be easily scaled up with more learning layers. However in our evaluation, we only managed to test on the networks of up to 41-parameter layers due to the limited computing resources we had. When we added more layers, the training became extremely slow and laborious, which can be improved with more powerful computing devices.
V-H A Comparative Study
To further evaluate our Pelican DNN (of currently 41 layers), we compare it with a set of typical machine learning based designs, as briefly described below, for network intrusion detection.
Support Vector Machine (SVM) : SVM is a classical machine learning approach that uses a kernel function, such as Gaussian kernel (RBF), to learn high-dimensional data. But as pointed out in , it has a low generation capability on learning large scale data.
Adaptive Boosting (AdaBoost) : It is an ensemble learning approach that uses many cascaded weak classifiers (such as decision trees) to construct a stronger classifier to learn complex tasks. The advantage of using many weak classifiers is its ability to mitigate the overfitting problem. However, AdaBoost often does not work well on imbalanced datasets.
Random Forest (RF) : RF is also an ensemble learning approach. But compared to AdaBoost, it uses a different strategy of weight allocation. Apart from having ability to reduce overfitting, RF can also handle imbalanced data. But its generalization capability often relies on the specification of features to be learned and for the effectiveness of learning, a large number of features are required.
MultiLayer Perceptron (MLP) : MLP is an early class of feed-forward neural network that uses Back-Propagation to learn non-linear problems.
Convolution Neural Network (CNN, or ConvNet) : CNN is the most popular deep neural network that has been used for image recognition and has gained great successes. With the help of convolution operation, CNN has an ability to generate spatial representations from raw data.
Long Short Term Memory (LSTM) : LSTM is a recurrent neural network. By generating temporal representations from learning, LSTM has been successfully applied to speech recognition and machine translation. LSTM is similar to GRU we used in our residual block but LSTM has a higher computing cost .
HAST-IDS : It is a recently proposed intrusion detection system that uses a tandem CNN+LSTM model (first learning spatial representations by CNN, then learning temporal representations by LSTM) as the core decision strategy.
LuNet : It is also an CNN+LSTM based intrusion detection system, similar to HAST-IDS. But LuNet uses a different architecture for effective learning of both spatial and temporal features of the input data and shows a better performance than HAST-IDS.
We trained each of those designs on UNSW-NB15 dataset. Their detection rate (DR), accuracy (ACC) and false alarm rate (FAR) are given in Table V.
As can be seen from the table, among all the designs examined, Pelican shows the best performance - with the highest detection rate and accuracy, and the lowest false alarm rate - which further demonstrates the effectiveness of our design with residual learning for network intrusion detection.
In this section, we provide some background knowledge related to network intrusion detection and then present our two point of views for contemporary NIDS designs: 1) using anomaly detection is not suitable for real time intrusion detection for large scale network and 2) using supervised learning is effective but challenging.
The idea of network intrusion detection systems (NIDS) is proposed to improve the capability of security incidents response[6, 20]. By monitoring the network activities, NIDS can timely alert the suspicious and malicious network behaviour to network administrators.
Security experts believe that most of new attacks are variants of known attacks. The traditional NIDS designs prevent malicious behaviours by using specific attack signatures from the known attack libraries and have developed some successful products, such as Snort  and Zeek-IDS . However, the signature-based solution lacks of intelligence to discover advanced variants of previously known attacks. Hence, two alternative strategies have been proposed to address the problem: supervised learning and anomaly detection.
The principle of anomaly detection is that we only need to learn a profile of normal traffic whereas the outliers are considered as attacks. Statistical learning [17, 30, 30, 33, 25] and unsupervised learning [23, 3, 37] are currently primary techniques used for anomaly detection. The anomaly detection for network intrusion detection is not very suitable and practical for the reasons below:
Reason one : Anomaly detection often leads to a high false alarm rate. However, a secure and dependable NIDS system not only should have zero tolerance of attacks but also should not block legitimate requests.
Reason two : Even if we can reduce false alarm rate by, for example, developing very sophisticated statistical or learning algorithm, a current notation of normal profile may not be fully representative in the future due to highly evolved network. Therefore, anomaly detection may only work well in a controlled network and may be less useful and practical in most of commercial network environments.
In contrast to anomaly detection, supervised learning requires a well-defined threat model to learn the underlying distinctions between normal and abnormal behaviour (such as Pelican and other classical supervised machine learning techniques shown in Table V). Supervised learning often produces a lower false alarm rate (FAR) and has more stable performance than anomaly detection. Hence, it is currently considered as a more reasonable and practical approach than anomaly detection for network intrusion detection. However, there are some challenges that need to be addressed.
Challenge one: Supervised learning requires a good and continuously updated threat model to maintain its intelligence for detecting more new attacks. However, due to the privacy and information security concerns, network attack data are often much expensive to collect.
Challenge two: Even if supervised learning can contain FAR better than anomaly detection, the false alarm rate is still not satisfactory enough compared to human experts. Hence, more effective detection approaches are still required to further reduce the false alarm rate.
In the paper, we have examined the performance degradation problem in deep neural networks and investigated how the problem can be alleviated by using residual learning.
We presented a deep residual network, Pelican, for network intrusion detection, and compared its performance with traditional plain networks and its shallower residual network.
Our work has shown that residual learning not only can be used with CNN for image classification as has been demonstrated in the literature, but also can be used with CNN+RNN structures for network intrusion detection. Compared to a set of state-of-the-art classical machine learning based techniques, by using residual learning, the deep neural network, Pelican, has a high capability for network intrusion detection.
It must be stressed that our evaluation is not thorough due to the limitations we encountered (small training datasets and lack of powerful computing facilities). A deeper Pelican with more learning layers will be investigated in the future when large training datasets and powerful computing resources become available.
- CVE: https://cve.mitre.org/
- BID: https://www.securityfocus.com
- MSD: https://docs.microsoft.com/en-us/security-updates/securitybulletins
- (2018) Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. IEEE Access 6, pp. 33789–33795. Cited by: §V-H.
- (2006) The curse of highly variable functions for local kernel machines. In Advances in Neural Information Processing Systems, pp. 107–114. Cited by: §V-H.
- (2012) Unsupervised network intrusion detection systems: detecting the unknown without knowledge. Computer Communications 35 (7), pp. 772–783. Cited by: §VI.
- (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: item 4.
- (2015) Gated feedback recurrent neural networks. In International Conference on Machine Learning, pp. 2067–2075. Cited by: item 4.
- (1987) An intrusion-detection model. IEEE Transactions on Software Engineering (2), pp. 222–232. Cited by: §VI.
- (2017) Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. Cited by: §V-H.
- (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Computers & Security 28 (1-2), pp. 18–28. Cited by: §VI.
- (2015) Convolutional neural networks at constrained time cost. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353–5360. Cited by: §II.
- (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §II, §II, §III.
- (2012) Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Lecture Note 14, pp. 8. Cited by: §V-H.
- (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §V-H.
- (2013) Online adaboost-based parameterized methods for dynamic distributed network intrusion detection. IEEE Transactions on Cybernetics 44 (1), pp. 66–82. Cited by: §V-H.
- (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: item 2.
- (2000) Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory. ACM Transactions on Information and System Security (TISSEC) 3 (4), pp. 262–294. Cited by: §V.
- (2010) Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, S. van der Walt and J. Millman (Eds.), pp. 51 – 56. Cited by: §V-A.
- (2017) Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Transactions on Big Data. Cited by: §VI.
- (2015) UNSW-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6. Cited by: §V.
- (2016) The evaluation of network anomaly detection systems: statistical analysis of the unsw-nb15 data set and the comparison with the kdd99 data set. Information Security Journal: A Global Perspective 25 (1-3), pp. 18–31. Cited by: §V.
- (1994) Network intrusion detection. IEEE Network 8 (3), pp. 26–41. Cited by: §VI.
- (1992) Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks 3 (5), pp. 683–697. Cited by: §V-H.
- (1999) Bro: a system for detecting network intruders in real-time. Computer Networks 31 (23-24), pp. 2435–2463. Cited by: §VI.
- (2000) Intrusion detection with unlabeled data using clustering. Ph.D. Thesis, Columbia University. Cited by: §VI.
- (1999) Snort: lightweight intrusion detection for networks.. In Lisa, Vol. 99, pp. 229–238. Cited by: §VI.
- (2016) An efficient proactive artificial immune system based anomaly detection and prevention system. Expert Systems with Applications 60, pp. 311–320. Cited by: §VI.
- (2010) Outside the closed world: on using machine learning for network intrusion detection. In 2010 IEEE Symposium on Security and Privacy, pp. 305–316. Cited by: §VI.
- (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: item 6.
- (2015) Highway networks. arXiv preprint arXiv:1505.00387. Cited by: §II.
- (2015) Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Cited by: item 1.
- (2014) Detection of denial-of-service attacks based on computer vision techniques. IEEE Transactions on Computers 64 (9), pp. 2519–2533. Cited by: §VI.
- (2009) A detailed analysis of the kdd cup 99 data set. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. Cited by: §V.
- (2017) Divide the gradient by a running average of its recent magnitude. coursera: neural networks for machine learning. Technical Report.. Cited by: §V-C.
- (2010) A triangle area based nearest neighbors approach to intrusion detection. Pattern Recognition 43 (1), pp. 222–229. Cited by: §VI.
- (2017) HAST-ids: learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 6, pp. 1792–1806. Cited by: §V-H.
- (2019) A transfer learning approach for network intrusion detection. In 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), pp. 281–285. Cited by: §V-A.
- (2019) LuNet: a deep neural network for network intrusion detectoin. In 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 712–717. Cited by: §II, §IV, §V-H.
- (2004) Unsupervised learning techniques for an intrusion detection system. In Proceedings of the 2004 ACM Symposium on Applied computing, pp. 412–419. Cited by: §VI.
- (2008) Random-forests-based network intrusion detection systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38 (5), pp. 649–659. Cited by: §V-H.
- (1988) Computation of optical flow using a neural network. In IEEE International Conference on Neural Networks, Vol. 1998, pp. 71–78. Cited by: item 3.