Pelican: A Deep Residual Network for Network Intrusion Detection

Pelican: A Deep Residual Network for Network Intrusion Detection

Abstract

One challenge for building a secure network communication environment is how to effectively detect and prevent malicious network behaviours. The abnormal network activities threaten users’ privacy and potentially damage the function and infrastructure of the whole network. To address this problem, the network intrusion detection system (NIDS) has been used. By continuously monitoring network activities, the system can timely identify attacks and prompt counter-attack actions. NIDS has been evolving over years. The current-generation NIDS incorporates machine learning (ML) as the core technology in order to improve the detection performance on novel attacks. However, the high detection rate achieved by a traditional ML-based detection method is often accompanied by large false-alarms, which greatly affects its overall performance. In this paper, we propose a deep neural network, Pelican, that is built upon specially-designed residual blocks. We evaluated Pelican on two network traffic datasets, NSL-KDD and UNSW-NB15. Our experiments show that Pelican can achieve a high attack detection performance while keeping a much low false alarm rate when compared with a set of up-to-date machine learning based designs.

Artificial Intelligence, Computational Intelligence, Cyber Warfare, Machine Learning, Intrusion Detection.

I Introduction

With the continuously expanding network scale, network attacks are becoming more and more frequent, volatile and advanced. From mobile phones, personal computers to industrial servers, all networked devices are potentially under the threats of malicious intrusion activities. How to effectively discover and prevent the intrusions is very challenging. Many cybersecurity enterprises and research institutes, such as Chronicle, Symantec, McAfee, have been developing the network intrusion detection system (NIDS) to safeguard their networked computing environments.

Fig. 1 shows a general setting of using such a detection system, where NIDS sits within the network, continuously monitors in-out network traffic, and reports any suspicious behaviours to the security team for further attack identification and containment.

Fig. 1: Network Intrusion Detection with NIDS

The NIDS designs were initially signature-based and were only effective for the detection of known attacks. With the increasingly large traffic data generated each day and the need to timely detect volatile and advanced attacks (due to growing network users), the artificial intelligence (AI) or machine learning (ML) technology has come into play in the NIDS design.

Most of the existing ML-based NIDS designs, however, achieve high detection rate at the cost of large false alarms, which is inevitably adding unnecessary workload to the security team and may delay the counter-attack responses, and hence adversely affecting the overall network security.

(a) Training Accuracy of LuNet on UNSW-NB15
(b) Testing Accuracy of LuNet on UNSW-NB15
Fig. 2: Motivational Example: Performance Degradation in Training DNN for Network Intrusion Detection

In this paper, we aim to develop a deep neural network (DNN) design to achieve high intrusion detection accuracy. Our main contributions are as follows:

  • We investigate the performance degradation – the hurdle existing in training DNN, and introduce residual learning in our design.

  • We propose a building-block based DNN structure, Pelican, that is built on sub residual networks.

  • To improve learning efficiency, we incorporate CNN and RNN in the sub residual network so that both spatial and temporal features in the input data can be effectively captured.

  • We test Pelican on two different network traffic datasets and compare Pelican with a set of ML based designs for intrusion detection. Our experiment results show that Pelican outperforms those existing designs.

  • Importantly, our work demonstrates that the residual learning is effective in building DNN for network intrusion detection.

The motivation of using residual learning and the brief discussion of residual learning are given in the next two sections (Section II and Section III). Our design for Pelican with residual networks is presented in Section IV.

Ii Motivation

It is a general perspective that a deeper neural network should have a better potential on learning and generalization data than a shallow one. We tested this idea by running an experiment based on an existing neural network design, LuNet, which was proposed in [36] for network intrusion detection. Fig. 2 shows the plots of training accuracy and testing accuracy (on a dataset UNSW-NB15) of the network with respect to different number of parameter (or learning) layers. From these plots, we can see that as the network depth increases, the learning accuracy does not increase as expected; Instead the performance is even degraded.

In fact, this performance degradation in training deep neural network is not just for network intrusion detection. The problem was also revealed early in [10, 9, 28] when training the convolution neural networks (CNN) for image classification.

The performance degradation issue imposes a great hurdle in unleashing the potential of deep neural network. There have been a few solutions proposed. Among them, residual learning has been demonstrated an effective technique in the area of image classification. Here we want to testify that it is also effective for network intrusion detection.

The general design idea of residual learning is briefly presented below. (The detailed discussions can be found in an abundant literature, [10] etc.)

Iii Residual Learning

Residual learning is related to deep neural network. A deep neural network can be abstracted with multiple learning layers, as demonstrated in Fig.  3 (a), where three learning layers are used. Each layer in the network contains a set of connections between its input and output, and has corresponding adaptive parameters (typically connection weights).

Fig. 3: (a) Plain Network (b) Residual Network

Network training consists of rounds of rounds of learning that basically includes forward mapping and backward tuning.

Forward Mapping: For each round of learning, the network processes the input data layer by layer in a forward-propagation fashion to generate an output mapping
, where and are, respectively, the input and output vectors, and , the total set of adaptive parameters from all learning layers. The learning error generated at the output is then used to train the network.

Backward Tuning: Basically, training a network is tuning the adaptive parameters in each learning round to minimize the learning error. There are several classes of training algorithms. For DNNs with a large number of adaptive parameters, a gradient descent optimization algorithm such as SGD, RMSprop, ADAELTA, is often used.

With such an algorithm, the error is backward propagated through each layer where each of the related parameters, , , is adjusted based on its gradient , as shown in the below formula:

(1)

where is the step size of the tuning and can be determined by the learning rate. We use to denote this tuning process in Fig.  3 (a), where is the total set of gradients.

The gradient, , of is the product of a chain of partial derivatives of local outputs along the backward chain from the output layer to the current parameter layer related to , and can be approximately denoted as

(2)

If those parameters are smaller than 1 (as often is the case), the gradient tends to be vanished towards the initial learning layers; Therefore, the parameters of the initial layers will not get properly tuned in each training round. But the initial layers directly learn the input data and their learning is often critical in capturing the fundamental features of the input data and plays an import role for the final learning outcome. On the other hand, if those parameter are larger than 1, the gradient will become exponentially large (namely, exploded); In this case, the parameter tuning is too rough and may fail to make any improvement in the training, and therefore, the learning cannot converge.

Consequently, with more and more learning layers added, this problem become more and more eminent, leading to network learning performance decreased, as observed in our experiment shown in Section II.

Use of residual learning can mitigate such a problem. Residual learning was first introduced in ResNet [10], a very deep neural network with hundreds of learning layers, for computer vision tasks. ResNet has demonstrated amazing results that are even better than human performance for image recognition.

Residual learning applies a short cut between the output layer to an input layer, as demonstrated in Fig. 3(b), so that the output error can be propagated to the input layer through a shorten route, hence avoiding the gradient vanishing/exploding problem caused by the existing long propagation path.

Our design with residual learning for Pelican is given in the next section.

Iv Pelican

Pelican is basically constructed with a set of residual blocks, as shown in Fig. 4, where a chain of residual blocks plus an output-layer block are concatenated.

Fig. 4: General Structure of Pelican

Each residual block, , is a sub residual network that is formed based on a plain network block proposed in [36], as is shown in Fig. 5(a). The plain network block uses CNN and RNN and is able to learn both spatial and temporal features from the input. The main functions of each type of layer in the basic block are briefly summarized below.

Fig. 5: (a) Plain Block (b) Residual Block
  1. Batch-Normalization (BN) [29]: BN reduces the internal covariate shift during training by scaling weights to unit norms. It is applied before Convolution and Recurrent network (GRU) layers. BN helps fine-tune the learning rate to accelerate network training.

  2. Convolution [14]: The convolution operation in this layer extracts the spatial features from the input data and produces a feature map at the output. The convolution operation is often followed by an activation function to amplify distinctions between features generated by the convolution. Here the rectified linear unit (ReLU) is used as the activation function.

  3. Maxpooling [39]: This layer selects most active neurons based on the maximum probabilities in nearby features to facilitate the next stage learning.

  4. Gated Recurrent Unit (GRU) [5, 4]: GRU is a recurrent network that can extract the temporal features of the input data through a recurrent process. Similar to the convolution layer, an activation function and a recurrent activation function are needed for GRU, for which tanh and hard sigmoid are, respectively, used here.

  5. Reshape: Since the dimension of data changes during learning, the reshape layer is used to keep the accordance of data dimension.

  6. Dropout [27]: This layer randomly drops out some connections from the network to prevent the network from overfitting. It must be noted that dropout is not a sole solution to overfitting, which will be further discussed in Section V-G.

With the functional layers presented above for the basic plain block, we construct the residual network block (ResBlk), as shown in Fig. 5(b), where the short cut is connected from the BN output to facilitate the initialization of overall deep network.

V Evaluation

The training environment used in our evaluation was based on TensorFlow backend, Keras and scikit-learn packages on a HP EliteDesk 800 G2 SFF Desktop with Intel (R) Core (TM) i5-6500 CPU @ 3.20 GHz processor and 16.0 GB RAM. The training was performed on two network intrusion datasets: NSL-KDD [31] and UNSW-NB15 [18]. Both datasets have removed a significant amount of redundant records from the originally collected data to ensure the trustworthiness of evaluation [19, 15]. The NSL-KDD dataset consists of 5 categories: Normal, DoS, U2R, R2L and Probe, and the attack samples were collected based on a U.S air force network environment. The UNSW-NB15 dataset includes 10 categories: Normal, DoS, Exploits, Generic, Shellcode, Reconnaissance, Backdoors, Worms, Analysis and Fuzzers, and the attack samples were collected from Common Vulnerabilities and Exposures1, Symantec2, Microsoft Security Bulletin3.

V-a Data Preprocessing

There were 148,516 and 257,673 data records from NSL-KDD and UNSW-NB15 used in the evaluation. Before training, we needed to preprocess the data, which consists of three steps:

Step 1, Numerical Conversion: Since the neural network could not recognize the textual notation, such as ‘tcp’, ‘private’ and ‘http’ in the raw data, we converted them into numerical values. Here, we used the ‘get_dummies’ function in Pandas [16] for the conversion.

Step 2, Normalization: The data in the dataset may have various distributions with different means and derivations, which may make the neural network learning difficult. Hence, we applied standardization to scale them with a mean of 0 and a standard deviation of 1.

Step 3, Training/Testing Dataset Creation: To address the problem of data deficiency [35], we used k-fold cross-validation to ensure a good training-testing proportion. With the k-fold validation, a dataset was split into k subsets, where k-1 subsets were combined for training and the rest one was used for testing. Here, we set k=10.

V-B Evaluation Metric

We used three metrics to evaluate the performance: validation accuracy (ACC), detection rate (DR) and false-alarm rate (FAR), as defined below.

(3)
(4)
(5)

where TP and TN are, respectively, the number of attacks and the number of normal traffic correctly classified; FP is the number of actual normal records mis-classified as attacks, and FN is the number of attacks incorrectly classified as normal traffic.

Category UNSW-NB15 NSL-KDD
Filter size 196 121
Kernel size 10 10
Recurrent unit 196 121
Dropout rate 0.6 0.6
Epochs 100 50
Learning rate 0.01 0.01
Batch size 4000 4000
TABLE I: Parameter Setting
(a) Training Loss on UNSW-NB15
(b) Testing Loss on UNSW-NB15
(c) Training Loss on NSL-KDD
(d) Testing Loss on NSL-KDD
Fig. 6: A Comparison of Learning Performance of Four Tested Networks on UNSW-NB15 and NSL-KDD.

V-C Tested DNN Models and Parameter Setting

To investigate the effectiveness of residual network, we constructed two plain networks and two residual networks with different depths. The brief description of the four networks is given below:

  • 21 parameter-layer plain network (Plain-21 ): It was built with five plain blocks + one global average pooling layer + one dense layer.

  • 21 parameter-layer residual network (Residual-21): It was built with five residual blocks + one global average pooling layer + one dense layer.

  • 41 parameter-layer plain network (Plain-41): It was built with ten plain blocks + one global average pooling layer + one dense layer.

  • 41 parameter-layer residual network (Residual-41 (Pelican)): It was built with ten residual blocks + one global average pooling layer + one dense layer.

The parameter settings of training those networks on the two datasets are given in Table I. All networks were trained with RMSprop gradient descent algorithm [32].

Since residual learning uses the “add” operation, the output dimension of filters (number of filters) and recurrent units must be equal to the input shape. In our experiment, after the data preprocessing, the input to the network for UNSW-NB15 has 196 features, and for NSL-KDD has 121 features. Therefore, the input shapes for the two datasets are (1,196) and (1,121) respectively. Hence, we set 196 filters and 196 recurrent units to learn UNSW-NB15, and similarly 121 filters and 121 recurrent units to learn NSL-KDD.

V-D Training Loss and Testing Loss

We recorded the training histories and compared the training and testing losses to examine how residual learning can mitigate the degradation problem in deep neural networks. The training losses of the four networks on the two datasets are plotted in Fig. 6 (a)-(d). As shown in Fig. 6, Plain-21 has less losses than Plain-41, which indicates that adding more learning layers leads to poorer performance. However, by using residual learning, the losses reduce greatly; for the networks of the same depth, the residual network has a much lower loss than the plain network. It can also be observed that the deeper network, Residual-41, in most cases, shows smaller losses than the shallow one, Residual-21. The only exception happened during testing the both residual networks on the UNSW-NB15 dataset, as shown in Fig. 6 (b), which could be explained due to overfitting (More discussion on this will follow).

Dataset Plain-21 Residual-21 Plain-41 Residual-41
NSL-KDD TP 14688 14702 14607 14732
FP 62 58 52 50
UNSW-NB15 TP 22094 22265 21211 22321
FP 220 136 399 121
TABLE II: Total True Attacks Detected and Total False Alarms

V-E True Attacks Detected vs False Alarms

Table II presents the total number of correctly detected attacks (TP) and the total number of false alarms (FP) generated by the four networks on the two datasets. As can be seen from the table, the residual-41 can detect more attacks and at the same time generate less false alarms than other three designs.

Struture DR% ACC% FAR%
Plain-21 98.70 98.92 0.80
Plain-41 97.56 98.37 0.67
Residual-21 98.81 99.01 0.73
Residual-41 (Pelican) 99.13 99.21 0.65
TABLE III: Testing Performance on NSL-KDD
Structure DR% ACC% FAR%
Plain-21 97.42 85.76 2.37
Plain-41 93.73 82.33 4.29
Residual-21 97.86 86.42 1.46
Residual-41 (Pelican) 97.75 86.64 1.30
TABLE IV: Testing Performance on UNSW-NB15

V-F Overall Performance

Table III and Table IV shows the detection rate (DR), validation accuracy (ACC), and false alarm rate (FAR) of the four networks tested on the NSL-KDD and UNSW-NB15 datasets, respectively. From the two tables, we can conclude that:

  • The residual networks outperform the plain networks with a high detection rate, a good validation accuracy and a low false alarm rate.

  • With the increasing number of learning layers, a deeper residual network can achieve better performance than a shallow residual network.

V-G Limitations on Experiments

Though our evaluation here has basically demonstrated the effectiveness of using residual learning for deep neural network for network intrusion detection, our experiment results are restricted by some limitations, and the training data insufficiency and the low capacity of computing resources are two major limiting factors we encountered, which are discussed below.

Training Data Insufficiency

Generally, a deep neural network requires sufficient data for it to learn effectively and be able to cater for large scale data. Insufficient training data may lead to overfitting, which can be partially addressed by dropout, as has been mentioned in Section IV. In our experiment, even though we already set a high dropout (0.6) to overcome overfitting, it was still an issue, as manifested by our experiment results shown in Fig. 6 (b). This situation indicates that the deep network model needs more data to fit. However, because of privacy and security concerns, a sufficient cyber-attack dataset is much expensive to obtain. Currently, NSL-KDD and UNSW-NB15 are only two trustworthy cyber-attack datasets that are free of redundancy. We hope the training datasets of sufficiently large will be available in future so that Pelican can be further improved and evaluated.

Low Capacity of Computing Resources

Our Pelican model can be easily scaled up with more learning layers. However in our evaluation, we only managed to test on the networks of up to 41-parameter layers due to the limited computing resources we had. When we added more layers, the training became extremely slow and laborious, which can be improved with more powerful computing devices.

V-H A Comparative Study

To further evaluate our Pelican DNN (of currently 41 layers), we compare it with a set of typical machine learning based designs, as briefly described below, for network intrusion detection.

Support Vector Machine (SVM) [1]: SVM is a classical machine learning approach that uses a kernel function, such as Gaussian kernel (RBF), to learn high-dimensional data. But as pointed out in [2], it has a low generation capability on learning large scale data.

Adaptive Boosting (AdaBoost) [13]: It is an ensemble learning approach that uses many cascaded weak classifiers (such as decision trees) to construct a stronger classifier to learn complex tasks. The advantage of using many weak classifiers is its ability to mitigate the overfitting problem. However, AdaBoost often does not work well on imbalanced datasets.

Random Forest (RF) [38]: RF is also an ensemble learning approach. But compared to AdaBoost, it uses a different strategy of weight allocation. Apart from having ability to reduce overfitting, RF can also handle imbalanced data. But its generalization capability often relies on the specification of features to be learned and for the effectiveness of learning, a large number of features are required.

MultiLayer Perceptron (MLP) [21]: MLP is an early class of feed-forward neural network that uses Back-Propagation to learn non-linear problems.

Convolution Neural Network (CNN, or ConvNet) [11]: CNN is the most popular deep neural network that has been used for image recognition and has gained great successes. With the help of convolution operation, CNN has an ability to generate spatial representations from raw data.

Long Short Term Memory (LSTM) [12]: LSTM is a recurrent neural network. By generating temporal representations from learning, LSTM has been successfully applied to speech recognition and machine translation. LSTM is similar to GRU we used in our residual block but LSTM has a higher computing cost [7].

HAST-IDS [34]: It is a recently proposed intrusion detection system that uses a tandem CNN+LSTM model (first learning spatial representations by CNN, then learning temporal representations by LSTM) as the core decision strategy.

LuNet [36]: It is also an CNN+LSTM based intrusion detection system, similar to HAST-IDS. But LuNet uses a different architecture for effective learning of both spatial and temporal features of the input data and shows a better performance than HAST-IDS.

We trained each of those designs on UNSW-NB15 dataset. Their detection rate (DR), accuracy (ACC) and false alarm rate (FAR) are given in Table V.

Design DR% ACC% FAR%
AdaBoost 91.13 73.19 22.11
SVM (RBF) 83.71 74.80 7.73
HAST-IDS 93.65 80.03 9.60
CNN 92.28 82.13 3.84
LSTM 92.76 82.40 3.63
MLP 96.74 84.00 3.66
RF 92.24 84.59 3.01
LuNet 97.43 85.35 2.89
Pelican 97.75 86.64 1.30
TABLE V: A Comparison of Pelican’s Performance with Classical Techniques (Based on UNSW-NB15)

As can be seen from the table, among all the designs examined, Pelican shows the best performance - with the highest detection rate and accuracy, and the lowest false alarm rate - which further demonstrates the effectiveness of our design with residual learning for network intrusion detection.

Vi Background

In this section, we provide some background knowledge related to network intrusion detection and then present our two point of views for contemporary NIDS designs: 1) using anomaly detection is not suitable for real time intrusion detection for large scale network and 2) using supervised learning is effective but challenging.

The idea of network intrusion detection systems (NIDS) is proposed to improve the capability of security incidents response[6, 20]. By monitoring the network activities, NIDS can timely alert the suspicious and malicious network behaviour to network administrators.

Security experts believe that most of new attacks are variants of known attacks. The traditional NIDS designs prevent malicious behaviours by using specific attack signatures from the known attack libraries and have developed some successful products, such as Snort [24] and Zeek-IDS [22]. However, the signature-based solution lacks of intelligence to discover advanced variants of previously known attacks. Hence, two alternative strategies have been proposed to address the problem: supervised learning and anomaly detection.

The principle of anomaly detection is that we only need to learn a profile of normal traffic whereas the outliers are considered as attacks. Statistical learning [17, 30, 30, 33, 25] and unsupervised learning [23, 3, 37] are currently primary techniques used for anomaly detection. The anomaly detection for network intrusion detection is not very suitable and practical for the reasons below:

Reason one [26]: Anomaly detection often leads to a high false alarm rate. However, a secure and dependable NIDS system not only should have zero tolerance of attacks but also should not block legitimate requests.

Reason two [8]: Even if we can reduce false alarm rate by, for example, developing very sophisticated statistical or learning algorithm, a current notation of normal profile may not be fully representative in the future due to highly evolved network. Therefore, anomaly detection may only work well in a controlled network and may be less useful and practical in most of commercial network environments.

In contrast to anomaly detection, supervised learning requires a well-defined threat model to learn the underlying distinctions between normal and abnormal behaviour (such as Pelican and other classical supervised machine learning techniques shown in Table V). Supervised learning often produces a lower false alarm rate (FAR) and has more stable performance than anomaly detection. Hence, it is currently considered as a more reasonable and practical approach than anomaly detection for network intrusion detection. However, there are some challenges that need to be addressed.

Challenge one: Supervised learning requires a good and continuously updated threat model to maintain its intelligence for detecting more new attacks. However, due to the privacy and information security concerns, network attack data are often much expensive to collect.

Challenge two: Even if supervised learning can contain FAR better than anomaly detection, the false alarm rate is still not satisfactory enough compared to human experts. Hence, more effective detection approaches are still required to further reduce the false alarm rate.

Vii Conclusion

In the paper, we have examined the performance degradation problem in deep neural networks and investigated how the problem can be alleviated by using residual learning.

We presented a deep residual network, Pelican, for network intrusion detection, and compared its performance with traditional plain networks and its shallower residual network.

Our work has shown that residual learning not only can be used with CNN for image classification as has been demonstrated in the literature, but also can be used with CNN+RNN structures for network intrusion detection. Compared to a set of state-of-the-art classical machine learning based techniques, by using residual learning, the deep neural network, Pelican, has a high capability for network intrusion detection.

It must be stressed that our evaluation is not thorough due to the limitations we encountered (small training datasets and lack of powerful computing facilities). A deeper Pelican with more learning layers will be investigated in the future when large training datasets and powerful computing resources become available.

Footnotes

  1. CVE: https://cve.mitre.org/
  2. BID: https://www.securityfocus.com
  3. MSD: https://docs.microsoft.com/en-us/security-updates/securitybulletins

References

  1. I. Ahmad, M. Basheri, M. J. Iqbal and A. Rahim (2018) Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. IEEE Access 6, pp. 33789–33795. Cited by: §V-H.
  2. Y. Bengio, O. Delalleau and N. L. Roux (2006) The curse of highly variable functions for local kernel machines. In Advances in Neural Information Processing Systems, pp. 107–114. Cited by: §V-H.
  3. P. Casas, J. Mazel and P. Owezarski (2012) Unsupervised network intrusion detection systems: detecting the unknown without knowledge. Computer Communications 35 (7), pp. 772–783. Cited by: §VI.
  4. J. Chung, C. Gulcehre, K. Cho and Y. Bengio (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: item 4.
  5. J. Chung, C. Gulcehre, K. Cho and Y. Bengio (2015) Gated feedback recurrent neural networks. In International Conference on Machine Learning, pp. 2067–2075. Cited by: item 4.
  6. D. E. Denning (1987) An intrusion-detection model. IEEE Transactions on Software Engineering (2), pp. 222–232. Cited by: §VI.
  7. R. Dey and F. M. Salemt (2017) Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. Cited by: §V-H.
  8. P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández and E. Vázquez (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Computers & Security 28 (1-2), pp. 18–28. Cited by: §VI.
  9. K. He and J. Sun (2015) Convolutional neural networks at constrained time cost. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353–5360. Cited by: §II.
  10. K. He, X. Zhang, S. Ren and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §II, §II, §III.
  11. G. Hinton, N. Srivastava and K. Swersky (2012) Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Lecture Note 14, pp. 8. Cited by: §V-H.
  12. S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §V-H.
  13. W. Hu, J. Gao, Y. Wang, O. Wu and S. Maybank (2013) Online adaboost-based parameterized methods for dynamic distributed network intrusion detection. IEEE Transactions on Cybernetics 44 (1), pp. 66–82. Cited by: §V-H.
  14. A. Krizhevsky, I. Sutskever and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: item 2.
  15. J. McHugh (2000) Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory. ACM Transactions on Information and System Security (TISSEC) 3 (4), pp. 262–294. Cited by: §V.
  16. W. McKinney (2010) Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, S. van der Walt and J. Millman (Eds.), pp. 51 – 56. Cited by: §V-A.
  17. N. Moustafa, J. Slay and G. Creech (2017) Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Transactions on Big Data. Cited by: §VI.
  18. N. Moustafa and J. Slay (2015) UNSW-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6. Cited by: §V.
  19. N. Moustafa and J. Slay (2016) The evaluation of network anomaly detection systems: statistical analysis of the unsw-nb15 data set and the comparison with the kdd99 data set. Information Security Journal: A Global Perspective 25 (1-3), pp. 18–31. Cited by: §V.
  20. B. Mukherjee, L. T. Heberlein and K. N. Levitt (1994) Network intrusion detection. IEEE Network 8 (3), pp. 26–41. Cited by: §VI.
  21. S. K. Pal and S. Mitra (1992) Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks 3 (5), pp. 683–697. Cited by: §V-H.
  22. V. Paxson (1999) Bro: a system for detecting network intruders in real-time. Computer Networks 31 (23-24), pp. 2435–2463. Cited by: §VI.
  23. L. Portnoy (2000) Intrusion detection with unlabeled data using clustering. Ph.D. Thesis, Columbia University. Cited by: §VI.
  24. M. Roesch (1999) Snort: lightweight intrusion detection for networks.. In Lisa, Vol. 99, pp. 229–238. Cited by: §VI.
  25. P. Saurabh and B. Verma (2016) An efficient proactive artificial immune system based anomaly detection and prevention system. Expert Systems with Applications 60, pp. 311–320. Cited by: §VI.
  26. R. Sommer and V. Paxson (2010) Outside the closed world: on using machine learning for network intrusion detection. In 2010 IEEE Symposium on Security and Privacy, pp. 305–316. Cited by: §VI.
  27. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: item 6.
  28. R. K. Srivastava, K. Greff and J. Schmidhuber (2015) Highway networks. arXiv preprint arXiv:1505.00387. Cited by: §II.
  29. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich (2015) Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Cited by: item 1.
  30. Z. Tan, A. Jamdagni, X. He, P. Nanda, R. P. Liu and J. Hu (2014) Detection of denial-of-service attacks based on computer vision techniques. IEEE Transactions on Computers 64 (9), pp. 2519–2533. Cited by: §VI.
  31. M. Tavallaee, E. Bagheri, W. Lu and A. A. Ghorbani (2009) A detailed analysis of the kdd cup 99 data set. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. Cited by: §V.
  32. T. Tieleman and G. Hinton (2017) Divide the gradient by a running average of its recent magnitude. coursera: neural networks for machine learning. Technical Report.. Cited by: §V-C.
  33. C. Tsai and C. Lin (2010) A triangle area based nearest neighbors approach to intrusion detection. Pattern Recognition 43 (1), pp. 222–229. Cited by: §VI.
  34. W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye, Y. Huang and M. Zhu (2017) HAST-ids: learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 6, pp. 1792–1806. Cited by: §V-H.
  35. P. Wu, H. Guo and R. Buckland (2019) A transfer learning approach for network intrusion detection. In 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), pp. 281–285. Cited by: §V-A.
  36. P. Wu and H. Guo (2019) LuNet: a deep neural network for network intrusion detectoin. In 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 712–717. Cited by: §II, §IV, §V-H.
  37. S. Zanero and S. M. Savaresi (2004) Unsupervised learning techniques for an intrusion detection system. In Proceedings of the 2004 ACM Symposium on Applied computing, pp. 412–419. Cited by: §VI.
  38. J. Zhang, M. Zulkernine and A. Haque (2008) Random-forests-based network intrusion detection systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38 (5), pp. 649–659. Cited by: §V-H.
  39. Y. Zhou and R. Chellappa (1988) Computation of optical flow using a neural network. In IEEE International Conference on Neural Networks, Vol. 1998, pp. 71–78. Cited by: item 3.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
406389
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description