Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks
Abstract
Software-defined networking (SDN) is a new paradigm that allows developing more flexible network applications. SDN controller, which represents a centralized controlling point, is responsible for running various network applications as well as maintaining different network services and functionalities. Choosing an efficient intrusion detection system helps in reducing the overhead of the running controller and creates a more secure network. In this study, we investigate the performance of well-known anomaly-based intrusion detection approaches in terms of accuracy, false positive rate, area under ROC curve and execution time. Precisely, we focus on supervised machine-learning approaches where we use the following classifiers: Adaptive Neuro-Fuzzy Inference System (ANFIS), Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT) and K Nearest-Neighbor (KNN). By using the NSL-KDD benchmark dataset, we observe that KNN achieves the best testing accuracy. However, in terms of execution time, we conclude that ELM shows the best results for both training and testing stages.
tiny\floatsetup[table]font=small
1 Introduction
Network security is one of the most important aspects in modern communications. Recently, programmable networks have gained popularity due to their abstracted view of the network which, in turn, provides a better understanding of the complex network operations and increases the effectiveness of the actions that should be taken in the case of any potential threat. Software Defined Networking (SDN) represents an emerging centralized network architecture, in which the forwarding elements are being managed by a central unit, called an SDN controller, which has the ability to obtain traffic statistics from each forwarding element in order to take the appropriate action required for preventing any malicious behavior or abusing of the network. At the same time, the SDN controller uses a programmable network protocol, which is OpenFlow (OF) protocol, in order to communicate and forward its decisions to OF-enabled switches [1].
In spite of the significant impact of using a centralized controller, the controller itself creates a single point of failure which makes the network more vulnerable compared with the conventional network architecture [2]. On the other hand, the existence of a communication between the OF-enabled switches and the controller opens the door for various attacks such Denial of Service (DoS) [3], Host Location Hijacking and Man in the Middle (MIM) attacks [4]. Therefore, in order to develop an efficient Intrusion Detection System (IDS) for SDNs, the system should be able to make intelligent and real time decisions. Commonly, an IDS designed for SDNs works on the top of the controller which forms an additional burden on the controller itself. Thus, designing a lightweight IDS is considered an advantage, since it helps in effectively detecting of any potential attacks as well as performing other fundamental network operations such as routing and load balancing in a more flexible manner. Scalability, is also an important factor which should be taken into consideration during the designing stage of the system [4]. There are two main groups of intrusion detection systems: signature-based IDS and anomaly-based IDS. Signature-based IDS searches for defined patterns within the analyzed network traffic. On the other hand, anomaly-based IDS is able to estimate and predict the behavior of system. Signature-based IDS shows a good performance only for specified well-known attacks. On the contrary, anomaly-based IDS enjoys ability to detect unseen intrusion events, which is an important advantage in order to detect zero day attacks [5]. Anomaly-based IDS can be grouped into three main categories [5]: statistical-based approaches, knowledge-based approaches, and machine learning-based approaches. In this study, we are focusing on machine learning-based approaches. Machine learning techniques can be categorized into four main categories: (i) supervised techniques, (ii) semi-supervised techniques, (iii) unsupervised techniques and (iv) reinforcement techniques. In this paper, we investigate various supervised learning techniques with respect to their accuracy, false positive rate, area under ROC curve and time taken to train and test each classifier.
2 Related Work
The previous research efforts made for providing a detailed analysis of supervised machine learning-based intrusion detection are summarized in Table 1. These studies focused on training and testing different machine learning approaches using standard intrusion detection datasets. However, obtaining all these features from an SDN controller could be computationally expensive. Therefore, we can either use a subset of these standard datasets [6] or extract new features based on network traces of standard datasets or statistics provided by the controller [7]. In this study, we use a subset of features extracted from NSL-KDD dataset and we consider the following supervised machine learning approaches: Adaptive Neuro-Fuzzy Inference System (ANFIS), Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT) and K Nearest-Neighbor (KNN).
Ref. | Year | Algorithms | Dataset | ||||||
[8] | 2005 |
|
KDD CUPâ99 | ||||||
[9] | 2007 |
|
KDD CUPâ99 | ||||||
[10] | 2009 |
|
NSL-KDD | ||||||
[11] | 2010 |
|
NSL-KDD | ||||||
[12] | 2013 |
|
NSL-KDD | ||||||
[13] | 2013 |
|
NSL-KDD | ||||||
[14] | 2013 |
|
NSL-KDD | ||||||
[15] | 2015 | Neural Networks | NSL-KDD | ||||||
[16] | 2016 |
|
NSL-KDD |
3 Dataset and Selected Features
[ht!] As mentioned earlier, in this study we use NSL-KDD dataset. NSL-KDD is an improved version of KDD Cup99 dataset, which suffers from huge number of redundant records [10]. Both KDD Cup99 and NSL-KDD datasets include the features shown in Table 2. In this study, we use only features which are easy to obtain from SDN controllers [7]. Therefore, features number F1, F2, F5, F6, F23 and F24 are selected for this study.
F. # | Feature name. | F. # | Feature name. | F. # | Feature name. |
F1 | Duration | F15 | Su attempted | F29 | Same srv rate |
F2 | Protocol type | F16 | Num root | F30 | Diff srv rate |
F3 | Service | F17 | Num file creations | F31 | Srv diff host rate |
F4 | Flag | F18 | Num shells | F32 | Dst host count |
F5 | Source bytes | F19 | Num access files | F33 | Dst host srv count |
F6 | Destination bytes | F20 | Num outbound cmds | F34 | Dst host same srv rate |
F7 | Land | F21 | Is host login | F35 | Dst host diff srv rate |
F8 | Wrong fragment | F22 | Is guest login | F36 | Dst host same src port rate |
F9 | Urgent | F23 | Count | F37 | Dst host srv diff host rate |
F10 | Hot | F24 | Srv count | F38 | Dst host serror rate |
F11 | Number failed logins | F25 | Serror rate | F39 | Dst host srv serror rate |
F12 | Logged in | F26 | Srv serror rate | F40 | Dst host rerror rate |
As shown in Table 3, NSL-KDD includes a total of 39 attacks where each one of them is classified into one of the following four categories. Moreover, a set of these attacks is introduced only in the testing set.
Attack category | Attack name | |||||||
---|---|---|---|---|---|---|---|---|
|
|
|||||||
|
|
|||||||
|
|
|||||||
Probe |
|
In addition, Table 4 shows the distribution of the normal and attack records in NSL-KDD training and testing sets.
[longtable]font=tiny Total Records Normal DoS R2L U2R Probe KDD Train 125973 67343 45927 995 52 11656 53.46% 36.46 0.79% 0.04% 9.25% KDD Test 22544 9711 7458 2754 200 2421 43.07% 33.08% 12.22% 0.89% 10.74%
4 Evaluation Metrics
The performance of each classifier is evaluated in terms of accuracy, False Positive Rate, Area Under ROC Curve (AUC) and execution time. A good IDS should achieve high accuracy with low false positive rate. The accuracy is calculated by:
(1) |
True Positives (TP) is the number of attack records correctly classified; True Negatives (TN) is the number of normal traffic records correctly classified; False Positives (FP) is the number of normal traffic records falsely classified and False Negatives (FN) is number of attack records instances falsely classified. False positive rate is calculated by:
(2) |
In addition, we evaluate the performance of previously selected classifiers based on execution time as well as the analysis of the receiver operator characteristic (ROC) curve where the area under curve (AUC) can be used to compare each classifier with another one. The higher AUC, the better IDS.
5 Experimental Results
The experiment is conducted on Intel i5 machine with 12 GB of RAM. Table 5 shows the results obtained for both training and testing stages. In terms of accuracy, the most accurate classifiers for the training stage are: DT, KNN and RF with a slight difference between them. For the testing stage, however, we notice that KNN approach achieved the highest accuracy. In terms of false positive rate, it is worth mentioning that RF approach achieved the best results.
Method | Accuracy (%) |
|
||||
Training | Testing | Training | Testing | |||
Naive Bayes | 59.27 | 49.88 | 3.7227 | 5.14 | ||
NN | 84.10 | 66.22 | 2.41 | 1.61 | ||
LDA | 87.57 | 69.36 | 3.26 | 2.24 | ||
ANFIS | 88.88 | 68.80 | 2.89 | 2.46 | ||
SVM | 90.86 | 71.00 | 6.55 | 10.27 | ||
ELM | 93.16 | 74.17 | 2.25 | 2.31 | ||
RandomForest | 98.09 | 75.96 | 0 | 0 | ||
KNN | 98.23 | 77.09 | 3.128 | 4.07 | ||
Decision Trees | 98.37 | 74.43 | 0.306 | 6.43 |
In terms of area under ROC curve, as shown in Fig. 1 and Fig. 2, we notice that KNN approach achieves the best AUC for both training and testing tasks followed by DT, RF and ELM with slight difference between each other. Both ANFIS and SVM had nearly the same AUC for the training task. NB, however, achieved the least AUC for both training and testing tasks. For the testing task, as shown in Fig. 2, RF achieved a higher AUC than DT. In the same context, we notice that ELM also achieved a higher AUC than DT approach. In addition, SVM showed a better AUC than LDA and ANFIS. On the other hand, using a subset of flow-based features was fairly good for the training stage. However, it significantly reduced the testing accuracy, which may indicate that selecting optimal features should be taken into consideration for achieving a higher accuracy during the testing stage.


In terms of execution time, from Fig. 3 and Fig. 4, we observe that ELM approach achieves the best results for both training and testing tasks. Therefore, we conclude that ELM is a timely-efficient choice for SDNs. It is also observed that ANFIS approach shows the worst training time. On the other hand, in spite of the highest accuracy for the testing stage achieved by KNN approach, it showed the worst testing time, which may indicate that KNN is not the best choice for SDNs where each controller may need to handle thousands of flows per second. Consequently, it seems obvious that there is a trade-off between accuracy and execution time which should be taken into consideration when choosing a supervised machine learning based IDS for SDNs.


6 Conclusion
In this paper, we provided a comparative study of choosing an efficient anomaly-based intrusion detection method for SDNs. We focused on supervised machine learning approaches by using the following classifiers: ANFIS, NN, LDA, DT, RF, SVM, KNN, NB, ELM. Using NSL-KDD dataset and based on our experimental studies, we conclude that KNN approach shows the best performance in terms of accuracy and AUC. Whereas in terms of false positive rate RF approach achieves the best results. In terms of execution time, ELM achieves the best results. Our future work will be focused on comparing the results obtained from this study with other machine learning approaches and exploring other flow-based features that could be used in order to achieve a higher testing accuracy.
References
- Ha, T., Kim, S., An, N, Narantuya, J., Jeong, C., Kim, J., Lim, H.: ’Suspicious traffic sampling for intrusion detection in software-defined networks’, Comput. Netw, 2016, 109, pp. 172-182.
- AlErouda, A. and Alsmadib, I.: ’Identifying cyber-attacks on software defined networks: An inference-based intrusion detection approach’, J. Netw. Comput. Appl, 2017, 80, pp. 152â164.
- Cui, Y., Yan, L., Li, S., Xing, H., Pan, W., Zhu, J., Zheng, X.: ’SD-Anti-DDoS: fast and efficient DDoS defense in software-defined networks’, J. Netw. Comput. Appl, 2016, 68, pp. 65â79.
- Hong, S., Xu, L., Wang, H., Gu, G.: ’Poisoning network visibility in software-defined networks: new attacks and countermeasures’. Proc. NDSS. 22nd Annu. Network and Distributed System Security Symposium, California, USA, Feb 2015, pp. 1-15.
- Garcıa-Teodoroa, P., Dıaz-Verdejoa, J. Macia´-Fernandeza, G., Vazquezb, E.: ’Anomaly-based network intrusion detection: Techniques, systems and challenges’, Comput. Secur, 2009, 28, pp. 18â28.
- Tang, T. A., Mhamdi, L., McLernon, D., Zaidi, S. A. R., Ghogho, M.: ’Deep learning approach for network intrusion detection in software defined networking’. Proc. International Conf. on Wireless Networks and Mobile Communications, Fez, Morocco, Oct. 2016, pp. 258-263.
- Braga, R., Mota, E., Passito, A.: ’Lightweight DDoS flooding attack detection using NOX/OpenFlow’. Proc. IEEE 35th Conf. on Local Computer Networks, Denver, USA, Oct. 2010, pp. 408-415.
- Laskov, P., Düssel, P., Schäfer C., Rieck, K.: ’Learning intrusion detection: supervised or unsupervised?’. Proc. 13th International Conf. Image Analysis and ProcessingâICIAP, Cagliari, Italy, Sept. 2005, pp. 50-57.
- F. Gharibian F.,Ghorbani, A. A.: ’Comparative study of supervised machine learning techniques for intrusion detection’. Proc. IEEE Fifth Annu. Conf. on Communication Networks and Services Research, New Brunswick, Canada, May 2007, pp. 350-358.
- Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A. A.: ’A detailed analysis of the KDD CUP 99 data set’. Proc. IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ontario, Canada, July 2009, pp. 1-6.
- Panda, M., Abraham, A., Patra, M. R.: ’Discriminative multinomial naive bayes for network intrusion detection’. Proc. IEEE Sixth International Conf. on Information Assurance and Security, Atlanta, Canada, Aug. 2010, pp. 5-10.
- Aziz, A. S. A., Hassanien, A. E., Hanaf, S. E. O., Tolba, M. F.: ’Multi-layer hybrid machine learning techniques for anomalies detection and classification approach’. Proc. IEEE 13th International Conf. on Hybrid Intelligent Systems, Gammarth, Tunisia, Dec. 2013, pp. 215-220.
- Thaseen S., and Kumar, C. A.: ’An analysis of supervised tree based classifiers for intrusion detection system’. Proc. IEEE International Conf. on Pattern Recognition, Informatics and Mobile Engineering, Salem, India, Feb. 2013, pp. 294-299.
- Chauhan, H., Kumar, V., Pundir, S., Pilli E. S.: ’A comparative study of classification techniques for intrusion detection’. Proc. IEEE International Symposium on Computational and Business Intelligence, New Delhi, India, Aug. 2013, pp. 40-43.
- Ingre B., Yadav, A.: ’Performance analysis of NSL-KDD dataset using ANN’. Proc. IEEE International Conf. on Signal Processing and Communication Engineering Systems, Guntur, India, Jan. 2015, pp. 92-96.
- Belavagi, M. C., Muniyal, B.: ’Performance evaluation of supervised machine learning algorithms for intrusion detection’. Proc. Twelfth International Multi-Conf. on Information Processing, Bangalore, India, Dec. 2016, pp. 117-123.
