Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks

Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks

Abstract

Software-defined networking (SDN) is a new paradigm that allows developing more flexible network applications. SDN controller, which represents a centralized controlling point, is responsible for running various network applications as well as maintaining different network services and functionalities. Choosing an efficient intrusion detection system helps in reducing the overhead of the running controller and creates a more secure network. In this study, we investigate the performance of well-known anomaly-based intrusion detection approaches in terms of accuracy, false positive rate, area under ROC curve and execution time. Precisely, we focus on supervised machine-learning approaches where we use the following classifiers: Adaptive Neuro-Fuzzy Inference System (ANFIS), Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT) and K Nearest-Neighbor (KNN). By using the NSL-KDD benchmark dataset, we observe that KNN achieves the best testing accuracy. However, in terms of execution time, we conclude that ELM shows the best results for both training and testing stages.

\DeclareFloatFont

tiny\floatsetup[table]font=small

1 Introduction

Network security is one of the most important aspects in modern communications. Recently, programmable networks have gained popularity due to their abstracted view of the network which, in turn, provides a better understanding of the complex network operations and increases the effectiveness of the actions that should be taken in the case of any potential threat. Software Defined Networking (SDN) represents an emerging centralized network architecture, in which the forwarding elements are being managed by a central unit, called an SDN controller, which has the ability to obtain traffic statistics from each forwarding element in order to take the appropriate action required for preventing any malicious behavior or abusing of the network. At the same time, the SDN controller uses a programmable network protocol, which is OpenFlow (OF) protocol, in order to communicate and forward its decisions to OF-enabled switches [1].

In spite of the significant impact of using a centralized controller, the controller itself creates a single point of failure which makes the network more vulnerable compared with the conventional network architecture [2]. On the other hand, the existence of a communication between the OF-enabled switches and the controller opens the door for various attacks such Denial of Service (DoS) [3], Host Location Hijacking and Man in the Middle (MIM) attacks [4]. Therefore, in order to develop an efficient Intrusion Detection System (IDS) for SDNs, the system should be able to make intelligent and real time decisions. Commonly, an IDS designed for SDNs works on the top of the controller which forms an additional burden on the controller itself. Thus, designing a lightweight IDS is considered an advantage, since it helps in effectively detecting of any potential attacks as well as performing other fundamental network operations such as routing and load balancing in a more flexible manner. Scalability, is also an important factor which should be taken into consideration during the designing stage of the system [4]. There are two main groups of intrusion detection systems: signature-based IDS and anomaly-based IDS. Signature-based IDS searches for defined patterns within the analyzed network traffic. On the other hand, anomaly-based IDS is able to estimate and predict the behavior of system. Signature-based IDS shows a good performance only for specified well-known attacks. On the contrary, anomaly-based IDS enjoys ability to detect unseen intrusion events, which is an important advantage in order to detect zero day attacks [5]. Anomaly-based IDS can be grouped into three main categories [5]: statistical-based approaches, knowledge-based approaches, and machine learning-based approaches. In this study, we are focusing on machine learning-based approaches. Machine learning techniques can be categorized into four main categories: (i) supervised techniques, (ii) semi-supervised techniques, (iii) unsupervised techniques and (iv) reinforcement techniques. In this paper, we investigate various supervised learning techniques with respect to their accuracy, false positive rate, area under ROC curve and time taken to train and test each classifier.

2 Related Work

The previous research efforts made for providing a detailed analysis of supervised machine learning-based intrusion detection are summarized in Table 1. These studies focused on training and testing different machine learning approaches using standard intrusion detection datasets. However, obtaining all these features from an SDN controller could be computationally expensive. Therefore, we can either use a subset of these standard datasets [6] or extract new features based on network traces of standard datasets or statistics provided by the controller [7]. In this study, we use a subset of features extracted from NSL-KDD dataset and we consider the following supervised machine learning approaches: Adaptive Neuro-Fuzzy Inference System (ANFIS), Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT) and K Nearest-Neighbor (KNN).

Ref. Year Algorithms Dataset
[8] 2005
C 4.5, K-nearest neighbor
Multi-layer perceptron
Regularized discriminant analysis
Fisher linear discriminant
Support vector machines
KDD CUP’99
[9] 2007
Decision Trees, Random Forest
Naive Bayes, Gaussian classifier
KDD CUP’99
[10] 2009
J48, SVM, Naive Bayes (NB), NB Tree
Random Forest, Random Tree
Multi-layer perceptron (MLP)
NSL-KDD
[11] 2010
Discriminative multinomial
Naïve Bayes classifiers
NSL-KDD
[12] 2013
Principal component analysis based
feature selection
Genetic algorithm based
detector generation, J48,
NB, MLP, BF-Tree, NB- Tree, RF Tree.
NSL-KDD
[13] 2013
Feature selection by
correlation based feature selection
and consistency based filter
ADTree, C4.5, J48graft,
LADTree, NBTree, RandomTree,
RandomForest, REPTree
NSL-KDD
[14] 2013
J48, BayesNet, Logistic, SGD,
IBK, JRip, PART, Random Forest,
Random Tree and REPTree
NSL-KDD
[15] 2015 Neural Networks NSL-KDD
[16] 2016
Logistic Regression
Gaussian Naive Bayes
SVM and Random Forest
NSL-KDD
Table 1: Overview of previous supervised machine learning studies for intrusion detection

3 Dataset and Selected Features

[ht!] As mentioned earlier, in this study we use NSL-KDD dataset. NSL-KDD is an improved version of KDD Cup99 dataset, which suffers from huge number of redundant records [10]. Both KDD Cup99 and NSL-KDD datasets include the features shown in Table 2. In this study, we use only features which are easy to obtain from SDN controllers [7]. Therefore, features number F1, F2, F5, F6, F23 and F24 are selected for this study.

F. # Feature name. F. # Feature name. F. # Feature name.
F1 Duration F15 Su attempted F29 Same srv rate
F2 Protocol type F16 Num root F30 Diff srv rate
F3 Service F17 Num file creations F31 Srv diff host rate
F4 Flag F18 Num shells F32 Dst host count
F5 Source bytes F19 Num access files F33 Dst host srv count
F6 Destination bytes F20 Num outbound cmds F34 Dst host same srv rate
F7 Land F21 Is host login F35 Dst host diff srv rate
F8 Wrong fragment F22 Is guest login F36 Dst host same src port rate
F9 Urgent F23 Count F37 Dst host srv diff host rate
F10 Hot F24 Srv count F38 Dst host serror rate
F11 Number failed logins F25 Serror rate F39 Dst host srv serror rate
F12 Logged in F26 Srv serror rate F40 Dst host rerror rate
Table 2: List of features of KDD Cup ’99 dataset.

As shown in Table 3, NSL-KDD includes a total of 39 attacks where each one of them is classified into one of the following four categories. Moreover, a set of these attacks is introduced only in the testing set.

Attack category Attack name
Denial of service
(DoS)
Apache2, Smurf, Neptune,
Back, Teardrop, Pod, Land,
Mailbomb, Processtable, UDPstorm
Remote to local
(R2L)
WarezClient, Guess_Password,
WarezMaster, Imap, Ftp_Write,
Named, MultiHop, Phf, Spy
SnmpGetAttack, SnmpGuess,
Worm, Xsnoop, Xlock, Sendmail
User to root
(U2R)
Buffer_Overflow, Httptuneel,
Rootkit, LoadModule, Perl,
Xterm, Ps, SQLattack
Probe
Satan, Saint, Ipsweep,
Portsweep, Nmap, Mscan
Table 3: List of attacks presented in NSL-KDD dataset

In addition, Table 4 shows the distribution of the normal and attack records in NSL-KDD training and testing sets.

\captionsetup

[longtable]font=tiny Total Records Normal DoS R2L U2R Probe KDD Train 125973 67343 45927 995 52 11656 53.46% 36.46 0.79% 0.04% 9.25% KDD Test 22544 9711 7458 2754 200 2421 43.07% 33.08% 12.22% 0.89% 10.74%

Table 4: Distribution of attacks and normal records in NSL-KDD dataset

4 Evaluation Metrics

The performance of each classifier is evaluated in terms of accuracy, False Positive Rate, Area Under ROC Curve (AUC) and execution time. A good IDS should achieve high accuracy with low false positive rate. The accuracy is calculated by:

(1)

True Positives (TP) is the number of attack records correctly classified; True Negatives (TN) is the number of normal traffic records correctly classified; False Positives (FP) is the number of normal traffic records falsely classified and False Negatives (FN) is number of attack records instances falsely classified. False positive rate is calculated by:

(2)

In addition, we evaluate the performance of previously selected classifiers based on execution time as well as the analysis of the receiver operator characteristic (ROC) curve where the area under curve (AUC) can be used to compare each classifier with another one. The higher AUC, the better IDS.

5 Experimental Results

The experiment is conducted on Intel i5 machine with 12 GB of RAM. Table 5 shows the results obtained for both training and testing stages. In terms of accuracy, the most accurate classifiers for the training stage are: DT, KNN and RF with a slight difference between them. For the testing stage, however, we notice that KNN approach achieved the highest accuracy. In terms of false positive rate, it is worth mentioning that RF approach achieved the best results.

Method Accuracy (%)
False Positive
Rate (%)
Training Testing Training Testing
Naive Bayes 59.27 49.88 3.7227 5.14
NN 84.10 66.22 2.41 1.61
LDA 87.57 69.36 3.26 2.24
ANFIS 88.88 68.80 2.89 2.46
SVM 90.86 71.00 6.55 10.27
ELM 93.16 74.17 2.25 2.31
RandomForest 98.09 75.96 0 0
KNN 98.23 77.09 3.128 4.07
Decision Trees 98.37 74.43 0.306 6.43
Table 5: Accuracy and false positive rate obtained after training and testing different supervised machine learning algorithms

In terms of area under ROC curve, as shown in Fig. 1 and Fig. 2, we notice that KNN approach achieves the best AUC for both training and testing tasks followed by DT, RF and ELM with slight difference between each other. Both ANFIS and SVM had nearly the same AUC for the training task. NB, however, achieved the least AUC for both training and testing tasks. For the testing task, as shown in Fig. 2, RF achieved a higher AUC than DT. In the same context, we notice that ELM also achieved a higher AUC than DT approach. In addition, SVM showed a better AUC than LDA and ANFIS. On the other hand, using a subset of flow-based features was fairly good for the training stage. However, it significantly reduced the testing accuracy, which may indicate that selecting optimal features should be taken into consideration for achieving a higher accuracy during the testing stage.

Figure 1: ROC curve for training different supervised machine learning algorithms
Figure 2: ROC curve for testing different supervised machine learning algorithms

In terms of execution time, from Fig. 3 and Fig. 4, we observe that ELM approach achieves the best results for both training and testing tasks. Therefore, we conclude that ELM is a timely-efficient choice for SDNs. It is also observed that ANFIS approach shows the worst training time. On the other hand, in spite of the highest accuracy for the testing stage achieved by KNN approach, it showed the worst testing time, which may indicate that KNN is not the best choice for SDNs where each controller may need to handle thousands of flows per second. Consequently, it seems obvious that there is a trade-off between accuracy and execution time which should be taken into consideration when choosing a supervised machine learning based IDS for SDNs.

Figure 3: Execution time for training different supervised machine learning methods
Figure 4: Execution time for testing different supervised machine learning methods

6 Conclusion

In this paper, we provided a comparative study of choosing an efficient anomaly-based intrusion detection method for SDNs. We focused on supervised machine learning approaches by using the following classifiers: ANFIS, NN, LDA, DT, RF, SVM, KNN, NB, ELM. Using NSL-KDD dataset and based on our experimental studies, we conclude that KNN approach shows the best performance in terms of accuracy and AUC. Whereas in terms of false positive rate RF approach achieves the best results. In terms of execution time, ELM achieves the best results. Our future work will be focused on comparing the results obtained from this study with other machine learning approaches and exploring other flow-based features that could be used in order to achieve a higher testing accuracy.

References

  1. Ha, T., Kim, S., An, N, Narantuya, J., Jeong, C., Kim, J., Lim, H.: ’Suspicious traffic sampling for intrusion detection in software-defined networks’, Comput. Netw, 2016, 109, pp. 172-182.
  2. AlErouda, A. and Alsmadib, I.: ’Identifying cyber-attacks on software defined networks: An inference-based intrusion detection approach’, J. Netw. Comput. Appl, 2017, 80, pp. 152–164.
  3. Cui, Y., Yan, L., Li, S., Xing, H., Pan, W., Zhu, J., Zheng, X.: ’SD-Anti-DDoS: fast and efficient DDoS defense in software-defined networks’, J. Netw. Comput. Appl, 2016, 68, pp. 65–79.
  4. Hong, S., Xu, L., Wang, H., Gu, G.: ’Poisoning network visibility in software-defined networks: new attacks and countermeasures’. Proc. NDSS. 22nd Annu. Network and Distributed System Security Symposium, California, USA, Feb 2015, pp. 1-15.
  5. Garcıa-Teodoroa, P., Dıaz-Verdejoa, J. Macia´-Fernandeza, G., Vazquezb, E.: ’Anomaly-based network intrusion detection: Techniques, systems and challenges’, Comput. Secur, 2009, 28, pp. 18–28.
  6. Tang, T. A., Mhamdi, L., McLernon, D., Zaidi, S. A. R., Ghogho, M.: ’Deep learning approach for network intrusion detection in software defined networking’. Proc. International Conf. on Wireless Networks and Mobile Communications, Fez, Morocco, Oct. 2016, pp. 258-263.
  7. Braga, R., Mota, E., Passito, A.: ’Lightweight DDoS flooding attack detection using NOX/OpenFlow’. Proc. IEEE 35th Conf. on Local Computer Networks, Denver, USA, Oct. 2010, pp. 408-415.
  8. Laskov, P., Düssel, P., Schäfer C., Rieck, K.: ’Learning intrusion detection: supervised or unsupervised?’. Proc. 13th International Conf. Image Analysis and Processing–ICIAP, Cagliari, Italy, Sept. 2005, pp. 50-57.
  9. F. Gharibian F.,Ghorbani, A. A.: ’Comparative study of supervised machine learning techniques for intrusion detection’. Proc. IEEE Fifth Annu. Conf. on Communication Networks and Services Research, New Brunswick, Canada, May 2007, pp. 350-358.
  10. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A. A.: ’A detailed analysis of the KDD CUP 99 data set’. Proc. IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ontario, Canada, July 2009, pp. 1-6.
  11. Panda, M., Abraham, A., Patra, M. R.: ’Discriminative multinomial naive bayes for network intrusion detection’. Proc. IEEE Sixth International Conf. on Information Assurance and Security, Atlanta, Canada, Aug. 2010, pp. 5-10.
  12. Aziz, A. S. A., Hassanien, A. E., Hanaf, S. E. O., Tolba, M. F.: ’Multi-layer hybrid machine learning techniques for anomalies detection and classification approach’. Proc. IEEE 13th International Conf. on Hybrid Intelligent Systems, Gammarth, Tunisia, Dec. 2013, pp. 215-220.
  13. Thaseen S., and Kumar, C. A.: ’An analysis of supervised tree based classifiers for intrusion detection system’. Proc. IEEE International Conf. on Pattern Recognition, Informatics and Mobile Engineering, Salem, India, Feb. 2013, pp. 294-299.
  14. Chauhan, H., Kumar, V., Pundir, S., Pilli E. S.: ’A comparative study of classification techniques for intrusion detection’. Proc. IEEE International Symposium on Computational and Business Intelligence, New Delhi, India, Aug. 2013, pp. 40-43.
  15. Ingre B., Yadav, A.: ’Performance analysis of NSL-KDD dataset using ANN’. Proc. IEEE International Conf. on Signal Processing and Communication Engineering Systems, Guntur, India, Jan. 2015, pp. 92-96.
  16. Belavagi, M. C., Muniyal, B.: ’Performance evaluation of supervised machine learning algorithms for intrusion detection’. Proc. Twelfth International Multi-Conf. on Information Processing, Bangalore, India, Dec. 2016, pp. 117-123.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
126764
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description