Comparison of Intrusion Detection System Hybrid Approach in Computer Networks with Previous Methods

- Various techniques have been used in designing a misuse detection system among which machine learning algorithm, smart expert systems and statistical methods can be pointed out. This study aims to compare the intrusion detection system hybrid approach in computer networks with previous methods in order to improve attack detection and reduce false alarms. The architecture of the proposed method has three stages. In the first stage, pre-processing data and feature selection using different methods such as information gain and Fisher algorithm, selecting samples was done by using various clustering methods such as self-organizing mapping, K-means clustering and data classification. In the second stage, 4 decision trees classifiers i.e. naïve Bayesian, KNN (K-nearest neighbors) and neural networks were used in order to generate median data. At the third stage, an incremental classification based on decision tree was used. Results show that the proposed hybrid method, relative to both previous individual and combined classifications, are more efficient in detecting denial of service, port scanning, remote to local (R2L) and user to root (U2R) attacks.

Several studies have been carried out in this field (5).A strategy based on distributed factors in combination with PCC has been used. In this strategy, the attack detection system was divided into two separate layers of host layer and classification layer. In [6], the proposed method of penetration detection system includes four components. In [7], a set of one-class classifiers is presented which use different trainers and in [8], 6 features in the KDD-CUP99 out of the 41 data sets available is selected. Results show that this combination could recognize 97.25% of the instances correctly. In [9], a hybrid method is used to detect abnormalities in wireless sensor networks. In [10], the results are improved by combining the two methods of K-means clustering and decision tree using C4.5 algorithms. In [11], Jiang presents a new method which merges the abnormality detection and misuse in a hierarchical radial basis function network (HRBFN). In [12], a support-vector machine, simulated annealing and decision tree is used to detect attacks. Method of presenting a hybrid approach for intrusion detection systems was based on distance summation in 2014 (13). In 2012, a study titled "An Implementation of Intrusion Detection System Using Genetic Algorithm" by Hoque, Mukit and Bikas was published (14). In hybrid methods, classifiers are combined in a way that input data are pre-processed and classifier uses data generated in the previous level. Hybrid methods are practically more efficient relative to the two previous methods and most studies carried out focus on these methods in recent years. This study proposes a three-stage method and strives to improve attack detection and reduce false alarms. Following the research, in the second chapter, the proposed method has been explained; in the third chapter results are evaluated and chapter four concludes the study.

Proposed Method
In the proposed method, initially we need to assign classifiers to each class which can distinguish and detect better relative to other classifiers. In other words, each ensemble of classes needs to perform better in the detection of a specific class. Next, in order to improve class detection, we will try to achieve the detection pattern of classifiers in the first stage. The output of each classifier in the second stage is a binary amount which specifies if an instance belongs to a class or not. Different classes can be used in the second stage and there's no requirement that the same classifiers should be used for all classes. In order to create median results in the second stage out of the four decision tree classifiers, KNN (K-nearest neighbors), naïve Bayesian and neural network for each class label has been used. In last stage, the generated median data is used to teach the final classifier. In this research, incremental method based on decision tree 1 has been used as the final classifier. The advantage is the fact that here the classifiers' detection pattern is summed and trained. This method can improve the detection rate and hybrid approach precision.  nstances is he data. According to Figure 4, in U2R data, unbalanced distribution can be observed between the training and the test side. Furthermore, in some areas of the test data, there are unique unseen cases in training sets. According to Figure 5, R2L attack data include distribution in training area, while test data are almost concentrated in one area. As you can see, the number of available training samples in mentioned cell is very small and it is highly likely that the detection precision in data related to this attack is very low. Since R2L attack is very similar to normal network activity and also the low number of training samples, especially in denser areas of train instances, make the detection of these types of attacks a major challenge in intrusion detection systems. Based on Figure 6, we can see that the distribution of training and test instances of normal activity in data set is balanced. Also, the number of training set instances in each area is evaluated to be proper distribution-wise.

Efficiency Evaluation
Evaluating the efficiency of the proposed method through efficiency evaluation and the effect of sample selection is done by means of clustering and each of the subsystems, utilizing other classifiers as final classifier and comparing with previous methods.

Efficiency evaluation and the effect of sample selection by means of clustering
In Table 1, a comparison of random and interval selection based on distance till class average, clustering by Kmeans and clustering based on self-organizing network for each class including DoS, port scan, U2L, R2L and normal activity is presented separately. All figures are based on binary decision tree classifier training. Comparing the results of Table 1, we can conclude that the best result of sample selection in normal class and DoS is based on intervals distance till the average of class. At this rate, the best result in port scan class is from sample selection based on clustering by self-organizing map method and eventually, the best result is in R2L access class and U2R attack is from sample selection based on K-means clustering. Overall, sample selection based on different methods causes diversity and variety in training data and it is expected that this process will improve the detection in the proposed system.
We will continue to evaluate and examine the effects of sample selection in the final results. The method of evaluation first does the training and classification using randomly selected instances which is the common method in papers and researches and then, in other modes, final results are extracted using diverse sample selection ways in the proposed methods. The results of stated modes are shown in Table 2.

Efficiency Evaluation of Proposed Method with the Subsystems
At a general glance, we can express the evaluation criteria in the form of two normal activity and attack classes ( Here, the difference stages of the proposed method are expressed and the equivalent title is shown in Table 4.  As you can see in Figure 7, execution of the proposed method using the 11-fold method has produced better results. According to Figure 8, we can point out to the fact that using the 11-fold method via median values does not make much difference efficiency-wise. In fact, we can conclude that the train and test data have a similar structure and are not much different topological-wise. We can conclude from Figure 9 that using median classifier results has fairly increased the detection precision in port scan class based on F-value.This procedure can also be seen in 11-fold method using the median values. According to figure 10, using the results of basic classification as median data has improved the R2L attack detection in 11-fold method greatly. The number of U2R attack class instances compared to other classes is much lower and the possibility of training learning machines of these instances is very low. This fact can be useful when using median results. Results in Figure 11 confirm this claim.

Comparison of other classifiers instead of final classifier of proposed methods
In Table 5,we will examine the results from other classifiers instead of AdaBoost classifier based on decision tree in the proposed algorithm. According to Table 5, we can say that the normal and DoS class of all classifiers have almost the same performance and there is no significant difference. In port scan class, the best performance is the AdaBoost classifier based on decision tree and next is KNN (K-nearest neighbors). In the two classes of R2L and U2R the best performance is the AdaBoost based on decision tree.

Comparison of the proposed method with 3-stage clustering method and distance summation
The method of presenting a distance sum-based hybrid method for intrusion detection (13) was proposed in 2014. In Table 6, the results of this method with the proposed hybrid approach is compared in the form of evaluation criteria. Also, in Figure 12, the F-value evaluation is drawn in diagram. Table 6. Comparison of the proposed method results with the 3-stage clustering and distance summation method Figure 12. Comparison of F-value evaluation of the proposed method with 3-stage clustering and distance summation method As we can see in Table 6 and Figure 12, it is clear that the proposed method has improved the detection precision in port scan, R2L and U2R attacks according to F-value. This precision increases in two port scan and U2R attacks are clear.
In F-value of the DoS attack class, there is no significant difference between the proposed hybrid method and the 3-stage clustering and distance summation.
In detecting normal activity, the 3-stage method based on distance summation has performed better than the Fvalue of proposed method. Of course, it is worth noting that the F-value being a better criterion does not necessarily signify that the F-value is overall the best criterion in this class, since the proposed method has a false alarm value of 1.115 compared to the previous method being 1.1159 in normal activity class. False alarm rate criterion proves that by reducing the F-value by 0.5%, the proposed method has improved the false alarm rate in the intrusion detection system.

Conclusion
In this study, the results drawn from evaluation of the proposed method of intrusion detection was presented in 4 sections. First, the results drawn from the effect of sample selection via different methods were shown. The results displayed that the use of sample selection using various methods will lead to improved results. Next the results and median classification effects were reviewed. According to the evaluation diagram presented, the use of median classifiers' results has significant effect on the detection precision rate. Next, results drawn from utilizing various other machine learning methods such as decision trees, KNN and neural networks as final classifiers were evaluated. It was clarified that the final classifier utilized as AdaBoost method based on decision tree has performed better than other methods in comparison. Then, the proposed method was compared to the 3-stage clustering and distance summation method. Table 7. Comparison of F-value evaluation of the proposed method and the three-stage clustering and distance summation based on 5 classes.
In Table 7, comparisons of F-measure evaluation of the proposed method with the three-stage clustering and distance summation method has been reviewed and displayed. Also, in the evaluation of the proposed hybrid method in binary classification, results show that the proposed method was able to reach a detection precision of 1.95655 in while being able to reduce the false alarm by 1.115 F-value evaluation which is the geometric mean recall and precision criteria. Table 8. Comparison of the proposed method with three previous methods based on normal and attack classifiers.
In Table 8, you can see a review of evaluation results of the proposed method's efficiency compared to the three-stage clustering and distance summation method in binary classification.