Paper—Feature Selection Strategy for Network Intrusion Detection System (NIDS) Using Meerkat Clan... Feature Selection Strategy for Network Intrusion Detection System (NIDS) Using Meerkat Clan Algorithm

The task of network security is to keep services available at all times by dealing with hacker attacks. One of the mechanisms obtainable is the Intrusion Detection System (IDS) which is used to sense and classify any abnormal actions. Therefore, the IDS system should always be up-to-date with the latest hacker attack signatures to keep services confidential, safe, and available. IDS speed is a very important issue in addition to learning new attacks. A modified selection strategy based on features was proposed in this paper one of the important swarm intelligent algorithms is the Meerkat Clan Algorithm (MCA). Meerkat Clan Algorithm has good diversity solutions through its neighboring generation conduct and it was used to solve several problems. The proposed strategy benefitted from mutual information to increase the performance and decrease the consumed time. Two datasets (NSL-KDD & UNSW-NB15) for Network Intrusion Detection Systems (NIDS) have been used to verify the performance of the proposed algorithm. The experimental findings indicate that, compared to other approaches, the proposed algorithm produces good results in a minimum of time. Keywords—meerkat clan algorithm, selection of features, NIDS, NSL-KDD, UNSW-NB15


Introduction
Computer network attacks are several bad things so that information and services within computer networks are damaged, denied, and degenerated or destructed. In the field of computation, multiple attacks and security breaches now face the increasing risk of unintentional downtime. Intrusion Detection Network (NID) is an intrusion detection mechanism that attempts to detect unauthorized access to a computer network for signs of malicious activity through analysis of network traffic. In this huge area of Network Intrusion Detection (NID), several fields of study exist. An IDS (Intrusion Detection System) is a key element in the defense of the network these days, as it provides complete network support, IDS detects effective and failing intrusion efforts. The IDS aims at reporting all irregular device activity and identifying all non-antiques negatively. IDS will offer real-time reactions to any of these intrusion events by studying activities closely and signing intrusion detection [1][2][3]. Because of the huge data with a large number of features, the demand for constructing a suitable machine learning model is demanded. Two approaches are used for dimensionality reduction, they are: feature extraction, which includes building a new feature space with low dimensionality, and the other approach is feature selection that emphasizes irrelevant and redundant features removal from the original feature set. The Extensive search to find the optimal feature subset is not possible in a rational time and might reduce the classifier's performance [2]. Thus many binary meta-heuristic algorithms are used to approximate the optimal solution by removing irrelevant features within a suitable computational time. Meta-heuristic algorithms are known as natural-based algorithms that are more appealing than conventional approaches for resolving optimization problems [4][5][6]. They function without derivatives and thus are suitable for high-dimensional space problems. The optimization process can go through two steps in any population-based meta-heuristic algorithm; they are exploration (diversification) where the algorithm explores the entire search space intending to find the likely states that may contain the global optima. In exploitation (intensification), the algorithm attempts to search the neighborhood of each solution state found in the exploration stage [7][8][9][10][11]. Feature selection methods choose attributes from original spaces set based on strategies like information gain, correlation, and decision table. Hall and Smith [3] proposed a subset of attributes to be related if these attributes are high connection with class and are not connected in terms of mutual information. Feature selection offers many advantages some of them are illustrated below: [12][13] • It diminishes feature dimensionality also supports getting the better performance of the algorithm. • It discards unnecessary, extraneous, or noisy data.
• It makes efficient data goodness which helps to get better the performance of learning technique. • It enhances the precision of the output model. • It assists in data grasp to acquire knowledge about the operation that created the data.
This paper presents a review of comparing between machine learning classification methods applied to analysis NSL-KDD dataset and UNSW-NB15 dataset then using some algorithms for feature selection to decrease the dimensionality of the datasets, then using the same classification methods and compare the results of different feature selection methods.
In this paper, have been introduced to identify network attacks with the aid of Random Forest and Modified Random Forest via Meerkat Clan Algorithm on NSL-KDD dataset and UNSW-NB15 dataset. So we sometimes need to increase searches in features to get the best of them.

Related work
In [14], It has been suggested to use a feature selection technique based on the Updated Artificial Immune System (MAIS). The proposed algorithm takes advantage of the benefits of the Artificial Immune System (AIS) to improve functional efficiency and randomization. The experimental findings, which were based on the NSL-KDD dataset, revealed improved accuracy as compared to other feature selection algorithms (best first search, correlation, and information gain). In [15], The NSL-KDD dataset was used to characterize the network attack using five essential classification methods and three feature selection strategies. These techniques are (J48 decision tree, support vector machine, decision table, and Bayesian network). Several studies were carried out to achieve successful results by using NSL-KDD for preparation and checking in general attack (normal and anomaly) scenarios (within 4 attacks U2R, R2L, Probe, DO). In [16]. gives an insight into the existing Intrusion Detection Systems (IDS) along with their basic principles. Furthermore, it discusses how data mining with its core feature (knowledge discovery) can help to create a data mining based on IDS. The resulted data mining may demonstrate more solid behavior comparing with traditional IDS and accomplish a higher accuracy to instruction's unique types. In [17], Using the NSL-KDD dataset, various classification algorithms (J48, SVM, and Nave Bayes) were used and analyzed. These algorithms are used to find anomalies in the packets of a network. Furthermore, the NSL-KDD dataset is used to deduce the protocols' connections from the commonly used network protocol stack from an intruders' attack that results in irregular network traffic. In [18], Mostafa and Slay released the latest big UNSW-NB15 dataset, which contains features not included in the KDD'99 dataset. Only a few different features connect the UNSW-NB15 and KDD'99 datasets, making comparison difficult. This research examines the features used in the UNSW-NB15 dataset to minimize the number of features (the curse of dimensionality) and proposes a subset of features that are more relevant in detecting network traffic intrusions. In addition, the analyses can be compared to the KDD'99 dataset to see where the correlations and differences lie.

Feature selection methods via metaheuristics
The process that selects a subgroup of features based on certain criteria from all the available features is called feature selection. The criterion is utilized to increase classification performance. Feature selection approaches can be divided into two categories: The search space is decomposed into four classes; they are: "exhaustive", "random", "heuristic", "meta-heuristic". And the other is the strategybased techniques which are decomposed into filter and wrapper feature selection approach [19][20][21][22][23].
In general, feature selection methods can be divided into three basic classes: wrapper, filter, and embedded methods. Wrapper methods utilize a learning algorithm iteratively to evaluate the truthfulness of selected feature subsets via classification accuracy. It is known to be more accurate, but it is computationally more expensive. Filter methods, however, are independent of any classification algorithms. The characteristics of the dataset are used to measure the relevance between a feature and the target label using measures such as distance and consistency. The feature selection using the filter approach is performed in one iteration, so they are easily scalable to high dimensions. In the embedded method; the learning algorithm is embedded with no iterative evaluations for the classification accuracy of the feature subset like in wrapper approaches.
During the training phase, the feature coefficients are set by minimizing the fitting miscalculations. Then, the selected features are resultant from the feature coefficients. Therefore, this method is suitable for high dimensional feature selection domains [5] [19] [20] [24][25][26][27].

Meerkat clan algorithm
The careful observation of the behavior of certain living things can illustrate how they plan their natural behavior into algorithms. This is why nature-stimulated algorithms are the new meta-heuristics discussed in this work. These new methods are meta-heuristics of global optimization, mainly gathered by selecting the best structure and by randomizing structures. The previous guiding principles, the algorithm combining to the optimal (use) and the far ahead prevents the lack of variety and the algorithm to limit local optima. Strong stability between use and research may lead to the achievement of global optimism. Meerkats are animals living socially in colonies of 5 to 30 people. They exchange both toilet and parenting duties, as sociable beings. Each mob has a male and a female alpha leader. Each mob has its ground where they sometimes move when no food is found or a tougher mob is forced to find. When the second occurs, the weaker mob tries to widen or stay until it gets tougher and retrieves the lost burrow. Every mob also has what is called a 'watchman,' that is to say, a person guarding the mob and when it can detect risks and tell the rest if there is a danger. The watcher watches either from the ground or from a tree or through the bushes. The watchman looks at both the burrow system and the other members of the mob feed for food. When a risk is observed, the watchman gives a loud bark sound, and the mob bolts quickly into its hiding holes [28].
The general steps below for MCA, which may be modified depending on the problems encrypted, are the prior explanation of Meerkat animal-inspired MCA [29][30][31][32]. a. Initialization: create a random clan of people and set the clan size, foraging size, care size, and the worst feed and caring rates of the other parameters. b. Compute the fitness for the clan c. Choose the best one as 'sentry' d. Divide the clan into two groups (foraging & care) e. Generate neighbors for foraging group f. Select the worst people in the food and swap with the best people in the care group g. Drop the worst in the care community and randomly construct another person h. Substitute the best person for sentry, if best.

Proposal approach
There are several randomly feature selection methods, some of these depend on pure randomization, others depend on nature and swarm intelligence-inspired algorithms.
This paper presents an approach as a features selection algorithm based on MCA, the proposed algorithm is a wrapper feature selection type. In our approach, MCA has a good diversification and exploration, therefore, it produces a wide space from diversified solutions that goes back to its various stages and neighbors generation strategy. Initially, our proposed algorithm will drop the worst features depend on Mutual Information (MI), then generate random solutions (features), evaluate these solutions based on the classification method, select the best solution as the best features, divide the rest into 2 groups working and spare. Evaluate these features based on accuracy values of the classification method; select the best accuracy as best features. The main loop of MCA including the processing of the classical step of MCA on the working and spare groups. The neighbor's generation functions play a big role to diverse the solutions (features) then stay the betters in the working group, the worst (Fr) of the working group will immigrate to the spare group and replaced by the best ones from the spare group. The worst (Cr) of the spare group will be replaced by random ones. Through these steps, the population of features (solutions) has been getting better or improve. Drop the worst Cr solutions from the spare group and generate ones' solution randomly; Evaluate the features in working and spare groups using classification method; If there is a features best than Best_Features Then Best_Features=best features; end while End Output: best solutions.
The step of MI will decrease the consumed time of the MIMCA approach because the worst has been dropped, which to focus on the important real features.

6
Experimental results

Datasets and MIMCA parameters
Two standard NID datasets have been selected to verify the performance of the proposed two approaches, NSL-KDD and UNSW-NB15.

DoS
Is an attack class that exhaustion the victim's resources as a result of that making the victim unable to process the request this would shut down the intended device or flood it in requests and therefore the authorized users cannot reach the device services, For example, ping to death and syn. flood.

7458
Probe Is trying to collect a datum on a Net and detect the system vulnerabilities. These vulnerabilities will take advantage to intrude the system. For example, Port scanning 11656 2421

U2R
Is it a type of attack that takes the advantage of authorized users and tries to reach the root of the system from some vulnerability? For example, buffer overflows attack."

R2L
Occurs when an attacker who can send a stream of bits to a device in a network but this attacker doesn't have an account on that device exploits some vulnerability to obtain local access to that device. For example password guessing." 995 2654  Trying to bring a program or a network to a halt by feeding it randomly generated data.

Analysis 2,677
It includes a variety of port search, malware, and HTML file penetration attacks.

Backdoors 2,329
A method of gaining unauthorized access to a device or its data by circumventing a system authentication process invisibly.

DoS 16,353
A deliberate effort to prevent users from accessing a server or network resource, normally by momentarily interrupting or halting the services of a host connecting to the Internet.

Exploits 44,525
The attacker is aware of a security flaw in an operating system or piece of software and takes advantage of it by exploiting the flaw.

Generic 215,481
Without regard for the structure of the block cipher, a technique operates against all block ciphers (with a given block and key size).

Reconnaissance 13,987
This section contains all Strikes that can be used to mimic intelligence-gathering attacks.

Shellcode 1,511
A payload is a short piece of code that is used to hack a software flaw.

Worms 174
To propagate to other machines, the attacker replicates itself. It often spreads through a computer network, depending on security bugs on the target computer to obtain access. The experimental work is carried out using ORANGE [15] which is open-source data mining software and Python programming language.

Experiments based on the NSL-KDD dataset
NSL-KDD The data collection is split into two parts: preparation and research. Models based on MIMCA are trained first, then evaluated on different partitions of the data set. Table 6 shows the precision of the MIMCA model for various versions.
In addition to, using MIMCA to find the features selected for the category in NSL-KDD, Table 8 illustrates the best features for each category in the NSL-KDD dataset.

Experiments based on the UNSW-NB15 dataset
In the UNSW-NB15 dataset, the experiments did not include the normal cases, they include only attacks. MIMCA based feature selection method has been found the best features for all categories of the UNSW-NB15 dataset. The accuracy of the MIMCA model with different models is shown in Table 9. In our experiments, 4 classification methods have been used DT: Decision Tree, BP-NN: Back-Propagation Neural Network, NB: Naïve Bayes, and Apriori algorithm. As a feature selection, we are used the proposed MIMCA and MI (Mutual Information). Also, MIMCA obtained a result same as MI in DT and Apriori algorithm, but in less time, Table 10 illustrates the consumed time of experiments in Table 8. The final 17 selected features from UNSW-NB15 using MIMCA are {8, 9,11,12,32,10,13,28,4,42,36,7,33,29,3, 41, 18}. These features for all categories of the UNSW-NB15 dataset.
In addition to, using MIMCA to find the features selected for the category in UNSW-NB15, Table 11 illustrates the best features for each category in the UNSW-NB15 dataset. Essentially, the extracted features did not differ from several feature selection methods and classification techniques, but the consumed time is less than several methods, this is for both NSL-KDD and UNSW-NB15 datasets. This indicates that the MIMCA is an efficient feature selection technique within less time. Of course, some methods give results best than MIMCA, because it depends on the power of classification methods.

Conclusion
MIMCA is a feature selection technique that is proposed and evaluated in this paper. It is based on the diversification of candidate solutions in the MCA, which leads to improving these solutions during the MCA stages. The experiments on two important NID datasets (NSL-KDD & UNSW-NB15) verified that the MIMCA is a good technique for feature selection. The selected features are the same as most good standard methods but in the least consumed time. In the future, our objective aim to MCA for classified rules generation like association rules mining.