An Intrusion Detection Algorithm based on D-S Theory and Rough Set

—Intrusion detection system is a kind of network security system, which can alarm suspicious transmission or take active response measures when it real-time monitors network transmission and discovers suspicious transmission. But intrusion detection system has many problems such as wrong detection of intrusions, missed intrusions, poor real-time performance. In order to improve the performance of intrusion detection system, this paper proposes an intrusion detection algorithm based on D-S theory and Rough Set. The algorithm uses the attribute reduction algo-rithm in rough set to eliminate redundant attributes, form the simplest attributes set, overcome the traditional D-S theory relying on expert knowledge to provide evidence and makes each evidence body mutual independence. So it improves the evidence synthesis efficiency, shortens the evidence synthesis time and reduces the conflict phenomenon of evidence synthesis. On this basis, the paper builds an intrusion detection model based on D-S theory and rough set, and the experimental results demonstrate that the model has higher detection rate and lower false detection rate.


INTRODUCTION
With the rapid development and the wide application of the computer network technology, network security problems become more and more prominent. Establishment of effective intrusion detection system to protect the security of computer information system has become more and more important. Intrusion detection system (IDS) is a new safety assurance technology following traditional security ways such as firewall and data encryption. IDS can identify and response to improperly use computer and network resource [1][2].Now the methods used to establish the intrusion detection model are mainly neural network, SVM and immune network ,etc. The decision factors of intrusion behavior are often very complicated. Many intrusion detection systems only match and filter a few features to directly led to the high error rate. This paper uses the rough set attribute reduction algorithm to obtain evidence and uses the decision rule strength to obtain the basic probability assignment of evidence. So the method reduces the subjectivity of the basic probability assignment and the conflict phenomena of evidence synthesis. And this paper applies the method in intrusion detection to improve the detection accuracy.
In the equation (1), A ' basic probability assignment. On the basis of the basic probability assignment function, the trust function can be defined as [6][7][8][9]: In the equation (2), Two evidences on the recognition framework ! can be efficiently synthesized by the synthesis rules in D-S theory [10]. Two or more evidences can be synthesized two by two by the rules. Let The limitations of D-S theory are as follows: the synthesis rules require evidence body mutual independence and are sensitive to the change of the basic probability assignment and require the basic probability assignment value which is reasonably given and can not be randomly assigned.

III. ROUGH SET THEORY
The main idea of rough set theory is as follows [11]: keeping the knowledge base classification ability unchanged, the theory deletes irrelevant or unimportant attributes. Through the knowledge reduction, unnecessary attributes are removed and knowledge representation is simplified and essential information is not lost. So the concept of classification rules can be derived to improve the decision precision. and is also an equivalence relation, and is called indistinguishable relation in Q and denoted by ) (Q ind .
Definition 2 [13]: in Knowledge expression sys- , U is a non-empty finite set of object, A is a non-empty finite set of attributes and called attribute set. Attribute set can be divided into condition attribute set C and decision attribute set D, is a information function which makes the attribute a of any element have a unique value in V. And the information system is called decision information system. Definition 3 [14]: For

IV. INTRUSION DETECTION SYSTEM BASED ON D-S THEORY AND ROUGH SET
This paper deals with a large number of network access data in the characteristic level by rough set, and then judges the redundant attributes by the knowledge expression system simplified and gets the simplest combination of the characteristic attributes to simplify the characteristic data. Finally the final invasion decision results can be obtained by the synthesis of D-S theory in decision level. The combination of rough set and D-S theory is as follows: 1. The decision attributes can be obtained by the reduction effect of rough set theory to form evidence. 2. The basic probability assignment of evidence can be obtained by the attribute importance measure of rough set. The intrusion detection model based on rough set and D-S theory, (as shown in Figure 1).The model first pretreats collection of data, chooses training samples, reduces attributes in decision tables, produces reduced output rules to construct rule base of safe system and intrusion detection detector. The initial intrusion model needs gradually perfect and improvement in subsequent studies to reach the best detection effect.
From intrusion model we can clearly see that intrusion detection algorithm mainly involves some following problems: 1. Intrusion data discrete. IDS analyzes the data which includes network data and host data. The analysis of network packet is a key point in the current intrusion detection study. Compared to host log data, network data is more complex and multiple and thus greatly increase the intrusion difficulty of network attack. To improve the intrusion effects, the large amount of collected data-points needs be dispersed by the method of equal frequency division. 2. Attribute reduction. The attributes of collected data sets are structured and reduced .And redundancy intrusion attributes are removed

A. Evidence Formation based on Attribute Reduction
The redundant attributes in decision table can be eliminated by the attribute reduction and the key attributes be retained. The decision table being reduced and the decision table not being reduced have the same knowledge.
This paper presents the relative attribute reduction algorithm which is as follows: which can also be expressed as Output: a relative reduction Step 1: Initialize R to an empty set ! .
Step 2: Calculate the condition entropy of the decision attribute C and the relative condition attribute B .
Step 3: Calculate the condition entropy and putting b corresponding to the minimum condition entropy into R , Step 4: Calculate , then go to step 7.
Step 6: Calculate for each attribute r corresponding to R (not including a which is the latest attribute added). If the difference value is less than a given threshold value ", make , and then go to Step4, and continue to calculating.
Step 7: After reduction, output ( ) Condition information entropy is as follows:

B. The Structure of Basic Probability Assignment
When B G ! , and many condition attributes are included in G , the strength of decision rule (6): When B e ! , the strength of decision rule is shown in (7): .The strength of decision expansion rule is shown in (8): The theorem determining the basic probability assignment of evidence is as follows: , k A j = , the basic probability assignment of proposition Q is shown in (9):

C. Algorithm Application
In order to demonstrate the algorithm application process, this paper uses three attributes to compose condition attribute to judge invasion. These three attributes are respectively expressed as 1 C is the decision attribute and the value domain is the recognition framework ! of intrusion detection. History intrusion data collected are pretreated and attributes are dispersed to form the decision table of intrusion detection, which is shown in Table I.   TABLE I. .
When the threshold . Condition attribute 3 E is redundant on the basis of the entropy. The basic probability assignment of 1 E and 2 E can be respectively calculated by (5)~(9). The basic probability assignment of evidence and the basic probability assignment synthesized are shown in Table II.
! are the preset threshold. 1 E is judgment result. When the value of 1 ! and 2 ! is 0.1, intrusion type is DoS which is determined by the above reasoning decision rule. But how to determine the specific type of DoS needs further analysis. It can be seen from the synthesized results that the uncertainty degree is gradually reduced, and the reasoning obviously concentrates to the results set {DoS}. These show that much evidence synthesis performance is better than the single-evidence synthesis.

D.Experimental Results and Analysis
In order to prove the effectiveness of the algorithm, this paper use data sets KDD99 to test. Because the data amount in KDD99 is too large, 10000 data selected from the data set are as the experimental data. Data selected contain as much as possible the common attack method, and ensure each attack method has a certain quantity of data, which includes normal records and 4 class attack record. The proportion of all kinds of attacks in attack record is as follows: Dos is 94%, R2L is 2%, U2R is 2%, Probe is 2%. Therefore, the identification intrusion detection framework can be established.
There are 32 continuous attributes in 41 characteristic attributes of TCP record(http: //kdd.ics.uci. edu/databases /kddcup99/kddcup99.html). First, Naive Scaler algorithm disperses continuous attributes, so all the characteristic attributes are transformed into discrete attributes. Then attribute reduction algorithm reduces attributes of 10% data set selected.
Because the reduction method has been realized in Rosetta software, we use Rosetta software to reduce attributes of the data set. After reduced, normal connection records attribute set has 26 items:{ l, 2,4,6,7,8,9,10,13,15,18,20 To assess the performance of the intrusion detection method, two performance parameters are introduced: detection rate ! and false detection rate ! . They are defined as follows: H is the invasion number of correct detection, O is the invasion number of misjudgment, L is the total invasion number and Z is total non-intrusion record number. Detection rate and false detection rate can measure the performance of the intrusion detection system. Intrusion detection system is always expected to achieve high detection rate and low false detection rate. The experimental results are shown in Table III. From table 3, we can see that the network intrusion detection algorithm based on rough set and D-S theory in this paper has high detection rate and low false detection rate. For normal connection, detection rate of DoS and Probe attack is very high, and the result is better than the best detection result of KDD Cup99:97.3% (DoS) and 75.0% (Probe).Therefore, this algorithm is very good to satisfy the security detection requirements of intrusion detection system.
To further test the algorithm effectiveness, lots of comparison experiment in different types of intrusion algorithms, such as data mining(DM), Support Vector Machine (SVM), BP(back propagation)Neural Network, etc are made and experimental results are obtained as follows(See Figure 2 and Figure 3) By comparing the experimental results we can see that the algorithm based on rough set and D-S theory is better than other methods of intrusion detection in the detection rate and the false detection rate. Because data mining applied to intrusion detection requires a large amount of data, and the intrusion methods based on Support Vector Machine and BP neural network need high training speed and large computing amount etc, so the effectiveness and intrusion rate of the detection algorithms on the base of the BP neural networks, support vector machines and data mining is significantly lower than the algorithm based on rough set and D-S theory. All these are fully proved that the rough set and D-S theory used in intrusion detection systems are very effective.  The intrusion detection algorithm based on rough set and D-S theory can give full play to their respective advantages of the two kinds of uncertainty reasoning theory. Through the experiment on the KDD99 data set, it shows that the hybrid model has higher detection rate and lower false detection rate. Future research will focus on to attribute reduction algorithm optimization to improve the detection speed.