Predicting Attack Surface Effects on Attack Vectors in an Open Congested Network Transmission Session by Machine Learning

This paper examined the impact of a network attack on a congested transmission session. The research is motivated by the fact that the previous research community has neglected to evaluate security issues related to network congestion environments, and has instead concentrated on resolving congestion issues only. At any point in time, attackers can take advantage of the congestion problem, exploit the attack surface, and inject attack vectors. In order to circumvent this issue, a machine learning algorithm is trained to correlate attack vectors from the attack surface in a network congestion signals environment with the value of decisions over time in order to maximise expected attack vectors from the attack surface. Experimental scenario that dwell on transmission rate overwhelming transmission session, resulting in a standing queue was used. The experiment produced a dataset in which a TCP transmission through bursting transmission were capture. The data was acquired using a variety of experimental scenarios. Nave Bayes, and K-Nearest Neighbours prediction analyses demonstrate strong prediction performance. As a result, this study re-establishes the association between attack surface and vectors with network attack prediction. Keywords—attack surface, attack vectors, congestion, transmission session, machine learning


Introduction
Network transmission session defines the flow of data in connection-oriented communications among the connecting. Transmission Control Protocol (TCP), is dependable in a network transmission session because it employs feedback mechanisms to ensure that all data is delivered to the intended recipient. When compared to other protocols, it does not take into account the possibility of data delivery delays and may result in data being delivered slowly [1]. TCP continues to be the most widely used mode of communication. The issue of delay in transmission with TCP, is its weakness. While TCP's primary characteristics are its ability to transport data, it also provides services that are intended to improve reliability by dealing with data segments that have been lost during a transmission session as a result of congestion caused, most likely, by a large number of segments competing for limited network resources [2]. As a results it calls for the implementation of congestion control measures. These services include dealing with loss data, minimising errors, managing a transmission session's failure. The fundamental challenges and issues in the design of a transport protocol are "mobility management", "bandwidth estimation", "packet loss estimation", "Quality of Service Support" among other things. Congestion control is applied at the transport layer in order to maintain the steady sending rate during periods of high-volume transmission. The congestion control mechanism detects packet losses and, as a result, reduces a percentage of the available congestion window. The common drawbacks og being into congested network environment is open-space for attack surfaces to exploit the network, and perform a Denial of Service Attack (DOS) or Distributed Denial of Service Attack (DDOS) [3].
Attack surfaces are the areas of a computer system that are being targeted by a cybercriminal. They are made up of all the assets that are exposed to an attacker and that can be exploited [4]. Attack Vectors are the specific method or pathway that a cybercriminal employs in order to conduct malicious activities against a targeted system or network [5].
Apart from DDoS or DDOS attacks, what other types of attacks are expected in a congestion situation? It was revealed that congestion can act as an implicit signal for decentralized implementation pulsating attacks, means that congestion itself is an attack surface [6]. This is a critical situation that necessitates assessment, because TCP is a connection-oriented protocol, it enables the control of data during transmission, which is necessary because an uncontrolled data flow can result in damage or loss. Attackers will redirect transmission flow until the congestion issue is resolved, taking advantage of this situation where the primary concern is to resolve data flow. As a result, once the congestion issue is resolved, the traffic direction will change unexpectedly [7]. This indicates the establishment of an attack surface. While the TCP connection is negotiating between the sending and receiving network nodes, specifically about the size of the data being transmitted, attacks are busy attempting to redirect the connection. They will eventually inject the attack vectors if they succeed. In a client-server architecture, the client informs the server of the maximum amount of data it is willing to accept from the server at one time. Similarly, the server must inform the client of the maximum amount of data it is willing to accept from the client at any given time, referred to as the server's receive window, which doubles as the client's send window [8]. This negotiation creates an opening for the attack surface.
Considering the nature of uncertainty within the transmission session associated with congestion as well as the fact that congestion is both an implicit signal for decentralised implementation attacks, and a legitimate cause of implementation attacks in and of itself. This paper examined the impacts of a network attack on a congested transmission session, and it considered how to deal with congestion issues associated with network attacks. To obtain training data from a network attack surface environment and to correlate the network attack impact with decision value changes over time, a machine learning algorithm was used.

Attacks associated with transmission on congested network
Apart from DOS or DDOS attacks, any other attacks associated with congestion or inaccessibility of service, are usually resolves by re-establishing paths to create more open paths necessary for transmission. That is why some studies reveals that all attacks associated to congestions are a different form of DOS or DDOS [6]. As a result, it could be concluded that attacks associated with transmission on congested are link-related TCP traffic. Pulsating attacks greatly reduced TCP traffic by continuous pulses at a network link [6]. The attack can take various forms, ranging from time-out-based pulsating attacks, to synchronous, asynchronous attacks, and traditional link flooding attacks [9]. Decentralized implementations of pulsating attacks that require a central coordinator can utilise congestion as an implicit signal for decentralisation. This creates an intermittent flooding which can reduce legitimate traffic and thus has the potential to be more harmful than simple brute-force flooding. For example, a series of carefully crafted periodic pulses on a network link can mislead TCP flows using the same link, causing legitimate flows to repeatedly enter the timeout state.
This study envisioned scenarios where the attacks described above can be predicted (see Figure 1). The interaction of transmission congestion associated to attack surface and vectors is conceptualized for the modelling of the prediction of attack surface effects on attack vectors in an open congested network transmission session. The justification of the claim lies with the vulnerability available in a TCP connection characterized with the attack associated with an attacker exploiting the attack surface. In order to gain access to an internal network in order to attack the network through the "hole" in the attack surface, it is possible that transmission session is unaware of this and are unaware of how the attack works, let alone what it allows and does not allow an attacker to accomplish.
Although the vulnerability to attack may be a critical issue in a network transmission session that is experiencing congestion, it is not the most pressing issue that requires extensive effort to resolve [10]. The conceptualization in Figure 2, established that TCP transmission, may continue in the event that the connection establishment in a 3-way handshake is maintained. A TCP split-handshake connection can be performed by an attacker in a situation where the TCP connection establishment is not monitored by TCP SYN Checking [11]. That is to say the ability to implement "TCP SYN Checking" will almost certainly result in an attack on the network. Thus, we have a special case in which an attack surface has been established [12]. The TCP SYN checking function, as a result, can protect the transmission from TCP state-based attacks [13].
In another situation, even if the TCP SYN checking was implemented, it would still be ineffective due to the fact that the connection establishment technique used in transmission was well known and that many defence security parameters will enforce the defences. The unfortunate reality is that there are additional connection establishments that are normally created when the intent is solely to attack a network infrastructure.
The simultaneous-open handshake is the term used to describe this method of connection establishment [11]. When faced with this situation, the act of establishing the simultaneous-open handshake itself creates an attack surface. A simultaneous open transmission session will be established as a result, and both connections will send a SYN packet to each other at approximately the same time, as shown in Figure 1. Then, as a response, both sides send each other acknowledgement packets (ACKs). This slightly different variant of the TCP handshake occurs infrequently in the real world; however, it is a perfectly legitimate way to initiate a TCP connection; however, it is most frequently used with the intent of committing an attack.
With the simultaneous-open handshake comes the TCP split-handshake attack, which is also known as a Snitch ACK attack in some circles [14]. The split-handshake, is a dual track that combines elements of the "normal three-way handshake" and the "simultaneous-open handshake" into a single gesture. Essentially, a client sends a SYN packet to a server with the intent of completing a standard three-way handshake. Instead of completing the three-way handshake initiated by the client, a malicious attacker begins by responding as if it were initiating a simultaneous-open connection, and then initiates its own three-way handshake in the opposite direction of the transmission session. In essence, even though the connection to the other side of the transmission session was initiated by the client, the flow direction of the connection is reversed as a result of this.
As presented in Figure 1, attack vectors are at malicious drive-in, at this point, it indicates that an attack surface has been established, and the attack vectors have been successfully evaded the connection. When this occurs, the malicious attacker has the opportunity to exploit the situation by reversing the logic direction of the connection that was originally established in his or her favour. That implies that there is a correlation between the network connection and the malicious attacker before the attack can even begin. When this attack is successful, the attacker does not even have complete control over the victim's connection; instead, the attacker has only reversed the logical direction of the initial connection.
This kind of attack could be harmful because it could result in the logical reversal of the direction of a perfectly legitimate connection that has already been established. This does not necessarily imply that the attacker can perform any new actions on the transmission connection, but it may cause confusion among the security services that are in charge of protecting the connection. If the malicious attacker uses a TCP split-handshake connection to launch the same attack, the network defence systems may become confused by the direction of the traffic and fail to scan the connection content for malicious activity. The malicious drive-through would now be successful, despite the fact that defensive security protection was in place. As a result, the TCP split-handshake attack may be used by malicious attackers to circumvent the security active services of a communication connection. Internal connections must initiate the connection in order for external attackers to be able to bypass network security policies. This prevents external attackers from circumventing any security policies.

Methodology and experimental analysis
This study opens several TCP connections scenario in which TCP transmission were captured by bursting transmission and overwhelming the transmission link. In a system running Kali Linux, a Wireshark was used and the count the numbers of instances of the SYN, SYN-ACK, ACK, http, and https with open TCP/UDP socket. Each incoming and outgoing TCP packet contains information about the network communication data, including elements such as (bytes-in-flight), connection syncs, data-text lines, duplicate acks, and flags. Open TCP/UDP socket is the attack vector, while the rest of the parameters are the attack surface as described in section 2. Even though, there are benchmark cybersecurity datasets for the evaluation of machine learning performance [15], this study generates its datasets considering that the situation here is based on congested network. Furthermore, the goal of prediction of the network security situation is to accurately predict the network security situation in real time. Data in a network security situation is random, ambiguous, and uncertain [16].

Machine prediction approach
Prediction problems in machine learning are tasks that consist of predicting the attributes of the next object based on the attributes of the object that was previously observed. Due to the high level of uncertainty in network security situation, prediction is difficult [16]. Common problem that finds real-world applications in a variety of network security areas is the prediction of the relationship between the attack surface and the attack vector. In a network congestion environment, a predictive model of the relationship between attack surface and attack vector can then be used to make predictions about previously unidentified security issues in the network. The problem of predicting the relationship between the attack surface and the attack vector critical conditions has a wide range of applications, and solving it has a large number of them. An attack surface and an attack vector prediction system is initially developed as a tool to solve a security problem that does not perform well. This is when prediction of the relationship between the attack surface and the attack vector first appears. While considering a prediction model for critical situations in technical areas, performance is critical in this situation [17][18][19]. The current study made use of the prediction algorithm of machine learning and the dataset that was generated, which was divided into two groups: a training set and a test set. Using the training dataset, we can train the predefined techniques, and the test dataset allows us to evaluate the accuracy of the trained model that we have generated. In this study, the Nave Bayes, and K-Nearest Neighbors (K-NN), were used to make successful machine learning prediction predictions.

K-Nearest Neighbors (K-NN)
In numerous research projects [20][21][22], the K-NN model has been applied to network security, specifically intrusion detection. A non-parametric classifier, it operates under the assumption that "things that look alike must be alike." The Nearest Neighbor (NN) rule is an extension of the Nearest Neighbor (K-NN) rule, which was derived from the K-NN rule. Using a given vector space as a test space, it classifies sets of attributes by comparing each one with the entire class of its K nearest neighbours. So K is the parameter that can have several different values near the sample test, as demonstrated above. Due to its limitations, the K-NN algorithm is not always effective in producing good results, which is a disadvantage. Nonetheless, because of its simplicity of implementation and high performance, it is suitable for a wide range of problems. Given the fact that data in a network security environment is typically unpredictable, confusing, and uncertain, Rao and Swathi [20] use K-NN for the characteristics of denial-of-service (DoS) and probe attacks.

Naïve bayes algorithm
The Naive Bayes algorithm is one of the most widely used data mining algorithms. This method's efficiency is derived from the assumption of attribute independence; however, this assumption may be violated in many real-world data sets. There have been numerous efforts made to mitigate the assumption, with attribute selection being one of the most important approaches [23]. Although the Naive Bayes Algorithm is oversimplified and heavily reliant on prior knowledge, it is important for network security because it is fast and accurate. Through the use of many different features, such as network features, it tends to evaluate the likelihood of the expected outcome using the preceding probability model, which is then used to make the decision [24]. It is determined by using the Naive Bayes function whether a parameter or set of parameters is relevant or non-relevant to the objective function in question. As a result, the data will be classified by the model according to whether or not they are attack vectors that influence the attack surface. The attribute assumptions about a set of n attributes are revealed by this method, which is used for intrusion detection [25]. Most of the time, the predictions made by the naive Bayes classifier are correct. Currently, there are some modified versions of naive bayes algorithm for network intrusion detection in the machine learning, a typical example is the one that applies artificial bee colony algorithm [26].

Evaluation metrics
The Evaluation is critical when estimating the performance of a machine learning algorithm. Typically, performance is measured using indicators such as precision and recall. Precision and recall are two different metrics that describe how well a prediction algorithm performs when rejecting a non-relevant class, and precision and recall are two different metrics that describe how well the algorithm finds all relevant classes. A binary label is used to differentiate between what happened in real life and what happened in the prediction when evaluating precision and recall, where x represents the correctness of the evaluation and z indicates relevance. This can be represented in the confusion matrix shown below: The positive and negative signs determine how the prediction reveals either the technique performance well or not; TP is "true positive" FP is "false positive", TN is "true negative", and FN is "false negative". Thus, precision and recall are derived from equations 1 and 2 respectively.
Where the technique do not find the dataset used in building the model appropriate it can then be concluded that TN+FP = 0, and when TP + FN = 0, it means there is no objects that is significant in the proposed the test set. This means that when evaluating the technique, errors are associated with false positives (the prediction of a non-relevant relationships) and false negatives (a relevant relationship is not found). For this reason, F measure called the F-score which is based on recall and precision is used for evaluating the prediction performance by equation 3.

( )
Where R is the recall and P is the for precision

Actual -FP TN Presentation of the results
It is stated earlier that a variety of machine learning techniques are employed for network security prediction, all of which automate the process of identifying and constructing intellectual decisions based on data. This research employs a prediction model based on the network transmission session which becomes overloaded due to the amount of data that had reached a congestion. The captured dataset includes session data that was used to develop the model, and there was a correlation established, with some sets of data being conceptualised as attack surface and a category as the attack vector.
A procedure known as modelling was used to connect sets of input variables known as attack vectors and attack surfaces. Wireshark was used to capture data in several flow analyses. This solution set out two rules: The input variables could be utilised as groups of pairs of input in training data, and also input variables with a lower variance would increase the model's accuracy. In this way, two separate training and testing datasets were provided, each with their own partitioning of the dataset. To test the hypothesis, two analyses on the model were conducted, and two different partitioning schemes were used. In the first analysis, the partition dataset was not utilised, even though it was involved in the overall analysis. Labels apply to the Open TCP/UDP sockets dataset entirely. For prediction of sets, the dataset for the entire Naïve Bayes experiments was set at a threshold of 50% The values that were used for the K-NN experiment were set to 1, 2, and 3.
The performance analysis of the model for which the first partition of the dataset was used are presented in table 1 to 3. For the first model K-NN has been found to outperformed Naïve Bayes (see Table 1). The Nave Bayes and KNN perform better than the first models in the second model, which is not similar to the first model. The second models divide training, validation, and testing into 80:10:10 ratios, whereas the first model divided training, validation, and testing into 70:15:15 ratios. As a result, it can be concluded that when the amount of training is increased, the model's performance is improved (see Table 2). When comparing the last model to the first and second models, the performance of Nave Bayes and KNN models suffers. This final model divides training, validation, and testing into three groups with the following ratios: 65:15:20. This also further demonstrates that when the number of training partitions is reduced, the performance evaluation will suffer (see Table 3). The experimental trials are conducted to test text classification algorithms on test data. For classifying each dataset, the attributes of each dataset served as the classifiers. These trials' results are displayed in the figures in Figure 2 through 6. KNN performed the best of all three models in the test dataset as shown in the figures (Figures 2 and 3). In actual practise, perfect prediction rarely occurs. However, with the model and dataset, the accuracy of K-NN on all four performance metrics comes to 100% as shown in Figure 3. Figure  This finding has demonstrated that using congestion as an implicit signal for decentralisation was can leads to attacks that required a central coordinator. The goal of connecting input variables using attack surface and attack vectors was to determine their relationships. In all the analyses carried out R 2 values are very high, it should be recognised that these analyses used dataset from several flow analyses captured in Wireshark within these flows. The study also recognised that it is an urgent matter to establish a mechanism for identifying attacks on TCP connection-oriented congestion because it provides the ability to control data as it is being sent, and yet, if left unchecked, may cause data loss that would disrupt the connection. Attackers will divert the data flow in order to resolve the congestion problem but attack the network at the same time. Because attacks may be observed while the TCP connection negotiates details, specifically the size of the data being transmitted.

Conclusion
This paper acknowledges that attackers can use congestion as a mechanism to gain access to an application's resources and then utilise that attack surface to spread attack vectors. Attack surface refers to everything that provides services to a computer system, such as a network, an operating system, and various services that are executed. Attack surface includes anything on the network, such as the operating system, various services, and even applications. Based on the Nave Bayes and K-Nearest Neighbors predictions, it is concluded that in a congestion environment of TCP transmission session, the attack surfaces are highly influencing attack vector. The performance of the algorithm used for this these experiment have demonstrated outstanding performance. It can be seen from this study that by uncovering attack surfaces, we can restore the link between attack vectors and network attack prediction. When speaking of TCP, the data transport feature is most important, However, it also exploits attack surface, despite providing reliable communication services.