Detecting HTTP Based Mimicking Attacks at HTTP Server

- Botnets are major challenges for cybersecurity. DDOS attacks taken new dimension with botnets. First attacker will target security compromised systems and will inject bot in to them. Those systems will run malicious code and bot master cancontrol them. Bot master forms homogeneous/heterogeneous botnets by this Bots. Botnets are used for many cyber-attacks such as distributed denial of service (DDoS), information phishing and email spamming. Existing Intrusion Prevention / Intrusion Detection (IPS/IDS)systems can detect botnets attacks by using anomaly detection methods. To sustain botnets, bot masters working on bots that can mimic legitimate cyber behavior to fly away from the radar. Most of intrusion detection systems works on assumption the attack traffic is statistically different from normal traffic. Bot owners hack the popular website browsing history with that they will simulate thousands of users through bots and will try to degrade the performance of website. This becomes challenge for existing anomaly detection algorithm to distinguish between legitimate users and attacker. Previous studies carried on browsing behavior by using semi-Markov model prove it is impossible to detect mimicking attacks based on statistics if the number of active bots of the attacking botnet is sufficiently large[1]. It is becoming difficult to identify mimicking attack. We are proposing possible method to do mimicking attack and an algorithm to identify the mimicking attack pattern at server by using HTTP statistic. To prevent the attacks challenging the user, genuine user will respond to the challenge and attackers will fail to respond. This method can be used to distinguish legitimate user and attacker this can be extended to other layer 7 protocols.

Phase2: By using the information retrieved in Phase 1. Create new Bots to do Mimicking attacks. If bot masters able to establish large number of active bots i.e each bot can simulate one legitimate. It is impossible to differentiate mimicking attacks from the legitimate web browsing of a large number of browsers [1]. Botnet attacks can be detected in two ways signature based and anomaly-based detection. The best suited for above attack is anomaly based after identifying the attack it can be feed to signature data bases [5]. Anomaly detection Anomaly detection also can be further classified as host-based approach [6], each machine is monitored to find malicious activity .This method needs installment toll on machine which may not be scalable. Second approach is, network-based this analyze network traffic [7]. In this paper, we are trying propose host based methodology. Intelligent Analyzer collects the flows and filters them based on originator ip and monitor the flow activities using layer 7 statistics. Cluster the flows having similar characteristics. Derive the flows appearing across the clusters. Validate them using HTTP Challenge. Populate block list to the IDS. In this paper we are proposing two algorithms First one discuss possible way to do mimicking attack. Second algorithm is for detecting mimicking attack at HTTP server level. As part of Mimicking attack algorithm homogeneous bots will be injected in first step. Using that it will collect the browsing history of the users. Bot master will analyze the user accessing patterns and from that will derive accessing profiles with bellow details 1) Number of request 2) Number of packets on each session 3) Size of the data in each packet 4) Duration of the each session 5) Time interval between successive sessions With above information different Bots will be designed.Those bots will be heterogeneous in nature. It is not necessary to generate same number of bots as number of systems. Based on the commonality in the profile Bot master can club multiple user profile in single bot. This analysis might need bot master interventions but still bot master can automate this as well. Bot master designs the heterogeneous bots mimicking the user behavior and will insert in to respective system. Bot master starts monitoring the system and system will trigger the mimicking attack. Mean time any variation in the browsing behavior observed in the user profile that feedback will be used in generating the next bot. Once Botnet started mimicking attack it will be difficult to predict the attack at gateways and IPS/IDS. So we are working on model that can detect at server level. This method mainly focuses HTTP mimicking attack. It will inspect HTTP layer statistics based on the behavior it will form identical groups. If flow similarity found in multipleparameters will be considered possible attack pattern. For those flows HTTP challenge will be send .Genuine user will respond to the challenge attacker who may not implement the complete HTTP stack will fail in responding. That list will be circulated as block list. We need to cluster the data based on similarities in attributes so we opted for clustering algorithms.Data clustering two types of algorithms are there hierarchical and partition. As part of the study we analyzed hierarchical (connectivity models) model it starts with assumption that objects will be related to nearby objects than far away objects. The algorithms start with less number of samplesandforms group with one large set having items of similar properties. Whereas partition model will based on a center vector and club the multiple elements close to cluster. The number of partitions (clusters) will be fixed in traditional algorithms. For the above problem we require partition clustering because we need to divide the data objects into non-overlapping subsets. For each parameter we need to come up with multiple clusters. The number of clusters are not deterministic so famous k-means algorithm might not be perfectly suited for this [8]. Whereas X-means clustering algorithm will not work on assumption that number of clusters are known in the beginning.X means algorithm works with three steps [12].
Step 1: Conventional K-means algorithm Step 2: Adjust the centroids Step 3: Continue above two steps until k<Kmax When the clustering process needs to be stopped will be decided by two methods Akaike information criterion (AIC) or Bayesian information criterion (BIC). In Our current requirement we will be not aware of the number of clusters in advance. We need to find the optimal number of clusters that are closely tied up with parameters so using X-means algorithm.
In HTTP server will keep background process to collect the flow information. This process will get the one copy of all the HTTP requests and will collect the Layer 7 information.Analyzer will filter the data based on originator ip address (source ip address). In the filtered data critical components like (Number of session, idle time, amount of data transferred, methods used and duration of the session) will be recorded and that data will be forwarded to clustering algorithm. X-means clustering algorithm was applied on flow data on different parameters. The quickness of identifying the attack is much needed. X-means algorithm will be executed parley for each parameter after completing the clustering on all parameters that data will be given to insertion algorithm. The insertion Algorithm was designed to identify the common flows occurring across the clusters. If multiple flows occurring across the clusters means there is similarity in the activity they are performing. That set of flows was added to the suspicious list of flows. For those flows Analyzer will give to the challenge process. The HTTP challenges process will walk through all the flows currently alive and will send the HTTP challenge. HTTP challenger module will use HTTP server to initiate challenge for those suspicious flows for the next subsequent request came from the client. Out of existing HTTP challenges with cookie is difficult to crack.The flows that are not replying on HTTP challenge add them to block list and populate that to the IDS/IPS systems in the network. We can give the trace of those flows so that IDS systems can add them to the block list [13] [14].
There is a possibility that bot master can populate program so that Bots can reply to the HTTP challenge. To identify such cases keep the list of suspicious flows identified after insertion algorithm. In multiple iterations of X-means algorithm and insert algorithm the same flow is observed consider them as suspicious mimic attack flowsand add them to the Gray list so that Analyzer will keep watch on those clients, if they are coming further in the iterations the connections coming from those source address will be blocked. By end of each phase we will be having three lists White list: Flows that are not having any similarities Black list: Clients HTTP challenge failed, they are not permitted to any further sessions with HTTP server Gray list: Clients having similar activity pattern. They are under monitoring if they observed further iterations moved to block list. If any new entry is added in the black list it needs to be populatedto the IDS immediately. If a system was attacked by Bot they might be doing multiple malicious activities going on. The system will be completely blocked in network. Second algorithm is to identifythe mimicking attack at host level.

Algorithm for creating Mimicking attack
As part of the algorithm's N number of bots will be injected to different computers. All these bots are homogeneous. They will collect the browsing statistics of the user. After that browsingstatistics are analyzed and user profile was created. From that target server was identified and create M heterogeneous botnets for N network elements. Each bot simulate the browsing behavior of the user. As part of the second phase Bot master is initiating mimicking attack. Bot master repeats the process continuously so that IDS/IPS may not able to detect the mimicking behavior. Bot master run two phases in parallel while one set of bots are doing mimicking attack. Bot master will collect the new user profiles happening in the system and will change the heterogeneous mimicking pattern.

Explanation of the Algorithm
Step 1: Bot master inject N copies of bot (equal to the number of system in the network). This bot is intended to collect the browsing profile of the user in the system the data looks as bellow Step 2: Analyze the user browsing profile and come up with different unique profiles can be used for mimicking attack. Currently we are proposing this will be done by bot master, he will analyze manually and will keep similar profiles in to one group and prepare common profiles like bellow. This can still be automated.

Algorithm for detecting mimicking attack
Current Botnet detection algorithm's mainly focuses on anomaly detection. They will work on assumption that attack will have variation in the flow than usual. Most of the botnet detection algorithms like Botminer, Botsniffer will work on this assumption attack pattern will have any deviation with the existing patterns but mimicking attacks will not fall in that category So it is not possible to detect the attack patterns using the traditional algorithms. So we are proposing the new algorithm to detect mimicking attacks The quickness algorithm is muchneeded, the algorithm should be fast enough to detect the attack while attack is in progress. Need to run this algorithm in multiple phases we can have feedback from one phase to another. We can have three processes running in parallel. Process 1: a) Collect the flow information based on source ip, collect the parameters b) Run the clustering algorithms on each parameter parley c) Provide the Cluster sets to the Process 2, and move back to step a) for next set of flows Process 2: Run the insertion algorithm to find the common flows, provide that information to process 3 Process 3: Send Http redirect for all the common flows find in process 2 , after time out prepare the block list, gray list and populate Populate the block list to the IDS/IPS. The three processes need to run in parallel so that execution will be fast and detection of the mimicking flow will be quicker.Here we need clustering algorithm based on multiple parameters like idle time, active time, gap between two successive connections, methods used by HTTP sessions. Those clustering on different parameter need to happen in parallel so that input to insertion algorithm happens much quicker.  Step 2: By using k-Means algorithm [1} groups N data points into different data sets , based on bellow parameters 1) Idle time (P1) 2) Active time (P2) 3) Time between two successive sessions (P3) 4) Number of request per method (P4) Step 3: Use X-means clustering algorithm on all flows by using above 4 parameters First run K means cluster Get averages for comparison to the Cluster. For P1 Set the initial partition, and the initial mean vectors for each group belongs to Cluster (ki), and the averages found now become the new mean vectors for Cluster (ki). If closer to Cluster (kj), then it goes to Cluster (kj), along with the averages as new mean vectors. Example : flow 3 belongs to K1 and K1 mean remains as 6 Step 7: If there are still flows present in flow data base , continue again with Step 4. Otherwise go to Step 8.
Step 8: Compare distanceof element to its own cluster's mean and to that of the opposite cluster mean. If cluster is close to its own group keep the element in that cluster else place it in opposite cluster.
Step 9: If any relocation occurred in Step 8, the algorithm must continue again with If no relocations occurred, k-means clustering is completed . a) Step 10 :forki = 1,. . . ,K: Replace each centroid ck by two centroids ck1 and ck2.
Step 11: Run K-means algorithm with K = 2 over the cluster Ki. a) Replace each centroid. Use BIC calculation to determine two clusters is needed (or) single cluster is best suited b) If convergence condition is not satisfied, do it again. Otherwise Stop.
Step  Step 11: Compare the flows that are occurring in both K1 and K2 clusters. For all subsequent requests send initiate HTTP challenge (302 redirect message) with cookie. Client has to store and resend the cookie. If it is an attacker he fails to respond Step 12: Flows that are not responding Cookie and having similar pattern considered to be mimicking attack.
Server can pass this information to firewalls, IPS/IDS to block the malicious users Step 13: Maintain Gray list with the ip address that giving reply but if they appeared in multiple times in iterations move those ip address to block list.

III. CONCLUSION
We discussed the possible ways of mimicking attack and identifying at server level by using HTTP statistics. Will determine the mimic attack and block the malicious machines. If servers are distributed the information need to be exchanged and the current design need to be altered accordingly. This is Host based techniques it may not scalable in large network .We concentrated on HTTP but there is a possibility the attacks can happen in any application protocol in those cases the algorithm need to be altered according to the application protocol. In future works we will be working on the methods of extending to all layer 7 protocols. The quickness of the algorithm is needed if the flows are in large numbers then we may land in blocking after the attack. Working on approach to identify mimicking attackat networkgateways so that it can be much scalable.