ACO-Random Forest Approach to Protect the Kids from Internet Threats through Keystroke

— Internet users from children group are rapidly increasing. They use the Internet for doing their homework to keep in touch with their friends. But they are vulnerable to unknown threats coming from the Internet. Many Government authorities are actively trying to protect the children from these threats. This study is one approach which can distinguish the children from Internet users by analysing the typing behaviour. The moment a user is identified to be a child or minor, the next stage of protection will be auto sensing firewall appropriate for the users. We have taken two public datasets on keystroke dynamics for experimental purpose and applied Ant Colony Optimization (ACO) technique as search methods and Random Forest as a classifier on each dataset. Obtained results are impressive. As per our study, more than 92% of desktop computer users and 84.22% of touch screen mobile users from children group can be protected from the looming threats from the Internet by analysing the typing behaviour on keyboard or touch screen.


II. RELATED WORKS
Keystroke dynamics is not new in biometric science. The technique has been started in the year 1980. Many Journal, Conference articles and master thesis have been published. Fig. 1 clearly indicates the increasing trends on keystroke dynamics research. Many datasets have been created considering different type of texts with different lengths from different number of subjects, many methods have been applied and many innovative ideas have been come out from the previous study. But most of the papers focused on user identification or authentication performance through typing pattern.
Only few papers described some ancillary information that can be extracted from the typing pattern. Epp et al. [10] show that it is possible to identify the emotional state of the person through the person's way of typing. They reported the accuracy rate 84% to identify the angriness and excitement. Giot et al. [12] show that it is possible to detect the gender and they reported the accuracy rate more than 90% using typing style. Idrus et al. [11] show that it is possible to identify the gender, age group, handedness and one or two hands used while typing and they reported the accuracy rate very close to 90%. Uzun et al. [3] show that it is possible to identify the child group and adults through typing pattern and they obtained the accuracy more than 90% for the simple familiar Turkish text. They have used 13 classification algorithms where SVM (Linear) is achieved minimum Equal Error Rate for familiar text but the performance is not consistent for the other texts.

A. Basic Idea
Keystroke dynamics is a behavioral biometric traits relates the issues in human authentication/identification. But this technique also can be used to recognize the ancillary information. Physical structure, mentality, reading style, hand geometry, weight and length, experience level on keyboard, knowledge level, educational qualification an neuro-physiological are the factors which indirectly effect on keyboard while typing to identify the kids. Since keystroke dynamics is a distance-based measurable pattern it would be the strong alternative which may enable the age group identification.

B. Features
Basic features of keystroke patterns are the time interval between a key pressed and released, the time interval between two subsequent keys pressed and released. Now days, key pressure, finger tips size, finger placement on keyboard and keystroke sound are also considered. The some timing features of the keystroke dynamics are as follow: ( 2 ) Down-Down Key Latency (PP)=P i+1 -P i ( 3 ) Up-Down Key Latency (RP)=P i+1 -R i ( 4 ) Down-Up Key Latency (PR)=R i+1 -P i ( 5 ) Total-Time Key Latency (T-Time)=R n -P 1 ( 6 ) Tri-graph Latency (Tri-time)=R i+2 -P i ( 7 ) Four-graph Latency (F-Time)=R i+3 -P i ( 8 ) Here, P and R represent the key press and release times of entered keys for predefined text.

C. Public Datasets
Many datasets on keystroke dynamics have been created in the last 30 years but some of them listed below are available in the Internet, we can download it or we can download on request. This datasets are collected from both child and adult users. Details are given in the Table 1. We have given the name of each dataset for this paper where Dataset A and B are created through keyboard where Dataset C is created through touch screen. IV. EXPERIMENTAL RESULTS Details of the experimental results are described in the Table 2, Table 3 and Table 4. Eight popular and recognized classification algorithms were used on each dataset described in Table 1. The accuracy rate is calculated by the weka environment version Weka 3.7.2 [9]. Two test options were used in our experiment. First one is 10 fold cross validation where total sample of instances is divided into 10 groups, each group will be treated as testing and remaining training groups will be treated as training. In second test option we have divided the total training data into 2 groups with 66% of training and 34% of testing instances. Only test accuracy of each learning processes were listed. The table shows that Random Forest methods achieved highest accuracy before and after optimization with ACO technique consistently for each dataset.  Table 5 that Fuzzy Rough NN is always better than Random Forest for all datasets used in our experiments, but after optimization we observed that Random Forest is proved the suitable methods in this domain.

VI. APPROACHES A. ACO-RF Approach
Our proposed model is ACO-RF. The searched input is the key parameters to check is it kids or not. The moment a user is identified to be a child or minor, the next stage of protection will be auto sensing firewall appropriate for the users and it will be continued whenever user types the search inputs, the graphical representation is presented in the Fig. 5.

VII.
COMPARISONS WITH OTHER METHODS Bicakciet. al. [3] showed that the accuracy rate to distinguish the children group from adults is 91.2%. This is the optimum accuracy recorded in literature for the simple text in Turkey, where our proposed approach achieved 92.2% of accuracy on same dataset. They also applied their classification algorithms on password type text but only achieved 87.2% of accuracy where our approach achieved 90.2% of accuracy. Therefore, our approach is more consistent that previously proposed methods.
VIII. DISCUSSIONS It is true that performance of keystroke dynamics is not much promising due to high failure to enroll rate or intra class variation. So this technique can be applied where this error rates can be compromise instead the use in user identification / authentication. In this paper, we have tried to segregate the children from adults through the way of typing and obtained promising results.
The experiments have been done in both environments. In desktop environment, we achieved 92.2% of accuracy and we achieved 84.22% of accuracy in android environment using ACO-Random Forest. It is very hard to achieve these results in practice where there are more chances to high FTE rate due to external factors like cross device validation.
IX. CONCLUSIONS Keystroke dynamics and mouse movements are two common measurable distance-based activities to use the Internet through keyboard/touch screen. It is enough to identify the age group which can protect the kids or minor from looming threats coming from the Internet. We have collected datasets only contain keystroke pattern and applied 8 machine learning algorithms on each and also we have applied optimization techniques (ACO) to select feature subset. Random Forest machine learning models are proved a suitable classification method, where ACO is achieved optimum solution as optimization technique. Machine learning algorithms were used in our experiment, where we obtained up to 92% of accuracy in desktop environment. This accuracy rate is impressive for single familiar fixed text, if enrolment phase is extremely accurate. But it is very hard to achieve in practice. There are many factors which may affect the process and increases the failure to enroll rate. It means the technology is not much efficient. More research work has to be done and many factors have to be included like mouse dynamic, pressure which is proportional to force, depends on mass of hand weight may be the good factor in desktop environment. In android platform, key pressure, acceleration, and finger tips size may be included where advance sensing device, accelerometer are embedded in each smart phone, So this technique get achieved acceptable accuracy and can be used to protect the children from looming Internet threats.