An Efficient Clustering Based Feature Selection for Predicting Student Performance

− The student failure prediction at school has turn into a very complicated challenge owing to both the large number of factors which can influence the high performance of students and the balanced nature of student’s databases which are maintaining by Educational Data Mining (EDM) techniques. The main goal of this study is to detect and remove the both irrelevant and redundant features that can be used to enhance the classification accuracy in predicting the student performance. This goal is achieved by establishing the novel technique for feature or attributes selection process by hybrid of Artificial fish swarm-Cuckoo Search optimization algorithm to remove the irrelevant features or obtaining relevant features. Also, Non-negative Matrix Factorization Clustering algorithm (NMFC) performs the removal of redundant feature or attributes which are presented in the relevant features. The performance of this technique is analyzed by using the student database which comprises the gathering of student’s information from different colleges. For analyzing the performance of this technique, the comparative evaluation is carried out between the classifiers used in this research such as Prism and J48 without the feature selection and classifiers with our proposed technique. The experimental consequences illustrate that hybrid of artificial fish swarm-cuckoo search optimization feature selection along with NMFC approach is accomplishing high accuracy rate than other techniques. This study facilitates us to enhance the performance of the student’s failure and dropout prediction. In other words, this helps to increase the accuracy of the classification result.

(NMF) which is based on clustering method. Hence, this proposed feature selection technique helps to identify the relevant features and also increases the accuracy rate in classification process with feature selection.
The remaining part of this paper is organized as follows: Section 2 describes works that are related to our proposed methodology. Section 3 explains the concept of our proposed feature selection technique for classification of student's performance. Section 4 presents the performance evaluation results of the proposed technique. Section 5 concludes the research work and also provides the directions for future research.
II. RELATED WORKS Baradwaj, B. K., & Pal, S., [2] proposed the analysis of educational data to observe the student's performance. This research utilizes the decision tree approach for classification process which is provided to evaluate the performance of the students. Here, the knowledge which describes the student's performance in end semester examination is extracted to identify the dropouts and the special students who needs appropriate counselling. The accuracy of the decision tree classification technique is investigated to predict the student performance. But this technique has limitations such as the pruning process is necessary which makes more complexity.
Delibasic, B., et al [3] proposed block-box and white-box decision tree algorithms to know about the student's acceptance of data mining algorithms. The black-box algorithm is utilized for permitting the users to utilize predefined algorithms to set parameters. The white-box algorithm is utilized for letting the users to assemble algorithms from algorithm building blocks. This research is utilized to analyse the acceptance of the students for using these algorithms in education data mining technology. The limitation is the white-box algorithm is more complex.
Kabakchieva, D., [4] proposed the data mining classifications algorithms to predict the student's performance. This research uses the various data mining classification algorithms such as Bayes classifiers, decision tree classifiers and also Nearest Neighbour classifier. This research allows forecasting the student's performance at university range according to the student's personal and their pre-university characteristics. But the limitations in this work are that the determination of k value for nearest neighbour classifiers and Bayes classifier has less accuracy.
Nikam, S. S., [5] presented the comparative study of data mining classification algorithms to know about their features and limitations. This research compares the different classification algorithms such as C4.5, ID3, k-nearest neighbour classifier, Naïve Bayes, SVM and ANN in data mining for the purpose of feature classification. By using this research, the features and limitations for corresponding classification techniques are identified. Also, this work shows that determination and grouping of data for each classification technique.
ElGamal, A. F., [6] proposed a data mining classification model to predict the performance of the student in programming course. This research absorbs the student's mathematical skills, programming aptitude, problem solving skills, computer programming experience and e-learning usage, etc. This proposed work involves the three processes such as data pre-processing, attribute selection and rule extraction algorithm. In pre-processing, fuzzification process is carried out to transform the attributes into linguistic form. For classification process rule extraction is performed based on decision tree classification. However, the limitations in this work are decision tree classifies the features by rectangular partitioning and also suffer from over-fitting.
Singh, B., et al [7] proposed a feature selection technique based on symmetric uncertainty for high dimensional data. In this approach, the sorted features are initially partitioned and the essential features are searched in both forward and backward fashion. In this paper, correlation based feature ranking method called as symmetric uncertainty is used to select the features. This work efficiently identifies the major features and eliminates the redundant features. The limitations in this algorithm are it is well for only numerical data not for mixed type of data.
Singh, M., et al [8] proposed the model for feature extraction for identifying the student's risk level in academic activities. This research uses Naïve Bayes classifier to predict the features which are used for recognizing the performance of the second year students in their computer and application course by taking partially relevant or fully relevant features. This model has ability for extracting the fitness procedure sequences by each student who is predicted in at-risk group. This identification improves the student's performance in academia. However, the classification accuracy is still poor.
Mustafa, M. N., et al [9] presented the dynamic prediction model to identify the student's dropout in developing country. This research applies chi square test for feature selection from the separating factors such as gender, financial conditions, and dropping year, etc. In this work, Classification and Regression Tree (CART) and CHAID tree is used as data mining techniques to predict the features. However, the classification accuracy is low. The limitations in this research are it is only based on background information.
Rachburee, N., & Punlumjeak, W., [10] presented the comparison between feature selection techniques in data mining. This research compares the feature selection techniques such as Greedy, IG-ratio, Chi-square and mRMR to identify the efficiency of student's performance prediction methods. The classification models which are used in this work are such as k-nearest neighbour, neural network, naïve bayes and decision tree. This work uses best couple validation to get high accuracy. However, there are various feature selection and classification techniques are also available to improve the efficiency of student's performance.
Kaur, P., et al [11] proposed the data mining algorithms for classification and prediction for prediction of slow learners in education field. This research tests the database of the student's academic records and applies several classification techniques such as Naive Bayes, SMO, J48 and REPTree. This paper also used to know about the importance of the prediction and classification algorithms in educational data mining. This work is further investigated for other fields such as medicine, sports and etc.
Shahiri, A. M., et al [12] presented the data mining techniques for predicting the student's performance. This research is used to study about the prediction methods for identifying student's performance in Malaysian institutions. This work mainly focused on the prediction algorithm which is used for identifying the most popular attributes in database of the students. Also, this work studies about the variables which are used for analysis of the student's performance. However, the meta-analysis for predicting student's performance helps for monitoring by systematic way.
Sen, B., et al [13] proposed data mining method for predicting and analyzing student's placement-test scores. This research uses large and feature rich database which are collected from the secondary education transition system in Turkey. This work uses sensitivity analysis for prediction methods to identify the most important predictors. However, the prediction models are depends on the most priority factors to analyse the student's performance.
Emary, E., et al [14] proposed the feature selection approach based on the binary Gray Wolf Optimization (bGWO). This research uses the bGWO for selecting the features in order to maximize the classification accuracy by selecting relevant features. This bGWO is used to find the optimal regions in the complex search space. This work is compared with the particle swarm optimization and genetic algorithms. The limitations in this work are, to avoid repeatability and robustness this method converges to similar solutions.

III. ARTIFICIAL FISH SWARM-CUCKOO SEARCH OPTIMIZATION BASED FEATURE SELECTION
Our proposed system introduces a novel technique for the purpose of feature or attributes selection which is called as hybrid of Artificial fish swarm-Cuckoo search optimization. The effectiveness of feature selection is achieved by our proposed technique which incorporated the two coupled components of irrelevant and redundant feature elimination. In our existing method NMFC uses the Symmetric Uncertainty (SU) to remove the irrelevant features. The SU should maintain the mutual information and entropy. To avoid this limitation, in this paper, Artificial Fish Swarm-Cuckoo Search Optimization to remove the irrelevant features or to obtain the relevant features rapidly. After selecting the relevant features, we have to find the redundant features which are presented in the relevant features. For removal of redundant features, we are employing a novel Non-negative Matrix Factorization based Clustering technique. After feature selection procedure, two classification methods such as Prism and J48 are used to predict the student's performance.

A. Removal of Irrelevant Feature
Irrelevant features are eliminated by using Hybridization of Artificial Fish Swarm algorithm (AFS) and Cuckoo Search Optimization (CSO) algorithm. The AFS algorithm is a type of random search algorithm which has characteristics of parallelism, simplicity and tracking but it has low speed of convergence rate. Thus, the hybridization of AFS algorithm with CSO is used to achieve high convergence rate when their search area is limited by levy flight optimization.
This algorithm initialized including the number of features, number of iterations, number of fishes, number of selected features, etc. In initial iteration, all fish will arbitrarily pick a feature subset from m features. Only the best subsets (k<nf) can be utilized to bring up to date the visual position and manipulate the feature subsets of the next iteration. In further iterations, all fish can begin with m-p features which are arbitrarily chosen from the previously chosen k best subsets wherein p is an integer that values from 1 to m-1. This progression of feature selection may interruption the entire system to pick the relevant features. Therefore, Cuckoo search algorithm is utilized to pick the optimal feature. The feature selected by the cuckoo is carried out for next generation. By using this, the features which characterize the best k subsets have more probability to present in the features of the next iteration. However it will achievable for all fish to consider other features. For a known fish j, those features are the ones which accomplish the best cooperation between the previous understanding such as visual position and the current best of cuckoo search.
The feature selection exploited by artificial fish involves the following: : : where is a constant //Select the remaining p features for each fish 11. For j = 1 to nf, choose feature f i from given subset S j by cuckoo search algorithm such as follows: 12. Initialize a population of n host nests and desire the current best nest through evaluating the fitness, 13. While (t<MaxGeneration) 14. Search out a cuckoo arbitrarily by Levy flights x i (t+1) = x i (t) + 15. Evaluate its fitness, a j and pick a nest among n arbitrarily. 16. If (a i > a j ), change j by the new solution and abandon the worst nests. 17. Remain best solutions and order the solutions. 18. Discover the current best feature f j . 19. S j = S j ∪{f j } 20. Replace the duplicated subset with randomly chosen subsets and go to step 11.

B. Removal of Redundant Features
After obtaining the relevant features, the redundant features are also appeared in the relevant features. Therefore, the NMF approaches are used to cluster the features. The redundant features are eliminated from the relevant features by selecting representative feature in the clusters and the final feature subset is acquired.
NMF defines a matrix factorization method which decides the positive factorization for a particular positive matrix. The collected feature set represented as W = {f 1 ,…, f n }and X is the weighted frequency vector. Assume that the given feature set includes k clusters. The major aim is to factorize the m × n matrix X into the two nonnegative matrices such as m × k matrix A and k × n matrix B T that reduces the objective function as follows: In above equation (3), ||.|| refers the squared sum of all elements in the matrix. This objective function also rewritten as: The above minimization problem can be restated as follows: reduce F with respect to A and B under the restraints of a ij ≥ 0, b xy ≥ 0, where 0 ≤ i ≤ m, 0 ≤ j ≤ k, 0 ≤ x ≤ n, and 0 ≤ y ≤ k. This is the characteristic constrained optimization problem, and it can be explained by using the Lagrange multiplier process. Let α ij and β ij be the Lagrange multiplier for restraint a ij ≥ 0, b ij ≥ 0 respectively and α = [α ij ], β = [β ij ], the Lagrange L is described as: L = F + tr(αA T ) + tr(βB T ) The derivatives of L in respect of A and B are given as: = −X T A + BA T A + β ( 8 ) By using the Kuhn-Tucker condition α ij a ij = 0 and β ij b ij = 0, obtain the following equations for a ij and b ij correspondingly: (XB) ij a ij − (AB T B) ij a ij = 0 ( 9 ) (X T A) ij b ij − (BA T A) ij b ij = 0 (10) The above equations help to the following updating formulae: To create the unique solution, the Euclidean length of the column vector in matrix A is required as one. This requirement of normalizing A can be accomplished by: Rank the correlation coefficients and select the top k feature as an exemplary feature from each cluster and include in S j 8. Return S j From the above algorithm, initially the irrelevant features are eliminated from the database. For this purpose, the hybrid of AFS and CS optimization technique is used. Here, the artificial fish are move to the next state which is better than the current state in terms of measuring crowd factor, step size and Levy distribution function. Levy distribution function helps to optimize the better next state which is selected as the relevant state. The variables in the relevant state are represented as relevant features in the data. From the first part of our algorithm, the relevant features are obtained. In the second part algorithm, the redundant features in the relevant feature set are removed. For this function, the effective non-negative matrix factorization based clustering method is used. The acquired relevant features are allowed to the NMF based clustering to remove the redundant features. The main goal is to factorize the input X into non-negative m × k matrix A and k × n matrix B T . After these two non-negative matrices are normalized and the matrix of A is transposed for obtaining B T and achieved the clustering of features. By using clusters, the exemplary feature is collected and it is known as final selected features without irrelevant features as well as redundant features. Finally, the desired features are obtained which can increase the classification performance.
IV. EXPERIMENTAL RESULT In this section, the classification performance with proposed NMFC classification and without feature selection method results are compared in terms of classification accuracy, True Positive rate (TP rate), and True Negative rate (TN rate). From this experimental result, we can say that the proposed method has high efficiency than the other techniques.

A. Database Description
For our experimentation, we are considering the student's database which contains 297 data instance that are collected from the different colleges. In our database 40 attributes are available which includes student's name, course, age, gender, and nature of college such as engineering/medical/arts, college type such as government/self I financed, location feature, family type such as joint family/nuclear family, family factors such as occupation & educational qualification of relatives, economic aspects, college aspects, social aspects and spending time in television, mobile, computer and etc., personal factors, academic factors. Here, location features are referred as the area in which student's home, school and college located such as rural, urban, semiurban region. College aspects are one of the traits that gives the data about whether the student refer lecturer notes which is given by lecturers or books, training system such as lecturer method or black board, number of students in class, whether the college permitted mobile phones or not, and so on. Social aspects are such that direction of relatives for studies, number of friends and educational performance of friends.
In our experimentation, we are evaluating the student's performance such that good or poor in the institute in accordance with the features presented in the information. Data samples are given to the feature selection method to collect the effective features. Then these collected features are given to classifiers to evaluate the performance. For our experimentation, we are using two classifiers such as Prism and J48. We are providing 150 data samples as training data (with class label) to the classifier for learning process and remaining data samples are taken as test data (without class label) which are given to the classifier for finding the class label. Eventually, the output variable or attribute or class is to be chosen which gives the educational status or student's performance that has two consequences either PASS (student who pass the course) or FAIL (student who needs to repeat the course).

B. True Positive rate
True Positive rate (TP rate) also called as sensitivity or recall, is the proportion of true positives which are identified as positive and is computed by Equation 13. Based on this experiment, if the result class label from the prediction is PASS and the true class label is also PASS, it is called as TP rate. TP = (13) Fig. 1. TP rate comparison Fig. 1 demonstrates that comparison of the TP parameter between the classification without feature selection and the classification with NMFC which uses the existing method symmetric uncertainty as well as our proposed method artificial fish swarm-cuckoo search optimization approach. In the graph, X-axis denotes classification methods such as Prism, J48 and Y-axis denotes TP rate. From this graph, we conclude that the classification with out proposed NMFC uses artificial fish swarm-cuckoo search optimization has more efficient in TP performance relatively.

C. True Negative rate
True Negative rate (TN rate) also called as specificity, is proportion of true negatives which are identified as negative and is computed by Equation 14. Based on this experiment, if the result class label from the prediction is FAIL and the true class label is also FAIL, it is called as TN rate.  Fig. 2 demonstrates that comparison of the TN parameter between the classification without feature selection and the classification with NMFC which uses the existing method symmetric uncertainty as well as our proposed method artificial fish swarm-cuckoo search optimization approach. TN rate is calculated by using the formula. In the graph, X-axis denotes classification methods such as Prism, J48 and Y-axis denotes TN rate. From this comparison graph, we conclude that the classification with out proposed NMFC uses artificial fish swarm-cuckoo search optimization has more efficient in TN performance relatively.  Fig. 3 shows that comparison of the accuracy parameter between the classification without feature selection and the classification with NMFC which uses the existing method symmetric uncertainty as well as our proposed method artificial fish swarm-cuckoo search optimization approach. Accuracy of classification is measured by using the formula. In the graph, X-axis denotes classification methods such as Prism, J48 and Yaxis denotes accuracy rate. From this accuracy comparison graph, we conclude that the classification with out proposed NMFC uses artificial fish swarm-cuckoo search optimization has more efficient in accuracy performance relatively.
V. CONCLUSION Various strategies can be developed and implemented which enables the educational institutions for transforming the wealth of information into the wealth of predictability, stability and profits. Our proposed system is establishing a new feature selection method to classify the performance of the students in the educational institutions. This method has various advantages such as removal of irrelevant features and also removal of redundant features which are presented in the irrelevant features. This will increase the result of classification accuracy. In our experimentation, we are using the classification technique such as Prism and J48 to classify the student's performance of different colleges. Experimental result shows that accuracy level, TP rate and TN rate of our proposed method has higher accuracy than the classifier without feature selection process. In addition, this new novel technique can reduce the time complexity and memory complexity of the classification process. Thus, we can improve the performance of the student's failure and dropout prediction by using our proposed technique.