EFFICIENCY OF DATA MINING TECHNIQUES FOR PREDICTING KIDNEY DISEASE

- Chronic kidney disease is an aging problem in the current growing population. Kidney disease surveillance and prediction is very important for patients to provide adequate and appropriate treatment at the right time. Data mining can extract interesting patterns for gigantic medical databases. Patients with kidney disease can be automatically analyzed from their disease data taking into account prior predictions. Though medical data is heterogeneous in nature including text, graphics and images, unwanted data can be removed to provide useful medical information on a patient. Medical data mining can detect disease patterns and predict severity of a patient's disease. Conformist theories are more pertinent than probabilistic theories for results as precise results and inferences become a necessity to save a patient’s life. Fuzzy systems are generally used as they produce results based on mathematics, instead of probabilistic arbitrations like neural networks. The paper proposes new algorithm Improved Hybrid Fuzzy C-Means (IHFCM) which is an improvisation of FCM with Euclidean distances to predict kidney diseases in patients.

Basma Boukenze et.al [2] pre-processed data with conversions and data mining methods to gain knowledge about the interaction between measurement parameters and the survival of a patient. Two data mining algorithms were used to form decision rules in extracting knowledge and predict the survival of patients. They explained the significance of exploring important parameters using data mining. Their new concept was implemented and tested using dialysis data collected from four different sites. Their method also reduced the cost and effort in selecting patients for clinical trials. The patients were selected based on predicted results and significant parameters found in their analysis. Neha Sharma et al [3], detected and predicted kidney diseases as a prelude to proper treatment to patients. The system was used for detection in patients with kidney disease and the results of their IF-THEN rules predicted the presence of a disease. Their technique used two fuzzy systems and a neural network called a neural blur system, based on the result of the input data set obtained. Their system was a combination of fuzzy systems that produced results using accurate mathematical calculations, instead of probabilistic based classifications. Generally results based on mathematics tend to have higher accuracies. Their work was able to obtain useful data along with optimizations in results. Veenita Kunwar et al [4]. In their study predicted chronic kidney disease (CKD) using naive Bayesian classification and artificial neural network (ANN). Their results showed that naive Bayesian produced accurate results than artificial neural networks. It was also observed that classification algorithms were widely used for investigation and identification of CKDs. Swathi Baby P et al [5] demonstrated that data mining methods could be effectively used in medical applications. Their study collected data from patients affected with kidney diseases. The results showed data mining's applicability in a variety of medical applications. K-means (KM) algorithm can determine number of clusters in large data sets. Their study analyzed tree AD, J48, star K, Bayesian sensible, random forest and treebased ADT naive Bayesian on J48 Kidney Disease Data Se and noted that the techniques provide statistical analysis on the use of algorithms to predict kidney diseases in patients.
III. PROBLEM FORMULATION Probability theory cannot be used to obtain the results in prediction of kidney diseases as it involves the patient's life and the exact results are a necessity. Statistical methods, Bayesian classification or association rule based predictions cannot be used to predict CKD as the results obtained may be less accurate. Predicting disease can save a patient's life and if detected early can help proper cure of the disease. Thus a need to evolve CKD prediction with new techniques.
IV. PROPOSED WORK Diseased kidneys are increasing in an aging population making it imperative to monitoring or prediction diseased kidneys. General predictions are based on a set of if then rules on kidney datasets. Erroneous predictions of CKD can lead to loss of life. The proposed a new technique IHFCM is used for predicting and detecting kidney disease in a patient data set.

V. METHODOLOGY A. Fuzzy Model
Fuzzy grouping is based on generation of graphs for each pattern within the group. Fuzzy modeling can match human reasoning models and manage data. The main advantages of fuzzy logic include its simplicity and flexibility. Fuzzy logic can handle inaccurate and incomplete data where traditional statistical models may fail. A fuzzy system can be any model of a complex nonlinear function and provides transparency with explanation on rules. These rules can be potential clinical guidelines.

B. Fuzzy C Means
The fuzzy c-means (FCM) algorithm is a traditional and classical image segmentation algorithm. It is a method that allows clustering, where data may belong to two or more clusters The FCM algorithm focuses on minimizing the value of an objective function that measures the quality of the partitioning a dataset into clusters. It produces an optimal partition by minimizing the weights within a group sum of squared error objective function. It is frequently used in pattern recognition. The fuzzy C-means algorithm is listed below in

D. Proposed Improved Hybrid Fuzzy C Means Clustering Algorithm (IHFCM)
The fuzzy c-means is introduced by Ruspini and then extended by Dunn and Bezdek and is widely used as clustering analysis, pattern recognition and image processing in Fuzzy C Means Clustering Algorithm (FCM). It is based on the K-means and the basic idea of FCM that each data point belongs to the membership in the degree of poor clustering, and K means that each data point belongs to a particular group or not. So FCM uses fuzzy partitioning so that when you can belong to multiple groups, the members are between 0 and 1. However, through the degree of data provided by the degree of membership, FCM still uses the cost function to try to split the data set. When minimized. It makes the matrix member having a U element value between 0 and 1. The algorithm works iteratively through the preceding two conditions until the no more improvement is noticed. In a batch mode operation, FCM determines the cluster centers i, c and the membership matrix U using the following steps: Input: Feature extracted CT scan kidney segmented image Output: given image has Kidney Disease or not kidney disease Step 1: Set the number of clusters Step 2: Set the Fuzzification parameter, image size and ending condition.
Step 3: Initialize randomly the fuzzy cluster and conditions.
Step 4: Set the loop condition initialize by 0 Step 5: Calculate the weighted fuzzy factor using Euclidean distance measure.
Step 6: Modify the segmented matrix M= {M ij } using Euclidean Distance (d) Step 7: Modify the Cluster conditions using fuzzy membership function (MF) Step 8: If (MAX|MF new -MF old | < End Condition) then Stop Step 9: otherwise increment Loop condition +1 and go to step 5. Where MF= [MF 1 , MF2… MF C ] are membership function of cluster condition. At the end point, a defuzzification process takes place to convert the fuzzy image to crisp segmented image. IHFCM can be applied to Identifying a disease in a patient's dataset and even be used for Drug Activity Prediction.
VI. EXPERIMENTAL RESULTS This work is done on MATLAB which can manipulate matrices, product functions and data, implement algorithms, create user interfaces, and interact with programs written in other languages. The experimental IHFC is worked on MATLAB. The data set is extracted from the reference point UCI library machine. In the UCI machine learning library they are in the machine learning community used in the machine learning algorithm to conduct an empirical analysis of the field of database theory and data generation. The document was created by David Aha in the 1987 FTP file and other graduate students at the University of California, Irvine. Since then, it has been widely used by students, educators and researchers from major sources of data collection machines around the world.

A. Fuzzification Score
The algorithm calculates the fuzzy C meaning as the diffuse score for each value in the corresponding table of the contents of the query that is entered as a score. The higher the score, the more similar the string. A score of 1.0 or 0.9 means that the fuzzy score results in a highly risky clustering. 0.0% means that the corresponding symptoms have a risk level that is less affected or is not at risk. The user can enter the minimum and highest possible risk factors that are set to contact the doctor and the base, the individual gives each query score, FCM is divided into two categories with the lowest and highest levels found again with the result with the given range of values Find the minimum and maximum scores given to their limits. Thus, FCM can provide three low-risk scores for finding high-risk results with fuzzy scores, fuzzy average sub-risk and cluster-based results.

B. Results
The performance of FCM is evaluated by statistical measures like sensitivity, specificity and accuracy to illustrate the normal life style score. These metrics also enumerate how the test was good and consistent. Sensitivity evaluates the normal life style score correctly at detecting a disease positively. Specificity measures how the proportion of patients without disease can be correctly ruled out. The objective function of IHFCM is depicted in Figure 3. The comparative performance of the algorithms is listed in table 1 and

VII. CONCLUSION
The proposed IHFCM is an extension of FCM and is applied for locating kidney disorders in patient records. The paper demonstrates that correct adjustment to FCM can help build a new strategy for discovering unusual and traditional cases. Initial pre-processing of IHFCM is deleting duplicate records. Results of clustering which obtained from 300 patients showed that FCM based clustering algorithms achieve higher accuracy than most existing algorithms. The proposed IHFCM's performance has been proved clearly in terms of accuracy.