Crop Prediction Framework Using Rough Set Theory

—The agriculture sector contains the vast amount of data which require the development of specialized framework to store, clean, and analysis of the stored data to convert it into the knowledge such that hidden pattern can be identified from the data. Here, the basic concept of Rough Set Theory which is applied to the agriculture data set to make the decision. The Rough Set Theory (RS) offers a feasible approach for extraction of decision rules from data sets. These rules can be used for doing forecasting of crop-yield in the agriculture sector. In this paper, the RS framework ispresentedto generate the classification rules from 640 sets of agriculture data for crop forecasting.In proposed framework, the collected data are preprocess and then information table is generated. After this, decision table is generated. The reduction method is employed for finding out the reduct of the data set which holdsthe minimal subset of attributes accompanying with a class label. By applying the LEM2 algorithm, the rules are generated from the reduct. The study shows that the theory of rough sets is the one of the best technique for rule generation and decision making.


I. INTRODUCTION
Now a day, the RST is applied in various domains, such asmachine learning [1], knowledge acquisition and knowledge discovery from database [2], decision analysis [3], expert systems [4], inductive reasoning and pattern recognition [5], data mining [6], and many more.The rough set methodology is applied to many applications like legal reasoning for drawing conclusion from the fact data, churn modeling in telecommunications and analysis of medical, finance and military dataset [7].
The central objective of the analysis using RST is to induce of (learning) approximations of concepts [8]. It gives mathematical tools to discover the hidden patterns in data. It can be used for data reduction [9], feature selection [10], feature and pattern extraction [11], decision rule generation [12]. Moreover, it can be employed to recognize partial or total dependencies in data,dynamic data,removes redundant data, missing data, give approach to null values , [13] and others.
The best of our knowledge, a very little work is done by employing rough set in agriculture sector. Therefore, we motivated to develop a framework by applying rough set in this domain. Advantages of employing this technique are explained as follows [14] [15]: • There is no prior or additional information about the data set is required • It provides a valuable analysis • It provides the interpretation in form of quantitative and qualitative data.
The main objectives of this study is to build an appropriate framework to access the performance of rough set classifiers, to do the forecasting of crop in the agriculture domain and to produce understandable decision rules to be applied on crop.

Use of Rough Set in Various Domains:
The RST has many properties which makes the one and only option for solving the various real problems like pattern recognition in which it is used for improvement in the classification ability of a hybrid pattern recognition system [16]. The designand development ofa mobile support system to triage abdominal pain in the emergency room of a hospital was done by the use of rough sets [17]. The rough sets concept is also applied to generalize the rules that explain the association between acoustical parameters ofconcert halls and sound processing algorithms [18]. The RST is employed to do the extractionof facts and rules for the power system operation [19].The hierarchical learning method based on RST is applied to the problem of sunspot classification from satellite images [20]. The author Shen and Jensen have identified the other area where rough set is successfully applied like prediction of business failure, financial investment, bioinformatics and medicine and fault diagnosis [21]. The rough set rules applied in forming the meta-structures of interest to semiconductor applications [22].
The paper is organized as follows: Sect. 2 describes the basics of RST and data mining with its applications; Sect. 3 presentsthe material and methodology; Sect. 4 describes the experimental results and discussion; Finally, Sect. 5 includes the conclusions.
II. OVERVIEW OF ROUGH SET THEORY In 1992, the RST was initially proposed by ZdzislawPawlak [23]. The methodology of RST is deal with unclear or imperfect information and knowledge, analysis and classification vague, which is consider as nonstatistical methods in data analysis.RST is a new method that deals with vagueness and uncertainty emphasized in decision making. This is the new technique to do analysis of the data. The advantages of RST to data analysis are as under [24]: • It offers efficient algorithms which are able to find out the hidden patterns from the data • It finds reduced sets of data so data reduction is done easily • The significance of at the data is evaluated • The minimal sets of decision rules are generated from data set • It offers straightforward understanding of results • The quantitative data analysis and the qualitative data analysis can be prepared • It recognizesassociationswhich is not possible by applying statistical methods The basic perception behind RST is the lower approximation and upper approximation of a set. The subset which is produced by lower approximations is the objects of interested subset. The subset produced by upper approximation is the objects which can possibly make a chunk of an interested subset. These subsets, defined by the lower approximation and upper approximation is known as Rough Set. The hidden knowledge in the systems can be discovered and expressed in the form of decision rules [25].

A. Concept of RST
The concept of Rough Set and Basic Terms used in this theory are discussed as follows: A set is a collection of various objects of interest for instance collection of magazine, paintings, people etc. Suppose the given set of object O is a finite set of objects, called the universe. The relation R, R ⊆ O × O, is an indiscernibility relation which represent the lack of knowledge about the element of O. S is a subset of U.Now we are going to describe the set S with respect to R. Definition 1: (Lower Approximation). The lower approximation of a set S with respect to relationR is the set of entirely facts that can be classified as S in view of the Relation R. Mathematically, it can be expressed as: The upper approximation of a set S with respect to Relation R is the set of all facts which is certainly classified asS in view of the Relation R. Mathematically, it can be expressed as: Definition 3: (Boundary Region). A set of all the objects that is classified neither S nor not-X with respect to R of the boundary region of a set S with respect to R. It can be stated as: In a simple word, we can say that granules of knowledge can be represented by the lower and upper approximation. The lower approximation of a set is union of all granules which are entirely included in the setwhereas the upper approximation is union of all granules which have non-empty intersection with the set.The difference between the upper and the lower approximation is the boundary region. This definition is representing in the Figure 1.

B. Rough Set Attribute Reduction
In an information system, there may be a chance of some condition attributes that actually do not provideany additional information about the objects in O. So, it is required to remove thoseattributes. By doing this, we can reduce the complexity and cost of decision process.

III. FRAMEWORK FOR CROP PREDICTION: THE ROUGH SET APPROACH
A framework is presented below for data analysis using the rough set approach, in Figure 3. Each phase of the framework is explained next. The dataset used in our experiment consists of 640 samples, collected from various sources. We have collected data from various government websites as shown in Table 1. Each sample data consists of three condition attributes or feature that represents its class which is wheat. The two class of each instance are either yes or no. If yes, then wheat is cultivated and no if wheat is not cultivated in the given conditions. In table 2, the class label c1 to c8 are attributes and c9 is the condition attributes.   Table 3.  E. Phase 5. Generation of Decision Rule. The approximations are very useful to draw the conclusion from the data. The relationship we have found between the condition attributes are {s1, s3, s4, s6} and {s2,s5}. In our example we have, with respect to the condition attributes, following facts:

Boundary Region
The set of fact {s2} is classified as neither as wheat nor no wheat (boundary region). The set {s4, s5} is the boundary region of the set {s1, s2, s3, s4, s5}.

Decision Rule
It is required to reduct the data for making the decision rule. Below step describes how to create reduct from information table.
Step 1: Verification inconclusive data The crop data s5 and s6 are excluded as they hold equal values of conditions attributes with a value of decision attribute that is different.
Step 2: Verification of equivalent information Thereis no data exist in the table 3 that possess equivalent information. The reduct of information table is as under. IV. EXPERIMENT AND RESULT Here in this research, the LEm2 algorithm is apply to the reduct data set which generated the 26 significant decision rules as shown in the following figure 4. By using the information reduct shown above, the necessary decision rules are generated by applying the LEM2 algorithm for crop prediction. The obtained significant rules are presented as under. The support of a rule is indicates that how often theantecedent and the consequent of a rule appear together in the transaction. Theconfidence of a rule indicates that how often the antecedent and the consequent exist together. The support of the rules is obtained which are shown as below: