F-PNWAR: Fuzzy-based Positive and Negative Weighted Association Rule Mining Algorithm

— Association Rule Mining (ARM) algorithm motivates on mining of the Positive Association Rules (PARs). In recent times, the researchers focused on mining the Negative Association Rules (NARs) by finding the interesting infrequent itemsets. Existing ARM algorithms discovers only the PARs and treat each item with same significance. But, the significance of each item may differ from each other. This paper proposes a Fuzzy-based Positive and Negative Weighted Association Rule (F-PNWAR) mining algorithm for the market-based data analysis. The itemsets are ranked and weight is assigned to the itemsets based on the rank. The positive and negative weighted itemsets are extracted and rule is generated. The proposed F-PNWAR algorithm is compared with the existing weighted ARM (WARM), Fuzzy WARM (FWARM), Enhanced FWARM (E-FWARM), traditional K-means and Adaptive K-means algorithms. The comparative analysis shows that the proposed F-PNWAR algorithm achieves maximum frequency item rate, association rule rate, accuracy and minimum execution time than the existing algorithms.

The remaining sections in the paper are arranged in the following way: Section II describes the existing ARM approaches. Section III explains the proposed F-PNWAR algorithm including the mining of the PARs and NARs. Section IV illustrates the experimental analysis of the proposed F-PNWAR algorithm. The conclusion of the proposed F-PNWAR algorithm is described in Section V. Mallik et al. [10] proposed a weighted rule mining approach for ranking the association rules using the interestingness measures. The proposed algorithm generated a few number of frequent itemsets than the existing mining algorithms. Hence, it saved the execution time. Pears et al. [11] automated the weight assignment process by formulating a linear model that obtains the relationships between items. The Valency model is extended by increasing the field of interaction beyond the immediate neighborhoods. The experimental results show that the rules are mined efficiently at a much lower level of support than the basic model. However, the computational cost is high, while recomputing the entire set of weights.

II. RELATED WORKS
Azadnia et al. [12] developed a new approach by integrating the Genetic Algorithm (GA) with the ARM algorithm such as Traveling Salesman Problem algorithm to find the best travel path. The GA is applied for sequencing the batches to reduce the tardiness. Nithya and Duraiswamy [13] applied average ranking feature selection approach and Fuzzy Weighted ARM (FWARM) classifier to diagnose the medical dataset. The classification accuracy is improved and the number of rules is reduced by ranking the appropriate potential attribute. Hence, the computation time is minimized. Galárraga et al. [14] developed a model for mining the Horn rules on large Resource Description Framework (RDF) knowledge base (KB) and supporting the Open World Assumption (OWA) scenario. The precision and coverage of the proposed model are improved. The rules can be mined quickly than the existing approaches.
Lee et al. [15] proposed a utility-based ARM method for evaluating the association rules by measuring the business benefits of the firms. Vo et al. [16] developed new algorithms for efficient mining of the Frequent Weighted Itemsets (FWI) from the transaction databases. The proposed algorithm achieved a significant reduction in the mining time than the Apriori-based algorithms. Tew et al. [17] concentrated on the behaviorbased clustering and study of the interestingness measures. The domain knowledge is crucial to select a proper interestingness measure for a specific task and business objective. Wanaskar et al. [18] presented and investigated a novel approach based on WARM algorithm and text mining. The algorithm is improved by adding semantic knowledge to the results. Better web recommendation performance is achieved. Babashzadeh et al. [19] proposed a new approach for modeling the medical query contexts by mining the semantic-based association rules. The clinical data retrieval performance is improved.
Savasere et al. [20] developed an algorithm for mining NARs for statistically dependent items by integrating the frequent itemsets and domain knowledge. But, this approach required a set of predefined hierarchical classification structure. This makes it difficult to generalize. Morzy [21] introduced the dissociation rule concept. This algorithm maintains the number of generated patterns low. Antonie and Zaiane [22] proposed an algorithm that discovers NARs with high negative correlation between the antecedents and consequents. But, there is a need to continuously update the coefficients and there is no guarantee of all NARs. This paper presented the Fuzzy-based algorithm for extracting the PARs and NARs.

III. PROPOSED F-PNWAR ALGORITHM
The input data is obtained from the OneDrive. OneDrive is a file hosting service that allows the user to store the files. Initially, the pre-filtering method is applied on the input datato remove the data redundancy. The filtered data is analyzed based on the data types. The zero-mean normalization is applied so that all the data are made to slide vertically. Thus, the average value of the data is zero. The string type data is converted into integer-type data. Then, the items in the dataset are ranked. The weight is assigned to the items based on the rank. The F-PNWAR mining algorithm is applied to find the positive and negative weighted itemsets and rule is generated. The data is received from the cloud. Finally, the data analysis is performed based on the generated rule. Fig.1 shows the overall flow diagram of the proposed F-PNWAR algorithm. To find valuable association rules, Shapiro [23] presented interestingness measurement of association rules. If sup P ∪ Q sup sup , ⟹ Q is considered as uninteresting rules. The association rule P ⟹ Q is interesting, only if the sup P ∪ Q sup sup is not less than a specified minimum interesting value, _ . The same method is adopted to measure the interestingness of NARs [24]. An interesting NAR is defined as The condition sup min _ should be satisfied, due to the interest in the mining of frequent itemsets in association rules. Similarly, the NAR conditions are defined as ⟹ Qand ⟹ . If ⟹ Q is a NAR, P ∪ Q will be an interesting infrequent itemset. If 'i' is an interesting infrequent itemset, one expression P ∪ Q will exist. This makes one of the interesting NAR P ⟹ Q, P ⟹ Q and P ⟹ Q to hold.

A. Fuzzy Association Rules
The support of an itemset can be computed by finding the fuzzy logic AND of the membership values of the items, for each transaction and adding these values. Let the transaction database be 'D' and itemset , , , … , ⊆ . The support of the transaction to the itemset 'X' is defined as If the fuzzy logic AND is obtained as the result, the support of the itemset from the transaction database is defined as

B. Positive and Negative Weighted Fuzzy Association Rules
Let us assume is the membership function of x for all ∈ . For each transaction ∈ , represents the degree that 't' contains the item 'x'.

1) Positive weighted fuzzy association rules
The support of itemset 'P' sup is considered as the number of transactions in the database that contains the itemset. The weighted minimum support is indicated as _ . Let 'P' and 'Q' be two itemsets. ⟹ is the positive weighted fuzzy association rule, if the following conditions are satisfied 1.
2) Negative weighted fuzzy association rules Let 'P' and 'Q' are two itemsets, if ⟹ is a NAR, both 'P' and 'Q' are frequent. This means that the support value of these itemsets should not be less than the support threshold, while ∪ should be infrequent. The three types of negative fuzzy association rules are defined as follows ⟹ is a negative fuzzy association rule, if the following conditions are satisfied ⟹ is a negative fuzzy association rule, if the following conditions are satisfied.
⟹ is a NAR, if these conditions are satisfied.

C. Algorithm for mining positive and negative weighted fuzzy association rules
The fuzzy ARM algorithm [4] transforms quantitative value into a fuzzy set with the linguistic terms by using the membership functions. The scalar count of each linguistic term is estimated. The support value of the itemsetsis computed. An iterative search method is applied to find the large itemset. Each item uses the linguistic term with the maximum count. The number of fuzzy regions will become identical to the number of original items. This algorithm focuses on the important linguistic terms, the time complexity is minimized. Table I shows the symbols and descriptions used in the mining algorithm [25].

F-PNWAR Algorithm
Input: 'n' number of transactions consisting of customer identity (ID), number of purchased items with their quantities, a set of membership functions, minimum weighted fuzzy support threshold _ , minimum weighted fuzzy confidence threshold _ and minimum interest threshold _ . 'D' is the transaction database and is the i th transaction in D.
Output: A set of PAR and NAR.
Step 1: Convert the quantitative value of each itemset into a fuzzy set using the Fuzzy membership functions.
The is denoted as ⋯ .
is the k th fuzzy region of the itemset and is the fuzzy membership value of the quantitative value in the fuzzy region.
Step 2: Compute the scalar cardinality of each defined as ∑ .
Step 3: Find the max _ , such that 1 . Let be the region with the maximum count for the item. The region with the maximum count value represents the fuzzy characteristic of the item.
Step 4: Calculate the fuzzy support of max _ . Verify if the fuzzy support of the region is greater than or equal to the predefined minimum weighted support threshold , for 1 . If the maximum count value is greater than or equal to , arrange in the large 1-itemsets . |max _ , 1 .
Step 5: If the large itemset is null, the algorithm is stopped. Otherwise, move to the next step.
Step 6: Set k=1, where 'k' represents the number of items in the current large itemsets.
Step 7: Create the candidate set from the large itemset.
Step 8: For the newly generated (k+1) itemsets with the items , , … , in the candidate set, a. Calculate the fuzzy value for the item as and ˄ ˄ … … . b. Calculate the scalar cardinality of the item as ∑ . c. If the count of the item is not less than , then I.
Set the item 's' in . II.
If the weighted fuzzy support of the item 's' ∑ _ , set 's' in ; III.
Else set 's' in ; Step 9: If is null, then perform the next step. Otherwise, set 1 and repeat steps 7 to 9. Step 32: } Step 33: }

IV. PERFORMANCE ANALYSIS
The performance of the proposed work is evaluated by applying it in the groceries dataset [26] on a system with Intel(R) Core i3-3220 x64-based processor and 8 GB capacity. The proposed E-FWARM algorithm is compared with the WARM and FWARM [27] and traditional K-means and Adaptive K-means algorithms [28]. Fig.2 shows the frequent item rate analysis of the proposed F-PNWAR and existing WARM, FWARM and E-FWARM algorithms. The proposed F-PNWAR algorithm achieved maximum frequent items than the WARM, FWARM and E-FWARM algorithms. There is a linear decrease in the number of frequent items with respect to the increase in the support value. Fig.3 illustrates the association rule rate analysis of the proposed F-PNWAR and existing WARM, FWARM and E-FWARM algorithms. The proposed F-PNWAR algorithm extracts more number of association rules than the existing WARM, FWARM and E-FWARM algorithms. There is a gradual decrease in the number of association rule with respect to the increase in the weighted confidence value.   Fig.4 depicts the accuracy analysis of the proposed F-PNWAR and E-FWARM, adaptive K-means and traditional K-means algorithms. The proposed F-PNWAR algorithm yields maximum accuracy of about 97%, while the E-FWARM algorithm yields accuracy of about 93%, traditional K-means and Adaptive K-means algorithms yield accuracy of about 70% and 75% respectively. Fig.5 presents the execution time analysis of the proposed F-PNWAR and E-FWARM, adaptive K-means and traditional K-means algorithms. The proposed F-PNWAR algorithm requires minimum execution time than the E-FWARM, adaptive K-means and traditional Kmeans algorithms.

V. CONCLUSION
Traditional rule mining methods are accurate, but have very hard and fragile operations. Fuzzy-based mining algorithms provide a robust and efficient approach to explore large search space. This paper presented a Fuzzybased algorithm for mining both the PARs and NARs. The proposed algorithm efficiently generates NARs, along with the PARs. This algorithm focuses on the significant linguistic terms, the time complexity is minimized. From the performance analysis, it is observed that the proposed F-PNWAR algorithm yields maximum frequency item rate, association rule rate and accuracy than the existing WARM, FWARM and E-FWARM algorithms. The execution time of the proposed F-PNWAR algorithm is lesser than the E-FWARM, adaptive and traditional K-means algorithms.