CFARC: Compact Fuzzy Association Rule based Classifier . 65

3.6 Conclusion

4.1.4 CFARC: Compact Fuzzy Association Rule based Classifier . 65

In this section a new method for fuzzy association rule based classifier is introduced.

The proposed method gives high classification power due to the compact set of FCARs. This is the reason while it is called Compact Fuzzy Association Rule based Classifier. The method contains five main steps already presented in Fig. 4.2. A possible implementation (algorithms) for all the steps are detailed in the following

a_{i, j}

Figure 4.5: The generated partitions by the applied partitioning methods: (i) Ruspini-type, (ii) Gustafson-Kessel, (iii) Gath-Geva, (iv) C4.5, where A_i,j repre-sents the jth partition (membership function) at the ith attribute, the a, b, c and d are the range paramters of the several partitions (triangular, trapezoid)

Input data partitioning

One of the most important step of an associative classifier is to partition the contin-uous attributes. Since the effect of the partitioning method results in significantly different classification power, this subsection describes four different methods for this research problem.

First of all, an globally unsupervised fuzzy method, the Ruspini-type partition-ing is presented [90]. Then Gustafson-Kessel (GK) clusterpartition-ing algorithm is detailed which is a locally unsupervised method [43]. The third clustering technique is the supervised Gath-Geva (GG) algorithm [39]. To compare the fuzzy methods with an effective crisp technique, the partitioning of the C4.5 decision tree induction algorithm is also applied [85].

• Ruspini-type partitioning: The simplest partitioning methods (e.g. the equal width interval binning) generate equidistant width partitions for all of the at-tributes. The equal width interval partitioning with the use of fuzzy triangular membership functions is called Ruspini-type partitioning. The vertexes of a triangular membership function are marked by the values on abscissa: a, b, andc, respectively (see (i) in Fig. 4.5). The valuesaandcof each member-ship functions are equal to the valueb of the adjacent triangular. Therefore the sum of the membership values of a data point x_k for the z_i attribute is

always equal to one:

j=1

A_i,j(x_k,i) = 1, (4.19)

wherea_i,j = b_i,j−1 andc_i,j = c_i,j+1, andq_i denotes the number of member-ship functions on theziattribute.

• Gustafson-Kessel partitioning: Instead of quantizing the input data into hard subsets, fuzzy Gustafson-Kessel clustering algorithm can be used as an unsupervised locally partitioning method. The number of partitions for all the attributes (number of attributes denoted by n) are set as an input vector for the algorithm: q = [q₁, q₂, . . . q_n]^T. Each partition of attributes will be de-termined by a cluster. Thewⁱ = [wⁱ₁, w₂ⁱ, . . . , w_qⁱ_i]centers of the clusters and the Mⁱ ∈ [0,1]^N×qⁱ partition matrix are calculated for all attributes of data (i = 1,2, . . . , n, andN denotes the number of samples). Thekth row vector of theMⁱ partition matrix is: m_k = [m_k,1, m_k,2, . . . , m_k,q_i]. Its elements are the fuzzy membership values of thex_k,icrisp data points for all of theq_i clus-ters determined on the z_i attribute. The generated fuzzy partitions (clusters) can be directly used to determine the memberships of the new data points:

A_i,j(x_k,i) = Mⁱ_k,i (4.20)

For the representation of theA_i,j(x_k,i)fuzzy sets the parameterized member-ship functions (e.g. trapezoids) are proposed (see for example the (ii) sub figure in Fig. 4.5), where the four parameters: a, b, c and d determine the shoulders (b, c) and the legs (a, d) of the trapezoid. In the current implemen-tation, the position of the shoulders is determined based on a threshold value of the memberships, where Mⁱ_k,i > 0.9. The legs of the membership func-tions were defined in such a way to obtain the feature of Ruspini type partition (Eq. 4.19).

• Gath-Geva partitioning: If Gath-Geva clustering method is used for parti-tioning, Gaussian membership functions are proposed to represent the fuzzy sets (clusters):

A_i,j(x_k,i) = exp Ã

−1 2

(xk,i−νi,j)² σ²_i,j

(4.21)

curve. The Gaussian functions can be transformed into trapezoidal (or trian-gular) membership functions in an easy way. Thea, b, canddparameters of the trapezoid fuzzy sets are determined by the following way:

a_i,j =ν_i,j −3·σ_i,j, b_i,j =c_i,j =ν_i,j, d_i,j =ν_i,j+ 3·σ_i,j (4.22)

The use of these parameters results triangular fuzzy sets in the attributes, how-ever at the current implementation of the proposed method the first and the last fuzzy sets in all of the attributes are transformed into real (where the pa-rametersc andd are non equal) trapezoids. An example is presented in the (iii) sub figure in Fig. 4.5.

• C4.5 based partitioning: The ID3 (Interactive Dichotomizer 3) is one of the most popular algorithms for decision tree induction [84]. The improved ver-sion of this algorithm is called C4.5 which has a greedy crisp cutting method (see an example in (iv) sub figure in Fig. 4.5). During the induction step the selection of the attributes is based on the principle of the information gain.

Note that this type of partitioning method does not always select all of the attributes. In this situations the unnecessary attributes can be disregarded or partitioned by another method. In this study only the first approach (disre-gard) is proposed.

Frequent fuzzy item set searching

The frequent item set searching can be the bottleneck of all the association rule mining algorithms, it is due to the long searching time. Many methods use a fast and efficient (e.g. FP-tree based) algorithm for the item set searching task. But here a new fuzzy method based on the Apriori algorithm is introduced which can be efficiently used to mine the important classification rules.

The algorithm (presented in Fig. 4.6) first determines the support values of the classes by the distributions of the classes over the data set. Let the f ∈ [0,1]^s×1 columns vector be the frequency vector of the classes in the training data (where s denotes the number of the possible classes). The sum of the class frequencies are equal to one (P_s

i=1f_i = 1). For the frequent item (set) searching the minimal fuzzy support thresholdγ is set to a (user) given percentage (denoted byτ) of the

Frequent fuzzy item set searching(an Apriori fuzzy implementation) Input: D_Ffuzzy data

Output: the set of frequent fuzzy item set Method:

1. Deterime the supports of the classes by the distribution of classes;

2. Set the minimal fuzzy support (Ȗ) to the half of the minimum frequency of classes;

3. Generate the 1- candidate fuzzy items;

4. Calculate FS values then select the frequent fuzzy items from the 1-candidate which has FS > Ȗ, and n = 2;

5. While there some n-1 size frequent item sets:

Generate the n– size candidate sets from n-1 size frequents (and 1-size frequents);

Calculate FS values of n– size candidates then select the frequents, n = n+1;

Figure 4.6: The frequent fuzzy item set searching algorithm

minimum frequency of the classes:

γ = min(f)∗τ (4.23)

(The 50 % of the minimum frequency of the classes is proposed to the value of τ in our implementations.) Afterγ is determined the fuzzy support values for all the possible fuzzy items (item sets with only one item) are calculated by the equation Eq. 4.5. This step is frequently called as 1-item candidate generation. The can-didates where the calculated FS value are higher than or equal to γ is marked as frequent item sets. Then from the frequent 1-item sets the frequent2,3. . .(n+ 1) -item sets are determined in the same way based on the Apriori principle. During the frequent item set searching, of course, only the item sets including the class labels are generated. After all of the frequent fuzzy item sets are searched, the next task is the FCAR generation. This step is detailed in the following subsection.

Fuzzy classification association rule generation

Classification association rules can be generated from discovered frequent item sets.

Many types of methods are published in the literature to select a well usable set of CARs to get high classification accuracy. Most of them use some confidence criteria in the rule generation where value of the minimal confidence is defined by the user.

This is a big problem of methods based on the support-confidence framework

be-Fuzzy classification association rule (FCAR) generation Input: a set of frequent fuzzy item sets

Output: positive correlated FCARs separated by size Method:

1. Generate association rules with class label consequent from all the frequent item sets to consider the size of item sets

2. Calculate the correlation values (FCORR) of all the rules;

3. Select rules with positive FCORR value for all size;

Figure 4.7: The proposed fuzzy classification association rule generation algorithm based on a correlation measure

In the developed method (presented in Fig. 4.7) only a fuzzy correlation measure (FCORR) is applied for the rule generation. The definition and the calculation of this measure are given by Eq. 4.8. The correlation can be used to select the most important rules for the classification. The FCORR of a rule can be a positive or a negative value in the interval [−1,1]. A rule with a correlation value near to the one can be more important to classification. The algorithm calculates the FCORR values for all of the FCARs, and selects only the rules with positive correlation values.

Rule base pruning

After the positive correlated FCARs are selected, the importance, the generality and the complexity of the rules are analyzed. The most usable rules for the classification are determined by a three steps rule base pruning method (presented in Fig. 4.8).

First, rules with largest correlation values for each class label are searched in each size of rules. In the second step FCORR correlation values of complex and general rules are compared. Comparison is started from the most complex (longest) rules to more general rules. If a rule from the set of more general rules which are included by a complex rule has higher correlation as the complex rule, complex rule is marked and removed from the rule base. The last step of the pruning method is similar to the second one, but here the comparison is started from the most general (smallest) rules to more complex rules. This step is applied for the obtained rule base after the first and second steps of the pruning, therefore during this pruning a very high compact fuzzy classification rule base is produced. The main advantage of the method is that it does not require data set covering analysis of the rules, still it provides an efficient

Rule base pruning algorithm

Input: positive correlated FCARs separated by rule size Output: a compact set of FCAR

Method:

1. Select rules with the highest FCORR for each class in each size of the rules; n = the largest size of the rules

2. Do until n = 2

for all the n - length rules select the more general (n–1) - length rules which are included by the complex, compare the FCORR values of the generals and complex:

if a general rule has higher FCORR, remove the complex n = n -1

m = 1 (the smallest rules are 1-length)

3. Do until m = the largest size of rules in the rule base pruned by step 2

for all the (m) - length rules select the more complex (m+1) length rules which include the more general, compare the FCORR values of the complexes and the general:

if a complex rule has higher FCORR, remove the general m = m+1

Figure 4.8: The proposed fuzzy rule base pruning algorithm

rule selection.

4.1.5 Experimental results

To evaluate the accuracy and compactness of the proposed classifiers generated by CFARC method, they were tested on five different benchmark data sets. The data come from the UCI Machine Learning Repository, namely the Breast-wisconsin, Heart, Iris, Pima and Wine problems (Table 4.3). There are two reasons why these five classification problem were selected. First of all, these are the data sets that con-sist only continuous attributes, and therefore suitable for the performance analysis of the different partitioning techniques (detailed in Section 4.1.4). Secondly com-parisons of the method to the existing ones is simple. Note that for all of the data sets, ten-fold cross validation were applied.

Performance study

Each partitioning technique is tested on each type of classification method for all of the five data sets. To obtain the correct performance measures for all of the tech-niques, five times ten-fold cross validation was applied. Therefore, the classification accuracies of the methods are the mean accuracies of the runs. The classifiers that

Table 4.3: Basic information of data sets Data set Size # Attribute # Class

Breast-w 699 10 2

Heart 270 13 2

Iris 150 4 3

Pima 768 8 2

Wine 178 13 3

• c₁: Classify by a set of rules, classes are determined only by the firing strengths of the rules

• c2: Classify only by the strongest rule which has maximal product of the firing strengths and FC Eq. 4.18

• c₃: Classify by a set of rules, classes are determined by the product of the firing strengths and FC of the rules Eq. 4.12

• c₄: Similar to methodc₃ but it is weighted with the aggregated firing strength Eq. 4.15

• c5: Similar to methodc3but the weighting factor is determined by the quotient of the aggregated firing strength and the aggregated number of rules Eq. 4.16

• c₆: Apply the initial class distribution as a weighting factor to the methodc₃ In order to compare the performances of the different techniques the results are presented by column diagrams in Fig. 4.9.

The classification method based on Ruspini-type partitioning technique gives very highest classification accuracies for all problems, but it provides good results using several classifiers for all problems. For example, thec1classifier has the high-est accuracy at the Breast-wisconsin problem, but it gives the worst results at the Wine data set. If the partitioning method is based on the GK clustering algorithm, the results show high accuracy for all classification problems except the Breast-wisconsin data set. But instead of using GK, the GG algorithm can be efficiently applied to generate right partitions and to get high accuracy associative classifiers.

It always provides good result for all data sets. This is due to the effectiveness of supervised clustering mechanism. The accuracies of the tested six classifiers are the smallest by using the C4.5 algorithm for all of the data sets. Therefore, the tested

1 2 3 4

Figure 4.9: Accuracies of the classification methods applying the Ruspini, GK, GG, C4.5 partitioning techniques by the using of several classification model

fuzzy techniques, especially the supervised clustering method can produce more proper input domain partitions for associative classification.

To get a more detailed picture of how the different partitioning methods are per-forming, a complexity analysis of the rule bases has been applied. By this kind of analysis, the number of rules and the number of conditions were compared. The complexity values belonging to the accuracies (presented in Fig. 4.9) are summa-rized in Fig. 4.10. Considering the number of rules on the left, the most compact classifiers are generated by using GG supervised clustering algorithm. Beside the classifier has a small number of rules, there are general (not too complex) FCARs considering the condition numbers on the right side of the figure.

Comparison of CFARC and other methods

After the exhaustive performance comparison of the possible partitioning and clas-sification techniques, this subsection presents the comparison of the CFARC algo-rithm and other associative classification algoalgo-rithms based on their accuracy and

1 2 3 4

Figure 4.10: Classifier complexities (number of rules and conditions) of the clas-sification methods applying the Ruspini - 1, GK - 2, GG - 3, C4.5 - 4 partitioning techniques

literature. The same data sets are used as in the previous analysis above. Table 4.4 shows the classification accuracies and Table 4.5 shows the classifier sizes of the algorithms for all of the five problems. Results using ten-fold cross validation are marked by the symbol "*". If there was no available published result, the symbol

"-" is placed. The MCAR algorithm serves the highest classification accuracy for the five data sets. However considering the classifier size (Table 4.5), we can see that it generates the almost largest rule base. The number of the associative rules generated to achieve the represented accuracies in Table 4.4 were not published by the authors of seven algorithms from the 15. Some of the previous algorithms can serve compact classifier. For example the GARC algorithm uses much less number of rules than MCAR, but it has lower accuracy in average. It has very small accu-racy at the Wine data set that is classified by most of the algorithms with more than 90% accuracy (the MSR serves the highest 99.4% if the published value is really correct).

To compare CFARC to the previously published algorithms and to get reliable results, the classification performance is measured by five times ten-fold cross

val-Table 4.4: Classification accuracy for CFARC algorithm and other algorithms for various data set (^∗: using ten-fold cross validation)

Algorithm Breast-w Iris Heart Pima Wine

ADT - 92.0 - - 93.1

ARC-AC^∗ 93.4 94.0 - 78.0

-ARC-PAN^∗ 94.5 93.4 83.7 72.6

-CAEP^∗ 97.3 94.7 83.7 75.0 97.1

CBA^∗ 95.8 92.9 81.5 72.4 91.5

CMAR 96.4 94.0 82.2 75.1 95

CorCLASS 97.4 96.0 82.3 75.9

-CPAR 96.0 94.7 82.6 73.8 95.5

GARC 95.7 96.0 88.0 76.2 86.7

LB^∗ 96.9 - 82.2 75.8

-MCAR^∗ 96.5 95.3 81.1 78.5 97.1

msCBA 96.3 94.7 81.9 72.9 95.0

MSR^∗ 96.4 95.4 81.4 73.3 99.4

OAC^∗ 95.1 94.0 81.1 78.1 97.1

TFPC^∗ 90.0 95.43 - 74.9 81.9

CFARC [c2,GG]^∗∗ 95.1 95.9 77.4 72.9 94.2

idation (it is marked by the symbol ^∗∗). The last row of Table 4.4 and Table 4.5 contains the summarized results of CFARC for the test data sets. The presented results of CFARC are generated by using supervised Gath-Geva partitioning algo-rithm and the second type classifier method.

The results show that CFARC method serves very compact and good accu-rate fuzzy associative classifiers in average. However, the classification accuracy is much smaller at the Heart data set compared to other algorithms. One possible reason for this is that the Cleveland data set is used from the Heart classification problem and it has 13 continuously valued attributes but three of them have only binary values. The other frequently used Heart problem is the Hungarian data set.

CFARC serves better results for this type of Heart test (CFARC [c₂,GG]^∗∗gives 79.5

% accuracy). There were no published information about which data set (Cleveland, Hungarian, etc.) was applied for the previous algorithms (except the Corclass which

Table 4.5: Number of classification rules for CFARC algorithm and other algo-rithms for various data set (^∗: using ten-fold cross validation)

Algorithm Breast-w Iris Heart Pima Wine

ADT - 4.6 - - 15

ARC-AC^∗ 135.0 30.0 - 190.0

-ARC-PAN^∗ 1000.0 60.0 80.0 50.0

-CAEP^∗ - - - -

-CBA^∗ 49.0 5.0 52.0 45.0 10.0

CMAR - - - -

-CorCLASS - - - -

-CPAR - - - -

-GARC 21 7 12 6 16

LB^∗ - - - -

-MCAR^∗ 61.0 31.0 31.0 66.0 12.0

msCBA - - - -

-MSR^∗ - - - -

-OAC^∗ 83.0 9.0 157.0 112.0 23.0

TFPC^∗ 14.0 14.8 - 20.3 109.6

CFARC [GG]^∗∗ 3.1 3 2.7 2 3.8

4.1.6 Conclusion

The interpretability and accuracy are critical importance in many classification ap-plications. The ideal solution would satisfy both criteria but they are contradictory issues. Previously proposed associative classifier methods give high accuracy but these predictions are based on too large set of rules. In contrast of these methods, my algorithm, CFARC produces very compact (small number of rules) and good accurate fuzzy classifier systems at the same time. Therefore, it efficiently helps to understand the data and the predict mechanism for several types of classification problem.

4.2 Fuzzy Association Rule Mining for Model

In document Folyamatadatok szabálykeresésen alapuló elemzése (Pldal 65-77)