Decision tree based classification - Folyamatadatok szabálykeresésen alapuló elemzése

The decision tree induction is a classical method for classification problems. The samples in the learning data set belong to {c_i}_i≤C classes and the goal of tree in-duction method is to get an input attribute partitioning which warrants the accurate separation of the samples. A decision tree has two types of nodes (internal and terminal) and branches between the nodes. During a top-down run in the tree, the internal nodes represent the decision situations for data attributes, therefore the in-ternal nodes are called as cut (test) nodes. The possible outputs for a cut are rep-resented by the branches. The terminal nodes of the tree are called leaves where the class labels are represented. The pathes from the root to the leaves (sequences of decisions, or cuts) represent the classification rules. Therefore a decision tree is a representation of the data partitioning, it defines a hyper-rectangle. Fig. 3.2 represents a decision tree and the defined three hyper-rectangle where the decision attributes arex₁, x₂, the cut points areD₁, D₂ and the classes areC₁ andC₂. Each leaves of the tree represent a region which is equivalent to a classification rule. See for example the leaf at the outside right which represents the following rules: If x₁ > D₁ Andx₂ > D₂ ThenC₂ (For example, if the temperature of the reactor is higher thanD₁ and the pressure is higher than D₂ then product has good quality).

The most of the decision tree induction algorithms (e.g. ID3, C4.5) are based on the divide and conquer strategy. In every iteration steps the cut which serves topically the highest information gain (greedy algorithms) is realized.

The crisp and fuzzy decision trees are similar in structural point of view (nodes, branches, etc.), but in the case of fuzzy trees, the cuts, the decision situations are fuzzy. The most frequently used types of attribute partitioning are the Gauss, the sigmoid and the fractionally linear dichotomies. For more than two partitions,

x₂

x₁

C₁(P_C1) C₂(P_C2) C₁(P_C1)

C₂(P_C2)

x₂

1 0

Figure 3.3: Fuzzy decision tree and fuzzy partitions

Gauss, triangular or trapezoidal fuzzy sets can be constructed from the basic func-tions. The fuzzy sets as membership functions determine the membership values of the data points. Thanks to the fuzzy logic, the membership values of a data point can be calculated for all the fuzzy membership functions (fuzzy sets) at the given attribute. Accordingly it is possible that the samples belong to many classes (with several membership values) simultaneously. The Fig. 3.3 shows an example for the fuzzy partitioning (by trapezoids) and the fuzzy decision tree. The quality of the partitioning is very important at the decision tree induction algorithms which apply a priori partitioning, because the partitions have a key role in the classification per-formance.

Related works

As a result of the increasing complexity of classification problems it becomes nec-essary to deal with structural issues of the identification of classifier systems. The input attributes of a classification problem can be basically continuous or discrete in values. Also the interpretability of models and the nature of some learning al-gorithms require effectively discretized (partitioned) features. Hence, the effective partitioning of input domains of continuous input variables is an important aspect regarding the accuracy and transparency of classifier systems. Moreover consider-ing the characterization of process data (most of the process variables is continuous in value) an efficient discretization is a very important part of the classifier system.

A discretization method can be a priori or dynamic. In a priori partitioning methods the attributes are partitioned before to the induction of the tree. Contrarily,

most known decision tree induction methods that utilize a priori partitions are the ID3 [84] and the AQ [73] algorithms. The most current dynamic methods are the CART [16], the Dynamic-ID3 [38], the Assistant [56] and the C4.5 [85] algorithms.

Fuzzy partitioning of continuous features can increase the flexibility of the clas-sifier since it has the ability to model fine knowledge details. The particular draw-back of discretization into crisp sets (intervals) is that small variations of the inputs (e.g., due to noise) can cause large changes of the output. This is because the tests are based on Boolean logic and, as a result, only one branch can be followed after a test. By using fuzzy predicates, decision trees can be used to model vague deci-sions. The basic idea of fuzzy decision trees is to combine data-based learning of decision trees with approximate reasoning of fuzzy logic [52]. In fuzzy decision trees, several branches originating from the same node can be simultaneously valid to a certain degree, according to the result of the fuzzy test (see the example in Fig. 3.4). The path from the root node to a particular leaf model therefore defines a fuzzy operating range of that particular model. The output of the tree is obtained by interpolating the outputs of the leaf models that are simultaneously active. Be-side fuzzy decision trees represent discovered rules far natural for human, fuzzy logic serves more robust classifiers in case of false, inconsistent, and missing data.

Fuzzy logic, however, is not a guarantee for interpretability, as was also recognized in [96, 28]. Real effort must be made to keep the resulting rule-base transparent [95, 76].

Since the 80’s years many fuzzy decision tree induction algorithm have been introduced [3, 37, 61]. The paper [116] analyzes the cognitive doubtfulness attached to the classification problem and propose a fuzzy decision tree method for solution.

In [102] and [117] genetic algorithms are introduced for tree induction. The Neuro-fuzzy ID3 algorithm is introduced in [48], where linear programming is used to tree induction. The FILM (fuzzy inductive learning method) method has been proposed in [53]. This method converts a general decision tree to a fuzzy one. Boyen and Wehenkel introduced a new induction algorithm for problems where input variables are continuous and the output is fuzzy [15].

(Fuzzy) decision trees based on the original ID3 algorithm assume discrete do-mains with small cardinalities. Another important feature of ID3 trees is that each attribute can provide at most one condition on a given path. It is great advantage as they increases comprehensibility of the induced knowledge, but may require a good a priori partitioning, because the first discretization step of the data mining procedure especially affects the classification performance. Hence, for this purpose

Janikow introduced a genetic algorithm to optimize this partition step [51], while Pedrycz proposed an environment dependent clustering algorithm to get adequate partitions [83].

Motivations and goals

As previous papers illustrate, decision tree and rule induction methods often rely on a priori discretized continuous features. However, advantages of the supervised and fuzzy discretization of attributes have not been shown yet. This chapter shows a study which proposes a supervised clustering algorithm for providing informative input information for decision tree induction algorithms. For the tree induction the FID algorithm is applied [52]. It has the advantage that the resulted fuzzy decision tree can be transformed into a fuzzy rule based classifier and in this way the per-formance and complexity of classification model can fine-tuned by a rule pruning algorithm.

The new method has been implemented as a fuzzy classification toolbox in MATLAB, it is called FIDMAT (Fuzzy decision tree Induction in MATlab). With the FIDMAT toolbox the classification performance of the original FID program can be easily analyzed. The toolbox makes possible to compare several partitioning techniques as the FID partitioning method, Ruspini-type partition and the new clus-tering based approach. Moreover performances and complexities of classifiers can also be compared with other well known decision tree based classifiers, for example with the C4.5 algorithm.

The rest of the chapter is structured in the following sections. Section 3.3 intro-duces the structure of the fuzzy classifier. In Section 3.4 the proposed supervised a priori partitioning algorithm is presented. In Section 3.5 an experimental study illustrates that compact, perspicuous and accurate fuzzy classification models can be identified by the new approach.

In document Folyamatadatok szabálykeresésen alapuló elemzése (Pldal 34-37)