• Nem Talált Eredményt

Discussion and conclusion

The extremely massive growth of data created at any industrial process ( process data) has become one the main overwhelming problems that most industrial plants and companies have to deal with. In order to be able to solve process failures and

stored. All sort of chemical process in chemical plants generate a constantly grow-ing amount of data such as the output and input variables, historical time record-ing, process stability statistics and more. Therefore, an automated way of storing is needed, moreover in order to deal with all the information that this constantly growing process data is able to provide, several analysis methods can be utilized.

For this purpose the basics of process data warehouse and exploratory data analysis were introduced in this chapter.

Beside the exploratory data analysis, data mining is also a very useful analytic process designed to explore and to extract hidden predictive information, either from databases or data warehouses by searching for consistent patterns and/or sys-tematic relationships or rules between the (process) variables [97]. In the followings (Chapter 3-4) some new data mining tools are introduced.

Chapter 3

Supervised Clustering Based

Compact Decision Tree Induction

3.1 Introduction to pattern recognition

Machine learning is concerned with the design and development of algorithms and techniques that allow computers to learn. There are two main types of machine learning: inductive and deductive. Inductive learning methods extract rules and patterns out of massive data sets. Deductive learning works on existing facts and knowledge and deduces new knowledge from the old one. There are many factors that influence the choice and efficacy of a learning system, such as the amount of domain knowledge used by the system.

A typical machine learning method consists of three phases [89]:

1. Training: a training set of examples of correct behavior is analyzed and some representation of the new knowledge is stored. This is often represented by some form of rules.

2. Validation: the rules are checked and, if necessary, additional training is given. Sometimes additional test data are used, or a human expect may vali-date the rules or some other automatic knowledge based component may be used. - The role of tester is often called the critic.

3. Application: the rules are used in responding to some new situations.

start

collect data

choose features prior

knowledge

choose model

train classifier

evaulate classifier

end

Figure 3.1: Design cycle of a pattern recognition system

egories or classes. Depending on the application, these objects can be images or signal waveforms or any type of measurements that need to be classified. Pattern recognition has a long history, but before the 1960s it was mostly the output of the-oretical research in the area of statistics. Today the main application areas of the pattern recognition are machine vision, character recognition, computer-aided di-agnosis, speech recognition, fingerprint identification, signature authentication, text retrieval, face and gesture recognition, etc. [101].

The design of a pattern recognition system usually entails the repetition of a number of different activities as data collection, feature choice, model choice, train-ing and evaluation (Fig. 3.1). The followtrain-ing description of the steps is adopted from [34].

Data collection can account for surprisingly large part of the cost of develop-ing a pattern recognition system. It may be possible to perform a preliminary feasibility study with a small set of typical examples, but much more data will usually be needed to assure good performance in the fielded system. The main question is that how do we know when we have collected an adequately large and representative set of examples for training and testing the system?

Feature choice: The choice of the distinguishing features is a critical design

step and depends on the characteristic of the problem domain. Incorporating prior knowledge can be far more subtle and difficult. In some applications the knowledge ultimately derives from information about the production of pat-terns. In others the knowledge may be about the form of the underlying cate-gories, or specific attributes of the patterns. In selecting or designing features, we obviously would like to find features that are simple to extract, invariant to irrelevant transformations, insensitive to noise, and useful for discriminating patterns in different categories. How do we combine prior knowledge and empirical data to find relevant and effective features?

Model choice: how do we know when a hypothesized model differs signif-icantly from the true model underlying our patterns, and thus a new model is needed? In short, how are we to know to reject a class of models and try another one?

Training: in general, the process of using data to determine the classifier is re-ferred to as training the classifier. No universal methods have been developed to solve the problems that arise in the design of patter recognition system.

However, the most effective methods for developing classifiers involve learn-ing from example patterns.

Evaluation is important both to measure the performance of the system and to identify the need for improvements in its components. While an overly com-plex system may allow perfect classification of training samples, it is unlikely to perform well on new patterns. This situation is known as over-fitting. One of the most important areas of research in statistical pattern classification is determining how to adjust the complexity of the model. It should be not too simple to explain the differences between the categories, yet not too complex to give poor classification on novel patterns. Are there principled methods for finding the optimal (intermediate) complexity of classifier?

In the last decades many types of classifiers were developed for the classification problems (model based techniques: decision tree, Bayes [62, 87, 55, 111], or model independent methods, as for example the k-nearest neighbor [19, 110, 70, 40]). This chapter is focused on decision tree based classification, therefore in the followings, first the basics of decision trees, then the related works of the topic are reviewed.

x1 ”D1 C1

x1> D1 x1

x2

x2”D2 x2> D2

C1 C2

C1

C1 C2

D1 D2

x1 x2

Figure 3.2: Example for a decision tree