Motivations, goals - Folyamatadatok szabálykeresésen alapuló elemzése

The formulated products (e.g. plastics, polymer composites) are generally produced from many ingredients. The large number of interactions between components and processing conditions all have the effect on final product quality. This feature of the complex production systems defines an important motivation factor to develop

Operators

PV’s PV’s Graphical User Interface

SP’s

Figure 1.8: Scheme of process analysis methodology

formalized in fuzzy rule base system (FRBS) which could support the monitoring and analysis work of operators or engineers. To the efficient process analysis an ap-propriate methodology is also necessary beside the analysis tools. Fig. 1.8 presents the scheme of the designed process analysis methodology, the interface between operation and knowledge discovery levels.

At operation level, the distributed control system (DCS) assures locally the se-cure and safety operation of the technology. The measurements serve the values of the process variables (PV’s) to DCS which forwards the information to the graphi-cal user interface (GUI). The operators of the technology can get information about the system via the GUI and (if it is necessary) they control the process by changing the set points (SP’s). An advanced model based process control (Process Com-puter) computer calculates among others the operation set points (OP’s) for the DCS. Moreover, it calculates many special PV’s which provide more information about the production. Most of the DCS have functions for data storing. These data definitely have the potential to provide information for product and process design, monitoring and control, but the access to data is limited in time on the process con-trol computers, since they are archived retrospective for about 1-2 months.

If we want to store technology data for longer time interval, it is expedient to store data in a database, or especially in a process data warehouse (DW). Process DW is a data analysis-decision supporting and information processing unit, which

is operated separately from the databases of the DCS. It is an information environ-ment in contrast to the data transfer-oriented environenviron-ment, which contains trusted, processed and collected data (see the characteristic of process data) for historic ana-lyzes. Therefore, it makes possible to get relevant information from the technology.

Consequently, the two main goals of this thesis are the followings:

1. Develop a general, easy to use, but efficient process analysis methodology for (polymerization) production technologies based on data warehousing. The methodology should be applicable for different (polymerization) technolo-gies.

2. Develop new data mining tools to discover important relationships (repre-sented by fuzzy rules) in data of productions. The new algorithms have to be applied not only for process data but also to more general problems.

Structure of the thesis

In accordance with the main goals, the thesis is structured into three main parts.

• In the first part, a process data warehouse building mechanism is introduced (Chapter 2). It is based on the data handling steps (selection, preprocess-ing, transformation) of the knowledge discovery process (Fig. 1.2) and the resulted process data warehouse could be the main data source for analysis by descriptive statistics and various data mining tools (Fig. 1.8).

• The second part of my thesis discusses two new data mining tools. First, a new fuzzy decision tree based classification method is introduced (Chapter 3), then a fuzzy association rule mining technique is presented (Chapter 4). This technique is applied to develop three new methods, namely a fuzzy associative classification (Section 4.1), a model structure selection(Section 4.2) and a rule base visualization (Section 4.3) algorithms.

• The developed tools are applied for several benchmark and real-world prob-lems from data mining and chemical engineering literature, moreover in the third main part of the thesis an industrial application study is also showed (Chapter 5).

Chapter 2 Process Data Warehousing for Complex Production Systems

2.1 Introduction

Process manufacturing is increasingly being driven by market forces, customer needs and perceptions, resulting in more and more complex multi-product manu-facturing technologies. It is globally accepted that information is a very powerful asset that can provide significant benefits and a competitive advantage to any or-ganization, including complex production technologies. The complex production processes consist of several physically distributed production units and these units represent heterogenous information sources (e.g. real-time process data, product quality data, financial data, etc.). Therefore, for the effective operation and im-provement of complex technological systems an integrated information system (in-cluding advanced process automation and operator support systems) is essential.

Such information system must be able to handle heterogeneities in terms of:

• Type of information, like:

– Prior knowledge arising from natural sciences and engineering - formu-lated by mathematical equations, or data

– Heuristical, empirical knowledge expressed by linguistical rules, stored and handled by expert systems,

– Sampled and calculated process data;

• Data format:

– Manually logged data (reports reside in many different file and database structures developed by different vendors),

– Databases in different platforms and formats, including the historical databases of distributed process control system (DCS);

• Content:

– Product features - measured in laboratory, – Process features - measured by the DCS.

The complex organizations have vast amounts of such heterogenous data and it is difficult to access and use them. Thus, large organizations have had to write and maintain perhaps hundreds of programs that are used to extract, prepare, and consolidate data for use by many different applications for analysis and reporting.

Also, decision makers often want to dig deeper into the data once initial findings are made. This would typically require more intensive and effective integration of the information sources.

Several works deal with the problem of heterogeneous data(base) integration. In [124] Zhao discussed the problem of detecting semantically corresponding records from heterogeneous data sources, because it can be a critical step in integrating the data sources. Many researches focus on how data structured in different ways can be handled. Considering databases, XML files, other structured text files or web services as information supplier, the complexity of integrating the informa-tion with their various describing models is not easy to handle. Different soluinforma-tion methods have been worked out (e.g. in [27, 108]). In [14] a new object-oriented lan-guage, with an underlying description logic, was introduced for information extrac-tion from both structured and semi-structured data sources based on tool-supported techniques. [81] presented a framework for the comparison of systems, which can exploit knowledge based techniques to assist with information integration. Integra-tion of heterogeneous data sources is also related to knowledge discovery and data mining, see e.g. [41, 93].

Beside the database integration within a particular production unit, there is a need for information integration in the level of the whole enterprize for the pur-pose of optimal operation. This task cannot be fully automated, there is a need for permanently improved methods and approaches for creation, storage and

dissemi-Since this solution cannot be fully automated, it is costly, inefficient, and very time consuming.

The aim of this chapter is to illustrate that data warehousing offers a better ap-proach. Data warehousing implements the process to access heterogeneous data sources: clean, filter, transform and store the data in a structure that is easy to ac-cess, understand, and use. The data is then used for query, reporting, and data analysis to extract relevant information about the current state of the production, and support decision making process related to the control and optimization of the operating technology.

The basics of process data warehousing are detailed in the next section (tion 2.2) and an introduc(tion to exploratory data analysis (EDA) is given in Sec-tion 2.3. A case study is presented at the end of the thesis (Chapter 5), where the full development process of an information system for multi-product technology analysis in a polypropylene plant is detailed. The central element of the developed information system is a process data warehouse. It can be used not only for gen-erating report and executing queries, but it supports the analysis of historical data, process monitoring and data mining, knowledge discovery too. Before the applica-tion study is showed some new data mining methods and algorithms are presented in Chapters 3-4.

In document Folyamatadatok szabálykeresésen alapuló elemzése (Pldal 19-24)