Motivation, roadmap of the thesis - Data Mining Techniques for Process Development

Concluding the previously described need for integration, the main motivation of the thesis is to create an integrated framework whereto data mining, modeling and simulation, experimentation tools can be incorporated. To achieve model integrity, the existing models should be reapplied, the non-existing models cre-ated, and all the models connected in an appropriate way. If it is possible to collect sufficiently large amount of data from the process, Knowledge Discov-ery in Databases (KDD) technique can be applied to extract information focused on the maintenance or control operation problems to make the production more efficient [7].

As I suggest in Figure 1.6, the information flow of such integrated method-ologies should be centered around a process data warehouse in a process im-provement cycle. Sources come from available process data, current process knowledge (rules, constraints, etc.) and an integrated global model of products, process and process control. As these information are collected in the data warehouse, data mining tools, modeling and experimentation tools can be ap-plied to aid the improvement of the process while extracting further knowledge.

All these developments and the process data warehouse, which they are centered around, were created within the research project of the Cooperative Research Centre of Chemical Engineering Institute entitled as "Optimization of multi-product continuous technologies" with implementation at the Polypropy-lene plant of Tisza Chemical Group Plc., Hungary.

According to the motivations and main goals explained above, the thesis is structured as follows: Chapter 2 describes an integrated process data ware-house with applications to product quality and operating cost estimations, while Chapter 3 presents novel clustering based data analysis tools to be able to an-alyze data queried from or transferred to the data warehouse. As shown pre-viously, process data and simulator models are linked together through exper-imentation, hence a genetic algorithm based tool for interactive process opti-mization was developed, which is detailed in Chapter 4. Till know some tools for process intensification, scaling up, control and monitoring have been worked out and implemented to demonstrate the potential impact of the developed ap-proach in technology development. The details of these application examples are presented in Appendix A. As this chapter illustrates, the results presented in this thesis can be immediately applied in the field of process engineering by de-signing tools that can support model based process and product development.

For readers not familiar with every concept and notation used in this theses the theoretical background of the contributions are briefly discussed in Appendix B.

Chapter 2 Process Data Warehousing and Mining

According to costumers’ expectations and market challenge, process industry needs to have the ability to operate complex, highly interconnected plants that are profitable and that meet quality, safety, environmental and other standards.

In ideal case this requires modeling and simulation tools, which integrate not only the whole product and process development chains, but all the process units, plants, and subdivisions of the company. These information islands are also formed at the level of the collection and analysis of historical process data, however the access to this data is limited due to the heterogeneity of the infor-mation sources (the data is generated in different places and stored in different type of databases) and due to the data is stored only in shorter periods of time.

This chapter proposes a know-how for the design and implementation of pro-cess data warehouses that integrates plant-wide information, where integration means information, location, application and time integrity. The process data warehouse contains non-violate, consistent and preprocessed historical process data and works independently from other databases operating at the level of the control system. To extract information from historical process data stored in the process data warehouse tools for data mining and exploratory data analysis have been developed.

2.1 Motivation

The increasing automation and tighter quality constraints related to production processes make the operator’s and plant engineer’s job more and more difficult.

The operators in the process industry have many tasks such as to keep the process condition as closely as possible to a given operating point, to preserve optimality, to detect failures and to maintain safety. The more heterogenous the units are the less transparent the system is. Hence, there is a need for integrated information system that solves these problems and supports process monitoring and development.

Feature Formation Recognition

Identification Decision of Tasks Planning

Association Stored Rules

Automated Action Patterns Skill-based Behavior

Rule-based Behavior Knowledge-based Behavior

Goals

Symbols

Signs

Sensory Inputs Signals Action

Figure 2.1: Three-level model of skilled human operator.

As the three-level model of the performance of skilled operators shown in Figure 2.1 suggests, such Operator Support Systems (OSS) should indicate in-tuitive and essential information on what is happening with the process and give suggestions according to operator’s experience and skills [8], [9], [10]. Hence, the OSS of complex processes should be the combination of information sys-tems, mathematical models and algorithms aimed to extract relevant information (signs, e.g. process trends and symbols) to "ease" the operator’s work. In the following the main elements of this kind of system are described.

In modern industrial technologies the existence of a distributed control sys-tem (DCS) is a basic requirement. This syssys-tem is responsible for the safe op-eration of the technology in the local level. In the coordination level of the DCS many complex tasks are handled, like controller tuning, process optimization, model identification and error diagnostic. These tasks are based on process models. As new products are required to be introduced to the market over a short time scale to ensure competitive advantage, the development of process models necessitates the use of empirical based techniques, since phenomeno-logical model development is unrealizable in the time available [9]. Hence, the mountains of data that computer-controlled plants generate, must be effectively used. For this purpose most of the DCS systems are able to store operational process data. However, DCS has limited storage capacity because this is not its main function, only data logged in the last one or two months is stored in these computers. Since data measured in a longer time period have to be used for so-phisticated process analysis, quality control, and model building, it is expedient to store data in a historical database that is useful to query, group and analyze the data related to the production of different products and different grade tran-sitions. Today several software products in the market provides the capability of integration of historical process data of DCS’s: e.g. Intellution I-historian [11], Siemens SIMATIC [12], the system of Fisher-Rosemount PlantWeb [13] and the Wonderware FactorySuite 2000 MMI software package [14].

As it will be deeply described in the the case study shown in Section A.1, there are several heterogeneous information sources that have to be integrated to support the work of engineers and operators with relevant, accurate and use-ful information. In case of process systems standard data warehousing and

OLAP techniques are not always suitable for this purpose because the opera-tion units in the process industry have significant dynamical behavior that re-quires special attention contrary to the classical static business models. The source of this dynamical behavior is the dynamical effect of the transportation and mixing of material flows. Since process engineering systems rarely oper-ate on steady-stoper-ate manner (process transitions, product changes significantly occur), and the control and monitoring of these dynamical transitions are the most critical tasks of the process operators, the synchronization of data taken from heterogeneous information sources of process systems requires dynamical process models. These dynamic qualities of process units and the related data sources make it unsuitable to simply apply standard OLAP techniques. Hence, as it will be presented in the following section, the integration of the historical databases of DCS’s into OSSs is not only a technical problem, in this process the special features of the technology have to be taken into account.

In document Data Mining Techniques for Process Development (Pldal 10-14)