• Nem Talált Eredményt

1.2 Process data - from source to applications

1.2.1 Data acquisition and retrieval

In the early years of chemical engineering processes, systems were controlled manually and data collection or data acquisition was performed by plant en-gineers as human observers as no central control system existed.

During the continuous enhancement of process control, digital equipments substituted analogue ones and analogue measurements. In modern automated systems, a redundant eld bus system is installed to bridge digital eld instru-ments with the central process control system eleinstru-ments connected through an inner system bus. At this level, distributed control systems are implemented with several subfunctions like data storage, basic data visualization and inter-faces to corporate business information system. Such systems were TDC2000 from Honeywell Inc., CENTUM from Yokogawa, UCS3000 from Bristol. As hardware instruments got more standardized, these companies turned to ad-vanced software packages.

These phenomena led to the development of open control systems (OCS), which are based on standard operational systems (like MS Windows, Unix) and on open network protocols. A basic attribute of these systems is inter-operability, thus products from dierent vendors can be built in the system as building blocks. OCS solutions were rst applied in supervisory control and data acquisition systems called SCADA. A SCADA system can be con-sidered as a software package, which is installed on a standard set of hardware

equipments using standard open network protocols for communication [61].

The two main weaknesses of data acquisition systems are not handling heterogenity and data inaccessibility:

1. Data from dierent sources and in dierent format cannot be handled in one environment, e.g. a priori knowledge, empirical or phenomenological knowledge cannot be incorporated into sampled data. Lots of research has been done on the problems of data compression and data integrity.

Next section deals with several solutions to these problems.

2. A mid-size chemical plant has about few thousand measured variables sampled from seconds to hours, a hundred manipulated variables to con-trol a few critical product quality related variables, which results in terra-bytes of data every year. It would mean ineciently large data storage capacity if one wants to analyze not only prompt but historical data.

In this section solutions to these problems and commercial products already available on the market are presented.

Integrated information storage and query

To solve the problem of heterogeneous data integrity several approaches have been developed. Complexity of integrating the information with their various describing models is not easy to handle, hence solution methods are dierent.

Two main solution groups can be identied: where the integrality problem is solved at the query level or at the construction level of the integrated in-formation system. Collins et al. developed an XML based environment [62], while Wehr suggests an object-oriented global federated layer above informa-tion sources [63]. In [64], Bergamaschi et al. presents an object-oriented lan-guage as well with an underlying description logic, which was introduced for information extraction from both structured and semi-structured data sources based on tool-supported techniques. Paton et al. developed a framework for the comparison of systems, which can exploit knowledge based techniques to assist with information integration [65]. Another approach to handle the het-erogeneity of information sources is the application of data warehouses (DWs) to construct an environment lled by consistent, pre-processed data [66].

The main advantage of a DW is that it can easily be adapted to a DCS and other information sources of a process while it works independently (see Table 1.1)[67].

DCS related database Data warehouse Function Day-to-day data storage Decision supporting

for operation and control

Data Actual Historical

Usage Iterative Ad-hoc

Unit of work General transactions Complex queries

User Operator Plant manager, engineer

Design Application-oriented Subject-oriented Accessed records Order of ten Order of a million

Size 100 MB-GB 100 GB-TB

Degree Transactional time Inquiry time

Region Unit, product line Product

Table 1.1: Main dierences of a DCS related database and a data warehouse.

Appropriate time-series representation for data compression

Data compression is rather a contribution of the signal and image processing society where lossless information transmission is a key feature within limited time or bandwidth. In chemical engineering society, data compression has beside storage capacity rationalization another important issue: retrieve the data in a manner that they are easily interpretable for later engineering tasks.

In this manner, data compression problem is turned into trend representa-tion problem. Lin et al. gave a classicarepresenta-tion of process trend representarepresenta-tion methods in [68], which can be seen in Fig 1.5. Many of these representation techniques refer to segmentation of time series, which means nding time in-tervals where a trajectory of a state variable is homogeneous [69], representing data by its segments and storing only the segments instead of raw data.

Time series representations

Data-adaptive Non-data-adaptive

Piecewise

Strings Orthonormal Bi-orthonormal Discrete Fourier Transform

Discrete Cosine Transform

Interpolation Regression

Lower Bounding

Non-lower bounding

Haar Daubechies dbn (n>1)

Coiflets Symlets Trees

Figure 1.5: Hierarchy of various time series representations for data mining.

14

Products on the market

The modern distributed control systems (DCSs), which are widely imple-mented in modern automated technologies, have the direct access to the eld instrument signals and measurements, while have data storage functions as well. Today several software products in the market provide the capability of integration of historical process data of DCS's: e.g. Intellution I-historian [70], Siemens SIMATIC [71], the PlantWeb system of Fisher-Rosemount [72], Won-derware Factory- Suite 2000 MMI software package [73] or the Uniformance PHD modul (Process History Database) from Honeywell [74].

Concluding, modern data acquisition systems need to be capable of han-dling diverse types of data in a way that data are applicable for further anal-ysis. The next section deals with this topic where a widely-applied procedure is presented.

1.2.2 Information extraction from process data: