Fault tree analysis - O VERVIEW AND CRITICAL EVALUATION OF TOOLS OF H AZARD AND R ISK ANALYSI

3 PROCESS HAZARD AND RISK ANALYSIS MANAGEMENT: A KNOWLEDGE BASED COST

3.1 O VERVIEW AND CRITICAL EVALUATION OF TOOLS OF H AZARD AND R ISK ANALYSIS SUGGESTED

3.1.5 Fault tree analysis

Fault tree analysis (FTA) has been initially developed in 1961 by G. Watson together with A. Means in the “Bell Laboratories” for “Minuteman” rocket launch system. Wider and wider applications initiates that the “Fault Tree Handbook”

[FTH_81], which served as a basis for development of various methods and tools for FTA support, was published in US in 1981.

Fault Tree Analysis should not be confused with Event Tree Analysis (ETA, see in Chapter 3.1.6). Event Tree Analysis uses an inductive approach, ie. one searches for failure, which lead to undesirable consequences (bottom up or forward analysis). The FTA method uses an opposite or deductive approach. The undesirable event is present and one searches for all causes (Top-Down or backward analysis).

Fault tree analysis is en effective tool for revealing logical relations between failing components or subsystem. Those combinations of failures which lead to undesirable event, are to be avoided, or, at least, the probability of they occurrence is to be minimised.

Fault tree analysis has the following objectives:

Systematic identification of all possible failure combination (causes) leading to a given undesirable event (quality analysis);

Evaluation of the system reliability attributes (e.g. frequency of failures, failure combinations, frequency of not desirable events occurrence or unavailability of the system on demand) by calculating reliability attributes of the units of the system (quantitative analysis).

The Fault Tree Analysis should be divided into eight subsequent steps:

1. Precise structural analysis of the system,

2. Determination of undesirable events and the failure parameters,

3. Determination of the relevant probability parameters and the time intervals,

4. Determination of the component failure modes, 5. Creation fault tree,

6. Determination of the basic events (such as: type of failures, time of fault occurrence and unavailability),

7. Analysis fault tree 8. Evaluation of result

In the one can be see an example how the Fault Tree Analysis work According to IEC 61511, Volume 3 [IEC_511].

For more detailed description about Fault Tree, see Josef Börcsök, Functional Safety, Basic Principle of Safety-related systems [BÖR_08].

Figure 10 Example of Fault Tree analysis 3.1.6 Event tree analysis

Event Tree analysis (ETA) belongs to the inductive type of analysis. According to these methods, subsequent events are concluded from the initial events. Initial events are considered as basic event for the ETA. The subsequent events are in simple cause and effect relation with the initial events. They take place after the initial event is finished. ETA is important and would be used for system described by cause-effect chains. In the process industry all events are similar, but using ETA for Hazard and Risk analysis is far too difficult because of the size of the process plants and it does not allow analysing complex situation, only always one event is taken into consideration.

Sensor fails BPCS

fails Valve

Stuck BPCS function

fails External

Event (fire)

Overpressure 0.1/year

Consequ-ence

Basic Event

Transfer gate Notes:

Even Tree Analysis is a simple and easy to implement technique starting with initiating events, represented by the left part of the tree, and then the tree deviate into several branches, which represent subsequent events. Each branch leads to a situation with different outcomes so the event tree can lead to different outcome scenarios.

In the Figure 37 I show a simple example of ETA. In our case we took into consideration the following events:

Starting event is the overpressure in a vessel which may happen with a frequency of 10^-1/year. According to the corporate policy the target frequency is only 10^-4/year.

When the pressure raises the first event is the high pressure alarm followed by an operator intervention. If any of this action failed the next action is the relief valve. If the relief valve fails an unexpected release will happen. This example is given in the IEC 61511, Volume 3 [IEC_511] and it can be seen that in this example the release frequency is higher than the tolerable frequency of the Company.

The method, how to reach the tolerable value given by the Company (how to reduce the risk), is LOPA (Layer of Protection Analysis), see Chapter 3.1.7.

For more detailed description about Event Tree Analysis, see Josef Börcsök, Functional Safety, Basic Principle of Safety-related systems [BÖR_08].

Figure 11 Event Tree analysis example 3.1.7 LOPA

LOPA (Layer of Protection Analysis) is a modified event tree analysis. It is being used for a risk analysis in the chemical, petrochemical and oil and gas industry and was developed in the 1990s in USA. LOPA determines the Safety Integrity Level (SIL) of SIFs for safety oriented processing plants. The principle of protection levels, their number and their evaluation, was published first time by Overpressure

High alarm

3 Release to environment, 9x10^-4/year

5 Release to environment, 1x10^-3/year

4 Release from the prevention layer to environment (ie.

flare), 9x10^-3/year

Release from the prevention layer to environment (ie.

flare), 8x10^-3/year

[CCPS_93]. The different protection levels (layers) within a plant fraught with risk are being described most descriptively through the onion-peel-model. An example of onion-peel-model from the process industry is shown in Figure 12.

Figure 12 Onion-peel-model of LOPA

The single levels (layers) are independent and physically separated.

LOPA is being very important in Hazard Risk analysis, and one of my research questions was the cumulative LOPA method. That is why more details are given in Chapter 4.1.1.

3.1.8 Reliability Block Diagram analysis

The reliability block diagram (RBD) is a stable probability model that is quite easy to use for reliability and failure probability calculations [BÖR_08], in the USA it is considered as fundamental for the network modelling [GOB_98]. Each block in the diagram represents a component of the system. The configuration of the blocks represents the logical relation between the potential losses of the components.

Process

Control and monitoring Basic Process Control system - BPCS

Prevention (for example ESD) Mechanical prevention layer Process alarms with operator action

SIS

Mitigation (for example F&G) Mechanical mitigation

SIS

Factory havaria plant Public havaria plan

In practice there are two possible connections. One can connect the blocks in serial (horizontal arrangement) meaning AND connection, and one can connect parallel (vertical arrangement) meaning OR connection. In the graph one can also depict complex components groups like 2oo3, 1oo2 voting see Figure 13.

The failure rate of these blocks is determined by particular mathematical calculation. The RBD is strictly mathematical and therefore easy to apply.

Figure 13 Reliability Block Diagram, 2oo3, 1oo2 voting example

In reality the application is restricted because only mathematically computable events lead to the result. The Reliability Block Diagram (RBD) shows which element of a system fulfils the demanded function and which might fall out. The RBD is one of the most widely used methods to represent systems in graphical form. The system is dismantled into elements in order to prepare a RBD; those elements fulfil a specific task. Each block relays a reliability characteristic of the component. If a component has several types of failure each type must be represented by way of a block. This makes the application a little bit difficult.

There is an essential difference between RBD and a function diagram. In a RBD elements can occur several times even though they exist only once as a hardware.

Serial and parallel structures are the simplest kinds to link components. The Figure 14 shows the linking of n components in a serial structure (AND structure), while the Figure 15 show the parallel structure (OR structure).

Figure 14 Linkage of n components into a serial structure

With the assumption that these components are independent from each other, one can describe the probability of failure F(f) of a serial system with equation 1.

Definitions:

Ri = Probability of success for component i Fi = Probability of failure for component i

1 2 n

Sensor

Logic Solver

Actuator

Rs = Probability of success for the system Fs = Probability of failure for the system

For n component series system the probability of success of the system is:

∏

= In a serial network it is generally simpler to work with success probabilities and if it is composed of nonrepairable components with constant failure rates (exponential function), it is possible to substitute

where λ is failure rate.

Thus, failure rates for components in a serial system can be added to obtain the failure rate for the system:

∑

= ⁿ_i _i

S 1

λ

Equation 4

For a parallel system, see the Figure 15.

Figure 15 Linkage of n components into a parallel structure For an n component system the result is given in

∏

= These equations come from the fact that all components must fail for failing a parallel system.

To obtain the probability of success for an “n” component system, the rule of complementary events is used:

∏

A simplified method of calculation of Rs and Fs values of a parallel system is when one built up the “truth table”, see Table 15 where the truth table for three component parallel system is shown. It is supposed that all component failure rate (λ) is equal, and the components are independent from each others.

Table 15 Truth table for a i=3

The same method is used when one want to evaluate a voting system, which is a parallel system.

Voting is expressed as number of independent paths (M) required out of the total number of existing paths (N) in order to perform safety function. Voting is often expressed as MooN where:

M express the number of voting

N express the number of redundancy For example: 1oo2, 2oo3, 2oo4, etc.

Table 16 shows a 2oo3 voting system, ie. two existing paths (N=2) from three (total number of path, M=3) is required to perform safety function.

Table 16 2oo3 voting system

Item Element 1 Element 2 Element 3 Result

01 λ λ λ System Success

02 λ 1-λ λ System Success

03 1-λ λ λ System Success

04 1-λ 1-λ λ Failure

05 λ λ 1-λ System Success

06 λ 1-λ 1-λ Failure

07 1-λ λ 1-λ Failure

08 1-λ 1-λ 1-λ Failure

One can gives real probability of failure figures in this table and able to calculate the results (system success and failures).

This model (see Figure) is suitable for calculation the probability failures for Independent Protection Layers (Chapter 4).

Figure 16 IPL as parallel system IPL 1

Alarm system

IPL 2 Operator action

IPL n Relief valve

Table 15 shows Table how one calculates the efficiency of the independent protection layers replacing the lambda values with PFD values. See the interpretation at in

∏

FS means that the occurrence frequency of the unwanted event will be reduced by this figure.

More details about this type of application see Chapter 4.

3.1.9 Markov Modelling

Till now I am discussing the reliability models of systems not having maintenance (except the voting systems).

In case of repairable system, which is typical in industrial environment, another model applies. The reparable systems (voting systems, or other words fault tolerant systems, offers many advantage in terms of system availability and safety.

Repairs take time. Simple reliability network modelling methods do not directly account for repair time. The method, looking for, must account for realistic repair times, realistic system features, including self diagnostic. This technique must apply to systems that are fully repairable and systems that are partially repairable.

Markov modelling, a reliability and safety modelling technique that uses state diagram fulfil these goals using only two simple symbols. Circles (states) show combination of successfully operating components and failed components.

Possible component failures and repairs are shown with transition arcs, arrows that go from one state to another. A number of different combination of failed and successful components are possible. It should be note that multiple failure modes can be shown on one drawing.

A Markov model can show on a single drawing the entire operation of a fault-tolerant control system. If the model is created completely, it will show full system success states. It also will show degraded states where the system is till operating successfully but vulnerable to further failures. The drawing will also show all failure modes.

Andrei Andreyevich Markov (1856-1922), a Russian mathematician defined the Markov process, in which the future variable is determined by the present variable but is independent of predecessors. These methods apply to the failure/repair process because of combination of failures create discrete system states. In addition, the failure/repair process moves between discrete states only as a result of current state and current failure.

The Markov model building technique involves definitions of all mutually exclusive success/failure in a system. These are represented by labelled circles.

The system can transition from one state to another whenever a failure or a repair occurs. Transitions between states are shown with arrows (transition arcs) and are labelled with the appropriate failure or repair probabilities. This model is used to describe the behaviour of the system with time. If time is modelled in

discrete increments (for example, once per hour) simulations can be run using the probabilities shown in the models.

Figure 17 Markov Model, Single nonrepairable Component

Figure 17 shows the Markov model for a nonrepairable component while the Figure 18 shows the Markov model for repairable component. These two simple figures demonstrate the principle of Markov modelling. “λ” is the probability of failure while the “µ” is the probability of repair based on a time interval which matches the process under discussion.

Figure 18 Markov Model, Single repairable Component

The Markov model can be represented by showing its possibilities in matrix form.

An n*n matrix, in our case shown in Figure 18, while the 2*2 matrix is shown in

⎥⎦

Figure 19 Markov Model, 2*2 matrix

This matrix is known as the stochastic transition probability matrix, and is often called “transition matrix” with sign as P.

Each row and each column represents one of the states.

In Figure 19 the row 0 and the column 0 represent state 0, while row 1 and column 1 represent sate 1. If more states existed, they would be represented by additional rows and columns. The numerical entry in a given row and column is the probability of moving from the sate represented by the row to the state represented by the column. The moving from one state to another state always refers the basic time interval. The transition matrix contains all necessary information about a Markov model. It is used as the starting point for further calculation methods.

I used the Markov model in the dangerous undetected fault modelling, see Chapter 6.

3.1.10 HAZOP

HAZOP is being very important in Hazard Risk analysis, and one of my research questions was the HAZOP template method. That is why one can found more details See in details in Chapter 3.1.

3.1.11 Comparison and evaluation of tools suggested by the standards In this chapter I give a summary of the application and other features of different methods discussed in previous chapters, see Table 17.

Table 17 Evaluation and comparison of Hazard and Risk analysis methods

Methods Feature Application

area Weakness Strength Risk Matrix Qualitative Up to SIL2 Subjective,

IPL not

included Easy to use Risk Graph Qualitative Up to SIL2 Subjective,

IPL not

included Easy to use Fault Tree Quantitative Control loop Component

level Easy to use Event Tree Quantitative Events IPL included Easy to use

LOPA

Semi-Quantitative After HAZOP for

complete plant IPL included HAZOP/LOPA integration Block

diagram Quantitative Control loop Nonrepairable system only

Used for LOPA calculation

Markov Model Quantitative

Control loop, HAZOP Qualitative First step of

Hazard and Risk analysis

Time

consuming Completeness One has to distinguish the methods historic point of view. Nowadays the application of Risk Matrix and Risk graph is very seldom in application, exclusive below SIL 2 and a draft approach.

Fault tree, reliability block diagrams and Markov models are useful for reliability calculation of control loop even for components, but not for SIL calculation.

Event tree is only useful in analysing the subsequent events, but not for SIL calculation.

Only LOPA, after and together with HAZOP study is suitable for preparing

“correct” target SIL calculation. In this case the “correct” word means that the accuracy of the result is in balance with the efforts invested.

Process HAZOP study nowadays is the mandatory first step of hazard and Risk analysis. The highlight is on the process word.

Summary: an integrated HAZOP study and LOPA calculation is a state of art solution.

3.2 Overview of IEC 61882 HAZOP standard

Because of the various methodologies to identify and assess risk, specific attention has been paid to HAZOP, formalised by the Institute of Chemical Industry (ICI) at the end of the 1960s and subsequently developed to assess safety risks in process plants and identify operational problems which, although not particularly dangerous, may seriously undermine plant performance [AIC_92], [AIC_85], [BIN_04], [FAN_00], [FEW_00], [KLE_76], [LAW_74], [LIN_01], [MUK_94], [RUS_94], [VIN_98].

In May of 2001 there was published a European Standard CEI IEC 61882 which has the following main goals:

•“Identifying potential hazards in the system. The hazards involved may include both those essentially relevant only to the immediate area of the system and those with a much wider sphere of influence, e.g. some environmental hazards;

•Identifying potential operability problems with the system and in particular identifying causes of operational disturbances and production deviations likely to lead to nonconforming products.”

The HAZOP standard [IEC_882] describes the method and procedure of how to make a HAZOP study. This standard gives one method and procedure of how prepare HAZOP study. Preparing HAZOP is very time and man power consuming and not having any software supporting the HAZOP work documentation.

Meanwhile HAZOP software was launched but the method remained unchanged.

The HAZOP team consists of as minimum from the following people:

HAZOP leader

HAZOP secretary

Operators

Technologist, process engineer

Mechanical engineer

Instrument engineer

The technique appears to be particularly useful in risk identification and assessment during the commissioning phase for the following reasons:

risk analysis in a dynamic and complex context such as commissioning requires an inductive approach able to identify a priori and in detail all negative events which may theoretically occur and not merely those which have occurred in the past. HAZOP is typically a bottom-up methodology and thus most suitable in this case;

to ensure that an inductive approach is effective, a systematic analysis is

exhaustive. In applying all the guide words to each node in the study, HAZOP fulfils this requirement;

HAZOP identifies all deviations which can actually occur and analyses the respective causes and consequences. This approach lends itself well to the identification of possible preventive and protective measures which can be implemented in the system.

The HAZOP (Hazard and Operability Study) Study [KLE_76] and, [KLE_99] are structured critical examinations of plant or processes, either batch or continuous, and are undertaken by an experienced team of company staff, which seeks to identify systematically the risks, faults and operational problems may compromising personal or environmental safety, or plant operation, even damage the business of the Company. Moreover, it can also assess the consequences of deviation from design intent, taking into consideration all undesirable effects regarding safety, operability and the environment, and propose corrective actions and safeguards reducing the severity of the consequences.

The procedure is based on the generation of a series of questions for submission to a multi-disciplinary team with expertise in the process under examination.

Then a combination of parameters and guide words is applied to all parts of the plant considered potentially dangerous. In addition to being particularly demanding from the point of view of the man-hours required, HAZOP studies have strong systematic and multi-disciplinary features typical of plant projects, and can thus be considered as small projects in themselves.

The possible deviations are generated by rigorous questioning, prompted by a series of standard ‘guidewords’ applied to the intended design.

The deviations from the intended design are generated by coupling the guideword with a variable parameter or characteristic of the plant or process,

In document Műszeres biztonsági rendszerek menedzsmentje (Pldal 67-0)