Common cause between IPLs - C OMMON CAUSE FAILURES

5 SIS DESIGN MANAGEMENT: PRACTICAL INTERPRETATION OF THE PROCESS SAFETY

5.3 C OMMON CAUSE FAILURES

5.3.3 Common cause between IPLs

The independency of the IPLs is a well known requirement, everybody know this criteria but, in the every day practice, according my experience the designer built in this mistake in the system.

I was looking for, why and where this mistake would happen and discovered that many times this is built in the LOPA study phase, when the participant are preparing the Safety Requirement Specification, which is the basic of the SIS Design.

What is the reason of this mistake? The answer was easy to discover. The HAZOP team neglects the basic principle of the IPL ie. the cause of the hazard scenario never would be protected by the cause itself.

For example see Figure 37, where a vessel level control and overfilling protection is shown.

The question to be answered is if LAH is a critical alarm, what the independency means, what the correct solution is and where the high level alarm signal is to be connected to.

In every day practice this signals are connected to BPCS. That means in case of BPCS failure the LAH as independent protection layer will not operate and will not send alarm signal warning to the operator to start an action of decreasing the level in the vessel. BPCS failure is a common cause factor (involving the HMI) which should be taken into consideration in the SIL calculation.

A better solution is when the LAH signal is connected to Logic Solver, but in this case, the SIS and the Critical alarm are not independent. Because of the better reliability figure of Logic Solver, in this solution, the common cause factor is more or more negligible.

The correct, but more expensive solution, which matches the standards, is an independent Critical Alarm Management System with a separate HMI.

Figure 37 Example of common cause of IPLs 5.4 System behaviour on detection of fault

There are two aspects of Safety System behaviour (the detection of a dangerous fault by diagnostic tests, proof tests or by any other means:

The target is to maintain the functional safety of the system in case of a single hardware fault with:

a specified action to achieve or maintain a safe state or

continued safe operation of the process whilst the faulty part is repaired If the repair of the faulty part is not completed within the mean time to restoration (MTTR) assumed in the calculation of the probability of random hardware failure, then a specified action shall take place to achieve or maintain a safe state. The specified action (fault reaction) required to achieve or maintain a safe state should be specified in the safety requirements. It may consist, for example, of the safe shutdown of the process or of that part of the process which relies, for risk reduction, on the faulty subsystem or other specified mitigation planning.

There are some important rules for the designer:

when the above actions, depend on an operator taking specific actions in response to an alarm (for example, opening or closing a valve), then the alarm shall be considered as a part of the safety instrumented system (i.e., independent of the BPCS).

LT LAH

SV LCV

LS1 LS2 LOGIC Solver

Independent Protection Layer

(IPL)

BPCS

where the above actions depend on an operator notifying maintenance to repair a faulty system in response to diagnostic alarm, this diagnostic alarm may be a part of the BPCS but shall be subject to appropriate proof testing and management of change along with the rest of the SIS

In case of detection of a dangerous fault (by diagnostic test, proof tests or by any other means) in any subsystem having no redundancy the repair of the faulty subsystem shall be done within the mean-time-to-restoration (MTTR) period assumed in the calculation of the probability of random hardware failure. During this time the continuing safety of the process shall be ensured by additional measures and constraints. The risk reduction provided by these measures and constraints shall be at least equal to the risk reduction provided by the safety instrumented system in the absence of any faults

It is found more about the faults and failures can be found in Chapter 6.

5.4.1 Hardware Fault Tolerance and its realisation There are two possibilities:

Using IEC 61508-2 which specifies the factors and specifies the extent of fault tolerance required

Using IEC 61511-1,2 in which it was considered that the requirements for fault tolerance of field devices and non PE logic solver could be simplified and the requirements in IEC 61511-1 could be applied as an alternative.

It should be also noted that subsystem designs may require more component redundancy than that is stated in Table 21, Table 22 in order to satisfy availability requirements.

The requirements for hardware fault tolerance can apply to individual components or subsystems required to perform a SIL value of a SIF. For example, in the case of a sensor subsystem comprising a number of redundant sensors, the fault tolerance requirement applies to the sensor subsystem in total, not to individual sensors.

SIS designer shall use Table 21, Table 22 in designing the SIF loops According to the SRS, independently of the solutions involved in subsystems or logic solvers.

2oo4 voting system involved in a logic solver CPU does not mean that Figures of Table 22 are satisfied in case of a SIL3 SIF loop.

Figures of the Table 21, Table 22 refer to the HFT value of the SIF loops and not for the component level itself. For example if a transmitter used redundancy inside the electronic to provide the SIL 3 category and one used it in a SIL 3 SIF loop, according to the Table 21 one shall design the SIF loops with HTF = 1 value.

5.4.2 Hardware fault tolerance

Hardware fault tolerance is the ability of a component or subsystem to continue to be able to undertake the required safety instrumented function in the presence of one or more dangerous faults in hardware. A hardware fault tolerance of one means that there are, for example, two devices and the architecture is such that the dangerous failure of one of the two components or subsystems does not prevent the safety action from occurring.

The minimum hardware fault tolerance has been defined to alleviate potential shortcomings in SIF design that may occur due to the number of assumptions made in the design of the SIF, along with uncertainty in the failure rate of components or subsystems used in various process applications.

It is important to note that the hardware fault tolerance requirements represent the minimum component or subsystem redundancy. Depending on the application, component failure rate and proof-testing interval, additional redundancy may be required to satisfy the SIL of the SIF to match the target value of the probability failure on demand and/or risk reduction factor.

The traditional approach of safety system design was to ensure that no single fault would result in loss of the intended function. System architectures such as 1oo2 or 2oo3 have a fault tolerance of one because they are able to function on demand even in the presence of one dangerous fault. Such systems were employed as a standard approach for safety systems to ensure they were sufficiently robust to be able to withstand random hardware failures. Fault tolerance architectures also gave protection to a wide range of systematic faults (mainly in hardware) because such faults do not necessarily arise at the same instant of time.

Because of the different levels of performance it is no longer appropriate to expect all safety integrity levels to be fault tolerant. In selecting the architecture to be used for a specified integrity level it is however important to ensure that it is sufficiently robust for both random hardware faults and systematic faults.

The requirements for hardware fault tolerance can apply to individual components or subsystems required to perform a SIF. For example, in the case of a sensor subsystem comprising a number of redundant sensors, the fault tolerance requirement applies to the sensor subsystem in total, not to individual sensors.

5.4.3 Minimum hardware fault tolerance of PE logic solvers

The IEC 61511-1 gives a minimum hardware fault tolerance of Logic Solvers requirement for the designer according to Table 21.

Table 21 HFT for Logic Solver

Minimum Hardware Fault Tolerance SIL SFF < 60 % SFF 60 % to 90

% SFF > 90 %

1 1 0 0 2 2 1 0 3 3 2 1 4 Special requirements apply (see IEC 61508)

The hardware fault tolerance requirement depends on the required SIL of the SIF and the PE subsystem’s safe failure fraction (SFF). Information on safe failure fraction of logic solvers can normally be obtained from the PE logic solver vendor. If the PE logic solver is not used, according to the assumptions made in

the calculation of the SFF then the claims made for safe failure fraction should be carefully considered.

The SFF is related to random hardware failures only. In establishing the SFF it is acceptable to assume that the subsystem has been properly selected for the application and is adequately installed, commissioned and maintained such that early life failures and age related failure may be excluded from the assessment.

Human factors do not need to be considered when determining SFF. Data sources and assumptions made during a calculation of SFF should be documented.

5.4.4 Minimum hardware fault tolerance of sensors and final elements Table 22 of IEC 61511-1 defines the basic level of fault tolerance for sensors, final elements, and non-PE logic solvers having the required SIL claim limit in the first column. The requirements in Table 22 refer on the requirements in IEC 61508-2 for PE devices with a SFF between 60 and 90 %. The requirements are based on the assumption that the dominant failure mode is the safe state or that dangerous failures are detected.

Table 22 HFT for sensor, final elements subsystems SIL Minimum Hardware Fault Tolerance

1 0 2 1 3 2 4 Special requirements apply (see IEC 61508)

The designer shall have possibility to satisfy the minimum HTF values using voting system.

There are some reasons why the redundancy or voting is designed in the SIS system:

Increase the availability of the system

Making the maintenance work more practical

Matches the Hardware Fault Tolerance values description of the standards What does the voting mean? Voting is expressed as:

Number of independent paths (M) required out of the total number of existing paths (N) in order to perform safety function

Voting is often expressed as MooN

M express the number of voting

N express the number of redundancy

For example: 1oo2, 2oo3, 2oo4, etc.

Hardware fault tolerance of N means

Hardware fault tolerance is easy to calculate

For any MooN system the HFT=N-M

For example for a 2oo3 system the HFT=3–2=1 Table 23 Voting and HFT

Architecture Voting Redundancy HFT

1oo1 1 NO 0 2oo2 2 NO 0

1oo2 1 1 1

2oo3 2 1 1

1oo3 1 2 2

2oo4 2 2 2

From dangerous failure point of view the following are:

1oo1 and 2oo2 identical

1oo2 and 2oo3 identical

1oo3 and 2oo4 identical

5.4.5 Exception for hardware fault tolerance in case of sensors and final elements

For all subsystems (for example, sensor, final elements and non-PE logic solvers) excluding PE logic solvers the minimum fault tolerance specified in Table 22 may be reduced by one if the devices used comply with all of the following:

the hardware of the device is selected on the basis of prior use (see Chapter 5.4.7);

the device allows adjustment of process-related parameters only, for example, measuring range, upscale or downscale failure direction under operation and;

the adjustment of the process-related parameters of the device is protected, for example jumper, password (“write protected”) and all action regarding parameter modification is well documented;

the function has an SIL requirement of less than 4.

This sub-clause allows the hardware fault tolerance of all subsystems except PE logic solvers to be reduced by one on certain conditions. These conditions will apply to devices such as valves or smart transmitters and reduce the likelihood of systematic failures such that the requirements are aligned to the requirements of IEC 61508-2 for non PE devices.

In some cases it may be possible to reduce the fault tolerance by following the fault tolerance requirements of IEC 61508-2. This may be achieved by introducing additional diagnostics such as signal comparison or regularly scheduled partial stroke testing such that the SFF of the subsystems is higher

My conclusion is that all field elements having “SIL” Certificate according to IEC 61508-2 shall conform the “prior in use” criteria and in safety application the HFT values in Table 22 may be reduced by one.

5.4.6 Minimum hardware fault tolerance according to IEC 61508

In Chapter 5.4.6 there was a reference to IEC 61508-2. Let’s see what is in IEC 61508-2. The SFF definition is in

Alternative fault tolerance requirements may be used providing an assessment is made in accordance to the requirements of IEC 61508-2 shown at Table 24 in the context of hardware safety integrity. The highest safety integrity level that can be claimed for a safety function is limited by the hardware fault tolerance and safe failure fraction of the subsystems that carry out that safety function.

The architectural constraints have been included in order to achieve a sufficiently robust architecture, taking into account the level of subsystem complexity.

The architecture and subsystem derived to meet the hardware fault tolerance requirements is that used under normal operating conditions.

From application point of view either IEC 61508 or IEC 61511 is taken consideration, in practice one hardware fault tolerance is needed for the SIF when SIL = 3.

Table 24 Hardware safety integrity: architectural constraints on type B safety-related subsystems

According to Chapter 5.4.5 the designer has the possibility to design a SIS system based on “Prior in Use” basis.

There are very few field devices (sensors and valves) that are designed per IEC 61508-2 and IEC 61508-3, but their number is increasing. Users and designers will therefore have to depend more heavily on using field devices that have been

“proven-in-use”.

The basis of this solution is that in the case of field devices (for example, sensors and final elements) fulfilling a given function, this function is usually identical in safety and non-safety applications, which means that the devices will perform in a similar way in both type of applications. Therefore, consideration of the performance of such devices in non-safety applications should also be deemed to satisfy this requirement.

First criteria for “Prior in Use“ application is that appropriate evidence shall be available about the components and subsystems are suitable for use in the safety instrumented system.

“Prior in Use” solution, which means that in the case of field elements, extensive operating experience may be either in safety or non-safety applications. This can be used as a basis for the evidence.

The level of details of the evidence should be in accordance with the complexity of the considered component or subsystem and with the probability of failure necessary to achieve the required safety integrity level of the safety instrumented function(s). The probability failures are important because without this there is no possibility of preparing the validation calculation of SIF loops.

That is why the statement like this field device is “proven in use” is not enough in itself.

Many users have a list of instruments that are approved or recommended for use in their facility. These lists have been established by extensive successful operating experience on their BPCS. Sensors and valves that have had a history of not performing as desired have been eliminated.

Normally the sensors and valves that are on these approved or recommended lists for the BPCS could also be considered as proven-in-use for SIS subject to the assessment required by 61511-1. This list of instruments should include the version of the device and be supported by documented monitoring of field returns at the user and at the manufacturer. In addition, the manufacturer should have a modification process which evaluates the impact of reported failures and modifications.

If such a list does not exist, then users and designers need to conduct an assessment on the sensors and valves to ensure that they are satisfied i.e. the instrument will perform as desired.

In practice the “prior in use” approach is rather acceptable and used in case of field sensors and actuators than in case of logic solvers.

It is important to know that all safety system shall be validated after installation and commissioning. When one wants to validate a SIF using “proven in use”

component he/she may run into trouble not having lambda value for the probability calculation. It is a good point in case of “proven in use” to decrease the HFT value with one, but on the other hand for the lambda values of the component there are a lot of criteria of the standard which is not easy to perform and not having correct numbers nobody are able to validate the safety instrumented system.

5.4.8 Role of diagnostic

The standards has statement about how to be performed the diagnostic, also about the operator action after having diagnostic alarm, but nothing is found how to interpret that special case when for example the transmitter is “smart”, ie. it has “built in” diagnostic features, but the diagnostic signals are not accessible for the operator and in this case, the diagnostic part of the lambda value does not exist.

As it was shown, the safe failure fraction will be modified (SFF ^SD ^SU ^DD

= + Equation 17) accordingly.

In a simple case when all lambda values are equal in the first case the SFF = 0.75; while not having diagnostic action at all SFF = 0.5.

Based on the calculation with realistic figures the lack of diagnostic action may mean a half of SIL value and according Table 24 the SFF < 60% means that one have to be added to HFT value in the table. In practice if the requested SIL = 2 and the diagnostic was installed the HFT = 1, however in case of no diagnostic action HFT = 2.

Conclusion: the application of Hart or Foundation Field bus components is not enough, a diagnostic data acquisition system have to be installed also, with actions performed by the operators and maintenance persons.

5.4.9 Requirements for selection of components and subsystems For the SIS designer the standard defines two levels of requirements:

specifying the requirements for the selection of components or subsystems which are to be used as part of a safety instrumented system,

specifying the requirements to enable a component or subsystem to be integrated in the architecture of a SIS.

The first criterion to be performed is the hardware fault tolerance. That should be calculated SIF by SIF.

The second criteria are to decide whether proven in use or certified component method shall be taken into consideration. In case of using certified components the only task left for the designer: preparing a pre validation SIF by SIF. In case of using proven in use method means that this is not the designers decision rather the customers, herewith this becomes the responsibility of the customer.

5.5 SIS Design verification

According to IEC 61511 all life cycle phase has to be verified.

In document Műszeres biztonsági rendszerek menedzsmentje (Pldal 122-0)