SILs are defined in terms of failure rates, and there is a widespread misconception that this means the failure rates of the equipment that is achieving safety. This is usually wrong: SILs depend on the failure rates of a function, not of a piece of equipment that is part of the implementation of that function. In order to understand this, let us start with the concept of safety integrity.
Take for example a transportation system. It's function is to move goods and/or people from one place to another. But because of Murphy, we need additional functions to prevent the system from damaging or destroying the goods, or from harming or killing the people. These are the safety functions.
Now there are various ways of implementing safety functions. We can use administrative procedures, design properties, monitoring and control systems etc. For example, there must be two people to push the button and trigger off a nuclear attack, and they both have to go through a ritualised check list to make sure that they really are authorised to press that button. This ritual is an example of an administrative implementation of a safety function.
As an example of a design based implementation of a safety function, stamping machines can usually only be operated with two hands, so the operator cannot get his own hand crushed on the job. And a simple grid can keep other peoples' hands away.
And then we have those complex control systems that for example nuclear and chemical plants use. All these different measures for implementing a safety function can, and often are, used simultaneously. Together they constitute a safety system.
There are cases where individual measures alone are not sufficient to implement a safety function, a combination of suitable measures is necessary. This is typically the case in complex control systems, where individual elements of the system are necessary but not sufficient for a safety function. For example, a pressure monitoring system will not be sufficient to prevent a boiler from exploding, but it can be used to trigger a relief valve when the pressure exceeds a certain limit. So implementation of the safety function "prevent explosion" requires three measures: installation of a pressure monitor, installation of a relief valve and a communication between the pressure monitor and the relief valve. In this simple case, failure of one of the three implementation measures would result in loss of the safety function. By using two relief valves, we would retain the safety function if one of the valves fails.
Now safety systems are also subject to Murphy's Law, so we have to reckon with some of the implementation measures for a given function failing to work. This may not necessarily result in total loss of the safety function, because the loss of one implementation measure may be compensated by another (e.g. the second valve in the example above). In other words, the safety function may continue to work (possibly more weakly) in spite of partial loss of its implementation.
This ability of a safety function to continue to be effective in spite of deterioration or loss of its implementation measures is Safety Integrity.
Take for example a motor car. One of its safety functions is the braking function. The purpose of the braking function is to reduce the car's speed so that we will not crash into walls, skid off the roads or mow down pedestrians. This function is implemented by a braking system, speed limits and limitations on the power of the engine.
Another safety function is the "containment" function, whose purpose is to prevent us from falling out of the car while it is moving. It is implemented by the doors, safety belts, and the shape of the seats.
Now if you were given the choice of driving through mountainous countryside
with lots of serpentines in a car without brakes or one without doors,
which would you choose?
From experience you know that you don't usually get flung against the
doors every time a car drives round a bend, so you will assume that the
chances of falling out of the car without doors in a serpentine are smaller
than the chances of flying off the road with the brakeless car. So you'll
choose the former.
Now what you've just done is demand a higher level of integrity for the braking function, because you feel the risk that it tackles is greater than the risk that the containment function tackles. This, however, is equivalent to saying that you are less willing to accept the risk of braking failure. In other words, the safety integrity level you demand is determined by your willingness to accept the risk involved.
| r | = (dC/dt * dA/dt) * (f(C)/b) |
A look at the units indicates the kinds of factors that will influence risk acceptability and hence also the determination of SILs.
Risk is defined as severity of an accident times the likelihood of the accident (see e.g. [ref.1]). In our case, severity boils down to casualties per accident, and likelihood to accidents per time, so risk becomes casualties per time:
[r] = [C/A]*[A/t] = [C/t]Now looking at the units from the equation we then get
The differential risk aversion factor depends on the casualties per accident. It will also depend on the operational time, because if the operational time is so high that people feel it is more likely for such accidents to occur, their aversion will be greater when they do occur.
So the unit for f will be something like [C/A/t]. The relationship is not necessarily linear, so we may have to include powers of these units!
The benefit will certainly increase with time (the longer the benefit lasts, the better it is), so the unit for f(C)/b is now something like [C/A/t * 1/t]=[C/A/t/t], which still isn't [t/A]. However, in the function f(C) we must include geopolitical factors such as the population's awareness of risk, their current living conditions, which might make them much more willing to take chances than we are, etc. For example, people living in an area with extensive environmental damage will accept a much higher pollution rate from a technological system than people living in clean, healthy surroundings, if the technological system reduces the already existing environmental damage. So we will need additional factors in the units for f(C).
We also need some additional components for b, because the benefit provided by a system will also depend on similar geopolitical factors. People living in a country that has to import electric power will consider nuclear power plants to be much more beneficial than would be the case for people living in a country with an abundance of alternative power sources.
But without knowing exactly what they are, we can see that the quotient of the missing dimensions for f(C) and the missing dimensions for b must end up giving us [t/A]. It would far exceed the scope of this article to go into a detailed discussion of how many factors contribute and how, but it is clear that risk acceptability depends on a large variety of factors that will vary with time, place, society etc. One of those factors will be the particular technology that is involved, so risk acceptance and hence SILs will also vary according to the technology involved.
This is a complex process. Determining the risk level is a reasonably straight forward exercise, and IEC 61508 gives guidance on how to do this. But determining which level of risk is acceptable is, as we have seen, a much more complicated business, and the standard says little about how to do the job.
We must then identify the safety functions that we will need in order to reduce each risk to an acceptable level. And first then can we start associating safety integrity levels with the safety functions, based on the acceptability criteria for the risk involved. Finally, we must determine the measures we need to implement each safety function and the degree to which each measure contributes to the implementation of that function. Based on the safety integrity level for that function we can finally determine failure rates for the individual implementation measures. These will not be the same as the failure rates for the safety function unless we only have one single measure to implement the function.
[2]. O.Nordland
"A
discussion of risk tolerance principles"
The Safety-Critical Systems Club Newsletter
Volume 8, Number 3, May 1999; Page 1
reprinted in The Hazards Forum Newsletter
Issue No. 27, Summer 1999; Page 2