This article was first published in SAFETY SYSTEMS, the Safety Critical Systems Club Newsletter, Volume 8, Number 3 (May 1999).
Newsletter enquiries should be made to the editor, Mr Felix Redmill, 22 Onslow Gardens, London, N10 3JU; tel.: +44 (0) 20 8883 0789, e-mail: felix.redmill@ncl.ac.uk.
Enquiries concerning other aspects of the Safety Critical Systems Club should be directed to
Mrs Joan Atkinson, Centre for Software Reliability, University of Newcastle upon Tyne, NE1 7RU; tel: +44 (0) 191 221 2222, fax +44 (0) 191 222 7995, e-mail: csr@newcastle.ac.uk.
The article was reprinted in the The Hazards Forum Newsletter issue no. 27, Summer 1999
Further information regarding any articles in that issue is available from Miss Darlene Torey or Dr Ian Lawrenson at
The Hazards Forum, 1 Great George Street, London SW1P 3AA; tel: +44 (0) 20 7665 2158, fax +44 (0) 20 7233 1806, e-mail: torey_d@ice.org.uk.

Click here for a PDF version of this text

A discussion of Risk Tolerance Principles

by Odd Nordland

Introduction

There is a number of approaches used to identify whether or not the risk posed by a given technical system is "tolerable". They are all attempts to define objective, rational criteria for determining whether or not enough has been done to eliminate risks in order to fulfil public expectations and demands. Where the risks cannot be completely eliminated, they should at least be reduced to such a low level that the general public would be willing to tolerate them and still accept deployment of the system. Let’s first take a brief look at some of the currently used principles.

Currently used principles

The French GAMAB principle ("Globalement Au Moins Aussi Bon": globally at least as good) assumes that there is already an "acceptable" solution and requires that any new solution shall in total be at least as good. The expression "in total" is important here, because it gives room for trade-offs: an individual aspect of the safety system may indeed be worsened if it is overcompensated for by an improvement elsewhere. It is closely related to the British ALARP principle, as we shall see in a moment.

The ALARP principle (the residual risk shall be As Low As Reasonably Practicable) is the only one described in IEC 61508, in Annex B to Part 5 of the standard. That annex is of informative rather than normative character, so it should not be concluded that the ALARP principle is the only one that is compliant with the standard.

The ALARP principle assumes that one "knows" a level of risk that is acceptable to the general public and requires that the risk posed by any new system shall at least be below that level. How far below is where the term "reasonably practicable" comes in: theoretically, an infinite amount of effort could reduce the risk to an infinitely low level, but an infinite amount of effort will be infinitely expensive to implement. So we have to identify a second level of risk that so low that the public will accept that "it’s not worth the cost" to reduce it further.

Now associating risk reduction with cost tends to be misunderstood. Of course, if achieving a safe system is prohibitively expensive, the system just won’t be built. But cost is not just a matter of money.

Let’s take a rather primitive example to illustrate this. If the risk of being involved in a train accident at high speed is unacceptably high, one way of reducing it would be to reduce the speed of the trains. This, however, would mean that the duration of a train trip would increase, so that the exposure time to the risk of a low speed accident would also increase. In other words, the cost of reducing the risk of high-speed accidents would be an increase in the risk of low-speed accidents. Now if that increased risk at low speeds results in an increase of the total risk, then the cost of reducing high-speed risks would be deemed too high.

Now this is effectively what the GAMAB principle says: if the increase of low-speed risk is less than the decrease in high-speed risk, the change is acceptable. So the GAMAB principle is basically the same as the ALARP principle!

The German MEM (Minimum Endogenous Mortality) principle starts off with the fact that there are various age-dependent death rates in our society and that a portion of each death rate is caused by technological systems. The requirement is then that a new system shall not "significantly" increase a technologically caused death rate for any age group. Ultimately, this means that the age group with the lowest technologically caused death rate, the group of 5 to 15 year olds, is the reference level.

In the CENELEC pre-standard prEN 50126, the reference mortality rate is given as 2.10-4 fatalities per person and year and the limit for "significantly" augmenting this rate is given as at most 10-5 fatalities per person and year, i.e. 5% of the reference value.

Differential Risk Aversion

In the introduction it was pointed out that the various approaches are attempts to identify objective, rational ways of determining whether or not a risk is acceptable. Nevertheless, they have to take irrational, emotional factors into account.

People tend to be more willing to accept a risk if they think that they can directly influence how strongly it affects them. They are willing to accept a horrendous death toll on the roads, because they directly control the cars they drive. But for public transport, they are much more demanding: if they’re going to put their lives into the hands of somebody else, then every precaution must be taken to protect them!

They also tend to view accidents singularly. If a single accident can cause a gigantic catastrophe, it will be much less acceptable than a hoard of small accidents that each have apparently minor effects, even if the total is much worse. When a thousand people get killed in a single accident, it is taken much more seriously than ten thousand deaths in fifty thousand road accidents spread over a year.

This effect is taken into consideration in most countries by introducing a "Differential Risk Aversion" (DRA). Basically, it is assumed that accidents up to a certain severity can be regarded as being equally serious, severity being interpreted in terms of the death toll. Above a certain threshold, people will react increasingly negatively, so their willingness to tolerate the associated risk will decrease accordingly.

In Britain and Germany, for example, a linear relationship is applied to the DRA, i.e. the decrease in risk acceptability is directly proportional to the increase in the potential death toll. This is not always expressed explicitly in all countries, whilst some are even more demanding. The Dutch, for example, use a DRA that is proportional to the square of the potential death toll.

From the above we can see that there is a risk level that is so high that people will categorically refuse to accept it. Such risks are intolerable. But there is also a level that is so low that people regard the risk as being negligible. The region between these two levels is where the tolerable risks lie, those non-negligible risks that people are willing to live with.

Determining tolerable risks

What sort of acceptance criteria should be used then? Let’s start with a look at the railway sector, because it’s an area where most people have a fairly clear impression of what kind of risk they’re willing to tolerate!

Engineers and railway authorities tend to use accidents or casualties per passenger kilometre as their unit of reference for any comparison with other systems, be it a neighbouring country's railway system or other traffic systems. This quantity is certainly a very rational, calculable entity that does say something about traffic density and accident rates, but its use as a reference for acceptability is questionable. For manned space travel, the colossal distances involved result in exceptionally high figures for passenger kilometres and correspondingly low figures for the risk expressed as casualties per passenger kilometre. On that basis, space travel is an exceptionally safe affair, in strong contrast to the safety efforts of NASA, ESA and all the other space agencies!

From the point of view of the "man in the street", it doesn't matter how far he got before being killed. He's more concerned about how often he can take a train without risking his life. So some combination of mortalities per accident and accidents per trip (not kilometre!) will interest him.

He also has a chronically short memory! If accidents occur once every ten years, even if they are full-scale catastrophes they will still be regarded in isolation. If accidents occur once a month, even if there are no mortalities at all, the system will be regarded as unacceptably unsafe.

There are differences between countries and societies. It was pointed out earlier that the Dutch, for example, use a stronger DRA than the Germans or British. And a look at the public attitude to traffic safety in underdeveloped countries shows that there is virtually no awareness of the risks involved in public or private transport!

Finally, it is also a political issue. When the population is highly "tuned in" to a discussion of safety, the willingness to accept residual risks will decrease. On the other hand, if a technology is considered to be of vital importance, people will be more willing to accept the risks it entails. So the benefit provided by the system will influence peoples' willingness to accept the risk.

Thus we see that the tolerable risk level for a railway system must be some function of:

  • the average number of Casualties per Accident, C/A;
  • the average number of Accidents per Journey, A/J;
  • the distribution of accidents over time, dA/dt;
  • a differential risk aversion factor f(C) and
  • a factor b describing the benefit provided by the system.
  • The willingness to tolerate the risk will decrease when any of the above factors except b increases. It will increase with b, so assuming equal weighting of each of the factors, we can define tolerability t as
    T = b / (C/A * A/J * dA/dt * f(C) )
    = b / ( C/J * dA/dt * f(C) )
    A high value of T means that people will in general be willing to tolerate the risk; it corresponds to a low risk level. A low value of T means something has to be done to reduce the risk; it corresponds to a high risk level.

    Note that neither the total number of passengers nor the distance travelled (i.e. passenger kilometres) go into this equation! The average number of Casualties per Journey (C/J) and the Accident rate (dA/dt) are measurable quantities. The differential risk aversion factor f(C) will not only be different for each country, it may even differ in different regions of a country. In addition, it will vary over time, reflecting changes in politics and peoples' thinking.

    For simplicity, we will restrict ourselves to a single geopolitical region and set f(C) to 1 in the equation. The Accident rate (dA/dt) is of course dependent on what one calls an accident, which in turn is (indirectly) dependent on the definition of "casualty". Generally, the term casualty is restricted to mean any case where a person needs medical attention, i.e. we exclude environmental or financial damages but include non-mortal injuries, so an accident in our sense is any event where a railway system is the cause of a casualty. It is then possible to determine the accident rate as the number of such accidents per year and to identify the number of casualties from those accidents.

    The number of journeys is not simply the number of different routes multiplied by the number of departures per route, because people may make several journeys over parts of the same route, for example. The number of journeys is equal to the number of tickets that are sold! (Free trips or forfeited tickets shouldn't influence the statistics noticeably.)

    The above considerations were made for railway systems, but they can of course be applied to any other form of public or even private transport. For other technologies, we need an appropriate interpretation of the term "journey". In most cases, "journeys" will correspond to operational time, but it should be pointed out that this includes downtime caused by accidents! Only planned pauses in operation (e.g. maintenance shutdowns) should be excluded. So "casualties per journey" becomes "casualties per potential operational hour" and the tolerability becomes

    T = b / (dC/dt * dA/dt )
    Note that the term 'casualties' can be extended to include environmental and financial damages without influencing the underlying relationship.

    It was pointed out earlier that a high tolerability corresponds to a low risk level and vice versa. And since we usually talk about risk reduction rather than tolerability increase, it is more practical to refer to high or low risk levels rather than low and high tolerability levels.

    So we simply invert the expression to get an expression for the risk level:

    r = 1 / T
    = (dC/dt * dA/dt) / b

    Conclusion

    We have found an expression that relates risk levels with the casualty rate, the accident rate and the benefit provided by the system involved, regardless of what kind of system we’re looking at. The higher the risk level, the less tolerable the risk will be.

    There will always be a threshold risk level, above which a risk is considered to be completely intolerable, and risk reduction measures must always aim at getting the risk down below that threshold for unacceptable risks.

    But there is also a lower threshold value, below which risks are considered negligible. Where these thresholds lie depends on social, political and geographical factors. Between the two thresholds we have a region where risks will generally be tolerated. Whether or not additional risk reduction will be demanded for such tolerable risks is ultimately a political question.