Posts tonen met het label Reason. Alle posts tonen
Posts tonen met het label Reason. Alle posts tonen

donderdag 2 augustus 2012

Some thoughts... Swiss Cheese Model

By now it should be widely known and recognized within the safety community that the SCM does have some serious shortcomings or drawbacks, or at least that a number of misperceptions have led to wrong application. Reason himself is among those to acknowledge this and has, together with Hollnagel and others, written a paper on the subject. I’d advise anyone to read this 2006 Eurocontrol report “Revisiting the Swiss Cheese Model” that is freely available on the net.

Recently there has been published a book which gives a rather deep and detailed critical discussion of the SCM. I won't go into that level because it would require an extensive study of Reason's work and I don't have time for that. Personally I have never seen the SCM as an accident model in the same manner as e.g. the dominos, especially since I have never seen how the SCM practically can be used in an investigation other than that one uses it as a frame of mind to check out barriers that may have failed. The SCM is to me at the most something that “describes how an accident could happen” as Reason et.al. have said. In the cases that I mention the model myself, I use it to explain (multiple) barriers and how failure of these can lead to accidents.

The SCM also shows that a failure upstream (call it management) can be stopped underway (e.g. by competent and alert personnel), yet on ‘a bad day’ (another employee, stress, etc.) an accident may be the consequence. One huge drawback of the SCM is obviously that many versions exist. Some versions (e.g. the 1997 from “Managing The Risks…”) are very flexible and don’t put the layers of barriers in any strict order which would even allow an explanation of incidents starting without management failures. Other versions, however, do of course have various layers with designated ‘categories’ that - looking strictly at them - could be seen as if the SCM does say that all accidents are due to management failures.

It has been suggested that the SCM is an updated version of Bird, something I don’t see at all and must be an assumption or conclusion that is not further explained at length, but probably is linked to the fact that both tend to look towards management factors as the root causes of accidents. That being the case I think that Bird’s sequence and the SCM are quite different at heart. Reason’s first two books do reference Bird only once, and then he not even refers to Bird’s domino sequence, but to Bird’s updated pyramid (see “Managing The Risks Of Organizational Accidents”, p. 224).

One important difference is that Bird shows us a sequence of causes leading up to an accident and its consequences/loss. The SCM pictures a series of barriers with possibilities for failure (note that the SCM pictures also ‘holes’ that are not relevant for the accident!) which when several failures line up can lead to an accident because all layers of protection have been breached. If anything then the SCM is about the spaces between Bird’s dominos. Another difference is that the domino sequence in a way pictures the mechanism how upstream factors may cause the next domino to fall and thus show some causal relationship. The SCM does not show the mechanisms for the causes (holes). Contrary to the Bird sequence viewed in a strict sense, the SCM does not have holes in one slice of cheese affecting holes in the other slices.

Barriers

An understanding of barriers is important for discussing the SCM. In Norwegian railway safety legislation a barrier is defined as: “technical, operational, organisational or other planned and implemented measures that are intended to break an identified unwanted chain of events” (Sikkerhetsstyringsforskriften, 1-3). Other standards and legislation contain similar wording; e.g. ISO 17776:2000 (“Guidelines on tools and techniques for hazard identification and risk assessment”) defines a barrier as a “measure which reduces the probability of realizing a hazard’s potential for harm and which reduces its consequence” and explains that “barriers may be physical (materials, protective devices, shields, segregation, etc.) or non-physical (procedures, inspection, training, drills, etc.)”. When we speak of barriers (and also causes, by the way) in my company we think MTO: man, technical and organisation.

Probably everyone has experienced that barriers are (often) not perfect and barriers such as the ones listed as examples by ISO 17776 can fail from one moment to the next. One can choose to follow a procedure, or one can decide to take the shortcut making the rule-barrier useless. This mechanism doesn’t only apply to the ‘softer’ barriers, this also extends to technical barriers that can be rendered useless in a whim, for example when we don’t wear seatbelts or safety goggles or when a safety barrier is bridged.

When observing a system we have to study it as being the combination of man, machines, procedures and other elements. While it’s possible to see the part as separate items with man as one system and the machine as another, and man not being a part of the machine-system (although one, for example, can argue that the dead man’s switch in a train does unite the two), this view of separate systems is not very useful. A man working with a machine creates a new system that is built up from several sub-systems. This, in my view, gives a much clearer view of systems, and also of machines.

Having this in the back of our minds we should conclude that the SCM actually gives a pretty good mental model of how (or at least: that) barriers can fail - while recognizing its weaknesses at the same time, of course. I do strongly question the SCM’s use as an accident model.

By the way…

I never really understood why the model had to be based on something smelly and yucky like cheese in the first place. Let me propose something more tasteful:

Some thoughts... Management Failure versus Free Will

An interesting point that I read some time ago, is that management failure theory would conflict with the notion of free will: if management failures are the root causes for all accidents then management failures are also the cause for human errors, and not free will. I’m not sure if this very strict reasoning is fully correct. Even if it does sound logical it doesn’t feel right to me. In my opinion at least Bird’s dominos and also Reason’s Swiss Cheese Model (proponents of multi causality) do not exclude free will from kicking in into the causal chain. The basic causes in Bird’s sequence differentiate in job factors and personal factors the latter explicitly including things like motivation which is at least one personal factor that is clearly related to free will. I’m rather sure that Reason has similar mechanisms, but I’m too lazy to check.

One might point out that still the most left ‘domino’ (management) causes the fall of the next (which includes the personal factors) and does not describe for causes materializing half way the sequence. I sometimes get the impression that models sometimes are treated as if they were laws of nature that have to apply in each and any situation, in exactly the same way and the same order (a hair-rising example is the treatment by some people of Heinrich’s ratios, expecting to find the same everywhere). But that’s hardly realistic. Heck, that’s what they’re called models for - a simplified representation of reality and thus not describing all and every possibility.

I live in the belief that nobody rises in the morning and goes to his job with the intention to screw up massively and create an accident. There are others much more qualified than me writing about human error (including violations) and its causes (and have luckily done so), but roughly I’d say that two important reasons/causes for human error or violations are found in: 1) an overly optimistic perception of their control over the situation (remember that about 90% of all drivers believe that they’re better than average drivers) and 2) especially conflicting objectives. These are ‘causes behind the cause’ for deliberate acts and for safety work it is eminent to identify those in order to determine preventive actions that keep future errors and accidents from happening.

For a legal case it may be sufficient to stop having established as a cause that someone willingly chose not to follow a safety rule. Acting on that single act (e.g. by punishing the violator, or explaining the rule to him once more) is often very ineffective from a safety point of view. The decision not to follow a safety rule may be a deliberate act of free will, but there may have been incentives and other mechanisms behind this decision. In case of conflicting objectives (e.g. the company claims ‘Safety First’, but rewards people cutting corners in order to maximize profits) it’s more effective to address those causes behind the causes.

When studying safety many of us have learned not just to focus on an error by an employee, stop the investigation there and blame the person. Instead we’re taught to look further than the person at fault. But the opposite is true as well: defaulting to management failures as causes for accidents is not the way things should be done. That’s a kind of jumping to conclusions without any basis in reality or facts that is just as bad as Heinrich’s decision to stop at the direct cause and focus on that alone. A comment recently I heard was in the line of “we’re not satisfied unless we have found at least one management cause”. I’m convinced that this is said in the very best of intentions for the improvement of safety, but even this well-meaning framing of your mind is going to bias the result in a way that should not be acceptable. Remember what Hollnagel said: WYLFIWYG! I believe he said it in relation to ‘human error’, but it applies to anything. Some causes are simply not management induced and that’s that.

Some people go a step further and reject the existence of management failures entirely, a.o. because these are human failures. Agreed, in the end, management/organisational failures are decisions of men and thus human failures, but they are of another kind. I find it helpful to distinguish between (direct) causes on a more personal level (call it sharp end/operational, if you want) and (underlying) causes on organisational or management level that are further upstream. Additionally, sometimes it’s hard or impossible to determine what the ‘deliberate act’ or more or less active failure in the management was.

Take for example the accident that happened at Sjursøya on 23 March 2010, something that has taken a considerable amount of my working hours since. The full report of the Norwegian Accident Board (a fairly good one, even though I do not support all of the recommendations) is online, check it out for details (available in English). Short version: operational error(s) directly caused a runaway set of goods wagons which went unstoppable downhill. There is a 100 meter difference in height between the point of origin and the ending point, 8 km further down in the harbour of Oslo. There the wagons, speeding at about 130 km/h crashed in a building and killed three people, injuring several others. The outcome could have been even worse, by the way, since the wagons might have hit wagons with jet fuel, had the accident happened at another point of time.

What lies behind the operational error (I’ll do the simplified version) is that over the last two decades the use of the goods terminal had slowly been changed without anyone noticing and many baby steps ended up in using the terminal differently than intended. Working procedures and local risk analysis had not followed the same development and thus safety barriers had unknowingly been eroded in such a way that a rather simple error could lead to such a drama.

I tend to be very critical towards the labeling of deficiencies in risk analysis as causes - often this is a sign of lazy or unrealistic safety people since you will almost always find something related to the accident not being in the risk analysis (a complete risk analysis being a fiction and not very helpful either). But here it certainly was the case and together with other management factors this created over many years (impossible to pin down on persons, acts or dates) the situation as it was on the date of the accident without anybody being really aware of the situation sliding towards the breaking point.

While blaming the operational personnel that had ‘violated’ long forgotten rules would have been a possibility (unnecessary because they blamed themselves enough as it is) it was decided to look at the underlying factors. I don’t want to discuss the legal part, but also the DA chose not to have a legal action against the operational personnel. The companies involved, however, were fined which my company paid without any appeal (in contrary to the other company involved). Negative side effect: several millions of Norwegian kroner down the drain that could not be used for prevention and improvement.

Comments and discussion appreciated!

Some thoughts... Organisational Accidents

In “Managing The Risks Of Organizational Accidents”, Reason describes organisational accidents as having “multiple causes involving many people operating at different levels of the business”. For individual accidents “a specific person or group is often both the agent and the victim of the accident”. Reason adds that the distinction between the two is hard - an individual accident may turn out to be an organisational accident. As said above the book sees the entire term as a myth because the ‘default search for management failures’ would turn any individual accident into an organisational accident.

I don’t know if Reason actually intended to use his description as a proper definition. One may wonder how useful this is anyhow because one could separate the two only in retrospect - after an investigation. Strictly defined or not, for my understanding the distinction was more that between relatively straightforward cases and more complex cases with sometimes less linear and clear causal chains. The only real value of the term is thus that we are aware of this possible complexity.

Comments? Opinions?