HEACH: augustus 2012

maandag 20 augustus 2012

Are We Blundering Our Way Through Safety?

A recommended read here. Will have to buy Zachary Shore's book myself soon...

dinsdag 14 augustus 2012

Zero accidents = safe!

So... because nobody was hurt when this happened, this practice is perfectly safe.

Hmmm... something tells me that there's something fishy with that logic...

woensdag 8 augustus 2012

How to write an Academic Article - Quick Guide

If you ever have to write an academic article, follow the below guideline and you're safe:

Introduction

The following article describes the study into [insert theme] which is an important framework for [insert theory] which has been the [insert either: ‘leading paradigm in the field for the past years’, or ‘paradigm that has undergone a shift in latter years’]. This paper describes the study into [brief description here].

The Study

[Describe the metodology, research and results, use a lot of quotes and references, avoid as much as possible original research, but add some statistics and results. Use complicated words at will and sentences shall be combined into long complicated structures with many adjectives and commas. One or two scattergraphs are cool.]

Conclusion

This study has given us greater understanding of [insert theory]. More research is recommended.

Make sure the title is impossibly long and about some ridiculous detail of your research. Like: "The Influence of the Lovelife of the Madagarscar Blowfly on Safety Consciousness in a Postmodern Society within a Brownian Motion Model"

Follow APA rules for layout to make it additionally unappealing (you wouldn't want anybody accidently reading this, being publish should be sufficient).

donderdag 2 augustus 2012

Some thoughts... 'The' Cause (?)

The people who know me, know that I tend to reject the notion of ‘the’ cause. Let’s explore the concept a bit. I believe there are two main variations on ‘the’ cause.

1. A simple philosophy about causal relationships. Everything that happens has just one cause.

2. The elevation of one particular cause as the most important.
3. A very strict definition of the word cause

Re 1: Monocausality

I do believe in the existence of simple accidents with a straightforward linear causal sequence (and have experienced some), not unlike the one below where each effect is basically the cause of the next effect until we come to the final consequence. Working backwards from the final consequence we’ll have a continuous sequence of why - because relationships. E.g. I’m on my way out (context, not cause), change my mind and turn around abruptly only to bump into the door that closes behind me. Some cases might be that simple that causal chains even may be restricted to just one cause and one effect…

Often, however, the world is not quite that simple and causal paths develop more or less independently, only to join up at some point causing some outcome. Please see the example discussed elsewhere on this blog; I don’t see how that can be turned into one linear sequence without dismissing a number of essential factors.

Re 2: More important causes

Are some causes more important than others? In some sense, yes. Especially if one defines preventive and corrective actions to remove them one may want to focus more on one cause than on others. But that’s then ‘more important’ in the sense of prioritizing resources and actions, not ‘more important’ in a causal sense.

In a causal sense it’s a bit more difficult… Leaving aside the discussion if an underlying (or root) cause is more important than a direct cause (we may get back to that issue another day), I find it hard to say that cause A is more important than cause B because it caused the incident more than the other cause… If an investigation has established that both were necessary for the incident to happen I cannot maintain that one should be ranked above the other.

Take the lab example: both not wearing goggles and mixing the ingredients wrongly are needed for chemical substances hitting the eye of the victim. How can one be more important than the other? Sure one might argue that the wrong mixing is the point of loss of control, or the first barrier breached. But still, both are needed for the defined incident.

For now I remain unconvinced that some causes are more important than others. But please supply viewpoints of your own.

Re 3: Strict definitions

Some define the word cause very strictly, for example by only allowing a deliberate act, or discarding conditions as causes (see elsewhere). Or by limiting the meaning to the direct cause (which in effect boils down to option 1). The added value of this isn’t quite clear to me so far - apart from eventual legal use… or as an alternative for what I often would call a direct cause (like in the lab example). But my opinion on the value may change as discussion progresses and knowledge grows…

Some thoughts... Swiss Cheese Model

By now it should be widely known and recognized within the safety community that the SCM does have some serious shortcomings or drawbacks, or at least that a number of misperceptions have led to wrong application. Reason himself is among those to acknowledge this and has, together with Hollnagel and others, written a paper on the subject. I’d advise anyone to read this 2006 Eurocontrol report “Revisiting the Swiss Cheese Model” that is freely available on the net.

Recently there has been published a book which gives a rather deep and detailed critical discussion of the SCM. I won't go into that level because it would require an extensive study of Reason's work and I don't have time for that. Personally I have never seen the SCM as an accident model in the same manner as e.g. the dominos, especially since I have never seen how the SCM practically can be used in an investigation other than that one uses it as a frame of mind to check out barriers that may have failed. The SCM is to me at the most something that “describes how an accident could happen” as Reason et.al. have said. In the cases that I mention the model myself, I use it to explain (multiple) barriers and how failure of these can lead to accidents.

The SCM also shows that a failure upstream (call it management) can be stopped underway (e.g. by competent and alert personnel), yet on ‘a bad day’ (another employee, stress, etc.) an accident may be the consequence. One huge drawback of the SCM is obviously that many versions exist. Some versions (e.g. the 1997 from “Managing The Risks…”) are very flexible and don’t put the layers of barriers in any strict order which would even allow an explanation of incidents starting without management failures. Other versions, however, do of course have various layers with designated ‘categories’ that - looking strictly at them - could be seen as if the SCM does say that all accidents are due to management failures.

It has been suggested that the SCM is an updated version of Bird, something I don’t see at all and must be an assumption or conclusion that is not further explained at length, but probably is linked to the fact that both tend to look towards management factors as the root causes of accidents. That being the case I think that Bird’s sequence and the SCM are quite different at heart. Reason’s first two books do reference Bird only once, and then he not even refers to Bird’s domino sequence, but to Bird’s updated pyramid (see “Managing The Risks Of Organizational Accidents”, p. 224).

One important difference is that Bird shows us a sequence of causes leading up to an accident and its consequences/loss. The SCM pictures a series of barriers with possibilities for failure (note that the SCM pictures also ‘holes’ that are not relevant for the accident!) which when several failures line up can lead to an accident because all layers of protection have been breached. If anything then the SCM is about the spaces between Bird’s dominos. Another difference is that the domino sequence in a way pictures the mechanism how upstream factors may cause the next domino to fall and thus show some causal relationship. The SCM does not show the mechanisms for the causes (holes). Contrary to the Bird sequence viewed in a strict sense, the SCM does not have holes in one slice of cheese affecting holes in the other slices.

Barriers

An understanding of barriers is important for discussing the SCM. In Norwegian railway safety legislation a barrier is defined as: “technical, operational, organisational or other planned and implemented measures that are intended to break an identified unwanted chain of events” (Sikkerhetsstyringsforskriften, 1-3). Other standards and legislation contain similar wording; e.g. ISO 17776:2000 (“Guidelines on tools and techniques for hazard identification and risk assessment”) defines a barrier as a “measure which reduces the probability of realizing a hazard’s potential for harm and which reduces its consequence” and explains that “barriers may be physical (materials, protective devices, shields, segregation, etc.) or non-physical (procedures, inspection, training, drills, etc.)”. When we speak of barriers (and also causes, by the way) in my company we think MTO: man, technical and organisation.

Probably everyone has experienced that barriers are (often) not perfect and barriers such as the ones listed as examples by ISO 17776 can fail from one moment to the next. One can choose to follow a procedure, or one can decide to take the shortcut making the rule-barrier useless. This mechanism doesn’t only apply to the ‘softer’ barriers, this also extends to technical barriers that can be rendered useless in a whim, for example when we don’t wear seatbelts or safety goggles or when a safety barrier is bridged.

When observing a system we have to study it as being the combination of man, machines, procedures and other elements. While it’s possible to see the part as separate items with man as one system and the machine as another, and man not being a part of the machine-system (although one, for example, can argue that the dead man’s switch in a train does unite the two), this view of separate systems is not very useful. A man working with a machine creates a new system that is built up from several sub-systems. This, in my view, gives a much clearer view of systems, and also of machines.

Having this in the back of our minds we should conclude that the SCM actually gives a pretty good mental model of how (or at least: that) barriers can fail - while recognizing its weaknesses at the same time, of course. I do strongly question the SCM’s use as an accident model.

By the way…

I never really understood why the model had to be based on something smelly and yucky like cheese in the first place. Let me propose something more tasteful:

Some thoughts... Causes (and more)...

Causes appear to be difficult things and causation appears to be a difficult subject. A brief check of safety books in the shelf over my desk shows astonishing variations. Heinrich decides to focus on direct causes, Hendrick & Berner completely reject the notion in their 1985 book “Investigating Accidents With STEP” and Hollnagel appears to reject root causes (see his great 2004 book “Barriers And Accident Prevention”). All explain things differently and have their reasons: Hendrick and Berner choose to equate cause to blame, Hollnagel not as much rejects root causes altogether but the notion of the root cause and application to intractable systems (see pages 105 and 106 of his ETTO book) and for Heinrich I’d like to point to my earlier discussion of his work.

As if this isn’t enough safety literature is littered with an incredible number of terms: direct causes, basic causes, underlying causes, root causes, proximate causes, latent conditions, unsafe acts, contributing factors, causal factors, and what not. While there seems to be some kind of general acceptance and understanding of those, one quite often comes across differing interpretations and then flaming discussions are adding to the confusion.

Something that gets me rather confused are ‘proximate causes’. This may be partly due to my ignorance (and the fact that this term is not used in the Dutch language when you study safety or law). Heinrich defined these as “unsafe personal act or unsafe mechanical hazard that results directly in an accident”. Heinrich’s definition is for me clearly a definition of ‘direct causes’. This is agreed upon by that most scientific of all sources, Wikipedia: “In philosophy a proximate cause is an event which is closest to, or immediately responsible for causing, some observed result”. Equating proximate and root causes (as I saw not too long ago) is a concept that I find difficult from a contemporary safety science point of view and in my opinion not very helpful for safety work either.

Conditions and causes

Many safety professionals appear not to be aware of the difference between causes and conditions (you may call the latter context as well, if you want), which sometimes ends up in quite strange causation, all too often blending in context. As an illustration: a few years ago I was one of the people responsible for cleaning up the ‘taxonomy’ of causes in our incident registration/management system. The status at that moment was one of unguarded organic growth over 15 years. Originally the off-the-shelf version had contained Bird’s sequence as the three main categories: Management Failures, Basic Causes (with sub categories for Personal Factors and Job/Environmental Factors) and Direct Causes. Into this structure there had been added elements over the passing of years without any proper policy or philosophy. Often the new elements ended up in ‘wrong boxes’ (e.g. ‘planning’ as a direct cause) and there were exceptionally many context items (e.g. ‘performing work in the rail tracks’) mixed in. Sure, absence of these things would have ‘prevented’ an accident from happening, but it would also have prevented accomplishing the intended outcome. So there is a lesson here, since I do not consider the people who worked with this system in the 15 years before me as complete idiots (something I’m not even qualified to diagnose anyway).

Among professionals there has been the debate if and how an (unsafe) condition can be a cause. I’m in the camp of people who think some conditions can be causes, but not always and automatically. Hart and Honore posed that “causes are what made the difference, mere conditions on the other hand are just those [things] that are present alike both in the case where accidents occur and in the normal case where they do not; and it is this consideration that leads us to reject them as the cause of the accident, even though it is true that without them the accident would not have occurred…”. I find this a rather useful and clarifying definition. Let’s apply it to some examples:

If someone bumps into a lamp post out on the street I agree that we’re talking about the lamp post as a mere condition. Sure, had it not been there, no one would have bumped in it. But reasoning in the line of “what would have prevented the accident from happening” and labeling those things causes is counterfactual reasoning and a fallacy without establishing a proper causal relationship first. The lamp post is intended to be there, was built in accordance with relevant standards and it stands there in the normal case of people not bumping into it and is just minding its business of lamp-posting.

If a workshop burns down after a discarded match sets discarded scraps of paper on fire, I do not agree with the view that only the match was the cause of the fire and the scraps of paper a mere condition. This based on the fact that I refuse to see scraps of paper thrown on the ground as being the normal case. If dumping your trash on the ground is acceptable (regarding it as an unwelcome but normal standing condition), then discarding a match is the same (after all people throw down matches and cigarette butts all the time, most of the time not causing fires). Even more since discarding a match on a concrete floor with no flammable material present would not have made the difference either. So in this example I see two things joining up as causes, namely the act of discarding the match and the condition-turning-cause of the scarps of paper lying there. Counting the last bit of the old-fashioned fire triangle (i.e. oxygen) as a cause, however, would be bullocks. Merely based on the ‘normal case’ argument… Absence of oxygen in this situation would hardly qualify as normal, would it?

Cause-effect relationships have to be explained with logic and must be based on facts (not hunches or guesses, not even on experience!). Anything written down in an investigation report must be able to stand up in a court of law, but hopefully never will. I find the “beyond a reasonable doubt” criterion a good one, and in the case that conclusions have to be based on assumptions/theories/hypotheses this must be clearly stated as such.

The example elsewhere on this blog hopes to illustrate the point about causes and conditions further. This example also shows another difficulty: it’s fully possible to write down acts as conditions and vice versa. The discarded newspaper isn’t quite as clear as the example where the condition ‘not wearing safety goggles’ essentially is the same as the act ‘does not put on safety goggles’.

By the way, a legal background may partly influence what to call a cause and what not because law can only handle people and thus will be focused on human acts. If there are any conditions identified the question usually will be who has caused them. In safety this appears to me as a not very useful notion. In my view cause in law is not quite the same as cause in safety and I find it not useful to define causes in a strict theoretical and legally sound manner. This will not really help preventive actions, but is probably great for judges and lawyers.

So, while I have learned to appreciate some of Hart and Honore’s work as very useful and clarifying (they have been added to my personal list of things-to-read-when-I-grow-up) I will look for causes in a safety context. Especially since the goals are different. Causes in safety must help to define actions while causes in law help other goals. In the newspaper example I can imagine that a judge rules a discarded match being an act of greater carelessness than discarding a newspaper. And I would agree, one reason I could imagine this being ‘more’ cause than the newspaper. Still, you need both and both are acts out of the ordinary that together ‘make the difference’, so in safety terms it’s useful to regard both as causes.

There’s another reason why I’m not particularly happy with thinking about causation in the legal sense: laws and even entire legal systems are quite different from one country to the other (just compare the UK to Germany or France) which may, or will, have consequences for the legal interpretation of or legal requirements for the term ‘cause’. In contrary the language of prevention in safety should be a common one that is understood by all safety professionals, regardless their countries of origin.

Conclusions (sort of)

Three things so far…

Causes in law and causes in safety are not necessarily the same thing. Beware of mixing them up.

The distinction between causes and conditions/context is an area that many safety professionals have too little awareness of. I’m sure that more focus and greater clarity on this will strengthen incident analysis and recommendations for preventive action.

It would be a good thing if the safety world would start to agree on what we call cause and what not, if necessary with some additional cause categories (direct, root, …). Or maybe we should stop using the word cause altogether and use another more neutral term? ‘Contributing factor’ has been proposed and Alan Quilley came up with “factors to be considered to prevent recurrence of the events that led to the unintended consequence”, but I believe FTBCTPROTETLTTUC is an impossible to sell acronym.

Feel free to comment and add more thoughts!

Some thoughts... Management Failure versus Free Will

An interesting point that I read some time ago, is that management failure theory would conflict with the notion of free will: if management failures are the root causes for all accidents then management failures are also the cause for human errors, and not free will. I’m not sure if this very strict reasoning is fully correct. Even if it does sound logical it doesn’t feel right to me. In my opinion at least Bird’s dominos and also Reason’s Swiss Cheese Model (proponents of multi causality) do not exclude free will from kicking in into the causal chain. The basic causes in Bird’s sequence differentiate in job factors and personal factors the latter explicitly including things like motivation which is at least one personal factor that is clearly related to free will. I’m rather sure that Reason has similar mechanisms, but I’m too lazy to check.

One might point out that still the most left ‘domino’ (management) causes the fall of the next (which includes the personal factors) and does not describe for causes materializing half way the sequence. I sometimes get the impression that models sometimes are treated as if they were laws of nature that have to apply in each and any situation, in exactly the same way and the same order (a hair-rising example is the treatment by some people of Heinrich’s ratios, expecting to find the same everywhere). But that’s hardly realistic. Heck, that’s what they’re called models for - a simplified representation of reality and thus not describing all and every possibility.

I live in the belief that nobody rises in the morning and goes to his job with the intention to screw up massively and create an accident. There are others much more qualified than me writing about human error (including violations) and its causes (and have luckily done so), but roughly I’d say that two important reasons/causes for human error or violations are found in: 1) an overly optimistic perception of their control over the situation (remember that about 90% of all drivers believe that they’re better than average drivers) and 2) especially conflicting objectives. These are ‘causes behind the cause’ for deliberate acts and for safety work it is eminent to identify those in order to determine preventive actions that keep future errors and accidents from happening.

For a legal case it may be sufficient to stop having established as a cause that someone willingly chose not to follow a safety rule. Acting on that single act (e.g. by punishing the violator, or explaining the rule to him once more) is often very ineffective from a safety point of view. The decision not to follow a safety rule may be a deliberate act of free will, but there may have been incentives and other mechanisms behind this decision. In case of conflicting objectives (e.g. the company claims ‘Safety First’, but rewards people cutting corners in order to maximize profits) it’s more effective to address those causes behind the causes.

When studying safety many of us have learned not just to focus on an error by an employee, stop the investigation there and blame the person. Instead we’re taught to look further than the person at fault. But the opposite is true as well: defaulting to management failures as causes for accidents is not the way things should be done. That’s a kind of jumping to conclusions without any basis in reality or facts that is just as bad as Heinrich’s decision to stop at the direct cause and focus on that alone. A comment recently I heard was in the line of “we’re not satisfied unless we have found at least one management cause”. I’m convinced that this is said in the very best of intentions for the improvement of safety, but even this well-meaning framing of your mind is going to bias the result in a way that should not be acceptable. Remember what Hollnagel said: WYLFIWYG! I believe he said it in relation to ‘human error’, but it applies to anything. Some causes are simply not management induced and that’s that.

Some people go a step further and reject the existence of management failures entirely, a.o. because these are human failures. Agreed, in the end, management/organisational failures are decisions of men and thus human failures, but they are of another kind. I find it helpful to distinguish between (direct) causes on a more personal level (call it sharp end/operational, if you want) and (underlying) causes on organisational or management level that are further upstream. Additionally, sometimes it’s hard or impossible to determine what the ‘deliberate act’ or more or less active failure in the management was.

Take for example the accident that happened at Sjursøya on 23 March 2010, something that has taken a considerable amount of my working hours since. The full report of the Norwegian Accident Board (a fairly good one, even though I do not support all of the recommendations) is online, check it out for details (available in English). Short version: operational error(s) directly caused a runaway set of goods wagons which went unstoppable downhill. There is a 100 meter difference in height between the point of origin and the ending point, 8 km further down in the harbour of Oslo. There the wagons, speeding at about 130 km/h crashed in a building and killed three people, injuring several others. The outcome could have been even worse, by the way, since the wagons might have hit wagons with jet fuel, had the accident happened at another point of time.

What lies behind the operational error (I’ll do the simplified version) is that over the last two decades the use of the goods terminal had slowly been changed without anyone noticing and many baby steps ended up in using the terminal differently than intended. Working procedures and local risk analysis had not followed the same development and thus safety barriers had unknowingly been eroded in such a way that a rather simple error could lead to such a drama.

I tend to be very critical towards the labeling of deficiencies in risk analysis as causes - often this is a sign of lazy or unrealistic safety people since you will almost always find something related to the accident not being in the risk analysis (a complete risk analysis being a fiction and not very helpful either). But here it certainly was the case and together with other management factors this created over many years (impossible to pin down on persons, acts or dates) the situation as it was on the date of the accident without anybody being really aware of the situation sliding towards the breaking point.

While blaming the operational personnel that had ‘violated’ long forgotten rules would have been a possibility (unnecessary because they blamed themselves enough as it is) it was decided to look at the underlying factors. I don’t want to discuss the legal part, but also the DA chose not to have a legal action against the operational personnel. The companies involved, however, were fined which my company paid without any appeal (in contrary to the other company involved). Negative side effect: several millions of Norwegian kroner down the drain that could not be used for prevention and improvement.

Comments and discussion appreciated!

Some thoughts... Organisational Accidents

In “Managing The Risks Of Organizational Accidents”, Reason describes organisational accidents as having “multiple causes involving many people operating at different levels of the business”. For individual accidents “a specific person or group is often both the agent and the victim of the accident”. Reason adds that the distinction between the two is hard - an individual accident may turn out to be an organisational accident. As said above the book sees the entire term as a myth because the ‘default search for management failures’ would turn any individual accident into an organisational accident.

I don’t know if Reason actually intended to use his description as a proper definition. One may wonder how useful this is anyhow because one could separate the two only in retrospect - after an investigation. Strictly defined or not, for my understanding the distinction was more that between relatively straightforward cases and more complex cases with sometimes less linear and clear causal chains. The only real value of the term is thus that we are aware of this possible complexity.

Comments? Opinions?