The Safety of Work

Ep.39 Do accident investigations actually find the root causes?

Episode Summary

Welcome back to the Safety of Work podcast. Today, we discuss whether accident investigations actually get to the root causes of the incidents.

Episode Notes

To frame our chat, we reference the papers, Our Current Approach to Root Cause Analysis and What-You-Look-for-is-What-You-Find.

Tune in to hear our thoughts on this matter.

 

Topics:

 

Quotes:

“What they suggested was that we should always have a very strong evaluation process around any corrective action to test and check whether it actually addresses the things that we’re trying to address.”

“To really understand how the investigation has happened, you’ve got to talk to the investigators as they’re doing the investigation…”

“I can’t imagine a safety person taking to the board a recommendation that they weren’t sure themselves, they could fix.”

 

Resources:

Our Current Approach to Root Cause Analysis

What-You-Look-for-is-What-You-Find

Feedback@safetyofwork.com

Episode Transcription

Drew: You're listening to the Safety of Work Podcast episode 39. Today we're asking the question, do accident investigations actually find the root causes? Let's get started. Hey, everybody, my name is Drew Rae and I'm here with David Provan. We're from the Safety Science Innovation Lab at Griffith University. Dave, what's today's question?

David: Drew, I think today is the first time on the podcast that we've talked directly about accident investigation. We've talked in and around accidents with our episode on the Dreamworld incident and some others but processed it so core to health and safety practice. I thought we might talk about it today. 

Truth be told, between you and I, we've got less of a buffer than we'd like at the moment with our future episodes. I thought given we're recording on a Sunday morning, I'd throw you a topic that's right in your wheelhouse. We won't be able to cover all the useful questions that we could talk about in relation to accident investigation, but I thought we'd talk about this idea of root cause and whether our investigations actually get us there.

Just a shout out to Ben Hutchinson. I was thinking about topics and if you don't follow Ben on LinkedIn, he regularly publishes some really short practical reviews of safety research. I asked him for a set of ideas and he fired me back about 25 topics. Thanks for that, Ben. 

Drew, This idea of the root cause is one that suggests that there's either a single or a small set of fundamental issues that have significantly and directly led to the incident that we're investigating. Can you give us your early comments about this idea of root cause and accident causation?

Drew: I don't think our listeners can see that in our outline notes for this episode, the words ‘root cause’ keeps coming up with scare quotes around it. Causation is one of those things that seems really simple and obvious until you try to define it precisely. Even very basic notions of what causes what, are tied much more closely to ideas of responsibility and blame than anyone really wants to admit, particularly, people down the science at the end of Safety Science. 

A really simple example. A person drops a glass and the glass smashes. What caused the smash? Is it the fact that the person let go of the glass? Is it the counterfactual that someone didn't catch it first? Is it the fact that the glass didn't bounce? Is it the existence of gravity? Is it all of the events that led up to that person being there in that place at that time, with the glass in their hands?

You start with this really simple idea of let's investigate and find out the causes. You start to think about exactly what you're doing that gets really complicated. We don't want to go into that too much in this episode. I think we can very quickly narrow down to two things that hopefully we can agree on. 

The first one is that ultimately, all investigations are social processes. There's no such thing as a single, unbiased, scientifically objective explanation for what caused an accident. That's the very social sciencesy end of things. But I think everyone also agrees on the very practical end of things, which is that some explanations are better than other explanations. We can't necessarily agree or pin down exactly why, but some explanations are useful, some explanations are not useful.

David: Drew, just for some more context before we dive into the papers. I think it would be generally accepted (although we, as researchers, don't really like to use that term ‘generally accepted’) that our listeners would agree with this that Root Cause Analysis generally fails to explore the deep system problems that contribute to safety events, and perhaps stays on the surface of trying to understand what's going on. I think we'd also all agree that incident investigation techniques and the quality of their application within the organization across industries vary really widely.

If we take Chernobyl for instance, I think the root causes that have been suggested by different people for that disaster range from the root cause being the actions of an individual worker inside the plant on the day to the root cause being the macroeconomic factors surrounding the entire energy sector across Europe.

Let's talk about these papers for today and see if we can get some way towards the answer to our question of do our accident investigations actually find the root causes. We're going to look at two papers plus an editorial to help us with the discussion, and I think this is the first time we've referred directly to an editorial in the podcast. The editorial that we're talking about was authored by Patricia Trbovich and Kaveh Shojania, both from the University of Toronto in Canada. This editorial was a special issue, but it might have just been a regular issue of the BMJ Quality & Safety Journal in 2017. Can you give our listeners an overview of what an editorial is and what it's placed within an issue of a journal is?

Drew: Sure. This is something that happens a lot in special issues, where the people who put together this special issue will provide an editorial at the front which summarizes the paper. This particular editorial is something that I've actually really only seen done in BMJ Quality & Safety, but I think it might be something a bit more common in medical journals than it is in safety journals. The idea is that the journal is publishing a very empirical paper that's got lots of hard numerical results, but the editors think that there’s some more important context around the study that needs to be discussed, so they invite someone else, often one of the peer reviewers of the empirical study, to write like a commentary editorial that puts the results into the broader context.

BMJ Quality & Safety actually asked me to do this once. They gave me a very factual paper to review. They said, okay, I like some of your broader comments here. Don't just send them off to the authors. Why don't you put them into a little commentary that we'll publish along with the main article? If it's a controversial thing, they may actually get multiple people to write editorials, and then it grows into a special issue of all this commentary around a small number of results. I think it makes sense in this case that we talked about the editorial because it gives us the context then we talked about this specific paper with the specific results in it.

David: When we talked about, does the zero harm policy improve safety, (I think) episode 12, I think a lot of the papers that we refer to were actually back and forth editorials in the journal of policy and practice in health and safety from different authors. This editorial early on cites a few reputable names in Safety Science and comments that they have made about the incident investigation and accident causation. They talk about James Reason and his suggestion that the goal of an investigation is to drain the swamp not swat at the mosquitoes, which is his idea of the underlying system problems and vulnerabilities versus the symptoms that we're seeing on the surface.

They also cited a paper that was co-authored by Nancy Leveson which described accidents in complex socio-technical systems as being the result of active failures, which is proximal causes or sharpened actions which is how the operators acted, and these lightened conditions which are the underlying circumstances that give rise to the behaviors or some of the blunt-end conditions in the business. 

The warning here (I suppose) in these two suggestions by Leveson and Reason is that if we treat the active failures, the things that we see immediately surrounding the incident sequence itself, then we're treating the symptoms and maybe not the underlying illness.

Drew, one of the other things I liked in this editorial was that they said that an investigation can really only generate hypotheses, not corrective actions. They talked about when they found a problem during an investigation, then what they might do when they're coming up with corrective actions is to look at other interventions that are already in use in different types of applications, but they could only ever hypothesize about whether these corrective actions would actually address the findings in this particular incident situation. What they suggested was that we should always have a very strong evaluation process around any corrective action to test and check whether it actually addresses the things that we're trying to address.

Drew: I love this idea of generating hypotheses, which makes a lot of sense if you think about an accident investigation as a type of research. Even when we do case study research (where we just got one case study) we're always looking to test and compare the findings, which is something that's not available if you're just looking at a single accident that happened in the past. One of the things that research methods people warn about is this danger of confusing specific circumstances with general patterns or confusing general patterns with specific circumstances.

Sometimes we see something that we think is special and unusual, and actually sometimes it happens all the time. Sometimes we see something that we think is normal and a pattern, and it's really just this specific example in front of us. That's the real warning for accidents, where you're talking about counterfactuals, things that didn't happen, or that were missing. When you say improved communication would have prevented this accident, you’re really going out on a limb, unless you're sure that the communication in this particular case was very different from other cases. Most of the time, communications’ much better. Even then, you want to know why this occasion was special. What made the communication bad in this case, so that you know how to fix it.

David: Drew, the final thing that I took out of the editorial for us to consider is the fact or the idea that the investigations that we're going to talk about, the investigators are internal to the organization, which makes them part of the system and part of all of these latent values that exist within the business. Therefore, internal investigators will commonly conclude that the organization's policies, procedures, and leadership were effective, and so on, so it's the human error or the active failures that were to blame. This is, therefore, where the investigator concludes in relation to causes and targets the corrective action. I think we'll see this in the first paper that we're about to discuss.

Drew: It's funny how when the head of the Quality & Safety committee investigates an accident. The number one recommendation is not this accident was a failure of the Quality & Safety committee.

David: I think I've reflected on that a few times and mentioned it, but I don't think in the last 20 years I've ever seen an incident investigation ever make any comment in relation to the Health and Safety Department itself.

Drew: Even particularly, you'd expect, given how often the same accidents recur. That one of the recommendations you definitely should be coming up with if people were genuinely being honest and consistent in their investigations were clearly our last investigation failed because the accident has happened again. Therefore, we need to do investigations better.

David: The first paper that we're going to talk about is titled, Our current approach to Root Cause Analysis. Is it contributing to our failure to improve patient safety? From the editorial, it's from the British Medical Journal Quality & Safety. The paper has seven authors. I won't list them all but it does include the late [...] who, for decades, pioneered the introduction of resilience engineering principles and practices into patient safety within our healthcare, and Terry Fairbanks. Terry is currently the VP of Quality & Safety at MedStar Health in the US and a professor at Georgetown University.

I was fortunate to spend a bit of time with Terry in pre-COVID days in Florida 18 months ago or 2 years ago at a safety doing practice workshop. So, very, very experienced academics and practitioners who are involved in the authoring of this paper. Drew, can you tell us a bit about the method?

Drew: This is a fairly simple paper. It's a review of Root Cause Analysis investigations at a large New York teaching hospital. What the authors did is they got hold of all of the RCAs that had been performed over a time period of eight years, and they did a qualitative analysis categorizing them. It’s really two senses. One of them was about what the type of problem was that prompted the RCA and the other was about the recommendations. We're going to be talking mainly about the recommendations side of it rather than the cause of the accident side of things.

The first interesting thing in the results, there were 302 RCAs in total conducted over this time, and for 196 of those, the finding was that the required standard of care was met, and therefore, there were no recommendations. David, what does that say to you about the purpose of an RCA? If they find that the required standard of care is met, there's no recommendation.

David: I think it makes the purpose of RCA to be a defensive political process. I was left wondering about how the decisions for these 196 were made and then the 106 that didn't meet the required standard of care. I think when I had a look, nearly 40% of those 106 involved the death of a patient. I assume that the other 60% was where there was a really clear action by clinicians that was in breach of a hospital process. 

I suspected in the 196, the clinicians generally followed all of the stated rules. What had happened was there was some complication associated with the patient's underlying condition that was not something that the way that the clinicians practiced or the hospital systems operated they believed contributed to it.

In answering that question, what does it say about the purpose of RCA, I guess it means a purpose is to be a defensive political process as opposed to a genuine learning process because I suspect in that 196 cases where the standard of care was met, there'd be a huge amount to learn about the systemic factors, whether the standard of care that was followed is appropriate, whether it's something else needs to be learned, but I just feel like because the hospital didn't think they needed to go there for those ones.

Drew: I think it's somewhat telling, too, that the vast majority of these ones that did get investigated further were to do with surgery. If you go through the various departments of the hospital, you can almost think in terms of what are the political consequences of something going wrong here, and that's the order of the number of RCAs that happened. More than half of them are surgery and I think one of them is in the psychiatric care area. You think it can't possibly be the fact that surgeons are the ones making all of the mistakes or that this is where all of the systemic problems are.

David: That’s probably the place where the mistakes are the most visible. I think in some of the examples of, if you leave something that you shouldn't inside a patient during surgery or if you misadministered medication, then the mistakes are obvious.

Drew: Yes, and I think that's the keyword, ‘is.’ There's a very mistake-focus here, that things that get investigated are things that are mistakes. We can tell ourselves we're doing the RCAs for improvement, but we're not doing them to look for opportunities to improve. We're doing them, first and foremost, to categorize and respond to things that are perceived as mistakes.

David: The RCA then is about returning the organization to some status quo as opposed to some genuine learning process.

Drew: Yeah. An average RCA had 4.7% recommendations, which rounds up fairly nicely to a database of (almost 500) 499 categorized recommendations. It's those recommendations we're going to be looking at here. They followed a fairly standard process where multiple authors of the paper independently categorized them, and then if they disagreed about the category then they got a third author to come and look and resolve the dispute. We've neatly put every recommendation into one of around 10 different categories.

Listeners, if you'd like to play along at home, just try to guess what the top three types of recommendations are, and if you're at all familiar with investigations, I think you're going to hit it spot on. The number one response is 20% of recommendations are training. Top three, 20% training, 19.6% are process change, and 15.2% are policy reinforcement. 

Before you get really excited that the word change is in there in recommendation number two, they give a couple of examples of process change. It’s not actually changing what process, so much as making explicit what the correct practice was, so it's doing things like writing down what the procedure is to reinforce that a particular step needs to happen.

David: I was involved in a project with an organization that did a similar exercise as this. We actually had 128 Root Cause Analysis that had been done in 12 months and there was an average of 8 or so recommendations per investigation, so about 1000 actions. This is actually not too bad because, in that sample, more than 80% are related to those 3 things, which was retraining the worker, update the procedure to fix the gaps between the steps that were highlighted, or send around the safety alert, or a safety bulletin, or something to reinforce the required accidents in a situation. Eight hundred actions associated with training, updating procedures, and safety alerts.

Drew: In this case, we have a little over half of the recommendations. Some form of emphasizing what was already believed to be the correct practice. If we continue going down the list, the next three are policy change 8.8%, counseling which is basically the discipline of the staff member at 6.8%, and forms and paperwork change, 5.6%. 

In terms of what we'd think of as actually changing the safety of the work, rather than reinforcing existing practice or adjusting the work of safety around what happens, 4.8% so almost 5% of the recommendations were about physical environment change, and 1.8% so 2% of the recommendations were to do with some form of broader institutional change. 

Overall, not surprising, but still pretty disappointing that very very few of these recommendations are resulting in changes that are actually going to eliminate a hazard or reduce the likelihood of that hazard happening, unless we assume that everything is just a mistake made by the worker and we can somehow fix that mistake.

David: You've pulled a couple of quotes out of the paper here I see. Do you want to highlight one or two of these quotes?

Drew: Sure. The first one is that they noticed that a lot of these things in the RCAs are actually the same event happening again and again. Part of that, they say is inevitable. They say that, “Whilst recognizing that some types of events are impossible to eliminate completely, we propose that repeat events occur despite repeated RCAs because of the quality and types of solutions that are proposed by RCA teams.”

In other words, we're seeing the same events keep happening because our solutions aren't capable of actually fixing what's happening. That's backed up by another quote that they say, “Many times, the RCA does not identify meaningful aspects of the event, but simply observes that humans are imperfect,” and if your solution is just to say the human was imperfect, then it's going to keep happening because we can't make the humans perfect.

David: I think you're right. One of the things that I like thinking of is where you've got a human behavior is the thing that's going to prevent or create the situation, then you are setting yourself up for failure. If your system relies on the person to behave 100% reliably in all situations, in all complex situations, then you're always going to be disappointed.

Drew: Yeah, and that's what the authors say. One last quote they say, “And what resilience engineers would refer to as the workers imagine space? We see solutions such as reminding staff of the correct procedure and human error was determined to be a factor. This violates the basic premise of safety engineering, which recognizes that human errors will always be repeated. Just as our parents taught us when we were toddlers. Human error is inevitable, thereby proposing a solution for safety mitigation that focuses on reminding people not to make mistakes is an indictment of our approach to safety.” 

David, did your parents teach you when you were a kid that human errors will always be repeated?

David: I'm not entirely sure. Perhaps. I probably made a lot of mistakes.

Drew: I think we certainly learned when we were young that human stuff all the time. I'm not sure my parents sort of drum that into me is a message.

David: So, the study was conducted in a single hospital, but like I've described, at least in the case of [...], a vast amount of experience across multiple hospitals in institutions. They are pretty sure that their results in relation to RCA at this hospital over eight years were representative of the usual RCA practice across health care. 

They also lament that they feel that they're way behind other industries that are presumably much better at implementing these systemic changes or safety of work changes from the accident investigations. I'm not entirely convinced that the grass is always greener in the industries outside of healthcare, but I guess healthcare does look a lot to aviation and by comparison, where they're looking at the process from. I suspect that they see the accident investigation process in aviation as appearing to be much more mature. Do you have a comment?

Drew: I'd love to know where health care keeps getting this idea that they're terrible and that everyone else is doing really well. I suspect what happens is doctors tend to be very academic in their mindset. I suspect what they're doing is they're reading the safety literature and the stuff that gets published, and they're taking it at face value as if that's a representation of what other people do in other industries. 

I think if we threw a couple of these quality and safety committee members onto a construction site who got them to follow around a safety adviser in oil and gas, even in the aviation ground site for a couple of days, they would come back feeling a lot happier and more comfortable about the way health care deals with things.

David: I agree. The final paper we're going to talk about today, this paper is titled, What-You-Look-For-Is-What-You-Find–The consequences of underlying accident models in eight accident investigation manuals. Given the title there, What You Look For Is What You Find, yes, Eric Hollnagel is one of the authors, along with Jonas Lundberg and Carl Rollenhagen. This paper was published in Safety Science in 2009, about the time that Safety-I and Safety-II, Erik was writing about Safety-I and Safety-II and writing about What-You-Look-For-Is-What-You-Find in relation to organizational safety practices. 

This study reviewed eight investigation manuals. So basically, it's quite a simple study. What the authors did was they picked up eight different incident investigation manuals. The manuals themselves came from a combination of industries across the public and private sectors in Sweden and Norway. They got the Swedish Maritime Inspectorate manual, the Swedish Occupational Safety Regulator Manual, the Swedish Rail manual, the Swedish Road Safety manual, the Aviation Safety manual, the Patient Safety manual, so all these different industries across Swedish government organizations.

They also got Statoil, the Norwegian oil and gas company, which is now Equinor. They got their accident investigation manual, and then Forsmark which is a nuclear power station in Sweden. They got their accident investigation manual. What they did is they picked up all these manuals and they looked at three things. They looked at what the manuals said about the accident models or (I suppose) the accident causation ideas that were embedded within these manuals about how accidents are caused and prevented. 

They also brought in the scope of the factors involved or the factor domain. They said, what are these companies looking at when they investigate? So human technology organizations, how are they categorizing and understanding and making sense of the different factors associated with accidents? And then what does the system of investigation look like? What are the steps in the accident investigation process? What are the activities, and how much resources and direction are provided to each of the steps in the accident investigation process?

Drew: Just before we jump into the findings, I think it's worth pointing out that there is a real lack of high-quality research into how investigators investigate. I think we need to be a little bit cautious when we do things like review investigation manuals or even when we read accident reports, that these don't necessarily show all of the stuff that goes on in an investigation. 

They showed this work as an imagined picture. I actually suspect that real investigations are much more methodologically sophisticated than they appear from either the manuals or the reports. I think investigators do a lot of stuff that they might not even be fully conscious of themselves, in particular, lots of theory building, lots of sort of progressive comparisons of the theory they're building with the facts that they're observing. Those sorts of skills aren't necessarily covered in the manuals. 

A real sort of simple example for people to think about is imagining the Swiss Cheese model. You can say we're applying the Swiss cheese model. But the model doesn't tell you where to start. You just simply decide to start at one end of the model or the other or in the middle, that's a decision that's not documented anywhere in the model, but it is an important part of how you're building your theory about the accident.

Drew: I'm doing some work with an organization at the moment around the incident, investigation, maturity, and you can pick up a report and see a list of recommendations and think that the investigator hasn't identified the causes or the recommendations very well.

Then, as you said about investigations being a social process right up-front, you go in and explore that, all the tensions and trade-offs and negotiation with management and alignment of the investigation with the appetite and the politics within the organization. You pick up the report and go, this was a good or this was a bad investigation, but it relies on all of that intricate process that goes on in the background that you don't get to see. Also, what happens after the report actually gets published and what goes on within the organization after that.

Drew: And you can't see any of that from the finished published report. To the extent that I've had a number of conversations with other researchers about this idea of using accident reports as data to understand safety science, which is something that I've always had mixed feelings about. 

One of the things that Sidney Dekker had said to me is he thinks that the political process that goes into what goes in the final report—both the explicit process, but also the knowledge about how the report is going to get used, which shapes it—adds so much noise to the data that you're getting. You just can't disentangle the two. You can't even make meaningful statements about the political process because it's invisible from the report and you can't make meaningful statements about the accident because it's contaminated by the political process. 

To really understand how the investigation has happened, you've got to talk to the investigators as they're doing the investigation and you actually get them to think, speak aloud their reasoning as they look at the data and as they write the report rather than just look at the finished product.

David: For early career researchers or master's researchers, there's a good project to show some incident investigation through the course of a couple of incident investigations and do some ethnographic research around that would be a start to think about what we do in that gap.

Drew: Yeah, but I don't think that this problem invalidates the findings of this particular paper. So long as you keep in mind that what we're really talking about in terms of findings is what goes into that final report and that's where this what you look for is what you find the principal really comes out.

David: And as interesting as people as we are interested in this idea of safety clutter, these manuals contained a range of between 2000 and 17,000 words. So that's a fairly big, wide-ranging size and depth of information within the process description any way to describe a similar type of process. 

The researchers sort of came up with seven (let's say) typical stages of an investigation cycle. Those seven stages were planning, data collection, representation, analysis, recommendation, finalization and implementation, and follow-up. 

Basically, they looked at the way that these manuals described the process and provided tools to support the investigators at each stage in this investigation. They said that the manual is mainly focused on the data collection, analysis, and report writing, and there was only some coverage of recommendations, follow-up, dissemination of information, and organizational learning.

It was like the conclusion that I drew from at least that finding. It was like the incident investigation process was about delivering an investigation report. The incident investigation process was not necessarily about the overarching outcomes to the safety of work within the organization.

Drew: On the one hand, that's not surprising, but on the other, that is really, really telling because you ask people why they're doing the investigation. They'll always tell you we're doing it to learn. We're doing it to improve. We're doing it to find out what happens. But you look at the process that is followed, it's all about getting that report right, getting that report delivered as if the report in itself is suddenly going to create this following process. Then you look for who in the organization is responsible for that following process. They don't always even necessarily exist. The report's been delivered, the job is done.

David: Yeah. We'll come back exactly to that comment in the practical takeaways because I think that's a real gem. The findings were that the causes found during the investigation reflect the assumptions of the model following ‘what you look for is what you find’ principle. Depending on the accident causation model embedded within the manual, depending on the way that findings get categorized and labeled will essentially determine what the investigator finds in the process because they know how those findings have to be reported.

So that what you look for is what you find, I suppose we've mentioned that a few times already, and that was quite a popular saying or quite a well-known saying. But what they also suggested was that these identified causes or factors contributing to the incident become specific problems to be fixed. If you're an investigator and you put a cause on the table, then you have to put some kind of solution or recommendation.

They also introduce this term, what you find is what you fix principle. Drew, I kind of wouldn't mind reversing that and as I thought about this, I thought from an investigator's point of view, only find the things that you can fix. We mentioned in our safety work versus the safety of work paper that organizations really need closure and certainty around safety, particularly following a surprise or an accident, and that it's generally too uncomfortable for an organization to have a safety incident investigation with a long list of causes with no clear and easy solutions for them.

Drew: When you say that, David, I can remember a couple of really major investigations which I think made the (it'd be fair to say) political mistake of coming up with recommendations that were clear and well-supported, but which couldn't possibly be actioned and it's amazing the effect that had.

One of them I'm thinking of is the Waterfall Inquiry in New South Wales, where they recommended that all of the safety staff within the transport governing body needed to have suitable tertiary qualifications. That recommendation got tracked through the organization for years and years and years, constantly being reported because there were no suitable tertiary qualifications to give the people. And yet, it was such a well-supported recommendation that it just had to live on the table until it could just get diluted year after year after year and then finally signed off as having been thought about.

David: Drew, I wouldn't mind talking a little bit about the difference between internal and external investigations. In thinking about that idea there that only find the things that you can fix in the investigation, could you imagine an internal investigator taking to senior management or a board these causes saying there’s a lack of financial resources available for safety, there was an inappropriate management structure, there was a deficiency in your leadership decision-making around the work that led to this incident? I just don't think you can ever find a situation where these things are: (a) not politically may be able to be said, and (b) no easy solution or recommendation to say we should restructure your whole organization, change your budgeting process, or that senior leaders made a mistake.

Drew: Yeah, let alone stuff that criticizes senior leadership. I can't imagine a safety person taking to the board a recommendation that they weren't sure themselves they could fix because they know that in 12 months’ time, someone's going to ask them, have you closed off the recommendation?

It's like when you go into a performance review and you create an annual goal for yourself for the next year which is out of your control. Everyone knows you make sure you come out of your performance review with an action that you've already know exactly how you're going to get it done.

David: Drew, I want to ask you this question. Well, I hope there's a question in here of what I'm about to say, but in preparing this episode about the difference between internal and external investigations, I was thinking about the Chemical Safety Board investigation into (say) Texas City, for example, or had an investigation into the Nimrod disasters. 

Some of the conclusions around the organization (I suspect) look very different to those organizations' own internal investigations of what the causes of those incidents were. This idea of how different the causes that are arrived at for the same incident are between investigations done by external parties and internal parties. I might just pause, have you got thoughts about that?

Drew: I've got (I guess) conflicting thoughts. One of the things that has always surprised me is how often internal investigations turn into mea culpas. The internal investigation will tend to blame the organization even if the organization is not blameworthy. By accepting localized blame, you have easy actions that you can do to fix it. 

In fact, it might be politically beneficial for the organization to do an internal investigation that blames external parties, but the trouble is to get to that point you've got to go through management. So, your choices are to either blame someone very locally and you can fix it or blame someone outside the organization. But you don't want your investigation to go through blaming management in the middle. I think that's why internal investigations tend to stop very locally, even if it would be politically advantageous to say this is actually the regulator’s fault.

David: The bigger question here, if we reflect on the paper that we reviewed, not the one [...] but the RCAs in the hospital settings. Somewhere between 50% and (I think) 93% if you take out the 7% of root causes or the actions that address the safety of work. The bigger question here is whether there's any real point in spending a lot of time doing big formal incident investigations internally. I suppose at best, they don't do anything other than just create this safety work activity. 

At worst, they probably convinced the organization that they're learning and they're getting better at preventing incidents, and maybe weakening the controls or oversight processes around safety management. Should we even do formal internal investigations?

Drew: My personal opinion here is that internal investigations are creating a rod for your own back. They're creating legal difficulties for the organization. They're creating administrative difficulties. They're diverting resources into spending time on safety work, which is not going to improve the safety of work. So it's not just that they're not being helpful, it's that they are engines that are driving lots of counterproductive activity.

I really think that it is important for organizations to not just think about whether these are adding value, but to look at how much primary and secondary work they're generating for the organization because I think actually they may be creating negative value a lot of the time.

David: Yeah, thanks for your comments, Drew. I'd sort of just throw in those at you and we've got a very, very broad. I appreciate that we've gone very broad, but I think your comments there are absolutely right. Management incident investigation is one of those central processes in safety management. It's one of those central processes that generate safety work. It's one of those processes that probably influenced climate and culture a lot because (like you said) looking for mistakes, looking for blame. It is really something that lends itself to us having a real critical review of our practice within an organization.

Let's step into practical takeaways for the day. I'd throw in a few in here, and I wouldn't mind getting your thoughts on some of the things that I thought about in relation to takeaways. The first one would be, I think it's really important to think about your model of accident causation. You mentioned Swiss cheese. I think in some of those manuals in the paper that we reviewed from Hollnagel and others, we talked about dominoes being used. There are different types of instant causation models and different types of incident investigation methodologies that line up with those different models.

I think it's important for organizations to think carefully about them. Think carefully about how their internal investigation process matches the model that they want. Do they want to find out what the person did wrong? Do they want to find out about underlying organizational factors? Different models and processes will get you to different places.

Drew: One of the practical things I'd suggest there is I hear a lot from company safety departments, particularly if they tend to outsource the investigations to local people, to local supervisors and site managers. They're very concerned with the quality of their investigations. You interrogate a little bit further, you realize that quality is actually about how well the investigation reports match some of these model-based requirements. 

The bit I'd say there is, think carefully about what you want the outcome of the reports to be and consider investigation quality as related to outcomes rather than to matching a particular model. The more requirements you put on the secondary things, the more likely the report just matches those requirements rather than actually achieves the goal you're trying to get out of it.

David: The next thing I wouldn't mind getting your thoughts about is this cause to action process. When we talked about don't finding things you can't fix and what you find is what you fix and what you look for is what you find. This idea that if we've got processes in our organization where causes get categorized and popular ones (something) like ICAM where there's a classification of causes, so people, environment, equipment, processes, and organization.

Investigators know they're going to need to put causes into these categories. They're also going to know that people and process factors are easy to come up with the actions like we've spoken about, about training and procedural updates, and get to know that organizational factors are harder to come up with actions for.

Thinking about the way that causes get captured and how we can do that in a way that (I suppose) forces a broader identification of factors, maybe finding ways in our process to decouple the cause to action link. In here I'm thinking about, typically we make it the responsibility of the investigator to come up with the recommendations as opposed to just making it the responsibility of the investigators to come up with findings, and then hand a bunch of findings over to the organization to say it's not our responsibility to come up with the actions. I think if you expect the investigators to come up with the recommendations, then they're going to limit the causes that they put on the table.

Drew: I think you're right that coupling is a real problem. I know that some organizations specifically do create decoupling. I know a couple of safety academics who I'm not not going to name in this podcast, who tend to judge investigation quality on whether every finding is linked to a recommendation. 

Some organizations have that as a rule, every finding must have a matching recommendation. If you think about it, that rule actually works in reverse. It doesn't make sure that there's a recommendation for every finding. It actually makes sure that there are only findings where there can be a recommendation. If you get rid of that rule, you encourage broader consideration of causes.

David: No, I agree, Drew. I know organizations that do that actually have certain types of causes, that it's mandatory to have a recommendation to address that. I think all it does is censor certain causes of causes from the report. 

The next link is this action to the evaluation process. So we talked early on in the podcast about how the incident investigation process is generating hypotheses about what might be able to improve a certain situation in the organization. Having in your incident investigation process is some sort of action evaluation process, so how you go not just about recommendations, but recommendations to what the desired change that's meant to be driven by that action. What's the mechanism? What's the measurement process around? How are we going to know if that change is happening? And what the evaluation criteria is the effectiveness?

Let's just say arriving at the recommendation is only 50% of the process and going from the recommendation all the way through to evaluation of effectiveness is the second 50%. I don't know if the second 50% happens in organizations other than just a tick in the box at some point that says the recommendation is done.

Drew: David, I'd be interested in your thoughts on how we could fix this. One of the things I was thinking, as you were saying it, is that we tend to make the recommendation, the thing that needs to get closed out. If you've done the recommendation, you're done. Even if the recommendation is consider this or review this, we can say we've done the consideration or we've done the review. What would it look like if instead of that, we listed our goals and our suggestions for how to achieve those goals?

The goal is we need to improve communication within the organization. Our recommendation is we think this might work by providing communication training, but the recommendation never gets closed off until the goal is actually met. If you provide the training, communications still don't improve. You can't close it off.

David: That idea of goals or even when you're talking there, I'm thinking of outcomes. Before your list of recommendations, the investigation process had a set of outcomes to be achieved. That puts the recommendations into context because typically the recommendations are means or process-type actions like communicate this, or update this procedure, or do some kind of process or step towards an outcome. But rarely is that outcome clearly stated of what's trying to be achieved. I think something as simple as that would add a huge amount of value to the recommendation process.

Drew: That also lets people refuse the recommendations without disagreeing with the goal, which I think is sometimes necessary. Someone could read the report and say, okay, I agree with this goal that we need to better equip our supervisors for this situation. But I disagree with your recommendation that every supervisor should be tertiary qualified because that's just not practical. However, in pursuit of that goal, we can do this thing instead.

David: Yeah, I agree, I think we can have endless examples of the types of actions that we see in investigations and we could have debates over the outcome. I think we've said also in our safety work versus the safety of work, it's very hard in these processes for managers to disagree with an investigator because they'd be seen to be having desiring a lower level of safety, which doesn't create a psychologically safe space to have some real critical debate over whether the actions are really useful.

So when you quietly talk to your manager and say he's doing all of this training, all of these procedural updates, all of these safety alerts, and all of these toolbox talks, is that improving your business and improving your safety? Every manager will quietly say no, but inside a meeting, senior management would nod their head and say yes, these are good actions. We're going to do them. I think creating some steps that enable that pretty cool conversation is going to improve the process. 

The last practical takeaway here, I didn’t know quite how to describe it, but I thought about the mindset and the autonomy of the investigators and this idea of safety work versus the safety of work. I said we're going to come back to this when you mentioned it earlier in the podcast in the practical takeaways. 

This idea is the objective of the investigation to comply with the investigation process. I've done a good report, it's filled out the template that needs to be filled out. I got a recommendation for every cause. I've done it within the 28-day mandatory timeframe to complete the investigation. I've done a good job as an investigator.

As opposed to the objective of the investigation to arrive at proposed actions and evaluation methods that improve the safety of work. This idea of quality over compliance. What is the goal of the investigator when they're doing the investigation process? How much autonomy they have to take three months for a complicated investigation instead of meeting the one-month deadline, and so on?

Drew: This is a point where I think we do need to be not cynical but honest about the purpose of conducting internal investigation within organizations. We can be as aspirational as we like about how to redesign investigations to improve the organization, but that needs to be in the context of that's not actually why people do investigations. 

People do investigations to achieve closure and to return the organization to a normal state of operations. That is always at least one of the goals, even if the improvement is also a goal. That finding of closure is important and it is legitimate. If we ignore it, then we don't understand all of the pressures that are placed on investigators. 

I think the key to this suggestion you got here is that decoupling that you suggested earlier. That we need to somehow decouple the purposes and make it possible to get your thing delivered in 28 days without that being an end to the whole process, and have the opportunity that yes, I can deliver my quick and simple report. That doesn't close off learning. That just gets that out of the road, so now I can focus on taking these hypotheses or taking these suggestions and working out how to improve the organization based on it.

David: Thanks, Drew. Is there anything you'd like to know from our listeners in relation to accident investigation? You spend a lot more time in your career thinking about accidents and investigations. Is there anything you'd like to know?

Drew: I do know that whenever we talk about investigations, lots of people get up in arms and they say this applies to other people and other people's investigations, but not to my own. What I would really like to know is if you think the stuff we talked about in this episode doesn't apply to you, what's your secret? What are the key tips you would give to other organizations that you think escape from the RCA trap and allow investigations to lead to more systematic long-term organization improvements?

David: I look forward to getting some of that feedback because I think the practical transfer of lessons would be something that our growing listener group can share with each other. We tried to do that a little bit with some of the posts on LinkedIn for the group. I think there are about 800 people now that follow the Safety of Work podcast on LinkedIn. Trying to create a sense of community and sharing across that group would be really good.

Drew: David, our question for today was do accident investigations actually find the root causes? What's the answer?

David: I guess most likely no. Based on the research that if there's a thing such as a root cause or even the dominant factor associated with an incident, most likely your incident investigators are not finding them or at least not reporting them. I guess what we say through the conversation on this podcast is maybe no doesn't have to be the answer, but for it to not be the answer, it requires a fundamental rethinking of the way that we do our internal accident investigations.

Drew: Thanks, David. That's it for this week. We hope you found this episode thought-provoking and ultimately useful in shaping the safety of work in your own organization. Send any comments, questions, or ideas for future episodes to feedback@safetyofwork.com