The Safety of Work

Ep. 118 How should we account for technological accidents?

Episode Summary

Have you ever considered who shoulders the blame when technology fails us catastrophically? Today’s paper by Dr. Katherine Elizabeth Kenny, “Blaming Deadmen: Causes, Culprits, and Chaos in Accounting for Technological Accidents” examines the complex web of causes behind technological accidents. We examine the chilling case of the Waterfall rail tragedy, unraveling the layers of human judgment, mechanical failure, and the systemic implications that ripple through our safety practices. Kenny's insights offer a transformative lens on how we perceive and address the chaos of disasters, challenging us to rethink the assumptions that underpin our search for answers.

Episode Notes

Using the Waterfall incident as a striking focal point, we dissect the investigation and its aftermath, we share personal reflections on the implementation of safety recommendations and the nuances of assessing systems designed to protect us. From the mechanics of dead man's systems to the critical evaluation of managerial decisions, our dialogue exposes the delicate balance of enforcing safety while maintaining the practicality of operations. Our aim is to contribute to the ongoing conversation about creating safer work environments across industries, recognizing the need for both technological advancements and refined human judgment.

Discussion Points:

Drew loves a paper with a great name
The circumstances surrounding the Waterfall rail accident
How the “dead man system” works on certain trains
Recommended changes from investigation committees
In the field of safety, we seem more certain about our theories
Exploration of narratives and facts in accident investigations
Dead man's system and Waterfall derailment's investigation
Post-accident list of operator failures
Safety theories and organizational fault correlation critiqued
Evolution of railway safety
Discussion on managerial decisions amidst imperfect knowledge
The importance of context in incident investigations
Safety management systems and human judgment
Insights on enhancing organizational safety
Theoretical conclusions
Practical takeaways
The answer to our episode’s question is, “yes, keep it in mind as a digital tool”

Quotes:

“I find that some of the most interesting things in safety don't actually come from people with traditional safety or even traditional safety backgrounds.”- Drew

“Because this is a possible risk scenario, on these trains, we have what's called a ‘dead man system.” - David

“Every time you have an accident, it must have objective physical causes, and those physical causes have to come from objective organisational failures, and I think that's a fairly fair representation of how we think about accidents in safety.” - Drew

“They focused on the dead man pedal because they couldn't find anything wrong with the design of the switch, so they assumed that it must have been the pedal that was the problem” - Drew

Resources:

The Paper: Blaming Dead Men

The Safety of Work Podcast

The Safety of Work on LinkedIn

Feedback@safetyofwork

Episode Transcription

David: You're listening to the Safety of Work Podcast episode 118. Today we're asking the question, how should we account for technological accidents? Let's get started.

Hey, everybody. My name is David Provan, and I'm here with Drew Rae. We're from the Safety Science Innovation Lab at Griffith University in Australia. Welcome to the Safety of Work Podcast.

In each episode, we ask an important question in relation to the safety of work or the work of safety, and we examine the evidence surrounding it. Drew, this is another one of your papers that you were, I guess, looking for an excuse to dive into. Why do you want to talk about what we're going to talk about today?

Drew: Sure. When you say my paper, this is not a paper that I've written. This is just a paper that I found courtesy of Ben Hutchinson, one of our PhD candidates. First thing about this paper is just that I'm a sucker for papers with good titles, and this paper has a great title as we'll get to. But also, I find that some of the most interesting things in safety don't actually come from people with traditional safety or even traditional safety research backgrounds.

People who come in from the outside without the same exposure to the literature, without the same exposure to safety practice, often have really insightful and interesting challenges to our basic assumptions. This is that paper, and I really like it.

Just broadly what the paper's about, it's about the epistemology of explaining accidents. It was pointed out to me last week by one of our researchers, Jop Havinger, that the word epistemology is really something that doesn't add a lot to any conversation that it's part of. When we talk about epistemology and ontology, they're just usually jargon that don't really add to the meaning.

To put it a bit more non jargony, when we investigate an accident, how much of what we're doing is finding out real and certain things about the world. And how much is us trying to impose meaning onto the world? That's really what the paper's asking. It's challenging whether if we are imposing meaning, is that something that is useful to do? Is it going to lead to good outcomes from the investigation or not?

David: It took me about six months to get my head around ontology and epistemology during the early stages of my PhD. How it applies, I'm still not entirely sure, but I guess it's the way the world works, the way that we think the world works, and how we have benefit or usefulness of both of those types of understandings. Drew, the paper that we're going to review today, the author of the paper is Katherine Kenny. Dr. Kenny wrote this piece paper. I don't know how you found this out Drew, but wrote this paper as an honors thesis at the time of the interim report to Waterfall was released, so I assume that was around 2004 or thereabouts.

Drew: Yeah, David. First of many rabbit holes in this episode is I was wondering how a PhD candidate into health sociology at University of California was writing a paper about railways in Sydney. First thing I found out was that Dr. Kenney now works at University of Sydney, so I thought, okay, maybe she's from University of Sydney. Then I thought, why am I making assumptions? Why don't I just ask her?

I sent her off an email, and she gave me a nice reply back explaining that she originally wrote this as an undergraduate assignment, and then she wanted to do a better job of it, so she expanded into an honors thesis. When she was a PhD candidate, they had to have a certain number of published papers for the program she was in, so she took her honors thesis, revised it, improved it again, and it became this paper.

Her career's taken her in quite a different direction. That's not unusual that someone does their honors on one thing, and then their PhD leads them down a different path. That's why this is the one safety specific thing that she's done. She's now a very respected sociologist of health and health care. This was just an interesting paper that she wrote earlier in her career.

David: Great. Drew, the paper title that you mentioned you're a sucker for, and we've talked about paper titles and yourself on this podcast a number of times, but the title is Blaming Deadmen, which actually has a dual meaning in this in this paper. Blaming Deadmen: Causes, Culprits, and Chaos in Accounting for Technological Accidents.

The final version, I guess, published in the Journal of Science and Technology and Human Values. It was published in 2015. Drew, you've dug up an interesting version of an early paper title, which to try to avoid the explicit lyrics stamp on our podcast, it was titled something like SHIT Happens. I don't know how you dug that up either.

Drew: Yes. That's what Dr. Kenny says. She called it when she submitted it as an undergraduate assignment with a more academic sounding subtitle. The eventual version of the paper for publication reasons was unable to use the swear word for the same reason that there are some other famous titles with swear words in the title that just get cited a lot less often, because I guess people are prudes. If instead we got this great pun about blaming dead men, I think it was worth it.

David: Drew, we've talked a little bit about the paper. We've talked just about rail accidents in New South Wales in Australia, and we'll talk about that shortly. We've talked a little bit about how we ascribe meaning to certain things and how we try to understand objective facts about the world. Do you want to talk a little bit about the methodology of this paper and maybe a little bit about some of the background into the incident that we're discussing?

Drew: Okay. The best way to describe it would be a reinterpretive case study. In other words, this is taking a particular report or set of reports into an accident and then examining those reports as the data, not to say something about the accident itself, but to say something about how we, when we write accident reports, create and interpret meaning about those accidents.

What Kenny is doing is fairly similar to what we did in our episode 21 when we were talking about the Dreamworld report, or in episode 15 where we're talking about the Brady mining report. But obviously, this is a lot more academically rigorous than any one of our podcast episodes. David, I think you were around in the rail industry at the time of the accident. Do you want me to go through the very basic facts, and then you could maybe say a little bit about some of the surrounding context?

David: Sure. I'll do my absolute best, but it has been more than 20 years, Drew, so we'll see how we go.

Drew: Okay, here's what we know for sure. In 31st of January, 2002, there's an outer suburban passenger train. It's not like one of these big inner city, it's a four-carriage train. There's only about 40 passengers. It's traveling between Sydney and Wollongong, which is a town south of Sydney. To get from Sydney to Wollongong, you basically go through a national park, and the train derailed at high speed near a place in this national park called Waterfall.

At the time of the accident, the train was traveling at around 120 kilometers per hour. If you know anything about Australian railways, you know that trains are not meant to go that fast. Pretty much ever, I think in Australia, we don't have trains that are capable of getting up to those speeds safely.

It's 120 kilometers an hour around a curve where the speed limit is supposed to be 60 kilometers an hour. The train overturns, goes off the edge of the curve, kills the driver, six passengers, injures everyone else on board. Some of those just minor bumps and bruises, some of them are really quite severe life-changing injuries.

That's actually really all we know for sure, but the engineering investigation couldn't find anything technically or mechanically wrong with either the train, the rolling stock, the infrastructure, the track. The data logger wasn't operating. There was no evidence that the driver was deliberately doing anything wrong.

The inference was that the driver who was in fairly poor health must have had a heart attack. In the process of having a heart attack, the deadman system didn't stop the train, so their body just fell onto the accelerator and the train kept accelerating. That's it for the basics. David, do you want to say a little bit more about either your own experiences around the accident or about how the technology is meant to work?

David: Yeah. Drew, I will. I was actually working in a central safety team in East Coast Australia passenger rail organization at the time of this event. That wasn't the State Rail Authority, which is the operator that had this incident, but obviously an operation very close by. I was on the task force responsible for implementing the recommendation, or at least evaluating the appropriateness of the recommendations from the investigation here into our own rail operations. I was quite familiar at the time with the findings.

I mentioned with the paper title, the word deadman had a dual meaning and because in this instance, we unfortunately have what seemed like a train driver that died while he was driving. Because this is a possible risk scenario on these trains, we have what's called a deadman system.

These intercity passenger trains operate with a single driver. You can imagine that the risk of an incapacitated single driver is a well-understood risk, so we've engineered systems to understand if the driver becomes incapacitated or not and take actions to safely stop the train. This is the deadman system.

These particular Tangara trains that were in operation or involved in the train that was involved in the incident have two connected deadman devices. The driver either has to be maintaining a constant twisting tension on a handle on their console. If they release that tension, then the deadman system will go off. Or they have to have a constant pressure on a foot pedal. If that pedal touches the floor, or if the weight of that pedal is completely released, the spring goes off. One of those two systems always have to have maintaining a constant state of tension.

I guess that the theory is that if someone becomes incapacitated, their body will go limp, and they won't maintain that constant tension, and then a solenoid will activate the emergency brake, which is a pneumatic braking system. It's constantly charged that lifts the brake shoes off the wheels. If the air in that system is released, then all the brakes activate, and the wheels literally lock up. The train will slide along the tracks until it stops. That's basically how this system works.

I guess in this situation, maybe to steal the conclusion of the report, it appears as though that the slightly larger driver, even if they did have a heart attack and become completely incapacitated, the weight of their leg was able to maintain a constant pressure on this pedal. As far as the train was concerned, the driver was still driving the train.

Drew: Thanks for that explanation, David. One other piece of background we should just explain very quickly is what inquiry this was. This was a special commission. Australia has a system of public inquiries fairly similar to what happens in the United Kingdom, depending on exactly where they happen and who authorizes them. Sometimes we call them special commissions, royal commissions, or just commissions of inquiry.

They used to be fairly rare, you'd have one or two a decade. They're becoming increasingly more common, and it seems like there's always some sort of royal commission or special commission going on in Australia or into something. They have fairly strict terms of reference set by the government. Within those terms of reference, very wide ranging power to collect evidence and make recommendations. They get authorized. Each commissioner gets a letter directly from the head of state who in Australia is either the state governor or the governor general of the Commonwealth.

Usually, these are run by some sort of senior judge or retired senior judge. They might have working for them different investigators, sometimes technical advisors. In this case, they actually had multiple technical committees that were working under the judge.

What's interesting about Waterfall is the particular judge was retired Supreme court judge, Peter McInerney. He was also the commissioner for an inquiry into another rail accident, the Glenbrook rail accident in 1999. It's fun reading the Waterfall report because it's pretty clear that they dragged him out of retirement. He does this one detailed investigation. Not long after he's finished that, they dragged him out of retirement again. He finds himself like reading the same documents and making the same recommendations because they haven't acted on all the stuff that he said they should do the first time.

About half of the Waterfall recommendations are about how to make sure that Royal Commission recommendations get properly followed. One of the rules he put in place is that the government had to report every year on their implementation. Right up to 2019, 20 years after the accident, they're still producing annual reports on their progress in implementing the recommendations.

David: Yeah, Drew. In fairness, there was only a few years between Glenbrook and Waterfall, so it takes a year or two for the report to come out. Then a year later, you have this second incident.

Drew: Yes, but he was just very visibly annoyed that they hadn't taken immediate action.

David: Drew, do we want to talk a little bit about where Dr. Kenny's analysis starts and then get into the paper from there?

Drew: Okay. The Waterfall Commission produced in the end, three reports. There was an interim report, a draft final report, and a final report. The interim report was mainly about the physical causes of the accident. The rest of the investigation was basically how did those physical causes come about. Trying to explain how someone who was at imminent risk of heart failure was allowed to be driving a train and how there was this deadman system that didn't work properly that was allowed to be on the train.

Dr. Kenny's analysis starts with examining this assumption that every time you have an accident, it must have objective physical causes. Those physical causes have to come from objective organizational failures. I think that's a fairly fair representation of how we think about accidents and safety.

We talk all the time about socio organizational or socio technical causes. What we really mean by that is this assumption that we know physically what causes an accident, but that we can work backwards and say what organizationally caused the accident. We can and should try to fix both, not just fix the immediate technical causes, but also fix what went wrong in the organization.

Kenny's just examining, is this assumption really true? What background view of the world do we have that makes us think that this is reasonable? Is it in fact reasonable? What she says is that identifying the causes of accidents depends on determining what went wrong against a background of assumptions about the way sociotechnical systems normally function.

Even though this set of assumptions never gets explicitly considered, examined, or tested, all of the recommendations are based on this ideal version of how it normally works or how it's supposed to work. There are all these conclusions about how the system must have been broken in order to permit the accident to happen.

David: Drew, in the paper, she goes into quite a lot of historical detail on safety theory. She talks about Perot and normal accidents. She talks about high reliability, organization theory. She talks about WINS work, and really, especially trying to make sense of all these different safety ideas and what they're trying to say in relation to this incident. It's actually quite a nice summary of some of these earlier ideas in safety about technological assets.

Drew: Yeah. It's always interesting to see what people from outside of safety think are the most important big ideas to be cited in safety. She actually draws out some of the biggest influences, some things that lots of people in safety have never heard of. I think she manages to cite almost every one of John Downer's papers that had been written up to that point.

David: I thought that might have been another reason why you like this paper. We did an episode on one of John Downer's papers about nuclear.

Drew: Yeah, we've done an episode directly about John Downer and a couple of other episodes that touched on his work. Yes, I find his outsider perspective really quite interesting looking in at the way we tend to think about technology. The main point she's making through this literature is just how certain we seem to be with some of our safety theories about how safety is meant to work and how other fields, particularly science and technology studies, tends to see the world of work and technology as a lot more ambiguous, even down to the actual technology itself as being a lot more ambiguous.

This is often one of those points of big contention between engineering types and studies of technology types. Engineers tend to think that the technology sociologists just don't understand how the technology works. If they understood it properly, they wouldn't think of it as vague and ambiguous. But then the studies of technology people come back and say, yeah, but you only think how it works. It's a lot more ambiguous and a lot more subjective than engineers and scientists like to pretend. I have problems with that when it's applied to physics, but I think it's actually very true when it's applied to engineered systems.

David: When we talk about going from a technological failure, if you like, and drawing a link back to, you said, a broader socio organizational type of failure, and we spoke a bit about counterfactual reasoning in the episode with Ben Hutchinson a few episodes ago, I think the idea in the Waterfall report that the failure of the deadman system was a result of a weak safety culture. That's a long bow I've heard of a long bow in going from that technological system through to something that's vague and generic as a weak safety culture.

Drew: Yeah, and we'll come back to how some of those links are drawn and how valid they are. What I'd like is that Kenny starts right with the fundamental question of whether there was anything wrong with it in the first place. The special commission comes at the assumption that, okay, it didn't work in this particular case. Therefore, it must have been a bad system. There was something intrinsically wrong with it. Therefore, people should have known that there was something intrinsically wrong with it. Therefore, they should have fixed it and had a better system or had a second system that was monitoring the driver as well as this system.

David: I also don't mind that reasoning from a safety engineering point of view because you do have a hand stick in a foot pedal, and they do activate a single solenoid that dumps the brake system. From a common mode value point of view, you've got one layer of protection. The idea of having a slightly higher safety integrity level and having a second layer of protection, I don't mind that from a safety engineering logic point of view. I guess that's something that I took out of the paper, I don't mind that engineering logic for a system that's safety critical.

Drew: Yeah. I don't mind the idea of saying we could have had a better system. It is in fact directly true that immediately after the accident, the local railway authority did implement in those exact trains, a second monitoring system. There's a big jump from we can make it better and we can add something on to we should automatically have done that a decade before the accident. We should have prioritized that over all of the other things we are prioritizing.

One of the things that Kenny points out is that this has been a controversial system from long before it was even designed. This idea of we've got to have a system for checking if a driver is incapacitated almost always puts the burden back onto the driver to prove that they are still there. Every technology that people have come up with either involves some form of really aggressive privacy intrusive monitoring of the driver like a camera staring in their face. Or it involves the driver having to do some sort of constant or repetitive movement.

The types of systems that they'd had in New South Wales for decades and decades before this train was even designed, they had all safety problems. They had these workers experiencing chronic injury from having to continuously twist a lever or apply pressure to something. They knew that any time they tried to make a system that was user friendly, it was easy for the drivers to circumvent. But any time they tried to make something that was hard to circumvent, it was really, really hard and painful for drivers, so the drivers had super motivation for finding a way to work around the system.

They'd made quite a conscious decision. If drivers are going to circumvent it, no matter what we do, Then let's at least make something that is comfortable for the drivers so they don't even want to circumvent it, and then we've got the best chance of not hurting the drivers and having a system that is working like it's meant to. There was really some genuine thought, trade off, and project management decision making that went into how they'd set up the particular system.

I remember walking around a rail yard early in my career. If you walked around a rail yard, you saw a whole lot of break blocks. These are the actual unit of the break, I guess pad itself. It's like a rectangular prism and it's probably a few kilos. I remember asking a driver, why are there so many break blocks lying all over the floor in the rail yard? The answer I got was, they're really useful for helping drive the train. I guess read into that what you like.

Drew: Yes. That's in fact why this particular system, you've got to hold a certain amount of pressure on the pedal, because if you've got a system that the pedal has to be pressed all the way down to the floor, that's really easy for the driver to just put a big heavy block on. If you've put a system that has to be pressed halfway, then the driver's got to find a stick that's the right perfect length in order to keep the system working. Yes, that's a true story about these systems as well.

David: Okay, Drew. What else do we want to say on the history of the deadman design?

Drew: I guess the final thing to just say is that we don't actually even know whether the driver was turning the stick or pressing the pedal at the time of the accident. All of this is just assumption. We don't actually know that they had a heart attack. It's just like a process of elimination. That's what we think probably happened. They focused on the deadman pedal because they couldn't find anything wrong with the design of the switch, so they assumed that it must have been the pedal that was the problem.

Here are all of the things the inquiry said about the pedal. I'll just talk about the operator instead of the exact words here, so I'm paraphrasing. The operator failed to conduct an adequate risk assessment of the pedal. The operator failed to conduct a risk assessment to determine whether the overall driver safety system would stop the train and control the risk of a train driver being incapacitated.

The operator failed to do a design review of the driver safety system to determine whether the design concept would control the risk if a driver became incapacitated. The operator failed to implement an engineering management system. The operator failed to investigate whether the requirements of the driver safety system would be met by the particular design.

The operator failed to prepare a functional performance specification for the driver safety system. The operator failed to determine whether the design of the driver safety system would work before the manufacturer was contracted to build the trains. The operator failed to perform a functional performance specification to identify the means by which there'd be verification of the design of the driver safety system of the trains.

The operator failed to put in place a quality assurance program. The operator failed to implement a system of regular review to determine if the system was going to achieve the functional purpose. And the operator failed to conduct a risk assessment to determine whether or not a vigilance device should have been added on top of the driver safety system.

David, that's quite a list of failures. It sounds like the operator was absolutely terrible in managing safety. Look at all the things they did wrong just for this particular device. The assumption is that if they've done any of these things, then they would have had a perfect pedal instead of this pedal that the driver of the wrong weight would fall on and might get stuck in the position. David, you're the railway safety expert. Are there perfect systems that do this just sitting out there, waiting to be implemented for the same cost and same functionality?

David: I think expert is a bit of a stretch. This was 20 years ago as well, and these trains are manufactured a long time before that. I think this might even be a 70s or 80s type design train. In terms of functionality and cost, we can look at this through a 2024 lens, and we can come up with extra biometric monitoring. We can talk a whole bunch of things.

The way that the railway is moving now with actual automatic track protection and European train control systems means that if the train's on a section of track doing anything than what the train's meant to be doing on that section of track, then the system will automatically shut itself down.

At the time, they'd already invested in these multiple systems, one for the hand, one for the foot, and you mentioned a vigilant system. I guess after the incident, all they did was put a button on the dash where it lights up every minute or so. If it's not pressed by the driver, it makes a really audible, annoying alarm. If it's not pressed, the system shuts down.

They did add, I guess, quite a relatively simple system. But again, you still got a common mode failure. You're still using the same emergency brake system, the same activation solenoid, and the same circuitry and that. I think at the time, it would be easy to see that we've never had an incident with the system. I think it's okay.

I think Challenger comes up a lot in this report as well because they liken this deadman's pedal to the O rings involved in the Challenger and the launch decision there. I think similar to Diane Vaughan's work around Challenger, I think another way of looking at this is that people did everything that made perfect sense to them at the time.

Drew: Yeah, that's a good way of putting it. I just want to give two quick quotes, one from the way the Commission talks about this, and then one from the way Dr. Kenny talks about this same thing. The Commission says, when risks such as the deficient deadman foot pedal are well-known to persons in managerial positions. Nothing is done to investigate, let alone eliminate the known risk.

The inference that must be drawn is that those persons in management positions do not know how to determine what risks are acceptable and what risks are not acceptable and need to be eliminated or controlled. That's very fixed language that the risk was there, the risk was well-known, the risk was appreciated. The only explanation is that the managers must have been stupid. Even though the evidence was in their face, clear, and well-known, they must have still almost like deliberately made the wrong decision, because if you accept that it was well-known, then they must have actually done something nefarious.

What Kenny says is that according to the objectivist logic of the inquiry's legalist rationality, it appeared as though warning signs of future danger had been willfully ignored. However, in the years before the waterfall accident, the level of protection from driving and capacitation afforded by the deadman system was but one of many competing objectives that the Dengarra Deadman design attempted to balance.

I just like to put both those explanations in front of anyone and say, which do you think is a more reasonable description of what these managers were doing? I guess it comes down to if you're the person who believes that all managers are evil, but I certainly think that the first one still leaves us wondering, how can we possibly make sense of this? And the second one tells us clearly how we can make sense of it. Either they were doing something inexplicable, or they were actually acting reasonably with the information and the decisions in front of them.

David: Yeah, Drew. I worry that more people than not would think the first description is the better one. Anyway, maybe that's where we are.

Drew: I think the paper's got a great example. I actually dug up the example, because I wanted to check what the paper was saying about it and what the commission was saying about it.

David: You mean the memo?

Drew: The memo. I dug up the memo.

David: One of the definitions that the problems or the risks associated with the deadman design were well-known. Well-known is an interesting choice of words, a little bit like the deficiencies on the O rings in relation to cold temperature launch were well-known to engineers. Drew, the paper talks about this famous memo. I can't believe you went and actually read the memo.

Drew: Okay. Let me give you two things that the commission says about the memo. The first thing is they say, vigilance devices should have been installed on the trains when the deficiencies associated with the deadman system were first identified by a train driver, Mr. Wilkinson, in 1988. This is talking about the memo. The report gives three pages of analysis and discussion about what they should have done in response to this memo.

It concludes, I've set out in some detail the lack of proper responses by these three managers to the clear warnings given by Mr. Wilkinson. None of whom tested his findings in a moving or non static situation, or attempted to speak to Mr. Wilkinson about them. I believe their attitude shows apathy in the minds of managers in the State Rail Authority at that time in regard to critical safety matters. It demonstrates a pervasive lack of concern for public safety. David, from that, what are you expecting this memo to say?

David: I know the answer, so it's hard for me to give you what I would like to give you in terms of an answer to the question. But you would expect some fairly significantly compelling, almost as if like a train had actually crashed because this deadman system had failed and maybe no one was injured or killed as a result. But we have seen this happen, and we've then gone and recreated the situation. We understand the failure mode, and we absolutely know this is a problem.

Drew: Okay. The risk of making this episode run over time. I'm going to read the whole memo. As we go through all of the details in this memo, I want you to keep in mind that this is not actually a train driver. This is a driver trainer who is not qualified to drive trains. This is the guy who looks after the train driving simulator and don't imagine that this is like a high tech perfect replica of the train.

The drivers, before they go on the train, they hop in this fairly artificial simulator. This is the guy who looks after the simulator. He doesn't do any driving himself. Here is the memo. I find that if I place my feet at the front of the plate, the weight of my legs will hold the deadman during motoring. This can be achieved by jamming your feet under the heating. No introduction, no conclusion. That is the entire text of the memo.

After the accident, three separate managers were all supposed to have gone and followed up with Mr. Wilkinson, the non train driving trainer, and asked him exactly what he meant. Instead of doing what they actually did, which was quite reasonably assumed that a problem that occurs deliberately by a trainer in the simulator has very little to do with how the actual system works on the actual train, it's probably not worth spending their time on compared to other safety issues.

David: Wow. How do you raise a safety report in 50 words or less?

Drew: He didn't even say it was a safety issue. The word safety doesn't appear anywhere in the memo or in the title of the memo. He's just complaining about the simulator.

David: Okay. Got it.

Drew: What the commissioner's saying is people should have gone to him and just checked. Are you complaining about the simulator? Then they should have run a series of scientific tests instead of using their own judgment as to whether they could break the system on the train in response to the memo in order to clear up this important safety issue.

This was what Dr. Kenny says instead of the commissioner's three pages about what the managers should have done. Here, McInerney demonstrates the way that an objectivist approach to technology disregards the centrality of ambiguities and uncertainties in normal daily technological operations and discounts the necessity of reasonable assumptions that's integral to the normal operation of socio technical systems.

Further, this passage illustrates the objectivist faith that further testing of technical components can definitely resolve complex and often ambiguous questions that only in the aftermath of accidents appear as simple or well-defined. Basically she's saying that the commissioner wants every single little bit of ambiguity to be traced, pinned down, and investigated. You can't use your judgment, you've got to do scientific testing, and that scientific testing will give you a definite answer when this whole thing was a trade off in the first place.

There's no evidence that people didn't think that this was an imperfect system. It was a deliberate trade off between systems that couldn't be circumvented but made life hell for the drivers, and systems that had some weaknesses but were reasonably easy to operate so the drivers could get on with doing other things like looking out the windows, checking that the passengers are safe, monitoring the speed and looking out for red lights, and all those things we expect drivers to do other than manage deliberately difficult deadman systems.

David: Yeah. The language in there about ambiguities, uncertainties, trade offs, I think this in safety is so important, and we don't talk enough about it. I don't think Health and Safety talk enough about it. We definitely don't talk about it enough in senior management or executive levels of organizations.

The chief engineer or the project manager of this particular fleet will be dealing with wheelset issues, axle bearing issues, fire retardancy issues, rolling stock performance issues, and braking systems. There are hundreds of safety critical systems on these trains that are all in a various state of imperfection.

Just like you said to say, we've got no information to suggest that the deadman system technically isn't working. I don't think that it's in any way irrational for the project manager of this fleet to see this piece of information come in from the driver trainer and maybe not do anything with it. I know that people might judge me for saying that, but I think that's what Dr. Kenny's saying in this paper. This is just perfectly rational behavior.

Drew: Yeah. Should we talk briefly about the theoretical conclusions of the paper and then move on to some takeaways?

David: Yeah. We started with the ontological and epistemological position. The central part of this paper is having this epistemic kind of approach as a third way of thinking about accident causation. I'm going to leave that to you, Drew, as the academic expert to enlighten me and everyone listening.

Drew: Okay. This is drawing on John Downer's work explicitly. Kenny's not claiming to make some grand new discovery so much as pointing out an example of what Downer is talking about. There's a big push in safety to move away from technical or human error explanations towards socio technical explanations.

Both Downer and Kenny are saying this move that treats our organizational systems as if they are mechanical systems isn't necessarily a good way of looking at how organizations work. We shouldn't think of accidents as coming from failures of systems but from failures of understanding and sense making that may be inevitable.

This is the idea of an epistemic accident. It's a fundamental inability to work out in advance things that are ambiguous. We have to use human judgment. We have to make trade-offs. We have to work with incomplete information.

Inevitably, when we do that, sometimes it'll go wrong. That doesn't mean that the system is broken, it just means that when you've got uncertainty, sometimes you get unlucky. Even if you've got good systems, good judgment, good people, when they're working with imperfect information, sometimes they make trade-offs and we don't like the outcomes. Sometimes they take risks and the dice rolls the wrong number.

The problem, she says, is that if we base all of our recommendations and solutions on this idea of a broken system, then we're trying to impose a solution that requires a perfect reality onto this world that still contains all of these fundamental uncertainties. There's a really good chance that a lot of our recommendations are not actually going to fix things. They're going to run into exactly the same problems that caused the accident in the first place.

One of the fun examples she draws is that in the interim report, McInerney's complaining about this lack of implementing a vigilant system. The rail authority immediately introduced a vigilant system into the trains. In the final report, McInerney complains that the new vigilance system, the sound it makes is exactly the same sound as one of the other alarms. He just uses this as further evidence that they don't have a good safety culture or a good approach to risk assessment.

This is exactly the thing that he said was obvious that should just be done. He complains that when it's done, it doesn't fix the problem because the world is ambiguous, and the only way to have avoided this problem would be if in implementing the system, they'd done proper consultation with the drivers and proper trade-off between these different systems, and which system should take priority. You can't say, okay, well, maybe they should have done all of that, at the same time as you're conducting a royal commission that didn't use a single driver as a witness in the commission.

David: I think also, then the inherent trade off in needing to be seen to do something quickly and trading that off with a thorough investigation. Before we jump into the practical takeaways, I wouldn't mind just saying that we've talked about this vigilance system as an action. But like with any of these, at least that I've seen, these royal commissions is from memory, I've got in my head a number like 43, but I'm not sure what the actual number was, but there were many, many recommendations from this particular investigation report. I don't know if you got the number offhand.

Drew: There were about a hundred and some of them had multiple parts.

David: Okay, a hundred. My memory of 43 must have been the ones that we thought were relevant for the rail organization that I was in at the time. You then throw into this very complex, ambiguous system, a hundred suggestions all at once.

What's interesting in the report is when you see things like, okay, this driver was unhealthy and at risk of a heart attack, let's do a health check on all our drivers. Then the very next day, more than 13 percent of the state rail drivers were ruled out of medical fitness to work. Then you've got 40 cancelled train services, and you've got people working overtime and scrambling to try to figure out how you're going to operate a railway. That's just one example of tens and tens of actions. The question is, do they actually make anything better?

Drew: Yeah, there were actually a huge number of recommendations like that. My favorite one, because I was keeping a close eye on it at the time, was he wanted all of their managers to have proper safety qualifications. All the universities around Australia were saying, great, fantastic. You've got a thousand managers employed within this organization. They all need safety qualifications.

Does that mean a huge influx of money and resources into safety education in Australia? To which the answer is of course not. It just means the recommendation sits unactioned and then gets closed off by saying, we've got an external consultant to tell us that our existing training is adequate and certified.

David: Drew, are we ready to talk about some takeaways?

Drew: Sure. I found finding takeaways from this a little bit hard because it's mostly just challenging us about how we look at the world. I guess I've picked some takeaways that are not so much things you can do in your own organization as just maybe some challenges to thinking about safety, particularly when doing investigations.

The first one I've got here, David, is that if we want to improve safety decision making, then we need to be realistic about how decisions are actually made instead of when we don't like the outcome of a decision going back and assuming that decisions must have been made badly, negligently, or nefariously.

David: Yeah, I like that. I think any incident investigation needs to have a very strong view and mindset that people did things the way that they thought they should be done and were best to be done at the time. Our objective now is to understand why they did things in the way that they did, not to judge it as good or bad, but just to understand how our system was functioning to shape those behaviors and decisions.

Drew: The second one I've got is that after every accident, there's obviously going to be an incentive and a push to improve safety. Sometimes the accident has revealed a genuine immediate problem that needs to be fixed, but it's not always the case that our best investment is trying to stop the accident that's just happened. Often, we can pour resources into trying to prevent something that was fairly low likelihood and we just got unlucky. We'd actually do better pushing those resources into preventing a different accident or a different class of accidents rather than trying to just assume that because something bad has happened that we weren't investing enough in that particular area.

David: Yeah, I think that's really important, Drew. I was thinking about that previous practical takeaway about needing to understand why people were doing things at the time because the person could have been going, oh, in 1988 when this particular memo came through, we were working on this really challenging issue around some other critical system. We knew it was a massive problem, we had all of our resources devoted to it, and we managed to solve it.

You could actually go to the team, well done, that seems like that was a perfect trade off decision that you were making at the time. When this report lands, no one in the organization gets this report and goes, okay, where do these 100 actions stack up against the things that we're already working on?

It might not be that we don't agree that it might be a useful thing to do to spend a whole lot of time on this deadman system. But if I've got to take resources off people that are doing a whole lot of other high risk stuff just because this one thing happened, then I think what you're saying is, be careful of swinging the car across the road because there might be a bigger hazard there that your team are currently working on.

Drew: Yeah. I guess the particular thing I've got in mind is every single major rail accident in Australia, the recommendations have included review whether we should have a European style automatic train protection system. I don't know how many hundreds of millions have been spent investigating the feasibility of automatic train protection systems in Australia, and how many tentative projects have been started with partial implementations.

The fact that an accident has happened doesn't change the calculus that people have already investigated and decided not to do it. They just result in another feasibility study about whether we should do it or a small test project of whether we should do it. I don't want to make a judgment here about whether it's right or wrong that we've made that calculus, but we shouldn't be thinking that just because an accident has happened is suddenly going to change the minds of politicians about billion dollar investments in rail.

David: Drew, if it is any consolation, I know in your part of the world, in Brisbane, with the Cross River Rail project, that you will have ETCS in a reasonable portion of your network. I'm not sure how often you ride the train.

Drew: I don't know. Fun fact from the paper though is that Dr. McInerney apparently rode that exact model of train to and from the commission without any concerns about its safety.

David: There we go. Okay, third takeaway, Drew.

Drew: Okay. Sometimes it's better to just embrace uncertainty. I know that's easy for academics to say because we live in more of an uncertain space, but it is okay not to know exactly what happened after an accident. It is okay not to be able to say for sure what we could or should have done to prevent the accident. It is okay to say just sometimes SHIT happens. That doesn't mean that you don't care about safety. It's just being honest and realistic about what we know and don't know.

David: I think if you do find something, even if you don't think, hey, this might not be exactly what happened, but through the course of our discovery work, we found something, then find one thing that you might be able to do that makes a positive difference and doesn't disrupt the work too much. You don't need 20 actions, 25, or 30 actions. In fact, if you do 10 things and things get seemingly better, you still don't know which of those 10 things actually helped or which didn't, so like that one idea.

Find something, do one thing about it, and stop there. With complex systems, I think we should be deliberately making probably only incremental change because we don't understand it well enough.

Drew: Yeah. Imagine if after every single workplace incident or accident, we genuinely made one thing better. That would actually be probably better on average than our current hit rate of spending a lot of resource on investigations and making quite a few recommendations.

David: Your final takeaway there, Drew?

Drew: The final takeaway is particularly about this idea of having systems that can prevent things slipping through the cracks. I'm not a big fan of safety management systems, but having your proceduralized systems does have some value in large organizations. It stops things slipping through the cracks, and at the very least it helps you defend and justify the decisions that you're making.

A system is never an alternative to good human decision making. Having a good system doesn't make you have better decisions, get every trade off right, or get your priorities right. It just creates a way of carefully documenting what decisions you've made and what priorities you've made. Lots of investigations have a huge faith in these systems being able to work. Then the next investigation just says, well, we had the system, and it still didn't solve the problem, so people must have had bad culture or not been following the system properly.

David: That long list of failings that you read out in this episode of several failures of risk management, several failures of audit, quality assurance, functional specifications, design reviews, and all of these systems that supposedly might have made a difference. But I agree with you. It's people that make decisions, not systems. We need to try to, I guess, help and support that decision making as much as we can. Drew, the question we asked this week was how should we account for technological accidents? Your thoughts?

Drew: The particular view in this paper is accidents happen because people don't have perfect knowledge. In the absence of that perfect knowledge, they're going to make reasonable decisions that in hindsight turn out to be accidents.

David: That's it for this week. We hope you found this episode thought provoking and ultimately useful in shaping the safety of work in your own organization. Send any comments, questions, or ideas for future episodes to feedback@safetyofwork.com.