The Safety of Work

Ep. 100 Can major accidents be prevented?

Episode Summary

In our very special 100th episode, we attempt to answer our title question with a discussion around the book, “Normal Accidents: Living with High-Risk Technologies” by Charles Perrow. This book was first published in 1984, but later editions were released in 1999 and beyond.

Episode Notes

The book explains Perrow’s theory that catastrophic accidents are inevitable in tightly coupled and complex systems. His theory predicts that failures will occur in multiple and unforeseen ways that are virtually impossible to predict.

Charles B. Perrow (1925 – 2019) was an emeritus professor of sociology at Yale University and visiting professor at Stanford University. He authored several books and many articles on organizations and their impact on society. One of his most cited works is Complex Organizations: A Critical Essay, first published in 1972.

Discussion Points:

David and Drew reminisce about the podcast and achieving 100 episodes
Outsiders from sociology, management, and engineering entered the field in the 70s and 80s
Perrow was not a safety scientist, as he positioned himself against the academic establishment
Perrow’s strong bias against nuclear power weakens his writing
The 1979 near-disaster at Three Mile Island - Perrow was asked to write a report, which became the book, “Normal Accidents…”
The main tenets of Perrow’s core arguments:
Start with a ‘complex high-risk technology’ - aircraft, nuclear, etc
Two or more values start the accident
“Interactive Complexity”
787 Boeing failures - failed system + unexpected operator response lead to disaster
There will always be separate individual failures, but can we predict or prevent the ‘perfect storm’ of mulitple failures at once?
Better technology is not the answer
Perrow predicted complex high-risk technology to be a major part of future accidents
Perrow believed nuclear power/nuclear weapons should be abandoned - risks outweigh benefits
Three reasons people may see his theories as wrong:
If you believe the risk assessments of nuclear are correct, then my theories are wrong
If they are contrary to public opinion and values
If safety requires more safe and error-free organizations
If there is a safer way to run the systems outside all of the above
The modern takeaway is a tradeoff between adding more controls, and increased complexity
The hierarchy of designers vs operators
We don’t think nearly enough about the role of power- who decides vs. who actually takes the risks?
There should be incentives to reduce complexity of systems and the uncertainty it creates
To answer this show’s question - not entirely, and we are constantly asking why

Quotes:

“Perrow definitely wouldn’t consider himself a safety scientist, because he deliberately positioned himself against the academic establishment in safety.” - Drew

“For an author whom I agree with an awful lot about, I absolutely HATE the way all of his writing is colored by…a bias against nuclear power.” - Drew

[Perrow] has got a real skepticism of technological power.” - Drew

"Small failures abound in big systems.” - David

“So technology is both potentially a risk control, and a hazard itself, in [Perrow’s] simple language.” - David

Resources:

The Book – Normal accidents: Living with high-risk technologies

The Safety of Work Podcast

The Safety of Work on LinkedIn

Feedback@safetyofwork

Episode Transcription

David: You're listening to The Safety of Work podcast episode 100. Today we're asking the question, can major accidents be prevented? Let's get started.

Hi, everybody. My name is David Provan. I'm here with Drew Rae. We're from the Safety Science Innovation Lab at Griffith University in Australia. Welcome to The Safety of Work podcast. Welcome, Drew. Welcome to our 100th episode.

In each episode, and today will be no different, we ask an important question in relation to the safety of work or the work of safety, and we examine the evidence surrounding it. Drew, 100 episodes feels like a good milestone in terms of statistics, because in safety, we love statistics.

Drew: We've had three years without a fatality, David.

David: Oh, we're three years fatality-free. Get a sign up outside the studio. In almost three years, a quarter of a million downloads or so, 120 countries. I think we've got a bit over 5000 followers on our community on LinkedIn. It's really cool that the podcast has seemed to be something that is in some ways useful to maybe safety professionals and others who think about this stuff in organizations and universities more broadly.

Drew, from me to you, thank you very much, personally. I greatly appreciated the support and guidance you provided to me during my PhD. I'm just so pleased that we found a way to continue to find ways to collaborate. Thanks for being you.

Drew: It's been lots of fun and good discipline. I don't know how many of our readers have read the papers that we've referenced on each episode, but at least we've read the papers we've referenced on each episode. It's been good to keep our reading and discussion of both recent stuff published and as we've done a little bit more recently, going back to some of the classics and giving them a reread.

David: There's a lot of pressure to think about, what are we going to do for episode 100, but true to our purpose for this show, it's about the science of safety. So we wanted to try to find maybe a central question in safety. We landed on this idea of, can major incidents be prevented? This idea of all accidents can be prevented.

We wanted to get into that question. I think we've found a way of doing that. We're going to talk about a theory book today. Drew, I want to just talk a little bit about safety science or theory more broadly and how we should think about something because you and I are going to have some different opinions of what we talked about today in some ways. Do you want to just talk a little bit about when one person comes up with an idea?

Drew: Sure. I don't think it's ever the case that anyone comes up with a safety theory in a vacuum. Everyone who comes to safety is always steeped in some sort of literature and research understanding that they bring into the topic. More recently, when people do safety work, they're steeped in the history of previous safety work. But safety science hasn't actually been around that long.

Particularly in the 70s and 80s, we had all of these people coming in to safety from outside. We had people from psychology, like James Reason and Nancy Leveson. We have people from sociology, like Barry Turner, Nick Pidgeon, and the other we're talking about today, Charles Perrow. We had people coming from management and engineering, like the HRO scholars who followed on behind Charles Perrow's work.

So no one comes out of nowhere, but everyone is responding to what is important and salient at the time that they start looking into it. I think people's ideas about what causes accidents get very heavily shaped by the first couple of accidents that they look at.

If you come at safety and the first thing you read is The Challenger investigation, that gives you one view about how accidents happen. You come and look at something like Three Mile Island. You get a very different perspective.

David: I think that's important. It's important to understand these theories out, whether people are trying to make sense and patterns in the world or trying to make up for things that occur in the world which don't seem to be explained by other theories.

I liked this work. I think this is probably an understated work in safety as hopefully we can talk about, but we are going to talk about a Charles Perrow book titled Normal Accidents: Living with High-Risk Technologies. First published in 1984, it's about 450 pages long. It's not a small book, but it's quite an easy read.

Drew, I don't know what version you're working off. I'm working off a 1999 print which has got a few extra afterwards, and some interesting stuff about the Y2K bug if people remember what was happening in 1999–2000. It's a book that I think is well worth a read from anyone who's interested in safety.

Before we dive into it, I think Charles Perrow was born in 1925. He unfortunately passed away a few years ago. His last post was as an emeritus professor of sociology at Yale, and he was also a visiting professor at Stanford. He was primarily concerned with the impact of large organizations on society. Most of his work was on the sociology of organizations, complexity. He had a very long academic career. I think he got his PhD in 1960.

His most widely-cited work is a publication titled Complex Organizations: A Critical Essay from the early 1970s. Even though he went on to publish about Fukushima and another book called How Catastrophes Happen, most of his work, he continued a lot of work outside of safety. I don't think he would have ever really considered himself a safety scientist.

He published on climate change, on politics, on the economy, on social challenges like conservative radicalism. He published his last peer-reviewed publication in 2013 when he was about 88 years old. Maybe we might be podcasting still in 40 or 45 years time. Who knows?

Drew: I'd like to think we had careers that long. I'd also like to hope that generating a little bit of controversy is a good way of sustaining a career. Perrow definitely wouldn't consider himself a safety scientist because he deliberately positioned himself against the academic establishment in safety. He had some knock down drag out fights with some of the key figures around his time. He even talks in the book about a review he gave of Paul Slovik's work on risk.

We've touched a little bit on Slovik I think in a previous episode, David. Slovik was talking about the difference between expert perceptions of risk and lay perceptions of risk. Perrow was very much of the idea that expert opinions were just as socially constructed and just as subject to bias as laypeople's, and was very skeptical of things like quantitative risk assessment, particularly as applied to things like nuclear power.

David: You mentioned nuclear power there, Drew. I think I want to throw it out to you early on, because this book fell out. I will use the word fell out. I'll talk about that shortly, but it fell out of the Three Mile Island incident in the US in 1979. Nuclear power and nuclear weapons are a fairly central theme of Perrow's argument. Do you want to maybe make some opening thoughts about that and then we can step off from there.

Drew: I thought it might be worth early on, noting and then putting aside some of Perrow's ideological bent. It's funny. For an author who I agree with an awful lot about, I absolutely hate the way all of his writing is colored by what I think can only be fairly called a bias against nuclear power. This entire book is as much an argument against the adoption of nuclear power as it is a theory about why accidents happen.

Perrow is constantly struggling to explain what makes nuclear power special compared to other industries. It involves all sorts of weird special pleading. You're setting up a definition and then subverting his own definition when he comes to apply it to nuclear power.

Throughout his career, Perrow has constantly come back to nuclear accidents as both the thing that drew people's attention to his work on safety and his ongoing argument against the establishment, particularly his persistent attempts to claim that he predicted future nuclear accidents, even though most of the things that he predicted never actually came true. He has a whole chapter on why there haven't been more nuclear accidents.

Pretty much, his main argument there is, oh, it hasn't been around long enough. Just give it 20 years and there'll be heaps of accidents. It just never came to pass. Even the scale of future accidents was something that he's constantly re-arguing.

There is a reading of this book purely as this is Perrow's fight against nuclear power. I think that's worth setting aside because a lot of his arguments really aren't about nuclear power, fundamentally. He just likes to treat it specially. Really what he's doing is introducing some things almost like co-inventing.

I can't see any sign that he was aware of Barry Turner's work. Barry Turner's book came out first, but this was in a time when personal computers were invented while Perrow was writing the book. He talks about how his work could spin up when he first got a personal computer. The ability of people at different sides of the world to encounter each other's work and understand where progress has been made relies on you knowing who else is working on the same things.

Although Perrow has a long, long list of references and he gives credit to (I think) a total of about 20 different graduate students who helped him write the book, a lot of the other work in safety, he just never encountered, which I don't think is his fault. Yeah, there are lots of people today who've never heard of Barry Turner.

Anyway, the boiling point I was getting at was that he independently invented a lot of foundational thinking in safety, that he wasn't the first to think of it, but he also did it without standing on the shoulders of other people who had those same ideas.

David: I think safety science, if we talk about it, there's a bit of a niche field today. Back then, it wasn't even so much as a field. He would have had to go looking for him, rather obscure human factors journals, cognitive systems engineering journals, or something like that, which, from his point of view, he was doing a sociological expose on Three Mile Island.

How this started is there's this incident—I don't know, Drew, you might have done a disaster cast episode on TMI—in 1979, a near disaster. Professor Cora Marrett was appointed to the President's Commission on the accident, and she also happened to be a board member of the Social Science Research Council. She was really wanting to make sure there was some social science input into what she felt was threatened to be a very entirely engineering-oriented investigation into the incident.

She reached out to a whole bunch of people, including Perrow, who she was aware of through sociology and social science. She said, I can provide you with all of the hearing transcripts from the three months of all of the interviews and all the hearing transcripts, and you've got three weeks to give me a 10-page report on how you see the social science side of this incident.

Perrow often (like you said, Drew) involved some of his graduate students. He went off and wrote 40 pages within those three weeks. He sent it back and he says he started this book without knowing that he was starting a book. Four years after he wrote that 40-page essay, it became the book that we're reviewing today. Is that how a normal book starts in the academic world?

Drew: Certainly in safety, it seems like a lot of books start off as 40 pages that someone stretches out to a book by adding in more examples and chapters to get it up to book length. It's something which I think Perrow definitely avoids in the sense that, even though he does repeat the same idea over multiple examples, every chapter brings a whole heap of new information and new analysis, and even new theoretical ideas come out in each chapter.

David: Drew, our listeners now, after 10 minutes would probably go, woah, you got 450 pages. This is a long listen. We ask them most of our time on the introduction and encourage you to just explore these ideas for yourself. What Perrow's basically saying is, this is the late 70s. It was the early 80s. He's saying, there's a huge growth in high-risk technologies. The technology is multiplying. He talked a lot about wars multiplying through.

He'd leveled nuclear weapons alongside nuclear power as well. He said, we're invading more of nature, we're creating these complex systems, we're creating organizations within organizations. What he wanted to do is look at a number of these types of systems. Nuclear power, we've mentioned. He looked at petrochemical plants, aircraft, air traffic control, ships, dams, mines, nuclear weapons, and what he called exotic technologies, so space missions, DNA, genetic engineering, those types of things.

He's saying that every year, there are more of these such systems, and Perrow suggested that this is bad news. I can see what you're seeing, the huge growth in technological systems, the social impact, because he was quite socially-orientated as well, which you mentioned about nuclear power. But even more, he was quite critical of society not getting a choice in the adoption of these types of technologies that favored the elites within society.

He just said, this is not a good thing. If we want to have all these technologies or if people think that we should have all these technologies, then we need to be prepared for the disasters that are going to come.

Drew: He's got a real skepticism of technological power. In particular, the idea that you could have a system that society might not otherwise accept, that is assured, and society is told that it's safe using obscure methods, people, and regulatory regimes, that society is just expected to trust. I don't think he quite gives enough credit to just how transparent some of these processes are. He was writing this after the WASH-1400 report and all of the criticisms of that report.

The idea that risk assessment of nuclear power stations needed to be more transparent and needed to be more honest about its uncertainty was something that was already out there in the public domain at the time when he was doing this writing. But he was still very skeptical about this idea that experts know best and that expert opinions of risks should be given precedence over social understanding of risk.

David: What Perrow says earlier in the book is what motivated the inquiry in the book was the idea that if we can understand the nature of risky enterprises better, we may be able to reduce or even remove the dangers. Even though he says throughout the book that we could do this, I don't think it's going to happen.

He also went on to take a first shot at more conventional safety management. He says, there are many improvements we can make to improve safety—operator training, better designs, quality control. He basically said, people are working on these things, but the risks appear faster than these risk reductions can help.

What he basically says is that no matter how effective our conventional safety approaches are, there's a form of accident that is inevitable. He believed there were some special characteristics of the way that failures can interact within these systems, but we should be able to understand them, to understand how they can occur, and in his words, why they always will.

Drew: I think it might be worth at this time just laying out the main central steps in his argument, just so the listeners have some idea of just what he's actually concretely claiming here.

David: I'll layer this core argument and then a quick example, Drew. I'm keen for your thoughts on it. He says, you start with a petrochemical plant, a plane, a ship, a power station, one of these complex high risk technologies. It's any system that has lots of components. There are lots of engineered parts, there are lots of procedures, there are lots of operators, and what we need is two or more failures among these components that interact in some unexpected way. No one thought that when X failed, Y would also be out of order or be impacted.

These two or more failures would interact and then break the system. The example is, at the same time a fire starts and an alarm gets silenced. Further, no one can figure out this interaction in real time and respond accordingly. The problem never really occurred to the designer. If it doesn't, the next time, they will put an additional control in. This new control might solve one problem, but it might introduce three new failure points through these interactions.

He called this the interactive complexity of the system. He was leaning heavily on this idea of tightly-coupled systems, which had been in engineering for the last decade at the time of writing. Forty years later, if we think about the two Boeing aircraft crashes on the 737 Max aircraft, it laid out his argument.

He says this EnCase software system is going to fail. The pilot's not going to respond to how the designers would have expected them to respond. If you combine the failed system with the unexpected operator response, then those two components combine in a way that leads to the disaster. You can't prevent it the first time, you can only try to fix it the second time.

Drew: I want to be clear spelling out the second element here, because there are two paired concepts that he talks about. One of them is this interactive complexity. He makes the argument that only certain systems have this complexity. This is where his bias and some of his lack of understanding of different industries means that he's weird as which industry he spells out as having interactive complexity and which ones don't.

For example, he specifically says that aircraft don't have as much interactive complexity. He says that chemical plants don't have as much interactive complexity. I think future accidents have shown that this ability to not understand how multiple failures could occur and interact. It's pretty much true across any industry.

The second concept, he says, is tightly-coupled, which is really confusing because he's drawing on something which, in organizational theory Karl Weick had previously talked about. In engineering, we use the term tightly-coupled as well. Perrow doesn't actually mean either of the existing definitions of tightly-coupled. What he's talking about is something that (I think) today we think of more as brittleness.

For him, a tightly-coupled system is one where there's very little margin, either margin in extra resources, margin in flexibility, or margin in time. He says you can have a very complex system, but as long as you have got what we now call resilience, it's okay, because you've got time and you've got space to work out what's going on and adapt.

It's when you've got everything is very time-sensitive and everything is very closely linked, so that one step inevitably leads to the next step. You've got no flexibility of action. It's when you've got both of those things being high. That's when you have a really high potential for an accident. The real funny thing here is just which systems he singles out as having this, and when it does and doesn't lead to an accident.

Three Mile Island is a perfect example of a system which had interactive complexity but actually had plenty of time. That's why the accident didn't in fact happen. Three Mile Island was an incident, not an accident, precisely because the events extended over several days, which was enough time for people to react and work out what was going on, adapt, and fix the situation.

David: I believe that you're right. If we've got enough time, if we've got enough information, if we've got enough resources, we can solve any operational problem. It's when we get constrained by time, information, or resources, where we can't stay ahead of the risk.

My only comment about the classification of industry for Perrow is I think it's a typical sociological classification. He was just looking at what systems involve more people interactions as opposed to interactions themselves.

A plane seemed very simple to him because you got two people in the front. Whereas air traffic control, communication between the control, the ground, and the pilots, seemed much more complex to a sociologist.

Drew: That's something that I think is really interesting, and one of the reasons why this book rewards reading it rather than just listening to a quick summary of it. Was Perrow one of the very first thinkers—pretty much I think apart from Turner—to not stop at the technical level when analyzing a system?

He's quite inconsistent about when he does it and when he doesn't do it, but I think that's just because he's one of the first people feeling out these ideas. He says some of the complexity isn't in the technology. The complexity is in the organization around it.

But at one point, he makes the argument that aircrafts are not so complicated, even though they're very high tech because they will regulate it, and the regulation system works fairly well. One of his arguments against nuclear is that he thinks that the regulatory structure isn't effective, so he moves in and out of technical complexity versus organizational and effective management of the technology.

David: I think you're right. It's great to see and I agree to say it's reading because moving beyond either the mechanical failure or the operator failure into the organization is a great contribution of this work.

He opens in the first few pages of that book. It's a long story about everyday life, but he basically talks about this event where I need to go to a really important business meeting. I think he says a job interview. He goes, but I've locked myself out of my apartment because I've raced out without my car keys. I normally have a spare set of keys, but I lent him to someone, so I'm locked out of my apartment.

I go to my neighbor and go, that's fine, I'll just borrow my neighbor's car, but my neighbor's car won't start. Then I go, okay, well, that's fine, I'll just call a taxi. But I can't call a taxi because there's a bus strike on that day, because the buses, ironically, are complaining about a safety issue and something else. So he misses this job interview.

Perrow asked the question, what do you think is the cause of this incident? Is it human error because you locked yourself out of your apartment? He says, if you agree with that, then you agree with the President's Commission for Three Mile Island, which primarily blame the worker and the OEM, only blame the operators and no one else for the event. Or is it the mechanical failure, which is that the neighbor's car won't start?

He said the company operating TMI blamed the individual valves and sued the supplier of those valves. He goes, or do you talk about the environmental or the design of the system as a whole? What Perrow's saying is the cause is none of the above because we can deal with locking ourselves out of our apartment. We can also deal with a neighbor's car not working, and we can deal with a bus strike or a taxi.

He's saying, this idea of which one is it misses the point, because the failures alone are trivial, even banal. The idea is this perfect storm when the system design all falls apart at once. Drew, I thought it was quite an easy to understand story about what his argument was.

Drew: I think this raises one of the key questions in safety, and in fact, the question of this episode, that it's inevitable that the individual failures are going to happen. In any sufficiently complex system, you're going to have lots of individual failures. It's inevitable that some of those failures are going to interact in dangerous ways.

The question is, can we either predict or otherwise prevent the dangerous combinations from arising? And different people have got different answers to those questions.

You have the safety engineering approach, which says that with sufficient engineering analysis, we can work out what the dangerous combinations are and we can make sure that they don't happen.

The particular approach that they use in nuclear which Perrow is very critical of is defense in-depth, where we build multiple layers so that if one layer fails, we've always got other layers to catch. Perrow rightly argues, yeah, but that assumes that each layer is independent, and they're not. There could be things that you haven't thought of that cause those layers to interact with each other.

You've also got things like the HRO argument, which says we can design systems that can be robust even when multiple failures happen. Perrow's answer is, in certain industries, because of the level of interactive complexity and tight coupling, it's inevitable that we're not going to successfully prevent unsafe interactions. That's where the whole idea of normal accidents is, that it's inevitable that we are going to fail.

David: Yeah, this point of multiple failures. We've got the language for this now in risk engineering, independent protection layers, and Perrow's argument that locking yourself out should be independent of the neighbor's car, which should be independent of the bus strike, but nothing stops these things from failing at the same time.

Drew, I'm interested if you know any link between Reason’s work in Perrow, because Perrow says in this book that accidents are the result of multiple failures. A couple of years later, we got the Swiss cheese model, which was this idea of multiple causation of incidents. By now, I guess there is progress in Perrow's idea into safety science if they were, in fact, his ideas.

I tried to actually bottom out some of the starting points of these things, but it's actually a very hard thing to do, which is to try to find when the first appearance of a particular idea seemed to turn up.

David: I did a little bit of a deep dive into this one, David. By the time that James Reason wrote his book on human error in 1990, the idea was swirling around the zeitgeist in a number of different ways. Reason's book is actually dedicated to Jens Rasmussen. He cites Dave Woods multiple times in his book. Woods, of course, built on the work of both Rasmussen and Turner.

The idea of organizational accidents and man-made disasters through the organization rather than through the individual components or human error was very widely spread. But the difference is that most of those other authors weren't really interested in individual human error rather than technological failure. Reason pulled it back into the area of individual psychology and talking about system value in contrast to human error. Whereas previously, people have been talking about system value in contrast to component failure.

David: These are the collection of ideas at the time. We even see it now. If anyone tries to unpick Safety II, Resilience Engineering, Safety Differently, and Human Organizational Performance, it will be very hard to figure out where these ideas are, because you can read this book around Normal Accidents and the role of the operator error. It can feel like you're reading something of Sidney Dekker's work in this book, which feels like you're reading someone else's work. It's hard to know only that, science does what science does. It builds on the ideas of other people and extends them a little bit.

Drew: You can sometimes track it either through direct citations or through the way people use particular ideas and particular language they use to refer to it. You can tell from the term ‘tight coupling’ that Perrow draws on organizational theory and the same space that Karl Weick was operating in. You can tell directly that because Reason cites Perrow that Reason at least knows that Perrow exists and has read his work.

It's telling that Reason only talks about Perrow in the context of Three Mile Island. When he's generally talking about systems accidents, he talks about other authors. You can read between the lines and say, okay, Reason was aware of Perrow's work, but likes these other explanations more when it comes to system accidents.

David: Drew, what I thought we'd do in the interest of getting this podcast out at some reasonable timeframe is there are about five topics that are introduced in the introduction that I think are worth having a brief talk about. Then I thought we'd just overview each of the chapters and then just pick off what Perrow actually thinks is the way forward from there. Maybe if I start.

One of the first things is that Perrow started with this idea about operator error. He said, virtually, every system places operator error really high on the list of causal factors for accidents. All of these industries are saying if there's a problem, he says 60%–80% of the time, this is labeled as operator error.

His view is that we shall see this time and time again, that our operators are confronted by unexpected situations. If you say that they should have zigged when they zagged, it's only possible to actually make this judgment after the fact. He pretty much discounts this idea of safety in these systems is a bigger issue than operator error.

Drew: I think that's one of the most important parts of Perrow's analysis of Three Mile Island. Effectively, he goes through the details and basically says, based on the information that they had in front of them, the operators took reasonable steps to prevent an accident at the same time as they were taking exactly the right steps to cause the accident, that understanding that, in hindsight, you can say they did the wrong thing. But if you look carefully at what they knew, they should have done exactly what they did. That's a paradox that someone trying to explain an accident needs to explain.

David: This was done in the 80s and 90s in simulated nuclear control room environments in aircraft simulators, so we understand this now.

The second point that he said is that great events have small beginnings. This idea that what may be just a trivial mishap on a ship or a plane in a nuclear power plant, small failures abound in big systems.

He's saying aircraft aren't crashing because the wings are falling off the planes, but it's a combination of individually quite benign and trivial types of situations that combined in an unforeseen way and result in the incident. It might be one sensor in a control room not working, which doesn't alert the operator to something which then flows into an overfill situation. It's this sort of, what would be labeled in routines engineering more as a cascade? This idea in complex systems that a butterfly flapping its wings creates a tidal wave on the other side of the ocean.

Drew: David, the one other thing I'd throw there, though, is that he makes a distinction between cascade accidents, where each one of these small things causes the next thing, and combination accidents where two things are not directly caused. They're linked by some unknown behind-the-scenes.

He says that these really dangerous things are when two small things happen that aren't linked in a way that the operators would readily recognize them as linked. Two lights go off on the display, and they're two totally separate systems. What you don't know is that they both happen to be on the same circuit, which has caused both lights to blink.

David: The third point in the introduction is he goes straight into his sociological and organizational domains and talks about the role of organizations and management in preventing failures or causing them. He says, we talk a lot about hardware, we talk about temperature, and we talk about acute physical conditions of systems. He says that high risk systems have a double penalty because accidents stem from failures closest to the system.

Operators have to take this independent and creative action to respond to these situations. But because everything is tightly-coupled, what organizations are trying to do is tightly control everything that operators do, because the organization knows that operators are not aware of the broader functioning of the rest of the system. He says there's a bind here because an organization can't be both controlled centrally and decentralized at the same time. He says organizations are pushing and pulling at the same time.

Perrow suggested that time and time again in organizations that the warning of problems are ignored, unnecessary risks are taken, sloppy work is done, and then he went on to talk about deception and downright lying, this practice inside organizations. I found this section just a bit of a muddled stab at just organizations and management, but I don't know what you took out of this part of the intro.

Drew: Partly, what Perrow was doing was responding to criticisms of his own work while it was in its embryonic stage, and also responding to a lot of the defense's that people made for nuclear safety after Three Mile Island. I read this book a little bit similar to John Downer's Disowning Fukushima, where John Downer goes through people and tries to say, oh, the next accident won't happen because of this, this, this, and this.

One of the things that people were saying is, after Three Mile Island, we're going to have better regulations to stop this. Maybe if the government takes a closer direct control over nuclear power plants, that will be the solution. What he's doing is he's responding to that suggestion and saying, be careful. It's like a balloon where you squeeze down on one part of the balloon, the next part of the balloon squeezes out. Try to grab too tight a hold over the complexity by putting in centralized control, and you just make your system more tightly-coupled.

David: He’s just speaking so passionately today about nuclear power. Maybe we could do the John Downer episode next time. Does distance really create difference? Or something like that. I know you know that paper very well because you use it in the master's in graduate program at Griffith.

Drew: I hadn't actually realized we hadn't done an episode on John Downer's work. We've got to do that.

David: We might do a little nuclear series.

The fourth point here is that better technology is not the answer. What Perrow's saying here is that if we say that better operators aren't the silver bullet, then we probably must also agree that better technology is not the answer either. Even though Perrow points out that this book is all about technology, he also quotes this idea. In the quote, "A man's reach has always been beyond his grasp." Just as an aside, Drew, when Perrow publishes this, he quotes that, "A man's reach has always been beyond his grasp. [That goes for women, too.]”

I don't know if you realize, but whenever Perrow talked about operators in this book, he referred to the operator as she or her. I just noticed that immediately and is refreshingly progressive for a text of this nature in the 1980s to intentionally label all of these domains in the feminine.

Drew: I need to go back and have a look. I'm reading a different version than you, so I'm wondering if that's one bit that got updated. I'm wondering if he also pulled out all of his climate change skepticism. He's got a whole section in the 1984 bit that is talking about how nuclear power gets defended based on fears of climate change. He doesn't totally say climate change isn't real, but it got lots of ifs in there, if this turns out to be true.

David: He was very clear, though, what we'll get to at the end around nuclear power is that the benefits of nuclear power far outweigh the risks involved. Perhaps, the risks are seen as different now.

The last point here is that he talks about the issue is not risk but power. This is his background to the sociological aspects of technology adoption, capitalism, and the social construction of communities and societies. He foreshadows the risk here and the rise of the risk professionals. He suggests that it would be dangerous to let the risk assessors or the risk managers basically provide the advice and direction for how to manage these technologies.

He devotes a whole chapter to this new profession. He labels them as this idea of risk assessors using body counting to replace social and cultural values, and that these risk processes exclude society from participating in decisions that a few people who'd benefit have decided that the many cannot do without, so very socialist.

Drew: David, there's a paragraph I have to read to you from the introduction. Readers might have got a bit of a sense so far that I'm not Perrow's biggest fan. But for someone that I don't like, there are a lot of areas where we're in total disagreement. Here's a paragraph for you, David.

"One last warning before outlining the chapters to come. The new risks have produced a new breed of shamans called risk assessors. As with the shamans and the physicians of old, it might be more dangerous to go to them for advice than to suffer unattended.

In our last chapter, we will examine the dangers of this new alchemy, where body counting replaces social and cultural values, and excludes us from participating in decisions about the risks that a few have decided, the many cannot do without."

David: Do you agree or disagree, Drew? I'm not quite sure.

Drew: There are so many bits there that just sound like my own paper titles.

David: Okay, I think you said disagree at the start. I think you said for someone who I don't like the work, I disagree, but I think you meant to say agree.

Drew: No, I disagree a lot with Perrow, but there are so many areas like this that I just want to say, yeah.

David: I think in terms of your work in risk, prohibitive blindness, and the cracks in the crystal ball—for those listeners, we don't think we've covered a lot of your work in that space—a dinner party with you and Perrow talking about risk would be a fascinating conversation to be a fly on the wall for.

Drew: I think I agree 100% with Perrow's stance that the risks of nuclear power in terms of likelihood are drastically underestimated by the published risk assessments. I think he's absolutely right that at the time he published this, nuclear accidents were just far more common than anyone was willing to admit. The likelihood of nuclear accidents was far higher than any of the risk assessments said.

Where I think he is empirically wrong and has just been proven to be so by history since, is that the size of the accidents is much lower than the critics of nuclear power like to suggest. Nuclear accidents are actually a relatively common, not particularly harmful event. Whereas the people who want to defend it want to say it's not common, the people who want to attack it want to say it's catastrophic. And it's neither catastrophic nor rare.

David: Over time, we get the benefit of a little bit more time since he was authoring this, but I think he's doubling down on it. It isn't necessarily the greatest thing for a science person to do when there's new information available.

Let's talk about these chapters, Drew. There was an interesting thing as I read through these chapters in the technologies that he chose and the time that he published it.

The next chapter of the intro is titled Normal Accident: Three Mile Island. He goes on to talk about the TMI accident. He has this chapter and if that's not enough, this chapter titled Nuclear Power is a High Risk System. The idea is why we have not had more Three Mile Islands, but we will soon. Two years later, we did have Chernobyl, which was the largest nuclear disaster that we've had.

I suppose it's all debatable. Some people think that there were 4000 relatable deaths. But then he went on to talk about complexity, coupling, and catastrophe. Even though he said he was going to talk about these different industries, he has these three chapters where he just really tried to double down on this nuclear situation, and then I'll run through and get your thoughts.

He talked about petrochemical plants. In December of 1984, we had Bhopal, so 4000–16,000 deaths. He talked about aircrafts and air traffic control. The next year in 1985, Japan Airlines 123 happened. I think that was the largest incident in terms of the fatality count of Civil Aviation with 520 people killed. He talked about marine accidents at the time, Exxon Valdez came a few years later, earthbound systems, dams, earthquakes, mines, lakes.

And then, these what he called exotics—space, weapons, DNA. Two years later, we had Challenger. If you pick all the risky technologies, then you're bound to be a little bit right over the next couple of years of things happening, but he was eerily predictive of these big events in the next couple of years.

Drew: Nick Pidgeon published an article in 2011, which was a retrospective of Normal Accidents. He basically starts off saying, in publishing much hinges on timing, so it was with Charles Perrow's influential book, Normal Accidents. Its publication in 1984 was followed by a string of major technological disasters, and goes on to list the ones that you just list, as you each cried out for the detailed analysis that Perrow supplied. I think pretty much all of those did-get-a-really-in-depth-Perrow-style treatment, which maybe he in turn almost started a trend of this deep sociological analysis of individual accidents.

David: And I think then, by the time we got to Challenger, Diane Vaughan's work and some of the other work, I think it did spawn a new level of detail. Maybe we can put all of that back to the foresight of Cora Marrett who was appointed to this commission and said, we need a social science perspective on this and drew that into the Three Mile Island incident.

If you talk to Professor David Woods, he would say that Three Mile Island also spurred the whole resilience engineering to come in the 1983 Conference, which got Reason, Hollnagel, Woods, and all of them together to start that field that became Resilience Engineering. And also the funding that came along with it with the book in 1994, the book that came out, Behind Human Error, which was funded based on that trend at the time.

Drew, all those chapters are actually really good reads because it's just littered with disasters and Perrow's analysis of that. Some of that analysis was a little bit of Professor Andrew Hopkins' writing in terms of this retrospective sociological perspective where you can just write the disaster, review in the way that you want them to sound based on the theory that you're trying to put forward. They are very easy to read and interesting reads in each of those chapters.

Drew: There are a number of accidents I'd love to take back to Perrow and get him to comment on. For example, he talks about dam disasters being really simple, technological failures. I'd love to get him to have a look at the Brisbane floods and the [...] or some of the Chinese dam failures. I don't think that's a criticism of Perrow. I just think his analysis could actually extend to a broader range of systems even than he seems to think.

David: Let's move on. In chapter nine, he talks about living with these high-risk systems. He basically says, by the time you get to page number 300 and chapter nine, his view is that readers should have one question in the back of their mind. For me, it was at the front of my mind. His question is, what is to be done?

Perrow claims to have a modest but realistic proposal. He also claimed that it's not likely to be followed, because he's proposing three categories of systems or three categories of high-risk technologies. He's saying there are some risky technologies. I think he even called them hopeless technologies that we should abandon. He only lumped two in there. He said, nuclear power and nuclear weapons, we should abandon those technologies. The risks far outweigh the rewards.

He said the systems which we need, that we could make them less risky with considerable effort. He considered marine to be a system that requires considerable effort. Or where the benefits are so great that we should in fact take some risks, but not as many as we are now taking. For example, he gave DNA research.

Then he said there are some systems that could be further improved with much more modest efforts, and we should take those. We love petrochemical plants and airlines into the systems that we could actually just do a little bit in and make them better. What are your thoughts on his categorization system?

Drew: Of course, I'm going to disagree with his categorization of particular industries. What I think is really fascinating, particularly given when this was written, is what he actually means by making the industries safer, because you'd think when people use language like that, they're talking about things like greater regulation, better risk assessment, stricter regimes, but he's not.

His idea for making systems safer is to do what you can to decouple them and to do what you can to make them simpler, both in the technology and in the organization. His ideas are limited a little bit just by his understanding of what future technology was going to look like.

For example, his idea of decoupling the aviation industry is scary. It's like, leave it up to individual pilots to work out when they get to land at the airport. Actually, those are the sorts of things that people have talked about since in terms of using things like flocking behavior and self-regulation to manage flight paths.

These are not crazy ideas, but they do rely on a real deep trust in the technology that I think people might be a little bit more skeptical of than he was. He's not a 100% technological skeptic. He seems to think that a lot of problems can be solved by self organizing systems with the right technology.

David: We haven't talked on the podcast yet about polycentric governance, systems of commons, and things like that, but this idea that if you have people with the same goal, then you're better off letting them self-organize than you are about trying to control.

But there are a lot of assumptions with that process about information, about capability, about goal, hierarchy. I think it's a discussion for another day. I might be able to check in with some of Eleanor's work around that.

Drew: What I think is really interesting is that the people who came after Perrow and criticized him were the HRO people who were suggesting something which grew into resilience as a solution for Perrow’s talking about the inevitability of accidents.

Whereas, in fact, where Perrow thought that there was opportunity for safety improvement, he was pretty much talking about that same thing in improving safety, not by adding extra layers of protection but by working out ways to de-risk the entire system by building in more time, more resource, more capacity for responding when something goes wrong.

David: Perrow would detail three reasons why he feels that his recommendations will be seen as wrong. He said his recommendations must be judged wrong if the science of risk assessment as it was practiced at the time is correct. If risk assessment theory at the time suggested that what Perrow worried about the most, which was nuclear power and nuclear weapons, it hasn't done any real unintended harm to people. Although Perrow did plead that the risk assessment science deserved far greater scrutiny than it was getting, he said, if you believe the risk assessors, then you can't believe my theory and my ideas.

His second point—I'll do the three and I'll get your thoughts—was his recommendations could also be wrong if it's showing that they're contrary to public opinion and values. If he thinks that a technology is not necessary and society believes it is, then maybe that changes the weighing of how much risk people would be prepared to take. At the same time, he had a swipe at cognitive psychology which suggested that the general public aren't equipped to make good decisions about complex matters.

His third objection there that he cited is more basic to the theory of his book. He said that you could object by saying that there is a way to run these systems safely, that it simply requires more authoritarian, more rigidly-disciplined error-free organizations. These are his three views. The HRO theory is for the fourth, which says, no, no, no, there is a way of running these systems, but it's not in that third point, the way you describe. It's not bad for a theorist to actually say, this is what I think people are going to criticize.

Drew: I really liked the bravery in putting those things out. If he had stuck with them, I would have even more respect for Perrow, because step one is double barreled. He says if the risk assessors are right and history proves that this is not as dangerous as I think.

That's two separate claims. One of them, the risk assessors, was not right. Future research on risk assessment showed that the risk assessments were just as bad as Perrow was claiming, but future experience with nuclear accidents said that nuclear accidents didn't happen nearly as often as he was claiming they would, or as bad as if claiming they would either.

Perrow's response to that was then to write, oh, people are underplaying the nuclear accidents. He's basically denying the consensus on how many fatalities there were out of Fukushima and Chernobyl.

He's somewhere between saying that casually estimation is a social construct and claiming that there was a massive cover up. Whereas, if he was more honest, he'd just say, nuclear accidents didn't happen at the rate I was expecting. New generations of reactors did turn out to be safer, even though I was saying that that wasn't possible. As plants age, no, they didn't start experiencing as many disasters as I thought they would, given that I was looking at an immature technology. His first point is like half right, half wrong.

David: I think you're right. He did say in the addition that I've got that in the 15 years since he published the performance of things like new generation aircraft, and like you said, things have changed a bit.

I want to ask one question because at the lab, Ben Hutchinson has been doing some work on fantasy planning in safety. There's a whole section in the afterword of the version I've got on fantasy plans, Clark's work, and things like that. When are we going to be ready to get Ben on the podcast and talk about fantasy plans?

Drew: Ben does have some published work. He's got a couple of works that are in the publication stage. I think we want to wait for the next ones to get through peer review before that. It's interesting that Lee Clark gets a credit even in the 1984 version. He's one of the little post grads who were running around doing work for Perrow. It's interesting how this generational stuff happens.

David: For anyone not knowing these fantasy plans, the example mainly cited in the book is the Exxon Valdez. The plans around the capability of the government in the industry to clean up an oil spill were just fanciful. They'd never done it. They never even had access to it.

If you read the plan of what they proposed to do, you would never have even thought it as plausible on what they said that they were going to be capable of doing. I'm looking forward to discussing that with Ben at some point.

Drew, is there anything else you want to talk about before we jump into practical takeaways?

Drew: No. I think the takeaways are a good point to look at how much we can talk about today from something that was first published so long ago.

David: I don't quite know how we want to tackle this, but I think this idea is that there are no simple fixes. In complex, high-risk technologies, what Perrow proposes is more technology is not the answer.

In fact, he goes on to talk about, actually, there's some work at a time that suggests that adding more technology is at best risk-neutral, which is probably against our thinking around the hierarchy of controls which is around engineered solutions, but his view was that continually adding more technology in the belief that we're adding more layers of defense were in fact adding exponentially more failure modes into the system.

This idea around technology, Drew, your thoughts on practically where do we go with technology and engineering controls?

Drew: I think the modern takeaway of what he said is that there is a direct trade-off between adding safety through adding in more controls and decreasing safety by adding in complexity. In any industry and in any situation, that trade-off is going to have a sweet spot.

We can look at that with something like driving a car. In driving a car, there's obviously a lot of room for increased safety through increased automation of tasks that are very prone to human error, but there is a trade-off that humans’ understanding of their own cars is decreasing. The system itself is becoming more complicated as automated cars interact with automated cars.

The point there isn't that technology is not a solution, but that technology got these two edges to it. It adds safety at the same time as it adds complexity, which takes away from safety.

David: Technology is both potentially a risk control and a hazard itself in that simple language. The second point then beyond technology is that the answer (in Perrow's view) is unlikely to be taking the operator out of the system. Putting all your faith, he talks about this hierarchy. He talks about this hierarchy of designers to operators, taking the operator out of the system and putting all the power in the designers to design a system where the operator isn't required.

He talked about early unmanned space flight and the fact that the operators are really just there with a finger on an abort button, and that's about it. That was even after the chimpanzees had had a few gos of doing suborbital flights. Taking the operator system is not necessarily an answer. Maybe even beyond that, dumbing down the role of the operator may not necessarily be the answer either.

Drew: I think in modern terms, we'd say that human operators are a really important source of capacity within your system. That can be because of their ability to do things that you don't expect, which we might think of as risky, but when things go wrong that we don't expect, we need the operators there. He said even in really simple accidents, like the King's Cross fire, where reducing staffing on trains means that when you have a train fire, there are just fewer people around to help out and solve problems.

David: Also, fewer fatalities as well. These are all trade-offs, aren’t they?

Drew: I think Perrow's point is when you've got a system that is presenting danger to lots of lots of people, then the operators are only just a few extra bodies in the line, but a huge amount of extra capacity to stop when something's going wrong.

David: Great point. Then this idea, this practical takeaway about being wary of power, hierarchy, and implications for safety. I do quite like that quote about it. Whether you agree or disagree, but the idea that power is an issue in safety may be more important than risk.

Perrow cautions that authoritarian regimes are not the answer, that operators need support, enablement, capability, freedom of action, some of those early indications of that. We need to understand how power is at play, who gets to decide what, and how our organizations function.

Drew: I have really mixed feelings about Perrow's arguments about technological power and the power of expertise. But the one bit that I think I'm very happy to take as a takeaway is we don't think nearly enough about the role of power when we do things like risk assessments, approvals, and regulation. We claim that things are safe for other people even though we've decided the risks, we've decided it's acceptability, and other people are the ones actually facing that risk.

David: I think as Sidney Dekker would say, who gets to decide? Culture work is not so much about who gets to decide about the risk, but I think that's an important dilemma for organizations to think about just in a normal day-to-day of operations that management gets to accept the risk for a risk that workers work with.

Drew, the last point I had here and I'm interested more from you, maybe the answer here lies in simplicity, like you said, about how Perrow suggested that systems could be safer. Maybe we need to find ways to make things complex. There's a quote that I really liked. It says that, "Any fool can make a system larger and more complex, but it takes a genius to make something smaller and simpler."

David would say in relation to the space shuttle program after the Columbia incident, with all of the way that the shuttle construction program, the operations program was run, the contracting, and everything that was going on, he said, “Why would you want to run into a wall of complexity? Because it hurts when you do that.”

Haddon-Cave, after the Nimrod inquiry, one of his four principles in his lips conclusion was that from a Ministry of Defense point of view, simplicity was the answer for why they would have built so much complexity into the way that they run that organization. In everything that we've done in safety of work and decluttering, simplicity has been core to a lot of that thinking without us expressly saying it.

Drew: David, I think I should give a shout out here to Dr. Ben Seligman out at UQ. I don't know if he's listening. He's got an idea that we shouldn't be even assessing risk at all in the current terms. What we really should be doing when we do assessments of systems is assessment of complexity, because that's really what drives safety risk. As the complexity of the system increases, the safety risk increases. We don't know exactly why or where, but that's the whole point.

We should be creating incentives for people to reduce the complexity of their systems as much as possible. Put downward pressure on complexity, which otherwise is just constantly going up. Perrow makes the argument that safety puts upward pressure on complexity instead of downward pressure.

David: I think from my perspective, what comes with complexity is uncertainty. I'm a big believer that uncertainty is what we should be worried about as opposed to risk, because we can manage high-risk situations really well if we understand the nature of the hazard and how the hazard can manifest into the risk.

People do live-line, high-voltage electrical transmission work every day. Very, very safely in the presence of more than 100,000 volts of live electricity. Maybe it's around complexity and the uncertainty that complexity creates. As Perrow would say, the ways that systems interact in ways that we don't understand. I think maybe these ideas of complexity, uncertainty are more important in safety and safety science than absolute risk.

All right, the question that we asked this week for episode number 100 was, can major accidents be prevented?

Drew: You want a short answer to that question, David?

David: Yeah, why not?

Drew: We've had decades of debate ever since about it. I think the obvious answer is not entirely. It's an ongoing project just to understand why.

David: Why can't we?

Drew: Yeah. The moment when we fully understand why we can't prevent all accidents, we'll be able to prevent them. Until we have that perfect understanding of what creates accidents, we're going to keep arguing every time the accidents happen about whether we can prevent the next one. We'll be wrong every time. We can only hope to be a little bit less wrong with each generation.

David: I think so, by the sheer nature of complexity means that the complexity scientists would have a view that you can never fully understand a complex system. At least, that gives us lots of reason to do another 100 episodes, Drew. Maybe we should target 200 episodes by our five-year anniversary. If we punch them out at once a fortnight for the next two years, around this time in 2024, we should be doing episode 200. What do you reckon?

Drew: It sounds like a plan.

David: All right, that's it for this week. We hope you found this episode thought-provoking and ultimately useful in shaping the safety of work in your own organization. Send any comments, questions, or ideas for the next 100 episodes to feedback@safetyofwork.com.