The Safety of Work

Ep.85 Why does safety get harder as systems get safer?

Episode Summary

In this week’s episode, we tackle an interesting conundrum in safety through a paper written by René Amalberti. The idea he poses is that aiming for zero errors in the workplace should not be the goal - in fact, some errors should be encouraged to ensure learning. The author also challenges the idea of continuously improving the safety of systems - stating that this could actually become detrimental to the overall safety of a workplace.

Episode Notes

Find out our thoughts on this paper and our key takeaways for the ever-changing world of workplace safety. 

 

Topics:

 

Quotes:

“Systems are good - but they are bad because humans make mistakes” - Dr. Drew Rae

“He doesn’t believe that zero is the optimal number of human errors” - Dr. Drew Rae

“You can’t look at mistakes in isolation of the context”  - Dr. Drew Rae

“The context and the system drive the behavior. - Dr. David Provan

“It’s part of the human condition to accept mistakes. It is actually an important part of the way we learn and develop our understanding of things. - Dr. David Provan

 

 

Resources:

Griffith University Safety Science Innovation Lab

The Safety of Work Podcast

The Safety of Work LinkedIn

Feedback@safetyofwork.com

The Paradoxes of Almost Totally Safe Transportation Systems by R. Amalberti

Risk Management in a Dynamic society: a Modeling problem - Jens Rasmussen

The ETTO Principle: Efficiency-Thoroughness Trade-Off: Why Things That Go Right Sometimes Go Wrong - Book by Erik Hollnagel

Ep.81 How does simulation training develop Safety II capabilities?

Navigating safety: Necessary Compromises and Trade-Offs - Theory and Practice - Book by R. Amalberti

Episode Transcription

David: You're listening to The Safety of Work podcast episode 85. Today we're asking the question, why does safety get harder as systems get safer? Let's get started.

Hi, everybody. My name is David Provan. I'm here with Drew Rae, and we're from the Safety Science Innovation Lab at Griffith University in Australia. Welcome to The Safety of Work podcast. In each episode, we ask an important question in relation to safety of work or the work of safety, and we examine the evidence surrounding it. 

In the last episode (episode 84), we discussed a paper by Daniel Katz, one of the foundational papers that shapes modern safety thinking, even though at the time it was very much a social psychology of organizations paper. For the response to that episode, it sounds like quite a few of our listeners are interested in hearing us talk about some of the older papers that we consider to be classics, required reading, or definitely shaping ideas about modern contemporary safety science.

Today, we have another one for you. It was suggested by Tom Lorenson on LinkedIn. Drew, how about we jump straight in the paper and then we can talk about more about the ideas as we go?

Drew: Sure, David. Let's do it. The paper is called The paradoxes of almost totally safe transportation systems. It was published in the Journal of Safety Science in 2001. The author was Professor René Amalberti. David, you've been to some of these resilience conferences. Have you met Amalberti?

David: No. In [...] in 2015, he presented by video conference, and he didn't travel to Kalmar in Sweden a couple of years afterwards. I don't think I've ever been able to be in the same room as him.

Drew: I've never met Amalberti. He's a bit of a pillar of the safety research community, but he's not as big a public speaker or self-promoter as some of the resilience people, which I don't mean in either a pejorative or positive sense. It's just that he tends to write more than he gives public talks. He's associated with over 300 publications, getting right back to the mid 1980s, around near the tail of Reason's career.

Amalberti is still publishing today. He published seven papers last year, seven the year before. If you work in safety and you haven't heard of Amalberti, it's probably for a couple of reasons. The first one is that he writes mainly in French. This is something that I sort of realized later in my career, David. There's a lot of safety work, particularly in that human factors psychology end of things, which is only in French. Then we get the English version of it when someone popularizes it 10 or 20 years later. Have you found that too? Do you speak and read French?

David: No, I absolutely don't speak and read French, but we know that work as imagined, work as done is borrowed from French economics tradition. We can see in this paper, as we talk through, some of the ideas in Safety Differently, Safety I, and Safety II, maybe the origins of that in some of Amalberti's work as well. 

I think you're right, Drew. I think that's the way it happens. Being someone who only speaks one language, I don't get the opportunity to explore too much of the safety science until it becomes available in English.

Drew: The other thing about Amalberti is, I actually don't know if this is true in French, but in English, there's no single big idea associated with Amalberti. Unlike people like Decker, Reason, Levison, and Hollnagel, there's no sort of like single thing that you'd say, oh, Amalberti did this. He created a few of his own models about the way safety works, but none of them have caught on in that really catchy way that safety ideas sometimes do.

David: I think the one that he does gets credited for by some of those other theorists. It's the idea that adding more rules is not going to make your system safer. I think, definitely, Sidney Dekker and others credit Amalberti with that idea of doing more and more safety is not going to make your system safer beyond a point.

Drew: I think that's interesting. While he doesn't like to have an idea that is very popularized, everyone who does research or other writing in safety always cites Amalberti. If you want to know, get yourself a reading list in safety just to pick up the list of people who have cited this particular paper. That list is like a who's who catalogue of safety research. It's like everyone who has, at some point or other, cited Amalberti's work. You'll see in this paper which was published (I remember) 2001—10 years before Hollnagel was even writing Safety-I or Safety-II, or Decker was writing Safety Differently—that in Amalberti's work are the seeds of a lot of stuff that came later.

David: Drew, this is more so than research. This is a general-purpose safety model where this is a theory building–type of paper or Amalberti's descriptions of the world as he sees them, particularly transportation systems. Whether it was just at this point in time or whether it was a thread through most of his career, he has written a lot about rail systems, about aviation, and a range of other transport-related industries, more so than other industries. 

Is that sort of typical, Drew, that researchers tend to be associated with researching and thinking in certain industries? I suppose it's definitely true with healthcare.

Drew: It often is. I have to admit, I was trying to work out what Amalberti's original industry was because I've got a suspicion he's actually a medical doctor, but I haven't been able to find confirmation of that. You look at this paper and he's clearly talking about industries like aerospace, but he's also done some of that European stuff about talking about deep sea fishing. Then much more recently, he's been very into health care resilience. So at any given point in time, you could try to say this is Amalberti's industry. But you look over the entire career, he's had his hand in everything.

David: Drew, do you want to kick us off with the principles that Amalberti lays out, that's the foundation of his ideas about safety?

Drew: Sure. I think this episode is going to be a lot of giving lists of three or four principles just because that's the way Amalberti has written the paper. We'll just go through section by section and layout the things that he says in each section. The first section, the principles that Amalberti's laying out, are the way most industries do safety. He's got three things that he says. He says, firstly, systems are designed with theoretically high levels of safety performance. 

They start off, the design says it's safe, but then when we have technical failings—either in the design, maintenance, or human failings—it brings the system down from the optimal level of safety, just like you've got noise interrupting a good signal. This is like almost the pure system safety point of view. Systems are good. They're bad because humans make mistakes. The goal of safety is to engineer out those human mistakes. 

The second principle is the safety of physical things. The things that are purely by design are increasing over time because technology is getting better. Over time, even though we care equally about technical and human failings, safety priority naturally gets steered towards reducing human error because that's the main bit that gets left. 

The third thing is that reporting is fundamental to improving safety. Safety is basically a cycle of working out what's going wrong, finding where it's coming from, and fixing it, whether that's coming from problems with the design, or problems with human error, or organizational problems. 

He says that he's not critical of this in the way that some other authors are. He says that this is self-evident. It makes sense, this is the way we do things. The trouble is that it's diminishing returns. As systems become safer and safer, each of these things starts to lose its relevance and effectiveness. He's got a really specific number. David, we'll talk a little bit about Amalberti's use of numbers, which I'm very skeptical of. He claims that there's this asymptote, this frontier that can't be crossed. He says it's somewhere around 5x10-7 so 5 in 10 million. Then as we get close to that frontier, our safety activities start to have devious effects.

David: I don't know where that likelihood comes from. The closest that I can get is maybe at the time, it was the target mitigated event likelihood to the aviation sector or something like that, where it was able to use that as a threshold for what we'll talk about in terms of ultra safe systems. I can only assume it came out of aviation at the time.

Drew: It does sound very aviation-ish. The general idea is that you've got this fundamental limit that is set by technology. He thinks that's where across industry, it varies a little bit about how you count it. Generally, that's where the type of technology that we have at the turn of the millennium is sitting. There's a hard limit of what that technology is capable of.

David: Drew, what Amalberti is saying here is we design our systems to be safe, but there are always problems with those systems, either technical, latent, factors, or human errors. We keep trying to engineer even better systems and we focus a lot of priority on trying to reduce people making mistakes for their role in the loop. We drive lots of reporting to find these errors, find these issues, and detect and repair. 

As David Woods would say, we have these really effective detect and repair mechanisms in the organization. That's how we're managing the safety of our systems over time. That's how we've been managing them for 100 years.

Drew: I'll just give a quick preview of where this is heading because Amalberti has a really clear reason why he doesn't think it's possible to squeeze too close. That is that he doesn't believe that zero is the optimal number of human errors. It's not that he doesn't think that zero is achievable, he thinks it's undesirable. That once we push past the optimum number of human errors, things actually start to get worse because we've got too few errors. 

He's got quite an interesting rationale for that. His idea is that we get into this paradoxical space, where we're over optimizing, trying to get things safe.

David: I'm looking forward to that discussion, Drew; got a few ideas. The title of this paper, The paradoxes of almost totally safe transportation systems, paradox is his way of describing two seemingly opposing views that wouldn't be possible to coexist, that in fact, do and need to coexist. Drew, we're going to use the word paradox probably a few times, but do you want to talk about the next section now, where we've got this division of our systems into three broad types?

Drew: Yup. We've got another list of three things now. This particular division is one that Amalberti has played with throughout his career, so it appears in quite different versions in different parts of his work. The broad idea is that at different levels of safety, you need different safety strategies. 

In this paper, he divides it into dangerous systems, regulated systems, and ultra safe systems. The dangerous systems are things where there's a really high risk of an accident. He says greater than one accident per a thousand events, like bungee jumping or mountain climbing. If bungee jumping or mountain climbing was genuinely as dangerous as one per a thousand, no one would be bungee jumping or mountain climbing. This is Amalberti not quite understanding. It's not that he doesn't understand. I think he's just very lax. He pulls out these numbers without really doing the calculations.

David: I think it's referenced though, Drew, because I think I saw the statistics for people attempting summit attempts of Mount Everest. The fatality rate is like one in nine for summit attempts of Mount Everest or something. Maybe he did find that mountain climbing statistic somewhere. He might climb above 5000 meters or something like that.

Drew: I'm going to go out on a limb here and say that climbing Mount Everest is a kind of special case, that it's less about danger and more about semi-suicide.

David: Like bungee jumping without a rope. 

Drew: Yeah.

David: Got it.

Drew: The odds of jumping off a bridge without a controlled recreational system is—

David: Probably about one in nine. It's probably about the same.

Drew: Probably, yeah. Anyway, his idea is that there are some activities that are not professional activities. They're risk-taking that we do because of the risk, and safety about those activities is highly individual. So he's basically saying, let us just put that aside for the moment. There are some things that just fall way outside my model. 

The second one he talks about is regulated systems. These are ones where he says, the risk of an accident lies between one per a thousand events and one per a hundred thousand events. He puts in the list here, driving chemical industries and chartered flights. I did the math on all of these just because I was annoyed and no, they are not between one per a thousand events, one per a hundred thousand events, but the general idea here is that these are not as safe as (say) regularly scheduled civil aviation. 

These are regulated, but they're not ultra, ultra regulated. Safety is in the hands of professionals. We're using regulations and procedures. We're using error resistant designs, we’re putting in better training, trying to get people to make fewer mistakes, get people to follow. All of those things are working in this space. All of those things are successful at driving down errors. 

But then he says we've got these ultra safe systems that's as low as one per a million, things like regularly scheduled civilian flights, railway, the nuclear industries. He's saying there's always a limit of one accident per (he says) million safety units. He says these tend to be aging systems. So these are things that are around since the 1960s and 1970s. They're very, very highly regulated. In fact, he says over regulated. They're rigid, highly unadaptive. When we have an accident, it's usually got a combination of factors where the combination is a bit of a surprise and difficult to predict. If we could predict it in these industries, we would have prevented that accident. 

He also says that a distinction for safety professionals between ultra safe systems and regulated systems is that the timeframes that safety managers have an effect over is different, that with a regulated system, a safety manager can get results within a couple of years, so they can see and be rewarded for their own performance. In an ultra safe system, the safety manager is working for their successor. The results of their work are going to be seen after they've left the job because we're looking at timeframes of eight or more years before we see the genuine effectiveness of any safety change. 

As a result, it tends to be more of a political rather than scientific idea whether a particular thing is safe. We can absolutely see that in particular airlines. Air France had a few accidents a few years ago, which was from statistical point of view not significant at all, but it's obviously a big political thing when one airline gets news, attention, and headlines. Similarly with the Boeing 737 Max. One aircraft, only a couple of events but a big political issue about how they're managing safety.

David: Absolutely, Drew. I know in later versions of this paper, Amalberti tried to put different industries in different brackets, like putting construction and some industries in the regulated systems, and commercial fishing in the dangerous category. He played with different industries. But I think that's different. Just like you've talked about the numbers, I think there's a different definition of a safety incident that he uses in certain different industries like nuclear meltdown versus maybe a single fatality event in driving and things like that.

I think we've got to get away with the detail and just understand, like you said, Drew, the broad idea here, which I think is quite useful is that there are systems with a different level of safety that may warrant different safety management approaches. 

Drew: For the rest of this paper, he's really talking specifically about those ones that he called ultra safe. He's basically saying, airlines, nuclear industry, railway, how does our need to manage safety change as we push up really, really close to that apparent limit of how safe things can get? Do we have diminishing returns or even counter productivity when we push the same strategies too far? Should we continue?

David: Yeah, let's go. Amalberti is interested in these ultra safe systems, like you said. Drew, maybe we should start to talk about the problems in these ultra safe systems because that helps us lay the foundation in why maybe we might need to rethink our safety management approach. 

Amalberti says that the safer the system gets, the more that we run into problems. Do you want to start by describing what some of these problems are?

Drew: I have to admit that I was struggling a little bit with the language of this part of the paper, trying to understand exactly what he's getting at. So my apologies in my paraphrasing. I've misunderstood some of the point. The original bits fairly clearly says that we've got fairly loose definitions of human error and accidents. I think that's both well-known and well-understood. 

Whether you agree or disagree with any particular definition, what counts as human error, what exactly you define as an accident or incident is really contentious in safety. He says that that doesn't matter as long as the system is not very safe, because it's still fairly obvious when we're not very safe. What counts as an error, or what counts as an incident, or an accident? 

As we try to get like that last drop of safety, then it really, really does matter. When we've got rid of the pilot who flew the plane into the ground, what then do we count as pilot error? Is it the pilot not noticing an alarm for 30 seconds? 

We have to get more and more into things which are contentious about whether they count as errors or not in order to be able to count errors. We need to get more and more into what he calls quasi-incidents rather than things that are incontestably bad things that are happening, because we just have got so few uncontestable things that are bad that we can count, or manage, or investigate.

David: I think that aligns with my understanding. The weaker and weaker the signals that we're looking for become, like you said, when we're having an accident every week or every day, common cause type, accidents where it just becomes clear as day, what can be improved? Then the organization's resources are adequately consumed by fixing all of those obvious problems. When those obvious problems aren't there, then we need to look much further if we're going to follow that same approach.

Drew: It's not just that those signals are weaker, but they're ambiguous. Reasonable people can disagree about whether something even is a human error, whether it even is a problem that needs to be fixed. 

Remember, he's already said that he thinks with these industries, we've got things almost as technically good as technology can get them given that they were originally designed back a few decades ago. What's left is the human error. 

Amalberti then takes a bit of a left turn and spends a chunk of the paper talking about what he considers to be the summary of developments in human error up to the year 2000. Some of these will be very familiar to our listeners, some of them might not be. What I find interesting is this was when I was doing my thesis. None of this stuff was in the domain that I was working in, which is system safety, safety of railways, aircraft. 

All of this stuff would have been quite new and unfamiliar to people working in safety of aircraft and railways around that time. I've actually got some of the slides we use for talking about human error, and they look nothing like this. There was stuff that Amalberti was saying. He'd be like, this is commonplace where the people have actually been studying human error and doing it rigorously. 

He points out four main things. The first one, he says that mistakes are cognitively useful and cannot be totally eliminated. Basically, it's not just that we use mistakes to learn, but even once we're experts, we're still making mistakes and recovering from those mistakes. 

He says that it's pretty well-understood that the difference between an expert and an amateur isn't that the amateur makes mistakes and the expert doesn't. The expert just easily recognizes and corrects their mistake as part of normal work. Whereas the amateur might make mistakes and they go uncorrected, causing problems.

David: This is where we start to get on some of these really interesting points. He said if we genuinely think we're trying to create error-free systems and not have people make mistakes, then there's a question mark raised here about the ability of people to develop expertise—unforeseen, unfamiliar, dynamic, emergent-type situations. 

It's like if you rode a bike with training wheels for 15 years and then suddenly, we put it at the top of a hill without your training wheels and let go at the top, your ability to navigate and manage that particular situation that will inevitably come in all of these systems is highly compromised, compared to if you were able to explore mistakes of a lesser kind all the way through your career. That's how I (at least) interpret some of this idea.

Drew: Sometimes this gets talked about as error competence or even just personal resilience—the ability to just constantly make and recover from mistakes. The ability to recover from your own mistakes, also, of course, remember, is similar to the ability to recover from technological mistakes because the system isn't perfect, either. It's going to throw errors at you even if you don't make the errors yourself. 

The second thing is that, in order to understand the effect that mistakes have, we need to look at the whole system. You can't look at a mistake in isolation of the context, even to start calling it a mistake. 

He talks about two different theories. He talks about the Swiss cheese model from James Reason, which sees relationships as a set of layers, where individual mistakes come from and reveal organizational failures. Then he talks about HRO theories, where we don't think about things in terms of layers. We think of them as organizational dynamics. He doesn't like to pick and choose between those theories. He just says like, these are just two examples of, to understand mistakes, you've got to have a model of the whole system. You can't just look at the human. 

David: I think it's somewhere where we start to see the insightfulness of Amalberti around this work because this is a similar point in time when we were looking at skill-based, knowledge-base, rule-based errors, and Reason's culpability matrix around errors, and looking at errors of choice. I think what Amalberti does here, although not really clearly and not trying to wrap any model around it but we just talk about one of our hot principles that the context and the systems just drive the behavior, or as Sidney Dekker would say, the behavior is a symptom of trouble deeper within the system. This is 20 years ago, Drew. These are fairly recent new ideas at the time.

Drew: Or at least, they’re new in safety, because I think Amalberti is directly importing these from non-safety work on human error. 

The third thing he says is that individual mistakes—whether those mistakes lead to accidents—comes from the whole system moving towards the boundaries of performance. I don't think we've talked about it directly on the podcast before, but this is like Rasmussen's envelope of performance, where we've got pressure from production, pressure from economics, pressure from safety. Unless you've got accidents, you've got just this constant pressure for other things moving you closer and closer to the boundary of safety.

David: Yeah. I think, Drew, that there are some. I'll definitely put a vote in to discuss that paper, risk management in a dynamic society, a modeling problem of Rasmussen's. The citation list for that paper would also read very much like a who's who of safety that has cited that paper. I think this idea that Amalberti jumps on and very clearly tied to Rasmussen's work is that idea where organizations have this extended period without accidents

They're (of course) maybe going to be optimizing around non-safety performance outcomes of the business, like we said, like resource utilization, cost, productivity, and all of those things, that will gradually reduce any safety margin that they had in the business. That's inevitable, Amalberti would say. 

Drew: The final point he makes is that there aren't really error-producing mechanisms in the brain, which sounds fairly simple to say. But this is in direct contrast to a lot of the things that we were doing in safety at the time, and are still doing. 

I have to admit that I've literally got one of my slides that shows Norman's cycle of decision-making and the way we make errors in perception, errors in interpretation, errors in goal-formation. Those models are sometimes a useful way of thinking about what types of errors exist, but they're incredibly incorrect about how people's brains actually work to produce errors. They're like how a computer would produce errors, but they're not how humans produce errors. 

I have to admit, I don't fully understand, though, how the brain does this. The way Amalberti puts it is he says, and I'll just have to quote directly, "Fundamentally, an operator does not regulate the risk of error. He regulates a high performance objective at the lowest possible execution cost. In the human mind, error is a necessary component of this optimized performance result."

I think what he's saying is that our brains aren't thinking about errors. Our brains are trying to produce a result. We're trying to do the least amount of thinking that we need to do in order to produce that result. We don't have conscious control over which mode of thinking we're doing or how our brains are optimizing what it's doing and not doing, we just do it. Then sometimes errors come out of that process, and they're just a natural thing that then feeds back in and we correct, speed up or slow down, pay more attention or pay less attention.

All of that works at around a stable low rate of errors, not a zero rate of errors. If we weren't making any errors, our brains wouldn't be able to notice the errors and speed up or slow down, or adjust, or pay more attention in order to cope with the errors.

David: I know that I'm not going to spend a long while since I studied… Actually, I was already finished studying psychology when this paper was published. But I think what he's also saying here is that there's an adaptive necessity for humans to learn from experience and part of that is on making mistakes. I think we've got these sayings, no, you can't put an old head on young shoulders. People learn something from themselves, they don't learn it from someone else's mistakes.

I think he's also saying in this ecological role of error in performance, that's saying that it's part of the human condition to accept and in fact, not to seek mistakes but actually it's actually an important part of the way that we learn and develop our understanding of the world and our expertise.

Drew: He certainly says that if we're not making mistakes, then our brain is just going to pay less attention to things until we do make a mistake, which alerts us that we need to be paying more attention. Because there's always other parts, we're trying to speed up the task, we're trying to get more production out, we're trying to do it better. Errors are what tells us we're pushing too much in one of those directions.

David: While we're pointing out some of the ideas in this paper, I see what Amalberti is saying in this section is very similar to what Erik Hollnagel ended up writing a whole book about, which is the efficiency thoroughness trade-off or the ETTO principle. This is same square to me what Amalberti is talking about here, which is that we are always balancing the performance objective, the thoroughness with how power efficient we can actually achieve that minimum performance outcome.

Drew: We should be fairly clear. The implication of this is not—at least from Amalberti's point of view—stop trying to reduce human error, because he says that up to a certain point, that is absolutely the way to make a system safer. You drive down the number of errors by making the system more error-proof. You’re making it harder to make errors, easier to do the right thing, and providing education and training to the operators.

We can see that really basically in the design of cars. We make cars safer by giving them better controls, better protection, and training drivers. I've got a whole rant in my note here about his calculation of this point, which I'm just going to skip here and just say, when reading Amalberti, stick to the ideas, ignore the numbers. Once you get to a certain point, you can't drive the errors down further just by those error controls.

He says you can get it a bit better from automation and strictly enforced procedures. That's one way that airplanes are safer than cars. They've just got more automation built in and they've got much more stricter controls about who can fly them, when they can fly them, how they can fly them. 

But even using automation and those procedures, that very quickly tailed off again. We get a certain amount of improvement from that, but then we get into a zone where by reducing errors, we're also reducing the operator's skills. That means we're losing situational awareness, error management. Safety is going down in one area by making fewer errors but up in the other area by not handling the errors that do come.

David: I think also, Drew, I suppose if we stay on the aviation example, that's in the opportunity to actually provide error-producing situations in simulators in regular simulation activities, to actually create opportunities for people to maybe make errors, explore unfamiliar situations and emergent issues, and really maintain some of those skills and expertise, that they may not be getting through routine flying operations.

Drew: That's something that Amalberti doesn't talk a lot about. It would be totally consistent with what he says, that we could have this split system where you make the core system safer through automation. But we maintain the operators' expertise through other methods, through simulations, through training, through things like that that expose them to error-making situations, and then we would have the best of both worlds. But Amalberti doesn't talk about that. I think he would still argue that we're still limited in how much we could get.

David: And we actually talked about about using simulators in the maritime sector for actually developing resilient capability in an earlier episode. People might want to go back and have a listen to that one. 

I just wanted to make a practical point here since you mentioned the numbers. If you run a large operation that you consider to be reasonably high risk and you're only getting a handful of reports, maybe a year, a couple of near misses or a couple of incidents, I think what Amalberti's saying here is you have two choices.

If you think that you're not really getting the insight into what's going on, then you can improve your reporting culture, and you can continue with these existing safety management activities of correcting problems that are identified. Or if you think that level of reliability is reflective of your actual performance, then Amalberti is saying that you really need to rethink some of the ways that you are managing safety.

Just remember thinking about organizations showing me fully green audit results, fully green critical risk control checks, fully green incident rates like a dashboard, just a sea of green. This is what this paper is really talking to. That's a business that is maybe safe to a point but is not going to get any safe continuing to do what they're doing.

Drew: The only bit of that I disagree with, David, is that I don't think Amalberti is pushing strongly for the rethink how you do safety. Amalberti isn't, and I don't think he's ever been strongly an evangelist for a new way of doing safety. He's more just carefully pointing out why we have the problems that we have and suggesting that if you want to try to solve them, these are the limits of what can be achieved through certain types of solutions.

I think it's natural that that approach would push you a bit towards the new view. There's a reason why Amalberti is cited by all the new view people as pointing out the problems with the old view, rather than a leader of the new view of safety.

David: I think, shortly after this paper in 2003, Amalberti published a book, which is Navigating Safety: Necessary Compromises and Trade-Offs. It would have been the pursuit of safety or something like that. It's another really good book. I think in there, he is a little bit more outside of these academic papers.

He's a little bit more direct, when he just sort of really calls out the European Air Traffic Management about adding 300 new rules into the aviation system every year, and it's not getting any safer, and we need to really rethink to continue compliance approach to how we mentioned. I think carrying this view that he has been a little bit more, we need to maybe not change, but we definitely need to add to the approaches that we've got for managing safety because we're not making our systems any safer.

Drew: That's fair enough. I think when we're doing Safety-I and Safety-II, I criticized Hollnagel when he moves on to the solution space. His solution just results in continuing to attack the old way of doing things. Navigating Safety is a weird book because it got published (I'm sure) without being finished. You look at the last couple of chapters and they’re literally just chapter outlines.

I've got a lot of sympathy for that because you can be very thorough and scientific about criticizing the status quo. Then if you apply that same rigor to your own solutions, you end up thinking, I can't say that, I don't have proof. Amalberti is a real scholar. I think he would have struggled with going into full-on advocacy rather than critique mode.

David: Fair point. His models and ideas (I think) are really useful to get us to reflect on the way that we are managing safety, but just like you said, it's not so much putting out. I agree with you about not necessarily advocating for a different view. Although we had authors like Hollnagel, Decker, and others come along and fill that void that have been created, at least in part, by Amalberti's critique of the current safety management approach.

Drew: Absolutely. Amalberti does have a conclusion section and it does have a few suggestions in it. Shall we go through them and discuss them?

David: Let's do that.

Drew: I guess the first thing to say is that Amalberti is a real techno optimist. He's almost like the anti pro. Pro says that as we start to introduce certain new technologies, they're just inherently dangerous. Amalberti says that he thinks that that fundamental limit that he's calculated that we seem to hit is he thinks that that is technology-dependent. As we move into the next generation of technology and the next generation, each generation will have a new limit.

He thinks that we'll take a while to get there. He thinks that the difference between safety in the 70s and today is that we've got 70s technology, but we've had 30 years to optimize our performance with that technology. He says, as we introduce 2000's technology, it'll take us 20–30 years to fully optimize safety against that technology, but it'll get us further.

He thinks that, fundamentally and particularly, he's got this real railway style view that ultimately the thing to do is automate everything reliably. Once you've got things automated and not dependent on humans, then they're going to be fundamentally, the limits will be safer. I don't really think that there's evidence for or against. Let's just say that is the position that he takes.

He also says that he thinks that the safety measures that we should take, the ones that are most effective, depend on where we're optimized against that target. The further we are away from the limits, the more effective it is to try to drive down the rate of errors. The closer we get, we shouldn't keep doubling down because that doubling down just not only won't help, it'll actually start to be counterproductive.

Once the system gets really, really close, we should focus not on making it safer, but on maintaining it at its current level of safety. Particularly, he's simply saying for aviation, don't want to make aviation any safer than it is. This is what we've got. The trick is to keep aviation this safe. That might be a different strategy to try to make it safer.

David: I think those are really interesting points, where you said maybe businesses need to measure different things depending on the safety of the system, so maybe it's perfectly acceptable to count errors and incidents, when you're having errors and incidents, the frequency of which all of your resources are consumed by correcting them. As you get more optimized for safety, then you need to be measuring and paying attention to different things, rather than just blanket saying everyone should use these types of indicators rather than those types of indicators.

I like the way Amalberti is one of the maybe theorists that’s why maybe his models haven't quite caught on juries because they've all been nuanced and they haven't been simple enough. It's like, here are the four pillars of resilience engineering from Erik, or here’s the three principles of Safety Differently. He says, well, but it's really context-dependent and approaches need to vary. Maybe that's why his models were never quite simple enough to be popularized like Swiss cheese.

Drew: There is one particular thing he says, which I think is very, very true and I wish more people would pay attention to. The fundamental problem is that your number of countable events is going down as you get safer, so you're making fewer things that are unambiguous errors or unambiguous incidents. He says that you can't fix that problem by going to less direct metrics, because as you move away from unambiguous errors to quasi errors, quasi incidents, and precursors that might or might not be precursors, then you're losing your validity of those things.

It's pointless anyway because you don't want to drive human error down further once you get to that point where your unambiguous things are so low. You want them just to be sitting at that level. Pretty much like every other person who has tried to fix the metrics problem, he's right about the problem. I think his solution is pointless. David, I'd be interested in your thoughts.

His solution is (he says) focus on incident volume monitoring rather than on investigating each incident. He says keep track of the total number of incidents you're having. If that starts to swell, you've got an aging platform. That's when you need to start worrying. That says that you failed to keep it low.

Personally, I think that once you start to monitor incident volumes, you're just driving down reporting for no particularly good reason. I think that contradicts what we understand about the politics and organizational pressures of reporting. I think he's right about the problem, but I think it's not a problem that has an easy solution like that.

David: I think I can empathize with his point 20 years ago. Some of that point is quite insightful, which is saying, okay, if we've only got one incident per million events, then there's no point really looking at that incident and changing the system for the other 999,000 times when, how do we know if any of this is systematic because our rates are so low?

Then he goes, but keep an eye on it, because if you suddenly go from one incident every million to one incident every 10,000, you've actually fallen out of that ultra safe system bracket that I've defined. Maybe you should go back into investigating every incident again because you've got those volumes of them. 

I think he was being quite mathematical about saying, you're in this ultra safe system, no need to look at it, focus on protecting your operation. But if you fall outside that realm, then your strategy should change again.

Drew: I think that makes logical sense. I think this might be an interesting place to talk just before we go into conclusions. Given that I don't think we can trust Amalberti's numbers, how do you as an organization know where you fall? How do you know whether you're an organization that can improve by trying to drive down errors? Or how do you know whether you're an organization that is close enough to safety, that we should be employing other strategies?

David: Now that is a really great question that I'm not going to answer directly. But I was thinking just a second ago when we talked about this need to. When you get your system to a level of safety that you are satisfied with, whatever that is, then your task is to manage the system to that level. If we think about aviation and whatever we say, is it 1x10-7, plane crashes for every takeoff for every million or 10 million takeoffs we have a plane crash, then the question is, how do you manage the system to that level of safety?

It's like a relentless focus on not so much the things that we talk about in Safety-II, but a lot of what Amalberti says, which is automation and the process, the rigid guidelines around pilots, frequency, recency, communication protocols, air traffic management requirements, and airworthiness of aircraft, is actually, for a system that is already so safe, it's almost like make sure tomorrow looked exactly like today looked.

So I guess that's, I don't know how you know when you're going to get better by reducing errors any further. Maybe there is. I'm just thinking as I'm talking, Drew. Maybe there is no distinct category. Maybe there's always value in trying to reduce errors and there's always value in trying to protect the state of operations.

Drew: I think the time it would be interesting to think about this strategically as a safety manager would be, let's say we were in an industry where fatalities were rare but distinctive possibilities, and we'd had several years without any fatalities. We still had a number of hand injuries happening every year. The temptation would be, okay, this is the next problem to solve, to get down to further safety. We've got to divert attention towards dealing with these hand injuries.

Whereas the alternative would be to say, look, we're happy. In our industry, the level of safety is that we have this small number of minor injuries. What we really should be devoting our safety attention to is keeping that stuff that we're doing well that is protecting us against those fatalities, maintaining the stability of those things, and keeping watch that those things don't start to decay from complacency. I think that'd be a really difficult choice as a safety manager because there’s always this pressure to improve. Claiming that we are staying steady is not often an easy thing to do.

David: I think this is where the theory, science, politics, and the practicalities have this intersection point, which is why I think this idea of paradox in this paper, Amalberti means that the safer we get, maybe the less our safety processes can continue to help us get more safe.

Maybe you're right, Drew. There's always going to be a desire for improvement. There's always going to be desire for error reduction. There's always going to be a desire to make what we've already got work, continue to work and work better. That's just the reality of organizations.

Drew: Should we move on to takeaway messages?

David: Yeah, let's do that. Do you want to kick us off?

Drew: Sure. I think the first big message is, Amalberti says that the safety of any system or organization we're in has two limits. It's got a limit set by the underlying technology and it's got a limit set by how much we're willing to genuinely optimize for safety and push towards that limit, rather than for settling for something that is fairly away from that theoretical optimum for safety. 

If we genuinely want to do a reset for safety, we need to do more than just push close to the limit. We need to basically invest in new operational capability, new ways of doing the work that we're doing with better technology that is fundamentally safer.

David: Drew, this idea of when we even go back to some of Rasmussen's work, which is that we've got these competing operational goals, and Amalberti said about what's our genuine willingness to what you would say, Drew, is put a thumb on the scale for safety at the expense of custom production goals. Here, we're saying to actually genuinely reset safety margins in our business. It's actually about investing in resources, reducing production targets and some of these things, which allows that margin to come back rather than directly thinking that we can focus on improving safety margins by focusing on improving safety margins.

Drew: I was trying to think through what this would mean for a couple of examples. I think as an entire industry, how much we're willing to genuinely prioritize safety is often a political decision that is beyond our paygrade in a single organization. Like it or not, society has decided that aviation is allowed to spend more on safety. As an individual airline or aircraft manufacturer, you're living in that environment where you are politically unable to take risks that could be taken in other industries.

I think it's much harder if you're a taxi company to say everyone else is charging this much. We're going to be the company that charges $2 extra a trip in order to be safer. There are some businesses that do it, that set themselves out as the quality alternative. We just unashamedly cost more than other companies, but in return, you get more safety. But it's a hard thing to do and often, that decision is made outside of the safety environment.

David: I think the idea of markets deciding is important because when you go back to the taxi example, you could, as a company, you say, we are going to increase our cost by 40%. We're going to have two drivers in the car the whole time to make sure that one spotting out for the other one. They swap every 30 minutes, they make sure that they’re both are alert and driving carefully. But then you'd very much find that people wouldn't necessarily pay that premium for that increased safety.

I think you're right, Drew. I think it is very much a political exercise, and this is probably where we get the really important role of regulators to set some of these margins for industries.

Drew: The other extreme I was thinking of, David, was when we have something like a machine shop and the machine tools in that shop are retrofitted with guards that can be removed, the organization is spending money on behavioral safety, the regulator is pushing the safety programs, when really the thing that will make a step change in safety is to buy better tools, to have more automation built into the tools, guards built into the tools, stricter regulation about what you can import into Australia in terms of machine tools.

I think sometimes, we sacrifice at that technology end and then try to make up for it at the human error end. Amalberti is saying we need to just have our eyes open, that we want to step change and safety. We don't do it through the next safety campaign. We do it through deciding. Actually, we're going to move to the safer way of doing work.

David: Yeah, and the unpopular comment here to make with in terms of industry and organizations is that, while organizations have lots of these messages of safety as the first priority, capitalist nature, the environment that companies operating, the pressure that they face to generate profits is so great that it's very hard to see that that trade-off for safety being made at the levels of which Amalberti would probably say is required to make those companies ultra safe, if their regulators or there's nothing found on those organizations to do that.

Drew: I think the thing that frustrates me sometimes is that the investment in capital equipment is a genuine investment. It can be a really good business decision that improves productivity and improves safety, whereas the safety program is a sunk cost. Even if it works, it's preserving safety for a little bit for a little time with a negative return on investment for the business because he's spending money on the safety program with no increase in productivity.

I think sometimes, this isn't just like selfish trade-offs. It's actually bad decision-making that we need to take a step back and make better decisions about how we set up our businesses, how we think about what is costing us money. Often, we set these short-term incentives that cause the whole business to act against its own interests.

David: Yeah, some of these. Just one example, Drew, that I give in an organization that did a one-day safety stand down as a way of telling their organization that 5safety is so important, we're going to stop production for a whole day and we're just going to talk about safety. It's going to cost us, as a business, more than a million dollars.

My comment to people involved in that was, surely, you could find a better way to spend a million dollars. Surely, you don't want to put that productivity pressure on your people for the rest of the week to catch up, because of all of those warehousing and manufacturing facilities of all the customer orders that still need to go out the door. I just find that insane, that organizations think that that materially reduces risk in their business.

Drew: Second takeaway. The idea of making things safer by making errors less likely becomes less and less useful the safer we are. Obviously, each person and each business needs to have some idea about where they're currently sitting on that curve, but we need to be aware that it is a curve. As you get safer and safer, just working harder and harder and making things safe doesn't work.

The third one is that expertise is not about being free from error. It's about being competent to recognize and correct from errors. That (I think) is one of the most fascinating and important takeaways. This isn't really an argument for or against zero harm. It is an argument against zero error.

We shouldn't be thinking of no one makes a mistake, everyone does things properly all the time. Even as a desirable goal, let alone a possible goal, we should think our goal is preventing the accidents. To do that, it's okay and desirable that people are making mistakes and fixing those mistakes.

David: Drew, do you want to just talk about your fourth point here about less regulated industries?

Drew: Yes. This one, I've got it as a takeaway. I honestly don't know whether it's good or bad myself, but Amalberti leaves open the idea that in less regulated industries, ones where accidents are more common, that driving improved safety by trying to reduce human error is actually the logical and sensible way to be doing safety. I honestly don't know what to think about that. I think he's got a really strong argument that that becomes less and less useful. But the open question is, is it sometimes the best way to do things? I don't think Amalberti answers it, he just leaves it as a question.

David: I think that continues to come up in a lot of Sidney Dekker's writing and other current theorists writing about error traps, failing safely and these types of things. I think it's about one of the hot principles about error is normal. I don't even know what can we even say on this podcast, Drew? Can we say a generally accepted idea? Maybe there's a generally accepted idea that error is going to happen in our operation.

The current theory says try not to build error traps just to set people up to make problems. Try to make sure that you have an idea of what possible problems could arise and have a means if those problems occur and those errors are made for the system to fail safely, or the person to fail safely, or making an error safely. But I don't think anyone said too much about what to do about it.

Drew: Let me put my own position this way. I think it is definitely desirable that we improve our designs and our organizational conditions to make it easier for people to work successfully with fewer mistakes. I don't think we know as much as we think we know about what processes lead to those designs and those organizations.

David: I like that, Drew. The question that we asked this week was, why does safety get harder as systems get safer? Do you want to have a go?

Drew: Amalberti's answer is that for very safe systems, safety doesn't come from suppressing errors, but from keeping them controlled within an acceptable margin. Safety gets harder because our traditional approaches to safety don't tell us how to do that. They just tell us how to work in that space where we're reducing obvious errors.

David: Thanks, Drew. That's it for this week. We hope you found this episode thought-provoking and ultimately useful in shaping the safety of work in your own organization. Join us in the discussion on LinkedIn or send any comments, questions, or ideas for future episodes to feedback@safetyofwork.com.