The Safety of Work

Ep.86 Do we have adequate models of accident causation?

Episode Summary

In today’s episode, we discuss the paper “Risk Management in a Dynamic Society: A Modelling Problem”  published in a 1997 volume of Safety Science by Jens Rasmussen (1926‑2018). Rasmussen was a renowned professor and researcher at the Riso Laboratory in Denmark.  As one of the most influential thinkers in safety and major hazard prevention, Rasmussen’s theories put forth in this article are still being used in safety science today.

Episode Notes

We will discuss how other safety science researchers have designed theories that use Rasmussen’s concepts, the major takeaways from Rasmussen’s article, and how safety professionals can use these theories to analyze and improve systems in their own organizations today.

 

Discussion Points:

 

Quotes:

“That’s the forever challenge in safety, is people have great ideas, but what do you do with them?  Eventually, you’ve got to turn it into a method.” - Drew Rae

“These accidental events are shaped by the activity of people.  Safety, therefore, depends on the control of people’s work processes.” - David Provan

“There’s always going to be this natural migration of activity towards the boundaries of acceptable performance.” - David Provan

“This is like the most honest look at work I think I’ve seen in any safety paper.” - Drew Rae

“If you’re a safety professional, just how much time are you spending understanding all of these ins and outs and nuances of work, and people’s experience of work? …You actually need to find out from the insiders inside the system. ” - David Provan

“‘You can’t just keep swatting at mosquitos, you actually have to drain the swamp.’ I think that’s the overarching conceptual framework that Rasmussen wanted us to have.” - David Provan

 

Resources:

Compute your Erdos Number

Jens Rasmussen’s 1997 Paper

David Woods LinkedIn

Sidney Dekker Website

Nancy Leveson of MIT

Black Line/Blue Line Model

The Safety of Work Podcast

The Safety of Work on LinkedIn

Feedback@safetyofwork.com

Episode Transcription

David: You're listening to The Safety of Work podcast episode 86. Today we're asking the question, do we have adequate models of accident causation? Let's get started.

Hi, everybody. My name is David Provan. I'm here with Drew Rae, and we're from the Safety Science Innovation Lab at Griffith University in Australia. Welcome to The Safety of Work podcast. In each episode, we ask an important question in relation to the safety of work, or the work of safety, and we examine the evidence surrounding it.

Drew, in the last couple of episodes, we've had taken the liberty of talking about some foundational papers in safety. I think it was in our episode a couple of weeks ago on Amalberti's paper, The paradox of almost totally safe transportation systems, that I mentioned what I thought was quite a foundational paper by Jens Rasmussen in the late 1990s. I thought we could discuss that. Do you want to kick us off with some background into Rasmussen?

Drew: Sure, David. I know some of our listeners have complained that we're a little bit scripted and not very bantery in these episodes. So I feel obliged to point out for everyone that David's notes here just say, at this point, Drew will provide general background on Jens Rasmussen, and then have nothing left.

David: The reason I said that, Drew, as such an esteemed safety scholar and someone as important as Rasmussen, you should just be able to roll that off your tongue, but true to academic forum, I did have to take a few minutes just to check a few points.

Drew: I'll tell you in a moment exactly what I was checking, David. My immediate thought as an academic was a thing called the Erdos number. Have you heard of the Erdos number before?

David: No. I'll learn along with our listeners.

Drew: Erdos was this mathematician who was just an extreme collaborator. There are lots and lots of papers that have Erdos as an author. Because he collaborated with so many people, everyone in academia can now work out how socially distant they are from Erdos based on co-citations. It's like a mark of prestige to have a low Erdos number, how many steps there are between you and poor Erdos.

I found out that Jens Rasmussen is so influential that there is such a thing called a Rasmussen number, which is everyone in safety science is only a few citations removed. Actually, not just citations, a few co-authorships removed from Rasmussen. What I was checking was I know that David Woods co-authored directly with Rasmussen, and Sid Dekker was a student of David Woods and has co-authored with him.

I wanted to check. I've co-authored with Sid and you've co-authored with Sid. Does that mean that our Rasmussen number is three? I wanted to check. Has Sid actually ever co-authored with Rasmussen? But then I realized, you and I have both co-authored with Woods. so we have a Rasmussen number of two.

David: Wow. That's exciting. So then, Drew, what's your Erdos number? Have you done that calculation?

Drew: It's very hard because you have to trace all of mathematics into computer science, and then through computer science until you get to me.

David: Okay.

Drew: Rasmussen is sometimes thought of as the granddaddy of safety science. They talk about Rasmussen's grandkids, the generation of people who were directly influenced by his work. So with Rasmussen numbers of two, you and I both qualify.

David: Very good. I think I'm familiar with the work but not being that involved in whether it's an age thing or academic experience thing, not being that involved when Rasmussen was central to these ideas. You just hear about the influence that he had as an author and this whole new view collection of authors being referred to as Rasmussean in their thinking.

When I was reading this paper that we'll talk about today, some of the sentences and some of these paragraphs could be pasted into human organizational performance text in 2021 and be seem to be very new ideas and it's almost 25-year-old paper that we're going to talk about.

Drew: When we were talking about Amalberti in the last episode—I think it was episode 85—we mentioned that there are a lot of ideas that originally come from Amalberti that have ceded their way into other work. Rasmussen just has a really heavy influence not just on new view authors, but on authors who you wouldn't really consider new view, who have, in fact, argued very heavily with the new view.

There's almost no one in modern safety science who hasn't taken onboard some of his key ideas. There are different ways of breaking up the field of safety science into different fields and approaches. But you can't really take a Rasmussen field because there are people in safety engineering who draw on his work. There are people in what we think of the new view.

There are people much more at the cognitive psychology and human factors that follow his work. Pretty much, the only people who haven't been influenced are people who take a very, very quantitative, almost epidemiological approach to injury causation.

David: Even in this paper, though, there's a whole section on probabilistic risk assessment when looking at different types of hazards in different types of systems. When I was reading through, I thought he had almost causal event tree diagrams that would be similar to something that Leveson would publish in stamp and had some leanings towards some quite specific, quantitative risk management approaches in certain systems. It did seem to connect a lot of different thinking into quite a clear narrative about how we need to think about safety and our organizations.

Drew: Yeah. He almost predates the split between system safety and occupational health and safety, at least in terms of thinking. Today, safety engineering and safety in social science hardly talk to each other at all. There are a few people who stepped across the boundary like Leveson that Rasmussen was already just straddling right across both fields. You don't have many social scientists who talk about risk assessment or who were just familiar with how the techniques work. Rasmussen worked in the nuclear industry. He's very heavily steeped in that stuff.

David: I don't know what's in store for you or I, but the influence of Rasmussen was such that a couple of years ago, when he did pass away, he had special issues of safety science, he had a dedicated conference, and almost an outpouring of reflections from the who's who of everyone in in safety science today. It definitely impacted the thinking of a lot of people.

This paper is one of my favorite papers. Listeners will probably hear why on the way through that it's easy to get your hands on. It's so well worth the read. It's about 30 pages long. We won't talk about everything that's in it, but very much well worth the read.

Drew: That's my dream, David. I don't want a retirement party or a funeral. I want a workshop dedicated to how my work has influenced other people and actually had zero people show up to it.

David: All right. One of us will get one. If we had a pack to do one for each other, at least one of us will get one.

Drew: We talked about the very basics of the paper like we usually do. This is a single author paper. From a time when single author papers used to be a lot more common, safety science used to be filled with these almost like essay theoretical-style papers. It's called Risk management in a dynamic society: a modeling problem, in the journal of Safety Science, published in 1997. Although I have a suspicion that it might actually have been published a couple of years earlier somewhere else and has been around in a few versions. I've seen it cited earlier than 1997.

David: I think when I had a quick look just on Google Scholar, had 3500 citations, so a very well referenced publication that cited 150 times a year since it was published. That's very heavily-cited. In this article, and you might get it from the title, risk management in a dynamic society: a modeling problem, Rasmussen posted the question that in spite of all of our efforts to design safer systems, we still have these large-scale accidents.

His basic question that he stated in the paper was, do we actually have adequate models of accident causation in the present dynamic society? I think at this time whether it was the mid- or towards the late 90s, it feels like Rasmussen's thinking was trying to make sense of the emerging complexity science and systems thinking, this tension between normal accident theory and HRO theory, this relationship between the safety work processes.

He talks a lot about risk management and other safety work processes, and the dynamic real time risk situation that people face, and trying to try to work out if our safety science models and approaches that we had were relevant or maybe not relevant, maybe we're right. I don't know if that's the right word used, but were appropriate given those other theoretical context.

Also, what I really like is the way that Rasmussen frames and describes problems. He offers frameworks for thinking and action. Actually, even though it's a theoretical paper, tackling quite a big question, there's still quite a lot of practical direction and thought in this article. I don't know if you've got the same amount of reading, but it was grounded in someone who obviously understood how work happens in organizations.

Drew: Not just like normal work, but also how safety work happens. I get the real flavor when he's talking about, do we have adequate models, that he's read dozens of accident reports. He's thinking about how those accident reports describe the accidents, how they describe the causes, what methods people use, and what diagrams they draw.

There are a couple of key diagrams in the paper. One of the ones that I don't think is in there but I'm just picturing my own head is working in a world where most people think of accidents, not necessarily as a totally single line of events. It's not like a chain of dominoes. But they are thinking of accidents as this set of discrete causes that causes other causes, and those causes cause the next set of causes, and those causes cause the accident. You can draw the accident as this network graph of things causing other things that bring into the accident.

When people talk about a linear model, that's really what they mean. It's not a single line, but this causes that, causes that in all sorts of combinations. He's just saying that's how people think of it, but that doesn't capture the dynamism of what's going on. The way in which organizations aren't just made up of events causing other things, they're made up of movements, people, and pressures. You need to describe those things. You can’t just describe them as the raw events.

David: Absolutely. We'll talk about these pressures because at the heart of what we'll talk about in this paper today and its modeling problem was what he talks about, these constantly competing pressures. We would talk today about goal conflict. We might talk about efficiency thoroughness , trade-offs and these things, but drift into failure.

We got to talk in quite practical terms about how real organizations face these constant tension and pressures, which means that our risk management models can't be static and they also can't be linear because of the way that these different pressures interact with each other. He refers to this almost like the risk management system. This is before we probably think of enterprise risk management systems and the things we have today.

At the time, he was really just talking about the broad ecosystem of how risk attempts to be managed with high-risk technologies and why I think he called it in a dynamic society in the title rather than the organization, because he's really drawing in this entire socio-technical system of people outside as well as inside the organization, so legislators, as well as the managers, planners and operators inside, and also the communities and governments.

He says that this risk management system and the organization is a system that's stressed by fast-paced technological change. A lot happening in the 90s in leaps and bounds in automation and software, an increasingly aggressive competitive environment. So lots of the liberalization, deregulation, market-based economies, these changing regulatory practices and public pressures, different models of regulation, and different regulatory structures and expectations, and these types of different pressures that we're stretching and squeezing the entire risk management system around these technologies.

Drew: David, I probably should just throw in here because we've got quite a broad cross-section of listeners. The environment that Rasmussen is working in, not as an academic but the practitioner environment, these organizations, particularly things like nuclear power or aerospace that are working with the big safety critical systems that involve formal risk management systems, remember, this is a no time we're not every organization has a risk management system. It's something that flowed out of the highly critical industries into other industries later.

All of those risk management practices were based on models of failure, where the technology features very, very heavily. Some of the procedures and human behaviors get added in. People tell themselves, they're taking a systems approach because they've thought about the operator, not just about the tank, but you won't see a manager appear anywhere in any of those risk assessments. The idea that the people performing the routine, the idea that the person pushing the button exists within an organization doesn't get taken into account. They either push the button or they don't push the button.

We forget about the fact that they have to come to work, have to go home that day, and they have other things that they have to do. All we care about is, does the button work and does the person pushing the button work? Which failure is it? That failure then feeds into the model that tells us whether the nuclear reactor is going to melt down or not.

Rasmussen is saying look, you've got to broaden out beyond the physical technology, beyond the software, beyond the procedure, because the organization matters for the risk, and probably matters more for the risks than just the physical technology does.

David: He went really far in his explanation. I like how you say that about why you can't do that. It's quite a straightforward read about saying, basically it says in summary that there's no real point looking at individual risk management practices. For example, something like a job safety analysis or a safety case, there's almost no point looking at the person pushing the button or that component that fails in these complex systems, because the way that we need to model these, these systems requires what he calls functional abstraction not structural decomposition.

So don't break it down and look at, does each individual component work? You've actually got to look at how the system is functioning, the pressures and constraints, and boundaries of that system. Like Sidney Dekker would say in Drift Into Failure, we need to always be going up and out, not down and in. That's a direct (I suppose) translation of one of the ideas in his paper.

Drew: I'll pause you for a moment because that's Sidney's translation of it. One of the fascinating things about Rasmussen is everyone takes that basic idea and translates it in different ways. if you look at Leveson's work—I think we're actually planning next week to have a look at one of Leveson's papers—Leveson sees functional abstraction just as the opposite direction to structural decomposition.

I believe that she is following Rasmussen's ideas. What she does is she then draws a diagram that just extends the system beyond the technology to include a layer for the managers, or layer for the planners, or layout for the legislators, and builds in loops to show how all of those things are interacting and influencing the safety of the system. You can go up to that level, you can go down to the level of when the valve is working or not within the same representation.

Dekker interprets it to say, you need to go up because you can't go down. Once you start abstracting, you get to the real important stuff. Then you've got other people, for example, with XE maps who sit somewhere in between, who see the different levels of abstraction more as layers of looking at things. You can look at things at each layer and see the causes separately at that layer.

Yes, the layers connect, but it's helpful to look at each layer because different people have responsibility for each layer of the causation. They will all say that they are applying this idea of functional abstraction instead of structural decomposition that Rasmussen is talking about.

David: Thanks, Drew. Well explained. We talked about going back to the source and there's always a layer on the top. In fairness to Leveson, Jens drew some similar diagrams inside this paper with modeling systems.

Drew: Yes, I didn't think any of those people were incorrect. They were all taking Rasmussen's idea and so they said, what does this mean? What do we do with this insight? But everyone has a slightly different way of taking that insight, which is the forever challenge in safety. His people have great ideas, but what do you do with them? Eventually going to turn into a method. Once the idea has gone into a method, it's not quite the same idea anymore.

David: Yeah, and Jens is not quite the same idea, but he does a better go than most that are trying to say what he thinks should happen. He says, don't really spend time focusing on action sequences and occasional deviations or human errors. You have to create a model of the behavior-shaping mechanisms in your system in terms of constraints, boundaries, acceptable performance, and subjective criteria for guiding adaptation to change, change situations, change circumstances.

He has a fairly deliberate shot over any model of unsafe behaviors and linear models of accidents being useful in modeling risks. Instead, he points to this holistic systems approach very similar to what we might understand in the hot principle today around context drives behavior. He's saying you have to model the context inside your system so you know what behaviors that that context is going to create. This paper spells out a lot of that principle well.

Finally, I actually didn't realize the words guiding adaptation work was in this paper when we did the work about guided adaptability at the end of my PhD with David Woods. But I'm sure, somewhere in the back of David Woods’ mind was knowing that he'd read that somewhere and it was a useful term to use, so maybe that's directly. I don't know if you use a word in a theory that comes directly out of Jens' paper, whether that puts you as a one instead of a two.

Drew: No. I think you actually go to directly co-author and would be working the same level for a while, so he definitely gets a one.

David: Absolutely. The paper starts with an evolution to the theoretical approaches. He talks through what we've done in safety up until now. Where we've now arrived—when I say now, the mid 90s—at this broad understanding of the design of our individual work systems, and how we understand the decisions within those systems.

Rasmussen just argued that, when he looked around our risk management and safety models, he just felt that they were insufficient for looking overall at the risk management of systems, and suggested the introduction of this paper that we actually need an entirely new conceptual framework.

Drew: Rasmussen doesn't say this directly, but he's really thinking about this in probabilistic terms constantly. His idea of what it means for something to cause something else is just so fundamentally different from lots of people in safety, even today. There are lots of people have this idea that cause is deterministic. When Rasmussen talks about guiding and shaping factors, he's basically (I think) making a much clearer case than Holnagel does, that your accidents always have a probability of happening.

The things that cause accidents are always constantly causing them. They're just causing them with very low probability. It doesn't make sense to talk about what are the causes and remove the causes or control the cause because they were always just there. What you can do, though, is change those probabilities. You can shape the likelihood of things happening.

The behavior that is going to lead to the accident is happening right now in your organization, but that behavior has a low probability of causing an accident. A different similar behavior might have a slightly higher probability. Maybe we can reduce the rate at which the most dangerous behavior is happening by shaping something that we do with management. We're not removing causes, or controlling causes, or eliminating causes. They're there all the time. We're just changing how much influence they have on each other.

That's the whole way Rasmussen sees the world, the causal field is filled with these unknown, but very real probability of things happening. We can make those probabilities go up and down. We just can't draw a perfect map of how they're all connected to each other and why would we bother, because we don't need to as long as we can understand the factors that make them go up or the factors that make them go down.

David: He does also talked about things that are always present. It's a matter of when, not if. There are also some things that would definitely surprise, initially counterintuitive, but potentially logical afterwards that more layers of protection and greater safety margins is potentially more dangerous than narrower safety margins and less risk controls. We'll talk about that in a moment.

There are some significantly different ideas in this paper at the time. Maybe there's the next section of the paper and he talks about risk management generally. He just talks about the problem space and does that quite well. He says, look, to have an accident, you basically need to have a loss of control and physical process. This is an accidental cause of events.

No one plans to lose control, that these accidental events are shaped by the activity of people, then safety therefore depends on the control of people's work processes. I don't think anyone in any branch of safety thinking would argue with that sequence of statements. I think that's quite illogical. We lose control, we don't mean to. There are people involved and safety needs to depend on the control of what people are doing inside the system.

Drew: I don't think anyone literally disagrees with that. I think the trouble is that everyone has their own immediate derived thoughts of that, and people disagree with each other's derived thoughts. That's why I think a lot of people think that people like Holnagel and Dekker don't think that that's true, because they don't like to focus on that, because they don't like the implications that people draw from it.

Rasmussen very clearly believes that accidents are loss of control events. I've never really seen any of the new view authors contradict that. They just don't like the fact that people see accidents as loss of control events. So they say that we need more control. The bit that new view people disagree with is the idea that more control actually reduces the likelihood of that loss of control.

David: That's a good point, Drew. Rasmussen said, look, our current approaching state after that sequence of events that we just talked about all that sequence of statements. So their current approach involves, to manage safety, we try to motivate our workers and operators. We train them, we guide them, we constrain their behavior with rules and equipment designed to increase the safety of their work performance. We're looking at directly working on the task and the behavior of the individual. It says at the time, which is again, early- to mid-90s, would be none of our models capture these shaping factors that are dynamically changing inside the risk management system. 

Rasmussen quite clearly then argues that this command and control top down prescriptive approach may be effective. He was quite diplomatic, so it may be effectively that very stable situation, where instruction and work tools can be based on quite clear task analysis, but said in the present society that he didn't see that this approach was adequate, and said that we require this fundamentally different view of system modeling, I suppose. Subsequent to this paper, we've got Safety-II, and Safety Differently, and [...].

I don't feel like anything that's been written about any of those theories that we see as popular today. I don't think they've redefined accident causation like Rasmussen has attempted to do in this paper. We did a three-part series on Safety-I versus Safety-II. I feel like I got a more clear argument laid out in this paper, albeit before any of that language that we use so commonly today.

Drew: Yes. I think Rasmussen is a little bit more precise in how he uses some of these ideas. Holnagel is drawing on those same ideas but being less precise in what he means by them, which I personally find quite frustrating and confusing. One of the challenges is everyone in this space is really talking in metaphors, but they're not always honest about the fact that they're metaphors. The moment you try to realize what that thing actually means rather than what's a metaphor for.

You were at the meeting, I think, where we were having an argument about trying to use the figure in Rasmussen's diagram as a basis for coming up with metrics. If it was actually like a mathematical model of how the world works, it would directly give rise to metrics. But the more you reserve like trying to think about what exactly is a boundary of performance, you realize that actually, it's not a well-defined thing.

David: I still think it's a good idea. I'm still looking for volunteers to want to work on that. I think it's got legs.

Drew: Yes. I do think that this idea that accidents can be better explained by abstracting away from the direct interactions between causes and looking instead at the shaping factors over those causes, is not something that I believe was expressed clearly in the literature before Rasmussen, and is not something that anyone has successfully replaced. This is literally how most people, if they really stop and think about it, understand accidents to happen and just to be clear about what the underlying thing is.

The idea is that any event that happens proximate to a loss of control has a certain chance of happening always. It's probably happening often, but most of the time we recover from it to the point where it doesn't even become a notable event. Your chance of the accident happening is the chance of enough of those little events happening and of the recoveries not happening. That's what caused the accident to happen.

It looks afterwards like each of those events caused each other. But the reality is that they were all just constantly in this dynamic process of happening and then being undone, then happening and then being hidden, happening and then being noticed and then being corrected. That's not a helpful way of describing it. What's constant is these overarching things that are driving the probabilities up and down.

If we understand those overarching things, that gives a better explanation of why it was this organization where the accident happened, why was it at this time and this place. It could under this model have been a fluke, or it could have been that the probabilities were higher in this time or place than other times in places. We can look for things that cause those probabilities to be higher—things to do with management, things to do with work pressure, things to do with safety management activities.

David: Or physical conditions or other factors. Maybe just a test in specific practical terms, and this is something that would have come up later. Say I'm working beside an excavation or a trench completely unprotected. I probably still only got about a one in 10,000 chance of falling in that trench because just by controlling where I walk, I can probably walk up and down maybe one in 10,000 times. If I then put a barricade around it, I might have moved an order of magnitude away.

We talked about [...] protection. I can very quickly get my risk probabilistically to one in a million chance. But then what you're saying is, one day on the side, that trench in its soft ground, or the wind blows in a certain direction, or something else distracts me for whatever reason, and all of a sudden, I've fallen down and after the accident, it seems so obvious to backtrack away from that event, because that's what are accident models do. They just start with a no unknown and make everything else a no unknown.

Drew: Yeah, and if we look back with enough of a crystal ball, we can see that some of those times you're walking up and down the trench, the edge protection was in place. Some of the times it wasn't there, but you noticed and had time to put it back up or someone else noticed in time to put it back up. Once you're walking down it, you're walking a little bit close, and someone said, hey, mate, just step away from there a little bit. Sometimes you're running and sometimes you weren't.

Each one of those things temporarily drove up your probability of falling into the trench and then went away again. Any one of those times, your risk might have jumped up to one in 100 or one in 50. But still, it was low and you got lucky. Now more likely than not, at the time when you fill in, some of those factors that were driving the probability up were present. That's not the question that Rasmussen wants us to ask. It's now why would this particular factor’s up at that particular time. It's where they systematically up. 

If you're working for an organization that is short on cash, short on management supervision, and not paying much attention to safety, then the times when you're running were much more often than the times you were close to the trench were much more often, and the time when there wasn't someone to say, hey, mate, were much less often. Your time to put the edge protection back in when it was missing, wasn't there. If you want to explain the accident, we're better off looking at where those pressures are coming from. Then at the particular time you fell in, which particular probabilities happen to be spiking.

David: The way to think about this might be more and we'll move on might be more like there's almost an infinite number of risks and ways of accidents occurring in your organization. Every step that gets taken beside every single trench, every single meter of driving that there is. Your chance of actually going task by task, day by day, activity by activity, and managing risk is always going to be a futile attempt.

It's almost like the capacity conversation that we'd have today, which is worry about building capacity in your organization. I think Rasmussen's probably early conversations are like that, looking at system enablers and constraints (I guess) when he talks about behavior shaping factors.

Drew: Yes. We got enough to start talking about the big figure three in the paper that's got [...] PowerPoints everywhere.

David: Let's do that. I think we've covered the context well. He's got this diagram which is like a diamond shaped diagram, and it'll be popular to everyone. We might if I remember, which is a 50/50, we'll find out. I'll paste the diagram in the comments when we post this episode. But very popular picture in safety, and at least in academic circles. This is the one that I think has a huge potential for performance management, just to set the record straight just before, and I'll explain it to anyone who wants to reach out. 

Human behavior is shaped by these objectives and constraints. We've got an objective we're trying to achieve and there are constraints around our ability to achieve that. There's always going to be this natural migration of activity towards the boundary of acceptable performance.

In doing my work, what is acceptable? What's an acceptable standard for my work? What Rasmussen has done is he tried to represent the mechanisms underlying this and says, you've always got this level of resources, which he calls an economic boundary, so how much resource can you bring to bear on the work. Then you've got this workload issue, which does the work that needs to be done, the objectives match or not matching with the resources that are available. What that will do is that will push the work towards the error margin or the margin towards safety. 

Simply, if you've got too much demand and not enough supply in terms of resource, then you will not be able to maintain those safety margins and have an incident. At least that's the way that I interpret this diagram.

Drew: The blunt way of putting it is that management is always pressing people to be more efficient, to do more work for less money. Workers are always pushing to do work with the least amount of effort possible to get through the day without exhausting themselves. Both of those pressures don't like work directly against each other. They both push it in the same direction, which is people try to get their work done acceptably. Both of those pressures are pushing you to push the boundaries of acceptable performance to start taking shortcuts, to do the work a bit less completely, to do the work with a bit less attention.

This is like the most honest look at work I've seen in any safety paper. Amalberti does this a bit, too. Just be honest about the fact that you people who go to work don't try to do every task perfectly. They try to do every task well enough, placing most attention on getting those tasks right that they need to get exactly right. Other tasks that they don't need to get exactly right, they pay less attention, they do less work.

They do the work acceptably not perfectly every time. Your management within the aggregate however much they might say they care about other things, they want the work done and they want the product out the door. They want the money to be coming in because that's what businesses do. We can't pretend that those pressures don't exist.

It frustrates the hell out of me when I say accident reports that say there was commercial pressure to get this project done. As in every single project around the world ever, there was not constant pressure to get the job done on time for enough money.

David: Which is why I think there are some things that we can be monitoring in our performance measurement that might give us some insights into the real time nature of risk in our business. It is this combination as if I'm a manager in an organization, it's how efficient can I be, how much can I get for the resources that I put in. For the workers, it's how can I meet the acceptable requirements of my job for the least effort.

Like you said, they're both pressures that push work towards being, in some ways, less and less safe. Yet, we've got these safety margins, we've got what Rasmussen goes on. He talks about defense in depth. He talks about all these different controls to try to protect ourselves now at the late 90s, which means we've got lots of redundant controls that we actually never rely on or we never use. The value of one control doesn't result in immediate consequences to the person like what we're talking about just in that example earlier.

You're also in the situation now in our organizations where the behavior of what someone else does depends on the possible actions of another person. What Rasmussen says is that any system that is designed according to this defense in depth, where you've got these multiple controls in this big safety margin will always systematically degenerate. So there's always going to be this pressure for cost effectiveness, and we'll get this drift and this trade off. It will be a constant effort required to maintain controls, which are not used in the work every single day.

Drew: This bit is pretty important and really overlaps with the Amalberti episode we were talking about last week. When Amalberti was talking about almost totally safe systems, he was talking about systems that are still operating right at that boundary of acceptable performance. The whole idea here is that safety is not about avoiding being at the boundary.

Being at the boundary is inevitable. You are going to be pushed up against the boundary of acceptable performance. The question is, how do you operate once you get there? How much of a buffer is there between the virtual boundary that you're not going to let people go across, and the actual boundary where the risk starts to escalate very quickly, and people are likely to more likely to get hurt?

Your people can be hurt even when you're not at the boundary. People can be hurt even when you imagine a boundary well, but once you get too far beyond the boundary, your risk just goes up exponentially. The question is, how do you manage things at that boundary? How do you stay without going over it? What do you put in place to create awareness that you're there, awareness that you might have just gone over so that you can step back, awareness that locally, the pressure is so much that performance has degraded to an acceptable standard?

So many organizations, when they're thinking about measuring safety, they think about it as if we're trending away from that boundary and that numbers are going down, means we're moving away from the boundary. That's not true. Your numbers are going down, you slit the boundaries with everyone else. You're just there with less information.

David: What Rasmussen talks about, just at the end of this section he goes, accidents are caused by events that are initiated by the normal efforts of lots of different people operating in their daily work context and responding to this standing request to get my job done as efficiently as I can. Ultimately, at some point in time, there's a quite normal variation in someone's behavior that releases this accident sequence. If it hadn't been this time with this particular "cause," we think that it would have been avoided because of some new safety measure which we put in place.

The reality is it'll just be released by another cause at another point in time because he said, we don't have and we need this framework that understands the objectives, the value structures, the subjective preferences governing behavior, the degrees of freedom faced by the individual decision maker, and the interactions of the people involved. That’s sort of saying that's just what this model draws on. These are the things that we need to pay attention to dynamically to understand which system or part of our system the next inevitable accident will occur in.

Drew: We probably should start heading towards some conclusions here. Fortunately, Rasmussen at this point actually starts to give us some direct implications. He starts to fit different approaches to safety into this model he provided. Remember this model is we're being pushed up against the boundary of acceptable performance. What are your strategies for dealing with the not wanting to go across that boundary too far.

One approach he says is, you just increase the margin from where the virtual boundary is to where you've completely lost control. You basically put in a buffer space. That might be something that gives you extra time, it might be something that gives you just more room to make mistakes. 

I think this is where it fits in with Perrow's idea of normal accidents that Perrow is saying accidents happen when we've got really tight coupling, or really high interaction complexity when people can't recognize that they're across the boundary, or they just don't have time to react once they've gone into the dangerous state. So we can just create more of a buffer.

The second thing we can do is create just continual counter pressure. That's where theories like safety culture comes in. Safety culture is about creating a third type of pressure. We've got pressure from management to be more efficient, pressure from workers to be less exhausted, pressure from safety culture to step away from the boundary of acceptable performance. So we can consciously work with trying to build up that pressure.

That's one theory of safety organizations. Safety organizations just exist as a counter pressure to other pressures in the organization. Then he's got some more interesting stuff about making the boundary more visible. This is why some of our techniques might be a bit counterproductive, because we build an extra protection that just hides from people where the real acceptable boundary is, or we get too strict on our performance requirements and we enforce this virtual standard of performance.

People don't know what actually acceptable is. They only know what strictly acceptable is. So people are stepping away from strictly acceptable all the time, but they don't know that sometimes when they do that, it's fine. Sometimes when they do that, they step towards very unsafe.

David: The way I like to frame this in my mind for those who like the human organizational performance, frameworks and models particularly black line–blue line, is we talk a lot about work as imagined work is done, which is what's the gap between the black line and the blue line? How we think work should be done? How it's actually getting done?

What this model is saying is, that's irrelevant. What we need to worry about is the gap between the blue line, how work's happening, and the hazard or boundary of acceptable performance is. So don't worry yourself with the blue line–black line, worry yourself with the blue line–red line because that's what's going to make visible for the work as it's actually being done when it starts to interact directly with that boundary or that uncontrolled hazard. I'm thinking it like that as well. I don't know if you'd agree with that.

Drew: Yes, I agree absolutely. I see you've copied into the notes a really complicated diagram that tries to draw Rasmussen's map of how every safety theory fits together with every other theory that's going on at the time. This is pretty out of date, given that the paper is 25 years old, but was there some sort of particular insight from this that you wanted to throw in?

David: Yeah. What Rasmussen was saying is that in safety or in risk management at the time, we're at the mid 90s, and we need to come to this realization in safety that we need a different model and different conceptual framework for understanding safety. What he's drawing here is that other disciplines relevant to socio-technical systems have already come to this conclusion. Management and organizational theories are already there with learning organizations, the work of [...] Singh.

Decision research is already there with naturalistic decision-making models. Some of the other research areas that are relevant to social contexts in organizations like social psychology are already there. He was putting this idea or this concept that he had alongside related disciplines that already had these descriptive models in terms of behavioral traces rather than these normative prescriptive theories and models that we were so used to in safety. So I think he was putting that all in there as a validation that this is about how we think about safety. It's quickly growing up and matching the reality of organizations.

Drew: I find the diagram fascinating because it highlights every theorist in safety. Rasmussen is no exception to this. They pick and choose from other fields what they're going to bring in. Rasmussen has, on this diagram, showed that naturalistic decision making is the epitome or the very latest in decision research, which I think lots of other people in decision research would strongly question. There are lots of competing ideas that are still alive and well, and developing.

Then in management theories, he goes straight into Wyck and learning organizations and says, anything else is just like scientific management and Taylorism. No, there are lots of other management science out there that's not Taylorist, but it's still not Wyck. In occupational safety, he's got risk homeostasis as his pinnacle of the field, which is a bit of a shortfall in the critical thinking there.

David: I don't know what the human sciences literature landscape looked like in the early 90s. I do think the idea of safety as a transdisciplinary science and needing to have descriptive models in terms of how real organizations work is pretty consistent with what we spoke about in our manifesto for reality-based safety science, which we spoke about in episode 20, I think. I think he's saying, give us some models that reflect the real world, understand social theory, human sciences trends. But again, it served the purpose of his arguments quite well.

Drew: Yes. I would agree with the bit that I would put as a takeaway for our listeners, particularly listeners who are doing PhDs or things like that, is this is a constant project. Twenty-five years ago, Rasmussen went out to these other fields and said, this is where the other fields are at the moment and this is what our safety and my ideas are consistent with them. But someone doing the same work today needs to go back out to all of those same fields, not to say, okay, these are the ones that Rasmussen brought in.

That's what we found. These single authors from other fields get quoted again and again in safety far more than they are within their own fields, because someone like Rasmussen happened to read them, was a big fan, and brought them in. The rest of us need to do our homework as well.

David: Great point. All right, practical takeaways. I’m going to start. This paper is nearly 25 years old. There are some fairly clear explanations and logic laid out about adjustments that should be made in our pursuit of safety that I'm not sure have found their way entirely into industry and organizations yet, so drawing on these insights across other disciplines as well. This is well worth a read and in reflection. 

For those who spend a lot of time with some contemporary safety theories, you'll see a lot of the parallels. You just might find some new ways of framing some of those ideas in ways that help.

Drew: I think there are a couple of places where they can definitely help us directly. Obviously, given that he's talking about accident models, one of the big things in our investigations is recognize that it's not helpful either to get hung up on individual things that happened at the particular time and place of the accident, or to get hung up on just broad brush statements like there was pressure, but to look for, what are the particular pressures that are in placed across our organization? What are the mechanisms that are helping people adapt to those locally? And how can we improve that dynamic management given that the pressure is inevitable, the mistakes are inevitable? That's the world we live in. Not much in pointing that out in the accident report. Instead, look at you. How do we cope with being in that world?

David: This idea of context driving behavior or what Rasmussen said, here is behavior-shaping factors. When we're talking about a lot with human organizational performance, we think about it in terms of just incident reports, so just understanding that if we should ask the question of what failed, not who failed, but this idea of behavior-shaping factors is really important to think about in how we understand everyday work, looking at different everyday works situations, looking for the presence or the absence of different behavior-shaping factors, pushing away from safety, pushing towards safety. I think you could do a lot with what Rasmussen lays out in this paper, and how you actually go and explore normal work in your organization to try to understand what those value structures, constraints, and conditions are in the type of work that you do.

Drew: I just want to throw in a couple of practical examples here. I'm sure you can give even better examples than I can. It's very tempting when you hear things like this to just to fall into defaults and think about your culture, local leadership, time pressures and things like that. But there was just really straightforward environments that change the risk of work up and down.

We can look at (for example) how much work are we doing at night. Is that changing? Are we doing more work at night than we used to? How much of our maintenance are we doing in the field versus in the garage? The changes in the ratios of those things are changes in the boundaries and the pressures that we're facing. How much extra discretionary time do people have?

Do they have enough discretionary time that they're doing the things that they adjust their own work to just keep within that managing the boundaries? Or are they so much focused on getting the basic stuff done, that they never have time to be conscious of the boundaries of doing things in their own workplace that help them step back?

David: Yeah. I think there's an infinite number of potential behavior-shaping factors that are increasing and decreasing risk, other factors in the business that are pushing more for efficiency, and counter pressures pushing back for safety. It's why this idea of just, if you're a safety professional, how much time are you spending understanding all of these ins and outs, nuances of work, and people's experience of work, It's not for an outsider to decide how these factors shape behavior. You actually need to find out from the insiders inside the system whose behavior you're interested in the things that are shaping their behavior.

That's what we need to tap into. Sometimes it's tacit knowledge or decision-making. This is a really hard thing to get at. I don't think Rasmussen saying it's easy. We're definitely not saying it's easy, but it's very important.

Drew: And it's going to be different for every organization. Really, you need to understand what are the factors in your organization and your industry. We did a project with a school organization that I won't talk about which organization it was, but it was supposed to be a workshop about safety.

One of the things we did was just imaginative exercise. You get paid to tell us, imagine the next major accident has happened. Where did it happen and how? Every single person in the workshop mentioned to us that over the last couple of years, class sizes had gone up. They had all got to that point where the class sizes were so much that they were constantly running in this risk space that is just beyond the boundary of acceptable performance.

You wouldn't think of it in terms of safety. That's almost an industrial relations issue or quality of education issue. But that was the big performance-shaping factor that was obviously driving the safety of everyone working at the schools.

David: I think this conceptual framework that's a great example for these purposes, not a nice example for the people involved, but this idea that if there was one incident one day and you put one additional safety measure to cope with that one cause on that one day, then you're not doing anything to change your safety system because it'll just be another type of incident, another cause, just resulting from the same behavior-shaping condition in the organization.

Going back to Jim Reese's work, which is you can't just keep swatting at mosquitoes, you actually have to drain the swamp, one of his quotes. I think that's the overarching conceptual framework that Rasmussen wanted us to have.

Drew: Yeah. The picture that came into my head when you were saying that was you've got a flood against a damn wall, and every time a hole springs, you stick a finger in one of the holes. Sure someone can do that, but you really need someone to step back and get rid of that water pressing against the wall.

David: Another good analogy, I guess.

Drew: Second takeaway you've got here is understanding where the boundary of safety is. What we're saying is the acceptable boundary of performance, what is the actual boundary of performance, and how visible is that.

David: There are terms that we haven't explained, but you can read them about the perceived boundary and the functional boundary. Where does our system tell us the boundary is, or the standard work tell us where it is, and where's the actual functional boundary, where does the hazard become clear and present danger? So creating visibility, like we said, earlier around those things. I think that's something that we don't do a lot in safety. It's very present in some organization.

I thought a lot about aviation as I was reading this paper. There'll be a standard process on an approach, a rate of descent at a certain altitude, standard practice for the runway or for the environment. Then there'll be this minimum threshold, which is don't start this at an altitude lower than X or something. It's a good idea of where this perceived boundary is and maybe this functional boundary.

I thought that the aviation industry does a really good job at acceptable standards of work for air traffic controllers and pilots. There seems to be a really clear, acceptable standard of work and a fairly visible safety boundary and margins in the system. I don't know if you'd agree with that, but I think we could learn a lot from lifting this paper out and thinking about the way that aviation gets managed.

Drew: I think that is something that aviation gets really right, partly because they're very, very conscious and explicit of where those pressures are coming from. They know that pilots aren't perfect and that pilots need to prioritize what they're doing so they can focus on the most important things at the most important time. They know that there's this constant pressure that there are some rules that are there because they save fuel and saving fuel saves money.

A lot of the rules are there for those reasons, not for the safety reasons. You need to know if the rule says, don't go below this altitude that it's that altitude because of the fuel savings or because of the noise, and the safety of one is 1000 feet below that. That 1000 one, that's hard because there's a hill there.

David: The final takeaway and wrap up. I think Rasmussen probably deserved a slightly longer episode today, in any case. But this idea that all work faces constant efficiency pressure and needs to have constant counter pressure. When you look at this model in this paper, you can see these resource and demand pressures pushing work out towards the safety margin needs constant pressure pushing it back and drawing down on extra resources and pushing back on extra demands.

This is an important realization that organizations need to not fall in love with their own rhetoric around safety is the most important priority, to know that your system is constantly pushing for faster, better, cheaper, and it needs to be constantly pushed back in a meaningful way, not a superficial or propaganda sort of way.

Drew: That's something that I think is a practical takeaway that we can build into the conversations we're having. Asking people when things get really tight for time, what is important that gets done the right way every time it gets done? What are the things that it's okay to just do, acceptably? Have those conversations and you have people talk to each other and to you about what are the non-negotiable limits and what are the things that we do as well as we can, but not perfectly.

David: Thought for a future episode. I know Jim, Tony, and Ron, some colleagues just published their book on Critical Steps, which is what must go right. There are a few papers in and around that, so maybe we can talk about that in the podcast because I think that's a good call out. What must always go right or what must always be done to what acceptable standard?

I wrote a bit of an answer here for this one, but I'm just going to throw it to you. You can answer it if you like. The question we asked this week was, do we have adequate models of accident causation? What's your thought?

Drew: I'm going to just throw that one straight back to you, David.

David: I'm interested in your thoughts because I said, the answer that I thought is that after reading this paper and thinking through what's come over the last 25 years, I think that we may have some adequate models in the literature, but in thinking about organizations, I don't think we often have adequate models shared across our organizations and industries. I'm interested in whether you think the literature is actually yet to have adequate models of safety.

Drew: I think an adequate model requires a better understanding of how organizations try to influence safety. It needs to draw more on the organizational literature about how organizations work. I think we have good enough models of how the accidents themselves happen, but those models aren't good enough to include in them the effects of our defenses. Very often, what seem to be good models when we turn them into defenses, the defenses don't work. So therefore, the model cannot have been adequate.

David: Great work to do. Good. It means that the research is going to keep coming and the podcast can keep coming as well. 

That's it for this week. We hope you found this episode thought-provoking and ultimately useful in shaping the safety of work in your own organization. Join us on LinkedIn or send any comments, questions, or ideas for future episodes to feedback@safetyofwork.com.