The Safety of Work

Ep.24 How did David Woods discover the theory of graceful extensibility?

Episode Summary

On this episode of the Safety of Work, we discuss how David Woods discovered the theory of Graceful Extensibility.

Episode Notes

Drew isn’t here today and in his stead is Professor David Woods. Tune in to hear his discussion of graceful extensibility and how it applies to the current battle with Covid19.

Topics:

Unwittingly developing theories.
Building resilience in organizations.
Framing his theory in terms of current events.
How the brain deals with changes.
What the data from Covid19 will tell us.
Net adaptive value.
Saturation and decompensation.
Proactive learning.
Reciprocity.

Quotes:

“The simple idea is that we are always vulnerable to surprise. Surprise is ongoing.”

“[The death rate] is going to be correlated with who anticipated...they will have better outcomes for patients.”

“I have to generate, mobilize, and deploy new ways of working, as I start to run out of the capacity to continue.”

“Decompensation in our current case is happening at a society level, at large scale jurisdiction levels; it’s happening at hospital systems levels…”

Resources:

Woods, D. D. (2018). The theory of graceful extensibility: basic rules that govern adaptive systems. Environment Systems and Decisions, 38(4), 433-457.

Feeback@safetyofwork.com

Episode Transcription

You're listening to The Safety of Work Podcast, episode 24. Today we're asking the question how did David Woods discover The Theory of Graceful Extensibility? Let's get started.

Hi, everybody. My name is David Provan and I'm from the Safety Science Innovation Lab at Griffith University. Welcome to The Safety of Work Podcast. If this is your first time listening, then thanks for coming. The show is produced every Monday and the show notes can be found at safetyofwork.com.

Drew isn't with me today because this week, I'm speaking directly with Professor David Woods, who I’ve been fortunate enough to spend a fair bit of time with over the last few years. Dave, thank you so much for joining us on The Safety of Work Podcast. I was hoping to get you on before the current situation with COVID-19, and just pray, our listeners this will come out in about four weeks from when we record, but it's the second of April today when we're recording.

Dave, just give us some foresight about what you think will be happening in four weeks' time. What I really wanted to do was to take the opportunity to hear from you about the theory of Graceful Extensibility. Given the situation we're in right now, we might be able to help our listeners to understand the application of that theory and the problems that that theory can help us explain and understand in the context of the current situation with COVID-19 around the world, particularly maybe in relation to the healthcare sector, how it needs to adjust, adapt, and respond to the situation that it's facing.

Dave, I'm really interested to understand I suppose, over the last 40 years, all the fundamental findings that you've been involved in across nuclear, aerospace, healthcare, information systems, how that led to Graceful Extensibility and what you wrote in 2017, 2018?

Dave: It was one of these issues where you've been developing something and you didn't really know you had. Remember we started Resilience Engineering circa 2000, and then we had our first meeting in 2004. A lot of this was around NASA's Space Exploration mishaps in 1999, and then unfortunately the Columbia space shuttle accident 2003, variety of things. And then, we really started to launch the community, and to say we had foundations, findings, ideas, and things that matter.

As we pull things together for the 2006 book and move forward, it was time to go back, synthesize, and reframe things that have instances and empirical confirmations that go but way, way back. It was an early nuclear study we did. It was all about aspects of reciprocity and risk of saturation.

How did one recognize when another scope of responsibility was going to run out of space and they were just running out of the capability to control the process? How they could change what they were doing for their scope of responsibility to help the other operators through that difficult patch.

We saw it in studies Richard Cook and I did 31 years ago when we first started looking at the operating room. Less expertise in anesthesia, it is anticipating bottlenecks ahead, even though the occurrence of those bottlenecks might be relatively low probability. The difficulty was, if you had to create the capability to act effectively in the middle of the crunch, you're not likely to do it very well.

To be effective as a responsible operator, the anesthesiologist would invest extra energy and extra work in advance to create the opportunities to act effectively, should the crunch arise. We illustrated anticipation. We found ourselves writing papers using the word saturation. When I look back, I was surprised at how many times I had used the word saturation and also in papers with Richard around 2000 about bridging gaps, how resilience is found in the operators at the front lines, because they have to adapt to make the system work and so they fill in the gaps and the holes.

In 2004, we started calling Work as Imagined versus Work is Done. They have to bridge that gap as responsible agents in the world. All of this was part of a shift. We were all dissatisfied with safety, because we just kept fighting the same battles and not making progress.

We decided we needed to reframe and shift the ground on which we fought these battles. We started saying in a world of complexity, resilience, building resilient capabilities, investing in the sources that create resilient performance is something all organizations need as the world gets more interdependent and complex.

This was coming together pretty well and I was starting already to follow and use John Doyle's work with a variety of colleagues when he and I shared the platform at a conference on complexity. The interactions with John, some of his colleagues, Dave Alderson and others, helped hold together and understand some of the things that are going on as well as have some areas where we have mathematical proofs of concept, where the nonlinear math and things play out.

Interactions have ended up being interesting. One of the fundamental concepts is about fundamental tradeoffs. We both come to the idea that there are about five dimensions, five kinds of tradeoffs, things that interact. It's not just moving one point on an operating curve. We want to illustrate that in a couple of minutes.

When you have to do that, you're really making a shift on several tradeoff curves. A lot of this is about biological examples that illustrate how you can navigate dynamically in this trade space, and that plays out on a different scale. We end up with this problem of architecture structure for organized complexity. There is an organization relative to complexity so you can outmaneuver the kinds of interdependencies and traps.

Ultimately, a lot of what drives this is a complete reframing where most people operate on let's put some simplifications, it's okay to pretend the world is only linear, it's okay to pretend that surprise is only about frequency. It's okay to do these things, and look at what we can put together and understand the world. The success of that simplification, linearization strategy has been to improve optimality when it's safe to simplify. That turns out to not hold in general. In fact even worse, the successes it creates turns out to drive a growth of complexity. We have layered networks at scale.

Let me switch gears for a minute and let's start with what I read today. Let's understand this by what happened today. A doctor in the New York Times writes an editorial about what are healthcare workers supposed to do when they are poorly provisioned, overloaded when society as a whole has been late and they're receiving the end result of COVID-19 outbreaks out of control, such as in New York, Italy, Spain, Wuhan, places where this happened.

What's their responsibility as healthcare workers, frontline workers? They have the responsibility to make it work. When we study in emergency medicine or the studies we've done of large mass casualty events beyond surge events, what do we find? The people on the frontlines adapt.

It's ad hoc, because they're not really prepared. The kinds of plans are fictional. They're under-specified and don't really address the real issues that they have to recognize and adapt around in order to continue to provide care to injured or sick people, but they do it. They keep stretching, they keep stretching, they reconfigure space. They reconfigure how they work together. They reconfigure how they deploy expertise, because expertise will turn out to be in short supply.

They have to integrate people who come to volunteer. People come to the hospital to help make it work in a mass casualty event. How do I integrate them in? I sacrifice things like documentation, because so many people are coming in, but I still have to do documentation because we transfer patients.

How can I do it in a new way, because the standard way is way too slow, way too workload intensive, it doesn't work in the crisis situation. We change the way we deal with other parts of the hospital and pharmacy reciprocates, because what do they do? They don't come back and say how do you have to put that request into the formal system, and then it will show up on my computer screen, and then I will fill out all these other forms, then the authorization will come from the other center, and eventually it shows up in 12 hours or whatever, three hours or whatever. It's not possible.

They adapt their processes and modify them. What did the doctor say today? Health care workers should not be forced to incur additional risk because people don't want to practice social distancing, or we should not have to pay for short sighted policies that have already eviscerated public health infrastructure, and of course we need proper masks.

Social order relies on reciprocity, imposing outside burdens on one group without sacrifice from others is unfair. Doctors and nurses and other healthcare workers may be heroes in this pandemic but we will not be martyrs. What does that capture? Reciprocity and sacrifice, the sacrifice judgment was one of the key things they laid out in the 2006 Essentials chapter.

You can look at last year's essentials revisited chapter for some of these things where I emphasized the other word to use reciprocity. Without reciprocity across groups, what did we put on our diary? Our COVID diary yesterday is an analogy to team defense in basketball since this would be basketball season in the US, and Team Defense is all about help.

You start to recognize one player's responsibilities are about to break down in the face of the offensive plays and skill sets. The other defenders come to help which then means other defenders come to help, but of course, it isn't just what happens during the game at that moment. It's what the team worked on in terms of the different tactics and strategies for help defense.

How does that work out, how do they put the team together, and various things? All of the layers in the team and its structure come into the play to build a system for Team Defense that has to change, as the opponent makes changes. We wrote it up as Team COVID versus Team USA. Team USA got behind real early. We need that system of help in order to make progress. Guess what? We ended with the line.

Whether we build a system that helps or not the healthcare workers, cleaners, and delivery people who are on the front lines, a lot of them will step up and they will do the job now. Reciprocity, the sacrifice judgment, tradeoffs are all in that quote.

Now let's take another big idea that matters a lot here. I've been talking about this for 40 years since The Three Mile Island nuclear accident started my career. A couple of days ago, a professor at Georgetown University said in biomedical that a biomedical powerhouse like the US should so thoroughly fail to create a very simple diagnostic test that was quite literally unimaginable.

I'm not aware of any simulations that I or others have run, where we even consider such a failure of testing. Wait a minute, guess where I started, Three Mile Island. That was written then. It was written with Challenger, Columbia, and many, many highly visible cases of failures. This was an early idea. It came under different labels and it's often been misunderstood, talked about, and not driving at the core simple idea.

A simple idea is that we are always vulnerable to surprise, surprise is ongoing. It's not a rarity that occasionally happens once in 100 years. Surprise is always present. Small surprises are always occurring and it's part of the gap between Work as Imagined and Work is Done.

In fact, it's fundamental, it turns out systems have to run this way. The simple reason is that they're the two only assumptions that I have to make to lay out Graceful Extensibility and the theory of Graceful Extensibility that explains and integrates all this stuff and tells us where to go, which is finite resources and the change never stops.

Why does that matter? Finite resources means there will always be limits to whatever planful approach we have worked out. However, it's embodied in technology, procedures, training, and whatever. Whatever scale we look at it, even in places like the emergency room that's designed to adapt to a certain degree.

However, we lay that out, finite resources and change means there will be limits. There are events outside those limits or boundaries. Those events are surprises. I've often used the metaphor visually, dramatically of dragons of surprise, which if I could ever go visit my two-year-old grandson who loves dragons and dinosaurs and can't tell the difference. He got to see that grandpa with the dragon animations, he would be very excited and go roar.

They are roaring dragons out there now, it's COVID-19. They're not subtle dragons, they are brightly colored dragons, they are beating up all different kinds of parts of the system at the same time.

David: When you talk about those assumptions are locked away that you say that uncertainties never zero and vary and therefore risk is never zero and varies.

Dave: That's right. That leads us to these tradeoffs are fundamental. Idea of surprise turns out to be really, really hard for people to get. I've been a little surprised because it's not frequency. We can run through and I usually do a corrective about the frequency version of surprise because distributions are heavily tailed in several senses.

Three out of four years, we had 1 in 500 years of extreme rain events in Houston, Texas in the US. The frequency models are not correct as simply from a frequency point of view. This has to do with space and time averaging and other things.

What's the probability of an extreme rain event happening in the US over any three month period? It isn't a probability, it's a number. These things happen regularly, so you have to prepare for them. We often find we're quite underprepared because we misinterpret frequency, but the bigger issue is surprise is model surprise, and it goes back to the old military saying, plans fail quickly after contact with an adversary or in disaster planning.

All of our emergency plans fall apart after contact with a crisis. These are old sayings, that what they are saying is our models of how things should work, the models that have developed over time to get better on some criteria, to get more optimal. Even if we put in some mechanisms to defend against well understood kinds of disturbances and failures, they have limits, they get challenged. You can't enumerate the kinds of challenges that can arise, but you still have to be prepared.

Now we're at the discovery of Graceful Extensibility, and the story of that is what I meant from the beginning in 2000, when I first proposed that we needed to develop resilience engineering as a proactive safety strategy for NASA. I meant resilience as the opposite of brittleness. I knew resilience as a label would get popular, but I had no idea of how ridiculously popular it would get, therefore used to cover everything and anything and become as a result of merely a pointer word and not really have any technical content or substantive content anymore.

About 2014, 2015, I realized, or grudgingly, I had to think about this differently. In some ways I had to go dig into what I really meant? Why did I want to see it as this inverse of brittleness? Given the work with John Doyle, I went back and started looking through what I really meant and the work I've been doing with John Doyle and his papers and proof of concept studies, going back to some of our studies and proof of concept from our colleagues like in emergency medicine. That's where I realized the core principle.

This wasn't just a work thing, there was a fundamental concept underneath that applied to all adaptive systems at all scales, given that the interdependent, interconnected world is inescapable eventually. It was this line that viability requires extensibility, that this is a hard constraint. It's a universal constraint, viability requires extensibility.

What that means is really simple, is in the short run, I can be more optimal for what frequently occurs, what I understand, what I can come to understand, and I make investments and try to constantly improve that. That seeking of being more optimal for what regularly occurs, especially regular variations by trying to estimate patterns and variation. That's important across the biological sphere at any level, from cells, organ systems, healthy organs, sick organs with people helping, all the way up human systems and societies.

David: Humans in organizations will always try to get more efficient, faster, better, cheaper, easier to do anything that they do every day.

Dave: It's perfectly reasonable to do that. As long as you don't only do that, because what happens given the constraints of an interdependent world and the cross scale interactions, finite resources, continuing change, surprise will show up. There are limits, there are boundaries to your envelope of competence from your planful behavior, then what you end up with is you will be surprised.

Your world will find the limits of your planful system. It will challenge those limits. It is probing all the time and adjustments have to happen all the time in order to compensate. That means that biological systems have to have another capability and we call that Graceful Extensibility. I have to be able to extend gracefully from my normal activity in order to handle surprises. Graceful Extensibility, viability requires this. Why, because brittleness.

Descriptively, brittleness is a boundary effect. At the edge of your limits, how fast does performance fall off? We think of a material, it bends, it bends, it bends, and it breaks. We call that its brittle point. You could look at that in material science, we can draw the curves in material science.

The same thing plays out in adaptive systems. It's called decompensation. It's a fundamental way adaptive systems can break. How do they break? They continue to exhaust, deploying resources, resources they can deploy to keep up with increasing demands or eventually the exhaustion. There's nothing else to grab and there's a sudden collapse in performance.

By the way, this is what's going on at multiple levels in the current pandemic. You have stages of physiological respiratory, decompensation has to get harder. Respiratory system has to start failing enough that you go to the hospital, if you're sick enough to get into potentially overloaded hospitals. Then when you're in the hospital, they're watching you and getting ready because your respiratory system may decompensate again.

It relates to the physiological mechanism of this particular virus, which is not the normal configuration of symptoms and risk. What do they do? They preemptively intubate patients, in a very particular way, to deal with some of the side effects, the side constraints of this disease and its risk that it brings. They need to get control of the airway and be ready to support the respiratory system. Why, because when the patients decompensate, you need to do something constructively fast.

If you're trying to intubate them when their oxygen saturation is plummeting and the organ systems are starting to respond to the lack of oxygen, it's too late and you're unlikely to be successful in supporting that patient.

That happens again later. Patients look like they're on recovery 10, 12 days in maybe even a little longer. Those are just long time delays and then what happens is you get a sudden cardiac death. The mechanism that is unclear, but it's another form as the virus percolates to your body it has a secondary effect, may depend on some other illnesses and states of the person's physiology we don't know yet, but it is decompensation.

What's the ICU doing? It's the risk of decompensating in New York, in Italy, in Spain. As loaded, overloaded, overloaded, overloaded, especially as healthcare workers drop out because they become sick as well, because there's a lack of protective gear. Notice how one thing is compiling on another as this plays out.

In the biological world, viability in the long run, avoiding brittle collapse turns out to be essential. Because eventually, no matter how specialized you are for the environment, as is, as it's changing slowly and the variations within that environment seasonally, etcetera, the world will shift on you.

This is back to Darwin when he visited the Galapagos Islands. We can talk about it with Darwin's finches. What’s finches? That’s one specialized for an environment to get strong beats for nuts. Another one specializes for food from flowers and long slender beaks. Where did they all come from? A generalized bird. But those specialized ones, while being more optimal for the current environment become less optimal should that environment change. The more generalized, original species now become more adaptable to the new environment.

It happens with neurophysiology, it happens with the brain. It's really interesting what the brain does, because it simultaneously is listening to the world about what's typical, what's frequent, what's the frequent patterns of change and variation over a certain time and space range, we're really good at that.

The brain is going to pick up these trends. Two behaviors to match them given the affordances for that particular organism. The world changes so what does the brain have to do, it has to be really sensitive to what's new, what's novel, what's a surprise, and it turns out it is. It's really sensitive to anomalies.

Within a certain space and time scale, it's enormously sensitive. It builds up this rolling model of what's typical that varies with context, and it's really sensitive to departures from typical. Wow. In fact, it does it really early in neurological processing. It's not something to infer, infer, infer and I got to compute, compute, compute, and then I'll figure this out.

No, it's set up to do it really fast, because if novel things are happening, you want to know about them really fast. What's going on related to this is the anticipation. We have the critical function for resilient performance of anticipating what Richard and I found 30 million years ago, anticipating bottlenecks ahead, and adapting in advance of the crunch.

We see the difference now between jurisdictions where they took the information and started to act early. My state of Ohio, our governor acted early before there was a single case in the state. He canceled a very large sporting event. Economic consequences for several parts of the state and the cities involved a lot of criticism, but he ended up being the one who saw the signs from other places because it's a rolling series of outbreaks.

The other outbreaks and what had happened provided a basis for him to see what could be coming and say I need to prepare the stage in advance. I need to reduce transmission rate, so social distancing, canceling large gatherings. I need to expand the capability, the readiness to respond to our healthcare system, so they can take on more patients. He was implementing a basic resilience engineering findings and tactics for dealing with surprise by anticipating and building a readiness to respond. It turned out to require coordination across many different roles, jurisdictions, levels, and has turned out to not be able to be supported by a federal level response. The state is responding effectively and trying to stay out of controversy, but we're going to have to take care of ourselves. That's a breakdown in coordination or synchronization.

We've seen three downsides, and the three associated complimentary upsides of the way adaptive systems can fail. The compensation, they can't keep up with the growing demands and difficulties, disturbances. We see that in several places in the rolling outbreak around the world.

The opposite is anticipation. We see parts of Asia anticipating and understanding what happened in Wuhan, growth transmission early with extensive test track, and isolated standard epidemiology intervention, coupled with canceling large gatherings and social distancing. They've reduced transmission, not overloaded their hospitals and been able to relax some of the activity restrictions in their societies.

We see the second one working across purposes versus synchronization. We see that working across purposes breaks down across levels, vertical levels in society, certainly in the US and the UK, as opposed to places like Singapore and Taiwan where those were well synchronized vertically.

What's interesting is, what we've seen many times, the people who are taking on the responsibility. The people in the roles who end up being the final and most critical source of resilient performance, they have to adapt and they know they can't do it by themselves. What do they do? Remember that quote I read from today's New York Times, so what do they do? They build ad hoc emergent horizontal communication. They pulse their personal network, who they were fellows with, who they were residents with? Where are they in the world? Are they earlier in the rolling outbreaks? What can they tell me about this disease and what we have to do to get ready? They work their horizontal professional links. Where's the best information? Look at the society for critical care medicine or emergency medicine.

These kinds of societies are trying to collate information. What do they tell you? You talk to the people who are in the ICU as physicians, they come back and go well, what I heard from my old colleagues is the freshest stuff, that's important because that's the fastest information. It needs to be corroborated and so we get this from the professional societies.

Most are saying the same thing, it's just a little time delay. We get other information from authorities. Those are time delays some more because they're processing, and then they have to get it up, and then they have to commute it back down, so that's even slower.

We have to accommodate that, and we ran through all the ways that the intensive care units were adapting, the hospitals were adapting to support the intensive care units. How the larger system across the state was adapting to support the hospitals, support the ICU. In all of this, he comes back to me and says it's going to be interesting to see how fast we can learn and adapt and get that information to the bedside.

How fast can we learn and adapt? What's that? That's the third one. Slow and stale, where we get stuck in old models of how things used to work well when the world has changed and we're not ready. We're falling behind that change. It's not that the plans weren't appropriate to a previous world, it's just not the world you're in anymore.

Notice the downside version of people saying, oh, it's just the flu. The answer is no, no, this is not. This is a novel disease with novel presentation, novel risks, challenging the medical personnel to re-organize their knowledge and techniques in order to care for these patients. They're learning as they go, and they're open to learn more. That's the proactive learning or upside on the third one.

Being stuck in stale approaches versus proactively learning and adapting from what's going on. That's one of the ironies in terms of illustrating the resilience findings in the rolling outbreaks. Because it happened to somebody else first, you had the opportunity to anticipate and adapt. You could learn, you could start the coordination process, you could build up resources in advance of crunches that could be coming.

How will this play out? We'll see. It's always possible that some jurisdictions will over prepare. At least in this round, they will not get crunched. They will have capacity. They didn't have to shut everything down. The answer is good, that's a sign of success. You have to over prepare. Viability requires extensibility because if you didn't build it in advance and you happen to get crunched, your performance will be worse.

What's ironic about this setting from a safety point of view is this is a case where we have what we've never had. With safety, we can't have bad events. We have too many bad events, it's too costly and we have to change. We end up with this problem that we can never show you for sure in some statistical way that it was important to make these investments in advance, that it was necessary in order to handle potential crunches to support sources of resilience at the frontlines.

In this case, unfortunately, we will have the data, it's called excessive deaths. Just like in the 1918 Influenza Pandemic, you can go back afterwards and you can calculate the excessive deaths. There'll be arguments and estimations about hidden deaths, and were they assigned to the virus or something else, or they were assigned to the virus when they should have been assigned to something else. Underserved populations will have deaths that happen outside of hospitals because there's not enough care capacity. They don't have access to it but they'll estimate it.

You know what's going to be correlated with? It's going to be correlated with who anticipated and built a readiness to respond will avoid getting slammed, avoid getting their ICU and hospitals saturated, and they will have better outcomes for patients.

David: That's a real fundamental finding. I think you mentioned that to me before about emergency response in control room settings. I suppose people and teams that are able to revise their model, revise their situation assessment, and change the way that they understand what's going on are the ones able to respond quickly and make the small sacrifice judgments which become much easier to make than the bigger ones once that cascading chain of events has basically resulted in saturation.

Dave: That's one of the things that's been coming out recently. How do you make the sacrifices small enough that they're doable and they have a direction that you build? You make a small sacrifice that generates more information. How hard do I have to work to handle this which gives you feedback? I have to work harder. If you earn the feedback, you need to work harder. You need to invest some more. You need to dig a little further. You need to recruit some more specialized knowledge. You need to organize in a different way in order to continue to keep up with the pace of demands. It's about keeping pace.

What you head on is a very basic aspect of adaptive capacity which is you're poised to adapt. You have to have the capability in advance of the situation that demands performance. It has to exist in advance of the demonstration of that capability and putting it into action. You have to be poised to adapt.

It's like potential energy in physics. Go back to your high school physics. There's kinetic energy and there's potential energy. We can run these things. It's like this. This is like the fundamental physics for biological systems. Just like fundamental physics in the early 20th century, it's pretty bloody counterintuitive. The way the world works, the way the biological world works is not the way we think it does. We can simplify all we want but eventually those simplifications will bite us, and then we will have too little adaptive capacity. What will happen is some people will step into the breach. As responsible people, they will try to make it work, they will fill the gap.

One of the surprises in this when we reframe and put complexity first, the world is always complex. Is it safe to relax, how to behave in a complex world temporarily, versus the other way which says, “We can relax.” It's safe to relax. We'll only deal with those complex things when we absolutely have to. That hasn't worked. It doesn't work at all. Instead, we're going to start with complexity and say, “What do you have to do here?”

The first one is viability requires extensibility. You have to have Graceful Extensibility. It has to be nonzero. What that means is you had a minimum, you have to have two kinds of performance. You must have the ability to get more optimal. You need to get more efficient, more productive, more tuned to what regularly happens relative to your goals and things. That pursuit of optimality is a pressure on this other capability that builds an adaptive capacity at the boundaries when surprise happens. Graceful Extensibility, you need both. I mentioned examples in biology and I could go on where they're both.

By the way, ironically, what's one of the best examples? Viruses. Viruses would have been beaten by immune systems and their capability to develop treatments. Why do they keep coming back? Because they can adapt to preserve future adaptive value. The biological instances that show these things had seemed impossible or in fact, fundamental in biology. What we can select for and build is being poised to adapt even though we haven't experienced these challenges. We never thought testing would fail. We never thought about nuclear power in the 1970s that a series of small problems would combine with a couple misjudgments for people who have never thought, never been trained, or engaged in the knowledge that would require them to think like nuclear physicists.

That combination of small things built up in a sour pattern to be the equivalent of the big single failure that they had designed their safety systems around. I think that was possible.

David: One of the big takeouts from the theory for me was really that balance between that robust optimality and Graceful Extensibility. If you invest too much in managing the continuous improvement cycle and managing for the normal daily events, you're at risk of making yourself less able to deal with surprise. Is that a fair take here?

Dave: Yeah, it is. It's called net adaptive value. It is actually other than they didn't put the net word on it. It's actually got long roots in biology. We can pull this out. That's a surprise in my own career, work, my colleagues, stuff, and other things. It's when the reframing happened five or six years ago, you look back through all this stuff and you go, wow, it's here. I had it there. Before me, these people had it there. All this stuff fits together and helps lay this out.

Net adaptive capacity means you have to invest both to build Graceful Extensibility. It needs to be nonzero. We'll pursue optimality and you can build in some robustness criteria in that, but that depends on you understanding the kinds of challenges having good models of them. You have to have both of these things. The problem is they interact. Unless you experience a surprise crisis, tangibly that plans are limited, the world hits your plans, breaks them fairly regularly and fairly visibly, and all of your organization and stakeholders, you tend to see the sources of resilient performance as inefficiencies and so you start to eliminate them.

One of the potential lessons of the current pandemic is that it happens in different parts of the world in different ways. But for the Western World, we've been OCD on narrow definitions of optimality in pursuit of that and we've undermined sources of resilience, the breakdowns at the top levels of the country. The CDC’s failure, the failures of testing, the failures of being able to scale up protective gear, undermine many different aspects.

This is a common finding in disaster response, initiating the disaster event. The crisis event operates at scale. Hurricanes get stronger and extend over larger physical space. What happens is everybody's local plans, contingency plans, and backup turn out to all break in the same way at the same time. You see that playing out in the rolling outbreaks. Everybody assumed.

You see there are some of the medical state level personnel. We always assume we can get help. If the crisis might hit us, but it doesn't hit everybody around us, we can get resources and borrow from them. Well, they're getting hit too. You quickly find that some of the things that are assumed in your backup plans often implicitly turn out to get undermined or broken by the same event that initiates the crisis for your main service.

Things that you would rely on in your contingency plans as a backup resource or another resource you could draw on to compensate for the main event turned out to be oversubscribed because everybody is trying to use that resource as you think of your bandwidth slowing down as an example as we all adapt being on Zoom meetings all day working from home.

A variety of things you think or look at the issues with Zoom on security, privacy, and whatever, things that didn't matter when it was an occasional thing, et cetera. When it becomes your frontline way of accomplishing work, all of a sudden, some of those security vulnerabilities in Zoom or some of the other tools we're using become much more significant change and surprises.

David: Dave, you talk about Graceful Extensibility being a dynamic capability. I think there's a way of oversimplifying this where people might form a view that Graceful Extensibility is just about more redundancy, more contingency planning, more components resources within your system to draw off. But when you talked about your theory about the law of stretch systems and how it's not about just having more stuff in your system...

Dave: The simplest way to put this is I'm going to borrow from two people, Erik Hollnagel my colleague in much of this, and Lynn Margulis, who's one of the major biology figures, in trying to explain what we've understood in biology, she puts it very simple. Life is a verb. Life did not develop through combat, but by networking.

Let's start with the first one, that is life is a verb. It's about what you can do as Eric puts it, it's about capabilities for acting. Building an action capability anticipation supports acting and appropriate timing. These are dynamic capabilities. We have a problem in English right off the bat since resilience isn't a verb grammatically yet, it is substantive. It's all about the verbs and the adverbs, what modifies your verbs.

Western management has been set up around categories and nouns. This is where I said this stuff is hard to understand, not because it's in a sense hard to understand, but because we've tried to make the world be the opposite of how it really works. It gave us a bunch of ways to pursue optimality. If we could relax change, if we could relax finite resources, if we could isolate a piece of the world and pretend it isn't interconnected to other things. What we found is that actually people were moderating the interconnections to other things. We're moderating the little surprises that were continuously happening to make this little planful envelope work as intended.

This is going to get into the third subset of theorems, findings, and fundamentals in the theory. Resilience as a verb is about capabilities. That runs you into the first set of fundamentals in the theory. Given finite resources and change, everything has limits, surprise happens. There's regular occurrence at the boundaries of those limits. That means what we're interested in is the risk of saturation or sustaining the capacity to maneuver. In other words, as I start to run out of action capacity, I need a mechanism to be able to generate more. I have to extend my capability for action. I have to generate, mobilize, and deploy new ways of working as I start to run out of the capacity to continue. Where does that come from? It comes from decompensation.

Decompensation in our current case is happening at a society level and at large scale jurisdiction levels. It's happening at hospital system levels when people can't get in the hospital because the hospital is so overloaded, it can't handle more. That's a sign of decompensation and saturation. You can't deal with it very well when you're close to the edge. Instead, you have to act as you start to approach it. We have proofs of concept for many settings, human society level, jurisdictional levels, organizational levels, cognitive system levels, neurophysiological levels, biological cellular levels, in all of these places.

The critical thing here is every unit at every scale is at risk of saturating. That means they need to avoid brittleness. Avoiding brittleness on the positive side is maintaining viability in the long run that requires extensibility and that has to move gracefully from the ways we normally work as we encounter and deal with surprise. You can't just invoke something completely out of the blue because it won't work very well. It won't be sustainable because it's sitting around doing nothing waiting to all of a sudden deal with a surprise. You have to take advantage of things that are normally going on and push them further. You have to build these deeper capabilities of reciprocity, initiative, and many others in order to have that potential to adapt when a bigger surprise hits.

We ended up with this constraint that every unit at whatever scale has to have nonzero Graceful Extensibility. It has to have some. Now, it has two implications. One is that all of our automation has zero Graceful Extensibility because it's all designed saying this is the best way to do things. It's just going to do its thing, it doesn't recognize that risk of saturation. It doesn't monitor for risk of saturation. It doesn't invoke or interact with other parties in a way to gracefully extend performance.

We just published a paper where we did it with automation on an airplane case. Why don't we do it? Because nobody ever tries, we keep falling into the trap of the oversimplifications, and just the pursuit of optimality. You have to have nonzero Graceful Extensibility. What's the second constraint on that is you can't have enough because of the nature of surprise which you can't characterize it specifically. It means that whatever you build up, given the interaction with pursuit of optimality, if I build up a lot to deal with all the kinds of surprises, I'm going to be too inefficient in the short line. I'm balancing that adaptive value between these two capabilities.

One of the things I need is the second set. I need a network of units that have adaptive capabilities to synchronize and coordinate. Remember, that's the second failure pattern. The second complimentary success thing is we need to coordinate. If you look at the hospitals systems, they've been doing a massive reconfiguration across multiple roles and levels, not just within a hospital system but across hospital systems and other aspects in the US in order to build a capacity to respond to a patient search from COVID-19.

It's all about reconfiguring how you work together. It's in our studies of emergency departments that have been fundamental to resilience engineering for 20 years. It's in the studies of responses to be on surge events in emergency medicine from mass casualty events, et cetera. What that means is neighboring units are making either constrict or extend the adaptive capacity of the units under stress. What we want is ones that expand that capacity, that help you when you're at risk of saturation, or when that risk is growing and expand your capacity for maneuver. We don't want ones that constrict. When you see people hoarding resources because they see relationships breaking down across supply chains and things. From a total level, that's exacerbating as shortages start moving around and more people defect from cooperative activity.

How do you build social solidarity so people cooperate? There's a cost to helping the other unit at risk of saturation. I'm devoting resources, I'm devoting energy. I'm not doing the things I'm supposed to count to look good because I got all these measurements that I'm supposed to hit. All these things are monitoring and counting them. I'm not working on them, I'm not doing my role. I'm helping somebody else's role.

David: One positive example at the moment is the health care workers from Poland who are jumping on planes and going to Italy to support the healthcare system over there.

Dave: That's quickly subset two of the fundamentals which sets us up for subset three that we've explained in part in terms of net adaptive value. You have to have a system that has both capabilities. Yes, they're dynamic capabilities. You need to monitor and understand that and make sure you appreciate where your resilient performance can come from. To do that requires a reframing. That is going from my system. I have a great plan. I've worked out a great way for us to work and I have a great track record that we're doing a great job. We've identified some threats and problems. We've made progress and I have data that says we're doing great.

What happens when people adapt to handle a surprise? If they adapt well, you don't see it. That's the law of fluency. Notice, it's a law. The law of fluency is well-adapted activity that hides the difficulties handled and the dilemmas resolved. If you do it well, nobody sees that there's a gap between the plans and the way things really work. They don't see the plans have limits. They don't see the kinds of surprises that are coming, what you need to have in order to figure out what to do in that surprising situation, and who to coordinate with to do it in a timely way.

It's hiding it so when things go badly, the people who develop the plan are responsible for the planning, automation, and procedures and say it didn't work. If you'd work the plan, if you'd work the role, if you'd work the rule, everything would have been okay. They come back and say, “Let's increase the pressure to work the rule, work the role, and work the plan.” Guess what that does?

Pressure turns out to be part of the fundamentals. You increase the pressure to work the role. You get more role retreat. What does that mean? You get less coordination across roles. You get less help when you're at risk of saturation. What do you get? You get less Graceful Extensibility and you get less help when your Graceful Extensibility gets exhausted. What did that doctor say today? I'm not getting any help. What's my responsibility when you don't help, when you don't support me, when I'm personally at risk and I'm supposed to be a responsible actor taking care of the patients? That's the commitment. That's my identity. That's my work. You put me in a fundamental dilemma and it's related to your lack of support.

We see this playing out across the world at different levels when you're late and stale at responding to the rolling outbreaks. You create this fundamental dilemma at the sharp end. The systems are inherently messy. It takes work to see them. Snafu, situation normal, all fouled up is constant. That's what you're hearing from the frontlines. That's what you're hearing from protective gear. That's what's happening because of the lack of testing. It means our standard effective epidemiological interventions aren't possible.

Situation normal all lift up. That's actually normal. Your belief that your plans work great is common and off. That turns out to be fundamental. It's not a weakness of yours, it's not a character flaw. It's not some indictment of you as an organization, a manager, or a human being. It turns out it has to be that way. The difference is do you spend effort to recalibrate.

You will be miscalibrated. You will think your model and your competence envelope is more competent than it really is. You will think the envelope is bigger than it really is. No matter what you do, that will be the case. Who's successful? Those who put energy in and sensitivity. Remember, the third form of breakdown, the third complimentary success to that breakdown is proactive learning versus being slow and stale, stuck in old ways of doing things.

They are poised to adapt, they're poised to learn, they're poised to remodel how things are working and change the way things work in order to meet the new situation. They may draw on fundamentals that made that organization successful in the past but they have to be utilized and match to the challenges of the world in new ways. That’s this irony here. It derives from one of the constraints that people miss, this is called perspective balance. Related to the fact that any unit at any level we're talking about is local, it's in the world in some place. A constraint on being in the world in some place is bound on your perspective. That's stated very simply.

The view from any single point of observation, simultaneously reveals and obscures properties of the environment. It simultaneously revealed some. It's a good perspective, at the same time it obscure summits. It's hard to see some things from that perspective. That's a hard constraint. In other words, nobody can stand outside this world. Nobody can escape from this constraint. You can't get it by putting in a drone at 1000 feet. You can't get it by having sensors on a satellite looking down at some perspective that simultaneously reveals and obscures. You can't have it by saying, “I've got the big view as an effective C suite.” You can get the view only by being at the frontlines and seeing what is the details of what it takes to get the work really done, the problems, and messiness that really occurs there. All of them reveal something and obscure something.

The way biology solves this constraint is to shift and contrast perspectives. Biology has no omniscient point from which you can observe and have truth. No command point can exist but that has the perfect perspective. Rather, it solves the problem by changing perspective and contrasting perspective. I'll give you a simple proof of concept which is the way you move through the world. The way you move through the world is not because you have the perfect perspective but because you pick up information from a changing perspective. If we took that away, you wouldn't be able to move smoothly to clutter the environment. You just wouldn't be able to get around. That's how fundamental this is. It's one of these things that we do but to know about how we do it is really difficult.

David: In my interpretation, Central to Graceful Extensibility is what you talk about as being the capacity for maneuver. Is it fair for me to say that Graceful Extensibility is the opposite of brittleness and also a boundary phenomenon? We worry about Graceful Extensibility when we're at the boundary of our performance template.

Dave: It's a boundary effect. The issue is two things are going on. The performance near the boundary is going on at some rate and at some tempo. It goes on. It's there. You don't assume it's zero and then occasionally happens to turn around the other way. It's going on all the time. What is going on all the time? What are the small ones? Is it over here or is it over there? It's going on and you have to find it. Remember the fluency law. You have to go find. At the same time, you operate far from the boundaries. You do them simultaneously in parallel. That's a big mental switch for everybody because they see this as one or the other mode of operation.

It turns out you don't transition very well from one to the other if you think of them as sequential and separate, if you see them as parallel. This turns out it has to be this way in biology. You can do this badly too. You're not gonna survive long if you do both badly, you need them both. Don't say we're just building up a reserve or building up buffers and extra supplies because those distract from the efficiency criteria. On the other hand, if your efficiency criteria runs a month by itself, you will inevitably eat the sources of resilient performance. You will inevitably reduce Graceful Extensibility, you will inevitably get more brittle.

What's the fundamental finding from all the modern system safety stuff cut through at all? It's very simple. Failure in complex systems is due to brittleness, not to component failure. That's it. Yes, components fail. Yes, we should make better components. Yes, we should have better subsystems. That's not how we get reliability, robustness, and resilient performance from organizations, systems, industries, and services we provide. We get it by coping with the potential for brittleness.

Building up and maintaining some capacity for maneuver. We can’t have a deployable capacity for maneuver that's big enough to handle everything. We have to be able to mobilize additional capacity. It's not going to be enough. We have to be able to generate. An example of the state level in the US as the rolling outbreaks heads toward your state, what are you trying to do? You're going, "I need to generate new capability. Mobilize the ball so I can move it to places that need it, depending on where the outbreak is in my state. The surge is challenging, delivering medical care to seriously ill patients." I have to do that while still taking care of the patient so I still have to take care of those who don't have COVID-19. I still have—despite reduction—serious health problems. It can't be deferred. I have to do it all in new ways. That's three kinds of stressors on the system interacting with new people, with new ways. It's a novel situation.

I need to get that deployed at the frontline. I had to skip to the bedside to be in the ICU. It has to reduce the chances of healthcare providers getting sick themselves, adding more burden to the system, and taking away the capacity to care for patients. Early in this, we were getting estimates. 8% of the healthcare workers are getting COVID-19 seriously. The last one I saw today was 14%. There's some speculation that this happened because of the intensity of their exposure to the virus that they're getting sicker than the average person in the general population. We don't know these things for sure. It flocks on the uncertainty of the situation.

We do know, for sure, that if we can't reduce the infection rate of the healthcare workers, we undermine our ability to provide for the sick patients who need care. Therefore, we will have a higher excessive death toll. By the way, if you look at the stuff I'm putting out, my current range of estimate is 4x to over 10x. We've calculated excessive deaths down the road, that number could come out to be 10x.

The things you do that make a difference, that would reduce fatalities by a factor of 10 or doing the wrong things could increase fatalities by a factor of 10x. Maybe it's only 4x, maybe you'll do okay, maybe it's even less. The difference between the best places and your place could be very large. How are you going to deal with that when this is all over? We kind of have a score. These could be a bunch of places to look pretty damn bad.

David: I think towards the end of the theory you layed out some architectural safeguards or some things that enabled this capacity for maneuver when you're at boundary. I really like those. I think it's not the normal way of organizations generally functioning, which I think there's three things there about empowering decentralized initiatives, like pushing decision making local, rewarding reciprocity, and creating a greater means to synchronized activities. They're the type of capacities that you think organizations really need to be deploying.

It's kind of interesting that every organization around the world, at least some part of their organization, is severely stretched.

Dave: Yeah. We can add a couple more to that list. When people say, "Okay, I got the concepts. What do I do? What do I start with?" The general idea is you've got to support the initiatives—governing the expression of initiative. Does that mean everybody gets to make it up as they go? Without initiative, you can't have graceful extensibility.

If your initiative leads to a lack of coordination across roles and levels, then you end up with fragmentation. The organization has to think about where and how we support initiative, because it can't run amok. This happens in military and military history. If you actually looked at who wins and how they win and whatever, it actually turns out that supporting that initiative, yet embedding that initiative in a collaborative and a network of coordination, that combination is less critical.

That's how you set up, control, and manage the expression of initiative. Without initiative, there's no graceful extensibility, you're maximizing brittleness. You're maximizing the chances and the challenge that your performance will collapse.

Reciprocity, that's why the quote today came up and noticed by a bunch of us simultaneously today. It's been much of what we've been pushing in terms of the response to this whether we use the word reciprocity or not, our basketball helped. Team defense is all about the help. Reciprocity. What's interesting is reciprocity is an old find in social science. It's often talked about as altruism. In Eleanor Austen’s work, it gets a little more specific and relates to what is called Polycentric Governance. Some of these are architectural issues about how multiple jurisdictions, places of partial authority, and autonomy have to interact and coordinate.

Reciprocity, in the theory, gets some more actionable definition. Reciprocity is specifically about your ability to see another interdependent unit, role, or level is under stress, and at risk of saturation, running out of capacity maneuver. You will take action to extend it. The second set of fundamentals that you will expand and extend their ability to perform despite the increasing stress and load on them.

In the theory, it's that second subset. It's absolutely fundamental. You can't do it by yourself. No matter how you build for graceful extensibility, you can't have enough. If you do have enough, it won't last long because of the efficiency pressures. One crisis is visible and frequent enough. You have to build this set of reciprocity in advance. You can't build it in the moment. You can't build your management of expression of initiative and how you reconfigure roles and relationships to support an initiative. Embedded in a larger coordinative network on the spot. You have to do it in advance.

You have to build up these relationships. You have to build up the ability that the measures you use on performance are not so narrow that you are actually and inadvertently driving out reciprocity, driving out initiative, or driving it underground and wasting it. It won't work as effectively. It won't be as easy to coordinate and won't share the information as much because they're an underground system that goes on to make your organization's service work under regular pressures in the regular kinds of [...] that show up.

David: David, if we go three to four weeks from today, what can you see? What can you anticipate is going to be going on with the healthcare sector and the state of this virus based on what you've seen in their readiness to respond right now?

Dave: Unfortunately, the evidence, remember the unique characteristic of what you're getting at is we are in the middle. We are actors and stakeholders in the middle of things we study. We are seeing the patterns of resilience and brittleness play out. We're seeing graceful extensibility trying to be created ad hoc, late in the process, and not very effective. We're seeing decompensation at a hospital and stake or jurisdiction—metropolitan jurisdiction—in all areas and we're seeing the opposite.

In some sense, part of this is what we always see in disasters. We see the best of people and sometimes the worst of people. Those go on together as we see people exploit the changes, the challenges, and the reconfiguration for completely other purposes that have nothing to do with the threat to life that the virus represents.

What matters a lot is how early you anticipate and build a readiness to respond versus doing it late. The prediction of the not in four weeks but when we can actually figure out is the ones that respond late gets slammed at the hospital level where severely ill people outweigh their ability to care. It's looking pretty slammed right now. We'll have higher excessive death rates when this is all over. That's in two years.

It's a rolling outbreak. Places that are late are already late. Now, there's some other jurisdictions that are going to see this later. They're going to have a larger capacity as an entity. Some of these might be rural areas which don't normally have central administrations. They're much more about local institutions. Will they be able to coordinate, synchronize their activities, and learn from the other areas? Or will they think that's not about me? Distance themselves, I'm not an urban area. The kinds of bad things may come in to give them the model they can be. They will not be very affected by this. Or they'll just be late in it and something will happen with vaccines or treatments or something. It'll be okay for us because we're late in the cycle.

The pressures of being late creates more dilemmas that have higher cost than results. We're seeing the impact of that already. We don't have to project out. We'll see the people defecting. You see them already because I don't like the consequences for me. I'm willing to take the risk even though I'm adding risk for others in terms of fatalities. Excessive death counts eventually because in the short run, for me, it's easier. We'll see this defection from a collaborative action. Efforts to build solidarity are very,very important. Jurisdictions and layers of society that operate outside, directly feeding the hospital's capability, were reducing transmission rates, and needed to build solidarity in society. We're all pulled into the same direction.

You want to get economic recovery and restoration, the fastest way and the best way to do it is to break the transmission so cases aren't going severely ill. Generate the capacity to treat all the people who are severely ill as well as all the other people who still need healthcare.

We're going to put out in the next few days criteria. It needs to be met if we're going to resolve the outbreaks and start to relax some of the restrictions. How do we start to move forward and bounce forward? One of the common mistakes about resilience is we bounce back to where we were. We return to the old equilibrium. It's a dynamic thing—equilibrium.

Equilibrium models are simplifications of these processes of the adaptive systems. You're not bouncing back. You're not restoring. You're not going back to where you were. You will be different. We will all be different. How different? In what way? What research says is we will draw on our past. We will draw on what worked. We will draw on images of our past. Our identities about our past to define a post-COVID reality. I think it will make a difference in terms of access to healthcare. This is a kind of situation where somebody's slice of society’s lack of access to healthcare puts everybody in that society at risk.

In fact, it's part of our restoration criteria that if you want to start rolling back restrictions, it's kind of hard to see how that's going to happen if you can't test and treat, or isolate and treat people who have the virus. You've got to see this in a society. The places that have done well do that. They test everybody, they test extensively, they trace contacts, they isolate. As the symptoms get worse and you're more ill, they have the capacity to treat. That's what's going on in Germany.

Germany is working, not because they tested the earliest, but because they're testing the most thoroughly. They are testing, tracking contacts, isolating very, very thoroughly, and effectively. That's why transmission has been low. They’ve also had more capacity in some of the other European countries in terms of their medical system and were able to stand up additional capacity so they have not been overloaded at this state because they acted in advance of the surge. Death rates are very low in Germany. That's a relatively reset in the rolling outbreaks example. We could look at Singapore and we could look at Taiwan. We could look at South Korea. The stories are slightly different in each place. The size of these jurisdictions are different than the US or the European Continent.

What do I see coming? I see a bunch of places that are going to get slammed. I think some places will get lucky. Taking an early action will be sufficient to keep things within the range of their healthcare system capabilities. It's interesting how we advise organizations not to be there at the right time. We can have all the explanations and have those explanations lead to help them make it decisive at the right time. A lot of this is about timing. You'll be decisive at the right time, you need to be decisive. The time you need to be decisive is not going to be obvious if that's necessary. It's easy for people to delay until they think it's absolutely necessary, like Florida. It still hasn't completely come to grips with necessity of action. You fear for those jurisdictions that they will have a rough time ahead.

One important point to make about this situation, remember, one way the theory is grounded on an expanded view of control's dynamical systems. In dynamical systems, saturation and lag are the two things that kill performance. I'm basically treating lag as another form of saturation in this theory. The COVID virus is perfectly poised to make it hard to counteract because of the longtime constance—the lag in the system. The fact that people can have the disease and not have symptoms, stay asymptomatic but be highly transmissible, makes this very difficult.

The idea that to cover a full distribution of onset of symptoms is 14 days, roughly, is a problem. To recognize that if you need hospitalization, it's probably going to take 20 days before you leave. One way or another, it's going to be 20 days. You take up an ICU bed, you're in there for 12 or 14 days. We can saturate the resources very quickly because of the way these accumulate because of the time delays, the timing of this stuff. What does it illustrate? Let me come back to the simple story at the beginning—resilience is a verb. It's about verbs. It's about the adverbs and modifying the verbs. It's about how to know when to be fast, decisive, and when to be slower and thorough.

Crisis like this, you don't get to be one or the other, you have to be both. That's what's going on. For example, developing treatments and vaccines, you have to be fast. If you give up being thorough, you're going to create new forms of harm. Avoidable harm because you stop being thorough. Some of the success stories we look at, we see how organizations reconfigure, push initiative down, build reciprocity, realign coordination between top layers of the organizations in the frontlines, in order to be highly responsive, and still have new ways to be thorough. What do they sacrifice? They sacrifice the normal procedures and ways of doing things—the controls, goals, those procedures supported.

They spend more resources. They spend a lot of resources. They don't worry about resource expenditure. What matters is, "Can I get this stuff? Do I have this stuff to take the actions that are going to be effective at the right time?" I checked and the action I take is not having negative side effects that undermine or offset the benefits of the action.

We see new forms of horizontal coordination like I described in the healthcare system. Those have to be supported so that all of a sudden, people are talking directly to each other. When somebody says, "This has been good for me. We're going to go do this," and somebody else goes, "Whoa! Whoa! Whoa! There's another way to do that. That's going to screw us. That's going to put this at risk. We don't want to do that." They go, "Oh, yeah. There's another way we can do that." "Oh, good. That one doesn't put me at risk or my people at risk. Let's do that one."

Often, their second choice may have less consequences for me or fit with my first or second choice in a way that we both get goal satisfaction and risk reduction in handling the new situation.

David: All right, thank you so much for your time. I really appreciate you doing that. I was in a study group the other day that was facilitated by [...]. We met and hopefully, now, I can send this off to people who are involved in that but didn't quite get to the bottom of all the ideas. Thank you so much for your time.

Dave: There's a lot of ideas in there. A lot of them are challenging all the way from. We all think we have a command system and it turns out, biology won't let you have a command system that works and that's adaptive enough. The fact that your models will be wrong and it's okay. Your models will get wrong. The world will find ways. In fact, one of the biggest reasons your models will get wrong, your procedures, automation, whatever, is because they're successful. When they're successful, it triggers more forms of adaptation. Those forms of adaptation will produce new kinds of challenges, new dragons of surprise, that will test the limits of what you worked, what you approved, and what you're successful at.

The idea that you have to keep on learning and that you will be miscalibrated. Successful organizations put effort into testing, is my model of the world still the way the world works? Or is something different going on and I need to change it? It's okay to change. Don't get stuck in a stale situation because it used to work.

David: Before we wrap up, I thought it might help to briefly share my thoughts on that discussion with David. Graceful extensibilities are dynamic capability. It's really hard to have a set formula for how to build it into each individual organization.

Here are my five practical takeaways from this conversation. Number one, all work involved continue to change and finite resources. Therefore, uncertainty is never zero and risk is never zero. This is a really important overall conclusion, overall finding. We really need to be wary of aesthetic models of work and risk. Method statements, risk registers, procedures. There is only ever a point in time and there is only ever a partial representation of work as done. The more we focus on these rigid, prescriptive, and static models of work, the more vulnerable we are to surprise.

Number two, all teams have base adaptive capacity to adapt their day to day work. We need to think of things like the weather, unexpected situations, equipment failure. All of this day to day problem solving is this demonstration of adaptive capacity in action. Humans are great at it. We need to support this in our organization because this is how work gets done.

Number three, when teams and organizations are at their limit of this local adaptive capacity, they need to be able to extend to avoid catastrophic failure. This is different to their day to day adaptive capacity. What they need to do is they need to be able to invoke new resources, new decision making processes, new compromises around different types of goals and new relationships. They need to be able to draw in from areas outside of their own immediate team.

We see this quite practically in things like cross management teams which many organizations will be standing up right now in response to the COVID-19 virus. This lets your organization function in a very different way to the way it normally functions. You need to be able to know when your teams and organizations are at the boundary of their performance and their resources. You need to let go of your old models of work and decision making to do this.

This is not just redundancy and contingency planning. This is adapting the way that your organization and your teams function to prevent them from catastrophically failing at the boundary of their resources.

Capacity for maneuver. This is what David talked about. This is what helps. This is what you can build into the organization before your resources and teams become compromised, or as David said, saturated. There's three important factors here. Number one, decentralized the decision making. Teams will always be, organizations will always be more responsive. Will always be able to keep pace with the change in nature of the context for their work if they decentralized the decision making. Let people make the decisions who are closest to the information and to the point of risk and to the real time changing nature of work.

Number two, reward reciprocity. Teams need support. In trying circumstances on the edge of their performance envelope, neighboring teams, and vertically up the organization can either constrict or extend them to be able to do that. Most importantly, a horizontal relationship. You need to think about this in a context of an organization that runs multiple sites. What are you doing to encourage relationship, sharing, leveraging, and mutual support arrangements between different sites? The site manager of one site can straightaway call a site manager in another site to draw in resources, to draw in advice, and support.

Organizations that do this well, that maintain this really strong horizontal relationships for like teams across their organization, are going to be able to extend their capacity when they need to. You have to have a means to synchronize your activities. You need to make sure that you can fastly pass information, communication, and decision making across the different parts of the organizations. You need to be practising this in advance. You need to be connecting your organization well during normal work so that it knows how to maintain those connections when it's tested at the boundaries.

Number five, this is really tough and really interesting. The more you focus on efficiency, the more fragile you become to more unexpected surprises. That means the more optimal you are to the current environment—the faster, the better, the cheaper—the more efficient you are for what you're dealing with today, it's going to make it more likely that you're not going to be able to face the things that you do that you might run into tomorrow. If your efficiency criteria, as David said, runs rampant, you'll aid into your graceful extensibility.

Listeners, the safety professionals in their organizations, need to be a voice for maintaining some level of flexibility and resources on the frontline of their business. A great example is in general aviation where people want to move from two parts to one part of it, is to push for efficiency. It's going to compromise your capacity for maneuver when you're faced with a testing situation, a boundary of your resources, and your performance.

I also liked the way that David says, as a bonus takeaway after those five, teams and organizations are always overconfident, they always think they can handle more than they actually can, they always think that the safety margin is bigger, and the boundaries further away than it actually is.

That's it for this week. I hope you found this episode thought-provoking and ultimately useful in shaping the safety of work in your own organization. If you're enjoying this podcast, please leave us a review. It will help others to find it. Send any feedback questions or ideas for future episodes directly to us at feedback@safetyofwork.com.