The Safety of Work

Ep. 103 Should we be happy when our people speak out about safety?

Episode Summary

In this episode, we’ll discuss the paper entitled, “Repentance as Rebuke: Betrayal and Moral Injury in Safety Engineering” by Sidney W. A. Dekker, Mark D. Layson & David D. Woods. The paper was part of a series published in the journal of Science and Engineering Ethics, volume 28, in 2022.

Episode Notes

In concert with the paper, we’ll focus on two major separate but related Boeing 737 accidents: 

  1. Lyon Air #610 in October 2018 - The plane took off from Jakarta and crashed 13 mins later, with one of the highest death tolls ever for a 737 crash - 189 souls.
  2. Ethiopian Airlines #30 in March 2019 - This plane took off from Addis Ababba and crashed minutes into takeoff, killing 157.

 

The paper’s abstract reads:

Following other contributions about the MAX accidents to this journal, this paper explores the role of betrayal and moral injury in safety engineering related to the U.S. federal regulator’s role in approving the Boeing 737MAX—a plane involved in two crashes that together killed 346 people. It discusses the tension between humility and hubris when engineers are faced with complex systems that create ambiguity, uncertain judgements, and equivocal test results from unstructured situations. It considers the relationship between moral injury, principled outrage and rebuke when the technology ends up involved in disasters. It examines the corporate backdrop against which calls for enhanced employee voice are typically made, and argues that when engineers need to rely on various protections and moral inducements to ‘speak up,’ then the ethical essence of engineering—skepticism, testing, checking, and questioning—has already failed.

 

Discussion Points:

 

Quotes:

“When you develop a new system for an aircraft, one of the first safety things you do is you classify them according to their criticality.” - Drew

“Just like we tend to blame accidents on human error, there’s a tendency to push ethics down to that front line.” - Drew

“There’s this lasting psychological/biological behavioral, social or even spiritual impact of either perpetrating, or failing to prevent, or bearing witness to, these acts that transgress our deeply held moral beliefs and expectations.” - David

“Engineers are sort of taught to think in these binaries, instead of complex tradeoffs, particularly when it comes to ethics.” - Drew

“Whenever you have this whistleblower protection, you’re admitting that whistleblowers are vulnerable.” - Drew

“Engineers see themselves as belonging to a company, not to a profession, when they’re working.” - Drew

 

Resources:

Link to the paper

The Safety of Work Podcast

The Safety of Work on LinkedIn

Feedback@safetyofwork

Episode Transcription

Drew: You're listening to The Safety of Work podcast, episode 103. Today, we're asking the question, should we be happy when our people speak out about safety? Let's get started. 

Hey, everybody. My name is Drew Rae. I'm here with David Provan, and we're from the Safety Science Innovation Lab at Griffith University in Australia.

Welcome to The Safety of Work podcast. In each episode, we ask an important question in relation to the safety of work or the work of safety, and we have a look at the evidence surrounding it.

David, what paper do we have this week?

David: We'll introduce it in a little while, but I guess it's an opportunity for our listeners. It's an opportunity that I use. Many of our listeners will hopefully be familiar with Google Scholar. What Google Scholar allows you to do is authors create profiles, and then you can actually follow those authors.

I got a hit in my inbox this week twice because I follow two of the authors involved in this paper. I thought since it's a paper that was only published last month that it might be a good opportunity to talk about it. We're going to look at engineering, ethics, broken organizations, and what all that means. We're going to do that through the lens of the Boeing MCAS accidents and everything that's been written around them over the last couple of years.

Drew, maybe before we introduce the paper, do you want to give a little bit of background to our listeners around these accidents themselves and the MCAS system? Most will be across it at some level but maybe not in great detail.

Drew: Sure, David. I never quite know how much people know about accidents. It's like when you get really geeky about a subject and you don't know whether it's in the public domain or not.

I actually ended up writing a little almost DisasterCast episode by way of introduction to this episode. We're talking here about two separate but closely related accidents, Lion Air Flight 610 in October 2018 and Ethiopian Airlines Flight 302 in 2019.

The Lion Air Flight basically took off from Jakarta. Thirty minutes later, it crashed into the Java Sea. It was the highest death toll of any 737 ever dating right back to when the 737 was first introduced in the 1960s. In that particular flight, there had been issues on the plane the flight beforehand, so immediately, attention went to these malfunctioning sensors. In particular, a sensor called the angle of attack, which basically tells you whether your plane is tipping up or tipping down.

Another system that we'll talk about in a moment that that sends the feeds into is called MCAS. Basically, David, my memory is that pretty much straight after the accident, people were just blaming the pilots. Even during the investigation when they found these failures, people were still thinking, oh, it's an Indonesian airline. Pilot training is not so good in Indonesia. They've stuffed up.

Then, a very similar accident happened not that far later. They were still investigating the Lion Air Flight which happened in October when the Ethiopian Airlines Flight happened in March. This one left Addis Ababa and crashed six minutes after takeoff. When they looked at the wreckage, one of the most immediate things that they found was that the horizontal stabilizer was screwed right into the maximum position possible. It would have been pushing the plane into a dive, and in fact, the plane dove into the ground.

We're not going to talk in great detail about these accidents and the investigation, but the important thing you need to know is that we've got this single sensor in both planes that malfunctioned, the angle of attack. It feeds into a system called MCAS, which relies on this single sensor. As a result of this sensor, it's making incorrect adjustments to make the plane tip forward.

Most of what the investigations have said is that it isn't automatic that the plane's going to crash, but the plane is headed toward the ground, and it creates a really, really high workload for the pilots. They have to do exactly the right things in exactly the right order and they've got no guidance from the systems as to what that order is. In both cases, the pilots weren't able to recover at the right time and height, so the planes hit the ground before the pilots could recover.

David, is there anything else you want to say about the accident, or shall we talk a bit about MCAS?

David: Let's talk about the MCAS system itself because that feeds well into the paper around engineering and safety engineering particularly.

Drew: To understand MCAS, you need to understand the Boeing 737. I'd be pretty confident to say that most, if not all, of our listeners have at multiple times in their life flown on a 737. It's the most common aircraft in the world and it's been around since 1967. I looked this up, David. It's as old as the original series of Star Trek.

David: Excellent, Drew. I've never watched Star Trek, so I'll take your word for it.

Drew: We've got to start a whole new podcast, Drew Makes David Watch Star Trek from Start to Finish.

David: All right, I'm up for it.

Drew: David, what you need to know about Star Trek is that there are lots of different flavors of Star Trek going from the very original to the next generation, and then they just keep reinventing on the same formula. This is what they did at Boeing. We've got Boeing Original Series, Boeing Classic, Boeing Next Gen, and Boeing MAX.

When they are up to the MAX, basically, they're trying to keep the plane the same from the point of view of the pilots, but they also want to upgrade it, so it's got brand-new, bigger engines and better fuel efficiency. The big new engines give the plane a different balance. They add in an electronic system to make the plane handle the same way as the previous plane.

What MCAS does is it just tweaks the plane so that it feels the same to the pilots, so you don't need to train the pilots to handle a different plane, but in both accidents, instead of making these small tweaks, MCAS responded to the malfunctioning sensor and pushed the plane's nose right down.

David, I will go quickly through the backstory. How do you get to the point when you've got a system that just needs a single unreliable sensor to make that system do something dangerous?

When you develop a new system for an aircraft, one of the first safety things you do is classify them according to their criticality. If it's very new or very dangerous, it gets a higher classification. Higher classification means more scrutiny, more safety analysis, the regulator takes a closer look at it, and you've got to submit more documents. If it's something that is relevant to the pilots, you've got to do more training for the pilots on how to handle that system.

There's a big urgency not to over-classify things because every time you classify something as critical, it incurs all of these extra costs. Some of those costs go directly to Boeing. Some of them go to the customers either because the aircraft is more expensive or they've got to do more training.

In the case of the MAX, they would have had to buy brand-new simulators, train people to operate the simulators, and then train the pilots using the simulators. Boeing was trying to sell into markets where they don't have money for simulators and lots of pilot training.

Boeing decided that because MCAS is really just there to help the feel of the plane and it's not really to actually help recover from stalls or anything like that, it's not a critical system. This overlooks the fact that if MCAS fails, suddenly, it does become a critical system because a failed MCAS can do really dangerous things.

You're supposed to classify systems not on what they're supposed to do but on what they can do if they go wrong. You got to be really careful when you—in hindsight—try to judge what people should have classified something as because, of course, we've got the information that it fails dangerously. The engineers who were designing it don't have that information.

I think it's fairly noncontroversial to say that given what MCAS was capable of doing, it should have been given a more critical classification. What's a bit more controversial and speculative is what would have happened if it had gotten that classification.

You should be really careful when people say, oh, they should have done a risk assessment. If they'd done the risk assessment, they would have found the problem, they would have fixed the problem, and the accident wouldn't have happened. We know that most risk assessments aren't capable of doing that, but this is what the argument was made.

In this episode, we're going to be talking about a whistleblower. The whistleblower clearly believes that if they had done the extra analysis or the extra scrutiny, then they would have identified and fixed MCAS. Specifically, the claim is that in a system that critical, people would have looked at the single sensor and would have said no, you can't have a single sensor feeding into a system that is dangerous. We've got to have multiple sensors.

They would have also looked at the pilot training and said, look, you've got a system like this. Pilots need to be given specific training on how to deal with it failing instead of just being given press releases on how to deal with a failed MCAS.

That's pretty much it for background and what you need to know about MCAS accidents. David, do you want to talk a little bit about the whistleblower?

David: I'm going to get you to do that, Drew. You went and looked at the main source article. Then, we'll introduce the paper and get into it.

Drew: What this article is going to be talking about is a particular engineer who came forward publicly pretty much after a lot of the main investigation. This guy is called Joe Jacobsen. He worked for the Federal Aviation Administration (FAA) and previously worked for Boeing, but he wasn't actually involved in the MCAS system, so he didn't design it and regulate it. But he did get involved afterward in some of the FAA responses. 

What happened is when he was retiring, he didn't just quietly retire. He made public a letter that he wrote to the family of one of the victims, shopped around to newspapers, and then went and gave interviews to lots of newspapers, which is a very atypical thing for an engineer to do. 

It's not an apology letter. He doesn't accept any personal responsibility for what happened, although he does wish that he had been more aggressive early on insisting that he was involved. 

The letter doesn't really make any new details. It's just really his thoughts about how Boeing and the FAA should respond going forward. I think it's this letter that particularly prompted the attention of the authors of the paper we're looking at today because it was a fairly unusual thing to do. It was written with very, very moral overtones. His newfound faith in Christianity is a large part of the letter. He draws a direct connection between that and the act of whistleblowing.

David: I'll introduce the paper now. We've held out for a little while. The authors of this paper are Sidney Dekker, Mark Layson, and David Woods. We've covered papers by both Sidney and David before. In relation to this article, Sidney is very interested in the second victim, the moral aspects of safety management, and also major disasters. 

David Woods has always been very interested in his cognitive systems engineering background in automation. Particularly these days, he's involved in very complex technology systems and has written lots of papers on how to make automation a team player and how to manage complex technologies and automation. 

And Mark Layson, I don't know if he's a Ph.D. candidate at Charles Sturt University or he's already graduated from his Ph.D. If you have, Mark, congratulations. His Ph.D. was all about investigating the harm that arises from traumatic experiences. Mark had been a police officer. He's a minister. He's been a chaplain in an ambulance service. He's really looking at ways that we actually think about healing moral injuries and moral distress in people. 

I guess from his perspective on this paper, Joe Jacobsen felt quite a moral injury. The FAA employees and many of the Boeing employees probably felt quite a moral injury as a result of the incidents themselves and their own actions or potential inactions. 

The title of the paper is called Repentance as Rebuke: Betrayal and Moral Injury in Safety Engineering. It was published in October 2022. It was published in the Journal of Science and Engineering Ethics. 

I'm not that familiar with this journal, but it sounds like from at least the citations inside this paper that this journal writes heavily on the ethics of engineering.

Drew: Yes. This is part of basically a series of articles appearing in that journal about the 737 MAX situation.

David: Drew, to kick us off, in this repentance aspect of the title of the paper, we've had a discussion on the ethical implications of engineering for a long time. There's a famous quote in NASA about the Space Shuttle incidents that engineers needed to take their engineering hat off and put their management hat on about balancing engineering needs and organizational and commercial needs. 

We know this big debate about autonomous vehicles now, the ethical implications of the coding in the software used for autonomous vehicles, and obviously, these more recent Boeing 737 incidents. 

The key call in lots of these incidents and issues is for more moral courage and this idea that what we need is engineers who have greater courage to act on their own moral convictions, which is making sure that they strictly adhere to their code of ethics, we need to strengthen the voice of engineers within large organizations, and we need to make sure that engineers are always able to do their best engineering work.

Drew: You sometimes see this expressed as calls for this to feed into the training of engineers. Engineers should be trained in ethics and major accidents, and organizations should have structures that enable people to speak up and have whistleblowing policies.

David: We're seeing this quite a lot, Drew. Even in the safety profession itself, I was involved in the Australian Institute of Health and Safety publishing a chapter on ethics for safety professionals in The OHS Body of Knowledge. It's this idea that actually when we've got professionals inside organizations, they need to be able to speak up on matters in relation to their profession and be heard by the organization. 

There's an early comment here in this paper that if we need to focus so heavily on moral courage with professionals in organizations to simply just do their job, then there is a major moral and systemic failure within those organizations that the remedy or the fix for that failure can't sustainably or reliably be just having more frontline moral heroes. Our organizations aren't going to be successful in the long term if they're broken enough to require heroes on the frontline. 

Any other thoughts around that argument? I know you've been heavily involved in systems assurance, engineering, risk management, and research of all of that.

Drew: Nothing specifically, David. We'll get into this a little bit later. There's this curious framing of ethics that people tend to only talk about ethics when it goes badly wrong. Just like we tend to blame accidents on human error, there's a tendency to push ethics down to that front line and treat ethics as a last-ditch defense rather than recognizing the way it permeates our whole approach to things like how we regulate high-hazard industries. 

There are ethical things in the approach in aviation of self-regulation like deciding that mostly, it's up to the companies to manage safety and it's up to the regulators to make sure that the companies are doing their job properly. 

That's an ethical stance. You might agree or disagree with whether it's the right ethical approach, but there are ethics tightly bound in those choices. The idea that we want these to be private companies that are accountable to shareholders and hold shareholders as their primary stakeholders is an ethical decision.

David: There are contradictory professional ethics at play here as well that we'll also talk about when we talk about the organization's objectives because as a professional, you'll also want to be ethically motivated to support the organization to achieve its objectives. Obviously, there are engineering objectives and also cost, sales, and other objectives for the organization, so ethics isn't as simple as just saying that the ethical code here is about things always being safe.

Drew: There's something I really want to throw in here. This is not actually in the paper. I think it's a failing of the paper because the authors of the paper aren't engineers themselves. They might never have actually had to sign up to the engineers' code of ethics. 

One of the really interesting things about how ethics are framed for engineers is that it basically works like the Laws of Robotics, which is another pop culture reference I'll have to explain to you some time. 

Basically, you pass each threshold. Your number one duty is to the safety and well-being of the public. The idea is if you then pass that threshold, then your responsibility is to your employer. It treats this treatment of safety and welfare just almost as a binary yes-no thing. So long as you're not doing something actively unethical, then your next responsibility is to the employer. 

Your engineering ethics doesn't necessarily help you with this complex mix between what is in the interest of the public and what is in the interest of your employer. As those two things become intertwined, how much money do you spend on safety? How many people do you appoint for safety? How strictly do you classify something? Encouraging greater regulation and slowing down the project. 

Engineers are taught to think in these binaries instead of complex trade-offs particularly when it comes to ethics.

David: It's a good point, Drew. This paper then goes into this session on moral injury and defines it. It's this idea of when an individual's ethical framework is broadsided or broken by the actions of themselves or others. 

This term has been around since the 1990s, and it links heavily to the betrayal of what is right. If we feel like something that we've done or something someone else has done is a betrayal of what is right in a particular high-stakes situation—in this case, the design and approval of aircraft—then we can have these moral injuries triggered by this either self-accusation or someone else's accusation of ourselves and suffer this moral injury as a result. 

This has been described. Sidney's already previously written a book titled Second Victim. There's this lasting psychological, biological, behavioral, social, or even spiritual impact of either perpetrating, failing to prevent, or bearing witness to these acts that transgress our deeply held moral beliefs and expectations. 

What we're saying here is in this situation, knowing what we know after these incidents and knowing the engineers within both Boeing and the FAA, what they must have felt would have been this double injury of actions of others as well as actions or inactions of themselves. 

The paper then goes into quite some deep philosophy and history here. I assume that's been driven from the contributions of Sidney Dekker into betrayal as being this central theme as this ubiquitous human experience over the ages. Even some of the biblical stories and original stories that we tell go back to someone's betrayal of someone else or someone having done something in a situation that they shouldn't have done or failed to prevent something. 

Drew, this is not uncommon in safety either. I think this safety engineering reference is prominent, particularly in relation to NASA. Do you want to give some reflection on the Challenger incident specifically?

Drew: One of the things that we see a lot is after a major accident, I don't think whistleblower is quite the right term because they're usually not revealing new information. They are stepping outside of their defined roles where they wouldn't normally be permitted to speak by their organization. 

They just decide to perform these public acts of either self-accusation or accusing the company that they were part of as a way of trying to make up for the moral injury that either they feel that they have let people down so they need to repent, or they feel that their company has injured people and they have been implicated, so they feel the need to speak out against the betrayal of them by the company. 

One of the big examples is the engineers who were involved in the Challenger launch decision meetings. I don't want to go too deeply into this because I think their actual role in the lead-up to the accident is quite mixed. There is a bit of self-justification in some of the ways that they have tried to portray it afterward, but it is certainly the case that from their point of view, they tried and failed to stop the launch.

Particularly, Roger Boisjoly had spoken out a lot about the Challenger accident. Another one is the engineer who was in charge of the Hyatt Regency project, which led to the Regency walkway collapse. He actually basically spoke out and said, look, this is on me.

He was overseeing the project. He didn't actually make the drafting error or any of the design errors, but he basically said, projects are supposed to have an engineer in charge to stop this. I didn't properly do my job as the engineer in charge.

He spent a lot of the rest of his career trying to educate engineers about what it means to be the engineer in charge of a project and the need to take personal accountability and responsibility.

David: Going back to the start of this episode about disaster literacy, for those who want to look into both the Challenger and the Hyatt Regency project, the back catalog of DisasterCast has very good and interesting comprehensive episodes on both of those.

Drew: If you can find them these days, David. I haven't been particularly maintaining the access to DisasterCast, but the transcripts are still easy to find at least.

David: I think it's still on my podcast app sitting on the Interweb somewhere.

We go to this section. We've got this repentance, which is engineering done wrong. We need to support engineers to speak up more and we know we've got these moral injuries that we need to understand and help people with. 

Then, we've got this section in this paper called humility and hubris. In here, it wasn't referenced, but this section made me think of some work by Karl Weick and others on organizing for collective mindfulness where he talks about the need to have both prosocial motivation and emotional ambivalence.                           

The way that they described emotional ambivalence in that paper is that we need almost equal amounts of hope and fear. We need to be equally—in the language of this paper—humbled to the extent that actually, we may not know everything we need to know and we may not have covered all the risks that we need to cover, so we need to be quite cautious and quite open to exploring, testing, checking, and understanding. 

The hubris side of things is but we can't second-guess every single thing we do because otherwise, we'd never do anything. We need to have an equal amount of confidence that, well actually, no, we do have experienced people involved, we have followed good processes, and we do know what we're doing. 

Karl Weick who's been involved since the start of HRO Theory concluded that you need equal amounts of both. One eye or 50% of you needs to be worried that you actually don't know everything, and the other 50% of you or the other eye needs to be confident that you can move forward knowing what you know. 

I guess in this paper, they're worried about overconfidence in engineering, so there's this idea that our technologies are actually quite unruly, less orderly, less rule-bound, less controlled, and less universally reliable than we think. Technology involves lots and lots of uncertainty, judgments, and assumptions even when we see these really nice and comprehensive designs that mask the fact that there's a lot of—hesitated to use the word—guesswork behind these seemingly very complete designs.

Drew: David, there's a specific really interesting thing that goes on with legacy systems. One of the ways we try to convince ourselves that we've got this certainty is we keep something that has been very successful and has a long history, and we just make small tweaks to it. We tell ourselves that because they're only small tweaks, they're not really new, it's not really a more complex system, and it's just the old system with a little bit of change as if we're borrowing all of the certainty and confidence. 

The idea is that your 737 MAX isn't a brand-new, highly complex system. It's the same thing we've been operating since the 1960s, so we've got 70 years of confidence. 

MCAS itself is a system designed to make the 737 MAX be just like the previous ones. Even admitting that this is a novel thing breaks our magic line of trust, but in fact, this isn't making it behave like the old thing. It's adding a whole new piece of technology that needed to be evaluated.

David: The other side of the argument to that is that you're changing the size and the weight of engines, the load distribution of the aircraft, and the aerodynamics and its flight handling. Pilots hop in the MAX as opposed to the other 737s, and the plane responds completely differently. You go, well, we'll just put this little design and software to help you smooth out the pilot's controls. 

The story that you tell yourself with the change that you're making in these brownfield-type environments is this idea that the plane is more than the sum of its parts. In any complex system, as soon as we introduce something, we may not know exactly what the unanticipated or unintended consequences might be for the rest of the operation of the system.

Drew: Yup. I want to be clear about what we are saying here because there are people who've just come right out and said, oh, Boeing was trying to hide the complexity of this and was trying to avoid scrutiny from the regulator in order to save money. 

I think there is an equally plausible story that Boeing and the engineers wanted to believe that this was not a complex system and not a big change. 

Having your own work scrutinized by regulators is a pain. It adds no actual value to the design. It's just a whole lot of money and effort to have someone else check and tell you yes, okay, here is your approval. Accepting that 1 time out of 100 when the regulator comes back with a really, really good point. No one believes they are that 1 time out of 100. Everyone believes they are those 99 other times when this is just a cost added for no actual benefit to the design. 

David: There are different stories that you can tell yourself here about this. I don't think we claim to know anything about exactly what happened pre-events, but on one hand, you can see a very experienced engineering team that's been working with an aircraft for 55 years and knows a lot about that aircraft and how it operates. 

It's been incredibly safe over its lifetime. They're designing some software that's going to make minor adjustments to the feel and flight. They are classified as hazardous rather than safety-critical because they figure that at any point in time, the pilots can just turn the system off and fly the plane as they normally would. 

They work through the designs and approval process like that. Like you said, avoiding unnecessary testing in their minds, avoiding unnecessary scrutiny, and moving forward. 

Then, the other story you can tell yourself is you've got these senior managers who are incentivized to make things as profitable as possible and are standing over the shoulders of engineers telling them to down-classify things and ignore certain things. 

I think the reality of the first story for me seems a little bit more plausible than the second, but that's just my own assumptions of organizational life.

Drew: The argument that I know that Dekker and Woods would both like to make—I don't think this is the paper where they're trying to make it, so I'm not accusing them of failing to make their argument—is I don't think the evidence or the theory is there yet. 

We've got these engineering rationalizations and decisions that the engineers are making as engineers thinking that they're doing the right things appropriately. We've got the managers who are absolutely under stakeholder and shareholder pressure and have got really perverse incentives to cut budgets, cut regulation, cut the amount of training, and get the product out. 

What we don't have is a clear mechanism for how one influences the other. The engineers are not sitting there saying, let's make money for the CEO. They're not sitting there saying, oh, I want the shareholders to go away with a packet of money. 

But there are lots of ways in which the influence can trickle through, things like the number of engineers and the timeline on your project. 

All of those things are going to create direct incentives for the engineers because they just have to do a manageable amount of work. If you've got infinite time and plenty of spare staff, why not send it off to the regulator? The rookie can handle that while the rest of us get on with someone else. 

But if there's no rookie there, there are only fewer people, and the junior person who'd get that job got canned last week, then you got more incentive not to send it out. You had a limited budget. 

Again, you're worried about, will it get delayed if we do this? There are plausible mechanisms. We just don't have evidence that that's actually what's going on here. 

David: You and I are both supervising a Ph.D. research project now from Russell McMullan that's hopefully going to get some answers in some of the links between some of these aspects and engineering decision-making in the real world. We've had Russell on the podcast before talking about his master's thesis on safety engineering and design decisions.

Drew: I'm looking forward to whether he can build up some of that evidence of how those mid-level in the organization influences translate cost and budget pressures down to the engineering decision-making level.

David: We then go into a section on the decay and the disaster. How did this all play out in the design and approval process around this aircraft and all of these pressures on people to deliver these changes to this aircraft in a way that we haven't spoken in detail about? But obviously, you mentioned generally about this. 

There's a huge incentive to not have to retrain these pilots, so can we just introduce this aircraft with our existing pilot group and basically then not have to worry about which pilot is on which plane at which point in time? Because if they're certified current on a 737, they can fly this aircraft. 

This pressure is really big in the aviation sector and obviously clear with airlines all around the world and even within the FAA. 

The FAA has a role here. In the episode we did on the nuclear industry and risk assessments, episode 101, sometimes we overestimate the role that the regulator plays in the safety assurance of these industries.

Drew: David, I have to say though, I don't buy at all the idea that this is an erosion of a well-working system. Let me just read out to you what the paper says. It says, "A US Senate inquiry in the aftermath of the MAX accidents found: insufficient training, improper certification, FAA management acting favorably toward operators, and management undermining of frontline inspectors."

Anyone who knows anything about the history of the FAA knows that that has been the entire life of accident investigation and transport regulation in America. I bet I can find almost exactly those same words in the accidents in the 1960s. That's just the way the structure is always set up with regulators that have got this joint responsibility for managing safety but also fostering the industry that they're regulating. You always have this very difficult path that the regulator has to navigate.

David: In some ways, in my experience with regulators in other safety-critical industries, I would say over the course of my career that I have, at least my own under researched opinion would be to see some form of decay in what's going on. 

When I started my career, a job in government and a job in a private enterprise wasn't seen to be that disparate in salaries. In many ways, when I started my career in government, my salary in government was far more. I went from a private enterprise role in a safety role to a government safety role, and doubled my salary at the start of my career. 

Over time, we've seen this loss of capability inside regulatory offices due to either the undesirability or for a range of reasons, either people wanting to work in regulators, maybe they've got a bit of an identity and a brand crisis, or the remuneration offered for those roles within regulators as well.

This paper says that regulators sometimes are lacking technical capability. They get into a fight with a manufacturer who's got hundreds and thousands of engineers, lawyers, and a whole bunch of things at their disposal. It's almost an unfair fight between a regulator that's meant to have these legal powers and maybe the actual power residing in the regulation.

Just to give an example without naming anyone, I was involved in an environmental approval process for an energy company. Their regulator was being very difficult with the environmental approvals process after the Macondo Gulf of Mexico incident. 

The environmental department actually reported to the Energy and Environment Minister, and a very senior person in the organization just called the Energy Minister and said, you've got to tell your environmental regulator to calm down or we're never going to get this project done. All of a sudden, that was approved within the next week.

I only tell that story to say that I actually think there are real problems in lots of regulators, and those problems have gotten worse over the last couple of decades.

Drew: I would certainly concede that with respect to salaries for regulatory staff, training, and development opportunities for regulators, but I don't think that there's a perfect fix to it because particularly in something like aviation, where do you become an expert in designing aircraft as a regulator for the FAA? The only way you do that is to have previously worked for Boeing, so your best, most experienced regulators and inspectors in the FAA are all ex-employees of the company that they are regulating. 

You're really reliant then on this hero engineer model and their personal integrity as engineers as opposed to being captured by a system where you take for granted this legacy of if they would have worked on the 737 in its earlier versions, they would have faith in it as a system, which makes it very difficult.

David: They would have relationships with people inside that organization, a sense of maybe unverified trust, some very stale knowledge if you're out of an organization for two, three, or five years, lots of assumptions about the way that organization functions, some rose-colored glasses looking at that organization from the time that you were there, and what you think is going on within the organization.

We had this situation that came out in a few of these reviews post-Boeing, these incidents that the FAA was quite this military stick chain of command. Specialists could offer opinions, but otherwise, it was up to the management of the FAA to decide what happens. 

That's true of all organizations, so we shouldn't criticize them too much for that. However, I really felt like this idea of difference to expertise just didn't exist inside the regulator here. At least going on the story of the individual by which this paper is based, he really felt that he had the most expertise of anyone and wasn't assigned to be involved in these events not only beforehand but even after the accidents that had occurred until he ended up—in his own words—inviting himself along to meetings that he wasn't actually invited to.

Drew: David, should we talk a little bit about what solutions are available and what the paper calls the remedies?

David: Yeah. We've got this situation that we've talked about through this episode here. We've got these engineered systems that are constantly being updated, there's always this trade-off between the commercial pressures of organizations and engineering safety, we've got regulators overseeing this, and we've got quite a complex system involving regulators, manufacturers, and operators.

One of the remedies, which we called out at the very start of this episode, is to speak up and really just try to tell engineers that there's an expectation that you're really knowledgeable in this system design, there are lots of pressure in the organization, and we want to be safe, so you really need to speak up. Even when it feels like there are pressures for you to remain silent, we really want you to use your voice. Go against the dominant goals of the organization like time schedules and profit. Even though you might feel that the organization sees you as a traitor to the employee or to colleagues, you have to have to speak up.

Then, the paper does a little bit to say how maybe invalid that remedy might be and that the assumptions around what it takes to speak up are unlikely to be true.

Drew: It particularly suggests that all of these things we do to try to encourage employee voice and get people to speak up are just signs of how difficult it is. Whenever you have this whistleblower protection, you're admitting that whistleblowers are vulnerable and your ability to beef up protection and encouragement that can resist the power of the very organization that employs someone is a little bit naïve.

They go further and say that much in the same way, David, that you discussed in your paper about safety professionals, engineers see themselves as belonging to a company, not to a profession when they're working. It's not that they are resisting the goals of their employers. The goals of the employer become the goals of the engineers. 

If you're a company making aircraft, your goal is to make that aircraft, get that aircraft approved, and get that aircraft flying. Engineers working for rocket companies don't have as their goal, I want to serve the profession. They want to put stuff into space.

David: I think by the very nature of the engineering profession—I'm also not an engineer, Drew—engineers see themselves as very practical solution-oriented problem solvers. We can do this, we can work it out, and we can make this happen. 

It's also mentioned in this paper that what you're asking engineers to do is something very non-engineering, which is to admit that this is wrong, we don't know what to do here, and we need to stop without knowing what comes next.

The authors then say that the FAA should have been the right place for this if we can't expect people inside companies to put their jobs on the line and stand up to all this management power. We know that it doesn't end well for whistleblowers. Professional Engineering Association doesn't really do much. We've seen that play out in the media around the accounting function, major audit houses, the inability of professional accounting associations to do anything about that, or the unwillingness for them to do anything about that to their members.

The authors say that the FAA is the right place for this, so the regulator really needs to understand that this is actually the way that organizations function. The regulators should have processes to actually dig deeper for all of these approvals. Maybe that's unfair as well because regulators never have the resources that they need and they can't possibly double-check everything.

Drew: I don't know if you are reading this the same as I am, but what I'm hearing in the paper is they're almost suggesting that quite apart from resources for the regulator, the regulators have the ability to have the culture and engineering environment that is more aimed towards resisting rather than being captured by the goals of the industry. 

That's actually a really interesting suggestion that regulators could do a lot of cultural work within themselves. If it's true that engineers form their goals to match their own organization and conform to rules, hierarchy, and production goals, and they get their job satisfaction and identity from that, it is quite capable to have all of that that has been aligned towards the goal of a regulator trying to regulate.

David: It would be very interesting to look at regulators doing some of that cultural work because what the primary objective or primary goal of a regulatory organization, particularly a safety regulatory organization, is to do everything in its power to enable and assure the safety of the industry. 

I've looked at some of the identity and cultural work around things like policing departments, priorities of some policing departments like increased public trust in the police force, and a whole bunch of interesting priorities and objectives for an organization that you'd never see in a commercial enterprise. You could actually do something to try to shape the identity of the professionals inside a regulatory environment, notwithstanding the fact that the behaviors that you would drive would be seen as very, very resistant. You'll see a lot of behaviors play out then that the industry will get quite upset with.

Drew: It's an interesting question whether a good regulator has to be an adversarial regulator because I think there's got to be a middle ground between a regulator who sees their job as promoting the industry and also promoting safety versus a regulator at the other extreme who sees their job as hindering and stopping the industry from doing bad things. 

There's got to be a middle ground where they're just doing a professional job of ensuring that adequate scrutiny happens regardless of the reputation, history, and confidence the company has. They just fairly and impartially make sure that everything's been checked.

David: Process and transparency would help with doing their professional job because regulators should never see themselves as successful when the industry is very satisfied with the partnership with the regulator. 

If in this case, the FAA said, no, we're not only going to scrutinize something that you mark as safety-critical, we're going to scrutinize all changes, this is the process through which we're going to scrutinize them, and this is the information that we need, this is just a single-point failure. 

This is a big assumption for me, a nonengineer, but this is one of the most basic things that you do if you did a HAZOP or some process hazard analysis of any engineered system. Here's a single point of failure. If this sensor shows the wrong reading, then this control system of the flight is going to take control, and therefore, any changes to software that have the ability to change in-flight controls automatically become safety-critical systems. They're safety-critical changes. Maybe there's actually more work that a regulator can do with procedure and transparency.

Drew: As you know, I'm the most hesitant person to say that if we'd done a risk assessment, we would have caught this, or if we'd done the classification, we would have caught this. I cannot imagine setting this as a basic question in system safety 101 and not having most of my class get it right.

You're right. It's a single sensor going into a system that automatically suspects so long as you actually do the scrutiny in the first place, but the question of how to classify this is where the mistake was made. It wasn't that someone failed to do the analysis properly. It's that they failed to recognize that this was the type of system that needed that type of analysis. That is a more nuanced question that's hard to know exactly what that looked like from the inside.

David: Knowing that you've got a system where the assumptions that get made in a classification then flow through to a whole bunch of actions and scrutiny that happen or doesn't happen is where you're going to regulators to do the role of a good engineering job as a regulator. I think the transparency of their processes and protocols could be a lot more detailed. 

No industry actually finds that a little bit frustrating, particularly under safety case regimes, which is actually not knowing what the regulator wants to see and then finding it very difficult to know what information to provide.

Drew: That's what I want to quickly mention here because I think this is an important point. One of the reasons businesses hate it is that those initial classification decisions are what drive the budget for the project. 

The most annoying thing is you'd classified it as SIL 2, you've done all your development and all your processes at SIL 2, the regulator comes along, and says, why was that classified as SIL 2? You should've classified it as SIL 3 and done all the SIL 3 work. 

That's where getting the right time for the regulator to review the decision matters. It also requires regulators who have a lot of confidence in their own analytical abilities. I think what you were saying before about pay and stuff matters is the regulator has to believe that they're smarter than the engineer who made the decision. They have to believe that the engineer might have got it wrong and their opinion is better rather than thinking, I don't even understand this analysis. I'll give it cursory scrutiny and let it pass.

David: Safety integrity level for non-system engineers.

Drew: It's not actually what they use in aviation. They use [...] in aviation.

David: There's this idea about working counter to the organization. This is what you said about the government should make this a little bit easier. Do you want to talk a little bit about some of the other examples outside of Boeing like the arrangements within NASA in the 1980s and the situation of engineers in relation to the Shuttle program?

Drew: The mixed thing between organizational goals and your engineering decisions is getting more concerned by, oh, if I do this analysis and find a problem, it's going to delay the program because we've got to do more assurance. Then, you get concerned about whether it's actually safe or not. 

This happened a lot at NASA, particularly throughout the Shuttle program. The way the decisions were framed was are we going to risk delaying the next part of the project? The fact that we're dealing with quite sensitive safety issues, they should have been saying, are we worried about the shuttle blowing up or the shuttle failing on launch? 

What I described as drift is the language then starts to change about how you're describing things and how you get used to decisions that have already been made. You feel like a jerk if you insist on revisiting something that other people have said is closed and we've finished talking about it. Is that what you meant, David?

David: Yeah. I think this idea that organizational goals come into it, [...] also said that engineers don't resist organizational goals. The Space Shuttle program was the engineering goal of the organization, so the engineers are using their technical skills in the interest of those goals. Speaking out is almost like saying that we're all failing here and we can't solve this problem. It goes on to say that maybe this is actually slightly ridiculous, this idea of speaking up moral courage.

Drew, the final section of this paper is around restating and deepening that argument to say that actually, the remedy for all of this is not having people that speak up more because our systems are actually designed in a way that's not going to be an effective solution.

There's some detailed history on Boeing. It relocated. The express objective of the chief operating officer was to say that his objective was to run Boeing like a business rather than a great engineering firm. 

People want to invest in a company because they want to make money, so that reorientation of Boeing going back 10–15 years before the first one of these incidents occurs—which is the timeframe of the design of this equipment—is that we start rewarding the management team based on total shareholder return, we start looking at short-term business performance of the organization, and we base the organizational decisions and the management towards short-term commercial objectives and shareholder returns.

Drew: We've talked before about the importance of disaster literacy. I just want to point out that the CEO's public quote was basically saying to his own organization, take your engineering hat off and put your manager's hat on. Anyone with a history of aviation disasters should have absolutely flinched at that statement, and said, hey, boss, you do realize that this is what they said just before the Challenger blew up.

David: They went faster, better, and cheaper with this idea when you say that that happened in NASA, a government organization, and now you put that into a private firm and talk here about the intersection of Boeing, the regulator, and Wall Street and how that tension plays out.

Drew: There's something in here that I don't actually know the truth of, but they basically make the argument that the 737 MAX had heaps of pre-orders. It was going to be a successful program. It wasn't a question of success or failure of the project or making enough money. This was purely about already making lots of money, and then how much extra can we skim off the top for the shareholders? 

It paints this whole thing as a picture of this is basically trying to just get as much free money off the top of the organization as possible. All of that money has to come from somewhere. Record profits are not just free money. They got to be squeezed out of somewhere. 

Somewhere where they get squeezed out of is reducing the size of your engineering bureaucracy—which is another way of saying your oversight programs—and your extra slack in the project for getting the extra regulatory activities.

David: Drew, I think that's not a bad step-off point to go into takeaways as well—where those profits come from—because I guess that was one of the takeaways. I'll go down there, and then I'll throw it back to you.

Maybe as safety professionals, because you're saying that profit is borrowing from somewhere, there might be record sales, but maybe the margin is getting bigger and costs are going down is what's going on. I think safety professionals can play an interesting role in organizations in going, oh, we had a record year of profit, we had a record year of production, we've had a record year of customer orders. Okay, what does that mean for the risk in our business? Where is that extra profit coming from? Does the organization know where that extra profit is coming from? Is it borrowing from safety in the short term? 

That'll actually come back to bite the organization in the long term because training off engineering resources in the design of an aircraft doesn't compromise safety at the time that those trade-offs are being made. It compromises safety once that aircraft goes into operation long after the quarterly reports have been published.

Drew: David, there's a phrase that's used in IT that I haven't heard used in safety called technical debt. It's basically the amount of extra stuff you've got stocked up from small changes you've made without properly integrating them and rushed decisions that you've made at crunch time. 

It goes onto the balance sheet of the organization. You just don't see it. It's all this mess you've accrued for yourself in the future because you haven't done things as smoothly, as cleanly, and as sorted out as they could otherwise have been. It might seem like a profit now, but you're going to pay for it.

David: Yeah. Software developers will tell you, look, the system works, but there are bugs in it, there's testing that we haven't done, and if we don't actually come back, liquidate that technical debt, go through, actually check everything, integrate everything, and test it all, then it's just going to result in [...] down the track and it's going to be really complex for us to take the system apart and figure out where the problem is. I think that's a really good example of how we like to kick something down the road and never actually deal with it.

Drew, do you want to add to the takeaways?

Drew: Next takeaway from me is don't romanticize ethics. One of the points that comes through clearly from this paper is that often, we get the impression that the most important ethical thing is you're a brave whistleblower. You're standing up to defend the public against the evil company. 

If you've got to that point where you need that person, you're way beyond it. Your practical ethics gets done on spreadsheets and budgets. It doesn't get done through whistleblowing. We need to incorporate that ethical thinking into our culture of budgeting, recruitment, and all those other parts of the organization, not just of building in these systems for last-ditch defense through whistleblowing.

David: Another takeaway is around safety approvals and being clear of what our primary task is. Is there a primary task at safety assessment or safety approval? Because when you're inside an organization and your safety assurance team and safety approval team, if they see their primary task as safety approval, then they're actually focused on that as their outcome and using their engineering ability to make arguments for getting approvals. If they see their task as safety assessment, then that may drive a different narrative, identity, and set of activities. So be very wary of the goal that teams see themselves as having and what their primary goal is.

Drew: David, I think that's particularly important. This thing is another takeaway. When we're looking across regulators and contractors, a lot of these approval systems are designed as if they're adversarial. 

One reason is going to be scrutinizing the work of someone else. The risk is that both halves of that relationship rely on each other and assume that they're in a joint product to acquire approval. You've set the system up to be adversarial, and you're on the side that's supposed to be scrutinizing. It's your job to scrutinize and not trust other people, not just assume that their work is good. It's just your job to read through, check it's okay, check it's in the right format, and stamp it. 

It includes things like asking for documents that they haven't given you and asking to look over systems that they try to claim that you don't need to look at. The moment someone says, you don't need to see that, the answer should be, I don't just want to see that, I now want to see everything.

David: Yeah. There's a quote that I paraphrased that I love when it comes to safety that I actually do whenever I'm training safety professionals. It says, if two people always agree, then one of them is not necessary. Your job actually has to constructively challenge others, and the regulatory process should be around a constructive challenge. I guess it's not relying on others because there is an argument that could be made particularly by senior people in the organizations which is well, this has been approved by the regulators, so it's safe. It must be safe. 

This idea of if it's approved, it's okay is a really dangerous kind of assumption because pretty much any major instance that has taken place in every major hazard industry or safety-critical industry has been on a technology, a site, or in relation to an activity that has regulatory approval.

Drew: This one is specific to Boeing, David. When the investigation happens, don't let them find emails in your company saying, oh, it must be safe because the regulator approved it and other emails in your company saying, the regulator is an idiot because those don't look good put next to each other.

David: Yes. Then, there are these rewards. We did an episode not too long ago about whether we should reward schemes for safety that was in company research by [...] and just looked at rewards, decisions, and behaviors in safety. 

We know that organizations are accountable to their shareholders. We know that executives are incentivized to maximize their returns for those shareholders. I guess what we've got to do in our organization is understand particularly in these high-risk technology engineering firms where safety actually does involve an investment of money and investment of time to check in, test, and verify lots and lots of complex activities and designs, then that time and cost is going to cut straight into shareholder profits.

If you're a leader in those types of organizations or a safety professional in those types of organizations, I think you've got to have a very strong focus on basically how that trade-off is being balanced or how that conflict is being balanced.

Drew: You can make a [...] to the shareholders without having to maximize the amount of money that you're giving away in your dividends or stock price, give them a fair profit, and put the risk back into safety.

David: It was mentioned in this paper that in Boeing particularly on a previous program where they might have had 20 engineers involved in a certain activity, they now had just one. You can actually look at your safety professional engineering practices and resources over time and just make some inquiries. Have we had a change in the number of engineering resources that we've had in the team? Go and talk to somebody if you're an engineer and ask them, do you feel like our engineering practices are getting weaker? In what areas? What other processes do we use to check, verify, and test engineering designs? 

You can make inquiries about what engineering work has done in the same way that you'd make inquiries in your organization about what frontline work has done, and see what things emerge.

Drew: What did we use to do that we don't do now, and are you sure we shouldn't still be doing it?

David: Drew, the question we asked this week was should we be happy when our people speak out about safety?

Drew: Certainly, the answer in this paper is once we've got to that point, we're already well beyond being happy, so no. It's a sign that if we need that to happen, then things are bad. 

David: It's a good point. Not that we shouldn't be happy if someone does catch something out, but maybe your focus should be what's broken in our organization that we've had to rely on for maintaining safety.

Drew, that's it for this week. We hope you found this episode thought-provoking and ultimately useful in shaping the safety of work in your own organization. Send any comments, questions, or ideas for future episodes to feedback@safetyofwork.com.