The Safety of Work

Ep. 109 Do safety performance indicators mean the same thing to different stakeholders?

Episode Summary

In this episode, we’ll discuss the paper entitled, “Tracking the right path: Safety performance indicators as boundary objects in air ambulance services”, by Jan Hayes, Tone Njølstad Slotsvik, Carl Macrae, Kenneth Arne Pettersen Gould. It was published in Volume 163 of Safety Science.

Episode Notes

Show Notes -  The Safety of Work - Ep. 109 Do safety performance indicators mean the same thing to different stakeholders

Dr. Drew Rae and Dr. David Provan

 

The abstract reads:

Indicators are used by most organizations to track their safety performance. Research attention has been drawn to what makes for a good indicator (specific, proactive, etc.) and the sometimes perverse and unexpected consequences of their introduction. While previous research has demonstrated some of the complexity, uncertainties and debates that surround safety indicators in the scientific community, to date, little attention has been paid to how a safety indicator can act as a boundary object that bridges different social worlds despite being the social groups’ diverse conceptualization. We examine how a safety performance indicator is interpreted and negotiated by different social groups in the context of public procurement of critical services, specifically fixed-wing ambulance services. The different uses that the procurer and service providers have for performance data are investigated, to analyze how a safety performance indicator can act as a boundary object, and with what consequences. Moving beyond the functionality of indicators to explore the meanings ascribed by different actors, allows for greater understanding of how indicators function in and between social groups and organizations, and how safety is more fundamentally conceived and enacted. In some cases, safety has become a proxy for other risks (reputation and financial). Focusing on the symbolic equivocality of outcome indicators and even more tightly defined safety performance indicators ultimately allows a richer understanding of the priorities of each actor within a supply chain and indicates that the imposition of oversimplified indicators may disrupt important work in ways that could be detrimental to safety performance.

 

Discussion Points:

 

Quotes:

“The way in which we turn things into numbers reveals a lot about the logic that is driving the way that we act and give meaning to our actions.” - Drew

“You’ve got these different measures of the service that are vastly different, depending on what you’re counting, and what you’re looking for..” - David

“The paper never draws a final conclusion - was the service good, was the service bad?” - Drew

“The pilots are always in this sort of weird, negotiated situation, where ‘doing the right thing’ could be in either direction.” - Drew

“If someone’s promising something better, bigger, faster and cheaper, make sure you take the effort to understand how that company is going to do that….” - David 

 

Resources:

Link to the Paper 

The Safety of Work Podcast

The Safety of Work on LinkedIn

Feedback@safetyofwork

Episode Transcription

David: You're listening to The Safety of Work Podcast episode 109. Today we're asking the question, do safety performance indicators mean the same thing to different stakeholders? Let's get started.

Hey, everybody. My name is David Provan. I'm here with Drew Rae, and we're from the Safety Science Innovation Lab at Griffith University in Australia. Welcome to The Safety of Work Podcast.

In each episode, we ask an important question in relation to safety of work or the work of safety, and we examine the evidence surrounding it. Drew, let's talk about today's episode. Today, we're going to talk about safety performance measures again. We've talked about it a few times, but do you want to introduce how we are going to talk about it today?

Drew: Sure, David. Before we get too far into it, I thought I would mention that I was actually listening to you on another podcast recently. I promised them when I did an episode with them that I’d do a cross promotion. I don't know that I've actually done it on this podcast, so I'll shout out now to the Safety Labs by Slice. A little bit weird, a podcast sponsored by a safety knife, but it's a really, really good interview-based safety podcast.

The host of it, Mary Conquest, is a fantastic interviewer. She's done a recent interview with you, David. She did a recent interview with me. She's just done one, I think, with Rosa Carrillo. Some fantastic interviews, we think it's across the breadth of safety. If you're bored with our podcast, then switch over to that one instead.

David: Yeah, Drew because we're not an interview podcast. I guess the majority of podcasts are interview-based. We had a lot of fun talking to Mary. I saw that you had done it, maybe even the episode just before me. A number of people that I hold in very high regard in the safety space have done interviews on this last podcast. If you don't like Drew, we got plenty to do. If you don't like Drew and my format around what we do here, it's a great podcast in the safety world, so head over there.

Drew: Yeah, but let's get into today's episode. We've talked about performance measures quite a few times on the podcast, which I think is a reflection both of how interested people tend to be in metrics, and also how much of the safety research clusters around issues around measurement.

In episode 35, we just talked generally about leading and lagging indicators. In episode 55, we looked at Matthew Hallowell's work around the statistical validity of lost-time injury frequency rates. In episode 76, we did actually do one of our rare interviews, which was David talking to Greg Smith about, was this capacity based performance metrics?

David: Yeah, and looking at how we want to understand safety capacity in our organizations. It wasn't specifically about performance measures, but it definitely lent itself towards the things you would look at in your organization for safety.

Drew: In Episode 97, we talked about whether it was a good idea to link safety performance to bonus pay. If you're interested in metrics, go back and listen to any of those episodes. Most of those episodes, we were talking about performance indicators as a tool for decision making.

Basically, the key question in almost all of them is, what's a valid and useful indicator? This paper is a little bit different. It's going beyond safety performance measures simply as a measure and looking at the broader organizational meaning that we have, that we attach to metrics and how that influences behavior.

David, I thought we might begin with just a quote from the paper which says that, "Studying the meaning of a safety performance indicator beyond its eponymous function is therefore important to gain a greater understanding of the conditions for indicator effectiveness and usability. This includes studying the multiple meanings that may be assigned to an indicator. That's what we're looking at today's multiple meanings that we can have around the same indicator.

David: I think it's useful to think about indicators like this. On one hand, we think that a safety indicator is something that we look at in isolation. Is it going up? Is it going down? What decisions do we make?

In this paper, what we'll talk about is it can be a very central feature for the overall functioning of the organization. If we think about it as a performance metric, if you're really shaped the way that the whole organization operates, it has advantages and unintended consequences as well.

Drew: Yeah. I think whether or not you are a fan of measurement in general or particular measures, we can acknowledge that anything which has the capacity to influence things positively, has the capacity to influence things negatively as well. If we believe that key performance indicators are effective, then we have to believe that having the wrong key performance indicators can be ineffective.

Social science is filled with people who are typically fairly uncomfortable with numbers. Pretty much, everyone acknowledges that the power that numbers have and what we choose to turn into a number, reveals a lot about what we hold important. The way in which we turn things into numbers reveals a lot about the logic that's driving the way that we both act and give meaning to our actions.

David: Yeah, Drew, I think that's really important. Like you said, whether we like it or not. I think this is the reality of organizational life. We turn lots of things into numbers. It's really helpful to understand how some of these numbers become very powerful in our organization in shaping the way that people in organizations make decisions, but organizations as a whole think about the overall success.

Drew, today's the title of today's paper, and I know you love a good title, but today's paper is called Tracking the right path: Safety performance indicators as boundary objects in air ambulance services. Four authors, Jan Hayes who we know from RMIT here in Melbourne, Australia, Tone Slotsvik from Stavanger University, Carl McCrae from Nottingham, and Kenneth Pettersen from Stavanger as well. I know, Drew, that you know some of these authors very well. Do you want to give us any background into the authors of this paper?

Drew: David, you've put me on the spot because mainly, we haven't actually written a little biography for these authors, mostly, I think just because these are pretty well-known people in the safety space. If you look up any history of the Journal in Safety Science, you'll see all four of these names occurring multiple times. Given that this is a case study paper, I don't really think we need to know a lot about the authors.

Often, when we're looking at something like a literature review or using any particular method, then you might want to inquire how much to take what the authors give you on face value, versus how much you want to check it for yourself. This is a very straightforward reporting of a real example that the four authors were involved in. Yeah, very experienced authors, but it actually probably doesn't matter for interpreting or reading this paper.

David: Yeah. I apologize for putting you on the spot, I know Ken Pettersen and Jan Hayes are very widely published in the safety world and also served some time on editorial boards of journals and things like that. I think when we get to case study research and qualitative research, it's helpful to know that the researchers involved are very, very experienced incredible researchers in the safety science world.

This journal only became available on 20th of March 2023 online. This paper is only a month old. It's published open access, so we'll put a link to the show notes and on LinkedIn. I think the other thing I wouldn't mind your thoughts on is, in this paper, it didn't seem that the question that this particular paper answers wasn't the primary research question during the data gathering phase. These researchers went and did a whole bunch of interviews in the air ambulance service.

This issue of availability became a really central topic in terms of the performance of the service provider in relation to the availability of services. That seemed to spark this whole paper. This whole paper seems to come out of the data that was gathered from a particular research project, even though this paper wasn't the original intended paper that was going to be written.

Drew: Yeah. I really liked the detective work, actually, that led to this. They noticed that there was this contradiction in their data. Some people thought that the availability performance was good, and some people thought it was bad. Basically, you've got the exact same thing. Different participants are giving you conflicting stories about whether it was getting better or worse.

Rather than either try to reconcile the truth and decide who was telling the truth, or try to fudge it and ignore the issue because you've got contradictory data, they thought, actually, there's something interesting going on here. Let's do a couple more interviews specifically to get at why our people telling us different stories about the same thing.  It's a number. People shouldn't be disagreeing about what the number is. It was that curiosity and that detective work that led to a really interesting story.

That's my little background to this paper. We're doing a case study of transition of air ambulance services from one private provider to another private provider. You've basically got the health service. Under that health service, you've got patient transport, and then under that, you've got the air ambulance service, but then they contract out for the actual pilots and aircraft.

The contract was with one organization for quite a number of years and was shifting to a new organization, basically, because there'd been a competitive tender. The new organization had won the contract, actually in part, because they had promised higher availability of aircraft for lower cost. It was this scenario that the researchers were studying, when the issue of how do you actually measure the availability, and did the availability go up or down during this transition, became the focus of what they were studying.

David: How you describe that situation, where you've got an incoming contract, if you like, promising certain things. We've got four main groups that are part of the data gathering for this research. You've got the actual procurement organization, so the company that's going out and tendering for these different service providers, you've got the incoming operator, you've got the outgoing operator, and you've got the pilots. You've got several interviews in each group.

I guess the pilots in this case, it's a little bit like sometimes we see in contracting environments in the safety world, where different companies will take on the overarching contract, and then the people who actually perform the work will be the same people, but maybe they just wear a different logo on their shirt when they're performing those services.

Drew: When we start talking about the pilots, we'll get into more detail about how they felt about the rebadging. The way they structure the paper and the way that we'll go through it in our discussion is they basically tell the story from the point of view of the four groups. Firstly from the procurer, secondly from the outgoing contractor, then the incoming contractor, and then the pilots. They bring it all together to talk about what these different stories mean for the notion of availability. David, if you're happy, should we just go through the four groups?

David: Yeah, I think so. Let's talk about the four different groups and how they think about this idea of availability.

Drew: Sure. Starting with the procurer, this is the air ambulance service. They operate two types of aircraft, fixed-wing and helicopters. This case study is just for the fixed-wing contract.

When you're running an air ambulance service, availability already has two different meanings. One of them is overall as an air ambulance service. How available are you? Basically, someone is sick, they need to be transported, are you able to transport them? Availability at that top level is overall what the service is trying to achieve.

You've also got the more specific availability, which is looking at contractor performance because there’s lots of reasons why they might not be an aircraft that aren't the contractors' fault if in the middle of a blizzard, you're not available, but it also shouldn't count against the contract performance.

The second meaning of availability is the contractor providing pilots and aircraft at the right rate. What they noticed is in their previous contract, they were averaging availability over time, which might seem to make sense because what matters is not your performance on a particular day, but measured over a couple of months. How often is the service available? How often is it not?

What they were worried was happening was, if the contractor was doing particularly well in any given period, they would slack off. They would reduce availability, even though they were staying above the target because they knew they had the margin. To save costs, they'd transport aircraft out of the areas where they knew they were going to meet the targets, into areas where they weren't going to meet the targets, just so that everything stayed just above the target.

They weren't worried that this was a perverse incentive that they weren't getting the contract performance, whereas they had a slightly different contract structure for the helicopter service. For the new contract, they thought, okay, what we could do is learn from the two different contracts and come up with a new contract, which would more likely drive the behaviors that we want, leading overall to better genuine availability.

Part of the paradox they had, and a lot of contractors have this, is you don't actually want to penalize your subcontractors too much. Certainly, you don't want to drive your own subcontractor out of business. There's still a bit of a perverse incentive when you're putting these performance penalties. You don't necessarily want to actually penalize someone for worse availability because then they have less money, and then they're less available.

There's limited ability on the part of the procurement agency to actually prevent poor performance. There's a lot of opportunity to use both the contract and public pressure to incentivize as much as possible, as much availability as they can get. That was their goal and belief. That they could, with a better contract, incentivize the subcontractor.

What they found is that the media started reporting on these availability statistics. Even though they had this quite sophisticated contract and this quite sophisticated way of measuring availability, obviously, the media didn't want that. The media just wanted a raw percentage and started reporting this raw percentage.

The procurement agency ended up almost internally having to use the media's definition of availability because that was what was being reported, that was what government ministers were asking about, that's what they were getting phone calls about. That started to drive some of their own understanding of what was important. If the media is reporting a certain percentage, when you got to make that certain percentage look good, which means you have to make the contractors look good against that measure as well, or at least give you performance against that measure.

David: Drew, I know you've done a little bit work in rail. Some of our listeners will know that I spent a bit of time in the rail industry. When I was reading through this paper, I was thinking a lot about on-time running of the train services. There are some very clear targets that get set by governments and get reported publicly around the on-time running of train services.

Even though there's a whole bunch of complex factors, which play into the contractual agreements between governments and rail service providers around the on-time running, at the end of the day, the public really only wants to know how many trains are on time. They don't worry too much about, what factors get included or not included into whether trains don't run on time or not.

Overall, those organizations, even though they've got some really complex ways of negotiating with government over service delivery, when it comes to the overall percentage, the company ends up really measuring itself based on what the public measures themselves on.

Drew: Yeah. That simplified measure leads to all of these really perverse incentives. For example, if you cancel a train half an hour before it's due to depart, you no longer need to count that. That train is no longer running late. Once a train is 10 minutes late, it's already late. There's no incentive for it running on time now. You might as well just let it be late.

David: And let those trains behind it be on time.

Drew: Yeah. If you measure it by raw percentage of how many trains are on time, then once it's late, it's late. You run into similar things with things like how we measure waiting times for elective surgery. Often, that becomes a big political issue. If the media is reporting average waiting time, that drives one behavior. If the media is reporting the number of people who had to wait more than six months, that drives a totally different behavior.

David: Yeah, that's a great example, Drew.

Drew: We can try to have really sophisticated measures that avoid these perverse incentives, but then we get driven into more simplified measures because that's what people want to hear about. Would you like to move on to the providers?

David: Yeah, let's keep going. Yeah, please.

Drew: Okay, the second group here is the outgoing provider. Remember, these are the people who have just lost the contract. They're feeling really hard done by. As we'll get into, they are not just feeling hard done by because they've lost the contract, but because they believe that they unfairly lost the contract, because they lost it to an organization that was making promises that they couldn't meet.

This is the classic case. You've got an institute provider. In order to drive competition, you open it up to public tender. Ideally, the markets are supposed to solve these problems. But then someone comes in who doesn't know what it takes to run the service because they haven't had to run it and wins the contract based on promises that they're not going to meet, but it's too late. They've already won the contract. The original contract has already lost it.

One of the challenges that the outgoing provider has, is they're about to lose all of their pilots to the new organization. Even though they know they've already lost the contract, even though they know there's a fixed time at which they no longer have to operate, they're still actually interested in both operating a good service and in making sure that these pilots successfully transition.

The pilots are going to not just leave, they're going to have to be trained on a new aircraft. It's the pilots that are shifting the aircraft that are not shifting. The air ambulance service isn't willing to make availability concessions to allow the pilots to be trained. The outgoing service can't release the pilots for training because it's got to still meet its availability targets. It's actually taking a lot of pride in meeting those targets.

They're trying to deliver according to the contract. They're trying to make sure that things are ready for the transition. They're trying really hard to convince the procurer that they're headed for failure because the incoming operators are never going to meet the requirements.

David: I think this is a fascinating intersection of contracts because how you described the barriers is something that may be familiar to our listeners, where you've got this incumbent contractor who's doing all of the work. Again, you've tended it. Maybe you found a cheaper price, maybe you found some better performance. You've got another company that's going to come in and take over the service.

At the same time, they're going to take over the people that are involved in the old contract, and yet you've got an existing contract where you're still measuring and rewarding the performance under that existing contract in a way that doesn't motivate, incentivize, or create the opportunity for the existing provider to actually support the new provider anyway. This is the one where structurally, you've created this situation which really doesn't serve your purpose at that point in time.

Drew: Yeah. You've incentivized the old provider to meet the availability targets. You've incentivized the new provider to meet the availability targets. But you haven't incentivized either of them to do what's actually necessary to meet the availability targets.

My favorite detail of this is, often, when you have this competition, the way that the new provider is able to underbeat is by being innovative. They might have a new way of delivering the service, they might have new technologies, they may have better efficiencies. It may just be that the old provider because they thought that the incumbent has been racking up the prices, and there's a profit margin that can be reduced.

In this case, we know what was going to happen. The new provider was planning to deliver more availability with less cost, with the exact same set of pilots. The only literal way they can do that is to make all of the pilots work for longer hours. This was the plan, except the pilots weren't contracted to do that. The pilots were still contracted by the old provider. We're going to get into what this means for the pilots in a minute, but let's talk about the incoming provider.

There is nothing in this paper, and there is nothing in the data, that says that this outcoming provider was doing anything dishonest or wrong. I'm quite interested to read the rest of the research that they're publishing around this transition that they were studying. At least in this paper, we have the incoming providers' view from their point of view. From their point of view, they were being honest, they were being clear, they had availability targets, they knew that they couldn't meet those immediately, and they were honest about that they negotiated that to make sure that the overall availability of the service was maintained.

The way this was going to happen was the procurement agency, the air ambulance service, was acquiring extra helicopters and aircraft to operate during the transition period. A lot of these, for example, were military aircraft. Military aircraft and pilots are filling the gap before the new provider can get fully up to speed.

Really, the question then is, how do you measure availability? Is it according to the contract, which is, is the new provider able to provide service pilots and aircraft? Is it according to the patient? Is there an aircraft to take the patient? Is it according to this negotiated deal, where along with the extra aircraft, the contractors' obligations are being met, they're just not being met by the contractor. They're being met by the interim arrangement that everyone has agreed to, even though that's not strictly what the original contract was.

Who gets to decide which definition? It depends who you are. If you're the media trying to drum up how bad the service is, then you try to use the lowest possible figure which is, how many aircraft are being provided by the new contractor? If you're a politician trying to claim that the service is fine, then you exclude all of the reasons why something might not be there, and you just talk about the maximum possible figure.

From the incoming providers' point of view, they're doing their best, they're meeting their targets, and they're meeting officially what counts. They're still getting hammered by the media, therefore also hammered by the procurement agency for not doing a good enough job.

David: Drew, we've got these real multiple ideas of availability, like you've mentioned. What does availability mean to the patient? What does it mean to the air ambulance service? What does it mean to the contractor? We can have these situations, where maybe the operator itself or the incoming operator, is only providing these claims 50% of the time.

When actual call-outs occur, maybe 97% of patients get the claim that they need. The 50% doesn't really matter from a service point of view, like you said, unless you're trying to actually negotiate penalties in a contract. It's the 97% of patients who get the claim when they need them.

On the other direction, you might have claims available 90% of the time, so between 50% and 90% is a huge service availability uplift. But maybe in terms of the call-outs, the contractors are hitting a target of 98%. You've got these things, where you've got these different measures of the same service that are vastly different depending on what you're counting and what you're looking for.

Drew: Yeah. Interestingly, the paper never draws a final conclusion that, was the service good or was the service bad? Was there a staff up? Was there not a staff up? It really tells a story that this entirely depends on your point of view and how you choose to present the figures. The people that are caught in the middle of all of this are the pilots, so let's talk a little bit about the pilots.

When you're a pilot, you've basically got competing considerations. From a broad aviation safety point of view, the default is always that you don't fly. If there's bad weather, the safe thing to do is stay on the ground. If there's a problem with the aircraft, the good thing to do is stay on the ground. If you're feeling tired, the safe thing to do is declare that you're tired and stay on the ground.

From a patient point of view, availability is the desirable thing. You need to get somewhere you want the plane to fly because that's going to keep you personally as safe and as healthy as possible, assuming that the plane doesn't crash. The pilots are always in this weird negotiated situation, where doing the right thing could be in either direction, but then this is compounded by their employment situation.

The new operator is planning all along to use the same pilots with new aircraft. That’s the basis for their contract is just assuming, if we win, there are going to be all of these unemployed pilots. We'll be able to pick them up because there'll be no other employer. It'll be easy for them to make the transition, it's just we grabbed them. We're going to employ them working longer shifts than they're currently working.

I don't know if they just assumed that the pilots would go along with that or assume that the pilots wouldn't have a choice, but that's what they planned. The pilots never agreed to those new arrangements. The pilots weren't in a contractual situation with the new operator. The pilots at the time of the tender were in a contractual situation with the old operator. They were already feeling overworked and stressed. Then they'd go into this transition period, where basically their employer is going out of business.

They're feeling just as hard done by as the old operator. They're feeling that the service that they've been doing isn't valued. They're expected to move the new operator, but they haven't been promised anything. They don't have clear contracts already in place. They don't have an agreement for work hours, but they're expected to be doing stuff during the transition period like training onto the new aircraft so that they can get these jobs with the new employer.

At the same time, there are difficult labor negotiations because there needs to actually be an agreement in place for who's going to pay for the training. Are these new shifts viable? Essentially, the pilots end up in an industrial dispute with the new operator. The new operator is not going to be able to immediately fly because the new operator doesn't have pilots. They won't have pilots, until the pilots have actually signed an employment agreement.

At the same time, we've got air ambulances that are not in the air. The media is making it look as if this is basically because of a pilot strike. The pilots are, in part, getting blamed for the lack of availability. They're feeling lots and lots of pressure, partly because they're getting blamed for lack of air ambulances, and partly because they're out of work. This is the only work that's going to be available to them as if they do eventually sign some deal with the new operator.

For everyone else, availability is a political figure, it's whether you get an air ambulance, or it's a contract term that determines penalties. For pilots, availability is basically how much they have to work. Media availability has a key thing to do with their, basically, conditions of work and conditions of life.

David: We'll get to practical takeaways at the end, as we always do, but I think it's worth pointing out here a little bit of a practical takeaway when we talk about these procurement processes. Anytime that a company is making forward commitments that they're going to deliver something that's better, cheaper, or greater than what's currently being done, I think it's really, really important that the organization understands the assumptions that have been made in that procurement process.

Here, like you've mentioned, these new service providers come along promising whether the same or greater availability at a cheaper price. But if prefaced on a whole bunch of assumptions to do with contracting these pilots, the pilots working longer hours, and a whole lot of things that, then there's no substance behind that. There's no agreements in place, there's no existing practice that can be referred to. At the end of the day, at this point in time, it's just a whole bunch of empty promises that this particular company asserts that it can deliver.

I think that's just a practical call-out I wanted to call-out now, which is, if someone's promising something better, bigger, faster, and cheaper, make sure you take the effort to understand how that company think is going to do that, and whether or not there's any facts or evidence to suggest that it can actually do that.

Drew: Yeah, contracts and the commercial market aren't magic. They can make money go up and down, but they can't create extra planes or create extra pilot hours. Unless there's a genuine technology improvement, then there's always going to be some trade off.

We see the same thing in the gig economy. The gig economy is basically about contractual differences. It doesn't magically create faster ways of getting from A to B, reduce the cost of getting someone from A to B, or reduce the wages that it takes for someone to work for a certain number of hours. Unless you believe that the industry is currently already corrupted with huge profit margins, which is usually not a safe assumption, then any slack has to come from somewhere. It has to come from working people harder, charging people more, or reducing the safety and quality of the service.

David: I think what we're seeing broadly in industry, at least at the moment, is that there's not huge profit margins in the supply chain that are just waiting there to be picked off. I think we've got to question any aspect of our supply chain, where there seems to be the opportunity for significant improvement. Drew, let's talk a little bit about the conclusions of this paper. Do you want to talk about what the overall findings were?

Drew: Okay. Before we do, David, I'm actually going to throw to you now to explain some of the theory for us because the guiding principle they're using here is the idea of a boundary object. Can you explain to us what a boundary object is and how that plays into?

David: Drew, this might be my punishment for throwing you under the bus earlier in relation to the authors. A colleague of mine, Ralph Shreeve, did a master's research thesis on looking at contracts as boundary objects between parties, between a client and a contractor. I guess this idea is that, maybe this metric that we're talking about here is in terms of the availability of the air ambulance service. It's actually a boundary object.

We're not looking at the specific criteria or the specific requirements. We're looking at more of a framing object or a criteria. In that research example that I gave, maybe the specific terms and conditions of the contract aren't as important as the overall deliverables of the contract itself.

What we've got here is a boundary object. We're looking at something that defines the boundaries of a particular service or activity. Notwithstanding the definitions, what counts as an available aircraft, whether it's late, or whether it's not available, or whether it's caused by a blizzard, as you mentioned earlier, Drew, all that stuff doesn't become as important as this overall boundary around the particular service itself.

What we're going to think about is that we don't have to worry about the details as much as what the overall position is. If we get these boundary conditions right, then the details can take care of themselves. If we take care of the details, it may not necessarily result in the overall condition being what we want it to be. Drew, I think my understanding of these things is that boundary objects become just, what's the overall framing of the relationship or the situation?

Drew: Thanks for that, David. If we leave aside the specifics of how we define availability, looking at the boundary objects caused us to just ask, what does availability fundamentally mean to different people? If you're applying a very commercial logic to things, if you're trying to maximize your profits as a contractor, then availability is something that you want to minimize.

Availability isn't good. Availability costs you money. You actually want availability to be as low as you can get away with. Less availability means less pilots, less planes in the air, less fuel, more time for maintenance, overall, lower costs. Unless you actually get hit with a performance penalty, which is bad, you want availability to be low.

The targets in contracts are basically floors. You want to get as low to that floor as possible without crossing it. From a commercial point of view, that's what availability means. From an air ambulance service, we learned that everyone involved here is part of an air ambulance service, availability is your mission, it's your pride.

High Availability means you get the patient there, and you save their life. High Availability means you are meeting the public's demands. You're providing a high value service. You're recognized by politicians and the media as doing well.

We've got these like almost conflicting meanings of availability that drive really quite different types of understandings and behaviors. This is why this is published as a safety paper and why we're talking about in a safety podcast is, in this particular thing we're talking about, availability and safety sits in that exact same space.

Commercially, safety is something that logically an organization wants to minimize. You want to meet your acceptable standards of safety. Anything more than meeting your acceptable standards is going to cost you unnecessary money, and create competitive disadvantage against people who are bidding for the same contracts as you are with lower levels of safety, and therefore with lower costs.

As an organizational value, we really care about safety, and we believe in it. We want to maximize safety. In fact, the aspiration is not as less safety as possible. The aspiration is often, explicitly, we're trying to get perfect safety. You've got the same boundary object and the same contradictory message that is something that we're trying to get as low as possible commercially, and we're trying to get as high as possible based on our values.

It's worth understanding those different meanings and those different logics because otherwise, we lean towards one or the other without ever acknowledging that sometimes things are actually shifting meanings. We really noticed this when we talked about the pilots. For the pilots, they've got not just availability, but they've also got aviation safety. They've got professional pride in being safe aviators, but they've also got professional pride in being an air ambulance service willing to get the patient there no matter the weather, no matter the difficulties in the way.

You've got this third meaning of availability, which is because it's important to other people, availability becomes a source of power in negotiations. Availability is something that the pilots can offer. If the pilots are willing to work for longer hours, that offers more availability to other people. If the pilot say no, no, we can't do that, then that gives them a reason to refuse things and say, we'd love to give you this availability, but flight safety and fatigue management, and we just can't do that.

David: We see this a lot, Drew. We mentioned on-time running in the rail industry. I thought train drivers have the same opportunity to position themselves in the middle of this reliability on-time running performance because they can take the decision about whether something is safe enough to operate, or whether they think something's safe and appropriate.

In this example, we're talking about health care. It's the same in a whole bunch of situations where people want to do their job, and we also expect them to be safe. I've done some work with a lot of utility organizations. When there’s service outages, I think people pride themselves on turning the lights back on and are prepared to try to figure out what's the appropriate trade-off between safety and the availability or the reinstatement of a particular service.

I think this happens in lots of operational environments where we've got the core operational deliverable. We've got this expectation of, in this instance, or in our instance from the podcast point of view, safety sometimes competing alongside that.

Drew: Yeah, and it's also not that unusual that the core operational deliverable is as important or even more important than worker safety. When we’re talking about getting the lights back on, that's also getting the power back on to hospitals. That's also making sure that the person who needs continuous power for their medical devices at home is getting that continuous power. Providing people with continuous safe drinking water, providing people with transport, these can be life critical things. It's not as simple as just our safety versus your operational availability, as if operational availability doesn't matter as much.

David: Maybe I'll just throw this out there. I guess in this trade-off between the core operational deliverable and safety, I think at the end of the day, whether the intention within organizations is that we want to maximize the core operational deliverable for a level of safety that we can get away with. This idea of what you mentioned earlier about floors and performance.

I think this might be controversial, Drew. To your point, even though organizations say we want the maximum level of safety for the minimum acceptable operational deliverable, I actually think organizations brought the maximum level of operational deliverable for the level of safety that's no greater than what we can get away with.

Drew: David, I don't think I'd like to make a generalizable judgment there. What I will do is support the message in this paper. Just because we've put both of these things into our contract, doesn't mean that we've really clarified what that meaning is. Let the metrics that we're using have different meanings going upwards and downwards.

In this case, the air ambulance service, the procurement organization, the message that they're sending upwards to their political masters is to make it look like there's maximum availability, and then the message they're sending downwards to the subcontractor is trying to penalize them for poor availability. That's why it's interesting and important to look at what we mean by these different measures.

The next interesting point that the authors make is they say, there are these problems with measuring availability, and it's not a good measure, but it's also very obviously not a good measure. Sure, a percentage is very simplified, but also explaining the problems with a percentage is pretty easy to do.

Using our example of on-time train running, it's pretty easy to explain why the number of trains on time is not actually a good measure of how well your train system is running. The stakeholders aren't stupid, so why are key stakeholders focusing so much on such a simplified metric? The authors say that they can actually explain that.

They say, "Having just this simplified, broad metric, even though it's got real problems, even though it may be statistically completely invalid, they say it has leading qualities in practice, and thus might be more effective promoting positive change than might be predicted." That's a direct quote. They basically say that even though it's bad, it still may be effective.

They say that even if we're not really in agreement about what the right measure is of availability, at least if we have a figure, and we all agree on that figure, it puts some stability into the system and lets us then use that as our starting point for conversations, use it as our starting point for investigating poor availability, use it as our starting point for agreeing that we need to improve availability and talking about how we're going to do that.

David: I guess in practice, maybe using that as a framing for decisions as well. We talked a little bit about metrics being bad reference points for operational decisions. But I think what we're saying here is that, even if the way that we calculate the metric isn't exactly the best representation of underlying performance.

If it points the organization in a certain direction in the way that it thinks about its purpose and its decisions, then maybe it makes decisions in a certain way which isn't all bad, which is interesting. I hadn't thought about metrics in this way until I read this paper, which is, suddenly it might not be measuring what we want it to, but it might be framing decisions in a way that we actually do want it to.

Drew: To be honest, I don't really buy the argument. I think it's a sophisticated way of saying, yeah, it's bad nature, but it's just one source of information. If it sparks good conversations, that's positive. The reason I don't buy it is you could have those conversations anyway. You could have those investigations anyway. You would know that availability was bad anyway because you've got to put in all these extra arrangements borrowing aircraft from the military, because you don't have good availability.

I don't really think you've got good evidence that percentage measure of availability, using that as a metric, and publishing that as a metric, is helping the situation at all. I think you've got the conversations anywhere, I think you've got the negotiations anywhere, I think you've got the shared understanding and the shared lack of understanding anyway.

If anything, I think they're trying to use this simplified figure to hide the problem. It was just that the problem was so big, that even though they could manipulate the percentage figure, the public still knew that there were bad things going on because they could see the military jets operating in the air ambulance service.

For all of the stakeholders, they were trying to use that figure to achieve their political ends. It was just that the situation was so bad that they had to do the investigation. I could see it as possible. I just think you'd need better evidence that the measure was actually helping, rather than the general situation was going to lead there anyway.

David: Yeah, I think that's important. Thanks, Drew. I think that's an important point. I personally wouldn't underestimate the pressures and the tensions that come into organizations based on some of these metrics. Even though maybe the conditions exist, the fact that you've actually put a number around it, and it might be 75%, when your target was 95%, I think when you actually start to label things, and you do have an indicator around things, I think that creates pressures and tensions that automatically make something really important to an organization.

Even though those conditions might exist, and you see the military aircraft, and maybe you're not delivering the service as you might want to, as we said at the start of this episode, once it starts becoming a number, once a whole bunch of stakeholders start talking about it, maybe it does actually focus an organization's attention around, what can we do to do this? For a while it might be, how do we manage the number?

If it's 75%, 76%, 77%, you can only massage that number to a point before you actually start having to make some different operational decisions about how you run that service. I'd love to see the research in these core operational metrics, how it actually frames the different decisions that organizations make.

Drew: David, the other point I'd throw in there is that, where I think this gets really negative is when there is very little that can be done, except for the massaging of the number. I think that's similar to the case here. We've got a limited number of pilots. We've got a limited number of hours they can work. No amount of pressure is going to miraculously create new pilots.

We have a similar thing in healthcare with waiting times. Things get reduced to certain key performance indicators for the healthcare system. Yes, that drives media attention. Yes, that drives political attention. But if anything, it leads to over simplified interventions.

We get the government then manipulating the figures about how many new nurses they're providing, how many new beds they're providing. They're just using new simplified statistics to talk about the things that they're doing to fix the problem, which has been oversimplified. Are we actually graduating more doctors, graduating more nurses, and putting more money into the system? There's always going to be a limit as to how much that pressure is actually helping, rather than just driving perverse behavior.

David: Yeah, Drew. It's a great point, the pressure on the system. I guess I was more thinking about where there is freedom of movement and opportunity, where we've got some genuine system constraints. Then that pressure, obviously, isn't going to be constructive in the way that I'd like it to be. In this case, we've got a certain number of pilots, a certain number of aircraft, a certain number of dollars inside the contract. All these constraints are fixed, so I think that's a great point.

Drew, I think at that point in time, additional pressure in the system isn't going to change operational outcomes. Additional pressure in the system is just going to make people frame things differently.

Drew: Yeah. One other point they make, and this point I think they're very persuasive on, is they explained that the reason why the procurer or organization is focused on this availability percentage is because of the fact that it's tangible and comprehensible to the outside stakeholders. Remember, part of this whole contract change was they wanted to use more sophisticated ways of measuring and tracking availability.

They knew the problems with their old way of measuring it, they'd thought about it, they'd looked at other contracts that had worked better, had provided better incentives, they'd change the definitions, change the measurement, change the contracts. But because the media and the politicians are focusing on this raw percentage, the whole organization ends up back to focusing on the raw percentage and putting pressure on the contractors for the raw percentage again.

I think that explains a lot about why organizations get stuck with things like lost-time injury rates. It's not that they're stupid, it's not that they don't see the arguments, it's not that they don't see the evidence and the problems. It's that this is a figure which other people value and find understandable.

You can't internally replace it with something sophisticated at the same time as you're externally reporting something simplified because your internal systems have to link up with those outside systems. You can't, with a straight face, try to manage this outside number, and then within your organization, tell a totally different story about what matters, what's valued, and what's measured.

David: Yeah, I think that's a great point. I thought you might even scream louder about the analogy between the availability percentage that we've spoken about throughout this whole episode and the lost-time injury rate for safety. I think how you described this is the reality. Why we're still having this debate 20 years, 30 years later about metrics in safety is because this simplified metric is tangible, it's understandable, and it's communicatable.

Even though inside organizations, there's a broad understanding that maybe it's not quite representative of overall safety system performance, it's not representative of how we think about things, at the end of the day, how many people did you hurt? In the conversation we're having today about air ambulance services, at the end of the day, how often were planes available to fly? I don't know how we're ever going to get away from that desire to oil things all the way down to a very simple way of understanding, is the company doing what we want it to do?

Drew: I don't think we can ever get away from it, but I do think we can try our best not to contribute to it. We'll begin the takeaways at the moment, but I think this is where we should be very skeptical of things like making companies publish their safety figures and doing things like making people on the ASX include certain figures in their annual reporting, whether it's for safety, environment sustainability, or community goals.

Those seem like good ideas. Those seem like they're increasing the amount of information. But we should be just really cautious about how that causes people to have to then dumb down how they're thinking about those things on the boundary, and therefore also thinking about those things within their organization. What we think is transparency may actually be forced simplification.

David: Drew, even in these non financial aspects of organizational performance of an entity is any different to any aspect of organization performance. We could have said the same thing there about profit. We know that there's a lot of pressure on accounting treatment of finances within organizations based on delivering a certain level of profit, statutory profit, underlying profit or whatever that people are paying attention to in that organization.

I think any time we put a number on an aspect of organizational performance, we need to understand that it's a gross oversimplification of the operating performance of that company in respect to that metric. The company is going to be doing whatever they can to tell a story around that number, or make that number be something that aligns with the story they want to tell. I think that's the broad takeaway in relation to organizational performance from this paper.

Drew: Thanks for that, David. Should we move from broad takeaways into our specific takeaways?

David: Yeah, tell us what you think, Drew. Tell us what you think we can do about this.

Drew: Sure. I think the first takeaway is that the case study in this paper shows very clearly that the argument about metrics and indicators isn't just some academic problem about what is statistically valid. This is a really good example of how choice of a particular metric leads to people repeatedly spending time and effort improving the indicator at the expense of safety. That was happening even before this whole case study in the old contracts that they, and it was illustrated all through the new contract. Having metrics that aren't quite what you want them to be leads to perverse behavior.

David: Drew, we know that. We've linked it to lost-time injuries. We know throughout the decades that putting a target or focus on last time injuries changes the way people classify things. Spending time improving the indicator at the expense of trying to improve the underlying conditions that are causing incidents in the organization.

Drew: The second one is that when we put indicators on things, they can make complex and less measurable factors, overshadowed, or invisible. That takeaway is almost a direct quote from the paper, I just thought it was a really important takeaway. Just beware, it's not just that the metric itself might be a problem, but there are other things that might fall into the background and get less attention because we're focusing on the metric.

The third one is specifically about financial penalties. This is when we're subcontracting. When we start to tie things like financial penalties to indicators, that can create counterintuitive incentives that can actually work directly against what we're trying to achieve.

David, I'm not a lawyer, you're not alone. You know much more about contracts than I do, but this is really an expert field that non experts should steer away from. Just trying to put clauses into contracts doesn't always achieve what you think it does, particularly when it comes to things like financial penalties.

David: Yeah. Like you said, even targets inside contracts. You mentioned, if I've got a contractual obligation to provide a 95% availability for an air ambulance service, but you mentioned earlier, that's a quarterly measured outcome in the last two and a half months I'm tracking it 98% or whatever. I worked out that in the next two weeks, I actually don't have to provide any service to meet my contractual obligation, then I can move aircraft and pilots for those two weeks onto a different contract where I might be able to catch up.

That target and that financial reward doesn't actually serve my intended purpose, which is to maximize the availability of this particular contract. I think how we think about objectives, targets, and rewards in these situations, is really important.

I don't have the answer, necessarily, but I think a lot of times in these arrangements, we just put these numbers, put these timeframes, put these outcomes together, and we don't necessarily think about how that shapes decision making for companies in the supply chain that are already under a lot of pressure. They're just going to do whatever they can to maximize their gain in any way. We shouldn't judge them for doing that. That's the business model that we've created for them.

Drew: Exactly. David, the question we asked this week was, do safety performance indicators mean the same thing to different stakeholders? The very short answer is yes. A slightly longer answer is that metrics sit on the boundaries between stakeholders. They always seem to be forming that communication function, but we should just be cautious that they can be communicating different things in different directions on that boundary.

David: Drew, that's it for this week. We hope you found this episode thought provoking and ultimately useful in shaping the safety of work in your own organization. Send any comments, questions, or ideas for future episodes to us at feedback@safetyofwork.com.