The Safety of Work

Ep.55 Are injury rates statistically invalid?

Episode Summary

Welcome back to the Safety of Work podcast! In this episode, we discuss whether total recordable injury rates are statistically invalid. Further, we talk about what that would mean on a practical level.

Episode Notes

The paper we use to frame our discussion is one that has been making the rounds on social media recently. We thought it was important to seize on the opportunity to discuss a work safety issue while it’s top of mind for the public.

Topics:

What it means when something isn’t peer-reviewed.
Why statistics are ever popular.
How many workers hours to decimal places.
Using a model that weighs underlying variables and randomness.
How this study is another nail in the coffin for this question.
Practical takeaways.

Quotes:

“I’ve noticed in Australia, at least, there’s an increasing move to have safety statistics included in annual reports, at least for publicly traded companies.”

“And their conclusion was: Almost all of it was explained by randomness.”

“If recordable injury rates are used to record performance, then we’re actually rewarding random variation.”

Resources:

The Statistical Invalidity of TRIR as a Measure of Safety Performance

Feedback@safetyofwork.com

Episode Transcription

David: You're listening to the Safety of Work Podcast, Episode 55. Today, we're asking the question, are total recordable injury rates statistically invalid? If so, what does that mean practically? Let's get started.

Hi, everybody. My name is David Provan and I'm here with Drew Rae. We're from the Safety Science Innovation Lab at Griffith University. Welcome to the Safety of Work Podcast. In each episode, we ask an important question in relation to the safety of work or the work of safety, and we examine the evidence surrounding it today.

Drew, we've got something a little bit different today from what we normally review. What's today's question?

Drew: David, this is a little bit of a quick turnaround, which is partly due to the interest of the paper and partly because our ability to record episodes in advance is getting crunched as we come up to a busy time of the year.

There's a paper that's been making the rounds on social media that has attracted interest and that we were interested in ourselves. It's by the Construction Safety Research Alliance. The lead author is Dr. Matthew Hallowell. The title for the paper is The Statistical Invalidity of TRIR as a Measure of Safety Performance. We thought rather than doing anything sophisticated, we'd just pick up the paper and talk about what we think about it. We'll give you a bit of a summary of the paper, talk about some of its key findings, discuss what we consider to be sort of strengths and weaknesses, and what we think the findings mean in practice.

David: Drew, I think for as long as I've been working with you, you've had a working paper sitting there unfinished. I think the title is something like, We Don't Kill Enough People. Is this a paper that you, I suppose, as someone beat you to the punch with a paper that you're intending to publish it at some point in time?

Drew: Absolutely, David. I think I had the cooler title out of the two, but they've written the paper I wanted to write and they've worked out a couple of ways of explaining things that I was finding hard to explain. They solved the big problem I had. Some of the things about injury rates are actually really well-known, statistically. They're not actually new findings, so it's a bit hard to publish academic papers where you're stating what, to academics, is the obvious.

They've come up with a really neat solution, which was just, yeah, forget about the academic publication. Just send it straight out to industry as a paper that they've done themselves. Yeah, I'm a little bit jealous. I can hardly say I've been scooped about something that people have been complaining about for 30 years. Yeah, I wish I'd written this paper myself.

David: Yeah. We'll talk a little bit about the format of an industry white paper and how it's more accessible to certain audiences. I think the authors of this paper clearly had an audience in mind when they were choosing this particular format to publish.

Drew, do you want to describe a little bit about the nature of the paper? It was published by the Construction Safety Research Alliance, which is sort of an industry academia collaboration. Do you want to talk a little bit about the paper, how these alliances work, and where they get their funding from?

Drew: I don't know a lot about this particular group, but it appears to be a structure that works. It seems to work a little bit more successfully in America than elsewhere, where industry pulls together in a consortium. A number of industries that are interested in research have like-minded problems and questions, and perhaps go to researchers, pull the funding, then they use that to sponsor particular projects that all of those companies are interested in.

The advantages that the funding goes directly to questions that the companies are concerned about. You don't have the hassles of having grant applications and competitive funding, which usually leads to higher quality projects that are listed directly towards the industry needs.

That publication is what we call a white paper. The idea is it's been self-published, it's publicly available—there's no paywall—anyone can read it, but the downside is that it hasn't been peer-reviewed.

Now, we've talked a bit before on the podcast, David, about how the peer review process works. The fact that something hasn't been peer-reviewed doesn't necessarily mean it's bad. It just means that it hasn't been scrutinized. There's a slightly higher obligation on readers to think about the methods and think about the types of things that peer reviewers would look at rather than trusting that someone has looked at that for them.

I should say that a version of the paper is being published in a journal called Professional Safety. Even though Professional Safety is technically peer-reviewed, it's an industry journal. This is not intended as an academic type publication.

David: One of the advantages, Drew, I think, is that in a white paper you can write for your audience. You don't have to write in a heavily academic tone that you'd be normally writing for peer reviewers of academic papers. It provides the opportunity to make the content a bit more accessible for a non-academic audience.

People will see this, read the paper, and I'm sure some of our listeners will have already read it. Others can just click on it in the show notes and get open access to it. It sort of means that I think if we're going to look at it critically, we have to sort of look who the authors are. Are they authors who have some academic credibility? So we can trust to some level the information that they're communicating without that peer review process. You want to give us some insight into who the authors are?

Drew: The lead author is Professor Matthew Hallowell. He works at an American university, University of Colorado. His title is Professor. He published a lot, interestingly, almost entirely in construction journals and conferences rather than in the key safety journals. I don't know quite why that is. It tends to be that people find their niche where they're comfortable publishing, where they know what peer reviews turnaround they'll get, and tend to publish in a narrow slice of places.

Another author is Professor Matt Jones, who is a psychologist. His expertise is in experimental psychology and learning systems. Reading through the lines, I'm thinking that he's the one who really knows his stuff when it comes to statistics. He was brought in as an author to provide the internal peer review on the statistical arguments.

The other authors are senior people in industry. These are the people who understand how injury rates are collected and used in practice. It's pretty much exactly the list of authors you'd want—a safety expert, a statistics expert, and some safety industry people.

David: Let's go for first impressions. We'll go through how the research was done and all of the findings. Just to spoil a few things, things that really jumped out, I suppose it was good to see the statistical work that had gone into confirming things that I assumed were the relationship between injuries and fatalities or the lack of a relationship between injuries and fatalities. What that means practically, we'll talk about throughout the podcast. Then, just did the randomness of movements in TRIR, what that means and doesn't mean for how we can even think about using injury rates in any way to understand the safety of our businesses.

There are a few things that weren't nice to see because it gives us a real problem in safety. It was good to see such a large body of data, statistically confirming those sorts of things that we tend to believe.

Drew: I also like the fact that even though this paper gives a pretty damning account of injury rates and the conclusion really should be that no one with credibility should use them. They don't quite go that far in their recommendations. They do provide a couple of fallback positions, which I think provide a practical compromise for people who have no choice but to report injury rates and are looking for ways to escape from some of the traps we've got ourselves in, where everyone reports them so everyone else has to report them.

Drew: Let's talk about a summary. We’ll make a few assumptions. We'll make the assumption that our listeners are familiar with total recordable injury rate. What we're really talking about here is the number of injuries that you have in your organization over the number of hours that you work in. Depending on where you are in the world, it might be how many injuries per 100,000 hours, per 200,000 hours, per million hours. Really, it kind of doesn't matter. It's just that the rate of incidents per the hours that you work in your organization. How do organizations use these injury rates, Drew?

Drew: In the paper, they've got a list of different types of reporting and decision-making that use injury rates. They say companies use it to report results. They use it to benchmark against their peers. This is saying, this is what our injury rate is. This is what someone else in a similar industry has as their injury rate. They use it to prequalify and select contractors.

David, I don't know about you. I've seen that directly myself. Contractors actually have to fill out on the form, what is your total recordable injury rate over the past two years? Evaluate managers, which is not one I'd heard of before, but I guess makes sense. If your injury rate goes up, the safety manager gets a slap on the wrist.

David: Oh, look, I think you're using reward schemes. Many organizations would apply financial bonuses for managers, would promote managers who'd been seen to reduce injury rates in certain areas, would demote managers, and change managers who'd been seen not to improve recorded injury rates sufficiently. I think it's very much used as an individual evaluation method.

Drew: To track the impact of safety initiatives—this is knowing whether what you're currently doing for safety is working or not and whether it's making a difference—they say it may influence insurance premiums, may influence public opinion, and be scrutinized by investors. I've noticed in Australia at least, there's an increasing move to have safety statistics included in annual reports, at least for publicly traded companies. Some companies definitely use their injury rate as a flagship on their marketing materials.

David: Very much so, Drew. In sustainability reporting in the GRI and others, it's quite a prominent indicator. I think our common criticisms would be, we talk a lot about lagging indicators. We talk about it being reactive. We talk about it not differentiating between injuries of different severities, these minor injuries versus fatalities.

These criticisms I think I might’ve mentioned on the podcast before. There's a paper that was published in safety sites I like, and I think it was 1993 or 1994. It has a title of something like, Thankfully we've seen the end of using lost time injury rates as a measure of safety performance, which was the start of the real positive performance indicator movement. Which is what? 26 or 27 years ago?

Like you said, this is not new. It might be a new way to look at a large data set and look at some statistics. It's not new, but this paper also does what we haven't done in the last 25 years, which is a line behind some other kind of alternative measurement. I think we'll get to that at the end of the podcast.

Drew: One thing they don't mention that I think definitely is worth a mention whenever we talk about injury rates, is the problem of how much the rate gets distorted by both reporting and classification of injuries. That it's quite possible in many organizations that total recordable injuries is an imaginary figure, that is so distorted by what gets reported and what gets classified as work-related or not work-related. But in this particular paper, they're not concerned with the fact that it's lagging, or about underreporting, or that it doesn't differentiate between injuries of different severities.

If you're arguing with anyone on LinkedIn and they're making comments about those things with injury rates, that's not actually what this paper is about at all. This paper almost takes as a given that we've got good data. It says, given that we've actually got good data about injuries, these are still the problems that exist with total recordable injury rates. David, is it okay to move on and start talking a little bit about those statistics?

David: Yeah. Tell us about statistics because I must admit some of the statistics are understood. Even for a white paper, I was out of my depth when I was reading through some of it, and it was probably the most technical white paper that I've seen. I think the statistical argument being made necessitated the heavy statistics. If you're happy to explain them, Drew, I'm happy to let you.

Drew: Let's start off with the very basic concept of validity. There's two types of validity that we care about when we're measuring safety. There is construct validity, which is, are we measuring the thing that we think we're measuring? Then there's statistical validity. Statistical validity is often spoken about a lot by academics and really misunderstood by practitioners.

The common misunderstanding is people think that things can be good up to a point, then they become statistically valid and then the extra good. Whereas, the reality is that statistical validity is like the minimum bar for something to have any meaning at all. Anything that doesn't have statistical validity has no practical use, no matter what. It's other things.

A common example is if one politician is ahead of another politician in an opinion poll. We say that difference isn't statistically meaningful or it's within the margin of error. We're not saying one politician is only slightly ahead. We're saying they are so close that we don't know who is ahead.

When we say that an indicator has no statistical validity, we're saying that when it goes up, the reality might be going down. When it goes down, the reality might be going up. It's so close that we don't actually know what's going on from the indicator.

Then, if a driver or as they're called in this paper, TRIR isn't statistically valid but we're using it to make decisions, then we don't actually know whether a company is getting safer or getting less safe, even though the numbers might be appearing to drop or go down. It means if we're using it to compare contractors, we don't know which one has the better safety record, even though one has got a lower number. It means we don't know whether the safety manager deserves their bonus and is doing a good job even though the score has gone down. It means we don't know which safety initiatives are working.

You can't get out of this by saying, oh yeah, we use recordable injuries, but it's only one of the indicators we use and we use a package of indicators. If it's not statistically valid, it should have no part in any package of measures. If something doesn't have statistical validity, that's it. It has no use as an indicator. That's the question they're mostly asking in this paper, that and a couple of other questions. Statistical validity is the big one. You just ‘can it do at all,’ what it says on the tin.

David: To achieve that statistical validity with all these statistical tests for someone who doesn't spend a lot of time with these heavily quantitative statistics, what we're looking for is sufficient patterns in the data to have confidence that things are moving around in some sort of predictable and reliable way. We can see that what the real problem here is just the sheer randomness of the movements of the numbers.

Drew: Yes, exactly. The paper starts off with a description of some stuff. This is the bit that I'm really jealous of because this is the bit that I wanted to put out in the paper. You can only be told the same stories so many times before it gets boring. It comes originally from a book called The Law of Small Numbers, which was used to calculate how many soldiers in the Prussian army would get kicked to death by horses. It's a famous story in statistics, there’s decisions leading to cross on distributions.

Safety data doesn't work according to the normal bell curve distributions that we're used to seeing in other types of statistics. They follow these distributions.

First thing I saw when reading this paper is, yes, they've got it right. Just the fundamentals that so many people miss using the wrong family of statistics. The original use of press on distributions was using 20 years of data from 14 Cavalry Corps. That gives you some indication of the large numbers you need to be able to make statistical claims about very small numbers of people getting hurt.

They've got this figure. It's Figure 1 in the paper, if you're reading it for yourself. I checked with Matthew, there is actually an error in this figure. It's supposed to be either percentages or decimals, but not both. I read this as percentages and it gives some idea of the variability.

It says that if your injury rate is truly 1.00, this is the actual underlying risk that people face, 37% of the time you'll get 0 injuries, 37% of the time you'll get 1 injury, 18% of the time you'll get two injuries, 6% of the time you'll get 3 injuries, and the rest of the time you'll get more than 3 injuries. Even if you've got just a constant level of safety, the numbers you're getting can easily roll around anywhere between 0 and 3, which is why if you see your figures sort of going from 0 to 1 to 2 to 3, back to 0 up to 6, that's just normally what you'd expect with a constant risk. Your thoughts, David?

David: Yeah, look, it took me a while to get my head around. This is probably why I had the bell curve in my head, why if TRIR is truly 1.0, then why you wouldn't have the same chance of 0 and 2. That was kind of my limited understanding of the statistics. Then, the more I looked at, the more it made absolute sense to me.

I think that's the range that we need to be, that we'll talk a little bit about later, which is that there's no point saying that TRIR is 1.0. The answer is probably my TRIR is probably somewhere between 0 and 3, but I don't really know where it is.

Drew: That's the answer they give. They say that we should think of these things as ranges rather than as point estimates. If you think it's 1.0, then actually it's really a confidence interval somewhere between around 0.2 and 5.7. That's what we should say. Instead of saying it was 1.0 this month, we should say it's somewhere between 0.2 and 5.7.

Then, if that's the range, then if next time it's also within that same range, then it hasn't gone up or down. It's just within the same outcome that you'd expect for a fairly constant risk. They also give some fairly useful ballparks for how many work hours you'd need to get a more precise estimate.

I think this is a really good way of explaining it. It's basically saying as we work more and more hours, we can get more and more precise estimates. You've got to work a heck of a lot of hours to narrow it down, even to use one decimal place, to be accurate enough to make a one decimal place difference. This is instead of claiming that it is 1.0, claiming that it's something like 1.1 or 1.2.

To claim it that precisely, you need around 300 million hours of worker exposure for each calculation. Unless you have that many worker hours, you shouldn't be reporting things to one decimal place. To report it to two decimal places, you need 30 billion worker hours. I thought that was just a really good way of explaining that if you're putting decimal places, you're just talking nonsense. It can't possibly be that precise.

David: I think many of our listeners and many organizations would report their TRIR to two decimal places and not have 30 billion hours. Just the quick maths, one person works 2000 hours a year. For 300 million hours, you need to report a year of statistics for 1500 people to have some chance of a 0.1 decimal.

I think, Drew, our reporting clearly inside organizations doesn't match the statistics. I'll ask you now, based on your view, I did try with one organization at one point when I was seeing TRIR 4.0 then the next you want it to be 3.0. In the next, you want it to be 2.0, then it went up to 2.5. Then, it was bad. Then, we wanted it down to 1.0.

We were talking to that organization about just going, if it's under 5.0, who cares? Anywhere between 0 and 5.0, I suppose that doesn't quite work if it's at the upper end and then it could be somewhere between 2.0 and 10, you wouldn't know.

Definitely, that mind set I was trying to get, that part of that organization was to say, who cares if it's 1.0 or 2.0 or 3.0 or 4.0? Just pick a big range and say, look, if it's anywhere under 5.0, we don't care anymore whether it bounces up and down. Does that sort of match some of the findings here or would you want to be more specific about your range?

Drew: Sorry, David. I've actually got two conflicting thoughts about this. The first one is that if your actual risk is fairly high, it can still bounce down below that level of 5.0. That still could be quite misleading. I think if you had a genuine target of 1.0 and that was your long-term average, then it would really be quite reasonable to say, okay, we know that it's going to bounce around. We know that for a steady risk, it's going to be easily up to 6.0. That's our warning sign if ever we get above 6.0, then actually there's probably something going on because there's a really low chance of that happening by chance.

You could do a similar thing if you thought your long-term average was 5.0. You could say, okay, we're going to expect it to go around anywhere from 0 to 10. If it's above 10, that's our warning sign.

David: I think that matches for some people. I don't think we're advocating necessarily measuring it, but we also understand the practicalities of how hard it is to remove these numbers from your organization. Maybe just restate that as a practical takeaway. Like, if you've got a really low TRFR of 1.0, not getting worried until it hits 5.0 and having your organization understand that. If you're setting key risk indicators around things, again, pick a number that's five times of double, just set some limits, and really pay no attention to the number at all until it hits some limit. That is, I suppose, less likely to be a best statistical chance.

Drew: Yet they make a suggestion that we might talk about again when we get to practical takeaways, which is just always when you report it, report it with those confidence intervals. Just get into the habit of saying it is somewhere between this and this. The most likely place for it to be is the particular number. If we can get into those habits of using the statistical language correctly, we can just start creating that impression that these are ranges rather than very accurate numbers.

The second thing they do in the paper, David, is a calculation of the amount of randomness. I don't think it would be particularly constructive for either of us to try to sort of explain in detail how you do the randomness calculations. Basically, all the stuff so far doesn't actually need real TRFR to do the calculations they've done. This is just a raw fact about the types of distributions. For this bit, they have a lot of data from the companies that they're working with. They've got a number of companies, lots and lots of months of data. They can look at comparatively how these things move around within a single company and how they compare between companies. They can say, is there any indication that these differences are due to some underlying factor or trend that is moving the rates around?

The idea is you use a model that goes something like there's an underlying variable that causes it, plus there is an amount of randomness. We ask how much of it can be explained by differences in the underlying variable, how much of it can be explained by the randomness. Their conclusion was that almost all of it is explained by randomness. If you have two TRFRs to two TRIRs and they're different, that's mostly just because of random movement. It's not because of some fundamental difference between the two months, the two companies, or the two situations.

David: They put a number on that, Drew, this number of 96%-98%. That's a high percentage of saying this is all of the movements of all of the TRIRs in all of the data. It is basically 96% or 98% of all of that movement. We can sort of say it's random. That feels like a very big number.

Drew: Well, one way to think about it is to think of this as a signal and noise. If it's 95%-98%, your signal is 2%-5%, and the noise over the top is all of the rest of the movement. If you're trying to listen to a radio with that much noise and that much signal, you'd just be hearing white noise.

David: That randomness is really important, statistically, for us to understand. You’ll know what happens next month when the TRIRs are up and what happens next month when it's down.

The second thing that went into some detail around was this relationship between recordable injuries, fatalities, and sort of drew this through the same conclusion that couldn't really find any statistical relationship between those recordable injuries and fatalities. The white paper even refers back to some of the more recent work.

We've had debates about Heinrich's work in triangles and pyramids, depending on what you want to call them. Really, this big statistical testing shows that you can't look at your recordable injuries and infer anything about your risk of having a fatality.

Drew: Yeah, I don't know if this one is best described as beating a dead horse or rubbing it in. Once you've established that all of your movement is random, then of course, it's not going to correlate with something real. That's basically what they found. They went looking to see whether that small percentage of underlying pattern correlates with fatalities or can predict the fatalities.

In terms of statistical methods, there are more things that they could have tried here. They were just basically looking for a straight relationship that you can do things like correlation over time and try testing out whether it predicts one month in advance, two months in advance, three months in advance, and do a sliding statistical pattern to check for every possibility that they haven't been thorough to that extent. As I said, once you've established the first bit, this is an inevitable conclusion anyway. When you're looking for a pattern inside noise, the answer is noise.

David: I think, practically, for people who say—and I must admit, this is something that I believed at a point in my career—which is that all recordable injuries is worth looking at our TRIR because it's a barometer force for our broader management of safety, including fatality risk. That's just not true.

If people think that they're measuring their recordable injuries as a barometer of safety or as some kind of early indication of fatality risk, it's not true. If you're interested in fatality risk, looking at your recordable injury will give you no insight at best and probably false confidence at worst.

Drew: Yes, absolutely. Before we just move on to the findings, we mentioned at the start that this is a white paper, it hasn't been peer reviewed. I'm fairly good at reviewing statistics. I spotted a couple of minor errors at the level of typos. Their statistical arguments here are fundamentally sound. They've used the right types of distributions, the right types of models, the right types of tests. They could have been a little bit more thorough in just trying harder to find some sort of thing that the TRIR predicted. They tried pretty hard and found nothing.

David: Drew, these findings, let's go through their six findings that are called out in the paper. Let's go through it. Most of them we might have mentioned, but we're saying that so listeners can pay close attention because these are all of the arguments around recordable injury rates. They're not associated with fatalities. This paper statistically, we should never probably ask that question again of whether there's a relationship between recordable injuries and fatalities.

Drew: This has been asked and answered with multiple data sets. This is yet another nail in the coffin. No one has ever found a strong relationship between total recordable injuries and fatalities.

David: Drew, the second is that recordable injury rate movements in recordable injury rates are almost entirely random.

Drew: Yeah, their work certainly bears this out. I find it rather surprising because I actually would have assumed that the social pressure to have constantly reducing TRIR would create an artificial stability and that there would be a pattern there caused basically by distortions in reporting and distortions in classification. Maybe it's just that their industry partners are unusually honest in giving them unfiltered data.

David: Yeah. Drew, so recordable injury rates cannot be represented as a single point estimate. Saying your rate is 1% is not something that makes any sense.

Drew: Yeah. That's not an empirical finding, that's just a mathematical statement supported by their mathematical explanations in this paper.

David: Drew, your recordable injury rates are not precise. If you're reporting a recordable injury rate with one or two decimal points, it's not going to make any statistical sense to be reporting decimal places around your recordable injury rates.

Drew: To be precise, if you have 30 million work hours, you're allowed to go to one decimal place. If you have 30 billion work hours, you're allowed to go to two decimal places. Otherwise, no decimal points.

David: Okay, very good. Some very large organizations are probably going to be able to keep their two decimal places. If recordable injury rates are used to reward performance, then we're actually rewarding random variation.

One year, randomly, rates will go down and people get a bonus. The next year, rates will go up and people won't get a bonus. Neither of those two years, you've actually rewarded people for anything other than random variation in their statistics. Or, I supposed, Drew, maliciously the ability to creatively classify the incidents that they do have.

Drew: Yes, if you get someone who's regularly earning those bonuses. This is also a way of saying that that's pretty much certainly fraud. If anyone is being honest, they're going to go up and down randomly. You're rewarding random variation or you're rewarding inability to read the statistics.

That's true of any decision we make. It doesn't just have to be a reward decision—if it's a decision about which subcontractor to hire, if it's a decision about which safety program to persevere with, those decisions are just a suspect. We're making those decisions based on random variation.

David: The final—not half positive thing—but thing with a glimmer of positivity around recordable injury or injury rates, the finding says that if recordable injury rates are predictive at all, it is only over very long periods of time, say 100 months of data. If you look at your recordable injury rates for over eight years, there may be a small chance that there might be some hidden predictability in that.

Drew: Yeah, what they don't say in this paper is you've got to keep everything else stable. If you make any major changes to your company over those eight years, you've got to start again.

Keep everything flat for eight years, make no changes, let the injuries happen, and then you'll have good data at the end of it to make a decision using TRIR.

David: Drew, there are six findings. I suppose we're going to have listeners who are going to say, so what? I knew all this stuff. Maybe there is a bit of a so what? I suppose there is a lot of evidence behind this now and there is a white paper. If you're actually trying to initiate some change in your organization, now might be a pretty good time to actually pick this up and remake some of those arguments that you're making to your organization.

The authors offer some key takeaways, Drew. Let's go through these and throw in some of our own practical takeaways as well.

Drew: Number one is just don't use injury rates as a proxy for serious injury and fatality risk. I think that's a fairly practical one for practitioners. If you have to use injury rates, when you put up the slide, put a title on the slide about your minor injury measurement. Then have a separate slide to talk about major injury risk. Just sort of split the two.

David: Correct people when they make these claims about the relationship between the two.

Don't use recordable injury rates to track internal performance or to compare the performance of companies, projects, individuals, or teams. This is practically going to be hard for a lot of our listeners inside organizations to say we're not going to track our safety performance internally using recordable rates. We're not going to compare our contractors or pre-qualify our contractors based on their rates. We're not going to look at one project versus another project, one asset versus another asset. This one here is what companies do. This is actually what they do with their recordable injury rates, Drew. The authors here are saying don't do that.

Drew: Yeah. That's one that safety professionals may not be able to fight against fully. At the very least, we can stop making claims ourselves about it. Unless we're told to, we can leave them off our resumes. Unless we're told to, we can leave them off our reports. At the very least, don't buy into using it for those purposes and don't encourage it.

David: Drew, if you have to do some of that reporting, then safety professionals should change how they communicate recordable injury rates like reporting the range and not reporting the decimal.

For example, if we want to compare two sites, you might say this site has a TRIR of 2.0 or 3.0, which is actually a range of 0-5.0 This is 3.0 or 2.0 and it's actually a range of 0.5. Just talking about ranges and saying, well, all these sites are basically within the same random bucket of rates.

Drew: Takeaway four.

David: Takeaway four was missing, Drew, you said that. First of all, I made a claim about something peer review might have picked up. I don't want to undervalue the peer review process because it's very good. It's also helpful at picking up a few peer review teams to be a pretty detailed process. It might have been an awful joke, just about statistics and numbers, to leave it out.

Drew: I haven’t considered that possibility. Takeaway five is don't use injury rates to measure performance of interventions. Just because your injury rates have gone down doesn't mean the intervention has been a success. Just because it's gone up doesn't mean throw away the intervention. Find some other way to measure performance.

David: I might jump in now before we go on to number six, because we talked, I think it was on the podcast last week, about communication campaigns and the dependent variable, there were injury rates.

What does this mean for the academic world? Injury rates are often used as a dependent variable. They’re used a lot in safety science research. What does this report mean for how you design research in the academic world?

Drew: David, I've been thinking about this a fair bit recently because we've had a couple of listeners provide feedback and sort of take us to task for the fact that we are generally critical of injury rates. Then, we sometimes talk positively about studies that have injury rates as one of their measures.

I think there's a couple of definite things we can say and some gray areas. The definite thing would be if we're talking at the level of epidemiological studies, we're talking across whole injuries, whole populations, or whole countries, and we're using that level of data, then injury rates are reasonably meaningful. We have enough data points, we have large enough numbers that we get away from these statistical validity problems.

If we're talking about single companies, then injury rates are just as bad for research as they are for any other company decision. A company has got to be hurting an awful lot of people for that to show up as a difference in a trial about whether an intervention works or not. Whether it's an academic paper or a company report, if someone says we put in this space our intervention, our injury rate went down, really, that's probably just random variation.

David: I think we also talk a bit on the podcast about being clear on the direct mechanism or the direct variable of what your intervention is trying to achieve. I think that sort of flows into the final takeaway here offered by the authors, which is that new approaches to safety measurement are needed.

Like I said, one of the things that we think we're trying to change to create safety and seeing if we can measure those things that we think we're trying to change.

Drew: Yeah, I don't blame the authors for not putting out positive suggestions here, that's not the purpose of this paper. It's a very easy thing to say that we need new approaches to measurement. It is much, much harder to find what those new measures should be because every measure has one of those two problems. Either it has the exact same statistical problems of injury rates that we just don't have enough data points, or it has the construct validity problems that we don't really have this proven connection to safety.

I agree with you, David, that I think measurements based around the mechanisms are really important. What we mean by that is if you’re measuring the effectiveness of your inductions, then don't worry about whether they're going to change the rate of injuries. Inductions are intended to work by the mechanism of changing people's awareness of hazards. Measure whether your induction has changed people's awareness of hazards or not and just accept that as a metric.

David: Drew, does it leave us in an infinite loop or an infinite leap of faith? Even that thing that you said there, that awareness of hazards, we're still making an assumption that awareness of hazards is something that actually creates safety. There's still a link that's always going to be missing.

Drew: This is actually the point that I recall Erik Hollnagel was originally trying to make in Safety One and Safety Two, which is so misunderstood. It’s that infinite loop can only really be solved, at least in Hollnagel’s view, by stopping saying that safety is about the number of injuries and just recognising that it is a futile path to ever actually be able to measure safety with that end-to-end connection.

From a more pragmatic point of view, you can build the chain step by step. You can look at whether your inductions create hazard awareness. You can look at whether hazard awareness changes workplace behaviours. You can look at whether changes in workplace behaviours change injuries. You're never going to be able to end-to-end in one study, but you build up each of those steps carefully. Then, you have a strong evidence based chain for why good inductions might improve safety outcomes.

David: The authors finally conclude that recordable injury rates have remained—I'm sort of quoting here—the most pervasive measure of safety for 50 years. This study has demonstrated a basic need to test all of our assumptions in relation to safety management.

I think that what we try to do each week on the podcast—test a lot of these assumptions. These authors this week did a good job in a fairly accessible format of answering that question of the statistical validity of recordable injury rates. Drew, what would you like to know from our listeners?

Drew: I'm curious this time about the nature of this particular paper. I think this is the first time where we have been a little bit behind the curve in that people were already reading and discussing the paper before we tried to sort of draw attention to it.

I'm interested in whether people saw the paper when it came out, whether they felt inclined to read the whole thing. How did you find the non-academic style, even though it was fairly heavy on the statistics? Was this style more like an industry report? Easier to read and easier to follow?

What other safety topics would you like to see addressed in this sort of white paper format? Maybe we can get some other people to write similar reports of similar quality.

David: Before we wrap up, Drew, a couple of our listeners asked for our views on this particular paper. I think they were asking for us to run a credibility lens over it as a white paper. Drew, you're an editor of a Safety Science Journal, a peer reviewer. What would be your overarching messages to people that when they're reading this, the credibility of what they're reading?

Drew: Yeah, if you find the statistics hard, I think it's safe in this case to just sort of blur over the statistics and trust that they have been done properly. Focus on the clearly stated conclusions that come out of the analysis that they've done. The findings are supported by the work and the takeaways come directly from the findings.

David: It’d be a free challenge for our listeners. I sort of encourage and challenge any of our listeners to basically prepare yourself a little summary PowerPoint slide of this particular paper, the findings, the takeaways, and go presenting that to management. If you do that, tell us how it went.

Drew: Yeah, I think there are some real gems in there of good ways of explaining things. I think just a couple of the figures and examples will just be very easily translatable for people to take and explain within their own organizations.

Our question for this week, was TRIR statistically valid and what does this mean? Our answer?

David: Drew, our answer is no, they really aren't. We should really stop using them. If we have to use injury statistics which we shouldn't advocate for, at the very least, we should stop pretending that they're precise, report them in ranges, in whole numbers, and really not pay too much attention to them unless they move outside the range of statistical randomness.

Drew: That's it for this week. We hope you found the episode thought-provoking and ultimately useful in shaping the safety of work in your own organization.

As always, send any comments, questions, or ideas for future episodes to feedback@safetyofwork.com or contact us on LinkedIn.