The Safety of Work

Ep.40 When should we trust expert opinions about risk?

Episode Summary

On this episode of the Safety of Work podcast, we discuss whether and when we should trust expert opinions about risk.

Episode Notes

To frame our conversation, we use one of Drew’s papers to discuss this issue. This paper, Forecasts or Fortune-Telling,was borne out of deep frustration.

Tune in to hear our discussion about when or if it is appropriate to listen to experts.

Topics:

The two questions the paper sought to answer.
What we mean by “expertise”.
Forecasting.
Determining the value of a given expert.
Biases in reporting and researching.
Super-forecasting.
Wisdom of crowds.
Better ways to get better answers.
Why mathematical models aren’t as helpful as we think.
Practical takeaways.

Quotes:

“Is it best to grab ten oncologists and take the average of their opinions?”

“But there is this possibility that there are some people who are better at managing their own cognitive biases than others. And it’s not to do with domain expertise, it’s to do with a particular set of skills that they call ‘super-forecasting’.”

“As far as I understand it, most organizations do not use complicated ways of combining expert opinions.”

Resources:

Forecasts or Fortune-Telling

Feedback@safetyofwork.com

Episode Transcription

You’re listening to the Safety of Work Podcast episode 40. Today we’re asking the question when should we trust the expert’s opinion about risk? Let’s get started.

David: Hey, everybody, my name is David Provan and I’m here with Drew Rae. We’re from the Safety Science Innovation Lab at Griffith University. Welcome to the Safety of Work Podcast. Regular listeners would know that every 10 episodes, we give ourselves the opportunity to talk about one of our own papers. In previous episodes, we have talked about papers that we’ve written together—Drew and I. But this one I personally wasn’t involved in.

Drew is probably going to do most of the talking and I might play a bit of Q&A type of role. But Drew, do you want to give us some background into why you wrote this paper and what the research question was?

Drew: Sure. Papers usually start off with some sort of curiosity or desire to investigate. This particular paper started with deep frustration. Listeners may know that I’ve got an unpaid side gig as an Associate Editor at the Journal of Safety Science. When you’re working as an editor at a Journal, you get to see loads of papers that never make it to publication. Of course, the ones that don’t make it to publication are not as good as the ones that do get published. The ones that do get published aren’t perfect either.

The papers tend to come in waves of lots of similar papers that use the same techniques and make the same mistakes. You’re sitting there as an editor, going through a list, rejecting, rejecting, rejecting similar papers for similar reasons. It starts to sap your confidence because when everyone else is disagreeing with you, the two explanations are you’re right and everyone else is wrong. Or maybe there’s something that you’re not seeing.

I thought it was time to do a deep dive to check out my reasons for rejecting the papers and just to make sure that it wasn’t me alone in this sea of disagreement, but this was something that the academic community was behind me on. The particular type of paper we were looking at—what was common to all the papers is they were using survey or interview techniques to gather subject matter expert opinions about risk.

And then rather than just reporting that as if it was an opinion poll, they were using some form of complicated mathematical modeling or processing to create a new answer that they thought was more accurate or valid than the original opinions. David, that general idea is something that tends to happen more in academia than in real life. But asking people’s opinions about risk is something that we do in lots of organizations.

David: Yeah, Drew. We do it. We do it all the time. We talked about risk matrices and decision-making way back on episode eight. Risk assessment is this real cornerstone of safety management practices. We maybe don’t critically reflect on it as much as we should, as much as you have in your academic career, Drew.

The paper that we’re talking about today is Forecasts or fortune-telling: When are expert judgments of safety risk valid? And Drew, you’re the first author and the second author is Rob Alexander. It was published in the Journal of Safety Science in 2017. Our listeners might remember from episode 20 when we talked about our manifesto paper. Rob Alexander was also a co-author with us on that paper. Drew, do you want to talk a little bit about the publication?

Drew: Sure. Rob Alexander’s a close personal friend of mine from the University of York. He tends to play the role of the supervisor in our collaborations. Just as I would with a Ph.D. student, they’d do the writing, I would check out, ask the hard questions. When I do my writing, Rob’s the one who asks the hard questions and pulls me up when I’m getting convoluted.

The two questions we wanted to answer in the paper, we wanted to answer a very practical question about what can we currently claim about using expert estimates of risk as data. The secondary question is, are there particular methods for asking for those opinions or combining those opinions that somehow increase the validity and make them better than just asking the experts and reporting what they say?

David: Drew, if I recall correctly when we’re working through our Safety of Work versus the Safety of Work paper, in theory, a couple of times we sent it to him to get his opinion on it. But this paper, Drew, it doesn’t actually have a method section. You described the paper as a critical synthesis of the evidence from multiple disciplines. It’s more of a literature of you. If people recall our episode 36, it was on Management Fads and Fashions. We described that as a narrative literature of you, and this follows a very similar format.

Drew: I’ve been interested in our listeners’ opinions if they grab a copy of the paper of this style of writing. It’s something that Rob and I play around with occasionally. Trying to make our papers as readable as possible by throwing away some of the standard conventions and not having an introduction, methodology, background, results, discussion, and conclusions. The technique is to make the outline of the paper actually readable. You could skim and just look at the headings and that should tell a story just in its own right.

That does do things like getting rid of having a method section. But I’d argue the paper does have a lot of methodology in it. We thought deeply about some of the key questions like who is an expert and how do you tell who is an expert? What does it mean to estimate risk? What does it mean to be valid? I try to answer those questions rather than the boring questions like what database you used? What search terms you used? How many papers matched those results?

I don’t think people really learn much from that in a literature review so much as they learn from engaging about the key questions.

David: Yeah, Drew. I also like the way in your paper and we’d done it in some other papers where you frame each section of the paper or each heading in each section of the paper as a question. What we’re going to talk through now there’ll be questions such as can expert judgment of risk be made more accurate? Is there such a thing as a risk estimation expert? Are multiple experts better than one expert?

These things, when you’re thinking about this field or this idea of when should I trust or rely on an expert assessment of risk, these are the questions that should be in your mind when you are thinking about that question.

Drew: Yes. If the paper works correctly, all of the main headings are questions, and all of the subheadings are answers to those questions.

David: Let’s dive in, Drew. Let’s work our way through the paper because I think it’s a fascinating question. I think it’s one that our listeners are or should be interested in because people come to them—if they are a safety professional as an expert—to provide certain estimations of risk. I’m sure many of our listeners, safety professionals, and others rely on engineers, on doctors, on a whole range of professions. As experts, to tell them the chance that something is going to happen or something is not going to happen.

The first part of the paper talks about is there such a thing as a risk estimation expert? Drew, tell us about what the literature says in this space?

Drew: Okay. The first answer to that question is just that expert is a really ambiguous term when you’re talking about risk assessment. In the literature, there are two main definitions that get used. The first one is that an expert is someone whose judgment gets accorded extra weight. It might be because of their qualifications, their experience, their domain expertise, or some other signal of authority.

And then the second definition is an expert is someone who makes particularly accurate estimates. We often blow those two things together. We assume that the people who we give extra weight to are the same people who are good at making estimates. But they’re absolutely, definitively not the same thing.

It’s certainly the case that when we give extra weight to some people, we probably in the back of our mind assume that they have some special ability to tell us what the risk is. But that’s often very unproven—the link between the things that signal expertise and actual proven ability to make more accurate estimates of risk.

David: Drew, I was thinking when I read some of this—and I may have the wrong slant on it, so correct me if I’m wrong. Say a patient gets a cancer diagnosis and the oncologist tells the patient that they’ve got a 50% chance of surviving the next five years. That’s some kind of assessment of the mortality risk for that particular individual. Is that a time when we should be thinking of the oncologist as the expert and putting weight in their estimation?

Drew: I guess that’s the question we want to ask. What they’re doing there is definitely a risk forecast. The question is if you want to know what’s your chance of surviving cancer, is it best to grab an oncologist? Is it best to grab 10 oncologists and take the average of their opinions? Is it best to take some 15-year-old Norwegian kid who likes to play on risk markets and take their opinion?

We assumed that the best oncologist is the person who’s going to give us the most accurate estimate. That’s what we usually mean by expertise is. Someone who’s got that domain expertise and we think that that’s going to lead to them giving us good information.

The scientific literature suggests that there are three reasons why an expert might be better than just pulling someone off the street. The first one is there could have some sort of private information. That is they can make a better estimate because, in the back of their mind, there’s all of this data that they don’t know that we don’t have.

The example might be if you want to know about what’s your chance of being in a traffic accident, you ask a traffic expert. They already know how many people are killed on the roads each year. They know where those things tend to happen, what time of the day they tend to happen. They can use all of that secret information to give you the answer.

The second reason an expert might be better than someone else is they could have domain knowledge. This is probably why you think the oncologist is going to be good at making the estimate. This is just different from knowing the average information. They’ve got a really good understanding of how things work.

The oncologist looks at your particular cancer and they say, oh, I can see that this is a particularly aggressive cancer compared to other ones. Or a structural engineer looks at a bridge design and says, that’s a bad design. It’s going to fall down. It’s less about knowing the odds and more about knowing how to analyze the situation using domain knowledge.

The third one, which is kind of fun, is there could be a thing called a super forecaster. The idea of a super forecaster is someone—who regardless of domain knowledge or regardless of private information–is just really good at extrapolating from the past to the future. Later in the paper, we’ll talk a little bit about this idea of the super forecasters. Because that’s what we might mean by expertise is it’s not about knowledge, it’s about the ability to crunch the numbers.

David: Yeah. I think, Drew, there are people and professionals who call themselves futurists in those specific domains. But they consider themselves as having the ability to predict more accurately than others what might happen in the future regardless of the question that’s thrown at them.

Drew: Yeah. There’s a whole field of research about whether such people are in fact better than other people at doing that.

David: Very good. We might get to some of that. If we move onto the next section, we’ve just looked at what an expert is, then there’s a section in the paper that talks about expert accuracy are difficult to study. Because basically, you say there’s no real way of studying this unless you actually get people to make a whole heap of predictions and then just let the passage of time to see if people are right or not.

Drew: Yes. We wanted to make sure that people who read the paper had an idea of what counts as good evidence that an expert is good at their job when it comes to forecasting. The classical way that people study this is they have a bunch of estimation tasks where the testers already know the true answer. And then you ask the experts to give their answers and then you compare them.

You might measure the expert based on both their accuracy and their confidence. You can say, did they get close to the answer? If we ask them to give maybe upper and lower bounds on their own estimates, does the correct answer fall inside those bounds? Is the expert good at knowing whether their own guess is a good guess or not?

But the trouble with these sorts of studies is you can only do it if the testers know the answer. That’s not like any real-world risk estimation task that we want people to do. If we already knew the answer we wouldn’t be asking experts to guess for us. An expert might be good at that particular task—the one where we know the answer—but not be able to take that skill out into the domain where they don’t know the answer.

In fact, there have been some clever studies that have shown this. There’s even a problem that even within this restricted domain where the testers know the answer, there’s still this question of who gets to decide based on only available data what the correct answer is. It could be what’s happening is the experts and the testers believe that the value is one thing.

The people that we’re saying are wrong or inaccurate laypeople may just be different people to the testers and rely on different types of information. That doesn’t necessarily mean that they’re wrong. It just means that they value different information.

David: Drew, we’ve talked quite generically about risk estimation. But obviously, you and Rob, when you were thinking about this paper, were really interested in system safety and estimating the risk of major accident events and so on. Then the paper goes on to talk about that experts don’t have access to privileged information about safety risk. I was thinking of—I might throw it to you later—a problem with what is a safety professional actually an expert in?

When you say an expert doesn't have access to privileged information at safety risk, what do you mean there?

Drew: Definitely experts have got information. There’s no question that particularly safety experts should have some idea of the relative likelihood of particular things, how often they’ve seen particular things, or which has and some most likely to occur. But the trouble is all of that is historical data. When we ask them to do an estimation task, we’re not asking them historically what’s the chance of this particular hazard. We’re asking them in the future what’s the likelihood of this particular hazard.

Even though experts are generally better than laypeople when it comes to historical odds. Symbol one, David, we’re launching a rocket, what’s the chance of it blowing up?

David: Maybe 1 in 400.

Drew: Okay. The thing is I actually know the answer to that question. It’s really easy for me to outperform you on this particular test. I know that typically, rocket launches succeed on launch somewhere between 95% and 97% of the time. The chance of it blowing up is more likely somewhere between 1 in 20 and 1 in 40. I drastically outperform you on this task.

David: In order of magnitude too.

Drew: This is fantastic. We have a brand new rocket, and we have to forecast this. Now we both have access to the historical data. Do I have any extra power over you at guessing the likelihood of the new rocket?

David: I suppose if I knew details about the design of that rocket compared with the design of the previous rockets in the historical data you might be able to make some sort of adjustment to the historical information based on that. Whereas I would probably just form the view of going, well, I guess the engineers are probably a bit smarter. I’d probably give them an order of magnitude reduction of risk to hope that they’d learned from what happened previously.

Drew: I think that’s exactly how they made the claims that the space shuttle is going to have a risk of about 1 in 400. This is where the distinction is, that when it comes to privilege information, the experts don’t have it. But they may have some sort of understanding of the causal mechanisms. It turns out that this is mostly on very simple problems where you’re using a very small number of physical laws.

You certainly can’t apply it to launch a rocket, but you can do it with something like what’s the weather going to be like tomorrow? I could have all of the same data that the guy on channel nine has about the weather tomorrow, and they’re still going to come up with the better answer than me because they know how to read and interpret weather data.

David: But doesn’t the guy on the weather, Drew, I roughly recall a statistic at some point saying that even the guy on the weather gets it right about 30% of the time given the complexity of the system that they’re trying to model and understand?

Drew: Yeah. Weather forecasting has actually got increasingly better even over our lifetime. That’s something that is constantly getting better. That’s a good sign that there is increasing knowledge of the underlying mechanisms, the ability to build that understanding into the models, and into the interpretation of the models lets them perform really, really well. That’s where forecasting expertise really comes into its own is when these very well understood problems where the better we understand it, the more accurate the forecast is going to be.

But we shouldn’t confuse that sort of stuff with broader questions that go beyond those models. The weather forecaster is a terrible climate forecaster, and the climate forecaster doesn’t necessarily have any expertise in weather forecasting. Neither of them is any good at judging what the global reaction to climate change is going to be. We know whether climate change is actually going to happen or not.

David: Drew, the next part of the paper talks about domain experts may have a superior understanding of causal mechanisms.

Drew: That’s most of the point I was just making. We do need to differentiate between these ones that are really scientific modeling tasks versus forecasting tasks. Give someone a good model, give someone good physical laws they can tell you how that’s going to play out. That’s usually talking about a very single specific hazard rather than a broader domain.

The next one is this idea that experts are somehow able to be more objective and free from bias in their estimates. We had a lot of literature around in the 1960s, 1970s that talked about how bad lay people were at risk estimation. They talked about all the distortions in risk perception that people had. I was actually a little bit surprised myself when I got into this. To realize that most of that literature has been rejected even by the people who wrote it now.

The general consensus now is that experts not only suffer from the same cognitive distortions and biases than everyone else, but there are some extra special ones that experts themselves are particularly prone to. But there is this possibility that there are some people who are better at managing their own cognitive biases than others. It’s not to do with domain expertise, it’s to do with a particular set of skills that they called super forecasting.

Super forecasting has a whole body of literature, and it’s probably worth an episode on its own. But the way they do this is very different from traditional expert forecasting. Remember, the problem is how do you give someone a fair test when you already know the answer and it’s supposed to be a forecasting problem.

The way they do these super forecasting research is they make the research last multiple years. They get their experts to make predictions now about future events that no one knows the answer to. As those events happen or don’t happen, they then check whether the experts were right or not, work out who is better than other people, and what sort of techniques did they use.

They find that some people are definitely better than other people at that task. It seems to be that they’ve got particular behaviors and cognitive skills that they use to filter data to interpret data, to update their forecasts, and to communicate with other people about the risk.

David: Drew, it sounds like there are some capabilities, not so much in the domain expertise the person holds, but some more individual capabilities that enable them to make potentially more accurate forecasts, or at least be less or more immune from some of those challenges over time. But are there things that we can do to make expert judgments of risk more accurate through processes or outside influence?

Drew: That’s the big question that I really wanted to know the answer to because that had real practical implications for real-world risk assessment. If particular techniques are more accurate than others, then we should all be using those techniques. The first one is a definite yes, that there are some ways of asking questions that influence the validities of the expert forecasts.

The literature talks about substantive goodness and normative goodness. Substantive goodness is whether the expert knows the answer. And then normative goodness is whether they’re able to turn that answer into a probability that other people can use. How we ask the expert the question definitely changes the probability that they come out with.

If we follow good elicitation techniques, good ways of asking experts their opinions, we can get different answers. Interestingly, the absolute worst way to do it is to ask them a series of different questions about different risks in a row. In other words, just like hazard identification and risk assessment workshop is the absolute worst environment for eliciting opinions where each one follows on from the next one.

David: Drew, I’m curious about that because that’s what we tend to do in a risk workshop. We go, let’s talk about this risk, now let’s talk about different risks, now let’s talk about a different risk. The alternative would be to follow a line of inquiry about a particular circumstance in a range. When we talk about that in a range of different ways to try to narrow in on a more comprehensive understanding of the situation, then park it, and then maybe have some time before we moved on to the next topic. Or what practically would you do to try to change that process?

Drew: There’s a number of different issues there. One of them is just having different people in the room at the same time is going to mean that the aggregate of their opinion is going to be mixed up in social factors, which we’ll talk about a little bit in the next section of the paper. But the second one is that there are a number of different framing effects where a risk number that you’ve just given is going to act like the upper or lower bounds for the next opinion that you give. That you don’t give it independently, you give it adjusted based on the previous one.

That adjustment is not calibration. That doesn’t get you a consistent assessment. That actually gets you each one with a compounding distortion. By the time you get to the end of the session, you’re not really evaluating the risk at all. The best thing is to just have a clear head and approach each one from scratch without having recently talked about other numbers.

David: Yup. That makes sense. I can imagine when we talk about the social factors that the layout is voiced at the start that says something can happen makes it very difficult for someone to come up and say, I reckon that’s going to happen tomorrow.

Drew: Yeah. There’s a thing called anchoring, which says that even if you do disagree with that person and you have the social capital to shut them down, the number they said still affects what you say because you’re going to adjust based off that number and say I think it’s lower, I think it’s higher. That itself gives you a different opinion than if you came at it just from scratch.

David: Is there anything else that you wanted to say in relation to the training that can be provided, or how to break up the task or the way that we elicit the expert opinion?

Drew: The only other thing I’d throw in there is that giving people some practice exercises in coming up with probability and then giving them feedback improves with that ability to come out with your domain knowledge turned into probabilities. Giving people some training exercises before they start the risk assessment task—particularly if they’re not used to creating probabilities—definitely helps them with their accuracy.

The other one is there’s no real consensus on whether you can improve the expert performance by breaking the task up into little bits. There’s one school thought that says, we’re not going to ask the experts what’s the chance of a fire. We’re going to have a fire model, and then we’ll ask them about all the different inputs. Rate the flammability from 1–10. Rate the building materials from 1–10. Rate the fire protection from 1–10. That sort of decomposition task, there’s no real evidence that it ends up in better answers.

But the bit that I really want to talk about is this bit about multiple experts. David, you’ve heard of the phrase “wisdom of crowds” before?

David: Yeah, Drew. When I read this section, I actually read the notes that you’d prepared before I went and reread the paper. I went, oh, there was that guy a hundred years ago that did this at a fair. And then when I read the paper, I realized that you referenced it. It immediately reminded me of Francis Galton, and I did look it up in 1906 at a county fair in Plymouth where he had 800 people try to guess the weight of a slaughtered ox that he had sitting in the middle of this fairground. Out of those 800 guessed, he just took the median, which was like 1207lbs. It was accurate within 1% of the real weight of the 1198lbs.

Drew, that’s what it immediately reminded me of. But I hadn’t heard the term wisdom of crowds before.

Drew: Yes. That experiment is pretty much where it all started. Oddly enough, it’s also where it all ends. If you take the absolute question here, it’s if you ask a number of different people the same question. Does the group as a whole outperform just picking the best individuals in that group? The wisdom of crowds isn’t about is the crowd better than an average person. It’s is the crowd better than picking out one or two experts from the crowd and asking them.

That really depends on how you combine the expert. The first question is what if we combine the experts socially? Instead of doing it mathematically, we just shove them all in a room and say, give us an answer. There’s a lot of research about groupthink. The result is if you do that, on average, the majority opinion tends to win anyway. You’ve wasted time getting them to talk about it. You may as well have just got them to vote from the start and pick whatever the majority said.

The exception is if there’s a minority opinion that is able to be persuasive. In which case, the minority opinion wins through being able to justify its opinion better.

David: Drew, if we’d follow, for instance Galton’s experiments, the idea of having 800 random people in the fair versus 5 butchers and getting those 5 butchers in a room who deal with weights and meats every day, are they going to be better than 800 is question one. And the second one is if four of those butchers say one thing and one of those butchers say the other, or something quite different. But then that butcher says, actually, this morning, I was weighing this particular carcass that looks exactly like this and this was the weight. Is that what you mean?

Drew: Hold that thought because I just want to lead you through a couple of findings that get us directly to the answer to that question. I’m going to have to quote directly from the paper to explain this. But the sort of too long, didn’t read upfront is that if you mathematically combine the expert opinions, this is you get a bunch of experts—people with equal ability. Let’s say we’ve canvassed 20 butchers, then we will get a better answer than if we ask one butcher no matter who that butcher is.

Let’s start with two butchers. Let’s say the answer is somewhere between those two butchers. Mathematically, the average will always be closer to the true answer than picking at random one of the two butchers. We call this effect bracketing. Now, that’s not true if the average doesn’t fall in the middle. But if the average doesn’t fall into the middle, we don’t know which butcher to pick. Unless we’ve got some way of picking out the correct butcher, then we are better off just taking the average.

Then the question is what about the crowd versus the bunch of butchers? Is it better to have just a bunch of butchers and ask their opinion? Or is it better to throw in all of the just random figures as well? The answer to that one is that if we’ve got a good way of picking out the experts and creating a smaller pool, then each person we add is going to both add information and noise. There’s a magic number where you stop adding information, and you’re just adding noise and you’re making it worse. But we never know where that point is.

If it’s the crowd versus a crowd of butchers, definitely pick the crowd of butchers. But trying to get within that crowd of butchers and pick out the best group of butchers to ask were more likely to add error by trying to single out the best butchers than we are by just trying to take the average.

David: If we just step outside of the scenario for a minute into risk assessment in the workplace, where we do this risk assessment workshops where we’ve got an entire management team or a group of people, some of which have some closer domain knowledge of the risk and others that fit more into the random fairgoers. No disrespect to anyone in the organization, but people had different knowledge about different things.

What you’re saying is you actually don’t know if those people are adding valuable contributions or noise to an assessment about a domain that they don’t understand. Or don’t know maybe on an expert in.

Drew: That’s right. The only way you can really find out is to get them to actually indirect socially about the risk, talk about it, and to see what information they are bringing in. The risk there is once they’ve done that, it’s pretty much contaminated any chance of getting the number.

David: Drew, if it’s not incorrect to form the conclusion to say that look, two experts are better than one. Ten experts might be better than one. Depending on how you frame the questions, depending on how you manage the situation and the training that’s provided and all of those other things that you’d mention, are there any specific ways that we can combine their opinions as opposed to the average that we’ve just talked about and get a more accurate forecast of risk?

Drew: Now we’re getting down to the crunch question that got me wanting to write this paper in the first place. As far as I understand it, most organizations do not use complicated ways of combining expert opinions. Most organizations just throw people in a room and get them to come to a consensus.

But in the academic community, there are a huge bunch of people trying to come up with better methods of doing risk assessment. Some of the motivation is to get companies to use the method. Some of it is just to get academic publications. This is something where we absolutely do know the answer to the question. That is people have been trying to do this for 20, 30 years, and no one has yet to come up with a better method than simply taking a linear average of the expert opinions giving every expert an equal weight.

This is my challenge to anyone who wants to publish a new paper combining expert opinions using some methods. I’ve seen it with Bayesian Belief Networks, I think called AHP—Analytic Hierarchy Process, DEMATEL neural networks, fuzzy analysis. None of it has any of it and that it outperforms a linear average. You want to publish your paper on this, prove that your method gives a better result somehow than simply adding all the experts’ opinions together dividing by the number of experts. If you can’t prove that, your method is useless.

David: I think Drew, what you’d be saying is make sure that the dependent variable is actually the accuracy of the risk forecast. Not just how well the expert opinion aligns with the answer the researchers hold.

Drew: That’s it exactly.

David: Drew, there’s a couple of things in the paper that I read through that we haven’t talked about before we get to practical takeaways. I wouldn’t mind putting you on the spot about a couple of things if you don’t mind—since you put me on the spot about the risk of a space shuttle blowing up on take off.

Drew: Absolutely, I’ve seen you do this to people on other papers.

David: You talk about the use of experts for estimating the risk of major accident events, and we do that a lot. We do fire and explosion modeling, we go to structural engineers, we got a whole raft of experts to try to understand in system safety the risk of a major accident event. And you mentioned the paper at these MAEs happen too infrequently for us to use past statistics to be a good indicator of risk. These things don’t happen all the time, and we can’t figure them out.

For an asset that’s going to be around say the 50-year life of an asset, what’s actually the difference between a risk that’s going to happen 1 in 1000 years, or a risk that’s going to happen 1 in 1 million years. So 10 to the minus 4, 10 to the minus 7. How should we actually think about the difference in risk for something like that when we’re building something that’s going to be around for 50 years?

Drew: What I would love is for people to realize that the question of risk is not a useful question. The question is do we want to do it? That’s practically what we mean by the difference between a risk that’s going to happen 1 in 100,000 years and risk that’s going to happen 1 in 50 years is the implied idea that if it’s going to happen more often, then we’re going to try to do something about it. Whereas if it’s going to happen really rarely, then we’re not going to do something about it.

History has shown us that is just a bad way of making engineering decisions. We’re not likely to make a better engineering decision after having done the risk assessment. We’re more likely to make a better decision if we just say what is reasonable and appropriate for the design here and let’s do that. If you find yourself saying let’s remove the safety feature because we think the risk is really, really low, then just go and stare at the engineers’ code of ethics for a little while, come back and remake the decision.

David: Drew, I threw you a probability there and you also quote in the paper from some very prolific researchers and writers in the risk space, Aven and others. They talked about we should think about replacing the fundamental cornerstone of risk to become uncertainty rather than probability? Because 1 in 1000 versus 1 in 1 million doesn’t mean as much as saying I’m actually 90% confident that this is right.

Would we worry about a risk that’s got a 90% confidence around 1 in 1 year scenario versus a 1 in 1 million years scenario that I’m only about 10% confident that I think I’ve estimated that? Or that I understand the range of probabilities of that particular risk could have.

Drew: Terje Aven is a much bigger name in the risk field than I am or ever will be. I disagree with hesitation but also with confidence that I think playing mathematical games with risk doesn’t improve our decision-making. I think a lot of the times, when people talk about uncertainty, they very quickly default back to these mathematical methods and mathematical models. As an engineer myself, I’ll say that is a comfort zone for engineers to go into is turning for an existential angst and doing something that we can calculate.

But there are much more practical ways to think about uncertainty. One of them is don’t get people to do calculations. Get your 10 experts to write down what they think the risk is on a piece of paper. If they all agree and they all agree it’s serious and it’s a high risk, then you should respect that consensus and do something about it. If they all disagree, then you should say, hey, this is a really dangerous area of uncertainty where even the experts don’t know what the risk is. We ought to do something about it.

The only case where it’s confusing is if they all agree that it’s something that you shouldn’t worry about. We don’t really know what to do with that answer.

David: Drew, you lead it to the last point before we end the practical takeaways where you’re mentioning the paper taking what you’ve just described then taking a constructivist approach to risk and thinking about using experts and expert assessment of risk more for communicating risk rather than quantifying risk. And I think, in that case, it’s about communicating which risks perhaps the organization should do something about and which risks are maybe they don’t need to as much.

When I think about Fukushima and the situation that happened there and the 1 in 100 years or 1 in 1,000 years surge height of a wave. The numbers all looked fine for the basis of design for that facility. But I wonder if, like you said, a narrative-based assessment to say well, we actually don’t really know what height to design for. Who knows whether instead of numbers that an approach like that might have changed decisions.

Drew: Let me give you a slightly more recent example that I think really brings home that point. At the moment, around the world, there’s a number of countries that are still experiencing great difficulty with COVID-19. In fact, David, I think you are currently in lockdown in Melbourne with compulsory mask-wearing in place.

David: Yeah. I think you said some countries. I think the rest of Australia would really like to have victory and other countries at the moment.

Drew: Yeah. Imagine when COVID-19 first started being a threat, people had said clearly and responsibly, we don’t know much about how it’s transmitted. We don’t yet know what role masks will have in either protecting individuals or reducing transmission. But what we do know for sure is that we’re about to have serious mask shortages in hospitals where we desperately need those masks.

Honest transmission of the risk information there and the uncertainty, not as a mathematical thing, just as a what do we know, what do we not know, I think would leave us in a much better place now when the experts are saying, we’ve done the modeling, we do understand what’s going on, wear your damn masks.

People remember the conflicting advice. They remember that they were told with certainty, oh, you don’t need the mask. We’ve done a risk assessment. That’s where that being more willing to communicate what’s going on in our thinking and reasoning actually will help people make better risk decisions.

David: I like the way you’ve described that example, Drew. I had a few notes here about expert judgments of risk, the Center for Disease Control, the WHO, and the things that they’ve said. What they predicted and what’s actually happened. But our listeners can all form their own views in that space once they listen to this episode and think about that situation with COVID-19 and all of the experts and what’s transpired.

Drew, if we move to practical takeaways, this idea of when are expert’s judgment is about risk is valid and when should we listen? What does this mean for our conscientious listeners who are researchers, professionals, and decision-makers? They want to make use of the best available evidence. They’re listening to the Safety of Work Podcast because they want to try to understand and make use of this evidence.

But they also don’t want to overstate the quality and validity of that evidence. Many times, on this podcast, when we give views, and even when we talk about the research, we always outline the limitations and the uncertainty around that evidence. What should our listeners do?

Drew: One thing that we were very worried about when we wrote this paper is we didn’t want people to—as a takeaway—think our experts are bad at assessing risk, therefore let’s not listen to the experts. That’s absolutely not what the consensus is about. The trick is recognizing the difference between experts who are making domain knowledge judgments about causal mechanisms and when they’re pontificating or guessing about probabilities.

When you have the main experts who understand what’s going on who are trying to base their decisions and base their advice on understanding the mechanisms, and they all agree with each other, absolutely we should listen to them. When you’re talking about things like climate change, do not substitute your own opinion, your own thinking about what’s going on for the opinions of the experts who understand the causal mechanisms.

They understand it far better than you could ever even grasp the explanations. When a fire safety expert tells you the fire is going to spread in this particular way so you need to put a sprinkler in, listen to the damn expert. And even when the experts disagree, those contradictions tell you where there is structural uncertainty. It tells you we need to be less confident, we need to be less sure, we need to take precautions. That’s the first takeaway is it’s like an anti-takeaway. Don’t take away from this ignore the experts.

But the second one is the way that we’re currently using experts—including and perhaps in particular the way we do it when we do safety research—is not supported by the evidence about what experts are capable of. We are using experts in ways that our own expertise about experts tells us is wrong. Be very careful about how you use experts, particularly when you’re trying to use them to make judgments about risk as an objective quantity.

David: Drew, just to step in there, if an organization sees that a safety professional is an expert and gets a new piece of equipment and comes up to the safety professional and goes, what’s the risk associated with this piece of equipment or is this piece of equipment safe? Then that’s the situation where should we be relying on what the safety professional says in that situation? Or should we not necessarily rely on what the safety professional says?

Drew: There are two questions we can ask them. There’s the question you asked them where the answer is almost like a number. Don’t ask them that one. Don’t listen to them if they give you an answer to that one. If the question is what sorts of things should we be worried about with this new piece of equipment? And what sorts of things should we be doing to manage it? That’s definitely the case you want to ask the expert that question and you want to listen to the answer.

David: And the next takeaway, Drew?

Drew: That leads on to what things do you want to ask your experts to do? One of them is to get them to describe uncertainty rather than quantify it. That applies to any risk assessment. It shouldn’t be here are the numbers. It should be what are the things that we’re not sure about? What should we be worried about? And getting that idea of what’s missing? What’s unclear? Rather than trying to get a number or trying to get a classification for the risk.

David: I think there are some applications that do that really well. I’m thinking of the oil and gas exploration and drilling activities where they have got some geological models of subsurface structures. But then they also identify things that they’re really uncertain about formations or shallow gas. They adjust their drilling plans and well construction based around that uncertainty. But they just talk about it. They talk about shallow gas and they talk about formations. They don’t quantify that a lot of the time.

Drew: That’s a technique that managers can use when they’re talking to technical experts. You might not actually understand all the full details of the drilling plan or the drilling assessment. Just ask them if you are wrong, where are you most likely to be wrong? And then what extra resource do you need to find out about this? What do we need to do to account for that possibility?

The final takeaway I’d like to throw in is that it's good for all of us to think about risk like social scientists rather like engineers. That applies even if you’re an engineer yourself. That distinction is to think of risk assessment as tools for communicating and explaining decisions, not as tools for making decisions.

They are like the writing of the report, they’re not like the engineering calculations that go into the design. When we confuse one for the other, when we confuse the reporting from the analysis, that’s when we get into trouble. Risk assessment and expert opinion is much more about the reporting rather than about your physical analysis.

David: Drew, take your engineer’s hat off and take your management hat off and put your social scientist hat on.

Drew: Yeah.

David: Drew, what would you like to know from our listeners?

Drew: Do you use expert opinions in your own work? I think, David, we talked about formal risk assessment workshops. But I’d be interested in where else people tend to ask experts for their opinions, and do you do any expert polling in your work? Do you ever ask more than one expert, get a range of opinions, and try to add them together? If you do that sort of thing, then how do you decide who counts as an expert?

Who gets to be in the room when you do risk assessments? How do you decide that question? Have you put much thought into how you get those opinions and putting in place deliberate steps to try to reduce the bias or to reduce the distortion from those experts?

David: Drew, the question we asked today was when should we trust expert opinions about risk. And you wrote the paper, so your answer?

Drew: Thankfully, David, you’ve actually talked in a little summary answer for me, which I’m quite happy to agree with and say we should trust expert opinions when there is a specific local causal mechanism that the expert understands really well. And they’re applying simple physical laws and explaining how the world works. Perhaps, if we’ve got multiple expert opinions, we should just average them together and use the wisdom of crowds. And we’ll get better estimates just by asking more people averaging that together than we will be trying to pick out the key experts.

David: Thanks, Drew. And thanks for doing the heavy lifting on the episode this week. That’s it for this week. We hope you found this episode thought-provoking and ultimately useful in shaping the safety of work in your own organization. Send any comments, questions, or ideas for future episodes to us at feedback@safetyofwork.com.