Businesses are hiring data scientists in droves to make rigorous, scientific, unbiased, data-driven decisions.
And now, the bad news: those decisions usually aren’t.
For a decision to be data-driven, it has to be the data — as opposed to something else entirely — that drive it. Seems so straightforward, and yet it’s so rare in practice because decision-makers lack a key psychological habit.
Imagine that you are considering buying something online instead of making a pilgrimage to the other side of town to fetch it. You’ve boiled your decision down to whether or not you trust the online seller. A quick search yields some relevant data: you see that the seller has an average rating of 4.2 out of 5.
Without decision-making fundamentals, your decision will be at best inspired by data, but not driven by it.
Now you can’t use that 4.2 to drive your decision. Game over! Once we’ve seen the answer, we’re free to pick the most convenient question. If the first thing we do is poke around in our data, our decision will be, at best, something I like to call data-inspired.
That’s where we, like whales encountering plankton, swim around in some numbers, and then reach an emotional tipping point and… decide. There are numbers near our decision somewhere, but those numbers don’t drive it. The decision comes from somewhere else entirely.
A whale shark swimming around in some data.
The decision-maker’s mind was made up before the data, so the decision was there all along. Turns out humans interact with data selectively to confirm choices we’ve already made in our heart of hearts. We find the most convenient light in which to see evidence, and we don’t always know we’re doing it. Psychologists have a lovely name for this: confirmation bias.
Many people only use data to feel better about decisions they’ve already made.
Is 4.2/5 a good number? Depends on your unconscious biases. A decision-maker who really wants to make the online purchase will squint at that 4.2 and sing a happy song about how that’s a high number. “It’s more than 4.0!” They can even show a rigorous analysis about how it is statistically significantly higher than 4.0. (With certainty! It’s the p-value you’ve always wanted.) In the meantime, someone who really doesn’t want to use that seller will find another way to frame the question in response to the data: “Why would I settle for a seller with less than 4.5 stars?” Or perhaps “But look at those 1-star reviews. I don’t like how many there are.” Sound familiar?
The more ways there are to slice the data, the more your analysis is a breeding ground for confirmation bias.
Mathematical complexity doesn’t provide the antidote, it merely makes it harder to see the problem. As a result, what’s obvious in the trivial example we just saw becomes hidden in a jumble of gorgeous Gaussians. Don’t assume your friendly neighborhood data scientist sees it either. The more ways there are to slice the data, the more your analysis is a breeding ground for confirmation bias.
The result? Decision-makers end up using data to feel better about doing what they were going to do anyway.
Illustration by Paul J.
When the analysis is complex or the data are hard to process, a pinch of tragedy finds its way into our comedy. Sometimes boiling everything down to arrive at that 4.2 number takes months of toil by a horde of data scientists and engineers. At the end of a grueling journey, the data science team triumphantly presents the result: it’s 4.2 out of 5! The math was done meticulously. The team worked nights and weekends to get it in on time.
What do the stakeholders do with it? Yup, same as our previous 4.2: look at it through their confirmation bias goggles, with no effect on real-world actions. It doesn’t even matter that it’s accurate—nothing would be different if all those poor data scientists just made some numbers up.
When decision-makers lack fundamental skills, there’s no math in the world that can fix it. Your data science team will not contribute to data-driven decision-making.
Using data like that to feel better about actions we’re going to take anyway is an expensive (and wasteful) hobby. Data scientist friends, if your organization suffers from this kind of decision-maker, then I suggest sticking to the most lightweight and simple analyses to save time and money. Until the decision-makers are better trained, your showy mathematical jiu jitsu is producing nothing but dissipated heat.
Problem: you’re free to move the goalposts after you find out where the data landed. (Of course you score a goal every time. You’re just that good.)
Solution: set the goalposts in advance and resist temptation to move them later.
In other words, the decision-maker has some homework to do before anyone analyzes the data.
Until decision-makers are better trained, showy mathematical jiu jitsu only produces dissipated heat.
Framing the decision and setting decision criteria is a science of its own (we’ll dive into it in future posts, as the problem we examine here is just the tip of the iceberg), but in the meantime a quick fix that goes a long way is to come up with your decision boundary up front in your data science project.
I recently went clothes shopping in Brooklyn with my friend Emma. Showing off a pretty dress, she tugs at the pricetag on the back. “Hey, what does this say?” she asks me. “If it’s less than 80 bucks, I’ll buy it.”
Now that’s some decision intelligence! Instead of first seeing the price and then talking herself into a decision she’s already made, she uses the data to drive it. With a well-practiced reflex, she weighs how much she likes the dress and her budget, then sets the decision boundary, and only allows herself to see the data (price) once that’s done. She’s in the habit of using data in the right order and that’s a muscle you can exercise too.
People don’t always need to be data-driven and Emma knows that. She doesn’t have to make unimportant decisions that way, but she also knows that practice makes perfect. It’s much easier to build the habit on trivial decisions than to struggle when the important ones come around.
This idea is not new. Many different courses teach it, though one that’s almost guaranteed to cover it on day 1 is negotiation. If you haven’t put a value on your BATNA (~ a walk-away point) before entering a negotiation, you may as well paint “no idea what I’m doing” on your forehead. It’s the same thing by a different name: figuring out your decision boundary between your default action and the alternative.
The antidote is setting your decision criteria in advance.
In fact, standard advice for negotiators is to think through the entire range of potential offer combinations and plan your reactions to them in advance, otherwise it’s very easy for an experienced opponent to take advantage of you. Even without all the persuasion tactics at your counterpart’s disposal, irrelevant short-term factors like your blood sugar levels, your mood, how much the other party is smiling, and whether the sun is shining can have a disproportionate effect on the deal. Again, the same goes for data analysis — think of the data as negotiating with you to change your mind. The antidote there is planning your response in advance. Next time you’re negotiating a salary, for example, make sure you’ve thought about your number before you hear theirs.
Whether you think about what a number means to you before or after you see it, you still have to think about it. Doing it beforehand helps you counter some of the bugs in your human programming, with large payoffs in decision quality and negotiation performance. Improving the order of operations here is a valuable habit to cultivate and crucial if you’d like to be involved in data-driven decision-making. And here’s some bonus good news: with practice it’ll feel automatic.