Cassie Kozyrkov


Incompetence, delegation, and population

Anticipating your friendly statistician’s top 5 objections

If you’re new to the role the concept of population plays in statistics, skim my intro to set the mood. Long story short:

  • You only get to glimpse your population through the incomplete keyhole that is your sample — dealing with that is what all those fancy calculations are for.
  • The population is whatever the decision-maker chooses to interest themselves in for the purpose of making this decision.
  • In a machine learning / AI setting, the population is usually defined in terms of the instances the system needs to work on.

So the decision-maker gets to pick any population definition they want, even if it sounds stupid? Would now be a really good time for the statistician in you to get angry? (That’s our secret, Captain: we’re always angry.)

Let’s raise some valid objections together, shall we?

Your friendly statistician’s objections

Objection 1: This isn’t what the decision-maker is interested in.

In this setting, interested in is actually shorthand for “interested in for the purpose of making this decision” — perhaps a better way to say it is “what the decision-maker agrees to base this decision on.” You might find it helpful to think of defining a population of interest as a negotiation of sorts.

Sometimes the decision-maker starts out with very ambitious, all-encompassing interests… and then sees the price tag on the sampling required and rapidly backpedals to a much more modest and narrow “interest.” That’s perfectly fine. What’s important is that the decision-maker understands what they’re basing their decision on and is at peace with any corners cut or simplifications made. It should be a conscious choice.

Objection 2: The decision-maker isn’t the actual decision-maker.

It’s really important to know who’s in charge of choosing how the decision should be framed, because that’s the person who calls the shots here. If the actual decision-maker is outside the statistics project, bring them in. It’s vital to engage the actual decision-maker in framing of decision.

If a statistician senses that they’re being asked to work on something that bypassed a negotiation with the true decision-maker, they are within their rights to block the request until that person has approved the decision setup. If the real decision-maker can’t be bothered to commit the time and effort, they should delegate the decision to someone with the requisite skills and bandwidth.

If your goal is to persuade people using data, you may as well throw rigor out the window (since that’s where it belongs) and make pretty graphs instead.

Statistics makes most sense as a set of tools for making a decision. It doesn’t really stand up to epistemological scrutiny if you’re using it for persuasion. Stick with analytics and don’t worry so much about populations, because it’s an inspiration game at this point. Essentially, your goal is to inspire your victim to go with the decision you’ve already made on their behalf. They’re not the real decision-maker anyway. (I hope the real decision-maker made the decision in a smart way before the data theatrics commenced.)

Leaders, stop pretending you have the time to make every decision. It’s time to delegate!

Senior leaders, quit pretending you have the time to make every single decision. Save your attention for the important ones and delegate the rest. You don’t want to be part of the farce where your junior folk sell you the decisions they’ve made and you honestly believe there’s nothing selective about the analyses they show you. Numbers can’t lie? Yeah, and according to my photos (data!), every time I visit a tourist landmark I’m the only human there. … Exactly.

Left: Hey look, this is a completely desolate place. No one ever goes here. Right: Look again. These photos are data… turns out data can lie. And if you think there’s some applicable magic involving 30 datapoints, want to bet I can’t find you 3000 tourist landmark photos with no one in them?

Objection 3: There is no decision.

When the decision-maker can’t articulate how information would drive action, the approach you’re looking for is called analytics (a.k.a. data-mining), not statistics. It’s less stressful than statistical inference and the colors are pretty. More on this here.

So, why are we trying to do statistics when there is no decision? Consider objection 4…

Objection 4: The decision-maker doesn’t know what they’re doing.

If the decision-maker doesn’t understand what they’ve just asked for, the whole team has a huge problem.

Sometimes the decision-maker isn’t very skilled at their craft and they lack the ability to really think through what they’re interested in and how they want to frame the decision. In that case, the other team members, including the statistician, should push back. After all, downstream work relies on the decision-maker’s tasks to be completed competently, otherwise the completed analysis will be a rigorous answer to a sloppy and misguided question. That’s a Type III error right there.

If the decision-maker doesn’t have the right skills, the whole project is doomed.

So if you’re a data scientist working with a newbie decision-maker, bad news: you just got landed in the babysitting role of encouraging your decision-maker to learn the skills of deep thinking and rigorous decision framing required to make their requests worth a statistician’s time. (The alternative is getting them to delegate decision-making to you. Ask, don’t usurp. It’s safer.)

I hope they warned you that being a data scientist may include babysitting an unskilled decision-maker.

Modern decision intelligence teams can solve this problem another way. Instead of the statistician standing behind the decision-maker with a cattle-prod and forcing them to educate themselves in how to frame decisions, the team takes advantage of the qualitative expert role.

This person serves as an assistant to the decision-maker, asking a lot of questions and listening carefully to the decision-maker’s wishes, presenting scenarios the decision-maker might not have had the time to think of, and then translating everything into rigorous language and study designs that the downstream team will be able to work with.

Instead of firing bad decision-makers, you can augment them by hiring a helper: the qualitative expert.

On smaller teams, a data scientist with great people skills might take on this role in addition to their standard duties, while in larger organizations this might be a full-time job with one qualitative expert assisting multiple decision-makers. Social science backgrounds, especially in behavioral economics or cognitive psychology, lend themselves well to this role.

Objection 5: Insufficiently specific population description.

Just because your decision context makes sense in your head doesn’t mean it’s okay to be vague.

Tolerate no ambiguity! If you want nice things, expect to pay for them with effort.

It’s never a good idea for your written population description to be nothing but “all users” even if you swear up and down that you could sing an epic saga about the details if pressed. You’re probably not working alone, but even if you are, memory is a leaky bucket. I swear (up and down) that I could come up with a whole article about this horrible idea.*

Statisticians, if you sense that your decision-maker failed to do a thorough job, don’t budge until that homework is complete.

If no one is clear on what we’re trying to rigorously base our decisions on, what’s the point of rigor? We may as well stick with mere inspiration. Mere inspiration not good enough? Inspiration is cheap but rigor is expensive, so if you want nice things, you have to pay for them with effort.

* Jokes aside, here’s the article.

More by Cassie Kozyrkov

Topics of interest

More Related Stories