Cassie Kozyrkov


Is data science a bubble?

You’d be surprised how often I get this question. My answer?

“Probably not, but the data scientist title might be.”

And now let me try to explain myself before the pitchfork-wielding mob gets here.

Using data scientists to define data science

If you’re unfamiliar with what data science is, you can pop over to my quick overview here. There’s a variety of opinions, but the definition I favor is this one: “Data science is the discipline of making data useful.”

If you don’t like my definition or the alternatives in the link, perhaps you’ll enjoy this crisp one by Harlan Harris:

“‘Data Science’ is defined as what ‘Data Scientists’ do.
What Data Scientists do has been very well covered… Who Data Scientists are may be the more fundamental question.”

I like circles too. Fine. Who are we then? Well, it depends which club you’re in. (And this is where the whole data-science-as-a-bubble thing starts heating up a little.)

AND versus OR

To some, the title implies full competence at three specializations. To others, it implies that skills add up to those of one specialist. If you’re not experienced at hiring data scientists, this could hurt you.

Option 1: Only the worthy!

There are those who fantasize about stripping “wannabes” of their data scientist titles and limiting the hallowed profession to an elite that’s skilled in The Everything Of Data.

What it’s like to be interviewed by these folks? They want to see my statistician pedigree, my machine learning chops, my analytics black belt, and my portfolio of applied projects. They want to know whether my grad school was a fancy one. They also want to check my history of leadership and business problem-solving. Oh, and I’d better be a great communicator. (How about the moon on a stick too, why you’re at it?) If I hadn’t been nerding out with datasets since I was eight, I’d be downright intimidated. As it is, it’s a funny little club that I have deeply conflicted feelings about.

Let me call it what it is: The AND Club. Members must be full statisticians AND machine learning experts AND analysts with coding skills that can cut diamonds. Notice that it’s pretty hard to qualify into it; very few people are experts in All The Data Things. Staffing the world’s data needs with these folk is never going to work. Dismally, that’s the nature of supply and demand.

Option 2: Everyone’s welcome!

The alternative, and far more populous, club is The OR Club. It consists of folks who upgraded their narrower titles like analyst or statistician to the umbrella term. It sounds better, it increases employment in the data science profession, it broadens the community, it brings diverse skills, everyone wins. Right? Well, almost.

What I love about it is that it emphasizes the team sport nature of data science and empowers more people to participate in working with data. This is a great thing! And some parts of data science aren’t that complicated. Data-mining, for example, is something more people are qualified for than they realize. If you thought a PhD is required for data-mining, good news, all you need is a way to look at inside datasets plus healthy humility and common sense.

How about the downside? Data science has a reputation for deep skills, high qualifications, and long immersive study. My heart goes out to the poor confused hiring managers who think they’re luring an all-in-one data scientist but get someone much less qualified. False advertising does damage.

(Tip: If you want to be completely sure you’re not stretching the truth on your resume, the title Data Analyst is the safest choice.)

The floodgates of false advertising!

Let me level with you. With every ‘data scientist’ title I’ve held, I had already been doing the job under a different name before rebranding czars in HR applied a little nip-tuck to the employee database. My duties didn’t change in the slightest.

I’m no exception; my social circle is full of former statisticians, decision support engineers, quantitative analysts, math professors, big data specialists, business intelligence experts, analytics leads, research scientists, software engineers, Excel jockeys, niche PhD survivors… all proud Data Scientists of today.

When my title became data scientist, my duties didn’t change in the slightest.

Hey, friends, I don’t judge. Good on you for managing your professional brand. What I’d like to point out, though, is that basing the definition of data science on the “data scientist” isn’t a very stable choice, what with the mixed crowd attracted to the title. Taking the limit, we get a bunch of words carefully crafted to say as little as possible, which feeds back into how data scientists are seen. (Told you I like circles.) I recently felt my blood pressure spike when a data science hiring manager posted something like “Have a PhD? Then you’re probably a data scientist” (paraphrased to protect the innocent).

Using job titles to define data science is a dangerous game.

This kind of thing doesn’t hurt incumbents much. The established data science shops already know what they’re looking for and can sniff out a good, ahem, data scientist even if the resume job title says space alien. It’s the less experienced hiring managers I worry about.

A lot of firms getting started with data science don’t have someone experienced to guide them. Their plan? Hire a data scientist and all will be well.

Buyer beware

Put yourself in a new hiring manager’s shoes: you’ve done a bunch of reading and decided you need statistics, data-mining, and machine learning skills for your project. You can hire three people. Now let’s see the candidates: 10 resumes with “data scientist” on them.

If these are AND folk, you can pick any three of them. Each one has the skills you need. Unfortunately, that club is small (read: very expensive to hire) so chances are that these 10 aren’t members.

It can be hard for a hiring manager to ferret out which part of data science the job candidate’s actually good at.

If these are OR folk (more likely in today’s climate), you have to interview them carefully to figure out what they’re actually qualified to do. You’re looking for three different skillsets. The folks in front of you might have just one, but they also have every incentive to convince you that they’re the expensive all-in-ones. They might have learned just enough about all three areas (statistics, data-mining, machine learning) to be dangerous to both your project and your hiring process. You need to ferret out what they’re actually good at and that can be hard if you’re not an experienced data scientist with the full rainbow of experience.

The result? Mistakes in the hiring process. Buzzwords on a resume don’t necessarily come with skills guarantees.

I’ve seen many teams accidentally get several copies of the data-mining type of analyst instead of the well-rounded data team. But hey, this isn’t only a data science problem. Turns out buzzwords on a resume don’t necessarily come with skills guarantees. The hotter the buzzword, the more it spreads. Buyer beware.

The end of the data scientist?

Personally, I take job titles with a grain of salt. The important thing is matching skills to whatever needs doing. If the title isn’t a good indicator of that, then good hiring managers will learn to look for something else on the resume.

Enough of that behavior will result in a new hot label for the exact same job. If I squint at the logic, I can almost make out a story involving OR Clubs and hiring bubbles. The title might just go out of fashion, but I’m not one to bet on horses.

Is data science a bubble?

The world is generating more and more data every year, so it’s reasonable to expect labor that extracts business value from it to be able to earn its keep.

More data means more demand for the three main activities within data science — statistical inference, machine learning, analytics / data-mining — so those skills will stay very relevant, though their names might evolve.

Labor that extracts value from data will always be able to earn its keep. Work that isn’t useful has an expiration date.

On the other hand, teams who got hired on the hype and never learned how to focus on what’s useful to the business may find that their season has an expiration date.

Several years ago, an engineering director friend who works in tech was bemoaning his useless data scientists. “I think you might be hiring data scientists the way a drug lord buys a tiger for his backyard,” I told him. “You don’t know what you want with the tiger, but all the other drug lords have one.”

I don’t know any actual drug lords (or tigers), so I’m not sure what’s in those backyards. But you get my point.

Though that sounds like prime bubble, I’m actually pretty optimistic. Growing data means growing opportunities — it all just needs good management. My friend, for example, ended up conquering a lot of his problems by recognizing that the rest of his organization needed training in how to work with data scientists. Since then, his teams have been more thoughtful about how to assign work and great things followed. Training decision-makers in how to make use of data science saved the day!

Check that your decision-makers have the right skills for working with data scientists. If a bubble exists, that might be the root of it.

The challenge for today’s data science leaders is help decision-makers get training like that, creating more people with the skills to point the technical brilliance of data scientists in valuable directions. (Read further here.) Once data scientists are able to make themselves useful, keeping them around becomes a no-brainer, rather than a matter of fashion. Will we manage it before their data scientist title falls out of favor and they scramble towards another rebranding? Stay tuned.

More by Cassie Kozyrkov

Topics of interest

More Related Stories