AI's New Data Economy Has a Landlord Problem

Written by yunfeiz | Published 2026/03/24
Tech Story Tags: ai-data-economy | ai-training-data | ai-data-pipelines | ai-economics | ai-data-lineage | data-supply-chain | web-scraping | ai-governance

TLDRAI companies are paying platforms for training data, but the creators behind that data receive little to nothing, creating an extractive data economy that may harm future model quality.via the TL;DR App

Reddit's AI licensing deals with Google and OpenAI bring in $130 million a year — about 10% of its total revenue. Shutterstock pulled in $104 million from licensing deals. News Corp signed a $250 million agreement with OpenAI. The licensed data market showed up fast, and the money is real.

I work in data operations for an AI company. I spend my days figuring out where training data comes from, how it moves through pipelines, and what it costs. And from where I sit, this new market looks less like a fair exchange and more like a landlord economy — the people who own the pipes collect rent, while the people who fill those pipes with value get nothing.

Because here is the question nobody in those press releases wants to answer: who actually got paid?

Not the Reddit user who wrote the comment. Not the photographer who uploaded to Shutterstock in 2014. Not the freelance journalist whose article ended up in a News Corp archive. The platforms got paid. The middlemen got paid. The people who made the stuff? Nothing.

The math tells the story

Let me show you some numbers.

A ProMarket analysis broke down what licensing revenue looks like when you follow it to the source. Taylor & Francis sold Microsoft access to its entire academic journal catalog — thousands of journals, decades of research — for an initial fee of $10 million. Google's licensing deal with Reddit? If you divide that $60 million among Reddit's daily active users, each person's contribution is worth a few cents. Not per month. Total.

That's the licensed data economy in one snapshot. Billions flowing between a handful of AI companies and a handful of platforms. Crumbs — if anything — reaching the people whose work made those platforms worth licensing in the first place.

Platforms sell what they don't create

Think about it for a second.

Who wrote all those Reddit posts? Not Reddit. Who took the photos on Shutterstock? Not Shutterstock. Regular people made that stuff — millions of them — and they clicked "I agree" on a terms of service page without reading it. Somewhere on page nine of that wall of text, there's a line that says the platform can do pretty much whatever it wants with your uploads. Train an AI on it? Sure. Sell access to it? Go ahead.

Take LinkedIn. Last September, they quietly updated their settings so your data would be shared with Microsoft for AI training — turned on by default. You had until November to find the toggle and switch it off. Miss the window? Too bad. And anything they already grabbed before you opted out? That's theirs now.

The pattern repeats. Platform accumulates user content. Platform licenses that content to an AI company. AI company pays platform. User gets nothing. Sometimes the user doesn't even know it happened.

Small creators can't get to the table

Even if you set aside the platform problem, the licensing market still has a structural flaw. Only the biggest content owners can negotiate directly with AI companies. A major publisher has lawyers and a catalog worth millions. An independent illustrator in Buenos Aires has a portfolio and an Instagram account.

Research from ProMarket makes this blunt: licensing en masse will never be a realistic option for training large language models. The sheer volume of content needed makes it impossible to negotiate with every individual creator. So the deals go to whoever controls the biggest piles of content — which is the platforms, not the people.

A few intermediaries have tried to bridge this gap. Calliope Networks built a "License to Scrape" program for YouTube creators, then got acquired by Protege before it gained enough traction to matter.

Reddit, Yahoo and Medium backed a new initiative called Really Simple Licensing to standardize how AI companies pay publishers. Collective management organizations in the US, Australia and Germany have started experimenting with AI licensing on behalf of their members. These are all steps in the right direction. But they mostly help publishers and platforms — not the individual people who made the content.

The work-for-hire trap

There's a worse version of this that nobody talks about. If you're a freelance illustrator or photographer, chances are you signed a work-for-hire contract at some point. That means the company that hired you owns your work — not you. So when an AI company pays a studio or a publisher for training data, guess who gets the check? The studio. You drew the picture, but your name is nowhere near the money.

The result? An artist's entire body of work can feed a commercial AI model, and the payment lands at Disney or a stock photo agency. The artist's name appears nowhere in the transaction.

What this means for AI data sourcing

I care about this because I build data pipelines for a living. The quality of any AI model depends on its training data. And right now, the licensing model is creating incentives that will degrade that quality over time.

If creators feel exploited, they stop contributing. They pull content behind paywalls. They watermark everything. They opt out wherever they can.

Cloudflare started blocking AI crawlers by default in July 2025, and half of the major publishers now block training bots. The open web that early AI models trained on is shrinking — not because the technology failed, but because the economics were unfair.

The companies that figure out how to compensate creators directly, build transparent data supply chains, and treat data sourcing as a relationship rather than an extraction will have better models in three years. Not for moral reasons. For data quality reasons.

Everyone else will be training on an increasingly stale, increasingly narrow slice of the internet. And sooner or later, their users will notice.


Written by yunfeiz | COO of Abaka AI, a Silicon Valley company specializing in data collection, annotation and dataset creation for AI.
Published by HackerNoon on 2026/03/24