Tech Economics

Tech Economics

Token Deflation: The Treadmill Killing AI Margins

Tech Economics's avatar
Tech Economics
Jun 05, 2026
∙ Paid

In late 2021, GPT-3 cost $60 per million tokens.

Today, the same level of intelligence costs about six cents.

That’s a 1,000x price collapse in roughly four years. The product got a thousand times cheaper.

Now hold that number in one hand. In the other, hold this one: the four largest US tech companies will spend over $650 billion on capex in 2026.

A product whose price is collapsing. A cost base that is exploding.

That’s the whole problem, and almost nobody prices it honestly. Let me.


The deflation is real, and it’s brutal

Andreessen Horowitz has a name for it: LLMflation. The cost to deliver a given level of AI performance falls about 10x per year.

Epoch AI measured it across specific capability milestones. The decline ranged from 9x to 900x per year, depending on the task.

Put plainly: whatever you sold last year for a dollar, the market will pay roughly ten cents for this year. Maybe less.

Part of this is better chips. Part is better software. And part is competition — when DeepSeek showed up pricing 90% below the incumbents, everyone had to follow.

Even the optimistic case still has prices falling 3 to 5x a year through 2027.

For a customer, this is wonderful. For the company selling tokens, it is a treadmill set to a punishing speed.


The treadmill math

Here’s the part that should keep AI CFOs awake.

Say your price per token falls 90% in a year. That’s roughly the trend.

To earn the same revenue you earned last year, you now have to sell about ten times as many tokens.

Not grow ten times. Sell ten times more — just to stand still.

And to actually grow revenue, you have to outrun a product that’s getting 90% cheaper underneath you. Every year. Forever.

Volume has been growing that fast, for now, because AI usage is exploding. That’s the bull case, and it’s not stupid.

But think about what it requires. You have to add enormous volume every year just to offset price. And serving that volume costs more compute, which means more capex — the $650 billion — into hardware that, as I wrote elsewhere, is also depreciating fast.

So the model is: spend more every year, to sell exponentially more units, of a thing that’s worth 90% less each year, and hope volume outruns price forever.

That can work. Treadmills can be run. But you don’t get to stop. And the moment volume growth slows while prices keep falling, revenue doesn’t plateau. It falls.


Why this matters before the paywall

The comforting story is “AI is deflationary, that’s just technology getting cheaper, like it always does.”

True. But “deflationary” is a lovely word for a buyer and a terrifying one for a seller financing the deflation with hundreds of billions in capex.

The question nobody answers cleanly: at these price-decline rates, does selling tokens ever become a good business — or only ever a land grab that has to keep growing or collapse?

Behind the paywall:

The unit math on what a frontier query actually earns versus costs, once you stop quoting “revenue run rate” and start quoting gross margin.

Why the deflation hits the frontier labs hardest and the chip seller least — and what that tells you about where the profit in AI actually ends up.

The volume growth rate the labs need to hit, every year, just to keep revenue flat at current price-decline rates — and what happens to the whole sector the first year they miss it.

And the one business model in AI that benefits from token deflation instead of dying from it.

If you’ve read this far, you understand the treadmill. The paid section tells you who falls off it first..

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tech Economics · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture