The chart says intelligence got cheaper. The invoice disagrees.

It's a Tuesday morning. A CFO opens the Artificial Analysis chart. The line falls. It gets forwarded to the board. The board nods. But a massive invoice arrives next quarter!

The "price of intelligence" continues to decline. Courtesy A16z, Artificial Analysis

The chart is honest. It is also class-blind. It plots GPT-4 next to GPT-4o mini. Claude Opus next to Claude Haiku. A flagship and a distilled cousin, on one line, called "intelligence".

That isn't a generation getting cheaper. That's a vendor shipping a smaller product.

Here's my thesis: Compare same class to same class and the frontier is getting more expensive, not less. Roughly five times the bill. Roughly one-and-a-half times the brain.

The chart averages a Porsche and a scooter

The Intelligence Index is a benchmark average. Useful. Not linear. The cost axis is real dollars per million tokens. The capability axis is a benchmark percentage.

Plot them together, average across product tiers, and the line will always trend down.

The chart is class-blind. Your finance team isn't.

Same class, different story

GPT-5.4 to GPT-5.5. Same family. Same tier. The price doubled from $2.50/$15 to $5/$30 per million. Intelligence Index moved roughly 56 to 60. Two times the bill. A handful of percentage points more brain.

The sticker price per million tokens sometimes doubles, sometimes stays the same. But overall, the bill increases by 5x or more!

Claude Opus 4.6 to 4.7. Sticker price unchanged. The tokenizer changed. The same task now needs 1.4x the tokens. An invisible 40% bump.

Gemini 1.5 Flash to 2.5 Flash. Output price up 138%. Same Flash badge.

Simon Willison, who has been tracking this longer than most, calls April's hikes the steepest he has logged.

That isn't deflation. That's the opposite.

Reasoning tokens are the tax nobody priced in

The frontier got smarter by thinking more. Chain-of-thought. Agent loops. Each thinking token is an output token. Output tokens are billed three to eight times input.

A hard query that took GPT-4 two thousand tokens to answer now takes a reasoning model fifteen thousand to chew on.

Cheaper per token. More expensive per task.

The chart counts tokens. Your finance team counts tasks.

The honest read

At Riafy, we route every request to the right fine-tuned version of R10 - because the dirty secret of 2026 is this: capability per dollar improves only if you go smaller and cleverer. Stay at the frontier, and it declines sharply.

The chart goes down. The invoice goes up.

Don't trust a line that averages a Porsche and a scooter.

The chart averages a Porsche and a scooter

Same class, different story

Reasoning tokens are the tax nobody priced in

The honest read

Related Stories

There are two kinds of AI. One of them is you.

Future of Work: Why Dario Amodei Is Wrong About Your Job

The Architects of Intent