LLM Costs Are Falling. Infrastructure Margins Are Not.

The cost of calling a language model API has fallen by roughly two orders of magnitude over the past three years. GPT-4 in 2023 cost around $60 per million tokens. Frontier model calls today are an order of magnitude cheaper, and smaller capable models are cheaper still. This is a remarkable and generally good development for the ecosystem. Cheaper tokens lower the cost of experimentation and make more use cases economically viable.

There is a narrative that follows naturally from this observation: as model costs fall, the infrastructure layer built around those models will compress too. If the thing the infrastructure manages gets cheap, why does managing it stay expensive? This narrative is wrong, and I want to explain why.

What Cheaper Tokens Actually Do

Cheaper tokens do not reduce the complexity of running AI in production. They increase it. When tokens were expensive, companies carefully constrained their LLM usage to a small set of high-value, well-understood use cases. As tokens get cheaper, the scope of what teams try to do with LLMs expands — longer context, more complex pipelines, more agentic patterns, more integration points with external systems. The total system complexity grows faster than the per-token cost falls.

The engineering costs — the latency management, context assembly, routing logic, observability infrastructure, evaluation pipelines — are not sensitive to token prices. They are sensitive to how complex the overall system is. And as teams get more ambitious about what they try to build, those engineering costs rise. The infrastructure layer is not getting commoditized; its scope is expanding.

The Abstraction Value

The other dimension that falling token costs do not touch is the value of the right abstraction. The teams that win in AI infrastructure are not winning by being cheaper than raw API calls. They are winning by providing abstractions that let application developers move significantly faster than they could building on primitives. That abstraction value — the reduction in time-to-production, the removal of footguns, the provision of best practices baked into the API — does not correlate with model pricing.

We continue to believe that the infrastructure layer above the model APIs represents the best risk-adjusted investment opportunity in the AI stack. The falling cost of foundation models makes the application layer more interesting but does not diminish the infrastructure layer. It enlarges it.