The bill of agent-native

As of March 31, 2026, OpenAI is changing the billing unit for its agent containers. Until then they were priced per memory tier, three cents per gigabyte. From March 31 onward they’re priced per session of twenty minutes, with their own meter alongside the model. Two weeks earlier, Google introduced project-level spending caps in AI Studio, and enforces monthly per-tier caps from April 1. Gemini also offers Flex and Priority tiers, pricing variants on the same inference meter. Anthropic bills web search separately from tokens, at ten dollars per thousand calls, and runs Managed Agents on both tokens and session runtime.

Three labs, three different entry points, the same pattern. The bill for an agent is no longer a single number. It’s a stack.

This isn’t a pricing detail. Agent-native software doesn’t arrive as a UX upgrade layered on top of classic apps. It arrives as software with its own cost architecture, and that architecture shows up more clearly in the API docs every month.

Whether this stack is the final form, no one knows. Pricing models are historically volatile. CDNs, storage, serverless have all redrawn their meters multiple times. What’s happening now may be a transitional phase. In two years we could be back to flat subscriptions per agent. The shape of the bill will probably keep shifting. Anthropic recently swapped its code-execution container-hours for session runtime on Managed Agents, the same kind of meter under a different name. The existence of a multi-line bill is what the rest of this argument assumes, and that assumption deserves one footnote of honesty.

What agent-native software is

A quick recalibration, because the term gets used too loosely.

Classic software routes every step through the user. You click, navigate, fill in, confirm. The machine waits. AI-assisted software adds an advisor to this. You get a suggestion, a summary, a draft. You still act. The machine hands you something.

Agent-native software is something else. You give a task, and the system executes it itself. It breaks the task into steps, calls tools, weighs intermediate results, decides what to do next. Where classic software reacts and AI-assisted advises, an agent acts within the task you gave it.

It sounds like a gradual difference. It’s actually a shift in where the costs sit. In classic software, build costs are high and usage costs are low. You build a button once; after that, each click costs almost nothing. In agent-native software, that ratio inverts. Build costs can be low because much of the work shifts to prompts, tools, and orchestration. Usage costs climb, because every execution bills. Model usage, five to twenty tool calls per task, control layers, logging, fallbacks. Each of those burns something. And each of them now shows up as a separate line item on the lab’s invoice.

The pattern beneath the three labs

OpenAI has unbundled its pricing along multiple axes. Flex processing and Priority processing are pricing variants on the existing token meter, not a separate meter. Web search does stand apart, metered at ten dollars per thousand calls, separate from the model. Containers, since March 31, as short-lived sessions, twenty minutes apiece, with their own meter. The real stack at OpenAI is tokens, web search, and containers.

Google does the same from the other side. Gemini has monthly spending caps at the project and tier level. Flex inference and Priority inference aren’t marketing names; they’re explicit optimization layers in the Gemini docs, available from April 1, 2026. Vertex AI meters grounding separately, at forty-five dollars per thousand grounded prompts in the enterprise variant.

Anthropic confirms it from the third side, with a sober list. Tool use adds extra system-prompt tokens. Web search costs ten dollars per thousand calls, on top of the standard tokens. Computer use carries its own token overhead. Claude Managed Agents bill on both tokens and session runtime. Prompt caching and batch processing, the mechanisms for cost control, aren’t fine print. They’re a chapter of their own.

Agentic usage is no longer a single model call. It’s a stacked runtime with separate meters for latency, tools, execution environment, and retrieval.

The misconception many teams still make

Ask someone bolting an agent feature into their product what that means architecturally, and you’ll usually get an answer in terms of capability. The agent can now call tool X. The agent can now decide on its own. The agent can fill in a form. An extension of what the software already did.

That works in a demo. One user, one task, enough tokens in the buffer. Under production load, that frame collapses. Ten thousand users, a hundred agents per user per day, each agent run a stack of tool calls. You haven’t built a feature. You’ve placed a new category of invoice under your product, with a unit economics that doesn’t resemble the unit economics of your old software.

The misconception sits in one word: on top of. As if you can add agent-native as a layer without reconsidering the layers beneath.

That an agent SDK fits technically onto your existing backend, true. That this gives you an agent-native product, false. You have an agent shell around classic software. That shell works until it has to scale, and then, back to the drawing board.

Building becomes more equal, serving does not

Much has been written about how AI democratizes software building. Ideas reach working systems faster. Small teams build what used to require a full development team a few years ago. Solo builders ship prototypes that were previously out of reach. That democratization is real. The threshold to build is lower.

What’s less discussed: that democratization stops at the build layer. Whoever has made something then has to serve it. That’s where a different economic regime begins.

With classic SaaS, the weight sat at the front end. Basecamp, Linear, Notion took a few months to a few years of development work, then distribution where each additional user cost almost nothing. That made room for small players. Not just to build something, but to build a business on it. Scale worked in their favor.

With agent-native software, scale works differently. Each user brings their own stack of inference calls. A power user running ten agents a day consumes more than a casual user triggering one a week, sometimes by a factor of a hundred. The unit economics that made classic SaaS profitable for small builders doesn’t apply automatically here. You have a product that’s cheaper to make, and more expensive to run.

A technical escape route exists: bring your own key. Let the user supply their own API account. For developers and technical power users, that’s an elegant model. Cursor started that way, though it has been phasing BYOK out since late 2025 for its core features. Zed AI still does it, Raycast AI offers it as an option. And each of those products has since added a tier where the provider carries the bill. BYOK isn’t a mass-market model. The average end user doesn’t want to manage keys, doesn’t want to monitor token usage, doesn’t want a second relationship with OpenAI or Anthropic. They want a product. Predictable, simple, one price.

Whoever can absorb that price has a structural advantage. A large software player adds AI to an existing product suite with existing margins. They absorb inference costs into an existing P&L, amortize them across existing customer relationships, cross-subsidize them with products that are already profitable. A small provider has to build their entire business model around those costs. That difference isn’t in the technology; it’s in financing capacity. And it becomes more visible the more work agents do.

That’s the tension beneath the democratization. Software building is becoming more broadly accessible. Software serving at scale is becoming the opposite. Small players can still do it. It gets harder, with less self-evident profit margins than classic SaaS offered.

The real counter-argument

The sharpest critic of this argument doesn’t say agent-native is overhyped. They say I’m looking at the wrong accounting model.

Their argument goes like this. Judge an agent call on value per completed task. Cost per execution is the wrong measuring point. An agent that reads ten contracts in sixty seconds and finds three contradictions does work a lawyer would spend hours on. Eighty cents of runtime for what otherwise costs eighty dollars of human work isn’t a cost problem. The thesis confuses input costs with output economics.

That critique is right, and it sharpens my point.

Agent-native isn’t expensive by definition. Agent-native has unevenly distributed value. Some steps in a workflow amply justify expensive intelligence. Others resolve fine with a line of code, a regex, or a lookup. The design question therefore becomes more precise. Not: can an agent do this? But: where does an agent generate enough value to carry the runtime, governance, and error costs? That question gets asked per step, not per product.

What open-weight and compliance change

One tension I feel myself. This argument shouldn’t hang on the pricing sheets of OpenAI, Google, and Anthropic. For builders in EU jurisdictions, self-hosting on owned GPU clusters or through European providers isn’t a fringe case. It’s often the main route. Especially in semi-public sectors like local government, healthcare, and education, where data residency and processing agreements under GDPR are more often a blocker than a preference.

Does self-hosting change the economic point? The shape of the bill, yes. Its existence, no. Variable-per-call shifts to mostly-fixed-plus-marginal. You pay up front for compute capacity, and after that the marginal cost of orchestration, latency, operations. That’s a different product-economic regime. It makes budget predictable. It makes the scaling boundary more abrupt. You’re under your capacity, or you’re over it and need to buy more.

The GDPR dimension adds weight. A call to a US frontier lab that touches personal data requires a transfer basis that hasn’t been self-evident since the Schrems II ruling. The 2023 EU-US Data Privacy Framework offers a successor, but it’s legally fragile, not the kind of foundation to build a long-running product on. For a European product builder, the choice between frontier API and self-hosting therefore isn’t only a price question. It’s a legal choice that shapes the product fundamentally.

In both regimes, the central fact remains. Runtime intelligence is a product variable, not an infrastructure detail. The location of the invoice changes. The existence of an invoice does not.

What this means for designers

The UI designer of fifteen years ago thought in flows, screens, buttons. Those questions persist. They’re no longer sufficient.

The designer building agent-native software today has to answer four questions, per step in a workflow, not per product.

What is the agent permitted to do? Not what it can do technically. What it’s authorized to do within this step. Which tools, which actions, which data. Everything outside is closed.

How long can it run? A time budget per task. A twenty-minute container is a design choice, not a technical detail. Too short, work goes unfinished. Too long, you’re paying for an agent stuck in a loop.

Under what limits? Rate limits, spend caps, audit rules, fallbacks. What happens if a tool call fails. What happens if the output can’t be verified. Which human signature belongs to which action. And, for European builders, which data is even allowed to pass through it.

At what price? Not just the API invoice. The cost of control layers, logging, orchestration, and human time to review errors. The total amount this step costs every execution.

These aren’t four policy checkboxes. They’re a decision framework you fill in per interaction, because every interaction has a different balance between value and cost. A dropdown doesn’t need an agent. Plowing through ten contracts for contradictions ought to be one. The next step in the same workflow can be deterministic again.

Hybrid becomes the norm. Cheap deterministic layers for predictable work, light models for routine decisions, heavy models only where real reasoning is needed. Routing between those layers is itself a design discipline, and that discipline is still in its infancy.

The layer above the meter

Whoever pays the bill, in the end, determines who decides. If users can’t see what an agent action costs, the incentive is to do everything agent-native. It feels like a free capability upgrade. The bill comes later, with the CFO, at the scaling limit, at the compliance officer. Over the next one or two years, teams will hit walls that their agent architecture can’t make the math work on, or can’t account for. That’s not a disaster, it’s market maturity. But it’s coming.

The winners won’t be the teams with the most impressive demos. They’ll be the teams that choose sharply where an agent does and doesn’t belong, and that encode those choices somewhere they carry through to execution.

That last part is the open question. The four answers, what the agent is permitted to do, how long, under what limits, and at what price, are still mostly implicit, scattered across code, config, prompts, and policy documents. That doesn’t work when you have to refill them per interaction. A layer belongs above the meter, where the designer’s intent is testably captured, independent of the model beneath. What that layer looks like isn’t crystallized yet. That the layer will come seems inevitable.

For small builders, that layer isn’t a luxury; it’s existential. Without explicit decision frames per step, every agent feature becomes an open tab. With such frames, there’s room for indie builders to make products that stay profitable under production load, and that don’t automatically feed value back to the largest players who can absorb inference costs into an existing P&L.

The old design question was: what can the software do? That question isn’t in the foreground anymore. The new question is different. What is it permitted to do. For how long. Under what limits. At what price.

Those who treat this as a design discipline build products that survive production load, regardless of how large the player behind them is. Those who keep treating it as a feature layer build demos.

That difference will become visible over the next two years.