Coding agents are better than you think

Many developers say AI isn’t good enough to build with seriously. That judgment sounds level-headed. It’s increasingly based on an outdated frame of reference.

Coding agents make mistakes. That’s beyond dispute. But something structural has changed in recent months: the production layer of software has been extremely compressed. Tasks that used to take days or weeks are now — when properly scoped — completed in hours or minutes.

That difference is underestimated.

The wrong test

Much skepticism stems from experience with an earlier generation of tooling. The context was too small, the output too fragile, the iterations too limited. Or people still use coding agents as a hybrid of autocomplete and chatbot: one prompt in, some code out, see if it happens to work.

That’s the wrong test.

Claude Code now processes a codebase of 100,000 lines as context. It reads files, searches patterns, modifies code, runs tests, corrects errors and keeps going — without the human delay between all those steps. Cursor navigates an unfamiliar project and locates bugs faster than a developer who still needs to learn the codebase. GitHub Copilot agent picks up an issue, creates a branch, writes code, runs CI and opens a pull request — autonomously.

The distance between “autocomplete suggesting a line of code” and “agent implementing a complete feature” has been bridged in twelve months. Teams that still base their judgment on their first experience are testing an airplane by looking at a bicycle.

The real problem isn’t in the agent

The thinking error is less and less about the technology and more and more about how teams work with it. Instructions that are too broad. Too little context. Too little review. Too much expectation that an agent will automatically understand what’s intended.

Nobody would expect a human developer to immediately build the right thing without a briefing, without codebase context and without acceptance criteria. With coding agents, a surprising number of teams still expect exactly that.

A concrete example: a team I know used Claude Code to perform an API migration — 47 endpoints, from REST to GraphQL. The first attempt failed spectacularly. They gave one prompt: “migrate the API to GraphQL.” The result was unusable.

The second attempt took a day. They wrote a specification per endpoint: which fields, which relations, which breaking changes were acceptable. Claude Code processed all 47 endpoints in four hours. Three needed manual correction. The rest was production-ready.

The difference was in the instruction, in the scoping and in the review. The agent was identical.

The shift teams are missing

Once building becomes cheap and fast, the bottleneck shifts. Away from production capacity, toward scoping, instruction and review. That explains why some teams are already moving ten times faster while other teams remain stuck in the conclusion that it’s not good enough yet.

The former have adjusted their frame of reference. They write sharper instructions. They split work into scoped tasks. They review output like they’d review a junior developer — thoroughly, quickly, with clear feedback. They treat the coding agent as a production tool that needs quality control, not as an oracle that should just know.

The latter are still testing as if it’s 2024.

Vic Boomer is the proof. The entire codebase of Pantion — 46 tools, more than a thousand tests, tens of thousands of lines of TypeScript — was built with Claude Code. The past sprint: a complete OpenClaw adapter, six agent workspaces, three styleguides, an image pipeline and a deployment system. One developer, one coding agent, two weeks.

That pace was unthinkable eighteen months ago. Not because the developer types faster, but because the production layer works fundamentally differently.

What teams should do now

The first step is adjusting the frame of reference. Coding agents of April 2026 are a different category than those of 2024. Anyone basing their judgment on an experience from a year ago is evaluating a different product.

The second step is learning to work with the production layer. That means: smaller instructions, more context, sharper acceptance criteria, faster review loops. The quality of the output is a direct function of the quality of the instruction.

The third step is measuring. Take a task that normally takes a week. Give it to a coding agent with proper scoping. Measure the difference. Three hours versus five days isn’t an anecdote — it’s a structural shift in the economics of software.

Outdated benchmark

This isn’t a temporary tooling story. It’s a structural shift in how software is built. The agents are good enough. The question is whether teams are good enough at working with this new production layer.

Anyone still evaluating coding agents based on last year’s experience is watching the wrong game.

Vic Boomer is an essay-led AI studio that turns ideas about AI, agents and software into clear analysis, working systems and practical tools.