The harness compounds. The argument doesn't.

Kartik's piece, Why Everyone Is Suddenly Building Their Own Agent Harness, is making a claim worth taking seriously. As frontier models converge on capability and price, the scaffolding around them — tool design, context management, eval suites, verification loops — is where the durable engineering investment now lives.

The reframing is specific, and most AI commentary still misses it. The post separates raw model intelligence, which resets with every release, from harness investment, which accumulates. Every fix to a failure mode (a lint rule, a sub-agent, a tool boundary) keeps paying off across future model versions. That asymmetry is real, and naming it cleanly is more useful than another round of benchmark commentary. The Claude Code leak figure, roughly 513,000 lines of TypeScript wrapping a few lines of API call, is the kind of concrete number that makes the point land.

The headline is where it falls apart. "Everyone is suddenly building their own" is doing work the rest of the piece doesn't support. Read the second-to-last section carefully. The actual advice is: build your own when you see a sustained 15-point eval gap, when per-task economics matter, when you need audit trails stock harnesses don't provide, or when your domain needs tools that don't exist yet. Those are uncommon conditions. Most teams adding AI to a product should extend Claude Code or Cursor, not rebuild them. The headline is selling a movement. The advice is describing a niche.

The example list has the same problem. Cursor, Devin, Replit, Factory, Sourcegraph: these companies' product is an agent. Of course their harness is custom. That tells you very little about what an internal ops team or a SaaS company embedding an AI feature should do.

The other thing the piece glosses over is that harnesses also rot. A lot of harness work is tuned to specific model quirks: prompt patterns, tool-call reliability, context behaviour. When the model underneath changes, that work degrades. The compounding is real, but it's uneven. "Harnesses compound, models commoditise" works cleanly as a slogan and less cleanly as a description.

The signal worth keeping is narrower than the packaging. If your product is an agent, the engineering around the model is where the moat sits. If AI is a feature inside something else, the harness is an implementation detail of someone else's product.

More from the blog

Stay current weekly