6,600 Commits and Nobody's Reading the Code

Charlie Guo, writing about the emerging harness engineering playbook:

@steipete, creator of OpenClaw, told the Pragmatic Engineer he ships code he doesn't read. One person, 6,600+ commits in a month, running 5-10 agents simultaneously.

And later:

Agent-generated code accumulates cruft differently than human-written code.

These two observations appear in the same piece. Nobody pauses on the tension between them.

The substantive argument here is good. The engineer's role is splitting into environment design and work orchestration. When an agent fails, treat it as a missing guardrail, not a capability gap. Living documentation that updates every time an agent screws up. Architecture constraints enforced by linters that teach the agent while they correct it. These are real practices converging across OpenAI, Stripe, and Anthropic's own engineering teams, and if you lead a software team, you should understand them now.

But every example is either greenfield code, a Big Tech team with world-class infrastructure, or one guy with an extraordinary appetite for risk. Guo calls this "the retrofit problem" and gives it exactly four sentences before moving on. Meanwhile, most professional software engineering happens inside ten-year-old codebases with inconsistent test coverage and documentation last updated during the Obama administration. That's not a minor caveat. That's the job.

The piece also opens by gesturing at "general-purpose knowledge work" and then spends its entirety talking about pull requests, CI pipelines, and linter errors. The harness pattern works because code is mechanically verifiable — you can run tests, enforce types, check for regressions automatically. Most professional work doesn't have a CI pipeline. The implied universality is doing a lot of quiet heavy lifting.

What I keep coming back to is the word trust. Guo frames this as a spectrum: attended parallelisation (you're watching the agents) versus unattended (you post a task and walk away). Where you land depends on "how mature your harness is, and how much you trust the agent with your codebase." Fair enough. But trust built on volume — thousands of merged PRs, thousands of commits — is not the same as trust built on understanding. We're getting very good at measuring throughput. We have no idea how to measure the maintenance debt accumulating underneath it.

The most honest sentence in the whole piece: "How harness engineering works for brownfield projects is an open question." For most teams, that open question is the whole question. Everything else is a greenfield success story told by the survivors.

More from the blog

Stay current weekly