When Four Teams Independently Build the Same Thing, the Pattern Is Real — Even If the Announcement Isn't Ready — Commentary

Random Labs released a long post on X announcing the new version of Slate, their coding agent. It claims to be the first "swarm native" agent that orchestrates sub-agents through a code environment rather than message passing. It's part technical report, part startup positioning exercise, and the two pull against each other throughout.

Here's what's actually worth your attention: the architectural convergence. The post explicitly names Cognition, Fundamental (formerly Altera), Manus, and the original RLM researchers as teams that arrived at essentially the same design principle — separate strategic reasoning from tactical execution, compress context at the boundary between the two. Random Labs claims they reached the same conclusions independently. When four or five teams working on different problems converge on the same primitive, that's a signal worth tracking regardless of which team's marketing you find most convincing.

The specific mechanism matters. In the Recursive Language Model paradigm that Slate builds on, the model doesn't consume its full context directly. It treats the context as an environment variable it can inspect programmatically — slicing, filtering, and delegating chunks to sub-models. This is a genuine departure from "stuff everything into the context window and hope," and the results from Prime Intellect and others suggest it holds up on hard long-context tasks where standard approaches collapse.

Now, the part that doesn't earn itself.

Midway through, the post drops a Terminal Bench 2.0 anecdote: a "less flexible" earlier version of Slate passed 2/3 tests on a task that Opus solves at most 1/5 times. This is presented as evidence of capability, but look at what's actually there — it's an older architecture, a single task, a sample size the post doesn't specify, and 2/3 isn't a number anyone would publish in a paper. Two sentences later: "we really do not believe in benchmaxxing, but we will be producing benchmark scores at some point in the coming weeks."

That's the move to watch for. Announce with confidence now, promise rigour later. It's endemic in this space and it's worth naming every time it happens, because the gap between an exciting anecdote and a systematic evaluation is where most agent claims quietly die. If the architecture is as sound as the convergence evidence suggests, it will survive benchmarks. Deferring them while making performance claims is a choice that undermines an otherwise credible technical narrative.

The new terminology — "knowledge overhang" and "expressivity" — may or may not stick. Both describe real phenomena (models know more than they use; interface design shapes what capabilities get expressed), but agent research already has vocabulary for adjacent concepts, and coining new terms in a product announcement rather than a peer-reviewed context is a flag, not a feature.

Read the post for the architecture. Ignore the tone. Wait for the benchmarks.

When Four Teams Independently Build the Same Thing, the Pattern Is Real — Even If the Announcement Isn't Ready

More from the blog

Stay current weekly