Most AI Agent Failures Aren't Engineering Problems

Rohit, in "How to Build a Production Grade AI Agent":

Over 40% of agentic AI projects fail — not because of the models, but due to inadequate risk controls, poor architecture, and unclear business value.

That last item — unclear business value — is the one that matters most. The other two get the ten-point framework. It gets half a sentence and is never mentioned again.

The engineering advice itself is largely sound. The confused deputy problem is real: agents hold elevated permissions that users don't, and prompt injection can weaponise that gap. Typed schemas for tool calls, circuit breakers, tenant isolation, context compression with provenance tracking — none of this is glamorous, but teams shipping agents that write to databases or send customer emails ignore it at genuine risk. The principle that "real security must exist entirely outside the LLM reasoning loop" deserves to be highlighted and repeated.

Where it strains is in scope. This is a reference architecture for a platform team at a mid-to-large enterprise — hardware security modules, mutual TLS, workload identity federation. That's a legitimate audience, but the framing implies it's the way to build agents. A five-person company deploying an internal support tool doesn't need most of this. More importantly, the article never asks the prior question: should this be an agent at all? Many of the failure modes it catalogues — runaway loops, unauthorised actions, context contamination — are symptoms of reaching for agentic architecture when a simpler pipeline would have done the job.

There's also a tension the piece doesn't resolve. It acknowledges that prompt injection "may be inherent to how LLMs process natural language," then prescribes an engineering framework to make agents production-safe anyway. If the foundational vulnerability is unsolved, the honest framing isn't "here's how to secure it" — it's "here's how to contain the blast radius of a system whose core security model remains open." That distinction matters for anyone deciding how much autonomy to hand over.

Useful as a checklist for teams already committed to deploying agents at scale. But the article it should have been starts with the question it never asks: given that 40% of these projects fail, how do you decide whether yours should exist in the first place?

More from the blog

Stay current weekly