AI ROI looks disappointing when the company buys tools but keeps the old operating model.


There are two AI conversations happening at the same time.

In one, the technology is obviously powerful. Agents can work for hours, call tools, write code, summarize messy histories, draft documents, classify work, and coordinate across systems. Independent studies show real productivity gains in support, development, and knowledge work. Startups are running leaner. Solo businesses are crossing revenue thresholds that used to require teams.

In the other, executives keep asking where the ROI is. The pilots are everywhere. The dashboards show usage. The invoices are real. The margin impact is harder to find.

Both conversations can be true because they are talking about different things.

AI capability can improve before the organization knows how to absorb it.

That gap is the ROI problem.

Tool activity can rise before ROI. ROI appears when workflow, memory, ontology, and decision rights are redesigned.

Adoption is not transformation

The most common mistake is measuring adoption as if it were business change.

How many seats did we deploy? How many prompts did employees run? How many documents did the assistant summarize? How many tickets touched the AI system? Those are useful operating metrics, but they are not ROI. They are evidence that the tool was present.

ROI shows up somewhere else: cycle time, conversion, cost to serve, revenue per employee, issue resolution, customer retention, error rate, time to onboard, percentage of work handled without escalation.

The difference matters because a company can have very high AI activity and very little organizational change. Employees use the tool inside the old process. The old queue remains. The old approvals remain. The old reporting remains. The old data cleanup remains. The old manager still asks for the same status update because the system did not become trustworthy enough to remove it.

The model sped up fragments of work. The production system stayed the same.

That is not a model failure. It is an absorption failure.

The consultant trap

The a16z Charts piece included a striking market signal: Accenture, once treated as a likely winner from AI adoption and implementation demand, saw its valuation compress sharply from early 2025 levels. I would not overread one stock chart. Markets are noisy, and companies are complicated.

But the chart is a useful symbol of a broader correction.

The early enterprise AI story was friendly to consultants: every company wants AI, every company needs a roadmap, and somebody has to implement it. That story assumes the hard part is adoption. Pick use cases, integrate tools, train people, manage change.

The harder truth is that implementation is not the same as redesign.

You can implement AI into a broken process and make the broken process faster. You can build a chatbot over messy knowledge and make the mess more accessible. You can automate a handoff that should not exist. You can summarize data that the agent should have been able to retrieve as structured memory. You can spend a year adding AI to workflows that should have been deleted.

That work creates activity. It may even create local wins. But it does not necessarily create firm-level ROI.

The consultant trap is mistaking "AI has been added" for "the work has been remade."

Why the gains are uneven

This also explains why AI productivity results look uneven across studies.

Give a good assistant to a new support rep and the gains can be large. The work is repetitive, the knowledge gap is real, the success metric is clear, and the AI helps retrieve and compose what the rep does not yet know.

Give an early agent to an expert doing work they already perform fluently, inside a workflow that was not redesigned, and it can slow them down. They have to supervise, correct, translate, and manage the tool. The AI adds a second job: operator of the assistant.

Both results make sense. AI helps when it removes a real bottleneck. It hurts when it adds coordination overhead to a person who was not bottlenecked there.

This is why "AI for everyone" is weaker than "AI for the right workflow." The workflow determines whether the model is leverage or friction.

The discovery bottleneck

The most useful phrase from the research summarized in the PDF is "discovery bottleneck."

Companies are not merely deciding whether to use AI. They are trying to discover where AI changes production. That is a different problem. It requires understanding the work deeply enough to see which parts should be delegated, which should be redesigned, which should be governed, and which should remain human.

This discovery work is slower than buying software. It is also where the value is.

The winning question is not "what can this model do?" Models can do a lot. The winning question is "where does our current process exist only because humans used to be the only flexible reasoning layer?"

Those places are everywhere:

  • humans reading ten systems before taking one action
  • managers collecting status because systems do not produce it
  • analysts translating business terms into database fields
  • sales reps rebuilding account context before every touch
  • support reps searching policy, customer history, and product notes separately
  • operators checking work because the system has no persistent memory of exceptions

These are not isolated tasks. They are symptoms of missing infrastructure.

The infrastructure of ROI

If the ROI comes from redesign, then the infrastructure has to support redesigned work.

That means memory. The agent must retain what it learned about a customer, account, ticket, employee, project, or policy. Otherwise every run pays the same context cost again.

It means ontology. The agent must understand the business meaning of the entities, metrics, stages, relationships, and rules it touches. Otherwise it operates on raw fields and plausible guesses.

It means retrieval as a session, not a stateless query. Long-running agents should not rediscover the same facts every step or flood the context window with repeated evidence.

It means instructions, not prompts. The workflow should be explicit enough to run repeatedly, inspectably, and at scale.

It means governed writes. The system must know what the agent can update, what requires approval, what should be logged, and what should never be touched.

It means outcome measurement. The point is not that the agent ran. The point is that the workflow got faster, cheaper, more accurate, more complete, or more scalable.

Without that layer, the company is asking a model to create ROI inside an operating system designed for humans manually moving context around.

A practical diagnostic

Here is the diagnostic I would use before funding an AI project.

First, name the business outcome in a sentence. Not "deploy an AI assistant for support." Say "reduce time to first qualified response on tier-two support tickets by 40 percent without lowering resolution quality."

Second, draw the workflow as it exists. Include the ugly parts: copying from Slack, checking the old spreadsheet, asking the same person for approval, searching the policy doc, rewriting the same CRM note.

Third, ask which parts exist because the system lacks memory, meaning, or authority. Those are the candidates for redesign.

Fourth, decide what the agent should own. Retrieval? Drafting? Classification? Tool calls? Follow-up? Record updates? Escalation? Do not say "assist." Say what it owns.

Fifth, define the trust boundary. What can it do alone? What needs review? What gets written back? What gets remembered?

Sixth, measure the outcome before and after. If the metric is usage, you are not measuring ROI.

This is slower than announcing a Copilot rollout. It is also the work that makes the rollout matter.

The ROI will not be evenly distributed

The uncomfortable part is that AI ROI will compound unevenly.

Two companies can buy access to the same model and get completely different results. One bolts it onto old workflows and sees scattered productivity. The other remaps the work, gives agents memory, defines an ontology, governs writes, and measures outcomes. The second company gets a learning system. The first gets an expense line with enthusiasm attached.

This is why the next phase of AI competition will feel unfair. The model layer is increasingly shared. The organizational layer is not. Companies that redesign work will turn the same model into operating leverage. Companies that only deploy tools will wonder why the promised gains keep showing up in someone else's numbers.

The ROI question is real. Skepticism is healthy. But the answer is not to wait until AI gets a little better.

The answer is to stop treating AI as a tool rollout and start treating it as a production redesign.

Until the org changes, the ROI will look worse than the capability.

After the org changes, the capability starts to compound.


The companion piece, AI-Native Is Not AI-Enabled, covers the strategy. Where to Start With Agents turns it into a first deployment pattern.