Every provider now spawns subagents at scale. Spawning was never the hard part. Running the same job correctly and affordably across a hundred thousand database rows is. Here is what changes when the unit of agentic work becomes one subagent per record, grounded in governed memory.
Point an autonomous agent at one record and it looks like magic. It reads the contact, decides what to research, calls a few tools, writes a tidy summary. You watch it work and you think: do this for every row in my database and I am done.
So you do. You point the same agent at a hundred thousand rows. Two things break, and neither has anything to do with how good the model is.
The first is the bill. An autonomous agent decides for itself how many tool calls it needs. One record triggers three searches; the next triggers eleven because the agent got curious. Multiply a per-record cost you cannot predict by a hundred thousand and you do not have a budget, you have a lottery ticket. You find out what it cost after it has already cost it.
The second is consistency. The same instruction, run a hundred thousand times by an agent that re-decides its approach every run, produces a hundred thousand slightly different shapes. Some scores are 0-100, some are 1-5, some are prose. Some emails open with the pain point, some bury it in paragraph three. Each output is defensible on its own. As a column in your database, they are noise. Nothing downstream can trust a field whose format depends on the mood of the agent that wrote it.
The model was never the problem. The unit of work was.Spawning was the easy part
The last year shipped agent spawning into everything. Anthropic's Dynamic Workflows let Claude write a script that fans out subagents in the background while your session stays live. Kimi spawns tool-using subagents against a goal. OpenAI, Google, and a growing list of providers all have some version of "give the model a task and let it place a fleet."
These are genuinely good at what they are for: open-ended work where you want the agent to figure out the shape of the problem. One human, one goal, an agent that decides which files to read and which APIs to call.
But notice what they all assume. They assume the hard problem is getting a fleet of agents to exist. It is not. Spawning a thousand agents is a for loop. The hard problem is the one that shows up only after they exist: making a thousand of them do the same job, correctly, against a thousand different records, for a knowable price.
That second problem is the one I have spent the last stretch of work on. The answer turned out to require changing what counts as the unit of agentic work.
One subagent per row
For two years an agent has been a thing you talk to: a chat window, a session, a conversation that holds state. The shift is small to say and large in consequence. The unit is no longer a conversation. It is a record.
The pattern is one record, one subagent run. You select the rows that need work, and you dispatch a subagent per row that does five things in order: it plans, it gathers, it reasons, it acts, and it writes back. Its first step is grounded automatically in everything the organization already knows about that record: its memory, its history, the org's guidelines. It does the work, then it writes structured properties back onto the record and leaves free-text observations the next run will recall.
const contacts = await retrieveRecords({ type: "contact", conditions: [/* ICP filter */] });
for (const contact of contacts) {
await aiSubagent({
instructions: [ /* plan -> gather -> reason -> produce */ ],
outputs: [{ name: "next_best_action", required: true }],
tier: "pro",
metadata: { recordId: contact.record_id },
memorize: { email: contact.email, type: "Contact" },
});
}This scales to thousands of subagents in parallel. We cap concurrency to protect our own servers, but the architecture has no ceiling in principle. Each subagent can use most major LLMs, can call your MCP tools, and can run for up to ten minutes of real reasoning per record. That is not a one-shot completion. That is a small worker that researches, validates, and produces, then disappears.
The interesting part is not the fan-out. It is that the fan-out is boring. Every run is the same shape because every run executes the same authored instruction chain, not an improvised plan.
How structure buys back what autonomy spent
The trade is the whole game. You give up open-ended autonomy inside each agent, and you get back two things that production work cannot do without.
You get predictable cost. A structured subagent runs the instruction chain it was handed, on the model tier you picked, returning in a known token envelope. Ten thousand of them cost ten thousand times a number you can compute in advance. You can authorize the spend before it happens instead of discovering it after.
You get predictable quality. An instruction chain tuned on fifty records produces, on the next fifty thousand, output a downstream system can build on without second-guessing every row. The format holds. The reasoning steps are the same steps. The score is on the same scale every time.
Autonomy is the right default for one open-ended task. It is the wrong default for the same task repeated fifty thousand times.The customer who wants an agent to explore an unknown problem reaches for Claude Code or a dynamic workflow, and should. The customer who wants a hundred thousand records processed correctly, cheaply, and identically by tomorrow morning needs the opposite property. Same underlying models. Opposite design center.
Consistency is built, not hoped for
Predictable shape does not happen because you asked nicely. It is bought with specific machinery inside the instruction chain. Three pieces do most of the work.
Refuse instead of fabricate. The classic failure of a subagent at scale is the plausible invention: it cannot find the contact's employer, so it makes one up, and auto-memorize writes the fiction to your database forever. The cure is to give the model an explicit exit. If it cannot verify a fact, it aborts the run, and an abort writes nothing back.
"Research this company using the allowed tools. If after the search you cannot
verify even the company's basic identity, emit <abort reason='unverified_company'>
explanation</abort> instead. Do not invent."A refusal is a clean signal you can route on. A fabrication is a landmine you find months later.
Check the work against itself. For numeric tasks, the model's summary number often disagrees with its own component scores. So you make the last step recompute from the parts and overrule the model if they diverge.
// Step 2 - score the factors and produce a weighted total
"Compute 0.4*firmographic + 0.3*signals + 0.2*engagement + 0.1*champion. Round to integer."
// Step 3 - consistency check, then emit
"Recompute the weighted sum with the same formula. If it differs from step 2 by
more than 1, use the recomputed value. Then emit the required outputs."This is a math invariant, not creative reasoning, so it runs on a cheap tier. The agent is no longer allowed to disagree with itself silently.
Separate the audit from the rewrite. When generation must follow strict format and voice rules, the model usually gets the content right and breaks one rule on the way. Fusing "find the problems" and "fix the problems" into one step lets the model quietly mask its own findings. Splitting them into two steps, audit first, mechanical rewrite second, forbidden from adding new content, catches the violation without a human in the loop.
None of these are clever. That is the point. Consistency at scale is an accumulation of unglamorous guardrails, each closing one way the output can drift.
The part nobody ships in the box
I started this as a separate project. A loop that dispatched a tool-using subagent per record. It demoed beautifully and fell apart at volume, and the reason it fell apart is the reason this section exists.
The subagents had no shared memory and no governance. Each one started cold, re-derived what the last one already knew, and re-decided rules the organization had already settled. So the outputs drifted, not because the model was weak, but because every agent was reasoning from a blank slate over and over. The guardrails above fix how a single run behaves. They do nothing about the fact that ten thousand runs share no ground.
The missing layer was underneath, not above. What made the fleet consistent was giving every subagent the same two things on its first step, automatically: the governed memory scoped to its record, and the org's guidelines as live context. That is when I folded the project into Personize, as a dispatch layer on top of its governed memory rather than a thing bolted beside it.
Memory and governance are not features you add to subagent dispatch. They are the substrate that makes a thousand independent agents agree.Memory scoped to the record means each subagent inherits what every prior workflow learned about that entity, with hard isolation so one record's knowledge never bleeds into another's. Governance means the org's rules apply to all ten thousand runs at once, by default, not as a clause each prompt has to remember to repeat. Take those two away and you are back to a for loop spawning strangers.
Predictable pieces compose
There is a dividend on the far side of all this discipline. When ten thousand subagents produce typed, schema-conforming output, other systems and other agents can act on that output programmatically. One subagent's score becomes another's input. A typed property a research fleet wrote becomes the column a dashboard reads. The pieces wire themselves together because each piece is trustworthy enough to build on.
You cannot get that from a fleet of autonomous agents whose output shape varies per run. Unstructured output is a dead end for the next agent in line. Predictable output is a substrate.
Spawning thousands of subagents is a solved problem; every provider can do it now. Making thousands of them write the same true thing to the same database, on a budget you set in advance, is the problem worth solving. The answer is not a smarter agent. It is a smaller unit of work, grounded in memory the whole organization shares.
Companion pieces: Agent Dispatch is Live on the dispatcher-and-subagent shape this builds on. Instructions, Not Prompts on the per-minute economics that make the cost forecastable. Why Agents Fail Without Memory and Zero Cross-Entity Leakage on the governed-memory substrate underneath every dispatched subagent.