Progressive Context Delivery: How We Cut Token Usage 50% in Multi-Step Agents

When agents re-plan and act in loops, re-injecting the same governance context at every step is expensive and makes outputs worse. Here's the fix.

TL;DR

Autonomous agents operating in multi-step loops invoke governance routing multiple times per session. Without session awareness, the same policies get re-injected at every step.
This creates three compounding problems: token bloat, accuracy degradation (redundant context competes with fresh instructions for model attention), and unnecessary cost.
Progressive context delivery maintains a session state tracking what's already been delivered. Each step gets only new or newly relevant content.
Result: 50% token reduction across multi-step workflows with no quality loss.
Supplementary variables are intentionally not tracked — they can be promoted to critical on a later step if the task evolves.

Most discussions of AI agent governance focus on a single interaction: the agent gets a task, governance context is injected, the agent responds. Clean, simple, one shot.

Production agents don't work like that.

How Autonomous Agents Actually Operate

A modern agentic workflow looks more like this: the agent receives a task, makes a plan, takes an action, observes the result, re-plans based on what it learned, takes another action, observes again — and so on, for as many steps as the task requires. Without human intervention between steps.

At each step, the agent may need governance context. The compliance policy about what the agent can and cannot commit to in writing. The brand voice guidelines that govern tone. The pricing rules that constrain what offers can be made. These don't change between steps. But without session awareness, the governance routing layer doesn't know that. It sees a new request and does what it always does: selects and injects the relevant governance variables.

Step one: brand voice injected, compliance policy injected, pricing rules injected. Step two: same brand voice, same compliance policy, same pricing rules — all injected again. Step three: repeat.

By step five of a typical agentic workflow, the agent's context window contains the same three governance variables five times over. The content hasn't changed. The agent doesn't need it again. But it's there, consuming space that should be reserved for the actual task.

Three Compounding Problems

Naive re-injection creates problems that compound across steps.

Token bloat. Context windows fill with previously delivered material. A compliance policy that takes 800 tokens gets delivered at every step — in a five-step workflow, that's 4,000 tokens of redundant context before you've added a single token of task-specific content. At scale, across hundreds of concurrent workflows, this translates directly to API cost.

Accuracy degradation. Redundant context competes with fresh instructions for model attention. Liu et al. demonstrated that models under-use information in long-context middles. As context grows, each piece of information receives proportionally less attention. When step five's context window is half-full of governance material the agent has already processed, the new task-relevant instructions are fighting for attention against five copies of the same compliance policy. The model attends less precisely to what actually matters for this step.

Unnecessary cost. Every duplicated token is billed. In a simple two-governance-variable scenario across ten steps, naive re-injection bills roughly 10x the tokens that progressive delivery does. For organizations running thousands of concurrent agentic workflows, the difference is not marginal.

How Progressive Delivery Works

The mechanism is straightforward: maintain a session state record that tracks which governance variables — and which sections within them — have already been delivered to the agent in this session.

On each governance routing call:

Run the normal routing logic to select relevant variables (critical and supplementary)
Check each selected variable against the session state
For critical variables: if already delivered, exclude from this step's injection. If not yet delivered, inject and record as delivered.
For supplementary variables: inject if newly relevant, but do not record as delivered

Progressive delivery session state — which governance variables are delivered at each step

The treatment of supplementary variables is intentional. Supplementary context — guidelines that are potentially relevant but not essential for the current step — should not be locked in as "delivered." The agent's task may evolve. What was peripheral in step one may become critical in step three. By not recording supplementary delivery, those variables remain eligible for promotion when the task shifts.

A session TTL (default: 24 hours) bounds staleness. If a governance variable is updated mid-session — unlikely, but possible — the TTL ensures the session state eventually resets and the agent receives the fresh version.

The Numbers

Across multi-step workflows, progressive delivery reduces token usage by 50% with no quality degradation.

Token usage per step — naive re-injection vs. progressive delivery

The quality finding is as important as the cost finding. This isn't a tradeoff where you save tokens by accepting worse governance coverage. The agent receives exactly the governance context it needs, exactly when it's new. It stops receiving context it has already processed and doesn't need again. Removing the redundancy improves the signal-to-noise ratio in the context window.

The 50% reduction comes entirely from eliminating repetition, not from delivering less governance context overall.

Across a full session, the agent receives every relevant governance variable — once. After that, it receives only variables that become relevant as the task evolves.

Session-Aware Design Patterns

Progressive delivery enables a set of agentic patterns that aren't practical without it.

Long-horizon tasks. An agent working on a multi-hour research and drafting task might take dozens of steps. Without progressive delivery, governance context accumulates proportionally. With it, governance overhead is roughly constant after the first few steps — the agent has its core constraints and only receives updates when something new becomes relevant.

Task pivots. When an agent's plan shifts mid-execution — the initial approach didn't work, or new information changes the strategy — new governance context can surface. Variables that weren't relevant to the original plan become critical for the pivot. Because supplementary variables aren't locked in the session state, they're available for exactly this promotion.

Multi-agent handoffs. When one agent's output feeds into another agent's task within the same session context, progressive delivery ensures the receiving agent doesn't re-receive governance context the session has already established. The handoff carries the session state.

The Relationship to Token Efficiency

Progressive delivery addresses a specific category of token waste: redundant governance context. It works alongside, not instead of, other efficiency mechanisms.

Seven memories per entity is the complementary finding on the memory side: output quality saturates at approximately seven governed memories, so injecting more is waste from a quality standpoint, not just a cost standpoint. Progressive delivery handles the governance side of the same principle: inject what's needed, don't repeat what's already there.

Together, they define a design philosophy: context window space is not free, and filling it with anything other than what the current step needs is a cost you're paying for nothing.

The agents that work well at scale aren't the ones with the largest context windows. They're the ones that use context precisely.

Frequently Asked Questions

What if a governance variable is updated mid-session? The session TTL handles this. If the session is within the TTL window (default 24 hours), updated variables will be re-delivered on the next routing call since the session state tracks the version delivered, not just the variable ID. Operators can also reset session state explicitly if an urgent policy update needs to reach all active agents immediately.

Does progressive delivery work across multiple agent nodes in the same workflow? Yes, if the nodes share a session context. In a multi-agent pipeline where a coordinator distributes tasks to worker agents, the session state can be shared across nodes — ensuring that governance context delivered to one node isn't re-delivered to another in the same session.

What's the tradeoff of not recording supplementary variables? The tradeoff is that supplementary variables may be re-evaluated on every step. In practice, this is cheap — evaluation in fast mode takes ~850ms and involves no LLM calls. The benefit — keeping supplementary variables eligible for promotion when tasks evolve — outweighs the cost of re-evaluation.

How does progressive delivery interact with reflection-bounded retrieval? They operate on different parts of the context window. Progressive delivery manages governance context. Reflection-bounded retrieval manages entity memory. Both reduce the volume of content competing for model attention, but they address different pools of context.