A real fact about a customer lives in four places at once: a typed property, an atomic memory, a document chunk, and a graph edge. Most retrieval stacks pick one and let the agent stitch the rest. Here is the storage discipline that makes returning all four in one call actually useful.
A real fact lives in four places
A real fact about a customer lives in four places at once. The stage of the deal sits in a property (stage = qualified). The reason it is qualified sits in a sentence from a call transcript: "the CTO said budget was approved on the September call." The competitive context sits in a paragraph of a deal review document. The political dynamic sits in an edge in the org graph: Jane reports to Marcus, who is the actual decision-maker.
Most retrieval stacks pick one of these and let the agent stitch the rest. Vector stores see memories. Property databases see structured fields. Document stores see chunks. Graph databases see edges. Each one returns its own native shape, on its own API, with its own ranking model. It is the agent's job to figure out that the property stage, the memory about the CTO's call, the document about the review, and the edge from Jane to Marcus are all facets of the same thing.
This is the multi-system retrieval pattern. It is also where most production agent code goes to die.
This article is about what changes when those four sources come back in one call, with one identity resolution, with one set of citations that span all four. And, more importantly, about the storage decision underneath that makes the unified call workable in the first place: memories have to be atomic and properties have to be typed, or the unified payload becomes a soup the agent cannot reason over.
The four-source problem
The naive agent calls the vector store with the user's query, gets back six memory chunks, dumps them into the prompt, and asks the model to answer. This works for chat assistants because the user is usually asking "tell me what you know" and the vector store is usually good enough for that.
It stops working the moment the question is mixed-mode.
"What's the stage of this deal and what's the most recent objection?" mixes a deterministic property lookup (stage) with a semantic memory search (most recent objection). The vector store will surface memories about stages. The property database will give you the exact stage value. The right answer requires both, plus the awareness that the property is the source of truth and the memory is the context.
"Find all enterprise contacts in healthcare who were recently asked about HIPAA" mixes a filter (industry = healthcare, segment = enterprise) with a semantic search (HIPAA conversation). Again, two retrieval systems, two ranking models, and an agent that has to join them and pick which is authoritative.
"What is Jane likely to push back on, based on what we know?" mixes the graph (Jane's role, her org, who she reports to), property data (her record's flags), memory (her prior objections), and possibly a playbook document (how to handle her segment). Four sources. One question.
When the agent has to make those four calls itself, three things happen.
It writes a lot of plumbing code per pipeline. Every new agent needs the same fan-out, the same identity resolution, the same dedup. The plumbing is not the product, but it is most of the engineering time.
It pays for four round-trips and four serialization passes. Each call has its own latency, its own retry semantics, its own rate limits. The agent's wall-clock retrieval time is the sum, not the max.
It gets the joins subtly wrong. The contact in the property database has a recordId. The memory in the vector store has a recordId. The graph edge points at a recordId. If any of the four systems uses a different identity model, and they often do because they were built at different times, the agent ends up with a partially-merged view of the customer and answers based on it.
The storage decision underneath
The first instinct, when you see this pattern, is to build a federation layer. A service in front of the four stores that fans out, normalizes, and merges. Many systems have tried this. Most run into the same wall.
The wall is that the four stores were not designed to be merged. The memory is a 500-token chunk that mentions three different facts. The property is a typed value with no source attribution. The document chunk overlaps with another document chunk. The graph edge has no temporal context. The federation layer can splice the JSON, but it cannot make the agent's job easier, because the underlying shapes do not compose.
The decision that does work is upstream of retrieval. It is about how the memory is stored in the first place.
A memory has to be atomic: one extracted fact, self-contained, coreference-resolved, temporally anchored, with its own ID. Not a chunk of conversation. Not a paragraph from a transcript. A single statement of a single thing.
"The CTO, James Chen, mentioned on the October 2025 call that they are evaluating three cloud migration vendors, including Personize."
That is atomic. The pronoun has been resolved (he → the CTO, James Chen). The time has been anchored (recently → October 2025). One fact. One ID. Citable, dedupable, paginatable.
A property has to be schema-enforced: typed, validated, extracted by an LLM that knows the schema's own definition of what a valid value looks like. Not a free-text field. Not "approximately $450K." A typed value:
buying_stage: "Evaluating" (type: options, confidence: 0.88)
competitor_count: 3 (type: number, confidence: 0.90)
budget_range: 450000 (type: number, confidence: 0.92)
That is queryable by filter, comparable across records, usable in downstream workflows.
I have written about why dual storage matters at the storage layer. The point for this article is that retrieval inherits the storage decisions. If the memory is atomic, retrieval can return one memory at a time, with an ID, with a citation marker, with deduplication across calls. If the property is typed, retrieval can filter on it deterministically and return its current value alongside the relevant memories. Unified retrieval is the consequence of atomicity and typing at the storage layer, not a federation layer on top of it.
What one call actually returns
A unified retrieval call against a record returns the four sources at once, identity-resolved, with consistent labels and a single coverage map.
// One call: "scout this record"
{
"mode": "scout",
"record": { "email": "jane@acme.com" }
}The response, abbreviated:
{
"record": {
"id": "REC#1",
"type": "contact",
"displayName": "Jane Doe",
"properties": { "stage": "qualified", "revenue": 1500000 },
"aliasExpansion": { "sharedDomain": ["REC#COLLEAGUE_1"] }
},
"memories": { "tier1": [ /* atomic facts, each with its own ID */ ] },
"documents": { "must": [ /* relevant doc chunks, each with provenance */ ] },
"graph": { "neighbors": [ /* edges, each with type + counterparty */ ] },
"state": { "sessionId": "...", "coverage": { /* per-source counts */ } }
}Four things to notice.
One identity resolution. record.id = REC#1 is the resolved record. Every memory ID, every document chunk, every graph edge in the payload is associated with this record. The agent does not have to map between four different identity models.
Properties come first, typed. stage: "qualified" is the source of truth for that property. The memories about the stage are context, not authority. The agent that asks "what stage is this deal in?" reads properties.stage and trusts it. No semantic search required.
Atomic memories carry their own IDs. Each item in memories.tier1 is a self-contained, coreference-resolved fact with an ID. When the agent cites it, it cites a specific fact, not a chunk of conversation that contained the fact. When the next retrieval call comes in via the expand mode, the system can skip this exact memory by ID. Atomicity makes both citation and pagination work.
Alias expansion is automatic. aliasExpansion.sharedDomain quietly surfaces other records associated with the same domain (other Acme contacts). For a contact research agent, this is what "who else should I look at?" looks like, served as a free side effect of identity resolution.
The payload is one HTTP call, one set of credit charges, one coverage map, one identity. The agent code that consumes it is shorter than the agent code that would have called four stores and tried to merge.
Why atomicity is the load-bearing decision
If memories were stored as 500-token chunks instead of atomic facts, the unified payload would still be possible, but it would not be useful.
A chunk-style memory says: "We had a call with the CTO last week. He mentioned they are evaluating three vendors. He also said budget was approved. The VP of Engineering was on the call and raised concerns about API reliability." That is four facts in one record.
When the agent retrieves this chunk in step 1 and a related chunk in step 4, both chunks contain the "evaluating three vendors" fact. Coverage tracking cannot dedup them because they are two different chunk IDs. Citations point at the chunk, not the fact, so the audit trail says "based on chunks A and B" when the actual evidence is the same sentence in both. The agent's reasoning treats the over-represented fact as more important than it is. The same silent quality collapse that stateless retrieval produces, but inside the memory store itself.
Atomicity solves all four of those problems at once.
- Dedup: two retrieval calls returning the same atomic fact merge at the ID level.
- Citation: the marker
[M3]points at a single fact, not a paragraph. - Pagination: the system skips exactly the facts already delivered, not approximately.
- Weighting: each fact appears once or not at all, so the model does not over-weight repeated content that was never actually emphasized in the source.
Why types are the other load-bearing decision
The same logic applies to properties.
If a property is a free-text field ("the budget is in the mid-six figures"), it cannot be filtered. The agent that wants "all deals over $1M" has to read the prose for every record and decide. That is slow, expensive, and noisy.
If the property is a typed value (budget: 450000, type: number), the filter is deterministic. The agent does not even need to read the memory store for "all deals over $1M". The filter mode returns the records directly, no LLM in the loop.
Types also let the unified payload pair properties with memories cleanly. The stage property is qualified, value-true at this moment. The memories about how the deal got qualified are the narrative. The agent reads the typed property as the current state and the memories as the context, and the two answer different questions on the same record. Typed properties say what is true now. Atomic memories say what happened.
What the agent code looks like
The same agent that used to fan out four calls now looks like this:
const retrieved = await client.retrieve({
mode: "scout",
record: { email: "jane@acme.com" },
sources: { properties: true, memories: true, documents: true, graph: true }
});
const stage = retrieved.record.properties.stage;
const recentMemories = retrieved.memories.tier1;
const playbookExcerpt = retrieved.documents.must.find(d => d.type === "playbook");
const reportingChain = retrieved.graph.neighbors.filter(n => n.edgeType === "reports_to");No client-side joins. No four-system orchestration. No identity reconciliation. The agent reads what it needs from a single object that already understood the record was Jane Doe at Acme and pulled the right things from each source.
For the autonomous loop covered in Retrieval Is a Conversation, this also means coverage tracking and expand work across all four sources at once. The agent's "give me more" call paginates memories, documents, and graph edges in lockstep. The conversation continues; the agent does not have to manage four separate paginations.
What this buys for an autonomous agent
Three concrete things.
Less plumbing per agent. The boilerplate that used to be 60 lines of fan-out and merge becomes one call. Engineering time goes into the agent's reasoning, not into the retrieval layer.
Less identity drift. A single resolution step at the API boundary means the agent never operates on a half-merged view of the entity. The bug class "the property data and the memory data refer to subtly different records" disappears.
Real auditability. Because every memory is atomic with its own ID, every property is typed with its own provenance, and every document chunk carries its source, the agent's output can carry citations like [M3], [D2], [P:stage], [G:reports_to] and have them mean something specific. The compliance officer who asks "why did the system say this?" gets a precise answer instead of "the model decided based on context."
The unified retrieve is not a federation layer. It is what is possible when the storage layer was designed to be retrieved as one thing.
The principle
Composability is a storage property, exposed through a retrieval API. Atomic memories make memory composable with itself across calls. Typed properties make properties composable with memories across questions. Together, they make the unified payload more than a JSON merge: they make it a coherent view of the entity that an autonomous agent can reason over without stitching.
If the storage layer is not atomic and not typed, no retrieval API on top of it can fix the problem. The agent will keep doing the joins, paying the round-trips, drifting on identity, and over-weighting repeats.
If the storage layer is atomic and typed, the retrieval API is allowed to be simple. One call. Four sources. One identity. One coverage map. The agent stops being a data integration system and starts being an agent.
Companion pieces: Dual Memory: Free-Text Facts and Typed Properties covers the storage side of the same architecture. Retrieval Is a Conversation covers the per-record pagination pattern that complements unified payloads in autonomous loops.