Hamed Taheri

Who's Running Your CRM: You, Your Agents, or Your Agents' Agents?

Three architectures are reshaping revenue operations. A human clicking dashboards. A human directing agents. Agents directing agents. Only the third changes what's possible, and two companies just built the door.

11 min read

65% Off Memorization: How Batch APIs and Prompt Caching Change the Cost Per Memory

Provider batch APIs cut LLM extraction cost in half. Prompt caching cuts another fifteen to twenty percent off. Combined, customers who can wait 1-24 hours pay roughly a third of what they would on the sync path. Here is the architecture that makes that discount actually reach the customer, and why writes are the right place to spend the latency budget you saved at retrieval.

1 / 9

All writing67 posts

13 min read
Enterprise-grade accurate personalization at scale
Accurate personalization at scale is an architecture, not a prompt. Built on governed memory as the single source of truth, it enriches and cross-validates what you know, breaks the work into specialized zones tuned across many records, and wraps the model in deterministic guardrails and self-review, so every claim is either specific and true, or honestly general. Here is the framework, and why it holds at enterprise scale.
Analysis6 min read
Trust Is the New Payload
CrowdStrike's 2026 report says 82% of attacks now carry no malware and calls the pattern 'trust exploitation.' The number is real and the label is too small. What the telemetry is measuring is personalization: a machine that models a human and acts on the model faster than anyone can check.
10 min read
Agentjacking: When Read Access Quietly Became Execute Access
A public error-reporting key, a fake bug report, and a coding agent that runs the attacker's code with your credentials. Agentjacking is not a Sentry bug. It is proof that 'read this' and 'act on what you read' have fused into one permission, and almost no one is looking at the gap.
11 min read
Why Coding Agents Forget, and What a Real Memory Layer Must Do
Your coding agent does not remember your codebase; it re-perceives it every session, greps and reads its way back to a mental model, then throws that model away. That re-perception is a line item, and the fix is not a bigger context window. It is a persistent, governed model of the codebase the agent writes to and reads from under a budget.
7 min read
Modeling the World for Agents
There is a new modeling discipline emerging, the way relational schema design emerged in the 1970s. This time the reader is a stochastic model with an expensive context window, and the objective flips from saving bytes to maximizing signal per token.
Analysis7 min read
Wiki Memory Is the Right Idea. A Person Is Not a Codebase.
LangChain's wiki-memory pattern validates the core lesson of agent memory: synthesize at write time, don't rediscover structure at query time. It also names its own limit. The moment the wiki is about a person, files stop being enough.
Analysis8 min read
AI-Native Is Not AI-Enabled
The firms getting leverage from AI are not just adding models to existing workflows. They are redesigning the work around what the model can do. That distinction is the whole game.
Analysis8 min read
The Lean Agentic Company
AI startups are not just hiring fewer people. They are pointing at a new company shape: smaller teams, flatter coordination, more machine work, and a much higher burden on memory and systems.
Analysis8 min read
Why AI ROI Looks Bad Until the Org Changes
AI ROI often looks weak because companies measure tool adoption while leaving the production system untouched. The gains appear when the organization redesigns the workflow, memory, and decision rights around agents.
Analysis5 min read
Agents Are Eating the Hour-Long Task
The shift from chatbots to delegated, long-horizon work is already here, and the productivity is real. What the vendor and independent data actually show, and why now is the moment to lean in.
Guide5 min read
Where to Start With Agents
Most agent pilots stall, and the few that pay off share a copyable pattern. A practical on-ramp: which task to pick first, how to set it up so it actually learns your work, and how to get a real result this month.
Note2 min read
154 subagents: feature or smell?
A curated collection of 154+ specialized Claude Code subagents is impressive and also a tell. Specialization has a coordination cost, and one of the ten categories is managing the other nine.
Analysis8 min read
Agents Don't Need Your Database. They Need Your Ontology.
The raw schema was built for a query planner, not an agent. In 2026 every major data platform admitted it and shipped the same fix: a semantic layer, and above it, an ontology. Here's the difference, and why it's the whole game.
Note2 min read
Headroom, and the context window as a budget
Compressing tool outputs before they reach the model is becoming standard tooling. The interesting part isn't the percentage, it's the shift from treating context as a bucket to treating it as a budget.
Note1 min read
On-machine vs hosted agents: goose and Foundry answer different questions
goose runs on your machine; Microsoft Foundry runs your agent for you. The real axis isn't open vs closed, it's control and locality vs managed scale. Each wins a different job.
Note2 min read
Portable agent memory is table stakes, not a moat
EverOS carries an agent's memory across Claude Code, Copilot, Gemini, and OpenCode. Portability is the half that's commoditizing. The defensible half is governance, and it's harder.
13 min read
The Retrieval Explosion: The Economics of AI Agents in Your CRM
AI agents now work for hours and make dozens of retrievals per task. As each customer gets worked by tens to hundreds of agents a month, the cost of letting them know your customers scales exponentially — unless a memory layer sits between the agents and your CRM.
10 min read
Agent Spawn: Dispatching Thousands of Subagents in the Database
Every provider now spawns subagents at scale. Spawning was never the hard part. Running the same job correctly and affordably across a hundred thousand database rows is. Here is what changes when the unit of agentic work becomes one subagent per record, grounded in governed memory.
51 min read
Personize Governed Memory vs. the Alternatives
My team asked me to compare Personize against Pinecone, Mem0, Zep, Supermemory, and AWS's native memory. So I wrote the honest version: where each tool is genuinely the right choice, where it stops, and why memory for a fleet of agents over a database of entities is a different problem than memory for a chatbot remembering one user.
FeaturedLeadership11 min read
Who's Running Your CRM: You, Your Agents, or Your Agents' Agents?
Three architectures are reshaping revenue operations. A human clicking dashboards. A human directing agents. Agents directing agents. Only the third changes what's possible, and two companies just built the door.
16 min read
From Relational Schemas to a Memory Model for Agents
A work-in-progress framework. The relational model and entity-relationship design gave software a way to model a world so a machine could use it. Agents need an equivalent, and it is not a table. This is the memory model I have been building toward, in stages, and how it applies to our own engineering team.
Featured11 min read
65% Off Memorization: How Batch APIs and Prompt Caching Change the Cost Per Memory
Provider batch APIs cut LLM extraction cost in half. Prompt caching cuts another fifteen to twenty percent off. Combined, customers who can wait 1-24 hours pay roughly a third of what they would on the sync path. Here is the architecture that makes that discount actually reach the customer, and why writes are the right place to spend the latency budget you saved at retrieval.
Featured18 min read
Agent Dispatch is Live: One Script, Hundreds of Subagents, All Governed
A code-capable agent reads your records, decides what work each one needs, and places hundreds of trigger-bound subagents in a single script run. Some fire now. Some fire when conditions are met. All governed, all observable, all on the same per-minute rate. Here is what we shipped, the use cases we are exploring with early users, and what you already get.
Featured12 min read
Documents Are Memories Too: The Claude-Markdown-File Pattern in a Governed System
AI agents need a place to write sectioned markdown notes they can later read, append, or rewrite. The same shape Claude uses for its memory files, but inside a governance and retrieval layer that everything else uses. When to save as a document, how the section-update pattern works, and what flipping the aiExtraction switch does to your modality picture.
Featured12 min read
Instructions, Not Prompts: How We Run 1,000 Agents Per Minute Without the Bill Going Sideways
Per-token pricing punishes you for using AI at scale. Per-minute pricing forces you to design agents like production systems. Here is the pattern we converged on, the trade-offs of one big chain versus a forest of small agents, and what changes when the models themselves are designed for workflows.
Featured13 min read
One Call, Four Sources: How Atomic Memories and Typed Properties Make Unified Retrieval Possible
A real fact about a customer lives in four places at once: a typed property, an atomic memory, a document chunk, and a graph edge. Most retrieval stacks pick one and let the agent stitch the rest. Here is the storage discipline that makes returning all four in one call actually useful.
Featured9 min read
Retrieval Is a Conversation, Not a Query: Why Autonomous Agents Need Pagination as a First-Class API
Stateless retrieval is fine for chat. For autonomous agents taking 100 steps with no human in the loop, it is the single biggest source of wasted output tokens, context bloat, and silent quality collapse. Here is the pattern that replaces every-call-is-fresh with retrieval-as-conversation.
Featured13 min read
Schema-Guided Extraction: How Your Properties Teach the LLM What to Look For
Every property in your schema carries a type, a description, extraction instructions, examples, and measurement criteria. Together they form a per-property system prompt the LLM consumes at extraction time. You stop hand-prompting; the schema does it. Here is what an extraction-ready property actually looks like, and why this is the only way schema-based memory scales past a handful of fields.
Featured10 min read
Think and Execute For Me: Intent-Based Retrieval for Expensive Agents
Every output token your agent produces is at the premium rate of the strongest model in the loop. When the agent's job is to compress raw retrieved items into a paragraph, you are paying premium rates for compression. Here is how intent-based retrieval moves the synthesis to where the data lives, and what that does to the token bill.
Featured12 min read
The Three Channels of Graph Writes: Declared, Property-Inferred, LLM-Inferred
A relationship between two records can be told to the system, inferred deterministically from extracted properties, or surfaced by an LLM reading free-text memories. Three independent channels, three different cost profiles, opt-in per write. The same three apply whether the write target is a record or a document. The graph is not a separate system; it is an opt-in fan-out on the same save call.
Featured11 min read
Three Modalities, One Save Surface: Atomic Memories, Typed Properties, and Markdown Documents
AI agents need three different kinds of memory: extracted atomic facts, typed structured properties, and sectioned markdown documents they can revise like notes. The discipline is picking the right modality for the kind of knowledge. The unified save surface is what makes all three composable at retrieval time, which is the only place that matters.
17 min read
The Knowledge Problem in Enterprise AI: Why RAG Isn't Enough, and What We're Building Instead
Enterprises have thousands of documents AI agents should read. Context windows say they can't. RAG helps, but loses structure and intent. Here's how we're solving it.
Featured11 min read
57x Compression for AI Coding Agents
1.5M tokens of source code, compressed to 27K of curated context. Benchmarked against Graphify and raw grep on 20 real engineering tasks; the curated approach won on total workflow cost.
Research12 min read
Beyond Fact Count: Measuring What Actually Matters in Agent Memory Extraction
Memory extraction quality is not about how many facts you extract. It's about entity awareness, strategic depth, smart splitting, implied context, and cross-entity intelligence -- the levels that separate storage from understanding.
15 min read
Guided Memory Extraction: Why Domain Expertise Belongs in Your Memory Pipeline
Why AI agents need extracted memories over raw content, and why guided extraction with entity awareness and domain expertise is what separates useful memory from noise.
Build Log11 min read
One Endpoint to Replace Ten: What We Learned Building a Unified Recall Interface
We had ten memory endpoints. Our agents only needed one. Here's what five rewrites of an intent classifier taught us about building a natural-language query layer for AI memory.
17 min read
The AI Vendor Paradox: You Trust Them With Your Data But Can't Verify Anything
Enterprise AI requires handing sensitive data to platforms you can't inspect. The answer isn't better trust. It's better architecture, one where trust isn't required because verification is built in.
15 min read
4-Tier PII Redaction: How We Built Privacy Into the Memory Layer, Not Around It
AI memory systems ingest everything: emails, transcripts, documents. PII is embedded in all of it. We built redaction into the write path itself, not as a retrieval filter. Here's the architecture and why the distinction matters.
Leadership12 min read
Why the Next Wave of AI Winners Will Be Infrastructure Companies, Not Model Companies
The model layer is commoditizing at 280x in two years. The real defensibility is in the infrastructure between the model and the enterprise: memory, governance, deployment. That's the bet we're making.
Featured10 min read
Moving Governance and Evaluation Below the Application Layer
Governance lives in system prompts. Evaluation lives in separate pipelines. State lives in session stores. We moved all three into the infrastructure layer of an AI API. Here is the architecture and what it changes.
Featured15 min read
We Replaced messages[] With steps[] in Our Agent API. Here's Why.
We started with the same messages[] pattern everyone uses. For complex, repeatable agent workflows it kept failing in predictable ways. So we decomposed instructions into sequential steps with scoped tools and shared context. Here's what we learned.
Featured11 min read
Code-Orchestrated Agents vs. Tool-Calling: The Architecture Decision That Matters Most
Stripe, Shopify, and Salesforce all converged on the same pattern: LLM decides, code executes. Here's the architectural reasoning, the trade-offs, and when tool-calling actually makes sense.
Research10 min read
LLM Function Calling in Production: What the Benchmarks Actually Say
The best models fail 30% of the time on complex tool-calling scenarios. Seven documented error patterns, infinite loop failures, and silent cascading errors. Here's what the data says before you ship function calling to production.
Research9 min read
Adversarial Governance Compliance — Our Methodology and What Near-Perfect Accuracy Tells Us
Delivering the right context to agents is one problem. Ensuring they respect what they must never do is another. Here's how we designed our adversarial governance experiment, what our results show, and why this work is never finished.
11 min read
Dual Memory: Why You Need Both Free-Text Facts and Typed Properties
38% of valuable information no schema anticipated. 12% only usable with type enforcement. Neither modality alone captures the full picture, and both come from one extraction pass.
11 min read
The Four-Layer Architecture Behind Governed Memory
Dual memory, governance routing, reflection-bounded retrieval, and schema lifecycle — the architecture we built when RAG wasn't enough.
10 min read
14 Agent Configs, 3 Teams, Zero Source of Truth
Sales embeds brand voice in a system prompt. Support copies compliance rules from a Notion doc. Marketing uses its own tone guidelines. When legal updates the data handling policy, nothing propagates.
8 min read
Progressive Context Delivery: How We Cut Token Usage 50% in Multi-Step Agents
When agents re-plan and act in loops, re-injecting the same governance context at every step is expensive and makes outputs worse. Here's the fix.
Research8 min read
Reflection-Bounded Retrieval: +25.7pp Completeness on Hard Queries
When the information an agent needs is scattered across 3–5 sources, a single retrieval pass misses most of it. Here's what actually works — and the surprising finding about what drives the gain.
10 min read
Why Schema-Enforced Memory Is the CRM Integration Layer AI Has Been Missing
Free-text memories can go into a prompt. They can't sync to Salesforce, filter by deal stage, or aggregate across 10,000 entities. That's the downstream dead end.
9 min read
Schemas Are Living Documents: The Closed-Loop Refinement Pipeline
Schemas age. Models get updated. Content types shift. New agent workflows produce data the schema wasn't designed for. Here's how to build a schema that keeps up.
8 min read
Seven Memories Per Entity Is All You Need
Output quality saturates at roughly seven governed memories per entity. More context isn't better context — it's expensive noise.
Leadership9 min read
The $450K Email Your AI Sent Wrong
Your enrichment agent knows the CTO is evaluating three vendors. Your outbound agent sends a generic cold email anyway. This is how memory silos cost you deals.
9 min read
Two-Phase Redaction: Scrubbing PII Before and After LLM Extraction
Most redaction pipelines scrub PII from the output. We scrub it before the LLM sees the content and again after extraction. Here's why both phases are necessary.
Research9 min read
99.6% Fact Recall, 74.8% on LoCoMo — What the Numbers Actually Mean
Transparent breakdown of our experimental results: what we tested, what the numbers prove, what they don't, and why we benchmark against ourselves honestly.
8 min read
Zero Cross-Entity Leakage Across 3,800 Results
100 entities, overlapping names, same industry, similar roles — and zero actual memory bleed. Here's how entity isolation works when embeddings can't save you.
Featured16 min read
The Multi-Entity Memory Pattern
Most AI systems memorize contacts. The ones that work memorize contacts, their companies, their deals, and the relationships between all of them — then recall across entity boundaries at inference time.
12 min read
Your Agents Know Things. They Just Don't Tell Each Other.
Every workflow learns something. No workflow shares it. This is where organizational intelligence goes to die.
FeaturedBuild Log8 min read
Encoding Solution Architecture Into an AI Skill
The early stages of AI implementation are mostly discovery — assembling scattered context into a coherent system design. We spent two years encoding that process. Here's what we found.
Build Log4 min read
Dogfooding governed memory: building smart notifications for our own product
I installed our own SDK as a customer with a standard API key. No internal shortcuts. This is what I built and what happened.
13 min read
What's Relevant? What Do We Know? What Are the Rules?
Three questions that reveal whether your AI agents have what they need — or whether you're building on gaps.
11 min read
7 Patterns for Building Governed AI Knowledge Bases
A response to The New Stack's excellent taxonomy. They got six right. Here's the pattern nobody's building yet, and a practical blueprint for how to build it.
Analysis8 min read
Amazon, LinkedIn, and the Race to Build Agentic Knowledge Bases (Part 2)
Google, Microsoft, and Salesforce are each solving a piece of the agent governance puzzle. Here's what the pattern reveals — and the gap nobody has closed.
Analysis6 min read
Amazon, LinkedIn, and the Race to Build Agentic Knowledge Bases (Part 1)
The biggest companies in tech are converging on the same conclusion: AI agents without organizational knowledge are a liability.
Leadership8 min read
Who's Actually in Charge of Your AI Agents?
Same company, same task, three different AI agents, three completely different answers. Your customers notice. Do you?
5 min read
3 Shortcomings of RAG as a Memory
The gap between 'stored' and 'remembered' is where agent quality lives.
9 min read
Why Agents Fail Without Memory
If your AI agents forget everything between conversations, they're not agents — they're expensive autocomplete.

Elsewhere

Papers→Labs→Artwork→About→

Who's Running Your CRM: You, Your Agents, or Your Agents' Agents?

65% Off Memorization: How Batch APIs and Prompt Caching Change the Cost Per Memory

Enterprise-grade accurate personalization at scale

Trust Is the New Payload

Agentjacking: When Read Access Quietly Became Execute Access

Why Coding Agents Forget, and What a Real Memory Layer Must Do

Modeling the World for Agents

Wiki Memory Is the Right Idea. A Person Is Not a Codebase.

AI-Native Is Not AI-Enabled

The Lean Agentic Company

Why AI ROI Looks Bad Until the Org Changes

Agents Are Eating the Hour-Long Task

Where to Start With Agents

154 subagents: feature or smell?

Agents Don't Need Your Database. They Need Your Ontology.

Headroom, and the context window as a budget

On-machine vs hosted agents: goose and Foundry answer different questions

Portable agent memory is table stakes, not a moat

The Retrieval Explosion: The Economics of AI Agents in Your CRM

Agent Spawn: Dispatching Thousands of Subagents in the Database

Personize Governed Memory vs. the Alternatives

Who's Running Your CRM: You, Your Agents, or Your Agents' Agents?

From Relational Schemas to a Memory Model for Agents

65% Off Memorization: How Batch APIs and Prompt Caching Change the Cost Per Memory

Agent Dispatch is Live: One Script, Hundreds of Subagents, All Governed

Documents Are Memories Too: The Claude-Markdown-File Pattern in a Governed System

Instructions, Not Prompts: How We Run 1,000 Agents Per Minute Without the Bill Going Sideways

One Call, Four Sources: How Atomic Memories and Typed Properties Make Unified Retrieval Possible

Retrieval Is a Conversation, Not a Query: Why Autonomous Agents Need Pagination as a First-Class API

Schema-Guided Extraction: How Your Properties Teach the LLM What to Look For

Think and Execute For Me: Intent-Based Retrieval for Expensive Agents

The Three Channels of Graph Writes: Declared, Property-Inferred, LLM-Inferred

Three Modalities, One Save Surface: Atomic Memories, Typed Properties, and Markdown Documents

The Knowledge Problem in Enterprise AI: Why RAG Isn't Enough, and What We're Building Instead

57x Compression for AI Coding Agents

Beyond Fact Count: Measuring What Actually Matters in Agent Memory Extraction

Guided Memory Extraction: Why Domain Expertise Belongs in Your Memory Pipeline

One Endpoint to Replace Ten: What We Learned Building a Unified Recall Interface

The AI Vendor Paradox: You Trust Them With Your Data But Can't Verify Anything

4-Tier PII Redaction: How We Built Privacy Into the Memory Layer, Not Around It

Why the Next Wave of AI Winners Will Be Infrastructure Companies, Not Model Companies

Moving Governance and Evaluation Below the Application Layer

We Replaced messages[] With steps[] in Our Agent API. Here's Why.

Code-Orchestrated Agents vs. Tool-Calling: The Architecture Decision That Matters Most

LLM Function Calling in Production: What the Benchmarks Actually Say

Adversarial Governance Compliance — Our Methodology and What Near-Perfect Accuracy Tells Us

Dual Memory: Why You Need Both Free-Text Facts and Typed Properties

The Four-Layer Architecture Behind Governed Memory

14 Agent Configs, 3 Teams, Zero Source of Truth

Progressive Context Delivery: How We Cut Token Usage 50% in Multi-Step Agents

Reflection-Bounded Retrieval: +25.7pp Completeness on Hard Queries

Why Schema-Enforced Memory Is the CRM Integration Layer AI Has Been Missing

Schemas Are Living Documents: The Closed-Loop Refinement Pipeline

Seven Memories Per Entity Is All You Need

The $450K Email Your AI Sent Wrong

Two-Phase Redaction: Scrubbing PII Before and After LLM Extraction

99.6% Fact Recall, 74.8% on LoCoMo — What the Numbers Actually Mean

Zero Cross-Entity Leakage Across 3,800 Results

The Multi-Entity Memory Pattern

Your Agents Know Things. They Just Don't Tell Each Other.

Encoding Solution Architecture Into an AI Skill

Dogfooding governed memory: building smart notifications for our own product

What's Relevant? What Do We Know? What Are the Rules?

7 Patterns for Building Governed AI Knowledge Bases

Amazon, LinkedIn, and the Race to Build Agentic Knowledge Bases (Part 2)

Amazon, LinkedIn, and the Race to Build Agentic Knowledge Bases (Part 1)

Who's Actually in Charge of Your AI Agents?

3 Shortcomings of RAG as a Memory

Why Agents Fail Without Memory