Hamed Taheri
I turn AI strategy into production-grade systems that organizations can trust, govern, and scale.
Co-founder & CTO at Personize.ai. I write about memory, governance, and the operating model of enterprise AI. Based in Vancouver.
65% Off Memorization: How Batch APIs and Prompt Caching Change the Cost Per Memory
Provider batch APIs cut LLM extraction cost in half. Prompt caching cuts another fifteen to twenty percent off. Combined, customers who can wait 1-24 hours pay roughly a third of what they would on the sync path. Here is the architecture that makes that discount actually reach the customer, and why writes are the right place to spend the latency budget you saved at retrieval.
Agent Dispatch is Live: One Script, Hundreds of Subagents, All Governed
A code-capable agent reads your records, decides what work each one needs, and places hundreds of trigger-bound subagents in a single script run. Some fire now. Some fire when conditions are met. All governed, all observable, all on the same per-minute rate. Here is what we shipped, the use cases we are exploring with early users, and what you already get.
Documents Are Memories Too: The Claude-Markdown-File Pattern in a Governed System
AI agents need a place to write sectioned markdown notes they can later read, append, or rewrite. The same shape Claude uses for its memory files, but inside a governance and retrieval layer that everything else uses. When to save as a document, how the section-update pattern works, and what flipping the aiExtraction switch does to your modality picture.
Instructions, Not Prompts: How We Run 1,000 Agents Per Minute Without the Bill Going Sideways
Per-token pricing punishes you for using AI at scale. Per-minute pricing forces you to design agents like production systems. Here is the pattern we converged on, the trade-offs of one big chain versus a forest of small agents, and what changes when the models themselves are designed for workflows.
One Call, Four Sources: How Atomic Memories and Typed Properties Make Unified Retrieval Possible
A real fact about a customer lives in four places at once: a typed property, an atomic memory, a document chunk, and a graph edge. Most retrieval stacks pick one and let the agent stitch the rest. Here is the storage discipline that makes returning all four in one call actually useful.
Retrieval Is a Conversation, Not a Query: Why Autonomous Agents Need Pagination as a First-Class API
Stateless retrieval is fine for chat. For autonomous agents taking 100 steps with no human in the loop, it is the single biggest source of wasted output tokens, context bloat, and silent quality collapse. Here is the pattern that replaces every-call-is-fresh with retrieval-as-conversation.
Schema-Guided Extraction: How Your Properties Teach the LLM What to Look For
Every property in your schema carries a type, a description, extraction instructions, examples, and measurement criteria. Together they form a per-property system prompt the LLM consumes at extraction time. You stop hand-prompting; the schema does it. Here is what an extraction-ready property actually looks like, and why this is the only way schema-based memory scales past a handful of fields.
Think and Execute For Me: Intent-Based Retrieval for Expensive Agents
Every output token your agent produces is at the premium rate of the strongest model in the loop. When the agent's job is to compress raw retrieved items into a paragraph, you are paying premium rates for compression. Here is how intent-based retrieval moves the synthesis to where the data lives, and what that does to the token bill.
The Three Channels of Graph Writes: Declared, Property-Inferred, LLM-Inferred
A relationship between two records can be told to the system, inferred deterministically from extracted properties, or surfaced by an LLM reading free-text memories. Three independent channels, three different cost profiles, opt-in per write. The same three apply whether the write target is a record or a document. The graph is not a separate system; it is an opt-in fan-out on the same save call.
Three Modalities, One Save Surface: Atomic Memories, Typed Properties, and Markdown Documents
AI agents need three different kinds of memory: extracted atomic facts, typed structured properties, and sectioned markdown documents they can revise like notes. The discipline is picking the right modality for the kind of knowledge. The unified save surface is what makes all three composable at retrieval time, which is the only place that matters.
57x Compression for AI Coding Agents
1.5M tokens of source code, compressed to 27K of curated context. Benchmarked against Graphify and raw grep on 20 real engineering tasks; the curated approach won on total workflow cost.
Moving Governance and Evaluation Below the Application Layer
Governance lives in system prompts. Evaluation lives in separate pipelines. State lives in session stores. We moved all three into the infrastructure layer of an AI API. Here is the architecture and what it changes.
We Replaced messages[] With steps[] in Our Agent API. Here's Why.
We started with the same messages[] pattern everyone uses. For complex, repeatable agent workflows it kept failing in predictable ways. So we decomposed instructions into sequential steps with scoped tools and shared context. Here's what we learned.
Code-Orchestrated Agents vs. Tool-Calling: The Architecture Decision That Matters Most
Stripe, Shopify, and Salesforce all converged on the same pattern: LLM decides, code executes. Here's the architectural reasoning, the trade-offs, and when tool-calling actually makes sense.
The Multi-Entity Memory Pattern
Most AI systems memorize contacts. The ones that work memorize contacts, their companies, their deals, and the relationships between all of them — then recall across entity boundaries at inference time.
Encoding Solution Architecture Into an AI Skill
The early stages of AI implementation are mostly discovery — assembling scattered context into a coherent system design. We spent two years encoding that process. Here's what we found.
The Knowledge Problem in Enterprise AI: Why RAG Isn't Enough, and What We're Building Instead
Enterprises have thousands of documents AI agents should read. Context windows say they can't. RAG helps, but loses structure and intent. Here's how we're solving it.
Beyond Fact Count: Measuring What Actually Matters in Agent Memory Extraction
Memory extraction quality is not about how many facts you extract. It's about entity awareness, strategic depth, smart splitting, implied context, and cross-entity intelligence -- the levels that separate storage from understanding.
Guided Memory Extraction: Why Domain Expertise Belongs in Your Memory Pipeline
Why AI agents need extracted memories over raw content, and why guided extraction with entity awareness and domain expertise is what separates useful memory from noise.
One Endpoint to Replace Ten: What We Learned Building a Unified Recall Interface
We had ten memory endpoints. Our agents only needed one. Here's what five rewrites of an intent classifier taught us about building a natural-language query layer for AI memory.
The AI Vendor Paradox: You Trust Them With Your Data But Can't Verify Anything
Enterprise AI requires handing sensitive data to platforms you can't inspect. The answer isn't better trust. It's better architecture, one where trust isn't required because verification is built in.
4-Tier PII Redaction: How We Built Privacy Into the Memory Layer, Not Around It
AI memory systems ingest everything: emails, transcripts, documents. PII is embedded in all of it. We built redaction into the write path itself, not as a retrieval filter. Here's the architecture and why the distinction matters.
Why the Next Wave of AI Winners Will Be Infrastructure Companies, Not Model Companies
The model layer is commoditizing at 280x in two years. The real defensibility is in the infrastructure between the model and the enterprise: memory, governance, deployment. That's the bet we're making.
LLM Function Calling in Production: What the Benchmarks Actually Say
The best models fail 30% of the time on complex tool-calling scenarios. Seven documented error patterns, infinite loop failures, and silent cascading errors. Here's what the data says before you ship function calling to production.
Adversarial Governance Compliance — Our Methodology and What Near-Perfect Accuracy Tells Us
Delivering the right context to agents is one problem. Ensuring they respect what they must never do is another. Here's how we designed our adversarial governance experiment, what our results show, and why this work is never finished.
Dual Memory: Why You Need Both Free-Text Facts and Typed Properties
38% of valuable information no schema anticipated. 12% only usable with type enforcement. Neither modality alone captures the full picture, and both come from one extraction pass.
The Four-Layer Architecture Behind Governed Memory
Dual memory, governance routing, reflection-bounded retrieval, and schema lifecycle — the architecture we built when RAG wasn't enough.
14 Agent Configs, 3 Teams, Zero Source of Truth
Sales embeds brand voice in a system prompt. Support copies compliance rules from a Notion doc. Marketing uses its own tone guidelines. When legal updates the data handling policy, nothing propagates.
Progressive Context Delivery: How We Cut Token Usage 50% in Multi-Step Agents
When agents re-plan and act in loops, re-injecting the same governance context at every step is expensive and makes outputs worse. Here's the fix.
Reflection-Bounded Retrieval: +25.7pp Completeness on Hard Queries
When the information an agent needs is scattered across 3–5 sources, a single retrieval pass misses most of it. Here's what actually works — and the surprising finding about what drives the gain.
Why Schema-Enforced Memory Is the CRM Integration Layer AI Has Been Missing
Free-text memories can go into a prompt. They can't sync to Salesforce, filter by deal stage, or aggregate across 10,000 entities. That's the downstream dead end.
Schemas Are Living Documents: The Closed-Loop Refinement Pipeline
Schemas age. Models get updated. Content types shift. New agent workflows produce data the schema wasn't designed for. Here's how to build a schema that keeps up.
Seven Memories Per Entity Is All You Need
Output quality saturates at roughly seven governed memories per entity. More context isn't better context — it's expensive noise.
The $450K Email Your AI Sent Wrong
Your enrichment agent knows the CTO is evaluating three vendors. Your outbound agent sends a generic cold email anyway. This is how memory silos cost you deals.
Two-Phase Redaction: Scrubbing PII Before and After LLM Extraction
Most redaction pipelines scrub PII from the output. We scrub it before the LLM sees the content and again after extraction. Here's why both phases are necessary.
99.6% Fact Recall, 74.8% on LoCoMo — What the Numbers Actually Mean
Transparent breakdown of our experimental results: what we tested, what the numbers prove, what they don't, and why we benchmark against ourselves honestly.
Zero Cross-Entity Leakage Across 3,800 Results
100 entities, overlapping names, same industry, similar roles — and zero actual memory bleed. Here's how entity isolation works when embeddings can't save you.
Your Agents Know Things. They Just Don't Tell Each Other.
Every workflow learns something. No workflow shares it. This is where organizational intelligence goes to die.
Dogfooding governed memory: building smart notifications for our own product
I installed our own SDK as a customer with a standard API key. No internal shortcuts. This is what I built and what happened.
What's Relevant? What Do We Know? What Are the Rules?
Three questions that reveal whether your AI agents have what they need — or whether you're building on gaps.
7 Patterns for Building Governed AI Knowledge Bases
A response to The New Stack's excellent taxonomy. They got six right. Here's the pattern nobody's building yet, and a practical blueprint for how to build it.
Amazon, LinkedIn, and the Race to Build Agentic Knowledge Bases (Part 2)
Google, Microsoft, and Salesforce are each solving a piece of the agent governance puzzle. Here's what the pattern reveals — and the gap nobody has closed.
Amazon, LinkedIn, and the Race to Build Agentic Knowledge Bases (Part 1)
The biggest companies in tech are converging on the same conclusion: AI agents without organizational knowledge are a liability.
Who's Actually in Charge of Your AI Agents?
Same company, same task, three different AI agents, three completely different answers. Your customers notice. Do you?
3 Shortcomings of RAG as a Memory
The gap between 'stored' and 'remembered' is where agent quality lives.
Why Agents Fail Without Memory
If your AI agents forget everything between conversations, they're not agents — they're expensive autocomplete.