Two-Phase Redaction: Scrubbing PII Before and After LLM Extraction

Most redaction pipelines scrub PII from the output. We scrub it before the LLM sees the content and again after extraction. Here's why both phases are necessary.

TL;DR

Single-phase redaction (post-output only) fails for one fundamental reason: the LLM has already seen the sensitive data during processing.
Two-phase redaction runs before extraction (replacing PII with typed placeholders before the LLM call) and after extraction (scanning output for PII the LLM reconstructed from context).
Four entity tiers: secrets (API keys, private keys), financial PII (credit cards, IBAN), identity PII (SSNs), and contact PII (emails, phones, IPs).
Three anonymization strategies: redact (typed placeholder), mask (preserve last 4 digits), hash (SHA-256 prefix for linkability without reversibility).
Redaction is configurable: contact PII like email can be skipped if the use case requires it. Tier enablement, strategy selection, and field-level exceptions are all configurable.

When organizations build AI memory systems over sensitive content — customer call transcripts, support tickets, sales emails, HR records — PII protection usually comes up late. After the memory layer is working. After extraction is producing useful results. As an afterthought.

The afterthought approach produces a specific failure mode: scrubbing outputs but not inputs. The LLM has already processed the raw sensitive data. Even if no sensitive values appear in the extracted memories, the model's internal representations were shaped by content it shouldn't have seen. And when the LLM reconstructs patterns from context — inferring a credit card number format from partial information, or synthesizing a plausible email from surrounding context — single-phase output redaction catches nothing.

Two-phase redaction addresses both failure modes.

Phase 1: Pre-Extraction

Before any content reaches the LLM, the redaction pipeline scans and replaces sensitive patterns.

Raw text is scanned against entity pattern definitions. Each match is replaced with a typed placeholder: [EMAIL], [CREDIT_CARD], [SSN], [API_KEY]. The replacement is deterministic and typed — the placeholder preserves the semantic category of what was redacted without preserving the value.

Input:  "Sarah's email is sarah@acme.com and her budget is $450K"
Output: "Sarah's email is [EMAIL] and her budget is $450K"

The LLM processes the redacted version. It can extract "contact has an email address" and "budget is $450K" — both useful facts. It never sees sarah@acme.com.

Phase 1 also handles secrets, which are categorically different from PII. An API key or private key appearing in a sales call transcript (rare but not impossible — a customer sharing integration credentials during a demo) must never be extracted into persistent memory. Pattern matching on key prefixes and PEM blocks catches these before extraction.

Phase 2: Post-Extraction

Phase 1 handles what the pattern matcher can find. Phase 2 handles what it misses.

Two failure modes escape pre-extraction redaction:

Pattern gaps. Regex-based matching has high precision on well-structured PII (credit cards with Luhn validation, SSNs with area/group/serial validation) but lower coverage on context-dependent patterns. A casual mention of a bank account number in a non-standard format may not match the pattern.

LLM reconstruction. More insidiously: the LLM may infer or reconstruct PII-like patterns from surrounding context. If the input says "her card ending in 4242" and the LLM extracts "credit card: [reconstructed from context]", the post-extraction scan catches this. The pre-extraction phase couldn't catch what wasn't explicitly present.

Post-extraction scanning runs the same pattern matching over the extracted values, applying the configured anonymization strategy to any newly detected matches. The result is that even if the LLM produces PII-like output, it doesn't enter the memory store.

Four Tiers of Entity Detection

The pipeline organizes entity detection into four tiers with independent enablement:

Tier 1: Secrets. API keys, private keys (PEM blocks), connection strings, passwords. Detection uses pattern matching on key prefixes and structural patterns (length, character set, entropy). These are never stored under any configuration — there is no use case where an API key belongs in an AI memory system.

Tier 2: Financial PII. Credit card numbers (with Luhn validation), IBAN numbers (with format validation). Regex with checksum verification reduces false positives on digit sequences that happen to match card number patterns.

Tier 3: Identity PII. Social Security Numbers with area/group/serial validation. The validation step distinguishes SSNs from other 9-digit sequences.

Tier 4: Contact PII. Email addresses, phone numbers, IP addresses. Standard format matching. This tier is configurable — email addresses, for instance, may be intentionally retained in contact-focused memory systems where the email is part of entity identity rather than sensitive data. When the use case requires it, Tier 4 can be partially or fully disabled.

Each tier is independently enabled or disabled in the configuration. A system processing internal HR records applies all four tiers. A system processing technical documentation may disable Tier 4 entirely. The configuration decision is explicit and auditable, not implicit.

Three Anonymization Strategies

For each detected entity, the pipeline applies one of three strategies based on configuration:

Redact. Replace the value with a typed placeholder. sarah@acme.com → [EMAIL]. Highest privacy protection. No information about the original value is preserved. Appropriate for most PII categories.

Mask. Preserve the last N characters (default: 4). sarah@acme.com → [EMAIL:...com]. Useful when some information must be preserved for downstream use — confirming a card ending in a specific digit, or identifying email domain without storing the full address.

Hash. Replace with a SHA-256 prefix. sarah@acme.com → [EMAIL:a3f9b2...]. Preserves linkability without reversibility: two occurrences of the same email produce the same hash, enabling deduplication and entity linking without storing the original value. Appropriate when identity linking is needed but PII cannot be stored.

Strategy selection is per-tier in the configuration, not per-value. All credit card numbers use the mask strategy. All API keys use the redact strategy. The consistency is intentional — per-value decisions require judgment that shouldn't be delegated to runtime logic.

Audit Trail

Every redaction produces an audit record:

RedactionAudit:
  tier:        string      -- which tier triggered
  entityType:  string      -- what was detected (EMAIL, SSN, etc.)
  strategy:    string      -- which strategy was applied
  count:       number      -- how many instances were redacted

Audit records are attached to the memory entry's provenance metadata. The redactionApplied flag in the provenance record indicates whether Phase 1 redaction ran on the source content. This enables diagnostic queries: "show me all memories where redaction was applied" or "how many SSNs were detected across last month's content ingestion."

Provenance metadata makes the redaction pipeline auditable, not just functional.

For GDPR, CCPA, and similar regulatory frameworks, the audit trail answers the questions regulators ask: what sensitive data was detected, how was it handled, and what remains in the system?

What Regex Can't Catch

Regex-based redaction covers well-structured PII with high precision. It does not cover everything.

Obfuscated patterns ("my card number is four, two, four, two..."), context-dependent PII (a name that's PII in one context and not in another), and novel formats that don't match existing patterns all escape pattern-based detection.

We're honest about this limitation in the paper. ML-augmented redaction — transformer-based NER for non-standard PII patterns — is future work. The current pipeline is a high-precision foundation, not a comprehensive solution.

The practical implication: two-phase regex-based redaction significantly reduces PII exposure in extracted memories, but it doesn't eliminate it. Organizations with strict PII requirements should layer additional controls — access logging, data retention limits, right-to-erasure workflows — on top of the redaction pipeline, not instead of it.

The Right Model for PII in Memory Systems

PII protection in AI memory systems has a specific challenge that general-purpose redaction tools don't face: the goal is to preserve semantic content while removing identifying content.

"Sarah's email is [EMAIL] and her budget is $450K" is better than "Sarah's email is sarah@acme.com and her budget is $450K" — the LLM doesn't need the email address to extract useful facts about the entity. But "Sarah's budget is [AMOUNT]" is worse — the budget is the fact we want to preserve, not PII.

The tier structure and configurable strategy handle this by matching the redaction scope to the data category. Contact PII (email) is redacted. Budget figures are not currency-amount patterns in the Tier 2 definition. The distinction is explicit and configurable.

Redaction in memory systems should preserve what's analytically useful while removing what's identifying. The two-phase pipeline, with configurable tiers and strategies, is designed for that specific tradeoff.

Frequently Asked Questions

Why regex instead of ML-based NER? Regex with validation provides high precision and zero false positives on well-structured PII categories. An ML model might identify "John" as a person name and redact it — losing semantic content that isn't actually sensitive. For the specific categories we target (API keys, credit cards, SSNs, emails, phones, IPs), pattern-based detection with validation is more reliable than ML for the precision requirement. We plan to add ML-based detection as a supplementary layer for unstructured PII, not as a replacement.

What happens to memories that were stored before redaction was enabled? Existing memories in the store don't get retroactively redacted by default — extraction replay is required for that. For organizations enabling redaction after initial deployment, we recommend a migration pass: replay extraction with redaction enabled on historical content, replace existing memories with the redacted versions. The contentHash in provenance metadata enables matching new extractions to their source documents.

Does redaction affect extraction quality? Minimally. The LLM can still extract facts about entities whose identifiers have been replaced with typed placeholders. "The contact's email is [EMAIL] and they prefer email communication" extracts correctly as a communication preference fact, even without the specific address. The only case where redaction affects quality is when the specific value is itself the fact — a scenario where the extraction hint should be revised to make clear that the value will be redacted.

How does hashing enable entity linking if hashes aren't reversible? Consistent hashing means the same input always produces the same hash. If sarah@acme.com appears in five different documents and is hashed to [EMAIL:a3f9b2...] in each, the memory system can recognize that all five observations are about the same entity without storing the email address itself. Deduplication and entity aggregation work on the hash, not the original value.