PII Janitor
The PII Janitor is Bauxite’s high-speed redaction engine. It ensures that sensitive user data never leaves your infrastructure by intercepting prompts and replacing Personally Identifiable Information (PII) with ephemeral tokens before they reach the LLM.
How It Works: The “Vault Swap”
The Janitor operates on a three-stage lifecycle: Detect, Vault, and Re-Identify. This entire process happens within the 20MB Straitjacket, utilizing zero-allocation buffers for maximum performance.
1. Detection
As the request stream enters the intercept, the Janitor scans the text using optimized Regex patterns.
Stream-Safe: It uses a sliding window (32KB) to scan data without loading the entire prompt into memory.
Low Latency: Detection adds less than 1ms of overhead to the request path.
2. Vaulting (The In-Memory Vault)
When a match (e.g., an email address) is found, it is not “deleted.” Instead, it is moved to a Request-Scoped Vault.
Tokenization:
[email protected]becomes[EMAIL_1].Volatility: The vault is a
map[string]stringtied to the specific HTTP request context.
3. Re-Identification
When the LLM responds, it might reference the token
(e.g., “I have sent an email to [EMAIL_1]”). The Janitor intercepts the
response stream, looks up [EMAIL_1] in the vault, and replaces it with the
original value before it reaches the user.
Supported Entities
The PII Janitor comes with pre-configured patterns for common sensitive data types:
| Entity Type | Example Pattern | Redaction Label |
|---|---|---|
[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,} | [EMAIL_X] | |
| API Keys | sk-[a-zA-Z0-9]{32,} | [SECRET_X] |
| Credit Cards | \d{4}-\d{4}-\d{4}-\d{4} | [CARD_X] |
| Phone (US) | \(\d{3}\) \d{3}-\d{4} | [PHONE_X] |
| IP Addresses | \d{1,3}(\.\d{1,3}){3} | [IP_X] |
Memory Safety & The “Straitjacket”
The PII Janitor is the primary reason for Bauxite’s strict memory limits. To prevent data leaks, we implement Hard Erasure:
- No Disk Spillage: If a vault grows too large, the request is terminated rather than swapped to disk.
- Explicit Wiping: When a request finishes, the Janitor does not wait for the Garbage Collector. It iterates through the vault and zeroes out the byte slices.
- CGO-Free: The entire logic is pure Go, avoiding the memory-safety pitfalls of C-based regex libraries.
// Example of the Janitor's "Wipe" function
func (v *Vault) Purge() {
for key, value := range v.entries {
// Overwrite memory with zeroes
for i := range value {
value[i] = 0
}
delete(v.entries, key)
}
} Configuration
You can enable or disable specific detectors in your config.yaml:
pii_janitor:
enabled: true
strategy: "redact" # options: redact, hash, block
entities:
- email
- api_keys
- credit_cards
custom_patterns:
- name: "INTERNAL_ID"
regex: "ID-[0-9]{5}"