Streaming Redaction
Bauxite is built for the era of real-time AI. Unlike traditional proxies that wait for the full response to arrive before processing it, Bauxite’s Streaming Redaction engine scrubs data on-the-fly with sub-millisecond overhead.
The Challenge: Partial Matches
When an LLM streams a response via Server-Sent Events (SSE), it often sends data in tiny chunks:
Chunk 1: {"text": "My email is john."}
Chunk 2: {"text": "doe@exam"}
Chunk 3: {"text": "ple.com"}
A naive regex would miss the email address because it never sees the full string at once.
The Bauxite Solution: Sliding Window Buffer
To solve this within our 20MB Straitjacket, Bauxite uses a fixed-size, circular sliding window buffer.
- Windowing: Bauxite maintains a small 32KB buffer for each active stream.
- Lookahead: As chunks arrive, they are appended to the window.
- Redaction: The regex engine scans the window. If a partial match is detected at the end of a chunk (e.g.,
john.doe@), Bauxite holds that small fragment back for a few milliseconds until the next chunk confirms or breaks the pattern. - Flush: Once a pattern is confirmed and redacted—or the window moves past a non-match—the data is immediately flushed to the client.
Performance Impact
- Latency: Adds ~0.5ms to 2ms of jitter (negligible for human reading speeds).
- Memory: Fixed 32KB per concurrent stream. 100 concurrent users only consume ~3.2MB of the 20MB limit.
Security: The “No-Leak” Guarantee
Because Bauxite operates at the byte level, we ensure that no PII “leaks” through the stream even if the LLM attempts to obfuscate it with unusual chunking.
| Feature | Implementation |
|---|---|
| SSE Aware | Parses data: frames natively to avoid redacting metadata. |
| Tail-Holding | Prevents partial PII leaks by holding the end of the buffer until the next chunk. |
| Backpressure | If the client is slow, Bauxite holds data in the memory-only window, never swapping to disk. |
Configuration
Streaming redaction is enabled by default when the PII Janitor is active. You can tune the “Lookahead” buffer size in your config.yaml:
streaming:
enabled: true
lookahead_buffer_kb: 32 # Default 32KB
max_hold_ms: 50 # Max time to hold a partial match before flushing Testing the Stream
You can test the real-time redaction using curl:
curl -X POST http://localhost:9090/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "gpt-4",
"stream": true,
"messages": [{"role": "user", "content": "Repeat this back: my secret is sk-12345"}]
}' Expected Output: data: {"choices": [{"delta": {"content": "Your secret is [SECRET_1]"}}]}