Centralized Load Balancer (LB) Mode
In Load Balancer (LB) Mode, Bauxite acts as a high-performance, centralized intercept for your entire organization. This is the preferred pattern for platform teams providing “LLM-as-a-Service” to multiple internal departments.
Architecture
Instead of living inside each Pod, Bauxite sits behind a standard Network Load Balancer (NLB) or Ingress. All internal applications point their BASE_URL to this central cluster.
Key Benefits of LB Mode
| Feature | Description |
|---|---|
| Global Rate Limiting | Prevent a single “noisy neighbor” app from exhausting your corporate OpenAI/Anthropic quotas. |
| Unified Key Management | Manage your provider API keys in one secure vault rather than distributing them to every app team. |
| Org-Wide Auditing | Centralized Carbon Tracking and security logs for every prompt across the company. |
| Shared KV-Cache | Maximize KV-Aware Routing hits by pooling common RAG prefixes in a centralized memory layer. |
Deployment Configuration
When running in LB Mode, you typically disable the strict 20MB local limit in favor of a larger shared pool (e.g., 512MB) to handle hundreds of concurrent streams.
# config.yaml (LB Mode Optimization)
mode: load_balancer
max_concurrent_streams: 1000
pii_janitor:
enabled: true
vault_type: shared_memory # Uses a faster, non-locking map for high-concurrency Ingress Example (Kubernetes)
To expose the LB cluster internally, use a standard Kubernetes Service:
apiVersion: v1
kind: Service
metadata:
name: bauxite-intercept
spec:
selector:
app: bauxite
ports:
- protocol: TCP
port: 80
targetPort: 9090
type: ClusterIP Security Considerations
While LB Mode is highly efficient, it creates a “Single Point of Failure.” To mitigate this:
- High Availability: Always run at least 3 replicas
- mTLS: Enforce Mutual TLS between your internal Apps and the Bauxite LB to ensure no one can snoop on traffic inside your network.
- Namespace Isolation: Deploy the Bauxite LB in a dedicated security-egress namespace with strict NetworkPolicies.