What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via a deterministic whitespace + structural pass (LLMLingua-2 template substitution on roadmap), auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera count toward your reported savings.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. Pricing is a flat monthly subscription by token volume, not a per-seat SaaS — and savings are measured directly from our own proxy logs. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Flat monthly pricing by gross tokens submitted, same feature set on every tier — only the token cap differs. Free Sandbox: 60 million tokens per month, $0, full mechanic stack active. Paid tiers: Starter $199/month (up to 1B tokens), Growth $999/month (up to 5B), Scale $3,999/month (up to 20B), Enterprise custom (20B+). No per-token markup and no cut of your savings — savings are measured and you keep 100%. No floor, no retainer, no contract review for activation. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a service credit.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most paid-tier signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above 20B gross tokens/mo, Enterprise Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

← How it works

Mechanic · M12 · Safety primitive

Guardrails

Shipped 21 May 2026 · v0.1 heuristic · Migration 0090

M12 inspects each request before the upstream provider sees it. PII strings (email, phone, SSN, credit card, IBAN, …) are redacted via a Python sidecar running microsoft/presidio. Jailbreak and toxicity hits are caught by pure-TypeScript heuristics inside the Worker isolate. Per-workload mode selects the consequence: observe, redact, or block.

Honest framing first: this is a v0.1 heuristic. The PII detector is presidio's regex + dictionary + spaCy NER ensemble. The jailbreak detector is a regex set covering the published DAN / Sydney / role-flip corpus. The toxicity filter is a banned-token list with a per-message ratio threshold. None of these are ML-classifier-backed. Suitable for low-risk redact mode and for strict-mode blunt blocking. Not a substitute for a dedicated moderation provider on workloads with stringent compliance contracts.

What it does

For every request through the proxy with guardrails_mode != 'off':

PII validator. HMAC-signs the request text, POSTs to the presidio-bridge sidecar at /v1/analyze-redact. The sidecar returns { redacted_text, entities_found[] } and the worker substitutes the redacted version back into each chat message slot when redact or strict mode is active.
Jailbreak validator. Pure-TypeScript regex scan over the concatenated message text. Pattern set covers "ignore previous instructions", DAN role-flip, persona override, mode escalation, system-prompt extraction, safety bypass, encoded-payload smuggling. ~80 µs per check.
Toxicity validator. Banned-token list with a ratio gate. Single-token hit flags short messages (< 50 words); long messages need ratio ≥ 0.02. Narrow list, conservative. False positives on enterprise traffic are more damaging at v0.1 than false negatives.

The three validators compose into one outcome. In strict mode the request is rejected with HTTP 403 + tessera-guardrails-blocked header carrying the block reason ( jailbreak_blocked or toxicity_blocked). PII never blocks; it always redacts. Blocking on PII would convert ordinary traffic into 403s for any message that happens to contain an email address, which is incompatible with the proxy contract.

Mode taxonomy

The per-workload guardrails_mode enum has four values, evaluated server-side on every request:

off . Default. Validators are skipped entirely. No audit row.
log_only . Validators run, audit row written, request unchanged. Use this to validate v0.1 behavior on real traffic before committing to a mutation policy.
redact . PII redacted in-place. Jailbreak and toxicity hits are recorded but the request still proceeds. Best for workloads that handle customer-supplied content where blocking is too disruptive but PII leakage to the upstream is unacceptable.
strict . PII redacted; jailbreak or toxicity hit returns 403 to the caller. Use only on workloads where you have a fall-back path for blocked requests. The block header taxonomy is stable.

Composition cap

M12 redact mutates the request body, so it counts as a content-mutatoralongside M3 compress and M7 context-prune. The proxy enforces a max of two content-mutators per request (LOCKED 2026-05-20). Stacking three mutations against the same model call multiplies the quality risk past the canary's ability to catch.

When M3 + M7 already fired on this request AND guardrails_mode is redact, the runner downgrades to log_only for that single request and emits an audit row with composition_cap_exceeded=true. Validators still run; entities still get recorded; the request body is just not mutated. Jailbreak and toxicity validators are non-mutating, so they always run when mode is anything but off regardless of cap.

Sidecar architecture

Presidio (the PII engine) is Python-only. It ships with spaCy and transformers dependencies that cannot run inside the Cloudflare Worker V8 isolate. We run presidio as an HTTP sidecar (presidio-bridge/ in the monorepo, deployable via Docker Compose) and call out server-to-server over HMAC-signed requests.

The HMAC scheme is identical to the worker → dashboard ingest path documented in .claude/rules/hmac.md: header X-Tessera-Signature: t=<ts>,v1=<base64url-hmac-sha256> with the same asymmetric drift window (reject futures > 30 s, reject pasts > 300 s). One secret rotation updates both sides in lockstep. The bridge fails closed with HTTP 500 if the env var is missing, never running un-authenticated.

Failure modes are graceful degrade. If the bridge is unreachable or returns a non-2xx, the PII validator logs and treats the request as unredacted-passthrough. The customer's call still succeeds. The other two validators are in-process and have no network dependency.

What we don’t do yet

Five deferrals are explicit, not hidden:

ML-classifier-backed jailbreak detection. v0.1 is regex-only. False-positive risk is real on long adversarial prompts; sponsors running strict mode should also wire in per-prompt monitoring. v0.2 adds a small transformer classifier (likely the Garak corpus fine-tuned model) as a second-pass when the regex fires.
ML-classifier-backed toxicity scoring. v0.1 is banned-token + ratio. False-negative rate on creatively-spelled slurs is high. The shipping API surface keeps room for a Detoxify-class drop-in once we ship v0.2.
Streaming response validators. v0.1 inspects the request only. Mid-stream output redaction requires SSE tee + per-chunk classification that we'd rather not ship half-done.
Multilingual jailbreak corpus. Regex set is English-only. Non-English jailbreak attempts will slip through v0.1 by construction.
Sponsor-authored custom validators. The RequestValidator interface is stable enough to support config-driven JSON rules; we did not ship the config-loader path so we could harden the three baseline validators first.

How to enable

On a workload page in /portal/settings, select a mode from the Guardrails (M12) dropdown and save. KV refresh fires immediately; the next request through the worker honours the new mode without waiting for the 5-minute periodic sync.

Default mode is off . Sponsors must explicitly opt-in per workload. Free Sandbox and paid-tier workloads both see the full mode flow; guardrails are a reliability/compliance primitive, not a paid-tier-only feature.

Verification surfaces

M12 is observable in three places:

Response headers on a strict-mode blocked request: tessera-guardrails-blocked with value jailbreak_blocked or toxicity_blocked plus the standard x-tessera-request-id for correlation.
Worker logs. guardrails_strict_block, guardrails_redact_applied, guardrails_log_only_triggered, guardrails_composition_cap_exceeded cover the four interesting paths. PII bridge failures surface as guardrails_pii_bridge_unreachable or guardrails_pii_bridge_non_2xx.
Audit row for every mode flip on audit_events: event_kind='guardrails_mode_change' with payload.from and payload.to. The per-request guardrails_outcome audit row ships in v0.2 when the audit-log ingest path is wired in for the runner output.

What we promise

M12 v0.1 is opt-in default OFF. We do not flip the default to anything else until v0.2 ships with a real classifier and the public framing changes in lockstep. The mode enum + block header taxonomy is stable contract. Changes ship with a CHANGELOG entry and a deprecation window.

Worker source: workers/proxy/src/guardrails/ (types · runner · three validators). Bridge source: presidio-bridge/ (Dockerfile · FastAPI app · HMAC verify · pytest suite). Migration: supabase/migrations/0090_guardrails_mode.sql. Parent index: How it works.