What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via a deterministic whitespace + structural pass (LLMLingua-2 template substitution on roadmap), auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera count toward your reported savings.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. Pricing is a flat monthly subscription by token volume, not a per-seat SaaS — and savings are measured directly from our own proxy logs. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Flat monthly pricing by gross tokens submitted, same feature set on every tier — only the token cap differs. Free Sandbox: 60 million tokens per month, $0, full mechanic stack active. Paid tiers: Starter $199/month (up to 1B tokens), Growth $999/month (up to 5B), Scale $3,999/month (up to 20B), Enterprise custom (20B+). No per-token markup and no cut of your savings — savings are measured and you keep 100%. No floor, no retainer, no contract review for activation. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a service credit.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most paid-tier signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above 20B gross tokens/mo, Enterprise Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

← Tessera Blog

2026-05-19·9 min read

Audit immutability for AI cost claims — what snapshot-pinning actually buys you

A vendor pings you in Slack: “Your AI bill dropped 38% last month — here's the dashboard.” Your finance team asks how the savings were measured. Your security team asks what data flowed to whom. Your CTO asks what would happen if you turned the vendor off. The dashboard is pretty. None of those three people are satisfied.

This is a routine problem for cost-optimization vendors. Most handle it badly — the savings dashboard renders a percentage, the question of howgoes unanswered, the customer trusts on faith or doesn't. Eventually a procurement review surfaces the gap and the engagement gets stuck in a six-month contract debate.

We took a different approach. Every cost number Tessera shows you is a function of two inputs (your tokens, the catalog price) pinned to a snapshot id at the moment the request fired. The catalog is multi-source verified. The snapshot is immutable. Two engineers, three hours, can re-derive any month's savings from raw inputs. This post is the architecture.

The four things a savings number depends on

Trivially, every savings claim of the form “you would have paid $X without us, paid $Y with us, saved $X − $Y” is a function of four inputs:

The token counts your request consumed (input + output, per role)
The model the request would have run on without the vendor (baseline)
The model the request actually ran on (post-routing)
The per-token price for each of the above models at the moment of the request

Each input has a different failure mode for trust:

Token counts. The provider returns them in the response. Hard to fake without colluding with the provider (and easy to spot-check by counting locally).
Baseline model. What you would have used. The vendor has every incentive to inflate the baseline (a more expensive baseline = more savings claimed). This is the primary trust gap.
Actual model. What the request ran on. The vendor knows because they routed it.
Prices. Lookup. Providers change them occasionally. The vendor might quote outdated or inflated list rates. This is the secondary trust gap.

Most cost-savings products do not address either trust gap in their architecture. They either show a single number (no provenance) or show line items priced at “current” rates (which can drift weeks after a provider price change). Both leave the customer with no way to audit.

What snapshot-pinning means

Tessera ships a Postgres table called pricing_catalog that maps (provider, model, role, time_window) to a per-token price plus a confidence score plus a snapshot id. Every request the proxy handles emits a savings row that references the exact pricing_snapshot_id used to price it at request time.

The pricing_catalog is append-only by design. We never UPDATE a row. When a provider changes a price, the catalog gets a NEW row with a new snapshot id, the new effective_from timestamp, and the old row stays untouched. Yesterday's requests stay priced at yesterday's snapshot. Today's requests get today's.

The customer-visible consequence: if OpenAI cuts the price of gpt-4o tomorrow, your historical savings number does not silently shift. The post-hoc “we re-priced last month at the new rate” failure mode is impossible because we never re-price last month. Each request carries its own price provenance into eternity.

Multi-source verification

A snapshot is only as good as the input it was built from. So the pricing_catalog ingests prices from three independent sources, scores them per-row by agreement, and surfaces the confidence score on every snapshot:

LiteLLM's model_prices_and_context_window.json — community-curated, refreshed multiple times per week, broad coverage across 100+ providers.
The tokencost library — Python package, also community-curated, refreshed on its own cadence, different curation source from LiteLLM (and so independently catches a stale price one of them missed).
The OpenRouter API — live-quoted prices for providers OpenRouter routes to. Different update mechanism (pulled, not curated).

The verification rule: when all three sources agree on a price within 1%, the row gets a confidence score of 0.95 and is eligible for use in production billing. When only two of the three agree, the row holds at 0.80 — below the billing threshold — so we fall back to the last high-confidence snapshot and flag the diverging source for investigation. We never bill against a row under 0.95 confidence.

A daily cron runs the verification across the entire catalog and writes the updated confidence scores. Recency-decay applies: a snapshot built today carries the day's confidence; one built three weeks ago drifts toward 0 unless re-verified. So a row pricing a request in production can never be both stale AND high-confidence at the same time. If you see a savings figure with a snapshot that was verified within 24 hours, all three sources agreed, and the deltas across them are within 1% — the provenance chain is intact.

The architecture lives in three small files in the dashboard: lib/pricing/consensus.ts (cross-source agreement check), lib/pricing/tokencost.ts (one of the three source adapters), api/cron/verify-pricing/route.ts (the daily cron). Each ships with its own unit tests because the verification correctness is the whole point.

What this lets you audit

Every row your dashboard shows for the month — every individual savings line item — carries seven columns we never let the dashboard hide:

request_id — the proxy's unique id for the request
model_baseline — what would have run absent Tessera (your configured default)
model_actual — what actually ran (after auto-route / batch / etc.)
tokens_in + tokens_out — counts as reported by the upstream provider
original_cost_usd — counterfactual: tokens × baseline-model price at snapshot
actual_cost_usd — tokens × actual-model price at snapshot
pricing_snapshot_id — pointer back to the catalog row that priced both

The savings delta = original − actual. The customer keeps 100% of it — Tessera bills a flat monthly subscription by token volume, not a cut of the savings. Every number on the dashboard is the sum of values from rows like these.

You can export the full ledger as CSV from /portal/audit any time. Two engineers, three hours, can write a script that: (1) reads the export, (2) joins each row to the corresponding pricing_snapshot_id, (3) recomputes the savings delta from scratch, and (4) emits a PASS/FAIL per row plus an aggregate reconciliation. The reconciliation succeeds because our dashboard math is the arithmetic of these rows. There is nothing else.

Why this matters beyond accounting

The audit story matters for accounting — that is the obvious payoff. But it also matters for three less-obvious reasons.

Vendor incentive alignment. We charge a flat monthly subscription, nota percentage of savings — so we have zero financial incentive to inflate the savings number. A bigger reported delta earns Tessera nothing extra. The snapshot-pinned provenance + multi-source verification proves the savings we report are real; we just don't profit from a bigger number. Every savings row on your dashboard has 7 columns of evidence backing it, immutable.

Mid-contract pricing changes.If OpenAI cuts rates tomorrow, your historical savings stay locked — each row is priced at the snapshot in force when the request ran. The reconstructed delta never silently re-rates. The customer's record of what was saved stops depending on what any vendor decides next.

Downstream substrate primitives. The same architectural pattern — snapshot-pinned provenance — supports features we will ship next: SLA enforcement (which requests fell below the canary floor, locked at the moment), governance gates (which traffic was routed differently for compliance reasons, locked at the moment), regulated-data classification (which requests were tagged as regulated and routed accordingly). Audit-immutability is not just a cost-claim trick. It is the foundation for everything-else that needs to survive a downstream lawyer reading it back to you in six months.

Try it on your workload

Free Sandbox tier: 60M tokens/month, no card. Sign up at tesseraai.io/dev — 30-second flow returns an instant API key and dashboard access.

Run your workload for 7 days. Open /portal/audit. Click any savings row. The provenance trail behind it — request id, baseline model, actual model, tokens, original cost, actual cost, snapshot id — is what you should expect from a cost-optimization vendor in 2026.

Paid tiers are a flat monthly subscription by token volume — Starter $199 (≤1B) / Growth $999 (≤5B) / Scale $3,999 (≤20B) — and you keep 100% of the savings. The audit surface is identical to the Free Sandbox: same seven columns per row, same multi-source verification, same snapshot-pinning. Because we don't bill on the savings number, the audit isn't protecting our invoice — it's proving your ROI is real. The trust architecture is load-bearing at every tier or it isn't architecture.

References

Audit-grade cost transparency

60M tokens free, full audit trail per request, no card

Every savings line carries a request_id, baseline model, actual model, tokens, original cost, actual cost, and pricing_snapshot_id. CSV export any time. Kill-switch any time. Flat monthly pricing by token volume — keep 100% of savings.

Get free API key→