Audit immutability for AI cost claims — what snapshot-pinning actually buys you
A vendor pings you in Slack: “Your AI bill dropped 38% last month — here's the dashboard.” Your finance team asks how the savings were measured. Your security team asks what data flowed to whom. Your CTO asks what would happen if you turned the vendor off. The dashboard is pretty. None of those three people are satisfied.
This is a routine problem for cost-optimization vendors. Most handle it badly — the savings dashboard renders a percentage, the question of howgoes unanswered, the customer trusts on faith or doesn't. Eventually a procurement review surfaces the gap and the engagement gets stuck in a six-month contract debate.
We took a different approach. Every cost number Tessera shows you is a function of two inputs (your tokens, the catalog price) pinned to a snapshot id at the moment the request fired. The catalog is multi-source verified. The snapshot is immutable. Two engineers, three hours, can re-derive any month's savings from raw inputs. This post is the architecture.
The four things a savings number depends on
Trivially, every savings claim of the form “you would have paid $X without us, paid $Y with us, saved $X − $Y” is a function of four inputs:
- The token counts your request consumed (input + output, per role)
- The model the request would have run on without the vendor (baseline)
- The model the request actually ran on (post-routing)
- The per-token price for each of the above models at the moment of the request
Each input has a different failure mode for trust:
- Token counts. The provider returns them in the response. Hard to fake without colluding with the provider (and easy to spot-check by counting locally).
- Baseline model. What you would have used. The vendor has every incentive to inflate the baseline (a more expensive baseline = more savings claimed). This is the primary trust gap.
- Actual model. What the request ran on. The vendor knows because they routed it.
- Prices. Lookup. Providers change them occasionally. The vendor might quote outdated or inflated list rates. This is the secondary trust gap.
Most cost-savings products do not address either trust gap in their architecture. They either show a single number (no provenance) or show line items priced at “current” rates (which can drift weeks after a provider price change). Both leave the customer with no way to audit.
What snapshot-pinning means
Tessera ships a Postgres table called pricing_catalog that maps (provider, model, role, time_window) to a per-token price plus a confidence score plus a snapshot id. Every request the proxy handles emits a savings row that references the exact pricing_snapshot_id used to price it at request time.
The pricing_catalog is append-only by design. We never UPDATE a row. When a provider changes a price, the catalog gets a NEW row with a new snapshot id, the new effective_from timestamp, and the old row stays untouched. Yesterday's requests stay priced at yesterday's snapshot. Today's requests get today's.
The customer-visible consequence: if OpenAI cuts the price of gpt-4o tomorrow, your historical savings number does not silently shift. The post-hoc “we re-priced last month at the new rate” failure mode is impossible because we never re-price last month. Each request carries its own price provenance into eternity.
Multi-source verification
A snapshot is only as good as the input it was built from. So the pricing_catalog ingests prices from three independent sources, scores them per-row by agreement, and surfaces the confidence score on every snapshot:
- LiteLLM's
model_prices_and_context_window.json— community-curated, refreshed multiple times per week, broad coverage across 100+ providers. - The tokencost library — Python package, also community-curated, refreshed on its own cadence, different curation source from LiteLLM (and so independently catches a stale price one of them missed).
- The OpenRouter API — live-quoted prices for providers OpenRouter routes to. Different update mechanism (pulled, not curated).
The verification rule: if at least two of the three sources agree on a price within 0.5%, the row gets a confidence score ≥ 0.95 and is eligible for use in production billing. If two sources disagree, we hold the previous snapshot until manual review. If a single source disagrees with the other two, we go with the consensus and log the outlier for investigation.
A daily cron runs the verification across the entire catalog and writes the updated confidence scores. Recency-decay applies: a snapshot built today carries the day's confidence; one built three weeks ago drifts toward 0 unless re-verified. So a row pricing a request in production can never be both stale AND high-confidence at the same time. If you see a savings figure with a snapshot that was verified within 24 hours, all three sources agreed, and the deltas across them are below 0.5% — the provenance chain is intact.
The architecture lives in three small files in the dashboard: lib/pricing/consensus.ts (cross-source agreement check), lib/pricing/tokencost.ts (one of the three source adapters), api/cron/verify-pricing/route.ts (the daily cron). Each ships with its own unit tests because the verification correctness is the whole point.
What this lets you audit
Every row your dashboard shows for the month — every individual savings line item — carries seven columns we never let the dashboard hide:
request_id— the proxy's unique id for the requestmodel_baseline— what would have run absent Tessera (your configured default)model_actual— what actually ran (after auto-route / batch / etc.)tokens_in+tokens_out— counts as reported by the upstream provideroriginal_cost_usd— counterfactual: tokens × baseline-model price at snapshotactual_cost_usd— tokens × actual-model price at snapshotpricing_snapshot_id— pointer back to the catalog row that priced both
The savings delta = original − actual − Tessera fee. The fee is itself derivable from the savings (we charge 20%). Every number on the dashboard is the sum of values from rows like these.
You can export the full ledger as CSV from /portal/audit any time. Two engineers, three hours, can write a script that: (1) reads the export, (2) joins each row to the corresponding pricing_snapshot_id, (3) recomputes the savings delta from scratch, and (4) emits a PASS/FAIL per row plus an aggregate reconciliation. The reconciliation succeeds because our dashboard math is the arithmetic of these rows. There is nothing else.
Why this matters beyond accounting
The audit story matters for accounting — that is the obvious payoff. But it also matters for three less-obvious reasons.
Vendor incentive alignment. We charge 20% of measured savings. Bigger reported savings = bigger fee, which is a built-in pull toward inflating the number. Snapshot-pinned provenance + multi-source verification is the structural brake against that pull. Every $1 of fee on your invoice has 7 columns of evidence backing it, immutable.
Mid-contract pricing changes.If OpenAI cuts rates tomorrow, your historical savings stay locked. If we changed our fee structure tomorrow, your historical fees stay locked. The customer's relationship to the vendor stops depending on what the vendor decides next.
Downstream substrate primitives. The same architectural pattern — snapshot-pinned provenance — supports features we will ship next: SLA enforcement (which requests fell below the canary floor, locked at the moment), governance gates (which traffic was routed differently for compliance reasons, locked at the moment), regulated-data classification (which requests were tagged as regulated and routed accordingly). Audit-immutability is not just a cost-claim trick. It is the foundation for everything-else that needs to survive a downstream lawyer reading it back to you in six months.
Try it on your workload
Free Dev tier: 60M tokens/month, no card. Sign up at tesseraai.io/dev — 30-second flow returns an instant API key and dashboard access.
Run your workload for 7 days. Open /portal/audit. Click any savings row. The provenance trail behind it — request id, baseline model, actual model, tokens, original cost, actual cost, snapshot id — is what you should expect from a cost-optimization vendor in 2026.
Production tier (over 60M tokens/month): 20% of measured savings. Zero savings, zero fee. The audit surface is identical to Free Dev — same seven columns per row, same multi-source verification, same snapshot-pinning. The trust architecture is load-bearing at every tier or it isn't architecture.
References
- Architecture: the nine mechanics + quality SLA + reliability primitives
- Security & data handling: what we do and do not store
- Worked example: a $24k/month OpenAI bill cut 38% — stage-by-stage breakdown with snapshot-pinned audit trail
- The prompt bloat you don't see — chars-before/after telemetry, related provenance pattern
- SDK on GitHub: github.com/tessera-llm/tessera-sdk
Audit-grade cost transparency
60M tokens free, full audit trail per request, no card
Every savings line carries a request_id, baseline model, actual model, tokens, original cost, actual cost, and pricing_snapshot_id. CSV export any time. Kill-switch any time. Pay 20% only on measured savings.
Get free API key→