Outbound SDR loops — research, draft, personalise, score, reply-classify. Most of the prompt is repeated context. Auto-cache catches the system + persona blocks, auto-route sends low-stakes classification to a cheaper model, our heuristic compressor strips whitespace + structural noise from the research dump where the per-stack quality canary holds. Five-percent canary keeps the reply-quality canary green.
The Optimize Layer · For Sales AI products built on LLM stacks · Free Sandbox open
Your AI SDR runs
at half the token cost.
Flat monthly pricing — you keep the savings.
Aggregate measured savings · this month · across active Pilots
$47,820
If you ship a Sales AI product — Artisan, 11x, Cresta, Conversica, Apollo AI, Outreach AI, Regie, Lavender, Nooks — your LLM bill scales linearly with prospects, meetings, and enriched rows. Tessera is a thin proxy in your request path that routes cheaper-when-safe, caches identical calls, compresses what compresses, and batches what batches.
Every saved dollar is measured directly from our proxy logs — not inferred from a billing CSV after the fact. Pricing is a flat monthly subscription by token volume (Starter $199 / Growth $999 / Scale $3,999) with the same feature set on every tier; only the token cap differs. We measure and prove your savings; you keep one hundred percent of every saved dollar. Quality gate per workload — reply-quality eval ≥ 0.90 by canary, three-day breach triggers auto-rollback plus a one-month subscription credit.
60M tokens per month, no card up front, full mechanic stack — same feature set as every paid tier. Savings show in the dashboard from your first request through the proxy.
Use cases · Sales AI
Three unit-cost lines we cut on every Sales AI stack.
If your product looks like Artisan, 11x, Cresta, Conversica, Apollo AI, Outreach AI, Regie, Lavender, or Nooks — the request mix below is where the proxy earns its keep. Numbers are typical observed ranges across active Pilots running comparable workloads. Your real reading lands in your first Monthly Joint Reading.
Multi-touch sequences (4–7 emails, optional voice, optional LinkedIn). Each touch is its own LLM call; the cost amortised against booked meetings runs hot. Auto-batch on overnight nurture sends, auto-cache on persona-segment templates, auto-route on follow-up generation where booking lift is statistically flat.
CRM enrichment and persona research — title normalisation, account firmographics, technographics, fit scoring. Highly repetitive shape, very batch-friendly. We queue enrichment jobs for batch APIs (50% off at OpenAI and Anthropic), cache identical company lookups, route deterministic normalisation to a small model.
Coverage · twelve named providers
Tessera sits in front of these APIs —
Tessera is not affiliated with, endorsed by, or remunerated by any of the providers shown. Marks rendered to identify each API surface that the Tessera Optimize Layer can route to. Provider list expands as new SDK adapters land in the LiteLLM ingest path — full active coverage is enumerated in the llms.txt reference file.
How it works · in four steps
Proxy. Measure. Optimize. Invoice.
Ten-minute setup. One config line. Two headers on outbound LLM calls. The proxy replays your existing eval suite before it changes anything in production.
i · Proxy
Point your existing LLM SDK base URL at api.tesseraai.io and add two headers. Anthropic, OpenAI, Google, Bedrock — same shim. No SDK rewrite, no provider lock-in. Reversible in one line.
ii · Measure
The proxy logs every request — token counts, model, latency, paid cost from pricing_catalog snapshot. We anchor a seven-day baseline so every later dollar has a reference. You own the data, exportable any time.
iii · Optimize
Auto-route, auto-cache, auto-compress, auto-batch — each gated by your golden-set eval and a five-percent canary against the original model. Quality fails closed; nothing routes until your eval is uploaded.
iv · Invoice
Flat monthly subscription priced by gross token volume: Starter $199, Growth $999, Scale $3,999. You keep 100% of measured savings; the savings number is ROI proof, not our cut. Monthly Reading PDF auto-issued for accounting.
I · Mechanics
Four moves we make on every request.
Live metric column shows the rolling seven-day average across active Pilots. Illustrative shape — your real numbers are measured from your own proxy logs.
Auto-route to a cheaper model when quality holds
Your code asks for GPT-4o. Tessera checks whether GPT-4o-mini passes your golden-set eval. If yes, we route. If your golden set isn't uploaded yet, we don't route — quality gate fails closed. Five percent of routed requests are canary-sampled against the original model so we catch regressions before you do.
Auto-cache identical requests at the edge
If the same system prompt + user prompt + parameters has been asked before within your cache TTL, we return the cached response without calling the provider. Cache hits cost nothing upstream — you get sub-10ms latency and one-hundred-percent savings on that request.
Auto-compress prompts via deterministic whitespace + structural pass
When the input is long and our eligibility heuristic confirms compressible redundancy, we strip whitespace + structural noise from the request. Per-role opt-in; preserves code fences and JSON structure verbatim. v0.1 yields 5–15% on long prompts. LLMLingua-2 template substitution on roadmap.
Auto-batch where batch APIs apply
OpenAI and Anthropic both offer fifty-percent discount on batch-eligible workloads. You tag a workload as batch-eligible — Tessera queues for up to sixty seconds, fires as a batch, returns when ready. No code change on your side.
II · Evidence
Every saved dollar is computed from a trace your CFO can read.
At the close of each month, Tessera issues the Monthly Reading — a typeset register listing each in-scope workload, its ratified baseline cost, the actual paid cost in period, and the savings trace. You keep one hundred percent of measured savings; the flat subscription is your only line item. Below is an anonymised Acme reading, in full. The same artefact format applies to every customer.
Total savings
$45,180
against ratified baseline
Tessera fee · 25%
$11,295
Annual tier
Customer keeps · 75%
$33,885
net to Acme
§ 1 · Workload breakdown
§ 2 · Cumulative savings · 11 weeks
Operator dashboard
What you see, every day, in seven tabs.
Three of the seven dashboard surfaces below. The dashboard lives at ledger.tesseraai.io and is provisioned for each Pilot at onboarding. Click any tile for the live demo.
Today saved
$312.48
across Pilots
MTD saved
$8,421
this month
Projected
$26,840
monthly · 30d trail
Quality preservation
0.94
canary vs original
12-week savings curve
Route classification → gpt-4o-mini
shipped$11,200/mo · projected
Enable prompt cache · doc-sum prompt
ready$4,800/mo · projected
Batch nightly embeddings
open$1,450/mo · projected
Sorted by $ savings × reversibility / eng days
Audit log · last three events
pricing.snapshot
14:02:18
OpenAI gpt-4o · v37 · conf 0.92
recommendation.shipped
13:51:09
route_swap · clf-prod-01
reading.published
09:14:33
Acme · Apr 2026 · $45,180
Every $ figure traces to a pricing_catalog snapshot version
Other four tabs · Spend · Anomalies & Quality · Seats · Forecast & Settings
Calculator
Run the numbers on your stack.
Indicative only. Real savings are measured month over month from Tessera proxy logs and recorded in the Monthly Joint Reading. You keep 100% of every dollar we measure; pricing is a flat monthly subscription by token volume, never a cut of your savings.
$75,000
28%
Indicative range — pre-engagement composite shows 18-35% on mid-spend stacks
Pricing · flat monthly
Your plan is a flat monthly subscription by token volume (Starter $199, Growth $999, Scale $3,999), independent of how much we save you. Quality preservation is held at 0.90 by canary; a three-day breach auto-disables routing and posts a service credit. Compliance-tagged workloads never route.
III · Economics
The math is symmetric. We win when you win.
Prepaid balance billing — like Claude API. You top up your account, the proxy debits measured-savings fee in real time, you control top-up cadence. If balance reaches zero, optimizations auto-pause until you top up again. Pricing v3.6 · locked 2026-05-20.
I · Free Sandbox
60M tokens / month · $0 fee · no card
Full observability of what the proxy would have optimized, with mutating mechanics off. Bring your existing provider keys, route traffic through the proxy, see the savings dashboard populate from request one. Hard cap at 60M tokens per calendar month; quota resets monthly. Flip to Production when you want the full mechanic stack active.
II · Paid tiers
Flat monthly by token volume · from $199/mo
Flat monthly subscription by gross tokens submitted: Starter $199 (≤1B), Growth $999 (≤5B), Scale $3,999 (≤20B), Enterprise custom (20B+). The full mechanic stack runs on every paid tier. You keep 100% of measured savings, with no per-token markup and no cut of your savings. Free Sandbox stays free; upgrade only when your volume needs it.
Quality Service Level is the single safety gate — quality preservation ≥ 0.90 by canary against your golden set, three-day breach triggers auto-disable of routing plus a service credit on your next invoice. Compliance gate — workloads tagged regulated never get auto-routed (code-level gate). Always-on client pause control — your dashboard kill-switch overrides everything.
Who leads the practice
Tallinn · Estonia
Yevheny Panin · founder
Banker first, trader second. Three years running international payments operations at a European commercial bank — reading invoice-line-item data, distinguishing genuine optimisation from cosmetic re-pricing, writing a settlement contract that survives an audit. Five years on the FX trading floor pricing execution against asymmetric liquidity cost.
Tessera applies that structural discipline to LLM inference pricing. Audit-immutable Monthly Readings, joint baselines, every saved dollar measured from proxy logs at request granularity, a flat subscription that is the only line item — all borrowed from how banker-class advisory keeps the customer aligned with the result.
More about the practiceIV · Apply
You keep 100% of every dollar we save. That is the entire deal.
Ten-minute setup. One config line. Two headers. Zero SDK rewrite. We reply within a few minutes with the magic link — golden-set upload (your existing reply-quality, persona-fit, or escalation eval works; if you don't have one, we'll help you bootstrap from your last 200 production traces), proxy anchors a seven-day baseline, optimizations turn on workload by workload. The proxy measures from request one. Every dollar of savings is yours to keep.
Always-on client pause control. Every operator dashboard ships with an account-wide and per-workload kill-switch — pause routing, caching, compression, and batching instantly. The proxy keeps forwarding your requests as passthrough; optimization simply stops. Reversible at any time, no notice required. Tessera does not work uncontrolled in your stack.
Start with the Free Sandbox
One email. Code by mail. You're in.
No card. No procurement cycle. Sign in with your work email — we email a 6-digit code, you enter it, your account is provisioned in seconds on the Free Sandbox tier: 60M tokens per month, zero fee, full observability of what the proxy would have optimized. Upgrade to a paid plan when you want the full mechanic stack active: a flat monthly subscription by token volume, and you keep 100% of measured savings.
Sign in or get started →Building solo or evaluating before procurement? Grab a free 60M-token developer key at /dev — no card, instant provisioning, the same proxy and mechanic stack as Production. Or write directly to contact@tesseraai.io.