Skip to main content

The Optimize Layer · For Support AI that resolves tickets at scale · Free Sandbox open

Your support agent resolves tickets
at half the per-resolution cost.
You pay only on what we measurably save.

Aggregate measured savings · this month · across active Pilots

$47,820

If you ship a Support AI product — Decagon, Forethought, Ada, Sierra, Replicant, Intercom Fin, Espressive, Lang.ai, Maven AGI — your LLM cost scales by deflected ticket. Tessera sits in your request path, routes deflection classifiers to cheaper models, caches the policy knowledge base, compresses retrieved articles, and batches QA scoring overnight.

Every saved dollar is measured directly from our proxy logs — not inferred. Performance Fee is twenty-five percent of measured savings, debited in real time from a prepaid balance. Compliance gate per workload — anything taggedregulated(HIPAA, PCI, SOC2-restricted) never gets auto-routed; you keep the original model end-to-end.

Seven days free

No card up front. Your trial clock starts on the first request your code sends through the proxy. Savings show in the dashboard from minute one.

Tessera Optimize Layer · request path schematicYour application sends requests to the Tessera proxy, which applies four optimizations before forwarding to your LLM provider: route to a cheaper model when quality holds, cache identical requests, compress prompt tokens, and batch where eligible. Five percent of routed requests are canary-sampled against the original model and the quality gate fails closed on uncertainty.FIG. I · REQUEST PATHYour appOPENAI · ANTHROPICGOOGLE · ETC.TESSERA · OPTIMIZE LAYERROUTECHEAPER MODEL142/hCACHEIDENTICAL REQ.98/hCOMPRESSPROMPT TOKENS67/hBATCHBATCH-ELIGIBLE14/hProviderOPENAI · ANTHROPICGOOGLE · XAI · ETC.REQUESTRESPONSERE-ROUTEDMEASURED5% canary sampled against original model — quality gate fails closed

Use cases · Support AI

Three unit-cost lines we cut on every Support AI stack.

If your product looks like Decagon, Forethought, Ada, Sierra, Replicant, Intercom Fin, Espressive, Lang.ai, or Maven AGI — the request mix below is where the proxy earns its keep. Numbers are typical observed ranges across active Pilots running comparable support workloads. Your real reading lands in your first Monthly Joint Reading.

01 · Per-deflected-ticket cost−61%
$0.18$0.07

Customer message → intent → policy lookup → response → escalation gate. Most cost is the policy KB retrieval prompt. Auto-cache catches identical policy queries, auto-route sends intent classification to a small model, LLMLingua-2 compresses retrieved KB articles where retrieval-quality preserved.

02 · Per-escalation cost−62%
$0.34$0.13

Escalation chain — context summarisation, agent assist, transfer message generation. Highly cacheable on the summary side, routable on the assist side. Compliance gate respected: regulated workloads bypass routing.

03 · Per-QA-batch cost−64%
$0.45$0.16

Overnight ticket-quality scoring — every resolved ticket scored on accuracy, tone, completeness. Pure batch workload. Batch API discount (50%), cache identical-template scoring, route to small model where eval holds.

Support AI runs on regulated data more often than other verticals. Compliance gate is a per-workload tag — when set to regulated, the proxy never auto-routes, never compresses prompts, never batches. Cache + observability still apply. Quality preservation ≥ 0.95 enforced on routed workloads via your golden set (your existing CSAT classifier or escalation-correctness eval).

Coverage · twelve named providers

Tessera sits in front of these APIs —

01OpenAIOpenAI
02AnthropicAnthropic
03GoogleGoogle
04AWS BedrockAWS Bedrock
05Azure OpenAIAzure OpenAI
06xAIxAI
07MistralMistral
08CohereCohere
09PerplexityPerplexity
10OpenRouterOpenRouter
11GroqGroq
12TogetherTogether

Tessera is not affiliated with, endorsed by, or remunerated by any of the providers shown. Marks rendered to identify each API surface that the Tessera Optimize Layer can route to. Provider list expands as new SDK adapters land in the LiteLLM ingest path — full active coverage is enumerated in the llms.txt reference file.

How it works · in four steps

Proxy. Measure. Optimize. Invoice.

Ten-minute setup. One config line. Two headers on outbound LLM calls. The proxy replays your existing eval suite before it changes anything in production.

i · Proxy

Point your existing LLM SDK base URL at api.tesseraai.io and add two headers. Anthropic, OpenAI, Google, Bedrock — same shim. No SDK rewrite, no provider lock-in. Reversible in one line.

ii · Measure

The proxy logs every request — token counts, model, latency, paid cost from pricing_catalog snapshot. We anchor a seven-day baseline so every later dollar has a reference. You own the data, exportable any time.

iii · Optimize

Auto-route, auto-cache, auto-compress, auto-batch — each gated by your golden-set eval and a five-percent canary against the original model. Quality fails closed; nothing routes until your eval is uploaded.

iv · Invoice

Performance Fee is twenty-five percent of measured savings, debited in real time from a prepaid balance. Monthly Reading PDF auto-issued for accounting. Top up $100 to start, pause anytime, balance is yours.

I · Mechanics

Four moves we make on every request.

Live metric column shows the rolling seven-day average across active Pilots. Illustrative shape — your real numbers are measured from your own proxy logs.

01

Auto-route to a cheaper model when quality holds

Your code asks for GPT-4o. Tessera checks whether GPT-4o-mini passes your golden-set eval. If yes, we route. If your golden set isn't uploaded yet, we don't route — quality gate fails closed. Five percent of routed requests are canary-sampled against the original model so we catch regressions before you do.

02

Auto-cache identical requests at the edge

If the same system prompt + user prompt + parameters has been asked before within your cache TTL, we return the cached response without calling the provider. Cache hits cost nothing upstream — you get sub-10ms latency and one-hundred-percent savings on that request.

03

Auto-compress prompts where LLMLingua-2 says safe

When the input is large and compression preserves quality on your eval, we send a tighter prompt upstream. Microsoft's LLMLingua-2 paper shows two to three times compression on retrieval-heavy workloads with negligible quality loss. We use the same threshold gate as routing.

04

Auto-batch where batch APIs apply

OpenAI and Anthropic both offer fifty-percent discount on batch-eligible workloads. You tag a workload as batch-eligible — Tessera queues for up to sixty seconds, fires as a batch, returns when ready. No code change on your side.

II · Evidence

Every fee is computed from a trace your CFO can read.

At the close of each month, Tessera issues the Monthly Joint Reading — a typeset register listing each in-scope workload, its ratified baseline cost, the actual paid cost in period, and the Performance Fee computation trace. Below is an anonymised Acme reading, in full. The same artefact format applies to every Pilot.

Tessera · Monthly Joint ReadingAcme Corp · Annual month 3 · Apr 2026

Total savings

$45,180

against ratified baseline

Tessera fee · 25%

$11,295

Annual tier

Customer keeps · 75%

$33,885

net to Acme

§ 1 · Workload breakdown

WorkloadBaselineActualSaved
classification$48,210$28,900
$19,310
doc-summarisation$36,140$24,800
$11,340
chat$71,500$58,420
$13,080
embeddings$12,360$10,910
$1,450

§ 2 · Cumulative savings · 11 weeks

Read the full anonymised Acme reading

Calculator

Run the numbers on your stack.

Indicative only. Real savings are measured month over month from Tessera proxy logs and recorded in the Monthly Joint Reading. The proxy bills only on what it measures — if zero savings, zero fee. There is no spend floor, no retainer, and no separate Diagnostic phase.

$75,000

28%

Indicative range — pre-engagement composite shows 18-35% on mid-spend stacks

Measured monthly savings$21,000
Annual fee · 25%$5,250
Enterprise fee · 15%$3,150
You keep · Annual$15,750
You keep · Enterprise$17,850

Quality SLA · automatic

Quality preservation guaranteed at 0.90 by canary. Three-day breach → auto-disable of routing + 10% fee credit. Compliance-tagged workloads never route.

III · Economics

The math is symmetric. We win when you win.

Prepaid balance billing — like Claude API. You top up your account, the proxy debits measured-savings fee in real time, you control top-up cadence. If balance reaches zero, optimizations auto-pause until you top up again. Pricing v3.6 · locked 2026-05-20.

I · Free Sandbox

60M tokens / month · $0 fee · no card

Full observability of what the proxy would have optimized, with mutating mechanics off. Bring your existing provider keys, route traffic through the proxy, see the savings dashboard populate from request one. Hard cap at 60M tokens per calendar month; quota resets monthly. Flip to Production when you want the full mechanic stack active.

II · Production

20% of measured savings · $100 minimum top-up

Prepaid balance via Stripe (or invoice on request). Top up $100, $500, $5k — your choice; minimum entry is $100. Tessera deducts 20% of every measured-savings dollar in real time. If balance hits zero, the proxy auto-pauses to passthrough mode (you keep forwarding requests, no optimization fees accrue). Top up again to resume. No floor, no retainer, no contract review for activation.

Quality Service Level is the single safety gate — quality preservation ≥ 0.90 by canary against your golden set, three-day breach triggers auto-disable of routing plus a ten-percent fee credit (credit applied to your balance). Compliance gate — workloads tagged regulated never get auto-routed (code-level gate). Always-on client pause control — your dashboard kill-switch overrides everything.

Who leads the practice

Yevheny Panin

Tallinn · Estonia

Yevheny Panin · founder

Banker first, trader second. Three years running international payments operations at a European commercial bank — reading invoice-line-item data, distinguishing genuine optimisation from cosmetic re-pricing, writing a settlement contract that survives an audit. Five years on the FX trading floor pricing execution against asymmetric liquidity cost.

Tessera applies that structural fix to LLM inference pricing. Performance fees, joint baselines, audit-immutable Monthly Readings, and a Pilot floor of zero are all borrowed straight out of how banker-class advisory works — translated to proxy logs measured at request granularity.

More about the practice

IV · Apply

Zero measured savings, zero fee. That is the entire deal.

Ten-minute setup. One config line. Two headers. Zero SDK rewrite. We reply within a few minutes with the magic link — golden-set upload (your existing reply-quality, persona-fit, or escalation eval works; if you don't have one, we'll help you bootstrap from your last 200 production traces), proxy anchors a seven-day baseline, optimizations turn on workload by workload. The proxy measures from request one. If zero savings, zero fee.

Always-on client pause control. Every operator dashboard ships with an account-wide and per-workload kill-switch — pause routing, caching, compression, and batching instantly. The proxy keeps forwarding your requests as passthrough; Performance Fee does not accrue on paused traffic. Reversible at any time, no notice required. Tessera does not work uncontrolled in your stack.

Start with the Free Sandbox

One email. Code by mail. You're in.

No card. No procurement cycle. Sign in with your work email — we email a 6-digit code, you enter it, your account is provisioned in seconds on the Free Sandbox tier: 60M tokens per month, zero fee, full observability of what the proxy would have optimized. Flip to Production when you want the full mechanic stack active — 20% of measured savings, prepaid balance you control ($100 minimum top-up). If zero savings, zero fee.

Sign in or get started →

Building solo or evaluating before procurement? Grab a free 60M-token developer key at /dev — no card, instant provisioning, the same proxy and mechanic stack as Production. Or write directly to contact@tesseraai.io.