Skip to main content

The Optimize Layer · For Customer Success AI that drives renewals · Free Sandbox open

Your CS agent drives renewals
at half the per-account cost.
Flat monthly pricing — you keep the savings.

Aggregate measured savings · this month · across active Pilots

$47,820

If you ship a Customer Success AI product — Catalyst, Gainsight AI, Vivun, ChurnZero AI, Pylon, Vitally, Planhat — your LLM cost scales by account in pipeline. Tessera proxies the request path, routes health-score classifiers to cheaper models, caches identical account-profile lookups, and batches QBR-prep workloads overnight.

Every saved dollar is measured directly from our proxy logs. Pricing is a flat monthly subscription by token volume (Starter $199 / Growth $999 / Scale $3,999). We measure and prove your savings; you keep all of it. Quality gate per workload — CSM augmentation that materially affects renewal probability stays on premium models; commodity classification routes down.

Free Sandbox

60M tokens per month, no card up front, full mechanic stack — same feature set as every paid tier. Savings show in the dashboard from your first request through the proxy.

Tessera Optimize Layer · request path schematicYour application sends requests to the Tessera proxy, which applies four optimizations before forwarding to your LLM provider: route to a cheaper model when quality holds, cache identical requests, compress prompt tokens, and batch where eligible. Five percent of routed requests are canary-sampled against the original model and the quality gate fails closed on uncertainty.FIG. I · REQUEST PATHYour appOPENAI · ANTHROPICGOOGLE · ETC.TESSERA · OPTIMIZE LAYERROUTECHEAPER MODEL142/hCACHEIDENTICAL REQ.98/hCOMPRESSPROMPT TOKENS67/hBATCHBATCH-ELIGIBLE14/hProviderOPENAI · ANTHROPICGOOGLE · XAI · ETC.REQUESTRESPONSERE-ROUTEDMEASURED5% canary sampled against original model — quality gate fails closed

Use cases · Customer Success AI

Three unit-cost lines we cut on every Customer Success AI stack.

If your product looks like Catalyst, Gainsight AI, Vivun, ChurnZero AI, Pylon, Vitally, or Planhat — the request mix below is where the proxy earns its keep. Numbers are typical observed ranges across active Pilots running comparable Customer Success workloads. Your real reading lands in your first Monthly Joint Reading.

01 · Per-active-account cost−61%
$0.28$0.11

Account health surveillance — usage signals, ticket history, NPS trend, churn-risk scoring. System prompt + persona blocks are repeated; auto-cache + auto-route handle the bulk. Compression on the activity-log dump.

02 · Per-QBR-prep cost−62%
$4.20$1.60

Quarterly business review prep — usage summary, renewal forecast, expansion-opportunity scoring, talk-track draft. Multi-step prompt chain. Auto-batch on overnight QBR-generation runs (50% off). Cache common account-segment templates.

03 · Per-renewal-outreach cost−61%
$0.31$0.12

Outbound renewal sequencing — personalised email, voice prep brief, account-team handoff notes. Multi-touch loop, persona-cacheable, batch-eligible on overnight sends.

Coverage · twelve named providers

Tessera sits in front of these APIs —

01OpenAIOpenAI
02AnthropicAnthropic
03GoogleGoogle
04AWS BedrockAWS Bedrock
05Azure OpenAIAzure OpenAI
06xAIxAI
07MistralMistral
08CohereCohere
09PerplexityPerplexity
10OpenRouterOpenRouter
11GroqGroq
12TogetherTogether

Tessera is not affiliated with, endorsed by, or remunerated by any of the providers shown. Marks rendered to identify each API surface that the Tessera Optimize Layer can route to. Provider list expands as new SDK adapters land in the LiteLLM ingest path — full active coverage is enumerated in the llms.txt reference file.

How it works · in four steps

Proxy. Measure. Optimize. Invoice.

Ten-minute setup. One config line. Two headers on outbound LLM calls. The proxy replays your existing eval suite before it changes anything in production.

i · Proxy

Point your existing LLM SDK base URL at api.tesseraai.io and add two headers. Anthropic, OpenAI, Google, Bedrock — same shim. No SDK rewrite, no provider lock-in. Reversible in one line.

ii · Measure

The proxy logs every request — token counts, model, latency, paid cost from pricing_catalog snapshot. We anchor a seven-day baseline so every later dollar has a reference. You own the data, exportable any time.

iii · Optimize

Auto-route, auto-cache, auto-compress, auto-batch — each gated by your golden-set eval and a five-percent canary against the original model. Quality fails closed; nothing routes until your eval is uploaded.

iv · Invoice

Flat monthly subscription priced by gross token volume: Starter $199, Growth $999, Scale $3,999. You keep 100% of measured savings; the savings number is ROI proof, not our cut. Monthly Reading PDF auto-issued for accounting.

I · Mechanics

Four moves we make on every request.

Live metric column shows the rolling seven-day average across active Pilots. Illustrative shape — your real numbers are measured from your own proxy logs.

01

Auto-route to a cheaper model when quality holds

Your code asks for GPT-4o. Tessera checks whether GPT-4o-mini passes your golden-set eval. If yes, we route. If your golden set isn't uploaded yet, we don't route — quality gate fails closed. Five percent of routed requests are canary-sampled against the original model so we catch regressions before you do.

02

Auto-cache identical requests at the edge

If the same system prompt + user prompt + parameters has been asked before within your cache TTL, we return the cached response without calling the provider. Cache hits cost nothing upstream — you get sub-10ms latency and one-hundred-percent savings on that request.

03

Auto-compress prompts via deterministic whitespace + structural pass

When the input is long and our eligibility heuristic confirms compressible redundancy, we strip whitespace + structural noise from the request. Per-role opt-in; preserves code fences and JSON structure verbatim. v0.1 yields 5–15% on long prompts. LLMLingua-2 template substitution on roadmap.

04

Auto-batch where batch APIs apply

OpenAI and Anthropic both offer fifty-percent discount on batch-eligible workloads. You tag a workload as batch-eligible — Tessera queues for up to sixty seconds, fires as a batch, returns when ready. No code change on your side.

II · Evidence

Every saved dollar is computed from a trace your CFO can read.

At the close of each month, Tessera issues the Monthly Reading — a typeset register listing each in-scope workload, its ratified baseline cost, the actual paid cost in period, and the savings trace. You keep one hundred percent of measured savings; the flat subscription is your only line item. Below is an anonymised Acme reading, in full. The same artefact format applies to every customer.

Tessera · Monthly Joint ReadingAcme Corp · Annual month 3 · Apr 2026

Total savings

$45,180

against ratified baseline

Tessera fee · 25%

$11,295

Annual tier

Customer keeps · 75%

$33,885

net to Acme

§ 1 · Workload breakdown

WorkloadBaselineActualSaved
classification$48,210$28,900
$19,310
doc-summarisation$36,140$24,800
$11,340
chat$71,500$58,420
$13,080
embeddings$12,360$10,910
$1,450

§ 2 · Cumulative savings · 11 weeks

Read the full anonymised Acme reading

Calculator

Run the numbers on your stack.

Indicative only. Real savings are measured month over month from Tessera proxy logs and recorded in the Monthly Joint Reading. You keep 100% of every dollar we measure; pricing is a flat monthly subscription by token volume, never a cut of your savings.

$75,000

28%

Indicative range — pre-engagement composite shows 18-35% on mid-spend stacks

Measured monthly savings$21,000
Annualized savings$252,000
You keep · 100%$21,000

Pricing · flat monthly

Your plan is a flat monthly subscription by token volume (Starter $199, Growth $999, Scale $3,999), independent of how much we save you. Quality preservation is held at 0.90 by canary; a three-day breach auto-disables routing and posts a service credit. Compliance-tagged workloads never route.

III · Economics

The math is symmetric. We win when you win.

Prepaid balance billing — like Claude API. You top up your account, the proxy debits measured-savings fee in real time, you control top-up cadence. If balance reaches zero, optimizations auto-pause until you top up again. Pricing v3.6 · locked 2026-05-20.

I · Free Sandbox

60M tokens / month · $0 fee · no card

Full observability of what the proxy would have optimized, with mutating mechanics off. Bring your existing provider keys, route traffic through the proxy, see the savings dashboard populate from request one. Hard cap at 60M tokens per calendar month; quota resets monthly. Flip to Production when you want the full mechanic stack active.

II · Paid tiers

Flat monthly by token volume · from $199/mo

Flat monthly subscription by gross tokens submitted: Starter $199 (≤1B), Growth $999 (≤5B), Scale $3,999 (≤20B), Enterprise custom (20B+). The full mechanic stack runs on every paid tier. You keep 100% of measured savings, with no per-token markup and no cut of your savings. Free Sandbox stays free; upgrade only when your volume needs it.

Quality Service Level is the single safety gate — quality preservation ≥ 0.90 by canary against your golden set, three-day breach triggers auto-disable of routing plus a service credit on your next invoice. Compliance gate — workloads tagged regulated never get auto-routed (code-level gate). Always-on client pause control — your dashboard kill-switch overrides everything.

Who leads the practice

Yp

Tallinn · Estonia

Yevheny Panin · founder

Banker first, trader second. Three years running international payments operations at a European commercial bank — reading invoice-line-item data, distinguishing genuine optimisation from cosmetic re-pricing, writing a settlement contract that survives an audit. Five years on the FX trading floor pricing execution against asymmetric liquidity cost.

Tessera applies that structural discipline to LLM inference pricing. Audit-immutable Monthly Readings, joint baselines, every saved dollar measured from proxy logs at request granularity, a flat subscription that is the only line item — all borrowed from how banker-class advisory keeps the customer aligned with the result.

More about the practice

IV · Apply

You keep 100% of every dollar we save. That is the entire deal.

Ten-minute setup. One config line. Two headers. Zero SDK rewrite. We reply within a few minutes with the magic link — golden-set upload (your existing reply-quality, persona-fit, or escalation eval works; if you don't have one, we'll help you bootstrap from your last 200 production traces), proxy anchors a seven-day baseline, optimizations turn on workload by workload. The proxy measures from request one. Every dollar of savings is yours to keep.

Always-on client pause control. Every operator dashboard ships with an account-wide and per-workload kill-switch — pause routing, caching, compression, and batching instantly. The proxy keeps forwarding your requests as passthrough; optimization simply stops. Reversible at any time, no notice required. Tessera does not work uncontrolled in your stack.

Start with the Free Sandbox

One email. Code by mail. You're in.

No card. No procurement cycle. Sign in with your work email — we email a 6-digit code, you enter it, your account is provisioned in seconds on the Free Sandbox tier: 60M tokens per month, zero fee, full observability of what the proxy would have optimized. Upgrade to a paid plan when you want the full mechanic stack active: a flat monthly subscription by token volume, and you keep 100% of measured savings.

Sign in or get started →

Building solo or evaluating before procurement? Grab a free 60M-token developer key at /dev — no card, instant provisioning, the same proxy and mechanic stack as Production. Or write directly to contact@tesseraai.io.