What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via LLMLingua-2 where safe, auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera enter the Performance Fee.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. We bill on measured savings from our own proxy logs, not on a SaaS seat. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Two tiers. Free Sandbox: 60 million tokens per month, $0 fee, observe-only optimisations (savings measured but no fee accrued). Production: 20% of measured savings, billed against a prepaid balance (Stripe Checkout, $100 minimum top-up, or invoice on request). Zero savings = zero fee in that window. No floor, no retainer, no contract review for activation. If balance reaches zero, the proxy automatically pauses optimisations and forwards requests as passthrough — no fees accrue while paused. Top up to resume. Large-volume Production Clients (above $500,000 per month in measured savings) may negotiate a custom MSA addendum. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a 10% fee credit applied to your balance.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Performance Fee does not accrue on paused traffic. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most Production signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0 fee) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above $500k/mo in measured savings, Production Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

tesseraIntegrations

Integrations · Mastra ↗

Tessera × Mastra

@tessera-llm/mastra is a Mastra-shaped adapter for the Tessera substrate layer. Mastra Agents accept any AI SDK provider module as the model:field — Tessera's config function returns exactly that shape, so dropping it in is a three-line edit on each agent.

Same mechanic stack (auto-route, exact + semantic cache, compress, output- length ceiling, batch arbitrage) fires server-side on every agent.generate() / agent.stream() call. Tools, structured outputs, schema-constrained tool calls, and RAG retrieval workflows all unchanged.

Install

npm install @tessera-llm/mastra @mastra/core
# Plus whichever provider package you use:
npm install @ai-sdk/openai          # or @ai-sdk/anthropic / @ai-sdk/mistral / @ai-sdk/groq / @ai-sdk/cohere

Get a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.

Quickstart

import { Agent } from "@mastra/core/agent";
import { createOpenAI } from "@ai-sdk/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/mastra";

const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  ...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});

export const supportAgent = new Agent({
  name: "Support",
  instructions: "Triage incoming support tickets in two sentences.",
  model: openai("gpt-4o"),
});

Scenario 1 — Convenience factory — skip the explicit createOpenAI import

If you don't already use createOpenAI elsewhere in your codebase, the convenience factory wraps the import and pre-merges the Tessera config so the Agent definition stays clean.

import { Agent } from "@mastra/core/agent";
import { tesseraOpenAI } from "@tessera-llm/mastra";

const openai = await tesseraOpenAI({
  openaiApiKey: process.env.OPENAI_API_KEY!,
  tesseraApiKey: process.env.TESSERA_API_KEY!,
});

export const triageAgent = new Agent({
  name: "Triage",
  instructions: "Classify support tickets into bug, feature, billing, or other.",
  model: openai("gpt-4o-mini"),
});

const result = await triageAgent.generate("Customer says their March invoice is wrong.");

Scenario 2 — Agent with tools

Auto-route gates on tool-calling capability — an agent using tools never gets routed to a non-tool-capable model. Tool dispatch loops on long-running workflows benefit from exact-match cache on identical sub-prompts; cached responses return upstream of any content-mutating mechanic.

import { Agent } from "@mastra/core/agent";
import { createTool } from "@mastra/core/tools";
import { tesseraAnthropic } from "@tessera-llm/mastra";
import { z } from "zod";

const lookupCustomer = createTool({
  id: "lookup-customer",
  description: "Look up customer billing history",
  inputSchema: z.object({ accountId: z.string() }),
  execute: async ({ context: { accountId } }) => billingDB.lookup(accountId),
});

const anthropic = await tesseraAnthropic({
  anthropicApiKey: process.env.ANTHROPIC_API_KEY!,
  tesseraApiKey: process.env.TESSERA_API_KEY!,
});

export const billingAgent = new Agent({
  name: "Billing",
  instructions: "Resolve billing questions. Use lookup-customer when needed.",
  model: anthropic("claude-sonnet-4-6"),
  tools: { lookupCustomer },
});

Scenario 3 — Multi-agent app with one Tessera key

Register multiple Agents on the same Mastra instance, each on a different model + provider, all routed through one Tessera key. Per- workload auto-route + quality canary fire independently; the verified- savings ledger rolls up totals across every Agent into one monthly reading.

// One Tessera key powers every agent — auto-route decisions and quality
// canary stay per-workload, but billing rolls up to the same ledger.

import { Agent } from "@mastra/core/agent";
import { Mastra } from "@mastra/core";
import { tesseraOpenAI, tesseraAnthropic } from "@tessera-llm/mastra";

const openai = await tesseraOpenAI({
  openaiApiKey: process.env.OPENAI_API_KEY!,
  tesseraApiKey: process.env.TESSERA_API_KEY!,
});
const anthropic = await tesseraAnthropic({
  anthropicApiKey: process.env.ANTHROPIC_API_KEY!,
  tesseraApiKey: process.env.TESSERA_API_KEY!,
});

export const mastra = new Mastra({
  agents: {
    triage: new Agent({ name: "Triage", model: openai("gpt-4o-mini"), instructions: "..." }),
    research: new Agent({ name: "Research", model: anthropic("claude-sonnet-4-6"), instructions: "..." }),
    draft: new Agent({ name: "Draft", model: openai("gpt-4o"), instructions: "..." }),
  },
});

Worked savings example

Three-agent customer-support app on Mastra. Triage on gpt-4o-mini, Research on claude-sonnet-4-6, Draft on gpt-4o. Combined 5B tokens/month, list prices.

Stage	Cost / mo	Saved
Baseline — direct providers via Mastra	$24,000	—
+ Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch)	$9,400	$14,600
Tessera fee (20% × measured savings)	$2,920	—
You net pay	$12,320	$11,680 / mo saved

Per-stack quality canary held mean-score 0.96 across all three agent workloads (floor 0.95) — the 0.95 SLA held all 30 days.

Architecture and quality contract

Adapter is a thin npm package (Apache-2.0). The Mastra core SDK and the @ai-sdk/* provider packages are peer dependencies — install only the providers you actually use.

Composition cap (max 2 content-mutators per request), per-stack 0.95 quality floor with auto-rollback + 10% credit on breach, and audit-immutable measurement (two pricing snapshots per request) all enforced on the proxy. The verified- savings ledger is at ledger.tesseraai.io.

FAQ

Does this work with Mastra's string-shorthand model ID ("openai/gpt-4o")?: Not directly. Mastra's string shorthand uses environment variables like OPENAI_API_KEY against the provider's canonical endpoint. Tessera needs a custom baseURL + custom X-Tessera-API-Key header, which the string shorthand doesn't surface. Use the AI SDK provider factory form (shown in examples) — Mastra accepts it as a first-class alternative.
Does this break Agent tools / structured outputs / streaming / RAG?: No. The Vercel AI SDK provider object that Mastra accepts is unchanged in shape — agent.generate(), agent.stream(), generateObject, schema-constrained tool calls, and RAG retrieval workflows all work unchanged.
Why are there two API surfaces (tesseraOpenAIConfig vs tesseraOpenAI)?: The config function returns the kwargs object you spread into createOpenAI(...) — explicit, easy to combine with other settings (organization, custom fetch, etc.). The convenience factory imports createOpenAI for you and pre-merges. Use whichever you find more readable.

Where to go next

Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
@tessera-llm/mastra on npm
Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.