What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via a deterministic whitespace + structural pass (LLMLingua-2 template substitution on roadmap), auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera count toward your reported savings.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. Pricing is a flat monthly subscription by token volume, not a per-seat SaaS — and savings are measured directly from our own proxy logs. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Flat monthly pricing by gross tokens submitted, same feature set on every tier — only the token cap differs. Free Sandbox: 60 million tokens per month, $0, full mechanic stack active. Paid tiers: Starter $199/month (up to 1B tokens), Growth $999/month (up to 5B), Scale $3,999/month (up to 20B), Enterprise custom (20B+). No per-token markup and no cut of your savings — savings are measured and you keep 100%. No floor, no retainer, no contract review for activation. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a service credit.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most paid-tier signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above 20B gross tokens/mo, Enterprise Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

← Tessera Blog

2026-05-19·7 min read

Using Tessera with LangChain in 30 seconds — drop-in cost optimization

If your application is on LangChain and your LLM bill is real enough to care about, here is the shortest path from "I am paying full list price for every token" to "I cut the bill and keep 100% of what I save." It is two lines of code, your existing ChatModel keeps working, and the proxy underneath does the optimization.

The integration (Python)

One install. One line of config in your ChatOpenAI / ChatAnthropic / ChatMistralAI constructor. That is the entire integration.

pip install tessera-langchain

# In your existing app code:

from langchain_openai import ChatOpenAI
from tessera_langchain import tessera_openai_config

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-...",                          # your OpenAI key
    **tessera_openai_config(api_key="tk_..."),       # routes through Tessera
)

# Everything you already wrote with this llm — agents, chains,
# tool calling, structured output, streaming — works unchanged.
response = llm.invoke("Summarize this support ticket in 2 sentences.")

Get a free Tessera API key (60M tokens/month, no card) at tesseraai.io/dev — sign-up takes about 30 seconds and returns an instant tk_… key.

The integration (TypeScript / LangChain.js)

npm install @tessera-llm/langchain

// In your existing app code:

import { ChatOpenAI } from "@langchain/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/langchain";

const llm = new ChatOpenAI({
  model: "gpt-4o",
  apiKey: process.env.OPENAI_API_KEY!,
  ...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});

// Existing chains, agents, tool calls, streaming — all unchanged.
const response = await llm.invoke("Summarize this support ticket in 2 sentences.");

What changes underneath

Every call your LangChain ChatModel makes now goes to api.tesseraai.io first. The Tessera proxy applies a stack of cost-optimization mechanics in real time before forwarding the request to the provider:

Auto-route to a cheaper-equivalent model when a daily promptfoo canary on your eval set holds > 0.95 mean-score. Stays on the more expensive model otherwise.
Exact + semantic cache on the canonical request body (sha256 hash, 7-day TTL, Cloudflare edge KV).
Provider prompt-cache headers — injects OpenAI cached_tokens_id or Anthropic cache_control: ephemeral markers when a stable system prefix is detected. 50-90% off on cached prefix tokens.
Per-role prompt compression (heuristic whitespace/structural normalization preserving code fences and JSON shapes). Independent toggles for system vs user turns.
Context pruning for long conversations (system + last 8 turns; TF-IDF rerank on RAG attachments).
Output-length ceiling — daily compute fits p90 of your historical output length per workload, injects max_tokens = p90 × 1.3.
Batch arbitrage on async-tolerant requests (provider Batch APIs run 50% off list).

On top of these, a per-provider circuit breaker tracks rolling 5xx rates per upstream and skips degraded providers in auto-route decisions until they recover. (Cross-provider failover — re-routing to a different provider entirely — is on the roadmap, not shipped yet. See /how-it-works → Reliability primitives for current status.)

What does NOT change

Your eval set. Same promptfoo / langsmith / custom eval. Tessera fires its own daily canary against a 10% sample of production traffic and rolls back the offending mechanic stack if quality drops below 0.95 — but your application-level evals run against the proxied output unchanged.
Tool calling. The proxy passes tools through and respects them for routing decisions (auto-route never picks a fallback model that lacks tool-calling capability when tools are present in the request).
Streaming. SSE / chunked responses pass through. Mid-stream provider 5xx surfaces a terminal [ERROR] marker rather than silently retrying — we never hide failed completions from your app.
Structured output / JSON mode. response_format is preserved on routing decisions; the auto-route table only considers models that support the requested format.
Your provider keys. Your OpenAI / Anthropic / Mistral key stays in your environment. The proxy forwards it upstream untouched. We never store prompts or completions — only token counts, cost deltas, and mechanic-stack metadata.

Worked example: a customer-support LangChain agent

Concrete numbers from a real beta workload — customer-support agent on gpt-4o with a top-5 RAG retrieval per turn, 1.2 billion tokens/month aggregate, OpenAI list prices.

Stage	Cost / mo	Saved
LangChain → OpenAI direct (baseline)	$24,000	—
+ LangChain → Tessera (auto-route + cache + prompt-cache headers)	$11,520	$12,480
+ context pruning + M9 ceiling + compression + batch	$9,400	$2,120
Tessera-optimized total	$9,400	$14,600
Tessera subscription (Growth tier, flat)	$999	—
Savings the customer keeps	100%	$14,600

Quality canary across the full mechanic stack: mean-score 0.96 (floor 0.95) — 0.95 SLA held all 30 days. The application code is one LangChain ChatOpenAI constructor + the eight-line agent setup they already had. No prompt rewrites. No model swap by hand. Full breakdown by mechanic in the companion post.

Per-provider one-liners

from tessera_langchain import (
    tessera_openai_config,      # → langchain_openai.ChatOpenAI(...)
    tessera_anthropic_config,   # → langchain_anthropic.ChatAnthropic(...)
    tessera_mistral_config,     # → langchain_mistralai.ChatMistralAI(...)
    tessera_groq_config,        # → langchain_groq.ChatGroq(...)
    tessera_cohere_config,      # → langchain_cohere.ChatCohere(...)
)

# Or the generic dispatcher when provider is runtime-parameterized:
from tessera_langchain import tessera_config

cfg = tessera_config("anthropic", api_key="tk_...")

All five constructors accept the same shape: **tessera_<provider>_config(api_key=...) (or spread in TypeScript). The package ships separate functions per provider because each LangChain ChatModel uses a slightly different field name for the upstream base URL — the per-provider functions return the correct keyword for each.

FAQ

Q: Does this break my eval set?

No. Your eval runs against the proxied output unchanged. Tessera additionally fires its own daily quality canary against a 10% production sample for SLA enforcement — that is independent of your eval pipeline.

Q: My agent uses tool calling. Does that still work?

Yes. The proxy passes tools through. Auto-route gates on tool-calling capability — if a cheaper fallback model does not support function calling, the request stays on the original model for that call.

Q: Streaming?

Streamed responses pass through. Cache hits still stream from the proxy edge (fast). Provider failures mid-stream surface a terminal error marker rather than silently retrying.

Q: Can I use this with the LangChain `init_chat_model()` helper?

Should work — pass the Tessera config kwargs as the LangChain init kwargs: init_chat_model("gpt-4o", model_provider="openai", **tessera_openai_config(api_key="tk_...")). This routes through the same constructor path as a direct ChatOpenAI(...) call, but we have not exhaustively tested every model_provider value. File an issue at tessera-llm/tessera-langchain if a specific provider/init combination misbehaves and we will ship a patch.

Q: What if I am using a LangChain provider class not in the list?

The Tessera proxy accepts the OpenAI wire format on api.tesseraai.io/v1/openai — any LangChain provider class that accepts a base_url override and default_headers works (Together, DeepSeek, Fireworks, OpenRouter pass-through, etc.). File an issue if your provider class is missing a first-class helper — we ship them in patch releases.

Q: How is this different from the main `tessera-sdk`?
Same proxy. Same mechanics. Same billing record. `tessera-sdk` patches the underlying OpenAI / Anthropic / etc. SDK constructors via a one-line tessera.activate(key). `tessera-langchain` wires into LangChain's ChatModel constructors directly. Use whichever fits your codebase. Side-by- side install is supported.

References

Try Tessera + LangChain on your workload

60M tokens free, no card, one line of config

pip install tessera-langchain (or npm install @tessera-llm/langchain). Wire your existing ChatModel through the Tessera proxy. Kill-switch any time. Flat monthly pricing by token volume — keep 100% of savings.

Get free API key→