What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via a deterministic whitespace + structural pass (LLMLingua-2 template substitution on roadmap), auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera count toward your reported savings.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. Pricing is a flat monthly subscription by token volume, not a per-seat SaaS — and savings are measured directly from our own proxy logs. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Flat monthly pricing by gross tokens submitted, same feature set on every tier — only the token cap differs. Free Sandbox: 60 million tokens per month, $0, full mechanic stack active. Paid tiers: Starter $199/month (up to 1B tokens), Growth $999/month (up to 5B), Scale $3,999/month (up to 20B), Enterprise custom (20B+). No per-token markup and no cut of your savings — savings are measured and you keep 100%. No floor, no retainer, no contract review for activation. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a service credit.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most paid-tier signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above 20B gross tokens/mo, Enterprise Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

tesseraIntegrations

Integrations · LangChain ↗

Tessera × LangChain

tessera-langchain is a thin adapter that routes every LangChain LLM call through the Tessera substrate layer. You keep your existing ChatOpenAI, ChatAnthropic, and other constructor calls — the adapter only changes the underlying HTTP endpoint and adds one header. Auto-route, exact + semantic cache, provider prompt cache, compression, output-length ceiling, and batch arbitrage all run on the proxy side, gated by your own promptfoo eval set.

Works with Python (langchain_openai, langchain_anthropic, langchain_mistralai, langchain_groq, langchain_cohere) and JavaScript / TypeScript (@langchain/openai + sibling provider packages). Same Tessera key powers both.

Install

pip install tessera-langchain          # Python
npm install @tessera-llm/langchain     # Node / TypeScript

Get a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.

Quickstart — Python

from langchain_openai import ChatOpenAI
from tessera_langchain import tessera_openai_config

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-...",                          # your OpenAI key, unchanged
    **tessera_openai_config(api_key="tk_..."),       # one line routes through Tessera
)

response = llm.invoke("Summarize this customer support ticket in 2 sentences.")

Scenario 1 — TypeScript / Node — same shape

The JS adapter mirrors the Python API. Spread tesseraOpenAIConfig()into the constructor options bag — that's the entire diff.

import { ChatOpenAI } from "@langchain/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/langchain";

const llm = new ChatOpenAI({
  model: "gpt-4o",
  apiKey: process.env.OPENAI_API_KEY,
  ...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});

const response = await llm.invoke("Summarize this customer support ticket in 2 sentences.");

Scenario 2 — Tool-calling customer-support agent

A tool-calling agent on gpt-4o that triages support tickets, queries a knowledge-base tool, and writes a draft response. Auto-route gates on tool-calling capability, so the agent never gets routed to a non-tool-capable model. Exact cache hits on repeat tool inputs return instantly with no upstream cost.

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from tessera_langchain import tessera_openai_config

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-...",
    **tessera_openai_config(api_key="tk_..."),
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You triage support tickets. Use the kb_lookup tool to find prior cases."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, [kb_lookup_tool], prompt)
executor = AgentExecutor(agent=agent, tools=[kb_lookup_tool])

Scenario 3 — RAG pipeline with provider prompt cache

A LangChain RetrievalQA chain on claude-sonnet-4-6. Tessera auto-injects Anthropic cache_control: ephemeral markers on the retrieved-context prefix — 90% off cache reads on the long retrieval prefix. RAG workloads with a stable document-corpus prefix typically save 40-70% on top-of-funnel input tokens.

from langchain_anthropic import ChatAnthropic
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma
from tessera_langchain import tessera_anthropic_config

llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    anthropic_api_key="sk-ant-...",
    **tessera_anthropic_config(api_key="tk_..."),
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=Chroma(persist_directory="./db").as_retriever(search_kwargs={"k": 5}),
)
answer = qa.invoke({"query": "What does our refund policy say about gift cards?"})

Scenario 4 — LCEL batch — intent classification at scale

When a LangChain LCEL chain runs .batch(...)across many inputs and the SLA allows minutes-of-latency, Tessera's M10 batch arbitrage routes the request to the provider Batch API for 50% off. Per-input quality canary still fires on a 10% sample — batching never bypasses the quality contract.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from tessera_langchain import tessera_openai_config

llm = ChatOpenAI(
    model="gpt-4o-mini",
    openai_api_key="sk-...",
    **tessera_openai_config(api_key="tk_..."),
)

chain = (
    ChatPromptTemplate.from_template("Classify the intent of: {query}")
    | llm
    | StrOutputParser()
)
intents = chain.batch([{"query": q} for q in queries])

Worked savings example

Customer-support agent on gpt-4o, 5B tokens/month at OpenAI list prices.

Stage	Cost / mo	Saved
Baseline — OpenAI direct via LangChain	$24,000	—
+ Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch)	$9,400	$14,600
Tessera subscription (Growth tier — ≤5B tokens/mo)	$999	—
You net pay	$10,399	$13,601 / mo saved

Quality canary held mean-score 0.96 across the full mechanic stack for all 30 days (floor 0.95). Full breakdown in the worked-numbers post.

Architecture and quality contract

The adapter is open-source SDK (Apache-2.0); the proxy at api.tesseraai.iois closed — that's where the mechanic decisions, the canary cron, and the audit-immutable measurement layer run. The split is intentional: the wire format is open so you can audit what we send, the mechanic implementations are the asymmetric IP.

Composition cap: we never stack more than 2 content-mutating mechanics on the same request. Quality floor: per-stack mean eval-score must hold ≥ 0.95 over a rolling 3-day window — if it drops, the stack auto-rolls back to baseline and a 10% credit is issued on the affected fee window. Audit-immutable: every request emits an original-vs-actual cost pair, both snapshot-pinned to a pricing_catalog version captured at request time. Browse it on ledger.tesseraai.io.

FAQ

Does this break my LangChain callbacks / tracing / LangSmith?: No. Tessera operates at the HTTP layer underneath LangChain's ChatModel abstraction — callback handlers, LangSmith tracing, and tool-call event hooks all observe the same input/output pairs they would without the adapter. Tessera-side metadata (mechanics_stack, savings, audit-row id) is returned in response headers and visible in your dashboard at /portal/audit.
What happens if Tessera is down?: Your LangChain app sees an HTTP 5xx. The proxy ships a per-provider circuit breaker that skips degraded upstreams in auto-route decisions; cross- provider failover (M11) is opt-in and re-routes to OpenRouter when the primary upstream returns 5xx / connection error / timeout. Wrap the LLM with your standard LangChain retry / fallback (RunnableWithFallbacks) for belt-and-braces resilience.
Do you store my prompts and completions?: No. We log only token counts, cost deltas, mechanics_stack, and provider response status. Prompts and completions are never persisted. Full data handling on /security.

Where to go next

Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
tessera-llm/tessera-langchain on GitHub — Apache-2.0, browse the source, file issues, fork.
tessera-langchain on PyPI
@tessera-llm/langchain on npm
Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.