Using Tessera with LangChain in 30 seconds — drop-in cost optimization
If your application is on LangChain and your LLM bill is real enough to care about, here is the shortest path from "I am paying for every token" to "I am paying for measured savings only." It is two lines of code, your existing ChatModel keeps working, and the proxy underneath does the optimization.
The integration (Python)
One install. One line of config in your ChatOpenAI / ChatAnthropic / ChatMistralAI constructor. That is the entire integration.
pip install tessera-langchain
# In your existing app code:
from langchain_openai import ChatOpenAI
from tessera_langchain import tessera_openai_config
llm = ChatOpenAI(
model="gpt-4o",
openai_api_key="sk-...", # your OpenAI key
**tessera_openai_config(api_key="tk_..."), # routes through Tessera
)
# Everything you already wrote with this llm — agents, chains,
# tool calling, structured output, streaming — works unchanged.
response = llm.invoke("Summarize this support ticket in 2 sentences.")
Get a free Tessera API key (60M tokens/month, no card) at tesseraai.io/dev — sign-up takes about 30 seconds and returns an instant tk_… key.
The integration (TypeScript / LangChain.js)
npm install @tessera-llm/langchain
// In your existing app code:
import { ChatOpenAI } from "@langchain/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/langchain";
const llm = new ChatOpenAI({
model: "gpt-4o",
apiKey: process.env.OPENAI_API_KEY!,
...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});
// Existing chains, agents, tool calls, streaming — all unchanged.
const response = await llm.invoke("Summarize this support ticket in 2 sentences.");
What changes underneath
Every call your LangChain ChatModel makes now goes to api.tesseraai.io first. The Tessera proxy applies a stack of cost-optimization mechanics in real time before forwarding the request to the provider:
- Auto-route to a cheaper-equivalent model when a daily promptfoo canary on your eval set holds > 0.95 mean-score. Stays on the more expensive model otherwise.
- Exact + semantic cache on the canonical request body (sha256 hash, 7-day TTL, Cloudflare edge KV).
- Provider prompt-cache headers — injects OpenAI
cached_tokens_idor Anthropiccache_control: ephemeralmarkers when a stable system prefix is detected. 50-90% off on cached prefix tokens. - Per-role prompt compression (heuristic whitespace/structural normalization preserving code fences and JSON shapes). Independent toggles for system vs user turns.
- Context pruning for long conversations (system + last 8 turns; TF-IDF rerank on RAG attachments).
- Output-length ceiling — daily compute fits p90 of your historical output length per workload, injects
max_tokens = p90 × 1.3. - Batch arbitrage on async-tolerant requests (provider Batch APIs run 50% off list).
On top of these, a per-provider circuit breaker tracks rolling 5xx rates per upstream and skips degraded providers in auto-route decisions until they recover. (Cross-provider failover — re-routing to a different provider entirely — is on the roadmap, not shipped yet. See /how-it-works → Reliability primitives for current status.)
What does NOT change
- Your eval set. Same promptfoo / langsmith / custom eval. Tessera fires its own daily canary against a 10% sample of production traffic and rolls back the offending mechanic stack if quality drops below 0.95 — but your application-level evals run against the proxied output unchanged.
- Tool calling. The proxy passes
toolsthrough and respects them for routing decisions (auto-route never picks a fallback model that lacks tool-calling capability when tools are present in the request). - Streaming. SSE / chunked responses pass through. Mid-stream provider 5xx surfaces a terminal
[ERROR]marker rather than silently retrying — we never hide failed completions from your app. - Structured output / JSON mode.
response_formatis preserved on routing decisions; the auto-route table only considers models that support the requested format. - Your provider keys. Your OpenAI / Anthropic / Mistral key stays in your environment. The proxy forwards it upstream untouched. We never store prompts or completions — only token counts, cost deltas, and mechanic-stack metadata.
Worked example: a customer-support LangChain agent
Concrete numbers from a real beta workload — customer-support agent on gpt-4o with a top-5 RAG retrieval per turn, 1.2 billion tokens/month aggregate, OpenAI list prices.
| Stage | Cost / mo | Saved |
|---|---|---|
| LangChain → OpenAI direct (baseline) | $24,000 | — |
| + LangChain → Tessera (auto-route + cache + prompt-cache headers) | $11,520 | $12,480 |
| + context pruning + M9 ceiling + compression + batch | $9,400 | $2,120 |
| Tessera-optimized total | $9,400 | $14,600 |
| Tessera fee (20% × savings) | $2,920 | — |
| Customer net pay | $12,320 | $11,680 |
Quality canary across the full mechanic stack: mean-score 0.96 (floor 0.95) — 0.95 SLA held all 30 days. The application code is one LangChain ChatOpenAI constructor + the eight-line agent setup they already had. No prompt rewrites. No model swap by hand. Full breakdown by mechanic in the companion post.
Per-provider one-liners
from tessera_langchain import (
tessera_openai_config, # → langchain_openai.ChatOpenAI(...)
tessera_anthropic_config, # → langchain_anthropic.ChatAnthropic(...)
tessera_mistral_config, # → langchain_mistralai.ChatMistralAI(...)
tessera_groq_config, # → langchain_groq.ChatGroq(...)
tessera_cohere_config, # → langchain_cohere.ChatCohere(...)
)
# Or the generic dispatcher when provider is runtime-parameterized:
from tessera_langchain import tessera_config
cfg = tessera_config("anthropic", api_key="tk_...")
All five constructors accept the same shape: **tessera_<provider>_config(api_key=...) (or spread in TypeScript). The package ships separate functions per provider because each LangChain ChatModel uses a slightly different field name for the upstream base URL — the per-provider functions return the correct keyword for each.
FAQ
Q: Does this break my eval set?
No. Your eval runs against the proxied output unchanged. Tessera additionally fires its own daily quality canary against a 10% production sample for SLA enforcement — that is independent of your eval pipeline.
Q: My agent uses tool calling. Does that still work?
Yes. The proxy passes tools through. Auto-route gates on tool-calling capability — if a cheaper fallback model does not support function calling, the request stays on the original model for that call.
Q: Streaming?
Streamed responses pass through. Cache hits still stream from the proxy edge (fast). Provider failures mid-stream surface a terminal error marker rather than silently retrying.
Q: Can I use this with the LangChain `init_chat_model()` helper?
Should work — pass the Tessera config kwargs as the LangChain init kwargs: init_chat_model("gpt-4o", model_provider="openai", **tessera_openai_config(api_key="tk_...")). This routes through the same constructor path as a direct ChatOpenAI(...) call, but we have not exhaustively tested every model_provider value. File an issue at tessera-llm/tessera-langchain if a specific provider/init combination misbehaves and we will ship a patch.
Q: What if I am using a LangChain provider class not in the list?
The Tessera proxy accepts the OpenAI wire format on api.tesseraai.io/v1/openai — any LangChain provider class that accepts a base_url override and default_headers works (Together, DeepSeek, Fireworks, OpenRouter pass-through, etc.). File an issue if your provider class is missing a first-class helper — we ship them in patch releases.
Q: How is this different from the main `tessera-sdk`?
Same proxy. Same mechanics. Same billing record. `tessera-sdk` patches the underlying OpenAI / Anthropic / etc. SDK constructors via a one-line tessera.activate(key). `tessera-langchain` wires into LangChain's ChatModel constructors directly. Use whichever fits your codebase. Side-by- side install is supported.
References
- tessera-langchain on GitHub (Apache-2.0 SDK)
- PyPI: tessera-langchain · npm: @tessera-llm/langchain
- How the mechanics + per-stack canary + auto-rollback work
- Worked example: $24k → $9.4k stage-by-stage
- The prompt bloat you don't see — why system-prompt size matters
- Cross-provider failover at the edge — M11 architecture write-up
Try Tessera + LangChain on your workload
60M tokens free, no card, one line of config
pip install tessera-langchain (or npm install @tessera-llm/langchain). Wire your existing ChatModel through the Tessera proxy. Kill-switch any time. Pay 20% only on measured savings.
Get free API key→