Tessera × LangChain
tessera-langchain is a thin adapter that routes every LangChain LLM call through the Tessera substrate layer. You keep your existing ChatOpenAI, ChatAnthropic, and other constructor calls — the adapter only changes the underlying HTTP endpoint and adds one header. Auto-route, exact + semantic cache, provider prompt cache, compression, output-length ceiling, and batch arbitrage all run on the proxy side, gated by your own promptfoo eval set.
Works with Python (langchain_openai, langchain_anthropic, langchain_mistralai, langchain_groq, langchain_cohere) and JavaScript / TypeScript (@langchain/openai + sibling provider packages). Same Tessera key powers both.
Install
pip install tessera-langchain # Python
npm install @tessera-llm/langchain # Node / TypeScriptGet a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.
Quickstart — Python
from langchain_openai import ChatOpenAI
from tessera_langchain import tessera_openai_config
llm = ChatOpenAI(
model="gpt-4o",
openai_api_key="sk-...", # your OpenAI key, unchanged
**tessera_openai_config(api_key="tk_..."), # one line routes through Tessera
)
response = llm.invoke("Summarize this customer support ticket in 2 sentences.")Scenario 1 — TypeScript / Node — same shape
The JS adapter mirrors the Python API. Spread tesseraOpenAIConfig()into the constructor options bag — that's the entire diff.
import { ChatOpenAI } from "@langchain/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/langchain";
const llm = new ChatOpenAI({
model: "gpt-4o",
apiKey: process.env.OPENAI_API_KEY,
...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});
const response = await llm.invoke("Summarize this customer support ticket in 2 sentences.");Scenario 2 — Tool-calling customer-support agent
A tool-calling agent on gpt-4o that triages support tickets, queries a knowledge-base tool, and writes a draft response. Auto-route gates on tool-calling capability, so the agent never gets routed to a non-tool-capable model. Exact cache hits on repeat tool inputs return instantly with no upstream cost.
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from tessera_langchain import tessera_openai_config
llm = ChatOpenAI(
model="gpt-4o",
openai_api_key="sk-...",
**tessera_openai_config(api_key="tk_..."),
)
prompt = ChatPromptTemplate.from_messages([
("system", "You triage support tickets. Use the kb_lookup tool to find prior cases."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, [kb_lookup_tool], prompt)
executor = AgentExecutor(agent=agent, tools=[kb_lookup_tool])Scenario 3 — RAG pipeline with provider prompt cache
A LangChain RetrievalQA chain on claude-sonnet-4-6. Tessera auto-injects Anthropic cache_control: ephemeral markers on the retrieved-context prefix — 90% off cache reads on the long retrieval prefix. RAG workloads with a stable document-corpus prefix typically save 40-70% on top-of-funnel input tokens.
from langchain_anthropic import ChatAnthropic
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma
from tessera_langchain import tessera_anthropic_config
llm = ChatAnthropic(
model="claude-sonnet-4-6",
anthropic_api_key="sk-ant-...",
**tessera_anthropic_config(api_key="tk_..."),
)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=Chroma(persist_directory="./db").as_retriever(search_kwargs={"k": 5}),
)
answer = qa.invoke({"query": "What does our refund policy say about gift cards?"})Scenario 4 — LCEL batch — intent classification at scale
When a LangChain LCEL chain runs .batch(...)across many inputs and the SLA allows minutes-of-latency, Tessera's M10 batch arbitrage routes the request to the provider Batch API for 50% off. Per-input quality canary still fires on a 10% sample — batching never bypasses the quality contract.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from tessera_langchain import tessera_openai_config
llm = ChatOpenAI(
model="gpt-4o-mini",
openai_api_key="sk-...",
**tessera_openai_config(api_key="tk_..."),
)
chain = (
ChatPromptTemplate.from_template("Classify the intent of: {query}")
| llm
| StrOutputParser()
)
intents = chain.batch([{"query": q} for q in queries])Worked savings example
Customer-support agent on gpt-4o, 5B tokens/month at OpenAI list prices.
| Stage | Cost / mo | Saved |
|---|---|---|
| Baseline — OpenAI direct via LangChain | $24,000 | — |
| + Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch) | $9,400 | $14,600 |
| Tessera fee (20% × measured savings) | $2,920 | — |
| You net pay | $12,320 | $11,680 / mo saved |
Quality canary held mean-score 0.96 across the full mechanic stack for all 30 days (floor 0.95). Full breakdown in the worked-numbers post.
Architecture and quality contract
The adapter is open-source SDK (Apache-2.0); the proxy at api.tesseraai.iois closed — that's where the mechanic decisions, the canary cron, and the audit-immutable measurement layer run. The split is intentional: the wire format is open so you can audit what we send, the mechanic implementations are the asymmetric IP.
Composition cap: we never stack more than 2 content-mutating mechanics on the same request. Quality floor: per-stack mean eval-score must hold ≥ 0.95 over a rolling 3-day window — if it drops, the stack auto-rolls back to baseline and a 10% credit is issued on the affected fee window. Audit-immutable: every request emits an original-vs-actual cost pair, both snapshot-pinned to a pricing_catalog version captured at request time. Browse it on ledger.tesseraai.io.
FAQ
- Does this break my LangChain callbacks / tracing / LangSmith?
- No. Tessera operates at the HTTP layer underneath LangChain's ChatModel abstraction — callback handlers, LangSmith tracing, and tool-call event hooks all observe the same input/output pairs they would without the adapter. Tessera-side metadata (mechanics_stack, savings, audit-row id) is returned in response headers and visible in your dashboard at
/portal/audit. - What happens if Tessera is down?
- Your LangChain app sees an HTTP 5xx. The proxy ships a per-provider circuit breaker that skips degraded upstreams in auto-route decisions; cross- provider failover (M11) is opt-in and re-routes to OpenRouter when the primary upstream returns 5xx / connection error / timeout. Wrap the LLM with your standard LangChain retry / fallback (
RunnableWithFallbacks) for belt-and-braces resilience. - Do you store my prompts and completions?
- No. We log only token counts, cost deltas, mechanics_stack, and provider response status. Prompts and completions are never persisted. Full data handling on /security.
Where to go next
- Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
- How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
- tessera-llm/tessera-langchain on GitHub — Apache-2.0, browse the source, file issues, fork.
- tessera-langchain on PyPI
- @tessera-llm/langchain on npm
- Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.