Skip to main content

Tessera × LangChain

tessera-langchain is a thin adapter that routes every LangChain LLM call through the Tessera substrate layer. You keep your existing ChatOpenAI, ChatAnthropic, and other constructor calls — the adapter only changes the underlying HTTP endpoint and adds one header. Auto-route, exact + semantic cache, provider prompt cache, compression, output-length ceiling, and batch arbitrage all run on the proxy side, gated by your own promptfoo eval set.

Works with Python (langchain_openai, langchain_anthropic, langchain_mistralai, langchain_groq, langchain_cohere) and JavaScript / TypeScript (@langchain/openai + sibling provider packages). Same Tessera key powers both.

Install

pip install tessera-langchain          # Python
npm install @tessera-llm/langchain     # Node / TypeScript

Get a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.

Quickstart — Python

from langchain_openai import ChatOpenAI
from tessera_langchain import tessera_openai_config

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-...",                          # your OpenAI key, unchanged
    **tessera_openai_config(api_key="tk_..."),       # one line routes through Tessera
)

response = llm.invoke("Summarize this customer support ticket in 2 sentences.")

Scenario 1 — TypeScript / Node — same shape

The JS adapter mirrors the Python API. Spread tesseraOpenAIConfig()into the constructor options bag — that's the entire diff.

import { ChatOpenAI } from "@langchain/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/langchain";

const llm = new ChatOpenAI({
  model: "gpt-4o",
  apiKey: process.env.OPENAI_API_KEY,
  ...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});

const response = await llm.invoke("Summarize this customer support ticket in 2 sentences.");

Scenario 2 — Tool-calling customer-support agent

A tool-calling agent on gpt-4o that triages support tickets, queries a knowledge-base tool, and writes a draft response. Auto-route gates on tool-calling capability, so the agent never gets routed to a non-tool-capable model. Exact cache hits on repeat tool inputs return instantly with no upstream cost.

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from tessera_langchain import tessera_openai_config

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-...",
    **tessera_openai_config(api_key="tk_..."),
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You triage support tickets. Use the kb_lookup tool to find prior cases."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, [kb_lookup_tool], prompt)
executor = AgentExecutor(agent=agent, tools=[kb_lookup_tool])

Scenario 3 — RAG pipeline with provider prompt cache

A LangChain RetrievalQA chain on claude-sonnet-4-6. Tessera auto-injects Anthropic cache_control: ephemeral markers on the retrieved-context prefix — 90% off cache reads on the long retrieval prefix. RAG workloads with a stable document-corpus prefix typically save 40-70% on top-of-funnel input tokens.

from langchain_anthropic import ChatAnthropic
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma
from tessera_langchain import tessera_anthropic_config

llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    anthropic_api_key="sk-ant-...",
    **tessera_anthropic_config(api_key="tk_..."),
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=Chroma(persist_directory="./db").as_retriever(search_kwargs={"k": 5}),
)
answer = qa.invoke({"query": "What does our refund policy say about gift cards?"})

Scenario 4 — LCEL batch — intent classification at scale

When a LangChain LCEL chain runs .batch(...)across many inputs and the SLA allows minutes-of-latency, Tessera's M10 batch arbitrage routes the request to the provider Batch API for 50% off. Per-input quality canary still fires on a 10% sample — batching never bypasses the quality contract.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from tessera_langchain import tessera_openai_config

llm = ChatOpenAI(
    model="gpt-4o-mini",
    openai_api_key="sk-...",
    **tessera_openai_config(api_key="tk_..."),
)

chain = (
    ChatPromptTemplate.from_template("Classify the intent of: {query}")
    | llm
    | StrOutputParser()
)
intents = chain.batch([{"query": q} for q in queries])

Worked savings example

Customer-support agent on gpt-4o, 5B tokens/month at OpenAI list prices.

StageCost / moSaved
Baseline — OpenAI direct via LangChain$24,000
+ Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch)$9,400$14,600
Tessera fee (20% × measured savings)$2,920
You net pay$12,320$11,680 / mo saved

Quality canary held mean-score 0.96 across the full mechanic stack for all 30 days (floor 0.95). Full breakdown in the worked-numbers post.

Architecture and quality contract

The adapter is open-source SDK (Apache-2.0); the proxy at api.tesseraai.iois closed — that's where the mechanic decisions, the canary cron, and the audit-immutable measurement layer run. The split is intentional: the wire format is open so you can audit what we send, the mechanic implementations are the asymmetric IP.

Composition cap: we never stack more than 2 content-mutating mechanics on the same request. Quality floor: per-stack mean eval-score must hold ≥ 0.95 over a rolling 3-day window — if it drops, the stack auto-rolls back to baseline and a 10% credit is issued on the affected fee window. Audit-immutable: every request emits an original-vs-actual cost pair, both snapshot-pinned to a pricing_catalog version captured at request time. Browse it on ledger.tesseraai.io.

FAQ

Does this break my LangChain callbacks / tracing / LangSmith?
No. Tessera operates at the HTTP layer underneath LangChain's ChatModel abstraction — callback handlers, LangSmith tracing, and tool-call event hooks all observe the same input/output pairs they would without the adapter. Tessera-side metadata (mechanics_stack, savings, audit-row id) is returned in response headers and visible in your dashboard at /portal/audit.
What happens if Tessera is down?
Your LangChain app sees an HTTP 5xx. The proxy ships a per-provider circuit breaker that skips degraded upstreams in auto-route decisions; cross- provider failover (M11) is opt-in and re-routes to OpenRouter when the primary upstream returns 5xx / connection error / timeout. Wrap the LLM with your standard LangChain retry / fallback (RunnableWithFallbacks) for belt-and-braces resilience.
Do you store my prompts and completions?
No. We log only token counts, cost deltas, mechanics_stack, and provider response status. Prompts and completions are never persisted. Full data handling on /security.

Where to go next