Tessera × CrewAI
tessera-crewai wires Tessera's substrate proxy into CrewAI multi-agent flows. Factory functions return a pre-wired CrewAI LLM instance ready to assign to any Agent; the explicit LLM(...) + config-kwargs pattern is also supported for cases that need fine-grained control.
Multi-agent crews are the workload class that benefits most from caching. Crews routinely re-issue identical sub-prompts across the planning → execution → review loop. Exact-match cache (M2) and semantic cache (M5) catch those returns at zero upstream cost. Auto-route promotes triage / classification agents from gpt-4o down to gpt-4o-miniwhen the canary confirms equivalent output on the customer's eval set.
Install
pip install tessera-crewaiGet a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.
Quickstart — OpenAI
from crewai import Agent, Crew, Task
from tessera_crewai import tessera_openai_llm
llm = tessera_openai_llm(
model="gpt-4o",
openai_api_key="sk-...", # your OpenAI key
tessera_api_key="tk_...", # get a free one at tesseraai.io/dev
)
researcher = Agent(
role="Senior Researcher",
goal="Uncover cutting-edge developments in AI infrastructure",
backstory="You are a seasoned researcher with deep industry knowledge.",
llm=llm,
)
# Rest of your CrewAI code runs unchanged — Crew, Task, kickoff() all
# route through Tessera and benefit from auto-optimization.Scenario 1 — Anthropic-backed analyst agent
Same factory shape, different provider. Use tessera_anthropic_llm in place of tessera_openai_llm. Tessera auto-injects Anthropic cache_control: ephemeral markers on long stable system prompts — useful when agent backstories run long.
from crewai import Agent
from tessera_crewai import tessera_anthropic_llm
llm = tessera_anthropic_llm(
model="claude-sonnet-4-6",
anthropic_api_key="sk-ant-...",
tessera_api_key="tk_...",
)
analyst = Agent(
role="Senior Analyst",
goal="Synthesize research into investor-ready briefings",
backstory="You have ten years of equity research experience.",
llm=llm,
)Scenario 2 — Mixed-provider crew with tiered model assignment
Triage on a cheap model, deep-reasoning on a capable model — one Tessera key for both. Per-workload auto-route + quality canary fire independently for each agent role. Billing rolls up across the crew to one monthly reading.
from crewai import Agent, Crew, Task, Process
from tessera_crewai import tessera_openai_llm, tessera_anthropic_llm
router_llm = tessera_openai_llm(
model="gpt-4o-mini", # cheap classifier
openai_api_key="sk-...",
tessera_api_key="tk_...",
)
reasoner_llm = tessera_anthropic_llm(
model="claude-sonnet-4-6", # deep reasoning
anthropic_api_key="sk-ant-...",
tessera_api_key="tk_...",
)
triage = Agent(role="Triage", goal="Classify ticket priority", llm=router_llm)
investigate = Agent(role="Investigator", goal="Dig into root cause", llm=reasoner_llm)
resolve = Agent(role="Resolver", goal="Write customer-facing reply", llm=reasoner_llm)
crew = Crew(
agents=[triage, investigate, resolve],
tasks=[Task(description="Process incoming ticket: {ticket}", agent=triage)],
process=Process.sequential,
)
crew.kickoff(inputs={"ticket": "..."})Scenario 3 — Explicit LLM construction with custom kwargs
When the factory functions hide a CrewAI LLM kwarg you need (custom timeout, response_format, etc.), build the LLM yourself and spread tessera_openai_config() into the kwargs.
# When you need fine-grained LLM kwargs that the factory functions hide,
# pass tessera config directly into the CrewAI LLM constructor.
from crewai import LLM
from tessera_crewai import tessera_openai_config
llm = LLM(
model="openai/gpt-4o",
api_key="sk-...",
**tessera_openai_config(api_key="tk_..."),
)Worked savings example
Three-agent customer-support crew (triage → investigate → resolve), 5B tokens/month, mix of gpt-4o-mini and claude-sonnet-4-6.
| Stage | Cost / mo | Saved |
|---|---|---|
| Baseline — direct providers via CrewAI | $24,000 | — |
| + Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch) | $9,400 | $14,600 |
| Tessera fee (20% × measured savings) | $2,920 | — |
| You net pay | $12,320 | $11,680 / mo saved |
Cache hit rate on multi-agent crews is unusually high (35-55%) because planning + execution loops re-issue the same tool-call prompts. M2 + M5 together contribute most of the savings.
Architecture and quality contract
CrewAI is a peer dependency — install it alongside this package. The factory functions build a CrewAI LLM instance with base_url + default_headers pointed at the Tessera proxy.
Composition cap, per-stack 0.95 quality floor with auto-rollback, audit-immutable measurement — all enforced on the proxy. Verified-savings ledger at ledger.tesseraai.io.
FAQ
- Why only OpenAI and Anthropic in v0.1?
- Honesty over feature breadth. v0.1 covers ~85% of customer LLM traffic per our outreach research. Mistral / Groq / Cohere on CrewAI use a similar pattern but their constructor shape has not been end-to-end verified in our CI yet. Open a GitHub issue if you need one of those surfaced sooner.
- Does this work with Process.hierarchical / manager_llm?
- Yes. The manager_llm and worker llms are independent CrewAI LLM instances — wire each through its matching Tessera factory or config function. Auto-route and quality canary fire per-workload, so the manager can run on a cheaper model than the workers if the canary tolerates the swap.
- Do cache hits actually fire on multi-agent crews?
- Yes — and they fire more often than on single-agent workloads. Planning loops, tool retries, and reflection passes routinely re-issue identical sub-prompts. Exact-match cache (M2) catches verbatim repeats; semantic cache (M5) catches paraphrased ones above a 0.95 cosine-similarity threshold.
Where to go next
- Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
- How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
- tessera-crewai on PyPI
- Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.