Skip to main content

Tessera × CrewAI

tessera-crewai wires Tessera's substrate proxy into CrewAI multi-agent flows. Factory functions return a pre-wired CrewAI LLM instance ready to assign to any Agent; the explicit LLM(...) + config-kwargs pattern is also supported for cases that need fine-grained control.

Multi-agent crews are the workload class that benefits most from caching. Crews routinely re-issue identical sub-prompts across the planning → execution → review loop. Exact-match cache (M2) and semantic cache (M5) catch those returns at zero upstream cost. Auto-route promotes triage / classification agents from gpt-4o down to gpt-4o-miniwhen the canary confirms equivalent output on the customer's eval set.

Install

pip install tessera-crewai

Get a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.

Quickstart — OpenAI

from crewai import Agent, Crew, Task
from tessera_crewai import tessera_openai_llm

llm = tessera_openai_llm(
    model="gpt-4o",
    openai_api_key="sk-...",   # your OpenAI key
    tessera_api_key="tk_...",  # get a free one at tesseraai.io/dev
)

researcher = Agent(
    role="Senior Researcher",
    goal="Uncover cutting-edge developments in AI infrastructure",
    backstory="You are a seasoned researcher with deep industry knowledge.",
    llm=llm,
)

# Rest of your CrewAI code runs unchanged — Crew, Task, kickoff() all
# route through Tessera and benefit from auto-optimization.

Scenario 1 — Anthropic-backed analyst agent

Same factory shape, different provider. Use tessera_anthropic_llm in place of tessera_openai_llm. Tessera auto-injects Anthropic cache_control: ephemeral markers on long stable system prompts — useful when agent backstories run long.

from crewai import Agent
from tessera_crewai import tessera_anthropic_llm

llm = tessera_anthropic_llm(
    model="claude-sonnet-4-6",
    anthropic_api_key="sk-ant-...",
    tessera_api_key="tk_...",
)

analyst = Agent(
    role="Senior Analyst",
    goal="Synthesize research into investor-ready briefings",
    backstory="You have ten years of equity research experience.",
    llm=llm,
)

Scenario 2 — Mixed-provider crew with tiered model assignment

Triage on a cheap model, deep-reasoning on a capable model — one Tessera key for both. Per-workload auto-route + quality canary fire independently for each agent role. Billing rolls up across the crew to one monthly reading.

from crewai import Agent, Crew, Task, Process
from tessera_crewai import tessera_openai_llm, tessera_anthropic_llm

router_llm = tessera_openai_llm(
    model="gpt-4o-mini",       # cheap classifier
    openai_api_key="sk-...",
    tessera_api_key="tk_...",
)
reasoner_llm = tessera_anthropic_llm(
    model="claude-sonnet-4-6",  # deep reasoning
    anthropic_api_key="sk-ant-...",
    tessera_api_key="tk_...",
)

triage = Agent(role="Triage", goal="Classify ticket priority", llm=router_llm)
investigate = Agent(role="Investigator", goal="Dig into root cause", llm=reasoner_llm)
resolve = Agent(role="Resolver", goal="Write customer-facing reply", llm=reasoner_llm)

crew = Crew(
    agents=[triage, investigate, resolve],
    tasks=[Task(description="Process incoming ticket: {ticket}", agent=triage)],
    process=Process.sequential,
)
crew.kickoff(inputs={"ticket": "..."})

Scenario 3 — Explicit LLM construction with custom kwargs

When the factory functions hide a CrewAI LLM kwarg you need (custom timeout, response_format, etc.), build the LLM yourself and spread tessera_openai_config() into the kwargs.

# When you need fine-grained LLM kwargs that the factory functions hide,
# pass tessera config directly into the CrewAI LLM constructor.

from crewai import LLM
from tessera_crewai import tessera_openai_config

llm = LLM(
    model="openai/gpt-4o",
    api_key="sk-...",
    **tessera_openai_config(api_key="tk_..."),
)

Worked savings example

Three-agent customer-support crew (triage → investigate → resolve), 5B tokens/month, mix of gpt-4o-mini and claude-sonnet-4-6.

StageCost / moSaved
Baseline — direct providers via CrewAI$24,000
+ Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch)$9,400$14,600
Tessera fee (20% × measured savings)$2,920
You net pay$12,320$11,680 / mo saved

Cache hit rate on multi-agent crews is unusually high (35-55%) because planning + execution loops re-issue the same tool-call prompts. M2 + M5 together contribute most of the savings.

Architecture and quality contract

CrewAI is a peer dependency — install it alongside this package. The factory functions build a CrewAI LLM instance with base_url + default_headers pointed at the Tessera proxy.

Composition cap, per-stack 0.95 quality floor with auto-rollback, audit-immutable measurement — all enforced on the proxy. Verified-savings ledger at ledger.tesseraai.io.

FAQ

Why only OpenAI and Anthropic in v0.1?
Honesty over feature breadth. v0.1 covers ~85% of customer LLM traffic per our outreach research. Mistral / Groq / Cohere on CrewAI use a similar pattern but their constructor shape has not been end-to-end verified in our CI yet. Open a GitHub issue if you need one of those surfaced sooner.
Does this work with Process.hierarchical / manager_llm?
Yes. The manager_llm and worker llms are independent CrewAI LLM instances — wire each through its matching Tessera factory or config function. Auto-route and quality canary fire per-workload, so the manager can run on a cheaper model than the workers if the canary tolerates the swap.
Do cache hits actually fire on multi-agent crews?
Yes — and they fire more often than on single-agent workloads. Planning loops, tool retries, and reflection passes routinely re-issue identical sub-prompts. Exact-match cache (M2) catches verbatim repeats; semantic cache (M5) catches paraphrased ones above a 0.95 cosine-similarity threshold.

Where to go next