What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via a deterministic whitespace + structural pass (LLMLingua-2 template substitution on roadmap), auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera count toward your reported savings.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. Pricing is a flat monthly subscription by token volume, not a per-seat SaaS — and savings are measured directly from our own proxy logs. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Flat monthly pricing by gross tokens submitted, same feature set on every tier — only the token cap differs. Free Sandbox: 60 million tokens per month, $0, full mechanic stack active. Paid tiers: Starter $199/month (up to 1B tokens), Growth $999/month (up to 5B), Scale $3,999/month (up to 20B), Enterprise custom (20B+). No per-token markup and no cut of your savings — savings are measured and you keep 100%. No floor, no retainer, no contract review for activation. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a service credit.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most paid-tier signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above 20B gross tokens/mo, Enterprise Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

tesseraIntegrations

Integrations · CrewAI ↗

Tessera × CrewAI

tessera-crewai wires Tessera's substrate proxy into CrewAI multi-agent flows. Factory functions return a pre-wired CrewAI LLM instance ready to assign to any Agent; the explicit LLM(...) + config-kwargs pattern is also supported for cases that need fine-grained control.

Multi-agent crews are the workload class that benefits most from caching. Crews routinely re-issue identical sub-prompts across the planning → execution → review loop. Exact-match cache (M2) and semantic cache (M5) catch those returns at zero upstream cost. Auto-route promotes triage / classification agents from gpt-4o down to gpt-4o-miniwhen the canary confirms equivalent output on the customer's eval set.

Install

pip install tessera-crewai

Get a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.

Quickstart — OpenAI

from crewai import Agent, Crew, Task
from tessera_crewai import tessera_openai_llm

llm = tessera_openai_llm(
    model="gpt-4o",
    openai_api_key="sk-...",   # your OpenAI key
    tessera_api_key="tk_...",  # get a free one at tesseraai.io/dev
)

researcher = Agent(
    role="Senior Researcher",
    goal="Uncover cutting-edge developments in AI infrastructure",
    backstory="You are a seasoned researcher with deep industry knowledge.",
    llm=llm,
)

# Rest of your CrewAI code runs unchanged — Crew, Task, kickoff() all
# route through Tessera and benefit from auto-optimization.

Scenario 1 — Anthropic-backed analyst agent

Same factory shape, different provider. Use tessera_anthropic_llm in place of tessera_openai_llm. Tessera auto-injects Anthropic cache_control: ephemeral markers on long stable system prompts — useful when agent backstories run long.

from crewai import Agent
from tessera_crewai import tessera_anthropic_llm

llm = tessera_anthropic_llm(
    model="claude-sonnet-4-6",
    anthropic_api_key="sk-ant-...",
    tessera_api_key="tk_...",
)

analyst = Agent(
    role="Senior Analyst",
    goal="Synthesize research into investor-ready briefings",
    backstory="You have ten years of equity research experience.",
    llm=llm,
)

Scenario 2 — Mixed-provider crew with tiered model assignment

Triage on a cheap model, deep-reasoning on a capable model — one Tessera key for both. Per-workload auto-route + quality canary fire independently for each agent role. Billing rolls up across the crew to one monthly reading.

from crewai import Agent, Crew, Task, Process
from tessera_crewai import tessera_openai_llm, tessera_anthropic_llm

router_llm = tessera_openai_llm(
    model="gpt-4o-mini",       # cheap classifier
    openai_api_key="sk-...",
    tessera_api_key="tk_...",
)
reasoner_llm = tessera_anthropic_llm(
    model="claude-sonnet-4-6",  # deep reasoning
    anthropic_api_key="sk-ant-...",
    tessera_api_key="tk_...",
)

triage = Agent(role="Triage", goal="Classify ticket priority", llm=router_llm)
investigate = Agent(role="Investigator", goal="Dig into root cause", llm=reasoner_llm)
resolve = Agent(role="Resolver", goal="Write customer-facing reply", llm=reasoner_llm)

crew = Crew(
    agents=[triage, investigate, resolve],
    tasks=[Task(description="Process incoming ticket: {ticket}", agent=triage)],
    process=Process.sequential,
)
crew.kickoff(inputs={"ticket": "..."})

Scenario 3 — Explicit LLM construction with custom kwargs

When the factory functions hide a CrewAI LLM kwarg you need (custom timeout, response_format, etc.), build the LLM yourself and spread tessera_openai_config() into the kwargs.

# When you need fine-grained LLM kwargs that the factory functions hide,
# pass tessera config directly into the CrewAI LLM constructor.

from crewai import LLM
from tessera_crewai import tessera_openai_config

llm = LLM(
    model="openai/gpt-4o",
    api_key="sk-...",
    **tessera_openai_config(api_key="tk_..."),
)

Worked savings example

Three-agent customer-support crew (triage → investigate → resolve), 5B tokens/month, mix of gpt-4o-mini and claude-sonnet-4-6.

Stage	Cost / mo	Saved
Baseline — direct providers via CrewAI	$24,000	—
+ Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch)	$9,400	$14,600
Tessera subscription (Growth tier — ≤5B tokens/mo)	$999	—
You net pay	$10,399	$13,601 / mo saved

Cache hit rate on multi-agent crews is unusually high (35-55%) because planning + execution loops re-issue the same tool-call prompts. M2 + M5 together contribute most of the savings.

Architecture and quality contract

CrewAI is a peer dependency — install it alongside this package. The factory functions build a CrewAI LLM instance with base_url + default_headers pointed at the Tessera proxy.

Composition cap, per-stack 0.95 quality floor with auto-rollback, audit-immutable measurement — all enforced on the proxy. Verified-savings ledger at ledger.tesseraai.io.

FAQ

Why only OpenAI and Anthropic in v0.1?: Honesty over feature breadth. v0.1 covers ~85% of customer LLM traffic per our outreach research. Mistral / Groq / Cohere on CrewAI use a similar pattern but their constructor shape has not been end-to-end verified in our CI yet. Open a GitHub issue if you need one of those surfaced sooner.
Does this work with Process.hierarchical / manager_llm?: Yes. The manager_llm and worker llms are independent CrewAI LLM instances — wire each through its matching Tessera factory or config function. Auto-route and quality canary fire per-workload, so the manager can run on a cheaper model than the workers if the canary tolerates the swap.
Do cache hits actually fire on multi-agent crews?: Yes — and they fire more often than on single-agent workloads. Planning loops, tool retries, and reflection passes routinely re-issue identical sub-prompts. Exact-match cache (M2) catches verbatim repeats; semantic cache (M5) catches paraphrased ones above a 0.95 cosine-similarity threshold.

Where to go next

Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
tessera-crewai on PyPI
Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.