What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via a deterministic whitespace + structural pass (LLMLingua-2 template substitution on roadmap), auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera count toward your reported savings.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. Pricing is a flat monthly subscription by token volume, not a per-seat SaaS — and savings are measured directly from our own proxy logs. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Flat monthly pricing by gross tokens submitted, same feature set on every tier — only the token cap differs. Free Sandbox: 60 million tokens per month, $0, full mechanic stack active. Paid tiers: Starter $199/month (up to 1B tokens), Growth $999/month (up to 5B), Scale $3,999/month (up to 20B), Enterprise custom (20B+). No per-token markup and no cut of your savings — savings are measured and you keep 100%. No floor, no retainer, no contract review for activation. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a service credit.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most paid-tier signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above 20B gross tokens/mo, Enterprise Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

tesseraIntegrations

Integrations · AutoGen 0.4+ ↗

Tessera × AutoGen 0.4+

tessera-autogen wires Tessera's substrate proxy into AutoGen 0.4+ multi-agent teams. Factory functions return a pre-wired AutoGen ChatCompletionClient ready to pass into AssistantAgent, SelectorGroupChat, Swarm, or any other AutoGen team primitive.

v0.1 ships OpenAI + Anthropic — covers ~85% of customer LLM traffic per our outreach research. Mistral / Gemini queued for v0.2. AutoGen reflection loops, tool-use exchanges, and group-chat selector decisions all re-issue prompts often enough that exact + semantic cache contribute the bulk of savings on this framework specifically.

Install

pip install tessera-autogen autogen-core autogen-agentchat autogen-ext

Get a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.

Quickstart — OpenAI

from autogen_agentchat.agents import AssistantAgent
from tessera_autogen import tessera_openai_client

client = tessera_openai_client(
    model="gpt-4o",
    openai_api_key="sk-...",   # your OpenAI key
    tessera_api_key="tk_...",  # get a free one at tesseraai.io/dev
)

agent = AssistantAgent(name="researcher", model_client=client)

# Rest of your AutoGen code runs unchanged — single-agent calls,
# SelectorGroupChat, Swarm, tool use all route through Tessera.

Scenario 1 — SelectorGroupChat with tiered model assignment

A two-agent team where the selector / triage agent runs on a cheap model and the reasoner runs on a capable one. One Tessera key powers both clients; per-workload auto-route + quality canary fire independently for each agent. The selector workload, which routinely outputs short routing decisions, often qualifies for further auto-route swaps that the reasoner cannot tolerate.

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import SelectorGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from tessera_autogen import tessera_openai_client, tessera_anthropic_client

triage_client = tessera_openai_client(
    model="gpt-4o-mini", openai_api_key="sk-...", tessera_api_key="tk_...",
)
reasoner_client = tessera_anthropic_client(
    model="claude-sonnet-4-6", anthropic_api_key="sk-ant-...", tessera_api_key="tk_...",
)

triage = AssistantAgent(name="triage", model_client=triage_client,
    system_message="Decide ticket priority and routing.")
reasoner = AssistantAgent(name="reasoner", model_client=reasoner_client,
    system_message="Investigate root cause and draft a fix proposal.")

team = SelectorGroupChat(
    [triage, reasoner],
    model_client=triage_client,  # cheap selector
    termination_condition=MaxMessageTermination(8),
)
await team.run(task="Investigate ticket #7733 and propose a fix.")

Scenario 2 — Swarm with handoffs

AutoGen Swarm agents hand off to each other based on a shared context. The same handoff topology runs through one Tessera ChatCompletionClient — cache hits on identical handoff messages return at zero upstream cost.

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import Swarm
from tessera_autogen import tessera_openai_client

shared_client = tessera_openai_client(
    model="gpt-4o", openai_api_key="sk-...", tessera_api_key="tk_...",
)

researcher = AssistantAgent(
    name="researcher", model_client=shared_client,
    handoffs=["writer"], system_message="Research and hand off to writer.",
)
writer = AssistantAgent(
    name="writer", model_client=shared_client,
    handoffs=["researcher"], system_message="Draft, hand back to researcher if more facts needed.",
)

swarm = Swarm([researcher, writer])
await swarm.run(task="Write a 300-word brief on substrate-layer LLM cost optimization.")

Scenario 3 — Explicit ChatCompletionClient with custom kwargs

When the factory function hides a kwarg you need (custom timeout, response_format, parallel tool calls), build the client yourself and spread tessera_openai_config() into the kwargs.

# Fine-grained client kwargs when the factory functions don't expose what
# you need — e.g. response_format, custom timeout, parallel_tool_calls.

from autogen_ext.models.openai import OpenAIChatCompletionClient
from tessera_autogen import tessera_openai_config

client = OpenAIChatCompletionClient(
    model="gpt-4o",
    api_key="sk-...",
    **tessera_openai_config(api_key="tk_..."),
)

Worked savings example

Three-agent research team on AutoGen (planner → researcher → writer), 5B tokens/month, mix of gpt-4o-mini and claude-sonnet-4-6.

Stage	Cost / mo	Saved
Baseline — direct providers via AutoGen	$24,000	—
+ Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch)	$9,400	$14,600
Tessera subscription (Growth tier — ≤5B tokens/mo)	$999	—
You net pay	$10,399	$13,601 / mo saved

AutoGen workloads see particularly high M2 + M5 cache hit rates because SelectorGroupChat re-issues the agent-selection prompt on every turn.

Architecture and quality contract

autogen-core + autogen-ext are peer dependencies — install them alongside this package. The factory builds an OpenAIChatCompletionClient (or Anthropic equivalent) with base_url + default_headers pointed at the Tessera proxy.

Composition cap, per-stack 0.95 quality floor with auto-rollback, audit-immutable measurement — all enforced on the proxy. Verified-savings ledger at ledger.tesseraai.io.

FAQ

Does this work with AutoGen 0.2 (the old asyncio-based API)?: No. v0.1 targets AutoGen 0.4+ (the autogen-core / autogen-agentchat / autogen-ext architecture). The 0.2 API surface is different enough that it would need a separate adapter. Open an issue if you need 0.2 compatibility.
Does the per-stack canary fire correctly on group chats?: Yes — but the canary keys on workload, not on conversation. Each AutoGen agent is one workload from Tessera's perspective; multi-agent conversations create multiple per-workload eval streams that get aggregated to a per-stack score. Auto-rollback fires per-workload, not per-conversation.
Can I use this with Microsoft's Magnetic-One example?: Yes — Magnetic-One uses the same ChatCompletionClient abstraction. Wire each constituent agent (the Orchestrator, WebSurfer, FileSurfer, Coder, ComputerTerminal) through its matching Tessera factory. Tool-using agents benefit from auto-route's tool-capability gate.

Where to go next

Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
tessera-autogen on PyPI
Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.