Tessera × AutoGen 0.4+
tessera-autogen wires Tessera's substrate proxy into AutoGen 0.4+ multi-agent teams. Factory functions return a pre-wired AutoGen ChatCompletionClient ready to pass into AssistantAgent, SelectorGroupChat, Swarm, or any other AutoGen team primitive.
v0.1 ships OpenAI + Anthropic — covers ~85% of customer LLM traffic per our outreach research. Mistral / Gemini queued for v0.2. AutoGen reflection loops, tool-use exchanges, and group-chat selector decisions all re-issue prompts often enough that exact + semantic cache contribute the bulk of savings on this framework specifically.
Install
pip install tessera-autogen autogen-core autogen-agentchat autogen-extGet a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.
Quickstart — OpenAI
from autogen_agentchat.agents import AssistantAgent
from tessera_autogen import tessera_openai_client
client = tessera_openai_client(
model="gpt-4o",
openai_api_key="sk-...", # your OpenAI key
tessera_api_key="tk_...", # get a free one at tesseraai.io/dev
)
agent = AssistantAgent(name="researcher", model_client=client)
# Rest of your AutoGen code runs unchanged — single-agent calls,
# SelectorGroupChat, Swarm, tool use all route through Tessera.Scenario 1 — SelectorGroupChat with tiered model assignment
A two-agent team where the selector / triage agent runs on a cheap model and the reasoner runs on a capable one. One Tessera key powers both clients; per-workload auto-route + quality canary fire independently for each agent. The selector workload, which routinely outputs short routing decisions, often qualifies for further auto-route swaps that the reasoner cannot tolerate.
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import SelectorGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from tessera_autogen import tessera_openai_client, tessera_anthropic_client
triage_client = tessera_openai_client(
model="gpt-4o-mini", openai_api_key="sk-...", tessera_api_key="tk_...",
)
reasoner_client = tessera_anthropic_client(
model="claude-sonnet-4-6", anthropic_api_key="sk-ant-...", tessera_api_key="tk_...",
)
triage = AssistantAgent(name="triage", model_client=triage_client,
system_message="Decide ticket priority and routing.")
reasoner = AssistantAgent(name="reasoner", model_client=reasoner_client,
system_message="Investigate root cause and draft a fix proposal.")
team = SelectorGroupChat(
[triage, reasoner],
model_client=triage_client, # cheap selector
termination_condition=MaxMessageTermination(8),
)
await team.run(task="Investigate ticket #7733 and propose a fix.")Scenario 2 — Swarm with handoffs
AutoGen Swarm agents hand off to each other based on a shared context. The same handoff topology runs through one Tessera ChatCompletionClient — cache hits on identical handoff messages return at zero upstream cost.
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import Swarm
from tessera_autogen import tessera_openai_client
shared_client = tessera_openai_client(
model="gpt-4o", openai_api_key="sk-...", tessera_api_key="tk_...",
)
researcher = AssistantAgent(
name="researcher", model_client=shared_client,
handoffs=["writer"], system_message="Research and hand off to writer.",
)
writer = AssistantAgent(
name="writer", model_client=shared_client,
handoffs=["researcher"], system_message="Draft, hand back to researcher if more facts needed.",
)
swarm = Swarm([researcher, writer])
await swarm.run(task="Write a 300-word brief on substrate-layer LLM cost optimization.")Scenario 3 — Explicit ChatCompletionClient with custom kwargs
When the factory function hides a kwarg you need (custom timeout, response_format, parallel tool calls), build the client yourself and spread tessera_openai_config() into the kwargs.
# Fine-grained client kwargs when the factory functions don't expose what
# you need — e.g. response_format, custom timeout, parallel_tool_calls.
from autogen_ext.models.openai import OpenAIChatCompletionClient
from tessera_autogen import tessera_openai_config
client = OpenAIChatCompletionClient(
model="gpt-4o",
api_key="sk-...",
**tessera_openai_config(api_key="tk_..."),
)Worked savings example
Three-agent research team on AutoGen (planner → researcher → writer), 5B tokens/month, mix of gpt-4o-mini and claude-sonnet-4-6.
| Stage | Cost / mo | Saved |
|---|---|---|
| Baseline — direct providers via AutoGen | $24,000 | — |
| + Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch) | $9,400 | $14,600 |
| Tessera fee (20% × measured savings) | $2,920 | — |
| You net pay | $12,320 | $11,680 / mo saved |
AutoGen workloads see particularly high M2 + M5 cache hit rates because SelectorGroupChat re-issues the agent-selection prompt on every turn.
Architecture and quality contract
autogen-core + autogen-ext are peer dependencies — install them alongside this package. The factory builds an OpenAIChatCompletionClient (or Anthropic equivalent) with base_url + default_headers pointed at the Tessera proxy.
Composition cap, per-stack 0.95 quality floor with auto-rollback, audit-immutable measurement — all enforced on the proxy. Verified-savings ledger at ledger.tesseraai.io.
FAQ
- Does this work with AutoGen 0.2 (the old asyncio-based API)?
- No. v0.1 targets AutoGen 0.4+ (the autogen-core / autogen-agentchat / autogen-ext architecture). The 0.2 API surface is different enough that it would need a separate adapter. Open an issue if you need 0.2 compatibility.
- Does the per-stack canary fire correctly on group chats?
- Yes — but the canary keys on workload, not on conversation. Each AutoGen agent is one workload from Tessera's perspective; multi-agent conversations create multiple per-workload eval streams that get aggregated to a per-stack score. Auto-rollback fires per-workload, not per-conversation.
- Can I use this with Microsoft's Magnetic-One example?
- Yes — Magnetic-One uses the same ChatCompletionClient abstraction. Wire each constituent agent (the Orchestrator, WebSurfer, FileSurfer, Coder, ComputerTerminal) through its matching Tessera factory. Tool-using agents benefit from auto-route's tool-capability gate.
Where to go next
- Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
- How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
- tessera-autogen on PyPI
- Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.