What is the Tessera Optimize Layer?

The Tessera Optimize Layer is a thin proxy that lives in your LLM request path. Your application points its OpenAI, Anthropic, or Google client at Tessera; Tessera forwards each request to the provider but first applies four moves: auto-route to a cheaper model when quality holds on your golden-set eval, auto-cache identical requests at the edge, auto-compress prompts via a deterministic whitespace + structural pass (LLMLingua-2 template substitution on roadmap), auto-batch where batch APIs apply. Every saved dollar is measured directly from proxy logs, not inferred from a billing CSV.

How does Tessera measure savings?

Savings are measured from Tessera proxy logs at request granularity. For each request that gets routed, cached, compressed, or batched, Tessera records the counterfactual provider cost (what the request would have cost without the optimization) and the actual incurred cost. Aggregate Ongoing Savings equal the sum of (counterfactual minus actual) across all in-scope workloads in the period. Provider price moves and unrelated workload shrinkage are excluded; only optimizations attributable to Tessera count toward your reported savings.

How is Tessera different from an observability platform?

Observability platforms trace requests and show dashboards. Tessera is an optimize layer — we sit in the request path and actively route, cache, compress, and batch on every call. Pricing is a flat monthly subscription by token volume, not a per-seat SaaS — and savings are measured directly from our own proxy logs. Your existing observability tool can stay — it still receives downstream telemetry. Substrate-include posture: Tessera is the layer above whatever vendors you already use, not their alternative.

What does Tessera cost?

Flat monthly pricing by gross tokens submitted, same feature set on every tier — only the token cap differs. Free Sandbox: 60 million tokens per month, $0, full mechanic stack active. Paid tiers: Starter $199/month (up to 1B tokens), Growth $999/month (up to 5B), Scale $3,999/month (up to 20B), Enterprise custom (20B+). No per-token markup and no cut of your savings — savings are measured and you keep 100%. No floor, no retainer, no contract review for activation. Quality preservation guaranteed at 0.95 by canary; three-day breach triggers auto-disable plus a service credit.

Does Tessera modify our production code?

No. Tessera is added to your request path via two HTTP headers and one config-line change — you point your OpenAI/Anthropic/Google client at the Tessera proxy base URL and add an API key. Tessera never modifies your application source, never deploys into your codebase, and never makes provider-side changes on your account. All optimization logic runs inside the Tessera proxy infrastructure. You can disable Tessera at any time by reverting the two headers, or by using the in-dashboard pause control (see next question).

Can I pause Tessera at any time?

Yes. Every operator dashboard ships with an always-available kill-switch — account-wide and per-workload. When engaged, the Tessera proxy bypasses all four optimizations (route, cache, compress, batch) and forwards your requests to the upstream provider as pure passthrough. Pause is reversible at any time without notice. The pause right is contractually preserved in §8 of the Tessera Terms of Service — Tessera does not work uncontrolled in your stack.

Two primary shapes carry most paid-tier signups: (1) Series A-B AI-native SaaS CTO, $20k-$200k per month on LLM APIs, gross margin under pressure, can change a base-URL in 30 minutes and signs up in 72 hours; (2) Series B-D scale-up adding AI features, $50k-$500k per month, AI Platform Lead owns the budget, light security review in 2-4 weeks. Plus a developer-facing Free Sandbox tier (60M tokens/mo, $0) for solo developers + side projects. Workloads tagged regulated (HIPAA, PCI-DSS, SOC 2 in-scope) never auto-route — the compliance gate blocks routing at the code level. Above 20B gross tokens/mo, Enterprise Clients may negotiate a custom MSA addendum (invoice billing, custom SLO).

How does Tessera handle data privacy?

Confidential Information furnished to Tessera is stored primarily inside the European Economic Area (EEA) under Estonian operating jurisdiction. We do not require, request, or process Client end-user personal data, model prompts, or model completions in their full content form — only token-count and structural metadata. US-based AI provider sub-processors (Anthropic, OpenAI, Google) are engaged solely for Tessera-internal analysis under strict anonymisation conditions per the Data Processing Agreement.

Does Tessera accept referral fees from AI providers?

No. Tessera receives no affiliate revenue, referral fees, kickbacks, sponsorships, advisory-board honoraria, or any other compensation from any AI provider, gateway vendor, or observability platform we recommend in the course of operating the proxy. Client fees are our only income. This is contractually binding under §10 of the Tessera Terms of Service (Vendor Neutrality) and breach permits Clients to terminate without penalty and withdraw their balance.

tesseraIntegrations

Integrations · Pydantic AI ↗

Tessera × Pydantic AI

tessera-pydantic-ai is a thin adapter that routes every Pydantic AI agent.run_sync() / agent.run()call through Tessera's substrate proxy. Two patterns: provider wrapper (one function call returns a ready-to-use Pydantic AI Provider) or config kwargs (you keep direct control of the underlying AsyncOpenAI / AsyncAnthropic client and just spread our config into its constructor).

v0.1 ships OpenAI + Anthropic.Mistral / Groq / Cohere are queued for v0.2 — those Provider classes have a similar custom-client pattern but the exact signature has not been end-to-end verified yet, so we'd rather ship two honest providers than five unverified ones. For verified Mistral / Groq / Cohere, use tessera-langchain or tessera-llamaindex.

Install

pip install tessera-pydantic-ai pydantic-ai openai anthropic

Get a free Tessera API key at tesseraai.io/dev — 60M tokens/month, no card up front.

Quickstart — OpenAI

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from tessera_pydantic_ai import tessera_openai_provider

provider = tessera_openai_provider(
    openai_api_key="sk-...",        # your OpenAI key
    tessera_api_key="tk_...",      # free from tesseraai.io/dev
)

agent = Agent(OpenAIChatModel("gpt-4o", provider=provider))
result = agent.run_sync("Summarize this customer support ticket in 2 sentences.")

Scenario 1 — Anthropic — same shape, different model class

The provider-wrapper pattern mirrors OpenAI. Use tessera_anthropic_provider in place of tessera_openai_provider and pass the result into AnthropicModel(...).

from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from tessera_pydantic_ai import tessera_anthropic_provider

provider = tessera_anthropic_provider(
    anthropic_api_key="sk-ant-...",
    tessera_api_key="tk_...",
)
agent = Agent(AnthropicModel("claude-sonnet-4-6", provider=provider))
result = agent.run_sync("Classify the intent of: ...")

Scenario 2 — Type-safe structured output

Pydantic AI's killer feature is the type-safe output_type — the agent produces a Pydantic model instance instead of a raw string. Auto-route often promotes structured-output workloads from gpt-4o down to gpt-4o-mini because schema-constrained tasks tolerate the swap on the canary eval.

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic import BaseModel
from tessera_pydantic_ai import tessera_openai_provider

class TicketTriage(BaseModel):
    intent: str   # bug / feature / billing / other
    severity: str # low / medium / high / critical
    summary: str

agent = Agent(
    OpenAIChatModel("gpt-4o-mini", provider=tessera_openai_provider(
        openai_api_key="sk-...",
        tessera_api_key="tk_...",
    )),
    output_type=TicketTriage,
)

result = agent.run_sync("Customer says their March invoice shows $0 but card was charged.")
# result.output is a TicketTriage instance — fully typed.

Scenario 3 — Fine-grained AsyncOpenAI client control

When you need to set an organization id, custom http_client, retry config, or compose other constructor kwargs that the wrapper hides, use the pass-through config function and build the AsyncOpenAI client yourself.

# When you need fine-grained control over the AsyncOpenAI client itself
# (organization id, custom http_client, retry config), use the pass-through
# config function instead of the wrapper.

from openai import AsyncOpenAI
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai import Agent
from tessera_pydantic_ai import tessera_openai_config

client = AsyncOpenAI(
    api_key="sk-...",
    organization="org-mine",
    **tessera_openai_config(api_key="tk_..."),
)
provider = OpenAIProvider(openai_client=client)
agent = Agent(OpenAIChatModel("gpt-4o", provider=provider))

Worked savings example

Customer-support agent on gpt-4o with structured output, 5B tokens/month, OpenAI list prices.

Stage	Cost / mo	Saved
Baseline — OpenAI direct via Pydantic AI	$24,000	—
+ Tessera (route, cache, prompt-cache, compress, M9 ceiling, batch)	$9,400	$14,600
Tessera subscription (Growth tier — ≤5B tokens/mo)	$999	—
You net pay	$10,399	$13,601 / mo saved

Quality canary held 0.96 across the full mechanic stack (floor 0.95). Auto-route from gpt-4o to gpt-4o-mini on schema-constrained sub-tasks contributed roughly half of the savings.

Architecture and quality contract

Pydantic AI + the underlying SDK clients (openai, anthropic) are NOT declared as dependencies of this package — install whichever providers you actually use. The Tessera factories import them lazily at call time so the package itself never blows up over missing optional deps.

Composition cap (max 2 content-mutators per request), per-stack 0.95 quality floor, audit-immutable measurement — all enforced on the proxy. Browse the verified-savings ledger at ledger.tesseraai.io.

FAQ

Why only OpenAI and Anthropic in v0.1?: Honesty over feature breadth. The Pydantic AI Provider class for each LLM accepts a custom underlying-SDK client (e.g. OpenAIProvider(openai_client=...)) — we have end-to-end verified that pattern against OpenAI and Anthropic. Mistral / Groq / Cohere Provider classes exist in Pydantic AI but their custom-client shape has not been tested in our CI. Per our public-claim discipline, the unverified providers stay queued for v0.2.
Does this break tools / structured outputs / streaming?: No. The Pydantic AI Provider object that wraps the underlying SDK client is unchanged in shape — agent.run_sync(), agent.run(), tool calls, structured outputs (output_type=...), and streaming all work unchanged. Auto-route gates on tool-calling capability.
What if I already use multiple providers in one app?: Same Tessera key powers both. Construct one Pydantic AI Agent per provider, route each through its matching Tessera helper. Per-workload auto-route + quality canary fire independently. Billing rolls up to the same monthly reading.

Where to go next

Get a free 60M-tokens/month API key — sign-up takes ~30 seconds.
How the mechanic stack works — composition cap, quality SLA, per-stack canary, auto-rollback.
tessera-pydantic-ai on PyPI
Worked-numbers blog post — $24k → $9.4k, quality canary 0.96.