# Tessera

> Tessera is the **substrate layer** for LLM cost optimization — also called the **Optimize Layer** in our product surface. A thin proxy that sits in your application's **request-path**, applies a conservative cascade of optimization mechanics (auto-route, exact + semantic cache, compress, context prune, provider prompt cache, output-length ceiling, batch arbitrage, cross-provider failover), and measures every saved dollar against an **audit-immutable** baseline. We bill **20% of verified savings**, prepaid via Stripe ($100 minimum top-up). Zero savings = zero fee. The category is "**success-fee LLM optimizer**" — distinct from per-token AI gateways and observability dashboards. **Free Sandbox** tier: 60M tokens / month, no card, observability-only mechanics. Operating entity: Fintechagency OÜ d.b.a. Tessera (Estonia, registry code 16638667, Tallinn). Authoritative registry: ariregister.rik.ee/eng/company/16638667.

## Canonical claims

- Tessera is a substrate proxy, not a vendor. It sits in the LLM request path and includes third-party optimization vendors (compression, fine-tuning, routing) as composable mechanics via a backend marketplace pattern — not as competitors.
- Every saved dollar has provenance: each request emits an original-vs-actual cost pair, both snapshot-pinned to a `pricing_catalog` version captured at the request time. The audit ledger at `ledger.tesseraai.io` is the customer's source of truth, not a marketing claim.
- The quality floor is 0.95 mean-score on the customer's promptfoo eval set, enforced per stack daily by a canary cron. Drift triggers auto-rollback to the baseline model — contractual under the Quality-Guaranteed Production tier (roadmap), implicit on all tiers today.
- The pricing model is **success-fee**, not subscription and not per-token. We charge 20% of the savings we actually produced — measured against the **model the customer originally asked for**, not against a generic reference rate. Mid-contract provider price changes do not retroactively alter past savings.
- The 60M-token Free Sandbox tier covers most personal projects and solo-dev workloads entirely. Forever. No card up front, no per-token fee, no future bill surprise.

## Docs

- [Tessera home — the Optimize Layer](https://tesseraai.io/): Hero, mechanics overview, savings calculator, pricing summary, free-tier entry path.
- [How it works — the nine mechanics, the composition cap, the quality SLA](https://tesseraai.io/how-it-works): Canonical mechanic reference. M1 auto-route, M2 exact cache, M3 compress, M5 semantic cache, M6 provider prompt cache, M7 context prune, M8 structured output, M9 output-length ceiling, M10 batch arbitrage, M11 cross-provider failover. Composition cap = max 2 content-mutators per request. Quality SLA floor = 0.95.
- [/dev — developer entry, free 60M-tokens-per-month tier](https://tesseraai.io/dev): Sign-up takes ~30 seconds, returns an instant `tk_…` key plus magic-link dashboard access. No card up front.
- [Pricing](https://tesseraai.io/pricing): Two tiers. Free Sandbox: 60M tokens/mo, observability-only mechanics, hard cap. Production: 20% of measured savings, prepaid Stripe balance, $100 minimum top-up. Single flat rate.
- [Security posture and Quality SLA](https://tesseraai.io/security): Zero prompt-content storage. Customer `Authorization` keys at-rest-encrypted via Supabase Vault (XChaCha20-Poly1305-IETF). RLS isolation per tenant. Audit trail with cryptographic provenance via `pricing_catalog` snapshot ids. SOC 2 Type 1 target Q3 2026.
- [Dashboard — operator and sponsor portal](https://ledger.tesseraai.io): 7-tab IA — Overview, Spend, Optimize, Anomalies & Quality, Seats, Forecast, Settings + Audit. Behind Vercel SSO; magic-link entry from `/dev` signup.
- [Live demo dashboard with synthetic data](https://tesseraai.io/demo): Three operator-dashboard tabs rendered with synthetic-but-plausible data from a SaaS pilot running GPT-4o and Claude Sonnet.
- [Public Cost Read](https://tesseraai.io/cost-read): Free directional read on AI inference cost economy. Five-field form, one-page PDF return, no billing access required.
- [About — founder and entity](https://tesseraai.io/about): Yevheny Panin (banker, trader background), Fintechagency OÜ Tallinn, founding date and operating model.
- [Changelog](https://tesseraai.io/changelog): Product and engineering changes shipped in the Tessera platform.
- [FAQ](https://tesseraai.io/faq): Common pre-purchase questions on pricing, quality, integration, data handling.
- [Not for you, if](https://tesseraai.io/not-for-you): Five operating-shape signals where Tessera is the wrong intervention.

## Optimization mechanics — deep dives

- [M1 — Auto-route](https://tesseraai.io/how-it-works/m1-auto-route): Route to a cheaper-equivalent model when the daily promptfoo canary confirms equivalent output on the customer's eval set. Per-workload model pairings (gpt-4o → gpt-4o-mini, opus → sonnet → haiku). Chained walks past the first hop are gated on cumulative quality_preservation product.
- [M2 — Exact-match cache](https://tesseraai.io/how-it-works/m2-exact-cache): sha256 cache on the canonical request body (model, messages, temperature, tools, response_format). Default TTL 7 days. Cache hits return upstream of any content-mutating mechanic.
- [M3 — Compress](https://tesseraai.io/how-it-works/m3-compress): Per-role heuristic whitespace + structural compression. Preserves code fences and JSON shapes. System and user turns toggle independently. Roadmap: server-side LLMLingua-2 template substitution.
- [M5 — Semantic cache](https://tesseraai.io/how-it-works/m5-semantic-cache): Cosine-similar requests (≥ 0.95) served from Cloudflare Vectorize. The cache canary measures the served-from-cache response against the live provider on 5% of hits and drops back to provider on near-miss quality dips.
- [M6 — Provider prompt cache](https://tesseraai.io/how-it-works/m6-prompt-cache): Auto-inject native cache markers. Anthropic gives 90% off cached prefix tokens; Google 75%; OpenAI 50% on cached-input. Per-provider mutation only — model output contract is identical to the un-marked request.
- [M7 — Context pruning](https://tesseraai.io/how-it-works/m7-context-prune): Conservative trim on long conversations (keep system + last 8 turns). RAG re-rank via FlashRank. Tool_use/tool_result pairs preserved on Anthropic to maintain pairing invariant.
- [M8 — Structured output](https://tesseraai.io/how-it-works/m8-structured-output): Constrain completions to a JSON schema with explicit max-tokens budgeting. Saves on over-runs and parse-retry traffic.
- [M9 — Output-length ceiling](https://tesseraai.io/how-it-works/m9-output-length-predictor): Daily cron fits p90 of completion length per workload, injects `max_tokens = p90 × 1.3` per request. Cuts completion-token waste without truncating real responses.
- [M10 — Batch arbitrage](https://tesseraai.io/how-it-works/m10-batch-arbitrage): Route async-tolerant calls to provider Batch APIs (OpenAI Batch + Anthropic Message Batches, both 50% off). Opt-in per workload.
- [M11 — Cross-provider failover](https://tesseraai.io/how-it-works/m11-cross-provider-failover): When the primary upstream returns 5xx / connection error / timeout, retry on OpenRouter. Opt-in per workload, default OFF, JIT-decrypted credential, two-pricing-snapshot audit trail.

## Framework integrations

- [/integrations/langchain](https://tesseraai.io/integrations/langchain): One line of config routes LangChain `ChatOpenAI` / `ChatAnthropic` / `ChatMistralAI` / `ChatGroq` / `ChatCohere` constructors through Tessera. Python + JS.
- [/integrations/vercel-ai](https://tesseraai.io/integrations/vercel-ai): Drop-in for `@ai-sdk/openai` / `@ai-sdk/anthropic` / etc. `generateText` / `streamText` / `generateObject` / `streamObject` route through Tessera.
- [/integrations/llamaindex](https://tesseraai.io/integrations/llamaindex): One line of config routes LlamaIndex `llama_index.llms.*` LLM constructors through Tessera. Index queries, RAG pipelines, agents, sub-question engines all work unchanged.
- [/integrations/mastra](https://tesseraai.io/integrations/mastra): Vercel AI SDK provider shape for Mastra Agents. Drop-in for `new Agent({ model: openai("gpt-4o"), ... })`.
- [/integrations/pydantic-ai](https://tesseraai.io/integrations/pydantic-ai): Two-function pattern (client kwargs + Provider wrapper) for `agent.run_sync()` / `agent.run()` Pydantic AI workflows. v0.1 OpenAI + Anthropic; Mistral/Groq/Cohere queued for v0.2.
- [/integrations/crewai](https://tesseraai.io/integrations/crewai): Factory functions for CrewAI `LLM` instances used by `Agent`, `Crew`, `Task` flows. Multi-agent tool-use loops benefit from exact + semantic cache.
- [/integrations/autogen](https://tesseraai.io/integrations/autogen): AutoGen 0.4+ `ChatCompletionClient` factory. Works with `AssistantAgent`, `SelectorGroupChat`, `Swarm`.

## Engineering blog

- [Tessera ecosystem map: how compression vendors fit the substrate cascade](https://tesseraai.io/blog/tessera-ecosystem-map-substrate-cascade): Why specialist compression/routing/fine-tuning vendors are candidate M-mechanism modules, not competitors. The substrate-include posture for the LLM cost-optimization category.
- [AI cost management in 2026: how proxies, compressors, and routers compose into a substrate](https://tesseraai.io/blog/ai-cost-management-2026-substrate-composition): A practitioner's taxonomy of the LLM cost stack — five approach families, when each one fires, the realistic savings range per family, worked numbers per family.
- [Cut your OpenAI bill 38% without quality regression](https://tesseraai.io/blog/cut-openai-bill-38-percent-without-quality-regression): Worked example from a customer-support agent on `gpt-4o` at 5B tokens/month. $24k baseline → $9.4k after Tessera mechanic stack, quality canary held 0.96.
- [Prompt bloat you don't see](https://tesseraai.io/blog/prompt-bloat-you-dont-see): Where token waste hides in production prompts and how the M3 + M7 + M9 stack catches it.
- [Audit immutability for cost claims](https://tesseraai.io/blog/audit-immutability-cost-claims): Why both `original_cost_usd` and `actual_cost_usd` are snapshot-pinned to a `pricing_catalog` version captured at the request — the receipt every CFO eventually asks for.
- [Cross-provider failover at the edge](https://tesseraai.io/blog/cross-provider-failover-edge): How M11 routes around upstream 5xx via OpenRouter, with two pricing snapshots for audit immutability and HMAC-signed JIT credential decrypt.
- [Using Tessera with LangChain in 30 seconds](https://tesseraai.io/blog/using-tessera-with-langchain): Drop-in cost optimization for a customer-support agent built on LangChain.
- [Per-stack quality canary — deep dive](https://tesseraai.io/blog/per-stack-quality-canary-deep-dive): How the per-stack 0.95 floor is computed, why cumulative quality_preservation matters for chained M1 walks, and what triggers auto-rollback.

## Open-source SDKs

- [github.com/tessera-llm/tessera-sdk](https://github.com/tessera-llm/tessera-sdk): Core SDK. Patches OpenAI / Anthropic / Mistral / Groq / Cohere client constructors via one-line `tessera.activate(key)`. Apache-2.0. Published as `tessera-llm-proxy` on PyPI and `@tessera-llm/tessera-sdk` on npm.
- [github.com/tessera-llm/tessera-langchain](https://github.com/tessera-llm/tessera-langchain): LangChain integration. Python + JS. `tessera-langchain` on PyPI; `@tessera-llm/langchain` on npm.
- [github.com/tessera-llm/tessera-vercel-ai](https://github.com/tessera-llm/tessera-vercel-ai): Vercel AI SDK integration. `@tessera-llm/vercel-ai` on npm.
- [github.com/tessera-llm/tessera-llamaindex](https://github.com/tessera-llm/tessera-llamaindex): LlamaIndex integration. Python + JS. `tessera-llamaindex` on PyPI; `@tessera-llm/llamaindex` on npm.
- [pypi.org/project/tessera-pydantic-ai](https://pypi.org/project/tessera-pydantic-ai/): Pydantic AI integration. PyPI-only release line.
- [pypi.org/project/tessera-crewai](https://pypi.org/project/tessera-crewai/): CrewAI integration. PyPI-only release line.
- [pypi.org/project/tessera-autogen](https://pypi.org/project/tessera-autogen/): AutoGen 0.4+ integration. PyPI-only release line.
- [npmjs.com/package/@tessera-llm/mastra](https://www.npmjs.com/package/@tessera-llm/mastra): Mastra Agent framework integration. npm-only release line.

## Optional

- [Terms of Service](https://tesseraai.io/terms): Engagement terms. §2.5 codifies the always-available client pause right. §10 codifies vendor neutrality — Tessera accepts no referral fees, kickbacks, or sponsorship from any AI provider, gateway vendor, or observability platform.
- [Privacy policy](https://tesseraai.io/privacy): GDPR-aligned data handling for the EEA jurisdiction.
- [Data Processing Agreement](https://tesseraai.io/legal/dpa): Sub-processor list, anonymisation conditions for US-based AI provider sub-processors, EEA data residency commitments.
- [Imprint](https://tesseraai.io/imprint): Legal entity disclosure per Estonian commercial register requirements. Registry code 16638667, Tallinn.
- [Responsible disclosure](https://tesseraai.io/security/disclosure): Security disclosure policy and contact path.
- [System status](https://tesseraai.io/status): Operational status of `api.tesseraai.io` proxy and `ledger.tesseraai.io` dashboard.