Skip to main content

Free Dev tier · 60M tokens / month · no card

Cut your LLM bill 30–40%. Four optimizations, measured live.

Drop-in proxy for OpenAI, Anthropic, and 12 more. Routes to cheaper-equivalent models, caches repeated prompts, compresses context, batches eligible calls. Every request shows the dollars saved, live. Quality stays at ≥ 0.95 against your golden set or routing auto-disables. Remove one line of code and you're back to direct API in under two hours.

0Apache 2.0 · TypeScript + Python

api.tesseraai.io · 30s demo
curl https://api.tesseraai.io/v1/openai/chat/completions \
  -H "X-Tessera-Key: tk_<your-free-key>" \
  -H "Authorization: Bearer sk-<your-openai-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role":"user","content":"Hello"}]
  }'

# Response is plain OpenAI shape.
# Behind the scenes: route + cache + compress + batch.
# Open ledger.tesseraai.io/portal — savings counter ticks live.

Free tier ceiling

60M

tokens / mo

Optimization mechanics

4

per request

Providers supported

13

one OpenAI-shaped config each

Fee on free tier

$0

forever, until you upgrade

Providers supported

  • OpenAI
  • Anthropic
  • Google Gemini
  • xAI
  • Cohere
  • Mistral
  • DeepSeek
  • Groq
  • Together
  • Fireworks
  • OpenRouter
  • Perplexity
  • Cerebras

GDPR ready

EEA-only Confidential Information storage

DPA on request

Standard contractual clauses on file

SOC 2 Type I

In progress · Q3 2026 attestation target

Vendor-neutral

Zero affiliate revenue, contractually bound

How it works

Three minutes to first measured savings.

01

Sign up

Email + ToS. No card. Get your tk_ key + magic-link instantly at ledger.tesseraai.io/signup-dev.

02

Drop two headers

Point your OpenAI / Anthropic client at api.tesseraai.io. Send your provider key in Authorization. Send your Tessera key in X-Tessera-Key. That's it.

03

Watch the counter

Every request goes through the proxy. Savings measured per-request, surfaced live in ledger.tesseraai.io/portal.

01 · LangChain

The dedicated tessera-langchain package wires ChatOpenAI / ChatAnthropic / ChatMistralAI through the Tessera proxy with one line of config — no header plumbing.

tessera-langchain

02 · Vercel AI SDK

The dedicated @tessera-llm/vercel-ai package wires createOpenAI / createAnthropic / createMistral through the Tessera proxy. Works with generateText, streamText, generateObject — unchanged.

@tessera-llm/vercel-ai

03 · LlamaIndex

The dedicated tessera-llamaindex package wires the OpenAI / Anthropic / Mistral / Groq / Cohere LLM classes through the Tessera proxy. RAG, agents, query engines run unchanged.

tessera-llamaindex

Four mechanics, one proxy

Four optimizations every LLM request runs through.

These four are what you see. Internally Tessera runs nine — chained auto-route, per-role prompt split, output-length prediction, batch reconciliation, and more — plus reliability primitives that disable a single mechanic combination on canary regression and cross-provider failover on primary upstream 5xx. Full list at /how-it-works.

Route

Auto-route to cheaper-equivalent models

For each of your endpoints, we pre-compute which cheaper model returns equivalent quality. GPT-4o → GPT-4o-mini. Claude Opus → Sonnet. A 5% quality canary locks the assumption.

Cache

Auto-cache repeated prompt hashes

Identical-prompt requests within a 7-day window return cached responses. Hash-locked. Per-key TTL. Cache miss falls through transparently.

Compress

Auto-compress context with semantic preservation

Strip low-signal tokens from prompts before they hit the LLM. LLMLingua-2 algorithm preserves semantic intent. Optional per-workload toggle.

Batch

Auto-batch eligible requests

When latency tolerates, batch parallel calls into a single upstream request. Provider batch APIs (50% discount on OpenAI, etc.) used when available.

Pricing

Free Dev for exploration. Production when you scale.

Performance fee on Production — $0 if we don't save you money.

Free Dev

$0

forever, up to limit

  • 60M tokens / month
  • 30 requests / minute
  • All 4 optimization mechanics
  • Real-time savings counter
  • Anomaly alerts (read-only)
  • Apache 2.0 SDK · Python + TypeScript
  • No card required
Get free API key →

Production

20%

of measured savings vs your unoptimized spend · $0 invoiced if we save you nothing

  • Unlimited token throughput
  • 60 requests / minute
  • Balance management + Stripe top-ups
  • Monthly savings statement + CSV export (audit-grade)
  • Auto-throttle on cost spike, auto-halt on runaway
  • Team seats (up to 5)
  • Quality SLA floor 0.95 · auto-rollback on drift
Start free, upgrade when you scale →

FAQ

Why a proxy and not just an observability layer?

Observability shows you what your LLM calls did. Tessera does the optimization inside the request path. Same dollar saved, zero engineer hours. Compatible with whatever telemetry you already run.

I already have observability set up.

Keep it. Tessera sits on the request path; your existing tracer still receives downstream telemetry. Different layer, not a replacement.

Will routing change my output quality?

Per-workload, we run a quality canary on 5% of traffic. If the cheaper-equivalent model drifts >10% on score, the route auto-disables for that workload and Sentry alerts. Quality is never traded for cost.

What's the 60M ceiling for?

Free Dev tier prevents production traffic from squatting indefinitely on free quota. Hobby + side projects rarely hit it. When you do, upgrade to Production — you pay only on measured savings.

What providers are supported today?

OpenAI, Anthropic, Google (Gemini AI Studio), xAI, Cohere, Mistral, DeepSeek, Groq, Together, Fireworks, OpenRouter, Perplexity, Cerebras. AWS Bedrock, Azure OpenAI, Vertex AI — September 2026.

Where does my data go?

Through Cloudflare Workers (request-path proxy) → upstream provider (your existing key, your existing billing relationship). We log token counts + cost deltas in Supabase (Tessera-managed). Per-request prompt content is not stored. Full audit at /security.

Free key. 30 seconds. No card.

Get free API key →

Email + ToS only. Magic link sign-in. Key shown once — copy it to your secret manager.

Questions? Join the GitHub Discussions →