Free Dev tier · 60M tokens / month · no card
Cut your LLM bill 30–40%. Four optimizations, measured live.
Drop-in proxy for OpenAI, Anthropic, and 12 more. Routes to cheaper-equivalent models, caches repeated prompts, compresses context, batches eligible calls. Every request shows the dollars saved, live. Quality stays at ≥ 0.95 against your golden set or routing auto-disables. Remove one line of code and you're back to direct API in under two hours.
0Apache 2.0 · TypeScript + Python
curl https://api.tesseraai.io/v1/openai/chat/completions \
-H "X-Tessera-Key: tk_<your-free-key>" \
-H "Authorization: Bearer sk-<your-openai-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role":"user","content":"Hello"}]
}'
# Response is plain OpenAI shape.
# Behind the scenes: route + cache + compress + batch.
# Open ledger.tesseraai.io/portal — savings counter ticks live.Free tier ceiling
60M
tokens / mo
Optimization mechanics
4
per request
Providers supported
13
one OpenAI-shaped config each
Fee on free tier
$0
forever, until you upgrade
Providers supported
- OpenAI
- Anthropic
- Google Gemini
- xAI
- Cohere
- Mistral
- DeepSeek
- Groq
- Together
- Fireworks
- OpenRouter
- Perplexity
- Cerebras
GDPR ready
EEA-only Confidential Information storage
DPA on request
Standard contractual clauses on file
SOC 2 Type I
In progress · Q3 2026 attestation target
Vendor-neutral
Zero affiliate revenue, contractually bound
How it works
Three minutes to first measured savings.
01
Sign up
Email + ToS. No card. Get your tk_ key + magic-link instantly at ledger.tesseraai.io/signup-dev.
02
Drop two headers
Point your OpenAI / Anthropic client at api.tesseraai.io. Send your provider key in Authorization. Send your Tessera key in X-Tessera-Key. That's it.
03
Watch the counter
Every request goes through the proxy. Savings measured per-request, surfaced live in ledger.tesseraai.io/portal.
01 · LangChain
The dedicated tessera-langchain package wires ChatOpenAI / ChatAnthropic / ChatMistralAI through the Tessera proxy with one line of config — no header plumbing.
02 · Vercel AI SDK
The dedicated @tessera-llm/vercel-ai package wires createOpenAI / createAnthropic / createMistral through the Tessera proxy. Works with generateText, streamText, generateObject — unchanged.
03 · LlamaIndex
The dedicated tessera-llamaindex package wires the OpenAI / Anthropic / Mistral / Groq / Cohere LLM classes through the Tessera proxy. RAG, agents, query engines run unchanged.
Four mechanics, one proxy
Four optimizations every LLM request runs through.
These four are what you see. Internally Tessera runs nine — chained auto-route, per-role prompt split, output-length prediction, batch reconciliation, and more — plus reliability primitives that disable a single mechanic combination on canary regression and cross-provider failover on primary upstream 5xx. Full list at /how-it-works.
Route
Auto-route to cheaper-equivalent models
For each of your endpoints, we pre-compute which cheaper model returns equivalent quality. GPT-4o → GPT-4o-mini. Claude Opus → Sonnet. A 5% quality canary locks the assumption.
Cache
Auto-cache repeated prompt hashes
Identical-prompt requests within a 7-day window return cached responses. Hash-locked. Per-key TTL. Cache miss falls through transparently.
Compress
Auto-compress context with semantic preservation
Strip low-signal tokens from prompts before they hit the LLM. LLMLingua-2 algorithm preserves semantic intent. Optional per-workload toggle.
Batch
Auto-batch eligible requests
When latency tolerates, batch parallel calls into a single upstream request. Provider batch APIs (50% discount on OpenAI, etc.) used when available.
Pricing
Free Dev for exploration. Production when you scale.
Performance fee on Production — $0 if we don't save you money.
Free Dev
$0
forever, up to limit
- 60M tokens / month
- 30 requests / minute
- All 4 optimization mechanics
- Real-time savings counter
- Anomaly alerts (read-only)
- Apache 2.0 SDK · Python + TypeScript
- No card required
Production
20%
of measured savings vs your unoptimized spend · $0 invoiced if we save you nothing
- Unlimited token throughput
- 60 requests / minute
- Balance management + Stripe top-ups
- Monthly savings statement + CSV export (audit-grade)
- Auto-throttle on cost spike, auto-halt on runaway
- Team seats (up to 5)
- Quality SLA floor 0.95 · auto-rollback on drift
FAQ
Why a proxy and not just an observability layer?
Observability shows you what your LLM calls did. Tessera does the optimization inside the request path. Same dollar saved, zero engineer hours. Compatible with whatever telemetry you already run.
I already have observability set up.
Keep it. Tessera sits on the request path; your existing tracer still receives downstream telemetry. Different layer, not a replacement.
Will routing change my output quality?
Per-workload, we run a quality canary on 5% of traffic. If the cheaper-equivalent model drifts >10% on score, the route auto-disables for that workload and Sentry alerts. Quality is never traded for cost.
What's the 60M ceiling for?
Free Dev tier prevents production traffic from squatting indefinitely on free quota. Hobby + side projects rarely hit it. When you do, upgrade to Production — you pay only on measured savings.
What providers are supported today?
OpenAI, Anthropic, Google (Gemini AI Studio), xAI, Cohere, Mistral, DeepSeek, Groq, Together, Fireworks, OpenRouter, Perplexity, Cerebras. AWS Bedrock, Azure OpenAI, Vertex AI — September 2026.
Where does my data go?
Through Cloudflare Workers (request-path proxy) → upstream provider (your existing key, your existing billing relationship). We log token counts + cost deltas in Supabase (Tessera-managed). Per-request prompt content is not stored. Full audit at /security.
Free key. 30 seconds. No card.
Get free API key →Email + ToS only. Magic link sign-in. Key shown once — copy it to your secret manager.