Skip to main content

Tessera · Budget Guard

The spend cap OpenAI removed in 2025, restored at the proxy layer.

Hard request-blocking caps per workload, per API key, per tenant. Sub-second freshness. Cross-provider aggregation. The block-at-request-time guard your dashboard quietly dropped. Plus a full cost-optimization stack on top, same product — and you keep 100% of the savings.

Free Sandbox includes spend caps + observability on day one. Paid plans unlock the full mechanic stack (route / cache / compress / batch) for a flat monthly subscription by token volume. You keep 100% of measured savings.

Why this page exists

Three real bills from the last 90 days.

$38,000

AWS Bedrock · 2026-04-28

A simple prompt-caching miss compounded with an autonomous agent loop. The AWS dashboard's budget alert fired the morning after. By which time the bill was already final.

HN 47933355 →

$30,000

Claude API · 2026-05-15

Cost alert missed by hours. The Register covered it. The dev had alerts configured. The alert fired after the spend had already crossed the threshold by ~15×.

HN 48153258 →

$200

Solo agent · 2026-02-12

Smallest in dollar terms, most representative. Tool-call loop burned through $200 in a few hours. The founder shipped a per-tool budget control of his own and posted it on HN as Lava.so.

HN 46991656 →

All three share one architectural property: warnings, not blocks.OpenAI removed hard budget limits from their API dashboard in late 2025. You now get email reminders at thresholds, not request-time enforcement. AWS Bedrock's budget alerts fire on a delay that can lag the spend by hours. Anthropic ships no per-key cap UI at all. Hard enforcement only exists at the layer that sits in front of the request.

Mechanic

How the hard cap works.

Every request to the proxy carries a key. The key carries policies. Daily, weekly, or monthly caps in dollars or tokens. The cap can scope to a single workload, an entire API key, or a tenant identifier you pass in metadata.

On request arrival, the Cloudflare worker reads a running spend counter from KV. Sub-millisecond. If the projected post-request spend would cross the cap, the worker returns HTTP 402 Payment Required with a JSON body naming the cap, the remaining budget, and the period.

Your client decides what to do next: fail to user, downgrade to a cheaper model, retry against a fallback provider, or accept the block until the period rolls. The upstream provider never sees the call. The token count stays at zero. The bill stays bounded.

Sample 402 response

HTTP/1.1 402 Payment Required
Content-Type: application/json
X-Tessera-Cap-Scope: workload
X-Tessera-Cap-Period: daily

{
  "error": {
    "type": "budget_cap_exceeded",
    "scope": "workload",
    "workload_id": "wkld_a83f...",
    "period": "daily_utc",
    "cap_usd": 500.0,
    "spent_usd": 499.84,
    "would_spend_usd": 0.42,
    "resets_at": "2026-05-25T00:00:00Z",
    "remediation": [
      "downgrade_model: gpt-4o → gpt-4o-mini",
      "retry_after_reset",
      "increase_cap_via_portal"
    ]
  }
}

Scope

Per workload, per key, per tenant. Or any combination. Caps compose: hit the strictest cap first, return the structured reason.

Cross-provider

Caps total spend across OpenAI + Anthropic + Mistral + Groq + Cohere in a single budget envelope. Provider-side dashboards silo per-provider.

Freshness

KV-tracked running spend, written on every request finalize. Lag from spend to cap-aware is < 1 second across the edge.

Same product, dual hook

Spend caps are additive. The savings stack still ships underneath.

Tessera is not only a spend-control product. The same proxy that blocks at the cap also routes between model tiers, caches exact + semantic matches, injects provider prompt-cache headers, compresses prompts, and arbitrages async-tolerant calls onto provider batch APIs. The cap is one of ten mechanics. Useful especially when you wire it against the others.

Without Tessera

$30,000 surprise bill, no warning that fires before the spend.

Cost dashboards lag by hours. Email alerts arrive after the period has settled. No request-time enforcement exists at the provider layer. Three months of incidents on HN to prove it.

With Tessera

Hard 402 at $500. Plus 30-50% off the underlying spend.

Cap blocks the runaway at request time. Below the cap, the mechanic stack keeps the bill 30-50% lower than baseline through routing, caching, and compression. Eval-gated at 0.95 quality floor. Flat monthly pricing by token volume — you keep 100% of the savings.

See the full mechanic catalog and the worked example ($24K → $9.4K) on the main How-It-Works page.

FAQ

Five real questions.

How is this different from LiteLLM virtual-key budgets or Portkey gateway limits?

Both ship soft (dashboard-shown) budgets. Tessera supports hard request-blocking caps with sub-second freshness. The request never reaches the upstream provider once you cross the cap. Plus cross-provider aggregation: caps total your OpenAI + Anthropic + Mistral spend, not per-provider silos.

Does this work for AWS Bedrock?

Bedrock IAM/STS integration is on the roadmap, not shipped. Current Tessera proxy supports direct provider endpoints. OpenAI, Anthropic, Mistral, Groq, Cohere (well-defined wire formats). If your stack is 80% Bedrock today, leave your shape via the dev signup and we'll email when the Bedrock adapter lands.

What happens when the cap is hit mid-conversation?

HTTP 402 with structured JSON body. Cap scope, period, remaining budget, reset time, and a remediation field suggesting downgrades or retries. Your client logic decides whether to fail to user, downgrade to a cheaper model, or retry against a fallback provider. The upstream provider never sees the blocked call.

Do hard caps slow down legitimate traffic?

Cap-check is a single KV read at the Cloudflare edge - sub-millisecond added to the existing ~15-25ms p50 proxy hop. Negligible versus the upstream provider call.

Can I set caps without the savings layer?

Yes. The Free Sandbox tier ships cap enforcement + observability with no pricing dependency. Useful if you want to validate the spend-control mechanic on real traffic without committing to a paid tier. The flat monthly subscription (by token volume) only applies once you move to a paid tier; the Free Sandbox stays free.

Free Sandbox includes the cap on day one.

60M tokens / month. No card. The cap mechanic is active in observability mode from your first request. Flip enforcement on when you're ready.

Operated by Fintechagency OÜ · Tallinn, Estonia