Skip to main content
← How it works
Mechanic · M11 · Reliability primitive

Cross-provider failover

Shipped 20 May 2026 · Worker 0.43.0-m11-failover · Migration 0081

When the primary LLM provider returns a 5xx / connection error / timeout, the Tessera proxy retries the customer's request on OpenRouter with the namespaced equivalent model. The customer sees a successful response instead of a transport failure. This page documents what we built, the audit invariants that hold across the failover path, and the deferrals we deliberately did not paper over.

M11 is a reliability primitive, not a cost mechanic. It does not save money — when failover fires, the customer is briefly paying OpenRouter's 5.5% surcharge on top of the model rate. The framing matters: this is request infrastructure, sitting in the same proxy path as the nine cost mechanics, addressing a different concern.

What it does

On every request, after the primary forwarder returns (or throws), the worker classifies the outcome via parseFailoverTrigger:

  • upstream_5xx — primary returned a 5xx status. 4xx is excluded (caller-fault — the request would also fail on the alternate).
  • connection_error — fetch rejected before a response arrived (TCP reset, DNS failure, unreachable host).
  • timeout— request budget exceeded by the worker's outer guard.

Any classification other than these three returns null and the failover path is skipped. The customer sees the original response unchanged.

When the trigger fires AND the workload has opted in AND an OpenRouter API key is provisioned AND the request is not streaming AND the requested model exists in our static translation table — the worker JIT-decrypts the OR key, swaps the model field to the OR namespace ( gpt-5 openai/gpt-5, deepseek-chat deepseek/deepseek-chat), and retries against openrouter.ai/api/v1. The customer sees the OR response with extra response headers identifying the failover.

What we built (MVP scope locks)

Migration 0081 added two workload columns — cross_provider_failover_enabled (boolean, default FALSE) and openrouter_failover_key_vault_id (UUID into vault.secrets) — with a CHECK constraint that prevents orphan keys: vault uuid non-null requires the toggle to be ON. Three Vault RPC wrappers (m11_store_openrouter_failover_key, m11_read_openrouter_failover_key, m11_destroy_openrouter_failover_key) handle storage, JIT-decrypt, and cleanup, all SECURITY DEFINER with locked search paths.

The worker stores only the vault UUID in Cloudflare KV. The decrypted OpenRouter key is fetched per-failover via an HMAC-signed call to /api/internal/failover-key on the dashboard — cold path only, never cached across requests or across isolates. The plaintext key exists in worker memory for the duration of one failover and is then discarded. Audit: audit_events receives an m11_failover_key_decrypted row on every successful decrypt.

Four columns on optimize_savings record the failover provenance: failover_attempted, failover_from_provider, failover_trigger, primary_attempted_pricing_snapshot_id. Two pricing snapshots are persisted per failover row — the primary's catalog snapshot we would have used, and OpenRouter's snapshot we actually billed against. The audit-immutability invariant the rest of the proxy holds (every $ figure traceable to an immutable catalog row) covers the failover path the same way.

The capability gate in decideFailover refuses to fail over when the request body carries a tools array and the OR alternate lacks function-calling support. We do not silently drop a tool call to dodge an outage. If no eligible alternative exists, the primary 5xx surfaces unchanged.

What we don’t do yet

Five Phase 2 deferrals are explicit, not hidden:

  • Active failover via population signal. The worker today reads the per-isolate circuit-breaker state. Cross-isolate consensus via a Cloudflare Durable Object holding the canonical breaker state is on the roadmap — when the broader fleet sees 5xx rates spike before any one isolate accumulates enough samples to flip, active failover redirects new requests without first paying the per-request 5xx wait.
  • Anthropic /v1/messages primary. Anthropic's body shape diverges from OpenAI's chat-completions shape (separate system field, typed content blocks, distinct tool schema). MVP routes only OpenAI-shape primaries; cross-shape body translation lands in a separate sprint.
  • Mid-stream 5xx surfacing. Streaming requests (stream: true) skip failover. Surfacing partial completion plus a terminal [ERROR] marker requires SSE tee + chunk-class detection that we'd rather not ship half-done.
  • Recovery probe + fail-back. No synthetic 10-second probe to test the primary upstream while failover is active. Recovery is implicit: the next successful primary request returns the customer to the direct path. The synthetic probe is queued — the cost is monitoring complexity and the benefit only shows up during multi-minute outages where the implicit path delays recovery on the second-affected request.
  • Per-pair canary validation.The daily promptfoo canary scores the customer's configured mechanic stack. Per-pair validation across the failover path (OpenRouter-served gpt-5 vs OpenAI-direct gpt-5) is queued; we surface every failover event on the audit ledger ( /portal/audit for sponsors with provisioned access) today instead of asserting per-pair quality automatically.

How to enable

On a workload page in /portal/settings, paste an OpenRouter API key (sk-or-v1-…) into the OpenRouter failover key form and save. The store action wraps the key in Vault and writes the UUID; the toggle flips ON in lockstep (the DB CHECK constraint guarantees the (toggle, vault uuid) pair is consistent on every row). Existing keys can be rotated by saving a new value — the prior Vault row is destroyed and a fresh one is created. The Clear key button destroys the Vault row, NULLs the column, and force- disables the toggle.

KV refresh is immediate (refreshKvForWorkloadKeys fires after every settings mutation). The next request through the worker reads the new value without waiting for the 5-minute periodic sync.

Verification surfaces

M11 is observable in three places:

  • Response headers on the failover-served request: x-tessera-failover: applied, x-tessera-failover-from: openai, x-tessera-failover-trigger: upstream_5xx, x-tessera-failover-to: openrouter:openai/gpt-5.
  • /portal/audit chip strip. Failover rows carry an m11 chip in their canonical mechanics_stack alongside any other applied mechanics. A per-row caption surfaces the trigger class and from-provider so the failover event is visible without click-through.
  • audit_events table. Every JIT key decrypt writes an m11_failover_key_decrypted row with the request id + vault uuid for compliance traceability. The append-only invariant on audit_events covers M11 the same way it covers every other state-change event.

Composition with other mechanics

M11 sits outside the M3 / M7 / M8 content-mutation mutex. It composes freely with M1 auto-route, M2 exact cache, M5 semantic cache, M6 prompt cache, M9 output-length predictor, and M10 batch arbitrage. Practically: when the primary fails after auto-routing to a cheaper model (M1 fired gpt-5 → gpt-5-mini), the worker fails over to openai/gpt-5-mini on OpenRouter — the M1 swap is preserved, not undone.

Pricing follows the actual upstream: the OR pricing-catalog row for openai/gpt-5-mini backs actual_cost_usd (with OR's surcharge baked in); the primary's catalog row backs the snapshot we would have used. The performance-fee math runs against the customer's baseline anchor as always — failover does not unmoor the savings calculation.

What we promise

M11 is opt-in default OFF. Quality lock 2026-05-20: OpenRouter routes through their own infrastructure — not bit-identical to a direct provider call. We refuse to flip the default to ON until 30+ days of telemetry validate per-pair quality preservation at ≥ 0.95 across the failover pairs we ship. The same per-stack auto- rollback that protects the cost mechanics will cover M11 once per-pair canary validation lands.

Every commitment on this page is enforceable in source. The worker version is pinned at the top of this document. The migration that added the columns is listed alongside it. The response headers are stable contracts — if we ever change their semantics, that change ships with a CHANGELOG entry and a deprecation window.

Architecture write-up: Cross-provider failover at the edge — what we built (and what we did not). Parent index: How it works.