Skip to main content
← How it works
Mechanic · M8 · Content-mutating · Priority 3

Structured output

OpenAI · Anthropic · Gemini · v0.1

Workloads that ask the LLM for JSON output often emit JSON wrapped in prose ("Here is the JSON: { ... }Hope this helps!"), burning output tokens on apologies and natural-language framing the application then strips and discards. M8 injects the provider-native structured-output constraint — response_format on OpenAI-shape, prefill prefix on Anthropic-shape, responseSchema in generationConfig on Gemini — so the model emits only the schema-conformant payload. Typical output-token cut on JSON workloads: 30–70%.

M8 is content-mutating under the composition cap. Priority M7 > M3 > M8: if M7 context-prune or M3 compress already fired on this request, M8 short-circuits. Single-pick budget — one mutation per request, never compounded.

Schema-mode vs auto-mode

When workloads.expected_schema is set (a JSON-Schema document validated against the lightweight 2020-12 sanity check at write time), M8 injects strict-mode constraints:

  • OpenAI: response_format: { type: 'json_schema', json_schema: { name, schema, strict: true } }. OpenAI rejects model outputs that violate the schema with a 422; Tessera surfaces a Tier 1 anomaly when this fires.
  • Anthropic: prefill prefix injection — we add a pre-completion assistant message starting with { so the model continues from the open brace. Combined with the tool-use schema definition when the workload has one.
  • Gemini: generationConfig.responseMimeType: 'application/json' + responseSchema (Gemini accepts a subset of JSON-Schema; the applier translates).

When no schema is registered (workloads.expected_schema IS NULL but the toggle is ON), M8 runs in auto-mode:

  • OpenAI: response_format: { type: 'json_object' } — model emits JSON shape but no schema constraint.
  • Anthropic: prefill-only (no schema).
  • Gemini: responseMimeType: 'application/json' alone.

The caller-already-set bypass

If the caller already set response_format (OpenAI) or an equivalent provider field, M8 does not overwrite. Sponsors may have intentional non-JSON modes for some endpoints; we never override an explicit caller choice. The applier checks the field presence and exits without mutation when set.

Schema editor (sponsor self-serve)

The /portal/settings schema editor accepts a JSON document, validates against the lightweight 2020-12 sanity check (validateSchemaDocument), and writes to workloads.expected_schema. Empty {} is rejected to disambiguate "clear" (NULL via the form's dedicated clear button) from "save nothing" (the editor body left blank).

Server-side validation prevents the sponsor from bricking every JSON-mode call by pasting a malformed schema. KV refresh fires after every write so the next request sees the new schema without waiting for the 5-minute periodic sync.

Composition cap priority

M8 is third-priority among content mutators:

  • M7 context-prune wins. If M7 trimmed messages on this request, contentMutationApplied = true and M8 skips.
  • M3 compress wins. If M3 ran on this request, M8 skips.
  • M1 auto-route also disables when M8 is eligible to fire (cost stacked on mutation multiplies risk).

The cap is enforced inline in the worker, not via a separate policy engine. Unit tests in compose-budget.test.ts lock the priority order.

What we don’t do (yet)

  • No streaming structured-output.v0.1 injects on non-streaming JSON requests only. Streaming JSON-mode requests pass through (provider handles natively if the SDK sent the field; M8 doesn't inject).
  • No nested schema synthesis.The schema editor accepts whatever the sponsor pastes. We do not derive a schema from sample outputs — that's a separate workflow we haven't scoped.
  • No on-the-fly schema relaxation. If the model 422s repeatedly against the strict schema, we surface the anomaly to the sponsor — we do not auto-relax to auto-mode. The sponsor decides whether to fix the schema or drop strict.
  • No XML / YAML / other structured modes. v0.1 is JSON-only. Other structured outputs are queued behind customer-traffic signal.

Verification surfaces

  • Response header x-tessera-structured-output: json_schema (strict-mode applied) or json_object (auto-mode).
  • /portal/audit chip strip — m8 chip (warning-amber colour for mutating mechanic). structured_output_applied = true on the row.
  • Anomaly row m8_strict_schema_rejection when the provider 422s against a strict schema — surfaces the non-conformant fields on /portal/anomalies.

What we promise

M8 is opt-in per workload, schema-validated server-side before persistence, never overrides an explicit caller response_format, composition- cap-mutex with M3 and M7, and surfaced on every applied row. Provider-specific shape translation lives in dedicated appliers (applyOpenAIStructuredOutput, applyAnthropicStructuredOutput, applyGoogleStructuredOutput) — one source of truth per provider.

Parent index: How it works. Mutex-mate deep-dives: M3 compress · M7 context prune.