Structured output
OpenAI · Anthropic · Gemini · v0.1
Workloads that ask the LLM for JSON output often emit JSON wrapped in prose ("Here is the JSON: { ... }Hope this helps!"), burning output tokens on apologies and natural-language framing the application then strips and discards. M8 injects the provider-native structured-output constraint — response_format on OpenAI-shape, prefill prefix on Anthropic-shape, responseSchema in generationConfig on Gemini — so the model emits only the schema-conformant payload. Typical output-token cut on JSON workloads: 30–70%.
M8 is content-mutating under the composition cap. Priority M7 > M3 > M8: if M7 context-prune or M3 compress already fired on this request, M8 short-circuits. Single-pick budget — one mutation per request, never compounded.
Schema-mode vs auto-mode
When workloads.expected_schema is set (a JSON-Schema document validated against the lightweight 2020-12 sanity check at write time), M8 injects strict-mode constraints:
- OpenAI:
response_format: { type: 'json_schema', json_schema: { name, schema, strict: true } }. OpenAI rejects model outputs that violate the schema with a 422; Tessera surfaces a Tier 1 anomaly when this fires. - Anthropic: prefill prefix injection — we add a pre-completion assistant message starting with
{so the model continues from the open brace. Combined with the tool-use schema definition when the workload has one. - Gemini:
generationConfig.responseMimeType: 'application/json'+responseSchema(Gemini accepts a subset of JSON-Schema; the applier translates).
When no schema is registered (workloads.expected_schema IS NULL but the toggle is ON), M8 runs in auto-mode:
- OpenAI:
response_format: { type: 'json_object' }— model emits JSON shape but no schema constraint. - Anthropic: prefill-only (no schema).
- Gemini:
responseMimeType: 'application/json'alone.
The caller-already-set bypass
If the caller already set response_format (OpenAI) or an equivalent provider field, M8 does not overwrite. Sponsors may have intentional non-JSON modes for some endpoints; we never override an explicit caller choice. The applier checks the field presence and exits without mutation when set.
Schema editor (sponsor self-serve)
The /portal/settings schema editor accepts a JSON document, validates against the lightweight 2020-12 sanity check (validateSchemaDocument), and writes to workloads.expected_schema. Empty {} is rejected to disambiguate "clear" (NULL via the form's dedicated clear button) from "save nothing" (the editor body left blank).
Server-side validation prevents the sponsor from bricking every JSON-mode call by pasting a malformed schema. KV refresh fires after every write so the next request sees the new schema without waiting for the 5-minute periodic sync.
Composition cap priority
M8 is third-priority among content mutators:
- M7 context-prune wins. If M7 trimmed messages on this request,
contentMutationApplied = trueand M8 skips. - M3 compress wins. If M3 ran on this request, M8 skips.
- M1 auto-route also disables when M8 is eligible to fire (cost stacked on mutation multiplies risk).
The cap is enforced inline in the worker, not via a separate policy engine. Unit tests in compose-budget.test.ts lock the priority order.
What we don’t do (yet)
- No streaming structured-output.v0.1 injects on non-streaming JSON requests only. Streaming JSON-mode requests pass through (provider handles natively if the SDK sent the field; M8 doesn't inject).
- No nested schema synthesis.The schema editor accepts whatever the sponsor pastes. We do not derive a schema from sample outputs — that's a separate workflow we haven't scoped.
- No on-the-fly schema relaxation. If the model 422s repeatedly against the strict schema, we surface the anomaly to the sponsor — we do not auto-relax to auto-mode. The sponsor decides whether to fix the schema or drop strict.
- No XML / YAML / other structured modes. v0.1 is JSON-only. Other structured outputs are queued behind customer-traffic signal.
Verification surfaces
- Response header
x-tessera-structured-output: json_schema(strict-mode applied) orjson_object(auto-mode). -
/portal/auditchip strip —m8chip (warning-amber colour for mutating mechanic).structured_output_applied = trueon the row. - Anomaly row
m8_strict_schema_rejectionwhen the provider 422s against a strict schema — surfaces the non-conformant fields on/portal/anomalies.
What we promise
M8 is opt-in per workload, schema-validated server-side before persistence, never overrides an explicit caller response_format, composition- cap-mutex with M3 and M7, and surfaced on every applied row. Provider-specific shape translation lives in dedicated appliers (applyOpenAIStructuredOutput, applyAnthropicStructuredOutput, applyGoogleStructuredOutput) — one source of truth per provider.
Parent index: How it works. Mutex-mate deep-dives: M3 compress · M7 context prune.