Getting a language model to return valid JSON used to be an exercise in prompt engineering and prayer. Modern model APIs offer first-class support: structured output, function calling, and JSON mode. Here's how each works, when to use which, and the failure modes to plan for.

The old way: prompt and parse

Before structured output existed, you'd ask the model "Return JSON with fields name, age, and email" and hope. The model would mostly comply but occasionally:

Production code did regex extraction, repaired malformed JSON, and retried on failures. It worked but was brittle. The error rate on free-form JSON generation was 1-5% depending on the model, prompt, and complexity.

Structured output (the modern approach)

Most major model APIs now support a structured-output mode where you provide a JSON Schema and the model is constrained to produce output that conforms. The mechanism varies but the effect is the same: the model literally cannot output tokens that would violate the schema.

// OpenAI structured output
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [...],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "user_info",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "integer" },
          email: { type: "string", format: "email" }
        },
        required: ["name", "age", "email"],
        additionalProperties: false
      },
      strict: true
    }
  }
});

The output is guaranteed to be a valid JSON object matching the schema. No regex repair, no fence-stripping, no retry-on-malformed.

Under the hood, this works by constrained decoding: at each token position, the model's logits are masked to only allow tokens that could continue toward a valid completion. If the schema says the next field is a number, the model can't emit a letter. The implementation is non-trivial — schemas with oneOf or recursive references are tricky — but the result is reliable.

Function calling vs. structured output

Function calling (or "tool use") is conceptually similar: you describe a function signature with parameters, and the model decides whether to call it and with what arguments. The arguments come back as JSON.

// Tool definition
{
  "name": "get_weather",
  "parameters": {
    "type": "object",
    "properties": {
      "location": { "type": "string" },
      "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
    },
    "required": ["location"]
  }
}

The two features overlap. Use function calling when the model might also reasonably respond without calling the function (a chat where some user messages need tool use and others don't). Use structured output when you always want JSON conforming to a schema (extraction tasks, classification, structured generation).

JSON mode (the lighter alternative)

Some APIs offer a "JSON mode" flag that constrains the output to valid JSON without enforcing a specific schema. It catches most malformed-JSON failures but doesn't catch wrong-fields-or-types failures.

JSON mode is useful as a safety net when you can't or don't want to write a full schema. For production extraction tasks, the strict schema-based approach is more robust.

Pitfalls in production

Even with structured output, some failure modes remain. Plan for them.

Schema-conformant nonsense. The model can return valid JSON that matches the schema but contains hallucinated values. A schema can't enforce that "city" is a real city. Validate downstream where it matters.

Token budget limits. The model has a maximum output length. For large schemas with many fields, the model may be cut off mid-generation. With strict mode, this returns an error; without it, you might get a truncated JSON that's invalid. Set generous max_tokens for structured tasks and verify completion.

Optional fields and nulls. If a field is optional in your schema, the model decides whether to include it or omit it. If you need consistency, make all fields required and use empty strings or explicit null. (Many schemas don't allow null on string fields; you can use "type": ["string", "null"] if your schema dialect supports it.)

Enum drift. If your enum has values like ["active", "inactive", "pending"], the model will pick one. But which one? Test edge cases — the model may pick "pending" for unclear inputs that should really be "active". Add a description to each enum value to disambiguate.

Nested object hallucination. Even with strict schemas, complex nested objects can be filled with plausible-but-wrong details. The cure is the same as for any LLM output: validate semantically, not just syntactically.

Cost and latency

Structured output is generally not slower than free-form generation; constrained decoding is fast. However, the request itself counts the schema as input tokens, so a large schema can be expensive. For schemas above ~1000 tokens, consider whether the model needs the full schema or whether you can prune to relevant fields per request.

Validation downstream

Even when structured output guarantees schema conformance, run the output through your own validator before acting on it. Defense in depth:

For browser-side validation during prompt development, our JSON Schema validator lets you paste in the model's output and your schema and see what conforms.

What's next

The frontier is models that handle nested constrained decoding for more complex grammars — not just JSON, but DSLs, regex, and full programming languages. We're not quite there, but the trajectory is clear. For JSON specifically, structured output is now solved at the API level; the remaining work is on the schema-design and validation side.

For more context, see our guide to JSON Schema and Draft 7 vs 2020-12 for which schema dialect to target.