How to Generate a JSON Schema From Sample Data

Writing a JSON schema by hand for a large, nested object is tedious and error-prone. If you already have a representative sample of the data — a real API response, a fixture file, an example payload — you can generate a draft schema from it automatically and then refine what needs to change.

What a generated schema captures

A JSON schema generated from sample data describes the structure of that specific sample: the keys present, the types of their values, and the nesting depth. Given this payload:

{
  "id": 1234,
  "name": "Alice",
  "active": true,
  "tags": ["admin", "editor"],
  "metadata": {"created": "2026-01-15"}
}

A generator produces a schema like:

{
  "type": "object",
  "properties": {
    "id": {"type": "integer"},
    "name": {"type": "string"},
    "active": {"type": "boolean"},
    "tags": {"type": "array", "items": {"type": "string"}},
    "metadata": {
      "type": "object",
      "properties": {
        "created": {"type": "string"}
      }
    }
  }
}

What to refine after generation

A generated schema is a starting point, not a finished specification. Things it gets wrong or doesn't capture:

  • Required vs optional fields. The generator doesn't know which fields are always present and which are optional. You need to add "required": [...] manually.
  • String formats. A string that's always a date needs "format": "date", not just "type": "string".
  • Value ranges. An integer ID that's always positive needs "minimum": 1.
  • Enum values. A string that's always "active", "inactive", or "pending" should use "enum", not just "type": "string".
  • Additional properties. By default, JSON schemas allow any additional property. Adding "additionalProperties": false makes the schema strict.
Use multiple samples. A schema generated from one example may not capture the full range of valid inputs. If you have access to multiple real payloads — including edge cases with optional fields absent — generate the schema from the most complete example, then manually check it against the others.

Use cases for JSON schema

API contract documentation. A JSON schema attached to API documentation tells consumers exactly what fields to expect, their types, and which are required — much more precisely than prose descriptions.

Request/response validation. Libraries in Python, JavaScript, Go, and Java can validate JSON against a schema at runtime. A validate-on-receive pattern catches malformed inputs before they cause downstream errors.

Testing and mocking. A schema can drive test fixture generation — tools like Faker and JSON Schema Faker generate synthetic data that matches a schema, which is useful for load testing and integration tests.