How to Generate a JSON Schema From Sample Data
Writing a JSON schema by hand for a large, nested object is tedious and error-prone. If you already have a representative sample of the data — a real API response, a fixture file, an example payload — you can generate a draft schema from it automatically and then refine what needs to change.
What a generated schema captures
A JSON schema generated from sample data describes the structure of that specific sample: the keys present, the types of their values, and the nesting depth. Given this payload:
{
"id": 1234,
"name": "Alice",
"active": true,
"tags": ["admin", "editor"],
"metadata": {"created": "2026-01-15"}
}
A generator produces a schema like:
{
"type": "object",
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"},
"active": {"type": "boolean"},
"tags": {"type": "array", "items": {"type": "string"}},
"metadata": {
"type": "object",
"properties": {
"created": {"type": "string"}
}
}
}
}
What to refine after generation
A generated schema is a starting point, not a finished specification. Things it gets wrong or doesn't capture:
- Required vs optional fields. The generator doesn't know which fields are always present and which are optional. You need to add
"required": [...]manually. - String formats. A string that's always a date needs
"format": "date", not just"type": "string". - Value ranges. An integer ID that's always positive needs
"minimum": 1. - Enum values. A string that's always "active", "inactive", or "pending" should use
"enum", not just"type": "string". - Additional properties. By default, JSON schemas allow any additional property. Adding
"additionalProperties": falsemakes the schema strict.
Use cases for JSON schema
API contract documentation. A JSON schema attached to API documentation tells consumers exactly what fields to expect, their types, and which are required — much more precisely than prose descriptions.
Request/response validation. Libraries in Python, JavaScript, Go, and Java can validate JSON against a schema at runtime. A validate-on-receive pattern catches malformed inputs before they cause downstream errors.
Testing and mocking. A schema can drive test fixture generation — tools like Faker and JSON Schema Faker generate synthetic data that matches a schema, which is useful for load testing and integration tests.