How to Prepare JSONL Files for AI Batch Jobs
AI batch APIs require JSONL input — not a JSON array. Each line is an independent request. Getting the format right means the batch succeeds; getting it wrong means the entire job fails with a format error before processing a single record.
Why batch APIs use JSONL
JSONL allows the API to stream and process records independently without loading the entire file. A JSON array must be parsed completely before any record can be processed. For batch jobs with 50,000 requests, JSONL's streamability is a practical requirement — loading a 500 MB JSON array into memory just to start processing is inefficient.
JSONL also means partial success is possible. If a batch run fails partway through, all records up to the failure are processed. A malformed JSON array fails entirely at parse time.
OpenAI Batch API format
Each line in an OpenAI batch JSONL file must be a valid JSON object with three fields:
{"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is JSON?"}], "max_tokens": 100}}
{"custom_id": "req-002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is YAML?"}], "max_tokens": 100}}
custom_id: your identifier for this request — used to match input to output. Make it unique within the batch. method: always "POST" for chat completions. url: the API endpoint path. body: the complete request body as it would be sent to the regular API.
Anthropic Batch API format
Anthropic's message batches use a slightly different structure:
{"custom_id": "req-001", "params": {"model": "claude-3-haiku-20240307", "max_tokens": 100, "messages": [{"role": "user", "content": "What is JSON?"}]}}
{"custom_id": "req-002", "params": {"model": "claude-3-haiku-20240307", "max_tokens": 100, "messages": [{"role": "user", "content": "What is YAML?"}]}}
The main difference: the request body is under params instead of body, and the method and url fields are not required.
Generating JSONL programmatically
import json
# Build batch requests from a list of inputs
inputs = ["What is JSON?", "What is YAML?", "What is TOML?"]
with open("batch_requests.jsonl", "w") as f:
for i, question in enumerate(inputs):
request = {
"custom_id": f"req-{i:04d}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": question}],
"max_tokens": 200
}
}
f.write(json.dumps(request) + "
")
within the JSON string. A literal newline inside a value breaks the one-record-per-line format and invalidates all subsequent records.Validating before uploading
Before uploading a batch file, spot-check it: confirm each line parses as valid JSON individually, verify custom_id values are unique, check that required fields are present for every record, and confirm the model name is valid. A malformed line near the end of a 10,000-record file fails the entire batch — validation catches this before the upload.