When to Convert a JSON Array to JSON Lines

If you work with APIs, data pipelines, or AI batch jobs, you've almost certainly hit the moment where a tool rejects your perfectly valid JSON because it expected JSON Lines instead, or vice versa. The two formats store the same data but behave very differently in practice. Choosing the wrong one doesn't just cause errors; it affects memory usage, processing speed, and whether your pipeline can scale.

This guide explains the difference, when each format wins, and how to convert between them cleanly.

What Is the Difference Between JSON Array and JSON Lines?

A JSON array wraps all records inside square brackets, separated by commas. The entire structure is a single valid JSON document:

[
  {"id": 1, "name": "Alice", "score": 92},
  {"id": 2, "name": "Bob", "score": 87},
  {"id": 3, "name": "Carol", "score": 95}
]

JSON Lines, also called JSONL or NDJSON, puts each record on its own line with no outer wrapper. Each line is a complete, independent JSON object:

{"id": 1, "name": "Alice", "score": 92}
{"id": 2, "name": "Bob", "score": 87}
{"id": 3, "name": "Carol", "score": 95}

Both formats are lossless representations of the same data. The difference is entirely in how parsers read them, and that has significant downstream consequences.

When to Use JSON Lines (JSONL)

1. Large files and streaming. A JSON array parser must read the entire file before it can process a single record because it needs the closing ] to confirm the structure is complete. That means loading a 2 GB JSON array requires 2 GB of memory before your code can do anything useful.

JSONL is inherently streamable. A processor reads line 1, handles it, discards it from memory, then moves to line 2 without ever holding the full dataset in memory. For files with hundreds of thousands of records, this is the difference between a pipeline that works and one that crashes.

2. Append-only logs and event streams. Appending a new record to a JSONL file is a single write operation: add a new line at the end. Appending to a JSON array requires either rewriting the entire file or maintaining more complex writer logic to insert before the closing ].

This makes JSONL the natural format for log files, event streams, and any data source where records are continuously written over time.

3. Fault tolerance and partial files. If a JSON array file is truncated because of a failed write, a crash, or a network interruption, the entire file becomes unparseable. The missing closing bracket makes the whole document invalid.

A truncated JSONL file is still partially useful. Every line written before the truncation is a valid, independent record. Your pipeline can process what it has and flag the incomplete record at the end.

4. AI batch APIs and fine-tuning. This is one of the most common reasons developers convert JSON arrays to JSONL today. Batch processing and dataset workflows often expect one request or training example per line:

  • OpenAI Batch API: each line is one request object
  • Anthropic batch endpoint: each line is one message request
  • OpenAI fine-tuning: training data uses one {"messages": [...]} object per line
  • Hugging Face datasets: JSONL is a common large-dataset upload format

If your data starts as a JSON array and you are submitting to one of these systems, you usually need to convert it first.

5. Parallel processing. Because each JSONL line is independent, a file can be split across workers trivially. Worker 1 can handle lines 1 through 10,000 while worker 2 handles lines 10,001 through 20,000. A JSON array typically needs to be parsed into records before that split becomes convenient.

When to Keep the JSON Array

1. REST APIs and HTTP responses. JSONL is not valid JSON. You cannot send it as an application/json response body, and JSON.parse() will throw on a raw JSONL string. Any REST API, GraphQL endpoint, or web service that returns a collection of records should use a JSON array.

2. Small datasets loaded entirely into memory. Configuration files, test fixtures, seed data, and small reference datasets are usually loaded completely anyway. JSON array is the conventional choice here, has broader tooling support, and is easier to inspect in editors and code review.

3. Nested or hierarchical data. JSONL is designed for flat records: a sequence of independent top-level objects. If your data has meaningful outer structure, such as a document that contains metadata alongside an array of sub-records, a JSON array preserves that hierarchy naturally.

4. Browser and frontend consumption. fetch() and JSON.parse() work natively with JSON arrays. Parsing JSONL in the browser requires line splitting and per-line parsing. For data consumed directly by frontend JavaScript, a JSON array is the simpler and more compatible default.

Same data, different behavior: JSON array and JSONL can represent the same records exactly. The decision is not about data loss. It is about which format matches the next parser, pipeline, or API cleanly.

How to Convert JSON Array to JSON Lines

Python:

import json

with open('data.json', 'r') as f:
    records = json.load(f)

with open('data.jsonl', 'w') as f:
    for record in records:
        f.write(json.dumps(record) + '\n')

Node.js:

const fs = require('fs');

const records = JSON.parse(fs.readFileSync('data.json', 'utf8'));
const jsonl = records.map(r => JSON.stringify(r)).join('\n');
fs.writeFileSync('data.jsonl', jsonl);

Command line (jq):

jq -c '.[]' data.json > data.jsonl

How to Convert JSON Lines Back to a JSON Array

Python:

import json

with open('data.jsonl', 'r') as f:
    records = [json.loads(line) for line in f if line.strip()]

with open('data.json', 'w') as f:
    json.dump(records, f, indent=2)

Node.js:

const fs = require('fs');

const records = fs.readFileSync('data.jsonl', 'utf8')
  .split('\n')
  .filter(Boolean)
  .map(line => JSON.parse(line));

fs.writeFileSync('data.json', JSON.stringify(records, null, 2));

Both conversions are lossless and reversible. No data is added or removed in the process.

FAQ

What is the difference between JSONL, NDJSON, and JSON Lines? They are the same format with different names. JSONL and JSON Lines are common in AI and ML contexts. NDJSON, short for newline-delimited JSON, is common in data engineering. All three mean one JSON object per line with no outer wrapper.

Can a JSONL file contain nested objects? Yes. Each line must be a valid JSON object, and that object can contain nested arrays and nested objects. The only constraint is that each top-level record occupies exactly one line.

Why do AI APIs require JSONL instead of a JSON array? Because batch jobs process records independently and in parallel. JSONL lets the API read one request at a time without loading the full file, and it makes it easier to match each line in the input file to a corresponding line in the output.

How do I validate a JSONL file? Parse each line independently. A valid JSONL file has no blank lines between records, a trailing newline is fine, and every non-empty line parses as a valid JSON object.

What happens if there is a blank line in a JSONL file? Many parsers skip blank lines, but some strict implementations throw. The safest practice is to strip blank lines before processing with logic like lines.filter(line => line.trim()).