CSV and JSON formats introduction is a common data exchange format used widely for representing sets of records with identical list of fields. Comma Separated Values (CSV) format nowadays became de-facto of data exchange format standard, replacing XML, that was a huge buzzword in the early 2000’s. It is not only self-describing, but also human readable. JavaScript Object Notation (JSON) Let’s look examples of both formats. Here is a list of families represented as CSV data: id,father,mother,children1,Mark,Charlotte,12,John,Ann,33,Bob,Monika,2 CSV looks a lot simpler than array analog shown below: JSON [{"id":1,"father":"Mark","mother":"Charlotte","children":1},{"id":2,"father":"John","mother":"Ann","children":3},{"id":3,"father":"Bob","mother":"Monika","children":2},] But CSV is limited to store two-dimensional, untyped data. There is no any way to store nested structures or types of values like names of children in plain CSV. [{"id":1,"father":"Mark","mother":"Charlotte","children":["Tom"]},{"id":2,"father":"John","mother":"Ann","children":["Jessika","Antony","Jack"]},{"id":3,"father":"Bob","mother":"Monika","children":["Jerry","Karol"]},] Representing nested structures in JSON files is easy, though. Why not just surround the whole data with a regular JSON array so the file itself is valid json? In order to insert or read a record from a JSON array you have to parse the whole file, which is far from ideal. Since every entry in JSON Lines is a valid JSON it can be parsed/unmarshaled as a standalone JSON document. For example, you can seek within it, split a 10gb file into smaller files without parsing the entire thing. 1. No need do read the whole file in memory before parse. 2. You can easily add further lines to the file by simply appending to the file. If the entire file were a JSON array then you would have to parse it, add the new line, and then convert back to JSON. So it is not practical to keep a multi-gigabyte as a single JSON array. Taking into consideration that Dataflow kit users would require to store and parse big volumes of data we’ve implemented . export to JSONL format are three terms expressing the same formats primarily intended for JSON streaming. JSON lines (jsonl), Newline-delimited JSON (ndjson), line-delimited JSON (ldjson) Let’s look into what is, and how it compares to other JSON streaming formats. JSON Lines JSON Lines vs. JSON Exactly the same list of families expressed as a format looks like this: JSON Lines {"id":1,"father":"Mark","mother":"Charlotte","children":["Tom"]}{"id":2,"father":"John","mother":"Ann","children":["Jessika","Antony","Jack"]}{"id":3,"father":"Bob","mother":"Monika","children":["Jerry","Karol"]} essentially consists of several lines where each individual line is a valid JSON object, separated by newline character . JSON Lines `\n` It doesn’t require custom parsers. Just read a line, parse as JSON, read a line, parse as JSON… and so on. Actually it is already common in industry to use very jsonl Click on the link below to find more details about JSON lines specification. _This page describes the JSON Lines text format, also called newline-delimited JSON. JSON Lines is a convenient format…_jsonlines.org JSON Lines JSON Lines vs. JSON text sequences Let’s compare and associated media type “application/json-seq” with . It consists of any number of JSON texts, all encoded in UTF-8, each prefixed by an ASCII Record Separator (0x1E), and each ending with an ASCII Line Feed character (0x0A). JSON text sequence format NDJSON Let’s look at the list of Persons mentioned above expressed as JSON-sequence file: <RS>{"id":1,"father":"Mark","mother":"Charlotte","children":["Tom"]}<LF><RS>{"id":2,"father":"John","mother":"Ann","children":["Jessika","Antony","Jack"]}<LF><RS>{"id":3,"father":"Bob","mother":"Monika","children":["Jerry","Karol"]}<LF> here is a placeholder for non-printable ASCII Record Separator (0x1E). represents the line feed character. <RS> <LF> The format looks almost identical to JSON Lines excepting this special symbol at the beginning of each record. As these two formats so similar you may wonder why they both exist? format is used for a streaming context. So this format does not define corresponding file extension. Though JSON text sequences format specification registers the new MIME media type It is error-prone to store and edit this format in a text editor as the non-printable (0x1E) character may be garbled. JSON text sequences application/json-seq. You may consider using JSON lines as an alternative consistently. JSON Lines vs. Concatenated JSON Another alternative to JSON Lines is In this format each JSON text is not separated from each other at all. concatenated JSON. Here is concatenated JSON representation of an example above: {"id":1,"father":"Mark","mother":"Charlotte","children":["Tom"]}{"id":2,"father":"John","mother":"Ann","children":["Jessika","Antony","Jack"]}{"id":3,"father":"Bob","mother":"Monika","children":["Jerry","Karol"]} Concatenated JSON isn’t a new format, it’s simply a name for streaming multiple JSON objects without any delimiters. While generating JSON is not such a complex task, parsing this format actually requires significant effort. In fact, you should implement a context-aware parser that detects individual records and separates them from each other correctly. Pretty printed JSON formats If you have large nested structures then reading the JSON Lines text directly isn’t recommended. Use the tool to make viewing large structures easier: jq grep . families.jsonl | jq As a result you will see pretty printed JSON file: {"id": 1,"father": "Mark","mother": "Charlotte","children": ["Tom"]}{"id": 2,"father": "John","mother": "Ann","children": ["Jessika","Antony","Jack"]}{"id": 3,"father": "Bob","mother": "Monika","children": ["Jerry","Karol"]} Conclusion The complete JSON Lines file as a whole is technically no longer valid JSON, because it contains multiple JSON texts. The fact that every new line means a separate entry makes the JSON Lines formatted file streamable. You can read just as many lines as needed to get the same amount of records. _Web scraping open source platform written in Go_dataflowkit.com Turn Websites into structured data /Dataflow kit