Our Ambitious Quest to Democratise Real-time Distributed Systems

In our last post we made an argument about the huge potential that implementation of distributed systems can bring to startups and consumers. We also briefly mentioned some of the tools we are developing at Nanosai.com including our IAP protocol aimed at helping developers build smarter and more versatile distributed systems. Today we`re going to focus on IAP`s data format called Internet Objection Notation (ION for short).

Why Data Format Matters?

Today we live in a world where it is estimated that on average we generate around 2.5 quintillion bytes of data per day, a lot of that data being generate by users of popular services from household names such as Google, Facebook, Amazon, and Netflix. In fact a recent report by Cisco predicts the following:

● By 2020, the gigabyte (GB) equivalent of all movies ever made will cross the global Internet every 2 minutes.

● Globally, IP traffic will reach 511 terabits per second (Tbps) in 2020, the equivalent of 142 million people streaming Internet high-definition (HD) video simultaneously, all day, every day.

● Global IP traffic in 2020 will be equivalent to 504 billion DVDs per year, 42 billion DVDs per month, or 58 million DVDs per hour.

At Nanosai we believe that the handling of such high volume of data will only be effective with the rise of a new generation of intelligent distributed systems that will most likely challenge us with new types of data and communication patterns. In particular this will require a data format that is more versatile, compact, fast and easy to handle for both small devices as well as for big servers. Hence any advanced network protocol such as our own IAP will need to accommodate that in order to be suitable for such advanced distributed systems.

Today the most popular data format that is widely used by developers is without doubts JSON. In fact companies like twitter made their APIs JSON only. But JSON has a lot of shortcomings amongst them the following;

1. JSON is a textual format. That means that JSON is a pretty verbose way to send numbers. It also means JSON is not a good format for raw binary data. Raw bytes must be Base64 or Hex encoded and transferred as strings. Base64 encoding increases the size of the encoded data to 4/3 of the raw size, and Hex encoding increases the size to 2/1 of the raw size.

2. JSON is not that versatile in the sense that it is not that good at modelling all types of data structures. For example JSON is weak at modelling tables of similar data with rows and columns (e. CSV files). JSON would encode such tabular data as arrays of objects, meaning the column name would be repeated for every single object (row) in the table. This is a clear waste of data.

3. Being both textual and verbose, JSON is not the fastest data format to read or write. Being verbose it is also slower to transfer, especially for devices with limited bandwidth like small IoT devices, mobile phones on weak connections or ships floating in the middle of the ocean.

Finally, if the Internet of Everything is to become truly plug-and-play, we cannot have a situation where every single OEM or developer go through the hassle of creating their own data format, just because JSON isn’t suitable/versatile enough for their use case.

What Is ION?

ION (see ION specifications) is the default binary data format for IAP that is versatile enough to encode a wide variety of data. ION can model the most commonly used data structures such as;

· Raw bytes

· Basic primitive types like booleans, integers, floating points, UTF-8 text and datetime.

· Streams (unbounded) of fields of the above types.

· Arrays (bounded) of fields

· Maps (key, value pairs)

· Objects (with property name, property value pairs, null values, class id, cyclic references etc.)

· Tables with rows and columns.

· Complex object graphs with objects nested in objects, objects nested in tables, tables nested in tables, tables nested in objects, and tables and objects nested in arrays and vice versa.

· Combinations of all of the above.

ION is typically 2/3 the size of the corresponding JSON document, and for tabular data (e.g. arrays of objects) the size can drop to as low as 1/5 (even less for very small data types like Boolean). Reading and writing of ION messages can compete with Google Protocol Buffers in speed, if speed is what you need. ION is also self-describing and can be navigated without a schema.

An interesting fact about ION is that although we developed it for our protocol IAP, it was actually designed to be independent of any network protocols! So developers are able to use ION outside of the IAP protocol as an alternative data format to JSON, XML, YAML etc. As ION is reasonably compact and fast, using it over HTTP might be a first step for organisations looking to switch to IAP from e.g. HTTP/JSON, SOAP/XML etc.

Ultimately our goal is to make ION a more versatile alternative to JSON and capable of supporting a wider variety of data. We believe that in order to create a more versatile internet communication we really need a more versatile data format than JSON. Our co — founder and CTO Jakob Jenkov wrote some `ION Performance Benchmarks` — ION vs. JSON vs. Protobuf vs. MessagePack vs. CBOR.

Finally, over the coming weeks and months we`ll be sharing our journey with you as follow up posts i.e. Act 3, Act 4… Act N. In meantime please visit our github repo to checkout our code and stay up to date with our progress.

Posted by Bamborde, Co — Founder at Zaiku. Twitter @cloudbalde

Our Ambitious Quest to Democratise Real-time Distributed Systems — Act 2: ION vs JSON