ION: A Data Format for Intelligent Real-time Distributed Systems

In our previous posts [1] [2] we briefly introduced our data format ION which is an integral part of project. In this post, we`ll share a little bit more lightweight technical information about ION and how we think it can contribute to help developers build a better distributed systems ecosystem.

But first we need to clarify something that otherwise may cause confusion. We published our ION data format well before Amazon publicly published their ION format. Therefore we did not choose the name ION because of its similarity to Amazon’s ION. In fact we did not discover the similarities until after we had designed and named our ION format. The acronym ION derives from “IAP Object Notation” where our open network protocol IAP stands for “Internet Application Protocol”. A straightforward evidence is the following;

1. Our Co — Founder published an article at here about IAP which clearly mentions ION.

2. Our Hacker News announcement here about ION versus a Hacker News announcement post here about Amazon`s ION. As you will notice our Hacker News announcement is much older than the Amazon`s announcement.

What Is ION (our ION!)?

In short, ION is a versatile binary data format that can be used to encode a wide variety of data. It is expressive enough to contain serialized objects (e.g. Java or C# objects), CSV, JSON, XML, text and binary data.

It is very fast and reasonably easy to parse and generate, more compressed on the wire than JSON and XML, and easy to handle for servers and routers and other lightweight hardware (we believe).

ION is one of the central pieces of our open distributed systems stack as illustrated bellow. We designed ION as default data format for IAP, thus, all IAP messages are encoded using ION.

IAP is a versatile message oriented network protocol designed for both synchronous and asynchronous communication, making IAP suitable for many different use cases such as RPC, file exchange, streaming, message queue subscriptions. We created IAP because existing protocols such as HTTP did not meet our versatility and high performance requirements.

The Definition of Intelligent Real-time Distributed Systems

At we believe that the conditions set out in the following two definitions are the minimum and sufficient conditions that a system needs to satisfy to be classed as “Intelligent Real-time Distributed System”.

Definition 1: Let S be a distributed system. We say S is an Intelligent Distributed System if S satisfies at least the following conditions;

(i) Plug and play style interaction between nodes in the system.

(ii) Versatile, meaning it supports many different use cases and architectures.

(iii) Robust, meaning it can survive in case of failure.

(iv) Self-healing capability when failure conditions are fixed.

Definition 2: A distributed system S is a Real-time Distributed System if it satisfies at the least the following conditions;

(i) Communicates via a fast network protocol. This in term means a compact, binary protocol.

(ii) Communicates asynchronously, as many real-time systems require.

(iii) Can handle at least 1.000 messages per second (though our real goal is between 10.000 to 100.000+ messages per second).

Why do we need a new data format?

First, when exchanging data between nodes in a distributed system it is advantageous to encode that data using a fast, compact and versatile data format. We felt that the existing formats (e.g. Protobuf, CBOR, MessagePack, JSON) were not versatile and fast enough for the type of use cases that we envisioned distributed systems of the future will be.

Second, a fast data format is faster to read and write (deserialize and serialize) for the communicating nodes. ION can even be traversed in its binary form if developers need maximum speed.

Third, a compact data format requires less bytes to represent the encoded data. Fewer bytes requires less network bandwidth and can thus be transferred faster across the network. A compact data format is also an advantage when reading and writing it.

Finally, a versatile data format is a data format that can be used for as many use cases as possible. This minimizes the need for inventing new data formats. Of course, there is always the possibility of someone else coming up with a better data format than ION. But we would welcome that in the spirit of open innovation!

ION Outside IAP

Being a data format ION can be used independently of IAP. Developers can use ION as a data format in data files, log files, as data format for binary messages transmitted over HTTP etc.

ION can contain binary data so developers can also embed other formats inside ION when necessary. For instance, an MP3 file, ZIP file, JPG file etc.

ION vs. Other Data Formats

ION is a data format which is similar to a binary version of JSON. In that respect ION is similar to MessagePack, CBOR and Amazon’s ION but with a advantageous differences.

We have a more detailed description of how ION compares to other data formats in the text ION vs. Other Formats and ION Performance Benchmarks.

Current Release

The new 0.5.0 release contains an improvement of the IonObjectReader and IonObjectWriter and related classes. These classes are now able to read and write object graphs where a POJO class contains a field of its own type. For instance, a Node class which contains a Node child1 and Node child2 instance variables inside.

Before this release trying to create an IonObjectReader or IonObjectWriter with classes that referenced itself would result in a stack overflow exception. This is now fixed.

Over coming weeks, we will be publishing new ION updates and update documentation, benchmarks etc.

Finally, please download our free book authored by our CTO here. Our open source Java toolkit Grid Ops is available on Github. If you would like to receive beta invite for our hosted infrastructure services when we launch, please Subscribe here.

Posted by Bambordé Baldé, Co — Founder.

More by Zaiku

Topics of interest

More Related Stories