Principal Software Engineer
Blockchain technology and Apache Kafka share characteristics which suggest a natural affinity. For instance, both share the concept of an ‘immutable append only log’. In the case of a Kafka partition:
Each partition is an ordered, immutable sequence of records that is continually appended to — a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition [Apache Kafka]
Whereas a blockchain can be described as:
a continuously growing list of records, called blocks, which are linked and secured using cryptography. Each block typically contains a hash pointer as a link to a previous block, a timestamp and transaction data [Wikipedia]
Clearly, these technologies share the parallel concepts of an immutable sequential structure, with Kafka being particularly optimized for high throughput and horizontal scalability, and blockchain excelling in guaranteeing the order and structure of a sequence.
By integrating these technologies, we can create a platform for experimenting with blockchain concepts.
Kafka provides a convenient framework for distributed peer to peer communication, with some characteristics particularly suitable for blockchain applications. While this approach may not be viable in a trustless public environment, there could be practical uses in a private or consortium network. See Scaling Blockchains with Apache Kafka for further ideas on how this could be implemented.
Additionally, with some experimentation, we may be able to draw on concepts already implemented in Kafka (e.g. sharding by partition) to explore solutions to blockchain challenges in public networks (e.g. scalability problems).
The purpose of this experiment is therefore to take a simple blockchain implementation and port it to the Kafka platform; we’ll take Kafka’s concept of a sequential log and guarantee immutability by chaining the entries together with hashes. The
blockchain topic on Kafka will become our distributed ledger. Graphically, it will look like this:
Kafka is a streaming platform designed for high-throughput, real-time messaging, i.e. it enables publication and subscription to streams of records. In this respect it is similar to a message queue or a traditional enterprise messaging system. Some of the characteristics are:
In Kafka, each topic is split into partitions, where each partition is a sequence of records which is continually appended to. This is similar to a text log file, where new lines are appended to the end. The entries in the partition are each assigned a sequential id, called an offset, which uniquely identifies the record.
The Kafka broker can be queried by offset, i.e. a consumer can reset its offset to some arbitrary point in the log to retrieve records from that point forward.
Full source code is available here.
On startup, our Kafka consumer will try to do three things: initialize a new blockchain if one has not yet been created; build an internal representation of the current state of the blockchain topic; then begin reading transactions in a loop:
The initialization step looks like this:
First, we find the highest available offset on the blockchain topic. If nothing has ever been published to the topic, the blockchain is new, so we start by creating and publishing the genesis block:
read_and_validate_chain(), we’ll first create a consumer to read from the
Some notes on the parameters we’re creating this consumer with:
blockchaingroup allows the broker to keep a reference of the offset the consumers have reached, for a given partition and topic
auto_offset_reset=OffsetType.EARLIESTindicates that we’ll begin downloading messages from the start of the topic.
auto_commit_enable=Trueperiodically notifies the broker of the offset we’ve just consumed (as opposed to manually committing)
reset_offset_on_start=Trueis a switch which activates the
auto_offset_resetfor the consumer
consumer_timeout_ms=5000will trigger the consumer to return from the method after five seconds if no new messages are being read (we’ve reached the end of the chain)
Then we begin reading block messages from the
For each message we receive:
At the end of this process, we’ll have downloaded the whole chain, discarding any invalid blocks, and we’ll have a reference to the offset of the latest block.
At this point, we’re ready to create a consumer on the
Our example topic has been created with two partitions, to demonstrate how partitioning works in Kafka. The partitions are set up in the
docker-compose.yml file, with this line:
transactions:2:1 specifies the number of partitions and the replication factor (i.e. how many brokers will maintain a copy of the data on this partition).
This time, our consumer will start from
OffsetType.LATEST so we only get transactions published from the current time onwards.
By pinning the consumer to a specific partition of the
transactions topic, we can increase the total throughput of all consumers on the topic. The Kafka broker will evenly distribute incoming messages across the two partitions of the transactions topic, unless we specify a partition when we publish to the topic. This means each consumer will be responsible for processing 50% of the messages, doubling the potential throughput of a single consumer.
Now we can begin consuming transactions:
As transactions are received, we’ll add them to an internal list. Every three transactions, we’ll create a new block and call
read_and_validate_chainfrom before, this time supplying our latest known offset to retrieve only the newer blocks.
docker-compose up -d
2. Run a consumer on partition 0:
python kafka_blockchain.py 0
3. Publish 3 transactions directly to partition 0:
4. Check the transactions were added to a block on the
kafkacat -C -b kafka:9092 -t blockchain
You should see output like this:
To balance transactions across two consumers, start a second consumer on partition 1, and remove
-p 0 from the publication script above.
Kafka can provide the foundation for a simple framework for blockchain experimentation. We can take advantage of features built into the platform, and associated tools like kafkacat, to experiment with distributed peer to peer transactions.
While scaling transactions in a public setting presents one set of issues, within a private network or consortium, where real-world trust is already established, transaction scaling might be achieved via an implementation which takes advantage of Kafka concepts.
Create your free account to unlock your custom reading experience.