Blockchain technology and Apache Kafka share characteristics which suggest a natural affinity. For instance, both share the concept of an ‘immutable append only log’. In the case of a Kafka partition:
Each partition is an ordered, immutable sequence of records that is continually appended to — a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition [Apache Kafka]
Whereas a blockchain can be described as:
a continuously growing list of records, called blocks, which are linked and secured using cryptography. Each block typically contains a hash pointer as a link to a previous block, a timestamp and transaction data [Wikipedia]
Clearly, these technologies share the parallel concepts of an immutable sequential structure, with Kafka being particularly optimized for high throughput and horizontal scalability, and blockchain excelling in guaranteeing the order and structure of a sequence.
By integrating these technologies, we can create a platform for experimenting with blockchain concepts.
Kafka provides a convenient framework for distributed peer to peer communication, with some characteristics particularly suitable for blockchain applications. While this approach may not be viable in a trustless public environment, there could be practical uses in a private or consortium network. See Scaling Blockchains with Apache Kafka for further ideas on how this could be implemented.
Additionally, with some experimentation, we may be able to draw on concepts already implemented in Kafka (e.g. sharding by partition) to explore solutions to blockchain challenges in public networks (e.g. scalability problems).
The purpose of this experiment is therefore to take a simple blockchain implementation and port it to the Kafka platform; we’ll take Kafka’s concept of a sequential log and guarantee immutability by chaining the entries together with hashes. The blockchain
topic on Kafka will become our distributed ledger. Graphically, it will look like this:
Visual of a Kafka blockchain
Kafka is a streaming platform designed for high-throughput, real-time messaging, i.e. it enables publication and subscription to streams of records. In this respect it is similar to a message queue or a traditional enterprise messaging system. Some of the characteristics are:
In Kafka, each topic is split into partitions, where each partition is a sequence of records which is continually appended to. This is similar to a text log file, where new lines are appended to the end. The entries in the partition are each assigned a sequential id, called an offset, which uniquely identifies the record.
Kafka partitioning
The Kafka broker can be queried by offset, i.e. a consumer can reset its offset to some arbitrary point in the log to retrieve records from that point forward.
Full source code is available here.
On startup, our Kafka consumer will try to do three things: initialize a new blockchain if one has not yet been created; build an internal representation of the current state of the blockchain topic; then begin reading transactions in a loop:
Start
The initialization step looks like this:
Initialize the chain
First, we find the highest available offset on the blockchain topic. If nothing has ever been published to the topic, the blockchain is new, so we start by creating and publishing the genesis block:
The genesis block
In read_and_validate_chain()
, we’ll first create a consumer to read from the blockchain
topic:
Create the blockchain consumer
Some notes on the parameters we’re creating this consumer with:
blockchain
group allows the broker to keep a reference of the offset the consumers have reached, for a given partition and topicauto_offset_reset=OffsetType.EARLIEST
indicates that we’ll begin downloading messages from the start of the topic.auto_commit_enable=True
periodically notifies the broker of the offset we’ve just consumed (as opposed to manually committing)reset_offset_on_start=True
is a switch which activates the auto_offset_reset
for the consumerconsumer_timeout_ms=5000
will trigger the consumer to return from the method after five seconds if no new messages are being read (we’ve reached the end of the chain)Then we begin reading block messages from the blockchain
topic:
Read blocks
For each message we receive:
At the end of this process, we’ll have downloaded the whole chain, discarding any invalid blocks, and we’ll have a reference to the offset of the latest block.
At this point, we’re ready to create a consumer on the transactions
topic:
Create the transaction consumer
Our example topic has been created with two partitions, to demonstrate how partitioning works in Kafka. The partitions are set up in the docker-compose.yml
file, with this line:
KAFKA_CREATE_TOPICS=transactions:2:1,blockchain:1:1
transactions:2:1
specifies the number of partitions and the replication factor (i.e. how many brokers will maintain a copy of the data on this partition).
This time, our consumer will start from OffsetType.LATEST
so we only get transactions published from the current time onwards.
By pinning the consumer to a specific partition of the transactions
topic, we can increase the total throughput of all consumers on the topic. The Kafka broker will evenly distribute incoming messages across the two partitions of the transactions topic, unless we specify a partition when we publish to the topic. This means each consumer will be responsible for processing 50% of the messages, doubling the potential throughput of a single consumer.
Now we can begin consuming transactions:
Start consuming transactions
As transactions are received, we’ll add them to an internal list. Every three transactions, we’ll create a new block and call mine()
:
Mine a block
read_and_validate_chain
from before, this time supplying our latest known offset to retrieve only the newer blocks.
Publish a block
docker-compose up -d
2. Run a consumer on partition 0:
python kafka_blockchain.py 0
3. Publish 3 transactions directly to partition 0:
4. Check the transactions were added to a block on the blockchain
topic:
kafkacat -C -b kafka:9092 -t blockchain
You should see output like this:
To balance transactions across two consumers, start a second consumer on partition 1, and remove -p 0
from the publication script above.
Kafka can provide the foundation for a simple framework for blockchain experimentation. We can take advantage of features built into the platform, and associated tools like kafkacat, to experiment with distributed peer to peer transactions.
While scaling transactions in a public setting presents one set of issues, within a private network or consortium, where real-world trust is already established, transaction scaling might be achieved via an implementation which takes advantage of Kafka concepts.