Kafka & Redis Streams

Let’s talk about queue design.

We have a very long book which we would like many people to read. Some can read during their lunch hour, some read on Monday nights, others take it home for the weekend. The book is so long that at any point in time, we have hundreds of people reading it.

Readers of our book need to keep track of where they are up to in our book, so they keep track of their location by putting a bookmark in the book. Some readers read very slow, leaving their bookmark close to the beginning. Other readers give up halfway, leaving theirs in the middle and never coming back to it.

To make matters even worse, we are adding pages to this book every day. Nobody can actually finish this book.

Eventually our book fills up with bookmarks, until finally one day it is too heavy to carry and nobody can read it any more.

A very clever person then decided that readers should not be allowed to place bookmarks inside the book, and must instead write down the page they are up to on their diary.

This is the design of Apache Kafka, and it is a very resilient design. Readers are often not responsible citizens and often will not clean up after themselves, and the book may be the log of all the important events that happen in our company.

The common alternative design for other queues is for the queue service keep track of where readers are up to – this means needing to allocate memory per-reader. Badly behaved readers may repeatedly request new queue sessions and this can overwhelm the queue service. This often isn’t good a good design as we want readers to feel free to read without any risk to the queue.

Apache Kafka

Kafka is designed around a stream of events, such as:

1001. ‘tim’ has purchased travel deal ‘fiji’ 1002. ‘tim’ has updated his subscription preference to ‘daily’1003. ‘sam’ has logged in using ‘iphone’1004. ‘sam’ has opened travel deal ‘bali’1005. ‘sam’ has logged in using ‘desktop web’1006. ‘sam’ has purchased deal ‘bali’

Kafka event readers keep track of the ID in the stream which they have read up to, meaning that the event server does not need to keep track of them. This allows the Kafka event server to maintain predictable memory use, even with many poorly behaved readers.

Kafka sounds great, why Redis Streams?

Kafka is an excellent choice for storing a stream of events, and it designed for high scale. Kafka takes on extra complexity in order to achieve this scale. Provisioning and managing a Kafka setup does need an understanding of some complex concepts. For smaller projects, a simpler and smaller system can be a better choice. Although we all like to solve Google-scale problems, this is very rarely needed.

Redis is one of the most common choices for a simple, usually non-persistent data store. It has great library support across all popular programming languages, and is well understood by most developers. It is an excellent example of trading off strong distributed resiliency for simplicity. Redis now supports a simpler version of the Kafka event stream concept, making the architectural concept easily and inexpensively accessible to everyone.

To get started with this Redis Streams, I will introduce two new commands that are now in the ‘unstable’ branch of Redis. To follow along with this example, you can create an online Redis instance for free here in about 2 minutes. You can also pull the ‘unstable’ branch of Redis from here.

Writing to a Redis Stream

XADD stream_name * key1 value1 key2 value2 (etc)

XADD allows us to write a stream of events. Let’s create a stream of events reflecting the example above. We will name our stream ‘events’.

XADD events * user tim action purchase item travel:fijiXADD events * user tim action preferences subscription dailyXADD events * user sam action login platform iPhoneXADD events * user sam action visit item travel:baliXADD events * user sam action login platform “desktop web”XADD events * user sam action purchase item travel:bali

The ‘*’ is used to separate optional parameters from the set of key values.

This will write all these actions onto the ‘events’ steam.

Reading from a Redis Stream

XREAD COUNT 2 STREAMS events item-id

XREAD lets us read items from this queue. Let’s read two items at a time:

demo.wiftycloud.com:6379> XREAD COUNT 2 STREAMS events 0

1. “events”
2. 1. 1. 1512991598699–0
    2. 1. “user”
      2. “tim”
      3. “action”
      4. “purchase”
      5. “item”
      6. “travel:fiji”
  2. 1. 1512991602438–0
    2. 1. “user”
      2. “tim”
      3. “action”
      4. “preferences”
      5. “subscription”
      6. “daily”

We are requesting items in the stream ‘events’, starting at the beginning of the list (by passing ‘0’). We requested only 2 items.To get the next two items, instead of beginning at ‘0’, we now begin at the last ID returned by Redis — which in the example above is is ‘1512991602438–0’.

> XREAD COUNT 2 streams events 1512991602438–0

1. “events”
2. 1. 1. 1512991605766-0
    2. 1. “user”
      2. “sam”
      3. “action”
      4. “logon”
      5. “platform”
      6. “iPhone”
  2. 1. 1512991617871-0
    2. 1. “user”
      2. “sam”
      3. “action”
      4. “visit”
      5. “item”
      6. “travel:bali”

Other options are available — BLOCK allows us to have the Redis server wait on the connection until events are available.

Between these two commands, XADD, and XREAD, we can very easily construct a queue of business events.

Node.js Example

Library support for Streams is still not quite ready, however custom commands can currently be used. An example of doing this using ioredis can be found here.

TL;DR

Kafka is amazing, and Redis Streams is on the way to becoming a great LoFi alternative to Kafka for managing a streams of events.