This article’s aim is to give you a very quick overview of how Kafka relates to queues, and why you would consider using it instead.
Kafka is a piece of technology originally developed by the folks at Linkedin. In a nutshell, it’s sort of like a message queueing system with a few twists that enable it to support pub/sub, scaling out over many servers, and replaying of messages.
These are all concerns when you want to adopt a reactive programming style over an imperative programming style.
The difference between imperative programming and reactive programming
Imperative programming is the type of programming we all start out with. Something happens, in other words an event occurs, and your code is notified of that event. For example, a user clicked a button and where you handle the event in your code, you decide what that action should mean to your system. You might save records to a DB, call another service, send an email, or a combination of all of these. The important bit here, is that the event is directly coupled to specific actions taking place.
Reactive programming enables you to respond to events that occur, often in the form of streams. Multiple concerns can subscribe to the same event and let the event have it’s effect in it’s domain, regardless of what happens in other domains. In other words, it allows for loosely coupled code that can easily be extended with more functionality. It’s possible that various big down-stream systems coded in different stacks are affected by an event, or even a whole bunch of serverless functions executing somewhere in the cloud.
From queues to Kafka
To understand what Kafka will bring to your architecture, let’s start by talking about message queues. We’ll start here, because we will talk about it’s limitations and then see how Kafka solves them.
A message queue allows a bunch of subscribers to pull a message, or a batch of messages, from the end of the queue. Queues usually allow for some level of transaction when pulling a message off, to ensure that the desired action was executed, before the message gets removed.
Not all queueing systems have the same functionality, but once a message has been processed, it gets removed from the queue. If you think about it, it’s very similar to imperative programming, something happened, and the originating system decided that a certain action should occur in a downstream system.
Even though you can scale out with multiple consumers on the queue, they will all contain the same functionality, and this is done just to handle load and process messages in parallel, in other words, it doesn’t allow you to kick off multiple independent actions based on the same event. All the processors of of the queue messages will execute the same type of logic in the same domain. This means that the messages in the queue are actually commands, which is suited towards imperative programming, and not an event, which is suited towards reactive programming.
With Kafka on the other hand, you publish messages/events to topics, and they get persisted. They don’t get removed when consumers receive them. This allows you to replay messages, but more importantly, it allows a multitude of consumers to process logic based on the same messages/events.
You can still scale out to get parallel processing in the same domain, but more importantly, you can add different types of consumers that execute different logic based on the same event. In other words, with Kafka, you can adopt a reactive pub/sub architecture.
This is possible with Kafka due to the fact that messages are retained and the concept of consumer groups. Consumer groups in Kafka identify themselves to Kafka when they ask for messages on a topic. Kafka will record which messages (offset) were delivered to which consumer group, so that it doesn’t serve it up again. Actually, it is a bit more complex than that, because you have a bunch of configuration options available to control this, but we don’t need to explore the options fully just to understand Kafka at a high level.
There is a bunch more to Kafka, for example how it manages scaling out (partitions), configuration options for reliable messaging, etc. But my hope is that this article was good enough to let you understand why you would consider adopting Kafka over good ‘ol message queues.