If you’re an architect or developer looking at event-driven architectures, stream processing might be just what you need to make your app faster, more scalable, and more decoupled.
In this article—the third in a series about event-driven architectures—we will review a little of the first article in the series, which outlined the benefits of event-driven architectures, some of the options, and a few patterns and anti-patterns. We will also review the second article, which provided more detail on message queues and deployed a quick-start message queue using Redis and RSMQ.
This article will also dive deeper into stream processing. We will discuss why you might pick stream processing as your architecture, some of the pros and cons, and a quick-to-deploy reference architecture using Apache Kafka.
Stream processing is a type of event-driven architecture. In event-driven architectures, when a component performs some piece of work that other components might be interested in, that component (called a producer) produces an event—a record of the performed action. Other components (called consumers) consume those events so that they can perform their own tasks as a result of the event.
This decoupling of consumers and producers gives event-driven architectures several benefits:
There are two main kinds of event-driven architectures: message queues and stream processing. Let's dive into the differences.
With message queues, the original event-driven architecture, the producer places a message into a queue targeted to a specific consumer. That message is held in the queue (often in first-in, first-out order) until the consumer retrieves it, at which time the message is deleted.
Message queues are useful for systems where you know exactly what needs to happen as a result of an event. When an issue occurs, your producer sends a message to the queue, targeted to some consumer(s). Those consumers obtain the message from the queue and then execute the next operation.
Once that next step is taken, the event is removed from the queue forever. In the case of message queues, the flow is generally known by the queue, giving rise to the term “smart broker/dumb consumer”, which means the broker (queue) knows where to send a message, and the consumer is just reacting.
With stream processing, messages are not targeted to a certain recipient, but rather are published at-large to a specific topic and available to all interested consumers. Any and all interested recipients can subscribe to that topic and read the message. Since the message must be available to all consumers, the message is not deleted when it is read from the stream.
Producers and brokers don’t need or want to know what will happen as a result of a message, or where that message will go. The producer just sends the message to broker, the broker publishes it, and the producer and broker move on. Interested consumers receive the message and complete their processing. Because of this further decoupling, systems with event streaming can evolve easily as the project evolves.
Consumers can be added and deleted and can change how and what they process, regardless of the overall system. The producer and the broker don’t need to know about these changes because the services are decoupled. This is often referred to as “dumb broker/smart consumer”—the broker (stream) is just a broker, and has no knowledge of routing. The consumers in message processing are the smart components; they are aware of what messages to listen for.
Also, consumers can retrieve multiple messages at the same time and since messages are not deleted, consumers can replay a series of messages going back in time. For example, a new consumer can go back and read older messages from before that consumer was deployed.
Stream processing has become the go-to choice for many event-driven systems. It offers several advantages over message queues including multiple consumers, replay of events, and sliding window statistics. Overall, you gain a major increase in flexibility.
Here are a several use cases for each:
Message Queues
Message queues, such as RabbitMQ and ActiveMQ are popular. Message queues are particularly helpful in systems where you have known or complex routing logic, or when you need to guarantee a single delivery of each message.
A typical use case for message queues is a busy ecommerce website where your services must be highly available, your requests must be delivered, and your routing logic is known and unlikely to change. With these constraints, message queues give you the powerful advantages of asynchronous communication and decoupled services, while keeping your architecture simple.
Additional use cases often involve system dependencies or constraints, such as a system having a frontend and backend written in different languages or a need to integrate into legacy infrastructure.
Stream Processing
Stream processing is useful for systems with more complex consumers of messages such as:
In our previous article, we deployed a quick-to-stand-up message queue to learn about queues. Let’s do a similar example stream processing.
There are many options for stream processing architectures, including the following:
We'll use the Apache Kafka reference architecture on Heroku. Heroku is a cloud platform-as a service (PaaS) that offers Kafka as an add-on. Their cloud platform makes it easy to deploy a streaming system rather than hosting or running your own. Since Heroku provides a Terraform script that deploys all the needed code and configuration for you in one step, it's a quick and easy way to learn about stream processing.
We won’t walk through the deployment steps here, as they are outlined in detail on the reference architecture page. However, it deploys an example ecommerce system that showcases the major components and advantages of stream processing. Clicks to browse or purchase products are recorded as events to Kafka.
Here is a key snippet of code from edm-relay, which sends messages to the Kafka stream. It's quite simple to publish events to Kafka since it's only a matter of calling the producer API to insert a JSON object.
app.post('/produceClickMessage', function (req, res) {
try {
const topic = `${process.env.KAFKA_PREFIX}${req.body.topic}`;
console.log(`topic: ${topic}`);
producer.produce(
topic,
null,
// Message to send. Must be a buffer
Buffer.from(JSON.stringify(req.body)),
// for keyed messages, we also specify the key - note that this field is optional
null,
// you can send a timestamp here. If your broker version supports it,
// it will get added. Otherwise, we default to 0
Date.now(),
);
} catch (err) {
console.error('A problem occurred when sending our message');
throw err;
}
res.status(200).send("{\"message\":\"Success!\"}")
});
A real-time dashboard then consumes the stream of click events and displays analytics. This could be useful for business analytics to explore the most popular products, changing trends, and so on.
Here is the code from edm-stream that subscribes to the topic:
.on('ready', (id, metadata) => {
consumer.subscribe(kafkaTopics);
consumer.consume();
consumer.on('error', err => {
console.log(`! Error in Kafka consumer: ${err.stack}`);
});
console.log('Kafka consumer ready.' + JSON.stringify(metadata));
clearTimeout(connectTimoutId);
})
and then consumes the message from the stream by calling an event handler for each message:
.on('data', function(data) {
const message = data.value.toString()
console.log(message, `Offset: ${data.offset}`, `partition: ${data.partition}`, `consumerId: edm/${process.env.DYNO || 'localhost'}`);
socket.sockets.emit('event', message);
consumer.commitMessage(data);
})
The reference architecture is not just about buying coffee; it's a starting point for any web app where you want to track clicks and report in a real-time dashboard. It's open source, so feel free to experiment and modify it according to your own needs.
Stream processing not only decouples your components so that they are easy to build, test, deploy, and scale independently, but also adds yet another layer of decoupling by creating a “dumb” broker between your components.
If you haven’t already, read our other articles in this series on the advantages of event-driven architecture and deploying a sample message queue using Redis and RSMQ.