This is a 1.0 story that I editedThanks to Tom de Ruijter, Steven Reitsma and Laurens Koppenol for proof reading this post. While playing Factorio the other day, I was struck by the many similarities with Apache Kafka. If you aren’t familiar with them: Factorio is an open-world RTS where you build and optimize supply chains in order to launch a satellite and restore communications with your home planet, and Kafka is a distributed streaming platform, which handles asynchronous communication in a durable way. I wonder how far we can take the analogy between Factorio and Kafka before it starts to break down. Let’s start from scratch, explore the core Kafka concepts through Factorio visualizations, and have some fun along the way. Why bother with async messaging? Let’s say we have three microservices. One for mining iron ore, one for smelting iron ore into iron plates, and one for producing iron gear wheels from these plates. We can chain these services with synchronous HTTP calls. Whenever our mining drill has new iron ore, it does a POST call on the smelting furnace, which in turn POSTs to the factory. From left to right: mining, smelting and producing — tightly coupled via synchronous communication This setup served us well, until there was a power outage in the factory. The furnace’s HTTP calls failed, causing the mining drill’s calls to fail as well. We can implement circuit breakers and retries to prevent cascading failures and message loss, but at some point we’ll have to stop trying, or we’ll run out of memory. Power outage at the factory If only there was a way to decouple these microservices... This is, of course, where Kafka comes in. With Kafka, you can store streams of records in a fault-tolerant and durable way. In Kafka terminology, these streams are called topics. Microservices decoupled by asynchronous messaging With asynchronous topics between services, messages, or records, are buffered during peak loads, and when there is an outage. These buffers obviously have limited capacity, so let’s talk about scalability. We can increase storage capacity and throughput by adding Kafka servers to the cluster. Another way is to increase disk size (for storage), or CPU and network speed (for throughput). Which of these options give you the best value for money is use-case specific, but buying bigger servers — unlike buying more servers — is subject to the law of diminishing returns. Kafka’s capacity scales linearly with each node added, so that’s usually the way to go. Vertical scaling — a bigger, exponentially more expensive server Horizontal scaling — distribute the load over more servers In order to divide a topic between multiple servers, we need a way to split a topic into smaller substreams. These substreams are called partitions. Whenever a service produces a new record, this service gets to decide which partition the record should land on. https://imgur.com/a/7lPAdVA A wagon producing records, a partitioner that puts messages on the right partition, and a topic with four partitions. The default partitioner hashes the message key and modulos that over the number of partitions: return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;That way messages with the same key always end up on the same partition. Note that messages are only guaranteed to be ordered within the context of a producer and partition. Records from multiple producers, or from a single producer on multiple partitions, can interleave. Now that we know how messages are put onto topics, let’s see how they are consumed. When you start listening to a topic, by default the records from all partitions are routed to you. It’s common though, to have multiple instances of a microservice running at the same time to achieve higher throughput or availability. If they all start listening to the topic, each record gets processed by each instance, which is usually not what you want. All microservice instances consume all messages Consumer groups allow you to evenly divide the partitions among multiple consumers. When a microservice instance joins the consumer group, Kafka will reassign some of the partitions to it. Likewise, when an instance crashes, or leaves the group for another reason, its partitions will be assigned to other instances. Kafka makes sure the partitions are always evenly divided among the consumers in each group. A single consumer group with three consumers If there’s a topic where the number of records per partition are skewed, you might be in trouble. An instance might not be able to keep up, because it was assigned the partition with many records, while other instances are idle. It’s up to you to make sure that there are no partitions that have vastly more records than others. Messages are piling up on a hot partition Each consumer keeps track of which records it has processed. Since records are processed in order, a simple offset is enough. Every once in a while (5 seconds by default), a consumer will commit its offset to Kafka. When a consumer leaves its group, its partitions are given to other consumer in the group. The new consumers will be able to start requesting records starting at the offset where the previous consumer stopped. It is possible that a record was processed, but not yet committed. You’ll either have to start at the committed offset, or start processing new messages and skip everything that’s not yet processed. This is why Kafka can only guarantee that messages are delivered at least once, or at most once. The analogy no longer really makes sense when we start duplicating data. With Kafka, we can process a single record multiple times. Multiple consumer groups can consume the same records. Topics can be stored with a replication factor of three for reliability. Topics can have a retention period after which records are deleted. All this is possible because data, unlike iron, can be duplicated easily. This is a good place to end this post. We’ve covered all the major concepts of Kafka and you should have a general understanding of how Kafka works. Let’s wrap up with a short recap. What we learned Kafka is a distributed streaming platform that stores records in a durable way through replicating records across multiple servers. Topics consist of partitions, that store records in order. Partitioners decide which records belong on which partitions. Consumer groups are optional, and help distribute partitions among consumers for scalability. Offsets are committed as checkpoints for when consumers crash. And that, in a nutshell, is how Kafka works. About the author Ruurtjan Pul (Twitter: @ruurtjan) is a data engineer at BigData Republic, a data science consultancy company in the Netherlands. We hire the best of the best in BigData Science and BigData Engineering. If you are interested in using Kafka and other data engineering tools on practical use cases, feel free to contact us at info@bigdatarepublic.nl. Learn more Apache KafkaFactorioAchieving stack and heap safety in recursive functionsFeature importance - what's in a name? This is a 1.0 story that I edited Thanks to Tom de Ruijter, Steven Reitsma and Laurens Koppenol for proof reading this post. Thanks to Tom de Ruijter, Tom de Ruijter, Steven Reitsma Steven Reitsma and Laurens Koppenol Laurens Koppenol for proof reading this post. While playing Factorio the other day, I was struck by the many similarities with Apache Kafka . If you aren’t familiar with them: Factorio is an open-world RTS where you build and optimize supply chains in order to launch a satellite and restore communications with your home planet, and Kafka is a distributed streaming platform, which handles asynchronous communication in a durable way. Factorio Apache Kafka I wonder how far we can take the analogy between Factorio and Kafka before it starts to break down. Let’s start from scratch, explore the core Kafka concepts through Factorio visualizations, and have some fun along the way. Why bother with async messaging? Let’s say we have three microservices. One for mining iron ore, one for smelting iron ore into iron plates, and one for producing iron gear wheels from these plates. We can chain these services with synchronous HTTP calls. Whenever our mining drill has new iron ore, it does a POST call on the smelting furnace, which in turn POSTs to the factory. From left to right: mining, smelting and producing — tightly coupled via synchronous communication From left to right: mining, smelting and producing — tightly coupled via synchronous communication This setup served us well, until there was a power outage in the factory. The furnace’s HTTP calls failed, causing the mining drill’s calls to fail as well. We can implement circuit breakers and retries to prevent cascading failures and message loss, but at some point we’ll have to stop trying, or we’ll run out of memory. circuit breakers Power outage at the factory Power outage at the factory If only there was a way to decouple these microservices... This is, of course, where Kafka comes in. With Kafka, you can store streams of records in a fault-tolerant and durable way. In Kafka terminology, these streams are called topics . topics Microservices decoupled by asynchronous messaging Microservices decoupled by asynchronous messaging With asynchronous topics between services, messages, or records , are buffered during peak loads, and when there is an outage. These buffers obviously have limited capacity, so let’s talk about scalability. records We can increase storage capacity and throughput by adding Kafka servers to the cluster. Another way is to increase disk size (for storage), or CPU and network speed (for throughput). Which of these options give you the best value for money is use-case specific, but buying bigger servers — unlike buying more servers — is subject to the law of diminishing returns . Kafka’s capacity scales linearly with each node added, so that’s usually the way to go. bigger more the law of diminishing returns Vertical scaling — a bigger, exponentially more expensive server Vertical scaling — a bigger, exponentially more expensive server Horizontal scaling — distribute the load over more servers Horizontal scaling — distribute the load over more servers In order to divide a topic between multiple servers, we need a way to split a topic into smaller substreams. These substreams are called partitions. Whenever a service produces a new record, this service gets to decide which partition the record should land on. partitions. produces https://imgur.com/a/7lPAdVA https://imgur.com/a/7lPAdVA A wagon producing records, a partitioner that puts messages on the right partition, and a topic with four partitions. A wagon producing records, a partitioner that puts messages on the right partition, and a topic with four partitions. The default partitioner hashes the message key and modulos that over the number of partitions: default partitioner partitioner return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions; That way messages with the same key always end up on the same partition. return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions; return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions; Note that messages are only guaranteed to be ordered within the context of a producer and partition. Records from multiple producers, or from a single producer on multiple partitions, can interleave. Now that we know how messages are put onto topics, let’s see how they are consumed . When you start listening to a topic, by default the records from all partitions are routed to you. It’s common though, to have multiple instances of a microservice running at the same time to achieve higher throughput or availability. If they all start listening to the topic, each record gets processed by each instance, which is usually not what you want. consumed All microservice instances consume all messages All microservice instances consume all messages Consumer groups allow you to evenly divide the partitions among multiple consumers. When a microservice instance joins the consumer group, Kafka will reassign some of the partitions to it. Likewise, when an instance crashes, or leaves the group for another reason, its partitions will be assigned to other instances. Kafka makes sure the partitions are always evenly divided among the consumers in each group. Consumer groups A single consumer group with three consumers A single consumer group with three consumers If there’s a topic where the number of records per partition are skewed, you might be in trouble. An instance might not be able to keep up, because it was assigned the partition with many records, while other instances are idle. It’s up to you to make sure that there are no partitions that have vastly more records than others. Messages are piling up on a hot partition Messages are piling up on a hot partition Each consumer keeps track of which records it has processed. Since records are processed in order, a simple offset is enough. Every once in a while (5 seconds by default), a consumer will commit its offset to Kafka . offset commit . When a consumer leaves its group, its partitions are given to other consumer in the group. The new consumers will be able to start requesting records starting at the offset where the previous consumer stopped. It is possible that a record was processed, but not yet committed. You’ll either have to start at the committed offset, or start processing new messages and skip everything that’s not yet processed. This is why Kafka can only guarantee that messages are delivered at least once, or at most once. The analogy no longer really makes sense when we start duplicating data. With Kafka, we can process a single record multiple times. Multiple consumer groups can consume the same records. Topics can be stored with a replication factor of three for reliability. Topics can have a retention period after which records are deleted. All this is possible because data, unlike iron, can be duplicated easily. replication factor retention period This is a good place to end this post. We’ve covered all the major concepts of Kafka and you should have a general understanding of how Kafka works. Let’s wrap up with a short recap. What we learned Kafka is a distributed streaming platform that stores records in a durable way through replicating records across multiple servers. Topics consist of partitions, that store records in order. Partitioners decide which records belong on which partitions. Consumer groups are optional, and help distribute partitions among consumers for scalability. Offsets are committed as checkpoints for when consumers crash. And that, in a nutshell, is how Kafka works. About the author Ruurtjan Pul ( Twitter: @ruurtjan ) is a data engineer at BigData Republic , a data science consultancy company in the Netherlands. We hire the best of the best in BigData Science and BigData Engineering. If you are interested in using Kafka and other data engineering tools on practical use cases, feel free to contact us at info@bigdatarepublic.nl . Ruurtjan Pul Twitter: @ruurtjan BigData Republic BigData Republic info@bigdatarepublic.nl info@bigdatarepublic.nl Learn more Apache Kafka Factorio Achieving stack and heap safety in recursive functions Feature importance - what's in a name? Apache Kafka Apache Kafka Factorio Factorio Achieving stack and heap safety in recursive functions Achieving stack and heap safety in recursive functions Feature importance - what's in a name? Feature importance - what's in a name?

Understanding Kafka with Factorio

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

"DNS Propagation" Does Not Exist: A Suggested Change In Terminology

The Noonification: If You Cant Beat Em, Buy Em! (10/18/2023)

The Noonification: The Conversational AI Arms Race Has Begun (2/9/2023)

3 Top Resources To Learn About Apache Kafka

47 Stories To Learn About Kafka

5 Things Every Apache Kafka Dev Needs To Know: A Performance and Architectural Deep Dive

"DNS Propagation" Does Not Exist: A Suggested Change In Terminology

The Noonification: If You Cant Beat Em, Buy Em! (10/18/2023)

The Noonification: The Conversational AI Arms Race Has Begun (2/9/2023)

3 Top Resources To Learn About Apache Kafka

47 Stories To Learn About Kafka

5 Things Every Apache Kafka Dev Needs To Know: A Performance and Architectural Deep Dive

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps