Thanks to Tom de Ruijter, Steven Reitsma and Laurens Koppenol for proof reading this post. While playing the other day, I was struck by the many similarities with . If you aren’t familiar with them: Factorio is an open-world RTS where you build and optimize supply chains in order to launch a satellite and restore communications with your home planet, and Kafka is a distributed streaming platform, which handles asynchronous communication in a durable way. Factorio Apache Kafka I wonder how far we can take the analogy between Factorio and Kafka before it starts to break down. Let’s start from scratch, explore the core Kafka concepts through Factorio visualizations, and have some fun along the way. Why bother with async messaging? Let’s say we have three microservices. One for mining iron ore, one for smelting iron ore into iron plates, and one for producing iron gear wheels from these plates. We can chain these services with synchronous HTTP calls. Whenever our mining drill has new iron ore, it does a POST call on the smelting furnace, which in turn POSTs to the factory. From left to right: mining, smelting and producing — tightly coupled via synchronous communication This setup served us well, until there was a power outage in the factory. The furnace’s HTTP calls failed, causing the mining drill’s calls to fail as well. We can implement and retries to prevent cascading failures and message loss, but at some point we’ll have to stop trying, or we’ll run out of memory. circuit breakers Power outage at the factory If only there was a way to decouple these microservices... This is, of course, where Kafka comes in. With Kafka, you can store streams of records in a fault-tolerant and durable way. In Kafka terminology, these streams are called . topics Microservices decoupled by asynchronous messaging With asynchronous topics between services, messages, or , are buffered during peak loads, and when there is an outage. These buffers obviously have limited capacity, so let’s talk about scalability. records We can increase storage capacity and throughput by adding Kafka servers to the cluster. Another way is to increase disk size (for storage), or CPU and network speed (for throughput). Which of these options give you the best value for money is use-case specific, but buying servers — unlike buying servers — is subject to . Kafka’s capacity scales linearly with each node added, so that’s usually the way to go. bigger more the law of diminishing returns Vertical scaling — a bigger, exponentially more expensive server Horizontal scaling — distribute the load over more servers To divide a topic between multiple servers, we need a way to split a topic into smaller substreams. These substreams are called Whenever a service a new record, this service gets to decide which partition the record should land on. partitions. produces A wagon producing records, a partitioner that puts messages on the right partition, and a topic with four partitions. The hashes the message key and modulos that over the number of partitions: default partitioner <a href="https://medium.com/media/1fd7ab0d0c92a4b751ae414957285d73/href">https://medium.com/media/1fd7ab0d0c92a4b751ae414957285d73/href</a> That way messages with the same key always end up on the same partition. Note that messages are only guaranteed to be ordered within the context of a producer and partition. Records from multiple producers, or from a single producer on multiple partitions, can interleave. Now that we know how messages are put onto topics, let’s see how they are . When you start listening to a topic, by default the records from all partitions are routed to you. It’s common though, to have multiple instances of a microservice running at the same time to achieve higher throughput or availability. If they all start listening to the topic, each record gets processed by each instance, which is usually not what you want. consumed All microservice instances consume all messages allow you to evenly divide the partitions among multiple consumers. When a microservice instance joins the consumer group, Kafka will reassign some of the partitions to it. Likewise, when an instance crashes, or leaves the group for another reason, its partitions will be assigned to other instances. Kafka makes sure the partitions are always evenly divided among the consumers in each group. Consumer groups A single consumer group with three consumers If there’s a topic where the number of records per partition are skewed, you might be in trouble. An instance might not be able to keep up, because it was assigned the partition with many records, while other instances are idle. It’s up to you to make sure that there are no partitions that have vastly more records than others. Messages are piling up on a hot partition Each consumer keeps track of which records it has processed. Since records are processed in order, a simple is enough. Every once in a while (5 seconds by default), a consumer will its offset to Kafka_._ offset commit When a consumer leaves its group, its partitions are given to other consumer in the group. The new consumers will be able to start requesting records starting at the offset where the previous consumer stopped. It is possible that a record was processed, but not yet committed. You’ll either have to start at the committed offset, or start processing new messages and skip everything that’s not yet processed. This is why Kafka can only guarantee that messages are delivered at least once, or at most once. The analogy no longer really makes sense when we start duplicating data. With Kafka, we can process a single record multiple times. Multiple consumer groups can consume the same records. Topics can be stored with a of three for reliability. Topics can have a after which records are deleted. All this is possible because data, unlike iron, can be duplicated easily. replication factor retention period This is a good place to end this post. We’ve covered all the major concepts of Kafka and you should have a general understanding of how Kafka works. Let’s wrap up with a short recap. What we learned Kafka is a distributed streaming platform that stores records in a durable way through replicating records across multiple servers. Topics consist of partitions, that store records in order. Partitioners decide which records belong on which partitions. Consumer groups are optional, and help distribute partitions among consumers for scalability. Offsets are committed as checkpoints for when consumers crash. And that, in a nutshell, is how Kafka works. About the author ( ) is a data engineer at , a data science consultancy company in the Netherlands. We hire the best of the best in BigData Science and BigData Engineering. If you are interested in using Kafka and other data engineering tools on practical use cases, feel free to contact us at . Ruurtjan Pul Twitter: @ruurtjan BigData Republic info@bigdatarepublic.nl Learn more Apache Kafka Factorio Achieving stack and heap safety in recursive functions Feature importance — what’s in a name? <a href="https://medium.com/media/3c851dac986ab6dbb2d1aaa91205a8eb/href">https://medium.com/media/3c851dac986ab6dbb2d1aaa91205a8eb/href</a>

Chain

Apache

Twitter

Understanding Kafka with Factorio

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

"DNS Propagation" Does Not Exist: A Suggested Change In Terminology

The Noonification: If You Cant Beat Em, Buy Em! (10/18/2023)

The Noonification: The Conversational AI Arms Race Has Begun (2/9/2023)

3 Top Resources To Learn About Apache Kafka

47 Stories To Learn About Kafka

5 Things Every Apache Kafka Dev Needs To Know: A Performance and Architectural Deep Dive

"DNS Propagation" Does Not Exist: A Suggested Change In Terminology

The Noonification: If You Cant Beat Em, Buy Em! (10/18/2023)

The Noonification: The Conversational AI Arms Race Has Begun (2/9/2023)

3 Top Resources To Learn About Apache Kafka

47 Stories To Learn About Kafka

5 Things Every Apache Kafka Dev Needs To Know: A Performance and Architectural Deep Dive

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps