I sat down with some of my co-workers at Confluent to ask what the most pertinent Apache Kafka interview questions were for this year. A wide variety of questions surfaced, from the open-ended basics to advanced challenges.
Then, I took notes and organized the answers.
While the questions asked in a real-life scenario would differ widely, depending on whether the role was on a data team or engineering team and whether the candidate was junior or senior, I did my best to emulate the questions by providing answers for different levels.
This post focuses on the questions that surfaced for junior-level applicants. If you’re a junior-level developer who’s looking to level up their Kafka knowledge for an interview, I hope you find this helpful!
Apache Kafka® is an open-source distributed event store and stream processing platform.
An event is a key/value pair representing a thing that happened – it could be a webpage click, a rideshare app request, or a thermostat adjustment.
A topic stores events. Its underlying data structure is a log, which, unlike a queue, stores an immutable series of un-deletable events.
Topics are broken up into partitions. Each partition can live on a different node in a cluster of servers, which enables low latency and high throughput.
Each partition has a default replication factor of 3, which makes partitions highly resilient.
A broker is a server that is responsible for message storage as well as metadata storage. For example, brokers store information about the latest #offset for the consumer to access.
A producer writes messages on a Kafka topic, and a consumer reads those messages. Producers are responsible for event assignment to partitions as well as data compression.
An offset is the logical position of a record in a topic’s partition, assigned by the broker when storing the record. Consumer clients keep track of offsets so they can pick up reading from the last successfully processed one when resuming work from being offline.
When consumers share the same Group ID, they belong to a consumer group. This group splits the work of message consumption. Consequently, 2 consumers in the same group cannot read from the same topic. They must be in separate groups to do that.
When consumers read a message, they send an acknowledgment to the Kafka broker. You can configure different settings on the producer depending on whether you want to wait for this message acknowledgment before sending a new event or not.
Kafka is used for building real-time data pipelines. At LinkedIn, where it was invented, it was used to track site user activity. It’s also used for metrics, stream processing, event sourcing, and as a commit log.
Each partition has a replication factor of 3 and is stored on 3 different servers. That means if a server goes down, there will still be two replicas available.
Kafka Connect itself is a framework that makes it easier to write the code you need to connect to data sources and sinks.
A Kafka Connector, on the other hand, removes some of the hassle and boilerplate code you’d have to write to connect to different data sources and sinks.
We know topics are split into multiple partitions and that events are sent to topics. How are events assigned to partitions? That depends on the partitioning strategy.
Partitioning strategies are dependent on the keys of the events that are stored in them.
If there’s no key, the events will be distributed among the different partitions round-robin style, so there’s no reason to expect that events that come after one another in the same partition are in cardinal order.
Kafka handles events with keys by computing their destination from a hash of the key. This ensures that events with the same key end up in the same partition.
So if your underlying logic is sound, say, you’re producing events that come from the same user, using their id as a key, it’s safe to assume that these events are coming into the partition in order.
If you’re producing events to that partition that come from two different users, then the order would not necessarily be guaranteed.
Did you find this helpful? Were there questions that I missed? Comment below, and let me know! I’ll be continuing this as part of a series of interview questions, so please let me know if there are other questions you’d like to see featured.
If you’d like to solidify your understanding of Kafka, you can take this introductory course: Kafka101