Senior Software Engineer
A few months ago, I wrote about creating your own sink connector after we started using ours. Surprisingly, we replaced it with Kafka Consumers last week. I am going to review our experience and try to write the advantages and disadvantages of both technologies in this short article.
To say the truth, we were quite satisfied with our custom Elastic Sink Connector; it already handled our traffic (up to 7000 product data per second (each one is about 1000 lines), on 4 docker containers) and no single performance failure has happened. However, debugging and testing is A LOT difficult for custom Kafka Connectors!
You cannot write your integration tests easily as it is some kind of plugin in your Kafka stack. You need to mock internal Kafka classes, create anti corruption layers, spend much time on unit tests and… Still, you don’t feel satisfied as there are no end-to-end or integration tests.
Assume that you have a bug in production and you can’t figure it out as your tests are not complete for the reasons above. Ok, you want to debug the connector in your local. Unfortunately, it is quite a luxury word for Kafka Connectors. If you need to put a breakpoint somewhere, you need to do A LOT of “wtf… really?” staff mentioned here. Creating your environment? You can attach your connector to Landoop’s fast-data-dev stack, but again, it is not debugging, you just cloned the problem and trying to solve it in a more controllable infrastructure.
For this reasons, we replaced our custom Sink Connector with a regular Kafka Consumer and now it is like a piece of cake to maintain it. (At least for now…)
We were wondering about the performance, but the test results impressed us. For our performance concerns, they are almost identical. Maybe it can show a change in a quite big traffic, but this time you can scale right, it is the best advantage of Kafka.
Debugging and testing Kafka Consumers are quite easy, just like a regular API. If you decide to move to consumers, you can write in many programming languages. We first thought about writing it with Python or C#, but our final choice was Java. Because we could copy our classes in Kafka Connect project in a short time. (You must use Java for your custom Kafka Connectors)
I want to emphasize the main point here: Committing the offsets.
In a custom Kafka Connector, the framework auto commits the messages if there is no exception in put method:
In Kafka Consumers, you can use auto-commit again but this time you should consider that the framework commits just after you received the topic messages. What I’m trying to say is, if your code threw an exception before transferring the data, you lost the message.
Instead, you can commit manually and that is really easy to imitate Connect’s behaviour, just commit after you complete your job :)
No! If you need to simply transfer your topic data to another system, or vice versa, and there is a community/Confluent supported Kafka Connector, use it! - A little configuration and boom, it’s ready. However, if you must write your own Kafka Connector for some reasons, please consider that testing and debugging is hard.
Thanks for reading,
Sometimes I tweet a useful piece of information: @_skynyrd