What The Heck is WarpStream?

Written by progrockrec | Published 2024/05/20
Tech Story Tags: apache-kafka | data-engineering | datastreaming | kafka | warpstream | data-infrastructure | cloud-migration | data-streaming-solutions

TLDRWarpStream is an open-source event streaming platform that can collect, store, process, and publish streaming data in real time. It's delivered as a single, stateless Go binary, so there are no local disks to manage, no brokers to rebalance, and no ZooKeeper to operate.via the TL;DR App

In the summer of 2023, a cheeky blog post titled "Kafka is dead, long live Kafka" came across my feed. It was written by a company I hadnā€™t heard of yet, WarpStream. Itā€™s a long blog post, but I couldnā€™t stop reading it. What was being described was incredibly interesting. What WarpStream was proposing and had built is an Apache KafkaĀ® protocol compatible data streaming platform built directly on top of S3. It's delivered as a single, stateless Go binary, so there are no local disks to manage, no brokers to rebalance, and no ZooKeeper to operate. If you arenā€™t in the know about Kafka, it is an open-source event streaming platform that can collect, store, process, and publish streaming data in real time.

Kafka was originally developed at LinkedIn and open-sourced in 2011. It is written in Java and Scala and is in wide deployment, but it is a large, complex, and difficult project to launch and manage. The Kafka API has been established and is supported by other players in this space, which makes for a robust community and exciting developments.

Letā€™s Dive In

Originally designed for LinkedInā€™s data centers, Kafka's migration to the cloud has presented significant hurdles. The replication strategy often results in higher inter-availability zone bandwidth costs, and its management typically necessitates a dedicated team. This is the precise problem that WarpStream aims to address with its innovative solution.

As previously mentioned, it is a Kafka protocol-compatible data streaming platform that runs directly on top of any commodity object store. Name the cloud, and it appears they run on it. Iā€™m personally a big fan of the Go language, so I liked that it was developed in Go. Whatā€™s next?

Using WarpStream

Iā€™m doing this in Ubuntu running in WSL on Windows 11. The docs give you a few options for installing the WarpStream agent; I used the one-line installation script and note, that you donā€™t need to sign up for anything to do this:

curl https://console.warpstream.com/install.sh | bash

With that installed, then I issued the warpstream demo command. This will do the following:

  1. Automatically sign you up for a temporary account that is valid for 12 hours.
  2. Run an in-memory producer that will produce small JSON documents to a stream periodically.
  3. Run an in-memory WebAssembly consumer that consumes the JSON documents and prints them to the standard console.

This is what it looked like. The bottom line in green is the URL that I pasted into my web browser, which Iā€™ll get to in a moment.

After more than a screenful of feedback, I get the suggestion in blue to execute that comment in another terminal, and then we see the data getting produced to a topic and 4 different partitions.

\With that all running, I pasted the URL into my browser and I saw this immediately. We see two agents running in the cluster overview

Curious about the activity, I decided to delve into the topics. There, I found our topic with its four partitions, all showing write activity from the demo command. This whole demo is ephemeral, so you can try it out; it shuts itself down after an hour, so donā€™t try doing anything serious.

There are a number of other features in there that I tried out; theyā€™ve definitely bundled a lot of ease of use into the interface. If I had a nit to pick, itā€™s the UI for pop-up dialogs for creating and deleting objects; they have no ā€˜cancelā€™ option, and you have to hit the left navigation to bail out. There is a very cool startup wizard when you start building either a Serverless or BYOC cluster. It creates a temporary virtual cluster for a tutorial that then walks you through each step of getting started. The function seems to trigger if it is your first time creating a server in your account.

Summary

There are a lot of cool features in the WarpStream product that make it easy to set up and use, crazy easy. If you want a deep dive, I suggest their blog post that I linked in the intro, and for you to try it yourself. The fact that you can drop it into an existing Kafka setup and have it basically ā€œjust workā€, is very compelling. If I had a nit to pick, itā€™s that there isnā€™t an open-source component, but on the other hand, itā€™s a product on top of an open protocol, which is basically how a lot of open-source companies monetize. They are currently sponsoring Benthos, which was recently integrated into WarpStream to give you integrated pipeline editing and management. Thatā€™s a really nifty integration that is worth its own blog.

So, what the heck is WarpStream? Itā€™s a really powerful, easy-to-use, Kafka API-compatible replacement for Apache Kafka. It should lower your operational overhead for running Kafka and make it cheaper to run your streaming workloads. If youā€™re streaming with Kafka or one of the other protocols, itā€™s worth checking out.

**Check out my other What the Heck isā€¦ articles at the links below:



**


Written by progrockrec | Software designer/developer, developer advocate, writer, and musician.
Published by HackerNoon on 2024/05/20