Redis Stream vs. Amazon SQS

Written by przemeq | Published 2020/07/01
Tech Story Tags: programming | aws | redis | amazon | hackernoon-top-story | redis-stream-vs-amazon-sqs | redis-stream | amazon-sqs

TLDR Redis Streams (Redis version 5.0) vs. Amazon SQS (AWS SQS) is a head-to-head battle. The battle between queue systems is going to consist of five rounds. We will check: Ease of use, Error handling, Persistence Throughput and Error handling. We also check out how Redis streams can be used to send messages to the client in the cloud. The results of the battle will be published on Friday night at 9pm EST.via the TL;DR App

Do you like boxing fights? This evening I bring you a head-to-head battle between Redis Streams and AWS SQS. If you are interested which technology is better and which will end up on the ground, check it up!

The story behind the competition

Recently we had to add communication with an external API to our system. It's nothing unusual, and we already have dozens of similar integrations in our applications. This time, however, we had to meet more strict per-second rate limits, whilst still being able to maximize their usage.
To keep it fast, but simple, we decided to move all the API calls to a subsystem. No database, no front-end, just two queues (input and output), and just one purpose - hit API as fast as possible within limits. This created the next challenge - which queueing system to choose?

Rivals

Our application is written in Ruby and hosted in the AWS cloud. The natural candidate was AWS SQS, as we've already used it in the other parts of the system. On the other hand, we already had Redis set up for Sidekiq, so we decided to check if Redis Streams could be a better fit. We preferred not to introduce entirely new technology if one of the existing solutions met our needs.
AWS SQS comes in two flavors: FIFO and Standard, have their differences, so we decided to compare them separately. We ended up with three candidates:
  1. AWS SQS (standard)
  2. AWS SQS (FIFO)
  3. Redis Streams (Redis version 5.0)

Detailed comparison

The battle between queue systems is going to consist of five rounds. We will check:
  1. Ease of use
  2. Error handling
  3. Persistence
  4. Throughput
  5. Delivery
Hopefully, you have a fighting knowledge of how AWS services work, and you know how Redis punches.
I won't let you wait any longer; let the fight begin!
Battleground: How queues work?
A typical way of processing queue messages looks like the following; system marks messages as "pending" or "in-flight" and sends them to the consumer. When the consumer finishes processing, it notifies the queue about the success by sending an "acknowledge" message. Then messages change state from "pending" to "processed."
Lack of "acknowledge" for a given time means that processing has failed, and the message is no longer considered "pending." This way, no message gets stuck in the "pending" state forever.
When processing fails for a couple of times, the queue assumes there is a problem with the message itself and stops sending it to the client. Optionally it can also move it to the DLQ (it's like a graveyard for messages).
Round 1: Ease of use 👨‍🔬
AWS SQS provides a handful of differing ways in which to use it. Starting from the web interface, through CLI ending on client libraries available in many languages.
For example, processing messages in Ruby code looks like this:
poll_config = {max_number_of_messages: 10}
stats = sqs_poller.poll(poll_config) { |messages| process_messages(messages) }
You can set up a batch size together with a bunch of other parameters describing polling loop details. The library takes care of acknowledging messages when the processing block finishes. When processing finishes, the poller returns an object containing some basic stats. (e.g., number of messages received)
Using Redis Streams is more complicated. First, you need to perform all actions using CLI or client library. No web UI included. What's more, as of today, the set of commands provided by the API is somewhat limited. Simple code for processing messages looks like this:
redis.xreadgroup('mygroup', 'Alice', 'mystream', '>', count: 1)
# Process the message
redis.xack('mystream', 'mygroup', '1526569495631-0')
It looks easy, but there is a catch; if you want to handle failures, you need to code it yourself:
redis.xpending('mystream', 'mygroup')
# Check the how long message is "pending" and "delivery counter" in the response
redis.xclaim('mystream', 'mygroup', 'Alice', 3600000, '1526569498055-0')
Unless you worked with redis streams before, you probably have no idea what's happening here, which clearly shows that using SQS is way easier. If you wonder what these commands do, the best way is to check the documentation. This is what we always do when working with Redis API.
AWS with web UI access and more friendly API wins this round.
Aws 1:0 Redis
Round 2: Error handling 🥊
Two features help with handling message processing errors:
  1. retrying non-acknowledged messages
  2. moving messages to the Dead Letter Queue (DLQ).
Now, let's check how our candidates support them;
In SQS (both FIFO and standard), the whole flow is integrated with the queue lifecycle. You can easily set this up using the web UI or the CLI. There is an option to define "in-flight" state timeout as well as retries limit and which DLQ queue to use.
In Redis, there is no web UI, and there are no knobs for manipulating the "pending" state timeout or DLQ settings. There is a possibility to get all the needed data from the Redis API, but there is nothing built-in to utilise. You have to code the error handling by yourself.
Summing up, AWS allows you to configure error handling with a few clicks, while Redis requires you to implement a custom solution. AWS wins this round.
Aws 2:0 Redis
Round 3: Persistence 💾
By "persistence" here, I mean how queue stores messages. I'll compare how long a queue keeps messages (retention) and what the limit is on the number of stored messages.
In AWS SQS, we can configure retention to any value from 1 minute to 14 days. SQS can store an unlimited number of messages, with limits imposed only on the number of in-flight messages. It is set to 20,000 for FIFO and 120,000 for a standard queue.
Redis queue works differently - it stores all the messages in memory. As a result, Redis' memory size limits the number of messages (maximum number depends on message size). You have to purge queue from old messages manually to make sure you won't receive an out of memory error.
Storing messages in memory also has another drawback. In the case of server downtime or restart, you'll lose all the data. You can mitigate it by creating Redis database snapshots periodically together with recording all operations logs. Yet that requires additional manual configuration. See (RDB and AOF persistence).
Taking into account better versatility of the AWS persistence configuration coming right out-of-the-box, AWS wins again.
Aws 3:0 Redis
Round 4: Throughput 🏎
Queue system cannot be the bottleneck for the application. In this round, I'll compare how fast different queue systems can deliver messages.
AWS SQS introduces a limit when batching messages downloaded from the queue. In each request, you can download up to 10 messages. If your system needs to process a great number of messages, it means many queries to the API. On top of this, if you use the FIFO queue, you'll get 300/second requests cap (it's possible to increase this limit by contacting support). Combining it with messages batching (up to 10 in a single request), we receive a theoretical maximum of 3000 messages per second for FIFO SQS. My experiments showed that in practice downloading 250 messages from FIFO SQS using a single Ruby thread takes around 0.6s.
Here you can see the benefits of storing messages in memory, as Redis does. There are no limits on the number of requests, and my experiment with downloading 250 messages in Ruby took around 0.001s.
On top of this, Redis message size is limited only by the size of the server memory, while AWS SQS introduces a limit on the message size to 256K. You can use Extended Client Library to store message payload on S3, which extends this limit to 2GB, which is still less than Redis.
Redis is several rows of magnitude faster than AWS SQS. Hence it wins this round.
Aws 3:1 Redis
Round 5: Delivery 📨
In the final round, I'll check if the queues guarantee First-In-First-Out (FIFO) and exactly-once delivery.
Redis Streams always provide both FIFO and exactly-once processing. Period.
AWS SQS provides two different types of queues. When FIFO and exactly-once processing is not a requirement, a standard SQS queue is the right choice. It can deliver messages in any order (i.e., a later message can be delivered before an earlier one). It also promises only the "at least once" delivery (one message can be delivered multiple times).
For cases when FIFO is a must-have, AWS provides the FIFO SQS queues. They guarantee both FIFO and "exactly once" delivery at the cost of the limitations mentioned above.
So this round is a tie (with Redis having a small advantage)
Aws 4:2 Redis

Verdict

Winner by points 🏆
The AWS SQS has won the competition. It's a well-balanced solution that works out-of-the-box and satisfies most requirements. In case you need to tune it, there is a web UI or CLI to make it as easy as possible. It also comes with a lot of tutorials and tools that help you to start.
Winner by knockout 👊
In most cases, we use AWS SQS at work in u2i. Recently a Client gave us specific requirements, though. We were tasked in building a system that had to process at least 250 messages each second.
AWS SQS appeared a bit too slow. We have only one second for processing. When downloading 250 messages takes 0.6s out of 1 second we have, that doesn't leave time for possible retries. What's more, we're not interested in the lost messages when the system is down for more than 15 minutes, so we don't need an over-the-top persistence.
Given the above, we decided to choose Redis Streams and bear the cost of manual implementation of the error handling scenarios.

Written by przemeq | Tech Lead, Full Stack Developer at u2i.com
Published by HackerNoon on 2020/07/01