We're running an e-commerce platform, where people publish products and other people purchase those products. Our backend has some highly scalable microservices running on well-designed Lambdas, and there's a lot of caching involved. Our order processing microservice writes to a DynamoDB table. We're using DynamoDB provisioned capacity mode with auto-scaling. We did a great job and everything runs smoothly. Suddenly, someone's product goes viral, and a lot of people rush in to buy it at the same time. Our cache and CDN don't even blink at the traffic, and our well-designed Lambdas scale amazingly fast, but our table is suddenly bombarded with writes and the auto-scaling can't keep up. Our order processing Lambda receives , and when it retries it just makes everything worse. Things crash. Sales are lost. We eventually recover, but those customers are gone. How do we make sure it doesn't happen again? DynamoDB ProvisionedThroughputExceededException is to change the DynamoDB table to On-demand, which can keep up with Lambda when scaling, but it's over 5x more expensive. Option 1 is to make sure the table's write capacity isn't exceeded. Option 2 Let's explore option 2. AWS Services involved: Our database. All you need to know for this post is . DynamoDB: how DynamoDB scales A fully managed message queuing service that enables you to decouple components. Producers like our order processing microservice post to the queue, the queue stores these messages until they're read, and consumers read from the queue in their own time. SQS: An email platform, more similar to services like MailChimp than to an AWS service. If you're already on AWS and you just need to send emails programmatically, it's easy to set up. If you're not on , need more control, or need to send so many emails that price is a factor, you'll need to do some research. For this post, SES is good enough. SES: AWS Content Overview What is Amazon SQS? Types of SQS Queues How to Implement an SQS Queue for DynamoDB Synchronous and Asynchronous Workflows with SQS Best Practices for SQS and DynamoDB What is Amazon SQS? SQS is a fully managed message queuing service. A messaging queue is a data structure where items can be read in the same order as they were written: First-In, First-Out (FIFO). Queues allow us to decouple components by making the consumer (that's the component that reads from the queue) unaware of who wrote the item (the writer is called producer). Additionally, in software architecture we usually focus on another characteristic of queues: The reading of the item can happen sometime after the writing. This lets us decouple producers and consumers in time: Consumers don't need to be available when Producers write to the queue. The queue stores the messages for a certain amount of time, and when Consumers are ready, they the queue for messages and receive the oldest message. poll For our solution, we're going to use a queue so that our order processing microservice can send a message with the order, the queue stores the message, and a consumer can read it at its own rhythm (i.e. at our DynamoDB table's rhythm). Types of SQS Queues There are two types of queues in SQS: queues are the default type of queue. They're cheaper than FIFO queues and nearly infinitely scalable. The tradeoff is that they only guarantee (meaning you might get duplicates), and the order of the messages is mostly respected but not guaranteed. Standard at-least-once delivery queues are more expensive than Standard queues, and they don't scale infinitely, but they guarantee ordered . You need to set the property in the message since FIFO queues only deliver the next message in a MessageGroup after the previous message has been successfully processed. For example, if you set the value of to the customer ID and a customer makes two orders at the same time, the second one to come in won't be processed until the first one is finished processing. It's also important to set , to ensure that if the message gets duplicated upstream, it will be deduplicated at the queue. A FIFO queue will only keep one message per unique value of MessageDeduplicationId. FIFO exactly-once delivery MessageGroupId MessageGroupId MessageDeduplicationId Most people who think of queues are thinking of guaranteed FIFO orders and exactly-once delivery. The only way to actually get those guarantees is with FIFO queues. How to Implement an SQS Queue for DynamoDB Follow these step-by-step instructions to implement an SQS Queue to throttle writes to a DynamoDB table. Replace and with the appropriate values for your account and region. YOUR_ACCOUNT_ID YOUR_REGION Create the Orders Queue Go to the SQS console. Click "Create queue" Choose the "FIFO" queue type (not the default Standard) In the "Queue name" field enter "OrdersQueue" Leave the rest as default Click on "Create queue" Update the Orders service to write to the SQS queue We need to update the code of the Orders service so that it sends the new Order to the Orders Queue, instead of writing to the Orders table. This is what the code looks like: const AWS = require('aws-sdk');
const sqs = new AWS.SQS();
const queueUrl = 'https://sqs.YOUR_REGION.amazonaws.com/YOUR_ACCOUNT_ID/OrdersQueue';

async function processOrder(order) {
  const params = {
    MessageBody: JSON.stringify(order),
    QueueUrl: queueUrl,
    MessageGroupId: order.customerId,
    MessageDeduplicationId: order.id
  };

  try {
    const result = await sqs.sendMessage(params).promise();
    console.log('Order sent to SQS:', result.MessageId);
  } catch (error) {
    console.error('Error sending order to SQS:', error);
  }
} Also, add this policy to the IAM Role of the function, so it can access SQS. Don't forget to delete the permissions to access DynamoDB! {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sqs:SendMessage",
      "Resource": "arn:aws:sqs:YOUR_REGION:YOUR_ACCOUNT_ID:OrdersQueue"
    }
  ]
} Set up SES to notify the customer via email Open the SES console Click on "Domains" in the left navigation pane Click "Verify a new domain" Follow the on-screen instructions to add the required DNS records for your domain. Alternatively, click on "Email Addresses" and then click the "Verify a new email address" button. Enter the email address you want to verify and click "Verify This Email Address". Check your inbox and click the link. Set up the Order Processing service Go to the Lambda console and create a new Lambda function. Add the following code: const AWS = require('aws-sdk');
const dynamoDB = new AWS.DynamoDB.DocumentClient();
const ses = new AWS.SES();

exports.handler = async (event) => {
    for (const record of event.Records) {
        const order = JSON.parse(record.body);
        await saveOrderToDynamoDB(order);
        await sendEmailNotification(order);
    }
};

async function saveOrderToDynamoDB(order) {
    const params = {
        TableName: 'orders',
        Item: order
    };
    
    try {
        await dynamoDB.put(params).promise();
        console.log(`Order saved: ${order.orderId}`);
    } catch (error) {
        console.error(`Error saving order: ${order.orderId}`, error);
    }
}

async function sendEmailNotification(order) {
    const emailParams = {
        Source: 'you@simpleaws.dev',
        Destination: {
            ToAddresses: [order.customerEmail]
        },
        Message: {
            Subject: {
                Data: 'Your order is ready'
            },
            Body: {
                Text: {
                    Data: `Thank you for your order, ${order.customerName}! Your order #${order.orderId} is now ready.`
                }
            }
        }
    };

    try {
        await ses.sendEmail(emailParams).promise();
        console.log(`Email sent: ${order.orderId}`);
    } catch (error) {
        console.error(`Error sending email for order: ${order.orderId}`, error);
    }
} Also, add the following IAM Policy to the IAM Role of the function, so it can be triggered by SQS and access DynamoDB and SES: {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage",
        "sqs:GetQueueAttributes"
      ],
      "Resource": "arn:aws:sqs:YOUR_REGION:YOUR_ACCOUNT_ID:OrdersQueue"
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:YOUR_REGION:YOUR_ACCOUNT_ID:table/Orders"
    },
    {
      "Effect": "Allow",
      "Action": "ses:SendEmail",
      "Resource": "*"
    }
  ]
} Make the Orders Queue trigger the Order Processing service In the Lambda console, go to the Order Processing lambda In the "Function overview" section, click "Add trigger" Click "Select a trigger" and choose "SQS" Select the Orders Queue Set Batch size to 1 Make sure that the "Enable trigger" checkbox is checked Click "Add" Limit concurrent executions of the Order Processing Lambda In the Lambda console, go to the Order Processing lambda Scroll down to the "Concurrency" section Click "Edit" In the "Provisioned Concurrency" section, set "Reserved Concurrency" to 10 Click "Save" Synchronous and Asynchronous Workflows with SQS Architecture-wise, there's one big change in our solution: We've made our workflow ! Let me bring the diagram here. async Before, our Orders service would return the result of the order. From the user's perspective, they , and they see the result on the website. From the system's perspective, we're constrained to either succeed or fail processing the order in the timeout limit of API Gateway (29 seconds). In more practical terms, we're limited by what the user is expecting: we can't just show a "loading" icon for 29 seconds! wait until the order is processed After the change, the website just shows something like "We're processing your order, we'll email you when it's ready". That sets a to the user. That's important for the system, because now we could actually have our Lambda function take 15 minutes, without hitting the 29 seconds limit of API Gateway, or without the user getting angry. It's not just that though, if the Order Processing lambda crashes mid-execution, the SQS queue will make the order available again as a message after the visibility timeout expires, and the Lambda service will invoke our function again with the same order. When the maxReceiveCount limit is reached, the order can be sent to another queue called Dead Letters Queue (DLQ), where we can store failed orders for future reference. We didn't set up a DLQ here, but it's easy enough, and for small and medium-sized systems you can easily set up SNS to send you an email and resolve the issue manually, since the volume shouldn't be particularly large. different expectation Once the order went through all the steps, failed some, retried, succeeded, etc, then we notify the user that their order is "ready". This can look different for different systems, some are just a "we got the money", some ship physical products, some onboard the user to a complex SaaS. For this solution I chose to do it via email because it's easy and common enough, but you could use a webhook and still keep the process async. Best Practices for SQS and DynamoDB Operational Excellence You know how to monitor Lambdas. You can monitor SQS queues as well! An interesting alarm to set here would be number of orders in the queue, so our customers don't wait too long for their orders to be processed. Monitor and set alarms: Be ready for anything to fail, and architect accordingly. Set up a DLQ, set up notifications (to you and to the user) for when things fail, and above all don't lose/corrupt data. Handle errors and retries: We're complicating things a bit (hopefully for a good reason). We can gain better visibility into that complexity by setting up X-Ray. Set up tracing: Security That's all you need to do for an SQS queue to be encrypted at rest: check that box, and pick a KMS key. SQS communicates over HTTPS, so you already have encryption in transit. Check "Enable server-side encryption": The IAM policies in this issue are pretty restrictive. But there's always a nut to tighten, so keep your eyes open. Tighten permissions: Reliability With a FIFO queue, the next message won't be available for processing until the previous one is either processed successfully or dropped (to the DLQ if you set one) after maxReceiveCount attempts. If you don't set these, one corrupted order will block your whole system. Set up maxReceiveCount and a DQL: This is the time that SQS waits without receiving the "success" response, before assuming the message wasn't processed successfully and making it available again for the next consumer. Set a reasonable value, and set the same value as a timeout for your consumer (Order Processing lambda in this case). Set visibility timeout: Performance Efficiency More memory means more money. But it also means faster processing. Going from 30 to 25 seconds won't matter much for a successfully processed order, but if orders are retried 5 times, now it's 25 seconds we're gaining instead of 5. Could be worth it, depending on your customers' expectations. Optimize Lambda function memory: As discussed earlier, you should consider processing messages in batches. Use Batch processing: Cost Optimization Remember that this could be fixed by using our DynamoDB table in On-demand mode. It's 5x more expensive though. Same goes for relational databases (if we use Aurora, then Aurora Serverless is an option). Provisioned vs. On-demand for DynamoDB: In this case, we're trying to get all orders processed relatively fast. If the processing can wait a bit more, an auto scaling group that scales based on the number of messages in the SQS queue can work wonders, for a lot less money. Consider something other than Lambda: Also published here.

The is an opinion piece based on the author’s POV and does not necessarily reflect the views of HackerNoon.

Securing Your EC2 to S3 Connection

Using SQS to Throttle Writes to DynamoDB

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Navigating Amazon EBS: A Primer on Basics and Best Practices

5 Common Amazon SQS Issues

AWS Decoupling: The Big Comparison

How to Build a Scalable Webhook Delivery System With Kafka, SQS & S3

Improving Our MongoDB Write Throughput with SQS

Redis Stream vs. Amazon SQS

Navigating Amazon EBS: A Primer on Basics and Best Practices

5 Common Amazon SQS Issues

AWS Decoupling: The Big Comparison

How to Build a Scalable Webhook Delivery System With Kafka, SQS & S3

Improving Our MongoDB Write Throughput with SQS

Redis Stream vs. Amazon SQS

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps