I have been using Lambda in production for about four years now personally, and three years professionally at Volta. Initially, I shipped Lambdas because it was easier than managing servers. At Volta, we now exclusively use server-less services because they are the smartest option for our workloads we remember to support them correctly. This is a cheat sheet, a checklist of all the things you might want to remember when shipping something new to ensure it runs successfully. if Infrastructure as Code Regardless of what support you like to build into your Lambdas, the most important thing to do is to ensure consistency. If you’ve deployed CloudWatch alarms for one of them, it can be quite a surprise to see older functions fail silently because they predate your alarm strategy. Writing your infrastructure as code serves you with a way to document and deploy your Lambdas, while also enabling infrastructure conversations through code reviews. A standard CloudFormation file. I personally use CloudFormation because I love writing hundreds of lines of YAML, and because I got comfortable with it back when the Serverless Framework didn’t quite have feature parity and Terraform didn’t have . If I could do it again I would spend more time exploring Terraform or Terragrunt. remote state CloudWatch Alarms If you’re mesmerized by graphs like I am, you probably already spend time looking at CloudWatch Metrics. Let’s take it one step further and turn them into something actionable; something which can wake you up if absolutely necessary. There are four basic measurable properties of a Lambda: Invocation rate Invocation duration Error rate Throttle rate If you’re really ambitious you can also add CPU usage and RAM to the list. These are important because they all characterized the Lambda’s workload. By setting expectations for each of these properties as CloudWatch Alarms they are essentially serving as an abstract test, and we all want more test coverage, right? To make things more interesting, you can even use to compare metrics. Another recent addition is , which is very valuable if you expect your Lambda’s performance to vary over time. a few math expressions anomaly detection Look! A wild anomaly! Continuous Deployment Traffic Shifting For me, developer experience generally falls into two buckets: writing less code and deploying more confidently. Nothing has made a more significant impact for my deployment confidence than traffic shifting, it is that magical. This feature basically gives Lambda the ability to slowly move invocations from the old version of your Lambda to a new version, while monitoring some CloudWatch Alarms along the way to see if it should rollback. Have I gotten too cocky and deployed issues my alarms didn’t catch? Yes. Should they support more than just time-based deployment options? Yes! Is it annoying they call it by three different names throughout their documentation? Absolutely! But all that aside, Traffic Shifting gives you the power of blue/green testing in just a few lines of CloudFormation and makes it easier to test in production and release more confidently on a Friday night, (if you’re into that sort of thing). Automated Build Pipeline I’ve spent a lot of time thinking about , and think I’ve settled on for my needs. That said, there are countless services built for this purpose that may fit your needs better. The most important feature of a build pipeline is that it enables you to quickly release new versions of your Lambdas in both hotfix and feature release scenarios. how to make CircleCI work well for mono-repositories a pretty good configuration Distributed Tracing The first thing you’ll notice after deploying and running a Lambda in AWS is that your CloudWatch logs were not designed for Lambda. Log groups contain multiple invocations and make no effort to visually separate one invocation from another, making it incredibly hard to parse what is actually going on. Come on AWS, just group by invocation ID already! The other problem is that Lambdas are quite often invoked by things, and occasionally emit something as well. This concept of tracing is present in AWS X-Ray, but is perfected by third-party services such as Epsagon. With a simple instrumentation call, they capture the event which invoked the Lambda and can visualize each invocation separately. If one Lambda emits and SNS message which invokes another Lambda you can even see both invocations in one trace. Problem solved. across resources Dead-Letter Queues Lest we forget the oft-forgotten invocations that we errored out so many paragraphs before. A significant concern of using Lambda is responding slowly, or worse yet not at all. A to catch your unprocessable events is a good way to ensure you have a record of what you could not handle, and also makes it easy to reprocess the events after you’ve improved your function. dead-letter queue at least The practice behind this is just as important: let your Lambdas error out. Exiting doesn’t hurt the next invocation as it would in a conventional server-full environment since it’s just an execution failing, not the entire service. In fact, Lambda does a lot to account for these failures, such as for you. This also makes tracking issues easier in CloudWatch, or your other favorite monitoring tool. In short: when in doubt, throw it out. retrying the invocation Strict IAM Policies The is another guardrail that frequently gets left underutilized. IAM policies allow you to specify what resources and actions they grant access to, but also allow you to grant unnecessarily wide access with a little *. This is generally a bad idea because if someone compromises your service, they can use that role to impact other services as well. For example, if your Lambda should be able to read from a Dynamo table use this: Principle of Least Privilege { : , : [
    { : , : , : [ , , ], : }
  ]
} "Version" "2012-10-17" "Statement" "Sid" "QueryMyTable" "Effect" "Allow" "Action" "dynamodb:DescribeTable" "dynamodb:Query" "dynamodb:Scan" "Resource" "arn:aws:dynamodb:region:account-id:table/MyTable" Alternatively, if you had used and you would be allowing that Lambda to for every table in your account. "Action": "*" "Resource": "arn:aws:dynamodb:*" DeleteTable More to Come I’m sure this list will continue to grow as AWS adds new supporting features, and as folks like you point out things I’ve missed. Until then, I’ll see you in production. Previously published at https://medium.com/cazzer/lambdas-in-production-92f8e4ca70a2

Amazon

Using Lambdas in Production

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

How to Prepare To My Software Engineering Interview

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

How to Prepare To My Software Engineering Interview

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps