As with any new technology, the serverless ecosystem is evolving rapidly. Cloud providers are releasing new features and services on a monthly basis. As a new user, it can be quite overwhelming. So in this post, I’ll help you figure out who’s who in the serverless zoo — I will give an overview of services you should know when building serverless applications, discuss when you should (or shouldn’t) use them and list some common gotchas when using them.
Lambda is the obvious choice for embarking on our serverless journey. It is a Faas (Function-as-a-service) offering — An event-driven compute service provided by the cloud vendor that lets you execute your code without managing any servers. You zip your code and send it to the cloud provider. Then you configure events for which the code should execute (such as an HTTP request, a message placed into a queue, etc.) and the cloud provider takes care of the rest.
Lambda (and Faas) is an integral part of most serverless systems. It scales automatically and is pay-per-use (as most serverless services are). I would recommend to chose Lambda for most cases, although there are some other serverless compute services which target more specific use cases, which we will cover later on.
However, there are some new gotchas you should note when using Lambda:
Lambda is the cornerstone of most large serverless apps, so there are many more gotchas, and it takes time to get in that serverless mindset. But this should get you started on the right track.
SQS (Simple Queue Service) is a fully serverless queue service. You can create a queue with a click of a button and start sending messages through it. Queues are a common component in distributed systems. They are used to un-couple different parts of our system, so each can operate on his own. SQS offers two types of queues: FIFO queues where the order of the messages is guaranteed, and standard queues where the ordered is not guaranteed, but the throughput of the queue is almost unlimited.
There are several things you should note when using an SQS queue as a component in a serverless system:
SNS (Simple Notification Service) is a managed pub/sub service. An SNS “entity” is called a topic. Each topic can have several subscribers (HTTP endpoints, Lambda functions or SQS queues).
A typical use case for SNS is a service broadcasting an event to the rest of the system. Let’s say I have several microservices who need to react to the registration of a new user. I can have my registration service publish a message whenever a user is registered, and have all the other components as subscribers to this topic (either via a Lambda trigger, an SQS message or an HTTP hook). Classic Pub/Sub.
Here are some things to keep in mind when using SNS:
Kinesis is a fully managed stream. It allows processing of data records in order, at a very high scale. To enable parallel data processing, each stream is made of several “shards”, and only the shard is processed in order. You can guarantee that the same shard will handle two different messages by using the same identifier when inserting them to the Kinesis.
Kinesis is a family of services which includes the standard Kinesis, Kinesis Video Streams and Kinesis Firehose which is a service used for data aggregation.
You can read data from Kinesis using the KCL (Kinesis client library) or trigger a Lambda with it. Here are some notes on integrating Kinesis to your serverless system:
Since Lambda functions are stateless, managing state can sometimes be hard. But fear not, for Step Functions are here for the rescue! Step Functions is an AWS service which allows you to manage state as code. It is an orchestration service that lets you model workflows as state machines. You can find some more details in this excellent post by Yan Cui (whose name I ripped off).
Note that AWS announced support for more service integration for Step Functions at re:Invent 2018, like integration with DynamoDB, Fargate, SNS and SQS, so you can use Step Functions to orchestrate a lot more than just Lambda. The service is a bit pricey, so there are some use cases where it might be an overkill, but for the most of it, it is a pretty awesome tool at your disposal.
S3 (Simple Storage Service) is probably the most popular object store service in the world. While it has many possible use cases (e.g. backups, static websites) serverless systems reveal its full potential.
When building a serverless microservice, you usually require some database. In some cases, S3 will be a great fit — It is easy to use, highly available, durable, and it is massively cheap. It integrates with Lambda, and you configure it as one of its event sources. Also, when combined with other serverless services (like Athena or Glue) it can be quite powerful, despite its simplicity.
However, when choosing S3 as a database for your service make sure it’s a good fit first: S3 is an object store. It does not have database features like locking mechanisms and transactions, which can be an issue for some services which require parallel access to the DB (common pattern with Lambda) — so make sure this is not the case (now or in the foreseeable future for the service). If you don’t have parallel writes / they don’t interrupt you service — congratulation, S3 might be a great fit.
DynamoDB is a serverless key-value document database. It is a popular database for building serverless applications on AWS. When I say DynamoDB is serverless, It entails a few features:
DynamoDB is a pretty flexible database and works best with a single purpose service. If you build a monolith with DynamoDB and use it as an all-purpose database, you are gonna have a bad time.
I am not saying this because it’s impossible, or because it will be harder than with other databases (it might, but that’s not the point). It is because DynamoDB is the easiest to use when you keep your data as simple as possible. Adding many indexes for lots of different purposes (for each you will probably need a small piece of the data) is an anti-pattern. Additional Indexes are valid for some cases but use them wisely.
DynamoDB has recently announced transactions support, which was the last piece missing for making it the ultimate serverless database. However, there are some cases where DynamoDB is not the best fit: when you have to perform complex searches against your data, or when you hold raw analytics or time-series data you might find DynamoDB hard to use. For these cases, use one of the other serverless databases.
AWS recently announced several serverless databases other than DynamoDB. Some are still in preview but will be released this year. I will not elaborate on each of them independently (since DynamoDB is a good fit for most use cases, at least the basic ones), but that does not mean you should not use them. It’s the other way around — you should know all of them well, and use the one that best fits your use case! Always try to have the right tool for the right job.
Serverless databases on AWS. choose from key-value (DynamoDB), graph (Neptune), time-series (Timestream) or ledger (QLDB)
API Gateway is the gateway to your application. It lets you manage your API easily, and integrates with many compute services for you to handle the requests with (Lambda is one of them!). It is REST-based (meaning you can use the different HTTP verbs with it), and has some features that give you great control over your API such as setting limits on your APIs or using different authorizers for your API, thus separating the authentication logic from your main business logic.
While being a handy service, note that it adds some latency to your requests (compared to invoking a Lambda directly via the SDK for example), and it is not very cheap (not too expensive as well, but somewhere in the middle). API Gateway is an excellent choice for REST user-facing APIs. For internal APIs synchronous invocations (between microservices) it depends on the use case. In those cases don’t just use API Gateway as a default — check what are the benefits over, for example, invoking the other service Lambda directly. Sometimes there will be some; sometimes there won’t.
This one is exciting. We already mentioned that you should use Lambda to transform, not transport. But (let’s say you are building a user-facing application) how should you send data to the user? Isn’t that precisely what Lambda + API Gateway does? The short version is yes, but you can do it better.
AppSync is a serverless backend for mobile/web/any API consuming application. Unlike API Gateway it uses a GraphQL API for your service. To use AppSync you have to define your data schema. Then, you have to set your data source, from which AppSync reads the data. The default data source is DynamoDB, but there are many other options (You can even use it with API Gateway behind the scenes, for legacy APIs for example). In case you need to transform the data before you send it to the user, you can always use a Lambda resolver for the request and have it process the data. Then let AppSync take care of the rest.
Writing APIs which use API Gateway + Lambda to serve users request can be a significant effort you can spend somewhere else. I highly recommend using AppSync for that. However — It does require some basic knowledge of GraphQL. If you are at the beginning of your serverless transition some of you would probably want to take it step-by-step and not introduce all these new technologies to your stack all at once.
Love analytics? This one is just for you. Athena is a serverless tool built on Presto which allows you to analyze massive amounts of data stored in S3 quickly and cheaply, using standard SQL queries. The data can be stored in many formats, including CSV, JSON or Parquet. Another thing worth mentioning is that Athena is integrated with Glue, which is a serverless ETL service that is pretty cool on its own. You can use Athena queries on data sources you have in your Glue data catalog.
One thing to keep in mind when using Athena is that you pay not only for Athena (which is priced according to tera-bytes read) but also for the S3 calls that Athena is making. Some things to keep an eye for are:
here are some more tips for working with Athena, by Manjeet Chayel and Mert Hocanin
This is just the tip of the iceberg. While writing this post I had to leave so much out, and it is still pretty packed. Services like AWS Batch, CloudFront, Route 53, and the different IoT and machine learning services are just a few others I left out. Whenever you feel like you need to implement something in your application — you should check if a service for that already exists. But you are at least familiar with the most basic and common services you will use on your serverless journey.
Also, these are just the basic building blocks — now it’s up to you to start using them. One thing you should note is that the architecture of serverless applications is pretty different, and learning to do it right takes a while. Here is a serverless transformation example (again, by Yan Cui) and a good read about serverless design patterns by Jeremy Daly to give you a sense of what a serverless application might look like. You can also use tools like AWS’s new well-architected tool to assist along the way.
Like with any new technology, going serverless takes time and effort. But the good news is you are not alone! The serverless community is an ever growing one and is really inclusive. Join the serverless forum on slack or visit serverless days and you will for sure find people who will help you take your first steps.
So long, and thanks for all the functions!