A while ago, we covered the invocation (trigger) methods supported by Lambda and the integrations available with the AWS catalog.
Now we’re launching a series of articles to correlate these integration possibilities with common serverless architectural patterns (covered by this literature review).
In Part I, we will cover the Orchestration & Aggregation category. Subscribe to our newsletter and stay tuned for the next parts of the series.
Purpose
A single API is used to aggregate multiple downstream resources.
Entry-point Lambda as a router to other Lambdas
Requests come from API Gateway, which triggers a Lambda function (L1) synchronously using the proxy integration model. L1 then triggers multiple other Lambda functions (L2x). The invocation from L1 to L2x could be synchronous or asynchronous, depending on the use case.
If the client expects data that comes from L2x, the invocation trigger should be synchronous. For write-only endpoints, when the client only expects a ‘200 - OK’ response, L1 can invoke L2x asynchronously and respond to the client immediately.
One disadvantage of using synchronous invocations is that the L1 function will continue to be billed for each millisecond it awaits L2x functions results. See more in this Serverless Trilemma tutorial.
API Gateway as a router, client as aggregator
In some cases, the client could play the role of aggregator. Consider a frontend application under your control that requires data from multiple backend sources. A single API Gateway can be deployed with several endpoints, each routing to different L2x Lambda functions (also using the proxy integration model).
The client is then responsible for parallelizing calls to all required endpoints, collecting, and aggregating results.
The main benefit of this approach is removing the double-billing factor. Watch this tutorial about the Serverless Trilemma to learn more about this.
Architectural concerns
1. Timeout limits
2. Concurrency limits
3. Potential failures
Purpose
Having a central, long-term data storage that is rarely modified and supports flexible, on-demand data query and transformation according to different access pattern requirements.
Solutions
Push-based approach
API Gateway and Lambda (using proxy integration) can serve as a passive gate to receive requests with information for the data lake. Authorized applications would send the data in JSON format through a REST endpoint. The Lambda function is responsible for packing the data and uploading it to an S3 bucket.
This bucket will serve as the data lake storage. AWS Athena is used to query the JSON data stored in S3 on-demand. Athena can be accessed through JDBC or ODBC drivers (opens up for the usage of GUI analytical tools), an HTTP API, or even the AWS CLI.
In case it’s needed, a second API endpoint and Lambda function could be used to receive data requests, query Athena and send data back to the client. The benefits of this approach are:
Event-driven approach
In case the primary data storage service supports event-driven triggers, the Lambda function can consume data for the data lake in an asynchronous way. This is the case of DynamoDB and Aurora, for example.
DynamoDB Streams can trigger a Lambda function automatically as information is entered or modified in a table. Aurora (MySQL compatible only) can similarly trigger a Lambda function in an event-driven way.
The asynchronously triggered Lambda would then perform the same operations to store the data in S3.
Optimizing storage for fast and cheap reads
JSON is a universal and easy to use structured data format, but not optimized for large scale data consumption. Athena queries will be orders of magnitude faster and cheaper with columnar formats such as Apache Parquet.
An EMR Cluster could be used to transform JSON data into a columnar format, but AWS Kinesis would probably fit better in a serverless stack like ours. The Firehose service can convert incoming JSON data into popular columnar formats supported by Athena. In this case, the data is delivered directly into S3 from Kinesis.
An API Gateway can also be used in front of Kinesis Firehose with the AWS-type integration, which is beneficial for security and concurrency control purposes.
1. Concurrency limits
2. Query scalability limits
3. Data access and security
This was the first article in a series about Lambda triggers and architectural design patterns. We’ve covered some patterns within the Orchestration & Aggregation category. In the coming weeks, we’ll cover more patterns in the same category, such as Fan-in/Fan-out, Queue-bases Load Leveling, Finite-state Machine.
Other categories of patterns will come as well, such as Event-Management, Availability, Communication, and Authorization patterns.
Subscribe to our newsletter to stay tuned for the next parts in this series!
In case you are looking for a solution to help you build well-architected serverless applications, Dashbird Insights cross-references your cloud stack against industry best practices to suggest performance and architectural improvements. You can try the service for free today, no credit card required.
Previously published at https://dashbird.io/blog/complete-guide-lambda-triggers-design-patterns-part-1/