This is part of a series of articles discussing strategies to implement serverless architectural design patterns. We continue to follow this . Although we use AWS serverless services to illustrate concepts, they can be applied in different cloud providers. literature review In the we covered the Aggregator and Data Lake patterns. In today’s article, we’ll continue in the category covering the and . previous article (Part 1) Orchestration & Aggregation Fan-in/Fan-out Queue-based load leveling We previously explored these concepts in and , in case you’d like to dig deeper into some more examples. Crash Course on Fan-out & Fan-in with AWS Lambda Why Serverless Apps Fail and How to Design Resilient Architectures Pattern: Fan-in / Fan-out Purpose - Accomplish tasks that are too big or too slow for a single serverless function to handle. Solutions - We have basically four major steps in a Fan-in/Fan-out architecture: : everything starts with a big task or a large list of tasks Source of task : this is the entry point of the task and where the “Fan-out” process starts Ventilator : where tasks are actually accomplished Processing : brings together results from all task processing (a.k.a. “Fan-in”) Consolidation We’ll cover possible solutions for each of these stages below. In some cases, we use examples to illustrate our points. They are hypothetical scenarios and not necessarily an implementation recommendation for any particular case. Task Source Tasks could literally come from any possible AWS Lambda triggers, either or . This includes traditional invoke API calls, or integrations with , , etc. synchronous asynchronous API Gateway DynamoDB Streams For tasks that involve processing large amounts of data, it’s common to use integrations, since the invocation payload size limits are relatively low (up to 256 KB - 6 MB). For example, to process a 1 GB video file, it can first be stored in S3. The S3 PUT operation can . It doesn’t violate Lambda limits because the invocation only provides the S3 object key. The Lambda function can then GET the file from S3 for processing. automatically trigger a Lambda function AWS also recently within Lambda functions, which is an alternative to S3 for storing tasks with underlying large amounts of information. The criteria to choose between both services go beyond the purposes of our article. added support for EFS (Elastic File System) Ventilator This is where the Fan-out process really starts. An entry-point Lambda function receives a big task (or a large list of tasks) and is responsible for handling the distribution to multiple processors. Tasks can be distributed individually or grouped in small packs. In the example of a 1 GB video file, let’s say we need to perform machine learning analysis on video frames, or extract audio from the video. The Ventilator function could break the video down in 200 pieces of 5 MB. Each of these smaller video sections would be supplied to an external API, a cluster of servers, or even a second Lambda function to conduct proper analysis. This is obviously based on the premise that breaking the video apart is extremely faster than performing the analysis we are interested in. Processing The 200 Fan-out requests coming from the Ventilator can be dispatched concurrently to multiple processors. If it takes, let’s say, 1 minute to process every 1 MB of video file, the entire task can be accomplished in 5 minutes (1 minute * 5 MB/section). If the entire video was to be processed sequentially in a single node, it would take 1,000 minutes (or +16 hours). Clearly not possible in Lambda, due to timeout limitations. You might think this will also reduce total costs, since Processor Lambdas could be configured with much less memory than what the large task requires. But in most Fan-in / Fan-out use cases, the workload is CPU-bound. Reducing memory size will also reduce CPU power, which in turns increases processing time. Since Lambda is billed not only by memory size, but also per duration, the gains in lower memory allocation can be offset by the longer duration. To learn more about this, I recommend reading . How to Optimize Lambda Memory and CPU Consolidation In some cases, it will be necessary to bring results from all processors together. Since each Fan-out process will probably run independently and asynchronously, it’s difficult to coordinate the results delivery without an intermediary storage mechanism. For that we can use: Queues, Topics or Event Bridges Stream processors Databases or Datastores In the AWS ecosystem, for the first group we have (queue), (topics) and . In the second, has different flavors depending on what type of data and requirements we have. Finally, in the third group there are (object storage), (relational database) and (key-value store). Again, choosing what suits better each use case is beyond the scope of our current discussion. SQS SNS EventBridge Kinesis S3 Aurora Serverless DynamoDB Each of these services can act as a temporary storage for processing results. A third Lambda function, which we’ll call Consolidator, can later pick up all the results and assemble them together in a coherent way for the task at hand. But how does the Consolidator know when everything is ready? One approach is having the ventilator storing a task summary in a DynamoDB table, for example. It could store an item such as this: Each processor increments the processReady integer when its task is finished. Since , this operation is safe for our use case. DynamoDB supports atomic incrementing The Consolidator function can be invoked on a scheduled basis, using , to check whether processReady == processCount. Or we can also use the Consolidator upon item update (which may not be efficient, since the Consolidator will have to ignore 199 invocations out of 200). CloudWatch Rules DynamoDB Streams to automatically invoke One disadvantage of this architecture regards monitoring and debugging issues. A good practice would be to generate a unique in the Ventilator, which is passed to each Processor. All Lambda functions should log the same correlationID, this way it’s possible to track down every request associated with a single Fan-in/Fan-out process. correlationID Monitoring services that are tailored to Serverless also allows us to create “projects” representing multiple Lambdas, which simplifies issue tracking and resolution on distributed architectures such as Fan-in/Fan-out. Pattern: Queue-based load leveling Purpose - Decouple highly variable workloads from downstream resources with expensive or inelastic behavior. For example: consider an API Endpoint, whose incoming data is processed by a Lambda function and stored in a DynamoDB table with a . The concurrency level of the API is usually low, but at some points, during short and unpredictable periods of time, it may spike to 10x to 15x higher. DynamoDB auto-scale usually can’t cope with the rapid pace of demand increase, which leads to . Provisioned capacity mode ProvisionedThroughputExceededException Solutions The Queue-based Load Leveling pattern is a great candidate to solve this type of problem. It is highly recommended for workloads that are: Write-intensiveTolerant to an eventually-consistent data modelHighly variableSubject to unpredictable spikes in demand Implementing the solution is straightforward. In the example above, instead of having the Lambda function directly storing data in DynamoDB, it pushes the data to a Queue first. It responds with a 200-Ok to the consumer, even though the data hasn’t reached the final destination (DynamoDB) yet. A second Lambda function is responsible for polling messages from the Queue in a predetermined concurrency level that is aligned with the Provisioned Capacity allocated to the DynamoDB table. Risk Mitigation The risks associated with this pattern are mainly data consistency and loss of information. Having a Queue in front of DynamoDB means the data is never “committed” to the datastore immediately after the API client submits it. If the client write and read right after, it might still get stale data from DynamoDB. This pattern is only recommended in scenarios where this is not an issue. During peaks, the Queue may grow and information can be lost if we don’t take necessary precautions. Messages will have a , after which they’ll be deleted by the Queue, even if not read yet by the second Lambda. retention period To avoid this situation, there are three areas of caution: Enable centralized monitoring of your Inventory of queues to receive alerts about the ones growing too rapidly before messages are at risk of being lost Set a retention period that leaves comfortable time for the second Lambda to catch up in the event of unexpected long spikes Configure a Dead Letter Queue to give an extra room for processing - make sure to when setting an appropriate retention period for this additional queue observe a caveat Important considerations Read-intensive workloads can’t benefit from Queue-based load leveling, basically because we must access the downstream resource to retrieve the data needed by the consumer. Reserve this pattern for write-intensive endpoints. As discussed before, Queue-based is not recommended for systems where strong data consistency is expected. The pattern can only deliver good value in scenarios where demand is highly variable, with spiky and unpredictable behavior. If your application has steady and predictable demand, it’s better to adjust your downstream resources to it. In case of DynamoDB, the auto-scale feature might be enough, or maybe increasing the Provisioned Throughput would be recommended. Wrapping up In the next articles, we’ll discuss more patterns around Availability, Event-Management, Communication and Authorization. Implementing a well-architected serverless application is not an easy fit. To support you on that Journey, Dashbird launched the first . It runs live checks of your infrastructure and cross-references it against industry best practices to emit early signs preventing failures and indicating areas that can benefit from an architectural improvement. You can test the service with a free account today, no credit card required. serverless Insights engine Also published on: https://dashbird.io/blog/complete-guide-lambda-triggers-design-patterns-part-2/