My team has a lambda function that is scheduled to run every hour. It succeeds 90% of the time but fails 10% of the time due to network error. When it fails, it does so silently. And we have to regularly check its logs and manually make up for what is missing. This is quite inconvenient. We want a better way to do this. We want the lambda to automatically retry a few times after it fails. And if it still fails after all attempts, we want to be notified through email. And we achieved this using the AWS Step Function. It saved us tons of time, and we like how it simplifies the logic and reduces the amount of code (and bugs) that we otherwise have to write. This post will show you how to do that. We will first see how to create a step function in the AWS console, and then how to do that through an infrastructure-as-code tool such as Serverless. Create a Step Function in AWS Console 1. Add the Lambda Go to > > click on . AWS console Step Functions Create state machine Select , choose the Type, and hit . Design your workflow visually Standard Next In Workflow Studio, drag a block into the first state. Lambda: Invoke Under > > , choose the target lambda in the dropdown. Configuration API Parameters Function name Under , choose . Configuration > Additional configuration > Next state Go to end 2. Add a Retrier A defines a set of retry rules such as max retry attempts and retry interval. A retrier reruns the lambda after it fails with a certain error. retrier Step Function allows you to add multiple retriers to handle different errors. To keep it simple, we will add one retrier that runs on all errors. Under > , click . Error handling Retry on errors Add new retrier Under > , select . This means this retrier will apply to all errors. Retrier # 1 Errors States.ALL Set the to be 5 seconds, to be 2, and the to be 1. Interval Max attempts Backoff rate Interval and max attempts are easy to understand, the backoff rate determines how the retry interval increases. For example, if the interval is 5 seconds and the backoff rate is 2, the lambda will wait for 5 seconds before retrying after the first failure, 10 seconds after the second failure, 20 seconds after the third, and so on. 3. Add a Catcher A defines a set of error handling rules if the lambda fails after all retries. catcher I want to send an email with AWS Simple Notification Service if all retries failed. Under > , click . Error handling Catch errors Add a new catcher Under > , select . This means the catcher can be triggered by all errors. Catcher # 1 Errors States.ALL Under > , click . This will create a new error handling branch in the workflow. Catcher # 1 Fallback state Add new state Search for SNS in the search bar on the left, and drag an block into the fallback state. Amazon SNS Publish Next, click on the block to edit it. SNS: Publish Under > > , select a topic. For example, the HelloFuncFailed topic here will send an email to me. See on how to set up SNS to send emails. Configuration API Parameters Topic this documentation Now that we added the Lambda, defined retry and catch rules in the step function, you can click Next to review the definition, and then create the state machine. Deploy Step Function with Serverless To make it easier to share and maintain the step function configuration, you can also deploy the same step function with an infrastructure-as-code tool. Below is the definition for the step function that we created above. Serverless # serverless.yml service: myService provider: name: aws runtime: nodejs12.x functions: hello: handler: hello.handler # required, handler set in AWS Lambda name: hello-function stepFunctions: stateMachines: helloStepFunc: name: helloStepFunc definition: StartAt: HelloLambda States: HelloLambda: Type: Task Resource: Fn::GetAtt: [hello, Arn] End: true Retry: - ErrorEquals: - States.ALL IntervalSeconds: 5 # 5 seconds MaxAttempts: 3 BackoffRate: 1 Catch: - ErrorEquals: - States.ALL Next: SNSNotifcation SNSNotifcation: Type: Task Resource: arn:aws:states:::sns:publish Parameters: Subject: Hello Lambda failed after retries Message.$: $ TopicArn: xxx:HelloFuncFailed # your topic arn here End: true plugins: - serverless-step-functions # need to run $npm install --save-dev serverless-step-functions The above template assumes that the lambda code is defined in a file in the same directory. You can also refer to an existing Lambda by its Amazon Resource Name (Arn). See the for more details. hello.js Serverless documentation Conclusion So this is how to use AWS Step Function to add retry on errors and notification logics to a lambda function. You can create a step function through an AWS console or create one using an infrastructure-as-code tool such as Serverless. Step Function has saved my team lots of time. It simplifies our error handling logic and allows us to implement a set of rather complex rules with a few lines of code. Hopefully, there is something you can take away and apply to your project. And please let me know if you have any questions. 🙂