In my last post we talked about how we can implement semaphores with Step Functions. Another common scenario that many people have is to handle errors from a block of states like we’re used to with a try-catch
block.
try {step1()step2()step3()} catch (States.Timeout) {...} catch (States.ALL) {...}
With Step Functions, you can use Retry
and Catch
clauses to handle errors from Task
states. There are a number of predefined system errors, and you can also handle custom errors that are thrown by your Lambda functions.
You can do this by adding the same Catch
clause to each of the Task states.
"Catch": [{"ErrorEquals": [ "States.ALL" ],"Next": "NotifyError"}]
However, this approach requires you to add the same boilerplate to every Task
state. As your error handling strategy, or the state machine itself becomes more complex, this becomes a maintenance headache.
Fortunately, both Retry
and Catch
can be used on Parallel
states too!
Even if you’re not looking to perform tasks in parallel, you can still use it to simplify your error handling.
In this case, if I wrap Step1
, Step2
and Step3
into a single branch inside a Parallel
state, then I can catch unhandled errors from any of the steps with one Catch
clause.
{"StartAt": "Try","States": {"Try": {"Type": "Parallel","Branches": [{"StartAt": "Step1","States": {"Step1": {"Type": "Task","Resource": "...","Next": "Step2"},"Step2": {"Type": "Task","Resource": "...","Next": "Step3"},"Step3": {"Type": "Task","Resource": "...","End": true}}}],"Catch": [{"ErrorEquals": [ "States.ALL" ],"Next": "NotifyError"}],"Next": "NotifySuccess"},...}
One final caveat with this approach is that, a Parallel
state wraps the output from its branches into an array. So if subsequent states?—?such as the NotifySuccess
state in the example above?—?wants to use the output from Step3
then it’ll have to take that into consideration.
What you can do instead, is to add a Pass
state to unwrap the array, like this:
"UnwrapOutput": {"Type": "Pass","InputPath": "$[0]","Next": "NotifySuccess"}
This technique is useful when you want to apply the same error handling to block of states without having to resorting to boilerplates.
You can add Retry
clause to the Parallel
state to retry the entire block (i.e. from Step1
, even if Step3
errored). You can also add Retry
and Catch
for individual states to mix things up too.
So that’s it, a nice and short post to share with you a simple technique that I have found useful with Step Functions.
I have been spending a fair bit of time with Step Functions and enjoying the service. Let me know in the comments if you have use cases that you find difficult to implement with Step Functions, I would love to hear what others are doing with it.
Hi, my name is Yan Cui. I’m an AWS Serverless Hero and the author of Production-Ready Serverless. I have run production workload at scale in AWS for nearly 10 years and I have been an architect or principal engineer with a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming. I currently work as an independent consultant focused on AWS and serverless.
You can contact me via Email, Twitter and LinkedIn.
Check out my new course, Complete Guide to AWS Step Functions.
In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. Including basic concepts, HTTP and event triggers, activities, design patterns and best practices.
Get your copy here.
Come learn about operational BEST PRACTICES for AWS Lambda: CI/CD, testing & debugging functions locally, logging, monitoring, distributed tracing, canary deployments, config management, authentication & authorization, VPC, security, error handling, and more.
You can also get 40% off the face price with the code ytcui.
Get your copy here.