One thing people will notice when migrating to using cloud services are intermittent errors occur at a higher rate than you’re used to when running on premise.
These errors are often called transient errors and can be mitigated by just retrying the operation. While some operations might be fine to retry in all cases i.e. fetching data, others like creating orders, deducting money, etc. might not or at least need a little more fine grain control. Also, some errors makes no sense to retry and as a result it’s better to fail fast.
Writing the code to handle can easy turn to spaghetti code and it isn’t code you want to copy paste as it’s easy to get wrong and hard to maintain — so you really do want a framework to assist you with these scenarios.
This is how the authors describe Polly:
Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner.
Which really sums up what it does, it lets you create policies using a fluent API, policies which then can be reused and can range from being generic to target very specific scenarios.
Polly is available as NuGet package and works with both old .NET and shiny .NET Core, which will be what I’m using in this post.
Also worth noting Polly is open source available on GitHub.Polly’s also member of the .NET Foundation.
Defining a policy with Polly is really straightforward in the Polly
namespace you Policy
class which is what you utilize to create your policies
Above we’ve via the Handle<TException>
method defined a policy that if a TimeoutException
occurs it’ll wait 3 seconds and retry the operation 5 times before failing on the 6th time which is defined using the WaitAndRetry
method.
Because the delay of each retry attempt in this case is retrieved using a Func<int, TimeSpan>
delegate it’s really easy to implement more advanced scenarios like a sliding delay
to let our policy handle more exceptions we can use one to manyOr<TException>
method after theHandle<TException>
call
you can also make your policy even more granular by inspecting the exception with an exception predicate
In this case we’ll fail fast if it’s a .tmp
file that’s not found. You can also set policies based on result values, besides retries there’s also support for circuit-breaker, timeout, fallback, composing policies together and more, so the policy “engine” is very flexible.
We now have our policy in place and want to put it into good use, here the policies Execute
method comes in play
You can also create asynchronous policies to and then we can utilize the policies ExecuteAsync
method
It’s almost too good to be true, there’s a lot happening under the hood — but in my opinion “hidden” under a very easy to understand and maintainable API.
When is comes to SQL server there’s a few errors known to be safe to retry, so we explicitly look for those, so if we begin with it could look something like this:
where SqlRetryCount
is just a constant for how many retries to do and ExponetialBackoff
if a method that exponentially increases the delay between each attempt
TimeoutException:s
will always be retried but SqlExceptions will be passed to AnyRetryableError
method for assessment if it’s retryable or not
AnyRetryableError
will iterate all errors and with the RetryableError
method check if it’s an error known to be retriable and here is where the magic happens
we will switch on SqlException error number and use a few known constants to determine if it should fail fast or retry.
In this case, I’m going to do a simple API that extends SqlCOnnection
with WithRetry
methods, i.e. open SQL connection
giving us an open sync and async method, which in its simplest form usage could look something like this
combining with a simple SQL to .NET object mapper like Dapper results in pretty clean code
the above code will open the sql connection using the retry policy, but the query will be executed without retry, which could be what you want for some operations, but for a select it’s often safe/what you want to retry the whole operation. For that we add an overload that lets us execute code within the policy boundary
this will open the connection and then invoke and return the result of Func
you pass to it all within the scope of policy and just small refactoring of the calling code
and async variant usage could look something like this
Polly is really powerful, yet really easy to get started with and also fairly easy to retrofit into an existing application — so I definitely think you should take it for a spin.
I’ve created a complete sample repository on GitHub with the code from this post, thanks for reading! ❤
azurevoodoo/AzureSQLTransientHandling_AzureSQLTransientHandling - Handling Azure SQL transient errors in .NET Core using Polly_github.com