KJ Jones

@KJ.Jones

Solving Serverless Cold Starts with Advanced Tooling

November 4th 2018

Taking one of the biggest downsides of serverless and making it work for you.

What are Cold Starts?

If you spend a lot of time building serverless functions, you might start to think they’re too good to be true. The code is easier to manage and maintain. Azure’s Durable Functions set you up for success by giving you complicated plumbing right out of the box. Plus, they’re absurdly cheap.

The inner skeptic inside of me has been wondering, what’s the catch? Something has to give. No doubt, serverless is not a perfect world, and there are certainly downsides to consider. And one of the biggest downsides is that of cold starts.

Cold start is a term used to describe the phenomenon that applications which haven’t been used a while take longer to start up. (MSDN)

The above MSDN article does a great job of explaining what cold starts are, how they work, and why they’re necessary. To help keep costs low, a serverless app that has been idle for some period of time gets deallocated. This means a user’s first call to a deallocated function may have a delay while the function loads.

Cold starts work a little different in each serverless environment (Azure, AWS, Google Cloud). But the concept is the same. Mikhail Skilkov has a great series of articles on cold starts. This one compares cold start behaviors in each platform.

What Happens During a Cold Start (From MSDN)

Let’s face it, for most applications, the cold start problem is more of a minor inconvenience. A rare and occasional delay of a few seconds isn’t too big of a deal for most apps. But for some use cases, it could be an absolute deal breaker. So what can we do?

Solutions

The general consensus is that to reduce cold starts, you make sure your app never goes “cold” in the first place. This is usually done by creating a separate function that makes a request to your app every X minutes. I’ll propose a very similar pattern, but one that could provide some added business value.

A Timer Function App

The tried and true approach is to create a separate function app that works on a timer. You set the timer to run at some interval that you know your app will not be cold. Azure Functions, for example, “deallocate resources after roughly 20 minutes of inactivity*”.

Azure Function Cold Starter

So it may be good to set the timer to run every 15 minutes to ensure the app never goes cold. Or more often to be sure! Once again, I’ll link to Mikhail Skilkov. He has another great article showing the effectiveness of this approach. It’s not perfect, but it’s a great alternative to rolling the dice and hoping your users aren’t affected by cold starts.

A Third Party Monitor

If cold starts are a real problem, then this is a pretty mission critical application. And for those apps, you need all the assurance you can get that your app is not only performing well, but is performing in the first place.

Serverless functions have pretty high uptimes. Azure will give you some credits if it dips below a 99.95% uptime for functions. But that’s not to say they never go down, or slow down. And for mission critical apps, a small downtime can cost a lot. Getting alerts, even if it’s only down for a few minutes, is a big win.

That’s where API monitoring tools like RunScope and Assertible come into play. These services let you setup schedules to hit your APIs at some interval. They can run tests on the results and alert you if anything is out of whack. This is a perfect cold starter!

RunScope Scheduler

It’s the same concept as your own function app, but comes with added benefits. For one, there’s a separation of concerns. If your function app is down, the cold starter function (that’s also down) isn’t going to help you much. This gives you an unbiased third party ensuring everything is working appropriately.

You can also get a view of how performant your function is over time. Now that you’re testing your app so often, you’ll start to see patterns. Is the app getting slower? Has a new feature increased load times? Can we be sure there aren’t cold start issues in production?

RunScope’s Test Dashboard

I have no affiliation, but I’ve always been a big fan of RunScope. And I’d highly recommend it not just cold starting your function apps, but for all your API monitoring. The above overview is pretty nice to have. Plus, responding to dips (like on October 10 in the graph) based on RunScope alerts instead of angry users is priceless.

That said, $79/month for the cheapest plan may be a bit much if you’re only cold starting. Assertible is another nice option that even has a free plan to get going. You can make requests as often as every 30 minutes with the free tier. It’s not quite often enough to eliminate cold starts, but it could decrease the frequency your users experience them. Plus you get all the other benefits. If you find your users are still having issues, you can upgrade to a more reasonable plan. The $25/month plan will let you run every five minutes or faster. Perfect for keeping those function apps warm.

Wrap Up

Whatever you choose to do, it’s clear that the cold start issue deserves some thought as you go serverless. There’s no doubt that Azure is working on improving cold start performance. There was a time where it was every five minutes, now we’re up to ~20. But until the problem is gone, do yourself (and your users) a favor and keep the apps warm. You may even benefit from it in the process.

More by KJ Jones

More Related Stories