TL;DR: When adopting serverless, you need to make up your mind who will be on call to troubleshoot and fix functions. If you can’t imagine your developers being on call, then you need to think harder.
As I tried to argue in my 2016 O’Reilly book on Serverless Ops, when adopting serverless compute, organizations need to consider the operations side of things (also, thanks for whoever coined NoOps, not helping):
- There is an operations part around the platform itself. If you’re using one of the public cloud providers’ offerings—such as AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, or IBM Cloud Functions—then chances are you’ll never meet the fine folks who’re running the platform. If you, however, run serverless on your own infra—for example the popular Kubernetes offerings OpenFaaS and kubeless or up and coming things like Oracle’s fn or Pivotal’s PFS—then you’ve got at least a chance to know the people who maintain the Kubernetes cluster it’s running on. We will not be talking about this ops aspect in the following.
- Then there’s the operations part around the app (or function). A function might break, due to a bug in the code itself or due to unexpected load. How about security? Someone needs to be (a) alerted that something went wrong (that is, on call), and (b) in a position to fix it (logs, metrics, tracing, troubleshooting). This is what interests me. How do folks who’ve adopted serverless deal with it and how do folks who plan to adopt serverless think they’ll cover this ops task?
Now, don’t get me wrong, I’m a big believer in serverless (I wouldn’t have been investing in it myself since ca. 2015 if I thought it wouldn’t fly). I also by and large agree with Simon Wardley where he says:
I’m a big fan of containers, despite what people might think. I look at them as invisible subsystems. That’s not where the battle is. The battle is code execution environments, particularly with functional billing.
I’ve used various serverless offerings in the cloud (AWS Lambda, Google Functions, and Azure Functions) as well as on top of Kubernetes (like with OpenFaaS and kubeless). Heck, I’ve even implemented one myself. I know how awesome the UX is. No more thinking about what kind of VM or what container base image. Here’s my code and off we go!
Nevertheless, and I say that despite the fact that folks are trying to lead the discussion into a certain direction (sigh, it’s NOT about serverless vs. containers, see also Simon’s statement, above), one has to come up with a strategy around how to deal with the appops serverless aspect. Have you?
From organizations and individuals I hear essentially two kind of answers:
- Progressive, forward-looking ones say: “Of course our developers are on call for serverless”, for example Expedia’s engineering lead Subbu Allamaraju.
- The majority of folks I’ve been talking with at various conferences and user groups, mainly in Europe and USA, say something along the line: “We’re paying our developers to produce code, not to be on call”. Hmmmmm …
I’d love to learn your approach here. How are you dealing with the app operations part in serverless compute? Who’s on call in your team?