Serverless testing from the trenches

From the trenches

From the trenches series

We need testing

Boy, testing in a serverless environment is tough, but before we dig into the technical details I want to do a quick review based on my experience and the experience of others (see the list at the bottom) on testing paradigm in serverless apps, why is it different, and why you need to change your thinking on testing. Let’s begin.

For those who are not familiar with, there is the famous testing triangle:

This is a famous triangle

The idea that stood behind this triangle is that you should concentrate the most at writing unit tests, less on integration tests, and even less on E2E/Manual, etc.

When moving more towards microservices there is a shift towards integration test, but why? The answer is because this time a lot of the interaction not only happens in code, but also in the configuration and the interfaces between the services.

Serverless is a microservice on steroids, but unlike microservices where your code is running in your microservices, you control the interfaces; serverless usually means outsourcing anything that is not related directly to your main business, from “basic” stuff like CI, source control to DB, compute engines up to machine learning models, data lakes, etc. The moment you do not control the other end, there is a danger that your code will not talk properly to the other side or the other side has decided to change its rules.

In the microservices world, and even in the traditional monolithic world, you write mocks that enable you to test everything locally with a high degree of certainty. Remember the fact that you own the code, which gives you a lot of power. But mocks are not good enough in the serverless world — not because there aren’t, serverless offline, AWS SAM, or localstack (and the list goes on) with official and less official mocks. The problem is twofold:

  • You do not have mocks to all the services that you’ll use. Remember there are other service providers like Firebase, auth0 (and again the list goes on).
  • Even for those who do have mocks, they will never support the latest and greatest features that the cloud providers release. I remember I was using a feature called POST pre-sign in S3, and when I used it originally, no mock provided me the interface.

For me serverless testing means a different shape.

Unit tests are important — very important. Actually you should probably write them first, but in a serverless environment they are not the only important one because you need also tests that run in the cloud environment and tests that asses the quality of your cloud environment, which is something that only integration tests can give. Pay attention to the fact that you’ll probably have more unit tests than integration tests, by an order of magnitude, but you can’t skip integration tests.

But it does not end here: I believe that in order to produce good code as a developer in serverless environment you have to run and play with your code on an actual cloud environment, not as part of a CI, but part of the regular code → test cycle. And here there is another paradigm shift: We, as developers were used to coding → test locally on the computer, but I believe serverless will push us to code locally and test remotely.

How we test our Services

I’ll be honest with you, I’m going to skip the unit testing part. I believe this area is well covered in other posts and in general it is no different than what you were doing up until now. Let’s move to the interesting part, which is integration.

Our stack

In order to better understand our methodology, you need to know which services we used.

  1. Using Python + Zappa
  2. AWS API GW
  3. AWS Lambda
  4. AWS S3
  5. AWS RDS
  6. AWS ES
  7. AWS Rekognition
  8. Firebase Authentication service
  9. Firebase Firestore as our mobile device data store
  10. Various mobile related tools on Firebase like crashing, analytics etc.

The fact that we are using two cloud providers complicates our development, but for us Firebase is a winner for mobile-related development and we preferred to work hard on integrating it.

Prerequisites

As I wrote, serverless testing is a change in mindset, and each developer needs to have these prerequisites ready before writing a single line of code.

  • Each developer has their own environment in the cloud, such as AWS and Firebase.
  • Although each developer has their own expertise, we are expecting each developer to have the ability to deploy and run integration tests. They do not have to deploy all components (the system might be huge), but at least deploy the components that are affected by the change.

The ability for each developer to deploy the entire environment is something rather new; in the traditional world where you ran everything locally sometimes you couldn’t run everything (too complicated, not enough compute resources. Etc.), but here in the brave new world we embraced this capability.

It is not easy.

  1. We’ve created multiple Cloud Formation files, one for each component that requires provisioning like network (VPC, SG etc), DB, ES, S3. Pay attention that for Lambda and API GW, we are using Zappa’s built-in capabilities to create them on their own.
  2. There is a welcome wiki that a developer follows and create the relevant resources via AWS CLI.
  3. Firebase is more complicated, and there is a lot of manual work there, mainly because there are no easy CLI and configuration tools.
  4. Provisioning the relevant services is not enough, and after running Cloud Formation the user has to go into each serverless app and update the relevant configuration — things like DB credentials, subnets, SG, and correct S3 buckets. We use a template that the developer uses and fills in the relevant missing pieces.

The above process although highly automated has couple of problems:

  • It’s expensive and provisioning for each developer RDS, ES and NatGateway costs quite a lot of money. Although we were able to reduce the costs by choosing micro instance in every place we could, but it’s still a couple of tenths of dollars per month per working environment (multiply by number of developers and it gets expensive). After some time we’ve decided to move from micro to a better instance type; it was agonising to wait for our testing environment.
  • It’s very error prone and cumbersome for management, although AWS has organization support, but Firebase does not. Some things are not automated completely like developer account creation on AWS and Firebase (e.g. developers have to create them manually).

Integration tests flow

Now the fun begins. Let’s do a quick overview of the main flow we have in our integration tests

  • Install a client.
  • Register.
  • Create test data on the device and upload it to analyze.
  • Consume test data, run basic ML algorithm, and notify client on results by sending it via Firebase notification.
  • Update device properties and verify that the client received the update via Firestore with proper content.

I’m cheating

This is not a client ! by Braydon Anderson on Unsplash

Adding a real mobile device to our integration test would make it too complicated, so instead we used a mocked device: We had code that was calling the same API the device was supposed to call. Testing a real device, which is sometimes called E2E testing (end to end), is done manually by the developer themselves before committing the code. No QA is involved. Actually we don’t have any QA, and developers are complete owners of the entire process (another part of the paradigm shift in serverless, but that’s another post).

We are using DRF to expose functionality that is not visible in production environment. This functionality is used for instrumentation purposes only, it is heavily monitored and disabled in production via configuration.

In order to verify that Firebase was acting properly we use its admin SDK to query the relevant services.

Running the damn tests

Writing tests and running them are totally different creatures in the serverless world. In your local development environment all you have to do is click play, but there is no play button for AWS :-(

That would be nice

So running our integration tests contains two phases:

  • Discover all lambda functions we have in the system, pull the latest git changes, and deploy to AWS. For those interested in viewing the deployment script, check it out on github.
  • The results of deploying through Zappa is the actual API GW address; the addresses are the parameters given to pytest.

Now last but not least, we’ve actually created our own play button, hurray! Sometimes things break and they are found during the integration tests, and here unfortunately we fall short. The code → test cycle I was talking about does not work very well in the cloud, and for us fixing a small piece of code usually means waiting between 30 to 60 seconds of cloud update.

Epilogue

This is a long and cumbersome process. It takes a lot of time to update, and there is a real gap here in tooling. I found an interesting solution called seed. I did not try it, but we use python + zappa, which is not supported.

You are more than welcome to share your experience with testing and development environment provisioning in serverless.

Bibliography

More by Efi Merdler-Kravitz

Topics of interest

More Related Stories