Development flow in serverless environment from the trenches

a story from the trenches

What is a development flow ?

We are not going to talk about product development flow (no product managers were harmed during the making of this post!). Instead, we’re framing the process starting from the moment you as a developer have a well defined feature (note that what constitutes a “well defined feature” is the topic of another lengthy post), continuing through to the point where the feature is deployed in production. Remember: this is a very opinionated workflow! This may very well not be the best for your specific situation, and that’s okay.

Why are serverless development workflows different ?

Serverless changes a lot — but in this context, it can be distilled down to two distinct points.

Mindset — Serverless is not only using functions, it also means using managed services leaving you free to concentrate on what you do best. But wait, there’s more! In our model, serverless also means all or most of your tool chain is in the cloud; thus someone else is managing it. That includes your git repositories ✓, code quality and linting ✓, CI/CD engine ✓, development and testing environment ✓, etc.

Tooling — the moment your tools are also in the cloud, your workflow needs to accommodate that and make it easy for developers to make a quick transition (while developing) from local to remote and vice versa.

Guiding principles

When creating a serverless development workflow three principles should guide what you build:

You should detect problems as early as possible in the development cycle. We all know that fixing a bug in production is more expensive than fixing it during development , but in our case it is also cheaper to detect bugs locally on the developer’s laptop than on a remote testing environment. Local is faster than remote (for the time being).
No dedicated QA department — ohhhh this is a contentious point ! I don’t have anything against QA, but I do believe that developers are the sole owners of their developed product (note I didn’t call it “code”), from the moment it is first conceived until the moment customers use it and love/hate it, from top to bottom. There is room for QA in places where automation is too expensive to build, but this is the exception, not the norm. Existing QA teams can also serve as a valuable source of guidance, to help developers in thinking about cases that require testing. Create tools that allow developers to test easily both locally and remotely and try to automate the testing process as much as possible.
Developers are responsible for monitoring — Again, this a painful point, now I’m not saying that there is no room for devops in the organization, but developers are usually the folks best positioned to know if their developed product is behaving normally. They’re also generally the product owners who are best positioned to define the right KPIs for the product. Bake product visibility into your development workflow.

High level view

Feature life cycle

Type of environments we have

Technical Stack

Python + Django as programming language
Trello as our task manager
Github as our source control and code review tool
Django Test for local unit testing running
Flake8 for linting
Pytest for integration testing
Zappa as our serverless framework
TravisCI as our CI/CD server
CodeCov as our testing coverage
AWS ElasticSearch service as our monitor and KPI tool

Drill down

Task management

We chose Trello, (although as a side note I question whether it is a solution that can scale well). It is important to mention that we use Kanban as our product process. The main benefit that I see is the ability to pick a feature, develop it and push it to production without the need to wait for a “formal” sprint ending.

Implementation

During this phase it is the developer’s responsibility to choose KPIs. There are two types of KPIs that should be taken into consideration:

Product KPI — Is the feature being used? How it is being used? What is the impact of it to the organization?
Technical KPI — Are there any operational errors? What is the latency of each request? How much does it cost?

Unit testing

Of course it depends on whether you use TDD or not, so writing tests might be an inner step of the implementation, the point here that you can not move forward in your flow without finishing unit tests, remember, test locally whenever you can, it’s faster. A word of caution about mocks: I prefer to avoid mocking services with with “emulator” mocks, e.g. mocks that actually do something behind the scenes like DynamoLocal from AWS due to two main reasons:

Interfaces changes frequently, suddenly testing against local implementation does not guarantee it will work in production.
Setting up your local development environment becomes very cumbersome. For more details you are welcomes to read my thought about testing in serverless environment

Serverless testing from the trenches_A quick overview on Serverless testing paradigms_hackernoon.com

Automatic linting

No matter if we are talking about dynamic or static languages, linting is a mandatory step. Linting should include at bare minimum:

Opinionated auto-formatter — something similar to black for Python or prettier.io for JS. I do not want to work hard to format my code so everything should be done automatically.
Static type analysis — in dynamic languages like Python or JS I run static type checker like mypy or Flow.
Static analysis — This is the real “linting” process, pylint for example.

What do I mean by automatic? Simply put, everything is backed into a git pre-commit hook. We are using the pre-commit framework, which gives us the ability to run the linting process either manually or in CI environments. This is our .pre-commit-config.yaml file and a bash script the developer can use to run the linter flow.

Extend and/or run your integration test

Running integration tests are mandatory in serverless environments. For those too lazy to click, this is because many interfaces are not under your control and you need to test continuously to ensure that nothing is broken. Running is mandatory, but extending is not. My rule of thumb for whether a change in code requires an addition of an integration test is whether it is a new flow, e.g. no integration test for a bug and whether it is a critical path, e.g. if it breaks does the customer leaves us.

Test on personal cloud environment

Committing your changes without testing them on an actual native cloud environment means that the developer does not really know if his or her code actually work.

Each developer, as part of their first days in the company, prepares its own cloud environment, you can read more about the process in the following post

Serverless multi-cloud from the trenches_From the trenches series_hackernoon.com

A script enables the developer to push changes to its cloud environment and run either manual or automated testing on it.

The developers have now reached a point where they trust their code to do what it’s intended to do and that it does not break anything. It’s time to move the changes up the ladder!

Open PR + CI job

We use the GitHub flow in conjunction with TravisCI . When a developer opens a PR, two things happen in parallel:

Code review is being done through GitHub’s wonderful interface, code review is mandatory and the developer can not merge to master without doing it.
A bunch of tests are being conducted, making sure nothing bad is being merged into master. Let us go over our Travis script:

Line 18 — We use the same checks that a developer is able to run locally and are part of git’s pre-commit hook
Line 21 — One of the coolest things in serverless is the fact that you can keep all raw production data (like API requests) and use this data while testing your changes. We’ve created a script that pulls the last X requests that were made in production and allows us to replay them in our cloud testing environment. Pay attention to the fact that this cloud environment is used only by the CI service; this is an integration test environment. This test can take some time, therefore it runs only on PRs. As a part of this production raw data tests we also run integration tests.
Line 28 — We use codecov to code coverage, if it does not reach a certain threshold then no merge can be conducted

Pushing to production

If all tests and prerequisites passed we are ready to push changes into production, only the relevant functions will get updated, not the entire application.

One of the guiding lines is creating scripts that are usable both by developers, locally, and by the CI system, for local use sensible defaults that do not require any knowledge on how to run the scripts. For example the script above is using Zappa to update an AWS environment, when using it to update the developer’s cloud environment all they have to run is ./deploy.sh , The CI environment will use ./deploy.sh update zappa_settings.json production .

Fin

Defining a development workflow is an art, not a science. Each development team has its own quirks and preferred way of working. That said, I do believe that core values of using serverless tools and testing in the cloud is a sensible default that should be shared among all serverless developers.

As I wrote at the beginning of the post, this is a very opinionated workflow that works quite nicely for us. Please share your thoughts or alternate workflows in the comments!

Contributors

Corey Quinn — Cloud Economist. Father to @QuinnyPiglet. Writes http://lastweekinaws.com . Podcast: http://screaminginthecloud.com Advisor to @ReactiveOps. Community lead, @og_aws.
Aviad Mor — Co-Founder & CTO at @lumigo. Working on #serverless tools for #serverless architectures (and sometimes at night vice versa)