171 reads

Shifting Containers Left: On the Quest for Reproducible Development Environments

by Guillaume TamboiseDecember 1st, 2019

Too Long; Didn't Read

Photo by Casey Horner on Unsplash

featured image - Shifting Containers Left: On the Quest for Reproducible Development Environments

Photo by Casey Horner on Unsplash

From a very high-level perspective, a delivery pipeline can be modeled as a rather peculiar one-way tunnel with intersections - yes, the analogy has not broken down just yet.

The adding functionality part of coding happens on the left and is typically performed by Developers. These people value development environments that take care of the mundane (and more). Coding, building, running tests, iterating. Hey, if it works on my machine, what can go wrong?

The delivering functionality piece happens on the right, typically performed by people with more of an Ops acumen, living on the right. What they focus on is the ability to consistently and efficiently deploy software (including their dependencies) and keeping them running.

CI/CD folks live throughout the pipeline, promoting the product from development to deployment and operations. They, too, value repeatability. There is nothing worse than trying to introduce a single, tiny code change, only to realize that the entire build tool set has been patched twice since the previous push of some functionality.

Transitioning from one environment to the next (Coding, Building, Testing, Deploying/Operating) is where the fun begins.

Docker came up with multi-stage builds and introduced them as a good way to stitch build time and operations time. In non-trivial cases, the base image for production is a significantly cut-down version of the base image used as build environment. In the world of Golang, for example, we could use one of the golang-provided base image for build time, and go back to a vanilla Alpine in Operations:

# dev stage
FROM golang:1-alpine AS build-env
WORKDIR /go/src/example.com/mypackage
RUN apk add \
        ca-certificates \
        gcc musl-dev \
        git
COPY . .
RUN go get -d -v ./... \
    && go test -short -timeout 30s ./... \
    && go install -v ./...

# build stage
FROM alpine
RUN apk add \
        ca-certificates
WORKDIR /go/bin/
COPY --from=build-env /go/bin/ /go/bin/

Why go through the trouble of trimming down images?

Beyond the cyber security aspect (less code means less potential for exploitation), the mere size reduction should be a good incentive. From 359 MB down to... 5 MB when this article was written.

Now, would a developer want to develop in a Continuous Integration build environment? This proposal sounds questionable, as Continuous Integration environments are really meant for machines, not for humans.

Let's take the example of a piece of code written in Python. Not necessarily deployed as a container. Say, deployed as an AWS Lambda function coded in Python.

If we look at the problem from the Ops side and go left, we drop a number of Python modules directly with the source code. Keep moving left: working in a directory cluttered with Python modules is not pleasing. Just think about the resulting

.gitignore

file to be maintained. So at a minimum, the CI/CD tool needs to perform a transition from Dev to Ops, using typically pip's very good

requirements.txt

Mind however that we do not want the CI/CD tool to micromanage the promotion from development to testing, otherwise the said promotion cannot be performed manually. Otherwise, whenever we are closely iterating on that part of the pipeline, we would start copy-pasting steps from the CI/CD tool, say Gitlab's

.gitlab-ci.yml

file, into a command shell. Doable, but awkward.

Let's picture the development cycle when something does not work quite well once in a lambda function. The developer wants to iterate quickly by introducing a small change locally, then push, then inspect, then code again.

Quickly means without checking in code and pushing it through the pipeline, so without necessarily involving the CI/CD tool. After all, even developers of the pipeline itself want to be able to promote environments manually before asking a CI/CD tool to automate the process. So our CI/CD tool is our best friend, but right there and now it gets in the way of getting the job done.

WWOD - What would an Ops do?

Couldn't we develop in an uncluttered filesystem where external bits and pieces fall in standard places, out of the way? But without the burden of setting up individual workstations? Yep, containers would fit the bill, but probably not the same containers used at build time, at deployment time or in production. Enters VS Code and its Remote - Containers extension, which former-friend-at-work Brian (Brian's former work, not former friend) highlighted as very handy.

The gymnastic is somewhat similar to the exercise of creating a CI/CD build image: spend some cycles thinking of what goes in there, as opposed to being regenerated as part of the pipeline. In the case of NPM, command-line packages such as serverless are best installed globally, so they reside in the dev image, while modules are best installed at .devcontainertime (picture this file as the orchestrator of your development environment).

Hang on a minute. Orchestration of the development environment. Is that really necessary? Let's take a few steps back. What would an alternative solution look like? Do mind that we are not focusing on environment-specific solutions, ala Skaffold for Kubernetes.

A first, naive version would involve maintaining (or choose) a development container and run it using a script such as

docker run --rm -it -v ${HOME}:/home/user \
		   my-development-image:latest /bin/bash

We would then spend some thoughts on:

The gymnastic of what needs to run inside. Code editor with at least syntax highlighting? Compiler or interpreter? Command-line linter with common configuration settings throughout the team? git hooks to enforce the said common development practices?
As opposed to what needs to run outside the container. A full GUI-based IDE? The same compiler, interpreter, linter, git hooks as listed in the previous point?
The components sitting outside the container need to be efficiently installed, version controlled and updated.
And then the
```
my-development-image
```
image needs to be maintained and refreshed when needed.

That is basically what VS Code's

Remote - Containers

extension takes care of.

The language support and other SDLC tooling kind of VS Code extensions that we want available to the (containerized) development environment are defined in

.devcontainer.json

. For example, for Python developers it could start by looking like this:

"extensions": [
	"ms-python.python",
        "mhutchie.git-graph"
],

With that in place, where do we store the configuration of the development environment, as far as the piece of software under development is concerned? Think API keys, Cloud credentials, etc. The configuration of the development environment can be specified in a

.env

file. It can then picked up by docker-compose and

zsh

(using oh-my-zsh's dotenv plugin).

The

.env

file belongs to the environment and must not be checked in source control. Later in the pipeline, the content of the .

env

file can be deployed using Kubernetes secrets or equivalent. Alternatively, the CI/CD tool can programmatically recreate it as a file on the filesystem, or we can just ignore the file in favor of another mechanism. This is really a matter of taste (and of computing platform).

Coming back to the aforementioned AWS lambda function. Our development environment can now benefit from docker-compose orchestration that will make sure the lambda function always sees a Dynamodb in front of it, for its persistence needs. That is, provided that

.devcontainer/docker-compose.yml

includes something to this effect:

version: '3.5'
services:    
  controller:
    build:
      context: ..
      dockerfile: Dockerfile
    volumes: 
      - /home/myuser/git/the-app:/workspace
      - /home/myuser/.aws:/home/developer/.aws
    links:
      - dynamodb
    environment:
      - SLACK_WEBHOOK
    command: sleep infinity

.services/controller/environment

contains the environment variables to be passed inside the development container, with

SLACK_WEBHOOK

as an example.

The Dynamodb container is then added with a simple:

  dynamodb:
    image: amazon/dynamodb-local
    restart: unless-stopped
    volumes:
      - /data/db

Remote - Containers

knows to pick up these containers thanks to these entries in

.devcontainer/devcontainer.json

        "dockerComposeFile": "docker-compose.yml",
        "service": "controller",

Notice that we keep a

workspaceFolder

, but we remove our

workspaceMount

if we decide to let docker-compose take care of the mounting.

    "workspaceFolder": "/workspace",

All in all, this is what our

devcontainer.json

looks like:

{
  "name": "Python 3",
  "dockerComposeFile": "docker-compose.yml",
  "service": "controller",
  "postCreateCommand": "npm install",
  "extensions": [
    "ms-python.python",
    "mhutchie.git-graph"
  ],
  "settings": {
    "terminal.integrated.shell.linux": "/bin/bash",
    "python.pythonPath": "/usr/local/bin/python",
    "python.linting.enabled": true,
    "python.linting.pylintEnabled": true,
    "python.linting.pylintPath": "/usr/local/bin/pylint"
  },
  "workspaceFolder": "/workspace",	
}

Looking further right in the pipeline, automated testing. Are we going to re-use exactly the same container orchestration, as part of the CI/CD tool? Not necessarily, for a number of practical reasons:

CI/CD credentials and the place they are stored are unlikely to be the same between a developer's machine and a CI/CD agent/runner
the Docker images may be built outside of Docker Compose, as opposed to inside

It may be possible to bundle everything in a single docker-compose file for development and automated testing, but sometimes the best way to express that things serve a different purpose... is to keep them separate.

In the specific case of this Python AWS lambda function, keeping the same

Dockerfile

from development to deployment has proven possible. The fun part has been to merge the

Dockerfile

coming from the left (dev-first approach) and the one coming from the right (used at deployment time, as part of the CI/CD tool).

We have seen how remote containers can help maintaining a reproducible development environment, staying silent on the potential cross-platform benefits that this may bring.

Next use case to stare at? The advent of ARM as desktop platform, should the universe (and Apple) decide so. Being able to code/build/test on an ARM platform with all the local bells and whistles, but from a non-local, non-target hardware platform (some developer laptop) may come extremely handy.