The Containerized Software Development Guide

Written by decoder | Published 2021/09/19
Tech Story Tags: docker | programming | kubernetes | open-source | cloud-native | containerized-software | software-development | hackernoon-top-story

TLDR This article describes containerization best practices throughout the full lifecycle of a containerized workload; with emphasis on **development** and **security** We will look at: Container images design guidelines,. Development,. debug and testing,. security best practices,. CI/CD pipelines and. operations and maintenance pipelines. Even a senior developer might pick up a few tricks here and there. This article is about containers only, if you are interested in containers orchestration, check out my previous blogs, [developing on Kubernetes].via the TL;DR App

Introduction

This article describes containerization best practices throughout the full lifecycle of a containerized workload; with emphasis on development and security. We will look at:

  • Container images design guidelines
  • Development, debugging and testing
  • Security best practices
  • CI/CD pipelines
  • Operations and maintenance

You will find it useful if you are a software developer starting your journey with developing in containers. Even a senior developer might pick up a few tricks here and there.

There is also something for security professionals as well as automation engineers or SREs (Ops).

A little disclaimer, if your title is DevOps Engineer, please don’t feel left out. You will surely benefit from the content of this article. It’s just that DevOps is not a title neither a role nor a team, but rather a philosophy and culture. Unfortunatelly in most companies, DevOps really means automation engineering and soft-ops (mostly configuring and dealing with Kubernetes and other complex software). So if you read somewhere “automation engineer”, that means a DevOps engineer.

This document intends to serve as a framework and guide for developing and operationalizing containerized software. This article is about containers only, if you are interested in containers orchestration, check out my two previous blogs, orchestrating containers with Kubernetes and developing on Kubernetes.

There is a lot of ground to cover, so let’s get started!

Basic definitions and concepts

Container

A container is the runtime instantiation of a Container Image. A container is a standard Linux process often isolated further through the use of cgroups and namespaces.

Container Image

A container image, in its simplest definition, is a file that is pulled down from a Registry Server and used locally as a mount point when starting Containers.

Container Host

The container host is the system that runs the containerized processes, often simply called containers.

Container Engine

A container engine is a piece of software that accepts user requests, including command-line options, pulls images, and from the end user’s perspective runs the container. There are many container engines, including docker, RKT, CRI-O, and LXD.

Images Registry

A registry server is essentially a fancy file server that is used to store docker repositories. Typically, the registry server is specified as a normal DNS name and optionally a port number to connect to

Overview

This documentation assumes basic knowledge of Docker and Docker CLI. To learn or refresh on container-related concepts, please refer to the official documentation:

Please note that since most development activities will start on “docker stack” (docker CLI, docker CE, docker desktop, etc), most of the time we will refer to docker tooling. There are a lot of alternatives to every mentioned component. For example podman, buildah, buildpacks and many other technologies that are not coming from Docker the company.

The same goes for containers OS, some windows containers are outside of the scope of this article.

Docker Architecture Recap

For detailed information about docker architecture, please refer to Docker or Mirantis documentation. Here is a handy diagram explaining high-level docker architecture and its components.

Sources:

Container Lifecycle

When you start developing containerized workloads, there are a lot of similarities with developing regular software, but also a few key differences. The below diagram provides a simplified view of various stages of containerized workload lifecycle.

Docker CLI Syntax

Docker CLI has the following syntax:

Syntax: docker <docker-object> <sub-command> <-options> <arguments/commands>

Example: docker container run -it ubuntu

Container Layers

By default, all docker image layers are immutable (read-only). When a container is created using docker run command, an additional mutable (read-write) layer is created. This layer is only there for the duration of the container lifetime and will be removed once the container exits. When modifying any files in a running container, docker creates a copy of the file and moves it to the container layer (COPY-ON-WRITE) before changes are saved. Original files as part of the image are never changed.

Access remote Docker host from CLI

On machine form where you want to access docker host, setup variable:

export DOCKER_HOST="tcp://<docker-host-ip>:2375"

Docker default ports:

2375 — unencrypted traffic

2376 — encrypted traffic.

IMPORTANT*: This setting is only for testing/playground purposes. It will make docker host available on the network and by default there is no authentication.*

Use docker CLI as a non-root user

  1. Create Docker group: sudo groupadd docker
  2. Create a non-root user you want to use with docker: sudo useradd -G docker <user-name>
  3. Change this user primary group: sudo usermod -aG docker <non-root user>
  4. Log off and log in with the docker user.
  5. Optional — restart docker service: sudo systemctl restart docker

It is highly recommended to use VS Code with a Docker plugin for developing with containers.

here is a good write up about hot to setup and use Docker extension with VS Code

Read best practices for building Dockerfiles

Quickly create Dockerfile stub

If you are using VS Code with a Docker extension, you can quickly create a Dockerfile stub for your project.

  • open folder with your project in VS Code
  • go to command palette Ctrl+Shift+P and type Docker: Add Docker Files to Workspace
  • select your language from the dropdown box and answer a few questions
  • your Dockerfile will be generated in the directory you are currently in
  • make sure to tweak the file, but the templates are pretty good already

How to debug image building process

To build an image you can use a docker CLI docker build --progress=plain -t imagename:tag -f Dockerfile . or use VS Code Docker extension to do the same

the _--progress=plain_ flag creates verbose output to stdout and is enabled by default when using Docker extension.

When creating a Dockerfile, each new command such as RUN, ADD, COPY etc creates a new intermediate container that you can exec into and debug.

The debugging steps differ if docker host supports new build mechanism with _buildkit_ (from version 1.18 onwards) or old build mechanism with docker build. Buildkit debugging is relativelly complex, so it is easier to drop to the docker build way using _DOCKER_BUILDKIT=0_ before running docker build command. This setting will temporary switch build to legacy one.

Steps to debug Dockefile build process using legacy build

  • clone test repository or create a new one with Dockerfile that contains an error you want to debug
  • run legacy build command DOCKER_BUILDKIT=0 docker build --rm=false -t wrongimage -f Dockerfile.bad .
  • this Dockefile produces an error, the folder is missing

Step 17/19 : WORKDIR /app ---> Running in 21b793c569f4 ---> 0d5d0c9d52a3Step 18/19 : COPY --from=publish /app/publish1 .COPY failed: stat app/publish1: file does not exist

  • note that right above the error there is a message with an intermediate image ID of 0d5d0c9d52a3
  • since we used flag --rm=false intermediate images are not removed and we can list them using docker image ls
  • let’s start a new container from this image in an interactive mode docker run -it 0d5d0c9d52a3 sh
  • inside the container, we can see that the required folder is not created

How to debug applications running in containers

Applications running in containers can be directly debugged from an IDE when a launch.json the file is present and contains instructions on how to launch and debug a docker container.

it is strongly recommended to use VS Code with a Docker extension to easily add Dockerfile and debugging settings to the project.

  • Click here to see an already setup sample ASP.NET Core WebAPI project
  • Clone the project
  • cd into project directory
  • code . to open VS Code
  • select docker: initialize for debugging and follow the wizard
  • switch to Run and Debug view Ctrl+Shift+D
  • Select Docker .NET Launch
  • set breakpoint in the controller

Use Multistage builds

In a multi-stage build, you create an intermediate container — or stage — with all the required tools to compile or produce your final artefacts (i.e., the final executable). Then, you copy only the resulting artefacts to the final image, without additional development dependencies, temporary build files, etc.

A well crafted multistage build includes only the minimal required binaries and dependencies in the final image and does not build tools or intermediate files. This reduces the attack surface, decreasing vulnerabilities.

It is safer, and it also reduces image size.

Consider below Dockerfile building a go API. The use of multistage build is explained in file comments. Try it yourself!

Use Distroless images

Use the minimal required base container to follow Dockerfile best practices.

Ideally, we would create containers from scratch, but only binaries that are 100% static will work.

Distroless are a nice alternative. These are designed to contain only the minimal set of libraries required to run Go, Python, or other frameworks.

Use docker-slim to ensure that your image is as lean as possible

Container images should be small and contain only components/packages necessary for the containerized workload to work correctly. This is important for two main reasons:

  • security: making images smaller by removing unnecessary packages greatly reduces attack surface
  • performance: smaller images start much faster

docker-slim comes with many options. It supports slimming down images, scanning Dockerfiles etc. The best way to start with it is to follow steps in demo setup.

Confidential information and secrets

Use .dockerignore to exclude unnecessary files from building in the container. They might contain confidential information.

Docker uses biuildkit by default for building images. One of buildkit features is the ability to mount secrets into docker images using RUN --mount=type=secret. This is for the scenario where you need to use secrets during the image build process, for example pulling credentials from git etc.

Here is an example of how to retrieve and use a secret:

  • create a secret file or environmental variable: export SUPERSECRET=secret
  • inside a Dockerfile add RUN --mount=type=secret,id=supersecret, this will make the secret available inside the image under /run/secrets/supersecret
  • build the image with your secret like so:

export DOCKER_BUILDKIT=1docker build --secret id=supersecret,env=SUPERSECRET .

this will safely add from the environmental variable SUPERSECRET into the container. Examining image history or decomposing layers will not reveal the secret.

Use multiple Dockerfiles

Consider creating separate Dockerfiles for different purposes. For example, you can have a dedicated docker file with testing and scanning tooling preinstalled and run it during the local development phase.

Remeber, you can build imaged from different docker files by passing _-f_ flag, for example

docker build -t -f Dockerfile.test my-docker-image:v1.0 .

Use docker-compose to spin up multiple containers

Docker-compose specification is a developer-focused standard for defining cloud and platform-agnostic container-based applications. Instead of running containers directly from a command line using docker CLI consider creating a docker-compose.yaml describing all the containers that comprise your application.

Please note that applications described with docker compose specification is fully portable, so you can run it locally or in Azure Container Instances

Use Kompose to convert docker-compose files to Kubernetes manifests

If you already have a docker-compose file and need a kick-start with generating Kubernetes YAML files, use kompose.

komposeallows for quick conversion from docker-compose.yaml file to native Kubernetes manifest files.

You can download Kompose binaries from the home page

Use composerize to quickly create docker-compose files from docker run commands

Docker run commands can quickly represent the imperative style of interacting with containers. Docker-compose file on the other hand is a proffered, declarative style.

Composerize is a neat little tool that can quickly turn a lengthy docker run command into a docker-compose.yaml file.

composerize can generate docker-compose files either from CLI or a web based interface.

Here is an example of converting a docker run command from one of my images:

Control resources utilization by a container

CPU

Default CPU share per container is 1024

Option 1: If the host has multiple CPUs, it is possible to assign each container a specific CPU.

Option 2: If the host has multiple CPUs, it is possible to restrict how many CPUs can be given container use.

It’s worth noting that container orchestrators (like Kubernetes) provide declarative methods to restrict resources usage per run-time unit (pod in the case of Kubernetes).

Memory

Option 1: Run container with--memory=limit flag to restrict the use of memory. If a container tries to consume more memory than its limit, the system will kill it exiting the process with Out Of Memory Exception (OOM). By default container will be allowed to consume the same amount of SWAP space as the memory limit, effectively doubling the memory limit. Providing of course that SWAP space is not disabled on the host.

Map only ports you want to open

Ports mapping always goes from HOST to CONTAINER, so -p 8080:80 would be a mapping of port 8080 on the host to port 80 on the container.

Hint: Prefer using “-p” option with static port when running containers in production.

Use trivy to scan for image vulnerabilities

When using open-source images, it is critical to scan for security vulnerabilities. Fortunately, there are a lot of commercial as well as open-source tools to help with this task.

trivy from Aquasecurity

Using trivy is trivial ;) trivy image nginx reveals a list of vulnerabilities with links to CVEs

Additionally, to scanning images, trivy can also search for misconfigurations and vulnerabilities in Dockerfiles and other configurations.

Here is a result of trivy scan over a sample project:

Use linters on a Dockerfile

As part of your development process, ensure good linting rules for your Dockerfiles.

A good example is a simple tool called FROM:Latest developed by Replicated.

Below is a screenshot of the tool with recommendations:

Consider installing linting plugins to your editor of choice as well as run linting as part of your CI process.

Use dive to inspect images

Docker and similar tools provide an option for inspecting an image.

docker inspect [image name] --format - this command will display information about the image in JSON format.

You can pipe the output of the command to _jq_ and query the result. For example, if you have and nginx image, you could easily query for environment variables like so _docker inspect nginx | jq '.[].ContainerConfig.Env[]'_

This information however is rather rudimentary. To inspect the image even deeper, use dive

Follow the installation instructions for your system. Dive shows details of image content and commands used to create layers.

Decomposing an image

If you cannot install tools like dive, it is possible to decompose a container image using this simple method.

Container images are just tar files containing other files as layers.

Here is how to extract and save an Nginx image and inspect its content:

docker save nginx > nginx_image.tar mkdir nginx_image cd nginx_image tar -xvf ../nginx_image.tar tree -C

Each layer corresponds to command in Dockerfile. Extracting a layer.tar file will reveal the files and settings of this layer.

Consider signing and verifying images

Supply chain attacks have recently increased in frequency. Trusted and verifiable source code and traceable software bill of materials are critical to the security and integrity of the whole ecosystem.

You can sign your images using tools from the SigStore project

Sigstore is part of Linux Foundation and defines itself as “A new standard for signing, verifying and protecting software”.

There are many tools under SigStore’s umbrella, but we are interested in Cosign. Follow the installation steps from the Cosign repo.

Here is how to sign your image and push it to the Docker hub:

cosign generate-key-pair #this will generate 2 files, one with private and one with public key cosign sign -key cosign.key <dockeruser/image:tag>

Shipping containerized software has become easier and more streamlined due to standardized packaging (image) and runtime (container). CI/CD and systems automation tooling benefits from this greatly.

Nowadays pipelines follow the “X-As Code” movement and are expressed as YAML files and hosted alongside source code files in a git repository.

The exact syntax of those YAML files will vary from provider to provider. Azure DevOps, GitHub, GitLab, etc will have their variations.

Nevertheless, there are a few key components. Here is a sample YAML pipeline file for Azure DevOps with the most important definitions:

  • Resources: additional resources that pipeline needs to function. Can be other pipelines, image repositories, etc
  • Trigger: How the pipeline is triggered, can be only for a specific branch, pull request and more
  • Paths: for the trigger branch/PR what is the path where the source code is to work with
  • Variables: For convenience, most pipeline runners will provide a way to inject variables into a pipeline
  • Pool: VM or container running the pipeline jobs
  • Stages: Sequential stages of the pipeline, stages are logical grouping of jobs
  • Jobs: Another grouping level inside of stage
  • Task: actual activity carried out on the artefacts/source code

There is much more to CI/CD pipelines in general, the emphasis here is on actually incorporating a pipeline from the start with your project.

Build images using Kaniko or Buildah

To increase security consider building images in pipelines using Kaniko or Buildah instead of Docker.

Both tools do not depend on a Docker daemon and execute each command within a Dockerfile completely in userspace. This enables building container images in environments that can’t easily or securely run a Docker daemon, such as a standard Kubernetes cluster. Whereas Kaniko is more oriented towards building images in Kubernetes cluster, Buildah works well with only docker images.

Implement image scanning in the build process

Image scanning refers to the process of analyzing the contents and the build process of a container image in order to detect security issues, vulnerabilities or bad practices.

Recommendation: there are three major image scanning tools currently available: Snyk, Sysdig and Aqua. My recommendation is to use Snyk, for more detailed comparison check out this blog

Follow those best practices when integrating image scanning with your CI/CD pipelines:

  1. Scan images from the build pipeline (CI)
  2. Scan images in repositories, before containers, are created out of them (CI)
  3. Scan running containers (CD)
  4. Always pin image version explicitly (DO NOT use “latest” or “staging” tags)

For detailed explanation on how to integrate image scanning using Synk with Azure Pipelines for example, please refer to Snyk documentation

Nowadays operations on raw containers (without orchestrator) are happening mostly for simpler workloads or in non-production environments. Exception from this is IoT or edge devices but even there Kubernetes rapidly takes over.

Installation

Installing docker engine on a Linux distro is pretty straightforward. Please follow the installation steps from Docker documentation.

Installing docker engine on Windows Server is a bit more difficult, follow this tutorial to install and configure all prerequisites.

By default only windows containers will run on Windows Server. Linux containers must be additionally switched on (part of the documentation above)

Once the docker host is installed you can use Portainer to interact with the monitor and troubleshoot.

Choose the installation option depending on the environment you are in.

Sample Portainer dashboard

Once installed, docker creates a folder under _/var/lib/docker/_ where all the containers, images, volumes and configurations are stored. Kubernetes and Docker Swarm store cluster state and related information in etcd. etcd by default listens on port _2380_ for client connections.

Use watchtower to update images

Since docker host does not provide automated images update, you can use Watchtower to update images automatically when they are pushed to your image registry.

docker run -d \--name watchtower \-e REPO_USER=username \-e REPO_PASS=password \-v /var/run/docker.sock:/var/run/docker.sock \containrrr/watchtower container_to_watch --debug

Summary

Developing containerized workloads nowadays is a primary mode of server-side software development. Whether you are working on a web app, API, batch job or service, chances are that at some point you will add “Dockerfile” to your project.

When this happens, hopefully, you’ve bookmarked this article and will find here inspiration and guidance to do things right from the start.


Written by decoder | Multi-cloud is real, Microservices are hard, Kubernetes is the future, CLIs are good.
Published by HackerNoon on 2021/09/19