Why Dockerizing Applications is the Key to Building Scalable Software

One of the most popular container technology providers Docker registers in February 2022 a record-breaking 15+ million active users per month.

The success of Docker is a testament to the impact that container technologies have on the entire IT landscape.

But what causes more and more developers and organizations to move their applications and services into the container?

This blog post will explain the fundamental concepts of container technologies and show you 10 reasons that make container technologies attractive for you.

What is a Container?

Containers are small packages that contain software in a runnable software environment. You can use those packages to ship software to any computer or virtual machine (VM) that supports a container runtime like Docker.

The idea behind containers is the virtualization of the operating system. Containers are processes that disguise themselves as operating systems. The container looks like an operating system for applications that run inside the container. But they are in reality processes that run on top of an existing operating system. They share system resources such as disk, memory and network with the host operating system.

Container Images

The container image is a file format that we use to specify to a container runtime how a container process should start. It contains a set of top-down instructions that the container runtime carries out when we start a container.

The file format to specify a container image in the Docker ecosystem is called a Dockerfile. We can create a Dockerfile with our code editor and turn it into a container image with the docker build command.

Here is an example Dockerfile from the NodeJS community

FROM node:16

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies
# A wildcard is used to ensure both package.json 
# AND package-lock.json are copied
# where available (npm@5+)
COPY package*.json ./

RUN npm install
# If you are building your code for production
# RUN npm ci --only=production

# Bundle app source
COPY . .

EXPOSE 8080
CMD [ "node", "server.js" ]

You can see that the Dockerfile above carries out commands with the RUN directive much as you would do in the terminal of your UNIX system. But Dockerfiles offer also container-specific declarations like COPY and EXPOSE which we use to inject data into a container process or for port-binding.

Every container image references a "base image". This is done through the FROM statement. The example above references the official NodeJS Docker image. Using the NodeJS Docker image as a base image ensures that we already have a linux container with NodeJS dependencies like npm and node pre-installed.

But we could have chosen any other Docker image from the official Docker Hub registry instead.

The concept of re-using existing Container Images by referencing base images is called image layering. Container images can consist of multiple layers. The example Dockerfile above is based on the NodeJS image layer which is based on the Ubuntu image layer.

Container Registries

Containers are made to simplify the deployment of software. But we need to ship container images to production systems in order to start container processes.

Container registries help us to distribute container images. A registry is a database for container images that we can consume through a Rest API or a client. One of the most popular container registries is the Docker Hub Registry. This is the place where many open source communities upload their official container images.

The Dockerfile example from the last section uses the official Docker Hub image for NodeJS. The docker command-line interface (CLI) searches the Docker Hub container registry by default if the required container image cannot be found on the host system.

Downloading a container image from a registry is called a "pull" or "pulling".

You can also host a container registry by yourself. Nexus and Artifactory are two common applications that provide container registries. Most cloud providers like Amazon Web Service, Microsoft Azure or Google Cloud Platform offer managed container registries for users.

How you can use another container registry with your Docker CLI is described here

Container Orchestration

Operating production systems with many container applications is difficult. Container Orchestrators like Kubernetes, Docker Swarm and Docker Compose exist to make the deployment and maintenance of containerized production systems easier.

Container orchestrators differ greatly in complexity and features. But Docker Compose is a good orchestrator to get started with Docker. It is primarily used for local development purposes or the deployment of production systems on a single virtual machine or computer.

Other orchestrators like Kubernetes are more complex to use but make it possible to deploy container applications on infrastructure with multiple virtual machines.

Open Container Initiative

The Open Container Initiative (OCI) provides a standard for the most important components of container technologies. Part of these standards are for example the container image format and the container runtime API.

Container Runtimes like containerd, CRI-O, Docker and Mirantis implement this OCI container runtime standard. That is important because container orchestrators like Kubernetes make the Container Runtime configurable. You use any container runtime that respects the OCI standard in your Kubernetes cluster.

Reasons to Use Container Technologies

We understand now the fundamental concepts of container technologies and can start to learn more about their use cases and benefits.

Here are the 10 reasons why you should use container technologies.

1. Resource Efficiency

Containers help us to utilize more of our system resources in our computers, servers and virtual machines.

Organizations can deploy multiple services or applications on a machine through containers while maintaining a degree of isolation between them. That makes it possible to run more software on the same machine which improves resource utilization and reduces hosting costs.

A smaller resource overhead compared to dedicated virtual machines makes the container a cheaper deployment target for software. Containers do not provide system resources on their own but reuse existing system resources of the host machine instead.

A host machine can be anything that provides an operating system like a virtual machine or computer. Virtual machines are an interesting platform to host container processes since they find wide applications in cloud computing and software hosting in general.

Virtualization of system resources in form of virtual machines is time-intensive and costly. Automated infrastructures require on average several minutes to bootstrap a virtual machine, while containers are up and running in seconds.

The speed and resource efficiency of containers make them both a space, resource and time efficient deployment option and help us to maximize the resource usage of virtual machines.

2. Isolation

Software products have been deployed on virtual machines for a long time. Linux operating systems offer service managers like systemd to orchestrate several service processes on the same virtual machine. But that can be quite challenging because of the lack of isolation between processes.

Processes running on the same Linux operating system share system-wide dependencies, disk space, network, and CPU resources. It is difficult to ensure that services running in different Linux processes do not interfere with each other.

Container processes on the other side offer a higher degree of isolation in comparison. A container process gets started with the Unix clone system call in contrast to the exec system call is used for most other Unix processes.

The clone system call spawns a UNIX process like the exec or fork system call. But clone has some important capabilities:

clone can place child processes into different UNIX namespaces.
clone can place child processes into a different virtual address space.
clone can change the file descriptor table of the child process.
...

We won't go into details of how a Unix operating system works in this blog post. But the capabilities offered by the clone system call used by container processes improves the isolation of containers compared to "regular" UNIX processes.

Container processes provide a virtualized operating system that reuses the existing system resources of the host. The virtualized operating system prevents the leakage of system-wide dependencies like dynamic link libraries into software processes that run inside a container.

For example, installing a NodeJS NPM package in one container process running a NodeJS app does not affect other container processes. The same might not be true for two NodeJS processes running on the same Linux machine. There is the possibility that both apps use the same NodeJS interpreter which might lead to incompatibilities.

3. Automated Setup

Container images provide a declarative syntax that you can use to describe a container. Container Runtimes use container images to start container processes on a host operating system.

The starting procedure that the Container Runtime carries out is automated and reproducible. Instructing the Container Runtime to start a container process using the same image a hundred times, leads a hundred times to the same result.

That is an important quality of Container Technologies. It makes software deployments more predictable and bugs in software systems more relatable. Software Developers can reproduce bugs that appear in production systems easier if the underlying application runs in a container. Developers can use the container image to run an application in the same runtime environment on their local machine for troubleshooting.

4. Reusability

The container image ecosystem works like an onion. Container images consist of multiple layers that are entwined into each other like an onion.

Reusing rather than rewriting container images makes it easier to specify a container. A NodeJS developer can use the official NodeJS Docker Hub image to containerize his application. There is no need to specify installation routines for a NodeJS interpreter or the NPM package manager. This step is already covered by the NodeJS base container image.

Compare this approach with the typical installation workflow on virtual machines. A vanilla virtual machine comes with an operating system only. DevOps have to intercept the virtual machine through SSH to install software dependencies and parts of the software runtime by hand.

Additional software like Chef or Ansible can automate this process but administrators have to configure and maintain automation workflows.

5. Flexibility

A Docker Container can be deployed to any operating system with a Docker Engine installed. Docker supports a wide range of operating systems from Windows, MacOS to most Linux distributions.

Being able to deploy a container across many operating systems offers flexibility. It makes us more independent from conditions that we meet on our infrastructure. Virtual machines on Amazon Web Services might differ greatly from virtual machines on Microsoft Azure. So your software applications might require different dependencies depending on the infrastructure that you deploy them on.

Container technologies ship their own software runtime and circumvent discrepancies between infrastructures. That makes it easier to deploy them on different infrastructures.

6. Reproducibility

The premise of containers is that software running inside them behaves the same, regardless of which host system we deploy them. The extended isolation of container processes compared to "regular" UNIX processes ensures that this premise holds true.

Knowing that your containerized application behaves the same on any host system makes it easier to reproduce and debug problems that happen in productive systems. Software developers can run containerized applications on their local machine and debug a problem that the customer reported in the production environment.

7. Interoperability

Container technologies simplify the collaboration between developers and operators. Operators can provide the container image while software developers focus on programming the software application specified in the container image.

Changes in the software application rarely require changes of the container image. This makes it easier to isolate the developer and operator roles in a software company.

Container images function as a contract between developer and operator. This contract specifies how software can be deployed on the production system through a container runtime.

Compare this with "regular" software processes running on a developer machine or a virtual machine. Complex software processes require different data services and software dependencies. Developers and operators have to install these runtime dependencies on local development machines and virtual machines alike. This results in a confusing installation procedure that can lead to misunderstandings.

8. Composability

Composing software services and applications across multiple virtual machines can be challenging. Platform providers tend to install applications and services on dedicated virtual machines to limit potential side effects.

Additional software like Chef or Ansible gets used to automate this process but require configuration and maintenance of qualified personnel.

The implementation of large microservice architectures with multiple applications and services that communicate with each other are common these days.

Container orchestrators like Kubernetes can help to schedule and deploy containerized microservices across multiple virtual machines.

9. Scalability

Containers are fast, it takes a few seconds to start a container on a host system. They are so fast that Kubernetes kills failing container services and starts a new service rather than fixing the old instance.

Deployment speed is an important metric in software hosting. It reduces the downtime of services during updates and makes it easier to scale out.

Creating a new virtual machine on an automated infrastructure (IaaS) like AWS on the other hand takes minutes. Requesting a larger VM on the same infrastructure to satisfy the growing resource demand of an application costs you another couple of minutes.

This makes it much more difficult to scale out applications or services that run as processes on virtual machines instead of containers.

10. Less Permissive

It can be challenging to deploy applications on virtual machines if you don't have sufficient permissions to install third-party applications and dependencies.

Many Linux users install software and dependencies through a package manager like aptitude with superuser permissions. This can be done with the sudo command that runs a terminal command with superuser privileges.

But giving every developer and operator superuser privileges is dangerous. Many organizations restrict superuser privileges on productive virtual machines for this reason.

Container processes do not need to be executed with superuser privileges. They ship their own virtual operating system and developers specify required dependencies through the container image.

Summary

Container technologies revolutionized how we as a community operate and deploy software.

This blog post taught us the fundamental concepts of container technologies along with their advantages.

But containers are not a magic pill. They have limitations like any other technology. One of the weaknesses of container technologies is at the same time its biggest strength. The lack of virtualized system resources in containers makes them slim and fast but they compromise the isolation of containers in comparison to virtual machines.

A container does not virtualize the network, storage, or server resources. It reuses existing system resources of the host system along with other containers and processes. Container runtimes cannot limit the bandwidth of IO operations for a container process. This problem is also known as the IOPS problem.

Placing two large containerized databases on the same virtual machine might lead to resource contingencies. One database could occupy the complete bandwidth for data IO operations of the host system, effectively leaving the other database dry.

And those are the 10 reasons why you should use container technologies.

I recommend you have a look at the excellent documentation of Docker if you want to get started with container technologies now.

First published here

Why Dockerizing Applications is the Key to Building Scalable Software

Too Long; Didn't Read

Companies Mentioned

What is a Container?

Container Images

Container Registries

Container Orchestration

Open Container Initiative

Reasons to Use Container Technologies

1. Resource Efficiency

2. Isolation

3. Automated Setup

4. Reusability

5. Flexibility

6. Reproducibility

7. Interoperability

8. Composability

9. Scalability

10. Less Permissive

Summary

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

Why Dockerizing Applications is the Key to Building Scalable Software

Too Long; Didn't Read

Companies Mentioned

What is a Container?

Container Images

Container Registries

Container Orchestration

Open Container Initiative

Reasons to Use Container Technologies

1. Resource Efficiency

2. Isolation

3. Automated Setup

4. Reusability

5. Flexibility

6. Reproducibility

7. Interoperability

8. Composability

9. Scalability

10. Less Permissive

Summary

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps