Make your containerized CI environments truly useful by accelerating your Docker builds
Modern software development cycle means packaging your applications often as a container. This task can be time consuming and may slow down your testing or deployment significantly. The problem is especially obvious in the context of a continuous integration and deployment processe where images are built at every code modification.
In this article, we will discuss various ways of speeding up the build time of Docker images in a continuous integration pipeline by implementing different strategies.
As an example, we will first take a Python Flask application. Cannot be simpler than that:
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello, World!'
Writing the Dockerfile
Let’s write the corresponding Dockerfile:
FROM python:3.7-alpine as builder
# install dependencies required to build python packages
RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
# setup venv and download or build dependencies
ENV VENV="/venv"
ENV PATH="${VENV}/bin:${PATH}"
COPY requirements.txt .
RUN python -m venv ${VENV} \
&& pip install --no-cache-dir -r requirements.txt
FROM python:3.7-alpine
# setup venv with dependencies from the builder stage
ENV VENV="/venv"
ENV PATH="${VENV}/bin:$PATH"
COPY --from=builder ${VENV} ${VENV}
# copy app files
WORKDIR /app
COPY app .
# run the app
EXPOSE 5000
ENV FLASK_APP="hello.py"
CMD [ "flask", "run", "--host=0.0.0.0" ]
You can see here a classic multi-stage build process:
Why this two stages process? First, because you have a secured build process as it runs in a container without interference from the host environment. And second, you have a slim final image without all the build libraries but only what is required to run the app.
Running and testing the image
Making sure everything is working as expected:
docker build -t hello .
docker run -d --rm -p 5000:5000 hello
curl localhost:5000
Hello, World!
If you run the docker build command a second time:
docker build -t hello .
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
---> Using cache
---> 24d044c28dce
...
As you can see, this second build is much quicker as layers are cached in your local Docker service and are reused if they present no change.
Pushing the image
Let’s publish our image to an external registry and see what happens:
docker tag hello my-registry/hello:1.0
docker push my-registry/hello:1.0
The push refers to repository [my-registry/hello]
8388d558f57d: Pushed
77a59788172c: Pushed
673c6888b7ef: Pushed
fdb8581dab88: Pushed
6360407af3e7: Pushed
68aa0de28940: Pushed
f04cc38c0ac2: Pushed
ace0eda3e3be: Pushed
latest: digest: sha256:d815c1694083ffa8cc379f5a52ea69e435290c9d1ae629969e82d705b7f5ea95 size: 1994
Note how each intermediary layers are identified by a hash. We count 8 layers because we have exactly 8 dockers commands in our Dockerfile beyond our last FROM instruction.
It’s important to understand that layers from our base builder image are not sent to the remote Docker registry when we push our image, only layers from the last stage are pushed. The intermediate layers are still cached in the local Docker daemon though, they can reused for your next local build command.
No problem with local build, let’s now see how it works in a CI environment.
In real life, the building and pushing of Docker images isn’t necessarily made locally like this but typically runs inside a continuous integration and deployment platform. You want to build and push your image at every code changes before deploying your application. Of course, the build time is critical as you want a very fast feedback loop.
Test CI environment
We will use a CI environment leveraging:
The last point is important because our CI jobs will run into a containerized environment. With that in mind, each job is spawned as a Kubernetes Pod. Every modern CI solution use containerized job and all face the same problem when trying to build Docker containers: you need to make the Docker commands works inside a Docker container.
To make everything go smoothly you have two options:
We will use the later option for simplicity.
GitLab pipeline implementation
In a GitLab pipeline, you usually create utility containers like DinD by means of the service keyword.
In the pipeline excerpt below, both the docker-build job and the dind service container will run in the same Kubernetes Pod. When docker is used in the job’s script, it will sends commands to the dind auxiliary container thanks to the DOCKER_HOST environment variable.
stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
# localhost address is shared by both the job container and the dind container (as they share the same Pod)
# So this configuration make the dind service as our Docker daemon when running Docker commands
DOCKER_HOST: "tcp://localhost:2375"
services:
- docker:stable-dind
docker-build:
image: docker:stable
stage: build
script:
- docker build -t hello .
- docker tag my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
This pipeline should run fine. By running it once and checking the job output we have:
docker build -t hello .
Step 1/15 : FROM python:3.7-alpine as builder
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
---> Running in ca50f59a21f8
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
...
As it’s the first time we are building our container, every layer is built by executing commands. The total run time of the job is around 1 minute.
If you run your pipeline a second time without changing anything you should observe the same thing: every layer is rebuilt! When we ran our build commands locally, cached layers where reused but not here. For such a simple image it doesn’t really matter but in real life where some images may takes tens of minute to build it can be a real hassle.
Why is that? Simply because in this case dind is a temporary container that is created with the job and die after the job is done so any cached data is lost. Sadly, you cannot easily persist the data between two pipeline launches.
How we can benefit from the cache and still be running a dind container?
One solution: Pull/Push dancing
The first solution is rather straightforward: we will use our remote registry (the one we push into) as a remote cache for our layers.
More precisely:
stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
DOCKER_HOST: "tcp://localhost:2375"
services:
- docker:stable-dind
docker-build:
image: docker:stable
stage: build
script:
- docker pull my-registry/hello:latest || true
- docker build --cache-from my-registry/hello:latest -t hello:latest .
- docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker tag hello:latest my-registry/hello:latest
- docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:latest
If you run this new pipeline two times, the cache use is still disappointing.
The layers from the base builder image are all rebuilt.Only the first 2 layers (8 & 9) of the final stage are using the cache but the following layers are rebuilt.
Like we saw earlier, when pushing our image locally, the layers of the base builder image are not pushed to the remote registry and are effectively lost. Consequently when we are pulling the latest image, they are not there and need to be rebuilt.
Then when our final stage image is built (step 8 to 15), the first two layers are present in the image we pulled and used as cache. But in step 10 we are getting dependencies from the builder image which have changed so every steps after are also built again.
To sum it up, there is only a modest cache use with 2 steps out of 15 benefiting from the cache! To improve it, we need to push the intermediary builder image to the remote registry to persist its layers:
stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
DOCKER_HOST: "tcp://localhost:2375"
services:
- docker:stable-dind
docker-build:
image: docker:stable
stage: build
script:
- docker pull my-registry/hello-builder:latest || true
- docker pull my-registry/hello:latest || true
- docker build --cache-from my-registry/hello-builder:latest --target builder -t hello-builder:latest .
- docker build --cache-from my-registry/hello:latest --cache-from my-registry/hello-builder:latest -t hello:latest .
- docker tag hello-builder:latest my-registry/hello-builder:latest
- docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker tag hello:latest my-registry/hello:latest
- docker push my-registry/hello-builder:latest
- docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:latest
We build our builder intermediary stage as a proper docker image using thetarget option. After that, we push it to the remote registry, eventually pulling it as a cache for building our final image. When running the pipeline, our time is down to 15 seconds!
You can see the build is slowly becoming quite complicated. If you are lost, just think about an image with 3 or 4 intermediary stages! It does work though. Another drawback is that you have to upload and download all these layers each time which may be quite expensive in storage and transfer costs.
Another solution: external dind service
We need to have a dind service running to execute our docker build. In our previous try, dind is embedded into each job and share the lifecycle of the job making it impossible to build a proper cache.
Why not make dind a first class citizen by creating a dind service in our Kubernetes cluster? It would run with a PersistentVolume attached to handle the cached data and every jobs could send their docker commands to this shared service.
Creating such a service in Kubernetes is easy:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: docker-dind
name: dind
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: docker-dind
name: dind
spec:
replicas: 1
selector:
matchLabels:
app: docker-dind
template:
metadata:
labels:
app: docker-dind
spec:
containers:
- image: docker:19.03-dind
name: docker-dind
env:
- name: DOCKER_HOST
value: tcp://0.0.0.0:2375
- name: DOCKER_TLS_CERTDIR
value: ""
volumeMounts:
- name: dind-data
mountPath: /var/lib/docker/
ports:
- name: daemon-port
containerPort: 2375
protocol: TCP
securityContext:
privileged: true #Required for dind container to work.
volumes:
- name: dind-data
persistentVolumeClaim:
claimName: dind
---
apiVersion: v1
kind: Service
metadata:
labels:
app: docker-dind
name: dind
spec:
ports:
- port: 2375
protocol: TCP
targetPort: 2375
selector:
app: docker-dind
Then we slightly modify our original GitLab pipeline to point to this new external service and remove the built-in dind service:
stages:
- build
- test
- deploy
variables:
# disable Docker TLS validation
DOCKER_TLS_CERTDIR: ""
# here the dind hostname is resolved as the Kubernetes dind service by the kube dns
DOCKER_HOST: "tcp://dind:2375"
docker-build:
image: docker:stable
stage: build
script:
- docker build -t hello .
- docker tag hello:latest my-registry/hello:{CI_COMMIT_SHORT_SHA}
- docker push my-registry/hello:{CI_COMMIT_SHORT_SHA}
If you run the pipeline twice, the second time the build should be 10 seconds, even better than our previous solution. For a “big” image taking around 10 minutes to build, this strategy also reduce the build time to a few seconds if no layers have changed.
One last option: using Kaniko
A final option may be to use Kaniko. With it, you can build Docker images without the need of a Docker daemon, making everything we saw a non-problem.
However, please note that doing so you cannot use advanced BuildKit options like for example injecting secrets when building your image. For this reason, it’s not the solution I retained.
As software development makes heavy use of containers everywhere, building them efficiently is key in your release pipeline. Like we’ve seen, the problem can become quite complex and every solutions has its trade-off. The solutions proposed here are illustrated with the use of GitLab but keep in mind they are still true in any other containerized CI environment.
Read behind a paywall at https://medium.com/swlh/dramatically-improve-your-docker-build-time-in-gitlab-ci-db0259f1bb08