Make your containerized CI environments truly useful by accelerating your Docker builds Modern software development cycle means packaging your applications often as a container. This task can be time consuming and may slow down your testing or deployment significantly. The problem is especially obvious in the context of a continuous integration and deployment processe where images are built at every code modification. In this article, we will discuss various ways of speeding up the build time of Docker images in a continuous integration pipeline by implementing different strategies. Packaging a sample application locally As an example, we will first take a Python application. Cannot be simpler than that: Flask flask Flask

app = Flask(__name__) from import @app.route('/') : def hello_world () return 'Hello, World!' Writing the Dockerfile Let’s write the corresponding Dockerfile: python: -alpine as builder VENV= PATH= python: -alpine VENV= PATH= FLASK_APP= FROM 3.7 # install dependencies required to build python packages RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip # setup venv and download or build dependencies ENV "/venv" ENV "${VENV}/bin:${PATH}" COPY requirements.txt . RUN python -m venv \
    && pip install --no-cache-dir -r requirements.txt ${VENV} FROM 3.7 # setup venv with dependencies from the builder stage ENV "/venv" ENV "${VENV}/bin:$PATH" COPY --from=builder ${VENV} ${VENV} # copy app files WORKDIR /app COPY app . # run the app EXPOSE 5000 ENV "hello.py" CMD [ , , ] "flask" "run" "--host=0.0.0.0" You can see here a classic process: multi-stage build We start with a light base image in which we install the build tools and download or compile the dependencies into a Python virtual environment In the second stage, we copy the virtual env with our dependencies into the target image and finally add the application files Why this two stages process? First, because you have a secured build process as it runs in a container without interference from the host environment. And second, you have a slim final image without all the build libraries but only what is required to run the app. Running and testing the image Making sure everything is working as expected: docker build -t hello .
docker run -d --rm -p 5000:5000 hello
curl localhost:5000
Hello, World! If you run the docker build command a second time: docker build -t hello .
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
 ---> Using cache
 ---> 24d044c28dce
... As you can see, this second build is and are reused if they present no change. much quicker as layers are cached in your local Docker service Pushing the image Let’s publish our image to an external registry and see what happens: docker tag hello my-registry/hello:1.0
docker push my-registry/hello:1.0

The push refers to repository [my-registry/hello]
8388d558f57d: Pushed 
77a59788172c: Pushed 
673c6888b7ef: Pushed 
fdb8581dab88: Pushed
6360407af3e7: Pushed
68aa0de28940: Pushed
f04cc38c0ac2: Pushed
ace0eda3e3be: Pushed
latest: digest: sha256:d815c1694083ffa8cc379f5a52ea69e435290c9d1ae629969e82d705b7f5ea95 size: 1994 Note how each intermediary layers are identified by a hash. We count 8 layers because we have exactly 8 dockers commands in our Dockerfile beyond our last FROM instruction. It’s important to understand that layers from our base builder image are not sent to the remote Docker registry when we push our image, . The intermediate layers are still cached in the local Docker daemon though, they can reused for your next local build command. only layers from the last stage are pushed No problem with local build, let’s now see how it works in a CI environment. Building the Docker image in a CI pipeline context In real life, the building and pushing of Docker images isn’t necessarily made locally like this but typically runs inside a continuous integration and deployment platform. You want to build and push your image at every code changes before deploying your application. Of course, the build time is critical as you want a very fast feedback loop. Test CI environment We will use a CI environment leveraging: GitLab.com CI for hosting GitLab Runner Kubernetes Executor The last point is important because our CI jobs will run into a containerized environment. With that in mind, each job is spawned as a Kubernetes . Every modern CI solution use containerized job and all face the same problem when trying to build Docker containers: . Pod you need to make the Docker commands works inside a Docker container To make everything go smoothly you have two options: Binding the/var/run/docker.sock on which the Docker daemon listens, effectively making the host daemon available to our job container Using an additional container running “Docker in Docker” (aka dind) alongside your job. Dind is a special Docker variant running as privileged and configured to be able to run inside Docker itself 😵 We will use the later option for simplicity. GitLab pipeline implementation In a GitLab pipeline, you usually create utility containers like DinD by means of the . service keyword In the pipeline excerpt below, both the docker-build job and the dind service container will run in the same Kubernetes Pod. When docker is used in the job’s script, it will sends commands to the dind auxiliary container thanks to the DOCKER_HOST environment variable. stages: - build - test - deploy variables: # disable Docker TLS validation DOCKER_TLS_CERTDIR: "" # localhost address is shared by both the job container and the dind container (as they share the same Pod) # So this configuration make the dind service as our Docker daemon when running Docker commands DOCKER_HOST: "tcp://localhost:2375" services: - docker: stable-dind docker-build: image: docker:stable stage: build script: - docker build -t hello . - docker tag my-registry/hello:${CI_COMMIT_SHORT_SHA} - docker push my-registry/hello:${CI_COMMIT_SHORT_SHA} Running the pipeline This pipeline should run fine. By running it once and checking the job output we have: docker build -t hello .

Step 1/15 : FROM python:3.7-alpine as builder
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
---> Running ca50f59a21f8
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
... in As it’s the first time we are building our container, every layer is built by executing commands. The total run time of the job is around 1 minute. If you run your pipeline a second time without changing anything you should observe the same thing: every layer is rebuilt! When we ran our build commands locally, cached layers where reused but not here. For such a simple image it doesn’t really matter but in real life where some images may takes tens of minute to build it can be a real hassle. Why is that? Simply because in this case . Sadly, you cannot easily persist the data between two pipeline launches. dind is a temporary container that is created with the job and die after the job is done so any cached data is lost How we can benefit from the cache and still be running a dind container? Benefiting from the Docker cache while running Docker in Docker One solution: Pull/Push dancing The first solution is rather straightforward: we will use our remote registry (the one we push into) as a remote cache for our layers. More precisely: We start by pulling the most recent image (i.e. latest ) from the remote registry to be used as a cache for the subsequent docker build command. Then we build the image using the pulled image as a cache (--cache-from argument) if available.We tag this new build withlatest and with the commit SHA. Finally we push both tagged images to the remote registry so that they may also be used as cache for subsequent builds. stages: - build - test - deploy variables: # disable Docker TLS validation DOCKER_TLS_CERTDIR: "" DOCKER_HOST: "tcp://localhost:2375" services: - docker: stable-dind docker-build: image: docker:stable stage: build script: - docker pull my-registry/hello:latest || true - docker build --cache-from my-registry/hello:latest -t hello:latest . - docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA} - docker tag hello:latest my-registry/hello:latest - docker push my-registry/hello:${CI_COMMIT_SHORT_SHA} - docker push my-registry/hello:latest If you run this new pipeline two times, the cache use is still disappointing. The layers from the base builder image are all rebuilt.Only the first 2 layers (8 & 9) of the final stage are using the cache but the following layers are rebuilt. Like we saw earlier, when pushing our image locally, the layers of the base builder image are not pushed to the remote registry and are effectively lost. Consequently when we are pulling the latest image, they are not there and need to be rebuilt. Then when our final stage image is built (step 8 to 15), the first two layers are present in the image we pulled and used as cache. But in step 10 we are getting dependencies from the builder image which have changed so every steps after are also built again. To sum it up, To improve it, we need to push the intermediary builder image to the remote registry to persist its layers: there is only a modest cache use with 2 steps out of 15 benefiting from the cache! stages: - build - test - deploy variables: # disable Docker TLS validation DOCKER_TLS_CERTDIR: "" DOCKER_HOST: "tcp://localhost:2375" services: - docker: stable-dind docker-build: image: docker:stable stage: build script: - docker pull my-registry/hello-builder:latest || true - docker pull my-registry/hello:latest || true - docker build --cache-from my-registry/hello-builder:latest --target builder -t hello-builder:latest . - docker build --cache-from my-registry/hello:latest --cache-from my-registry/hello-builder:latest -t hello:latest . - docker tag hello-builder:latest my-registry/hello-builder:latest - docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA} - docker tag hello:latest my-registry/hello:latest - docker push my-registry/hello-builder:latest - docker push my-registry/hello:${CI_COMMIT_SHORT_SHA} - docker push my-registry/hello:latest We build our builder intermediary stage as a proper docker image using thetarget option. After that, we push it to the remote registry, eventually pulling it as a cache for building our final image. When running the pipeline, our time is down to 15 seconds! You can see the build is slowly becoming quite complicated. If you are lost, just think about an image with 3 or 4 intermediary stages! It does work though. Another drawback is that you have to upload and download all these layers each time which may be quite expensive in storage and transfer costs. Another solution: external dind service We need to have a dind service running to execute our docker build. In our previous try, dind is embedded into each job and share the lifecycle of the job making it impossible to build a proper cache. Why not make dind a first class citizen by creating a dind service in our Kubernetes cluster? It would run with a PersistentVolume attached to handle the cached data and every jobs could send their docker commands to this shared service. Creating such a service in Kubernetes is easy: apiVersion: v1 kind: PersistentVolumeClaim metadata: labels: app: docker-dind name: dind spec: accessModes: - ReadWriteOnce resources: requests: storage: 500 Gi --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: docker-dind name: dind spec: replicas: 1 selector: matchLabels: app: docker-dind template: metadata: labels: app: docker-dind spec: containers: - image: docker:19.03-dind name: docker-dind env: - name: DOCKER_HOST value: tcp://0.0.0.0:2375 - name: DOCKER_TLS_CERTDIR value: "" volumeMounts: - name: dind-data mountPath: /var/lib/docker/ ports: - name: daemon-port containerPort: 2375 protocol: TCP securityContext: privileged: true #Required for dind container to work. volumes: - name: dind-data persistentVolumeClaim: claimName: dind --- apiVersion: v1 kind: Service metadata: labels: app: docker-dind name: dind spec: ports: - port: 2375 protocol: TCP targetPort: 2375 selector: app: docker-dind Then we slightly modify our original GitLab pipeline to point to this new external service and remove the built-in dind service: stages: - build - test - deploy variables: # disable Docker TLS validation DOCKER_TLS_CERTDIR: "" # here the dind hostname is resolved as the Kubernetes dind service by the kube dns DOCKER_HOST: "tcp://dind:2375" docker-build: image: docker:stable stage: build script: - docker build -t hello . - docker tag hello:latest my-registry/hello:{CI_COMMIT_SHORT_SHA} - docker push my-registry/hello:{CI_COMMIT_SHORT_SHA} If you run the pipeline twice, , even better than our previous solution. For a “big” image taking around 10 minutes to build, this strategy also reduce the build time to a few seconds if no layers have changed. the second time the build should be 10 seconds One last option: using Kaniko A final option may be to use . With it, you can build Docker images without the need of a Docker daemon, making everything we saw a non-problem. Kaniko However, please note that doing so you cannot use advanced options like for example injecting secrets when building your image. For this reason, it’s not the solution I retained. BuildKit Conclusion As software development makes heavy use of containers everywhere, building them efficiently is key in your release pipeline. Like we’ve seen, the problem can become quite complex and every solutions has its trade-off. The solutions proposed here are illustrated with the use of GitLab but keep in mind they are still true in any other containerized CI environment. Read behind a paywall at https://medium.com/swlh/dramatically-improve-your-docker-build-time-in-gitlab-ci-db0259f1bb08

Alongside

Fetch

How To Improve Your Docker Build Time in GitLab CI

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Creating GitOps Workflow with ArgoCD, Kustomize and GitHub Actions

104 Stories To Learn About Continuous Integration

139 Stories To Learn About Cicd

15 of the Best Continuous Delivery Tools

5 Best Microservices CI/CD Tools You Need to Check Out

Creating GitOps Workflow with ArgoCD, Kustomize and GitHub Actions

104 Stories To Learn About Continuous Integration

139 Stories To Learn About Cicd

15 of the Best Continuous Delivery Tools

5 Best Microservices CI/CD Tools You Need to Check Out

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps