Whooosh: A Comprehensive Guide for the Fastest Possible Docker Builds in Human Existence

Written by aaronbatilo | Published 2023/01/20
Tech Story Tags: programming | docker | kubernetes | tutorial | fastest-possible-docker-build | human-existence | technology | web-development

TLDRThis post is written with the assumption that you already have some experience with Kubernetes and with AWS. The concepts and examples should translate fairly directly to other clouds or other managed Kubernetes offerings.via the TL;DR App

A few months ago, I migrated my newsletter to substack and the first post I published here was about using S3 as your remote layer cache. When that post was published, a kind user on Twitter told me about https://depot.dev. One of the founders of Depot responded to the thread and it got me really curious about trying the platform.

I got signed up and started building my containers and I was blown away. Since they support GitHub OIDC for authentication and since they’re a drop in replacement for docker build, getting setup genuinely took a few minutes. I had to switch off using my current bake file approach since Depot doesn’t currently support bake (but they’re interested in supporting it!) but even then, running all of my docker builds in parallel via a GitHub Actions matrix, my builds went from about 3.5 minutes down to about 50 seconds in the cold case and about 15 seconds when a given container in my monorepo hadn’t changed.

I was flabbergasted. I became obsessed with understanding how the improvements were so ridiculous. The Depot founders were not shy about sharing some of the secret sauce. I spoke with them directly but they clearly explain the improvements right in their documentation.

Depot is a remote container build service that makes image builds 3-14x faster than building Docker images inside generic CI providers. Docker image builds get sent to a fast builder instance with a persistent cache. The resulting image can then be downloaded locally or pushed to a registry.

The persistent cache! The SSD that they attach to their builder instances holds a cache that makes everything significantly faster. Having the cache right on disk means that you don’t have to spend any time transferring any cache artifacts from a remote location. Depot themselves have a well documented architecture for how they’ve implemented their builders. It’s isolated, secure, and managed. They support a flurry of other features, but I wanted to figure out how to simulate their setup for my own learning and now I’m here to share it with you, and to share what additional features are available when you know that you have a persistent cache.

To be clear, I’m not officially affiliated to Depot in any way whatsoever. This is not a paid post. I’m just genuinely impressed and fascinated by their product.

Who is this for?

This post is written with the assumption that you already have some experience with Kubernetes and with AWS. The concepts and examples should translate fairly directly to other clouds or other managed Kubernetes offerings.

Running Docker buildkit in Kubernetes

As it turns out, Docker buildkit already supports being able to run in Kubernetes via its “Kubernetes driver”.

⇒  docker buildx create \
  --bootstrap \
  --name=kube \
  --driver=kubernetes \
  --driver-opt=[key=value,...]

This command will quickly use your current Kubernetes context and create a remote buildx agent that you can use to build your containers. This creates all of the relevant Kubernetes resources for you automatically. This method creates a kind: Deployment in your cluster that you can scale out. The —driver-opt that are documented in the link above will let you control things like number of replicas, and CPU/memory allocations.

Of course, if you want even more control, you can refer to buildkit’s list of examples in their GitHub repo. They have examples for authenticating with TLS, examples of using other deployment types in your Kubernetes cluster, etc. For this article, we’re going to break it down step by step and we’re going to rely on the Kubernetes cluster authentication for accessing the remote agents. From inspecting the resource that using the docker command gives us, we can come up with the following minimal Kubernetes deployment.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: remote-buildkit-agent
  labels:
    app: remote-buildkit-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: remote-buildkit-agent
  template:
    metadata:
      labels:
        app: remote-buildkit-agent
    spec:
      containers:
        - name: buildkitd
          image: moby/buildkit:buildx-stable-1
          readinessProbe:
            exec:
              command:
                - "buildctl"
                - "debug"
                - "workers"
          securityContext:
            privileged: true

That’s it. That’s the bare minimum you need to schedule a remote agent. Now we need to figure out how to connect to it so that we can start to send it work to do.

As a quick aside, you’ll get a massive speed up for pushing your docker images, if you run this remote buildkit agent in the same cloud region that you’re storing your images. I run my EKS cluster in us-west-2 and all of my ECR repositories are also configured for us-west-2. If you’d like to read more about this speed up, I talk about it in a previous newsletter post where I got almost a 30% improvement.

Authenticating with remote buildkit using a kubeconfig

When you run the docker buildx create command, a file gets created on your computer with the buildx authentication configuration. If you use the following command:

⇒  docker buildx create --name remote-buildkit-agent --bootstrap --use --driver kubernetes

Then your configuration will be created like so:

⇒  cat ~/.docker/buildx/instances/remote-buildkit-agent | jq
{
  "Name": "remote-buildkit-agent",
  "Driver": "kubernetes",
  "Nodes": [
    {
      "Name": "remote-buildkit-agent0",
      "Endpoint": "kubernetes:///remote-buildkit-agent?deployment=&kubeconfig=",
      "Platforms": null,
      "Flags": null,
      "DriverOpts": null,
      "Files": null
    }
  ],
  "Dynamic": false
}

This file will be named after the agent that you created. The file you see above is what gets created for you automatically, but we can remove most of the fields and everything will still work. All you actually need is:

⇒  cat remote-buildkit-agent | jq
{
  "Name": "remote-buildkit-agent",
  "Driver": "kubernetes",
  "Nodes": [
    {
      "Name": "remote-buildkit-agent",
      "Endpoint": "kubernetes:///remote-buildkit-agent?deployment=&kubeconfig=",
      "DriverOpts": {
        "namespace": "default"
      }
    }
  ]
}

Do take note that with this configuration, the kubeconfig field is blank which means that the docker client will look for your kubeconfig at the default location. By applying the minimal deployment file and creating this configuration file manually, you’ll be able to start using the remote agent already.

docker buildx use remote-buildkit-agent

Now every time you execute docker buildx build, you’ll execute the build remotely. Your local CPU and memory won’t be used at all. This also means that if you specify a larger CPU or memory allocation in the deployment, that you’ll be able to leverage significantly more powerful build machines. That alone could help speed up your docker builds actually, and that’s even before we get to any of the caching optimizations.

Configuring a more generous garbage collection policy for buildkit

If you build some image a few times with this existing remote agent, you’ll probably notice that a lot of your steps get nicely cached, but then quickly stop being cached when you build another image. This is because buildkit keeps only a limited amount of space for its cache. Fortunately for us though, we can modify that limit and have garbage collection happen less often, which makes the cache much more useful for us.

If you look at the buildkit GitHub repo, we can find some documentation and examples for the default buildkit configuration. This documentation does a great job on its own for all of the other config options, but the sections that we really care about here are the *.*.gcpolicy rules. These sections are tiered and allow for multiple levels of eviction policy. I will make one quick shout out for the max-parallelism option which you might want to leverage when you have a large number of CPUs allocated to your buildkit agent.

Let’s take a look at one of the sections in the documented example:

  [[worker.oci.gcpolicy]]
    keepBytes = 512000000
    keepDuration = 172800
    filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"]

  [[worker.oci.gcpolicy]]
    all = true
    keepBytes = 1024000000

First we see the sections keepBytes which is the number of bytes you want to allocate to the matching cache. 512000000 bytes, aka 512MB is being allocated for specific types of artifacts. There’s the local cache for source code that’s a part of the docker context, there are cache mounts (which we’ll talk more about in the next session), and artifacts for a remote git checkout (like when you run docker buildx build on a public GitHub URL). At the same time, you can also set a time based eviction. That’s what keepDuration is for. The value 172800 is in seconds which translates to 48 hours. In this case, evictions will happen with whichever case happens first.

The second gcpolicy has the all key which will be the fallback. 1024000000 bytes is 1GB of space for all other caches and cache types. If you have a large number of dependencies in your build, you can easily spend more than 1GB on space.

This is the big win that will drastically speed up our builds. Let’s get into how we integrate that in with our existing Kubernetes deployment. Ultimately, this consists of making two changes. The first being that we need to allocate an EBS volume to store our actual cache, and then we need to update the config that our remote agent uses and mount that to the buildkit container.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: remote-buildkit-agent
  labels:
    app: remote-buildkit-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: remote-buildkit-agent
  template:
    metadata:
      labels:
        app: remote-buildkit-agent
    spec:
      containers:
        - name: buildkitd
          image: moby/buildkit:buildx-stable-1
          volumeMounts:
            - name: config
              mountPath: /etc/buildkit
            - name: var-lib-buildkit
              mountPath: /var/lib/buildkit
          readinessProbe:
            exec:
              command:
                - "buildctl"
                - "debug"
                - "workers"
          securityContext:
            privileged: true
      volumes:
        - name: config
          configMap:
            name: remote-buildkit-agent
        - name: var-lib-buildkit
          persistentVolumeClaim:
            claimName: remote-buildkit-agent
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: remote-buildkit-agent
data:
  buildkitd.toml: |
    root = "/var/lib/buildkit"

    [worker]

    [worker.containerd]
      enabled = false

    [worker.oci]
      enabled = true
      gc = true
      gckeepstorage = 30000000000
      snapshotter = "overlayfs"

      [[worker.oci.gcpolicy]]
        filters = ["type==source.local", "type==exec.cachemount", "type==source.git.checkout"]
        keepBytes = 10240000000
        keepDuration = 604800

      [[worker.oci.gcpolicy]]
        keepBytes = 30000000000

      [[worker.oci.gcpolicy]]
        all = true
        keepBytes = 30000000000
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: remote-buildkit-agent
spec:
  resources:
    requests:
      storage: "50Gi"
  accessModes:
    - "ReadWriteOnce"

Take all the contents of the above yaml and assuming you have a CSI provisioner configured on your cluster, this will allocate a 50Gi persistent volume, and then we attach that to the remote-buildkit-agent container, and we also create a ConfigMap with a minimally configured buildkit file that will increase the garbage collection thresholds. In this example, the buildkit config file is configured for a 30Gb persistent cache. That means we can hold significantly more layers in the local cache and layers that haven’t been touched will still get evicted as we fill up the cache.

Leveraging mount caches with persistent disk buildkit

Newer versions of the buildkit backend support an entirely different kind of cache called the run cache mount. Anywhere that you specify a RUN command in your Dockerfile, you can add a cache mount that lets you specify directories that you’d like to cache as the type==exec.cachemount artifact in your cache. For example:

RUN \
    --mount=type=cache,target=/var/cache/apt \
    apt-get update && apt-get install -y git

This stores the /var/cache/apt directory into a different type of cache that’s explicit, instead of having the generic layer cache store the results of the apt commands.

As opposed to the buildkit inline cache, or the existing remote caches for remote registries, you won’t see as much documentation about this time of cache yet both because it’s a new feature, but also because there isn’t a build provider in the world that can leverage this type of cache except for Depot. This is because providers like GitHub Actions don’t give you dedicated disk space that you can use. GitHub Actions and CircleCI, etc, all let you cache directories on the host within some limit, and there’s an open issue for buildkit to be able to specify the cache directory that gets used. However, at the moment, either you run your own remote agents or you use Depot to get persistent disk. Having a run cache mount is a phenomenal speed increase if your applications can use incremental compilation of some kind.

For example, the Go toolchain introduced a package build cache in 1.10. That means that you can have a Dockerfile line like so:

RUN \
  --mount=type=cache,target=/root/.cache \
  --mount=type=cache,target=/root/go/pkg/mod \
  go test -v ./... && \
  CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go install -ldflags="-w -s" ./cmd/...

If you run your tests at the same time as your build like I outlined in my previous newsletter post, then the built test artifacts and results will also be leveraged with this run type. This makes running your tests and building your actual artifacts as fasts as possible because you’ve minimized re-work.

For even more examples of using a run cache mount, I’d recommend that you check out this article by Vladislav Supalov.

Wrapping it all into a GitHub Action Workflow

Combining some of the steps from earlier in this post, you can authenticate a GitHub Action Workflow just by writing a kubeconfig and the listed docker config. Normally, you would use the docker/setup-buildx-action action with the right driver to get a buildkit agent in your cluster. Unfortunately, if you have already created the agents like you would with the custom config, this action doesn’t let you connect to existing agents. So you’ll need your own version of the same command. With EKS, I leverage the aws eks update-kubeconfig command and then write the json file myself. The next big key here is the docker buildx use command.

      - name: Connect to remote buildkit agent
        env:
          REMOTE_AGENT_NAME: remote-buildkit-agent
        run: |
          aws eks update-kubeconfig --name your-cluster-name
          mkdir -p ~/.docker/buildx/instances/
          cat << EOF > ~/.docker/buildx/instances/"$REMOTE_AGENT_NAME"
          {
            "Name": "$REMOTE_AGENT_NAME",
            "Driver": "kubernetes",
            "Nodes": [
              {
                "Name": "$REMOTE_AGENT_NAME",
                "Endpoint": "kubernetes:///$REMOTE_AGENT_NAME?deployment=&kubeconfig=",
                "DriverOpts": {
                  "namespace": "default"
                }
              }
            ]
          }
          EOF
          docker buildx use "$REMOTE_AGENT_NAME"

If you set the docker buildx context correctly, then if you use the available docker/buildx-push-action, you’ll automatically use your remote buildkit agent with your persistent cache. You could even use docker/bake-action with this setup since it’s native buildkit. This is actually one of the advantages of using the self-hosted option over Depot’s offering.

What’s next?

The EBS persistent volume that we used in this post can only be attached to once pod replica at a time. That means that we can’t scale out our build agents. Even if you increase the number of replicas for the deployment, the PVC will only attach to one of the replicas. One way that you can improve this is by using a Kubernetes StatefulSet. But now you’ll have multiple copies of the caches which are not guaranteed to be the same. The Kubernetes buildkit driver does have a parameter called loadbalance which you can leverage for some amount of predicability but not much. Something that I do want to try experimenting with in the future is using an EFS mount in EKS so that I can attach the same filesystem to multiple replicas.

Another problem with our configuration is that we currently run the buildkit agent as a privileged container. From a security perspective, it’s not ideal that there are additional privileges. There are ways to run the buildkit agent with rootless mode being set to false but that’s for another time.

The last big piece of functionality that is missing here is any configuration for doing cross platform docker builds. For Go applications, cross compilation is very easy, but for some application stacks, it’s significantly easier to build natively on other platforms.

Just for the final piece of thoroughness, many of the problems that this custom, self-hosted configuration has, are solved by the folks doing Depot. Again, I’m not officially affiliated with them at all. Just a genuinely big fan of their product.


That’s all for now, folks. This ended up being quite a long post! As usual, sample files are also available on GitHub. Let me know what you think and I’ll see you next time. Please consider subscribing to get future posts, and consider sharing with your friends!

Also published here.


Written by aaronbatilo | Writing about software, machine learning, and cloud infrastructure experiments at https://sliceofexperiments.com
Published by HackerNoon on 2023/01/20