Mike Nikles

@mikenikles

CircleCI Performance Difference Between Cache and Workspace

Persist ~ 70% faster; restore ~ 80% faster, your mileage may vary

A few days ago, Andrew Stiegmann commented on a blog post of mine where I shared how we automate our release process with CircleCI. Andrew’s comment can be summarized with “Hey, is there any reason you use CircleCI cache instead of a workspace?

I read up on a CircleCI blog post that explains the difference between a cache and a workspace. Their diagram does a great job explaining all that:

CircleCI cache vs workspace. Source: CircleCI blog post

Our CircleCI workfow contains of five jobs, each needs access to node_modules and a bunch of generated files in dist folders. Our “build job” as outlined in the diagram is where we install all npm packages and generate the files in the dist folders. We use a monorepo (more about that here), which results in a lot of packages hoisted to the root node_modules directory.

Migrate from a cache to a workspace

The main change here is in the .circleci/config.yml file. First, the save_cache directive needs to be replaced with persist_to_workspace. Likewise, any occurrence of restore_cache needs to be replaced with an attach_workspace directive.

In our case, that change alone improved persistence of data by about 60% (from 67 seconds writing to the cache to 27 seconds persisting to a workspace), while restoring it in a subsequent job dropped from 35 seconds with a cache to 12 seconds with a workspace (a roughly 65% performance gain).

Bonus improvement — prune node_modules

So far so good, but while I was at it, I dug a bit deeper. For a while now I’ve had an eye on our node_modules directory size… 🙀 Have you ever checked yours? If so, I’m quite sure you’re with me here. If not, go ahead and check yours, then come back here — I’ll wait.

Heaviest objects in the universe — based on estimates

Alright, now that all readers understand what I mean, let’s continue. In our case, the size of node_modules is 553 MB. Does it have to be though…? Definitely not! There’s no need to have *.md files, documentation assets, tests, temporary files, etc. All these files have to be compressed and decompressed when we share it across jobs on CircleCI, regardless of whether we use a cache or a workspace.

I’m aware of two options, so I implemented both and compared them to make an informed decision on which one to choose.

node-prune

TJ Holowaychuk built that for one of his products and open-sourced it at https://github.com/tj/node-prune. It’s a tiny Go command that can be installed in a single line. In our case, it takes 3 seconds to install and run the command. The output of the command looks as follows:

node-prune doing its job on CircleCI

That’s 136 MB of unnecessary files dropped from the node_modules directory. That’s also 136 MB less data to be compressed and decompressed when passing data from one CircleCI job to the next.

Now, persisting data to a workspace takes 20s, a roughly 70% improvement compared to the original implementation of using a cache. Restoring that data takes 7 seconds, a total of 80% faster than what we had originally.

yarn autoclean

I wasn’t satisfied with only one option to prune node_modules. Whenever possible, it’s a good idea to have a few extra data points.

I found yarn autoclean as documented at https://yarnpkg.com/lang/en/docs/cli/autoclean/. This command cleans up node_modules as part of the yarn install command.

Running that resulted in slightly slower performance results compared to node-prune. yarn install took about 20 seconds longer (because it runs autoclean) which is 17 seconds more than what node-prune requires to get installed and prune the dependencies. Persisting the repository to a workspace took 23 seconds (about 65% faster than compared to a cache) and restoring the workspace was “only” about 70% faster.

Conclusion

This table taken from the PR I opened at work summarizes the results:

CircleCI cache vs workspace

I completely dropped the cache for now, mainly to keep the PR small and manageable. I’m playing with some thoughts to see whether a cache could help speed up multiple workflow runs.

In our case, we save 2.6 minutes for an entire workflow on CircleCI.

If you have similar results, or a completely different experience, I’d love to hear from you. Please leave a comment or clap if you found this interesting (those claps are encouraging to write and share more 😀).

More by Mike Nikles

Topics of interest

More Related Stories