Facebook recently released Yarn, a new Node.js package manager built on top of the npm registry, massively reducing install times and shipping a deterministic build out of the box.
Determinism has always been a problem with npm, and solutions like npm shrinkwrap
are not working well. This makes hard to use a npm
-based system for multiple developers and on continuous integration. Also, npm
slowness in case of complex package.json
files causes long build times, representing a serious blocker when using Docker for local development.
This article discuss how to use Yarn with Docker for Node.js development and deployment.
xkcd take on installing code
git clone https://github.com/mfornasa/DockerYarn.git
cd DockerYarn
./build.sh
docker run yarn-demo node -e "console.log('Hello, World')"
The first time your build the container, Yarn fetches npm
dependencies for you. After that, Yarn is executed only when you modify your package.json
, and it uses cache from previous executions. On top of it, you have determinism: the same dependency tree is installed every time and on every machine. And it’s blazing fast!
The procedure works on Mac and Linux. We are going to the Risingstack Node.js Docker image for Node 6. Please install Yarn on your machine before proceeding.
wget https://yarnpkg.com/latest.tar.gz
Dockerfile
:FROM risingstack/alpine:3.4-v6.7.0-4.0.0
WORKDIR /opt/app
# Install yarn from the local .tgzRUN mkdir -p /optADD latest.tar.gz /opt/RUN mv /opt/dist /opt/yarnENV PATH "$PATH:/opt/yarn/bin"
# Install packages using YarnADD package.json /tmp/package.jsonRUN cd /tmp && yarnRUN mkdir -p /opt/app && cd /opt/app && ln -s /tmp/node_modules
This is based on a well-known trick to make use of Docker layer caching to avoid to reinstall all your modules each time you build the container. In this way, Yarn is executed only when you change **package.json**
(and the first time, of course).
package.json
yarn init
yarn add react
docker build . -t yarn-demodocker run yarn-demo node -e "console.log('Hello, World')"
Congratulations! You’re using yarn
with Docker.
yarn.lock”
?Yarn stores the exact version of each package and sub-package in order to be able to reproduce exactly the same dependency tree on each run. Both package.json
and yarn.lock
must be checked into source control. As we run Yarn inside the container, we need to retrieve yarn.lock
. Luckily, it’s not hard to extract yarn.lock
after each run. Simply change the ADD
line in the Dockerfile
with the following:
ADD package.json yarn.lock /tmp/
and build the container using the following command:
docker build . -t yarn-demo; docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > yarn.lock
After the build, yarn.lock
is copied to your working directory, and it will be reused on next Docker run, installing the same dependencies each time.
Congratulations! Now you have deterministic Yarn execution.
That is correct, we are now running Yarn at each build, even if package.json
has not been modified. This is because yarn.lock
is copied from the container to your working directory each time, even if it’s not changed, thus invalidating Docker layer caching. To solve this, we need to copy yarn.lock
only if it’s really changed. To do so:
build.sh
file:#!/bin/bash
docker build . -t yarn-demo
docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > /tmp/yarn.lockif ! diff -q yarn.lock /tmp/yarn.lock > /dev/null 2>&1; thenecho "We have a new yarn.lock"cp /tmp/yarn.lock yarn.lockfi
chmod +x build.sh
./build.sh
docker run yarn-demo node -e "console.log('Hello, World')"
Congratulations! You have now a deterministic Yarn execution, and Yarn is executed only when you change **package.json**
.
Another powerful feature of Yarn is package cache, which is stored on the local filesystem, to avoid downloading packages again. Our procedure so far does not maintain cache over container builds. This could be an issue for big package.json
files.
The following build.sh
solves the issue by saving Yarn cache on your working directory.
#!/bin/bash
# Init empty cache fileif [ ! -f .yarn-cache.tgz ]; thenecho "Init empty .yarn-cache.tgz"tar cvzf .yarn-cache.tgz --files-from /dev/nullfi
docker build . -t yarn-demo
docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > /tmp/yarn.lockif ! diff -q yarn.lock /tmp/yarn.lock > /dev/null 2>&1; thenecho "Saving Yarn cache"docker run --rm --entrypoint tar yarn-demo:latest czf - /root/.yarn-cache/ > .yarn-cache.tgzecho "Saving yarn.lock"cp /tmp/yarn.lock yarn.lockfi
You also need to add this to your Dockerfile
, after the ADD package.json...
line:
# Copy cache contents (if any) from local machineADD .yarn-cache.tgz /
The cache file is not meant to be pushed to the repo, so it should be added to a.gitignore
file.
Congratulations, again! You have now a deterministic Yarn execution, which is executed only when you change **package.json**
, and it uses Yarn caching. Try this with a complex package.json
file from a real project, you will be amazed!
If you enjoyed this piece click the “♥︎” button below. For more pieces on DevOps and Docker, join my mailing list.