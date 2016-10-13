DevOps Lead
Facebook recently released Yarn, a new Node.js package manager built on top of the npm registry, massively reducing install times and shipping a deterministic build out of the box.
Determinism has always been a problem with npm, and solutions like
npm shrinkwrap are not working well. This makes hard to use a
npm-based system for multiple developers and on continuous integration. Also,
npm slowness in case of complex
package.json files causes long build times, representing a serious blocker when using Docker for local development.
This article discuss how to use Yarn with Docker for Node.js development and deployment.
cd DockerYarn
./build.sh
docker run yarn-demo node -e "console.log('Hello, World')"
The first time your build the container, Yarn fetches
npm dependencies for you. After that, Yarn is executed only when you modify your
package.json, and it uses cache from previous executions. On top of it, you have determinism: the same dependency tree is installed every time and on every machine. And it’s blazing fast!
The procedure works on Mac and Linux. We are going to the Risingstack Node.js Docker image for Node 6. Please install Yarn on your machine before proceeding.
Dockerfile:
FROM risingstack/alpine:3.4-v6.7.0-4.0.0
WORKDIR /opt/app
# Install yarn from the local .tgz
RUN mkdir -p /opt
ADD latest.tar.gz /opt/
RUN mv /opt/dist /opt/yarn
ENV PATH "$PATH:/opt/yarn/bin"
# Install packages using Yarn
ADD package.json /tmp/package.json
RUN cd /tmp && yarn
RUN mkdir -p /opt/app && cd /opt/app && ln -s /tmp/node_modules
This is based on a well-known trick to make use of Docker layer caching to avoid to reinstall all your modules each time you build the container. In this way, Yarn is executed only when you change
package.json (and the first time, of course).
package.json
yarn init
yarn add react
docker build . -t yarn-demo
docker run yarn-demo node -e "console.log('Hello, World')"
Congratulations! You’re using
yarn with Docker.
yarn.lock”?
Yarn stores the exact version of each package and sub-package in order to be able to reproduce exactly the same dependency tree on each run. Both
package.json and
yarn.lock must be checked into source control. As we run Yarn inside the container, we need to retrieve
yarn.lock. Luckily, it’s not hard to extract
yarn.lock after each run. Simply change the
ADD line in the
Dockerfile with the following:
ADD package.json yarn.lock /tmp/
and build the container using the following command:
docker build . -t yarn-demo; docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > yarn.lock
After the build,
yarn.lock is copied to your working directory, and it will be reused on next Docker run, installing the same dependencies each time.
Congratulations! Now you have deterministic Yarn execution.
That is correct, we are now running Yarn at each build, even if
package.json has not been modified. This is because
yarn.lock is copied from the container to your working directory each time, even if it’s not changed, thus invalidating Docker layer caching. To solve this, we need to copy
yarn.lockonly if it’s really changed. To do so:
build.sh file:
#!/bin/bash
docker build . -t yarn-demo
docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > /tmp/yarn.lock
if ! diff -q yarn.lock /tmp/yarn.lock > /dev/null 2>&1; then
echo "We have a new yarn.lock"
cp /tmp/yarn.lock yarn.lock
fi
chmod +x build.sh
./build.sh
docker run yarn-demo node -e "console.log('Hello, World')"
Congratulations! You have now a deterministic Yarn execution, and Yarn is executed only when you change
package.json.
Another powerful feature of Yarn is package cache, which is stored on the local filesystem, to avoid downloading packages again. Our procedure so far does not maintain cache over container builds. This could be an issue for big
package.json files.
The following
build.sh solves the issue by saving Yarn cache on your working directory.
#!/bin/bash
# Init empty cache file
if [ ! -f .yarn-cache.tgz ]; then
echo "Init empty .yarn-cache.tgz"
tar cvzf .yarn-cache.tgz --files-from /dev/null
fi
docker build . -t yarn-demo
docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > /tmp/yarn.lock
if ! diff -q yarn.lock /tmp/yarn.lock > /dev/null 2>&1; then
echo "Saving Yarn cache"
docker run --rm --entrypoint tar yarn-demo:latest czf - /root/.yarn-cache/ > .yarn-cache.tgz
echo "Saving yarn.lock"
cp /tmp/yarn.lock yarn.lock
fi
You also need to add this to your
Dockerfile , after the
ADD package.json... line:
# Copy cache contents (if any) from local machine
ADD .yarn-cache.tgz /
The cache file is not meant to be pushed to the repo, so it should be added to a
.gitignore file.
Congratulations, again! You have now a deterministic Yarn execution, which is executed only when you change
package.json, and it uses Yarn caching. Try this with a complex
package.json file from a real project, you will be amazed!
