Working Of Yarn and npm

As an Ember developer, discovering Yarn package manager by Facebook was the best that could happen to me.

Whenever one of my fellow developers encountered “Broccoli Plugin failed…” I used to say ‘rm -rf node_modules’, that’s what my seniors told me and I continued the tradition.

That was very frustrating and as a beginner, I use to blame it all on Ember as I didn’t understand much of what was happening. As time passed I realized npm was the culprit. But a few months back I discovered this Yarn package manager. When I found it, I read that it promised to solve most frequent ‘npm install’ errors. It claimed it was better because it was:

· Deterministic unlike npm.

· Network performance by parallel installations.

· Same package installation like npm (it uses the same registry).

· Yarn uses check-sums to verify the integrity of packages.

· Works even Offline for previously installed packages as it caches all the packages.

Its has gained a lot traction in recent times so its features can be found all over the internet. What you won’t find that easily on internet is how does it achieve these great features. I initially didn’t pay much attention to how did npm installations worked but as I got to learn about the yarn I got curious, so I decided to dive deep and find how did they work. Hope it helps beginners understand things better.

As a beginner in npm world, I wasn’t aware of few of following terms so I decided to add them here so that the newcomers won’t have to google every word and then jump back and forth.

1. Packages

They are a piece of software that can be downloaded from the internet to accomplish some task. They may depend on other packages.

2. Dependencies

They specify the packages your application uses. They are specified inside manifest files like package.json . Following dependencies might be represented as below:

“dependencies”: { “a” : “1.0.0”, “b” : “1.0.0” }

They use semantic versioning i.e.

i. Given a version number MAJOR.MINOR.PATCH.

ii. MAJOR version when you make incompatible API changes,

iii. MINOR version when you add functionality in a backwards-compatible manner, and.

iv. PATCH version when you make backwards-compatible bug fixes.

They follow flat namespace, meaning all the different packages are given same hierarchy unless they have a multiple versions, then it that case they’ll be assigned lower level in the namespace like for ‘s^2.0.0’ in the above example.

So lets start with an example of npm installation and then followed by yarn installation. I’ll keep the example simple but highly detailed.

Assumptions

i. This will be our first time installation of package.json

ii. There is no node_modules folder at the moment.

iii. No pre-cached packages.

There are three phases in package installation

i. Dependency resolution

Make requests to the registry and look up dependencies recursively and determine the location of package installation in the dependency folder.

ii. Fetch packages

Fetch packages in compressed format and place them in global cache.

iii. Linking Packages

Copying files from global cache to local node_modules folder.

That said lets have a look at how do above things actually take place when we do ‘npm install’.

Now consider a manifest file(package.json) having following simple hypothetical packages.

“dependencies”: { “a” : “1.0.0”, “b” : “1.0.0” }

We’ll make it more simpler by removing the minor and patch versions and just keep the major versions. That leaves us with following:

When we do ‘npm install’ for the above package.json file, npm does following :

1. Load Packages:

a. If node_modules folder is already present (i.e. you had previously used ‘npm install’ which in turn creates npm_modules folder), then uses that to make ideal tree by loading from disc.

b. Clone the existing tree to build ideal tree, which serve as our final tree for the app.

c. Using the cloned tree build ideal tree, which will be used to build package folders in node_modules folder. This can be seen in the below snapshot.

First we encounter ‘a1’ package which has another dependeny of ‘s1’, as this is the first time we saw it so we follow the flat structure and put it as the common dependency along ‘a1’ and ‘b1’. This structure allows us to reuse this package for another package which maybe dependent on ‘s1’.

Now when we encounter ‘s2’ which is version 2 of same package ‘s’ then we cannot follow flat structure as same packages cannot reside in the same directory. So we consider it as a child for ‘b1’ and a folder for ‘s2’ is created inside ‘b1’. This also tells us that when we need to reuse a package, we need to have it at the top of another package with same name but different version, which may not be reused by other packages.

2. Dependency Resolution:

a. Based on the ideal tree npm detects which packages need to be installed and which are already installed. For our case as there is no ideal, as we started installation from scratch.

b. It also resolves the versions used in sub packages and in the flat directory path.

c. At this step, npm knows which packages needs to be installed in which folders.

3. Package Fetching and Linking

a. Here, npm actually fetches the packages from npm registry and installs them in the respective folders for packages.

Now, that’s how npm did it from ages, now lets see what does our new guy Yarn has to do to make things better.

1. Dependency Resolution:

a. Yarn creates a list of package requests it needs to make for the manifest file. It first checks for the package and then it adds the required version to the list. Dependency resolution is done package by package. For this example, it considers ‘a1’.

b. Now that we have package name and its version number, Yarn goes on to npm repository (Registry) and finds for the same or higher version of the package as mentioned in the manifest file. We can mention in package.json or manifest file about the package that needs to be fetched. Following are the symbols used to indicate what version needs to be fetched.

i. ‘~’ : It fixes both the major and minor version numbers, while matching any patch number. Ex. ~2.1.0 means that fetch anything higher than 2.1.0 but less than 2.2.0.

ii. ‘^’ : It locks the major version and looks for the latest version for minor and patch versions. Ex ^2.1.0 means fetch anything higher than 2.1.0 and less than 3.0.0

iii. ‘*’ : It locks MAJOR or MINOR depending on where its used. Ex 2.* means anything less than version 3.0.0 and 2.0.* means anything less than 2.1.0

c. Now Yarn checks the local cache before making actual request to registry. If the package that needs to installed is already used by some other package then there is no need to request it again as it was already cached by yarn for later reuse.

d. The above process is carried out for evry package and then has the list of packages that needs to fetched as it knows which were previously fetched and are cached.

2. Fetching and Linking

a. Now the request for the package is made and is fetched from registry. In same fashion, Yarn fetches other packages.

Once all the required packages are fetched Yarn does the job of linking those packages as ‘npm’ did. But here linking is a bit complex as Yarn needs to link both the cached and newly fetched packages. It needs to copy the cached packages to the new path for linking.

3. Save the lock file

a. It is used to store exactly which versions of each dependency were installed . It’s similar to npm’s npm-shrinkwrap.json, however it’s not lossy and it creates reproducible results. In Rails, you have bundler for doing similar thing. It is managed by Yarn and is updated every time you add/remove/update the existing packages.

b. This file always needs to be committed as it is the sole reason Yarn is able to maintain integrity and consistency of packages across the systems. So it is very impostant who commits this file and what is committed with this file. Broken package commit might lead to installation of same broken package on every team member’s machine.

c. It also sues Checksums for every package which helps ensure that the integrity of your data is maintained.

Now that we know how Yarn works , I would like to demonstrate one of main shortcomings of npm and show you how yarn solves it.

I would like to show how installing same packages on different machines results in different node_modules folder in npm and how yarn solves this problem by maintaining a lock file.

Now suppose, Yehuda installs the package as follows using npm

Consider he decides to upgrade Package ‘a’ to ‘a2’ . When ‘npm install’ is run it produces following structure.

Now suppose Yehuda wants to share his code with the world and people start using the ‘npm install’ for the above package.

So as we can see that npm doesn’t maintain consistent structure across systems but Yarn was built to address this sole issue. Rest of the features were just an addons.

This can be solved by clearing ‘node_modules’ folder but that’s not the most heavenly way of doing it . In fact reinstalling the packages again is gonna take ages.

Shrinkwrap to rescue….. but where do I find it???

Its already there with npm but its disabled by default and is lossy. So even npm guys don’t trust it for you.

Now Yehuda discovers Yarn, so he decides to give it a try.

Also there is lock file which maintains this dependency hierarchy even when you run ‘yarn install’ . The result is consistent on every machine.

Now who ever does ‘yarn install’ , he gets the same dependencies and in exact same order, as lock file is transmitted to everyone.

Pro Tip : Never forget to commit your lock file as it serves as the single source of truth to all the users.

We know Yahuda is really smart so he decides to make integration between Ember and Yarn tighter.

Ember 2.13 is now Yarn aware meaning it generates lock file and encourages you to use Yarn.