Alex Kras

@akras14

Overview of differences between npm, yarn and pnpm

I am not an expert on either of the package managers. Contrary, until few days ago I didn’t realize that npm used a local cache. Unaware, I wrote an article titled OMG — NPM clone that finally makes sense and was called out on some of my false assumptions. That feedback forced me to take a step back and re-examine some of the differences in package managers closer.

I have been using npm full time for the past 5 years. I’ve played around with yarn when it first came out, and I learned about pnpm via the “Why should we use pnpm?” article posted about a week ago.

I’ve spent the past week reading up on npm, yarn, and pnpm and wanted to summarize and share my findings. My target audience is regular npm users, like myself, that didn’t invest time to look into how various npm alternatives stack up. I will only focus on top three (for me) and will not be covering: ied, npm-install and npmd etc, because I don’t know anything about them.

It’s also important to point out, that as of writing of this article, none of the competing libraries are aiming to replace NPM the registry (aka place that stores the packages), and rather they are all aim to replace npm command line client, providing an alternative user interface and behavior, with similar functionality.

NPM

npm was there from the day one and is one of the main reasons that Node.js itself is so successful as a project. npm team has done a great job making sure that npm remains backward compatible and works consistently across various environments.

npm was designed around the idea of Semantic Versioning (semver), which is a pretty straight forward approach as quoted from their website:

Given a version number MAJOR.MINOR.PATCH, increment the:
- MAJOR version when you make incompatible API changes,
- MINOR version when you add functionality in a backwards-compatible manner
- PATCH version when you make backwards-compatible bug fixes.

npm uses a file called package.json in which users can store all of the dependencies for the project, via the npm install --save command.

For example running npm install --save lodash will add the following entry to the package.json file.

"dependencies": {
    "lodash": "^4.17.4"
}

Notice the ^ character added before the version number of lodash. This character tells npm to install any version of the library with the MAJOR version equal to 4. So if I were to run npm install a year from now, npm would install the latest version of lodash with MAJOR value of 4, for example it could be lodash@4.25.5 (@ is npm convention to specify version with package name). You can see all supported characters here: https://docs.npmjs.com/misc/semver.

The reason for it, is because bumping MINOR version (in theory) should only include backward compatible changes. Hence installing the latest version of the library should work AND might allow to pull in important bug and security fixes that happened since original version 4.17.4 was installed.

On a flip side, it could result in a situation where various developers have various version of the same library installed on their machine, even though they are sharing the same package.json file, leading to potentially hard to debug bugs and “works on my machine” situations.

Most of npm libraries rely heavily on other npm libraries. This results in nested dependencies and increases the chance of version miss-match.

The default behavior of using ^ in front of the library version can be turned off via npm config set save-exact true command, but this will only lock in top level dependencies. Since every library required has it’s own package.json file, which could have ^ in front of their dependencies, there is no guarantees provided via the package.json file for the nested content.

To combat this concern npm provide a shrinkwrap command. This command will generate a npm-shrinkwrap.json file, specifying exact version to use for all libraries and all nested dependencies.

That being said, even with npm-shrinkwrap.json file in place, npm will only lock in the library versions, not the library content. Even though npm now prevents users from re-publishing same version of the library more than once, npm admins still reserve the right to force update some of the libraries.

Here is a quote form shrinkwrap documentation page:

If you wish to lock down the specific bytes included in a package, for example to have 100% confidence in being able to reproduce a deployment or build, then you ought to check your dependencies into source control, or pursue some other mechanism that can verify contents rather than versions.

npm version 2 used to install all dependencies inside of each package that was requiring them. So if we had project, that required project A, that required project B, that required project C, the tree structure for all dependencies will look as follows:

node_modules
- package-A
-- node_modules
--- package-B
----- node_modules
------ package-C
-------- some-really-really-really-long-file-name-in-package-c.js

This structure could get pretty long. Which was merely an annoyance on Unix based system, but was actually breaking things on Windows, where a lot of utilities were not implemented to handle file paths longer than 260 characters.

npm version 3 solution to this problem was to flatten the dependency tree, so our 3 project structure would now look as follows:

node_modules
- package-A
- package-B
- package-C
-- some-file-name-in-package-c.js

As a result of that change, a path to some really long file, went from ./node_modules/package-A/node_modules/package-B/node-modules/some-file-name-in-package-c.js to ./node_modules/some-file-name-in-package-c.js.

You can read more on how NPM 3 dependency resolution works here.

A downside of this approach is that now npm has to go through all of the project dependencies and decide how the flatten the node_modules folder. npm is forced to build a full dependency tree for all modules used, which is a costly operation and one of the leading causes of npm install slow down.

Since I didn’t follow the npm changes carefully, I assumed that slow down came from NPM having to download everything from the Internet every time I ran the npm install command.

Turns out, I was wrong, and npm does have a local cache, where it keeps a tarball of each version of the library that it has downloaded. The content of local cache can be seen via the npm cache ls command. Having a local cache helps to improve the install times.

All in all, npm is a mature, stable and fun to use package manager.

Yarn

Yarn was announced in October 2016 and quickly rose to 24K+ starts on Github. For comparison, npm only has 12K+ starts. It is a project with some high profile developers such as Sebastian McKenzie (Babel.js) and Yehuda Katz (Ember.js, Rust, Bundler etc).

From what I could gather, Yarn’s main initial goal was to address npm installations not being deterministic due to semver related behavior described in the previous section. While predictable dependency tree (if desired) can be achieved with npm shrinkwrap, it is not the default behavior and relies on all developers to know that such option exists and that it should be used.

Yarn took a different approach. Every yarn install generates a yarn.lock which is similar to npm-shrinkwrap.json, but it is created by default. In addition to regular information, yarn.lock file contains a checksums for the content to be installed, insuring that the same version of the library is used.

Since yarn was a fresh re-write of npm client, the developers were able to properly parallelizes all needed operations and add some other improvements, which provided a significant speed up to the overall install time. My guess is that this speed up is the main reason for yarn‘s popularity.

Like npm, yarn uses a local cache. Unlike npm, yarn does not need to have an internet connection to install dependencies that are already cached locally, providing the offline mode. A feature that was unsuccessfully requested from npm since 2012.

Yarn provides some other perks. For example, it allows to aggregate licenses for all packages used in a project and it’s nice to look at.

An interesting side note, is how attitudes from yarn documentation has changed towards npm since yarn project become popular.

The initial yarn announcement said that the following about installing yarn:

The easiest way to get started is to run:
npm install -g yarn
yarn

Here is what Yarn installation page has to say about installing yarn now:

Note: Installation via npm is generally not recommended. npm is non-deterministic, packages are not signed, and npm does not perform any integrity checks other than a basic SHA1 hash, which is a security risk when installing system-wide apps.
For these reasons, it is highly recommended that you install Yarn through the installation method best suited to your operating system.

At this pace, I would not be surprised if yarn were to announce their own registry, allowing developers to slowly phase out npm completely.

Looks like thanks to yarn, npm finally realized that they need to look closer into some highly requested issues. NPM’s initial reaction to the release of Yarn read to me along the lines of “it’s cute”. Now, when I was reviewing the highly request “offline” feature that I mentioned earlier, I noticed that a fix for it (and other related issues) is being actively worked on as we speak.

pnpm

As I mentioned, I only became aware of pnpm a short while ago via “Why should we use pnpm?” post by Zoltan Kochan, author of pnpm.

I am not going to go into too much details (since this post is getting long), but you can checkout my initial post for a lot more context and discussion on Twitter.

BUT

I want to point out that pnpm is so fast that it outperforms both npm and yarn.

The reason why it’s so fast? Because it uses a clever approach that leverages hardlinks and symlinks to avoid having to copy all of the locally cached source files, which is one of the biggest performance hits for yarn.

Using links is not easy, and comes with a list of issues to consider.

As Sebastian pointed out on Twitter, he has initially considered using symlinks in yarn, but decided against it for a number of reasons.

At the same time, as indicated by 2K plus starts on Github, pnpm is able to make linking work for a lot of people.

In addition, as of March 2017 it provides all benefits that yarn provides, including offline mode and deterministic installs.

Conclusion

I think yarn and pnpm developers have done an amazing jobs. My personal preference is for deterministic installs, since I like the control and I don’t like the surprises.

Whatever the outcome of this race is (which kind of reminds me of io.js fork), I am thankful to yarn for putting some fire under npm's feet and providing a reasonable alternative until the dust settles.

I also think that it’s possible that yarn could have thrown out the idea of hard and soft linking a bit too early. I wonder what yarn team could do with this idea, considering how much damage a lone developer was able to do with pnpm and how highly users seem to value the speed of installs.

I do think that yarn is a safer choice over all, but pnpm might be better choice for some use cases. For example, it could play well with a small to medium size team that runs a lot of integration tests and wants their dependencies installed as fast as possible.

Last but not least, I think that npm still provides a very useful solution that supports a wide range of use case. Most developers can do just fine by sticking with pure npm client.

In any case, I am thankful for all of the contenders who are working hard to keep the ecosystem healthy. When companies compete, users win.

Originally published at www.alexkras.com on May 1, 2017.

More by Alex Kras

Topics of interest

More Related Stories