Reproducible steps for identifying unwanted and malicious code The state of NPM security 2018 brought us some fairly high-profile talking points about the state of NPM security. In January, broke the internet (pun intended) with a plausible attack on everyone’s PII ( ) The hypothetical attack was centered around adding malicious code to a package that was not in source control. David Gilbertson I’m harvesting credit card numbers and passwords from your site. Here’s how. In July, that allowed them to steal npm tokens from other packages. Our friends over at NPM nuked all tokens before the attack could spread further. attackers added malicious code to eslint-scope In September, and remained undetected for months. malicious code was added to event-stream via a dependency Photo by on João Silas Unsplash NPM has a few attack vectors I hesitate to call the following vectors “security flaws” because some of them are just the way package managers work. However, most of the high-profile stories from last year targeted the same things. Let’s take a look at what they are and then see if anything can be done to mitigate their risk. Legitimate packages have contents that aren’t in source control An npm package is just tarball of files. The fact is that all package managers (Npm, Nuget, Maven, etc) just distribute tarballs or zip files or some other bundle of content. Any responsible developer is going to keep their code in source control; however, this code may or may not be the only thing in the package. For compiled languages like Java or .Net, packages contain build artifacts, not source code. Especially if the build output is obfuscated, it is difficult if not impossible to casually discover security flaws in the package contents. Javascript is a bit different in that many simple packages are simply tarballs of unmodified source code. However, Typescript requires a transpilation step any many other non-Typescrpt codebases include some form of bundling or minification process before an npm package is created. Security concerns are exacerbated but the fact that minified code is hard to read. The chances of someone finding a flaw without sifting through the code character by character is very low. I cannot overemphasize how normal all of this is; however, the disparity between what’s in Github and what’s in the package is a great place to hide malicious code. This is what happened in the attack on eslint-scope. The attackers gained access to the token required to publish the package, but not the project’s Github account. In an oversimplified explanation, they added malicious code on their local machine and republished the package. When the dust settled, NPM unpublished the offending version, but it is important to note that the latest version of eslint-scope still does not match the repository per the git HEAD published via the registry. (I’ll explain this in detail later.) Legitimate packages can be updated with bad dependencies This is the basis of David Gilbertson’s hypothetical attack and the very real attack on event-stream. By targeting dependencies, attackers never had to gain access to anything but the attention of maintainers. Gilbertson (fictitiously) created a few quasi-useful packages and then sent out a blast of PRs to get them into trusted packages. Flatmap-stream was created and . actually added to event-stream The only guard against such an attack is to meticulously review all commits that affect the dependency tree. Additionally, using a package-lock.json file helps to ensure that predictable dependencies are used which mitigates the risk of an attack on a sub-dependency. But security by code review has to be on-point whereas attackers only have to get lucky . entire all the time once There is no canonical way to prove that a package is legitimate This is somewhat of a combination of the previous two vectors. If NPM knew that eslint-scope contained malicious code, they could have kept it from ever being published. Let’s say that the code in a package must always match what is in source control. Because the attackers never gained access to the Github account, they couldn’t change the repo. This would have prevented a new package version from being published. But it would also prevent Typescript-based projects such as RxJS from existing as well. So obviously, that isn’t a good heuristic. It would also be nice to verify that dependencies were valid before installing them or if problematic dependencies could be flagged with something like . npm audit Is there a viable solution? Ideally publishing an npm package would look something like this: Edit code Commit and push to Github/Gitlab/Bitbucket/etc ??? Publish There is a missing automated process that verifies that package about to be published matches what is in source control or at least is a deterministic result of what is in source control. Let’s take a look at what NPM offers for building such a process. NPM pre/post scripts I’m going to give a quick refresher here about npm scripts so that we are all on the same page. You can create a “pre” or “post” version of any script that is run before or after any other script. For example, if you have a “build” script that runs the Typescript compiler, you could also create a “prebuild” script to clear the build output folder of any previous files. Breaking apart complex scripts this way helps to product small, easy-to-read scripts. But you can also write “pre” and “post” scripts for the “built-in” npm scripts as well. For example, “prepack,” “prepublish,” and the weirdly named “prepare” scripts let you run builds or tests before creating or publishing your package. Here is the actual from the Redux package.json: “prepare” script "prepare": "npm run clean && npm run format:check && npm run lint && npm test" As you can see, packaging Redux will first clean, check, lint, and test all the things. If a developer somehow snuck bugs or even bad formatting past code review, this script would prevent that change from being packaged and subsequently published. This is pretty neat! NPM registry and source control While npm is a command line application, all of its data comes from registry.npmjs.com. Any publicly available data about a package is provided by this API. If you want to see all of the current and historical data about Redux, just . GET https://registry.npmjs.com/redux As you can see the “versions” object contains data about each version indexed by version ID. At the time of writing, is the latest stable version. There are a few interesting things to note about the version info that comes from the registry. Here is the abbreviated object so that we can focus on the relevant bits: redux@4.0.1 "4.0.1": {..."repository": {"type": "git","url": "git+https://github.com/reduxjs/redux.git"},..."gitHead": "c5d87d95f3b9b0ebdb57791f69b53d8507cebbed",..."dist": {..."shasum": "436cae6cc40fbe4727689d7c8fae44808f1bfef5",...}} The “gitHead” corresponds to the current commit on the machine from which the package was published. For a bit more context, you can access that commit on Github to see more info about it: https://github.com/reduxjs/redux/commit/c5d87d95f3b9b0ebdb57791f69b53d8507cebbed NPM pack dry run The “shasum” from the registry output is the checksum of the package that is generated. Assuming you have the Redux repo cloned, you can check out the “githead” of the package with (the first 8 of the sha) and install dependencies from the lock file: . git checkout c5d87d95 npm ci Now we can do a dry run of the pack command which performs the packaging process without actually generating the file: . Here is the output when run at this specific commit: npm pack --dry-run Result of packing Redux at c5d87d95 This is what go into the package’s tarball if we didn’t run it as a “dry run.” There are two really cool things going on here. First, anyone who checks out this commit on a clean repo and runs the “pack” script will get the EXACT SAME OUTPUT. Secondly, note that the shasum in the output is the EXACT SAME as the one from the registry. That proves that when Tim Dorr packed and published Redux, he performed the EXACT SAME steps we did. No undocumented manual steps, no hidden malicious code, no sneaky business. would We know this because we started with the same git HEAD and ended with the same shasum. We can look at everything in the repo including both the code and the “prepare” script and see that everything is above board. Armed with this knowledge, I think that we can definitely trust this version of Redux! 👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍 RealScience™ is repeatable Here are the steps we just took to verify Redux: Get package data from the NPM registry Find the repository URL Find the gitHead/commit Check out the commit (into a temp folder) Install dependencies Pack dry run Compare published and re-created shasums I created a that automates these steps for verifying packages. The package’s repo is “shallow cloned” into a temporary folder where all of the npm operations are performed. Here is an example output when we run TBV on Redux: proof-of-concept package validator called TBV (trust but verify) Redux passes. So does Express: Note that Express doesn’t have a “prepare” or “prepack” step, so we didn’t need to install dependencies. But not all popular packages verify. At the time of writing, lodash is the most depended upon package on NPM; however, it doesn’t have a “prepack” step and the published shasum doesn’t match the one generated from the corresponding version tag in Github: This isn’t to say that Lodash is dangerous to use, it just says that we can’t prove it with the same steps we performed on other packages. In the future, I intend on doing further research on the most popular packages to see if there are ways to reduce false negatives. isn’t Pre-publish verification There are a similar set of steps for self-verifying prior to publishing. The intent here is to introduce packages into the wild that are easy to verify: Ensure that your package.json specifies a repository Pack dry run (local files) Checkout the latest local commit (into a temp folder) Install dependencies Pack dry run (temp folder) Compare local and temp folder shasums This process is similar to the previous verification process; however, instead of the shasum coming from the registry (we haven’t published yet), we get it from a dry run of the local code. This is then compared against what is in source control. The result is that if there are any uncommitted or unpushed changes on your local, the process will fail. The “test” process indicates whether or not other developers would be able to verify your package if it were to be published. Because it doesn’t pull from the NPM registry at all, you can make and push as many changes as necessary to get the package to validate publishing. before Here is an example output from testing the TBV project itself: Proof-of-Concept to Production The tool is nothing more than a quick and dirty experiment to test the hypothesis that automated package verification is possible. And it looks like it is! But in its current form, it is nothing more than a proof-of-concept. Where do we go from here? TBV Official Verified Packages I think that NPM (and probably all package managers) ought to have this sort of package verification baked it. Npmjs.com could display the verification status of packages to indicate which ones are much less likely to contain unpredictable or non-reproducible contents: Twitter, please don’t sue me. You’ll benefit from this, too. I promise! :D Verified-only accounts/packages NPM could allow accounts to be configured such that publication of unverified packages is prohibited. This would prevent attacks like the one on eslint-scope. Hackers wouldn’t be able to update the code in Github which means it would be impossible to publish a package that would validate, even if they had access to the publish token. To flip it around, package.json could have a “verifiedOnly” flag to prevent installation of unverified dependencies or sub-dependencies. Improve NPM audit Currently, the command checks for known security vulnerabilities in the projects full dependency tree. A useful addition to the current audit would be reporting how many “unverified” packages exist in the dependency tree. npm audit Many legitimate packages exist at the current moment to do not validate per the simplistic rules I define in this post. If such a feature makes it into npm (or yarn) there would be a need for whitelisting certain known good versions of unverified packages. Actually catch a bad guy The best thing that could happen is for a news story to break that someone introduce malicious code into a popular package. No, you’re right, such a non-story would never even hit the blogosphere in the first place. 👍 DIDN’T I actually learned a lot about the NPM ecosystem while researching this project! Watching high-profile security flaws hit the news machine is painful because it always brings out the NPM and NodeJS haters. I am optimistic that solutions are within reach! Let’s make 2018 the last time that something like event-stream@3.3.6 ever happens. Let’s also make sure that David Gilbertson keeps his grimy mitts off our credit card numbers and passwords 😃. I’m genuinely interested in your feedback on this sort of approach. Drop a comment or leave some 👏 to let me know what you think. Also, (yes, it validates 😃) and if you think it earned them. check out TBV give it some 🌟🌟🌟 Happy coding! EDIT: I have a put bit of work into building an experimental version of “Package Verification as a Service.” You can read about it here: https://hackernoon.com/npm-package-verification-ep-2-2b2ec66eb610