Or being able to
A week or so ago, some students applied this concept to the idea of typosqatting (registering malicious packages with names similar to popular libraries). By getting a university to issue a security notice, they generated some interest, and finally resulted in some changes to pypi/warehouse to address these issues.
I decided to take another look at the download figures for my packages, and see what damage my malicious alter-ego could have wreaked.
Across the 12 system module packages I’m hosting, I’m getting on average 1.5 thousand downloads per day, via pip. This adds up to 491,292 downloads so far this year. I’m hoping to hit 500k downloads before my packages are deleted!
By package, the download ratios pretty much match the numbers from May:
There’s a plan to delete my fake packages now that restrictions have been added to prevent this sort of attack, but it was fun while it lasted!
At a London python dojo in October last year, we discovered that PyPi allows packages to be registered with builtin module names.
So what? you might ask. Who would pip install a system package? Well the story goes something like this:
pip
pip install sys
sys
pip module, and included a malicious payloadimport sys
in python works, and imports a functional sys module, so nobody notices.When we discovered this, I was pretty interested in how plausible this was as an attack vector, so did a few things:
raise RuntimeError("Package 'json' must not be downloaded from pypi")
It’s perfectly possible to squat on a pypi package and not upload any files. But by adding an empty package, I could track the downloads from the pypi download stats.
Pypi upload their access logs (sans identifying information) to google big query, which is pretty awesome, and allows us to get a good idea of how many systems each package ends up on.
Big query says that so far this year (19th May 2017), my dummy packages have been download ~244k times, lucky they’re benign huh, otherwise that’s 1/4 million infected machines!
Some of the downloads will be people using custom scrapers, others may be automated build jobs, running over and over, but I used some tactics to gauge the quality of this data:
installer.name
this seems equivalent to an HTTP user agent string, by only selecting rows where the installer.name is pip
, we’re more likely to be counting actual installs, rather than scrapers, or other botssystem.release
tracks very high-level system version information (for example 4.1.13–18.26.amzn1.x86_64
) By including this in the counts, we can see that lots of different types of setups are downloading these packages, suggesting it’s not just a few bots scraping the site. 3.1k different system versions have downloaded my packages this year, compared with 33k total unique versions across the whole of pypiThe query I used is here:
I never actually received a reply to my email, so a while later, I raised an issue on the official pypi github issue tracker in January. This also got no reply.
I’m currently squatting all the system package names that seem most at risk, and doing so with benign packages, so I don’t see much of a risk of disclosing this now.