Or being able to An update — September 2017 A week or so ago, some students applied this concept to the idea of typosqatting (registering malicious packages with names similar to popular libraries). By getting a university to issue a security notice, they generated some interest, and finally resulted in some changes to pypi/warehouse to address these issues. I decided to take another look at the download figures for my packages, and see what damage my malicious alter-ego could have wreaked. Across the 12 system module packages I’m hosting, I’m getting on average 1.5 thousand downloads per day, via pip. This adds up to 491,292 downloads so far this year. I’m hoping to hit 500k downloads before my packages are deleted! By package, the download ratios pretty much match the numbers from May: There’s a plan to delete my fake packages now that restrictions have been added to prevent this sort of attack, but it was fun while it lasted! Intro At a London dojo in October last year, we discovered that PyPi allows packages to be registered with builtin module names. python So what? you might ask. Who would pip install a system package? Well the story goes something like this: An inexperienced Python developer/deployer realises they need X functionality Googling/asking around, they find out that to install packages, people use pip Developer happily types in e.g. pip install sys Baddie has registered the pip module, and included a malicious payload sys Developer is now pwned by malitious package, but in python works, and imports a functional sys module, so nobody notices. import sys When we discovered this, I was pretty interested in how plausible this was as an attack vector, so did a few things: Emailed the pypi contacts listed on pypi security Proactively registered all the common system module names that I could think of, as packages Uploaded an empty package to each of them that does nothing other than immediately traceback: raise RuntimeError("Package 'json' must not be downloaded from pypi") Why upload anything? It’s perfectly possible to squat on a pypi package and not upload any files. But by adding an empty package, I could track the downloads from the pypi download stats. Pypi upload their access logs (sans identifying information) to , which is pretty awesome, and allows us to get a good idea of how many systems each package ends up on. google big query How effective is this attack vector? Big query says that so far this year (19th May 2017), my dummy packages have been download ~244k times, lucky they’re benign huh, otherwise that’s 1/4 million infected machines! Some of the downloads will be people using custom scrapers, others may be automated build jobs, running over and over, but I used some tactics to gauge the quality of this data: pypi download logs include a column this seems equivalent to an HTTP user agent string, by only selecting rows where the installer.name is , we’re more likely to be counting actual installs, rather than scrapers, or other bots installer.name pip Another column: tracks very high-level system version information (for example ) By including this in the counts, we can see that lots of different types of setups are downloading these packages, suggesting it’s not just a few bots scraping the site. 3.1k different system versions have downloaded my packages this year, compared with 33k total unique versions across the whole of pypi system.release 4.1.13–18.26.amzn1.x86_64 The query I used is here: What now? I never actually received a reply to my email, so a while later, I raised an issue on the official in January. This also got no reply. pypi github issue tracker I’m currently squatting all the system package names that seem most at risk, and doing so with benign packages, so I don’t see much of a risk of disclosing this now.

Building a botnet on PyPi

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Analysing 1.4 billion rows with python

0–100 in Django: Starting an app the right way

The Noonification: Reduce Javascript: Master the Basics (1/11/2023)

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Top 10 CSS Performance Tips (11/14/2022)

The Noonification: How to Use AI for Your B2B Marketing (11/11/2022)

Analysing 1.4 billion rows with python

0–100 in Django: Starting an app the right way

The Noonification: Reduce Javascript: Master the Basics (1/11/2023)

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Top 10 CSS Performance Tips (11/14/2022)

The Noonification: How to Use AI for Your B2B Marketing (11/11/2022)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps