How We Analyzed Crypto API Misuses in 895 GitHub Projectsby@cryptosovereignty
170 reads

How We Analyzed Crypto API Misuses in 895 GitHub Projects

tldt arrow

Too Long; Didn't Read

The methodology involved selecting and analyzing 895 Python repositories from GitHub and 51 curated MicroPython projects. Dependencies were downloaded, and source code was filtered to focus on production code and crypto usages. A comparative analysis with Java and C codebases was also conducted to understand crypto misuses across different languages.
featured image - How We Analyzed Crypto API Misuses in 895 GitHub Projects
Crypto Sovereignty Through Technology, Math & Luck HackerNoon profile picture


(1) Anna-Katharina Wickert, Technische Universität Darmstadt, Darmstadt, Germany ([email protected]);

(2) Lars Baumgärtner, Technische Universität Darmstadt, Darmstadt, Germany ([email protected]);

(3) Florian Breitfelder, Technische Universität Darmstadt, Darmstadt, Germany ([email protected]);

(4) Mira Mezini, Technische Universität Darmstadt, Darmstadt, Germany ([email protected]).

Abstract and 1 Introduction

2 Background

3 Design and Implementation of Licma and 3.1 Design

3.2 Implementation

4 Methodology and 4.1 Searching and Downloading Python Apps

4.2 Comparison with Previous Studies

5 Evaluation and 5.1 GitHub Python Projects

5.2 MicroPython

6 Comparison with previous studies

7 Threats to Validity

8 Related Work

9 Conclusion, Acknowledgments, and References


To analyze Python applications, we constructed two distinct data sets of popular Python and MicroPython projects. Furthermore, we compared our findings in Python programs with previous studies about Java and C code.

4.1 Searching and Downloading Python Apps

Both data sets represent very different domains where Python is used, ranging from server and desktop use to low-level embedded code. How we selected the projects in both data sets for our empirical study is described in the following.

4.1.1 Python Projects from GitHub. For our evaluation of crypto misuses in Python code we focus on open-source code. Thus, we crawled and downloaded the top 895 Python repositories from GitHub sorted by stars. To further understand the influence of dependencies, we downloaded them with Pythons standard dependency manager pip for each project. Afterwards, we ended up with 14,442 Python packages of which 3,420 are unique.

As our analysis works upon a per-file basis, we reduced our set to only those source code files that include the function calls referenced in our rules, e.g., (...) . In addition, we filter for production code and ignore test code which should be non-existent during the execution of the application. After applying these 2 filter steps, we ended up with 946 source files from 155 different repositories. Unfortunately, Babelfish was unable to parse 35 of these files, and reached the maximum recursion depth for the AST XPath queries for at least one rule in 50 files. These 85 parsing failures are distributed amongst 61 different projects. However, for each of the projects at least 1 file with a crypto usage was analyzed successfully. In total, we successfully analyzed 861 different files within 155 Python repositories with LICMA.

4.1.2 Curated Top MicroPython Projects. As an extension to our Python application set, we crawled 51 MicroPython projects which are stated as the top announced MicroPython projects[5]. Like for the regular Python applications, we downloaded all dependencies with pip and got 113 dependencies with 1 duplicate dependency. Afterwards, we applied the same filter steps as before: The usage of crypto and the exclusion of test files. These steps, resulted in 5 files which seem to use the Python crypto libraries supported by LICMA. Note that we included the MicroPython crypto library ucryptolib in LICMA and our filtering steps. To further understand this small number of potential usages, we also analyzed our data set of MicroPython applications manually. This analysis reveals that we potentially missed five crypto usages.

This paper is available on arxiv under CC BY 4.0 DEED license.