Machine Learning-based Vulnerability Protections For Android Open Source Project

This paper presents a framework that selectively triggers security reviews for incoming source code changes. Functioning as a review bot within a code review service, the framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository. Because performing such secure code reviews add cost, the framework employs a classifier trained to identify code changes with a high likelihood of vulnerabilities.

The online classifier leverages various types of input features to analyze the review patterns, track the software engineering process, and mine specific text patterns within given code changes. The classifier and its features are meticulously chosen and optimized using data from the submitted code changes and reported vulnerabilities in Android Open Source Project (AOSP). The evaluation results demonstrate that our Vulnerability Prevention (VP) framework identifies approximately 80% of the vulnerability-inducing code changes in the dataset with a precision ratio of around 98% and a false positive rate of around 1.7%.

We discuss the implications of deploying the VP framework in multi-project settings and future directions for Android security research. This paper explores and validates our approach to code change-granularity vulnerability prediction, offering a preventive technique for software security by preemptively detecting vulnerable code changes before submission.

I. INTRODUCTION

The free and open source software (FOSS) supply chains for the Internet-of-Things devices (e.g., smartphones and TVs) present an attractive, economic target for security attackers (e.g., supply-chain attacks [20][21][28]). It is for instance because they can submit seemingly innocuous code changes containing vulnerabilities without revealing their identities and motives. The submitted vulnerable code changes can then propagate quickly and quietly to the end-user devices.

Targeting specific, widely used open source projects (e.g., OS kernels, libraries, browsers, or media players) can maximize the impact, as those projects typically underpin a vast array of consumer products. The fast software update cycles of those products can quickly take vulnerabilities in the latest patches of their upstream FOSS projects if rigorous security reviews and testing are not implemented before each software update or release. As a result, those vulnerable code changes can remain undetected and thus unfixed, reaching a large number of end-user devices.

From a holistic societal perspective, the overall security testing cost can be optimized by identifying such vulnerable code changes early at pre-submit time, before those changes are submitted to upstream, open source project repositories. Otherwise, the security testing burden is multiplied across all the downstream software projects that depend on any of the upstream projects.

Those downstream projects cannot rely on the first downstream projects to find and fix the merged, upstream vulnerabilities because the timeframe for such fixes and their subsequent upstreaming is unpredictable (e.g., in part due to the internal policies [22]). Thus, it is desirable to prevent vulnerable code submissions in the upstream projects.

A naïve approach of requiring comprehensive security reviews for every code change cause an unrealistic cost for many upstream open source project owners. It is especially true for FOSS projects receiving a high volume of code changes or requiring specialized security expertise for reviews (e.g., specific to the domains). To this end, this paper presents a Vulnerability Prevention (VP) framework that automates vulnerability assessment of code changes using a machine learning (ML) classifier.

The classifier model estimates the likelihood that a given code change contains or induces at least one security vulnerability. Code changes exceeding a threshold mean likely-vulnerable. The model is trained on the historical data generated by using a set of associated analysis tools. The model uses the common features used for software defect prediction as well as four types of novel features that capture:

(1) the patch set complexity,

(2) the code review patterns,

(3) the software development lifecycle phase of each source code file, and

(4) the nature of a code change, as determined by analyzing the edited source code lines. In total, this study comprehensively examines 6 types of classifiers using over 30 types of feature data to optimize the accuracy of the ML model.

To generate the training and test data, we leverage the security bugs discovered and fixed in the Android Open Source Project (AOSP)1 . It specifically targets the AOSP media project2 (i.e., for multimedia data processing) that was extensively fuzz-tested and thus revealed many security defects. A set of specialized tools is designed and developed as part of this study to:

(1) identify vulnerability-fixing change(s) associated with each target security bug, and

(2) backtrack vulnerability-inducing change(s) linked to each of the identified vulnerability-fixing changes. All the identified vulnerability-inducing changes are then manually analyzed and verified before being associated with the respective security bugs. The associated vulnerability-inducing changes are labeled as ‘1’, while all the other code changes submitted to the target media project are labeled as ‘0’ in the dataset.

The N-fold evaluation using the first year of data identifies random forest as the most effective classifier based on its accuracy. The classifier identifies ~60% of the vulnerabilityinducing code changes with a precision of ~85%. It also identifies ~99% of the likely-normal code changes with a precision of ~97% when using all the features for the training and testing.

The VP framework is then used as an online model retrained monthly on data from the previous month. When it is applied to about six years of the vulnerability data3 , the framework demonstrates an approximately 80% recall and an approximately 98% precision for vulnerability-inducing changes, along with a 99.8% recall and a 98.5% precision for likely-normal changes. This accuracy result surpasses the results achieved in the N-fold validation in large part because the online deployment mode can better utilize the underlying temporal localities, casualties, and patterns within the feature data.

In summary, 7.4% of the reviewed and merged code changes are classified as vulnerability-inducing. On average, the number of likely-normal changes requiring additional attention during their code reviews is around 7 per month. This manageable volume (less than 2 code changes per week) justifies the cost, considering the high recall (~80%) and precision (~98%) for identifying vulnerability-inducing changes.

The main contributions of this study include:

We explore and confirm the possibility of code change-granularity vulnerability prediction that can be used to prevent vulnerabilities by flagging likelyvulnerable code changes at pre-submit time.
We present the Vulnerability Prevention (VP) framework that automates online assessment of software vulnerabilities using a machine learning classifier.
We devise novel feature types to improve the classifier accuracy and reduces the feature data set by evaluating the precision and recall metrics.
We present the specialized tools to label code changes in AOSP, facilitating robust training and testing data collection.
We demonstrate a high precision (~98%) and recall (~80%) of the VP framework in identifying vulnerability-inducing changes, showing the potential as a practical tool to reduce security risks.
We discuss the implications of deploying the VP framework in multi-project settings. Our analysis data suggests two focus areas for future Android security research: optimizing the Android vulnerability fixing latency and more efforts to prevent vulnerabilities.

The rest of this paper is organized as follows. Section II provides the background information. Section III analyzes the design requirements and presents the VP framework design. Section IV details the design of the ML model, including the classifier and features for classifying likelyvulnerable code changes. Section V describes the tools developed to collect vulnerability datasets for model training and testing.
Section VI describes the data collection process using the tools, and characterizes the vulnerability issues, vulnerability-fixing changes, and vulnerability-inducing changes in an AOSP sub-project. Section VII presents the evaluation of the VP framework using an N-fold validation. Section VIIII extends the framework for real-time, online classification. Section IX discusses the implications and threats to validity. Section IX reviews the related works before concluding this paper in Section X.

II. BACKGROUND

This section outlines the code review and submission process of an open source software project, using AOSP (Android Open Source Project) as a case study. AOSP is chosen, considering its role as an upstream software project with the significant reach, powering more than 3 billion, active enduser products.

Code Change. A code change (simply, change) consists of a set of added, deleted, and/or edited source code lines for source code files in a target source code repository (e.g., git). A typical software engineer sends a code change to a code review service (e.g., Gerrit4 ) for mandatory code reviews prior to submission. A code change is attributed to an author who has an associated email address in AOSP. The change can also have one or more code reviewers. Both the author and reviewers have specific permissions within each project (e.g., project ownership status and review level).

During the code review process, a code change can undergo multiple revisions, resulting in one or more patch sets. Each patch set uploaded to the code review service represents an updated version of the code change. The final, approved patch set of the change can then be submitted and merged into the target source code repository.

Code Review. The code change author can revise and resend the change as a new patch set for further review or approval by designated code reviewer(s). The key reviewer permissions include: a score of +1 to indicate the change looks good to the reviewer, a score of +2 to approve the code change, a score of -1 to tell that the change does not look good (e.g., a minor issue), and a score of -2 to block the code change submission.

Projects (e.g., git repositories or subdirectories in a git repository) can have custom permissions and review rules. For example, a custom review rule is to enable authors to mark their code changes ready for presubmit testing because often authors upload non-final versions to the code review service (e.g., to inspect the diffs5 and preliminary feedback).

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.