Inside the Data Pipeline Behind Classifying Android Security Flaws

Table Of Links

ABSTRACT

I. INTRODUCTION

II. BACKGROUND

III. DESIGN

DEFINITIONS
DESIGN GOALS
FRAMEWORK
EXTENSIONS

IV. MODELING

CLASSIFIERS
FEATURES

V. DATA COLLECTION

VI. CHARACTERIZATION

VULNERABILITY FIXING LATENCY
ANALYSIS OF VULNERABILITY FIXING CHANGES
ANALYSIS OF VULNERABILITY-INDUCING CHANGES

VII. RESULT

N-FOLD VALIDATION
EVALUATION USING ONLINE DEPLOYMENT MODE

VIII. DISCUSSION

IMPLICATIONS ON MULTI-PROJECTS
IMPLICATIONS ON ANDROID SECURITY WORKS
THREATS TO VALIDITY
ALTERNATIVE APPROACHES

IX. RELATED WORK

CONCLUSION AND REFERENCES

V. DATA COLLECTION

This section describes how the vulnerability dataset is collected and generated for evaluating the accuracy of classifier models. The dataset consists of a list of source code changes where each change is labeled as ViC or LNC that includes VfC. Classification as either ViC or LNC aligns with the goal of this study to build a classifier that accurately differentiates between ViCs and LNCs.

The data collection process (depicted in Figure 2) involves the three key steps:

(1) selecting all critical vulnerabilities found in the target AOSP codebase,

(2) associating each vulnerability with its corresponding fixes, i.e., VfCs; and

(3) locating of ViC(s) for each VfC:

Selecting Target Vulnerabilities. As depicted in Figure 2, this study leverages the CVE (Common Vulnerabilities and Exposures) database 7 , maintained by the National Cybersecurity FFRDC (NCF), to select the target vulnerabilities. Specifically, it focuses on the CVEs that are found in the target AOSP codebase (namely, AOSP CVEs) and published on AOSP Security and Update Bulletins (ASB)8 . This study excludes some types of CVEs to remain focused. First, self-discovered and fixed CVEs found

internally by Google during new Android dessert releases (e.g., v14) are omitted due to the lack of publically available details. Second, vulnerabilities found in the proprietary extensions from silicon vendors, ODMs (e.g., Qualcomm), and the Google play service are not considered because they fall outside the upstream AOSP development. Third, CVEs of upstream Linux kernel (e.g., mainline, stable, and longterm releases) are excluded although AOSP-specific Linux kernel CVEs are included (e.g., ones found in the Android common kernel extensions). It is because they often involve code developed by Google, silicon vendors and ODMs, and are not strictly tied to a specific AOSP platform version.

Associating Vulnerabilities and Fixes. For each of the selected CVEs, this step locates the associated VfC(s). It begins by identifying all the relevant bug report(s) linked from a given CVE issue. We note that every target CVE issue published on the AOSP security bulletins has one or more associated bug reports stored in an issue tracking service (e.g., Google issue tracker 9 aka Buganizer). Conversely, multiple CVEs can sometimes share the same bug report if their fixes are identical or closely related.

Bug reports offer valuable insights into the vulnerability fixing process (e.g., key discussions done while reproducing or fixing them). In a vast majority of the cases, bug reports contain information about all or a subset of their VfCs. It is explicit if a VfC lists a bug report ID in its code change description (e.g., Bug: or Fixes: in the gerrit 10 change description) because then its submission event is posted on the bug report.

Our BugID2GerritID script automates the process of finding VfCs. It takes a list of bug IDs as input, scans the content of those bug reports, and returns any posted change IDs. Because code changes can be cherry-picked to other branches, a single change can exist across multiple branches. At this stage, the script does not yet differentiate between original changes and cherry-picks, gathering the change IDs (i.e., gerrit IDs) of all relevant changes.

While VfCs for CVEs or other important security issues usually reference their bug report in their gerrit description, in practice, depending on the used development protocol, it is not always the case. If the script finds no gerrit ID, a manual review work is triggered for all such bug reports to find the associated, implicit VfCs. Rarely, some bug reports do not have any VfCs if those externally known issues do not exist in the internal repository (e.g., already resolved). Occasionally, such manual analyses relevel relevant gerrit changes or commits (e.g., URLs) linked to the VfCs. In those cases, the GerritID2ChangeIDandCommitHash script is used to extract the specific VfC IDs and commit hashes from the gerrit IDs. Importantly, commits sharing the same change ID indicate cherry-picks of the original change.

Locating Vulnerability-inducing Changes (ViCs) for each Vulnerability-fixing Change (VfC). The primary objective of Vulnerability Prevention (VP) is to maximize the accurate identification of ViCs. However, the aforementioned two steps in this section have thus far only allowed for the identification of CVEs and VfCs. Thus, we introduce a technique that enables the identification of ViCs from a given VfC. The identified ViCs undergo manual analysis to remove irrelevant code changes, resulting in a refined ViC set used to evaluate VP classifiers and features.

These identified ViCs Table I presents the algorithm for finding ViCs. It first identifies all the changed lines (i.e., additions and deletions) by using the git11 show command and subsequently parsing its output data. For each of the identified, changed source code lines, our Blame script filters out extraneous lines (e.g., empty lines, headers, and comments) in order to only retain the relevant, vulnerabilityfixing lines (VfLs). For a deleted line or a sequence of deleted lines, the script checks when each deleted line was added or last modified. The emphasis on the addition and last modification helps pinpoint potential ViCs because those code changes could have addressed the vulnerability at least but were unsuccessfully. We note that automatically and accurately determining whether a target vulnerability originates from the last modification or prior changes(if such changes exist) remains a challenge. Thus, this study relies on manual reviews for such cases.

When deleted lines are replaced by some newly added lines, typically more complex lines are added (e.g., in terms of the number of lines) to implement tailored error checking rules and error handling routines that can prevent a

corresponding vulnerability at runtime. The tool does not classify such as a modification because it is challenging to determine whether it is a sequence of deletions and additions, or a true modification.

For an added line or consecutively added lines in VfLs, our Blame script analyzes when the next valid line was last modified. Here, a next valid line, for example, means a line that is not an empty line nor a comment. It is to target the common case where an error checking routine is added right before a checked variable is used. By examining the addition or last modification time of the subsequent line, the tool identifies potential ViCs where the initial error checks for those variable(s) might have been missed.

If multiple ViCs are identified for a single VfC, the script lists them all. While the analysis done in this study mostly relies on such script-based automated techniques for locating ViCs, sometimes valuable insights for locating ViC(s) are found in the discussions posted on the bug reports or from the descriptions of VfCs. The tools and their algorithms are continuously refined through an iterative validation process of the discovered ViCs.

Author:

Keun Soo Yim

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.