Table Of Links Table Of Links ABSTRACT ABSTRACT I. INTRODUCTION I. INTRODUCTION I. INTRODUCTION II. BACKGROUND II. BACKGROUND II. BACKGROUND III. DESIGN III. DESIGN III. DESIGN DEFINITIONS DESIGN GOALS FRAMEWORK EXTENSIONS DEFINITIONS DESIGN GOALS FRAMEWORK EXTENSIONS IV. MODELING IV. MODELING IV. MODELING CLASSIFIERS FEATURES CLASSIFIERS FEATURES V. DATA COLLECTION V. DATA COLLECTION V. DATA COLLECTION VI. CHARACTERIZATION VI. CHARACTERIZATION VI. CHARACTERIZATION VULNERABILITY FIXING LATENCY ANALYSIS OF VULNERABILITY FIXING CHANGES ANALYSIS OF VULNERABILITY-INDUCING CHANGES VULNERABILITY FIXING LATENCY ANALYSIS OF VULNERABILITY FIXING CHANGES ANALYSIS OF VULNERABILITY-INDUCING CHANGES VII. RESULT VII. RESULT VII. RESULT N-FOLD VALIDATION EVALUATION USING ONLINE DEPLOYMENT MODE N-FOLD VALIDATION EVALUATION USING ONLINE DEPLOYMENT MODE VIII. DISCUSSION VIII. DISCUSSION VIII. DISCUSSION IMPLICATIONS ON MULTI-PROJECTS IMPLICATIONS ON ANDROID SECURITY WORKS THREATS TO VALIDITY ALTERNATIVE APPROACHES IMPLICATIONS ON MULTI-PROJECTS IMPLICATIONS ON ANDROID SECURITY WORKS THREATS TO VALIDITY ALTERNATIVE APPROACHES IX. RELATED WORK IX. RELATED WORK IX. RELATED WORK CONCLUSION AND REFERENCES CONCLUSION AND REFERENCES CONCLUSION AND REFERENCES IX. RELATED WORK IX. RELATED WORK This section reviews the related works. Main Focus of System Security Research. Historically, much of the security engineering effort was focused on identifying previously unknown types of vulnerabilities and designing their mitigation techniques. Consequently, system security research often emphasized the development of innovative attack and defense techniques [39][40]. To this end, such previous security works employed a range of security testing, validation, and verification techniques [45][46][47][48], including stress testing, instrumentation, fuzzing, static analysis, dynamic analysis, and model checking. Main Focus of System Security Research. Emerging Software Supply Chain (SSC) Attacks. SSC attacks surfaced recently still leverage some known types of vulnerabilities (i.e., not entirely novel approaches). Their primary focus lies in strategically infiltrating critical SSCs (e.g., submitting vulnerable code changes) to reach a vast number of end-user devices. Reference [21] outlines the 107 unique SSC attack vectors used in 94 real-world attacks or identified vulnerabilities. Emerging Software Supply Chain (SSC) Attacks. The notable examples include Heartbleed [29], Log4Shell [29], and SolarWinds [31]. In some cases, developers are directly targeted because their compromised credentials enable attackers to submit malicious code changes. It is exemplified by the CircleCI incident, where a developer laptop was compromised to steal 2FA (two-factor authentication)-backed single sign-on sessions [33], and the LastPass incident, where keys were stolen via key logger malware [34][35]. Existing SSC Mitigation Techniques. Reference [21] identified the 33 detective or preventive techniques (e.g., production branch protection [24][25], unused dependency removal [26], version pinning [27], and open source vulnerability scanner integration with CI/CD [23]). Those techniques focus on securing the build process and downstream merging (e.g., known as build reproducibility and bootstrappability15 [36]). Existing SSC Mitigation Techniques Recognizing the increasing threat of developer credential theft, GitHub now mandates 2FA for developer accounts associated with critical projects. It implies more widespread industry participation is crucial [32]. Notably, [20] emphasizes the need for community-driven efforts to empower stakeholders in securing the SSC. That includes addressing human factors to reduce developer overwhelm, and systematically decreasing the attack surface for individual software developers. Such efforts could involve establishing usable communication channels across projects and encompassing tools within the build process and CI/CD. However, they offer less emphasis on protecting upstream code check-ins or improving the credibility or security coverage of other vital upstream development activities. Faulty Module Prediction. The concept of software fault prediction shares similarity with the concept of likelyvulnerable code change prediction presented in this study. The field of software fault prediction has been extensively studied in the past [1][5], focusing on identifying modules likely to contain defects. It aligns with the objective of this work that is to predict the potential for vulnerabilities within code changes. Faulty Module Prediction. This study differs significantly from those previous works. Most prior research concentrated on predicting general software defects rather than pinpointing vulnerabilities. Furthermore, their prediction granularity typically targeted software modules or subsystems. Thus, those techniques are used in a periodic manner, aiming to select the targets for testing in order to optimize the allocation of limited testing resources. It remains evident in [10], which specifically focuses on vulnerabilities but predicts the most vulnerable modules. In contrast, this paper offers a distinct advantage by providing an online classification for every code change. Various existing works focused on optimizing features, classifiers, and filters for defect prediction. The past studies have achieved an average detection probability of 71% with a false alarm rate of 25% [1], which was deemed acceptable. Other techniques explored simple heuristics instead of supervised learning, such as [7], which tracks recently modified files, previously buggy files, and their nearby files. Despite continuous efforts to improve prediction techniques [8], attempts to deploy such techniques in commercial software development environments [9] raised skepticisms. It may have been difficult in part because those techniques were neither originally designed for online prediction nor targeting high-priority problems (e.g., high severity security issues). The long-term accuracy of a defect prediction system (e.g., over multiple years) is significantly influenced by defect triggers. For instance, a high number of bugs found in a file could indicate: either frequent code changes continuously introducing new bugs (assuming consistent triggers), or increased testing efforts uncovering existing bugs over a short period. In the latter case, the buggy modules would be less likely to exhibit new bugs in the near future than the former case. Therefore, to accurately predict the likelihood of a code change introducing new security bugs, it is crucial to capture and analyze both development progress and code change statistics. Such aspects, however, were not the main focus in many previous defect prediction works. Vulnerability Prevention Techniques. Our VP approach shares a common goal with other approaches that focus on enhancing the code review process or providing secure coding education to software engineers. Techniques exist to improve code reviews by identifying syntax errors, common bugs (e.g., through Linting), and coding style issues. Others are to ensure comprehensive unit testing and appropriate reviewer assignments. The underlying motivation of VP aligns with those efforts: to proactively prevent the introduction of vulnerabilities during the development phase. Vulnerability Prevention Techniques. Vulnerability prevention can be also achieved at the programming language level. Type-safe languages (e.g., Java used in diverse areas including Android framework and app developments) offer inherent protection against common vulnerabilities, such as buffer overflows, integer overflows, and format string vulnerabilities. However, despite efforts to design safer languages like Go, native C/C++ and assembly code remain prevalent, especially in mobile platforms and Internet-of-Things system software (e.g., AOSP). This prevalence leaves those codebases exposed to a wide range of common software vulnerabilities. Software Bug Characterization. Some other previous works have sought to characterize software defects. Those efforts utilize a variety of methods, including code inspection, analyzing previously found bugs, static analysis, and long-term project history analysis. A notable example is [6], a quantitative study based on a static analysis of the Linux kernel commit history data. Additionally, some other works employed software fault injectors that emulate common software defects (e.g., using ODC-based models [15]) to study the resulting consequences [16]. Software Bug Characterization While static analysis can be useful for identifying certain types of known bugs, it faces limitations when it comes to detecting the wide range of realistic security bugs. The high false positive rate and binary decision output (warning vs. no warning) make it challenging to apply static analysis at the granular level of code changes. If the baseline codebase already contains numerous warnings, it becomes difficult to determine whether new warnings resulting from a given code change are actually caused by that code change. Author: Keun Soo Yim Author: Author: Keun Soo Yim Keun Soo Yim Keun Soo Yim This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv n arxiv