Table Of Links Table Of Links ABSTRACT ABSTRACT I. INTRODUCTION I. INTRODUCTION I. INTRODUCTION II. BACKGROUND II. BACKGROUND II. BACKGROUND III. DESIGN III. DESIGN III. DESIGN DEFINITIONS DESIGN GOALS FRAMEWORK EXTENSIONS DEFINITIONS DESIGN GOALS FRAMEWORK EXTENSIONS IV. MODELING IV. MODELING IV. MODELING CLASSIFIERS FEATURES CLASSIFIERS FEATURES V. DATA COLLECTION V. DATA COLLECTION V. DATA COLLECTION VI. CHARACTERIZATION VI. CHARACTERIZATION VI. CHARACTERIZATION VULNERABILITY FIXING LATENCY ANALYSIS OF VULNERABILITY FIXING CHANGES ANALYSIS OF VULNERABILITY-INDUCING CHANGES VULNERABILITY FIXING LATENCY ANALYSIS OF VULNERABILITY FIXING CHANGES ANALYSIS OF VULNERABILITY-INDUCING CHANGES VII. RESULT VII. RESULT VII. RESULT N-FOLD VALIDATION EVALUATION USING ONLINE DEPLOYMENT MODE N-FOLD VALIDATION EVALUATION USING ONLINE DEPLOYMENT MODE VIII. DISCUSSION VIII. DISCUSSION VIII. DISCUSSION IMPLICATIONS ON MULTI-PROJECTS IMPLICATIONS ON ANDROID SECURITY WORKS THREATS TO VALIDITY ALTERNATIVE APPROACHES IMPLICATIONS ON MULTI-PROJECTS IMPLICATIONS ON ANDROID SECURITY WORKS THREATS TO VALIDITY ALTERNATIVE APPROACHES IX. RELATED WORK IX. RELATED WORK IX. RELATED WORK CONCLUSION AND REFERENCES CONCLUSION AND REFERENCES CONCLUSION AND REFERENCES III. DESIGN III. DESIGN This section outlines the design of the VP (Vulnerability Prevention) framework. The design is based on an analysis of its essential design requirements. A. DEFINITIONS A. DEFINITIONS Let us define the secure code review points and a taxonomy for classifying code changes in this study. Secure Code Review Points. It shows the three types of events that can be used to automatically trigger our classifier: Secure Code Review Points. Secure Code Review Points. (1) a code change is initially sent for code review (or marked as ready for review or pre-submit testing); (2) a new patch set is sent; (3) and a code change is submitted. Its use can be refined by extra conditions (e.g., triggering only when a reviewer is specified). The classifier can also be manually triggered in several ways (e.g., by adding a tag to the change description, clicking a UI button or checkbox in the code review service, or executing a shell command for a specific code change available in the code review service). Classification of Code Changes. In this study, we classify code changes into the following categories: Classification of Code Changes Classification of Code Changes ViC (Vulnerability-inducing Change) for a code change that originally induced a vulnerability. VfC (Vulnerability-fixing Change) for a code change that fixed the existing vulnerability. LNC (Likely Normal Change) for a code change unlikely to induce a vulnerability. Notably, it includes changes that have not be identified as a known ViC at the time of analysis. Additionally, VfLs (Vulnerability-fixing Lines) are the specific subset of source code lines edited by a VfC where the edits are essential to resolving the vulnerability. B. DESIGN GOALS B. DESIGN GOALS Our approach is devised for use cases meeting the following conditions: The target project experiences frequent software vulnerabilities with high potential consequences (e.g., costly fixes, product reputational damage, and impact on users). The target project serves as an upstream source for downstream software projects used to build many integrated, end-user products or services (e.g., AOSP for Android smartphones and TVs). Downstream projects often lack rigorous security testing (e.g., system fuzzing with dynamic analyzers [19]) due to the associated cost, technical expertise, and tooling constraints. By detecting and blocking vulnerable code changes in the upstream target project, security engineering costs are reduced for downstream projects. The reduction encourages continued use of the upstream project and attracts additional downstream adoption, incentivizing the upstream project owners to invest in vulnerability prevention practices. Under the targeted conditions, a classifier that estimates the likelihood of a vulnerability in a given code change proves effective. When the estimated likelihood exceeds a threshold, the respective code change is flagged for further scrutiny via secure code review or rigorous security testing. Our approach facilitates the detection of vulnerable code changes (e.g., with failing security tests) and consequently prevents their integration into the repository. Our approach also offers seamless integration into postsubmit, secure code review processes. Code changes flagged by the classifier undergo additional offline review by security engineers or domain experts. It increases the likelihood of vulnerability detection within those changes. By applying the classifier post-submission but pre-release, targeted security reviews can focus on the highest risk code changes (based on the estimated likelihood). Our approach, thus, optimizes the secure code review process, reducing the overall security costs, while maintaining a robust security posture. Considering those use cases, the key design goals for the classifier are set as follows: Reasonable recall (>75% for ViCs). It ensures that a significant majority (>75%) of vulnerable code changes is detected, substantially reducing the overall security risk. Reasonable recall (>75% for ViCs). High precision (>90% for ViCs). Since vulnerable code changes are rare, incorrectly flagging 1 out of 10 code changes is generally acceptable for the security reviewers. High precision (>90% for ViCs). Low false positive ratio (<2%). Normal code changes should rarely be flagged in order to maintain a streamlined review process for developers. Low false positive ratio (<2%). Fast inference time (e.g., <1 minute). It is to enable smooth integration into a code review service without causing developer-visible delays. Fast inference time (e.g., <1 minute). Low inference cost. To operate within typical open source project budgets for security infrastructure and tools, the inference should be done without having to use powerful or specialized ML hardware devices. Low inference cost. Infrequent retraining. Because the cost for model retraining is also important, monthly retraining on up to about a million samples is considered acceptable. It is to balance the accuracy maintenance (vs. daily retraining) with the affordability for most open source projects. Infrequent retraining. C. FRAMEWORK C. FRAMEWORK Our VP framework is applicable to three distinct use cases: Pre-submit Security Review is to utilize VP for assessing every code change sent for code review and identify likely-vulnerable code changes for additional secure code review by security domain experts. Pre-submit Security Review Pre-submit Security Review Pre-submit Security Testing is to employ VP for assessing every code change sent for code review and identify likely-vulnerable code changes for extra security testing (e.g., static analysis, dynamic analysis, or fuzzing) before submissions. Pre-submit Security Testing Pre-submit Security Testing Post-submit Security Review is to apply VP to all code changes submitted within a predefined period (e.g., daily or weekly) and isolate a set of likelyvulnerable code changes for an additional in-depth a secure code inspection by security domain experts. This use case differs from the existing post-submit time, security testing. Post-submit Security Review Post-submit Security Review The pre-submit security review use case scenario has the highest complexity. Compared with post-submit use cases, pre-submit use cases have stricter requirements for inference time and online retraining (e.g., directly visible to code change authors vs. quality assurance team). Compared with the pre-submit security testing use case, the pre-submit security review use case has stricter accuracy requirements (e.g., false positives for security testing mean mostly extra testing costs). Thus, it is used as the primary target for the framework design. To address the selected use case, the VP (Vulnerability Prevention) framework leverages its following key subsystems as depicted in Figure 1: Code Review Service. Authors initiate the code review process by uploading their code changes to a designated code review service. They then assign reviewers for their changes (see step 1 in Figure 1). The review service automatically triggers one or more review bots in response to the request of an author or reviewer, or when the uploaded code changes satisfy predefined conditions. Code Review Service. Code Review Service. Review Bot(s). Triggered review bots access the specific edits made by a source code change, along with relevant metadata of the code change and the baseline source code. To conduct in-depth analysis, bots usually leverage backend services for compilation, analysis, and testing of the change against the baseline. The new VP review bot utilizes those capabilities and forwards the gathered data to the classifier service. The classifier service then determines if the given source code change is likely-vulnerable or not. Review Bot(s). Review Bot(s). Classifier Service. When the classifier service istriggered, it utilizes the feature extractors and the data from the VP review bot to extract the pertinent features of the given code change. Subsequently, it performs inference using a model in order to estimate how likely the code change has vulnerabilities. Classifier Service. Classifier Service. The classifier model uses the extracted features asinput and generates a binary output signal (i.e., ‘1’ indicates likely-vulnerable and ‘0’ indicates likely-normal). The output signal guides if additional security review (or security testing) is beneficial for the code change. The classifier service has an option to employ multiple models, combining their results (e.g., through logical operators or majority voting) for better accuracy. Notification Service. When a code change is classified as a likely-vulnerable change, the notification service posts a comment on the code change in the code review service. The comment alerts the code change author and existing reviewers to the potential presence of vulnerabilities, urging extra scrutiny. Notification Service Notification Service If the target project maintains a dedicated pool of security reviewers, the notification service can automatically assign a member of the pool as a required reviewer for the target code change. The selected reviewer can be a primary engineer on a rotation or chosen through heuristics (e.g., round robin) for balanced distribution of security review workload. D. EXTENSIONS D. EXTENSIONS The VP framework supports extension for security testing and post-submit use cases: Selective Security Testing. To extend the VP framework for selective, pre-submit security testing, an asynchronous test execution service is further employed. The execution service patches the given code change into the baseline code, builds artifacts (e.g., binaries), and executes relevant security tests against the build artifacts. Selective Security Testing. Selective Security Testing. The execution service supports customization of security test configurations, including parameter tuning to target specific functions and adjust the maximum testing time. In implementation, a review bot is extended to generate tailored testing parameters. The extended bot leverages both the source code delta of the target code change and the vulnerability statistics of the target project. The resulting data-driven method allows the bot to use either the default parameter values or dynamically generate new ones, helping to optimize the balance between the security testing coverage and associated costs. Post-Submit Use Case. To extend the VP framework for post-submit time use cases, a replay mechanism is needed to process submitted code changes and invoke the VP review bot with the relevant input data. In particular, it requires tracking the code change identifiers from the git commit hashes if the used version control system is git. It uses the classification results to select a subset of code changes for further comprehensive security review. Post-Submit Use Case. Post-Submit Use Case. Author: Keun Soo Yim Author: Author: Keun Soo Yim Keun Soo Yim Keun Soo Yim This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv n arxiv