Authors: Wachiraphan Charoenwet Patanamon Thongtanunam Van-Thuan Pham Christoph Treude Authors: Wachiraphan Charoenwet Patanamon Thongtanunam Van-Thuan Pham Christoph Treude Wachiraphan Charoenwet Wachiraphan Charoenwet Patanamon Thongtanunam Patanamon Thongtanunam Van-Thuan Pham Van-Thuan Pham Christoph Treude Christoph Treude Table of links Table of links Abstract Abstract Abstract 1 Introduction 1 Introduction 1 Introduction 2 Background and Related Work 2 Background and Related Work 2 Background and Related Work Software Security Coding Weaknesses Security Shift-Left Modern Code Review Code Review for Software Security Security Concern Handling Process in Code Review Software Security Coding Weaknesses Security Shift-Left Modern Code Review Code Review for Software Security Security Concern Handling Process in Code Review 3 Motivating Examples 3 Motivating Examples 3 Motivating Examples 4 Case Study Design 4 Case Study Design 4 Case Study Design Research Questions Studied Projects Data Collection Coding Weakness Taxonomy Study Overview Security Concern Identification Approach (RQ1) Alignment Analysis of Known Vulnerabilities (RQ2) Handling Process Identification (RQ3) Research Questions Studied Projects Data Collection Coding Weakness Taxonomy Study Overview Security Concern Identification Approach (RQ1) Alignment Analysis of Known Vulnerabilities (RQ2) Handling Process Identification (RQ3) 5 Preliminary Analysis 5 Preliminary Analysis 5 Preliminary Analysis PA1: Prevalence of Coding Weakness Comments PA2: Preliminary Evaluation of our Security Concern Identification Approach PA1: Prevalence of Coding Weakness Comments PA2: Preliminary Evaluation of our Security Concern Identification Approach 6 Case Study Results 6 Case Study Results 6 Case Study Results 7 Discussion 7 Discussion 7 Discussion 8 Threats to Validity 8 Threats to Validity 8 Threats to Validity Internal Validity Construct Validity External Validity Internal Validity Construct Validity External Validity Abstract Abstract Identifying security issues early is encouraged to reduce the latent negative impacts on the software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce realworld security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%- 36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews. 1 Introduction 1 Introduction Software security is an important focus in software development processes because it encompasses how software system sustains external threats (McGraw, 2004). Managing security issues in software products is crucial because the latent security issues, especially exploitable vulnerability, can exponentially impact end-users and require more resources to resolve if discovered in the later stage. Attempting to mitigate security issues, developers are encouraged by the ongoing shift-left concept (Migues, 2021; Weir et al., 2022) to test the new software as early as possible. In the spirit of shifting left, numerous organizations have adopted modern code review, a software quality assurance activity for identifying and removing software defects early in the development lifecycle (Bosu, 2013; Rigby et al., 2012). Prior studies reported that code review is a potential approach for identifying and eliminating security issues at the early stage (Hein and Saiedian, 2009; Bosu et al., 2014; Assal and Chiasson, 2018). In particular, a study by Di Biase et al. (2016) observed that code review could identify well-known security issues such as Cross-Site Scripting (XSS). Several studies have investigated the benefits of code reviews in identifying security issues (Alfadel et al., 2023; Bosu et al., 2014; Di Biase et al., 2016; Edmundson et al., 2013; Paul et al., 2021b). Still, the security issues studied by previous works were typically bounded by the types of well-known vulnerabilities such as SQL Injection and XSS. In particular, the majority of studied security issues are limited to the vulnerabilities that attackers can exploit. Since code review focuses on identifying and mitigating coding issues, we hypothesize that coding weaknesses, or faults in code, that can potentially lead to security issues may also be found and mitigated during the code review process. Moreover, coding weaknesses should fit the capability of the typical reviewers who may have limited security awareness and knowledge (Braz et al., 2022) because reviewers need to have a substantial understanding of security knowledge in order to identify the security issues during code review (Braz and Bacchelli, 2022). Our preliminary analyses (Section 5) indicate that the simple coding weaknesses such as numeric errors, insufficient input validation, or business logic errors are more frequently discussed by reviewers than the security issues that prior work regularly studied in code reviews. However, the practices of code reviews in identifying such coding weaknesses are not yet fully investigated. This includes the types of coding weaknesses that lead to security issues and the handling process of these coding weaknesses. In addition, little is known about whether the security concerns raised during code reviews are aligned with the vulnerabilities that a system may have had in the past. Exploring these aspects would help us better understand the unrealized benefits of considering coding weaknesses during code reviews for the early prevention of software security issues. Such insight can also reveal the gaps between the current code review practice and the vulnerabilities that were known in the respective systems. On one hand, software teams could develop secure code review policies that enable them to more effectively identify and address security concerns during the code review process (M¨antyl¨a and Lassenius, 2009). On the other hand, researchers can expand the new perspective of code review studies and understand the shortcomings in code review practices and tools. In this work, we aim to investigate the coding weaknesses that were raised during the code review and to investigate how the code review comments that mentioned coding weaknesses were handled. We conducted our case study on OpenSSL and PHP which are large open-source systems that are prone to security issues. We chose to examine these phenomena in open-source projects due to the availability of publicly accessible datasets, the mandatory code review policy, and the past vulnerabilities of the selected projects. This decision also stems from the observation that code review outcomes in open-source communities, such as the ratio between functional defects and maintainability defects identified by reviewers, do not significantly differ from those observed in industry settings. (M¨antyl¨a and Lassenius, 2009; Beller et al., 2014). To confirm our presumption that the discussion related to coding weaknesses is more prevalent than the explicit vulnerabilities, we conducted an initial analysis by manually annotating 400 randomly sampled code review comments from each studied project. We found that coding weaknesses could be raised in the code reviews 21 - 33.5 times more often than explicit vulnerabilities. Therefore, we conducted an empirical study to address three research questions: (RQ1) What kinds of security concerns related to coding weaknesses are often raised in code review?, (RQ2) How aligned are the raised security concerns and known vulnerabilities?, and (RQ3) How are security concerns handled in code review?. To do so, we applied a semi-automated approach to 135,560 code review comments to identify code review comments that are related to coding weaknesses. Then, we manually annotated the types of coding weaknesses for 6,146 code review comments that are related to coding weaknesses. We used the taxonomy of the Common Weakness Enumeration—CWE-699 which covers 40 categories of coding weaknesses that are related to security issues. In addition, we analyzed 378 Common Vulnerabilities Exposure (CVE) reports (101 from OpenSSL; 277 from PHP) to investigate whether the coding weaknesses raised during the code reviews aligned with the known vulnerabilities in the systems. To understand how coding weakness comments were handled during code reviews, we performed qualitative analysis to identify the handling scenarios based on the code review activities (e.g., review discussion, revisions) of the corresponding code changes. The case study results show that coding weaknesses related to 35 out of the 40 categories in CWE-699 were raised during the code review process of OpenSSL and PHP (RQ1). For example, comments about coding weaknesses in authentication, privilege, and API were frequently raised in both studied projects. Each studied project also has unique coding weaknesses raised during code reviews, e.g., the direct security threats in OpenSSL and input data validation in PHP. These results indicate that various coding weaknesses that link to security issues were raised during the code review process, and the different software projects have different focused coding weaknesses. Known vulnerabilities in the studied projects are related to 16 weakness categories (RQ2). However, coding weaknesses related to memory buffer errors and resource management errors are the least frequently discussed coding weaknesses in OpenSSL and PHP (4%-9%), despite the high percentages of known vulnerabilities (17%-29%). These results suggest that some important coding weaknesses in a project may not be sufficiently discussed in the current code review practice. Coding weaknesses raised during the code reviews were handled in four ways (RQ3). In many cases (39%-41%), developers attempted to solve the issues. Nevertheless, approximately a third (30%-36%) of the raised coding weaknesses were only acknowledged without immediate fixes (i.e., no additional modifications to the code changes in the reviews). We observed that some of the acknowledged concerns were agreed to be fixed in new separate code changes (10%-18%) and some were left without fixing due to disagreement about the proper solution (18%-20%). A relatively small proportion of the concerns raised (14%-26%) were clarified and dismissed through discussion. From all scenarios, we found alarming cases (6%-9%) where security issues can be introduced in the code because code changes with unresolved discussion were eventually merged. Additionally, the abandoned concerns (3%-9%) and the unsuccessfully fixed concerns (2%-4%) are also important because they can negatively affect the developer’s contribution (Gerosa et al., 2021). These scenarios indicate that security concerns from coding weaknesses need a better handling process. Based on the findings, we recommend the software projects to consider coding weakness categories (i.e., CWE-699) as a guideline for identifying coding weaknesses that can introduce security issues in code reviews. Coding weaknesses can be prioritized based on the significance, proneness, and unique set of coding weaknesses raised the past code reviews. Our work also highlights a shortcoming of the code review process in handling security concerns (i.e., unsuccessfully fixed, unresolved discussion, and unresponded) which might require future work to address. Novelty & Contributions: To the best of our knowledge, this paper is the first to empirically investigate code review in identifying and mitigating coding weaknesses that link to security issues in addition to the well-known vulnerabilities, highlighting the potential benefits of code reviews for early prevention of potential security issues. Second, we presented a novel semi-automated approach that leverages a domain-specific pre-trained word embedding model to find potential code review comments related to security issues. Third, we examined the alignment between the known vulnerabilities in the studied systems and the coding weaknesses that were often raised during the code review process, highlighting a shortcoming of the current code review practices that some important coding weaknesses may not be sufficiently discussed. Finally, we investigated the handling process of the coding weaknesses raised during a code review which sheds light on an issue that a coding weakness can slip through the code review process and potentially become a security issue in the future. Novelty & Contributions: Data Availability: We have released the supplementary material1 of scripts for data retrieval and data analysis in this study along with the annotated data to facilitate further research. Data Availability: Paper Organization: Section 2 describes the background and explores the related work. Section 3 demonstrates the examples of vulnerabilities that were caused by coding weaknesses. Section 4 explains the case study design. Section 5 explains the initial analysis method and result. Section 6 reports the results. Section 7 discusses the implications and suggestions. Section 8 clarifies the threats that may affect the validity of this study. Finally, Section 9 draws the conclusion. Paper Organization: This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv available on arxiv