Code Reviews Uncover Dozens of Security Weakness Categories, Study Shows

Table of links

Abstract

1 Introduction

2 Background and Related Work

Software Security
Coding Weaknesses
Security Shift-Left
Modern Code Review
Code Review for Software Security
Security Concern Handling Process in Code Review

3 Motivating Examples

4 Case Study Design

Research Questions
Studied Projects
Data Collection
Coding Weakness Taxonomy
Study Overview
Security Concern Identification Approach (RQ1)
Alignment Analysis of Known Vulnerabilities (RQ2)
Handling Process Identification (RQ3)

5 Preliminary Analysis

PA1: Prevalence of Coding Weakness Comments
PA2: Preliminary Evaluation of our Security Concern Identification Approach

6 Case Study Results

7 Discussion

8 Threats to Validity

Internal Validity
Construct Validity
External Validity

6 Case Study Results

We report the empirical results based on the code review comments identified by the semi-automated approach; and answer the three research questions in this section, followed by a summary of our findings.

RQ1: What kinds of security concerns related to coding weaknesses are often raised in code review? Table 9 shows the number of identified code review comments and aggregated security concerns. From the 135K code review comments in the dataset, we manually read 3,570 OpenSSL and 2,576 PHP comments with the highest cosine similarity scores until reaching the saturation point (i.e., 50 consecutive irrelevant comments). As described in Section 4.6.2, in the first iteration we removed irrelevant comments (e.g., related to bookkeeping and code styling), resulting in 232 and 148 comments. Subsequently, the first and the third author independently determined whether the comments raised legitimate security concerns and could be classified into one of the coding weakness categories, resulting in 202 and 128 comments. To simplify the results, we aggregated comments within the same pull request that were classified into the identical coding weakness category into singular security concern. In total, we identified 188 security concerns from 202 comments in 164 pull requests in OpenSSL and 123 security concerns from 128 comments in 100

pull requests in PHP. Note that one pull request can have multiple concerns with different coding weakness categories. The manual annotation process by the first and the third author achieved the inter-rater agreement (Cohen, 1960) κ = 0.70 and κ = 0.84 for OpenSSL and PHP, which can be interpreted (McHugh, 2012) as substantial (0.61 ≥ |κ| ≥ 0.81) and almost perfect (|κ| > 0.81), respectively. Table 10 shows the number of identified security concerns across the 40 coding weakness categories of CWE-699. The numbers in parentheses indicate the CWE category number of the coding weakness. We found that in OpenSSL and PHP, identified security concerns were related to 35 out of 40 coding weakness categories of CWE-699, suggesting that diverse types of coding weaknesses can be discovered during the code review process. The bold text in Table 10 highlights the top ten coding weaknesses that were frequently raised in each project and the ‡ symbol indicates the concerns that were frequently raised in both OpenSSL and PHP. We found that six coding weaknesses, i.e., Authentication Errors (CWE-1211), API / Function Errors (CWE-1228), Privilege Issues (CWE-265), Behavioral Problems (CWE-438), Cryptographic Issues (CWE-310) and Random Number Issues (CWE1213), were among the top ten concerns in both OpenSSL and PHP. Additionally, we observe that several coding weaknesses were frequently raised in a particular project. This may suggest that while reviewers in OpenSSL and PHP share a set of common concerns, they can have a specific focus on particular security aspects as well. Below, we present common security concerns across both projects and projectspecific security concerns.

Common security concerns in OpenSSL and PHP: The first two common security concerns are related to users and rights, i.e., Authentication Errors (CWE-1211) and Privilege Issues (CWE-265) coding weaknesses. Authentication Errors (CWE-1211) are related to the failure to properly verify the identification of the rightful actors who can gain access to the system. For example, as shown in Figure 7, we observed that a reviewer noticed that the program does not verify whether the certificate is trusted or not: ”[...]The certificate in question is now detached from its provenance, we don’t know whether it came from the trust store, or from the peer-supplied untrusted chain![...] ”.44 Privilege Issues (CWE-265) are related to the improper management of critical privileges assigned to users or objects. For example, a reviewer mentioned that the developer did not use the correct approach to verify that the user has sufficient privileges to execute a script.45

Another two common security concerns are related to coding weaknesses about the functionality of the system, i.e., API/Function Errors (CWE-1228) and Behavioral Problems (CWE-438). API/Function Errors (CWE-1228) covers the use of dangerous functions or the exposing of the functions that allow unwanted actors to execute restricted actions. For example, as shown in Figure 8, we observed that a reviewer commented that assigning the result of the format string function to the same input variable can be potentially harmful: ”[...]Using the same variable as both input and output for spprintf looks dangerous. Are you sure it is safe? ”.46 Behavioral Problems (CWE-438) refer to code that may cause unexpected behavior in the software system. For example, a reviewer noticed that the code can look for the required files in incorrect directories if the program is compiled in different environments.4

Concerns related to the cryptographic process, i.e., Cryptographic Issues (CWE310) and Random Number Issues (CWE-1213), were also common in both OpenSSL and PHP. Cryptographic Issues (CWE-310) covers the proper use of encryption algorithms and cryptographic keys to ensure system and data security. For instance, as shown in Figure 9, a developer responded to a reviewer’s suggestion that the lengths of the cryptographic keys can be dynamic and cannot be restricted to a fixed value by saying ”[...]HMAC keys can be variable length so SHA256 DIGEST LENGTH doesn’t seem like the right answer here”.48 Random Number Issues (CWE-1213) account for the process of obtaining sufficient ran

Including the six common coding weaknesses, there are 21 types of coding weaknesses that were raised in both projects. In particular, security concerns related to coding weaknesses in category Audit/Logging Errors (CWE-1210), Information Management Errors (CWE-199), Concurrency Issues (CWE-557), Memory Buffer Errors (CWE-1218), Business Logic Errors (CWE-840), and Resource Locking Problems (CWE-411) are among the top 20 categories in both projects. Security concerns in these categories may also be considered common concerns to some extent. The previous code review works (Alfadel et al., 2023; Paul et al., 2021b; Di Biase et al., 2016; Bosu et al., 2014; Edmundson et al., 2013) reported that reviewers can identify security issues in various degrees based on the different application domains and the programming languages. However, the studied security issues are frequently bounded by well-known vulnerabilities that are associated with security consequences such as SQL Injection, XSS, or Denial of service. Our results further reveal that reviewers can commonly discuss more extensive coding weaknesses that can introduce those vulnerabilities from the development perspective. For example, the discussion regarding API / Function Errors (CWE-1228), Behavioral Problems (CWE-438), Cryptographic Issues (CWE-310), and Random Number Issues (CWE-1213) have not been previously reported.

Project-specific security concerns: In addition to common security concerns, understanding project-specific concerns would allow us to gain better insight into the secure code review practices in each project. We observed that in OpenSSL, a library that provides encryption functionalities to its dependent systems, reviewers seem to focus on preventing direct security threats that are related to encryption, e.g., Key Management Errors (CWE-255) and Communication Channel Errors (CWE-417). For example, a reviewer discussed the causes of timing-attack, which can reveal the type of cryptographic key used in secure communication with the attacker.50 On the other hand, in PHP, a programming language for web applications, reviewers rather focus on security related to data controlling, e.g., Data Validation Issues (CWE-1215) and the versatility of language, e.g., Pointer Issues (CWE-465) and Type Errors (CWE-136). Also, it seems that PHP reviewers are concerned with Documentation Issues (CWE-1225), which are rarely recognized in a security context (Alfadel et al., 2023). For example, a developer explained to a reviewer that a function should not declare to accept any type of parameters if it intends to raise TypeError when the user inputs the parameters of incorrect types, e.g., to avoid Denial of Service vulnerability.51 In another case, a reviewer noticed that a function does not implement a randomization algorithm that it claims to use in the document.52 These types of security concerns highlight the importance of input management and documentation in PHP.

Lastly, for the coding weakness types that were rarely raised, it may be because these issues are irrelevant to the application domains of the systems. We did not observe any concerns related to Lockout Mechanism Errors (CWE-1216), as it can cause an overly restrictive authentication policy, which is not applicable in both projects. Similarly, no concerns related to User Interface Security Issues (CWE355) were found, as OpenSSL and PHP do not have an elaborate user interface. Therefore, it is less likely that reviewers would raise this type of concern.

RQ2: How aligned are the raised security concerns and known vulnerabilities? Based on the mapping of known vulnerabilities to related coding weaknesses, as explained in Section 4.7, we find that the known vulnerabilities of OpenSSL and PHP during the studied period are related to 16 coding weakness categories. We answer this question by comparing the percentages of the known vulnerabilities and the raised security concerns that we found in RQ1 (Table 10).

Figure 10 shows that nine coding weakness categories in OpenSSL and six coding weakness categories in PHP have a high proportion of known vulnerabilities in the past, but are less frequently discussed in code reviews. For instance, the top two coding weakness categories that have the highest proportion of known vulnerabilities are Memory Buffer Errors (CWE-1218; 21% in OpenSSL and 29% in PHP) and Resource Management Errors (CWE-399; 21% in OpenSSL and 17% in PHP). However, these two coding weakness categories have a relatively low proportion of security concerns raised in the code reviews (4% - 9%). Similarly, 6% - 12% of the known vulnerabilities are related to Business Logic Errors (CWE840), File Handling Errors (CWE-1219), and Pointer Issues (CWE-465) which were rarely discussed in the code review (only 1% - 7% of the security concerns).

Moreover, we observe that OpenSSL has three coding weaknesses that are lessfrequently discussed in code reviews i.e., Information Management Errors (CWE199) (17% of known vulnerabilities; 3% of security concerns), Cryptographic Issues (CWE-310) (7% of known vulnerabilities; 4% of security concerns), and Data Neutralization Issues (CWE-137) (2% of known vulnerabilities; 0% of security concerns). In particular, the lower number of security concerns about Data Neutralization Issues align with the observation of Braz et al. (2021) that developers may not be aware of the consequences of improper input validation, as well as the case of Heartbleed as shown in Figure 1. On the other hand, coding weaknesses in six categories in both OpenSSL and PHP were more frequently discussed than the known vulnerabilities. Coding weaknesses related to Authentication Errors (CWE-1211), String Errors (CWE-133), Type Errors (CWE-136), Concurrency Issues (CWE-557), Data Processing Errors (CWE-19), and Behavioral Problems (CWE-438) which occurred in 4% of known vulnerabilities in OpenSSL and PHP were discussed by 22%-23% of security concerns in both projects. Despite the low frequency of the security concerns compared to the known vulnerabilities, all of the coding weakness categories of the known vulnerabilities, except for Numeric Errors (CWE-189) were discussed in the code review as shown in Figure 10. This finding suggests that reviewers may be able to identify these kinds of coding weakness, but require more attention.

C2. Acknowledged (30% in OpenSSL; 36% in PHP): For a third of the security concerns raised, we observed that security concerns were acknowledged by the developer or other reviewers but were not fixed in the same pull request. We observed that the concerns were not fixed in the same pull request because they will be fixed elsewhere (C2.1; 10% for OpenSSL and 18% for PHP) or due to an unresolved discussion (C2.2; 20% for OpenSSL and 18% for PHP). In particular, for the fix-elsewhere scenario (C2.1), the reviewers and developers discussed the raised concern and agreed that the necessary fixes should be made in new pull requests. We find that around half (55%) of the security concerns in this scenario were eventually merged in both projects ( 11 20 for OpenSSL and 12 22 for PHP). For example, a reviewer noticed the use of stale pointer and suggested a fix. The developer then replied, ”[...] Ok. I’ll prepare a pull request (but not right away) and request your review.”.56 However, it is not possible to confirm whether all security concerns in the C2.1 scenario were later fixed as promised. For the unresolved discussion scenario (C2.2), the developers and reviewers cannot find an agreeable direction to address the concern. The discussions in this scenario tend to be more rambling and involve several sub-concerns, hindering

reviewers from reaching an agreeable resolution. This could be due to different understandings and perspectives between reviewers. For example, reviewers and developers discussed the resolution while aiming to maintain compliance with security standards. However, due to the equivocal interpretation of the standards, the discussion cannot reach an agreeable resolution.57 Another example is that a reviewer raised a concern about the certificate authentication process and requested a modification.58

The other reviewers, including the developer, agreed that the concern was valid but expressed multiple opinions on the solutions. The pull request with the concern was eventually merged without any changes. Indeed, we found that 16 pull requests in OpenSSL and 5 pull requests in PHP which contain 16 ( 16 36 = 44% for OpenSSL) and 7 ( 7 22 = 31% for PHP) concerns in C2.2 were eventually merged without any evidence that the concerns were addressed. It should be noted that the reviewer’s workload may affect the code review outcomes. We found that a significant portion of reviewers (54% in OpenSSL and 17% in PHP) engaged in unresolved discussions (C2.2) are classified as highworkload reviewers i.e., reviewed over 100 pull requests in each respective project. We hypothesize that workload, characterized by the volume of code reviews, as discussed in prior research (Ruangwan et al., 2019), could influence the quality of code review process. However, future work can be conducted to further investigate this phenomenon.

C3. Dismissed (15% in OpenSSL; 26% in PHP): In this scenario, the developer and reviewers discussed the security concerns raised, and the security concerns were dismissed. We observed that the discussions eventually concluded that the concern was a false concern (C3.1; 13% for OpenSSL and 7% for PHP) or acceptable by design choice (C3.2; 24% for OpenSSL and 7% for PHP). Specifically, the false concern scenario (C3.1) is related to cases in which developers or other reviewers offered an explanation to invalidate the security concerns. For example, a reviewer raised a concern about leaking sensitive data.59 Then, the developer replied to the comment to explain that the implementation is not leaking sensitive data ”[...] %s given part shouldn’t be added for values (but only for types) since they might contain sensitive data”, which was agreed by the reviewer. The design choice scenario (C3.2) refers to cases where security concerns were dismissed by other factors such as performance trade-off, maintainability, or system design (Zanaty et al., 2018). For example, a developer responded that a change in the data-neutralizing process was a valid concern as raised by the reviewer; however, it did not affect the application logic.60 The reviewer finally agreed and approved the pull request. We also observed that 20 pull requests in OpenSSL and 4 pull requests in PHP which contain 21 ( 21 24 = 88% for OpenSSL) and 4 ( 4 8 = 78% for PHP) concerns in scenario C3.2 were eventually merged.

C4. Unresponded (3% in OpenSSL; 9% in PHP): There were a few cases where security concerns did not receive any responses nor activities logged in the pull request. This was in part due to out-of-context (C4.1; 2% for PHP) or unknown & inactivity (C4.2; 3% for OpenSSL and 7% for PHP). The out-of context scenario (C4.1) refers to cases where security concerns drift away from the current discussion or the goal of the code changes. For example, a reviewer raised a security concern about insufficient check of input and instantly volunteered to create a new change request that fixes the problem, however, the developer and other reviewers did not respond to the concern.61 The unknown & inactivity scenario (C4.2) refers to cases where security concerns were simply disregarded without a clear reason. For example, a reviewer remarked suspicious use of pointer but the developer did not respond and the pull request was eventually rejected.

Authors:

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.