Why Your Code Review Process Might Be Missing Its Biggest Security Risks

Table of links

Abstract

1 Introduction

2 Background and Related Work

Software Security
Coding Weaknesses
Security Shift-Left
Modern Code Review
Code Review for Software Security
Security Concern Handling Process in Code Review

3 Motivating Examples

4 Case Study Design

Research Questions
Studied Projects
Data Collection
Coding Weakness Taxonomy
Study Overview
Security Concern Identification Approach (RQ1)
Alignment Analysis of Known Vulnerabilities (RQ2)
Handling Process Identification (RQ3)

5 Preliminary Analysis

PA1: Prevalence of Coding Weakness Comments
PA2: Preliminary Evaluation of our Security Concern Identification Approach

6 Case Study Results

7 Discussion

8 Threats to Validity

Internal Validity
Construct Validity
External Validity

7 Discussion

In this section, we discuss the implications of our results and provide practical recommendations for practitioners and potential future work. 1) Various coding weaknesses that may lead to security issues can be raised during code reviews. Our first preliminary analysis (PA1) in Section 5 shows that coding weaknesses were raised in the code review process 21 - 33.5 times more often than explicit vulnerabilities. This finding supports our intuition that the reviewers tend to focus on issues in source code. Therefore, it is more natural for the reviewers to identify coding weaknesses than security issues. This implication aligns with the previous work (Gon¸calves et al., 2022) that the cognitive load required for code reviews is lower if the reviewers already have the relevant knowledge. Indeed, our RQ1 shows that the raised security concerns in code reviews of OpenSSL and PHP cover nearly 90% of the CWE-699 weakness types (i.e., 35 out of 40 categories, see Table 10). This confirms our presumption that a variety of coding weaknesses can be raised by reviewers during the code review process. As shown in the motivating examples in Section 3, such coding weaknesses can lead to security issues. It can be implied that the coding weaknesses that may introduce security issues can potentially be identified during the code review process although the weaknesses did not yet explicitly expose the vulnerable outcomes (Braz et al., 2021). Our manual observations from RQ1 also show that the code changes may potentially be vulnerable if the author did not address the raised security concerns. For instance, Figure 7 shows that vulnerabilities such as CVE-2008-498963 and CVE-2012-582164 could be introduced into the code if the Improper Certificate Validation coding weakness (CWE-295) under the Authentication Errors category (CWE-1211) was not raised by a reviewer.

Recommendation: As we found that coding weaknesses can be identified in code reviews, our findings suggest that practitioners and/or other software projects could adopt the coding weaknesses taxonomy (i.e., CWE-699) to assist code reviews. A list of coding weaknesses should help the team increase the awareness of the potential problems that can lead to security issues without requiring deep security knowledge. A recent controlled experiment of Braz et al. (2022) has shown that a code review checklist could help reviewers better find security issues. Hence, one of the possible ways to adopt the coding weaknesses taxonomy for code reviews is to incorporate it into a code review checklist. Future work should investigate the effectiveness and practicality of using coding weaknesses as a code review checklist for identifying and mitigating security issues during the code review process. Moreover, as coding weakness are more frequently discussed than the security issue, coding weakness can also be an effective proxy for understanding secure code review practices

2) Coding weaknesses related to the known vulnerabilities of the systems are not frequently discussed in code reviews. Our RQ2 shows that some types of coding weaknesses were less frequently discussed compared to the known vulnerabilities (see Figure 10). In particular, we found that Memory Buffer Errors (CWE-1218) and Resource Management Errors (CWE-399) are the least frequently discussed coding weaknesses in OpenSSL and PHP (4%-9%), albeit the high percentages of known vulnerabilities (17%-29%). Furthermore, our motivating examples in Section 3 highlighted that such coding weaknesses can lead to a serious vulnerability. For example, OpenSSL’s Heartbleed is a known vulnerability related to weakness Out-of-bounds Read (CWE125) which is a type of memory buffer error.

These coding weaknesses were rarely discussed maybe because they are generic and easy to be overlooked. Hence, the reviewers may have failed to notice them. To mitigate this problem, the reviewers should be aware of these latent coding weaknesses in order to properly prioritize them in the code reviews. In addition to the known vulnerabilities, our RQ1 indicates that the security concerns in code reviews can vary from project to project. Particularly, OpenSSL reviewers were concerned about direct security threats (e.g., Authentication Errors (CWE-1211 and Random Number Issues (CWE-1213)), while PHP reviewers were more concerned about data controlling (e.g., Type Errors (CWE-136)). As OpenSSL is an encryption library for secure communication and PHP is a programming language, it can be implied that the application domain may correlate with the coding weaknesses that reviewers can raise. This finding also supports our results that coding weaknesses such as User-interface Security Issues (CWE-355) and Encapsulation Issues (CWE-1227) were neither found in our results nor appear in the known vulnerabilities because they are less related to the application domains of the studied projects.

Recommendation: Our findings suggest that it is essential to identify the specific coding weaknesses that are significant, highly prone to introduce security issues, and relevant to the application domain of the projects. Thus, rather than reviewing all types of coding weaknesses, a selected set of coding weaknesses can be prioritized for effective code reviews. Prioritization of coding weaknesses during code reviews can be based on known vulnerabilities and the unique concerns of the projects that were raised in the past. Future work can investigate a systematic approach for identifying and prioritizing the types of important coding weaknesses for individual projects in this context.

3) Not all the raised security concerns were addressed within the same code review process. The security concern handling scenarios identified in our RQ3 reveal a shortcoming in the code review process. Our results show that approximately a third of the security concerns from coding weaknesses (30%-36%, see C2 in Table 11) were acknowledged without fixes in the process. We observed that developers promised to fix some of the acknowledged concerns in the new independent code changes (10%-18%), but some concerns were left without fixing due to disagreement about the proper solution (18%-20%). Nevertheless, approximately half of the unresolved concerns (6%-9%) were eventually merged. This result implies a possible risk that security issues can slip through the code review process into the software product. The incomplete code reviews or unclean code changes that contain security concerns related to coding weaknesses should be held from merging until all security concerns are resolved. Otherwise, the remaining coding weaknesses in code changes can become security issues in the future.

This implication is consistent with the findings of the prior work which reported that relentless and inconclusive discussion could impact the code review quality (Kononenko et al., 2015), and the incomplete code reviews and the unsuccessfully fixed can negatively affect the developer’s contribution (Gerosa et al., 2021). Recommendation: Code reviews with security concerns should be escalated if the final resolutions cannot be agreed upon before merging. Security experts or experienced developers should be included in such code reviews to investigate complex security concerns. In addition, the mechanisms to notify the reviewers of the incomplete code reviews or the insufficiently addressed security concerns could reduce the risk that security issues will slip through the code review process into the software product. Our suggestion aligns with Wessel et al. (2020) who reported that the adoption of an automated mechanism such as code review bots can increase the number of merged pull requests, and, hence, reduce the number of abandoned code reviews. Kudrjavets et al. (2022) also observed that the automated bots can remind the developers of the pending tasks in the code review process without inciting negative feelings. Hence, future work should investigate an approach to identify incomplete code reviews or the insufficiently addressed security concerns to help developers increase awareness.

Authors:

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.