Table of links Table of links Abstract Abstract Abstract 1 Introduction 1 Introduction 1 Introduction 2 Background and Related Work 2 Background and Related Work 2 Background and Related Work Software Security Coding Weaknesses Security Shift-Left Modern Code Review Code Review for Software Security Security Concern Handling Process in Code Review Software Security Coding Weaknesses Security Shift-Left Modern Code Review Code Review for Software Security Security Concern Handling Process in Code Review 3 Motivating Examples 3 Motivating Examples 3 Motivating Examples 4 Case Study Design 4 Case Study Design 4 Case Study Design Research Questions Studied Projects Data Collection Coding Weakness Taxonomy Study Overview Security Concern Identification Approach (RQ1) Alignment Analysis of Known Vulnerabilities (RQ2) Handling Process Identification (RQ3) Research Questions Studied Projects Data Collection Coding Weakness Taxonomy Study Overview Security Concern Identification Approach (RQ1) Alignment Analysis of Known Vulnerabilities (RQ2) Handling Process Identification (RQ3) 5 Preliminary Analysis 5 Preliminary Analysis 5 Preliminary Analysis PA1: Prevalence of Coding Weakness Comments PA2: Preliminary Evaluation of our Security Concern Identification Approach PA1: Prevalence of Coding Weakness Comments PA2: Preliminary Evaluation of our Security Concern Identification Approach 6 Case Study Results 6 Case Study Results 6 Case Study Results 7 Discussion 7 Discussion 7 Discussion 8 Threats to Validity 8 Threats to Validity 8 Threats to Validity Internal Validity Construct Validity External Validity Internal Validity Construct Validity External Validity 3 Motivating Examples 3 Motivating Examples In this section, we provide motivating examples of security issues that are related to coding weaknesses which can potentially be identified by code reviews. We obtained three examples from the Common Vulnerabilities Exposure (CVE) reports for our studied systems (i.e., OpenSSL and PHP). A CVE report provides a description of the known vulnerabilities and related information including corresponding code changes (or patches), severity score, or related weaknesses that are assigned by the National Vulnerability Database (NVD) security analysts. Example 1: Heartbleed (CVE-2014-0160)3 in OpenSSL is a data leak vulnerability that occured due to improper input validation (CWE-20).4 Heartbleed is one of the famous security incidents in OpenSSL that affected a large number of servers between 2012 and 2014. An OpenSSL client can send a Heartbeat message to monitor the availability of a server. The server should return the received message to the client. However, in the vulnerable version (i.e., the version with the improper input validation), OpenSSL responds with the data of any length that the client specifies. Hence, sensitive information on the server can be obtained by the client. The coding weaknesses that caused such unexpected behavior is the improper input validation of the length parameter write length. As shown in the fixing patch of the Heartbleed issue (see Figure 1), the vulnerable code was fixed by ensuring that the entered length is valid. A study by Durumeric et al. (2014) suggested that if the reviewers had identified the improper input validation in the vulnerable version of OpenSSL, this vulnerability could have been prevented early. Example 2: The Integer Overflow or Wraparound weakness can trigger heap memory corruption in a vulnerable version of OpenSSL, leading to a denial of service error (CVE-2016-21065 ). The cause of this vulnerability is the integer overflow (CWE-190).6 The Integer Overflow or Wraparound weakness is a numeric error where an integer value becomes larger than the maximum size of the associated data type, forcing the system to mistakenly wrap the value around and causing the denial-of-services error. It can be seen in Figure 2 that the original condition (i + inl < b) is prone to the integer overflow because the output of the left-handside operation can exceed the range of the implicit variable type. This particular weakness was fixed by a patch that adjusted the expression by subtracting two integers, instead of adding them. Hence, in this case, the denial-of-service vulnerability from heap memory corruption was caused by the integer overflow or wraparound weakness. Example 3: The Potential Infinite Loop (CVE-2014-02387 ) is a denial-of-service vulnerability in PHP that was caused by a weakness of type Allocation of Resources Without Limits or Throttling (CWE-770)8 , which can also be considered as a Business Logic Errors (CWE-840) weakness.9 The vulnerable version of PHP can make the system unresponsive to requests when the attacker enters the lengthy input into a function that executes multiple for-loops. As seen in Figure 3, one of the loop variables (i) is incremented without being checked in the control condition. It was fixed by adding the missing exit condition to the loop when the counter reached the maximum size. These examples demonstrate that coding weaknesses can contribute to security issues. Since code reviews focus on identifying coding issues in source code, identifying coding weaknesses that can lead to various exploitable vulnerabilities and security issues would be beneficial to code review practices. Refer to Figure 4 for a real-world example of a coding weakness i.e., exposing the internal values in error messages, relevant to Information Management Errors (CWE-199), identified during a code review. However, little has been investigated on how often the reviewers can identify coding weaknesses that link to security issues during the code reviews, what kinds of security concerns are raised, and how they are being handled or responded to. This insight would help practitioners improve their code review practice and equip developers with a secure code review mindset that is more compatible with their technical expertise. Authors: Wachiraphan Charoenwet Patanamon Thongtanunam Van-Thuan Pham Christoph Treude Authors: Wachiraphan Charoenwet Patanamon Thongtanunam Van-Thuan Pham Christoph Treude Wachiraphan Charoenwet Wachiraphan Charoenwet Patanamon Thongtanunam Patanamon Thongtanunam Van-Thuan Pham Van-Thuan Pham Christoph Treude Christoph Treude This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv available on arxiv