Table of Links
IV. Systematic Security Vulnerability Discovery of Code Generation Models
VII. Conclusion, Acknowledgments, and References
Appendix
A. Details of Code Language Models
B. Finding Security Vulnerabilities in GitHub Copilot
C. Other Baselines Using ChatGPT
D. Effect of Different Number of Few-shot Examples
E. Effectiveness in Generating Specific Vulnerabilities for C Codes
F. Security Vulnerability Results after Fuzzy Code Deduplication
G. Detailed Results of Transferability of the Generated Nonsecure Prompts
H. Details of Generating non-secure prompts Dataset
I. Detailed Results of Evaluating CodeLMs using Non-secure Dataset
J. Effect of Sampling Temperature
K. Effectiveness of the Model Inversion Scheme in Reconstructing the Vulnerable Codes
L. Qualitative Examples Generated by CodeGen and ChatGPT
M. Qualitative Examples Generated by GitHub Copilot
M. Qualitative Examples Generated by GitHub Copilot
Listing 12 and Listing 13 show two examples of the generated codes by GitHub Copilot that contain security vulnerabilities. Listing 12 depicts a generated code that contain CWE-022, which is known as path traversal vulnerability. In this example, lines 1 to 6 are the non-secure prompt, and the rest of the code is the completion of the given non-secure prompt. The code in Listing 12 contains a path traversal vulnerability at line 10, where it enables arbitrary file write during tar file extraction. Listing 13 shows a generated code that contains CWE-079, this issue is related to cross-site scripting attacks. Lines 1 to 8 of Listing 13 contain the input non-secure prompt, and the rest of the code is the completion of the non-secure prompt. The code in this figure contains a cross-site scripting vulnerability in line 12.
Authors:
(1) Hossein Hajipour, CISPA Helmholtz Center for Information Security ([email protected]);
(2) Keno Hassler, CISPA Helmholtz Center for Information Security ([email protected]);
(3) Thorsten Holz, CISPA Helmholtz Center for Information Security ([email protected]);
(4) Lea Schonherr, CISPA Helmholtz Center for Information Security ([email protected]);
(5) Mario Fritz, CISPA Helmholtz Center for Information Security ([email protected]).
This paper is