Authors:
(1) Vahid Majdinasab, Department of Computer and Software Engineering Polytechnique Montreal, Canada;
(2) Michael Joshua Bishop, School of Mathematical and Computational Sciences Massey University, New Zealand;
(3) Shawn Rasheed, Information & Communication Technology Group UCOL - Te Pukenga, New Zealand;
(4) Arghavan Moradidakhel, Department of Computer and Software Engineering Polytechnique Montreal, Canada;
(5) Amjed Tahir, School of Mathematical and Computational Sciences Massey University, New Zealand;
(6) Foutse Khomh, Department of Computer and Software Engineering Polytechnique Montreal, Canada.
Table of Links
Replication Scope and Methodology
Conclusion, Acknowledgments, and References
VII. CONCLUSION
This study aimed to replicate the work of Pearce et al. [14], which uncovered several security weaknesses in code suggestions generated by GitHub Copilot. The replication study focused on Python-generated code and used the same baseline of weaknesses (MITRE top CWEs) to create the code generation prompts (covering a variety of weaknesses and scenarios). Following the study of [14], GitHub announced an upgrade to Copilot aimed at filtering out solutions that include top CWEs. Despite the current improvements from Copilot, our results demonstrate that Copilot continues to propose vulnerable suggestions for various scenarios. Particularly, within four of the CWEs tested (CWE-78 (OS Command Injection), CWE-434 (Unrestricted File Upload), CWE-306 (Missing Authentication for Critical Function), and CWE-502 (Deserialization of Untrusted Data)), Copilot’s suggestions still exhibit vulnerabilities.
Our results highlight the importance for developers to continuously check the security of the code generated by such models through the implementation of rigorous security code reviews and with the use of a security analysis tool. This has been the recommendation provided by Copilot explicitly: “You are responsible for ensuring the security and quality of your code. We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn’t write yourself.” [29].
The issues associated with the security of generated code, especially from LLMs, will continue to impact the quality of code generation tools and thus might reduce the trust of developers using such tools. It is important to continue investigating such issues as both the underlying code generation models and the nature of weaknesses evolve fast.
While there is some work done on Copilot’s security, little is done in terms of other code-generation tools (especially those that utilize similar LLMs). We expect those tools to face similar security challenges, which will require further investigation.
VIII. ACKNOWLEDGEMENTS
This work is partially supported by Massey University SREF funding, the Fonds de Recherche du Quebec (FRQ), the Canadian Institute for Advanced Research (CIFAR), and the National Science and Engineering Research Council of Canada (NSERC).
REFERENCES
[1] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
[2] T. Dohmke, “The economic impact of the AI-powered developer lifecycle and lessons from GitHub Copilot - The GitHub Blog.” https://gith ub.blog/2023-06-27-the-economic-impact-of-the-ai-powered-developer -lifecycle-and-lessons-from-github-copilot/?ref=blog.gitguardian.com, June 2023. (Accessed on 10/10/2023).
[3] S. Lertbanjongngam, B. Chinthanet, T. Ishio, R. G. Kula, P. Leelaprute, B. Manaskasemsak, A. Rungsawang, and K. Matsumoto, “An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode,” in Proceedings of the 16th IEEE International Workshop on Software Clones (IWSC), pp. 10–15, IEEE, 2022.
[4] D. Wong, A. Kothig, and P. Lam, “Exploring the Verifiability of Code Generated by GitHub Copilot,” arXiv preprint arXiv:2209.01766, 2022.
[5] A. M. Dakhel, V. Majdinasab, A. Nikanjam, F. Khomh, M. C. Desmarais, and Z. M. J. Jiang, “Github copilot AI pair programmer: Asset or liability?,” Journal of Systems and Software, vol. 203, p. 111734, 2023.
[6] R. Pudari and N. A. Ernst, “From Copilot to Pilot: Towards AI Supported Software Development,” arXiv preprint arXiv:2303.04142, 2023.
[7] M. L. Siddiq and J. C. Santos, “SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques,” in Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S), pp. 29–33, ACM, 2022.
[8] J. He and M. Vechev, “Controlling large language models to generate secure and vulnerable code,” arXiv preprint arXiv:2302.05319, 2023.
[9] B. Yetis¸tiren, I. Ozsoy, M. Ayerdem, and E. T ¨ uz¨ un, “Evaluating the Code ¨ Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT,” arXiv preprint arXiv:2304.10778, 2023.
[10] O. Asare, M. Nagappan, and N. Asokan, “Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?,” Empirical Software Engineering, vol. 28, no. 6, pp. 1–24, 2023.
[11] M. Verdi, A. Sami, J. Akhondali, F. Khomh, G. Uddin, and A. K. Motlagh, “An Empirical Study of C++ Vulnerabilities in Crowd-sourced Code Examples,” IEEE Transactions on Software Engineering, vol. 48, no. 5, pp. 1497–1514, 2020.
[12] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[13] M. O. F. Rokon, R. Islam, A. Darki, E. E. Papalexakis, and M. Faloutsos, “SourceFinder: Finding malware Source-Code from publicly available repositories in GitHub,” in 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), pp. 149–163, 2020.
[14] H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” in 2022 IEEE Symposium on Security and Privacy (SP), pp. 754–768, 2022.
[15] Y. Fu, P. Liang, A. Tahir, Z. Li, M. Shahin, and J. Yu, “Security Weaknesses of Copilot Generated Code in GitHub,” arXiv preprint arXiv:2310.02059, 2023.
[16] Y. Huang, Y. Li, W. Wu, J. Zhang, and M. R. Lyu, “Do Not Give Away My Secrets: Uncovering the Privacy Issue of Neural Code Completion Tools,” arXiv preprint arXiv:2309.07639, 2023.
[17] GitGuardian, “The State of Secrets Sprawl Report 2023.” https://www. gitguardian.com/files/the-state-of-secrets-sprawl-report-2023?ref=blog .gitguardian.com, 2023.
[18] The MITRE Corporation (MITRE), “CWE List Version 4.12.” https: //cwe.mitre.org/data/index.html, 2023.
[19] S. Zhao, “GitHub Copilot now has a better AI model and new capabilities - The GitHub Blog.” https://github.blog/2023-02-14-github-cop ilot-now-has-a-better-ai-model-and-new-capabilities/. (Accessed on 12/11/2023).
[20] R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How Secure is Code Generated by ChatGPT?,” arXiv preprint arXiv:2304.09655, 2023.
[21] H. Hajipour, T. Holz, L. Schonherr, and M. Fritz, “Systematically Find- ¨ ing Security Vulnerabilities in Black-Box Code Generation Models,” arXiv preprint arXiv:2302.04012, 2023.
[22] O. Asare, M. Nagappan, and N. Asokan, “Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?,” arXiv preprint arXiv:2204.04741, 2022.
[23] J. Shi, Y. Liu, P. Zhou, and L. Sun, “BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT,” arXiv preprint arXiv:2304.12298, 2023.
[24] E. Derner and K. Batistic, “Beyond the Safeguards: Exploring the ˇ Security Risks of ChatGPT,” arXiv preprint arXiv:2305.08005, 2023.
[25] K. R. Go, S. Soundarapandian, A. Mitra, M. Vidoni, and N. E. D. Ferreyra, “Simple stupid insecure practices and GitHub’s code search: A looming threat?,” Journal of Systems and Software, vol. 202, p. 111698, 2023.
[26] N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do users write more insecure code with AI assistants?,” arXiv preprint arXiv:2211.03622, 2022.
[27] X. Huang, W. Ruan, W. Huang, G. Jin, Y. Dong, C. Wu, S. Bensalem, R. Mu, Y. Qi, X. Zhao, et al., “A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation,” arXiv preprint arXiv:2305.11391, 2023.
[28] N. Nguyen and S. Nadi, “An Empirical Evaluation of GitHub Copilot’s Code Suggestions,” in Proceedings of the 19th International Conference on Mining Software Repositories, pp. 1–5, 2022.
[29] GitHub, “GitHub Copilot for Individuals.” https://docs.github.com/en/c opilot/overview-of-github-copilot/about-github-copilot-for-individuals, 2023. (Accessed on 02/11/2023).
This paper is available on arxiv under CC 4.0 license.
