This story draft by @escholar has not been reviewed by an editor, YET.

Effect of Sampling Temperature

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and I. Introduction

II. Related Work

III. Technical Background

IV. Systematic Security Vulnerability Discovery of Code Generation Models

V. Experiments

VI. Discussion

VII. Conclusion, Acknowledgments, and References


Appendix

A. Details of Code Language Models

B. Finding Security Vulnerabilities in GitHub Copilot

C. Other Baselines Using ChatGPT

D. Effect of Different Number of Few-shot Examples

E. Effectiveness in Generating Specific Vulnerabilities for C Codes

F. Security Vulnerability Results after Fuzzy Code Deduplication

G. Detailed Results of Transferability of the Generated Nonsecure Prompts

H. Details of Generating non-secure prompts Dataset

I. Detailed Results of Evaluating CodeLMs using Non-secure Dataset

J. Effect of Sampling Temperature

K. Effectiveness of the Model Inversion Scheme in Reconstructing the Vulnerable Codes

L. Qualitative Examples Generated by CodeGen and ChatGPT

M. Qualitative Examples Generated by GitHub Copilot

J. Effect of Sampling Temperature

Figure 8 provides detailed results of the effect of different sampling temperatures in generating non-secure prompts and vulnerable code. We conduct this evaluation using our FS-Code method and sample the non-secure prompts and Python codes from CodeGen model. Here, we provide the total number of generated vulnerable codes with three different CWEs (CWE-020, CWE-022, and CWE-079) and sample 125 code samples for each CWE. The y-axis refers to different sampling temperatures for sampling the non-secure prompts, and xaxis refers to different sampling temperatures of the code generation procedure. The results in Figure 8 show that in general, sampling temperatures of non-secure prompts have a significant effect in generating vulnerable codes, while sampling temperatures of codes have a minor impact (in each row, we have low difference among the number of vulnerable codes), furthermore, in Figure 8 we observe that 0.6 is an optimal temperature for sampling the non-secure prompts. Note that in all of our experiments, based on the previous works in the program generation domain [6], [5], to have fair results we set the non-secure prompt and codes’ sampling temperature to 0.6.


Authors:

(1) Hossein Hajipour, CISPA Helmholtz Center for Information Security ([email protected]);

(2) Keno Hassler, CISPA Helmholtz Center for Information Security ([email protected]);

(3) Thorsten Holz, CISPA Helmholtz Center for Information Security ([email protected]);

(4) Lea Schonherr, CISPA Helmholtz Center for Information Security ([email protected]);

(5) Mario Fritz, CISPA Helmholtz Center for Information Security ([email protected]).


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks