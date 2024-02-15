This paper is available on arxiv under CC 4.0 license. Authors: (1) Domenico Cotroneo, University of Naples Federico II, Naples, Italy; (2) Alessio Foggia, University of Naples Federico II, Naples, Italy; (3) Cristina Improta, University of Naples Federico II, Naples, Italy; (4) Pietro Liguori, University of Naples Federico II, Naples, Italy; (5) Roberto Natella, University of Naples Federico II, Naples, Italy.

Abstract & Introduction

Motivating Example

Proposed Method

Experimental Setup

Experimental Results

Related Work

Conclusion & References

7. Conclusion

In this paper, we addressed the automatic correctness of the code generated by AI code generators. We proposed a fully automated method, named ACCA, that uses symbolic execution to assess the correctness of security-oriented code without any human effort.





We used our method to evaluate the performance of four state-of-the-art code generators in the generation of offensive assembly from NL descriptions and compared the results with the human evaluation and different baseline solutions, including state-of-the-art output similarity metrics and the well-known ChatGPT.





Our experiments showed that ACCA provides results almost equal and is the most correlated assessment solution to human evaluation, which is considered the golden standard in the field. Moreover, the analysis of the computational cost revealed that the time to perform the assessment of every code snippet is ∼ 0.17s on average, which is lower than the average time required by human analysts to manually inspect the code, based on our experience.

