Ensuring Quality and Compliance: Addressing Challenges with Copilot Usage

Authors:

(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China;

(2) Peng Liang, School of Computer Science, Wuhan University, Wuhan, China;

(3) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China;

(4) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany;

(4) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia;

(4) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland.

IV. DISCUSSION

Enhance compatibility across various IDEs and editors and simplify the configuration of Copilot. According to the results of RQ1 and RQ2, Compatibility Issue is the second-largest category, and Editor/IDE Compatibility Issue is the cause that leads to many Usage Issues. From the perspective of users, we also have observed lots of discussions about the details related to configuration and settings of Copilot, which makes Modify Configuration/Setting the second most frequently employed solution. Additionally, Improper Configuration/Setting is the fifth most common cause of issues. Based on the findings, we believe that enhancing compatibility and simplifying the configuration process of Copilot for users can significantly improve their experience. Therefore, the Copilot team may offer more detailed installation and configuration guidelines, provide user-friendly configuration options, and perform regular update and maintenance.

Need for more customization options to allow users to tailor Copilot’s behavior to align with their own workflow. Among the 123 FUNCTION REQUESTS, we identified 52 instances of such requests to customize Copilot’s behavior in various aspects, accounting for approximately 50%. Some common requests are specify the file types or workspace in which Copilot automatically runs (11), modify the shortcut keys for accepting suggestions (10), accept code suggestions line-by-line or word-by-word (9), prevent Copilot from generating certain types of suggestions (e.g., file paths, comments) (3), and configure text color and fonts (3). In the study by Zhang et al. [19], they also indicated that allowing users to have customization for suggestions is essential. In addition, according to the POOR FUNCTIONALITY EXPERIENCE (e.g., perceiving Copilot’s auto-suggestions as distracting, which is also mentioned the study by Bird et al. [20]), we can discern the demand for customizing Copilot’s behavior. According to the results, we believe that the extent to which Copilot’s behavior can adapt well to users’ individual coding habits is a significant factor in their decision to use Copilot. Therefore, providing flexible and user-friendly customization options is highly beneficial. Furthermore, it is meaningful to explore how AI coding tools should interact with users and integrate these tools into practical development.

Users need more ways to control the content generated by Copilot. From Table IV, it can be observed that the majority of solutions are aimed at addressing Usage Issue and Compatibility Issue, while there are small amount of solutions for Suggestion Content Issue. Out of 69 Suggestion Content Issues, we only identified 5 solutions, indicating that users may find it challenging to provide ideal solutions for the issues of content suggested by Copilot. This is partially due to the reason that users have limited ways to control the code generation of Copilot besides code and code comments per se. Therefore additional methods are required for addressing Suggestion Content Issue, for instance, allowing developers to interact with Copilot and iterate the generated code till the code reaches the expectation of developers.

Improve the quality of Copilot generated code. In Suggestion Content Issues, the predominant types are LOW QUALITY SUGGESTION (27) and NONSENSICAL SUGGESTION (13). The experiment by Imai et al. [9] found that, in comparison to human pair programming, Copilot, while is capable of generating a significant amount of code, also led to more code deletions during testing, highlighting the need for improvement in Copilot’s code quality. Bird et al. [20] observed that Copilot occasionally offers peculiar and nonsensical code suggestions, as reported by users, some of which may include personal information. Furthermore, although INSECURE SUGGESTION and LESS EFFECTIVE SUGGESTION each only have two instances, we believe this is primarily due to users encountering difficulty in detecting issues of these kinds and being less inclined to report them. Pearce et al. [6] found that out of the 1,689 code snippets generated by Copilot, 40% were vulnerable. Given the successive iterations of Copilot, it becomes imperative to conduct regular assessments of the quality of its suggestions.

The use of Copilot alters the coding process and increases the time cost of verifying code suggestions, making code explanations highly important. In our research, INCOMPREHENSIBLE SUGGESTION (8) ranks as the fourth most common Suggestion Content Issue. Some users mentioned the issues with code suggestions being excessively long, resulting in reduced readability. This indicates that when Copilot provides relatively complex suggestions, or when users lack coding experience in a particular domain, understanding the code logic and verifying its correctness can be timeconsuming. The study by Wang et al. [21] shows that using AI-generated code can lead to significant review pressure. Therefore, we believe that AI coding tools (e.g., Copilot) will change the allocation of time spent on various tasks in software development. We have observed four feature requests related to code explanation, and the Copilot team places significant emphasis on expanding this functionality. The chat feature [15] introduced in Copilot X is already capable of providing detailed code explanations, while its accuracy requires further experimental evaluation in subsequent stages.

Consider Intellectual Property and Copyright. The number of Copyright and Policy Issue is slightly higher than we expected, and we observed many concerns from both users and GitHub repository owners during the data extraction process. Birds et al. [20] also noticed some discussions about how copyright applied to Copilot’s code suggestions. The goal of our research is not to provide an evaluation of such issues and Copilot’s non-open source nature, as it is a complex problem that depends on various factors such as the goal, target users, and business model of Copilot. However, we contend that the Copilot team can take measures to address these problems, providing stable and high-quality code generation services while protecting user privacy and intellectual property.

This paper is available on arxiv under CC 4.0 license.