Table of Links
2. Methodology and 2.1. Research Questions
3. Results and Interpretation and 3.1. Type of Problems (RQ1)
4. Implications
4.1. Implications for the Copilot Users
4.2. Implications for the Copilot Team
4.3. Implications for Researchers
6. Related Work
6.1. Evaluating the Quality of Code Generated by Copilot
6.2. Copilot’s Impact on Practical Development and 6.3. Conclusive Summary
6.2. Copilot’s Impact on Practical Development
Several studies focused on investigating the performance of Copilot in actual software development, as well as the opinions of software practitioners on it. Wang et al. (2023a) conducted an interview with 15 practitioners and then surveyed 599 practitioners from 18 IT companies regarding their expectations of code completion. They found that 13% of the participants had used Copilot as their code completion tool. Jaworski and Piotrkowski (2023) prepared a survey questionnaire consisting of 18 questions to investigate developers’ attitudes toward Copilot. The research findings indicate that most people have a positive attitude towards the tool, but few participants showed concerns about security issues associated with using Copilot. Imai (2022) conducted experiments with 21 participants to compare the effectiveness of Copilot paired with human programmers in terms of productivity and code quality. The results indicate that while Copilot can increase productivity by adding more lines of code, the generated code quality is lower due to the need to remove more lines of code during testing. Barke et al. (2023) observed 20 participants who collaborated with Copilot to complete different programming tasks in four languages, and found that the interaction with the programming assistant was bimodal in different collaboration mode. Bird et al. (2023) conducted three studies aimed at understanding how developers utilize Copilot. Their findings suggest that developers spent a lot of time assessing the suggestions generated by Copilot instead of completing their coding tasks. Peng et al. (2023) presented the results of a controlled experiment using Copilot as an AI collaborative programmer. They found that the experimental group who had access to Copilot completed tasks 55.8% faster than the control group. Zhang et al. (2023) investigated the programming languages, IDEs, associated technologies, implemented functionalities, advantages, limitations, and challenges when using Copilot. Vaithilingam et al. (2022) conducted a user study involving 24 participants to assess the usability of Copilot and its integration into the programming workflow. They found that while Copilot may not directly enhance the efficiency of completing programming tasks, it serves as a valuable starting point for programmers, saving time spent on searching. Liang et al. (2024) conducted a survey among software developers and found that the primary motivation for developers to use AI programming assistants is to reduce keystrokes, complete programming tasks quickly, and recall syntax. However, the impact of using these tools to help generate potential solutions is not significant. Gustavo et al. (2023) analyzed the use of Copilot for programming and compared it with earlier forms of programmer assistance. They also explored potential challenges that could arise when applying LLMs to programming. Ziegler et al. (2024) sought to assess the impact of Copilot on user productivity through a case study, aiming to align user perceptions with empirical data. Their research highlights the aspects in which Copilot has enhanced users’ coding productivity and how it achieves these improvements.
6.3. Conclusive Summary
Most of the prior studies utilized controlled experiments or surveys to evaluate the effectiveness of Copilot. Our research is grounded in the perspective of software developers, focusing on the real-world problems they encounter when using Copilot, exploring the underlying causes and viable solutions. By analyzing the study results, we aimed to provide insights for Copilot users, the Copilot team, and researchers. Besides, we collected data from three popular software development platforms and forums, i.e., GitHub Issues, GitHub Discussions, and SO, to ensure the comprehensiveness of our dataset.
Authors:
(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(2) Peng Liang (Corresponding Author), School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(3) Beiqi Zhang, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]);
(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]);
(6) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);
(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).
This paper is