Table of Links
2. Methodology and 2.1. Research Questions
3. Results and Interpretation and 3.1. Type of Problems (RQ1)
4. Implications
4.1. Implications for the Copilot Users
4.2. Implications for the Copilot Team
4.3. Implications for Researchers
6. Related Work
6.1. Evaluating the Quality of Code Generated by Copilot
6.2. Copilot’s Impact on Practical Development and 6.3. Conclusive Summary
2.2. Data Collection
We collected data from three sources: GitHub Issues[1], GitHub Discussions[2], and SO posts[3]. GitHub Issues is a commonly used feature on GitHub for tracking bugs, feature requests, and reporting other issues related to software development projects, which allows us to capture the specific problems that users have encountered when coding with Copilot. GitHub Discussions is a feature provided by GitHub for open-ended discussions among project contributors and community members, which also offers a central hub for project-related discussions and knowledge sharing. The topics at GitHub Discussions can vary from technical questions and suggestions to usage issues associated with Copilot. Stack Overflow is a popular technology community that provides a public Q&A platform that addresses a broad spectrum of topics related to programming, development, and technology, which also includes inquiries about using Copilot.
Considering that Copilot was announced and started its technical preview on June 29, 2021, we chose to collect the data that were created after that date. The data collection was conducted on June 18, 2023. To answer RQ3, i.e., the solutions for addressing Copilot problems, we chose to collect closed GitHub issues, as well as answered GitHub discussions and SO posts. Specifically, for GitHub issues, we used “Copilot” as the keyword to search closed Copilot-related issues globally in the entire GitHub, and a total of 4,057 issues were retrieved. We also employed “Copilot” as a keyword to search answered posts in SO, resulting in 679 retrieved posts. Note that we did not use the “Copilot” tag for retrieval because the keyword-based method allows us to obtain a more exhaustive dataset. Different from GitHub issues and SO posts, GitHub discussions are organized into specific subcategories, with “Copilot” included as a subcategory under the overarching “Product” category. Given the high relevance of these discussions to Copilot, we collected all the 925 answered discussions under the “Copilot” subcategory.
Authors:
(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(2) Peng Liang (Corresponding Author), School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(3) Beiqi Zhang, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]);
(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]);
(6) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);
(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).
This paper is
[1] https://docs.github.com/en/issues
[2] https://github.com/orgs/community/discussions/categories/copilot