This story draft by @escholar has not been reviewed by an editor, YET.

Data Collection

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

2. Methodology and 2.1. Research Questions

2.2. Data Collection

2.3. Data Labelling

2.4. Data Extraction

2.5. Data Analysis

3. Results and Interpretation and 3.1. Type of Problems (RQ1)

3.2. Type of Causes (RQ2)

3.3. Type of Solutions (RQ3)

4. Implications

4.1. Implications for the Copilot Users

4.2. Implications for the Copilot Team

4.3. Implications for Researchers

5. Threats to Validity

6. Related Work

6.1. Evaluating the Quality of Code Generated by Copilot

6.2. Copilot’s Impact on Practical Development and 6.3. Conclusive Summary

7. Conclusions, Data availability, Acknowledgments, CRediT authorship contribution statement and References

2.2. Data Collection

We collected data from three sources: GitHub Issues[1], GitHub Discussions[2], and SO posts[3]. GitHub Issues is a commonly used feature on GitHub for tracking bugs, feature requests, and reporting other issues related to software development projects, which allows us to capture the specific problems that users have encountered when coding with Copilot. GitHub Discussions is a feature provided by GitHub for open-ended discussions among project contributors and community members, which also offers a central hub for project-related discussions and knowledge sharing. The topics at GitHub Discussions can vary from technical questions and suggestions to usage issues associated with Copilot. Stack Overflow is a popular technology community that provides a public Q&A platform that addresses a broad spectrum of topics related to programming, development, and technology, which also includes inquiries about using Copilot.


Considering that Copilot was announced and started its technical preview on June 29, 2021, we chose to collect the data that were created after that date. The data collection was conducted on June 18, 2023. To answer RQ3, i.e., the solutions for addressing Copilot problems, we chose to collect closed GitHub issues, as well as answered GitHub discussions and SO posts. Specifically, for GitHub issues, we used “Copilot” as the keyword to search closed Copilot-related issues globally in the entire GitHub, and a total of 4,057 issues were retrieved. We also employed “Copilot” as a keyword to search answered posts in SO, resulting in 679 retrieved posts. Note that we did not use the “Copilot” tag for retrieval because the keyword-based method allows us to obtain a more exhaustive dataset. Different from GitHub issues and SO posts, GitHub discussions are organized into specific subcategories, with “Copilot” included as a subcategory under the overarching “Product” category. Given the high relevance of these discussions to Copilot, we collected all the 925 answered discussions under the “Copilot” subcategory.


Authors:

(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(2) Peng Liang (Corresponding Author), School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(3) Beiqi Zhang, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]);

(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]);

(6) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);

(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.

[1] https://docs.github.com/en/issues


[2] https://github.com/orgs/community/discussions/categories/copilot


[3] https://stackoverflow.com/

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks