Table of Links
2. Contexts, Methods, and Tasks
3.1. Quality and 3.2 Productivity
5. Discussion and Future Work
5.1. LLM, Your pAIr Programmer?
5.2. LLM, A Better pAIr Programmer?
5.3. LLM, Students’ pAIr Programmer?
6. Conclusion, Acknowledgments, and References
3 MIXED OUTCOMES
Literature reviews of human-human pair programming have suggested various benefits as well as mixed effects. In the industry context, according to Alves De Lima Salge and Berente [5], pair programming improves code quality, increases productivity, and enhances learning outcomes. However, according to Hannay et al. [31], pair programming improves quality and shortens duration, but it increases effort, higher quality comes at the expense of considerably greater effort, and reduced completion time comes with lower quality. In the education context, pair programming brings benefits including higher quality software, student confidence in solutions, increased assignment grades, exam scores, success/passing rates in introductory courses, and retention [29, 52, 83]. All the reviews on human-human pair programming acknowledged that even though meta-analysis can show an overall trend and significant effect size, individual studies could report contradictory outcomes (see examples in Table 1).
For human-AI pair programming, existing works mainly focus on quality, productivity, and satisfaction, and already demonstrated mixed results in quality and productivity [8, 35, 84] (see examples in Table 1). Additionally, there is not enough research for a comprehensive review, so we cannot reach any conclusion on the effectiveness of human-AI pair programming yet. It is also hard to compare the human-human and human-AI pair programming literature, as they differ in what outcomes and measurements they adopt.
Therefore, in the top rows of Table 1, we listed the most common outcome variables in both literature (quality, productivity, satisfaction, learning, and cost) and some sample work to demonstrate the mixed outcomes and various measures. We elaborate on the variety of ways to measure some of the listed outcomes as follows.
Authors:
(1) Qianou Ma (Corresponding author), Carnegie Mellon University, Pittsburgh, USA ([email protected]);
(2) Tongshuang Wu, Carnegie Mellon University, Pittsburgh, USA ([email protected]);
(3) Kenneth Koedinger, Carnegie Mellon University, Pittsburgh, USA ([email protected]).
This paper is