Pair Programming Experiment: Threats to Conclusion Validity

Written by pairprogramming | Published 2025/08/22
Tech Story Tags: pair-programming | pair-versus-solo-programming | software-engineering | design-of-experiments | latin-square-design | programming-efficiency | academic-programming-research | pair-programming-experiment

TLDRThis article outlines the threats to conclusion validity in a pair programming study, including issues with statistical power, variance, and unreliable measures.via the TL;DR App

Table of Links

Abstract and 1. Introduction

2. Experiment Definition

3. Experiment Design and Conduct

3.1 Latin Square Designs

3.2 Subjects, Tasks and Objects

3.3 Conduct

3.4 Measures

4. Data Analysis

4.1 Model Assumptions

4.2 Analysis of Variance (ANOVA)

4.3 Treatment Comparisons

4.4 Effect Size and Power Analysis

5. Experiment Limitations and 5.1 Threats to the Conclusion Validity

5.2 Threats to Internal Validity

5.3 Threats to Construct Validity

5.4 Threats to External Validity

6. Discussion and 6.1 Duration

6.2 Effort

7. Conclusions and Further Work, and References

5. Experiment Limitations

Experiments are subject to concerns regarding validity. In this section we discuss experiment limitations based on the four categories of threats to validity described in [11]. Each category has several threats that can negatively impact on the experiment results. We list, both, threats that can impact on this experiment and suggestions for improvements in future versions of this experiment.

5.1 Threats to the Conclusion Validity

These threats concern with issues that affect the ability to draw a correct conclusion about the existence of a relationship between the treatment and the outcome. Next, we describe threats in this category that may have affected our experiment.

Although the experiment results show a moderate power of 50%, results may have been affected by low statistical power. With the aiming of increase the power at 80%, we will perform a power analysis to estimate the sample needed before we conduct replications of this experiment.

Regarding to assumptions of statistical tests, although experiment results satisfy the principle of independence and normality, results may have been affected by lack of variance homogeneity. We have identified the program as a source of variation. With the aiming of reduce variance heterogeneity, in future replications we will use programs with similar complexity.

Another threat that might have affected conclusion validity is with respect to reliability of measures. Although all measures were collected during second session, some measures regarding solo programmers were not collected during first session; it was due to time constraint. In this session subjects that did not finish on time were told to record the time at home. To avoid this threat in future replications we will be careful with managing the time of sessions.

Authors:

(1) Omar S. Gómez, full time professor of Software Engineering at Mathematics Faculty of the Autonomous University of Yucatan (UADY);

(2) José L. Batún, full time professor of Statistics at Mathematics Faculty of the Autonomous University of Yucatan (UADY);

(3) Raúl A. Aguilar, Faculty of Mathematics, Autonomous University of Yucatan Merida, Yucatan 97119, Mexico.


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


Written by pairprogramming | Pair Programming AI Companion. You code with me, I code with you. Write better code together!
Published by HackerNoon on 2025/08/22