Effect Size and Power Analysis: A Guide for Software Engineers

Written by pairprogramming | Published 2025/08/22
Tech Story Tags: pair-programming | pair-versus-solo-programming | software-engineering | design-of-experiments | latin-square-design | programming-efficiency | cohen's-d | power-analysis

TLDRThis article explains how to calculate effect size using Cohen's d and perform a power analysis to assess the sensitivity and significance of your experiment's results.via the TL;DR App

Table of Links

Abstract and 1. Introduction

2. Experiment Definition

3. Experiment Design and Conduct

3.1 Latin Square Designs

3.2 Subjects, Tasks and Objects

3.3 Conduct

3.4 Measures

4. Data Analysis

4.1 Model Assumptions

4.2 Analysis of Variance (ANOVA)

4.3 Treatment Comparisons

4.4 Effect Size and Power Analysis

5. Experiment Limitations and 5.1 Threats to the Conclusion Validity

5.2 Threats to Internal Validity

5.3 Threats to Construct Validity

5.4 Threats to External Validity

6. Discussion and 6.1 Duration

6.2 Effort

7. Conclusions and Further Work, and References

4.4 Effect Size and Power Analysis

Effect size is a measure for quantifying the difference between two data groups. Usually, it is used to indicate the magnitude of a treatment effect. Using the function defined in equation (2) [5], we calculate Cohen's d coefficient [10]. This coefficient is used as an effect size estimate for the comparison between two means (in this case Solo and Pair programming). According to Cohen [10], a d value between 0.2 and 0.3 represents a small effect size, if it is around 0.5 it is a medium effect size, and an effect size bigger than 0.8 is a large one.

Using the F-value 2.9843 of the first ANOVA (Table 6) we get an effect size d of 0.6529 and an effect size d of 0.6431 for the F-value 2.8953 regarding second ANOVA (Table 7). According to Cohen’s classification, both effect sizes are considered medium effects. The first effect size is against of solo programming (with respect to duration) whereas the second is against of pair programming (with respect to effort).

Once we have calculated effect sizes, we carry out a power analysis. The power of a statistical test is the probability of rejecting the null hypothesis when it is false. In other words, the power indicates how sensitive is a test to detect an effect in the treatment examined.

Once we know the effect size it is possible to compute the power of a test. In order to determine the power, we use the function pwr.t.test() of the R environment [9] which implements power analysis as outlined by Cohen [10]. Given an effect size of 0.6529 (related to duration) and a sample size of n=14 (number of measures in each group; pair and solo programming), and setting a significance level a=0.1; we get a power of 0.51 (51%). Similarly, a power of 0.5 (50%) was obtained with the same sample size and significance level, but replacing the effect size for the value 0.6431 (related to effort).

Authors:

(1) Omar S. Gómez, full time professor of Software Engineering at Mathematics Faculty of the Autonomous University of Yucatan (UADY);

(2) José L. Batún, full time professor of Statistics at Mathematics Faculty of the Autonomous University of Yucatan (UADY);

(3) Raúl A. Aguilar, Faculty of Mathematics, Autonomous University of Yucatan Merida, Yucatan 97119, Mexico.


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


Written by pairprogramming | Pair Programming AI Companion. You code with me, I code with you. Write better code together!
Published by HackerNoon on 2025/08/22