paint-brush
A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Abstract and Introby@computational
193 reads

A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Abstract and Intro

tldt arrow

Too Long; Didn't Read

Data augmentation enhances model generalization in computer vision but may introduce biases, impacting class accuracy unevenly.
featured image - A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Abstract and Intro
Computational Technology for All HackerNoon profile picture

Authors:

(1) Athanasios Angelakis, Amsterdam University Medical Center, University of Amsterdam - Data Science Center, Amsterdam Public Health Research Institute, Amsterdam, Netherlands

(2) Andrey Rass, Den Haag, Netherlands.

Abstract

Data augmentation (DA) enhances model generalization in computer vision but may introduce biases, impacting class accuracy unevenly. Our study extends this inquiry, examining DA’s class-specific bias across various datasets, including those distinct from ImageNet, through random cropping. We evaluated this phenomenon with ResNet50, EfficientNetV2S, and SWIN ViT, discovering that while residual models showed similar bias effects, Vision Transformers exhibited greater robustness or altered dynamics. This suggests a nuanced approach to model selection, emphasizing bias mitigation. We also refined a ”data augmentation robustness scouting” method to manage DA-induced biases more efficiently, reducing computational demands significantly (training 112 models instead of 1860; a reduction of factor 16.2) while still capturing essential bias trends.

1 Introduction

Machine learning is generally defined as aiming at systems to solve, or ”learn” a particular task or set of tasks (e.g. regression, classification, machine translation, anomaly detection - (Goodfellow, Bengio, and Courville 2016a) based on some training data, allowing computers to assign labels or predict future outcomes without manual programming. A typical case involves a training dataset that is finite, and the system’s parameters are optimized using a technique such as gradient descent, using some performance measure (e.g. “accuracy”) for evaluation during optimization and on a holdout “test” set (LeCun et al. 1998; Bishop and Nasrabadi 2006; Shalev-Shwartz and Ben-David 2014; Goodfellow, Bengio, and Courville 2016a).


In particular, computer vision tasks (e.g. image classification) are contemporarily accomplished via deep learning methods (Feng et al. 2019) such as convolutional neural networks (CNNs), which have continuously been held in high regard as the go-to approach to such problems (LeCun, Bengio et al. 1995; LeCun, Kavukcuoglu, and Farabet 2010; Voulodimos et al. 2018), owing to their ability to extract features from data with known grid-like topology, such as images (Goodfellow, Bengio, and Courville 2016b).


Like any other machine learning systems, CNNs can suffer from overfitting - a performance gap between training and test data samples, which represents inability to generalize the learned task to unseen data (Goodfellow, Bengio, and Courville 2016c). To combat this, various regularizaton methods have been developed for use during optimization (Tikhonov 1943; Tihonov 1963). Particular (though not limited - (Ko et al. 2015)) to image-based tasks is the use of data augmentation - applying certain transformations, such as random cropping, stretching and color jitter, to training data during training iterations. Such techniques are near-ubiquitous in computer vision tasks due to their effectiveness as a regularization measure (Shorten and Khoshgoftaar 2019).


However, recent research by Balestriero, Bottou, and LeCun (2022) suggests that, despite data augmentation being such a prevalent method of improving model performance, it may actually prove a risk to blindly turn to this technique regardless of dataset and approach. This appears to be a part of a larger phenomenon concerning a tendency of parametrized regularization measures to sacrifice performance on certain classes in favor of overall model accuracy. However, it is also caused in large part by the fact that different image transformations seem to possess varying levels of label preservation (Cui, Goel, and Kingsbury 2015; Taylor and Nitschke 2018) dependent on the class of image data, and, as such, can have a severe class-specific negative impact on the model performance by virtue of incurring label loss if applied too aggressively. This impact can also be so deep as to even impact downstream performance in the case of transfer learning tasks.


It may be a natural reaction, as such, to caution against using data augmentation as a regularization technique to avoid this. However, this can prove to be difficult in practice, as the overall performance boost this technique provides is undeniable, and it is currently used ubiquitously, with few alternatives. In addition, as alluded to before, the paper showed that other regularization methods, such as weight decay, can also be subject to this phenomenon. It is also important to note that Balestriero, Bottou, and LeCun (2022) claims that the phenomenon is model-agnostic for popular CNNs based on residual blocks such as ResNet (He et al. 2015), DenseNet (Huang, Liu, and Weinberger 2016), and others, but does not make any claims in regards to whether the phenomenon is data-agnostic, or how it would manifest in image classification networks that belong to architectures founded in considerably different principles, such as Vision Transformers (Dosovitskiy et al. 2020b; Liu et al. 2021) that use patch-based image processing and self-attention mechanisms to extract features from images.


Our work follows a course of further investigation. To formulate our primary research question, we follow the tenets of the data-centric AI movement championed by Andrew Ng, which seeks to systematically engineer the data used in training AI systems, and places special focus on accounting for imperfections in real world data. In data augmentation, the discipline outlines issues such as domain gaps, data bias and noise. Following this school of thought leads us to question if class-specific bias from data augmentations affect datasets different from Imagenet, in a different way. In particular, we seek to test whether this phenomenon can be observed on datasets that differ in nature to various degrees from Imagenet (Deng et al. 2009), which was used in Balestriero’s 2022 paper, and to what extent. To supplement this line of validation, another, secondary research question emerged from a seemingly minor detail in the original study. Concretely, we investigate if the addition of Random Horizontal Flipping have an effect on how the class-specific bias phenomenon manifests.


While not our core focus, we also seek to confirm how model-agnostic this phenomenon is on these new datasets. For this, it is first necessary to test with a model that has common features with ResNet50, which was the baseline for many experiments in (Balestriero, Bottou, and LeCun 2022). We selected EfficientNet, first described in Tan and Le (2019), which is a family of models that also utilizes residual blocks, but was designed via a neural architecture search (Elsken, Metzen, and Hutter 2019) using a new scaling method that uniformly scales all dimensions of model depth/width/resolution using a simple yet highly effective compound coefficient. To that end, we question if class-specific bias from data augmentations on the same dataset would affect a different Residual CNN architecture in the same manner as it would a ResNet. Finally, it is also worth investigating the effects of a vastly different architecture, as mentioned earlier. For these purposes, we have selected to use a SWIN Transformer, which is a relatively small patch-based vision transformer that uses a novel shifted windowing technique for more efficient computation of the self-attention mechanism that is inherent to Transformer-type models (Liu et al. 2021). Finally, we consider if class-specific bias from data augmentations on the same dataset would affect a Vision Transformer model in the same manner as it would a ResNet.


This paper is available on arxiv under CC BY 4.0 DEED license.