Authors:
(1) Nora Schneider, Computer Science Department, ETH Zurich, Zurich, Switzerland ([email protected]);
(2) Shirin Goshtasbpour, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland ([email protected]);
(3) Fernando Perez-Cruz, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland ([email protected]).
2 Background
3.1 Comparison to C-Mixup and 3.2 Preserving nonlinear data structure
4 Experiments and 4.1 Linear synthetic data
4.2 Housing nonlinear regression
4.3 In-distribution Generalization
4.4 Out-of-distribution Robustness
5 Conclusion, Broader Impact, and References
A Additional information for Anchor Data Augmentation
Many different data augmentation methods have been proposed in recent years with several applications in mind. Still most augmentations we mention here use human-designed transformations based on domain knowledge which leave the target variable invariant. For instance, Cutout [10] is an image-specific augmentation technique that is successfully used to train models on CIFAR10 and CIFAR100 [25], but was determined to be unsuitable for larger image datasets like ImageNet with higher resolution [9]. Other augmentation methods for images such as random crop, horizontal or vertical mirroring, random rotation, or translation [29, 43] may similarly apply to a certain group of image datasets while being inapplicable to others, e.g. datasets of digits and letters.
In an attempt to automate the augmentation process and reduce human involvement, policy or searchbased automated augmentation methods were developed. In AutoAugment [7] a neural network is trained with Reinforcement Learning (RL) to combine an assortment of transformations in varying strengths to apply on samples of a given dataset and improve the model accuracy. Methods such as RandAugment [8], Fast AutoAugment [30], UniformAugment [32] and TrivialAugment [36] aim at reducing the cost of the pretraining search phase in automated augmentation with randomized transformations and reduced search space.
Alternatively, in order to adapt the augmentation policy to the model during training, PopulationBased Augmentation [16] and Online Hyperparameter Learning [31] use multiple data augmentation workers that are updated using evolutionary strategies and RL, respectively. Adversarial AutoAugment [53] and AugMax [47] optimize for the augmentation policy that deteriorates the training accuracy and improves its robustness. DivAug [34] finds the policy which maximizes the diversity of the augmented data.
Having a separate search phase for optimal augmentation policy is computationally expensive and may exceed the required computation to train the downstream model [8, 48]. In addition, these methods and their online counterparts need to be trained separately on every single dataset. While OnlineAugment [44] and DDAS exploit meta-learning to avoid this problem, they still rely on a set of predefined class invariant transformations that require domain-specific information.
Generic transformations such as Gaussian or adversarial noise [10, 28, 45] and dropout [3] are also effective in expanding the training dataset. Generative models such as Generative Adversarial Networks (GAN) [13] and Variational Auto-Encoders (VAE) [22] are trained in [1, 6, 44] to synthesize samples close to the low dimensional manifold of the data for classification.
Mixup [51] is a popular data augmentation using a convex combination of pairs of samples from different classes and their softened labels for augmentation. Mixup is only evaluated on classification problems, even though it is claimed that the application to regression is straightforward. Various extensions of Mixup have been proposed to prevent data manifold intrusion [46], use more complex mixing strategies [33, 50] or account for saliency in augmented samples [20, 21]. These methods were predominantly designed to excel in classification tasks. In particular, Mixup for regression was studied in [5, 18, 49, 52] but it was reported to adversely impact the predictions in regression problems when misleading augmented samples are generated from a pair of faraway samples.
This paper is available on arxiv under CC0 1.0 DEED license.