Authors: (1) Nora Schneider, Computer Science Department, ETH Zurich, Zurich, Switzerland (nschneide@student.ethz.ch); (2) Shirin Goshtasbpour, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (shirin.goshtasbpour@inf.ethz.ch); (3) Fernando Perez-Cruz, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (fernando.perezcruz@sdsc.ethz.ch). Table of Links Abstract and 1 Introduction 2 Background 2.1 Data Augmentation 2.2 Anchor Regression 3 Anchor Data Augmentation 3.1 Comparison to C-Mixup and 3.2 Preserving nonlinear data structure 3.3 Algorithm 4 Experiments and 4.1 Linear synthetic data 4.2 Housing nonlinear regression 4.3 In-distribution Generalization 4.4 Out-of-distribution Robustness 5 Conclusion, Broader Impact, and References A Additional information for Anchor Data Augmentation B Experiments 3 Anchor Data Augmentation In this section, we introduce Anchor Data Augmentation (ADA), a domain-independent data augmentation method inspired by AR. ADA does not require previous knowledge about the data invariances nor manually engineered transformations. As opposed to existing domain-agnostic data augmentation methods [10, 45, 46], we do not require training of an expensive generative model, and the augmentation only adds marginally to the computation complexity of the training. In addition, since ADA originates from a causal regression problem, it can be readily applied to regression problems. Even when ADA does not improve performance, its effect on performance remains minimal. This paper is available on arxiv under CC0 1.0 DEED license. Authors: (1) Nora Schneider, Computer Science Department, ETH Zurich, Zurich, Switzerland (nschneide@student.ethz.ch); (2) Shirin Goshtasbpour, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (shirin.goshtasbpour@inf.ethz.ch); (3) Fernando Perez-Cruz, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (fernando.perezcruz@sdsc.ethz.ch). Authors: Authors: (1) Nora Schneider, Computer Science Department, ETH Zurich, Zurich, Switzerland (nschneide@student.ethz.ch); (2) Shirin Goshtasbpour, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (shirin.goshtasbpour@inf.ethz.ch); (3) Fernando Perez-Cruz, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (fernando.perezcruz@sdsc.ethz.ch). Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Background 2.1 Data Augmentation 2.1 Data Augmentation 2.2 Anchor Regression 2.2 Anchor Regression 3 Anchor Data Augmentation 3 Anchor Data Augmentation 3.1 Comparison to C-Mixup and 3.2 Preserving nonlinear data structure 3.1 Comparison to C-Mixup and 3.2 Preserving nonlinear data structure 3.3 Algorithm 3.3 Algorithm 4 Experiments and 4.1 Linear synthetic data 4 Experiments and 4.1 Linear synthetic data 4.2 Housing nonlinear regression 4.2 Housing nonlinear regression 4.3 In-distribution Generalization 4.3 In-distribution Generalization 4.4 Out-of-distribution Robustness 4.4 Out-of-distribution Robustness 5 Conclusion, Broader Impact, and References 5 Conclusion, Broader Impact, and References A Additional information for Anchor Data Augmentation A Additional information for Anchor Data Augmentation B Experiments B Experiments 3 Anchor Data Augmentation In this section, we introduce Anchor Data Augmentation (ADA), a domain-independent data augmentation method inspired by AR. ADA does not require previous knowledge about the data invariances nor manually engineered transformations. As opposed to existing domain-agnostic data augmentation methods [10, 45, 46], we do not require training of an expensive generative model, and the augmentation only adds marginally to the computation complexity of the training. In addition, since ADA originates from a causal regression problem, it can be readily applied to regression problems. Even when ADA does not improve performance, its effect on performance remains minimal. This paper is available on arxiv under CC0 1.0 DEED license. This paper is available on arxiv under CC0 1.0 DEED license. available on arxiv