Continual learning and Benchmarking continual learning

Authors: (1) Sebastian Dziadzio, University of Tübingen (sebastian.dziadzio@uni-tuebingen.de); (2) Çagatay Yıldız, University of Tübingen; (3) Gido M. van de Ven, KU Leuven; (4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox; (5) Tinne Tuytelaars, KU Leuven; (6) Matthias Bethge, University of Tübingen. Table of Links Abstract and 1. Introduction 2. Two problems with the current approach to class-incremental continual learning 3. Methods and 3.1. Infinite dSprites 3.2. Disentangled learning 4. Related work 4.1. Continual learning and 4.2. Benchmarking continual learning 5. Experiments 5.1. Regularization methods and 5.2. Replay-based methods 5.3. Do we need equivariance? 5.4. One-shot generalization and 5.5. Open-set classification 5.6. Online vs. offline Conclusion, Acknowledgments and References Supplementary Material 4. Related work 4.1. Continual learning Continual learning literature typically focuses on catastrophic forgetting in supervised classification. Parameter isolation methods use dedicated parameters for each task by periodically extending the architecture while freezing already trained parameters [33] or by relying on isolated subnetworks [6]. Regularization approaches aim to preserve existing knowledge by limiting the plasticity of the network. Functional regularization methods constrain the network output through knowledge distillation [17] or by using a small set of anchor points to build a functional prior [26, 36]. Weight regularization methods [39] directly constrain network parameters according to their estimated importance for previous tasks. In particular, Variational Continual Learning (VCL) [25] derives the importance estimate by framing continual learning as sequential approximate Bayesian inference. Most methods incorporate regularization into the objective function, but it is also possible to implement it using constrained optimization [2, 10, 13, 22]. Finally, replay methods [4, 12, 30, 32] retain knowledge through rehearsal. When learning a new task, the network is trained with a mix of new samples from the training stream and previously seen samples drawn from the memory buffer. A specific case of this strategy is generative replay [3, 34], where the rehearsal samples are produced by a generative model trained to approximate the data distribution for each class. Many continual learning methods are hybrid systems that mix and match the above techniques. 4.2. Benchmarking continual learning Established continual learning benchmarks primarily involve splitting existing computer vision datasets into discrete, nonoverlapping segments to study continual supervised classification. Notable examples in this domain include split MNIST [39], split CIFAR [39], and split MiniImageNet [1, 4], along with their augmented counterparts, such as rotated MNIST [22], and permuted MNIST [15]. More recently, contributions from Lomonaco and Maltoni [20], Verwimp et al. [38] and Roady et al. [31] have enriched the field with dataset designed specifically for continual learning, such as CORe50, CLAD, and Stream-51, which comprise temporally correlated images with diverse backgrounds and environments. This paper is available on arxiv under CC 4.0 license. Authors: (1) Sebastian Dziadzio, University of Tübingen (sebastian.dziadzio@uni-tuebingen.de); (2) Çagatay Yıldız, University of Tübingen; (3) Gido M. van de Ven, KU Leuven; (4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox; (5) Tinne Tuytelaars, KU Leuven; (6) Matthias Bethge, University of Tübingen. Authors: Authors: (1) Sebastian Dziadzio, University of Tübingen (sebastian.dziadzio@uni-tuebingen.de); (2) Çagatay Yıldız, University of Tübingen; (3) Gido M. van de Ven, KU Leuven; (4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox; (5) Tinne Tuytelaars, KU Leuven; (6) Matthias Bethge, University of Tübingen. Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Two problems with the current approach to class-incremental continual learning 2. Two problems with the current approach to class-incremental continual learning 3. Methods and 3.1. Infinite dSprites 3. Methods and 3.1. Infinite dSprites 3.2. Disentangled learning 3.2. Disentangled learning 4. Related work 4. Related work 4.1. Continual learning and 4.2. Benchmarking continual learning 4.1. Continual learning and 4.2. Benchmarking continual learning 5. Experiments 5. Experiments 5.1. Regularization methods and 5.2. Replay-based methods 5.1. Regularization methods and 5.2. Replay-based methods 5.3. Do we need equivariance? 5.3. Do we need equivariance? 5.4. One-shot generalization and 5.5. Open-set classification 5.4. One-shot generalization and 5.5. Open-set classification 5.6. Online vs. offline 5.6. Online vs. offline Conclusion, Acknowledgments and References Conclusion, Acknowledgments and References Supplementary Material Supplementary Material 4. Related work 4.1. Continual learning Continual learning literature typically focuses on catastrophic forgetting in supervised classification. Parameter isolation methods use dedicated parameters for each task by periodically extending the architecture while freezing already trained parameters [33] or by relying on isolated subnetworks [6]. Regularization approaches aim to preserve existing knowledge by limiting the plasticity of the network. Functional regularization methods constrain the network output through knowledge distillation [17] or by using a small set of anchor points to build a functional prior [26, 36]. Weight regularization methods [39] directly constrain network parameters according to their estimated importance for previous tasks. In particular, Variational Continual Learning (VCL) [25] derives the importance estimate by framing continual learning as sequential approximate Bayesian inference. Most methods incorporate regularization into the objective function, but it is also possible to implement it using constrained optimization [2, 10, 13, 22]. Finally, replay methods [4, 12, 30, 32] retain knowledge through rehearsal. When learning a new task, the network is trained with a mix of new samples from the training stream and previously seen samples drawn from the memory buffer. A specific case of this strategy is generative replay [3, 34], where the rehearsal samples are produced by a generative model trained to approximate the data distribution for each class. Many continual learning methods are hybrid systems that mix and match the above techniques. 4.2. Benchmarking continual learning Established continual learning benchmarks primarily involve splitting existing computer vision datasets into discrete, nonoverlapping segments to study continual supervised classification. Notable examples in this domain include split MNIST [39], split CIFAR [39], and split MiniImageNet [1, 4], along with their augmented counterparts, such as rotated MNIST [22], and permuted MNIST [15]. More recently, contributions from Lomonaco and Maltoni [20], Verwimp et al. [38] and Roady et al. [31] have enriched the field with dataset designed specifically for continual learning, such as CORe50, CLAD, and Stream-51, which comprise temporally correlated images with diverse backgrounds and environments. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv