Authors:
(1) Sebastian Dziadzio, University of Tübingen ([email protected]);
(2) Çagatay Yıldız, University of Tübingen;
(3) Gido M. van de Ven, KU Leuven;
(4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox;
(5) Tinne Tuytelaars, KU Leuven;
(6) Matthias Bethge, University of Tübingen.
2. Two problems with the current approach to class-incremental continual learning
3. Methods and 3.1. Infinite dSprites
4.1. Continual learning and 4.2. Benchmarking continual learning
5.1. Regularization methods and 5.2. Replay-based methods
5.4. One-shot generalization and 5.5. Open-set classification
Conclusion, Acknowledgments and References
Continual learning literature typically focuses on catastrophic forgetting in supervised classification. Parameter isolation methods use dedicated parameters for each task by periodically extending the architecture while freezing already trained parameters [33] or by relying on isolated subnetworks [6]. Regularization approaches aim to preserve existing knowledge by limiting the plasticity of the network. Functional regularization methods constrain the network output through knowledge distillation [17] or by using a small set of anchor points to build a functional prior [26, 36]. Weight regularization methods [39] directly constrain network parameters according to their estimated importance for previous tasks. In particular, Variational Continual Learning (VCL) [25] derives the importance estimate by framing continual learning as sequential approximate Bayesian inference. Most methods incorporate regularization into the objective function, but it is also possible to implement it using constrained optimization [2, 10, 13, 22]. Finally, replay methods [4, 12, 30, 32] retain knowledge through rehearsal. When learning a new task, the network is trained with a mix of new samples from the training stream and previously seen samples drawn from the memory buffer. A specific case of this strategy is generative replay [3, 34], where the rehearsal samples are produced by a generative model trained to approximate the data distribution for each class. Many continual learning methods are hybrid systems that mix and match the above techniques.
Established continual learning benchmarks primarily involve splitting existing computer vision datasets into discrete, nonoverlapping segments to study continual supervised classification. Notable examples in this domain include split MNIST [39], split CIFAR [39], and split MiniImageNet [1, 4], along with their augmented counterparts, such as rotated MNIST [22], and permuted MNIST [15]. More recently, contributions from Lomonaco and Maltoni [20], Verwimp et al. [38] and Roady et al. [31] have enriched the field with dataset designed specifically for continual learning, such as CORe50, CLAD, and Stream-51, which comprise temporally correlated images with diverse backgrounds and environments.
This paper is available on arxiv under CC 4.0 license.