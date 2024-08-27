Authors: (1) Sebastian Dziadzio, University of Tübingen ([email protected]); (2) Çagatay Yıldız, University of Tübingen; (3) Gido M. van de Ven, KU Leuven; (4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox; (5) Tinne Tuytelaars, KU Leuven; (6) Matthias Bethge, University of Tübingen.

Abstract and 1. Introduction

2. Two problems with the current approach to class-incremental continual learning

3. Methods and 3.1. Infinite dSprites

3.2. Disentangled learning

4. Related work

4.1. Continual learning and 4.2. Benchmarking continual learning

5. Experiments

5.1. Regularization methods and 5.2. Replay-based methods

5.3. Do we need equivariance?

5.4. One-shot generalization and 5.5. Open-set classification

5.6. Online vs. offline

Conclusion, Acknowledgments and References

Supplementary Material

6. Discussion

In the last decade, continual learning research has made progress through parameter and functional regularization, rehearsal, and architectural strategies that mitigate forgetting by preserving important parameters or compartmentalizing knowledge. As pointed out in a recent survey [37], the best performing continual learners are based on storing or synthesizing samples. Such methods are typically evaluated on sequential versions of standard computer vision datasets such as MNIST or CIFAR-100, which often involve only a small number of learning tasks, discrete task boundaries, and fixed data distributions. As such, the benchmarks do not match the lifelong nature of real-world learning tasks.





Our work is motivated by the hypothesis that state-of-the-art continual learners and their predecessors, would inevitably fail when trained in a true lifelong fashion akin to humans. To test our claim, we introduced idSprites dataset, consisting of procedurally generated shapes and their affine transformation. To our knowledge, this is the first class-incremental continual learning benchmark that allows generating hundreds or thousands of tasks. While acknowledging the relatively simplistic nature of our dataset, we believe any lifelong learner must solve idSprites before tackling more complicated, real-world datasets. Nevertheless, our empirical findings highlight that all standard methods are doomed to collapse, and memory buffers can only defer the ultimate end.





Updating synaptic connections in the human brain upon novel experiences does not interfere with the general knowledge accumulated throughout life. Inspired by this insight, we propose our disentangled learning framework, which splits the continual learning problem into (i) sequentially training a network that models the general aspects of the problem that apply to all instances (equivariances) and (ii) memorizing class-specific information relevant to the task (exemplars). This separation enables disentangled model updating, which allows for continually learning equivariant representations without catastrophic forgetting and explicitly updating class-specific information without harming information corresponding to other classes. As demonstrated experimentally, such a separation exhibits successful forward and backward transfer and achieves impressive one-shot generalization and open-set recognition performance.





Limitations With this work, we aim to bring a fresh perspective and chart a novel research direction in continual learning. To demonstrate our framework, we stick to a simple dataset and include the correct inductive biases in our learning architecture. We acknowledge that when applied to natural images, our approach would suffer from a number of issues, which we list below, along with some mitigation strategies.





• Real-world data does not come with perfect supervision signals, hindering the learning of equivariant networks. As a remedy, one might employ equivariant architectures as an inductive bias [7] or weakly supervise the learning, e.g. with image-text pairs [29].





• Obtaining class exemplars for real-world data is not straightforward, which makes training the normalization network difficult. A potential solution is to maintain multiple exemplars per class.





• It is not clear that we can separate generalization and memorization for any continual learning problem. We plan to investigate this question on a real-world dataset.





Acknowledgements This work was supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. This research utilized compute resources at the Tübingen Machine Learning Cloud, DFG FKZ INST 37/1057-1 FUGG. We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting SD. This work was supported by the National Centre of Science (Poland) Grants No. 2020/39/B/ST6/01511 and 2022/45/B/ST6/02817.

References

This paper is available on arxiv under CC 4.0 license.



