How Our Disentangled Learning Framework Tackles Lifelong Learning Challenges

Authors: (1) Sebastian Dziadzio, University of Tübingen (sebastian.dziadzio@uni-tuebingen.de); (2) Çagatay Yıldız, University of Tübingen; (3) Gido M. van de Ven, KU Leuven; (4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox; (5) Tinne Tuytelaars, KU Leuven; (6) Matthias Bethge, University of Tübingen. Table of Links Abstract and 1. Introduction 2. Two problems with the current approach to class-incremental continual learning 3. Methods and 3.1. Infinite dSprites 3.2. Disentangled learning 4. Related work 4.1. Continual learning and 4.2. Benchmarking continual learning 5. Experiments 5.1. Regularization methods and 5.2. Replay-based methods 5.3. Do we need equivariance? 5.4. One-shot generalization and 5.5. Open-set classification 5.6. Online vs. offline Conclusion, Acknowledgments and References Supplementary Material 6. Discussion In the last decade, continual learning research has made progress through parameter and functional regularization, rehearsal, and architectural strategies that mitigate forgetting by preserving important parameters or compartmentalizing knowledge. As pointed out in a recent survey [37], the best performing continual learners are based on storing or synthesizing samples. Such methods are typically evaluated on sequential versions of standard computer vision datasets such as MNIST or CIFAR-100, which often involve only a small number of learning tasks, discrete task boundaries, and fixed data distributions. As such, the benchmarks do not match the lifelong nature of real-world learning tasks. Our work is motivated by the hypothesis that state-of-the-art continual learners and their predecessors, would inevitably fail when trained in a true lifelong fashion akin to humans. To test our claim, we introduced idSprites dataset, consisting of procedurally generated shapes and their affine transformation. To our knowledge, this is the first class-incremental continual learning benchmark that allows generating hundreds or thousands of tasks. While acknowledging the relatively simplistic nature of our dataset, we believe any lifelong learner must solve idSprites before tackling more complicated, real-world datasets. Nevertheless, our empirical findings highlight that all standard methods are doomed to collapse, and memory buffers can only defer the ultimate end. Updating synaptic connections in the human brain upon novel experiences does not interfere with the general knowledge accumulated throughout life. Inspired by this insight, we propose our disentangled learning framework, which splits the continual learning problem into (i) sequentially training a network that models the general aspects of the problem that apply to all instances (equivariances) and (ii) memorizing class-specific information relevant to the task (exemplars). This separation enables disentangled model updating, which allows for continually learning equivariant representations without catastrophic forgetting and explicitly updating class-specific information without harming information corresponding to other classes. As demonstrated experimentally, such a separation exhibits successful forward and backward transfer and achieves impressive one-shot generalization and open-set recognition performance. Limitations With this work, we aim to bring a fresh perspective and chart a novel research direction in continual learning. To demonstrate our framework, we stick to a simple dataset and include the correct inductive biases in our learning architecture. We acknowledge that when applied to natural images, our approach would suffer from a number of issues, which we list below, along with some mitigation strategies. • Real-world data does not come with perfect supervision signals, hindering the learning of equivariant networks. As a remedy, one might employ equivariant architectures as an inductive bias [7] or weakly supervise the learning, e.g. with image-text pairs [29]. • Obtaining class exemplars for real-world data is not straightforward, which makes training the normalization network difficult. A potential solution is to maintain multiple exemplars per class. • It is not clear that we can separate generalization and memorization for any continual learning problem. We plan to investigate this question on a real-world dataset. Acknowledgements This work was supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. This research utilized compute resources at the Tübingen Machine Learning Cloud, DFG FKZ INST 37/1057-1 FUGG. We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting SD. This work was supported by the National Centre of Science (Poland) Grants No. 2020/39/B/ST6/01511 and 2022/45/B/ST6/02817. References [1] Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, and Lucas Page-Caccia. Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems 32, pages 11849–11860. Curran Associates, Inc., 2019. 5 [2] Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. Gradient based sample selection for online continual learning. Advances in neural information processing systems, 32, 2019. 5 [3] Craig Atkinson, Brendan McCane, Lech Szymanski, and Anthony Robins. Pseudo-recursal: Solving the catastrophic forgetting problem in deep neural networks. arXiv preprint arXiv:1802.03875, 2018. 5 [4] Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019. 1, 5 [5] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020. 6, 11 [6] Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A Rusu, Alexander Pritzel, and Daan Wierstra. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017. 1, 5 [7] Marc Finzi, Max Welling, and Andrew Gordon Wilson. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International conference on machine learning, pages 3318–3328. PMLR, 2021. 8 [8] Muhammad Waleed Gondal, Manuel Wuthrich, Djordje Miladinovic, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Schölkopf, and Stefan Bauer. On the transfer of inductive bias from simulation to the real world: a new disentanglement dataset. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pages 15714–15725. Curran Associates, Inc., 2019. 3 [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 3, 5, 11 [10] Timm Hess, Tinne Tuytelaars, and Gido M van de Ven. Two complementary perspectives to continual learning: Ask not only what to optimize, but also how. arXiv preprint arXiv:2311.04898, 2023. 5 [11] Irina Higgins, Sébastien Racanière, and Danilo Rezende. Symmetry-based representations for artificial and biological general intelligence. Frontiers in Computational Neuroscience, 16:836498, 2022. 1, 3 [12] David Isele and Akansel Cosgun. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018. 1, 5 [13] Ta-Chu Kao, Kristopher Jensen, Gido van de Ven, Alberto Bernacchia, and Guillaume Hennequin. Natural continual learning: success is a journey, not (just) a destination. Advances in neural information processing systems, 34:28067– 28079, 2021. 5 [14] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 11 [15] James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114(13):3521–3526, 2017. MAG ID: 2560647685. 1, 5 [16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. 3 [17] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2017. 1, 5, 6 [18] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017. 11 [19] Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning, pages 4114–4124, 2019. 3 [20] Vincenzo Lomonaco and Davide Maltoni. Core50: a new dataset and benchmark for continuous object recognition. In Proceedings of the 1st Annual Conference on Robot Learning, pages 17–26. PMLR, 2017. 5 [21] Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu, Antonio Carta, Gabriele Graffieti, Tyler L Hayes, Matthias De Lange, Marc Masana, Jary Pomponi, Gido M Van de Ven, et al. Avalanche: an end-to-end library for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 3600–3610, 2021. 6 [22] David Lopez-Paz and Marc’ Aurelio Ranzato. Gradient Episodic Memory for Continual Learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017. 5 [23] Loic Matthey, Irina Higgins, Demis Hassabis, and Alexander Lerchner. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017. 2, 4 [24] Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation Vol. 24, pages 109–165. Academic Press, 1989. 1 [25] Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, and Richard E. Turner. Variational continual learning. In International Conference on Learning Representations, 2018. 5 [26] Pingbo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard Turner, and Mohammad Emtiyaz E Khan. Continual Deep Learning by Functional Regularisation of Memorable Past. In Advances in Neural Information Processing Systems, pages 4453–4464. Curran Associates, Inc., 2020. 1, 5 [27] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. 11 [28] Ameya Prabhu, Philip HS Torr, and Puneet K Dokania. Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 524–540. Springer, 2020. 6, 11 [29] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021. 8 [30] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 5 [31] Ryne Roady, Tyler L. Hayes, Hitesh Vaidya, and Christopher Kanan. Stream-51: Streaming classification and novelty detection from videos. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020. 5 [32] David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experience replay for continual learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019. 1, 5 [33] Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016. 1, 5 [34] Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. Advances in neural information processing systems, 30, 2017. 5 [35] Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015. 1, 3 [36] Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, and Yee Whye Teh. Functional regularisation for continual learning with gaussian processes. In International Conference on Learning Representations, 2020. 1, 5 [37] Gido M van de Ven, Tinne Tuytelaars, and Andreas S Tolias. Three types of incremental learning. Nature Machine Intelligence, 4(12):1185–1197, 2022. 3, 4, 5, 8 [38] Eli Verwimp, Kuo Yang, Sarah Parisot, Lanqing Hong, Steven McDonagh, Eduardo Pérez-Pellitero, Matthias De Lange, and Tinne Tuytelaars. Clad: A realistic continual learning benchmark for autonomous driving. Neural Networks, 161: 659–669, 2023. 5 [39] Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In International conference on machine learning, pages 3987–3995. PMLR, 2017. 5, 6, 11 This paper is available on arxiv under CC 4.0 license. Authors: (1) Sebastian Dziadzio, University of Tübingen (sebastian.dziadzio@uni-tuebingen.de); (2) Çagatay Yıldız, University of Tübingen; (3) Gido M. van de Ven, KU Leuven; (4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox; (5) Tinne Tuytelaars, KU Leuven; (6) Matthias Bethge, University of Tübingen. Authors: Authors: (1) Sebastian Dziadzio, University of Tübingen (sebastian.dziadzio@uni-tuebingen.de); (2) Çagatay Yıldız, University of Tübingen; (3) Gido M. van de Ven, KU Leuven; (4) Tomasz Trzcinski, IDEAS NCBR, Warsaw University of Technology, Tooploox; (5) Tinne Tuytelaars, KU Leuven; (6) Matthias Bethge, University of Tübingen. Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Two problems with the current approach to class-incremental continual learning 2. Two problems with the current approach to class-incremental continual learning 3. Methods and 3.1. Infinite dSprites 3. Methods and 3.1. Infinite dSprites 3.2. Disentangled learning 3.2. Disentangled learning 4. Related work 4.1. Continual learning and 4.2. Benchmarking continual learning 4.1. Continual learning and 4.2. Benchmarking continual learning 5. Experiments 5. Experiments 5.1. Regularization methods and 5.2. Replay-based methods 5.1. Regularization methods and 5.2. Replay-based methods 5.3. Do we need equivariance? 5.3. Do we need equivariance? 5.4. One-shot generalization and 5.5. Open-set classification 5.4. One-shot generalization and 5.5. Open-set classification 5.6. Online vs. offline 5.6. Online vs. offline Conclusion, Acknowledgments and References Conclusion, Acknowledgments and References Supplementary Material Supplementary Material 6. Discussion In the last decade, continual learning research has made progress through parameter and functional regularization, rehearsal, and architectural strategies that mitigate forgetting by preserving important parameters or compartmentalizing knowledge. As pointed out in a recent survey [37], the best performing continual learners are based on storing or synthesizing samples. Such methods are typically evaluated on sequential versions of standard computer vision datasets such as MNIST or CIFAR-100, which often involve only a small number of learning tasks, discrete task boundaries, and fixed data distributions. As such, the benchmarks do not match the lifelong nature of real-world learning tasks. Our work is motivated by the hypothesis that state-of-the-art continual learners and their predecessors, would inevitably fail when trained in a true lifelong fashion akin to humans. To test our claim, we introduced idSprites dataset, consisting of procedurally generated shapes and their affine transformation. To our knowledge, this is the first class-incremental continual learning benchmark that allows generating hundreds or thousands of tasks. While acknowledging the relatively simplistic nature of our dataset, we believe any lifelong learner must solve idSprites before tackling more complicated, real-world datasets. Nevertheless, our empirical findings highlight that all standard methods are doomed to collapse, and memory buffers can only defer the ultimate end. Updating synaptic connections in the human brain upon novel experiences does not interfere with the general knowledge accumulated throughout life. Inspired by this insight, we propose our disentangled learning framework, which splits the continual learning problem into (i) sequentially training a network that models the general aspects of the problem that apply to all instances (equivariances) and (ii) memorizing class-specific information relevant to the task (exemplars). This separation enables disentangled model updating, which allows for continually learning equivariant representations without catastrophic forgetting and explicitly updating class-specific information without harming information corresponding to other classes. As demonstrated experimentally, such a separation exhibits successful forward and backward transfer and achieves impressive one-shot generalization and open-set recognition performance. Limitations With this work, we aim to bring a fresh perspective and chart a novel research direction in continual learning. To demonstrate our framework, we stick to a simple dataset and include the correct inductive biases in our learning architecture. We acknowledge that when applied to natural images, our approach would suffer from a number of issues, which we list below, along with some mitigation strategies. Limitations • Real-world data does not come with perfect supervision signals, hindering the learning of equivariant networks. As a remedy, one might employ equivariant architectures as an inductive bias [7] or weakly supervise the learning, e.g. with image-text pairs [29]. • Obtaining class exemplars for real-world data is not straightforward, which makes training the normalization network difficult. A potential solution is to maintain multiple exemplars per class. • It is not clear that we can separate generalization and memorization for any continual learning problem. We plan to investigate this question on a real-world dataset. Acknowledgements This work was supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. This research utilized compute resources at the Tübingen Machine Learning Cloud, DFG FKZ INST 37/1057-1 FUGG. We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting SD. This work was supported by the National Centre of Science (Poland) Grants No. 2020/39/B/ST6/01511 and 2022/45/B/ST6/02817. Acknowledgements References [1] Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, and Lucas Page-Caccia. Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems 32, pages 11849–11860. Curran Associates, Inc., 2019. 5 [2] Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. Gradient based sample selection for online continual learning. Advances in neural information processing systems, 32, 2019. 5 [3] Craig Atkinson, Brendan McCane, Lech Szymanski, and Anthony Robins. Pseudo-recursal: Solving the catastrophic forgetting problem in deep neural networks. arXiv preprint arXiv:1802.03875, 2018. 5 [4] Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019. 1, 5 [5] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020. 6, 11 [6] Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A Rusu, Alexander Pritzel, and Daan Wierstra. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017. 1, 5 [7] Marc Finzi, Max Welling, and Andrew Gordon Wilson. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International conference on machine learning, pages 3318–3328. PMLR, 2021. 8 [8] Muhammad Waleed Gondal, Manuel Wuthrich, Djordje Miladinovic, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Schölkopf, and Stefan Bauer. On the transfer of inductive bias from simulation to the real world: a new disentanglement dataset. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pages 15714–15725. Curran Associates, Inc., 2019. 3 [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 3, 5, 11 [10] Timm Hess, Tinne Tuytelaars, and Gido M van de Ven. Two complementary perspectives to continual learning: Ask not only what to optimize, but also how. arXiv preprint arXiv:2311.04898, 2023. 5 [11] Irina Higgins, Sébastien Racanière, and Danilo Rezende. Symmetry-based representations for artificial and biological general intelligence. Frontiers in Computational Neuroscience, 16:836498, 2022. 1, 3 [12] David Isele and Akansel Cosgun. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018. 1, 5 [13] Ta-Chu Kao, Kristopher Jensen, Gido van de Ven, Alberto Bernacchia, and Guillaume Hennequin. Natural continual learning: success is a journey, not (just) a destination. Advances in neural information processing systems, 34:28067– 28079, 2021. 5 [14] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 11 [15] James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114(13):3521–3526, 2017. MAG ID: 2560647685. 1, 5 [16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. 3 [17] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2017. 1, 5, 6 [18] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017. 11 [19] Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning, pages 4114–4124, 2019. 3 [20] Vincenzo Lomonaco and Davide Maltoni. Core50: a new dataset and benchmark for continuous object recognition. In Proceedings of the 1st Annual Conference on Robot Learning, pages 17–26. PMLR, 2017. 5 [21] Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu, Antonio Carta, Gabriele Graffieti, Tyler L Hayes, Matthias De Lange, Marc Masana, Jary Pomponi, Gido M Van de Ven, et al. Avalanche: an end-to-end library for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 3600–3610, 2021. 6 [22] David Lopez-Paz and Marc’ Aurelio Ranzato. Gradient Episodic Memory for Continual Learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017. 5 [23] Loic Matthey, Irina Higgins, Demis Hassabis, and Alexander Lerchner. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017. 2, 4 [24] Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation Vol. 24, pages 109–165. Academic Press, 1989. 1 [25] Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, and Richard E. Turner. Variational continual learning. In International Conference on Learning Representations, 2018. 5 [26] Pingbo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard Turner, and Mohammad Emtiyaz E Khan. Continual Deep Learning by Functional Regularisation of Memorable Past. In Advances in Neural Information Processing Systems, pages 4453–4464. Curran Associates, Inc., 2020. 1, 5 [27] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. 11 [28] Ameya Prabhu, Philip HS Torr, and Puneet K Dokania. Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 524–540. Springer, 2020. 6, 11 [29] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021. 8 [30] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 5 [31] Ryne Roady, Tyler L. Hayes, Hitesh Vaidya, and Christopher Kanan. Stream-51: Streaming classification and novelty detection from videos. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020. 5 [32] David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experience replay for continual learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019. 1, 5 [33] Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016. 1, 5 [34] Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. Advances in neural information processing systems, 30, 2017. 5 [35] Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015. 1, 3 [36] Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, and Yee Whye Teh. Functional regularisation for continual learning with gaussian processes. In International Conference on Learning Representations, 2020. 1, 5 [37] Gido M van de Ven, Tinne Tuytelaars, and Andreas S Tolias. Three types of incremental learning. Nature Machine Intelligence, 4(12):1185–1197, 2022. 3, 4, 5, 8 [38] Eli Verwimp, Kuo Yang, Sarah Parisot, Lanqing Hong, Steven McDonagh, Eduardo Pérez-Pellitero, Matthias De Lange, and Tinne Tuytelaars. Clad: A realistic continual learning benchmark for autonomous driving. Neural Networks, 161: 659–669, 2023. 5 [39] Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In International conference on machine learning, pages 3987–3995. PMLR, 2017. 5, 6, 11 This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv