Lifelong Intelligence Beyond the Edge using Hyperdimensional Computing: Conclusion, and References

Authors: (1) Xiaofan Yu, University of California San Diego, La Jolla, California, USA (x1yu@ucsd.edu); (2) Anthony Thomas, University of California San Diego, La Jolla, California, USA (ahthomas@ucsd.edu); (3) Ivannia Gomez Moreno, CETYS University, Campus Tijuana, Tijuana, Mexico (ivannia.gomez@cetys.edu.mx); (4) Louis Gutierrez, University of California San Diego, La Jolla, California, USA (l8gutierrez@ucsd.edu); (5) Tajana Šimunić Rosing, University of California San Diego, La Jolla, USA (tajana@ucsd.edu). Table of Links Abstract and 1. Introduction 2 Related Work 3 Background on HDC 4 Problem Definition 5 LifeDH 6 Variants of LifeHD 7 Evaluation of LifeHD 8 Evaluation of LifeHD semi and LifeHDa 9 Discussions and Future Works 10 Conclusion, Acknowledgments, and References 10 CONCLUSION The ability to learn continuously and indefinitely in the presence of change, and without access to supervision, on a resource-constrained device is a crucial trait for future sensor systems. In this work, we design and deploy the first end-to-end system named LifeHD to learn continuously from real-world data streams without labels. Our approach is based on Hyperdimensional Computing (HDC), an emerging neurally-inspired paradigm for lightweight edge computing. LifeHD is built on a two-tier memory hierarchy including a working and a long-term memory, with collaborative components of novelty detection, online cluster HV update and cluster HV merging for optimal lifelong learning performance. We further propose two extensions to LifeHD, LifeHDsemi and LifeHDa, to handle scarce labeled samples and power constraints. Practical deployments on typical edge platforms and three IoT scenarios demonstrate LifeHD’s improvement of up to 74.8% on unsupervised clustering accuracy and up to 34.3x on energy efficiency compared to state-of-the-art NN-based unsupervised lifelong learning baselines [13, 14, 54]. ACKNOWLEDGMENTS The authors would like to thank the anonymous shepherd, reviewers, and our colleague Xiyuan Zhang for their valuable feedback. This work was supported in part by National Science Foundation under Grants #2003279, #1826967, #2100237, #2112167, #1911095, #2112665, and in part by PRISM and CoCoSys, centers in JUMP 2.0, an SRC program sponsored by DARPA. REFERENCES [1] 2023. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2. [Online]. [2] 2023. Raspberry Pi 4B. https://www.raspberrypi.com/products/raspberry-pi-4- model-b/. [Online]. [3] 2023. Raspberry Pi Zero 2 W. https://www.raspberrypi.com/products/raspberrypi-zero-2-w/. [Online]. [4] Aurore Avarguès-Weber et al. 2012. Simultaneous mastering of two abstract concepts by the miniature brain of bees. Proceedings of the National Academy of Sciences 109, 19 (2012), 7481–7486. [5] Alan Baddeley. 1992. Working memory. Science 255, 5044 (1992), 556–559. [6] Garcia Rafael Banos, Oresti and Alejandro Saez. 2014. MHEALTH Dataset. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5TW22. [7] Bernd Bischl et al. 2023. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13, 2 (2023), e1484. [8] Trenton Bricken et al. 2023. Sparse Distributed Memory is a Continual Learner. In International Conference on Learning Representations. [9] Han Cai et al. 2020. Tinytl: Reduce memory, not parameters for efficient ondevice learning. Advances in Neural Information Processing Systems 33 (2020), 11285–11297. [10] Ning Chen et al. 2016. Smart urban surveillance using fog computing. In 2016 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 95–96. [11] Arpan Dutta et al. 2022. Hdnn-pim: Efficient in memory design of hyperdimensional computing with feature extraction. In Proceedings of the Great Lakes Symposium on VLSI 2022. 281–286. [12] Ehab Essa and Islam R Abdelmaksoud. 2023. Temporal-channel convolution with self-attention network for human activity recognition using wearable sensors. Knowledge-Based Systems 278 (2023), 110867. [13] Divyam Madaan et al. 2022. Representational Continuity for Unsupervised Continual Learning. In International Conference on Learning Representations. [14] Enrico Fini et al. 2022. Self-supervised models are continual learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [15] In Gim and JeongGil Ko. 2022. Memory-efficient DNN training on mobile devices. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 464–476. [16] Jean-Bastien Grill et al. 2020. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33 (2020), 21271–21284. [17] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288. [18] Michael Hersche et al. 2022. Constrained few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9057–9067. [19] Hioki. 2023. Hioki3334 Powermeter. https://www.hioki.com/en/products/detail/ ?product_key=5812. [20] Andrew Howard et al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324. [21] Mohsen Imani et al. 2019. Hdcluster: An accurate clustering using brain-inspired high-dimensional computing. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1591–1594. [22] Mohsen Imani et al. 2019. Semihd: Semi-supervised learning using hyperdimensional computing. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1–8. [23] Mohsen Imani, Deqian Kong, Abbas Rahimi, and Tajana Rosing. 2017. Voicehd: Hyperdimensional computing for efficient speech recognition. In IEEE International Conference on Rebooting Computing (ICRC). IEEE, 1–8. [24] Pentti Kanerva. 2009. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation 1 (2009), 139–159. [25] Behnam Khaleghi, Mohsen Imani, and Tajana Rosing. 2020. Prive-hd: Privacypreserved hyperdimensional computing. In ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6. [26] Hyeji Kim, Muhammad Umar Karim Khan, and Chong-Min Kyung. 2019. Efficient neural network compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12569–12577. [27] Yeseong Kim, Mohsen Imani, and Tajana S Rosing. 2018. Efficient human activity recognition using hyperdimensional computing. In Proceedings of the 8th International Conference on the Internet of Things. 1–6. [28] James Kirkpatrick et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences (2017). [29] Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009). [30] Young D Kwon et al. 2023. LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms. In Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems. [31] Soochan Lee et al. 2020. A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning. In International Conference on Learning Representations. [32] Ji Lin et al. 2020. Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems 33 (2020), 11711–11722. [33] Ji Lin et al. 2021. Memory-efficient patch-based inference for tiny deep learning. Advances in Neural Information Processing Systems 34 (2021), 2346–2358. [34] Ji Lin et al. 2022. On-device training under 256kb memory. Advances in Neural Information Processing Systems 35 (2022), 22941–22954. [35] David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems 30 (2017). [36] Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109–165. [37] Md Mohaimenuzzaman et al. 2023. Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices. Pattern Recognition 133 (2023), 109025. [38] Ali Moin et al. 2021. A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition. Nature Electronics 4, 1 (2021), 54–63. [39] James O’ Neill. 2020. An overview of neural network compression. arXiv preprint arXiv:2006.03669 (2020). [40] Andrew Ng, Michael Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 14 (2001). [41] Evgeny Osipov et al. 2022. Hyperseed: Unsupervised learning with vector symbolic architectures. IEEE Transactions on Neural Networks and Learning Systems (2022). [42] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. 2019. Continual lifelong learning with neural networks: A review. Neural networks 113 (2019), 54–71. [43] Adam Paszke et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019). [44] Karol J Piczak. 2015. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia. 1015–1018. [45] Christos Profentzas, Magnus Almgren, and Olaf Landsiedel. 2022. MiniLearn: On-Device Learning for Low-Power IoT Devices. In International Conference on Embedded Wireless Systems and Networks. [46] Dushyant Rao, Francesco Visin, Andrei Rusu, Razvan Pascanu, Yee Whye Teh, and Raia Hadsell. 2019. Continual unsupervised representation learning. Advances in neural information processing systems 32 (2019). [47] Haoyu Ren, Darko Anicic, and Thomas A Runkler. 2021. Tinyol: Tinyml with online-learning on microcontrollers. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8. [48] Olga Russakovsky et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211–252. [49] Andrei A Rusu et al. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016). [50] Swapnil Sayan Saha et al. 2023. TinyNS: Platform-Aware Neurosymbolic Auto Tiny Machine Learning. ACM Transactions on Embedded Computing Systems (2023). [51] Mark Sandler et al. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520. [52] Yang Shen, Sanjoy Dasgupta, and Saket Navlakha. 2021. Algorithmic insights on continual learning from fruit flies. arXiv preprint arXiv:2107.07617 (2021). [53] Shun Shunhou and Yang Peng. 2022. AIoT on Cloud. In Digital Transformation in Cloud Computing. CRC Press, 629–732. [54] James Smith et al. 2021. Unsupervised Progressive Learning and the STAM Architecture. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 2979–2987. [55] Ke Sun, Chen Chen, and Xinyu Zhang. 2020. " Alexa, stop spying on me!" speech privacy protection against voice assistants. In Proceedings of the 18th conference on Embedded Networked Sensor Systems. 298–311. [56] Anthony Thomas, Sanjoy Dasgupta, and Tajana Rosing. 2021. A theoretical perspective on hyperdimensional computing. Journal of Artificial Intelligence Research 72 (2021), 215–249. [57] Matteo Tiezzi et al. 2022. Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 3480–3486. [58] Rishabh Tiwari et al. 2022. Gcr: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99–108. [59] Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17 (2007), 395–416. [60] Erwei Wang et al. 2019. Deep neural network approximation for custom hardware: Where we’ve been, where we’re going. ACM Computing Surveys (CSUR) 52, 2 (2019), 1–39. [61] Qipeng Wang et al. 2022. Melon: Breaking the memory wall for resource-efficient on-device machine learning. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 450–463. [62] Gary M Weiss et al. 2016. Smartwatch-based activity recognition: A machine learning approach. In 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 426–429. [63] Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning. PMLR, 478–487. [64] Daliang Xu et al. 2022. Mandheling: Mixed-precision on-device dnn training with dsp offloading. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 214–227. [65] Weihong Xu, Jaeyoung Kang, and Tajana Rosing. 2023. FSL-HD: Accelerating Few-Shot Learning on ReRAM using Hyperdimensional Computing. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6. [66] Junting Zhang et al. 2020. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1131–1140. [67] Yu Zhang, Tao Gu, and Xi Zhang. 2020. MDLdroidLite: A release-and-inhibit control approach to resource-efficient deep neural networks on mobile devices. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 463–475. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: (1) Xiaofan Yu, University of California San Diego, La Jolla, California, USA (x1yu@ucsd.edu); (2) Anthony Thomas, University of California San Diego, La Jolla, California, USA (ahthomas@ucsd.edu); (3) Ivannia Gomez Moreno, CETYS University, Campus Tijuana, Tijuana, Mexico (ivannia.gomez@cetys.edu.mx); (4) Louis Gutierrez, University of California San Diego, La Jolla, California, USA (l8gutierrez@ucsd.edu); (5) Tajana Šimunić Rosing, University of California San Diego, La Jolla, USA (tajana@ucsd.edu). Authors: Authors: (1) Xiaofan Yu, University of California San Diego, La Jolla, California, USA (x1yu@ucsd.edu); (2) Anthony Thomas, University of California San Diego, La Jolla, California, USA (ahthomas@ucsd.edu); (3) Ivannia Gomez Moreno, CETYS University, Campus Tijuana, Tijuana, Mexico (ivannia.gomez@cetys.edu.mx); (4) Louis Gutierrez, University of California San Diego, La Jolla, California, USA (l8gutierrez@ucsd.edu); (5) Tajana Šimunić Rosing, University of California San Diego, La Jolla, USA (tajana@ucsd.edu). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2 Related Work 2 Related Work 3 Background on HDC 3 Background on HDC 4 Problem Definition 4 Problem Definition 5 LifeDH 5 LifeDH 6 Variants of LifeHD 6 Variants of LifeHD 7 Evaluation of LifeHD 7 Evaluation of LifeHD 8 Evaluation of LifeHD semi and LifeHDa 8 Evaluation of LifeHD semi and LifeHDa 9 Discussions and Future Works 9 Discussions and Future Works 10 Conclusion, Acknowledgments, and References 10 Conclusion, Acknowledgments, and References 10 CONCLUSION The ability to learn continuously and indefinitely in the presence of change, and without access to supervision, on a resource-constrained device is a crucial trait for future sensor systems. In this work, we design and deploy the first end-to-end system named LifeHD to learn continuously from real-world data streams without labels. Our approach is based on Hyperdimensional Computing (HDC), an emerging neurally-inspired paradigm for lightweight edge computing. LifeHD is built on a two-tier memory hierarchy including a working and a long-term memory, with collaborative components of novelty detection, online cluster HV update and cluster HV merging for optimal lifelong learning performance. We further propose two extensions to LifeHD, LifeHDsemi and LifeHDa, to handle scarce labeled samples and power constraints. Practical deployments on typical edge platforms and three IoT scenarios demonstrate LifeHD’s improvement of up to 74.8% on unsupervised clustering accuracy and up to 34.3x on energy efficiency compared to state-of-the-art NN-based unsupervised lifelong learning baselines [13, 14, 54]. ACKNOWLEDGMENTS The authors would like to thank the anonymous shepherd, reviewers, and our colleague Xiyuan Zhang for their valuable feedback. This work was supported in part by National Science Foundation under Grants #2003279, #1826967, #2100237, #2112167, #1911095, #2112665, and in part by PRISM and CoCoSys, centers in JUMP 2.0, an SRC program sponsored by DARPA. REFERENCES [1] 2023. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2. [Online]. [2] 2023. Raspberry Pi 4B. https://www.raspberrypi.com/products/raspberry-pi-4- model-b/. [Online]. [3] 2023. Raspberry Pi Zero 2 W. https://www.raspberrypi.com/products/raspberrypi-zero-2-w/. [Online]. [4] Aurore Avarguès-Weber et al. 2012. Simultaneous mastering of two abstract concepts by the miniature brain of bees. Proceedings of the National Academy of Sciences 109, 19 (2012), 7481–7486. [5] Alan Baddeley. 1992. Working memory. Science 255, 5044 (1992), 556–559. [6] Garcia Rafael Banos, Oresti and Alejandro Saez. 2014. MHEALTH Dataset. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5TW22. [7] Bernd Bischl et al. 2023. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13, 2 (2023), e1484. [8] Trenton Bricken et al. 2023. Sparse Distributed Memory is a Continual Learner. In International Conference on Learning Representations. [9] Han Cai et al. 2020. Tinytl: Reduce memory, not parameters for efficient ondevice learning. Advances in Neural Information Processing Systems 33 (2020), 11285–11297. [10] Ning Chen et al. 2016. Smart urban surveillance using fog computing. In 2016 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 95–96. [11] Arpan Dutta et al. 2022. Hdnn-pim: Efficient in memory design of hyperdimensional computing with feature extraction. In Proceedings of the Great Lakes Symposium on VLSI 2022. 281–286. [12] Ehab Essa and Islam R Abdelmaksoud. 2023. Temporal-channel convolution with self-attention network for human activity recognition using wearable sensors. Knowledge-Based Systems 278 (2023), 110867. [13] Divyam Madaan et al. 2022. Representational Continuity for Unsupervised Continual Learning. In International Conference on Learning Representations. [14] Enrico Fini et al. 2022. Self-supervised models are continual learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [15] In Gim and JeongGil Ko. 2022. Memory-efficient DNN training on mobile devices. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 464–476. [16] Jean-Bastien Grill et al. 2020. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33 (2020), 21271–21284. [17] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288. [18] Michael Hersche et al. 2022. Constrained few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9057–9067. [19] Hioki. 2023. Hioki3334 Powermeter. https://www.hioki.com/en/products/detail/ ?product_key=5812. [20] Andrew Howard et al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324. [21] Mohsen Imani et al. 2019. Hdcluster: An accurate clustering using brain-inspired high-dimensional computing. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1591–1594. [22] Mohsen Imani et al. 2019. Semihd: Semi-supervised learning using hyperdimensional computing. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1–8. [23] Mohsen Imani, Deqian Kong, Abbas Rahimi, and Tajana Rosing. 2017. Voicehd: Hyperdimensional computing for efficient speech recognition. In IEEE International Conference on Rebooting Computing (ICRC). IEEE, 1–8. [24] Pentti Kanerva. 2009. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation 1 (2009), 139–159. [25] Behnam Khaleghi, Mohsen Imani, and Tajana Rosing. 2020. Prive-hd: Privacypreserved hyperdimensional computing. In ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6. [26] Hyeji Kim, Muhammad Umar Karim Khan, and Chong-Min Kyung. 2019. Efficient neural network compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12569–12577. [27] Yeseong Kim, Mohsen Imani, and Tajana S Rosing. 2018. Efficient human activity recognition using hyperdimensional computing. In Proceedings of the 8th International Conference on the Internet of Things. 1–6. [28] James Kirkpatrick et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences (2017). [29] Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009). [30] Young D Kwon et al. 2023. LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms. In Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems. [31] Soochan Lee et al. 2020. A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning. In International Conference on Learning Representations. [32] Ji Lin et al. 2020. Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems 33 (2020), 11711–11722. [33] Ji Lin et al. 2021. Memory-efficient patch-based inference for tiny deep learning. Advances in Neural Information Processing Systems 34 (2021), 2346–2358. [34] Ji Lin et al. 2022. On-device training under 256kb memory. Advances in Neural Information Processing Systems 35 (2022), 22941–22954. [35] David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems 30 (2017). [36] Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109–165. [37] Md Mohaimenuzzaman et al. 2023. Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices. Pattern Recognition 133 (2023), 109025. [38] Ali Moin et al. 2021. A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition. Nature Electronics 4, 1 (2021), 54–63. [39] James O’ Neill. 2020. An overview of neural network compression. arXiv preprint arXiv:2006.03669 (2020). [40] Andrew Ng, Michael Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 14 (2001). [41] Evgeny Osipov et al. 2022. Hyperseed: Unsupervised learning with vector symbolic architectures. IEEE Transactions on Neural Networks and Learning Systems (2022). [42] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. 2019. Continual lifelong learning with neural networks: A review. Neural networks 113 (2019), 54–71. [43] Adam Paszke et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019). [44] Karol J Piczak. 2015. ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia. 1015–1018. [45] Christos Profentzas, Magnus Almgren, and Olaf Landsiedel. 2022. MiniLearn: On-Device Learning for Low-Power IoT Devices. In International Conference on Embedded Wireless Systems and Networks. [46] Dushyant Rao, Francesco Visin, Andrei Rusu, Razvan Pascanu, Yee Whye Teh, and Raia Hadsell. 2019. Continual unsupervised representation learning. Advances in neural information processing systems 32 (2019). [47] Haoyu Ren, Darko Anicic, and Thomas A Runkler. 2021. Tinyol: Tinyml with online-learning on microcontrollers. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8. [48] Olga Russakovsky et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211–252. [49] Andrei A Rusu et al. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016). [50] Swapnil Sayan Saha et al. 2023. TinyNS: Platform-Aware Neurosymbolic Auto Tiny Machine Learning. ACM Transactions on Embedded Computing Systems (2023). [51] Mark Sandler et al. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520. [52] Yang Shen, Sanjoy Dasgupta, and Saket Navlakha. 2021. Algorithmic insights on continual learning from fruit flies. arXiv preprint arXiv:2107.07617 (2021). [53] Shun Shunhou and Yang Peng. 2022. AIoT on Cloud. In Digital Transformation in Cloud Computing. CRC Press, 629–732. [54] James Smith et al. 2021. Unsupervised Progressive Learning and the STAM Architecture. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 2979–2987. [55] Ke Sun, Chen Chen, and Xinyu Zhang. 2020. " Alexa, stop spying on me!" speech privacy protection against voice assistants. In Proceedings of the 18th conference on Embedded Networked Sensor Systems. 298–311. [56] Anthony Thomas, Sanjoy Dasgupta, and Tajana Rosing. 2021. A theoretical perspective on hyperdimensional computing. Journal of Artificial Intelligence Research 72 (2021), 215–249. [57] Matteo Tiezzi et al. 2022. Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 3480–3486. [58] Rishabh Tiwari et al. 2022. Gcr: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99–108. [59] Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17 (2007), 395–416. [60] Erwei Wang et al. 2019. Deep neural network approximation for custom hardware: Where we’ve been, where we’re going. ACM Computing Surveys (CSUR) 52, 2 (2019), 1–39. [61] Qipeng Wang et al. 2022. Melon: Breaking the memory wall for resource-efficient on-device machine learning. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 450–463. [62] Gary M Weiss et al. 2016. Smartwatch-based activity recognition: A machine learning approach. In 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 426–429. [63] Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning. PMLR, 478–487. [64] Daliang Xu et al. 2022. Mandheling: Mixed-precision on-device dnn training with dsp offloading. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 214–227. [65] Weihong Xu, Jaeyoung Kang, and Tajana Rosing. 2023. FSL-HD: Accelerating Few-Shot Learning on ReRAM using Hyperdimensional Computing. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6. [66] Junting Zhang et al. 2020. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1131–1140. [67] Yu Zhang, Tao Gu, and Xi Zhang. 2020. MDLdroidLite: A release-and-inhibit control approach to resource-efficient deep neural networks on mobile devices. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 463–475. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. available on arxiv