Authors:
(1) Xiaofan Yu, University of California San Diego, La Jolla, California, USA ([email protected]);
(2) Anthony Thomas, University of California San Diego, La Jolla, California, USA ([email protected]);
(3) Ivannia Gomez Moreno, CETYS University, Campus Tijuana, Tijuana, Mexico ([email protected]);
(4) Louis Gutierrez, University of California San Diego, La Jolla, California, USA ([email protected]);
(5) Tajana ล imuniฤ Rosing, University of California San Diego, La Jolla, USA ([email protected]).
8 Evaluation of LifeHD semi and LifeHDa
9 Discussions and Future Works
10 Conclusion, Acknowledgments, and References
In this section, we present the design of LifeHD, the first unsupervised HDC framework for lifelong learning in general edge IoT applications. Compared to operating in the original data space, HDC improves pattern separability through sparsity and high dimensionality, making it more resilient against catastrophic forgetting [52]. LifeHD preserves the advantages of HDC in computational efficiency and lifelong learning, while handling the input of unlabeled streaming data, which has not been achieved in previous work [18, 21, 22, 41, 65].
Fig. 4 gives an overview of how LifeHD works. The first step is HDC encoding of data into hypervectors as described in Sec. 3. Training samples ๐ are organized into batches of size ๐๐๐๐ง๐ and input into an optional fixed NN for feature extraction (e.g. for images and sound) and the encoding module. The encoded hypervectors ๐ (๐) are input to LifeHDโs two-tier memory design inspired by cognitive science studies [5], consisting of working memory and long-term memory. This memory system intelligently and dynamically manages historical patterns, stored as hypervectors and referred to as cluster HVs. As shown in Fig. 4, the working memory is designed with three components: novelty detection, cluster HV update and cluster HV merge. ๐ (๐) is first input into novelty detection step (1). An insertion to the cluster HVs is made if a novelty flag is raised, otherwise ๐ (๐) updates the existing cluster HVs (2). The third component, cluster HV merge (3), retrieves the cluster HVs from long-term memory, and merges similar cluster HVs into a supercluster via a novel spectral clustering-based merging algorithm [59]. The interaction between working and long-term memory happens as commonly encountered cluster HVs are copied to long-term memory, which we call consolidation (4). Finally, when the size
limit of either working or long-term memory is reached, the least recently used cluster HVs are forgotten (5).
ecently used cluster HVs are forgotten (โ5 ). All modules in LifeHD work collaboratively, making it adaptive and robust to continuously changing streams without relying on any form of prior knowledge. For example, in scenarios of distribution drift, LifeHD may generate new cluster HVs upon encountering drifted samples initially, which can later be merged into coarse clusters. This approach ensures that LifeHD can efficiently capture and retain historical patterns.
In the following, we discuss more details about the major components of LifeHD: novelty detection (Sec. 5.2), cluster HV update (Sec. 5.3), and cluster HV merging (Sec. 5.4). We summarize the important notations used in this paper in Table 1.
The initial novelty detection step (1 in Fig. 4) is crucial for identifying emerging patterns in the environment. SupposeM = {๐1, ...,๐๐} is the set of cluster HVs stored in the working memory. We gauge the "radius" of each cluster by tracking two scalars for each cluster HV ๐: ๐๐ and ๐ห๐ , which represent the mean cosine difference and standard difference between the cluster HV and its assigned inputs. Given ๐ (๐), we first identify the most similar cluster HV, denoted by ๐. LifeHD marks ๐ (๐) as โnovel" if it substantially differs from its nearest cluster HV. Specifically, this dissimilarity is measured by comparing cos(๐ (๐),๐๐) with a threshold based on the historical distance distribution of cluster HV ๐:
The hyperparameter ๐พ fine-tunes the sensitivity to novelties.
LifeHD recognizes new ๐ (๐) as prototypes and inserts them into the working memory. When reaching its size limit ๐, the working memory experiences forgetting (โ5 in Fig. 4). The least recently used (LRU) cluster HV, represented by ๐ฟ๐ ๐ = argmin๐ ๐=1 ๐๐ , is replaced. Here ๐ corresponds to the latest batch index where the cluster HV was accessed. A similar forgetting mechanism is configured for the long-term memory, where the last batch accessed is marked with ๐.
If novelty is not detected, indicating that ๐ (๐) closely matches cluster HV ๐, we proceed to update the cluster HV and its associated information (2 in Fig. 4). This update process involves bundling ๐ (๐) with cluster HV ๐๐ , akin to how class hypervectors are updated as described in Sec. 3, and updated ๐๐ and ๐ห๐ with their moving average
The hyperparameter ๐ผ adjusts the balance between historical and recent inputs, where a higher ๐ผ gives more weight to recent samples. Properly maintaining ๐๐ and ๐ห๐ is vital for tracking the โradiusโ of each cluster HV, affecting future novelty detection. We also increase the hit frequency โ๐๐ก๐ and refresh ๐๐ with current batch index ๐๐๐ฅ. โ๐๐ก๐ is further used to compared with a predetermined threshold โ๐๐ก๐กโ to decide when a working memory cluster HV appears sufficiently frequently to be consolidated to long-term memory (4 in Fig. 4). ๐๐ determines forgetting as described in the previous section. With this lightweight approach, LifeHD continually records temporal cluster HVs from the environment, while the most prominent cluster HVs are transferred to long-term memory.
Cluster HV merge (3 in Fig. 4) has the dual benefit of reducing memory use and of elucidating underlying similarity structure in the data. Intuitively, a group of cluster HVs can be merged if they are similar to each other and dissimilar from other cluster HVs. For instance, one might merge the cluster HVs for Bulldog and Chihuahua into a single โDogโ cluster HV, that remains distinct from the cluster HV for โTabby Catโ.
To merge the cluster HVs, we first construct a similarity graph defined over the cluster HVs from the long-term memory. The cluster HVs correspond to nodes, and a pair of cluster HVs are connected by an edge if they are sufficiently similar. We then merge the cluster HVs by computing a particular type of cut in the graph in a manner similar to spectral clustering [40]. This graph based formalism for clustering is able to capture complex types of cluster geometry and often substantially outperforms simpler approaches like K-Means [59]. We detail the steps of cluster HV merging in LifeHD below, while Fig. 5 offers an illustrative overview.
Step 2: Decomposition. We compute the Laplacian๐ = ๐ท โ๐ด, where ๐ท is the diagonal matrix in which ๐ท๐๐ = ร ๐ ๐ด๐๐ . We then compute the eigenvalues ๐1, .., ๐๐ฟ, sorted in increasing order, and eigenvectors ๐1, ..., ๐๐ฟ of ๐.
Step 3: Grouping. We infer ๐ = max๐โ [๐ฟ] ๐๐ โค ๐๐ข๐, and merge the cluster HVs by running K-Means on ๐1, ..., ๐๐ . The upperbound ๐๐ข๐ is a hyperparameter that adjusts the granularity of merging, with a smaller ๐๐ข๐ leading to smaller ๐ thus encouraging merging more aggressively.
Our merging approach is formally grounded, as discussed in [59]. It is a well-known fact that the eigenvectors of ๐ encode information about the connected components of G. When G has ๐ connected components, the eigenvalues ๐1 = ๐2 = ... = ๐๐ = 0. To recover these components, K-Means clustering on ๐1, ..., ๐๐ can be employed, as explained in [59]. However, practical scenarios may have a few inter-component edges that should ideally be distinct. For instance, when the similarity threshold is imprecisely set, erroneous edges may appear in the graph, causing ๐1, ..., ๐๐ to be only approximately zero. Our merging approach is designed to handle this situation by introducing ๐๐ข๐. The cluster HV merging is evaluated every ๐๐๐๐๐๐ batches, where ๐๐๐๐๐๐ is a hyperparameter that controls the trade-off between merging performance and computational latency. Both ๐๐ข๐ and ๐๐๐๐๐๐ are analyzed in Sec. 7.8, along with other key hyperparameters in LifeHD.
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.