Table of links Table of links Abstract Abstract Abstract 1 Introduction 1 Introduction 1 Introduction 2 Background and Related Work 2 Background and Related Work 2 Background and Related Work 2.1 Different Formulations of the Log-based Anomaly Detection Task 2.2 Supervised v.s. Unsupervised 2.3 Information within Log Data 2.4 Fix-Window Grouping 2.5 Related Works 3 A Configurable Transformer-based Anomaly Detection Approach 3 A Configurable Transformer-based Anomaly Detection Approach 3 A Configurable Transformer-based Anomaly Detection Approach 3.1 Problem Formulation 3.2 Log Parsing and Log Embedding 3.3 Positional & Temporal Encoding 3.4 Model Structure 3.5 Supervised Binary Classification 4 Experimental Setup 4 Experimental Setup 4 Experimental Setup 4.1 Datasets 4.2 Evaluation Metrics 4.3 Generating Log Sequences of Varying Lengths 4.4 Implementation Details and Experimental Environment 5 Experimental Results 5 Experimental Results 5 Experimental Results 5.1 RQ1: How does our proposed anomaly detection model perform compared to the baselines? 5.2 RQ2: How much does the sequential and temporal information within log sequences affect anomaly detection? 5.3 RQ3: How much do the different types of information individually contribute to anomaly detection? 6 Discussion 6 Discussion 6 Discussion 7 Threats to validity 7 Threats to validity 7 Threats to validity 8 Conclusions and References 8 Conclusions and References 8 Conclusions and References 2 Background and Related Work 2 Background and Related Work 2.1 Different Formulations of the Log-based Anomaly Detection Task Previous works formulate the log-based anomaly detection task differently. Generally, the common formulations can be classified into the following categories. Binary Classification The most common way to formulate the log-based anomaly detection task is to transform it into a binary classification task where machine learning models are used to classify logs or log sequences into anomalies and normal samples [1]. Both supervised [18–20] and unsupervised [8] classifiers can be used under this formulation. In unsupervised schemes, a threshold is usually employed to determine whether it is an anomaly based on the degree of pattern violation. Binary Classification Future Event Prediction There are also some approaches that formulate the anomaly detection task as a prediction task [10]. Usually, sequential models are trained to predict the potential future events given the past few logs within a fixed window frame. In the predicting phase, the models are expected to generate a prediction with Top-N probable candidates for a future event. If the real event is not among the predicted candidates, the unexpected log is considered an anomaly which violates the normal pattern of log sequences. Future Event Prediction Masked Log Prediction The log-based anomaly detection task can also be formulated as a masked log prediction task [21], where models trained with normal log sequence data are expected to predict the randomly masked log events in a log sequence. Similar to future event prediction, a log sequence is considered normal if the actual log events that appeared in log sequences are among the predicted candidates. Masked Log Prediction Others Others Some works formulate the anomaly detection task as a clustering task, where feature vectors of normal and abnormal log sequences are expected to fall into different clusters [22]. The prediction of the label for the log sequence is determined based on the distance between the sequence to be processed and the centroids of the clusters. Moreover, there are previous approaches that utilize invariant mining [9] to tackle the task. They identify anomalies by discerning pattern violations of feature vectors of log sequences. 2.2 Supervised v.s. Unsupervised 2.2 Supervised v.s. Unsupervised Another dimension of the formulations of the anomaly detection tasks is based on the training mechanisms. Supervised anomaly detection methods demand labeled logs as training data to learn to discern abnormal samples from normal ones, while unsupervised methods learn from the normal pattern from normal log data and do not require labels in the model training process. Unsupervised methods offer greater practicality as we do not usually have access to well-annotated log data. However, supervised methods usually achieve superior and more stable performance according to previous empirical studies. 2.3 Information within Log Data 2.3 Information within Log Data Generally, log data that is formed by sequences of log events contains various types of information. Within a log sequence, the occurrences of logs from different templates serve as a context and are a distinctive feature for log sequences. Similar to the Bag-of-Words model, numerical presentation based on the frequency of the template occurrences can represent log sequences and be used in anomaly detection. Various works [1] utilize the MCV to represent this information. Moreover, the sequential information within the log items provides richer information about the occurrences of logs and probably reflects the execution sequence of applications and services. DeepLog [10] uses a LSTM model to encode the sequential information. Furthermore, the temporal information from the log data provides even richer details about the occurrence of logs. The time intervals between log events may offer valuable insights into anomaly detection and other log analysis tasks about the system status, workload, and potential blocks. Du et al. [10] tried to utilize this information in a Parameter Value Anomaly Detection model for anomaly detection. Besides, textual or semantic information provided by log messages has garnered significant attention in recent studies [5, 11, 12]. Given the inherent nature of log data, log messages written by developers articulate crucial information in natural language regarding the system’s operations, errors, and events, making them valuable for troubleshooting and system analysis. Various natural language processing techniques are employed to extract textual features and generate embeddings for log messages. From basic numerical statistics such as TF-IDF to word embedding techniques like Word2Vec, and advancing to advanced contextual embedding methods like BERT, these advancements are geared towards more accurately capturing the semantic information contained within log messages. Their objective is to distinguish between unrelated logs and connect similar ones, thereby supplying more informative and distinguishable features for subsequent downstream models. In addition, the parameters carried by the log messages offer more diverse information about the systems. However, as most parameters are system-specific and lack a consistent format or range, deciding on the best way to model the information from different parameters is a formidable challenge. In most previous works, the parameters that are usually numbers and tokens are removed in pre-processing stages. In DeepLog [10], a parameter value anomaly detection model for each log key (i.e., log template) is used to detect anomalies associated with parameter values as an auxiliary measure to the log key anomaly detection model. In a more recent study [12], a parameter encoding module is employed to produce character-level encodings for parameters. Following this, each output is assigned a learnable scalar, which functions as a bias term within the self-attention mechanism. Moreover, log data generated by various systems and applications often contains system-specific information that may require domain-specific knowledge and tailored approaches to optimize the performance of downstream tasks. 2.4 Fix-Window Grouping 2.4 Fix-Window Grouping Available public datasets for log-based anomaly detection have either sequence-level or event-level annotations. For the datasets that do not have a grouping identifier, fix-length or fix-time grouping is often employed in the pre-processing process to form log sequences that can be processed by log representation techniques and anomaly detection models. Various grouping settings have been used in previous studies for public datasets [1]. The different grouping settings generate different amounts of samples and varying contextual windows of log data, making direct comparisons of their performance impossible. Moreover, the logs are not generated with fixed rates or fixed lengths. Using fixed-window grouped log sequences for training and testing samples does not align with the actual scenarios. 2.5 Related Works 2.5 Related Works Recent empirical studies on log-based anomaly detection aim to deepen the understanding of the existing log-based anomaly detection models and the public datasets for evaluation. They focus on several issues. Le et al. [15] conducted an in-depth analysis of recent deep-learning anomaly detection models over several aspects of model evaluation. Their findings suggest that different settings of stages in anomaly detection would greatly impact the evaluation process. Therefore, using diverse datasets and analyzing logical relationships between logs are important for assessing log-based anomaly detection approaches. Wu et al. [7] conducted an empirical study on vectorization (i.e., representation) techniques for log-based anomaly detection. They evaluated the effectiveness of some existing classical and semantic-based techniques with different anomaly detection models. Their experimental results suggest that the classical ways of transforming textual logs into feature vectors can achieve competitive results with more complex semantic embeddings. A more recent work [23] compared classical and deep-learning approaches of log-based anomaly detection methods. Their experimental results also suggest that simple models can outperform complex log vectorization methods. The deep learning approaches fail to surpass the simpler techniques. Their work highlights the need to critically analyze the datasets used in evaluation. Moreover, Landauer et al. [16] critically reviewed the common log datasets used to evaluate anomaly detection techniques. Their analysis of the log datasets suggests that most anomalies are not directly associated with sequential information within the log sequence. Sophisticated detection methods are unnecessary for attaining excellent detection performance. Their findings also highlight the creation of new datasets that incorporate sequential anomalies for evaluating anomaly detection approaches. In our work, we proposed a Transformer-based anomaly detection model capable of capturing sequential and temporal information within the log sequence, in addition to event occurrence and semantic information. Due to the flexibility of the proposed model, we can easily utilize various combinations of log features as input for our evaluations. Through a series of carefully designed experiments, we scrutinized the four common public datasets and deepened our understanding of the roles of different types of information in identifying anomalies within the log sequence. Our findings are generally in accordance with the previous empirical studies. However, our analysis offers a more comprehensive and detailed understanding of the anomaly detection task and the studied public datasets. In our work, we proposed a Transformer-based anomaly detection model capable of capturing sequential and temporal information within the log sequence, in addition to event occurrence and semantic information. Authors: Xingfang Wu Heng Li Foutse Khomh Authors: Xingfang Wu Heng Li Foutse Khomh Xingfang Wu Xingfang Wu Heng Li Heng Li Foutse Khomh Foutse Khomh This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv available on arxiv