Support Vector Data Description (SVDD) is one of the popular boundary methods used in machine learning for anomaly detection. The goal of SVDD is to create a model that captures the characteristics of normal (non-anomalous) data and then identifies instances that deviate from these characteristics as anomalies.
Anomaly detection finds extensive use in various applications, such as fraud detection for credit cards, insurance, or health care, intrusion detection for cyber-security, fault detection in safety-critical systems, and military surveillance for enemy activities.
Imagine you have a set of data points, and most of them represent normal behavior. SVDD aims to create a boundary around these normal data points in such a way that the majority of the data falls inside this boundary. Any data point outside this boundary is then considered an anomaly or an outlier.
In other words, we are teaching a computer to recognize what "normal" looks like based on a set of examples and then being able to flag something as "unusual" if it doesn't fit the learned pattern.
In this article, we deep-dive into the fundamental concepts of SVDD, exploring the utilization of privileged information during the training phase — a technique aimed at enhancing classification accuracy in anomaly detection scenarios.
As said above, a classical approach to anomaly detection is to describe expected ("normal") behavior using one-class classification techniques, i.e., to construct a description of a "normal" state using many examples, e.g., by describing a geometrical place of training patterns in a feature space. If a new test pattern does not belong to the "normal" class, we consider it anomalous.
To construct a "normal" domain, we can use well-known approaches such as the Support Vector Domain Description.
We start with a brief explanation of the original SVDD without using privileged information. We have an i.i.d. samples (x1, . . . , xl)
The main idea of this algorithm is to separate a significant part of samples considered to be "normal" from those considered to be "abnormal" in some sense. We denote by φ(·) mapping the original data point to some more expressive feature space, for example, adding some polynomial features, applying some feature extraction with a deep neural net, or even assuming that mapping is in some infinite dimension space.
Let a be some point in the image of the feature map and R be some positive value. A pattern x belongs to a "normal" class if it is inside the sphere ∥a − φ(x)∥ ≤ R. To find the center a
and the radius R
we solve the optimization problem:
Here ξ is the distance from the xi, located out of the sphere, to the surface of the sphere. If a point is inside the sphere we consider ξi = 0 The variable R can be considered as a radius only if we require its positivity. However, it can be easily proved that this condition is automatically fulfilled if ν ∈ (0, 1), and for ν ̸ ∈ (0, 1), the solution either contains all points or contains none of them.
As you probably guess, since we have support in the name of the algorithm, we will be solving the dual problem:
Here we replace the scalar product (φ(xi) · φ(xj )) with the corresponding kernel K(xi, xj). We can calculate a and R using any xi such that αi > 0
Based on this we could define the decision function:
If f(x) > 0, then a pattern x is located outside the sphere and considered anomalous. Also we could notice that f(x) return value and we could tune the threshold to achieve a target level of true positive and true negative values.
For the original two-class Support Vector Machine, an algorithm creating an optimal boundary between different classes of data points,
Let us provide some examples of privileged information. If we solve an image classification problem, then as privileged information, we can use a textual image description. In the case of malware detection, we can use a source code of malware to get additional features for the classification.
Such information is unavailable during the test phase (e.g., it could be computationally prohibitive or too costly to obtain) when we use the trained model for anomaly detection and classification. Still, it can be used during the training phase.
Let's assume that training data is coming in pairs (xi, xi*). For example, imagine we are trying to detect anomalies in X-ray images. We have both the image itself and the doctor's description. Generally, a text description is more than enough but requires additional assistance. Can they be used during the model training but only make predictions using images? It is possible to use this additional information to improve detection.
In the previous formulation, we have an error in the form of ξi. let's assume that privileged data is so good it could predict the size of an error:
We could think about this as an intelligent teacher who tells during the training that you couldn't get the small error with this value. It's reasonable to concentrate on other, more valuable examples.
Now, let's write down this monster-like equation:
Here γ is a regularization parameter for the linear approximation of the slack variables. ζi are instrumental variables that prevent those patterns belonging to a "positive" half-plane from being penalized. Note that if γ goes to infinity, then the solution is close to the original solution of SVDD.
To avoid complications messing around with the Lagrange function, write down the dual form of this problem:
Here we replace the scalar product (φ* (xi* ) · φ*(xj* )) with the corresponding kernel function K* (xi*, xj*). At the end, the decision function has the same form as in the case of the original SVDD:
Notice that despite being slightly scarier than the original problem, this task is a specific type of optimization called quadratic optimization and can be easily solved by standard approaches like the logarithmic barrier function.
The original SVDD approach focuses on constructing a boundary around normal data points in a high-dimensional space. However, the SVDD+ theory introduces the concept of privileged information during the training phase to enhance classification accuracy.
Privileged information, not available during testing, can be utilized during training to provide additional insights, improving the model's ability to detect anomalies. Incorporating privileged information involves a modification of the original SVDD algorithm, allowing it to consider supplemental data during training, such as textual descriptions accompanying images in medical anomaly detection.
The inclusion of privileged information is framed as a form of intelligent guidance, akin to an informed teacher providing valuable insights to improve the model's learning. The modified SVDD+ formulation involves a quadratic optimization task, solvable through standard approaches like the logarithmic barrier function. Despite the complexity introduced by the inclusion of privileged information, the decision function in the SVDD+ theory maintains a form similar to the original SVDD, facilitating practical implementation.
In summary, the SVDD+ theory showcases a promising avenue for improving anomaly detection by leveraging privileged information during the training phase, offering potential applications across various fields, including image classification and malware detection.