Before exploring the SVDD+ implementation, let us briefly revisit what SVDD (Support Vector Data Description) and SVDD+ entail.
Support Vector Data Description (SVDD) is a method for describing data by encapsulating the target dataset within a hypersphere, suitable for one-class classification or outlier detection. SVDD+, in its turn, is a novel approach that integrates privileged information into the traditional SVDD framework.
Unlike classical SVDD, which disregards privileged information often present in human learning, SVDD+ leverages it to optimize the training phase through the construction of a set of corrective functions.
Moving forward, we will look in detail at SVDD+ as a quadratic optimization, explore its implementation using the cvxopt library, and conduct a comparative analysis between SVDD+ and OneClassSVM.
In the previous post, we discussed building an anomaly detection algorithm that could use special information available only during the training phase.
In the process, we show that we have to solve this optimization problem:
Alternatively, we could solve a dual problem:
Where K(xi, xj) is a dot product in some feature space. Which turns out to be a special case of quadratic optimization problem.
We reproduce the solution by running the following:
There are techniques designed particularly for this problem. However, during this, we will be using a more generic solver
Usually, quadratic optimization is written down in the following form:
P is a symmetric semi-positive definite matrix, so this doesn’t look like what we previously had. And will require an additional workaround.
First of all, let's define a new x, which is a combination of variables α, δ . We are going to stack them together in the following form. First, let's define a new x, a combination of variables α, **δ . We are going to stack them together in the following form:
We could always reconstruct original values based on this new vector:
With this new variable x, we could introduce matrix P as:
K is a matrix with pairwise kernel function values on training set data. Kij = K(xi, xj) The same goes for K* privileged information.
Now, we can write down the sum in the optimization problem as matrix multiplication:
This is precisely what we have in the dual problem formulation.
Now, we have to deal with equality constraints. We know that the sum of all α is equal to, v let's rewrite it as matrix multiplication: Here, the first l elements are equal to 1, and the second l are equal to 0. This gives us the one off first l elements of x, which are equal to αi.
The second equality will be slightly trickier. If we find a product with a vector with the first l elements equal to zero and the second equal to one, we will get:
So, the final matrix formula for equal constraints:
Now, we can describe inequality constraints. Notice that we could isolate α from x by multiplying to block matrix:
Here, we have a block matrix of size lx2l combined from the l by l Identity matrix and zeroes matrix.
To isolate δ will require slightly more effort. We will combine two identity matrices to get δ: Finally, combining it all together:
Now we are going to use cvxopt library. Having matrices ordinary and privileged features combined with kernel function, we could prepare optimization problem with the following Python function:
def prepare_problem(self, X, Z, original_kernel,
privileged_kernel, nu, gamma)
gamma = privileged_regularization
C = 1.0 / len(X) / self.nu
size = X.shape[0]
kernel_x = features_kernel(X)
kernel_z =privileged_kernel(Z)
zeros_matrix = np.zeros_like(kernel_x)
P = 2 * np.bmat([[kernel_x, zeros_matrix],
[zeros_matrix, 0.5*gamma*kernel_z]])
P = matrix(P)
q = matrix(list(np.diag(kernel_x)) + [0] * size)
A = matrix([[1.]*size + [0.]*size, [1.] * size*2]).T
b = matrix([1., 1.])
G = np.bmat([[-np.eye(size), zeros_matrix],
[-np.eye(size), np.eye(size)],
[np.eye(size), -np.eye(size)]])
G = matrix(G)
G = sparse(G)
h = matrix([0]*size*2 + [C]*size)
optimization_problem = {'P': P, 'q': q, 'G': G,
'h': h, 'A': A, 'b': b}
return optimization_problem
You can find the full implementation in the following repository: pip install git+https://github.com/sklef/ISVDD.git
We will use the kdd99 dataset to show how this approach works. The dataset contains information about TCP connections for the nine weeks of work. The first seven weeks are used as a train set, and the other two are used as a test set. The distribution of the train set and the test set are different. The test set even contains attacks not presented in the train set.
To get data, we could use the following code:
from sklearn.datasets import fetch_kddcup99
X, y = fetch_kddcup99(return_X_y=True, as_frame=True)
There are three types of features in kdd99 dataset:
basic_features = ["duration", "src_bytes", "dst_bytes", "land", "wrong_fragment", "urgent"]
content_features = ["hot", "num_failed_logins", "logged_in", "num_compromised", "root_shell",
"su_attempted", "num_root", "num_file_creations", "num_shells", "num_access_files",
"num_outbound_cmds", "is_host_login", "is_guest_login"]
traffic_features = ["count", "serror_rate", "rerror_rate", "same_srv_rate", "diff_srv_rate",
"srv_count", "srv_serror_rate", "srv_rerror_rate", "srv_diff_host_rate"]
ordinary_features = X[basic_features]
privileged_features = X[traffic_features + content_features]
Before the experiment, we remove categorical features and normalize the rest by subtracting its mean value and dividing by its standard deviation plus an additional constant equal to 1e-5. This constant was added to prevent dividing by zero.
ordinary_features = features[basic_features].astype(float)
ordinary_features = (ordinary_features - ordinary_features.mean()) / (ordinary_features.std() + 1e-5)
privileged_features = features[traffic_features + content_features].astype(float)
privileged_features = (privileged_features - privileged_features.mean()) / (privileged_features.std() + 1e-5)
Further on we will compare the work of SVDD+ with OneClassSVM. It’s possible to prove that using rbf_kernel will produce the same result for OneClassSVM and SVDD. So, by comparing with OneClassSVM, we are comparing with the original SVDD. Unfortunately, there are no good enough Python APIs for this algorithm.
Let us suppose that we have a train and a test set. We know which records correspond to an attack in the train set and want to determine which ones are malicious on the test set.
To do so, we take only normal records from the train set and fit One-Class SVM. After that, we apply this trained model to detect elements from the test set that differed from the normal ones from the train set.
As a result, we get an anomaly score, which expresses the algorithm’s confidence in its prediction. Then, we are going to calculate the area under precision-recall.
By using cross-validation, we will also get a standard deviation of metric estimation.
from sklearn.model_selection import KFold
def evaluate_parameters(model, ordinary_features,
labels, privilged_features=None):
splitter = KFold(n_splits=10)
all_scores = []
for test, train in splitter.split(labels):
train_labels = labels.iloc[train]
normal_data_indices = train_labels[train_labels == b"normal."].index
train_oridnary_features = ordinary_features.loc[normal_data_indices]
if privilged_features is not None:
train_privileged_features = privilged_features.loc[normal_data_indices]
model.fit(train_oridnary_features, train_privileged_features)
else:
model.fit(train_oridnary_features)
test_features = ordinary_features.iloc[test]
predictions = model.decision_function(test_features)
test_labels = labels.iloc[test] != b"normal."
all_scores.append(average_precision_score(test_labels, -predictions))
return np.mean(all_scores), np.std(all_scores)
We are going to use RBF kernels with different values of parameter v
We will fix parameter v, privileged information kernel, and regularization.
all_gammas = np.logspace(-7, 7, 30)
ordinary_average_precision_mean = np.zeros_like(all_gammas)
ordinary_average_precision_std = np.zeros_like(all_gammas)
for index, gamma in enumerate(all_gammas):
average_precision, average_precision_std = evaluate_parameters(OneClassSVM(gamma=gamma), ordinary_features, labels)
ordinary_average_precision_mean[index] = average_precision
ordinary_average_precision_std[index] = average_precision_std
privileged_average_precision_mean = np.zeros_like(all_gammas)
privileged_average_precision_std = np.zeros_like(all_gammas)
for index, gamma in enumerate(all_gammas):
priv_kernel = partial(rbf_kernel, gamma=1e-4)
kernel = partial(rbf_kernel, gamma=gamma)
model = ISVDD(0.1, kernel, priv_kernel, privileged_regularization=0.1, tol=0.0001, max_iter=100, silent=True)
average_precision, average_precision_std = evaluate_parameters(model, ordinary_features, labels, privileged_features)
privileged_average_precision_mean[index] = average_precision
privileged_average_precision_std[index] = average_precision_std
Though the usage of privileged information leads to better performance on test data with the same feature space, its performance is still worse than using all features for the training and test phase. These results demonstrate that we can see privileged information as a special case of regularization.
In addition, we demonstrate performance in test data for different combinations of original features' kernel width and privileged features' kernel width. A wider original kernel width allows us to perform better with a narrower privileged kernel width, which can be explained as the regularization role of the privileged information.