One Class Classification for Images with Deep features by@aniket_52307

December 9th 2017 20,596 reads

As programmers we regularly come across projects that require the task of building binary classifiers of the types A vs ~A, in which when the classifier is given a new data sample, it’s able to predict whether the sample belongs to class A or is an outlier. One reliable but difficult approach to solve such a problem is using the One-class Learning Paradigm.

In one-class learning we train the model only on the positive class data-set and take judgments from it on the universe [A union ~A] spontaneously. It’s a hot research topic and there are multiple tools available, like One-class SVM and Isolation Forest, to achieve this task. One-class learning can prove to be vital in scenarios where data consisting of ~A samples can take up any distribution and it isn’t possible to learn a pattern for ~A class.

But one-class learning becomes more challenging when the dimensions of the sample points increase. For example, consider an image size of 224x224px — to apply any one-class learning algorithm here straight out of the box can prove fatal due to the immense number of features each sample point holds (in this example it’s 50,176 features). Consequently efficient and discriminating feature representation is required to build a one-class classifier for images or high-dimensional data in general.

*This post covers the implementation of one-class learning using deep neural net features and compares classifier performance based on the approaches of OC- SVM, Isolation Forest and Gaussian Mixtures.*

**CNNs as Feature Extractors**

Convolutional Neural Nets have proven to be state-of-the-art when it comes to object recognition in images. CNNs have replaced traditional machine learning pipelines where feature extraction and the models used to learn from those features were two separate entities. Moreover, CNNs can also help in extraction of meaningful features for an image because a deep neural net also learns which features are important and which aren’t, in order to distinguish a class from the others. This allows us to simply use features returned from these deep CNNs which are ready-to-go, and build our classifier.

Moreover, the availability of pre-trained CNNs on ImageNet data, with over 1,000 categories and more than 14 million images, has made image categorization much more simple since a pre-trained CNN will generally return features which are *sufficiently satisfactory* to train a light-weight model using them.

As I mentioned earlier — since feature vectors returned by CNNs give powerful representation of the image which generated them, we use these features to train our one-class classifier. For the purpose of this post, we use ResNet-50 as feature extractor for our model. *Why?* Because it’s fast, accurate, and credible as it won the ImageNet Challenge in 2015.

** Dataset used for this problem — **We use the Food5k data-set, which contains both Food and ~Food images (2500 each). Sample images are shown below -

**Implementation details for One Class SVM and Isolation Forest models:**

We first compute ResNet-50 features for the image data-set. The code for which will look as follows -

`from keras.applications.resnet50 import ResNet50`

def extract_resnet(X):

# X : images numpy array

resnet_model = ResNet50(input_shape=(image_h, image_w, 3), weights='imagenet', include_top=False) # Since top layer is the fc layer used for predictions

features_array = resnet_model.predict(X)

return features_array

We can accordingly compute ResNet features for all the images in the data-set. Next step, follow the pipeline -

- Apply standard scaler on the obtained features.
- Principal Component Analysis with n_components = 512.
- Pass the remaining features to One Class SVM model or Isolation Forest

In below code, X_train and X_test are the resnet features for the train and test images.

`from sklearn.preprocessing import StandardScaler`

from sklearn.decomposition import PCA

from sklearn.ensemble import IsolationForest

from sklearn import svm

# Apply standard scaler to output from resnet50

ss = StandardScaler()

ss.fit(X_train)

X_train = ss.transform(X_train)

X_test = ss.transform(X_test)

# Take PCA to reduce feature space dimensionality

pca = PCA(n_components=512, whiten=True)

pca = pca.fit(X_train)

print('Explained variance percentage = %0.2f' % sum(pca.explained_variance_ratio_))

X_train = pca.transform(X_train)

X_test = pca.transform(X_test)

# Train classifier and obtain predictions for OC-SVM

oc_svm_clf = svm.OneClassSVM(gamma=0.001, kernel='rbf', nu=0.08) # Obtained using grid search

if_clf = IsolationForest(contamination=0.08, max_features=1.0, max_samples=1.0, n_estimators=40) # Obtained using grid search

oc_svm_clf.fit(X_train)

if_clf.fit(X_train)

oc_svm_preds = oc_svm_clf.predict(X_test)

if_preds = if_clf.predict(X_test)

# Further compute accuracy, precision and recall for the two predictions sets obtained

**PS**: Predictions returned by both isolation forest and one-class SVM are of the form {-1, 1}. -1 for the “Not food” and 1 for “Food”.

**One Class Classification using Gaussian Mixtures and Isotonic Regression**

Intuitively, food items can belong to different clusters like *cereals, egg dishes, breads*, etc., and some food items may also belong to multiple clusters simultaneously. As a result, we can fit a Gaussian mixture on the positive class data points (ResNet features). Gaussian mixture models “*are a probabilistic model for representing normally distributed subpopulations within an overall population*”. It can be surmised that a mixture model represents clusters of normally distributed subpopulations. A Gaussian mixture model, once fitted on the data, can give us information on the probability of whether any new point was generated from that distribution.

But beware — ** Gaussian mixture models return log of probability density function values for a given sample** (and not actual probabilities). Hence it is necessary to convert these probability density function values to ‘probability scores’, which can then show that a new sample will belong to the Gaussian distribution with “x” amount of confidence.

A simple yet efficient method to accomplish this is by fitting an isotonic regression model on the log probability density scores w.r.t. labels for the validation set data points. Isotonic regression is a probability calibration technique which can calibrate classifier scores to approximate probability values by fitting a stepwise non-decreasing function along the scores returned by the classifier.

**Implementation Details for one class learning with GMMs**

`# The standard scaler and PCA part remain same. Just that we will also require a validation set to fit`

# isotonic regressor on the probability density scores returned by GMM

# Also assuming that resnet feature generation is done

from sklearn.mixture import GaussianMixture

from sklearn.isotonic import IsotonicRegression

`gmm_clf = GaussianMixture(covariance_type='spherical', n_components=18, max_iter=int(1e7)) # Obtained via grid search`

`gmm_clf.fit(X_train)`

`log_probs_val = gmm_clf.score_samples(X_val)`

`isotonic_regressor = IsotonicRegression(out_of_bounds='clip')`

isotonic_regressor.fit(log_probs_val, y_val) # y_val is for labels 0 - not food 1 - food (validation set)

# Obtaining results on the test set

log_probs_test = gmm_clf.score_samples(X_test)

test_probabilities = isotonic_regressor.predict(log_probs_test)

test_predictions = [1 if prob >= 0.5 else 0 for prob in test_probabilities]

# Calculate accuracy metrics

Following are the results obtained using the three one class learning techniques

Thus the GMM model (calibrated using isotonic regression) outperforms both the other one-class learning models and is not far from ‘state-of-art’ which is obtained after training a neural net for 700 iterations. Furthermore, in the state-of-art approach the model is trained on both positive and negative data samples, whereas in our approach the model is** just trained on positive class samples,** therefore making it more robust to handle any kind of distributions in the ~A samples.

GMM helps improve the precision of the model by correctly predicting more “Not food” images as compared to one-class SVM. This results in considerable less false positives. For example, some of the images for which GMM correctly predicts as “Not food” (as opposed to OC-SVM) are -

Finishing off with an additional note — I didn’t go deeper on how to do a grid search for Gaussian Mixture Models but anyone who wishes to read in more about it can check out this sklearn tutorial.

If you liked this post you can check more stories from Squad Engineering at https://medium.com/squad-engineering.