As programmers we regularly come across projects that require the task of building binary classifiers of the types A vs ~A, in which when the classifier is given a new data sample, it’s able to predict whether the sample belongs to class A or is an outlier. One reliable but difficult approach to solve such a problem is using the One-class Paradigm. Learning In one-class learning we train the model only on the positive class data-set and take judgments from it on the universe [A union ~A] spontaneously. It’s a hot research topic and there are multiple tools available, like and , to achieve this task. One-class learning can prove to be vital in scenarios where data consisting of ~A samples can take up any distribution and it isn’t possible to learn a pattern for ~A class. One-class SVM Isolation Forest But one-class learning becomes more challenging when the dimensions of the sample points increase. For example, consider an image size of 224x224px — to apply any one-class learning algorithm here straight out of the box can prove fatal due to the immense number of features each sample point holds (in this example it’s 50,176 features). Consequently efficient and discriminating feature representation is required to build a one-class classifier for images or high-dimensional data in general. This post covers the implementation of one-class learning using deep neural net features and compares classifier performance based on the approaches of OC- SVM, Isolation Forest and Gaussian Mixtures. CNNs as Feature Extractors Convolutional Neural Nets have proven to be state-of-the-art when it comes to object recognition in images. CNNs have replaced traditional pipelines where feature extraction and the models used to learn from those features were two separate entities. Moreover, CNNs can also help in extraction of meaningful features for an image because a deep neural net also learns which features are important and which aren’t, in order to distinguish a class from the others. This allows us to simply returned from these deep CNNs which are ready-to-go, and build our classifier. machine learning use features Moreover, the availability of pre-trained CNNs on ImageNet data, with over 1,000 categories and more than 14 million images, has made image categorization much more simple since a pre-trained CNN will generally return features which are to train a light-weight model using them. sufficiently satisfactory As I mentioned earlier — since feature vectors returned by CNNs give powerful representation of the image which generated them, we use these features to train our one-class classifier. For the purpose of this post, we use ResNet-50 as feature extractor for our model. Because it’s fast, accurate, and credible as it . Why? won the ImageNet Challenge in 2015 We use the data-set, which contains both Food and ~Food images (2500 each). Sample images are shown below - Dataset used for this problem — Food5k Implementation details for One Class SVM and Isolation Forest models: We first compute ResNet-50 features for the image data-set. The code for which will look as follows - from keras.applications.resnet50 import ResNet50def extract_resnet(X): # X : images numpy array resnet_model = ResNet50(input_shape=(image_h, image_w, 3), weights='imagenet', include_top=False) # Since top layer is the fc layer used for predictions features_array = resnet_model.predict(X) return features_array We can accordingly compute ResNet features for all the images in the data-set. Next step, follow the pipeline - Apply standard scaler on the obtained features. Principal Component Analysis with n_components = 512. Pass the remaining features to One Class SVM model or Isolation Forest In below code, X_train and X_test are the resnet features for the train and test images. from sklearn.preprocessing import StandardScalerfrom sklearn.decomposition import PCAfrom sklearn.ensemble import IsolationForestfrom sklearn import svm# Apply standard scaler to output from resnet50ss = StandardScaler()ss.fit(X_train)X_train = ss.transform(X_train)X_test = ss.transform(X_test)# Take PCA to reduce feature space dimensionalitypca = PCA(n_components=512, whiten=True)pca = pca.fit(X_train)print('Explained variance percentage = %0.2f' % sum(pca.explained_variance_ratio_))X_train = pca.transform(X_train)X_test = pca.transform(X_test)# Train classifier and obtain predictions for OC-SVMoc_svm_clf = svm.OneClassSVM(gamma=0.001, kernel='rbf', nu=0.08) # Obtained using grid searchif_clf = IsolationForest(contamination=0.08, max_features=1.0, max_samples=1.0, n_estimators=40) # Obtained using grid searchoc_svm_clf.fit(X_train)if_clf.fit(X_train)oc_svm_preds = oc_svm_clf.predict(X_test)if_preds = if_clf.predict(X_test)# Further compute accuracy, precision and recall for the two predictions sets obtained : Predictions returned by both isolation forest and one-class SVM are of the form {-1, 1}. -1 for the “Not food” and 1 for “Food”. PS One Class Classification using Gaussian Mixtures and Isotonic Regression Intuitively, food items can belong to different clusters like , etc., and some food items may also belong to multiple clusters simultaneously. As a result, we can fit a Gaussian mixture on the positive class data points (ResNet features). “ ”. It can be surmised that a mixture model represents clusters of normally distributed subpopulations. A Gaussian mixture model, once fitted on the data, can give us information on the probability of whether any new point was generated from that distribution. cereals, egg dishes, breads Gaussian mixture models are a probabilistic model for representing normally distributed subpopulations within an overall population But beware — (and not actual probabilities). Hence it is necessary to convert these probability density function values to ‘probability scores’, which can then show that a new sample will belong to the Gaussian distribution with “x” amount of confidence. Gaussian mixture models return log of probability density function values for a given sample A simple yet efficient method to accomplish this is by fitting an isotonic regression model on the log probability density scores w.r.t. labels for the validation set data points. is a probability calibration technique which can calibrate classifier scores to approximate probability values by fitting a stepwise non-decreasing function along the scores returned by the classifier. Isotonic regression Implementation Details for one class learning with GMMs # The standard scaler and PCA part remain same. Just that we will also require a validation set to fit# isotonic regressor on the probability density scores returned by GMM# Also assuming that resnet feature generation is donefrom sklearn.mixture import GaussianMixturefrom sklearn.isotonic import IsotonicRegression gmm_clf = GaussianMixture(covariance_type='spherical', n_components=18, max_iter=int(1e7)) # Obtained via grid search gmm_clf.fit(X_train) log_probs_val = gmm_clf.score_samples(X_val) isotonic_regressor = IsotonicRegression(out_of_bounds='clip')isotonic_regressor.fit(log_probs_val, y_val) # y_val is for labels 0 - not food 1 - food (validation set)# Obtaining results on the test setlog_probs_test = gmm_clf.score_samples(X_test)test_probabilities = isotonic_regressor.predict(log_probs_test)test_predictions = [1 if prob >= 0.5 else 0 for prob in test_probabilities]# Calculate accuracy metrics Results and Discussions Following are the results obtained using the three one class learning techniques Results table comparing all three algorithms over food vs ~food data Thus the GMM model (calibrated using isotonic regression) outperforms both the other one-class learning models and is not far from ‘ ’ which is obtained after training a neural net for 700 iterations. Furthermore, in the state-of-art approach the model is trained on both positive and negative data samples, whereas in our approach the model is therefore making it more robust to handle any kind of distributions in the ~A samples. state-of-art just trained on positive class samples, GMM helps improve the precision of the model by correctly predicting more “Not food” images as compared to one-class SVM. This results in considerable less false positives. For example, some of the images for which GMM correctly predicts as “Not food” (as opposed to OC-SVM) are - Not food correctly predicted by GMM as opposed to OC-SVM Finishing off with an additional note — I didn’t go deeper on how to do a grid search for Gaussian Mixture Models but anyone who wishes to read in more about it can check out this . sklearn tutorial If you liked this post you can check more stories from Squad Engineering at . https://medium.com/squad-engineering