Recommendation (or recommender) systems are not new, they have been widely used for years, and the technology behind them is constantly evolving and re-evaluated. Whether it is a small e-shop or a multinational giant, the principle remains the same: The site visitor’s interest must be kept alive, with products (physical or not) or content suggestions that could potentially fit their needs or intrigue them. There are various methods that have been developed to implement such systems. It is not in the scope of this article to explain the bits and bobs of these methods, but generally speaking, they fall into three broad categories:
Content-based
Collaborative filtering
Hybrid
It goes without saying that each method has its pros and cons. In this article, we will explore (using an example) how to generate recommendations for products based on their visual similarity, ignoring their attributes or user ratings. For this, we need a system that would somehow “summarise” some distinct features of the product images and then calculate the similarity between them.
Transfer learning is a technique that involves using the “knowledge” that a model has gained when trained to solve a specific task. We can then make use of that “knowledge” to help us with a similar task instead of starting from scratch, which, would require a lot of extra work. So, how is this related to our image similarity problem? How can we make use of existing model knowledge? There are several pre-trained model architectures that we can choose from. The heart of these models is Convolutional Neural Networks, which are widely used in the image classification domain. We will choose the VGG16 architecture for this example. VGG16 (also called OxfordNet) is a convolutional neural network architecture named after the Visual Geometry Group from Oxford, which developed it. It consists of 16 layers and it comes pre-trained on ImageNet (roughly 14 million images from 1000 different classes). This is the “knowledge” I mentioned earlier. But we don’t need all the network’s layers. VGG16 will out of the box perform image classification. We are only interested to use the layers that gather features for the images and get rid of the layers that output probabilities.
We will use Google Colab to execute the code. The product images come from the Zappos50k dataset, which is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are stored in a structure that follows the pattern category/subcategory/designer/
. Since we are not interested in predicting the categories, we just need to collect all the images, assuming they are stored in Google Drive under this path:
/content/drive/MyDrive/Data/Zappos/ut-zap50k-images-square
Let’s start by importing the necessary packages and set up some values:
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd
from keras.applications.vgg16 import VGG16,preprocess_input
from keras.preprocessing.image import load_img,ImageDataGenerator
from keras.models import Model
ZAPPOS_DATA = "/content/drive/MyDrive/Data/Zappos/ut-zap50k-images-square"
BATCH_SIZE = 64
IMG_SIZE = (224, 224)
VGG16 expects 224x224 RGB images
We will load the model but since we won’t use it as a classifier but as a feature extractor, we don’t need the classification layer.
# load the model
vgg16 = VGG16()
# manually remove the final output layer
vgg_custom = Model(inputs=vgg16.input, outputs=vgg16.layers[-2].output)
zappos_images = [str(x) for x in Path(ZAPPOS_DATA).rglob("*.jpg")]
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
imgs_dframe = pd.DataFrame(zappos_images,columns=['filename'])
reduced_img_dframe = imgs_dframe.sample(n=5000,random_state=42)
dset = datagen.flow_from_dataframe(reduced_img_dframe,
target_size = IMG_SIZE,
batch_size = BATCH_SIZE,
class_mode=None,
shuffle=False)
We randomly select a set of 5000 images to speed up the process
preds = vgg_custom.predict(dset)
print(preds.shape)
(5000, 4096)
The prediction step might take a while to complete
Ok, so now we are at the point where we have fed the network with 5000 images and extracted their features. Each image is represented by a vector of 4096 elements and we will measure the cosine similarity between each pair. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. This will generate a matrix that we can (optionally) save as a Pandas dataframe for future use.
from sklearn.metrics.pairwise import cosine_similarity
cosSimilarities = cosine_similarity(preds)
cos_similarities_df = pd.DataFrame(cosSimilarities, columns=reduced_img_dframe['filename'], index=reduced_img_dframe['filename'])
cos_similarities_df.to_pickle('/content/drive/MyDrive/Data/zapos_cosine.pkl')
cos_similarities_df.shape
(5000, 5000)
We have everything we need to make product recommendations. We will write a simple class that given a product image will look up in the similarity matrix and return the most similar product images.
class SimilarityPredictor:
def __init__(self, cosinedf):
self.cosinedf = cosinedf
def getSimilar(self,originalPath,noRelated):
print("Original product")
org = load_img(originalPath)
plt.imshow(org)
plt.show()
closest_imgs = self.cosinedf[originalPath].sort_values(ascending=False)[1:noRelated+1].index
closest_imgs_scores = self.cosinedf[originalPath].sort_values(ascending=False)[1:noRelated+1]
print("You might also like: ")
f = plt.figure(figsize=(20,12))
for i in range(0,len(closest_imgs)):
org = load_img(closest_imgs[i])
f.add_subplot(1, len(closest_imgs), i + 1)
plt.imshow(org)
Let’s test with some random product images:
similarity = SimilarityPredictor(cos_similarities_df)
randomProd = str((reduced_img_dframe['filename'].sample(n=1)).tolist()[0])
similarity.getSimilar(randomProd,5)
I ran the above code cell three times and these are the images returned (the original image is in the first row and the 5 suggestions in the second):
The suggestions are more or less on point. The suggested products are visually similar to the original and we achieved this without using any product data (like category, sub-category, color, etc), just images.