We make recommendation systems easy, simple and fun.
A cold start problem is when the system cannot draw any inferences for users or items about which it has not yet gathered sufficient information. Simply put, if you have no or less initial data, what recommendation is the system supposed to give to the user?
While recommender systems are useful for users who have some previous interaction history, the same might not be the case for a new user or a newly added item. The problem is that in both cases we don’t have any history to base the recommendations on.
Since we don't have any usage history of the user or the item hasn't been yet interacted with by any user, the usual techniques for recommending stuff won't work. This is a common problem of not being able to start recommending things-- hence called the cold start problem.
For any recommendation system that needs to bootstrap from an initial phase with little or no historical data, interaction must be taken into consideration. It will enable the application to provide a personalized experience to its user as a core part of their product. One may argue that solving cold start problems might even make or break a product during its launch.
So, how do we make recommendations without any historical data about the user or item? There are several ways of doing that. The basic idea is to use the data we have about the user or item to make the recommendations.
The most common techniques are listed below:
Using greedy methods is the most basic way of handling cold-start problems because they are very rudimentary and easy to implement. These methods involve using some sort of easily calculable algorithm of serving recommendations to users.
a. Random Normal Predictor
As the name suggests this method serves recommended items randomly. Some more sophisticated methods sample a random value from the distribution of ratings given to the items by other users. Although simple to implement, this method doesn't serve relevant items to the user and is only useful as a means to bootstrap the process of gathering user preferences for future recommender models to use.
b. Most popular items
Another widely used method is to just recommend the most popular items, i.e. either in terms of interaction or best rating to each new user. For example, if I log in to a guitar parts website then it would make sense to show me guitar strings and picks as these are items that most people need to replace in their guitar.
If a user has given explicit or implicit feedback to only a few items then Collaborative Filtering may not give good results. However, we can measure the similarities of these items to other items that the user hasn’t given feedback and recommend the closest ones based on the similarity scores. This will also solve the problem related to recommending new items no one has interacted with yet as we are only looking at the item metadata.
A drawback of this method is the recommended items will not be diverse as they are scored only based on the similarity of items the user has already interacted with. For example, suppose that a user has watched 3 action movies, then only action movies would be recommended to the user. This is, however, more personalized to the user and can work with small interaction data as well.
Another bottleneck could be the speed of generating recommendations as each item needs to be compared to every other item. To overcome such cases we can use fast similarity-search based methods.
The multi-armed bandit method is based on combining the earlier two methods to first showing the recommendation in a random manner, and then iteratively improving them as the user interacts with these recommended items.
The name “bandit” comes from the world of casinos where we have multiple slot machines and we have to decide if we should continue playing a single machine (exploitation) or move to a new machine (exploration) to maximize our winnings. At any given time, we won’t know the internal mechanism of these machines — only the actions we take and rewards we get. Here, we need to decide between multiple choices purely based on taking actions and observing the results. These algorithms try to do a trade-off between exploration and exploitation to identify optimal strategy.
To better understand multi-armed bandit algorithms we can look at the example of website design. Say, we have a new version of our website and need to decide how well it performs compared to the older version. Let’s assume that we have an effective feedback mechanism by which we know the version the user liked or disliked (reward). Our action here is to present the user with option A or B of the website. Our environment is the user visiting the website. Our reward is the feedback received.
This works well if we have no data to start recommending items. The two main steps involved are exploration and exploitation that can be explained below.
We recommend items that have no feedback from the user so it is unchartered territory. When we tread these territories and recommend items never interacted with users, we start to gain the preferences of users. Thus, slowly we know what to recommend to users and what not to. But the priority always lies in recommending items never seen by the user.
If a user likes an item and rates it positively we just keep on recommending the same item or same type of item to the user taking no risks. This usually ensures that the recommendations are going to work but it gets boring pretty soon and doesn’t let different types of items that the user hasn’t interacted with to be tested on the user.
Multi-arm bandit algorithms help us maintain a balance between exploitation and exploration with our inputs. Hence, it helps solve the cold start problem in recommendation systems.