Recommender Systems in E-commerce: Why Are They Crucial and How Do You Improve Them?

In today’s day and age, we all shop online, probably more so than in person. E-commerce continues to boom, with many experts projecting the market to reach almost $6.5 trillion by 2027, which corresponds to an annual growth rate of 12% over the next five years.

An integral part of every e-commerce business is its recommender system (RS) – an inbuilt filtering mechanism that uses ranking and other methods to generate the most suitable results, providing shoppers with personalized suggestions based on their shopping history.

This is crucial, because it allows e-marketplaces to boost their revenues by up-selling and cross-selling to their clients, while also eliminating what’s known as the “burden of choice” by listing the most appropriate items first.

Having an RS that displays shopping items based on each customer’s preferences is how e-commerce platforms retain their customers. Consequently, this becomes all about establishing long-term relationships between e-stores and e-shoppers. For example, according to Accenture Research, a vast majority of consumers – over 90% – like shopping at those e-stores that can quickly recognize their habits and preferences. A well-publicized finding from McKinsey from almost a decade ago famously revealed that 75% of what people watch on Netflix is based on the app’s RS, as is 35% of all purchases made on Amazon. And this trajectory has only solidified and intensified since.

The role of ML

Concurrently, the global AI retail market is projected to tip over $24 billion in 2028. This market is intimately tied to e-commerce because of Machine Learning (ML).

An ML model at the core of every RS allows that RS to carry out its functions. Every ML model, in turn, requires a high-quality labeled dataset that the algorithm can utilize to train itself. As a result, a solid RS – and hence also a successful e-commerce store – implies a source of function-specific labeled data.

This labeled data can be obtained in a number of ways, of which crowdsourcing is consideredone of the fast and most affordable. Through ML, the data powered by human insight has a direct effect on recommender systems and e-commerce.

How annotated data is used

E-commerce companies can use this data to train their ML models (both brand new or pre-trained foundation models) and tailor them to specific tasks that bolster RS performance. This is known as model fine-tuning for downstream applications.

Another way that annotated data is successfully used by e-commerce businesses is commonly referred to as human-in-the-loop monitoring, which is essentially performance evaluation by real people. This is when crowd contributors gauge recommendation systems post-deployment and detect potential problems (for instance, data drift), so that these issues can be nipped in the bud before they take their toll on the business.

Human annotators can evaluate RS performance directly – by rating predictions provided by the model. They can also do it indirectly – by completing the same tasks the model faced and producing “ground truths” that can later be used to judge the model’s responses. The first track is great for evaluation, while the second is great for both evaluation and further RS fine-tuning. This is the case, because the data provided by annotators in the second case doesn’t just say when something is wrong but also provides the right answers.

How an RS can be improved

Now, let’s look at three of the three curious data-labeling tasks that crowd contributors carry out in their effort to help fine-tune and evaluate recommender systems.

Personalized products

A vital feature in any effective RS is the engine’s ability to offer personalized products based on each customer’s shopping preferences (i.e., matching items to shoppers’ profiles). When an RS understands what to offer to every customer, this results in immediate purchases, as well as long-term marketplace loyalty. This loyalty stems from the fact that a customer will be reluctant to look for new shopping options elsewhere if their habits and needs are already understood and met by their favorite e-store. To achieve that, crowd contributors act as e-customers with individual shopping histories and annotate training data by rating “degrees of interest” on the items offered to them by an RS.

Complementary item discovery

The goal here is to improve an RS, so that it offers accurate recommendations of relevant accessories and complementary items. A good example would be a phone case that matches the exact model of a smartphone recently purchased or carted by the e-shopper. Often, automated solutions that are meant to do this get complaints about their inaccuracy. So, in order to provide an ML model with the right data to improve an RS, crowd contributors carry out a series of pairwise comparisons. During this form of data labeling, human annotators match different items that can potentially be grouped together. After fine-tuning based on this labeled data, RS accuracy for complementary item discovery has been reported to climb to over 90%.

Serendipitous item discovery

Another useful data-labeling task that’s used to train, fine-tune, evaluate, and ultimately improve recommender systems is what’s known as serendipitous item discovery. This means that a well-trained ML algorithm at the core of an e-commerce engine will recommend new or surprising products that aren’t related to other goods bought by the same customer. Sometimes, serendipitous items may be linked to shopping behavior, but even more often, it’s about their overall “coolness” from the perspective of novelty, not necessarily usability.

Serendipitous item discovery is crucial in e-commerce, because there’s so much a well-tuned RS can do as far as suggesting complementary items; after all, there’s a finite number of products that can be used with other products. So, to get an RS to provide more options to e-shoppers, crowd contributors are asked to annotate more data and determine what can be seen as unusual products or novelty items. This is done via text and image classification tasks, with up to 20,000 items being processed in a single day.

Key takeaways

Today, recommender systems are fundamental to success in e-commerce. These systems need to operate effectively to give e-marketplaces an edge in a highly competitive environment. Since recommender systems rely on ML models for functioning, high-quality labeled data is needed to both train and fine-tune ML models, as well as test them after deployment, which is known as performance evaluation or human-in-the-loop monitoring.

Annotated data for RS improvement has to be store-specific, and it has to be delivered to e-platforms quickly, on a continuous basis, and at an affordable rate. Crowdsourcing offers a viable alternative to lengthy and expensive in-house labeling, with platforms like Toloka serving international e-commerce clients with their large fleet of global crowd contributors and ML engineers.