Assessing Preference Manipulation in Amazon's Recommender System

Authors:

(1) Jonathan H. Rystrøm.

Table of Links

Abstract and Introduction

Conclusions and References

A. Validation of Assumptions

B. Other Models

C. Pre-processing steps

5 Discussion

5.1 Key Findings

Recall our research question:

• RQ: Has the Amazon Book Recommender System made it more difficult for users to change preferences over time?

Our analysis finds highly significant growth in Barrier-to-Exit over time in the period spanning from 1998 to 2018. We can therefore reject the null hypothesis of no change, which provides evidence that the Amazon Book Recommender system has indeed made it more difficult to change preferences.

The growth rate of Barrier-to-Exit is approximately 1.8%. This implies that over the 20-year period of the dataset, the Barrier-to-Exit has increased by approximately 43%.[3] While extrapolating is a tricky matter - particularly, for nonlinear parameters (Timmers & Wagenaar, 1977). However, we must moderate these findings in light of the limitations of the study, which discuss in the next section.

5.2 Strengths and Limitations

There are both strengths and weaknesses in our analysis. The main strength is the scale of the analysis. This scale allowed us to test for the relatively small effect sizes - even while relying on noise-introducing approximations like defining category-relevance using co-occurrence (see 3.1). The scale also allowed us to test the feasibility of using Barrier-to-Exit in practice, which we will return to later.

However, the design also has some significant limitations, particularly with a) the validity of the proxy (i.e. the construct validity) and b) the potential sample bias.

The primary issue with the validity of the proxy is that we have an indirect and incomplete view of the recommendation process. The ratings only constitute a small part of the interaction with the recommender system. The primary feedback loop is plausibly in the purchasing and browsing behaviour (Smith & Linden, 2017). This breaks with the original framing from (Rakova & Chowdhury, 2019), which assumes a more direct interaction between ratings and recommender systems. While this assumption holds for MovieLens (Harper & Konstan, 2016), it is more problematic for Amazon. The MovieLens recommender system is made around ratings; the ”contract” between the user and MovieLens is that the user provides ratings and MovieLens gives personalised recommendations based on those ratings.

The ratings on Amazon, however, have a more public purpose: they help other consumers choose products (Leino & R¨aih¨a, 2007). In some sense, this makes ratings a strong signal for preferences (Leino & R¨aih¨a, 2007).

The main implication of the validity is thus one of coverage. The Amazon rating process thus provides a cost both in terms of time (you have to publicly create a review) and money (most users probably rate products they bought; Leino & R¨aih¨a, 2007). Thus, the average Amazon user in the filtered dataset made 43 ratings while the average MovieLens user has made ca. 740 ratings (Harper & Konstan, 2016).[4]

This leads us to potential sample bias. Because Barrier-to-Exit requires relatively many reviews to have a welldefined value, our analysis has predominantly very active users. This introduces a bias: we can only draw inferences for this particular subset of users and not the general population of Amazon customers. The sample bias shows a problematic aspect of Barrier-to-Exit as a model for preference change, which we will discuss further in section 5.3.

These active users plausibly represent a substantial fraction of Amazon’s revenue. However, though there is plausibly a correlation between the number of ratings a user has made and commercial interest for Amazon, this need not be the case.

Additionally, there are important limitations with the statistical analysis that must be addressed.

First, some assumptions were violated (see appendix A). Specifically, this has to do with the normality of the residuals and random effects. These violations arise from ill-behaving categories (see 6 in appendix A). Theoretically, this should not affect the fixed effects estimates (Schielzeth et al., 2020). Accordingly, re-fitting the model with the problematic categories removed showed similar results (see appendix B.2). However, it does affect the interpretation of the VPCs of the two categories. Since these are mainly used to control for variability in the levels (Baayen et al., 2008) and not for testing hypotheses (like described by Maddala, 1971), this is a minor problem.

There are two potential statistical causes of these issues. The first is a lack of data from the early years of Amazon. As fig. 4b shows, there are very few observations of Barrier-to-Exit in the early years compared to later. Amazon has grown dramatically in the past 25 years (Wells et al., 2018), which has been fuelled by many more people having access to the internet (Pandita, 2017). Statistically, this makes it more difficult to assess the long-term increase as early observations will tend to have high leverage (Fox, 2015).

This leads us to the second issue: problems with transformations. Transforming the data introduces problems for the validity and interpretability of the results (Feng et al., 2014). This makes some researchers argue that it is better to refrain - even when the assumptions are violated (Schielzeth et al., 2020).

In our case, fitting the models without transforming the Barrier-to-Exit made it impossible to fit the models without singular fits (Fox, 2015). Since our data had no zero-values and had a relatively high mean given the skew, we could log the data while avoiding the most serious problems from the transformation (O’Hara & Kotze, 2010; Ghasemi & Zahediasl, 2012).

However, there are alternatives. We could have utilised generalised linear mixed models (GLMM; Fox, 2003). GLMMs allow us to specify a link function which makes it possible to account for the non-normality in a more elegant way (Fox, 2015). As Barrier-to-Exit is continuous and right-skewed it might be well-fitted by a gamma-distribution (Nakagawa et al., 2017). As a post-hoc test, we fit a gamma GLMM to the data (see B.3). The findings are similar (positive growth in Barrier-to-Exit), however, problems with the fit prohibit strong conclusions. This highlights the added complexity of modelling with GLMMs (Fox, 2003).

5.3 Implications and Further Research

Because of the issues discussed above, we fail to provide a ”smoking gun” for preference manipulation in Amazon’s book recommender system. Nevertheless, in attempting to model the evolution of Barrier-to-Exit for Amazon, we have uncovered some important perspectives that can inform further work in investigating preference manipulation.

First, it is essential to have metrics that are defined for the entire population. Metrics can exclude certain people either in their definition or execution. As previously discussed, Barrier-to-Exit is only defined for people who have made several ratings about the same category within the specified time window. This excludes both customers who choose not to (publicly) rate products and customers who only use Amazon sporadically. Previous research suggests that ”lurkers” (people who do not actively post) make up a substantial part of the internet population (Nonnecke & Preece, 2000). Investigating preference manipulation for these users is important - both ethically (Jannach & Adomavicius, 2016) and legally (Franklin et al., 2022). However, with Barrier-to-Exit we lack the data to accomplish this.

One way to get broader coverage is to shift the metrics from the user-level to the system-level - i.e. whether the recommender system manipulates preferences in general. In this paper, we use the aggregate Barrier-to-Exit of many users to investigate the trend over time. By instead focusing on the system itself we could expand our toolbox. This includes ”sock puppet”-auditing (Sandvig et al., 2014): creating fake profiles to interact with recommender systems in controlled ways. ”Sock puppet”-audits have been used to investigate whether different recommender systems facilitate radicalization (Ledwich & Zaitsev, 2019). However, the methodology comes with its own set of practical and ethical limitations (see Sandvig et al., 2014).

Second, there is a dilemma of portability (i.e. how well the metric can be used across contexts; see Selbst et al., 2019). On the one hand, socio-technical metrics (like Barrier-to-Exit) need to be tailored to their context. Blindly ”porting” metrics from one domain to another can obscure the original purpose. On the other hand, portability between different systems is necessary for comparison.

Barrier-to-Exit was designed for a content recommendation system based on user ratings. Intuitively, that should make it well-suited for use on Amazon book recommender; The setup of the data is similar (Harper & Konstan, 2016; Ni et al., 2019). Nevertheless, the difference in context makes it difficult to port Barrier-to-Exit to Amazon: this introduces the issues with sampling bias, which we discussed earlier.

One solution is to rely on audits conducted by the companies themselves. These audits would provide a more accurate estimate of user preferences than external ratings as they have access to proprietary data. However, this raises ethical concerns as the companies may not accurately report any negative findings about their systems. Some researchers support this type of self-governance (Roski et al., 2021), while others are sceptical (Zuboff, 2019). In any case, there must be mechanisms in place to verify these audits in order to establish trust and comply with regulations (Floridi et al., 2022).

Further work should focus on creating measures of preference manipulation in content-based recommender systems. These should focus on having a high-construct validity and a high coverage (i.e. measure ”actual” effort of preference change for close to all users).

[3] 1.01820 ≈ 1.43

[4] Bear in mind the long-tailed nature of both distributions

This paper is available on arxiv under CC 4.0 license.

Assessing Preference Manipulation in Amazon's Recommender System

Too Long; Didn't Read

Table of Links

5 Discussion

5.1 Key Findings

5.2 Strengths and Limitations

5.3 Implications and Further Research

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Assessing Preference Manipulation in Amazon's Recommender System

Too Long; Didn't Read

Table of Links

5 Discussion

5.1 Key Findings

5.2 Strengths and Limitations

5.3 Implications and Further Research

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics