Can We Ever Fully Remove Bias from AI Recommendation Systems?

Our case study applied our proposed framework to observe how attribute association bias changes when removing user gender as a feature in an industry LFR model. Gonen and Goldberg [30] suggested the capability for user gender bias to be systematic bias embedded within the latent space, thus making it difficult for simple mitigation techniques to address the core issue; they noted “a systematic bias found in the embeddings, which is independent of the gender direction.” Given this independence, debiasing methods grounded in removing the gender direction were found to be “superficial” fixes. Systematic bias in our case study, similar to that found in word embeddings, would result in significant attribute association bias even when user gender is not included as a model feature'.

Like the results by Gonen and Goldberg [30], our case study suggests that gender stereotypes can become implicitly embedded in the representations of both users and items, as supported by the persistence of this bias after removing user gender as a feature during model training. We found that removing user gender as a feature resulted in a statistically significant decrease in levels of attribute association bias, but significant attribute association bias still remained. The presence of such implicit attribute association bias signals the potential for systematic gender bias when using LFR models and representations, or potentially recommendation algorithms in general, for serving podcasts to users. This finding is not surprising given previous research detailing the highly gendered nature of podcast listening [12, 18, 52]. Our findings based on our framework demonstrate that, similar to Gonen and Goldberg [30], known systematic bias can be found and quantified in recommendations. Given this, it is essential for practitioners to audit for attribute association bias when systematic bias is a known factor in their recommendation scenario, such as podcast recommendations.

Additionally, our finding means that more straightforward measures, such as removing gender as a feature during training, can also be seen as a “superficial” attempt to entirely remove gender bias from the representation space. Our results concerning the relatively small changes in our metrics and classification outputs show that relying on feature removal in recommendations will not fully mitigate user gender bias entirely from the user or item vectors. Additionally, the findings demonstrate that implicit attribute association bias can occur in LFR models, signaling that this type of bias may need to be accounted for in representation learning beyond NLP and image processing. In light of this, we did find that removing user gender during training of a deep LFR does reduce the inequality of gender association between male and female users to their stereotypical genres and content. In the case of podcast listening, it is unlikely that gender bias can be removed entirely due to the systematic bias found in listening trends showcased by users and found in external research. However, removing gender as a distinct feature could be a first step in reducing the overall amplification of gendered associations reflected in the relationships between user and item vectors within the trained latent space.

Our findings supporting the possibility for systematic bias to occur as attribute association bias in LFR outputs leads us to a more challenging question for the research community: when is it appropriate to mitigate systematic bias? If it is found that implicit attribute association bias improves user experience, how should one reduce the risk of representative harm? In some cases, like ours, stereotyped behavior is typical, and some may argue that it is beneficial for providing valuable content recommendations to users. In the case of user gender bias, the harm lies in the model potentially reinforcing stereotypes by driving users towards gendered listening habits they might otherwise not partake in. It is possible to monitor levels of attribute association bias over time to flag increasing bias in the latent space. Nevertheless, when do levels of reinforcing and implicit bias become harmful? Both the research and practitioner community would benefit from more exploration of how to approach setting baselines for managing representative harms in recommendations.

Authors:

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.