Understanding Attribute Association Bias in Recommender Systems

We present a novel evaluation framework for representation bias in latent factor recommendation algorithms. Our framework introduces the concept of attribute association bias in recommendations allowing practitioners to explore how recommendation systems can introduce or amplify stakeholder representation harm. Attribute association bias (AAB) occurs when sensitive attributes become semantically captured or entangled in the trained recommendation latent space. This bias can result in the recommender reinforcing harmful stereotypes, which may result in downstream representation harms to system consumer and provider stakeholders. Latent factor recommendation (LFR) models are at risk of experiencing attribute association bias due to their ability to entangle explicit and implicit attributes into the trained latent space. Understanding this phenomenon is essential due to the increasingly common use of entity vectors as attributes in downstream components in hybrid industry recommendation systems. We provide practitioners with a framework for executing disaggregated evaluations of AAB within broader algorithmic auditing frameworks. Inspired by research in natural language processing (NLP) observing gender bias in word embeddings, our framework introduces AAB evaluation methods specifically for recommendation entity vectors. We present four evaluation strategies for sensitive AAB in LFR models: attribute bias vector creation, attribute association bias metrics, classification for explaining bias, and latent space visualization. We demonstrate the utility of our framework by evaluating user gender AAB regarding podcast genres with an industry case study of a production-level DNN recommendation model. We uncover significant levels of user gender AAB when user gender is used and removed as a model feature during training to mitigate explicit user gender bias, pointing to the potential for systematic bias in LFR model outputs. Additionally, we discuss best practices, learnings, and future directions for evaluating AAB in practice.

1 Introduction

Latent factor recommendation (LFR) algorithms have become fundamental to industry recommendation settings [6, 14]. These recommendation algorithms, such as collaborative filtering and deep learning, provide predictions of engagement and embedded vector representations of users and items. The resulting trained vector representations can capture entity relationships and characteristics in a condensed dimensional space and allow for comparisons between different entity vectors in the trained latent semantic space. It has been demonstrated that user and item attributes can become entangled when leveraging these algorithms, resulting in feature duplication and bias amplification [62]. This algorithmic outcome can result in lower and less robust recommendation quality [62]. Research seeking to reduce this type of attribute disentanglement has become increasingly prevalent and showcases favorable results when targeting exposure bias attributed to item attributes, popularity, or user behavior [20, 44, 58, 62, 64, 65]. However, the research primarily focuses on intrinsic mitigation techniques and increasing recommendation performance but does not always provide evaluation techniques to understand how the bias may be captured explicitly or implicitly within the latent space. Attribute disentanglement traditionally requires attributes to be independent and explicitly used in order to implement disentanglement methods. This common requirement results in disentanglement evaluation methods failing to address situations where attributes show interdependence with one another or present themselves implicitly in results. This stipulation hinders processes for identifying systematic bias, such as gender or racial bias, that can be interdependent with other attributes or be implicitly captured by behavior in the recommendation scenario.

Other research concerning this type of bias in representation learning has shown that systematic bias can occur due to the nature of latent factor algorithms, as previously found in research exploring systematic gender bias in natural language processing (NLP) [30]. Leveraging the framework introduced in this paper, we demonstrate that this risk can also occur within the outputs of latent factor recommendation. Given the popularity of LFR models in industry systems and the use of their outputs as downstream features in hybrid or multi-component recommendation systems, it is paramount that practitioners understand and can evaluate attribute association bias to reduce the risk of introducing or reinforcing representation bias in their recommendation systems.

In other areas of representation learning, such as NLP, implicit or systematic bias has been shown to result in downstream representation harms (e.g., translation systems being more likely to generate masculine pronouns when referring to stereotypically-male occupations [49]). NLP researchers have studied this type of bias by evaluating and mitigating gender association bias [8, 13, 15]. Even though association bias evaluation has been a focus in other areas of representation learning, it remains largely unstudied for recommendation systems [23]. In this work, we close this gap by presenting a framework of methodologies for evaluating attribute association bias resulting from LFR algorithms. Unlike previous work focusing on gender, our framework is designed to be attribute agnostic, thus the name attribute association bias. Attribute association bias is present when entity embeddings showcase significant levels of association with specific types of explicit or implicit entity attributes. For example, while users can be explicitly labeled by gender, pieces of content cannot be gendered. However, due to the potential for attribute entanglement, pieces of content can show measurable levels of implicit association with a gender attribute

Understanding how attribute association bias can become entangled within the trained latent space is vital in industry recommendation settings because of the popularity of latent factor algorithms and the common practice of leveraging embedding outputs in hybrid recommendation systems [6, 14]. In these industry recommendation system settings, learned vectors may be used in unrelated model systems to create product predictions based on user and item representations. If sensitive attribute bias is encoded into the vector, it plausibly can be repeated and amplified when said vectors are used as features in other models. Ignoring this type of bias puts practitioners at risk of unknowingly amplifying stereotypes and representative harm within their recommendation systems. For example, Amatriain and Basilico [6] described how matrix factorization algorithms could be combined with “traditional neighborhood-based approaches” to create a recommendation system for Netflix. This combination consisted of LFR embeddings ranked based on neighborhood-based algorithms to produce final recommendations for consumption [6]. If certain sets of user or item vectors were closely associated with a sensitive attribute, the resulting attribute .

Our proposed evaluation methods for understanding attribute association bias in LFR models can be coupled with bias mitigation techniques to first understand if the case requires bias mitigation and subsequently determine if the mitigation technique was successful. This enables greater transparency when it would otherwise be challenging to interpret the causal effects of attributes on recommendation representations. Evaluation frameworks, such as the one presented in our paper, are essential to practitioners, allowing them to thoroughly investigate the level of bias in their system before experimenting with and completing expensive mitigation techniques [48]. Our work provides a practical guide for evaluating attribute association bias in trained vector embeddings. We introduce recommendation entity-specific attribute association vector directions, bias metrics, and evaluation techniques inspired by gender association bias NLP research. Our methods account for differences between recommendation system and NLP representation embeddings and are designed to provide flexibility for evaluation of binary attributes beyond gender bias. Our framework is model agnostic concerning the type of LFR algorithm. Additionally, our methods allow for the analysis of attribute association bias for any type of binary attribute. In addition to introducing the methodology, we implement these methods on an industry case study to demonstrate how these techniques can be implemented in practice and best practices for interpreting evaluation results. To the best of our knowledge, we present one of the first evaluation frameworks for addressing attribute association bias (as a type of representation bias). Our case study also presents one of the first quantitative analyses of user gender bias in latent factor recommendations for podcasts. Our work lowers the barrier to analyzing attribute representation bias in LFR algorithms and opens the door to disentanglement of dependent attributes and implicit representation bias in recommendation systems.

Our contributions include:

• Definition of attribute association bias within the context of latent factor recommendation algorithms (§3).

• Evaluation methodologies and metrics for analyzing attribute association bias between recommendation entity embeddings (§4).

• Recommendations for implementation in practice by showcasing techniques with a case study observing user gender attribute association bias resulting from a podcast recommendation model (§5-6).

• Discussion of limitations of this approach and future directions (§7-8).

Throughout the paper, we leverage language such as stereotypes, bias, and harm. When referring to bias, we are discussing algorithmic or qualitative statistically skewed results found in experimental or evaluation settings which can produce harm [54]. We refer to stereotypes as a “product of biases" often held at the societal level, which may or may not be supported in experimental settings [5]. Stereotyping algorithmic harm occurs when it is produced by algorithmic bias [54]. This type of harm can be seen as “representative" harm due to it reinforcing “the subordination of some groups along the lines of identity" [54]. Our framework, and presented case study, looks to quantitatively evaluate algorithmic bias (attribute association bias) which signals potential for reinforcing stereotypes thus resulting in downstream representative harm.

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.