Why Algorithmic Fairness is Elusive

In 2016, Google photos classified a picture of two African-Americans as “gorillas.” Two years later, Google had yet to do more than remove the word “gorillas” from its database of classifications. In 2016, it was shown that Amazon was disproportionately offering one-day shipping to European-American consumers. In Florida, algorithms used to recommend detention and parole decisions on the basis of risk of recidivism were shown to have a higher error rate among African-Americans, such that African-Americans were more likely to be incorrectly recommended for detention who would not go on to re-offend. When translating out of a language with gender-neutral pronouns, and into languages with gendered pronouns, Google’s word2vec neural network injects gender stereotypes into translations, such that pronouns become “he” when in conjunction with “doctor” (or “boss,” “financier,” etc.) but become “she” when translated in conjunction with “nurse” (or “homemaker,” or “nanny,” etc.).

These issues arise from a constellation of causes. Some are underlying social roots; if you train a machine learning algorithm on data created by biased humans, you’ll get a biased algorithm. Some are simply statistical artifacts; if you train a machine learning algorithm to find the best fit for the overall population, to the extent that minorities are different in a relevant way, their classifications or recommendations will necessarily have poorer fit. And some are a combination of the two: biased humans lead to biased algorithms that make recommendations which reinforce unjustified stereotypes (for instance, harsher policing of poorer neighborhoods leads to more crime reports in those neighborhood. More crime reports trigger policing analytics to recommend deploying more cops to those neighborhoods, and voila! You have a nasty feedback loop). The trouble is that it’s not at all clear how to make algorithms fair. And in this regard, conversations about algorithmic fairness have been a magnifying mirror on society’s ethics. Debates over how to define and measure fairness reflect broader ethical conversations taking place today.

I’ve recently had the pleasure of interviewing Sharad Goel, the executive director of the Computational Policy Lab at Stanford. We got to talk about some of his applied work in algorithmic fairness. In particular, we got to discuss the perks and shortcomings of three sides of the debate over how to conceptualize fairness algorithmically. Technical folks can find a fuller treatment of the debate in this paper, but I’m going to try and boil it down here.

Three conceptualizations of fairness

Certain group labels should be off limits. This mode of thought maintains that algorithms should not be allowed to take certain protected categories into account when making predictions. For instance, in this view, algorithms which are used to predict loan qualifications or recidivism should not be allowed to base predictions off of race or gender. This approach to achieve fairness is straightforward and easy to understand. But there are two main problems:

1. Distinguishing between acceptable and unacceptable proxies of protected categories. Even when such categories are eliminated from an algorithm, the statistical variance explained by these protected categories tends to slip into other available variables. For instance, while race might be excluded from loan applications, zip code, which tends to be highly correlated with race, can take on a higher predictive weight in the model and mask discrimination. For all intents and purposes, zip code becomes the new race variable. It’s challenging and debatable which proxies are illegitimate substitutes for protected categories, and which are acceptable, distinct variables. This fuzzy line brings us to the other problem with making certain labels “off-limits.”

2. The societal (and sometimes personal) costs are high. Protected categories often can make a meaningful impact on the behaviors that the algorithms are designed to predict. For instance, it is commonly known that insurance premiums are higher for male drivers, because male drivers really do account for more of the total insurance payouts. Eliminating gender from these algorithms would cause car insurance premiums to decrease for men, but it would increase the rates for women. Whether or not women should be required to pay for more than their share of risk, such that gender is eliminated from risk algorithms, is debatable. In short, while this may create exact equality, this seems to be missing the mark of what is proportionally equitable. Some would argue this approach is actually unfair.

Higher stakes can be found in criminal justice settings. Removing protected categories like gender or race from algorithms designed to predict recidivism deteriorates the efficiency of the algorithm, meaning that more people of lower real risk are detained, and more people of higher real risk are let free. The consequences would be that more crime takes place in general, and among communities that are already experiencing higher crime in particular. To see this, keep in mind that the majority of violent crime occurs between people who already know each other. And so, communities already plagued by violent crimes may stand to experience the additional re-offending violent crimes when algorithmic efficiency is slashed (when protected, but nonetheless explanatory, categories are disallowed).

Most people agree (including the law) that basing decisions on protected categories when there is no tangible justification is morally reprehensible. The tough part is when using these protected categories appears to efficiently cut down harmful outcomes. This trade off has led some to take alternative approaches to defining fairness algorithmically. Is there a way to maximize predictive accuracy (allowing inclusion of meaningful protected categories), while still being fair?

Algorithmic performance should work equally well across certain groups. As opposed to ignoring protected categories like race and gender (e.g. being color or gender blind), this approach to fairness instead argues that indicators of an algorithm’s performance should be equivalent across the protected categories. For example, an algorithm which classifies criminals as either high or low risk of re-offending should make prediction errors equally for white and black criminals. This approach is less intuitive than the color-blind approach, but at least theoretically allows the algorithms to be more efficient in their predictions, and has the added perk of avoiding tricky judgments calls about which proxies (e.g. zip code as a crude substitute for race) are and aren’t acceptable for inclusion in algorithms.

Still, this approach is imperfect. To see why, it’s important to understand that different groups of people will represent distinct populations—populations with different average scores, deviations, skews, kurtosis, etc. (see image above, and imagine trying to get one algorithm to perform equally for each group curve using the same cutoff threshold) . Generally, when we speak about fairness, we want all people, regardless of their group membership, to be held to the same standards. But if the same cutoff thresholds are used for different populations, predictive ability and error rates are more than likely to differ across groups--this is simply the natural result of how statistics works. If government regulation compels corporations to turn out algorithms that maintain the same performance across protected groups, corporations and institutions are incentivized to discriminate under the obscuring power of statistical wizardry and employee NDAs.

They generally have two options: 1. Lower the quality and efficiency of their algorithms by toying with the code so that algorithmic performance is equal across groups (this option introduces the potential for harm discussed previously, such as releasing criminals with real, high risk scores), or 2. Corporations could adopt different algorithmic thresholds for different populations, such that cutoffs are different for different groups (genders, races, people of different sexual orientations, etc.). But clearly this seems to break with notions of fairness, and is usually morally frowned upon and considered illegal (with a notable exception being something like affirmative action). Negative impacts of forced equalization of algorithmic performance across groups aren’t just theoretical—these negative impacts have been documented, for instance, in recidivism risk score databases as well as databases predicting likelihood of police finding contraband among white and black citizens.

Algorithmic scores should represent the same things across members of different groups. A third approach to achieving fairness in algorithms, is to ensure that an algorithm’s scores mean equivalent things across protected categories (for instance a woman receiving a risk score of X on her insurance application, should have similar insurance payouts as a man who also receives a risk score of X on his insurance application). On the surface, it would seem that this approach is getting at what we want—it seems fair. The problem is that it cannot guarantee fairness in the presence of intentionally discriminatory action, and thus regulation of algorithms on the basis of this definition of fairness will still leave room for obscured discriminatory treatment. There are at least two ways this can happen:

1. Proxies (like zip code for race) can still be used to gerrymander population scores above or below an algorithm’s cutoff thresholds. For example, individuals at a higher risk of loan defaulting can be paired with individuals at a lower risk of loan defaulting, such that a protected category’s risk scores can be pushed above or below a cutoff threshold at will. This essentially boils down to algorithmic redlining.

2. As discussed above, different groups will have different statistical risk curves. If quantitative scores are discretized (for instance, substituting “high,” “medium,” or “low” labels in place of an individual’s exact score) within groups, these differences in the real risk curves can mask different group cutoffs while maintaining the veneer that individuals labeled “high” risk re-offend, default, and get in car crashes at similar rates across protected (race, gender, etc.) categories. For example, in the image above, assigning a person a "high," "medium," or "low" risk label on the basis of their within-group percentile will effectively yield different group cutoff thresholds, while potentially maintaining the same algorithmic performance across those labelled "high" risk for each protected group.

While it seems like using these techniques would be somewhat rare for B2C corporations, who would more often than not suffer the loss of profits by discriminating in these ways, incentives still exist for B2B corporations. For instance, ad-matching companies have incentives to push certain groups above and below cutoff thresholds in order to justify ad targeting on the basis of protected categories. It’s not difficult to imagine political campaigns or lobbyists being attracted to the power of these methods to sway public opinion among strategic subgroups while leaving behind few breadcrumbs, and convoluted breadcrumbs at that. (I'm just saying, if US senators couldn't understand Facebook's business model, my faith in their understanding this issue is...well it's not good.)

The challenge

Each approach to algorithmically defining fairness has its strengths and weaknesses. I think what’s most troubling is not so much the weaknesses that each approach faces, but instead that these approaches are fundamentally incompatible with one another. We cannot ignore protected categories while using protected categories as the baseline to detect fairness. And we can’t demand similar algorithmic error rates while demanding that similar risk scores actually do entail similar outcomes among groups. The race is still on to define fairness algorithmically. But my background in moral psychology also gives me pause. Democrats, Republicans, and Libertarians can't agree on what's fair, and I think it's too optimistic to treat algorithmic fairness like a mathematical, computer science problem. The trouble isn’t solving some complicated statistical rubix cube, so much as it is trying to manifest Plato’s perfect form of fairness on a cave wall that’s only capable of capturing shadows. It’s hard to predict which solutions we’ll embrace, and what the costs will be when those solutions interact with regulatory and economic incentives. Algorithmic fairness is, at its heart, a socio-moral problem.

I'm building out the egalitarian infrastructure of the dWeb with ERA. If you enjoyed this article, I'd love to connect with you on Twitter!