A summary and review of: The Ethical Algorithm: The Science of Socially Aware Algorithm Design by Aaron Roth and Michael Kearns.
Algorithms have been around for a long-time — some say the earliest algorithms were written around 4000 years ago by Babylonians. Algorithms were used for practical purposes such as computing the area of agricultural land and compound interest on loans. Ancient Indian mathematics was strongly algorithmic and computations were done, among other applications, to estimate the positions of planets with the belief that planets and their positions influenced the events in our lives.
Much later, algorithms began to replace human judgment. One of the earliest applications was in lending. Credit scoring algorithms replaced human gut feel to assess the credit-worthiness of borrowers. Humans are not necessarily good at complex decision making. We are subject to a plethora of
Two major technological trends converged in the late years of the twentieth century: the rise of the internet and vast increases in compute power. The internet allowed the interconnection of the computers of the world and enabled the sharing of data. The vast increases in computing power and availability of huge volumes of data spurred advances in discipline of machine learning, such as the development of powerful methods to classify, identify and predict human behavior. This success has inspired applications ranging from: social justice (e.g. predictions of recidivism risk), employment screening, student scoring for college admissions, policing, and many others.
Machine learning, like technologies such as nuclear power and social media, is a double-edged sword — with its great power to transform society comes the potential to cause much harm. Computer scientists used to working with technical criteria for success — computation speed, prediction accuracy, and precision — did not pay attention to the human impact of their models and algorithms.
Machine learning models can be complex and opaque and often even model developers cannot anticipate how they will behave in specific situations. Journalists and social justice minded computer scientists have documented several cases of harm resulting from the indiscriminate application of machine learning models, raising ethical concerns about the use of algorithms in general. Regulatory bodies have gotten into the act, as well. In the US, the Federal Trade Commission (FTC) recently issued a tough
Michael Kearns and Aaron Roth, both professors in the department of computer and information science at the University of Pennsylvania are prolific contributors to this research. Their book, “The Ethical Algorithm — The Science of Socially Aware Algorithm Design” published in 2019, is written for the non-specialist. The book is a very accessible introduction to the ethical problems and algorithmic solutions to managing privacy, fairness, and political polarization due to recommendation and personalization algorithms, and the pitfalls of using the same data sets to do adaptive analysis. They also touch upon topics where the research is less developed such as interpretability of machine learning models. However, they are too sanguine about algorithms being able to resolve ethical issues arising in algorithmic decision making.
Consider privacy. Most of our social interactions and commercial transactions are digitized. Our online actions are monitored and recorded and the data analyzed, sold and used to target us for ads and products. Data breaches occur with increasing regularity exposing our identities, our credit cards and other personal data. Serious harm can happen if this data gets into the wrong hands. Autocratic governments have used their power over companies to get hold of sensitive information which have been used to harass their opponents. Scammers have used private data to defraud people.
Behavior patterns mined from large volumes of data can be used to infer personal details that we may not want revealed. Researchers at Facebook showed that it was possible to predict, with a high degree of accuracy, users’ partners, and which partners were most likely to break-up just based on their actions on their Facebook feed and their friend networks. Sensors on our smartphones can track our location. Logs of our app usage, website visits, media consumption, communications and screen activity can be used to
Privacy is defined as the right of an individual to keep his individual information from being disclosed. The UN Declaration of Human Rights recognizes
On the flip side, aggregated personal data contains useful information with many beneficial applications. Precision medicine — treatments designed to individual’s genetics, environment and lifestyle — relies on collection of vast volumes of patient data. A research program created by the NIH in the US called
After a few failed attempts, computer scientists have found a way to realize the value of data while protecting individuals’ privacy. It is important to distinguish between two closely related concepts: anonymity and privacy. Privacy is defined as the right of an individual to keep his/her individual information from disclosure. Whereas an individual is anonymous when his/her information is disclosed without revealing his identity. An early approach to protecting individuals involved anonymizing data by removing their identifying attributes such as name and address. However, removing personally identifying information was not sufficient to preserve anonymity.
Using the 1990 US census data,
The two researchers de-anonymized some of the data by comparing it with non-anonymous IMDb (Internet Movie Database) users’ movie ratings. In the resulting
A more sophisticated type of anonymization called
In order to make progress toward protecting privacy, we need a formal criterion which can be guaranteed. A few decades ago, statistician
Intuitively, a query protects the privacy of individuals in the data if its output does not reveal any information about any specific individual. Differential Privacy formalizes this intuition mathematically to provide a guarantee of privacy protection. We can prove that a specific algorithm “satisfies” differential privacy. Privacy is not just a property of the output, but rather a property of the computation that generated the output. Informally, differential privacy guarantees the following for each individual who contributes data for analysis: the output of a differentially private analysis will be roughly the same, whether or not you contribute your data.
Differential privacy works by adding noise to the output of a query. The challenge lies in determining where to add the noise and how much to add. We can control the strength of the privacy guarantee by tuning a parameter known as privacy loss or privacy budget. The lower the value of this parameter, the more indistinguishable the results, and better the protection.
Differential privacy has gained widespread adoption by governments, firms, and researchers. It is already being used for “disclosure avoidance” by the
This is a big concern for census data. While statistics for large populations — for example, for entire states or major metropolitan areas — can be adequately protected with negligible amounts of noise, many important uses of census data require calculations on smaller populations such as the native tribes of Alaska, where the impacts of noise can be much more significant. One of the complexities the Census Bureau faced was the need to carefully enumerate the myriad ways that census data are used and identify which of those uses are more critical than others. Nevertheless, the results were better than obtained from previously used methods.
The literature on algorithms for privacy can be abstruse, but Kearns and Roth provide an excellent discussion of the history of these algorithms and a lucid explanation of the key underlying ideas. They use many real-life examples to illustrate these ideas. They also point out the limits of these algorithms. Differential privacy is designed to protect secrets in individual data records, but it does not protect secrets embedded in the records of many people. Even with differential privacy, someone can use data from Facebook about users’ likes, to discover patterns that can be used to infer users’ gender, political affiliation, sexual orientation and many other attributes.
Kearns and Roth discuss the idea of fairness in a long chapter. Soon after machine learning was used in applications such as social justice, employment screening and health risk assessment, reports of gender, race and other types of bias began to surface. A widely discussed
The US Department of Justice uses the concept of
In contrast to group fairness, individual fairness requires that similar individuals be treated similarly by the model. However, it is not clear how to define similarity. Further, Roth and Kearns point out that all models have errors in their predictions, so with a naive application of individual fairness “its applicability will be greatly constrained and its costs to accuracy are likely to be unpalatable; we’re simply asking for too much.”
A major source of bias in algorithms is the data used to train models. All kinds of hidden (and not so-hidden) biases can be embedded in the data, complex models trained from such data can amplify these biases and introduce new ones. And when such models become the basis for widely deployed services such as search engines, targeting advertising and hiring tools the bias is propagated and amplified. Bias amplification occurs in systems that rank content such as in content and ad personalization in recommender systems which present or give priority to some items over others. Users’ responses (which generates the labels for examples) to items presented are collected, responses to items not presented are unknown. User responses are also influenced by the position of the items on the page and the details of presentation such as font, media (for instance, does the item contain images?).
Another way in which data introduces bias is when the different groups do not have uniform representation in the data. Some groups may have more data than others. Since model training is based on minimizing error, the model will perform poorly (relatively speaking) on groups that have smaller representation in the data.
Ironically the quest for privacy using differential privacy can exacerbate bias. As we discussed before, providing a high level of privacy for individuals belonging to groups with small amounts of data requires injecting a high degree of noise. This makes the data much less accurate relative to groups with larger volumes of data. Decisions made on the basis of such data can result in serious inequities to certain groups. David Pujol and his colleagues simulated funds allocation to congressional districts using the differentially private 2020 US census data and showed that smaller districts could receive comparatively more funding than what they would receive without differential privacy, and larger districts would get less funding.
While data is a major source of bias, it is not the only one. The algorithm designer can introduce bias too, perhaps unintentionally. In machine learning models, the loss functions used in training models and the metrics used to evaluate the models can determine the model performance relative to different groups. Sara Hooker has argued that certain choices for learning rate and length of training — hyper-parameters that are set by the model developer — can adversely affect groups that are underrepresented in the data.
How do we achieve fairness in algorithms? Biases in data can be eliminated to a degree by using de-biasing techniques. Fairness criteria can be used as constraints in the model training process. Some other solutions that seem obvious turn out to be problematic. It might seem that we could eliminate bias by avoiding the use of protected attributes such as: race, age, gender in models. However, excluding these attributes may yield less accurate predictions. Besides, other attributes may be strongly correlated with race. And surprisingly, removing the offending attributes may sometimes even exacerbate the racial bias of the model.
Therefore we should try to define fairness of the predictions made by the model in terms of the model outputs rather than inputs. However, this is not so straightforward, as there is not a gold standard for fairness. There are multiple possible criteria. Jon Kleinberg and his colleagues show that some of these criteria are incompatible with one another — they can only be simultaneously satisfied in certain highly constrained cases. A fair outcome based on one criterion may seem unfair on a different criterion. The choice of criterion is a fraught one, one which will need to involve the model designer and key stakeholders in the decision.
As in the case of privacy, there is a trade-off between fairness and classification error (that is, say someone who is regarded as credit-worthy goes on to default). These trade-offs can be represented as a curve on a two dimensional chart, the Pareto frontier — the set of all choices that are best for a given level of error and fairness. But it is up to humans to decide which point to pick on this frontier.
Kearns’ and Roth’s book focuses on situations where a decision-maker makes decisions that may be unfair to some of the affected groups. These decisions, with support of an algorithm, are taken based on predictions of future performance of the individuals or groups. But who decides what is fair? What is fair to one group may seem unfair to another. Kearns and Roth opine that “questions of fairness should be informed by many factors that cannot be made quantitative, including what the societal goals for protecting a particular group is and what is at stake”. These factors, including the choice of a fairness criterion, have to be chosen by humans. Algorithms can implement the criterion, but are not ethical in themselves. Some ethical questions have no obvious right or wrong answers. Consider the recent
There are other types of situations where algorithms are used to find a solution, which are not discussed by Kearns and Roth. One such is the problem of dividing a set of resources among several people (or organizations) who have an entitlement to them, such that each receives their due share. Here, the criteria for fairness is clear and good algorithms for finding the fair solution are available. This problem, known by the rubric “
A different ethical issue arises when the outcomes of our actions are influenced by the actions of others. One example is driving. We share the road system with other drivers — a network of freeways, highways and streets. Drivers typically want to take the shortest route (usually in terms of time taken) to their destination. However, the time taken depends on the decisions made by all the other drivers using the same roads. Before technology such as Waze and Google Maps were available, determining such routes was very difficult to do. Waze and Google Maps compute the shortest possible routes for us. However, these routes are computed for each driver in isolation. Suggesting the same route for every driver adds to the congestion and the suggested routes may no longer be the shortest. From an ethics angle, this “selfish” routing decreases the utility of drivers (assuming time to destination is the goal they want to minimize). A better solution–the opportunity to decrease the driving time for some drivers without making anyone worse off — is possible. Kearns and Roth outline one approach an app could take. Instead of always suggesting the selfish or best response route to each user in isolation, this app gathers the planned origin and destination of every user in the system and uses it to compute the routes for each user that minimizes the average driving time across all the drivers (a maximum social welfare solution). To achieve this an app may need to recommend a slow route to some drivers and a fast route to other drivers. The authors point out that this can be done without anyone being worse off than in the competitive solution. But would such an app work? Drivers who are suggested the longer route can ignore the recommendation of the app and choose a different (shortest) route making the situation worse for everyone (say by using a different app, such as Google Maps). Differential Privacy once again comes to the rescue — by making a single driver’s data have little influence, manipulations such as lying about where one wants to go won’t have a benefit. This may be the best solution if every driver values her time equally (an assumption Kearns and Roth are implicitly making). However, to someone rushing to the hospital or driving to the airport to catch a flight a shorter driving time would mean a lot more than it would to say someone driving to get to a vacation spot. We need a price mechanism to allow drivers to signal their value for time. Congestion pricing, where the price charged is the marginal social cost of a trip in terms of impact to others, is a better solution. Drivers in a hurry to get to their destination are likely to pay to take the less congested routes, others may be content taking the longer routes. This solution may also be less complex to compute than the one Kearns and Roth suggest.
Similar socially bad outcomes can occur in other situations. Many of us get our news from content recommendation platforms such as Facebook and Youtube. Facebook and Youtube apply machine learning techniques to build individual profiles of user interests based on collective data and use these models to choose what appears in their News Feed. Our collective data is used to estimate a small number of user types and each one of us is assigned to one of them.
Our “type” determines the news and articles we see in our streams narrowing the diversity of content we see–isolating each of us in our own echo chambers. And the more we adopt the suggestions (click on the articles in the stream) the more we accentuate the echo chamber. Arguably this has led to the rampant polarization we see in the US today as we have become less informed and thereby less tolerant of opposing perspectives. This is the result of algorithmic, model-based systems trying to simultaneously optimize everyone’s choices. However, it is possible using these same algorithms to inject diversity, say by recommending articles aligned with types different from us (it is possible to compute how different the types are — the distance between them).
A workable solution might be to implement a “knob” (that could be adjusted by readers) to adjust how “personalized” the recommendations should be.
While most of the algorithms we have discussed have to force a trade-off between two or more conflicting objectives (classification error and fairness or privacy and social benefit) the family of algorithms that solve the so-called
Statistics teaches us proper methods for data analysis and model building: the models to be fit, and the hypotheses to be tested are fixed independently of the data, and preliminary analysis of the data does not feed back into the data gathering procedure. However, the practice of data analysis in machine learning model development is highly adaptive — model selection is performed iteratively on a dataset by tuning hyper-parameters, and exploratory data analysis is conducted to suggest hypotheses, which are then validated on the same data sets used to discover them. This results in over-fitting to the data — and the model lacking in generality. In economist Ronald Coase’s memorable words: “If you torture the data for long enough, it will confess to anything.” This practice is often referred to as p-hacking, and blamed in part for the surprising prevalence of non-reproducible science in some empirical fields.
Many highly publicized findings such as the “power pose” idea of Amy Cuddy whose 2012 TED talk has garnered over 60 million views on the TED website could not be reproduced. And in a
Adaptive analysis cannot be avoided if the data sets available are relatively small. So a remedy for over-fitting is needed. Once again, differential privacy comes to the rescue. It turns out adding noise to the training data or the output of the model makes overfitting less likely. Blum and Hardt proposed another idea that further improves the generality of the model — rather than reporting on the performance of each iteration of the model, only report the model performance when a model beats the previous score by a margin.
Kearns and Roth admit that in the coverage of topics they have chosen to “look where the light is”. Fairness and privacy have been in the spotlight the most in the news media. They are also the most developed in terms of theory and methodologies. Matching algorithms are fairly mature and they deserve to be better known, given their clear role in saving lives and finding preferred matches for thousands of medical students.
In the final chapter they discuss topics that are less developed but nevertheless important. One important topic is the black box nature of models, especially neural network models, that makes them difficult to interpret. This lack of interpretability of models has caused problems — issues of hard to detect bias have occurred resulting in a loss of trust. Determining whether a black box model is fair with respect to gender or racial groups is much more difficult than determining whether an interpretable model has such a bias. Some types of models are more interpretable than others. Regression models (linear and logistic) and decision trees are some of these types. The easiest way to achieve model interpretability is to use only these types of models.
The model designer and users of the model need to decide on a criterion appropriate to the context of use. Examples of criteria are: monotonicity with respect to predictor variables, model sparsity (models with small number of predictors), decomposability into sub-models. Creating interpretable models can sometimes be much more difficult than creating black box models for many different reasons including: the model training can be much more computationally intensive, there may be a need to get better data, and the criterion for interpretability may not be obvious. Interpretability should be a goal only for models used in high-stakes decisions, such as medical diagnosis and criminal justice sentencing.
Are there limits to how far we can rely on algorithms to resolve the ethical problems introduced by algorithmic decision-making? Clearly there are situations such as warfare where life and death decisions need to be made. Since algorithms do not have moral agency and cannot be accountable, decisions in these situations should only be taken by humans. Kearns and Roth believe that in most other situations the solution to ethical problems in algorithmic decision-making should be algorithmic. However, the trade-offs between competing objectives — such as between individual privacy and social utility of data — should be set by humans.
For the specific problems that Roth and Kearns discuss, algorithmic solutions have proven to be brittle. Differential Privacy works well to protect privacy of large groups of individuals but not for small groups or unique individuals. Further, Differential Privacy mechanisms have been designed for specific sets of queries and are not general-purpose. Mechanisms for fairness work at a group level but do not guarantee fairness for individuals. Individual fairness has proven difficult to tackle since while a general principle for fairness can be stated — similar individuals should be treated similarly — it is difficult to execute algorithmically since similarity is defined by context. Wittgenstein in his Philosophical Investigations discusses the difficulty of understanding rules as an abstract statement.
Instead, he asserted that rules are best explained by examples. People learn how to apply a rule by watching others (instructors, teachers) execute the rule in various situations. Occasionally they may mistakes, which they can go on to correct. Similarly, algorithmic decision-making need not be autonomous. Better results can obtain with a human (or humans) in the loop. In contexts such as college admission decisions — fairness questions are best resolved interactively through a process of dialectic involving other people including the affected (when practical to do so) — a process essentially non-algorithmic. One of the key issues that needs resolution through this process are the fairness criteria relevant to the situation. In certain situations where including humans in the loop may be inefficient as in analysis of data sets or finding an optimal route to a destination, algorithmic resolution of ethical issues may be necessary.
Can an algorithm be a potential threat to the human race? Nick Bostrom, Elon Musk, and Stephen Hawking among other prominent people consider AI our most serious existential risk. They fear that a super-intelligent AI could spell the end of the human race. This is captured in a thought-experiment due to Nick Bostrom, a philosopher at the University of Oxford. Bostrom found the ‘
Computer scientist __Stuart Russell believes__that one way around this existential threat is to teach a super-intelligent AI to satisfy human preferences. Clearly we cannot just teach the AI to always obey the rule: do not harm humans, since it would be impossible to enumerate all the possible ways by which humans may be harmed. Instead, Russell and colleagues advocate an approach whereby machines learn human preferences by watching human behavior. Much like learning through examples, self-driving cars learn by watching how humans respond to various situations on the road. However, there is a limit to how much can be learned by observation as novel situations can arise that may not have been seen before. On the other hand, humans, even from infancy, are able to generalize well and respond effectively even to novel situations. Until we are able to replicate this human ability, we cannot rely on algorithms alone to resolve consequential ethical issues. Humans need to be in the loop. Models will need to be transparent — their output needs to be interpretable and will need to have the ability to accept new input from humans and modify their recommendations. Humans make mistakes too, but perhaps through a dialectical process with other participants are able to recognize errors and correct their decisions. Someday algorithms might too.
Also published here.