Large language models (LLMs) have generated huge hype in the AI field. GPT-3, ChatGPT, Bing, Google’s Bard — you name it. Twitter is buzzing about how racist and sexist they can be, while AI engineers say they’re supplemental tools that need human curation.
Bias has always been prevalent in AI. The human brain tries to simplify information consumption since it struggles to compare and fairly evaluate it. The resulting errors are called cognitive biases, and they affect how technology is used. For example, information retrieval systems like search engines have always suffered from position bias because people perceive items at the top of a list to be the most relevant.
Not all biases in AI systems are harmful. One digital nudging mechanism is categorization, an example of which is the “healthy snacks” category grocery delivery apps usually have. Leveraging the diversification bias, they pull users out of the filtering bubble to explore new and healthier options.
But some biases are obviously dangerous. That’s why fairness and biases in AI is a hot topic supercharged by the recent boom of LLMs. Many biases hide in the data used to train ML models. Quite a few language models boil down to a tool that generates words with the highest probability in the current context. Large corpora of “words in contexts” scraped from the web, which is inherently biased, simultaneously empowers and causes problems for LLMs.
Biases are identified, classified, and mitigated. The problem is too big to ignore. There are even guidelines. For example, Google’s People + AI Guidebook has a well-formulated, concise chapter called Data Collection + Evaluation with recommendations for minimizing bias in data. But there are still many unanswered questions and unsolved problems.
As a data advocate, I’m lucky enough to have access to experts at industrial conferences and meetups. At the Data-Driven AI Berlin meetup on biases in AI, I asked software engineering and data science leads on a discussion panel the questions that bug me the most.
These people in tech are creating AI startups and designing ML pipelines. That puts them in direct contact with biases in AI systems, so their answers give us an approximate understanding of the actual situation in the industry.
A majority of biases come from historical data, old beliefs, and outdated behavioral patterns. Obviously, historical data should not be ignored or erased, ML models learn from our history the same way we do. Should we explicitly fix historical biases in AI or wait for history to become history?
First, we should decide what to optimize for in each use case by clearly defining equality, opportunity, odds, and outcomes. The more we aim to change the status quo, the more stringent we need to be with the definitions we use. At a minimum, we should measure the problems caused by historical biases and be fully transparent in sharing this information.
Next, we should look at generalization versus specificity. Generalizing specific cases usually introduces new biases. For example, DALL-E fixed ethnicity and gender bias by forcefully inserting additional information into user prompts, which some users claimed led to incorrect results. Teaching users to be specific might eliminate many historical biases in AI.
Do all biases come from humans?
Even dinosaurs would probably have exhibited position bias if someone had ranked meat for them. All data produced by humans is somehow biased, which results in biased outputs for models.
As a side note, I don’t fully agree. All generalization creates or intensifies biases, so neural networks (simplifying, learning weights, generalizing input information to process unseen data) might be a separate bias driver. What do you think?
Humanity seems to be stuck in an infinite loop of introducing and fixing biases. Should we try to automate everything? AutoML for the win, AI models training AI models, and an unbiased AI eventually.
Even if there were a cycle of automation like that, we would still have to kick it off and set the objective as customers. That raises the likelihood of biases built into the loop. It’s not a perfect solution, especially with full AI automation decades away.
Mistakes are especially scary where human life is deeply impacted, like in medicine and jurisprudence. Given the current state of ethical AI, they can’t be fully automated, so we’re better off looking for manual solutions.
Biases in humans and AI are a psychological and sociological phenomenon. Should tech specialists study those fields to better handle biased AI systems? Or can we focus on common sense and outsource the rest to experts?
A degree in ethics isn’t mandatory; having well-defined personal ethics is. Work ethics can’t be completely distinct from your personal ethics (like in the TV series “Severance”), and that applies to technical experts too.
However, AI systems development nowadays is rarely up to just the tech crowd. In modern AI companies, cross-functional teams are responsible for product development. And one of their main focuses is conducting comprehensive user studies. User studies are extremely important in bias detection and overall product improvement. The pace with which users find biases in accessible models, like an infamous CLIP’s bias towards text on an image, speaks for itself.
If you as a tech specialist or team can’t find any more biases in your AI product, can you relax?
There will always be situations where you don’t have the imagination to understand someone else’s experience. But that doesn’t mean it doesn’t exist. Where human lives are affected, we have to set the bar as high as possible, remembering that fairness is key to marketplace sustainability as well.
The panel discussion shed some light on the perspectives AI experts have on biases. But there are still questions that bother me:
Which bias detection tools work best in which cases?
How do we mitigate biases and account for ethics that vary geographically?
Is it possible for LLMs to be fully unbiased?
I’m going to dig deeper into the topic, gather more intel, and come back soon with more insights!