Not Accounting for Sensitive Factors Doesn’t Mean Your Algorithm Won’t be Biased Being colorblind doesn’t mean that color doesn’t exist. Similarly, not including sensitive factors such as race and sex into algorithms doesn’t mean the algorithms won’t carry biases formed on race or sex. Those biases are ingrained into society, hence the data. Most algorithms are literal; their outputs are a function of the patterns they observe. Nonetheless, a common technique that developers have applied is straight omission despite its continuous failure. Kwok from Yale’s School of Management explains when race is removed from racially biased algorithms, a subtler biased “latent discrimination” is introduced where other factors, such as income or location that are correlated with race, essentially serve as proxies for race. The Harvard Business Review also investigated an employment recruitment scenario and found that proxies could predict gender with 91% accuracy in data. The omission strategy extends beyond just individual scenarios, though. During a recent conference on AI Regulation at California Western School of Law, a French panelist noted that France doesn’t have to deal with the racial bias issue since they simply do not collect race as a factor. This is due to the GDPR, which prohibits the use of ‘special categories of data’ (Article 9). This includes sensitive factors as well as proxies that may reveal them. It is phrased as follows: Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited. Countries subject to the GDPR, such as France, still have racial biases. They are just unable to be measured since the data is never collected. However, one could argue that perhaps biases don’t need to be “fixed” since our algorithm should reflect real life. When ProPublica criticized the maker of COMPAS, a recidivism algorithm that found black defendants to be nearly twice as likely to be classified as high risk compared to their white counterparts, the algorithm maker and researchers responded that it was mathematically impossible to have an algorithm that didn’t result in such racial gaps due to the impact of race on the recidivism. This reasoning is problematic since algorithms can amplify and perpetuate biases. For example, predictive policing tends to drive law enforcement to black and brown areas based on past data. However, the past data is biased based on heightened racial tensions, and increased law enforcement in such areas increases arrests, skewing future data and increasing the racial disparity among arrestees. We need a solution to prevent algorithms from perpetuating cycles of existing biases, and simply ignoring sensitive factors only masks the issue. The U.S. lacks a regulatory framework that allows organizations to measure and mitigate their own bias. The White House Office of Science and Technology’s AI Blueprint outlines thorough recommendations for best practices. However, the lack of enforcement undermines the Blueprint’s effectiveness, as evidenced by the harmful impact of biased algorithms being deployed. Since sweeping bans such as the GDPR Article 9 will do little to mitigate bias, I argue that the policymakers’ role shouldn’t be to tell developers how to minimize bias but rather do its part as a regulator to strictly hold them accountable through audits. Here is a sample auditing framework that draws heavily from the National Institute of Standards and Technology’s (NIST) identification of three primary categories for AI bias: systemic, computational, and human. Assessment of AI System Objectives


Purpose of System


Assumptions Regarding Fairness and Bias

Definitions of Fairness Model Attempted
Sensitive Factors Accounted For



Organizational Norms (e.g. Implicit Bias Training)


Diversity of Team




Data Management and Analysis


Data Collection Oversight

Representation of Groups in Data
Context of Data



Proxy Identification




Algorithm Development and Model Training


Transparent Design

Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc))



Bias Mitigation Techniques Used




Testing and Evaluation

Independent Validation
Continuous Monitoring
Disclose Bias Audit Findings
Engage Stakeholders Not Accounting for Sensitive Factors Doesn’t Mean Your Algorithm Won’t be Biased Being colorblind doesn’t mean that color doesn’t exist. Similarly, not including sensitive factors such as race and sex into algorithms doesn’t mean the algorithms won’t carry biases formed on race or sex. Those biases are ingrained into society, hence the data. Most algorithms are literal; their outputs are a function of the patterns they observe. Nonetheless, a common technique that developers have applied is straight omission despite its continuous failure. Kwok from Yale’s School of Management explains when race is removed from racially biased algorithms, a subtler biased “latent discrimination” is introduced where other factors, such as income or location that are correlated with race, essentially serve as proxies for race. The Harvard Business Review also investigated an employment recruitment scenario and found that proxies could predict gender with 91% accuracy in data. Kwok 91% The omission strategy extends beyond just individual scenarios, though. During a recent conference on AI Regulation at California Western School of Law, a French panelist noted that France doesn’t have to deal with the racial bias issue since they simply do not collect race as a factor. This is due to the GDPR , which prohibits the use of ‘special categories of data’ (Article 9). This includes sensitive factors as well as proxies that may reveal them. It is phrased as follows: GDPR Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited. Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited. Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited. Countries subject to the GDPR, such as France, still have racial biases. They are just unable to be measured since the data is never collected. However, one could argue that perhaps biases don’t need to be “fixed” since our algorithm should reflect real life. When ProPublica criticized the maker of COMPAS, a recidivism algorithm that found black defendants to be nearly twice as likely to be classified as high risk compared to their white counterparts , the algorithm maker and researchers responded that it was mathematically impossible to have an algorithm that didn’t result in such racial gaps due to the impact of race on the recidivism. unable to be measured black defendants to be nearly twice as likely to be classified as high risk compared to their white counterparts mathematically impossible This reasoning is problematic since algorithms can amplify and perpetuate biases. For example, predictive policing tends to drive law enforcement to black and brown areas based on past data. However, the past data is biased based on heightened racial tensions, and increased law enforcement in such areas increases arrests, skewing future data and increasing the racial disparity among arrestees. We need a solution to prevent algorithms from perpetuating cycles of existing biases, and simply ignoring sensitive factors only masks the issue. The U.S. lacks a regulatory framework that allows organizations to measure and mitigate their own bias. The White House Office of Science and Technology’s AI Blueprint outlines thorough recommendations for best practices. However, the lack of enforcement undermines the Blueprint’s effectiveness, as evidenced by the harmful impact of biased algorithms being deployed. Since sweeping bans such as the GDPR Article 9 will do little to mitigate bias, I argue that the policymakers’ role shouldn’t be to tell developers how to minimize bias but rather do its part as a regulator to strictly hold them accountable through audits. AI Blueprint Here is a sample auditing framework that draws heavily from the National Institute of Standards and Technology’s (NIST) identification of three primary categories for AI bias: systemic, computational, and human. (NIST) Assessment of AI System Objectives


Purpose of System


Assumptions Regarding Fairness and Bias

Definitions of Fairness Model Attempted
Sensitive Factors Accounted For



Organizational Norms (e.g. Implicit Bias Training)


Diversity of Team Data Management and Analysis


Data Collection Oversight

Representation of Groups in Data
Context of Data



Proxy Identification Algorithm Development and Model Training


Transparent Design

Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc))



Bias Mitigation Techniques Used Testing and Evaluation

Independent Validation
Continuous Monitoring
Disclose Bias Audit Findings
Engage Stakeholders Assessment of AI System Objectives Purpose of System


Assumptions Regarding Fairness and Bias

Definitions of Fairness Model Attempted
Sensitive Factors Accounted For



Organizational Norms (e.g. Implicit Bias Training)


Diversity of Team Assessment of AI System Objectives Purpose of System Assumptions Regarding Fairness and Bias

Definitions of Fairness Model Attempted
Sensitive Factors Accounted For Organizational Norms (e.g. Implicit Bias Training) Diversity of Team Purpose of System Purpose of System Assumptions Regarding Fairness and Bias Definitions of Fairness Model Attempted
Sensitive Factors Accounted For Assumptions Regarding Fairness and Bias Definitions of Fairness Model Attempted Sensitive Factors Accounted For Definitions of Fairness Model Attempted Sensitive Factors Accounted For Organizational Norms (e.g. Implicit Bias Training) Organizational Norms (e.g. Implicit Bias Training) Diversity of Team Diversity of Team Data Management and Analysis Data Collection Oversight

Representation of Groups in Data
Context of Data



Proxy Identification Data Management and Analysis Data Collection Oversight

Representation of Groups in Data
Context of Data Proxy Identification Data Collection Oversight Representation of Groups in Data
Context of Data Data Collection Oversight Representation of Groups in Data Context of Data Representation of Groups in Data Context of Data Proxy Identification Proxy Identification Algorithm Development and Model Training Transparent Design

Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc))



Bias Mitigation Techniques Used Algorithm Development and Model Training Transparent Design

Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc)) Bias Mitigation Techniques Used Transparent Design Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc)) Transparent Design Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc)) Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc)) Bias Mitigation Techniques Used Bias Mitigation Techniques Used Testing and Evaluation Independent Validation
Continuous Monitoring
Disclose Bias Audit Findings
Engage Stakeholders Testing and Evaluation Independent Validation Continuous Monitoring Disclose Bias Audit Findings Engage Stakeholders Independent Validation Continuous Monitoring Disclose Bias Audit Findings Engage Stakeholders

This story contains new, firsthand information uncovered by the writer.

Hot off the press! This story contains factual information about a recent event.

Why Ignoring Sensitive Factors Won't Solve Algorithmic Bias and Discrimination

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AGI 2029: Walking the Fine Line Between Utopia and Existential Risks

AI Restrictions Reinforce Abusive User Behavior

Artificial Intelligence Act (AIA): Europe’s Startups to Restore AI Excellence and Trust?

As we Learn to Use AI, It is Learning to Use Us

Microsoft CEO and Chief Economist Disagree on Regulation Timing for New Technologies - WEF Davos

Balancing AI Innovation & Regulation: Perspectives on Foundation Models and Responsible Development

AGI 2029: Walking the Fine Line Between Utopia and Existential Risks

AI Restrictions Reinforce Abusive User Behavior

Artificial Intelligence Act (AIA): Europe’s Startups to Restore AI Excellence and Trust?

As we Learn to Use AI, It is Learning to Use Us

Microsoft CEO and Chief Economist Disagree on Regulation Timing for New Technologies - WEF Davos

Balancing AI Innovation & Regulation: Perspectives on Foundation Models and Responsible Development

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps