Bias in AI is a big problem. In particular, AI can compound the effects of existing societal biases: in a recruiting tool, if more men than women are software engineers, AI is likely to use that data to identify job applicants and overscreen for men, creating a vicious circle of bias. Indeed, Amazon recently scrapped its AI recruiting engine project for that reason. Now that AI is increasingly used in high-impact applications, such as criminal justice, hiring, and healthcare, characterizing and mitigating biases is more urgent than ever—we must prevent AI from further disadvantaging already-disadvantaged people.
To what extent can AI be used to address its own problems? Most methodologies that have been proposed to mitigate biases in AI and machine learning systems assume access to sensitive demographic attributes. However, in practice, this information is often unavailable and, in some contexts, may even be illegal to use. How can we mitigate biases if we do not have access to sensitive attributes?
What if we could use fire to fight fire, or in this case, use bias to fight bias?
That’s the approach we came up with for our paper, “What’s in a Name? Reducing Bias in Bios Without Access to Protected Attributes“, to be presented at the 2019 North American Chapter of the Association for Computational Linguistics (NAACL) conference running June 2-7, 2019, in Minneapolis. We’re also pleased to announce that our paper won the Best Thematic Paper award.
In our paper, we propose a method that relies on word embeddings of names to reduce biases without requiring access to sensitive attributes. Our method even tackles intersectional biases, such as biases involving combinations of race and gender. As we showed in previous work, inherent societal biases involving sensitive attributes are encoded in word embeddings. Essentially, word embeddings are mappings of words to vectors, learned from large collections of documents, so they capture any biases represented in those documents. For example, in commonly used word embeddings, Hispanic names are “embedded” closer to ‘taco’ than to ‘hummus,’ as well as reflecting harmful cultural stereotypes.
Specifically, we look at mitigating biases in occupation classification, using names from a large-scale dataset of online biographies: the predicted probability of an individual’s occupation should not depend on their name—nor on any sensitive attributes that may be inferred from it. Crucially, and in contrast to previous work, our method requires access to names only at training time and not at deployment time.
Using societal biases in word embeddings
Here’s how our method works: we penalize the classifier if there is a correlation between the embedding of an individual’s name and the probability of correctly predicting that individual’s occupation. This encourages the classifier to use signals that are useful for occupation classification but not useful for predicting names or any sensitive attributes correlated with them. We find that our method reduces differences in classification accuracy across race and gender, while having very little effect on the classifier’s overall performance, quantified in terms of its true positive rate.
We propose two variations of our method. The first variation uses k-means to cluster word embeddings of the names of the individuals in the training set and then, for each pair of clusters, minimizes between-cluster differences in the predicted probabilities of the true occupations of the individuals in the training set. The second variation directly minimizes the covariance between the predicted probability of each (training set) individual’s true occupation and a word embedding of that individual’s name.
Both variations of our method therefore mitigate societal biases that are encoded in names, including biases involving age, religion, race, and gender. (Biases that are not encoded in names, such those involving disabilities, are not addressed by our method.) Crucially, our method mitigates intersectional biases involving specific combinations of these attributes, which may otherwise go undetected.
Because our method requires access to names only at training time, it extends fairness benefits to individuals whose sensitive attributes are not reflected in their names, such as women named Alex.
Relationships started during internships at Microsoft
This work came about because we were interns together at Microsoft Research New England, and worked on projects in the space of fairness in AI. The experience was very positive, and we formed a great team, so we continued to work together on this project even after the internship was over.
Both of us were motivated by our desire to build AI systems that work well for everyone. We’re passionate about understanding the roadblocks that prevent the effective use of AI and then developing ways to address them. Each of our co-authors (Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, and Adam Kalai) contributed a unique perspective to this project, and we’d like to thank them for their contributions and for the research environment they have created. In Hanna’s words, “Microsoft is particularly interested in work at the intersection of machine learning and the social sciences that can be used to mitigate biases in AI systems. This paper presents a one such method. Moving forward, we’re excited to see how our method might be used in real-world settings.”