Ever since Microsoft’s chatbot Tay started spouting racist commentary after 24 hours of interacting with humans on Twitter, it has been obvious that our AI creations can fall prey to human prejudice. Now a group of researchers has figured out one reason why that happens. Their findings shed light on more than our future robot overlords, however. They’ve also worked out an algorithm that can actually predict human prejudices based on an intensive analysis of how people use English online.
The implicit bias test
Many AIs are trained to understand human language by learning from a massive corpus known as the Common Crawl. The Common Crawl is the result of a large-scale crawl of the Internet in 2014 that contains 840 billion tokens, or words. Princeton Center for Information Technology Policy researcher Aylin Caliskan and her colleagues wondered whether that corpus—created by millions of people typing away online—might contain biases that could be discovered by algorithm. To figure it out, they turned to an unusual source: the Implicit Association Test (IAT), which is used to measure often unconscious social attitudes.
People taking the IAT are asked to put words into two categories. The longer it takes for the person to place a word in a category, the less they associate the word with the category. (If you’d like to take an IAT, there are several online at Harvard University.) IAT is used to measure bias by asking people to associate random words with categories like gender, race, disability, age, and more. Outcomes are often unsurprising: for example, most people associate women with family, and men with work. But that obviousness is actually evidence for the IAT’s usefulness in discovering people’s latent stereotypes about each other.