Artificial intelligence systems echoing biases from their training material, putting scientists on guard

Scientists are still learning how technology like BERT works. And they are often surprised by the mistakes their new AI is making.


Last fall, Google unveiled a breakthrough artificial intelligence technology called BERT that changed the way scientists build systems that learn how people write and talk.

But BERT, which is now being deployed in services like Google’s internet search engine, has a problem: It could be picking up on biases in the way a child mimics the bad behaviour of his parents.

BERT is one of a number of AI systems that learn from lots and lots of digitised information, as varied as old books, Wikipedia entries and news articles. Decades and even centuries of biases — along with a few new ones — are probably baked into all that material.

BERT and its peers are more likely to associate men with computer programming, for example, and generally, don’t give women enough credit. One program decided almost everything written about President Donald Trump was negative, even if the actual content was flattering.

As new, more complex AI moves into an increasingly wide array of products, like online ad services and business software or talking digital assistants like Apple’s Siri and Amazon’s Alexa, tech companies will be pressured to guard against the unexpected biases that are being discovered.

Artificial intelligence systems echoing biases from their training material, putting scientists on guard

The brand logo of Alphabet Inc's Google is seen outside its office in Beijing, China, August 8, 2018. Image: Reuters.

But scientists are still learning how technology like BERT, called “universal language models,” works. And they are often surprised by the mistakes their new AI is making.

On a recent afternoon in San Francisco, while researching a book on artificial intelligence, computer scientist Robert Munro fed 100 English words into BERT: “jewelry,” “baby,” “horses,” “house,” “money,” “action.” In 99 cases out of 100, BERT was more likely to associate the words with men rather than women. The word “mom” was the outlier.

“This is the same historical inequity we have always seen,” said Munro, who has a PhD in computational linguistics and previously oversaw natural language and translation technology at Amazon Web Services. “Now, with something like BERT, this bias can continue to perpetuate.”

In a blog post set to be published this week, Munro also describes how he examined cloud-computing services from Google and Amazon Web Services that help other businesses add language skills into new applications. Both services failed to recognise the word “hers” as a pronoun, though they correctly identified “his.”

“We are aware of the issue and are taking the necessary steps to address and resolve it,” a Google spokesman said. “Mitigating bias from our systems is one of our AI principles, and is a top priority.” Amazon did not respond to multiple requests for comment.

Researchers have long warned of bias in AI that learns from large amounts data, including the facial recognition systems that are used by police departments and other government agencies as well as popular internet services from tech giants like Google and Facebook. In 2015, for example, the Google Photos app was caught labelling African Americans as “gorillas.” The services Munro scrutinised also showed bias against women and people of colour.

BERT and similar systems are far more complex — too complex for anyone to predict what they will ultimately do.

“Even the people building these systems don’t understand how they are behaving,” said Emily Bender, a professor at the University of Washington who specialises in computational linguistics.

BERT is one of many universal language models used in industry and academia. Others are called ELMO, ERNIE and GPT-2. As a kind of an inside joke among AI researchers, they are often named for Sesame Street characters. (Bert is short for Bidirectional Encoder Representations from Transformers.)

They learn the nuances of language by analysing enormous amounts of text. A system built by OpenAI, an artificial intelligence lab in San Francisco, analysed thousands of self-published books, including romance novels, mysteries and science fiction. Bert analysed the same library of books along with thousands of Wikipedia articles.

In analysing all this text, each system learned a specific task. OpenAI’s system learned to predict the next word in a sentence. BERT learned to identify the missing word in a sentence (such as “I want to ____ that car because it is cheap”).

Through learning these tasks, BERT comes to understand in a general way how people put words together. Then it can learn other tasks by analysing more data. As a result, it allows AI applications to improve at a rate not previously possible.

“BERT completely changed everything,” said John Bohannon, director of science at Primer, a startup in San Francisco that specialises in natural language technologies. “You can teach one pony all the tricks.”

Google itself has used BERT to improve its search engine. Before, if you typed “Do aestheticians stand a lot at work?” into the Google search engine, it did not quite understand what you were asking. Words like “stand” and “work” can have multiple meanings, serving either as nouns or verbs. But now, thanks to BERT, Google correctly responds to the same question with a link describing the physical demands of life in the skin-care industry.

But tools like BERT pick up bias, according to a recent research paper from a team of computer scientists at Carnegie Mellon University. The paper showed, for instance, that BERT is more likely to associate the word “programmer” with men rather than women. Language bias can be a particularly difficult problem in conversational systems.

As these new technologies proliferate, biases can appear almost anywhere. At Primer, Bohannon and his engineers recently used BERT to build a system that lets businesses automatically judge the sentiment of headlines, tweets and other streams of online media. Businesses use such tools to inform stock trades and other pointed decisions.

But after training his tool, Bohannon noticed a consistent bias. If a tweet or headline contained the word “Trump,” the tool almost always judged it to be negative, no matter how positive the sentiment.

“This is hard. You need a lot of time and care,” he said. “We found an obvious bias. But how many others are in there?”

Bohannon said computer scientists must develop the skills of a biologist. Much as a biologist strives to understand how a cell works, software engineers must find ways of understanding systems like BERT.

In unveiling the new version of its search engine last month, Google executives acknowledged this phenomenon. And they said they tested their systems extensively with an eye toward removing any bias.

Researchers are only beginning to understand the effects of bias in systems like BERT. But as Munro showed, companies are already slow to notice even obvious bias in their systems. After Munro pointed out the problem, Amazon corrected it. Google said it was working to fix the issue.

Primer’s chief executive, Sean Gourley, said vetting the behaviour of this new technology would become so important, it will spawn a whole new industry, where companies pay specialists to audit their algorithms for all kinds of bias and other unexpected behaviours.

“This is probably a billion-dollar industry,” he said.

Cade Metz c. The New York Times Company

Find latest and upcoming tech gadgets online on Tech2 Gadgets. Get technology news, gadgets reviews & ratings. Popular gadgets including laptop, tablet and mobile specifications, features, prices, comparison.