Mozilla's 'Common Voice', And How Crowdsourcing Speech Dataset Can Eliminate AI Biases

With human-computer interaction is shifting from touch to voice, the future would rely on voice control.

From smart digital assistants inside smartphones to smart speakers and smart home appliances, these devices are all ears waiting to fulfill whatever command their masters have. They wait quietly for that trigger voice command, as they patiently serve our bidding.

But traditionally, voice assistants, like Google Assistant, Amazon Alexa, Apple Siri or Microsoft Cortana are all developed to represent mostly white male dominance. This is because most of their developers were indeed male, and white.

In one way or another, this led to gender bias.

For end users, this can translate to less appealing experience, especially for those who don't speak English as their native language, and not male, or white. And this has been confirmed by a study by the UN.

This is why Mozilla, the organization behind the popular Firefox browser, created an open-source project called 'Common Voice' back in 2017.

What it does, is crowdsourcing voices as a data set, in order to diversify sources of AI knowledge, by representing the global population, and not just the West.

Common Voice works by releasing its growing dataset publicly so any company can use it for research and to build and train their own voice-enabled software.

With thousands of hours of voice data in multiple languages - including English, French, German,Traditional Mandarin Chinese, Welsh, and Kabyle - Common Voice has the ultimate goal of improving voice recognition for all people regardless of their language, gender, age, or accent.

According to Kelly Davis, the Head of Machine Learning, at Mozilla

"Existing speech recognition services are only available in languages that are financially profitable. They also tend to work better for men than women and struggle to understand people with different accents, all of which are a result of biases within the data on which they are trained."
Speak - Listen (Mozilla Common Voice)
People can contribute to the Common Voice project, by either recording their own voice, or validate others' recordings

Speech is becoming a preferred way to interact with machines, and that’s contributed to the growth of services like Google Assistant and Amazon Alexa.

While these voice assistants have revolutionized the way humans communicate with their devices, "the innovative potential of this technology is widely untapped because developers, researchers, and startups around the globe working on voice-recognition technology face one problem alike: a lack of publicly available voice data in their respective language to train speech-to-text engines," added Davis.

So although the mass usage of voice recognition AI is increasing, it's far from what it should be, as according to Davis, voice assistants still majorly caters to the West with six of its seven language options being European.

"Largely, the efforts to address the race gap in AI have fallen on non-corporate hands," continued Davis.

This was further highlighted back in April 2019, when a study by New York University’s AI Now Institute found that the lack of diverse representation at major technology companies like Microsoft, Google, and Facebook caused AI to cater more readily to white men.

The report highlighted that only 15%of Facebook’s AI staff are women. Google on the other hand, only 10% are women.

Voice Search
"Most speech databases are trained with an over-representation of certain demographics which results in a bias towards male and white and middle class."

"Accents and dialects that tend to be under-represented in training datasets. Many machines struggle with understanding female voices or voices of elderly people."

This is why, despite the growing number of digital assistants usage, only a fraction of people really benefit from these voice recognition technology.

"Think about how speech recognition could be used by minority language speakers to enable more people to have access to technology and the services the internet can provide, even if they never learned how to read?" Davis said.

"The same is true for visually impaired or physically handicapped people, but regular market forces will not help them."

Using Common Voice, Mozilla hopes to speed up the process of collecting data in all languages around the globe, regardless of accent, gender, or age.

"By making this data available - and developing a speech recognition engine in the open, project Deep Speech – we can empower entrepreneurs and communities to address existing gaps on their own," said Davis.

Anyone can help contribute to this Mozilla's project. To do this, on the Mozilla’s Common Voice page, people can simply record their own voice reading out sentences out loud, or just listen to others' recording to verify whether or not they are accurate.

This way, Mozilla's approach is like giving the power back to users, and not to some third-party contractors like Google and Amazon.

Further reading: The Ways Voice-Enabled Technology Can Change Consumer Behavior