With human-computer interaction is shifting from touch to voice, the future would rely on voice control.
From smart digital assistants inside smartphones to smart speakers and smart home appliances, these devices are all ears waiting to fulfill whatever command their masters have. They wait quietly for that trigger voice command, as they patiently serve our bidding.
But traditionally, voice assistants, like Google Assistant, Amazon Alexa, Apple Siri or Microsoft Cortana are all developed to represent mostly white male dominance. This is because most of their developers were indeed male, and white.
In one way or another, this led to gender bias.
For end users, this can translate to less appealing experience, especially for those who don't speak English as their native language, and not male, or white. And this has been confirmed by a study by the UN.
What it does, is crowdsourcing voices as a data set, in order to diversify sources of AI knowledge, by representing the global population, and not just the West.
Common Voice works by releasing its growing dataset publicly so any company can use it for research and to build and train their own voice-enabled software.
With thousands of hours of voice data in multiple languages - including English, French, German,Traditional Mandarin Chinese, Welsh, and Kabyle - Common Voice has the ultimate goal of improving voice recognition for all people regardless of their language, gender, age, or accent.
According to Kelly Davis, the Head of Machine Learning, at Mozilla
Speech is becoming a preferred way to interact with machines, and that’s contributed to the growth of services like Google Assistant and Amazon Alexa.
While these voice assistants have revolutionized the way humans communicate with their devices, "the innovative potential of this technology is widely untapped because developers, researchers, and startups around the globe working on voice-recognition technology face one problem alike: a lack of publicly available voice data in their respective language to train speech-to-text engines," added Davis.
So although the mass usage of voice recognition AI is increasing, it's far from what it should be, as according to Davis, voice assistants still majorly caters to the West with six of its seven language options being European.
"Largely, the efforts to address the race gap in AI have fallen on non-corporate hands," continued Davis.
This was further highlighted back in April 2019, when a study by New York University’s AI Now Institute found that the lack of diverse representation at major technology companies like Microsoft, Google, and Facebook caused AI to cater more readily to white men.
The report highlighted that only 15%of Facebook’s AI staff are women. Google on the other hand, only 10% are women.
"Accents and dialects that tend to be under-represented in training datasets. Many machines struggle with understanding female voices or voices of elderly people."
This is why, despite the growing number of digital assistants usage, only a fraction of people really benefit from these voice recognition technology.
"Think about how speech recognition could be used by minority language speakers to enable more people to have access to technology and the services the internet can provide, even if they never learned how to read?" Davis said.
"The same is true for visually impaired or physically handicapped people, but regular market forces will not help them."
Using Common Voice, Mozilla hopes to speed up the process of collecting data in all languages around the globe, regardless of accent, gender, or age.
"By making this data available - and developing a speech recognition engine in the open, project Deep Speech – we can empower entrepreneurs and communities to address existing gaps on their own," said Davis.
Anyone can help contribute to this Mozilla's project. To do this, on the Mozilla’s Common Voice page, people can simply record their own voice reading out sentences out loud, or just listen to others' recording to verify whether or not they are accurate.
This way, Mozilla's approach is like giving the power back to users, and not to some third-party contractors like Google and Amazon.
Further reading: The Ways Voice-Enabled Technology Can Change Consumer Behavior