Background

'Tacotron 2' AI From Google Is Able To Generate Voice Indistinguishable From Humans

Humans can speak like humans. It's obvious because we're after all living beings. But how about machines? Can we make them speak the same way we do?

When computers were fist able to speak, they talk like robots and sound like robots. When technology advanced, humans started giving their humanly voice to machines, in ways that they can speak a lot more human. One of the advances has come from Google.

In a research paper published by the company, it details how text-to-speech system called 'Tacotron 2' can have a near-human accuracy at imitating audio of a person speaking from text.

The system is Google’s second official generation of the technology, which uses two deep neural networks. The first network translates the text into a spectrogram, a visual way to represent audio frequencies. Then, that spectrogram is fed into WaveNet, a system from Alphabet’s AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly.

Below is an example. One of the sample is generated by an AI, and the other by a human.

“George Washington was the first President of the United States.”

The Google researchers demonstrate that Tacotron 2 can also pronounce names and words that are difficult:

“Basilar membrane and otolaryngology are not auto-correlations.”

It also has the ability to enunciate based on punctuation.

“This is your personal assistant, Google Home.”

“This is your personal assistant Google Home.”

Words that are capitalized will be stressed, in a way just like a human would do when indicating that specific word is an important part of a sentence:

“The buses aren’t the problem, they actually provide a solution.”

“The buses aren’t the PROBLEM, they actually provide a SOLUTION.”

It is also robust to spelling errors:

“Thisss isrealy awhsome.”

What's more, Tacotron 2 is also good at tongue twisters:

“She sells sea-shells on the sea-shore. The shells she sells are sea-shells I'm sure.”

Tacotron 2 text-to-speech system is just like other AI researches the company does, but this one is a bit different since the technology can be immediately useful for Google.

It's personal assistant, for example, uses WaveNet technologies since 2016. With Tacotron 2, Google can make the digital assistant a lot more powerful.

However, the Tacotron 2 system has only been trained to mimic one human female voice. In order for it to speak like another female, or male, would required Google to train the system again.

Published: 
30/12/2017