
Humans can speak like humans. It's obvious because we're after all living beings. But how about machines? Can we make them speak the same way we do?
When computers were fist able to speak, they talk like robots and sound like robots. When technology advanced, humans started giving their humanly voice to machines, in ways that they can speak a lot more human. One of the advances has come from Google.
In a research paper published by the company, it details how text-to-speech system called 'Tacotron 2' can have a near-human accuracy at imitating audio of a person speaking from text.
The system is Google’s second official generation of the technology, which uses two deep neural networks. The first network translates the text into a spectrogram, a visual way to represent audio frequencies. Then, that spectrogram is fed into WaveNet, a system from Alphabet’s AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly.
Below is an example. One of the sample is generated by an AI, and the other by a human.
The Google researchers demonstrate that Tacotron 2 can also pronounce names and words that are difficult:
It also has the ability to enunciate based on punctuation.
Words that are capitalized will be stressed, in a way just like a human would do when indicating that specific word is an important part of a sentence:

It is also robust to spelling errors:
What's more, Tacotron 2 is also good at tongue twisters:
Tacotron 2 text-to-speech system is just like other AI researches the company does, but this one is a bit different since the technology can be immediately useful for Google.
It's personal assistant, for example, uses WaveNet technologies since 2016. With Tacotron 2, Google can make the digital assistant a lot more powerful.
However, the Tacotron 2 system has only been trained to mimic one human female voice. In order for it to speak like another female, or male, would required Google to train the system again.