Background

Google Introduces 'Aloud', An AI-Powered Video Dubbing Translator Tool

Aloud

How many languages can someone master? While there are polyglots who can speak dozens of languages, computers can master more languages than that.

Google is the tech company, known best for its search engine. But its business has gone far than just that. Among the many products the company has to offer, is Google Translate, which allows users to translate one language to more than 100 other languages.

But translating however, is not the same as creating subtitles, and far from dubbing a video.

This is why Google is experimenting on something that could change the tedious work that has long require human labor, forever.

The tech giant has been developing an AI-powered tool that can make adding audio in a different language from the original, as simple as typing the translation in a document.

Coming out from Area 120, Google calls it 'Aloud'.

In a blog post, Google said that:

"Aloud makes video dubbing easy and cost effective, getting us one step closer towards overcoming the language barrier in videos."

"Have you ever wanted to learn something from a video, but couldn’t because it was in another language? If your answer is yes, you’re not alone."

"That’s why we are introducing Aloud, a new product from Area 120, Google’s in-house incubator for experimental projects. Using Aloud, creators can quickly and easily dub their videos into multiple languages, unlocking knowledge that might be trapped in a single language today. We support dubbing into Spanish and Portuguese, with Hindi, Bahasa-Indonesia and other languages coming soon. We hope this makes dubbing more accessible to creators who previously considered it too difficult or too costly."

But the main purpose of using this technology, is to better bridge language barrier.

"Subtitles can help bridge the language gap, but they’re not always ideal on mobile devices due to the small form factor, the necessity of constant attention to the screen, and accessibility challenges for those with visual or reading impairments. Dubbing, the process of adding a translated voice track, overcomes those limitations, but is time-consuming and cost-prohibitive for most creators."

Using manual translation, proper dubbing can take weeks. And for production companies, it can be expensive.

With Aloud, the whole process can only take a few minutes.

To make this happen, Aloud uses advances in audio separation, machine translation and speech synthesis to significantly reduce the time needed for a human translator to work.

"You do not even need to know any language other than the ones you already speak, and all of this is available at no cost to the creator," Google said.

Aloud
A preview of Aloud's text editor. (Credit: Google)

What's needed, is to provide Aloud with the video and the subtitle of the original language.

Even if the original subtitle is not available, creators can simply review the text transcript that Google generates automatically.

Aloud works by combining several popular AI tasks into a single tool.

With Aloud, users can provide subtitles to a video or the AI’s speech-to-text model can produce a transcript for review, translating the text into an available language, and allows users to pick a synthetic voice to read out the translated speech, replacing the original audio in the video for publishing.

In other words, Google wants Aloud to trim the usual complex and lengthy process of dubbing that require several people, down to a simple one that it says won't cost creators anything.

Published: 
17/03/2022