
Since first releasing Google Translate, Google has been steadily adding new languages to it.
18 years after the tool was first released, Google Translate was supporting 133 back in May. And this time, Google announced Google Translate's "largest expansion ever," where the company added support for 110 new languages in one go.
The update brings the total number of supported languages to 243.
This is a huge bump, a significant increase that puts Google Translate way ahead of Apple Translate, which supports only 20 languages, and leap pass Microsoft Translator that supports 135 languages.
The 110 addition languages give Google an upper hand among the top three big players.
And this happens, thanks to AI.
We’re using AI to add over 100 new languages to Google Translate, our largest expansion ever. Learn more ↓ https://t.co/jLGouceAIG
— Google (@Google) June 27, 2024
According to Google, it achieved this feat with the help of the company’s PaLM 2 large language model.
Some of the 110 newly added languages include Afar, Cantonese, Manx, NKo, Punjabi (Shahmukhi), Tamazight (Amazigh), and Tok Pisin.
"From Cantonese to Q'eqchi', these new languages represent more than 614 million speakers, opening up translations for around 8% of the world's population," Isaac Caswell, senior software engineer for Google Translate, wrote in a release.
"Some are major world languages with over 100 million speakers. Others are spoken by small communities of Indigenous people, and a few have almost no native speakers but active revitalization efforts."
A quarter of the 110 languages come from Africa, including Fon, Kikongo, Luo, Ga, Swati, Venda, and Wolof, making this the largest expansion of African languages on Google Translate to date.
As for what languages should be added, Google has lots of things to consider.
And what it means by lots, include "everything from what varieties we offer, to what specific spellings we use."
"Languages have an immense amount of variation: regional varieties, dialects, different spelling standards. In fact, many languages have no one standard form, so it’s impossible to pick a 'right' variety," explained Caswell.
"Our approach has been to prioritize the most commonly used varieties of each language. For example, Romani is a language that has many dialects all throughout Europe. Our models produce text that is closest to Southern Vlax Romani, a commonly used variety online. But it also mixes in elements from others, like Northern Vlax and Balkan Romani."
Cantonese, for example, is what Caswell described as a language that "long been one of the most requested languages," but adding the language was challenging because it often overlaps with Mandarin in writing, which made it "tricky to find data and train models."
Tok Pisin, the lingua franca of Papua New Guinea, was added to Google Translate due to its status as an English-based creole, app users who are English speakers should try translating into Tok Pisin because they "might be able to make out the meaning!"
Shahmukhi, a variety of Punjabi that's the most spoken language in Pakistan, was added along with Afar, a language spoken in Djibouti, Eritrea and Ethiopia, because the languages had the most volunteer community contributions.
Manx, the Celtic language of the Isle of Man, was nearly extinct after the death of its last native speaker in 1974. But Google added it to Google Translate following a revival movement on the island has resulted in there now being thousands of speakers.
NKo is a standardized form of the West African Manding languages, which unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops modern resources and technology.
Punjabi (Shahmukhi), is the variety of Punjabi written in Perso-Arabic script (Shahmukhi), and is the most spoken language in Pakistan. And as for Tamazight (Amazigh), it's a Berber language spoken across North Africa.
Although there are many dialects, the written form is generally mutually understandable. It’s written in Latin script and Tifinagh script, both of which Google Translate supports.
And to make all of these possible, PaLM 2 was a key piece to the puzzle, helping Translate more efficiently learn languages that are closely related to each other.