Behind Bing, Google Search Finally Pushes BERT To Over 70 Languages Worldwide

Google Bert

BERT is a technique for natural-language processing (NLP) to make search engines better understand users' searches.

While BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google, the search engine giant only used it for a mere 10% of English search results in the U.S.. This is a far cry from Bing.

Previously, the search engine from Microsoft announced that it's already using BERT worldwide to every query.

To make up for the loss, Google finally turns on BERT on its search engine for all users worldwide.

Coming to over 70 languages globally, Google announced on Monday, BERT should make Google Search more capable in understanding users' queries.

According to Pandu Nayak, Google Fellow and Vice President of Search in a blog post back in October:

"If there’s one thing I’ve learned over the 15 years working on Google Search, it’s that people’s curiosity is endless. We see billions of searches every day, and 15 percent of those queries are ones we haven’t seen before--so we’ve built ways to return results for queries we can’t anticipate."

"When people like you or I come to Search, we aren’t always quite sure about the best way to formulate a query. We might not know the right words to use, or how to spell something, because often times, we come to Search looking to learn--we don’t necessarily have the knowledge to begin with. "

Because search engine should provide results, the algorithms need to first understand users' intentions on every query. And this has proven difficult since people can use different spelling, word combination or grammar to explain one single context.

"We sometimes still don’t quite get it right, particularly with complex or conversational queries," continued Nayak at the time.

And this is where Google applies BERT models to Google Search.

BERT which stands for 'Bidirectional Encoder Representations from Transformers', is an open-source neural network-based technique for NLP pre-training enables anyone to train "their own state-of-the-art question answering system."

This is possible because BERT has its origins from pre-training contextual representations, which include Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit. However, unlike previous models, BERT is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus.

As a result, models using BERT can process words in relation to all the other words in a sentence, rather than one-by-one in order.

In other words, BERT models can consider the full context of a word by looking at the words that come before and after it. This is particularly useful for search engines to understand the intent behind search queries.

To make BERT functional, Google had to invest on more powerful hardware, given that BERT models can be so complex that they push the limits of what Google can do with its existing hardware.