Facebook Uses AI To Read Text Inside Images, Struggles To Understand Memes

Facebook is the largest social media on the web. With billions of users sharing about whatever things they can think of, it can be vert difficult to police them all.

For text posts, to photo uploads, videos and others, they are all shared on Facebook, billions of times every single day. That number makes it impossible for its human moderators to sift through all of them comprehensively. This is why the company leverages AI to do the work.

Relying on artificial intelligence, Facebook can surface things like spam and pornography more efficiently.

One the Ais Facebook use, is called 'Rosetta'. This machine-learning system has been deployed at a large scale to automatically and proactively "inappropriate or harmful content" in images on the social network.

In other words, this Facebook AI can read text in a photo of a newspaper, screenshot, and even tell if a meme is offensive or not.

Rosetta reading meme
Rosetta's text extraction model

While chats and most other text inside photos are relatively easier to understand, memes are somehow different. Memes are complicated cultural artifacts, and have proven to be difficult for computers to understand.

This is where Rosetta' intelligence uses a two step process to understand context better. The first is detecting images that might contain text using optical character recognition (OCR) technology combined with other machine learning techniques, and the second is using another neural network to transcribe the text and understand its meaning.

It then can feed that text through other systems, like one that checks whether the meme is about an already-debunked viral hoax.

The automated systems can “read” the words that are overlaid on top of the photo, as well as analyzing the image itself.

Rosetta's knowledge comes from it extracting text from more than a billion public Facebook and Instagram images and video frames (in a variety of languages), daily and in real time. The data are the input to its text recognition model.

Rosetta is already being used by teams at Facebook and Instagram to improve the quality of photo search, improve the accuracy of photos in the News Feed, and also to identify hate speech.

Rosetta uses a two step process to understand context better
Two-step model architecture: The first step is to detect words, the second is to recognize the words

Using AI can be a good thing. With clever automation, AI can do people's job in real time without stopping. It can be faster and more efficient. But apparently, AI is not yet very effective.

Facebook has struggled in the past to identify hate speech or misleading information. Using AI to rate the severity of speech can be difficult because contents continue to adapt into the trends and cultures, in a way that AIs training materials may not be sufficient.

This is because AIs, unlike humans, need to see tens of thousands of examples before it can learn to complete a complicated task.

With about 350 million photos being uploaded to the social network each day, Facebook is fighting an uphill battle. The task requires the social giant to process many things, in variety of visual elements, all at once.

Rosetta has a lot to learn and to catch up.

Viswanath Sivakumar, a software engineer at Facebook who works on Rosetta, said that "In the context of proactively detecting hate speech and other policy-violating content, meme-style images are the more complex AI challenge."

Published: 
15/09/2018