'GLTR', A Yet Another AI-Powered Tool To Fight AI Text Generators

With computers getting powerful and reliable, AI has benefit the hardware to become increasingly capable.

From image recognition, voice recognition, to self-driving cars and others, AIs have helped humans in many cases in various industries. Unfortunately, not every AI created is meant for good intentions.

For example, they can be created and trained to employ fake news and spread misinformation.

Researchers from Harvard University and MIT-IBM Watson Lab have a way to stop that, using an AI-powered tool for spotting AI-generated text.

Called the Giant Language Model Test Room, or simply as 'GLTR' (pronounced Glitter), the AI is developed with the ability to detect whether a specific piece of text was generated by a language model algorithm.

Similar to 'Grover' from The Allen Institute for Artificial Intelligence, the tool uses AI and a machine-learning natural language generation models called GPT-2, which was originally developed by OpenAI.

Read: The 'GPT-2' AI Could Create Significant Societal Impacts OpenAI Is Scared Of

Genuine human-written text tends to have a good mix of words that contain yellows, reds and purples. If the highlighted text is mostly greens and yellows, it gives a strong indication that it could be machine generated

According to the paper titled GLTR: Statistical Detection and Visualization of Generated Text:

" We develop GLTR, a tool to support humans in detecting whether a text was generated by a model. GLTR applies a suite of baseline statistical methods that can detect generation artifacts across common sampling schemes."

GLTR has the potential to distinguish machine-generated text from human-written text to a non-expert readers, according to the researchers, up from 54 percent to 72 percent, without any prior training.

The AI does this by making use of statistical word distributions in text to identify differences.

The strategy is by looking at the text to see whether it has been generated using a language model. If it is, the text would consist of a more predictable string of words than when written by a human. This is because sentences generated by AI text generators may not carry any actual meaning, although the text is grammatically correct.

In the past, there were methods similar to GLTR. But most of them, according to the paper, don't use low-entropy predictions with an almost linear increase in the frequency of high-entropy words for generated text.

Three graphs with global information (a). The ability for users to switch between two different annotations and customize the top-k thresholds (b). Heatmap for each token shown with the associated annotation (c). The tooltip highlighting information about GLTR's prediction when hovering over the word “chuck”

According to its website at http://gltr.io, GLTR is not perfect.

"Its main limitation is its limited scale. It won't be able to automatically detect large-scale abuse, only individual cases. Moreover, it requires at least an advanced knowledge of the language to know whether an uncommon word does make sense at a position," explained the site.

The researchers' assumption is that the AI is "limited in that it assumes a simple sampling scheme."

What this means, it can be vulnerable to adversary attacks, which change the sampling parameters per word or sentence to make it look more similar to the language it is trying to imitate.

However, the researchers believe that an adversarial sampling scheme would lead to worse text, as the model would be forced to generate words it deemed unlikely.

This in turn would lead to other detectable properties in a text.

"Therefore, despite its limitations, we believe that GLTR can spark the development of similar ideas that work at greater scale."

This kind of technology can be used to not only detecting machine-made text, but also for identifying bots that have been used spread fake news on social media networks.