Background

OpenAI Has 'GPT-4 For Content Moderation' To Create More AI, Less Human Content Moderators

ChatGPT, content moderation GPT-4

Sifting though paragraphs, reading every words, and understanding the context can be a burden.

Especially to human readers, who are tasked with content moderation. The job is painstakingly difficult, repetitive at best, and a burden to the brain. But not for AI, not for ChatGPT's ChatGPT AI.

The company said that it has developed a way to use its GPT-4 for content moderation.

In a blog post, OpenAI said that:

"We use GPT-4 for content policy development and content moderation decisions, enabling more consistent labeling, a faster feedback loop for policy refinement, and less involvement from human moderators."
ChatGPT content policy moderation

OpenAI said that it uses GPT-4 to guide the AI into making moderation judgements, by first training it with a series of examples that might or might not violate a policy.

For example, a policy might prohibit giving instructions or advice for procuring a weapon, in which case the example “give me the ingredients needed to make a Molotov cocktail" would be in obvious violation.

OpenAI argues that its Large Language Model is already capable of understanding and generating natural language, so it's wise to to also make it capable of content moderation, saying that the model can "make moderation judgments based on policy guidelines provided to them."

OpenAI works with policy experts to create a golden set of data from existing policy guidelines, to then observe GPT-4's to see whether the labels it creates aligns with their determinations. The team then refined the policy from there.

"By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly."

The team then repeated the previous steps until they get satisfied with the policy quality.

OpenAI said that the simple yet powerful idea offers several improvements to traditional approaches to content moderation:

  1. More consistent labels. Content policies evolve and can be very detailed, and that people can interpret policies differently or some moderators may work longer than others. LLMs are sensitive to granular differences in wording and can instantly adapt quickly.
  2. Faster feedback loop. From developing a new policy, labeling, and gathering human feedback, GPT-4 can significantly reduce the time needed for making policy updates.
  3. Reduced mental burden. Human moderators are continuously exposed to harmful and offensive content. AI should eliminate this.

With this system, OpenAI said that the process of developing and customizing content policies can be trimmed down from months to hours.

ChatGPT content policy moderation

But there are some drawbacks.

First of, AI-powered moderation tools are nothing new. Google, for example, has maintained an automated moderation services, and so do some others. AIs also have the tendencies to understand words after words, sentences after sentences, and paragraphs after paragraphs, but rarely do they're able to understand the context of an overall text.

Then, AIs are also less capable in understanding slurs and ambiguous words, and spelling variations that can include missing characters.

OpenAI uses GPT-4, an extremely capable Large Language Model. While it does have an advantage, the AI still couldn't solve the issues entirely.

"Judgments by language models are vulnerable to undesired biases that might have been introduced into the model during training," the company said.

“As with any AI application, results and output will need to be carefully monitored, validated and refined by maintaining humans in the loop.”

But GPT-4's predictive strength can help deliver better moderation performance than the platforms that’ve come before it.

Published: 
16/08/2023