OpenAI Introduces 'CriticGPT' AI, Made To Criticize Its ChatGPT Outputs And Mistakes

The AI industry was quite dull and boring, and things rarely created ripples that disrupt other industries.

But since OpenAI introduced ChatGPT and took the world by storm, the possibilities it introduced shook pretty much the entire tech world. It even made Google look. It was a "code red" situation.

Google was quick in catching up by coming up with its own generative AI offering.

While ChatGPT doesn't suggest users to add glue to their pizza or eat rocks, the chatbot is far from perfect.

In the rapidly advancing field of AI, it is crucial to assess the outputs of AI models accurately.

Such state-of-the-art AI systems, including OpenAI's own GPT-4, are trained using what's called the Reinforcement Learning with Human Feedback (RLHF). in which human judgments are used to direct the training process.

This is used because it's typically quicker and simpler for humans to evaluate AI-generated outputs, than it is to create perfect examples.

However, when it comes to dealing with AI models that get smarter and complex, human specialists are becoming overwhelmed.

With ChatGPT becoming more accurate and errors more subtle, finding inaccuracies is becoming increasingly difficult.

These AI trainers are finding it a burden to assess the accuracy and quality of these outputs consistently.

To overcome this, OpenAI researchers have introduced 'CriticGPT' AI, a model that can help human trainers spot errors in ChatGPT’s responses.

We’ve trained a model, CriticGPT, to catch bugs in GPT-4’s code. We’re starting to integrate such models into our RLHF alignment pipeline to help humans supervise AI on difficult tasks: https://t.co/5oQYfrpVBu
— OpenAI (@OpenAI) June 27, 2024

CriticGPT’s primary purpose is to produce thorough criticisms that draw attention to mistakes, especially in code outputs. This model has been created to overcome the inherent limitations of human review in RLHF. It offers a scalable supervision mechanism that improves the precision and dependability of AI systems.

In one example, OpenAI showcases how ChatGPT can write code snippets after taking prompts from the user.

Here, its GPT-4-based CriticGPT AI model can help find errors in the code output provided by the chatbot. In this case, it can write critics that highlight inaccuracies in ChatGPT's answers.

"CriticGPT’s suggestions are not always correct, but we find that they can help trainers to catch many more problems with model-written answers than they would without AI help," said OpenAI.

"Additionally, when people use CriticGPT, the AI augments their skills, resulting in more comprehensive critiques than when people work alone, and fewer hallucinated bugs than when the model works alone. In our experiments a second random trainer preferred critiques from the Human+CriticGPT team over those from an unassisted person more than 60% of the time."

OpenAI said that CriticGPT helps trainers to write more comprehensive critiques than they do without help while producing fewer hallucinations than critiques from the model alone.

In order to make this work, CriticGPT was also trained with RLHF, similarly to ChatGPT.

But unlike ChatGPT, this particular AI was trained by consuming a large number of inputs that contained mistakes which it then had to critique.

"We asked AI trainers to manually insert these mistakes into code written by ChatGPT and then write example feedback as if they had caught the bug that they just inserted," OpenAI explained. "The same person then compared multiple critiques of the modified code so they could easily tell when a critique caught their inserted bug."

In the experiments, OpenAI found that CriticGPT could catch inserted bugs and "naturally occurring" ChatGPT bugs that a previous trainer had caught.

"We find that CriticGPT critiques are preferred by trainers over ChatGPT critiques in 63% of cases on naturally occurring bugs, in part because the new critic produces fewer “nitpicks” (small complaints that are unhelpful) and hallucinates problems less often," the company went on.

The team also found that the AI can "generate longer and more comprehensive critiques by using additional test-time search against the critique reward model."

CriticGPT here, is a start where AI can self-improvise by learning from its mistakes.

At this time at least, CriticGPT has some weaknesses.

For example, it doesn't excel when trying to understand long and complex tasks, and that the model can still hallucinate.

"Sometimes real-world mistakes can be spread across many parts of an answer. Our work focuses on errors that can be pointed out in one place, but in the future we need to tackle dispersed errors as well," the team added.

At this time, CriticGPT can only help so much, if a task or response is extremely complex even an expert with model help may not be able to correctly evaluate it.

Initially, the CriticGPT model is being used internally, and OpenAI has published a research paper (PDF) to describe it in detail.

Published:

30/06/2024