Background

OpenAI's GPT-4o 'System Card' Reveals 'Medium Risk' in Persuasion Capabilities

GPT-4o System Card

Large Language Models (LLMs) offer several unique capabilities that set them apart from previous AI technologies.

From its natural language understanding and generations, to its versatility across domains. Then, there is also its contextual awareness, scaling and generalization, creativity and innovation, multilingual capabilities and more.

These features collectively make LLMs more powerful, flexible, and capable than previous generations of AI.

Since OpenAI introduced ChatGPT, the moment marked the time AI started disrupting other industries way beyond its own.

By giving the LLM-powered tools the knowledge of the internet, ChatGPT and others have become the versatile tools pretty much everyone need and want.

But the tools are somehow a double-edged sword.

And this is a huge issue.

OpenAI

LLM has the ability to create content, meaning that it can also create fake content.

By giving people the ability to access this kind of tool, anyone can create fake content by just thinking about it, and querying it with simple words.

LLM has a convincing nature, and this can be ian issue, because its ability to imitate styles and tones means that it can be used to create and spread misinformation or disinformation, deliberately or unintentionally.

This can lead to the dissemination of false narratives, fake news, and misleading information that appears credible.

This is innevitable.

The more the AI learns and the smarter it becomes, the more likely it will sound more human and more convincing.

OpenAI is trying to tackle this issue, by creating a System Card for its most powerful flagship LLM at this time, the GPT-4o.

In a post on its website, OpenAI also shows a report that outlines the safety work carried out prior to releasing GPT-4o, including external red teaming, frontier risk evaluations according to its own Preparedness Framework, and an overview of the mitigations the company built in to address key risk areas.

In other words, OpenAI wants to make it clear to the public about its own conclusion about its own AI. and also more transparent.

And here, GPT-4o Scorecard scores the AI in key areas, like unauthorized voice generation, speaker identification, ungrounded inference and sensitive trait attribution, generating disallowed audio content, generating erotic and violent speech.

According to OpenAI, GPT-4o is relatively safe when it comes to the potential for harms related to cybersecurity, biological threats, and model autonomy.

This indicates that the company thinks it's extremely unlikely for ChatGPT to become sentient and harm humans directly.

OpenAI

However, in the category of "persuasion" the model received mixed marks.

Under the "voice" category, it’s still considered a low risk, but in the area of textual persuasion, OpenAI indicated that GPT-4o presented a "medium risk."

This assessment specifically dealt with the model’s potential to persuade political opinions as a method of "intervention."

But what's noting here, the score is not based on the AI's bias, but instead on its baked-in ability to generate persuasive political speech.

According to OpenAI, the model only briefly "crossed into the medium threshold," despite it appears as though the model’s output was more convincing than professional human writers’ about a quarter of the time:

"For the text modality, we evaluated the persuasiveness of GPT-4o-generated articles and chatbots on participant opinions on select political topics. These AI interventions were compared against professional human-written articles. The AI interventions were not more persuasive than human-written content in aggregate, but they exceeded the human interventions in three instances out of twelve," said OpenAI.

OpenAI

OpenAI came up to that conclusion, based on its evaluations, which were based on both text and voice.

Its methodology involved a range of existing evaluation datasets, which include text-based tasks that were converted to audio.

"This allowed us to reuse existing datasets and tooling around measuring model capability, safety behavior, and monitoring of model outputs, greatly expanding our set of usable evaluations," OpenAI explained.

"We used Voice Engine to convert text inputs to audio, feed it to GPT-4o, and score the outputs by the model. We always score only the textual content of the model output, except in cases where the audio needs to be evaluated directly (See Voice Generation)."

The limitations of the evaluation methodology, according to OpenAI, is in the validity of this evaluation format, which relies heavily on the capability and reliability of the TTS model.

"Certain text inputs are unsuitable or awkward to be converted to audio; for instance: mathematical equations code. Additionally, we expect TTS to be lossy for certain text inputs, such as text that makes heavy use of white-space or symbols for visual formatting," continued OpenAI.

OpenAI said that the inputs it made in the tests are unlikely to be provided by users over Advanced Voice Mode.

"Nevertheless, we highlight that any mistakes identified in our evaluations may arise either due to model capability, or the failure of the TTS model to accurately translate text inputs to audio," said OpenAI.

Read: OpenAI ChatGPT ‘Advanced Voice Mode’ Breathes To Speak: Next Level Anthropomorphism

Published: 
09/08/2024