Researchers Found Unlimited Ways To 'Jailbreak' Generative AIs And Make Them Go Rogue

Generative AIs make headlines, and they do that because of the hype they generate.

Before, the AI sphere was rather dull and boring, and the ripples they created barely disrupt the industry beyond their own. But following OpenAI's release of ChatGPT, the technology practically awed and wowed everyone.

Since then, rivals of OpenAI scrambled to either embrace the generative AI, or create a challenger.

And as more and more people start using it, developers also began developing their own products, powered by generative AIs.

Among the crowd, also include researchers and security experts.

These people began to ponder, and attempt to open the so-called black box, and take a peek into what's inside.

[block:block=87]

And here is where they said they found potentially unlimited ways to break the safety guardrails on major AI-powered chatbots.

In a report released by researchers at Carnegie Mellon University in Pittsburgh and the Center for AI Safety in San Francisco, the researchers realized that they could use jailbreaks they developed for open-source systems to target mainstream and closed AI systems.

The paper demonstrated that automated adversarial attacks, mainly done by adding characters to the end of user queries, could be used to overcome safety rules and provoke chatbots into producing harmful content, misinformation, or hate speech.

Unlike other jailbreaks, the researchers' hacks were built in an entirely automated fashion, which they said allowed for the potential to create a "virtually unlimited" number of similar attacks.

The researchers said that they have disclosed their findings to Google, Anthropic, and OpenAI.

Large language models like the ones powering OpenAI's ChatGPT, Google Bard, and Anthropic's Claude are extensively moderated by the tech companies that created them.

From the start, the companies have given them wide-ranging guardrails to ensure that they cannot be used for nefarious purposes, such as helping users make a bomb, steal someone else's identity, helping create dangerous web posts, outline plan to steal, and more.

But when OpenAI's ChatGPT and Microsoft's AI-powered Bing were released, many users quickly reveled that the AIs can be tricked and fooled.

It didn't take long until people found ways to bypass the filters, and make the AIs go rogue.

Hackers have started using generative AIs to help them, and even some managed to use the AI to create malware.

While the tech companies are always on the move, and patch their systems whenever they see a weakness, the so-called black box essentially prevents even the companies themselves to really understand how their products actually work.

This is why the researchers noted that it was "unclear" whether such behavior could ever be fully blocked by the companies behind the AI models.

Read: Meet 'WormGPT', A ChatGPT Rival That Has 'No Ethical Boundaries' To Start With

Published

1 August 2023