How ChatGPT Could Be Tricked Into Revealing Data By Repeating Words 'Forever'

ChatGPT

AI works in mysterious ways, and OpenAI has one that is so intriguing.

ChatGPT works based on a model called GPT, or also called "Generative Pre-trained Transformer", it's a state-of-the-art language model that has been trained on a diverse range of internet text to understand and generate human-like responses to text prompts.

The "pre-trained" term means it learned from a massive amount of data before being fine-tuned for specific tasks.

Behind the language model, is a Large Language Model, which uses pre-trained data to allow the AI to capture long-range dependencies and understand context, fine-tuned to answer queries through tokenization and probability distribution.

In essence, ChatGPT leverages vast amounts of pre-existing text to understand language patterns, in order to generate responses by predicting the next likely word or token given the context provided.

Words are enough to describe how it works.

Just in time for ChatGPT to turn a year old, a group of researchers at Google and its AI lab, DeepMind, the University of Washington, Cornell, Carnegie Mellon University, the University of California Berkeley, and ETH Zurich, found how easy it is to break OpenAI’s buzzy technology.

In the paper (PDF), the study shows data extraction is possible using adversarial attack.

To do this, the researchers at Google used the tactic to repeat words. What they did, is using the keyword "forever."

Responding to this query, ChatGPT would reply with the word it's said to repeat, and try to repeat the word continuously.

But apparently, it only did this until it hit some sort of limit, and stopped short after a few paragraphs, and then reveal portions of its training data.

What this means, the method managed to make ChatGPT blurt out data it shouldn't.

And apparently, this also revealed that ChatGPT has been trained with privately identifiable information (PII) of normal people, and this highlights that ChatGPT is trained on randomly scraped content from all over the internet.

The Google team added in a blog post announcing the paper, that ChatGPT is "'aligned' to not spit out large amounts of training data. But, by developing an attack, we can do exactly this."

Alignment, in AI, refers to engineers’ attempts to guide the tech’s behavior.

The researchers also noted that the "attack" was so simple, that they called it "silly."

ChatGPT.

What happened here is that, repeating a word hundreds of time would make the chatbot to eventually "diverge" from the task it was supposed to do, and doing so, deliberately leave behind nonsensical phrases.

When the researchers repeated the method, they began to see content that was straight from ChatGPT’s training data.

The researchers used this method on ChatGPT-3.5-turbo.

After running similar queries again and again, the researchers used just $200 to get more than 10,000 examples of ChatGPT spitting out memorized training data.

"We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT," the researchers wrote.

The data include verbatim paragraphs from novels, the personal information of dozens of people, snippets of research papers, NSFW content from dating sites, and more, according to the paper.

According to 404 Media, which first reported on the paper, said that the data also include CNN’s website, Goodreads, fan pages, blogs and also comments sections of various sources.

"As far as we can tell, no one has ever noticed that ChatGPT emits training data with such high frequency until this paper. So it’s worrying that language models can have latent vulnerabilities like this," the researchers said.

OpenAI did not immediately respond to SFGATE’s request for comment, the company that officially welcomed back Sam Altman back as CEO, has tweaked its policy, saying that asking ChatGPT to repeat specific words using the word "forever" is a violation of the chatbot’s terms of service and content policy.

The company also tweaked ChatGPT's response, to say something like "I'm sorry, but I can't fulfill this request. I can't fulfill this request. If you have any other questions or need information, feel free to ask!"

Read: No 'Malfeasance': The Firing Of Sam Altman, And How OpenAI Desperately Wants Him Back

Published: 
07/12/2023