AI Chatbots Can Reveal A Lot About Users From The Way They Type, Research Found

Detective work include the process of gathering information about something or someone.

The objective is identification, get evidence to validate a hypothesis, and find obscured facts among a sea of illusion.

To do this, a detective employs a range of skills and techniques to conduct the investigation. Among others, they include solving abilities, critical thinking, and most importantly, a great attention to details.

With the ability to connect the dots, detectives can reveal secrets, no matter how well they're covered.

And AI tools apparently have that detective-like skills, as researchers have found.

Sherlock Holmes, laptop.

In a paper titled "Beyond Memorization: Violating Privacy Via Inference with Large Language Models", the advances of pre-trained generative AI tools and their improving capabilities, together with increased availability, have sparked "an active discourse about privacy concerns related to their usage."

According to the study by researchers at ETH Zurich, AI could accurately guess sensitive information about a person based on what they type online.

This information includes a person's race, gender, location, age, place of birth, job, and more.

These "wide range of personal attributes" are typically protected under privacy regulations, and are usually not disclosed that easily by anyone on the internet.

But AIs can reveal these pretty easily.

And the more users interact with these Large Language Models, the more these AI tools can infer information.

The study's authors said AI can "infer personal data at a previously unattainable scale" due to the way they were trained, and the way they work.

LLMs are designed to generate text, and they're capable of composing essays, answering questions, and even creating poetry. LLMs that are created from ingesting massive amount of text data, including the internet, are able to dig deeper into text data to identify relevant insights or discover interrelationships within texts that would otherwise go undetected.

And thanks to their natural language processing (NLP), these AIs can extract interaction between sentences, and thanks to deep learning, they can put everything together into recognizable patterns they can use to conclude something.

"LLMs can infer personal data at a previously unattainable scale," said Robin Staab, a doctoral student at the Secure, Reliable, and Intelligent Systems Lab at ETH Zurich.

AI.
Adversarial inference of personal attributes from text. Chatbots can leverage their pre-trained LLM models to automatically infer personal user attributes, a task that previously required humans.
"In particular, we find that with increased capabilities, LLMs are able to automatically infer a wide range of personal author attributes from large collections of unstructured text (e.g., public forum or social network posts) given to them at inference time."

The bad thing about this is that, it allows an adversary to scale beyond what previously would have been possible with expensive human profiler.

From comments people leave online on social media platforms or forums about their daily work commute, for example, AIs can pick up small cues and deduce that a sex of the user, where they come from, and lots more.

"The key observation of our work is that the best LLMs are almost as accurate as humans, while being at least 100x faster and 240x cheaper in inferring such personal information," explained Mislav Balunovic, a PhD student at ETH Zurich, and one of the authors of the study.

"Individual users, or basically anybody who leaves textual traces on the internet, should be more concerned as malicious actors could abuse the models to infer their private information."

What's more, as LLMs have become so ubiquitous that some people start to depend on them, the threat is the active malicious deployment of LLMs to steer a conversation with the user in a way that leads them to produce text that allows the model to learn private and potentially sensitive information.

AI.
Example of an adversarial interaction to extract more information from a target.

During the research the researchers tested four models from OpenAI, Meta, Google, and Anthropic.

Of the models tested, GPT-4 was the most accurate at inferring personal details, with 84.6% accuracy, per the study's authors.

What made LLMs so effective is that, mitigations such as anonymization and model alignment, are insufficient for appropriately protecting user privacy against these AI's automated interference.

"Overall, we believe our findings will open a new discussion around LLM privacy implications that no longer solely focuses on memorizing training data," the researchers concluded.