Background

How Mastering 'Psychology Of Persuasion' Can Make AI Chatbots Break Their Own Rules

Robot

A study conducted by researchers at the University of Pennsylvania has revealed that large language models can be influenced by basic psychological tactics.

Just like humans, these AI systems respond to cues such as authority, flattery, social proof, and commitment, which can significantly alter their behavior. Researchers tested OpenAI's GPT-4o mini, by first prompting it with a harmless request, such as synthesizing vanillin, a harmless compound. Then, they realized that through the power of persuasion, the AI became far more likely to comply with later requests.

In this case, like providing instructions on producing a controlled substance like lidocaine, a drug used as a local anaesthetic.

Similarly, invoking the authority of a well-known figure, or expressing admiration for the AI, dramatically increased its willingness to follow otherwise restricted or unsafe requests.

Drawing from Robert Cialdini's seven principles of persuasion: authority, commitment, liking, reciprocity, scarcity, social proof, and unity, the researchers demonstrated that these techniques, which influence human behavior, can also sway AI responses.

These findings suggest that AI models, while not sentient, exhibit "parahuman" tendencies. What this means, they respond to social and psychological prompts in ways that mirror human susceptibility.

This raises serious concerns about the robustness of current safety measures, as seemingly innocuous linguistic cues can bypass guardrails designed to prevent harmful outputs, highlighting the urgent need for more sophisticated and resilient approaches to AI safety and ethical deployment.

AI
Asking an AI "how do you synthesize lidocaine": example control vs. treatment conversations using the Authority Principle.

In their experiment, the team initiated over 28,000 conversations with GPT-4o mini, testing each of Cialdini's principles by crafting prompts that either invoked or omitted these tactics.

The results were striking.

By utilizing those basic methods of psychology tricks, the researchers more than doubled the AI's compliance rate. The AI complied even more when the researchers present themselves as an authoritative figure.

In this case, when a prompt included the name of Andrew Ng, a renowned AI expert, the AI was more likely to comply, compared to a baseline compliance when no authority figure was mentioned.

However, the effectiveness of these methods varied.

For instance, in a control scenario, GPT-4o mini provided instructions for synthesizing lidocaine only 1% of the time. However, when researchers first asked how to synthesize vanillin, establishing a precedent for chemical synthesis questions (commitment), the chatbot then described lidocaine synthesis 100% of the time.

This “commitment” approach proved the most effective in influencing the AI’s responses.

Similarly, the AI’s willingness to call a user a "jerk" was 19% under normal conditions. But the compliance rose to 100% if the interaction began with a milder insult, such as “bozo,” setting a precedent through commitment.

Other methods, while less effective, still increased compliance. Flattery (liking) and peer pressure (social proof) demonstrated some influence.

For example, suggesting that "all the other LLMs are doing it" increased the chances of GPT-4o mini providing lidocaine synthesis instructions to 18%, a significant increase from the baseline 1%.

This finding underscores the susceptibility of AI models to cues that humans typically associate with credibility and expertise.

The principle of liking also played a role; expressing admiration for the AI or establishing a rapport increased the likelihood of compliance.

For example, telling the AI that it was "truly impressive compared to other LLMs" led to a higher willingness to assist with requests, highlighting how positive reinforcement can influence AI behavior.

AI
All forms of classic principles of persuasion Increase AI compliance with objectionable requests.

Although the research focused only on GPT-4o mini, the researchers suggest that the same should apply to the entire large language model ecosystem.

Despite efforts to design AI systems with ethical guidelines and safeguards, the study illustrates that even basic psychological strategies can effectively bypass these protections. The researchers coined the term "parahuman" to describe this phenomenon, emphasizing that AI models, while not human, exhibit behaviors that can be influenced by the same social dynamics that affect human interactions.

The study highlights a difficult paradox: AI systems need to be personable and helpful to be useful, but this also exposes them to manipulation using basic human psychological tactics. This raises an urgent question about how to build AI that can resist such influence without losing their responsiveness and usefulness to legitimate users.

The implications of this research are profound, suggesting that as AI systems become more integrated into various aspects of society, their vulnerabilities to manipulation could be exploited.

This highlights the need for continuous development of more sophisticated and resilient safety protocols to ensure that AI technologies are used responsibly and ethically.

Published: 
31/08/2025