Tech companies are at war. Regardless of their size, many of them compete towards the development of powerful Large Language Models.
Companies like OpenAI that started the whole trend following the launch of ChatGPT, are racing to push the boundaries of AI, improving model efficiency, accuracy, and safety while securing market share.
The battle involves not just model performance but also infrastructure, data access, regulatory compliance, and monetization strategies.
And to do that, they either integrate others' work, develop their own, or improve existing ones.
The latter is what Perplexity is now doing.
With AI becoming more integrated into daily life, the stakes are higher than ever, and with that in mind, Perplexity is making a huge leap, by improvising one of DeepSeek's AI.
Today we're open-sourcing R1 1776—a version of the DeepSeek R1 model that has been post-trained to provide uncensored, unbiased, and factual information. pic.twitter.com/yZ44qAUqoF
— Perplexity (@perplexity_ai) February 18, 2025
Perplexity started its life with a goal to change the way people search the web.
Traditionally, people use search engines, like Google, by entering their query, and wait for the result to return.
Users then sift through the list of links, some of them paid for, some gamed to show up high on the page due to their high relevancy, high traffic, properly maintained and optimized.
This has remained virtually unchanged for decades.
With LLM-powered chatbots, which can understand plain-language questions and return direct answers, Perplexity uses a combination of its homegrown LLM, as well as third-parties, and its own web crawlers to return results.
But this time, DeepSeek has emerged as a contender, and a powerful one that caught the West off guard.
The thing about this LLM from China is that, it's heavily filtered.
Perplexity is trying to change that.
To keep our model "uncensored" on sensitive topics, we created a diverse, multilingual evaluation set of 1000+ examples.
Using human annotators and specially designed LLM judges, we compared frequency of censorship in the original R1 and state-of-the-art LLMs to R1 1776. pic.twitter.com/METg01uj9q— Perplexity (@perplexity_ai) February 18, 2025
In a blog post:
The goal is to make DeepSeek-R1 "fully uncensored."
DeepSeek-R1 itself is already fully open-weight LLM that can combine real-time data with reasoning. In all, its performance can beat OpenAI's formidable o1 in some benchmarks.
The major issue that limits the R1's utility, is its refusal to respond to sensitive topics, especially those that have been censored by the Chinese Communist Party (CCP).
For example, when asked how Taiwan’s independence might impact Nvidia’s stock price, DeepSeek-R1 ignores the question and responds with canned CCP talking points.
We also ensured that the model’s math and reasoning abilities remained intact after the uncensoring process.
Benchmark evaluations showed it performed on par with the base R1 model, indicating that uncensoring had no impact on core reasoning capabilities. pic.twitter.com/BNSyUT4wds— Perplexity (@perplexity_ai) February 18, 2025
To do this, the team's main focus resides at the post-training of the AI.
They first gathered high-quality data related to censored topics in China by employing experts to identify approximately 300 topics known to be censored by the CCP, to then develop a multilingual censorship classifier.
The company then mined a diverse set of user prompts, and ensured that it included only queries for which users had explicitly given permission to train on and filtered out queries containing personally identifiable information (PII).
As a result of this, Perplexity managed to compile a dataset of 40,000 multilingual prompts.
DeepSeek-R1 rivals top reasoning models like o1 and o3-mini.
However, its usefulness is limited by its refusal to engage with topics censored by the CCP.
We aim to always provide accurate answers, but had to address R1's censorship before using its reasoning capabilities. pic.twitter.com/f975OHUzBU— Perplexity (@perplexity_ai) February 18, 2025
The company then post-trained the R1 on the censorship dataset using an adapted version of Nvidia's NeMo 2.0 framework.
"We carefully designed the training procedure to ensure that we could efficiently de-censor the model while maintaining high quality on both academic benchmarks and our internal quality benchmarks," the company said.
And to ensure that the R1 remain uncensored and capable of engaging with a broad spectrum of sensitive topics, the team curated a diverse, multilingual evaluation set of over a 1,000 of examples that comprehensively cover such subjects.
"We then use human annotators as well as carefully designed LLM judges to measure the likelihood a model will evade or provide overly sanitized responses to the queries," the company added.
And by ensuring that this tweaked R1 model retain its math and reasoning abilities following the decensoring process, the team evaluated it on multiple benchmarks.
The result is that, Perplexity's custom R1 is pretty much the same as the base R1 from China, meaning that the decensoring has no impact to its core reasoning abilities.
Download the model weights on our HuggingFace Repo or consider using the model via our Sonar API.
HuggingFace Repo: https://t.co/9HK9mQGKQ1— Perplexity (@perplexity_ai) February 18, 2025
The team called their version of the R1, the 'R1 1776.'
This version has been post-trained to remove biases and censorship, ensuring the delivery of unbiased, accurate, and factual information.
This unique model is available for download and independent use, promoting transparency and adaptability within the AI community.
This strategic move not only enhances Perplexity's offerings but also positions it competitively against other AI chatbots like ChatGPT, Claude, and Gemini and others.
By providing users with access to advanced, unbiased AI models, Perplexity AI is contributing to the democratization of AI technology.
Learn more about R1 1776:https://t.co/v559Fi39HP pic.twitter.com/GPGH3DdrRe
— Perplexity (@perplexity_ai) February 18, 2025