Background

A 'Fully Uncensored' Version Of DeepSeek-R1 Has Been Open-Sourced By Perplexity

Perplexity

Tech companies are at war. Regardless of their size, many of them compete towards the development of powerful Large Language Models.

Companies like OpenAI that started the whole trend following the launch of ChatGPT, are racing to push the boundaries of AI, improving model efficiency, accuracy, and safety while securing market share.

The battle involves not just model performance but also infrastructure, data access, regulatory compliance, and monetization strategies.

And to do that, they either integrate others' work, develop their own, or improve existing ones.

The latter is what Perplexity is now doing.

With AI becoming more integrated into daily life, the stakes are higher than ever, and with that in mind, Perplexity is making a huge leap, by improvising one of DeepSeek's AI.

Perplexity started its life with a goal to change the way people search the web.

Traditionally, people use search engines, like Google, by entering their query, and wait for the result to return.

Users then sift through the list of links, some of them paid for, some gamed to show up high on the page due to their high relevancy, high traffic, properly maintained and optimized.

This has remained virtually unchanged for decades.

With LLM-powered chatbots, which can understand plain-language questions and return direct answers, Perplexity uses a combination of its homegrown LLM, as well as third-parties, and its own web crawlers to return results.

But this time, DeepSeek has emerged as a contender, and a powerful one that caught the West off guard.

The thing about this LLM from China is that, it's heavily filtered.

Perplexity is trying to change that.

In a blog post:

"Today we're open-sourcing R1 1776, a version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information. Download the model weights on our HuggingFace Repo or consider using the model via our Sonar API."

The goal is to make DeepSeek-R1 "fully uncensored."

DeepSeek-R1 itself is already fully open-weight LLM that can combine real-time data with reasoning. In all, its performance can beat OpenAI's formidable o1 in some benchmarks.

The major issue that limits the R1's utility, is its refusal to respond to sensitive topics, especially those that have been censored by the Chinese Communist Party (CCP).

For example, when asked how Taiwan’s independence might impact Nvidia’s stock price, DeepSeek-R1 ignores the question and responds with canned CCP talking points.

"At Perplexity, we aim to provide accurate answers to all user queries. This means that we are not able to make use of R1's powerful reasoning capabilities without first mitigating its bias and censorship."

To do this, the team's main focus resides at the post-training of the AI.

They first gathered high-quality data related to censored topics in China by employing experts to identify approximately 300 topics known to be censored by the CCP, to then develop a multilingual censorship classifier.

The company then mined a diverse set of user prompts, and ensured that it included only queries for which users had explicitly given permission to train on and filtered out queries containing personally identifiable information (PII).

As a result of this, Perplexity managed to compile a dataset of 40,000 multilingual prompts.

The company then post-trained the R1 on the censorship dataset using an adapted version of Nvidia's NeMo 2.0 framework.

"We carefully designed the training procedure to ensure that we could efficiently de-censor the model while maintaining high quality on both academic benchmarks and our internal quality benchmarks," the company said.

And to ensure that the R1 remain uncensored and capable of engaging with a broad spectrum of sensitive topics, the team curated a diverse, multilingual evaluation set of over a 1,000 of examples that comprehensively cover such subjects.

"We then use human annotators as well as carefully designed LLM judges to measure the likelihood a model will evade or provide overly sanitized responses to the queries," the company added.

And by ensuring that this tweaked R1 model retain its math and reasoning abilities following the decensoring process, the team evaluated it on multiple benchmarks.

The result is that, Perplexity's custom R1 is pretty much the same as the base R1 from China, meaning that the decensoring has no impact to its core reasoning abilities.

The team called their version of the R1, the 'R1 1776.'

This version has been post-trained to remove biases and censorship, ensuring the delivery of unbiased, accurate, and factual information.

This unique model is available for download and independent use, promoting transparency and adaptability within the AI community.

This strategic move not only enhances Perplexity's offerings but also positions it competitively against other AI chatbots like ChatGPT, Claude, and Gemini and others.

By providing users with access to advanced, unbiased AI models, Perplexity AI is contributing to the democratization of AI technology.

Published: 
20/02/2025